Crafting Actionable Insights from Data: A Step-by-Step Approach
Written on
Data empowers improved decision-making within organizations. However, many businesses excel at data collection but struggle to derive meaningful insights from it. They may profess to be data-driven, yet they often depend on intuition for critical decisions.
As a Data Scientist, your role is to clarify and convey data insights to stakeholders, enabling them to make more informed choices.
Your value lies not merely in the analyses you conduct or the models you create, but in the tangible business outcomes that result from your efforts—this distinction is what separates senior data scientists from their junior counterparts.
In this guide, I present a comprehensive roadmap based on my experiences at Rippling, Meta, and Uber, aimed at transforming data into actionable insights.
The following topics will be explored: 1. Identifying key metrics: Understanding the revenue model and drivers for your business. 2. Effective tracking: Setting up monitoring systems while avoiding common pitfalls, including mastering time horizons and seasonal patterns. 3. Extracting insights: A structured approach to pinpointing issues and opportunities, along with common trend types you'll encounter.
While this may seem straightforward, the intricacies lie within the details, so let's delve into each aspect.
Part 1: Identifying Key Metrics
Begin by determining which metrics are essential for tracking and analysis. To maximize impact, focus on those that significantly influence revenue.
Start with a high-level revenue equation (e.g., “Revenue = Impressions * CPM / 1000” for an ad-supported business) and dissect each element to uncover the underlying drivers. The specific revenue equation will vary based on your business model; you can explore some common examples.
The resulting driver tree, with outputs at the top and inputs at the bottom, illustrates the factors that influence business results and indicates the dashboards required for comprehensive investigations.
Example: Below is a partial driver tree for an ad-based B2C product.
Understanding Leading and Lagging Metrics
While the revenue equation suggests direct input-output relationships, this is often misleading.
A classic example is the Marketing & Sales funnel: leads convert into qualified opportunities, which then close—this process can span several months, depending on the business and customer type.
In essence, when analyzing outcome metrics like revenue, you're often reflecting on actions taken weeks or months prior.
Generally, the deeper you go in your driver tree, the more likely you are to encounter leading indicators; conversely, metrics at the top tend to be lagging.
Quantifying the Lag
Analyzing historical conversion windows is crucial for understanding the lag associated with your metrics.
This analysis enables you to work backwards—knowing how far back to look for causes of revenue fluctuations—and project forward, predicting how long it will take to observe the effects of new initiatives.
From my observations, establishing rules of thumb (e.g., the average time for a new user to become active) can yield 80-90% of the insights needed, eliminating the need for excessive complexity.
Part 2: Effective Monitoring and Avoiding Common Pitfalls
With your driver tree established, how do you leverage it to monitor business performance and derive insights for stakeholders?
The initial step involves creating a dashboard to track key metrics. I won't delve into a comparison of various BI tools here (I may address that in a future post).
Everything discussed can be executed using Google Sheets or any other tool, so your choice of BI software shouldn't be a limiting factor.
Instead, let’s focus on best practices that facilitate data comprehension while avoiding common traps.
1. Selecting the Appropriate Time Frame for Each Metric
While early detection of trends is beneficial, it’s vital to avoid the pitfall of analyzing overly granular data that can often lead to misleading insights.
Consider the time horizon of the activities you're measuring and whether you can act on the data: - Real-time data is advantageous for B2C platforms like Uber, as 1) transactions have brief lifecycles (Uber rides are typically requested, accepted, and completed in under an hour) and 2) the platform has mechanisms for immediate response (e.g., surge pricing, incentives, driver communication). - Conversely, daily Sales data in a B2B SaaS environment can be noisy and less actionable due to prolonged deal cycles.
Also, align the time horizon of the goals with the metrics. If your partner teams have monthly goals, the default view for these metrics should be monthly.
However, the primary issue with monthly metrics (or longer periods) is the limited number of data points, which delays your performance updates.
One solution is to use rolling averages for metrics: this allows you to detect recent trends while smoothing out noise.
Example: Monthly numbers may suggest you're on track to meet the April target; however, a 30-day rolling average might reveal a significant drop in revenue generation, indicating an urgent need for investigation.
2. Establishing Benchmarks
To extract insights from metrics, context is essential.
- The simplest way is to benchmark metrics over time: Is the metric improving or declining? Ideally, you should have a target level for the metric.
- If an official goal exists for the metric, that's excellent. If not, you can still assess progress by deriving implied goals.
Example: If the Sales team has a monthly quota without a defined pipeline generation goal, you can examine the historical ratio of open pipeline to quota ("Pipeline Coverage") as your benchmark. Be cautious: This approach assumes steady performance (i.e., the team maintains a consistent conversion rate from pipeline to revenue).
3. Accounting for Seasonality
In most businesses, it’s crucial to account for seasonality when interpreting data. Does the metric exhibit recurring patterns based on time of day, week, or month?
Example: Consider the monthly trend of new ARR in a B2B SaaS company.
A drop in new ARR during July and August might cause concern, prompting an extensive investigation. However, layering annual data reveals a predictable summer lull, indicating that business is expected to rebound in September.
Seasonality can occur on different timelines; for instance, certain weekdays may yield stronger or weaker performance, or activity may ramp up towards month-end.
Example: If you assess the Sales team's performance mid-April and find $26k in revenue against a $50k goal, you might conclude they’re on track to miss the target with only six business days left. But if you know the team typically closes many deals at month-end, you might discover they’re on a solid trajectory when you analyze cumulative sales against previous months.
4. Addressing "Baking" Metrics
A frequent pitfall in metric analysis is examining figures that haven't had enough time to "bake," or reach their final value.
Common examples include: 1. User acquisition funnel: Measuring conversions from traffic to signups to activation without knowing how many recent signups will eventually convert. 2. Sales funnel: Your average deal cycle spans several months, making it unclear how many recent open deals will close. 3. Retention: Assessing how well a specific user cohort retains with the business.
In these cases, recent cohort performance may appear worse than it truly is due to incomplete data.
If you prefer not to wait, you generally have three strategies to address this issue:
Option 1: Segment the metric by time period A straightforward method is to analyze aggregate metrics by time period (e.g., first week conversion, second week conversion). This allows for early insights while ensuring that comparisons are equitable, avoiding bias toward older cohorts.
You can visualize this in a cohort heatmap. For instance, tracking conversions from signup to first transaction reveals trends more accurately.
This approach allows you to see that, on a comparable basis, the conversion rate is declining, which might not be apparent if only looking at the aggregate conversion rate.
Option 2: Alter the metric definition In certain situations, you can redefine the metric to avoid examining incomplete data. For example, rather than tracking how many deals from March have closed, analyze how many of those deals were won versus lost, as this will remain constant over time.
Option 3: Forecasting Using historical data, you can project where a cohort's final performance is likely to land. Over time, as more data accumulates, your forecasts will align more closely with actual values.
Caution: Approach forecasting cohort performance carefully, as it’s easy to miscalculate. For instance, in a B2B scenario with low win rates, a single deal can significantly skew a cohort's performance, making accurate forecasting challenging.
Part 3: Extracting Insights from the Data
All this data is beneficial, but how do we convert it into actionable insights?
Given limited time, prioritize by examining the largest gaps and notable movements: - Where are teams falling short of their goals? Where is unexpected success occurring? - Which metrics are declining? What trends are reversing?
Once you identify a trend worth investigating, you'll need to drill down to uncover the root causes so that business partners can develop targeted responses.
To provide a structured approach for your analysis, I’ll outline key archetypes of metric trends that you may encounter, along with concrete examples from real-life experiences.
1. Net Neutral Movements
When observing significant shifts in a metric, begin by analyzing the driver tree from the top down. This helps you determine whether the change actually impacts the key metrics you and your team care about; if not, the urgency of finding the root cause diminishes.
Example Scenario: If the conversion rate from visits to signups on your website plummets, instead of panicking, check the total signups to find they remain stable. The apparent drop in conversion rate stems from a surge in low-quality traffic rather than a decline in your core traffic's performance.
2. Denominator vs. Numerator Changes
When analyzing changes in ratio metrics (e.g., impressions per active user, trips per driver), it’s essential to determine whether the numerator or denominator has shifted.
There’s a tendency to presume that the numerator has changed, as these typically reflect engagement or productivity metrics. However, this isn't always the case.
Examples include: - A decline in leads per Sales rep due to the recent onboarding of new hires, not a demand generation issue. - Trips per Uber driver per hour may drop not because of fewer rider requests, but because the team increased incentives, resulting in more drivers being active.
3. Isolated / Concentrated Trends
Many metric trends are driven by specific factors within a certain area of the product or business, and aggregate figures may obscure the full picture.
The general process for isolating the root cause involves: Step 1: Continue breaking down metrics until isolating the trend is no longer possible.
Just like every number can be reduced to a set of prime numbers, every metric can be analyzed further to identify fundamental inputs.
This breakdown allows you to pinpoint the issue within your driver tree, facilitating targeted responses.
Step 2: Segment the data to highlight the relevant trend.
Segmentation can reveal if a particular business area is the source of the issue. By examining dimensions such as: - Geography (region/country/city) - Time (month, week, etc.) - Product (various SKUs or product surfaces) - User demographics (age, gender, etc.) - Individual entities (sales reps, merchants, users)
Example: Suppose you work at DoorDash and notice a decline in completed deliveries in Boston week-over-week. Instead of hastily proposing solutions to increase demand or completion rates, first isolate the issue by dissecting the "Completed Deliveries" metric.
From the driver tree, you can rule out demand issues, discovering instead that difficulties in securing drivers for order pickups are the cause.
Next, assess whether this is a widespread issue. If merchant data shows that the problem affects many restaurants, it doesn’t narrow down your focus.
However, by visualizing a heatmap of "delivery requests with no couriers found," you may find that the issue predominantly occurs in Boston's outskirts during nighttime.
What can be done with this information? Identifying the root cause allows for targeted courier acquisition efforts and incentives in specific times and locations instead of spreading efforts thinly across Boston.
In other words, isolating the root cause enhances resource allocation efficiency.
Additional examples of concentrated trends include: - A small group of "whales" generating most in-game purchases in an online game, prompting the team to concentrate retention and engagement efforts on this segment. - A few support representatives being responsible for the majority of escalated support tickets to Engineering, providing a targeted opportunity to alleviate Engineering's workload through training.
4. Mix Shifts
A common source of confusion in performance analysis arises from mix shifts and Simpson’s Paradox.
Mix shifts refer to changes in the composition of a total population. Simpson’s Paradox highlights the counterintuitive scenario where trends observed in the total population either disappear or reverse when examining subcomponents (or vice versa).
What does this look like in practice?
If you work at YouTube and notice a decline in revenue, further investigation reveals a long-term decrease in CPMs.
Since CPM as a metric cannot be decomposed further, segmenting the data may prove challenging. For instance, CPMs across all regions appear stable.
Here’s where mix shifts and Simpson’s Paradox become relevant: each region's CPM remains unchanged, but a shift in impression composition from the US to APAC—with APAC generally having a lower CPM—results in a lower overall CPM.
Understanding the precise root cause enables a more tailored response. Based on this insight, the team can either strive to rekindle growth in high-CPM regions, explore additional monetization strategies for APAC, or aim to offset the lower value of individual impressions through significant volume growth in the large APAC market.
Final Thoughts
Remember, data alone lacks inherent value. Its value emerges only when used to generate insights or recommendations for users or internal stakeholders.
By adhering to a structured framework, you’ll effectively identify significant trends within your data, and by applying the tips outlined above, you can differentiate signal from noise and prevent erroneous conclusions.
If you are interested in more content like this, consider following me on Medium, LinkedIn, or Substack.