- DPB Insights
- Posts
- Why Your Data Team Can't Answer "Simple" Questions
Why Your Data Team Can't Answer "Simple" Questions
The Architecture Problem Hiding Behind Every Quick Ask
Why Your Data Team Can't Answer "Simple" Questions
Subtitle: The Architecture Problem Hiding Behind Every "Quick Ask"
Big Idea
Your CFO asks: "What was our net retention by cohort last quarter?" Your VP of Sales asks: "What's our pipeline coverage by segment?"
Simple questions. Clear business need. And yet your data team needs a week to answer them.
This isn't a people problem. Your analysts aren't slow. Your engineers aren't lazy. The question isn't simple. Your data architecture is. And until you fix the architecture, every "quick ask" will keep taking five days, no matter how many analysts you hire.
Breakdown
Part 1: The Anatomy of a "Simple" Question
Let's trace what actually happens when your CFO asks for net retention by cohort.
First, your analyst needs to find the data. Revenue lives in your billing system (Stripe, Chargebee, whatever). Customer segments live in your CRM. Cohort definitions? Probably in a spreadsheet someone built eighteen months ago that may or may not match what's in the database. Or, more honestly, the spreadsheet IS the database.
Then they need to join the data. Billing system IDs don't match CRM IDs cleanly — there's always a lookup table, a mapping file, or worse, a manual crosswalk that someone maintains in Google Sheets. The analyst spends a day just getting the join logic right.
After that, the analyst needs to calculate it. Net retention sounds straightforward until you realize: Do we include plan changes mid-month? How do we handle credits? What about customers who churned and came back? Every company defines this slightly differently, and if those definitions aren't codified anywhere, your analyst is making judgment calls, or worse, guessing.
Then they need to validate it. The number comes out at 108%. That doesn't match the 112% that was on last quarter's board deck. Now the analyst is reverse-engineering someone else's calculation from six months ago, trying to figure out which one is right. Or whether both are right, using different definitions.
A "simple" question just consumed four to five days. Not because the analyst is bad at their job — because the infrastructure forced them to do archaeology instead of analysis.
Key points:
A single business question often touches 3-5 source systems
ID mapping between systems is where most time is lost
Undefined or inconsistent metric definitions create rework loops
Validation against historical numbers adds days, not hours
Part 2: Why Hiring More Analysts Doesn't Fix It
The instinct when the data team is slow is to request headcount increases. It's the wrong move, and the numbers show why.
Research from the Modern Data Company found that 63% of data practitioners spend more than 15-20% of their time purely on maintenance: updating schemas, managing data quality, and modifying pipelines. A separate TDWI study found that 50% of teams spend over 61% of their time on data integration tasks alone. And the classic 80/20 rule persists: data professionals still spend roughly 80% of their time finding, cleaning, and organizing data, leaving only 20% for actual analysis.
Think about what that means for a team of three analysts at a mid-market SaaS company. If 80% of their time goes to data prep, you effectively have 0.6 analysts doing analysis. Hire a fourth? Now you have 0.8. You just spent $120K+ in fully loaded cost for 0.2 of an analyst's output. That's not a hiring problem. That's a structural one.
The pattern I see across mid-market companies is consistent: teams of 2-5 data people, drowning in ad hoc requests, spending their days doing the same joins, the same data cleaning, the same validation work… over and over. They're not building anything. They're surviving.
Meanwhile, SaaS companies are running leaner than ever. ARR per employee has climbed across every revenue band since 2022, and headcount is down, especially in functions like data, engineering, and support. Companies aren't going to throw bodies at this problem. They can't afford to.
Key points:
80% of analyst time goes to data prep, not analysis. Hiring more people doesn't change the ratio
Mid-market data teams (2-5 people) are typically in permanent reactive mode
The economics of lean SaaS don't support headcount as a solution
Every new hire inherits the same broken architecture
Part 3: The Architecture Gap Nobody Talks About
What's actually broken? Most mid-market data stacks have two layers: raw data (whatever lands from your integrations) and dashboards (whatever your BI tool shows). That's it. Everything in between, the beautiful data pipelines that make all your data actually usable, is missing.
In a well-architected stack, you'd have something like this:
Raw → Cleaned → Modeled → Metrics → Insights
Raw: Data as it lands from source systems, untouched
Cleaned: Deduplicated, typed, validated (Data is similar to humans. You want some basic hygiene.)
Modeled: Business entities properly joined and related (customers, subscriptions, invoices as connected objects, not isolated tables)
Metrics: Pre-computed, governed definitions (net retention means this, pipeline coverage means that)
Insights: Anomaly detection, variance analysis, trend identification — answers, not just data
Most companies skip straight from "raw" to "dashboards." The analyst becomes the human middle layer, doing the cleaning, modeling, and metric definition work manually on every single request. They are the architecture. Pretty impressive when you think about it, but not scalable.
This is why the same question takes a week every time it's asked. There's no institutional memory in the system. The analyst rebuilds the answer from scratch because the answer was never encoded into the data layer itself.
Key points:
Most mid-market stacks are missing 3 of 5 critical layers
The analyst fills the gap manually, and that doesn't scale
Pre-computed metrics are what make "simple" questions actually simple
Without a proper semantic/metrics layer, AI tools (including Snowflake Intelligence and Databricks Genie) can't help either
Part 4: What Fixing It Actually Looks Like
This isn't a six-month, boil-the-ocean initiative. It's a focused effort to close the gaps that cause 80% of the pain.
Step 1: Audit your top 10 ad hoc requests from the last 90 days.
Literally go through Slack, email, or your ticketing system and list the 10 most common data requests. I guarantee you'll find that 3-4 questions account for the majority of requests. These are your candidates.
Step 2: Map the data lineage for those questions.
For each question, trace the path from source system to answer. Where does the data come from? Where does it get joined? Where is the logic applied? Write it down. This is usually where teams discover they have three different definitions for "active customer" being used in production. (and hopefully your production database isn’t called dev)
Step 3: Build the intermediate layer.
Encode those definitions into your data warehouse. If you're using dbt, this is a Gold layer with clear, tested, version-controlled metric definitions. Net retention should be a table you can query in five seconds, not a calculation your analyst rebuilds every Tuesday.
Step 4: Create a governed metrics layer.
This is the key piece. A semantic or metrics layer that defines business terms in a way that both humans and tools can use. This is what makes self-service possible, and it's the prerequisite for AI-powered analytics actually working. (If you read the last newsletter on Snowflake Intelligence, this is the foundation that most companies skip.)
Step 5: Measure time-to-answer, not dashboard count.
The goal isn't more dashboards. It's faster answers. Track how long it takes to answer those top 10 questions before and after. If your CFO's variance analysis goes from 5 days to 5 minutes, you've won.
Key points:
Start with your 10 most common requests, not your entire data model
Audit reveals you probably have 3-4 "repeat offender" questions
The fix is a metrics layer that encodes business logic once and serves it forever
Measure time-to-answer as the north star metric
Common Mistakes
Mistake 1: Building dashboards instead of data models.
The request comes in: "We need a churn dashboard." So the team builds a dashboard. But they don't standardize the churn definition, don't model the underlying data properly, and don't connect it to other metrics. Three months later, someone asks a slightly different churn question and the analyst starts from scratch. You don't need more dashboards. You need better data models that make dashboards trivial. Dashboards are also meant to be your visual medium, not your transformation engine.
Mistake 2: Letting the analyst become the institutional memory.
When one person knows how all the metrics are calculated, you don't have a data team, you have a single point of failure. If that person leaves, goes on vacation, or gets sick, every question stalls. And then you bother them while they’re trying to enjoy their Mai Tai. Not cool. The knowledge needs to live in the code, not in someone's head.
Mistake 3: Treating data quality as an afterthought.
Teams build beautiful dbt models on top of garbage source data and wonder why the numbers don't match. Schema changes upstream break everything downstream. Data quality monitoring isn't optional, it's foundational. You can't trust answers if you can't trust the data behind it.
Mistake 4: Solving a structural problem with a tool purchase.
"If we just buy Looker / Tableau / ThoughtSpot / [insert tool], our data problems will go away." They won't. Every BI tool assumes your data is already modeled and governed. They visualize what's underneath them. If what's underneath is a mess, you get a pretty visualization of a mess.
How to Use This
Your 30-day playbook:
Week 1: Pull your team's last 90 days of data requests. Categorize them. Identify the top 3-4 repeat questions. Interview the people who asked them, what were they actually trying to decide?
Week 2: Map the data lineage for each repeat question. Document every source system, every join, every calculation. Identify where definitions conflict or where manual work is required.
Week 3: Build the metric definitions for your top question into your warehouse. If you use dbt, create a Gold-layer model with tests. If not, at minimum create a view with documented business logic.
Week 4: Deploy. Route the next instance of that question to the pre-built model instead of having an analyst build it from scratch. Measure time-to-answer. Repeat for the next question.
Reach out and let me know how this goes if you try it.
Final Insight
The companies with the fastest data teams aren't the ones with the most analysts. They're the ones who invested in the boring middle layer: the data models, the metric definitions, the governed logic that turns raw data into answers.
Your data team can answer "simple" questions. They just need an architecture that lets them.
Two ways I can help:
Want the answers without the archaeology? We've built this exact architecture (Raw → Cleaned → Modeled → Metrics → Insights) dozens of times for companies of all sizes. PipeHouse is a managed data platform that delivers the full stack — clean data, modeled metrics, semantic layer, AI agent — fully built and running in 30 days. No archaeology. No rebuilding from scratch. Just answers to the questions your executives are already asking. → PipeHouse
If you want to fix this in-house but need someone to architect the solution and guide your team through it, I take on a limited number of analytics leadership engagements. I'll help you audit your stack, identify the gaps, and build a roadmap that gets your team from reactive to strategic. Reply to this email or let's talk.
— Chris
Sources: