5 Min Read

Data Infrastructure for Resurrecting Dormant Users

Uttam Kumaran

CEO & Founder

Decisions happen too fast. We make 10, 20, 30 choices daily with incomplete context. About half of those decisions are wrong.

We recently hosted a webinar with Propel discussing how proper data infrastructure enables dormant user reactivation. Over 90% of typical users become inactive, yet reactivating just 5% can boost your customer base by 35-40%. This massive opportunity requires the right technical foundation.

Here's how we approach building data systems that enable effective lifecycle marketing and user reactivation.

‍

Key takeaways

Over 90% of users become dormant, but reactivating 5% can increase your customer base by 35-40%
Companies need five core components: ETL tools, data warehouse, modeling layer, BI tools, and reverse ETL
Marketers who learn basic SQL and understand data flow collaborate better with technical teams
Start with one proven attribution path before building complex multi-touch models
AI will compress decision time from question to answer, not replace human judgment

‍

Problems we solve for growth companies

Companies between $20-100 million in revenue come to us with two distinct data challenges that prevent effective user reactivation.

Companies graduating from spreadsheets

These businesses run operations on spreadsheets and native dashboards within Facebook Ads or individual tools. They need consolidation but don't know where to start. They lack someone to decide on tools, negotiate prices, implement a data warehouse, and create a KPI strategy.

Companies with underperforming data teams

These organizations already hired data professionals but aren't seeing results. Wrong tools, missing team structure, no stand-ups or retrospectives, absent version control and peer reviews. We deploy a full-stack data team approach, bringing their team members into our process while running the entire data operation.

Data functions as a service organization. We support finance, marketing, inventory, and sales teams. We systematically meet stakeholders, understand their first, second, and third-degree questions, then build both short-term answers and long-term infrastructure.

‍

Essential data infrastructure components

Setting up data infrastructure requires making vendor decisions based on budget, immediacy, and team composition. We don't take kickbacks from vendors. We build engineering solutions that work for each specific client.

ETL for data movement

You need some way to pipe data in and around. Common tools include Fivetran and Polyatomic. Budget determines your choice here.

Data warehouse for storage

You need somewhere to land that data. Either a data lake like S3 or a data warehouse. Key products include Databricks, Snowflake, and BigQuery. This becomes your central repository.

Modeling layer for business logic

We do all our work in dbt. It's a query language where you write SQL and structure orchestration of models and logic. How do you define an active customer? Active customer equals five clicks and three sessions. That business logic needs definition and codification in dbt within a version-controlled environment.

BI layer for reporting

What does reporting look like? Will people self-serve or will executives receive weekly reports? We make decisions between Tableau, Looker, and Sigma based on who the stakeholder is and what they need.

Reverse ETL for activation

Sending information from your warehouse back to tools enables lifecycle marketing. This means sending data to Segment for marketing purposes, into Klaviyo for segmentation, or any activation tool.

For example, you calculate a lead score predicting who's ready for an upsell. As lifecycle marketers, you want to target that person. You need that score in Klaviyo or Braze to trigger emails. This flow requires calculating somewhere, sending accurately, maintaining relationships between data and marketing teams, triggering actions, then measuring conversion.

‍

Three-layer medallion architecture

We typically see companies evolve through specific data maturity stages. Understanding this evolution helps you make better infrastructure decisions.

Raw layer preserves source data

We ingest data into a raw schema inside the main database. Raw data means you don't modify anything. You bring it from the source without changes. Shopify arrives structured like Excel tables. Amazon provides JSON files, not tables. Both land in raw format.

Intermediate layer standardizes formats

After raw ingestion comes modeling work. In ETL terminology, you've extracted from sources and now enter transformation. Data moves from raw to intermediate following medallion architecture. The intermediate layer prepares data better. It's more curated. Unstructured JSON from Amazon becomes organized tables. Scattered formats get standardized.

Marts layer organizes by business function

From intermediate, transformation continues into marts. We separate by different categories based on client needs. Sales marts combine Shopify and Amazon data. Customer support marts consolidate support information. Logistics marts handle fulfillment. Summary tables provide high-level metrics.

When data reaches marts, it's curated and business-ready. This represents ELT not ETL. Extract, Load, then Transform. You extract from sources and load into raw schema without changes. Then perform transformations creating intermediate and marts layers. Transformation happens inside the warehouse using its compute power.

‍

Effective data and marketing collaboration

Many consultants say "we always do this" or "we always do that." We don't. We build solutions that are great engineering solutions mixing tools that work for each client.

Marketers should learn SQL

SQL is really close to English. You can learn it slowly in an afternoon. When you write even a little SQL, you build empathy and camaraderie with your data team.

Here's why it matters. You want to know how many people click the nav bar then drop off. If you can write that query and hand it to the data team saying "here's the business problem, here's exploratory work I did," they'll be much happier. Without this, there's extensive back and forth where they probably get it wrong initially.

Ask for access to BigQuery or Snowflake. Say "I'd love an account, I'm interested in these questions, can you walk me through sample queries?" If your data team is great, they'll love that someone else wants to run queries.

Explain problems not solutions

Marketers commonly think "I need this column" or "I need this value." Instead, explain the question you're trying to answer. Let them figure out the solution.

If you say "I need this specific column here," most people build just that. They don't consider supporting many team members systematically. Come in saying "we want to target users dropping off from this product feature." Let them propose solutions because you'll be asked about the next thing. One point solution won't translate to every feature event or drop-off you need.

Understanding technical concepts helps

Know what APIs are. What are server-to-server events versus frontend events? This understanding helps you push back appropriately.

When data teams hear about new APIs or changing frontend events, these are trigger words causing overwhelm. But it might be as small as adding a snippet. If you understand the terminology, you can explain specifically what you need without shocking them.

‍

Managing team structure effectively

This debate continues in data: decentralized versus centralized teams. The natural evolution shows companies starting with a centralized data team serving everybody.

Challenges with pure centralization

Everybody's problems become equal except executives who get priority. You lack subject matter experts. Everyone must be expert in everything, meaning nobody specializes in anything.

Problems with full decentralization

Managing data teams differs from having data people in marketing teams. How do you judge performance? How do you promote? What are KPIs? Writing SQL isn't the same as decreasing CPM or CAC. Data people isolated on marketing teams need camaraderie within larger data organizations.

Recommended hybrid approach

For companies under $15-20 million revenue, stay centralized. Companies that decentralize too fast lose data teams. They're isolated, getting criticized by business teams without peer support.

Have subject matter experts within the centralized data team as first contacts. Someone handles marketing questions and triages requests before translating for data engineers. Otherwise engineers with hundreds of tickets might not understand your ask and you get backlogged.

Assign people to business units so they go deeper. They become marketing data experts understanding event datasets, customer data, and reverse ETL procedures.

‍

Attribution and measurement challenges

Many clients struggle with attribution, especially for longer sales cycles. This represents a complex problem without perfect solutions.

Start with known examples

Take a sale where you have concrete understanding of attribution source. ABC Corporation came through a LinkedIn post, did a demo, then converted. Trace through your data. Do you have demo data? Can you see their UTM parameter was LinkedIn?

Work on one lead with known truth. Ensure you can accomplish that before moving to scale. If dealing with hundreds of thousands of consumers, you might need an answer today and must turn on attribution immediately.

Build systematic tracking

You need to capture every touchpoint systematically. Without data points, you can't reach attribution conclusions. This means instrumenting demo forms, tracking UTM parameters, storing user journeys.

Tools like Northbeam, Triple Whale, and others provide attribution capabilities. But start small validating you have data before attempting complex models.

Accept imperfect attribution

True attribution may never be possible. You never know every influence on user behavior. Multiple teams claim credit for improvements. Product changes, performance marketing, email campaigns all contribute.

No single answer exists. Focus on finding repeatable patterns you can measure consistently rather than perfect attribution.

‍

AI's role in data infrastructure

AI won't eliminate data engineering jobs. You'll probably need fewer people who are more effective with their time.

Current AI applications

Text-to-SQL agents make data accessible without technical skills. But without understanding SQL, you can't verify answers are correct. The agent might say "I can't do that" and without context, you can't push back.

AI agents won't understand missing tables, incorrectly named columns, or missing data sources. Your data team knows this context. They need to document it or provide context for agent interaction.

Compressing decision cycles

My biggest interest involves AI in decision-making processes. Our goal helps people make more decisions and more accurate decisions. That's the biggest bottleneck for businesses.

The time between question and answer needs compression. Not just faster Python writing. The real bottleneck: an executive asks a question, waits for an answer, asks follow-ups, then reaches decisions.

Text-to-SQL addresses part of this. But data exists outside tables too. Meeting notes, Slack conversations, documentation, industry benchmarks. All this needs availability to AI plus your data for decision support.

Building trust through iteration

We build what's called golden datasets. Typically 50-100 question-answer pairs documented with teams. You'd be surprised how people disagree on what constitutes an accurate answer.

We build LLM evaluations scoring responses against control datasets. Over time we increase scores while tackling question tiers. Basic questions like page views and orders. Then intermediate. Then advanced multi-step questions.

Even scoring zero on advanced questions gives us measurement for improvement. We ensure scoring mechanisms, feedback mechanisms, and feedback loops. When AI responds in Slack, users provide feedback about accuracy. This returns to our team for iteration.

‍

Implementation timeline and approach

This iterative process takes time with no shortcuts. In one month you'll complete basic questions. Another two to three months gets through medium complexity. Then you tackle advanced challenges.

Setting realistic expectations

We're not a software business. We can't sell dreams about "AI data analyst answering any question." That's impossible currently. It requires business context not written anywhere. Not in code or meeting notes.

Consultancies like ours hold advantageous positions. Doing our jobs requires asking hundreds of questions and documenting answers. That becomes perfect AI context.

Measuring success systematically

Success requires more than just deploying tools. You need golden datasets defining truth. Evaluation frameworks measuring accuracy. Feedback systems capturing user input. Documentation explaining business logic.

The process never truly ends. As businesses evolve, new questions emerge. Customer bases shift. Product features change. Data infrastructure must evolve continuously.

‍

FAQ

Why choose ELT over ETL?

ELT leverages cloud warehouse compute power more efficiently. You preserve raw data exactly as received, then transform using Snowflake or BigQuery processing capabilities. This approach maintains data lineage and enables reprocessing when business logic changes.

Should we build analytics in-house?

Unless you're selling event data or packaging analytics as a product, use third-party tools. Amplitude, Mixpanel, and similar platforms handle complexities you shouldn't rebuild. Focus engineering resources on your core business.

How much SQL should marketers learn?

Learn enough to write basic queries and understand data structure. You don't need advanced skills, just ability to explore data and communicate precisely with technical teams. One afternoon gets you started.

When should companies decentralize data teams?

Stay centralized under $15-20 million revenue. Larger companies benefit from embedded specialists only when they can afford complete data teams within departments. Premature decentralization causes talent retention problems.

How will AI change data work in three years?

Fewer people accomplishing more. AI assists decisions rather than replacing judgment. Focus shifts from writing code to ensuring decision quality. The biggest impact: compressing time from business question to actionable answer.

‍

Summary

Building data infrastructure for user reactivation requires systematic approaches balancing technical excellence with business practicality. The five-component stack (ETL, warehouse, modeling, BI, reverse ETL) creates foundations for lifecycle marketing success.

Success depends equally on technology choices and team collaboration. Marketers learning SQL and data engineers understanding business problems accelerates progress. Centralized teams with specialized liaisons optimize both efficiency and expertise.

Attribution remains imperfect but improvable through systematic tracking and realistic expectations. AI enhances rather than replaces human decision-making, with trust building through careful iteration and measurement.

This infrastructure enables the massive opportunity in dormant users. When 90% of users are inactive, even small reactivation improvements drive significant growth.

‍

Schedule a free strategy audit