Use CasesAIPostgreSQLMySQL

Real-Time Database Analytics Without a Data Warehouse

The conventional wisdom says that if you want analytics, you need a data warehouse. Copy your data to Snowflake, BigQuery, or Redshift. Build a pipeline. Set...

James Okonkwo· Developer AdvocateMarch 24, 202610 min read

The conventional wisdom says that if you want analytics, you need a data warehouse. Copy your data to Snowflake, BigQuery, or Redshift. Build a pipeline. Set up dbt. Then connect a BI tool. Six months later, you have dashboards.

For a company past a certain scale, that's the right architecture. But for most startups, early-stage companies, and small product teams, this is massive overkill and it actively slows you down.

Your production database already has all the data you need. This article explains how to run real-time analytics directly against your production database (or a read replica) without standing up a data warehouse.

The Data Warehouse Assumption Is Often Wrong

Data warehouses exist to solve specific problems:

  • You're running heavy analytical queries that would overwhelm your production database
  • You need to join data from dozens of different sources (Stripe, Salesforce, your app DB, ad platforms)
  • You have regulatory requirements to keep analytical data separate
  • Your data team is large enough to maintain ETL pipelines
  • If none of those apply to you, a data warehouse is a solution to a problem you don't have. The cost isn't just money (though Snowflake and Redshift costs add up fast). It's latency, complexity, and the maintenance burden of keeping your pipeline in sync.

    Most teams under 50 employees and many teams up to a few hundred don't need a warehouse. Their production database, whether it's PostgreSQL, MySQL, Supabase, or PlanetScale, has everything they need.

    What "Real-Time" Actually Means Here

    "Real-time" in analytics is often oversold. For most business use cases, you don't need sub-second latency. You need data that is:

  • Fresh enough to make decisions usually minutes to hours, not milliseconds
  • Accurate not lagging days behind because a pipeline failed silently
  • Accessible not locked behind a DBA who has to run the query for you
  • Querying your production database directly especially via a read replica satisfies all three. You get data that's current as of a few seconds ago, with no pipeline to fail, accessible to anyone with the right tool.

    The Read Replica Pattern

    Before running analytics directly against production, the key concern is load. A heavy analytical query one that scans millions of rows, does multiple JOINs, and computes aggregates can slow down your application.

    The solution is a read replica: a copy of your database that receives writes from the primary but handles reads separately. Most managed database providers offer this:

  • PostgreSQL (AWS RDS, Supabase, Neon): Create a read replica in a few clicks from the console
  • MySQL / PlanetScale: Read replicas are supported and recommended for analytical workloads
  • Supabase: Analytics queries can be routed to a separate connection string
  • Point your analytics tool at the read replica. Your application traffic hits the primary. Your analytical queries run on the replica without competing for the same resources.

    If you're on a small database (under a few million rows) and your queries complete in under a second, you may not even need a replica yet. Just query the primary with care avoid full table scans on hot tables during peak traffic.

    What You Can Build Without a Warehouse

    Here's what's possible with direct database analytics using a natural language tool like AI for Database:

    Revenue reporting:

    SELECT
      DATE_TRUNC('month', created_at) AS month,
      SUM(amount) AS mrr,
      COUNT(DISTINCT customer_id) AS paying_customers
    FROM subscriptions
    WHERE status = 'active'
    GROUP BY month
    ORDER BY month DESC;

    Ask this as: "Show me monthly recurring revenue and active customer count for the last 6 months."

    Retention cohorts:

    SELECT
      DATE_TRUNC('week', first_login) AS cohort_week,
      COUNT(DISTINCT user_id) AS cohort_size,
      COUNT(DISTINCT CASE WHEN last_login >= first_login + INTERVAL '7 days' THEN user_id END) AS retained_week1
    FROM user_activity
    GROUP BY cohort_week
    ORDER BY cohort_week DESC;

    Ask this as: "Show me weekly user cohorts and how many came back after 7 days."

    Funnel analysis:

    SELECT
      COUNT(*) FILTER (WHERE signed_up = true) AS signups,
      COUNT(*) FILTER (WHERE connected_db = true) AS connected,
      COUNT(*) FILTER (WHERE ran_first_query = true) AS activated
    FROM onboarding_events
    WHERE created_at >= CURRENT_DATE - INTERVAL '30 days';

    Ask this as: "How many users in the last 30 days signed up, connected a database, and ran their first query?"

    These are exactly the kinds of queries that non-technical business stakeholders need and they run in milliseconds on a well-indexed production database.

    The Performance Concern and How to Address It

    The objection you'll often hear from engineers: "We can't run analytics against production, it'll be too slow."

    Sometimes that's true. Often it's not. Here's when it matters:

    When direct DB analytics works fine:

  • Tables under roughly 10 million rows
  • Queries that filter on indexed columns
  • Dashboards that refresh every hour (not every second)
  • Applications with fewer than 100K daily active users
  • When you should consider a replica or a lightweight warehouse:

  • Queries that scan entire large tables with no usable index
  • Sub-minute refresh requirements on heavy aggregation queries
  • Joining five or more large tables with complex transformations
  • When analytical query load is visibly affecting application response time
  • The practical path for most teams: start direct. Add a read replica if you notice performance issues. Graduate to a warehouse only if the replica approach isn't sufficient. Most teams reach that point later than they expect.

    Setting Up Automated Dashboards From Your Production Database

    Once you've decided to query your production database (or replica) directly, the next step is making those insights available without running queries manually.

    AI for Database lets you build dashboards entirely from plain-English questions. Set a refresh schedule hourly, daily, or custom and the queries run automatically. Your team wakes up to current data without anyone having to remember to run anything.

    A typical startup ops dashboard built this way:

    Metric | Refresh | Natural Language Question

    Daily signups | Hourly | "How many users signed up in the last 24 hours?"

    MRR | Daily 6am | "What is total active subscription revenue this month?"

    Churn this week | Daily | "How many users cancelled in the last 7 days?"

    Active users today | Hourly | "How many unique users logged in today?"

    Open support tickets | Every 4h | "How many open support tickets are there right now?"

    This is a complete business dashboard, built from five natural language questions, with no SQL written and no data warehouse.

    Replacing the "Can You Pull a Report?" Request

    One of the highest-value use cases for direct database analytics is eliminating the ad hoc "can you pull a quick report?" request that interrupts engineers throughout the day.

    With natural language database access, the ops manager who wants "revenue by country for last quarter" can get it themselves in 30 seconds. The product manager asking "which onboarding step has the most drop-off this week?" can answer their own question. The CEO asking "how many paying customers do we have right now?" doesn't need to wait for a data export.

    Each of these queries takes an engineer 10–30 minutes to handle when the request comes in ad hoc context switching, finding the right tables, writing and debugging SQL, formatting the output. At 10 requests per week across a team, that's hours of engineering time redirected away from product work.

    Setting Up Alerts Without a Pipeline

    Beyond dashboards, your production database is also the right place to watch for conditions that need immediate attention.

    Instead of building a complex event pipeline, you can define alert conditions in plain English through AI for Database's action workflows:

  • "If daily new signups fall below 20, send a Slack message to #growth"
  • "If any transaction amount exceeds $10,000, email the finance team"
  • "If failed logins in the last hour exceed 100, call the security webhook"
  • The system checks these conditions on a schedule against your live database and fires the action automatically when the condition is met. No stored procedures. No database triggers. No separate monitoring stack to maintain.

    Creating a Read-Only Analytics User

    Before connecting any analytics tool to your production database, create a dedicated read-only user. This prevents any accidental writes and makes it easy to audit what the analytics connection is doing.

    In PostgreSQL:

    CREATE USER analytics_user WITH PASSWORD 'choose_a_strong_password';
    GRANT CONNECT ON DATABASE your_database TO analytics_user;
    GRANT USAGE ON SCHEMA public TO analytics_user;
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO analytics_user;
    -- Also grant access to future tables:
    ALTER DEFAULT PRIVILEGES IN SCHEMA public
      GRANT SELECT ON TABLES TO analytics_user;

    In MySQL:

    CREATE USER 'analytics_user'@'%' IDENTIFIED BY 'choose_a_strong_password';
    GRANT SELECT ON your_database.* TO 'analytics_user'@'%';
    FLUSH PRIVILEGES;

    Use these credentials when connecting AI for Database (or any analytics tool) to your database. If your credentials are ever compromised, the attacker can only read data not modify it.

    When to Graduate to a Data Warehouse

    Direct database analytics works until it doesn't. The signs you're outgrowing it:

  • Analytical queries are noticeably slowing down your application even with a replica
  • Your data team spends significant time maintaining joins between ten or more tables
  • You need to combine database data with external sources (Stripe webhooks, Salesforce exports, ad platform APIs)
  • You have compliance requirements mandating separation of production and analytical data
  • Your heaviest queries take more than 30 seconds on a replica
  • At that point, a lightweight warehouse (BigQuery, Motherduck, or DuckDB) becomes the right call. But most teams reach that point later than they expect and having spent the interim period doing direct database analytics means your team already knows exactly what data they need from a warehouse and why.

    Ready to try AI for Database?

    Query your database in plain English. No SQL required. Start free today.