Use CasesAIPostgreSQLSQL

Database Anomaly Detection: Catch Unusual Patterns Before They Become Problems

Most teams find out about data anomalies the wrong way. A customer emails support saying their account is showing the wrong balance. A sales rep notices thei...

Dr. Elena Vasquez· AI Research LeadApril 1, 20269 min read

Most teams find out about data anomalies the wrong way. A customer emails support saying their account is showing the wrong balance. A sales rep notices their pipeline numbers look different than yesterday. Revenue in the dashboard has inexplicably dropped 40% since Tuesday which turns out to be a bug in the billing integration that's been silently misfiring for three days.

Your database held the signal the whole time. The problem is that extracting it requires someone to know what to look for, write a query to detect it, run the query regularly, and act on the result. None of those four steps happen automatically without deliberate setup.

This article explains how to define meaningful anomalies in your data, why traditional alerting approaches fall short, and how to build automated detection that actually catches issues before they compound.

What Database Anomaly Detection Actually Means

An anomaly is a data pattern that deviates significantly from what you'd normally expect. But "normal" is specific to your product, your traffic patterns, and your business cycle. What counts as anomalous for a B2B SaaS with enterprise contracts looks completely different from a consumer app with daily active users.

Some anomalies are statistical a metric that's three standard deviations outside its 30-day average. Some are rule-based a counter that should only increase is suddenly decreasing. Some are relational two values that should always be equal aren't.

Useful anomaly detection starts with this question: what patterns in our database data would tell us something is wrong if we noticed them?

Common examples:

  • Daily signups drop more than 50% compared to the same day last week
  • Total active subscriptions decreases without a corresponding cancellation event
  • A user's session count exceeds 10,000 in a single day (bot or scraping activity)
  • The number of rows in orders grows, but the sum of order_total doesn't change (a data pipeline bug)
  • Payment events exist but the corresponding subscription_updated event doesn't (billing webhook failure)
  • A table that receives hundreds of writes per minute suddenly goes silent
  • Each of these is a signal that something real is happening a bug, an infrastructure issue, unusual user behavior, or a broken integration. Catching them within minutes vs. days is the difference between a quick fix and a customer-facing incident.

    The Metrics Worth Monitoring in a SaaS Database

    Before you set up any alerting, spend 30 minutes listing the metrics in your database that are business-critical and time-sensitive. These are usually the ones you'd check manually during an incident.

    Growth metrics

  • New user registrations per day
  • New subscriptions per day
  • Trial-to-paid conversions
  • Health metrics

  • Active subscriptions (should never drop without cancellations to explain it)
  • Failed payment count (a sudden spike indicates a billing issue)
  • Webhook delivery failures (if you track them)
  • Engagement signals

  • Daily active users (users with sessions in the last 24 hours)
  • Feature-specific activity counts
  • API call volumes
  • Data consistency checks

  • Sum of mrr in subscriptions vs. sum in your billing system reconciliation table
  • Count of users without a matching profile record
  • Orders without a payment record (or vice versa)
  • Each of these maps to a SQL query. And each one, left unmonitored, is a quiet problem waiting to surface at the worst possible moment.

    The Traditional Approach: Triggers and Stored Procedures

    The standard database approach to monitoring is triggers procedures that fire automatically when data changes. Here's what a trigger-based alert looks like in PostgreSQL:

    CREATE OR REPLACE FUNCTION notify_on_subscription_delete()
    RETURNS trigger AS $$
    BEGIN
      IF (SELECT COUNT(*) FROM subscriptions WHERE status = 'active') < 100 THEN
        PERFORM pg_notify('low_subscriptions', 'Active subscriptions dropped below 100');
      END IF;
      RETURN OLD;
    END;
    $$ LANGUAGE plpgsql;
    
    CREATE TRIGGER subscription_delete_alert
    AFTER DELETE ON subscriptions
    FOR EACH ROW EXECUTE FUNCTION notify_on_subscription_delete();

    This works, but it has significant limitations in practice.

    Triggers run on every row change. If you're checking aggregate conditions (total count, sum, average), you're re-running that aggregate query on every single write. On a busy table, this is expensive and creates latency.

    Triggers are hard to maintain. They live in the database, often undocumented, and don't show up in your application codebase. Engineers forget they exist. Schema changes silently break them.

    They can't express time-based comparisons. "Alert me if signups today are 60% lower than the average of the last 7 days" is difficult to express in a trigger because triggers fire on row changes, not on schedule.

    They require DBA access. Creating and modifying triggers typically requires elevated database permissions that most engineers and none of the ops team have.

    A Better Approach: Scheduled Condition Checks

    Instead of triggers, think about anomaly detection as a scheduled query that runs periodically and fires an alert when a condition is met. This separates the monitoring logic from the database engine, makes it easier to write and maintain, and handles time-based comparisons naturally.

    The pattern is:

  • Write a SQL query that returns a value (a count, sum, boolean, etc.)
  • Define a condition (value < threshold, value deviates from 7-day average, etc.)
  • Schedule it to run every N minutes or hours
  • When the condition is met, fire an alert (Slack message, email, webhook)
  • Here's what that looks like in practice:

    -- Check if signups today are more than 60% below the 7-day average
    SELECT
      today.signups AS today_signups,
      avg_7d.avg_signups AS avg_7d_signups,
      CASE
        WHEN avg_7d.avg_signups > 0
          AND today.signups < avg_7d.avg_signups * 0.4
        THEN true
        ELSE false
      END AS is_anomaly
    FROM
      (SELECT COUNT(*) AS signups FROM users WHERE DATE(created_at) = CURRENT_DATE) today,
      (
        SELECT AVG(daily_count) AS avg_signups
        FROM (
          SELECT DATE(created_at) AS day, COUNT(*) AS daily_count
          FROM users
          WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
            AND created_at < CURRENT_DATE
          GROUP BY DATE(created_at)
        ) daily
      ) avg_7d;

    If is_anomaly returns true, fire the alert. Otherwise, do nothing. This query runs on a schedule hourly is usually fine for this type of metric.

    Setting Up Anomaly Alerts Without SQL

    Writing these queries is straightforward for engineers. For ops managers, product managers, and founders who want monitoring on their business metrics, the SQL barrier is the problem.

    AI for Database approaches this differently. You define your alert condition in plain English "notify me in Slack if daily signups drop below 50" or "alert me when failed payments exceed 20 in a single day" and the system translates it into a scheduled check against your database.

    The underlying mechanism is the same (a SQL query, running on a schedule, triggering a notification), but you don't need to write or maintain the SQL yourself. The alert condition lives in the tool, not buried in a stored procedure that's difficult to find and modify.

    For a team without a dedicated data engineer, this makes the difference between having monitoring in place and not having it at all.

    The Anomaly Detection Patterns Worth Knowing

    A handful of patterns cover most production monitoring scenarios:

    Threshold alert a value crosses an absolute limit

    SELECT COUNT(*) AS failed_payments
    FROM payment_events
    WHERE status = 'failed'
      AND created_at > NOW() - INTERVAL '1 hour'
    HAVING COUNT(*) > 10;

    Use for: failed payments, error rates, queue backlogs

    Percentage drop today's value is X% below a historical baseline

    SELECT
      (today_count::float / avg_count - 1) * 100 AS pct_change
    FROM (
      SELECT COUNT(*) AS today_count FROM events WHERE DATE(created_at) = CURRENT_DATE
    ) t,
    (
      SELECT AVG(c) AS avg_count FROM (
        SELECT DATE(created_at), COUNT(*) AS c FROM events
        WHERE created_at BETWEEN CURRENT_DATE - 8 AND CURRENT_DATE - 1
        GROUP BY DATE(created_at)
      ) h
    ) h;

    Use for: signups, sessions, revenue, API calls

    Consistency check two values that should match don't

    SELECT
      (SELECT COUNT(*) FROM orders WHERE status = 'completed') AS completed_orders,
      (SELECT COUNT(*) FROM payments WHERE status = 'succeeded') AS succeeded_payments,
      ABS(
        (SELECT COUNT(*) FROM orders WHERE status = 'completed') -
        (SELECT COUNT(*) FROM payments WHERE status = 'succeeded')
      ) AS discrepancy;

    Use for: billing reconciliation, data pipeline integrity, duplicate detection

    Silence alert a table or event that should have new rows hasn't received any in a defined period

    SELECT
      CASE
        WHEN MAX(created_at) < NOW() - INTERVAL '2 hours'
        THEN 'SILENT - no new events in 2 hours'
        ELSE 'OK'
      END AS status
    FROM application_events;

    Use for: webhook processing, job queues, scheduled data syncs

    Handling Alert Fatigue

    Anomaly detection is only useful if people pay attention to alerts. Alert fatigue where so many alerts fire that everyone starts ignoring them is a real failure mode.

    A few practices reduce alert fatigue:

    Tune thresholds to seasonal patterns. Signups on Sunday are lower than Tuesday. Alerts should account for day-of-week variation rather than using a flat threshold.

    Add a minimum volume floor. If you have three signups on a slow day, a 60% drop from an average of 8 will trigger an alert but three signups is not worth a Slack ping. Add a condition like AND today_count > 20 to suppress low-volume noise.

    Route alerts to the right channel. Not every anomaly needs to go to everyone. Payment failures go to billing. Signup drops go to growth. Data pipeline errors go to engineering. Context determines relevance.

    Set resolution alerts. When a condition clears, send a follow-up message. "Signup rate has recovered now at 95% of 7-day average" closes the loop and helps the team understand whether their response worked.

    Building an Anomaly Monitoring Stack

    If you're setting this up from scratch, here's a practical starting point:

  • List your 5-10 most critical business metrics (see the list earlier in this article as a starting point)
  • Write the detection query for each using one of the patterns above
  • Decide on alert routing which channel, who should see it
  • Schedule each check most metrics work well on hourly or daily schedules
  • Run in shadow mode first let the alerts fire to a private log channel for a week before routing them to the team, so you can tune thresholds
  • This doesn't need to be sophisticated to be effective. Even a handful of well-tuned checks catches the majority of incidents before they escalate.

    Ready to try AI for Database?

    Query your database in plain English. No SQL required. Start free today.