Most teams find out about data anomalies the wrong way. A customer emails support saying their account is showing the wrong balance. A sales rep notices their pipeline numbers look different than yesterday. Revenue in the dashboard has inexplicably dropped 40% since Tuesday which turns out to be a bug in the billing integration that's been silently misfiring for three days.

Your database held the signal the whole time. The problem is that extracting it requires someone to know what to look for, write a query to detect it, run the query regularly, and act on the result. None of those four steps happen automatically without deliberate setup.

This article explains how to define meaningful anomalies in your data, why traditional alerting approaches fall short, and how to build automated detection that actually catches issues before they compound.

What Database Anomaly Detection Actually Means

An anomaly is a data pattern that deviates significantly from what you'd normally expect. But "normal" is specific to your product, your traffic patterns, and your business cycle. What counts as anomalous for a B2B SaaS with enterprise contracts looks completely different from a consumer app with daily active users.

Some anomalies are statistical a metric that's three standard deviations outside its 30-day average. Some are rule-based a counter that should only increase is suddenly decreasing. Some are relational two values that should always be equal aren't.

Useful anomaly detection starts with this question: what patterns in our database data would tell us something is wrong if we noticed them?

Common examples:

Daily signups drop more than 50% compared to the same day last week

Total active subscriptions decreases without a corresponding cancellation event

A user's session count exceeds 10,000 in a single day (bot or scraping activity)

The number of rows in orders grows, but the sum of order_total doesn't change (a data pipeline bug)

Payment events exist but the corresponding subscription_updated event doesn't (billing webhook failure)

A table that receives hundreds of writes per minute suddenly goes silent

Each of these is a signal that something real is happening a bug, an infrastructure issue, unusual user behavior, or a broken integration. Catching them within minutes vs. days is the difference between a quick fix and a customer-facing incident.

The Metrics Worth Monitoring in a SaaS Database

Before you set up any alerting, spend 30 minutes listing the metrics in your database that are business-critical and time-sensitive. These are usually the ones you'd check manually during an incident.

Growth metrics

New user registrations per day

New subscriptions per day

Trial-to-paid conversions

Health metrics

Active subscriptions (should never drop without cancellations to explain it)

Failed payment count (a sudden spike indicates a billing issue)

Webhook delivery failures (if you track them)

Engagement signals

Daily active users (users with sessions in the last 24 hours)

Feature-specific activity counts

API call volumes

Data consistency checks

Sum of mrr in subscriptions vs. sum in your billing system reconciliation table

Count of users without a matching profile record

Orders without a payment record (or vice versa)

Each of these maps to a SQL query. And each one, left unmonitored, is a quiet problem waiting to surface at the worst possible moment.

The Traditional Approach: Triggers and Stored Procedures

The standard database approach to monitoring is triggers procedures that fire automatically when data changes. Here's what a trigger-based alert looks like in PostgreSQL:

CREATE OR REPLACE FUNCTION notify_on_subscription_delete()
RETURNS trigger AS $$
BEGIN
  IF (SELECT COUNT(*) FROM subscriptions WHERE status = 'active') < 100 THEN
    PERFORM pg_notify('low_subscriptions', 'Active subscriptions dropped below 100');
  END IF;
  RETURN OLD;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER subscription_delete_alert
AFTER DELETE ON subscriptions
FOR EACH ROW EXECUTE FUNCTION notify_on_subscription_delete();

This works, but it has significant limitations in practice.

Triggers run on every row change. If you're checking aggregate conditions (total count, sum, average), you're re-running that aggregate query on every single write. On a busy table, this is expensive and creates latency.

Triggers are hard to maintain. They live in the database, often undocumented, and don't show up in your application codebase. Engineers forget they exist. Schema changes silently break them.

They can't express time-based comparisons. "Alert me if signups today are 60% lower than the average of the last 7 days" is difficult to express in a trigger because triggers fire on row changes, not on schedule.

They require DBA access. Creating and modifying triggers typically requires elevated database permissions that most engineers and none of the ops team have.

A Better Approach: Scheduled Condition Checks

Instead of triggers, think about anomaly detection as a scheduled query that runs periodically and fires an alert when a condition is met. This separates the monitoring logic from the database engine, makes it easier to write and maintain, and handles time-based comparisons naturally.

The pattern is:

Write a SQL query that returns a value (a count, sum, boolean, etc.)

Define a condition (value < threshold, value deviates from 7-day average, etc.)

Schedule it to run every N minutes or hours

When the condition is met, fire an alert (Slack message, email, webhook)

Here's what that looks like in practice:

-- Check if signups today are more than 60% below the 7-day average
SELECT
  today.signups AS today_signups,
  avg_7d.avg_signups AS avg_7d_signups,
  CASE
    WHEN avg_7d.avg_signups > 0
      AND today.signups < avg_7d.avg_signups * 0.4
    THEN true
    ELSE false
  END AS is_anomaly
FROM
  (SELECT COUNT(*) AS signups FROM users WHERE DATE(created_at) = CURRENT_DATE) today,
  (
    SELECT AVG(daily_count) AS avg_signups
    FROM (
      SELECT DATE(created_at) AS day, COUNT(*) AS daily_count
      FROM users
      WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
        AND created_at < CURRENT_DATE
      GROUP BY DATE(created_at)
    ) daily
  ) avg_7d;

If is_anomaly returns true, fire the alert. Otherwise, do nothing. This query runs on a schedule hourly is usually fine for this type of metric.

Setting Up Anomaly Alerts Without SQL

Writing these queries is straightforward for engineers. For ops managers, product managers, and founders who want monitoring on their business metrics, the SQL barrier is the problem.

AI for Database approaches this differently. You define your alert condition in plain English "notify me in Slack if daily signups drop below 50" or "alert me when failed payments exceed 20 in a single day" and the system translates it into a scheduled check against your database.

The underlying mechanism is the same (a SQL query, running on a schedule, triggering a notification), but you don't need to write or maintain the SQL yourself. The alert condition lives in the tool, not buried in a stored procedure that's difficult to find and modify.

For a team without a dedicated data engineer, this makes the difference between having monitoring in place and not having it at all.

The Anomaly Detection Patterns Worth Knowing

A handful of patterns cover most production monitoring scenarios:

Threshold alert a value crosses an absolute limit

SELECT COUNT(*) AS failed_payments
FROM payment_events
WHERE status = 'failed'
  AND created_at > NOW() - INTERVAL '1 hour'
HAVING COUNT(*) > 10;

Use for: failed payments, error rates, queue backlogs

Percentage drop today's value is X% below a historical baseline

SELECT
  (today_count::float / avg_count - 1) * 100 AS pct_change
FROM (
  SELECT COUNT(*) AS today_count FROM events WHERE DATE(created_at) = CURRENT_DATE
) t,
(
  SELECT AVG(c) AS avg_count FROM (
    SELECT DATE(created_at), COUNT(*) AS c FROM events
    WHERE created_at BETWEEN CURRENT_DATE - 8 AND CURRENT_DATE - 1
    GROUP BY DATE(created_at)
  ) h
) h;

Use for: signups, sessions, revenue, API calls

Consistency check two values that should match don't

SELECT
  (SELECT COUNT(*) FROM orders WHERE status = 'completed') AS completed_orders,
  (SELECT COUNT(*) FROM payments WHERE status = 'succeeded') AS succeeded_payments,
  ABS(
    (SELECT COUNT(*) FROM orders WHERE status = 'completed') -
    (SELECT COUNT(*) FROM payments WHERE status = 'succeeded')
  ) AS discrepancy;

Use for: billing reconciliation, data pipeline integrity, duplicate detection

Silence alert a table or event that should have new rows hasn't received any in a defined period

SELECT
  CASE
    WHEN MAX(created_at) < NOW() - INTERVAL '2 hours'
    THEN 'SILENT - no new events in 2 hours'
    ELSE 'OK'
  END AS status
FROM application_events;

Use for: webhook processing, job queues, scheduled data syncs

Handling Alert Fatigue

Anomaly detection is only useful if people pay attention to alerts. Alert fatigue where so many alerts fire that everyone starts ignoring them is a real failure mode.

A few practices reduce alert fatigue:

Tune thresholds to seasonal patterns. Signups on Sunday are lower than Tuesday. Alerts should account for day-of-week variation rather than using a flat threshold.

Add a minimum volume floor. If you have three signups on a slow day, a 60% drop from an average of 8 will trigger an alert but three signups is not worth a Slack ping. Add a condition like AND today_count > 20 to suppress low-volume noise.

Route alerts to the right channel. Not every anomaly needs to go to everyone. Payment failures go to billing. Signup drops go to growth. Data pipeline errors go to engineering. Context determines relevance.

Set resolution alerts. When a condition clears, send a follow-up message. "Signup rate has recovered now at 95% of 7-day average" closes the loop and helps the team understand whether their response worked.

Building an Anomaly Monitoring Stack

If you're setting this up from scratch, here's a practical starting point:

List your 5-10 most critical business metrics (see the list earlier in this article as a starting point)

Write the detection query for each using one of the patterns above

Decide on alert routing which channel, who should see it

Schedule each check most metrics work well on hourly or daily schedules

Run in shadow mode first let the alerts fire to a private log channel for a week before routing them to the team, so you can tune thresholds

This doesn't need to be sophisticated to be effective. Even a handful of well-tuned checks catches the majority of incidents before they escalate.

Start querying your database for free → Connect in 2 minutes at aifordatabase.com, no SQL required.

Database Anomaly Detection: Catch Unusual Patterns Before They Become Problems