Most teams find out about data anomalies the wrong way. A customer emails support saying their account is showing the wrong balance. A sales rep notices their pipeline numbers look different than yesterday. Revenue in the dashboard has inexplicably dropped 40% since Tuesday which turns out to be a bug in the billing integration that's been silently misfiring for three days.
Your database held the signal the whole time. The problem is that extracting it requires someone to know what to look for, write a query to detect it, run the query regularly, and act on the result. None of those four steps happen automatically without deliberate setup.
This article explains how to define meaningful anomalies in your data, why traditional alerting approaches fall short, and how to build automated detection that actually catches issues before they compound.
What Database Anomaly Detection Actually Means
An anomaly is a data pattern that deviates significantly from what you'd normally expect. But "normal" is specific to your product, your traffic patterns, and your business cycle. What counts as anomalous for a B2B SaaS with enterprise contracts looks completely different from a consumer app with daily active users.
Some anomalies are statistical a metric that's three standard deviations outside its 30-day average. Some are rule-based a counter that should only increase is suddenly decreasing. Some are relational two values that should always be equal aren't.
Useful anomaly detection starts with this question: what patterns in our database data would tell us something is wrong if we noticed them?
Common examples:
orders grows, but the sum of order_total doesn't change (a data pipeline bug)subscription_updated event doesn't (billing webhook failure)Each of these is a signal that something real is happening a bug, an infrastructure issue, unusual user behavior, or a broken integration. Catching them within minutes vs. days is the difference between a quick fix and a customer-facing incident.
The Metrics Worth Monitoring in a SaaS Database
Before you set up any alerting, spend 30 minutes listing the metrics in your database that are business-critical and time-sensitive. These are usually the ones you'd check manually during an incident.
Growth metrics
Health metrics
Engagement signals
Data consistency checks
mrr in subscriptions vs. sum in your billing system reconciliation tableEach of these maps to a SQL query. And each one, left unmonitored, is a quiet problem waiting to surface at the worst possible moment.
The Traditional Approach: Triggers and Stored Procedures
The standard database approach to monitoring is triggers procedures that fire automatically when data changes. Here's what a trigger-based alert looks like in PostgreSQL:
CREATE OR REPLACE FUNCTION notify_on_subscription_delete()
RETURNS trigger AS $$
BEGIN
IF (SELECT COUNT(*) FROM subscriptions WHERE status = 'active') < 100 THEN
PERFORM pg_notify('low_subscriptions', 'Active subscriptions dropped below 100');
END IF;
RETURN OLD;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER subscription_delete_alert
AFTER DELETE ON subscriptions
FOR EACH ROW EXECUTE FUNCTION notify_on_subscription_delete();This works, but it has significant limitations in practice.
Triggers run on every row change. If you're checking aggregate conditions (total count, sum, average), you're re-running that aggregate query on every single write. On a busy table, this is expensive and creates latency.
Triggers are hard to maintain. They live in the database, often undocumented, and don't show up in your application codebase. Engineers forget they exist. Schema changes silently break them.
They can't express time-based comparisons. "Alert me if signups today are 60% lower than the average of the last 7 days" is difficult to express in a trigger because triggers fire on row changes, not on schedule.
They require DBA access. Creating and modifying triggers typically requires elevated database permissions that most engineers and none of the ops team have.
A Better Approach: Scheduled Condition Checks
Instead of triggers, think about anomaly detection as a scheduled query that runs periodically and fires an alert when a condition is met. This separates the monitoring logic from the database engine, makes it easier to write and maintain, and handles time-based comparisons naturally.
The pattern is:
Here's what that looks like in practice:
-- Check if signups today are more than 60% below the 7-day average
SELECT
today.signups AS today_signups,
avg_7d.avg_signups AS avg_7d_signups,
CASE
WHEN avg_7d.avg_signups > 0
AND today.signups < avg_7d.avg_signups * 0.4
THEN true
ELSE false
END AS is_anomaly
FROM
(SELECT COUNT(*) AS signups FROM users WHERE DATE(created_at) = CURRENT_DATE) today,
(
SELECT AVG(daily_count) AS avg_signups
FROM (
SELECT DATE(created_at) AS day, COUNT(*) AS daily_count
FROM users
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
AND created_at < CURRENT_DATE
GROUP BY DATE(created_at)
) daily
) avg_7d;If is_anomaly returns true, fire the alert. Otherwise, do nothing. This query runs on a schedule hourly is usually fine for this type of metric.
Setting Up Anomaly Alerts Without SQL
Writing these queries is straightforward for engineers. For ops managers, product managers, and founders who want monitoring on their business metrics, the SQL barrier is the problem.
AI for Database approaches this differently. You define your alert condition in plain English "notify me in Slack if daily signups drop below 50" or "alert me when failed payments exceed 20 in a single day" and the system translates it into a scheduled check against your database.
The underlying mechanism is the same (a SQL query, running on a schedule, triggering a notification), but you don't need to write or maintain the SQL yourself. The alert condition lives in the tool, not buried in a stored procedure that's difficult to find and modify.
For a team without a dedicated data engineer, this makes the difference between having monitoring in place and not having it at all.
The Anomaly Detection Patterns Worth Knowing
A handful of patterns cover most production monitoring scenarios:
Threshold alert a value crosses an absolute limit
SELECT COUNT(*) AS failed_payments
FROM payment_events
WHERE status = 'failed'
AND created_at > NOW() - INTERVAL '1 hour'
HAVING COUNT(*) > 10;Use for: failed payments, error rates, queue backlogs
Percentage drop today's value is X% below a historical baseline
SELECT
(today_count::float / avg_count - 1) * 100 AS pct_change
FROM (
SELECT COUNT(*) AS today_count FROM events WHERE DATE(created_at) = CURRENT_DATE
) t,
(
SELECT AVG(c) AS avg_count FROM (
SELECT DATE(created_at), COUNT(*) AS c FROM events
WHERE created_at BETWEEN CURRENT_DATE - 8 AND CURRENT_DATE - 1
GROUP BY DATE(created_at)
) h
) h;Use for: signups, sessions, revenue, API calls
Consistency check two values that should match don't
SELECT
(SELECT COUNT(*) FROM orders WHERE status = 'completed') AS completed_orders,
(SELECT COUNT(*) FROM payments WHERE status = 'succeeded') AS succeeded_payments,
ABS(
(SELECT COUNT(*) FROM orders WHERE status = 'completed') -
(SELECT COUNT(*) FROM payments WHERE status = 'succeeded')
) AS discrepancy;Use for: billing reconciliation, data pipeline integrity, duplicate detection
Silence alert a table or event that should have new rows hasn't received any in a defined period
SELECT
CASE
WHEN MAX(created_at) < NOW() - INTERVAL '2 hours'
THEN 'SILENT - no new events in 2 hours'
ELSE 'OK'
END AS status
FROM application_events;Use for: webhook processing, job queues, scheduled data syncs
Handling Alert Fatigue
Anomaly detection is only useful if people pay attention to alerts. Alert fatigue where so many alerts fire that everyone starts ignoring them is a real failure mode.
A few practices reduce alert fatigue:
Tune thresholds to seasonal patterns. Signups on Sunday are lower than Tuesday. Alerts should account for day-of-week variation rather than using a flat threshold.
Add a minimum volume floor. If you have three signups on a slow day, a 60% drop from an average of 8 will trigger an alert but three signups is not worth a Slack ping. Add a condition like AND today_count > 20 to suppress low-volume noise.
Route alerts to the right channel. Not every anomaly needs to go to everyone. Payment failures go to billing. Signup drops go to growth. Data pipeline errors go to engineering. Context determines relevance.
Set resolution alerts. When a condition clears, send a follow-up message. "Signup rate has recovered now at 95% of 7-day average" closes the loop and helps the team understand whether their response worked.
Building an Anomaly Monitoring Stack
If you're setting this up from scratch, here's a practical starting point:
This doesn't need to be sophisticated to be effective. Even a handful of well-tuned checks catches the majority of incidents before they escalate.