Anomaly Detection
Detecting unusual patterns using Gaussian probability models to identify data points that deviate significantly from normal behavior.
Anomaly detection is an unsupervised learning technique used to identify rare items, events, or observations that differ significantly from the majority of the data. Unlike Clustering, which groups similar items together, anomaly detection focuses on finding the outliers — the data points that don’t fit the expected pattern.
When to Use Anomaly Detection
Anomaly detection works well when:
- You have many “normal” examples but very few (or no) labeled anomalies
- Anomalies are rare and diverse — they could look different each time
- You want to catch new types of anomalies that haven’t been seen before
Common applications include server monitoring (detecting failing machines), fraud detection, manufacturing quality control, and medical diagnostics.
Gaussian-Based Approach
The most common approach models the “normal” behavior of your data using a Gaussian Distribution. The intuition is simple:
- Fit a Gaussian to your training data (assumed to be mostly normal examples)
- For any new example, compute its probability under this distribution
- If the probability is very low (below threshold ε), flag it as an anomaly
Data points in the dense center of the distribution have high probability — they’re normal. Points in the sparse tails have low probability — they’re anomalies.
Parameter Estimation
Given a training set , estimate the Gaussian parameters for each feature :
Mean:
Variance:
These are closed-form solutions — no iterative optimization required. Simply compute the average and spread of each feature from your training data.
def estimate_gaussian(X):
"""Estimate mean and variance for each feature."""
m, n = X.shape
mu = (1 / m) * np.sum(X, axis=0)
var = (1 / m) * np.sum((X - mu) ** 2, axis=0)
return mu, var
Computing Probability
For a new example with features, compute its probability by assuming features are independent:
Each feature contributes its own Gaussian probability, and we multiply them together. A low value for any feature will drag down the overall probability, helping detect anomalies that are unusual in just one dimension.
Threshold Selection (ε)
The threshold ε determines what probability is “too low” to be normal. To select it optimally, use a cross-validation set with labeled examples (some known anomalies):
- Compute for all cross-validation examples
- Try many values of ε
- For each ε, classify examples as anomaly if
- Select the ε that maximizes the F1 score
Evaluation Metrics
For imbalanced data (few anomalies, many normal), accuracy is misleading. Instead use:
Precision — Of all examples flagged as anomalies, how many actually are?
Recall — Of all actual anomalies, how many did we catch?
F1 Score — Harmonic mean balancing precision and recall:
Where:
- (true positives): correctly identified anomalies
- (false positives): normal examples incorrectly flagged
- (false negatives): anomalies we missed
F1 is preferred because it penalizes models that sacrifice either precision or recall too heavily.
Algorithm Summary
- Choose features that might indicate anomalous behavior
- Fit parameters and from training data
- Compute probability for new examples
- Flag as anomaly if
# Training
mu, var = estimate_gaussian(X_train)
# Prediction
p = multivariate_gaussian(X_new, mu, var)
anomalies = p < epsilon
Practical Applications
- Server Monitoring: Track throughput and latency; flag servers with unusual combinations
- Fraud Detection: Model normal transaction patterns; flag unusual purchases
- Manufacturing: Monitor sensor readings; detect defective products
- Network Security: Baseline normal traffic; detect intrusions
High-Dimensional Data
This approach scales well to many features. The notebook example uses 11 features to monitor server health, achieving good detection with F1 ≈ 0.62. As dimensionality increases, ensure you have enough training data to reliably estimate each feature’s parameters.
Anomaly Detection vs. Supervised Learning
| Aspect | Anomaly Detection | Supervised Learning |
|---|---|---|
| Labeled anomalies | Few or none | Many of each class |
| Anomaly types | Diverse, novel | Known patterns |
| Training data | Mostly normal | Balanced classes |
| Best for | Rare, unpredictable events | Well-defined categories |
Use anomaly detection when anomalies are too rare or varied to learn directly. Use supervised classification when you have enough labeled examples of each class.