Taiju Sanagi: Experiments

Anomaly Detection

Note
Updated: April 22, 2025

🚨 Introduction to Anomaly Detection

Anomaly detection is about finding things that don’t belong.

These could be:

  • A fraudulent credit card transaction
  • A faulty machine sensor reading
  • An unusual customer behavior

We want to identify rare, unusual patterns that are different from the normal data — without necessarily having labels for them.

1. What Is an Anomaly?

An anomaly (or outlier) is a data point that is significantly different from the rest of the dataset.

There are three common types:

  • Point anomalies — a single abnormal value (e.g. a 10,000 dollar charge when most are under 100 dollars)
  • Contextual anomalies — normal in one context, strange in another (e.g. 40°C is normal in summer, not in winter)
  • Collective anomalies — a group of points is strange together (e.g. multiple failed logins in a row)

2. Why Use Unsupervised Methods?

In most real-world cases:

  • We don’t have labels saying which data points are normal vs. abnormal
  • Anomalies are rare, making supervised training difficult

So we often use unsupervised algorithms to find anomalies based on the structure of the data.

3. How Does It Work?

The core idea is:

"Learn what normal looks like. Then flag anything that’s far from it."

Common unsupervised strategies include:

• Distance-based

  • Anomalies are far from the center or neighbors
  • Example: k-Nearest Neighbors (kNN), Isolation Forest

• Density-based

  • Anomalies live in low-density regions
  • Example: Local Outlier Factor (LOF), DBSCAN

• Model-based

  • Fit a model to the data (e.g. PCA, Gaussian) and flag points that don’t fit
  • Example: One-Class SVM, Autoencoders, Elliptic Envelope