introduction.mdx 6.66 KB
Newer Older
bailuo's avatar
readme  
bailuo committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: "Online (Real-Time) Anomaly Detection"
description: "Learn how to detect anomalies in real-time streaming data using TimeGPT's detect_anomalies_online method. Complete Python tutorial with code examples for monitoring server logs, IoT sensors, and live data streams."
icon: "bolt"
---

## Overview

Real-time anomaly detection enables you to identify unusual patterns in streaming time series data instantly—essential for monitoring server performance, detecting fraud, identifying system failures, and tracking IoT sensor anomalies. TimeGPT's `detect_anomalies_online` method provides:

- **Flexible Control**: Fine-tune detection sensitivity and confidence levels
- **Local & Global Detection**: Analyze individual series or detect system-wide anomalies across multiple correlated metrics
- **Stream Processing**: Monitor live data feeds with rolling window analysis

## Common Use Cases

- **Server Monitoring**: Detect CPU spikes, memory leaks, and downtime
- **IoT Sensors**: Identify equipment failures and sensor malfunctions
- **Fraud Detection**: Flag suspicious transactions in real-time
- **Application Performance**: Monitor API response times and error rates

## Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/online-anomaly-detection/01_quickstart.ipynb)

### Step 1: Set up your environment

Initialize your Python environment by importing the required libraries:

```python
import pandas as pd
from nixtla import NixtlaClient
import matplotlib.pyplot as plt
```

### Step 2: Configure your NixtlaClient

Provide your API key (and optionally a custom base URL).

```python
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key='my_api_key_provided_by_nixtla'
)
```

### Step 3: Load your dataset

We use a minute-level time series dataset that monitors server usage. This dataset is ideal for showcasing streaming data scenarios, where the task is to detect server failures or downtime in real time.

```python
df = pd.read_csv(
    'https://datasets-nixtla.s3.us-east-1.amazonaws.com/machine-1-1.csv',
    parse_dates=['ts']
)
```

We observe that the time series remains stable during the initial period; however, a spike occurs in the last 20 steps, indicating anomalous behavior. Our goal is to capture this abnormal jump as soon as it appears.

<Frame caption="Server Data with Spike Anomaly">
  ![Server Data with Spike Anomaly](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/online-anomaly-detection/01_quickstart_files/figure-markdown_strict/cell-11-output-1.png)
</Frame>

### Step 4: Detect anomalies in real time

The `detect_anomalies_online` method detects anomalies in a time series leveraging TimeGPT's forecast power. It uses the forecast error in deciding the anomalous step so you can specify and tune the parameters like that of the `forecast` method. This function will return a dataframe that contains anomaly flags and anomaly score (its absolute value quantifies the abnormality of the value).

To perform real-time anomaly detection, set the following parameters:

- `df`: A pandas DataFrame containing the time series data.
- `time_col`: The column that identifies the datestamp.
- `target_col`: The variable to forecast.
- `h`: Horizon is the number of steps ahead to make a forecast.
- `freq`: The frequency of the time series in Pandas format.
- `level`: Percentile of scores distribution at which the threshold is set, controlling how strictly anomalies are flagged. Default at 99%.
- `detection_size`: The number of steps to analyze for anomaly at the end of time series.

```python
anomaly_online = nixtla_client.detect_anomalies_online(
    df,
    time_col='ts',
    target_col='y',
    freq='min',                # Specify the frequency of the data
    h=10,                      # Specify the forecast horizon
    level=99,                  # Set the confidence level for anomaly detection
    detection_size=100         # Number of steps to analyze for anomalies
)

anomaly_online.tail()
```

```bash Log Output
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint...
```

View last 5 anomaly detections:
| unique_id          | ts                    | y          | TimeGPT    | anomaly   | anomaly_score   | TimeGPT-hi-99   | TimeGPT-lo-99   |
| ------------------ | --------------------- | ---------- | ---------- | --------- | --------------- | --------------- | --------------- |
| machine-1-1_y_29   | 2020-02-01 22:11:00   | 0.606017   | 0.544625   | True      | 18.463266       | 0.553161        | 0.536090        |
| machine-1-1_y_29   | 2020-02-01 22:12:00   | 0.044413   | 0.570869   | True      | -158.933850     | 0.579404        | 0.562333        |
| machine-1-1_y_29   | 2020-02-01 22:13:00   | 0.038682   | 0.560303   | True      | -157.474880     | 0.568839        | 0.551767        |
| machine-1-1_y_29   | 2020-02-01 22:14:00   | 0.024355   | 0.521797   | True      | -150.178240     | 0.530333        | 0.513261        |
| machine-1-1_y_29   | 2020-02-01 22:15:00   | 0.044413   | 0.467860   | True      | -127.848560     | 0.476396        | 0.459325        |


<Frame caption="Identified Anomalies">
  ![Identified Anomalies](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/online-anomaly-detection/01_quickstart_files/figure-markdown_strict/cell-13-output-1.png)
</Frame>

From the plot, we observe that the anomalous period is promptly detected. 

<Check>
  Here we use a detection size of 100 to illustrate the anomaly detection process. In production, running detections more frequently with smaller detection sizes can help identify anomalies as soon as they occur.
</Check>

## Frequently Asked Questions

**What's the difference between online and historical anomaly detection?**

Online detection analyzes recent data windows for immediate alerting, while historical detection analyzes complete datasets for pattern discovery.

**Can I adjust detection sensitivity?**

Yes, tune the `level` parameter (confidence threshold) and `detection_size` (analysis window) to control false positive rates.

## Next Steps

Now that you've detected your first anomalies in real-time, explore these guides to optimize your detection:

- [Controlling the Anomaly Detection Process](/anomaly_detection/real-time/adjusting_detection) - Learn how to fine-tune key parameters for more accurate detection
- [Local vs Global Anomaly Detection](/anomaly_detection/real-time/univariate_multivariate) - Choose the right detection strategy for single vs multiple correlated time series