Commit f42429f6 authored by bailuo's avatar bailuo
Browse files

readme

parents
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Online (Real-Time) Anomaly Detection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Online anomaly detection dynamically identifies anomalies as data streams in, allowing users to specify the number of timestamps to monitor. This method is well-suited for immediate applications, such as fraud detection, live sensor monitoring, or tracking real-time demand changes. By focusing on recent data and continuously generating forecasts, it enables timely responses to anomalies in critical scenarios.\n",
"\n",
"This section provides various recipes for performing real-time anomaly detection using TimeGPT, offering users the ability to detect outliers and unusual patterns as they emerge, ensuring prompt intervention in time-sensitive situations.\n",
"\n",
"This section covers:\n",
"\n",
"* [Online anomaly detection](https://docs.nixtla.io/docs/capabilities-online-anomaly-detection-quickstart)\n",
"\n",
"* [How to adjust the detection process](https://docs.nixtla.io/docs/capabilities-online-anomaly-detection-adjusting_detection_process.ipynb)\n",
"\n",
"* [Univariate vs. multiseries anomaly detection](https://docs.nixtla.io/docs/capabilities-online-anomaly-detection-univariate_vs_multivariate_anomaly_detection)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Online (Real-Time) Anomaly Detection\n",
"In this notebook, we introduce the `detect_anomalies_online` method. You will learn how to quickly start using this new endpoint and understand its key differences from the historical anomaly detection endpoint. New features include:\n",
"* More flexibility and control over the anomaly detection process\n",
"* Perform univariate and multivariate anomaly detection\n",
"* Detect anomalies on stream data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/online-anomaly-detection/01_quickstart.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv()\n",
" colab_badge('docs/capabilities/online-anomaly-detection/01_quickstart')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from nixtla import NixtlaClient\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 👍 Use an Azure AI endpoint\n",
"> \n",
"> To use an Azure AI endpoint, set the `base_url` argument:\n",
"> \n",
"> `nixtla_client = NixtlaClient(base_url=\"you azure ai endpoint\", api_key=\"your api_key\")`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Dataset\n",
"In this notebook, we use a minute-level time series dataset that monitors server usage. This is a good example of a streaming data scenario, as the task is to detect server failures or downtime."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('https://datasets-nixtla.s3.us-east-1.amazonaws.com/machine-1-1.csv', parse_dates=['ts'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that the time series remains stable during the initial period; however, a spike occurs in the last 20 steps, indicating an anomalous behavior. Our goal is to capture this abnormal jump as soon as it appears. Let's see how the real-time anomaly detection capability of TimeGPT performs in this scenario!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 1200x200 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"ax = df.tail(300).plot(x='ts', y='y', color = 'navy', title='Time Series', figsize=(12, 2))\n",
"plt.axvspan('2020-02-01 21:00:00', '2020-02-01 21:02:00', color='orchid', alpha=0.3, label='Anomalous Period')\n",
"plt.axvspan('2020-02-01 21:47:00', '2020-02-01 22:11:00', color='orchid', alpha=0.3)\n",
"ax.legend(['Time Series', 'Anomalous Period'])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Detect anomalies in real time\n",
"The `detect_anomalies_online` method detect anomalies in a time series leveraging TimeGPT's forecast power. It uses the forecast error in deciding the anomalous step so you can specify and tune the parameters like that of the `forecast` method. This function will return a dataframe that contains anomaly flags and anomaly score (its absolute value quantifies the abnormality of the value).\n",
"\n",
"To perfom real-time anomaly detection, set the following parameters:\n",
"\n",
"- `df`: A pandas DataFrame containing the time series data.\n",
"- `time_col`: The column that identifies the datestamp.\n",
"- `target_col`: The variable to forecast.\n",
"- `h`: Horizon is the number of steps ahead to make forecast.\n",
"- `freq`: The frequency of the time series in Pandas format.\n",
"- `level`: Percentile of scores distribution at which the threshold is set, controlling how strictly anomalies are flagged. Default at 99%.\n",
"- `detection_size`: The number of steps to analyze for anomaly at the end of time series."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint...\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ts</th>\n",
" <th>y</th>\n",
" <th>TimeGPT</th>\n",
" <th>anomaly</th>\n",
" <th>anomaly_score</th>\n",
" <th>TimeGPT-hi-99</th>\n",
" <th>TimeGPT-lo-99</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>machine-1-1_y_29</td>\n",
" <td>2020-02-01 22:11:00</td>\n",
" <td>0.606017</td>\n",
" <td>0.544625</td>\n",
" <td>True</td>\n",
" <td>18.463266</td>\n",
" <td>0.553161</td>\n",
" <td>0.536090</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>machine-1-1_y_29</td>\n",
" <td>2020-02-01 22:12:00</td>\n",
" <td>0.044413</td>\n",
" <td>0.570869</td>\n",
" <td>True</td>\n",
" <td>-158.933850</td>\n",
" <td>0.579404</td>\n",
" <td>0.562333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>machine-1-1_y_29</td>\n",
" <td>2020-02-01 22:13:00</td>\n",
" <td>0.038682</td>\n",
" <td>0.560303</td>\n",
" <td>True</td>\n",
" <td>-157.474880</td>\n",
" <td>0.568839</td>\n",
" <td>0.551767</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>machine-1-1_y_29</td>\n",
" <td>2020-02-01 22:14:00</td>\n",
" <td>0.024355</td>\n",
" <td>0.521797</td>\n",
" <td>True</td>\n",
" <td>-150.178240</td>\n",
" <td>0.530333</td>\n",
" <td>0.513261</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>machine-1-1_y_29</td>\n",
" <td>2020-02-01 22:15:00</td>\n",
" <td>0.044413</td>\n",
" <td>0.467860</td>\n",
" <td>True</td>\n",
" <td>-127.848560</td>\n",
" <td>0.476396</td>\n",
" <td>0.459325</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ts y TimeGPT anomaly \\\n",
"95 machine-1-1_y_29 2020-02-01 22:11:00 0.606017 0.544625 True \n",
"96 machine-1-1_y_29 2020-02-01 22:12:00 0.044413 0.570869 True \n",
"97 machine-1-1_y_29 2020-02-01 22:13:00 0.038682 0.560303 True \n",
"98 machine-1-1_y_29 2020-02-01 22:14:00 0.024355 0.521797 True \n",
"99 machine-1-1_y_29 2020-02-01 22:15:00 0.044413 0.467860 True \n",
"\n",
" anomaly_score TimeGPT-hi-99 TimeGPT-lo-99 \n",
"95 18.463266 0.553161 0.536090 \n",
"96 -158.933850 0.579404 0.562333 \n",
"97 -157.474880 0.568839 0.551767 \n",
"98 -150.178240 0.530333 0.513261 \n",
"99 -127.848560 0.476396 0.459325 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"anomaly_online = nixtla_client.detect_anomalies_online(\n",
" df,\n",
" time_col='ts', \n",
" target_col='y', \n",
" freq='min', # Specify the frequency of the data\n",
" h=10, # Specify the forecast horizon\n",
" level=99, # Set the confidence level for anomaly detection\n",
" detection_size=100 # How many steps you want for analyzing anomalies\n",
")\n",
"anomaly_online.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 In this example, we use a detection size of 100 to illustrate the anomaly detection process. In practice, using a smaller detection size and running the detection more frequently improves granularity and enables more timely identification of anomalies as they occur."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the plot, we observe that both anomalous periods were detected right as they arose. For further methods on improving detection accuracy and customizing anomaly detection, read our other tutorials on online anomaly detection."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 1200x200 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"plt.figure(figsize=(12, 2))\n",
"plt.plot(anomaly_online['ts'], anomaly_online['y'], label='y', color='navy', alpha=0.8)\n",
"plt.plot(anomaly_online['ts'], anomaly_online['TimeGPT'], label='TimeGPT', color='orchid', alpha=0.7)\n",
"plt.scatter(anomaly_online.loc[anomaly_online['anomaly'], 'ts'], anomaly_online.loc[anomaly_online['anomaly'], 'y'], color='orchid', label='Anomalies Detected')\n",
"for t in ['2020-02-01 21:00:00', '2020-02-01 21:47:00']:\n",
" plt.axvline(pd.to_datetime(t), color='red', linestyle='--', alpha=0.7, label='Anomaly Behavior Captured' if t == '2020-02-01 21:00:00' else None)\n",
"\n",
"plt.axvspan('2020-02-01 21:00:00', '2020-02-01 21:02:00', color='orchid', alpha=0.3, label='Anomalous Period')\n",
"plt.axvspan('2020-02-01 21:47:00', '2020-02-01 22:11:00', color='orchid', alpha=0.3)\n",
"plt.legend()\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For an in-depth analysis of the `detect_anomalies_online` method, refer to the tutorial (coming soon)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AzureAI \n",
"\n",
"> The foundational models for time series by Nixtla can be deployed on your Azure subscription. This page explains how to easily get started with TimeGEN-1 deployed as an Azure AI endpoint. If you use the `nixtla` library, it should be a drop-in replacement where you only need to change the client parameters (endpoint URL, API key, model name)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploying TimeGEN-1\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using the model\n",
"\n",
"Once your model is deployed and provided that you have the relevant permissions, consuming it will basically be the same process as for a Nixtla endpoint.\n",
"\n",
"To run the examples below, you will need to define the following environment variables:\n",
"\n",
"- `AZURE_AI_NIXTLA_BASE_URL` is your api URL, should be of the form `https://your-endpoint.inference.ai.azure.com/`.\n",
"- `AZURE_AI_NIXTLA_API_KEY` is your authentication key."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to use"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Just import the library, set your credentials, and start forecasting in two lines of code!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"pip install nixtla\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
"import os\n",
"from nixtla import NixtlaClient\n",
"\n",
"base_url = os.environ[\"AZURE_AI_NIXTLA_BASE_URL\"]\n",
"api_key = os.environ[\"AZURE_AI_NIXTLA_API_KEY\"]\n",
"model = \"azureai\"\n",
"\n",
"nixtla_client = NixtlaClient(api_key=api_key, base_url=base_url)\n",
"nixtla_client.forecast(\n",
" ...,\n",
" model=model,\n",
")\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# About TimeGPT\n",
"\n",
"TimeGPT is a production-ready generative pretrained transformer for time series. It's capable of accurately predicting various domains such as retail, electricity, finance, and IoT with just a few lines of code.\n",
"\n",
"It is user-friendly and low-code. Users can simply upload their time series data and generate forecasts or detect anomalies with just a single line of code.\n",
"\n",
"TimeGPT is the only out of-the-box foundation model for time series that can be used through our public APIs, through [Azure Studio as TimeGEN-1](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/announcing-timegen-1-in-azure-ai-leap-forward-in-time-series/ba-p/4140446) or on your own infrastructure.\n",
"\n",
"Get started! [Activate your free trial](https://dashboard.nixtla.io/freetrial) and see our [Quickstart Guide](https://docs.nixtla.io/docs/getting-started-timegpt_quickstart).\n",
"\n",
"## Features and capabilities\n",
"\n",
"* **[Zero-shot Inference](https://docs.nixtla.io/docs/capabilities-forecast-quickstart)**: TimeGPT can generate forecasts and detect anomalies straight out of the box, requiring no prior training data. This allows for immediate deployment and quick insights from any time series data.\n",
"\n",
"* **[Fine-tuning](https://docs.nixtla.io/docs/tutorials-fine_tuning)**: Enhance TimeGPT's capabilities by fine-tuning the model on your specific datasets, enabling the model to adapt to the nuances of your unique time series data and improving performance on tailored tasks.\n",
"\n",
"* **[API Access](https://dashboard.nixtla.io/sign_in)**: Integrate TimeGPT seamlessly into your applications via our robust API (obtain an API key through our [Dashboard](https://dashboard.nixtla.io/sign_in)). TimeGPT is also supported through [Azure Studio](https://docs.nixtla.io/docs/deployment-azureai) for even more flexible integration options. Alternatively, deploy TimeGPT on your own infrastructure to maintain full control over your data and workflows.\n",
"\n",
"* **[Add Exogenous Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables)**: Incorporate additional variables that might influence your predictions to enhance forecast accuracy. (E.g. Special Dates, events or prices)\n",
"\n",
"* **[Multiple Series Forecasting](https://docs.nixtla.io/docs/tutorials-multiple_series_forecasting)**: Simultaneously forecast multiple time series data, optimizing workflows and resources.\n",
"\n",
"* **[Specific Loss Function](https://docs.nixtla.io/docs/tutorials-fine_tuning_with_a_specific_loss_function)**: Tailor the fine-tuning process by choosing from many loss functions to meet specific performance metrics.\n",
"\n",
"* **[Cross-validation](https://docs.nixtla.io/docs/tutorials-cross_validation)**: Implement out of the box cross-validation techniques to ensure model robustness and generalizability.\n",
"\n",
"* **[Prediction Intervals](https://docs.nixtla.io/docs/tutorials-prediction_intervals)**: Provide intervals in your predictions to quantify uncertainty effectively.\n",
"\n",
"* **[Irregular Timestamps](https://docs.nixtla.io/docs/capabilities-forecast-irregular_timestamps)**: Handle data with irregular timestamps, accommodating non-uniform interval series without preprocessing.\n",
"\n",
"* **[Anomaly Detection](https://docs.nixtla.io/docs/tutorials-anomaly_detection)**: Automatically detect anomalies in time series, and use exogenous features for enhanced performance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Get started with our [QuickStart guide](https://docs.nixtla.io/docs/getting-started-timegpt_quickstart), walk through tutorials on the different capabilities, and learn from real-world use cases in our documentation.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Architecture\n",
"\n",
"Self-attention, the revolutionary concept introduced by the paper [Attention is all you need](https://arxiv.org/abs/1706.03762), is the basis of this foundation model. TimeGPT model is not based on any existing large language model(LLM). Instead, it is independently trained on a vast amount of time series data, and the large transformer model is designed to minimize the forecasting error.\n",
"\n",
"<img src=\"https://github.com/Nixtla/nixtla/blob/main/nbs/img/timegpt_archi.png?raw=true\" />\n",
"\n",
"The architecture consists of an encoder-decoder structure with multiple layers, each with residual connections and layer normalization. Finally, a linear layer maps the decoder’s output to the forecasting window dimension. The general intuition is that attention-based mechanisms are able to capture the diversity of past events and correctly extrapolate potential future distributions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To make prediction, TimeGPT \"reads\" the input series much like the way humans read a sentence – from left to right. It looks at windows of past data, which we can think of as \"tokens\", and predicts what comes next. This prediction is based on patterns the model identifies in past data and extrapolates into the future."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explore examples and use cases"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Visit our comprehensive documentation to explore a wide range of examples and practical use cases for TimeGPT. Whether you're getting started with our [Quickstart Guide](https://docs.nixtla.io/docs/getting-started-timegpt_quickstart), [setting up your API key](https://docs.nixtla.io/docs/getting-started-setting_up_your_api_key), or looking for advanced forecasting techniques, our resources are designed to guide you through every step of the process. \n",
"\n",
"Learn how to handle [anomaly detection](https://docs.nixtla.io/docs/capabilities-anomaly-detection-quickstart), [fine-tune models](https://docs.nixtla.io/docs/tutorials-fine_tuning_with_a_specific_loss_function) with specific loss functions, and scale your computing using frameworks like [Spark](https://docs.nixtla.io/docs/tutorials-spark), [Dask](https://docs.nixtla.io/docs/tutorials-dask), and [Ray](https://docs.nixtla.io/docs/tutorials-ray). \n",
"\n",
"Additionally, our documentation covers specialized topics such as handling [exogenous variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables), validating models through [cross-validation](https://docs.nixtla.io/docs/tutorials-cross_validation), and forecasting under uncertainty with [quantile forecasts](https://docs.nixtla.io/docs/tutorials-quantile_forecasts) and [prediction intervals](https://docs.nixtla.io/docs/tutorials-prediction_intervals). \n",
"\n",
"For those interested in real-world applications, discover how TimeGPT can be used for [forecasting web traffic](https://docs.nixtla.io/docs/use-cases-forecasting_web_traffic) or [predicting Bitcoin prices](https://docs.nixtla.io/docs/use-cases-bitcoin_price_prediction)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "18102cea",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a50470ae",
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ace35684",
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1ea1aa8",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"id": "6ecd9d32-9178-4768-bffa-d70c93c98311",
"metadata": {},
"source": [
"# TimeGEN-1 Quickstart (Azure)\n",
"\n",
"> TimeGEN-1 is TimeGPT optimized for the Azure infrastructure. It is a production ready, generative pretrained transformer for time series. It's capable of accurately predicting various domains such as retail, electricity, finance, and IoT with just a few lines of code 🚀."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c48909a",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/getting-started/22_azure_quickstart.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv()\n",
" colab_badge('docs/getting-started/22_azure_quickstart')"
]
},
{
"cell_type": "markdown",
"id": "7e5b10a6-b39c-4583-bc2c-ab0b70eeea28",
"metadata": {},
"source": [
"## Step 1: Set up a TimeGEN-1 endpoint account and generate your API key on Azure\n",
"\n",
"* Go to [ml.azure.com](https://ml.azure.com/)\n",
"* Sign in or create an account at Microsoft\n",
"* Click on 'Models' in the sidebar\n",
"* Search for 'TimeGEN' in the model catalog\n",
"* Select TimeGEN-1\n",
"\n",
"![Azure Model Catalog landing page. Search for forecast and TimeGEN-1 comes up as the only option.](https://github.com/Nixtla/nixtla/blob/main/nbs/img/azure-model-catalog.png?raw=true)"
]
},
{
"cell_type": "markdown",
"id": "aa3705dd-6133-4eb0-8c53-8a8ff65ee455",
"metadata": {},
"source": [
"* Click 'Deploy' and this will create an Endpoint\n",
"\n",
"![TimeGEN-1 in the model catalog. Cursor is on the Deploy button indicating what to select to deploy TimeGEN-1.](https://github.com/Nixtla/nixtla/blob/main/nbs/img/azure-deploy.png?raw=true)"
]
},
{
"cell_type": "markdown",
"id": "a2ca7c90",
"metadata": {},
"source": [
"* Go to 'Endpoint' in the sidebar and you will see your TimeGEN-1 endpoint there\n",
"* In that Endpoint are the base URL and API Key you will use\n",
"\n",
"![Endpoint is highlighted in the side panel. The main panel shows the endpoint and API key for the TimeGEN-1 endpoint, with buttons where you can copy the information.](https://github.com/Nixtla/nixtla/blob/main/nbs/img/azure-endpoint.png?raw=true)"
]
},
{
"cell_type": "markdown",
"id": "5f30be1a-3eaf-4133-8254-680e8f7cda72",
"metadata": {},
"source": [
"## Step 2: Install Nixtla"
]
},
{
"cell_type": "markdown",
"id": "73ad0fc4-bcf1-4b7e-8e6c-5969982e1b78",
"metadata": {},
"source": [
"In your favorite Python development environment:"
]
},
{
"cell_type": "markdown",
"id": "d9c35e70",
"metadata": {},
"source": [
"Install `nixtla` with `pip`:\n",
" \n",
"```shell\n",
"pip install nixtla\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "03e24b4f-6b8c-4ffa-82c6-f5b889fdd423",
"metadata": {},
"source": [
"## Step 3: Import the Nixtla TimeGPT client"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d984aea-1315-4d4e-8b4d-b23efe947be1",
"metadata": {},
"outputs": [],
"source": [
"from nixtla import NixtlaClient"
]
},
{
"cell_type": "markdown",
"id": "8b73a131-390e-46b9-847b-173f7d3c869a",
"metadata": {},
"source": [
"You can instantiate the `NixtlaClient` class providing your authentication API key. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1098659-f250-4663-b588-f9e17065cafa",
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" base_url = \"YOUR_BASE_URL\",\n",
" api_key = \"YOUR_API_KEY\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5b8ea7f-e30e-4001-a7a6-9e935e12180a",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"id": "8ca0d1f7-9730-4146-b6f3-596099ce6e3b",
"metadata": {},
"source": [
"## Step 4: Start making forecasts!\n",
"\n",
"Now you can start making forecasts! Let's import an example using the classic `AirPassengers` dataset. This dataset contains the monthly number of airline passengers in Australia between 1949 and 1960. First, load the dataset and plot it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "687802f2-be84-4b81-95eb-44798c591daf",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fded6ec5-949a-44b8-9a4a-8fc8a5295e06",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>timestamp</th>\n",
" <th>value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1949-01-01</td>\n",
" <td>112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1949-02-01</td>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1949-03-01</td>\n",
" <td>132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1949-04-01</td>\n",
" <td>129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1949-05-01</td>\n",
" <td>121</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" timestamp value\n",
"0 1949-01-01 112\n",
"1 1949-02-01 118\n",
"2 1949-03-01 132\n",
"3 1949-04-01 129\n",
"4 1949-05-01 121"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6242b73d-fd43-41be-b4db-123cf7cd5b11",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 1600x350 with 1 Axes>"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nixtla_client.plot(df, time_col='timestamp', target_col='value')"
]
},
{
"cell_type": "markdown",
"id": "defa26b8",
"metadata": {},
"source": [
"> 📘 Data Requirements\n",
">\n",
"> * Make sure the target variable column does not have missing or non-numeric values.\n",
"> * Do not include gaps/jumps in the datestamps (for the given frequency) between the first and late datestamps. The forecast function will not impute missing dates.\n",
"> * The format of the datestamp column should be readable by Pandas (see [this link](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) for more details).\n",
">\n",
">For further details go to [Data Requirements](https://docs.nixtla.io/docs/getting-started-data_requirements)."
]
},
{
"cell_type": "markdown",
"id": "19581d85",
"metadata": {},
"source": [
"> 👍 Save figures made with TimeGEN\n",
"> \n",
"> The `plot` method automatically displays figures when in a notebook environment. To save figures locally, you can do:\n",
"> \n",
"> `fig = nixtla_client.plot(df, time_col='timestamp', target_col='value')`\n",
">\n",
">`fig.savefig('plot.png', bbox_inches='tight')`"
]
},
{
"cell_type": "markdown",
"id": "7631f17b-0ca4-4cf1-9b89-182a7d85fda6",
"metadata": {},
"source": [
"### Make forecasts"
]
},
{
"cell_type": "markdown",
"id": "5c4790e3",
"metadata": {},
"source": [
"Next, forecast the next 12 months using the SDK `forecast` method. Set the following parameters:\n",
"\n",
"- `df`: A pandas DataFrame containing the time series data.\n",
"- `h`: Horizons is the number of steps ahead to forecast.\n",
"- `freq`: The frequency of the time series in Pandas format. See [pandas’ available frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). (If you don't provide any frequency, the SDK will try to infer it)\n",
"- `time_col`: The column that identifies the datestamp.\n",
"- `target_col`: The variable to forecast.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "793011c6-6845-410f-b6b1-3bdb87b41ce6",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Restricting input...\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>timestamp</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1961-01-01</td>\n",
" <td>437.837921</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1961-02-01</td>\n",
" <td>426.062714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1961-03-01</td>\n",
" <td>463.116547</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1961-04-01</td>\n",
" <td>478.244507</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1961-05-01</td>\n",
" <td>505.646484</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" timestamp TimeGPT\n",
"0 1961-01-01 437.837921\n",
"1 1961-02-01 426.062714\n",
"2 1961-03-01 463.116547\n",
"3 1961-04-01 478.244507\n",
"4 1961-05-01 505.646484"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#| eval: false\n",
"\n",
"timegen_fcst_df = nixtla_client.forecast(df=df, h=12, freq='MS', time_col='timestamp', target_col='value')\n",
"timegen_fcst_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "638329d2-2d1b-49dd-8df7-1926f7d9b36b",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 2400x350 with 1 Axes>"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#| eval: false\n",
"\n",
"nixtla_client.plot(df, timegen_fcst_df, time_col='timestamp', target_col='value')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Setting up your API key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial will explain how to set up your API key when using the Nixtla SDK. To create an `Api Key` go to your [Dashboard](https://dashboard.nixtla.io/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are different ways to set up your API key. We provide some examples below. A scematic is given below.\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://github.com/Nixtla/nixtla/blob/main/nbs/img/api_key_process.png?raw=true\" alt=\"Diagram of the API Key configuration process. Method 1. Unsecure. Copy API key from Nixtla dashboard. 2. Paste API Key in Python code. 3. Validate API key. Method 2. Secure. One method, temporary. Open terminal. Set environment variable. Validate API key. Another method, permanent. Create .env File. Set API Key in file. Validate API Key.\" />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Copy and paste your key directly into your Python code\n",
"\n",
"This approach is straightforward and best for quick tests or scripts that won’t be shared.\n",
"\n",
"\n",
"- **Step 1**: Copy the API key found in the `API Keys` of your [Nixtla dashboard](https://dashboard.nixtla.io/). \n",
"- **Step 2**: Paste the key directly into your Python code, by instantiating the `NixtlaClient` with your API key:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nixtla import NixtlaClient \n",
"nixtla_client = NixtlaClient(api_key ='your API key here')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"::: {.callout-important}\n",
"This approach is considered unsecure, as your API key will be part of your source code.\n",
"::: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Secure: using an environment variable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- **Step 1**: Store your API key in an environment variable named `NIXTLA_API_KEY`. This can be done (a) temporarily for a session or (b) permanently, depending on your preference.\n",
"- **Step 2**: When you instantiate the `NixtlaClient` class, the SDK will automatically look for the `NIXTLA_API_KEY` environment variable and use it to authenticate your requests."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"::: {.callout-important}\n",
"The environment variable must be named exactly `NIXTLA_API_KEY`, with all capital letters and no deviations in spelling, for the SDK to recognize it.\n",
"::: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### a. Temporary: From the Terminal\n",
"\n",
"This approach is useful if you are working from a terminal, and need a temporary solution. \n",
"\n",
"#### Linux / Mac\n",
"Open a terminal and use the `export` command to set `NIXTLA_API_KEY`. \n",
"\n",
"``` bash\n",
"export NIXTLA_API_KEY=your_api_key\n",
"```\n",
"\n",
"#### Windows\n",
"For Windows users, open a Powershell window and use the `Set` command to set `NIXTLA_API_KEY`. \n",
"\n",
"``` powershell\n",
"Set NIXTLA_API_KEY=your_api_key\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### b. Permanent: Using a `.env` file\n",
"\n",
"For a more persistent solution place your API key in a `.env` file located in the folder of your Python script. In this file, include the following:\n",
"\n",
"``` python\n",
"NIXTLA_API_KEY=your_api_key\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can now load the environment variable within your Python script. Use the `dotenv` package to load the `.env` file and then instantiate the `NIXTLA_API_KEY` class. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dotenv import load_dotenv\n",
"load_dotenv()\n",
"\n",
"from nixtla import NixtlaClient\n",
"nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This approach is more secure and suitable for applications that will be deployed or shared, as it keeps API keys out of the source code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"::: {.callout-important}\n",
"Remember, your API key is like a password - keep it secret, keep it safe!\n",
"::: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Validate your API key\n",
"\n",
"You can always find your API key in the `API Keys` section of your dashboard. To check the status of your API key, use the `validate_api_key` method of the `NixtlaClient` class. This method will return `True` if the API key is valid and `False` otherwise. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nixtla_client.validate_api_key()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You don't need to validate your API key every time you use `TimeGPT`. This function is provided for your convenience to ensure its validity. For full access to `TimeGPT`'s functionalities, in addition to a valid API key, you also need sufficient credits in your account. You can check your credits in the `Usage` section of your [dashboard](https://dashboard.nixtla.io/). "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TimeGPT Subscription Plans\n",
"\n",
"We offer various Enterprise plans tailored to your forecasting needs. The number of API calls, number of users, and support levels can be customized based on your needs. We also offer an option for a self-hosted version and a version hosted on Azure.\n",
"\n",
"Please get in touch with us at support@nixtla.io for more information regarding pricing options and to discuss your specific requirements. For organizations interested in exploring our solution further, you can schedule a demo [here]( https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=dc037f5a-d93b-4[…]90b-a611dd9460af&utm_source=github&utm_medium=pricing_page) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Free trial available**\n",
"\n",
"When you [create your account](https://dashboard.nixtla.io), you’ll receive a 30-day free trial, no credit card required. After 30 days, access will expire unless you upgrade to a paid plan. Contact us to continue leveraging TimeGPT for accurate and easy to use forecasting!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**More information on pricing and billing**\n",
"\n",
"For additional information on pricing and billing please see our FAQ."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Requirements\n",
"\n",
"> This section explains the data requirements for `TimeGPT`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"from nixtla.utils import colab_badge"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/getting-started/5_data_requirements.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"colab_badge('docs/getting-started/5_data_requirements')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`TimeGPT` accepts `pandas` and `polars` dataframes in [long format](https://www.theanalysisfactor.com/wide-and-long-data/#comments) with the following necessary columns: \n",
"\n",
"- `ds` (timestamp): timestamp in format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. \n",
"- `y` (numeric): The target variable to forecast. \n",
"\n",
"(Optionally, you can also pass a DataFrame without the `ds` column as long as it has DatetimeIndex)\n",
"\n",
"`TimeGPT` also works with distributed dataframes like `dask`, `spark` and `ray`. \n",
"\n",
"You can also include exogenous features in the DataFrame as additional columns. For more information, follow this [tutorial](https://docs.nixtla.io/docs/tutorials-exogenous_variables).\n",
"\n",
"Below is an example of a valid input dataframe for `TimeGPT`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>timestamp</th>\n",
" <th>value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1949-01-01</td>\n",
" <td>112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1949-02-01</td>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1949-03-01</td>\n",
" <td>132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1949-04-01</td>\n",
" <td>129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1949-05-01</td>\n",
" <td>121</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" timestamp value\n",
"0 1949-01-01 112\n",
"1 1949-02-01 118\n",
"2 1949-03-01 132\n",
"3 1949-04-01 129\n",
"4 1949-05-01 121"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd \n",
"\n",
"df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that in this example, the `ds` column is named `timestamp` and the `y` column is named `value`. You can either:\n",
"\n",
"1. Rename the columns to `ds` and `y`, respectively, or\n",
"\n",
"2. Keep the current column names and specify them when using any method from the `NixtlaClient` class with the `time_col` and `target_col` arguments. \n",
"\n",
"For example, when using the `forecast` method from the `NixtlaClient` class, you must instantiate the class and then specify the columns names as follows. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nixtla import NixtlaClient\n",
"\n",
"nixtla_client = NixtlaClient(\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Querying model metadata...\n",
"INFO:nixtla.nixtla_client:Restricting input...\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>timestamp</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1961-01-01</td>\n",
" <td>437.83792</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1961-02-01</td>\n",
" <td>426.06270</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1961-03-01</td>\n",
" <td>463.11655</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1961-04-01</td>\n",
" <td>478.24450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1961-05-01</td>\n",
" <td>505.64648</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" timestamp TimeGPT\n",
"0 1961-01-01 437.83792\n",
"1 1961-02-01 426.06270\n",
"2 1961-03-01 463.11655\n",
"3 1961-04-01 478.24450\n",
"4 1961-05-01 505.64648"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst = nixtla_client.forecast(df=df, h=12, time_col='timestamp', target_col='value')\n",
"fcst.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, the `NixtlaClient` is infereing the frequency, but you can explicitly specify it with the `freq` argument.\n",
"\n",
"\n",
"To learn more about how to instantiate the `NixtlaClient` class, refer to the [TimeGPT Quickstart](https://docs.nixtla.io/docs/getting-started-timegpt_quickstart)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multiple Series "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you're working with multiple time series, make sure that each series has a unique identifier. You can name this column `unique_id` or specify its name using the `id_col` argument when calling any method from the `NixtlaClient` class. This column should be a string, integer, or category.\n",
"\n",
"In this example, we have five series representing hourly electricity prices in five different markets. The columns already have the default names, so it's unnecessary to specify the `id_col`, `time_col`, or `target_col` arguments. If your columns have different names, specify these arguments as required."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 00:00:00</td>\n",
" <td>70.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 01:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 02:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 03:00:00</td>\n",
" <td>44.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 04:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds y\n",
"0 BE 2016-10-22 00:00:00 70.00\n",
"1 BE 2016-10-22 01:00:00 37.10\n",
"2 BE 2016-10-22 02:00:00 37.10\n",
"3 BE 2016-10-22 03:00:00 44.75\n",
"4 BE 2016-10-22 04:00:00 37.10"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Inferred freq: h\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Querying model metadata...\n",
"INFO:nixtla.nixtla_client:Restricting input...\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 00:00:00</td>\n",
" <td>45.190582</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 01:00:00</td>\n",
" <td>43.244987</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 02:00:00</td>\n",
" <td>41.958897</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 03:00:00</td>\n",
" <td>39.796680</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 04:00:00</td>\n",
" <td>39.204865</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds TimeGPT\n",
"0 BE 2016-12-31 00:00:00 45.190582\n",
"1 BE 2016-12-31 01:00:00 43.244987\n",
"2 BE 2016-12-31 02:00:00 41.958897\n",
"3 BE 2016-12-31 03:00:00 39.796680\n",
"4 BE 2016-12-31 04:00:00 39.204865"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst = nixtla_client.forecast(df=df, h=24) # use id_col, time_col and target_col here if needed. \n",
"fcst.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When working with a large number of time series, consider using a [distributed computing framework](https://docs.nixtla.io/docs/tutorials-computing_at_scale) to handle the data efficiently. `TimeGPT` supports frameworks such as [Spark](https://docs.nixtla.io/docs/tutorials-spark), [Dask](https://docs.nixtla.io/docs/tutorials-dask), and [Ray](https://docs.nixtla.io/docs/tutorials-ray)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exogenous Variables "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`TimeGPT` also accepts exogenous variables. You can add exogenous variables to your dataframe by including additional columns after the `y` column."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>y</th>\n",
" <th>Exogenous1</th>\n",
" <th>Exogenous2</th>\n",
" <th>day_0</th>\n",
" <th>day_1</th>\n",
" <th>day_2</th>\n",
" <th>day_3</th>\n",
" <th>day_4</th>\n",
" <th>day_5</th>\n",
" <th>day_6</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 00:00:00</td>\n",
" <td>70.00</td>\n",
" <td>57253.0</td>\n",
" <td>49593.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 01:00:00</td>\n",
" <td>37.10</td>\n",
" <td>51887.0</td>\n",
" <td>46073.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 02:00:00</td>\n",
" <td>37.10</td>\n",
" <td>51896.0</td>\n",
" <td>44927.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 03:00:00</td>\n",
" <td>44.75</td>\n",
" <td>48428.0</td>\n",
" <td>44483.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 04:00:00</td>\n",
" <td>37.10</td>\n",
" <td>46721.0</td>\n",
" <td>44338.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds y Exogenous1 Exogenous2 day_0 day_1 \\\n",
"0 BE 2016-10-22 00:00:00 70.00 57253.0 49593.0 0.0 0.0 \n",
"1 BE 2016-10-22 01:00:00 37.10 51887.0 46073.0 0.0 0.0 \n",
"2 BE 2016-10-22 02:00:00 37.10 51896.0 44927.0 0.0 0.0 \n",
"3 BE 2016-10-22 03:00:00 44.75 48428.0 44483.0 0.0 0.0 \n",
"4 BE 2016-10-22 04:00:00 37.10 46721.0 44338.0 0.0 0.0 \n",
"\n",
" day_2 day_3 day_4 day_5 day_6 \n",
"0 0.0 0.0 0.0 1.0 0.0 \n",
"1 0.0 0.0 0.0 1.0 0.0 \n",
"2 0.0 0.0 0.0 1.0 0.0 \n",
"3 0.0 0.0 0.0 1.0 0.0 \n",
"4 0.0 0.0 0.0 1.0 0.0 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When using exogenous variables, you also need to provide its future values. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>Exogenous1</th>\n",
" <th>Exogenous2</th>\n",
" <th>day_0</th>\n",
" <th>day_1</th>\n",
" <th>day_2</th>\n",
" <th>day_3</th>\n",
" <th>day_4</th>\n",
" <th>day_5</th>\n",
" <th>day_6</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 00:00:00</td>\n",
" <td>70318.0</td>\n",
" <td>64108.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 01:00:00</td>\n",
" <td>67898.0</td>\n",
" <td>62492.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 02:00:00</td>\n",
" <td>68379.0</td>\n",
" <td>61571.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 03:00:00</td>\n",
" <td>64972.0</td>\n",
" <td>60381.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 04:00:00</td>\n",
" <td>62900.0</td>\n",
" <td>60298.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds Exogenous1 Exogenous2 day_0 day_1 day_2 \\\n",
"0 BE 2016-12-31 00:00:00 70318.0 64108.0 0.0 0.0 0.0 \n",
"1 BE 2016-12-31 01:00:00 67898.0 62492.0 0.0 0.0 0.0 \n",
"2 BE 2016-12-31 02:00:00 68379.0 61571.0 0.0 0.0 0.0 \n",
"3 BE 2016-12-31 03:00:00 64972.0 60381.0 0.0 0.0 0.0 \n",
"4 BE 2016-12-31 04:00:00 62900.0 60298.0 0.0 0.0 0.0 \n",
"\n",
" day_3 day_4 day_5 day_6 \n",
"0 0.0 0.0 1.0 0.0 \n",
"1 0.0 0.0 1.0 0.0 \n",
"2 0.0 0.0 1.0 0.0 \n",
"3 0.0 0.0 1.0 0.0 \n",
"4 0.0 0.0 1.0 0.0 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"future_ex_vars_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')\n",
"future_ex_vars_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Inferred freq: h\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Using future exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 00:00:00</td>\n",
" <td>51.632830</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 01:00:00</td>\n",
" <td>45.750877</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 02:00:00</td>\n",
" <td>39.650543</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 03:00:00</td>\n",
" <td>34.000072</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 04:00:00</td>\n",
" <td>33.785370</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds TimeGPT\n",
"0 BE 2016-12-31 00:00:00 51.632830\n",
"1 BE 2016-12-31 01:00:00 45.750877\n",
"2 BE 2016-12-31 02:00:00 39.650543\n",
"3 BE 2016-12-31 03:00:00 34.000072\n",
"4 BE 2016-12-31 04:00:00 33.785370"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst = nixtla_client.forecast(df=df, X_df=future_ex_vars_df, h=24)\n",
"fcst.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To learn more about how to use exogenous variables with `TimeGPT`, consult the [Exogenous Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables) tutorial. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Important Considerations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When using `TimeGPT`, the data cannot contain missing values. This means that for every series, there should be no gaps in the timestamps and no missing values in the target variable. \n",
"\n",
"For more, please refer to the tutorial on [Dealing with Missing Values in TimeGPT](https://docs.nixtla.io/docs/tutorials-dealing_with_missing_values_in_timegpt). \n",
"\n",
"### Minimum Data Requirements (for AzureAI)\n",
"\n",
"`TimeGPT` currently supports any amount of data for generating point forecasts. That is, the minimum size per series to expect results from this call `nixtla_client.forecast(df=df, h=h, freq=freq)` is one, regardless of the frequency.\n",
"\n",
"For Azure AI, when using the arguments `level`, `finetune_steps`, `X_df` (exogenous variables), or `add_history`, the API requires a minimum number of data points depending on the frequency. Here are the minimum sizes for each frequency:\n",
"\n",
"<div align=\"center\">\n",
"\n",
"| Frequency | Minimum Size |\n",
"|--------------------------|--------------|\n",
"| Hourly and subhourly (e.g., \"H\", \"min\", \"15T\") | 1008 |\n",
"| Daily (\"D\") | 300 |\n",
"| Weekly (e.g., \"W-MON\",..., \"W-SUN\") | 64 |\n",
"| Monthly and other frequencies (e.g., \"M\", \"MS\", \"Y\") | 48 |\n",
"\n",
"</div>\n",
"\n",
"For cross-validation, you need to consider these numbers as well as the forecast horizon (`h`), the number of windows (`n_windows`), and the gap between windows (`step_size`). Thus, the minimum number of observations per series in this case would be determined by the following relationship:\n",
"\n",
"<div align=\"center\">\n",
"\n",
"Minimum number described previously + h + step_size + (n_windows - 1)\n",
"\n",
"</div>\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# FAQ\n",
"\n",
"Commonly asked questions about TimeGPT"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of contents\n",
"- [TimeGPT](#timegpt)\n",
"- [TimeGPT API Key](#timegpt-api-key)\n",
"- [Features and Capabilities](#features-and-capabilities)\n",
"- [Fine-tuning](#finetuning)\n",
"- [Pricing and Billing](#pricing-and-billing)\n",
"- [Privacy and Security](#privacy-and-security)\n",
"- [Troubleshooting](#troubleshooting)\n",
"- [Additional Support](#additional-support)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TimeGPT"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details> \n",
" <summary>What is TimeGPT?</summary>\n",
"\n",
"`TimeGPT` is the first foundation model for time series forecasting. It can produce accurate forecasts for new time series across a diverse array of domains using only historical values as inputs. The model \"reads\" time series data sequentially from left to right, similarly to how humans read a sentence. It looks at windows of past data, which we can think of as \"tokens\", and then predicts what comes next. This prediction is based on patterns the model identifies and that it extrapolates into the future. Beyond forecasting, `TimeGPT` supports other time series related tasks, such as what-if-scenarios, anomaly detection, and more. \n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Is TimeGPT based on a Large Language Model (LLM)?</summary> \n",
"\n",
"No, `TimeGPT` is not based on any large language model. While it follows the same principle of training a large transformer model on a vast dataset, its architecture is specifically designed to handle time series data and it has been trained to minimize forecasting errors. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>How do I get started with TimeGPT?</summary>\n",
"\n",
"To get started with `TimeGPT`, you need to register for an account [here](https://dashboard.nixtla.io/). You will receive an email asking you to confirm your signup. After confirming, you will be able to access your dashboard, which contains the details of your account.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>How accessible is TimeGPT and what are the usage costs?</summary>\n",
"\n",
"For a more in-depth understanding of `TimeGPT`, please refer to the [research paper](https://arxiv.org/pdf/2310.03589.pdf). While certain aspects of the model's architecture remain confidential, registration for `TimeGPT` is open to all. New users receive $1,000 USD in free credits and subsequent usage fees are based on token consumption. For more details, please refer to the [Pricing and Billing](#pricing-and-billing) section\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>How can I use TimeGPT?</summary>\n",
"\n",
"- Through the [Python SDK](https://github.com/Nixtla/nixtla)\n",
"\n",
"- Via the `TimeGPT` API. For instructions on how to call the API using different languages, please refer to the [API documentation](https://docs.nixtla.io/reference/timegpt_timegpt_post)\n",
"\n",
"Both methods require you to have a [API key](#timegpt-api-key), which is obtained upon registration and can be found in your dashboard under `API Keys`.\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TimeGPT API Key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>What is an API key?</summary>\n",
"\n",
"An API key is a unique string of characters that serves as a key to authenticate your requests when using the Nixtla SDK. It ensures that the person making the requests is authorized to do so.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Where can I get an API key?</summary>\n",
"\n",
"Upon registration, you will receive an API key that can be found in your [dashboard](https://dashboard.nixtla.io/) under `API Keys`. Remember that your API key is personal and should not be shared with anyone.\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>How do I use my API key?</summary>\n",
"\n",
"To integrate your API key into your development workflow, please refer to the tutorial on [Setting Up Your API Key](https://docs.nixtla.io/docs/getting-started-setting_up_your_api_key). \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>How can I check the status of my API key?</summary>\n",
"\n",
"If you want to check the status of your API key, you can use the [`validate_api_key` method](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-validate-api-key) of the `NixtlaClient` class. \n",
"\n",
"`nixtla_client = NixtlaClient(\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")`\n",
"\n",
"`nixtla_client.validate_api_key()`\n",
"\n",
"If your key is validating correctly, this will return\n",
"\n",
"```\n",
"INFO:nixtla.nixtla_client:Happy Forecasting! :), If you have questions or need support, please email support@nixtla.io\n",
"\n",
"True\n",
"```\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>What if my API key isn't validating?</summary>\n",
"\n",
"When you validate your API key and it returns `False`:\n",
"\n",
"* If you are targeting an Azure endpoint, getting `False` from the `NixtlaClient.validate_api_key` method is expected. You can skip this step when taregting an Azure endpoint and proceed diretly to forecasting instead.\n",
"* If you are not taregting an Azure endpoint, then you should check the following:\n",
" * Make sure you are using the latest version of the SDK (Python or R).\n",
" * Check that your API key is active in your dashboard by visiting https://dashboard.nixtla.io/\n",
" * Consider any firewalls your organization might have. There may be restricted access. If so, you can whitelist our endpoint https://api.nixtla.io/. \n",
" - To use Nixtla's API, you need to let your system know that our endpoint is ok, so it will let you access it. Whitelisting the endpoint isn't something that Nixtla can do on our side. It's something that needs to be done on the user's system. This is a bit of an [overview on whitelisting](https://www.csoonline.com/article/569493/whitelisting-explained-how-it-works-and-where-it-fits-in-a-security-program.html).\n",
" - If you work in an organization, please work with an IT team. They're likely the ones setting the security and you can talk with them to get it addressed. If you run your own systems, then it's something you should be able to update, depending on the system you're using.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Features and Capabilities"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>What is the input to TimeGPT?</summary>\n",
"\n",
"`TimeGPT` accepts `pandas` dataframes in [long format](https://www.theanalysisfactor.com/wide-and-long-data/#comments) with the following necessary columns: \n",
"\n",
"- `ds` (timestamp): timestamp in format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. \n",
"- `y` (numeric): The target variable to forecast. \n",
"\n",
"(Optionally, you can also pass a DataFrame without the `ds` column as long as it has DatetimeIndex)\n",
"\n",
"`TimeGPT` also works with [distributed dataframes](https://docs.nixtla.io/docs/tutorials-computing_at_scale) like `dask`, `spark` and `ray`. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Can TimeGPT handle multiple time series?</summary>\n",
"\n",
"Yes. For guidance on forecasting multiple time series at once, consult the [Multiple Series](https://docs.nixtla.io/docs/tutorials-multiple_series_forecasting) tutorial. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Does TimeGPT support forecasting with exogenous variables?</summary>\n",
"\n",
"Yes. For instructions on how to incorporate exogenous variables to `TimeGPT`, see the [Exogenous Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables) tutorial. For incorporating calendar dates specifically, you may find the [Holidays and Special Dates](https://docs.nixtla.io/docs/tutorials-holidays_and_special_dates) tutorial useful. For categorical variables, refer to the [Categorical Variables](https://docs.nixtla.io/docs/tutorials-categorical_variables) tutorial.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Can TimeGPT be used for anomaly detection?</summary>\n",
"\n",
"Yes. To learn how to use `TimeGPT` for anomaly detection, refer to the [Anomaly Detection](https://docs.nixtla.io/docs/capabilities-anomaly-detection-anomaly_detection) tutorial.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"<details>\n",
" <summary>Does TimeGPT support cross-validation?</summary>\n",
"\n",
"Yes. To learn how to use `TimeGPT` for cross-validation, refer to the [Cross-Validation](https://docs.nixtla.io/docs/tutorials-cross_validation) tutorial.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Can TimeGPT be used to forecast historical data?</summary>\n",
"\n",
"Yes. To find out how to forecast historical data using `TimeGPT`, see the [Historical Forecast](https://docs.nixtla.io/docs/tutorials-historical_forecast) tutorial.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Can TimeGPT be used for uncertainty quantification?</summary>\n",
"\n",
"Yes. For more information, explore the [Prediction Intervals](https://docs.nixtla.io/docs/tutorials-prediction_intervals) and [Quantile Forecasts](https://docs.nixtla.io/docs/tutorials-quantile_forecasts) tutorials. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Can TimeGPT handle large datasets?</summary>\n",
"\n",
"Yes. When dealing with large datasets that contain hundreds of thousands or millions of time series, we recommend using a distributed backend. `TimeGPT` is compatible with several [distributed computing frameworks](https://docs.nixtla.io/docs/tutorials-computing_at_scale), including [Spark](https://docs.nixtla.io/docs/tutorials-spark), [Ray](https://docs.nixtla.io/docs/tutorials-ray), and [Dask](https://docs.nixtla.io/docs/tutorials-dask). Both the `TimeGPT` SDK and API don’t have a limit on the size of the dataset as long as a distributed backend is used.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Can TimeGPT be used with limited/short data?</summary>\n",
"\n",
"`TimeGPT` supports any amount of data for generating point forecasts and is capable of producing results with just one observation per series. When using arguments such as `level`, `finetune_steps`, `X_df` (exogenous variables), or `add_history`, additional data points are necessary depending on the data frequency. For more details, please refer to the [Data Requirements](https://docs.nixtla.io/docs/getting-started-data_requirements) tutorial.\n",
"\n",
"</details>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>What is the maximum forecast horizon allowed by TimeGPT?</summary>\n",
"\n",
"While `TimeGPT` does not have a maximum forecast horizon, its performance will decrease as the horizon increases. When the forecast horizon exceeds the season length of the data (for example, more than 12 months for monthly data), you will get this message: `WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon`.\n",
"\n",
"For details, refer to the tutorial on [Long Horizon in Time Series](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting).\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Can TimeGPT handle missing values?</summary>\n",
"\n",
"`TimeGPT` cannot handle missing values or series with irregular timestamps. For more information, see the [Forecasting Time Series with Irregular Timestamps](https://docs.nixtla.io/docs/capabilities-forecast-irregular_timestamps) and the [Dealing with Missing Values](https://docs.nixtla.io/docs/tutorials-dealing_with_missing_values_in_timegpt) tutorial. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>How can I plot the TimeGPT forecast?</summary>\n",
"\n",
"The `NixtlaClient` class has a [`plot` method](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-validate-token) that can be used to visualize the forecast. This method only works in interactive environments such as Jupyter notebooks and it doesn't work on Python scripts. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Does TimeGPT support polars?</summary>\n",
"\n",
"As of now, `TimeGPT` does not offer support for polars. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Does TimeGPT produce stable predictions?</summary>\n",
"\n",
"`TimeGPT` is engineered for stability, ensuring consistent results for identical input data. This means that given the same dataset, the model will produce the same forecasts.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Can TimeGPT forecast data with simple pattern such as a straight line or sine wave?</summary>\n",
"\n",
"While this is not the primary use case for `TimeGPT`, it is capable of generating solid results on simple data such as a straight line. While zero-shot predictions might not always meet expectations, a little help with fine-tuning allows TimeGPT to quickly grasp the trend and produce accurate forecasts. For more details, please refer to the [Improve Forecast Accuracy with TimeGPT](https://docs.nixtla.io/docs/tutorials-improve_forecast_accuracy_with_timegpt) tutorial.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fine-tuning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>What is fine-tuning?</summary>\n",
"\n",
"`TimeGPT` was trained on the largest publicly available time series dataset, covering a wide range of domains such as finance, retail, healthcare, and more. This comprehensive training enables `TimeGPT` to produce accurate forecasts for new time series without additional training, a capability known as zero-shot learning. \n",
"\n",
"While the zero-shot model provides a solid baseline, the performance of `TimeGPT` can often be improved through fine-tuning. During this process, the `TimeGPT` model undergoes additional training using your specific dataset, starting from the pre-trained paramaters. The updated model then produces the forecasts. You can control the number of training iterations and the loss function for fine-tuning with the `finetune_steps` and the `finetune_loss` parameters in the `forecast` method from the `NixtlaClient` class, respectively. \n",
"\n",
"For a comprehensive guide on how to apply fine-tuning, please refer to the [fine-tuning](https://docs.nixtla.io/docs/tutorials-fine_tuning) and the [fine-tuning with a specific loss function](https://docs.nixtla.io/docs/tutorials-fine_tuning_with_a_specific_loss_function) tutorials. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Do I have to fine-tune every series?</summary>\n",
"\n",
"No, you do not need to fine-tune every series individually. When using the `finetune_steps` parameter, the model undergoes fine-tuning across all series in your dataset simultaneously. This method uses a cross-learning approach, allowing the model to learn from multiple series at once, which can improve individual forecasts.\n",
"\n",
"Keep in mind that selecting the right number of fine-tuning steps may require some trial and error. As the number of fine-tuning steps increases, the model becomes more specialized to your dataset, but will take longer to train and may be more prone to overfitting. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Can I save fine-tuned parameters?</summary>\n",
"\n",
"Yes! You can fine-tune the TimeGPT model, save it, and reuse it later. For detailed instructions, see our guide on [Re-using Fine-tuned Models](https://docs.nixtla.io/docs/tutorials-re_using_fine_tuned_models).\n",
"\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pricing and Billing "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>How does pricing work?</summary>\n",
"\n",
"See our [Pricing page](https://docs.nixtla.io/docs/getting-started-timegpt_subscription_plans) for information about pricing.\n",
"\n",
"[Start for Free](https://dashboard.nixtla.io/)\n",
"*No credit card needed.\n",
"\n",
"For customized plan details and offerings, book a demo or contact us at [support@nixtla.io](mailto:support@nixtla.io).\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>Are there free options or discounts?</summary>\n",
"\n",
"Yes! We provide some discounted options for academic research. If you would like to learn more, please email us at [support@nixtla.io](mailto:support@nixtla.io).\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>What counts as an API call?</summary> \n",
"\n",
"An API call is a request made to TimeGPT to perform an action like forecasting or detecting anomalies. API Usage is as follows:\n",
"\n",
"### Forecasting:\n",
"\n",
"1. When not requesting historical forecasts (`add_history=False`)\n",
" - If you do not set `num_partitions`, all calls to perform [forecasting](https://docs.nixtla.io/docs/getting-started-timegpt_quickstart), [finetuning](https://docs.nixtla.io/docs/tutorials-fine_tuning), or [cross-validation](https://docs.nixtla.io/docs/tutorials-cross_validation) increase the usage by 1. Note that addition of [exogenous variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables), requesting [uncertainity quantification](https://docs.nixtla.io/docs/tutorials-uncertainty_quantification) or forecasting [multiple series](https://docs.nixtla.io/docs/tutorials-multiple_series_forecasting) does not increase the usage further.\n",
" - If the API call requires to send more than 200MB of data, the API will return an error and will require you to use the `num_partitions` parameter in order to partition your request. Every partition will count as an API call, hence the usage will increase by the value you set for `num_partitions` (e.g. for num_partitions=2, the usage will increase by 2).\n",
" If you set `num_partitions`, all calls to perform forecasting, finetuning, or cross-validation increase the usage by num_partitions.\n",
"2. When requesting [in-sample predictions](https://docs.nixtla.io/docs/tutorials-historical_forecast) (`add_history=True`), the usage from #1 above is multipled by 2.\n",
"\n",
"**Examples**\n",
"\n",
"1. A user uses TimeGPT to forecast daily data, using the `timegpt-1` model. How many API calls are made? (*Ans*: 1)\n",
"2. A user calls the `cross_validation` method on a dataset. How many API calls are made (*Ans*: 1)\n",
"3. A user decides to forecast on a longer horizon, so they use the `timegpt-1-long-horizon` model. How many API calls are made (*Ans*: 1)\n",
"4. A user needs to get the in-sample predicitons when forecasting using `add_history=True`. How many API calls are made (*Ans*: 2)\n",
"5. A user has a very large dataset, with a daily frequency, and they must set `num_partitions=4` when forecasting. How many API calls are made (*Ans*: 4)\n",
"6. A user has to set `num_partitions=4` and is also interesed in getting the in-sample predicitons (`add_history=True`) when forecasting. How many API calls are made (*Ans*: 8)\n",
"\n",
"\n",
"### Anomaly Detection:\n",
"\n",
"1. If you do not set `num_partitions`, all calls to perform [anomaly detection](https://docs.nixtla.io/docs/capabilities-anomaly-detection-quickstart) increase the usage by 1. Note that addition of [exogenous variables](https://docs.nixtla.io/docs/capabilities-anomaly-detection-add_exogenous_variables) does not increase the usage further.\n",
"2. If the API call requires to send more than 200MB of data, the API will return an error and will require you to use the `num_partitions` parameter in order to partition your request. Every partition will count as an API call, hence the usage will increase by the value you set for `num_partitions` (e.g. for num_partitions=2, the usage will increase by 2).\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>How does billing work?</summary>\n",
"\n",
"Billing is done through Stripe. We've partnered with Stripe to handle all payment processing. You can view your invoices and payment history in your [dashboard](https://dashboard.nixtla.io/) under `Billing`. \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Privacy and Security "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary>How do you ensure the privacy and security of my data?</summary>\n",
"\n",
"At Nixtla, we take your privacy and security very seriously. To ensure you are fully informed about our policies regarding your data, please refer to the following documents: \n",
"\n",
"- [Privacy Notice](https://docs.nixtla.io/docs/privacy-notice)\n",
"\n",
"- For the Python SDK, please review the [license agreement](https://github.com/Nixtla/nixtla/blob/main/LICENSE). \n",
"\n",
"- For `TimeGPT`, please refer to our [terms and conditions](https://docs.nixtla.io/docs/terms-and-conditions). \n",
"\n",
"In addtion, we are currently developing a self-hosted version of `TimeGPT`, tailored for the unique security requirements of enterprise data. This version is currently in beta. If you are interested in exploring this option, please contact us at `support@nixtla.io`.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Troubleshooting "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following section contains some common errors and warnings "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary><strong>Error message: Invalid API key</strong></summary>\n",
"\n",
"``` python\n",
"ApiError: status_code: 401, body: {'data': None, 'message': 'Invalid API key', 'details': 'Key not found', 'code': 'A12', 'requestID': 'E7F2BBTB2P', 'support': 'If you have questions or need support, please email support@nixtla.io'}\n",
"```\n",
"\n",
"**Solution:** This error occurs when your `TimeGPT` API key is either invalid or has not been set up correctly. Please use the `validate_api_key` method to verify it or make sure it was copied correctly from the `API Keys` section of your [dashboard](https://dashboard.nixtla.io/).\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
"<summary><strong>Error message: Too many requests</strong></summary>\n",
"\n",
"``` python\n",
"ApiError: status_code: 429, body: {'data': None, 'message': 'Too many requests', 'details': 'You need to add a payment method to continue using the API, do so from https://dashboard.nixtla.io', 'code': 'A21', 'requestID': 'NCJDK7KSJ6', 'support': 'If you have questions or need support, please email support@nixtla.io'}\n",
"```\n",
"\n",
"**Solution:** This error occurs when you have exhausted your free credits and need to add a payment method to continue using `TimeGPT`. You can add a payment method in the `Billing` section of your [dashboard](https://dashboard.nixtla.io/).\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
"<summary><strong>Error message: WriteTimeout</strong></summary>\n",
"\n",
"**Solution:** If you encounter a `WriteTimeout` error, it means that the request has exceeded the allowable processing time. This is a common issue when working with large datasets. To fix this, consider increasing the `num_partitions` parameter in the [`forecast` method](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-forecast) of the `NixtlaClient` class, or use a distributed backend if not already in use.\n",
"\n",
"</details>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional Support"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you have any more questions or need support, please reach out by:\n",
"\n",
"- Opening an [issue](https://github.com/Nixtla/nixtla/issues) on GitHub for technical questions or bugs.\n",
"- Sending an email to `support@nixtla.io` for general inquiries or support.\n",
"- Joining our [Slack](https://join.slack.com/t/nixtlacommunity/shared_invite/zt-2ebtgjbip-QMSnvm6ED1NF5vi4xj_13Q) community to connect with our team and the forecasting community."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Glossary\n",
"> These are some key concepts related to time series forecasting, designed to help you better understand and leverage the capabilities of TimeGPT.\n",
"\n",
"- [Time Series](#time-series) \n",
"- [Forecasting](#forecasting) \n",
"- [Foundation Model](#foundation-model) \n",
"- [TimeGPT](#timegpt) \n",
"- [Tokens](#tokens) \n",
"- [Fine-tuning](#fine-tuning)\n",
"- [Historical Forecasts](#historical-forecasts)\n",
"- [Anomaly Detection](#anomaly-detection)\n",
"- [Time Series Cross-Validation](#time-series-cross-validation)\n",
"- [Exogenous Variables](#exogenous-variables)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Time Series \n",
"\n",
"A time series is a sequence of data points indexed by time, used to model phenomena that changes over time, such as stock prices, temperature, or product sales. A time series can generally be thought of as comprising the following components:\n",
"\n",
"- **Trend**: The consistent, long-term direction of the data, whether upward or downward. It reflects the persistent, overall movement in the series over time.\n",
"\n",
"- **Seasonality**: A repeated cycle around a known and fixed period. \n",
"\n",
"- **Remainder**: The residuals or random noise left in the data after the trend and seasonal effects have been accounted for.\n",
"\n",
"## Forecasting\n",
"\n",
"Forecasting is the process of predicting the future values of a time series based on historical data. It plays a crucial role in the decision-making process across various fields such as finance, healthcare, retail, and economics, among others.\n",
"\n",
"Forecasting can use a variety of approaches, from statistical approaches to novel techniques such as machine learning, deep learning, and foundation models. These models can be further classified into univariate and multivariate models, depending on the number of variables used to make the predictions, or local or global models, with local models estimating parameters independently for each series and global models estimating parameters jointly across multiple series.\n",
"\n",
"Forecasts themselves can be presented as point forecasts, which predict a single future value, or as probabilistic forecasts, which provide a full probability distribution of future values, and hence, providing a measure of uncertainty.\n",
"\n",
"## Foundation Model \n",
"\n",
"Foundation model refers to a type of large, pre-trained model that can be adapted to a wide range of tasks, including time series forecasting. Originally developed for domains such as natural language processing and computer vision, foundation models are now increasingly applied to sequential data like time series. These models are typically trained on extensive datasets, capturing complex patterns and dependencies that can be fine-tuned for specific tasks.\n",
"\n",
"## TimeGPT \n",
"\n",
"Developed by Nixtla, `TimeGPT` is the first foundation model for time series forecasting. `TimeGPT` was trained on billions of observations from publicly available datasets across multiple domains and can produce accurate forecasts for new time series without additional training, using only historical values as inputs. The model 'reads' time series data similarly to how humans read a sentence—sequentially from left to right. It looks at windows of past data, which we can think of as 'tokens', and predicts what comes next. This prediction is based on patterns the model identifies in past data and extrapolates into the future.\n",
"\n",
"## Tokens \n",
"\n",
"`TimeGPT` processes time series data in chunks. Each data point in a series can be thought of as a 'token', akin to how individual words or characters are treated in natural language processing (NLP).\n",
"\n",
"## Fine-tuning \n",
"\n",
"Fine-tuning is a process used in machine learning where a pre-trained model like `TimeGPT` undergoes additional training to adapt it for a specific dataset. Initially, `TimeGPT` can operate in a zero-shot manner, meaning it can generate forecasts as-is. While this zero-shot approach provides a solid baseline, the performance of `TimeGPT` can often be improved through fine-tuning. During this process, the `TimeGPT` model undergoes additional training using the specific dataset, starting from the pre-trained parameters. The updated model then produces the forecasts.\n",
"\n",
"[Learn how to fine-tune TimeGPT](https://docs.nixtla.io/docs/tutorials-fine_tuning)\n",
"\n",
"## Historical Forecasts\n",
"\n",
"Historical forecasts, also known as in-sample forecasts, are the predictions made for the historical data. These forecasts are commonly used to evaluate the performance of forecasting models by comparing the predicted values against the actual values. \n",
"\n",
"[Learn how to make historical forecasts with TimeGPT](https://docs.nixtla.io/docs/tutorials-historical_forecast)\n",
"\n",
"## Anomaly Detection\n",
"\n",
"Anomaly detection refers to the process of identifying unusual observations that deviate significantly from the expected behavior of the data. Anomalies, also known as outliers, can be caused by a variety of factors, such as errors in the data collection process, sudden changes in the underlying patterns of the data, or unexpected events. These anomalies can pose challenges for many forecasting models, as they may distort trends, seasonal patterns, or estimates of autocorrelation. Consequently, anomalies can significantly impact the accuracy of forecasts. Therefore, it is crucial to be able to identify them accurately.\n",
"\n",
"Anomaly detection has many applications across different industries, including detecting fraud in financial transactions, monitoring the performance of online services, or identifying unusual patterns in energy usage.\n",
"\n",
"[Learn how to detect anomalies with TimeGPT](https://docs.nixtla.io/docs/capabilities-anomaly-detection-anomaly_detection)\n",
"\n",
"## Time Series Cross Validation\n",
"\n",
"Time series cross-validation is a method for evaluating how a model would have performed on historical data. It works by defining a sliding window across past observations and predicting the period following it. It differs from standard cross-validation by maintaining the chronological order of the data instead of randomly splitting it.\n",
"\n",
"This method allows for a more accurate estimation of a forecasting model's predictive capabilities by considering multiple sequential periods. When only one window is used, this method resembles a standard train-test split, with the last set of observations serving as the test data and all preceding data as the training set.\n",
"\n",
"[Learn how to perform cross-validation with TimeGPT](https://docs.nixtla.io/docs/tutorials-cross_validation)\n",
"\n",
"## Exogenous Variables\n",
"\n",
"Exogenous variables are external factors that can influence the behavior of a time series but are not directly affected by it. For example, in retail sales forecasting, exogenous variables could include factors such as holidays, promotions, prices, or weather data for electricity load forecasts. By incorporating these variables into the forecasting model, it is possible to capture the relationships between the target series and external factors, leading to more accurate predictions.\n",
"\n",
"[Learn how to include exogenous variables in TimeGPT](https://docs.nixtla.io/docs/tutorials-exogenous_variables)"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Why TimeGPT?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we compare the performance of TimeGPT against three forecasting models: the classical model (ARIMA), the machine learning model (LightGBM), and the deep learning model (N-HiTS), using a subset of data from the M5 Forecasting competition. We want to highlight three top-rated benefits our users love about TimeGPT:\n",
"\n",
"🎯 **Accuracy**: TimeGPT consistently outperforms traditional models by capturing complex patterns with precision.\n",
"\n",
"⚡ **Speed**: Generate forecasts faster without needing extensive training or tuning for each series.\n",
"\n",
"🚀 **Ease of Use**: Minimal setup and no complex preprocessing make TimeGPT accessible and ready to use right out of the box!\n",
"\n",
"Before diving into the notebook, please visit our [dashboard](https://dashboard.nixtla.io) to generate your TimeGPT `api_key` and give it a try yourself!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"\n",
"1. [Data Introduction](#1-data-introduction)\n",
"2. [Model Fitting](#2-model-fitting-timegpt-arima-lightgbm-n-hits)\n",
" 1. [Fitting TimeGPT](#21-timegpt)\n",
" 2. [Fitting ARIMA](#22-classical-models-arima)\n",
" 3. [Fitting Light GBM](#23-machine-learning-models-lightgbm)\n",
" 4. [Fitting NHITS](#24-n-hits)\n",
"3. [Results and Evaluation](#3-performance-comparison-and-results)\n",
"4. [Conclusion](#4-conclusion)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/getting-started/7_why_timegpt.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv()\n",
" colab_badge('docs/getting-started/7_why_timegpt')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from nixtla import NixtlaClient\n",
"from utilsforecast.plotting import plot_series\n",
"from utilsforecast.losses import mae, rmse, smape\n",
"from utilsforecast.evaluation import evaluate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data introduction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we’re working with an aggregated dataset from the M5 Forecasting - Accuracy competition. This dataset includes **7 daily time series**, each with **1,941 data points**. The last **28 data points** of each series are set aside as the test set, allowing us to evaluate model performance on unseen data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/demand_example.csv', parse_dates=['ds'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe thead tr:last-of-type th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th colspan=\"3\" halign=\"left\">ds</th>\n",
" <th colspan=\"4\" halign=\"left\">y</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th>min</th>\n",
" <th>max</th>\n",
" <th>count</th>\n",
" <th>min</th>\n",
" <th>mean</th>\n",
" <th>median</th>\n",
" <th>max</th>\n",
" </tr>\n",
" <tr>\n",
" <th>unique_id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>FOODS_1</th>\n",
" <td>2011-01-29</td>\n",
" <td>2016-05-22</td>\n",
" <td>1941</td>\n",
" <td>0.0</td>\n",
" <td>2674.085523</td>\n",
" <td>2665.0</td>\n",
" <td>5493.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FOODS_2</th>\n",
" <td>2011-01-29</td>\n",
" <td>2016-05-22</td>\n",
" <td>1941</td>\n",
" <td>0.0</td>\n",
" <td>4015.984029</td>\n",
" <td>3894.0</td>\n",
" <td>9069.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FOODS_3</th>\n",
" <td>2011-01-29</td>\n",
" <td>2016-05-22</td>\n",
" <td>1941</td>\n",
" <td>10.0</td>\n",
" <td>16969.089129</td>\n",
" <td>16548.0</td>\n",
" <td>28663.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>HOBBIES_1</th>\n",
" <td>2011-01-29</td>\n",
" <td>2016-05-22</td>\n",
" <td>1941</td>\n",
" <td>0.0</td>\n",
" <td>2936.122617</td>\n",
" <td>2908.0</td>\n",
" <td>5009.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>HOBBIES_2</th>\n",
" <td>2011-01-29</td>\n",
" <td>2016-05-22</td>\n",
" <td>1941</td>\n",
" <td>0.0</td>\n",
" <td>279.053065</td>\n",
" <td>248.0</td>\n",
" <td>871.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>HOUSEHOLD_1</th>\n",
" <td>2011-01-29</td>\n",
" <td>2016-05-22</td>\n",
" <td>1941</td>\n",
" <td>0.0</td>\n",
" <td>6039.594539</td>\n",
" <td>5984.0</td>\n",
" <td>11106.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>HOUSEHOLD_2</th>\n",
" <td>2011-01-29</td>\n",
" <td>2016-05-22</td>\n",
" <td>1941</td>\n",
" <td>0.0</td>\n",
" <td>1566.840289</td>\n",
" <td>1520.0</td>\n",
" <td>2926.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ds y \n",
" min max count min mean median max\n",
"unique_id \n",
"FOODS_1 2011-01-29 2016-05-22 1941 0.0 2674.085523 2665.0 5493.0\n",
"FOODS_2 2011-01-29 2016-05-22 1941 0.0 4015.984029 3894.0 9069.0\n",
"FOODS_3 2011-01-29 2016-05-22 1941 10.0 16969.089129 16548.0 28663.0\n",
"HOBBIES_1 2011-01-29 2016-05-22 1941 0.0 2936.122617 2908.0 5009.0\n",
"HOBBIES_2 2011-01-29 2016-05-22 1941 0.0 279.053065 248.0 871.0\n",
"HOUSEHOLD_1 2011-01-29 2016-05-22 1941 0.0 6039.594539 5984.0 11106.0\n",
"HOUSEHOLD_2 2011-01-29 2016-05-22 1941 0.0 1566.840289 1520.0 2926.0"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('unique_id').agg({\"ds\":[\"min\",\"max\",\"count\"],\\\n",
" \"y\":[\"min\",\"mean\",\"median\",\"max\"]})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(13391, 3) (196, 3)\n"
]
}
],
"source": [
"df_train = df.query('ds <= \"2016-04-24\"')\n",
"df_test = df.query('ds > \"2016-04-24\"')\n",
"\n",
"print(df_train.shape, df_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Model Fitting (TimeGPT, ARIMA, LightGBM, N-HiTS)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1 TimeGPT\n",
"TimeGPT offers a powerful, streamlined solution for time series forecasting, delivering state-of-the-art results with minimal effort. With TimeGPT, there's no need for data preprocessing or feature engineering -- simply initiate the Nixtla client and call `nixtla_client.forecast` to produce accurate, high-performance forecasts tailored to your unique time series.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Inferred freq: D\n",
"INFO:nixtla.nixtla_client:Querying model metadata...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
}
],
"source": [
"# Forecast with TimeGPT\n",
"fcst_timegpt = nixtla_client.forecast(df = df_train,\n",
" target_col = 'y', \n",
" h=28, # Forecast horizon, predicts the next 28 time steps\n",
" model='timegpt-1-long-horizon', # Use the model for long-horizon forecasting\n",
" finetune_steps=10, # Number of finetuning steps\n",
" level = [90]) # Generate a 90% confidence interval"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"metric\n",
"rmse 592.607378\n",
"smape 0.049403\n",
"Name: TimeGPT, dtype: float64"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Evaluate performance and plot forecast\n",
"fcst_timegpt['ds'] = pd.to_datetime(fcst_timegpt['ds'])\n",
"test_df = pd.merge(df_test, fcst_timegpt, 'left', ['unique_id', 'ds'])\n",
"evaluation_timegpt = evaluate(test_df, metrics=[rmse, smape], models=[\"TimeGPT\"])\n",
"evaluation_timegpt.groupby(['metric'])['TimeGPT'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Classical Models (ARIMA):\n",
"Next, we applied ARIMA, a traditional statistical model, to the same forecasting task. Classical models use historical trends and seasonality to make predictions by relying on linear assumptions. However, they struggled to capture the complex, non-linear patterns within the data, leading to lower accuracy compared to other approaches. Additionally, ARIMA was slower due to its iterative parameter estimation process, which becomes computationally intensive for larger datasets."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 Why Use TimeGPT over Classical Models?\n",
">\n",
"> * **Complex Patterns**: TimeGPT captures non-linear trends classical models miss.\n",
">\n",
"> * **Minimal Preprocessing**: TimeGPT requires little to no data preparation.\n",
">\n",
"> * **Scalability**: TimeGPT can efficiently scales across multiple series without retraining."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| eval: false\n",
"from statsforecast import StatsForecast\n",
"from statsforecast.models import AutoARIMA"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| eval: false\n",
"#Initiate ARIMA model\n",
"sf = StatsForecast(\n",
" models=[AutoARIMA(season_length=7)],\n",
" freq='D'\n",
")\n",
"# Fit and forecast\n",
"fcst_arima = sf.forecast(h=28, df=df_train) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"fcst_arima = pd.read_csv('../../assets/arima_rst.csv', parse_dates=['ds'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"metric\n",
"rmse 724.957364\n",
"smape 0.055018\n",
"Name: AutoARIMA, dtype: float64"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst_arima.reset_index(inplace=True)\n",
"test_df = pd.merge(df_test, fcst_arima, 'left', ['unique_id', 'ds'])\n",
"evaluation_arima = evaluate(test_df, metrics=[rmse, smape], models=[\"AutoARIMA\"])\n",
"evaluation_arima.groupby(['metric'])['AutoARIMA'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 Machine Learning Models (LightGBM)\n",
"\n",
"Thirdly, we used a machine learning model, LightGBM, for the same forecasting task, implemented through the automated pipeline provided by our mlforecast library.\n",
"While LightGBM can capture seasonality and patterns, achieving the best performance often requires detailed feature engineering, careful hyperparameter tuning, and domain knowledge. You can try our mlforecast library to simplify this process and get started quickly!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 Why Use TimeGPT over Machine Learning Models?\n",
">\n",
"> * **Automatic Pattern Recognition**: Captures complex patterns from raw data, bypassing the need for feature engineering.\n",
">\n",
"> * **Minimal Tuning**: Works well without extensive tuning.\n",
">\n",
"> * **Scalability**: Forecasts across multiple series without retraining."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| eval: false\n",
"import optuna\n",
"from mlforecast.auto import AutoMLForecast, AutoLightGBM\n",
"\n",
"# Suppress Optuna's logging output\n",
"optuna.logging.set_verbosity(optuna.logging.ERROR)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| eval: false\n",
"# Initialize an automated forecasting pipeline using AutoMLForecast.\n",
"mlf = AutoMLForecast(\n",
" models=[AutoLightGBM()],\n",
" freq='D',\n",
" season_length=7, \n",
" fit_config=lambda trial: {'static_features': ['unique_id']}\n",
")\n",
"\n",
"# Fit the model to the training dataset.\n",
"mlf.fit(\n",
" df=df_train.astype({'unique_id': 'category'}),\n",
" n_windows=1,\n",
" h=28,\n",
" num_samples=10,\n",
")\n",
"fcst_lgbm = mlf.predict(28)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"fcst_lgbm = pd.read_csv('../../assets/lgbm_rst.csv', parse_dates=['ds'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"metric\n",
"rmse 687.773744\n",
"smape 0.051448\n",
"Name: AutoLightGBM, dtype: float64"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test_df = pd.merge(df_test, fcst_lgbm, 'left', ['unique_id', 'ds'])\n",
"evaluation_lgbm = evaluate(test_df, metrics=[rmse, smape], models=[\"AutoLightGBM\"])\n",
"evaluation_lgbm.groupby(['metric'])['AutoLightGBM'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.4 N-HiTS\n",
"\n",
"Lastly, we used N-HiTS, a state-of-the-art deep learning model designed for time series forecasting. The model produced accurate results, demonstrating its ability to capture complex, non-linear patterns within the data. However, setting up and tuning N-HiTS required significantly more time and computational resources compared to TimeGPT."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 Why Use TimeGPT Over Deep Learning Models?\n",
">\n",
"> * **Faster Setup**: Quick setup and forecasting, unlike the lengthy configuration and training times of neural networks.\n",
">\n",
"> * **Less Tuning**: Performs well with minimal tuning and preprocessing, while neural networks often need extensive adjustments.\n",
">\n",
"> * **Ease of Use**: Simple deployment with high accuracy, making it accessible without deep technical expertise."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| eval: false\n",
"from neuralforecast.core import NeuralForecast\n",
"from neuralforecast.models import NHITS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| eval: false\n",
"# Initialize the N-HiTS model.\n",
"models = [NHITS(h=28, \n",
" input_size=28, \n",
" max_steps=100)]\n",
"\n",
"# Fit the model using training data\n",
"nf = NeuralForecast(models=models, freq='D')\n",
"nf.fit(df=df_train)\n",
"fcst_nhits = nf.predict()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"fcst_nhits = pd.read_csv('../../assets/nhits_rst.csv', parse_dates=['ds'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"metric\n",
"rmse 605.011948\n",
"smape 0.053446\n",
"Name: NHITS, dtype: float64"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test_df = pd.merge(df_test,fcst_nhits, 'left', ['unique_id', 'ds'])\n",
"evaluation_nhits = evaluate(test_df, metrics=[rmse, smape], models=[\"NHITS\"])\n",
"evaluation_nhits.groupby(['metric'])['NHITS'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Performance Comparison and Results:\n",
"The performance of each model is evaluated using RMSE (Root Mean Squared Error) and SMAPE (Symmetric Mean Absolute Percentage Error). While RMSE emphasizes the models’ ability to control significant errors, SMAPE provides a relative performance perspective by normalizing errors as percentages. Below, we present a snapshot of performance across all groups. The results demonstrate that TimeGPT outperforms other models on both metrics.\n",
"\n",
"🌟 For a deeper dive into benchmarking, check out our benchmark repository. The summarized results are displayed below:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Overall Performance Metrics\n",
"\n",
"| **Model** | **RMSE** | **SMAPE** |\n",
"|------------------|----------|-----------|\n",
"| ARIMA | 724.9 | 5.50% |\n",
"| LightGBM | 687.8 | 5.14% |\n",
"| N-HiTS | 605.0 | 5.34% |\n",
"| **TimeGPT** | **592.6**| **4.94%** |\n",
" \n",
"\n",
"#### Breakdown for Each Time-series\n",
"Followed below are the metrics for each individual time series groups. TimeGPT consistently delivers accurate forecasts across all time series groups. In many cases, it performs as well as or better than data-specific models, showing its versatility and reliability across different datasets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| echo: false\n",
"evaluation_df = evaluation_arima.merge(evaluation_lgbm, on = ['unique_id','metric'], how = 'left')\\\n",
" .merge(evaluation_nhits, on = ['unique_id','metric'], how = 'left')\\\n",
" .merge(evaluation_timegpt, on = ['unique_id','metric'], how = 'left')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 1400x600 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# | echo: false\n",
"colors = [\n",
" (\"#A9B9C3\", 0.5), # Grey-bluish color 1\n",
" (\"#7A8D9D\", 0.5), # Grey-bluish color 2\n",
" (\"#5B6D79\", 0.5), # Grey-bluish color 3\n",
" ('#F95D6A', 0.75) # Green color for the last\n",
"]\n",
"\n",
"\n",
"# Filter evaluation data by metric and set unique_id as index\n",
"rmse_df = evaluation_df[evaluation_df['metric'] == 'rmse'].set_index('unique_id')\n",
"smape_df = evaluation_df[evaluation_df['metric'] == 'smape'].set_index('unique_id')\n",
"\n",
"# Plot function with custom colors and opacity\n",
"def plot_metric(ax, df, title, ylabel):\n",
" x = np.arange(len(df))\n",
" bar_width = 0.2\n",
" for i, (col, (color, alpha)) in enumerate(zip(df.columns[1:], colors)):\n",
" ax.bar(x + i * bar_width, df[col], width=bar_width, label=col, color=color, alpha=alpha)\n",
" ax.set(title=title, ylabel=ylabel, xticks=x + bar_width * (len(df.columns[1:]) - 1) / 2, xticklabels=df.index)\n",
" ax.tick_params(axis='x', rotation=45)\n",
" ax.legend()\n",
"\n",
"# Generate side-by-side plots for RMSE and SMAPE\n",
"fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
"plot_metric(axes[0], rmse_df, \"RMSE Comparison Across Models\", \"RMSE\")\n",
"plot_metric(axes[1], smape_df*100, \"%SMAPE Comparison Across Models\", \"SMAPE\")\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Benchmark Results\n",
"For a more comprehensive dive into model accuracy and performance, explore our [Time Series Model Arena](https://github.com/Nixtla/nixtla/tree/main/experiments/foundation-time-series-arena)! TimeGPT continues to lead the pack with exceptional performance across benchmarks! 🌟\n",
"\n",
"<img src=\"https://github.com/Nixtla/nixtla/blob/main/nbs/img/timeseries_model_arena.png?raw=true\" alt=\"Benchmark\" />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Conclusion\n",
"At the end of this notebook, we’ve put together a handy table to show you exactly where TimeGPT shines brightest compared to other forecasting models. ☀️ Think of it as your quick guide to choosing the best model for your unique project needs. We’re confident that TimeGPT will be a valuable tool in your forecasting journey. Don’t forget to visit our [dashboard](https://dashboard.nixtla.io) to generate your TimeGPT `api_key` and get started today! Happy forecasting, and enjoy the insights ahead! "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| Scenario | TimeGPT | Classical Models (e.g., ARIMA) | Machine Learning Models (e.g., XGB, LGBM) | Deep Learning Models (e.g., N-HITS) |\n",
"|-----------------------------------|-------------------------------------------|-------------------------------------------------------|---------------------------------------------------------|-------------------------------------------------------|\n",
"| **Seasonal Patterns** | ✅ Performs well with minimal setup | ✅ Handles seasonality with adjustments (e.g., SARIMA) | ✅ Performs well with feature engineering | ✅ Captures seasonal patterns effectively |\n",
"| **Non-Linear Patterns** | ✅ Excels, especially with complex non-linear patterns | ❌ Limited performance | ❌ Struggles without extensive feature engineering | ✅ Performs well with non-linear relationships |\n",
"| **Large Dataset** | ✅ Highly scalable across many series | ❌ Slow and resource-intensive | ✅ Scalable with optimized implementations | ❌ Requires significant resources for large datasets |\n",
"| **Small Dataset** | ✅ Performs well; requires only one data point to start | ✅ Performs well; may struggle with very sparse data | ✅ Performs adequately if enough features are extracted | ❌ May need a minimum data size to learn effectively |\n",
"| **Preprocessing Required** | ✅ Minimal preprocessing needed | ❌ Requires scaling, log-transform, etc., to meet model assumptions | ❌ Requires extensive feature engineering for complex patterns | ❌ Needs data normalization and preprocessing |\n",
"| **Accuracy Requirement** | ✅ Achieves high accuracy with minimal tuning | ❌ May struggle with complex accuracy requirements | ✅ Can achieve good accuracy with tuning | ✅ High accuracy possible but with significant resource use |\n",
"| **Scalability** | ✅ Highly scalable with minimal task-specific configuration | ❌ Not easily scalable | ✅ Moderate scalability, with feature engineering and tuning per task | ❌ Limited scalability due to resource demands |\n",
"| **Computational Resources** | ✅ Highly efficient, operates seamlessly on CPU, no GPU needed | ✅ Light to moderate, scales poorly with large datasets | ❌ Moderate, depends on feature complexity | ❌ High resource consumption, often requires GPU |\n",
"| **Memory Requirement** | ✅ Efficient memory usage for large datasets | ✅ Moderate memory requirements | ❌ High memory usage for larger datasets or many series cases | ❌ High memory consumption for larger datasets and multiple series |\n",
"| **Technical Requirements & Domain Knowledge** | ✅ Low; minimal technical setup and no domain expertise needed | ✅ Low to moderate; needs understanding of stationarity | ❌ Moderate to high; requires feature engineering and tuning | ❌ High; complex architecture and tuning |\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TimeGPT Excel Add-in (Beta)\n",
"\n",
"## Installation\n",
"\n",
"Head to the [TimeGTP excel add-in page in Microsoft Appsource](https://appsource.microsoft.com/en-us/product/office/WA200006429?tab=Overview) and click on \"Get it now\"\n",
"\n",
"## Usage\n",
"> 📘 Access token required\n",
"> \n",
"> The TimeGPT Excel Add-in requires an access token. Get your API Key on the [Nixtla Dashboard](http://dashboard.nixtla.io).\n",
"\n",
"## Support\n",
"\n",
"If you have questions or need support, please email `support@nixtla.io`.\n",
"\n",
"## How-to\n",
"\n",
"### Settings\n",
"\n",
"If this is your first time using Excel add-ins, find information on how to add Excel add-ins with your version of Excel. In the Office Add-ins Store, you'll search for \"TimeGPT\". \n",
"\n",
"Once you have installed the TimeGPT add-in, the add-in comes up in a sidebar task pane. \n",
"* Read through the Welcome screen.\n",
"* Click on the **'Get Started'** button.\n",
"* The API URL is already set to: https://api.nixtla.io.\n",
"* Copy your API key from [Nixtla Dashboard](http://dashboard.nixtla.io). Paste it into the box that say **API Key, Bearer**.\n",
"* Click the gray arrow next to that box on the right. \n",
"* You'll get to a screen with options for 'Forecast' and 'Anomaly Detection'.\n",
"\n",
"To access the settings later, click the gear icon in the top left.\n",
"\n",
"### Data Requirements\n",
"\n",
"* Put your dates in one column and your values in another.\n",
"* Ensure your date format is recognized as a valid date by excel.\n",
"* Ensure your values are recognized as valid number by excel.\n",
"* All data inputs must exist in the same worksheet. The add-in does not support forecasting using multiple worksheets.\n",
"* Do not include headers\n",
"\n",
"Example:\n",
"\n",
"| dates | values | \n",
"| :------------- | :----- | \n",
"| 12/1/16 0:00 | 72 | \n",
"| 12/1/16 1:00 | 65.8 | \n",
"| 12/1/16 2:00 | 59.99 | \n",
"| 12/1/16 3:00 | 50.69 | \n",
"| 12/1/16 4:00 | 52.58 | \n",
"| 12/1/16 5:00 | 65.05 | \n",
"| 12/1/16 6:00 | 80.4 | \n",
"| 12/1/16 7:00 | 200 | \n",
"| 12/1/16 8:00 | 200.63 | \n",
"| 12/1/16 9:00 | 155.47 | \n",
"| 12/1/16 10:00 | 150.91 | \n",
"\n",
"#### Forecasting\n",
"\n",
"Once you've configured your token and formatted your input data then you're all ready to forecast!\n",
"\n",
"With the add-in open, configure the forecasting settings by selecting the column for each input.\n",
"\n",
"* **Frequency** - The frequency of the data (hourly / daily / weekly / monthly)\n",
"\n",
"* **Horizon** - The forecasting horizon. This represents the number of time steps into the future that the forecast should predict.\n",
"\n",
"* **Dates Range** - The column and range of the timeseries timestamps. Must not include header data, and should be formatted as a range, e.g. A2:A145. \n",
"\n",
"* **Values Range** - The column and range of the timeseries values for each point in time. Must not include header data, and should be formatted as a range, e.g. B2:B145. \n",
"\n",
"\n",
"\n",
"\n",
"\n",
"When you're ready, click **Make Prediction** to generate the predicted values. The add-in will generate a plot and append the forecasted data to the end of the column of your existing data and highlight them in green. So, scroll to the end of your data to see the predicted values. \n",
"\n",
"\n",
"\n",
"#### Anomaly Detection\n",
"\n",
"The requirements are the same as for the forecasting functionality, so if you already tried it you are ready to run the anomaly detection one. Go to the main page in the add-in and select \"Anomaly Detection\", then choose your dates and values cell ranges and click on submit. We'll run the model and mark the anomalies cells in yellow while adding a third column for expected values with a green background.\n",
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TimeGPT in R\n",
"\n",
"TimeGPT is also available in R through the `nixtlar` package, which is available on CRAN. This package can be used in a way almost identical to its Python counterpart. It offers nearly the same functionalities, with missing features and documentation currently under development. Originally developed in Python, TimeGPT is now accessible to the R community through `nixtlar`, providing access to the first foundation model for time series forecasting and embracing our core philosophy that _the future is for everybody_."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://github.com/Nixtla/nixtla/blob/main/nbs/img/logo_nixtlar.png?raw=true\" alt=\"Logo for nixtlar\" width=\"700\" />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to use \n",
"\n",
"To learn how to use `nixtlar`, please refer to the [documentation](https://nixtla.github.io/nixtlar/). \n",
"\n",
"To view directly on CRAN, please use this [link](https://cloud.r-project.org/web/packages/nixtlar/index.html). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 API key required\n",
"> \n",
"> The `nixtlar` package requires an API key. Get yours on the [Nixtla Dashboard](http://dashboard.nixtla.io).\n",
"\n",
"## Support\n",
"\n",
"If you have questions or need support, please email `support@nixtla.io`."
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment