Commit f42429f6 authored by bailuo's avatar bailuo
Browse files

readme

parents
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla datasetsforecast utilsforecast"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Long-horizon forecasting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Long-horizon forecasting refers to predictions far into the future, typically exceeding two seasonal periods. However, the exact definition of a 'long horizon' can vary based on the frequency of the data. For example, when dealing with hourly data, a forecast for three days into the future is considered long-horizon, as it covers 72 timestamps (calculated as 3 days × 24 hours/day). In the context of monthly data, a period exceeding two years would typically be classified as long-horizon forecasting. Similarly, for daily data, a forecast spanning more than two weeks falls into the long-horizon category.\n",
"\n",
"Of course, forecasting over a long horizon comes with its challenges. The longer the forecast horizon, the greater the uncertainty in the predictions. It is also possible to have unknown factors come into play in the long-term that were not expected at the time of forecasting.\n",
"\n",
"To tackle those challenges, use TimeGPT's specialized model for long-horizon forecasting by specifying `model='timegpt-1-long-horizon'` in your setup.\n",
"\n",
"For a detailed step-by-step guide, follow this tutorial on long-horizon forecasting."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/04_longhorizon.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv() \n",
" colab_badge('docs/tutorials/04_longhorizon')\n",
" import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Import packages\n",
"First, we install and import the required packages and initialize the Nixtla client."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nixtla import NixtlaClient\n",
"from datasetsforecast.long_horizon import LongHorizon\n",
"from utilsforecast.losses import mae"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 👍 Use an Azure AI endpoint\n",
"> \n",
"> To use an Azure AI endpoint, remember to set also the `base_url` argument:\n",
"> \n",
"> `nixtla_client = NixtlaClient(base_url=\"you azure ai endpoint\", api_key=\"your api_key\")`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Load the data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's load the ETTh1 dataset. This is a widely used dataset to evaluate models on their long-horizon forecasting capabalities. \n",
"\n",
"The ETTh1 dataset monitors an electricity transformer from a region of a province of China including oil temperature and variants of load (such as high useful load and high useless load) from July 2016 to July 2018 at an hourly frequency.\n",
"\n",
"For this tutorial, let's only consider the oil temperature variation over time."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 314M/314M [00:14<00:00, 21.3MiB/s] \n",
"INFO:datasetsforecast.utils:Successfully downloaded datasets.zip, 314116557, bytes.\n",
"INFO:datasetsforecast.utils:Decompressing zip file...\n",
"INFO:datasetsforecast.utils:Successfully decompressed longhorizon\\datasets\\datasets.zip\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>OT</td>\n",
" <td>2016-07-01 00:00:00</td>\n",
" <td>1.460552</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>OT</td>\n",
" <td>2016-07-01 01:00:00</td>\n",
" <td>1.161527</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>OT</td>\n",
" <td>2016-07-01 02:00:00</td>\n",
" <td>1.161527</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>OT</td>\n",
" <td>2016-07-01 03:00:00</td>\n",
" <td>0.862611</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>OT</td>\n",
" <td>2016-07-01 04:00:00</td>\n",
" <td>0.525227</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds y\n",
"0 OT 2016-07-01 00:00:00 1.460552\n",
"1 OT 2016-07-01 01:00:00 1.161527\n",
"2 OT 2016-07-01 02:00:00 1.161527\n",
"3 OT 2016-07-01 03:00:00 0.862611\n",
"4 OT 2016-07-01 04:00:00 0.525227"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#| eval: false\n",
"Y_df, *_ = LongHorizon.load(directory='./', group='ETTh1')\n",
"\n",
"Y_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this small experiment, let's set the horizon to 96 time steps (4 days into the future), and we will feed TimeGPT with a sequence of 42 days."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" Y_df = pd.read_parquet(\"../../assets/long_horizon_example_Y_df.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test = Y_df[-96:] # 96 = 4 days x 24h/day\n",
"input_seq = Y_df[-1104:-96] # Gets a sequence of 1008 observations (1008 = 42 days * 24h/day)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Forecasting for long-horizon"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we are ready to use TimeGPT for long-horizon forecasting. Here, we need to set the `model` parameter to `\"timegpt-1-long-horizon\"`. This is the specialized model in TimeGPT that can handle such tasks."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Inferred freq: H\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
}
],
"source": [
"fcst_df = nixtla_client.forecast(\n",
" df=input_seq,\n",
" h=96,\n",
" level=[90],\n",
" finetune_steps=10,\n",
" finetune_loss='mae',\n",
" model='timegpt-1-long-horizon',\n",
" time_col='ds',\n",
" target_col='y'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 Available models in Azure AI\n",
">\n",
"> If you are using an Azure AI endpoint, please be sure to set `model=\"azureai\"`:\n",
">\n",
"> `nixtla_client.forecast(..., model=\"azureai\")`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 2400x350 with 1 Axes>"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nixtla_client.plot(Y_df[-168:], fcst_df, models=['TimeGPT'], level=[90], time_col='ds', target_col='y')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now evaluate the performance of TimeGPT using the mean absolute error (MAE)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test = test.copy()\n",
"\n",
"test.loc[:, 'TimeGPT'] = fcst_df['TimeGPT'].values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" unique_id TimeGPT\n",
"0 OT 0.145393\n"
]
}
],
"source": [
"evaluation = mae(test, models=['TimeGPT'], id_col='unique_id', target_col='y')\n",
"\n",
"print(evaluation)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, TimeGPT achieves a MAE of 0.146."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "markdown",
"id": "6de758ee-a0d2-4b3f-acff-eed419dd17c5",
"metadata": {},
"source": [
"# Training\n",
"\n",
"This section offers tutorials related to training `TimeGPT` under specific conditions.\n",
"\n",
"### What You Will Learn\n",
"\n",
"1. **[Long Horizon Forecasting](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)**\n",
"\n",
" - Discover how make predictions beyond two seasonal periods or even further into the future, using `TimeGPT`'s specialized model for long horizon forecasting.\n",
"\n",
"2. **[Multiple Series Forecasting](https://docs.nixtla.io/docs/tutorials-multiple_series_forecasting)**\n",
"\n",
" - Learn how to use `TimeGPT` to forecast multiple time series simultaneously."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "27371399-17ac-4fcf-8e2d-19091b32cdf7",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e428575b-700a-49a6-a0a9-6fa884119d86",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fba11152-1fbb-43b5-b6c7-ccb5ff688ce2",
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0438f77-6a7e-400d-8739-09c9e347dcac",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"id": "d4bcec3f-9ffe-41e0-a38b-92e77e460154",
"metadata": {},
"source": [
"# Re-using fine-tuned models\n",
"\n",
"Save and re-use fine-tuned models across all of our endpoints."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56e9125c-53b3-41e4-bace-e920fb827c06",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/061_reusing_finetuned_models.ipynb.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv() \n",
" colab_badge('docs/tutorials/061_reusing_finetuned_models')"
]
},
{
"cell_type": "markdown",
"id": "c7eb9fc0-4541-4c1e-8ffe-442d115fd638",
"metadata": {},
"source": [
"## 1. Import packages\n",
"First, we import the required packages and initialize the Nixtla client"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89c80a4a-645d-43f9-9454-415a98685105",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from nixtla import NixtlaClient\n",
"from utilsforecast.losses import rmse\n",
"from utilsforecast.evaluation import evaluate"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73d7516f-2a78-4be1-972e-41cb70800bcd",
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # defaults to os.environ[\"NIXTLA_API_KEY\"]\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a60ca743-7d68-4d4b-af72-10f63dbf5b26",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"id": "83ca8dec-ca2a-4e9f-8983-886208423769",
"metadata": {},
"source": [
"## 2. Load data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb5ef6b1-4756-4f79-8609-12f051503431",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>H1</td>\n",
" <td>1</td>\n",
" <td>605.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>H1</td>\n",
" <td>2</td>\n",
" <td>586.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>H1</td>\n",
" <td>3</td>\n",
" <td>586.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>H1</td>\n",
" <td>4</td>\n",
" <td>559.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>H1</td>\n",
" <td>5</td>\n",
" <td>511.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds y\n",
"0 H1 1 605.0\n",
"1 H1 2 586.0\n",
"2 H1 3 586.0\n",
"3 H1 4 559.0\n",
"4 H1 5 511.0"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_parquet('https://datasets-nixtla.s3.amazonaws.com/m4-hourly.parquet')\n",
"\n",
"h = 48\n",
"valid = df.groupby('unique_id', observed=True).tail(h)\n",
"train = df.drop(valid.index)\n",
"train.head()"
]
},
{
"cell_type": "markdown",
"id": "f7b61f18-64a3-4b7f-8f86-76a78d6a0c0c",
"metadata": {},
"source": [
"## 3. Zero-shot forecast\n",
"\n",
"We can try forecasting without any finetuning to see how well TimeGPT does."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3e60cbbe-2710-4a7b-a453-27e52bf8b32b",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Querying model metadata...\n",
"WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon, this may lead to less accurate forecasts. Please consider using a smaller horizon.\n",
"INFO:nixtla.nixtla_client:Restricting input...\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>metric</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>rmse</td>\n",
" <td>1504.474342</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" metric TimeGPT\n",
"0 rmse 1504.474342"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst_kwargs = {'df': train, 'freq': 1, 'model': 'timegpt-1-long-horizon'}\n",
"fcst = nixtla_client.forecast(h=h, **fcst_kwargs)\n",
"zero_shot_eval = evaluate(fcst.merge(valid), metrics=[rmse], agg_fn='mean')\n",
"zero_shot_eval"
]
},
{
"cell_type": "markdown",
"id": "f966407c-9c7d-4bce-8d6c-31870e00e7b5",
"metadata": {},
"source": [
"## 4. Fine-tune\n",
"\n",
"We can now fine-tune TimeGPT a little and save our model for later use. We can define the ID that we want that model to have by providing it through `output_model_id`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ffd8395-c30c-4522-b597-349a9d3a4b2e",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Calling Fine-tune Endpoint...\n"
]
},
{
"data": {
"text/plain": [
"'my-first-finetuned-model'"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first_model_id = 'my-first-finetuned-model'\n",
"nixtla_client.finetune(output_model_id=first_model_id, **fcst_kwargs)"
]
},
{
"cell_type": "markdown",
"id": "1198429a-5518-43a3-bd73-2fa5d1f48cc3",
"metadata": {},
"source": [
"We can now forecast using this fine-tuned model by providing its ID through the `finetuned_model_id` argument."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb996e6a-37e1-44ea-af8d-3b71cf6276ae",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon, this may lead to less accurate forecasts. Please consider using a smaller horizon.\n",
"INFO:nixtla.nixtla_client:Restricting input...\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>metric</th>\n",
" <th>TimeGPT_zero_shot</th>\n",
" <th>TimeGPT_first_finetune</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>rmse</td>\n",
" <td>1504.474342</td>\n",
" <td>1472.024619</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" metric TimeGPT_zero_shot TimeGPT_first_finetune\n",
"0 rmse 1504.474342 1472.024619"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first_finetune_fcst = nixtla_client.forecast(h=h, finetuned_model_id=first_model_id, **fcst_kwargs)\n",
"first_finetune_eval = evaluate(first_finetune_fcst.merge(valid), metrics=[rmse], agg_fn='mean')\n",
"zero_shot_eval.merge(first_finetune_eval, on=['metric'], suffixes=('_zero_shot', '_first_finetune'))"
]
},
{
"cell_type": "markdown",
"id": "fb763ee8-07c0-4a6b-85dd-deb6c8216ddd",
"metadata": {},
"source": [
"We can see the error was reduced."
]
},
{
"cell_type": "markdown",
"id": "4b97ad55-a82c-4dd2-878c-40e2e9bf8945",
"metadata": {},
"source": [
"## 5. Further fine-tune\n",
"\n",
"We can now take this model and fine-tune it a bit further by using the `NixtlaClient.finetune` method but providing our already fine-tuned model as `finetuned_model_id`, which will take that model and fine-tune it a bit more. We can also change the fine-tuning settings, like using `finetune_depth=3`, for example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "99ede33c-379b-4569-8e1a-996abbe8576e",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Calling Fine-tune Endpoint...\n"
]
},
{
"data": {
"text/plain": [
"'468b13fb-4b26-447a-bd87-87a64b50d913'"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"second_model_id = nixtla_client.finetune(finetuned_model_id=first_model_id, finetune_depth=3, **fcst_kwargs)\n",
"second_model_id"
]
},
{
"cell_type": "markdown",
"id": "70f0cab5-7b01-4d2d-8afe-0a2317644eed",
"metadata": {},
"source": [
"Since we didn't provide `output_model_id` this time, it got assigned an UUID.\n",
"\n",
"We can now use this model to forecast."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4cfeed2e-0a39-4211-82d1-67d1f868b311",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon, this may lead to less accurate forecasts. Please consider using a smaller horizon.\n",
"INFO:nixtla.nixtla_client:Restricting input...\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>metric</th>\n",
" <th>TimeGPT_first_finetune</th>\n",
" <th>TimeGPT_second_finetune</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>rmse</td>\n",
" <td>1472.024619</td>\n",
" <td>1435.365211</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" metric TimeGPT_first_finetune TimeGPT_second_finetune\n",
"0 rmse 1472.024619 1435.365211"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"second_finetune_fcst = nixtla_client.forecast(h=h, finetuned_model_id=second_model_id, **fcst_kwargs)\n",
"second_finetune_eval = evaluate(second_finetune_fcst.merge(valid), metrics=[rmse], agg_fn='mean')\n",
"first_finetune_eval.merge(second_finetune_eval, on=['metric'], suffixes=('_first_finetune', '_second_finetune'))"
]
},
{
"cell_type": "markdown",
"id": "a2bc7c72-47be-4cc5-b774-f75980e8d70b",
"metadata": {},
"source": [
"We can see the error was reduced a bit more."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "611406fe-2379-4b92-bdd4-5f9a86438d91",
"metadata": {},
"source": [
"## 6. Listing fine-tuned models\n",
"\n",
"We can list our fine-tuned models with the `NixtlaClient.finetuned_models` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9648bb4-74ad-4a94-8c8a-74625e9795d7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[FinetunedModel(id='468b13fb-4b26-447a-bd87-87a64b50d913', created_at=datetime.datetime(2024, 12, 30, 17, 57, 31, 241455, tzinfo=TzInfo(UTC)), created_by='user', base_model_id='my-first-finetuned-model', steps=10, depth=3, loss='default', model='timegpt-1-long-horizon', freq='MS'),\n",
" FinetunedModel(id='my-first-finetuned-model', created_at=datetime.datetime(2024, 12, 30, 17, 57, 16, 978907, tzinfo=TzInfo(UTC)), created_by='user', base_model_id='None', steps=10, depth=1, loss='default', model='timegpt-1-long-horizon', freq='MS')]"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"finetuned_models = nixtla_client.finetuned_models()\n",
"finetuned_models"
]
},
{
"cell_type": "markdown",
"id": "95e591c8-80b0-43f8-afed-dfa760597af8",
"metadata": {},
"source": [
"While that representation may be useful for programmatic use, in this exploratory setting it's nicer to see them as a dataframe, which we can get by providing `as_df=True`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0cacc468-0aa3-42af-85d9-7c31bfd2a4f3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>created_at</th>\n",
" <th>created_by</th>\n",
" <th>base_model_id</th>\n",
" <th>steps</th>\n",
" <th>depth</th>\n",
" <th>loss</th>\n",
" <th>model</th>\n",
" <th>freq</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>468b13fb-4b26-447a-bd87-87a64b50d913</td>\n",
" <td>2024-12-30 17:57:31.241455+00:00</td>\n",
" <td>user</td>\n",
" <td>my-first-finetuned-model</td>\n",
" <td>10</td>\n",
" <td>3</td>\n",
" <td>default</td>\n",
" <td>timegpt-1-long-horizon</td>\n",
" <td>MS</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>my-first-finetuned-model</td>\n",
" <td>2024-12-30 17:57:16.978907+00:00</td>\n",
" <td>user</td>\n",
" <td>None</td>\n",
" <td>10</td>\n",
" <td>1</td>\n",
" <td>default</td>\n",
" <td>timegpt-1-long-horizon</td>\n",
" <td>MS</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id created_at \\\n",
"0 468b13fb-4b26-447a-bd87-87a64b50d913 2024-12-30 17:57:31.241455+00:00 \n",
"1 my-first-finetuned-model 2024-12-30 17:57:16.978907+00:00 \n",
"\n",
" created_by base_model_id steps depth loss \\\n",
"0 user my-first-finetuned-model 10 3 default \n",
"1 user None 10 1 default \n",
"\n",
" model freq \n",
"0 timegpt-1-long-horizon MS \n",
"1 timegpt-1-long-horizon MS "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nixtla_client.finetuned_models(as_df=True)"
]
},
{
"cell_type": "markdown",
"id": "9697c759-1b08-4192-a14f-5df1fdb03191",
"metadata": {},
"source": [
"We can seee that the `base_model_id` of our second model is our first model, along with other metadata."
]
},
{
"cell_type": "markdown",
"id": "eae29db5-de09-4954-9352-4f22eb0c3675",
"metadata": {},
"source": [
"## 7. Deleting fine-tuned models\n",
"\n",
"In order to keep things organized, and since there's a limit of 50 fine-tuned models, you can delete models that weren't so promising to make room for more experiments. For example, we can delete our first finetuned model. Note that even though it was used as the base for our second model, they're saved independently so removing it won't affect our second model, except for the dangling metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7232bc3b-9096-4875-978a-430b7627688f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nixtla_client.delete_finetuned_model(first_model_id)"
]
},
{
"cell_type": "markdown",
"id": "0973b161-368f-4681-8447-c87537a46583",
"metadata": {},
"source": [
"We can verify that our first model model doesn't show up anymore in our available models."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b80edea-8926-4a13-8fb8-ec9bbcf4d575",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>created_at</th>\n",
" <th>created_by</th>\n",
" <th>base_model_id</th>\n",
" <th>steps</th>\n",
" <th>depth</th>\n",
" <th>loss</th>\n",
" <th>model</th>\n",
" <th>freq</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>468b13fb-4b26-447a-bd87-87a64b50d913</td>\n",
" <td>2024-12-30 17:57:31.241455+00:00</td>\n",
" <td>user</td>\n",
" <td>my-first-finetuned-model</td>\n",
" <td>10</td>\n",
" <td>3</td>\n",
" <td>default</td>\n",
" <td>timegpt-1-long-horizon</td>\n",
" <td>MS</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id created_at \\\n",
"0 468b13fb-4b26-447a-bd87-87a64b50d913 2024-12-30 17:57:31.241455+00:00 \n",
"\n",
" created_by base_model_id steps depth loss \\\n",
"0 user my-first-finetuned-model 10 3 default \n",
"\n",
" model freq \n",
"0 timegpt-1-long-horizon MS "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nixtla_client.finetuned_models(as_df=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "02134a5e",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6d8f223",
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c6c0333",
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce98fab5",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"id": "da753996-54f8-4244-a34e-7316b0c01827",
"metadata": {},
"source": [
"# Fine-tuning"
]
},
{
"cell_type": "markdown",
"id": "75a62889-d81e-462e-b235-c1eba1096da9",
"metadata": {},
"source": [
"Fine-tuning is a powerful process for utilizing TimeGPT more effectively. Foundation models such as TimeGPT are pre-trained on vast amounts of data, capturing wide-ranging features and patterns. These models can then be specialized for specific contexts or domains. With fine-tuning, the model's parameters are refined to forecast a new task, allowing it to tailor its vast pre-existing knowledge towards the requirements of the new data. Fine-tuning thus serves as a crucial bridge, linking TimeGPT's broad capabilities to your tasks specificities.\n",
"\n",
"Concretely, the process of fine-tuning consists of performing a certain number of training iterations on your input data minimizing the forecasting error. The forecasts will then be produced with the updated model. To control the number of iterations, use the `finetune_steps` argument of the `forecast` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "448eaf77-0a40-4b5b-88a2-31de99f404bc",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/06_finetuning.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv() \n",
" colab_badge('docs/tutorials/06_finetuning')"
]
},
{
"cell_type": "markdown",
"id": "10ec4f03",
"metadata": {},
"source": [
"## 1. Import packages\n",
"First, we import the required packages and initialize the Nixtla client"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98942108-d427-42d6-81f8-fa0bb5859395",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from nixtla import NixtlaClient\n",
"from utilsforecast.losses import mae, mse\n",
"from utilsforecast.evaluation import evaluate"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64178d1c-957e-4a04-ab64-fde332b1840c",
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b57a38e6",
"metadata": {},
"source": [
"> 👍 Use an Azure AI endpoint\n",
"> \n",
"> To use an Azure AI endpoint, remember to set also the `base_url` argument:\n",
"> \n",
"> `nixtla_client = NixtlaClient(base_url=\"you azure ai endpoint\", api_key=\"your api_key\")`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5cd61549-0b00-4a42-a98e-239fa4fae5e5",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"id": "8c2e5387",
"metadata": {},
"source": [
"## 2. Load data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b78cc83e-7d34-4c37-906d-8c7ed1a977fb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>timestamp</th>\n",
" <th>value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1949-01-01</td>\n",
" <td>112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1949-02-01</td>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1949-03-01</td>\n",
" <td>132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1949-04-01</td>\n",
" <td>129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1949-05-01</td>\n",
" <td>121</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" timestamp value\n",
"0 1949-01-01 112\n",
"1 1949-02-01 118\n",
"2 1949-03-01 132\n",
"3 1949-04-01 129\n",
"4 1949-05-01 121"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "09be4766",
"metadata": {},
"source": [
"## 3. Fine-tuning"
]
},
{
"cell_type": "markdown",
"id": "7f5b9060",
"metadata": {},
"source": [
"Here, `finetune_steps=10` means the model will go through 10 iterations of training on your time series data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a683abc7-190c-40a6-a4e8-41a4c64bd773",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"INFO:nixtla.nixtla_client:Querying model metadata...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
}
],
"source": [
"timegpt_fcst_finetune_df = nixtla_client.forecast(\n",
" df=df, h=12, finetune_steps=10,\n",
" time_col='timestamp', target_col='value',\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ac469746",
"metadata": {},
"source": [
"> 📘 Available models in Azure AI\n",
">\n",
"> If you are using an Azure AI endpoint, please be sure to set `model=\"azureai\"`:\n",
">\n",
"> `nixtla_client.forecast(..., model=\"azureai\")`\n",
"> \n",
"> For the public API, we support two models: `timegpt-1` and `timegpt-1-long-horizon`. \n",
"> \n",
"> By default, `timegpt-1` is used. Please see [this tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting) on how and when to use `timegpt-1-long-horizon`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "545ffdac-f166-417b-993f-78f51b0db6a1",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 1600x350 with 1 Axes>"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nixtla_client.plot(\n",
" df, timegpt_fcst_finetune_df, \n",
" time_col='timestamp', target_col='value',\n",
")"
]
},
{
"cell_type": "markdown",
"id": "62fc9cba-7c6e-4aef-9c68-e05d4fe8f7ba",
"metadata": {},
"source": [
"Keep in mind that fine-tuning can be a bit of trial and error. You might need to adjust the number of `finetune_steps` based on your specific needs and the complexity of your data. Usually, a larger value of `finetune_steps` works better for large datasets.\n",
"\n",
"It's recommended to monitor the model's performance during fine-tuning and adjust as needed. Be aware that more `finetune_steps` may lead to longer training times and could potentially lead to overfitting if not managed properly. \n",
"\n",
"Remember, fine-tuning is a powerful feature, but it should be used thoughtfully and carefully."
]
},
{
"cell_type": "markdown",
"id": "8c546351",
"metadata": {},
"source": [
"For a detailed guide on using a specific loss function for fine-tuning, check out the [Fine-tuning with a specific loss function](https://docs.nixtla.io/docs/tutorials-fine_tuning_with_a_specific_loss_function) tutorial.\n",
"\n",
"Read also our detailed tutorial on [controlling the level of fine-tuning](https://docs.nixtla.io/docs/tutorials-finetune_depth_finetuning) using `finetune_depth`."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla utilsforecast"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fine-tuning with a specific loss function"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When fine-tuning, the model trains on your dataset to tailor its predictions to your particular scenario. As such, it is possible to specify the loss function used during fine-tuning.\\\n",
"\\\n",
"Specifically, you can choose from:\n",
"\n",
"* `\"default\"` - a proprietary loss function that is robust to outliers\n",
"* `\"mae\"` - mean absolute error\n",
"* `\"mse\"` - mean squared error\n",
"* `\"rmse\"` - root mean squared error\n",
"* `\"mape\"` - mean absolute percentage error\n",
"* `\"smape\"` - symmetric mean absolute percentage error"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/07_loss_function_finetuning.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv() \n",
" colab_badge('docs/tutorials/07_loss_function_finetuning')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Import packages\n",
"First, we import the required packages and initialize the Nixtla client."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from nixtla import NixtlaClient\n",
"from utilsforecast.losses import mae, mse, rmse, mape, smape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 👍 Use an Azure AI endpoint\n",
"> \n",
"> To use an Azure AI endpoint, remember to set also the `base_url` argument:\n",
"> \n",
"> `nixtla_client = NixtlaClient(base_url=\"you azure ai endpoint\", api_key=\"your api_key\")`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Load data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's fine-tune the model on a dataset using the mean absolute error (MAE).\\\n",
"\\\n",
"For that, we simply pass the appropriate string representing the loss function to the `finetune_loss` parameter of the `forecast` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>timestamp</th>\n",
" <th>value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>1949-01-01</td>\n",
" <td>112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1949-02-01</td>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>1949-03-01</td>\n",
" <td>132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1949-04-01</td>\n",
" <td>129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>1949-05-01</td>\n",
" <td>121</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id timestamp value\n",
"0 1 1949-01-01 112\n",
"1 1 1949-02-01 118\n",
"2 1 1949-03-01 132\n",
"3 1 1949-04-01 129\n",
"4 1 1949-05-01 121"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')\n",
"df.insert(loc=0, column='unique_id', value=1)\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Fine-tuning with Mean Absolute Error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's fine-tune the model on a dataset using the Mean Absolute Error (MAE).\\\n",
"\\\n",
"For that, we simply pass the appropriate string representing the loss function to the `finetune_loss` parameter of the `forecast` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
}
],
"source": [
"timegpt_fcst_finetune_mae_df = nixtla_client.forecast(\n",
" df=df, \n",
" h=12, \n",
" finetune_steps=10,\n",
" finetune_loss='mae', # Set your desired loss function\n",
" time_col='timestamp', \n",
" target_col='value',\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 Available models in Azure AI\n",
">\n",
"> If you are using an Azure AI endpoint, please be sure to set `model=\"azureai\"`:\n",
">\n",
"> `nixtla_client.forecast(..., model=\"azureai\")`\n",
"> \n",
"> For the public API, we support two models: `timegpt-1` and `timegpt-1-long-horizon`. \n",
"> \n",
"> By default, `timegpt-1` is used. Please see [this tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting) on how and when to use `timegpt-1-long-horizon`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 2400x350 with 1 Axes>"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nixtla_client.plot(\n",
" df, timegpt_fcst_finetune_mae_df, \n",
" time_col='timestamp', target_col='value',\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, depending on your data, you will use a specific error metric to accurately evaluate your forecasting model's performance.\\\n",
"\\\n",
"Below is a non-exhaustive guide on which metric to use depending on your use case.\\\n",
"\\\n",
"**Mean absolute error (MAE)**\\\n",
"\\\n",
"<img src=\"https://latex.codecogs.com/svg.image?\\mathrm{MAE}(\\mathbf{y}_{\\tau}, \\mathbf{\\hat{y}}_{\\tau}) = \\frac{1}{H} \\sum^{t+H}_{\\tau=t+1} |y_{\\tau} - \\hat{y}_{\\tau}|\" />\n",
"\n",
"- Robust to outliers\n",
"- Easy to understand\n",
"- You care equally about all error sizes\n",
"- Same units as your data\n",
"\n",
"**Mean squared error (MSE)**\\\n",
"\\\n",
"<img src=\"https://latex.codecogs.com/svg.image?\\mathrm{MSE}(\\mathbf{y}_{\\tau}, \\mathbf{\\hat{y}}_{\\tau}) = \\frac{1}{H} \\sum^{t+H}_{\\tau=t+1} (y_{\\tau} - \\hat{y}_{\\tau})^{2}\" />\n",
"\n",
"- You want to penalize large errors more than small ones\n",
"- Sensitive to outliers\n",
"- Used when large errors must be avoided\n",
"- *Not* the same units as your data\n",
"\n",
"**Root mean squared error (RMSE)**\\\n",
"\\\n",
"<img src=\"https://latex.codecogs.com/svg.image?\\mathrm{RMSE}(\\mathbf{y}_{\\tau}, \\mathbf{\\hat{y}}_{\\tau}) = \\sqrt{\\frac{1}{H} \\sum^{t+H}_{\\tau=t+1} (y_{\\tau} - \\hat{y}_{\\tau})^{2}}\" />\n",
"\n",
"- Brings the MSE back to original units of data\n",
"- Penalizes large errors more than small ones\n",
"\n",
"**Mean absolute percentage error (MAPE)**\\\n",
"\\\n",
"<img src=\"https://latex.codecogs.com/svg.image?\\mathrm{MAPE}(\\mathbf{y}_{\\tau}, \\mathbf{\\hat{y}}_{\\tau}) = \\frac{1}{H} \\sum^{t+H}_{\\tau=t+1} \\frac{|y_{\\tau}-\\hat{y}_{\\tau}|}{|y_{\\tau}|}\" />\n",
"\n",
"- Easy to understand for non-technical stakeholders\n",
"- Expressed as a percentage\n",
"- Heavier penalty on positive errors over negative errors\n",
"- To be avoided if your data has values close to 0 or equal to 0\n",
"\n",
"**Symmmetric mean absolute percentage error (sMAPE)**\\\n",
"\\\n",
"<img src=\"https://latex.codecogs.com/svg.image?\\mathrm{SMAPE}_{2}(\\mathbf{y}_{\\tau}, \\mathbf{\\hat{y}}_{\\tau}) = \\frac{1}{H} \\sum^{t+H}_{\\tau=t+1} \\frac{|y_{\\tau}-\\hat{y}_{\\tau}|}{|y_{\\tau}|+|\\hat{y}_{\\tau}|}\" />\n",
"\n",
"- Fixes bias of MAPE\n",
"- Equally senstitive to over and under forecasting\n",
"- To be avoided if your data has values close to 0 or equal to 0\n",
"\n",
"With TimeGPT, you can choose your loss function during fine-tuning as to maximize the model's performance metric for your particular use case.\\\n",
"\\\n",
"Let's run a small experiment to see how each loss function improves their associated metric when compared to the default setting."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train = df[:-36]\n",
"test = df[-36:]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n",
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n",
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n",
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n",
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n",
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"WARNING:nixtla.nixtla_client:The specified horizon \"h\" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n"
]
}
],
"source": [
"losses = ['default', 'mae', 'mse', 'rmse', 'mape', 'smape']\n",
"\n",
"test = test.copy()\n",
"\n",
"for loss in losses:\n",
" preds_df = nixtla_client.forecast(\n",
" df=train, \n",
" h=36, \n",
" finetune_steps=10,\n",
" finetune_loss=loss,\n",
" time_col='timestamp', \n",
" target_col='value')\n",
"\n",
" preds = preds_df['TimeGPT'].values\n",
"\n",
" test.loc[:,f'TimeGPT_{loss}'] = preds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 Available models in Azure AI\n",
">\n",
"> If you are using an Azure AI endpoint, please be sure to set `model=\"azureai\"`:\n",
">\n",
"> `nixtla_client.forecast(..., model=\"azureai\")`\n",
"> \n",
"> For the public API, we support two models: `timegpt-1` and `timegpt-1-long-horizon`. \n",
"> \n",
"> By default, `timegpt-1` is used. Please see [this tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting) on how and when to use `timegpt-1-long-horizon`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>timestamp</th>\n",
" <th>value</th>\n",
" <th>TimeGPT_default</th>\n",
" <th>TimeGPT_mae</th>\n",
" <th>TimeGPT_mse</th>\n",
" <th>TimeGPT_rmse</th>\n",
" <th>TimeGPT_mape</th>\n",
" <th>TimeGPT_smape</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>108</th>\n",
" <td>1</td>\n",
" <td>1958-01-01</td>\n",
" <td>340</td>\n",
" <td>347.134094</td>\n",
" <td>341.933563</td>\n",
" <td>347.600616</td>\n",
" <td>347.059113</td>\n",
" <td>356.154938</td>\n",
" <td>341.958679</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109</th>\n",
" <td>1</td>\n",
" <td>1958-02-01</td>\n",
" <td>318</td>\n",
" <td>345.739746</td>\n",
" <td>343.268738</td>\n",
" <td>346.399963</td>\n",
" <td>345.678314</td>\n",
" <td>354.163422</td>\n",
" <td>343.929657</td>\n",
" </tr>\n",
" <tr>\n",
" <th>110</th>\n",
" <td>1</td>\n",
" <td>1958-03-01</td>\n",
" <td>362</td>\n",
" <td>394.611450</td>\n",
" <td>390.873169</td>\n",
" <td>395.436646</td>\n",
" <td>394.636627</td>\n",
" <td>396.496155</td>\n",
" <td>392.543640</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>1</td>\n",
" <td>1958-04-01</td>\n",
" <td>348</td>\n",
" <td>404.133545</td>\n",
" <td>400.997070</td>\n",
" <td>404.369598</td>\n",
" <td>403.498901</td>\n",
" <td>396.927185</td>\n",
" <td>402.459625</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112</th>\n",
" <td>1</td>\n",
" <td>1958-05-01</td>\n",
" <td>363</td>\n",
" <td>421.236542</td>\n",
" <td>418.793365</td>\n",
" <td>422.122223</td>\n",
" <td>421.541443</td>\n",
" <td>410.335663</td>\n",
" <td>422.161255</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id timestamp value TimeGPT_default TimeGPT_mae TimeGPT_mse \\\n",
"108 1 1958-01-01 340 347.134094 341.933563 347.600616 \n",
"109 1 1958-02-01 318 345.739746 343.268738 346.399963 \n",
"110 1 1958-03-01 362 394.611450 390.873169 395.436646 \n",
"111 1 1958-04-01 348 404.133545 400.997070 404.369598 \n",
"112 1 1958-05-01 363 421.236542 418.793365 422.122223 \n",
"\n",
" TimeGPT_rmse TimeGPT_mape TimeGPT_smape \n",
"108 347.059113 356.154938 341.958679 \n",
"109 345.678314 354.163422 343.929657 \n",
"110 394.636627 396.496155 392.543640 \n",
"111 403.498901 396.927185 402.459625 \n",
"112 421.541443 410.335663 422.161255 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#| hide\n",
"test.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great! We have predictions from TimeGPT using all the different loss functions. We can evaluate the performance using their associated metric and measure the improvement."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loss_fct_dict = {\n",
" \"mae\": mae,\n",
" \"mse\": mse,\n",
" \"rmse\": rmse,\n",
" \"mape\": mape,\n",
" \"smape\": smape\n",
"}\n",
"\n",
"pct_improv = []\n",
"\n",
"for loss in losses[1:]:\n",
" evaluation = loss_fct_dict[f'{loss}'](test, models=['TimeGPT_default', f'TimeGPT_{loss}'], id_col='unique_id', target_col='value')\n",
" pct_diff = (evaluation['TimeGPT_default'] - evaluation[f'TimeGPT_{loss}']) / evaluation['TimeGPT_default'] * 100\n",
" pct_improv.append(round(pct_diff, 2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mae</th>\n",
" <th>mse</th>\n",
" <th>rmse</th>\n",
" <th>mape</th>\n",
" <th>smape</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Metric improvement (%)</th>\n",
" <td>8.54</td>\n",
" <td>0.31</td>\n",
" <td>0.64</td>\n",
" <td>31.02</td>\n",
" <td>7.36</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mae mse rmse mape smape\n",
"Metric improvement (%) 8.54 0.31 0.64 31.02 7.36"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = {\n",
" 'mae': pct_improv[0].values,\n",
" 'mse': pct_improv[1].values,\n",
" 'rmse': pct_improv[2].values,\n",
" 'mape': pct_improv[3].values,\n",
" 'smape': pct_improv[4].values\n",
"}\n",
"\n",
"metrics_df = pd.DataFrame(data)\n",
"metrics_df.index = ['Metric improvement (%)']\n",
"\n",
"metrics_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the table above, we can see that using a specific loss function during fine-tuning will improve its associated error metric when compared to the default loss function.\\\n",
"\\\n",
"In this example, using the MAE as the loss function improves the metric by 8.54% when compared to using the default loss function.\\\n",
"\\\n",
"That way, depending on your use case and performance metric, you can use the appropriate loss function to maximize the accuracy of the forecasts."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "markdown",
"id": "6de758ee-a0d2-4b3f-acff-eed419dd17c5",
"metadata": {},
"source": [
"# Validation"
]
},
{
"cell_type": "markdown",
"id": "5d267032-535b-4b7b-b7d3-d2db8f673af6",
"metadata": {},
"source": [
"One of the primary challenges in time series forecasting is the inherent uncertainty and variability over time, making it crucial to validate the accuracy and reliability of the models employed. `TimeGPT` offers the possibility for cross-validation and historical forecasts to help you validate your predictions.\n",
"\n",
"### What You Will Learn\n",
"\n",
"1. **[Cross-Validation](https://docs.nixtla.io/docs/tutorials-cross_validation)**\n",
"\n",
" - Learn how to perform time series cross-validation across different continuous windows of your data. \n",
"\n",
"2. **[Historical Forecasts](https://docs.nixtla.io/docs/tutorials-historical_forecast)**\n",
"\n",
" - Generate in-sample forecasts to validate how `TimeGPT` would have performed in the past, providing insights into the model's accuracy. \n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "2784576e",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c496982",
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08bb6d93",
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "876fee25",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"id": "7aada13c-3ac8-4664-92c3-82a96503128a",
"metadata": {},
"source": [
"# Historical forecast"
]
},
{
"cell_type": "markdown",
"id": "9de01fae-a231-4481-a080-f4c1ffe0b0cb",
"metadata": {},
"source": [
"Our time series model offers a powerful feature that allows users to retrieve historical forecasts alongside the prospective predictions. This functionality is accessible through the forecast method by setting the `add_history=True` argument."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b095960a-9a04-4ce7-b31f-babf181c1a8f",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/09_historical_forecast.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv()\n",
" colab_badge('docs/tutorials/09_historical_forecast')"
]
},
{
"cell_type": "markdown",
"id": "f318762d",
"metadata": {},
"source": [
"## 1. Import packages\n",
"First, we install and import the required packages and initialize the Nixtla client."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11bdfa31-0b38-4044-9018-ebe1c58f91d6",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from nixtla import NixtlaClient"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5f8b2b2b-4f49-479d-b709-b0db1c61fea8",
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "markdown",
"id": "dce721eb",
"metadata": {},
"source": [
"> 👍 Use an Azure AI endpoint\n",
">\n",
"> To use an Azure AI endpoint, set the `base_url` argument:\n",
">\n",
"> `nixtla_client = NixtlaClient(base_url=\"you azure ai endpoint\", api_key=\"your api_key\")`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d1a958b-23d9-43c1-882a-3d1348a34b54",
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"id": "9315e2a0",
"metadata": {},
"source": [
"## 2. Load data"
]
},
{
"cell_type": "markdown",
"id": "1266c856-7e04-4b0a-a54e-593d9ae5d723",
"metadata": {},
"source": [
"Now you can start to make forecasts! Let's import an example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ee0996c2-8b1f-4ec4-9b56-1cd5e4667072",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>timestamp</th>\n",
" <th>value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1949-01-01</td>\n",
" <td>112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1949-02-01</td>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1949-03-01</td>\n",
" <td>132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1949-04-01</td>\n",
" <td>129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1949-05-01</td>\n",
" <td>121</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" timestamp value\n",
"0 1949-01-01 112\n",
"1 1949-02-01 118\n",
"2 1949-03-01 132\n",
"3 1949-04-01 129\n",
"4 1949-05-01 121"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "49d08019-8908-4a4c-8dd1-a2e4def0ce66",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 2400x350 with 1 Axes>"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nixtla_client.plot(df, time_col='timestamp', target_col='value')"
]
},
{
"cell_type": "markdown",
"id": "9dc47b26",
"metadata": {},
"source": [
"## 3. Historical forecast"
]
},
{
"cell_type": "markdown",
"id": "d13dc09e-f606-40d2-94a6-f3b24106e85e",
"metadata": {},
"source": [
"Let's add fitted values. When `add_history` is set to True, the output DataFrame will include not only the future forecasts determined by the h argument, but also the historical predictions. Currently, the historical forecasts are not affected by `h`, and have a fix horizon depending on the frequency of the data. The historical forecasts are produced in a rolling window fashion, and concatenated. This means that the model is applied sequentially at each time step using only the most recent information available up to that point."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3768a453-8d69-4070-b4a0-9ba19e431f0b",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:nixtla.nixtla_client:Validating inputs...\n",
"INFO:nixtla.nixtla_client:Preprocessing dataframes...\n",
"INFO:nixtla.nixtla_client:Inferred freq: MS\n",
"INFO:nixtla.nixtla_client:Calling Forecast Endpoint...\n",
"INFO:nixtla.nixtla_client:Calling Historical Forecast Endpoint...\n"
]
}
],
"source": [
"timegpt_fcst_with_history_df = nixtla_client.forecast(\n",
" df=df, h=12, time_col='timestamp', target_col='value',\n",
" add_history=True,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "6933cbb0",
"metadata": {},
"source": [
"> 📘 Available models in Azure AI\n",
">\n",
"> If you are using an Azure AI endpoint, please be sure to set `model=\"azureai\"`:\n",
">\n",
"> `nixtla_client.forecast(..., model=\"azureai\")`\n",
"> \n",
"> For the public API, we support two models: `timegpt-1` and `timegpt-1-long-horizon`. \n",
"> \n",
"> By default, `timegpt-1` is used. Please see [this tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting) on how and when to use `timegpt-1-long-horizon`.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6ca1274-69ac-4373-a854-827b2deaa2f7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>timestamp</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1951-01-01</td>\n",
" <td>135.483673</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1951-02-01</td>\n",
" <td>144.442398</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1951-03-01</td>\n",
" <td>157.191910</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1951-04-01</td>\n",
" <td>148.769363</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1951-05-01</td>\n",
" <td>140.472946</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" timestamp TimeGPT\n",
"0 1951-01-01 135.483673\n",
"1 1951-02-01 144.442398\n",
"2 1951-03-01 157.191910\n",
"3 1951-04-01 148.769363\n",
"4 1951-05-01 140.472946"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"timegpt_fcst_with_history_df.head()"
]
},
{
"cell_type": "markdown",
"id": "c9d43c9b-414b-41cd-a336-1e478d9cff46",
"metadata": {},
"source": [
"Let's plot the results. This consolidated view of past and future predictions can be invaluable for understanding the model's behavior and for evaluating its performance over time."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e1e5281-6c6d-4215-add1-2a7066aa491a",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 2400x350 with 1 Axes>"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nixtla_client.plot(df, timegpt_fcst_with_history_df, time_col='timestamp', target_col='value')"
]
},
{
"cell_type": "markdown",
"id": "a7de0a08-d05d-4168-9b4d-523af5067434",
"metadata": {},
"source": [
"Please note, however, that the initial values of the series are not included in these historical forecasts. This is because `TimeGPT` requires a certain number of initial observations to generate reliable forecasts. Therefore, while interpreting the output, it's important to be aware that the first few observations serve as the basis for the model's predictions and are not themselves predicted values."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "markdown",
"id": "6de758ee-a0d2-4b3f-acff-eed419dd17c5",
"metadata": {},
"source": [
"# Uncertainty quantification\n",
"\n",
"In forecasting, it is essential to consider the full distribution of predictions rather than only a point prediction. This approach allows for a better understanding of the uncertainty surrounding the forecast. `TimeGPT` supports uncertainty quantification through quantile forecasts and prediction intervals.\n",
"\n",
"### What You Will Learn\n",
"\n",
"1. **[Quantile Forecasts](https://docs.nixtla.io/docs/tutorials-quantile_forecasts)**\n",
"\n",
" - Learn how to compute specific quantiles of the forecast distribution using `TimeGPT`. \n",
"\n",
"2. **[Prediction Intervals](https://docs.nixtla.io/docs/tutorials-prediction_intervals)**\n",
"\n",
" - Learn how to generate prediction intervals with `TimeGPT`, which give you a range of values that the forecast can take with a given probability. \n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "markdown",
"id": "6de758ee-a0d2-4b3f-acff-eed419dd17c5",
"metadata": {},
"source": [
"# Special topics"
]
},
{
"cell_type": "markdown",
"id": "5d267032-535b-4b7b-b7d3-d2db8f673af6",
"metadata": {},
"source": [
"`TimeGPT` is a robust foundation model for time series forecasting, with advanced capabilities such as hierarchical and bounded forecasts. To fully leverage the power of `TimeGPT`, there are specific situations that require special consideration, such as dealing with irregular timestamps or handling datasets with missing values.\n",
"\n",
"In this section, we will cover these special topics.\n",
"\n",
"### What You Will Learn\n",
"\n",
"1. **[Irregular Timestamps](https://docs.nixtla.io/docs/capabilities-forecast-irregular_timestamps)**\n",
"\n",
" - Learn how to deal with irregular timestamps for correct usage of `TimeGPT`.\n",
"\n",
"2. **[Bounded Forecasts](https://docs.nixtla.io/docs/tutorials-bounded_forecasts)**\n",
"\n",
" - Explore `TimeGPT`'s capability to make forecasts within a specified range, ideal for applications where outcomes are bounded.\n",
"\n",
"3. **[Hierarchical Forecasts](https://docs.nixtla.io/docs/tutorials-hierarchical_forecasting)**\n",
"\n",
" - Understand how to use `TimeGPT` to make coherent predictions at various levels of aggregation.\n",
"\n",
"4. **[Missing Values](https://docs.nixtla.io/docs/tutorials-missing_values)**\n",
"\n",
" - Learn how to address missing values within your time series data effectively using `TimeGPT`.\n",
"\n",
"5. **[Improve Forecast Accuracy](https://docs.nixtla.io/docs/tutorials-improve_forecast_accuracy_with_timegpt)**\n",
"\n",
" - Discover multiple techniques to boost forecast accuracy when working with `TimeGPT`."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Computing at scale"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Handling large datasets is a common challenge in time series forecasting. For example, when working with retail data, you may have to forecast sales for thousands of products across hundreds of stores. Similarly, when dealing with electricity consumption data, you may need to predict consumption for thousands of households across various regions.\n",
"\n",
"Nixtla's `TimeGPT` enables you to use several distributed computing frameworks to manage large datasets efficiently. `TimeGPT` currently supports `Spark`, `Dask`, and `Ray` through `Fugue`.\n",
"\n",
"In this notebook, we will explain how to leverage these frameworks using `TimeGPT`. \n",
"\n",
"**Outline:**\n",
"\n",
"1. [Getting Started](#1-getting-started)\n",
"\n",
"2. [Forecasting at Scale](#2-forecasting-at-scale) \n",
"\n",
"3. [Important Considerations](#3-important-considerations) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting started \n",
"\n",
"To use `TimeGPT` with any of the supported distributed computing frameworks, you first need an API Key, just as you would when not using any distributed computing.\n",
"\n",
"Upon [registration](https://dashboard.nixtla.io/), you will receive an email asking you to confirm your signup. After confirming, you will receive access to your dashboard. There, under`API Keys`, you will find your API Key. Next, you need to integrate your API Key into your development workflow with the Nixtla SDK. For guidance on how to do this, please refer to the [Setting Up Your Authentication Key tutorial](https://docs.nixtla.io/docs/getting-started-setting_up_your_api_key)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecasting at Scale "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `TimeGPT` with any of the supported distributed computing frameworks is straightforward and its usage is almost identical to the non-distributed case. \n",
"\n",
"1. Instantiate a `NixtlaClient` class.\n",
"2. Load your data as a `pandas` DataFrame.\n",
"3. Initialize the distributed computing framework. \n",
" - [Spark](https://docs.nixtla.io/docs/tutorials-spark)\n",
" - [Dask](https://docs.nixtla.io/docs/tutorials-dask)\n",
" - [Ray](https://docs.nixtla.io/docs/tutorials-ray)\n",
"4. Use any of the `NixtlaClient` class methods.\n",
"5. Stop the distributed computing framework, if necessary. \n",
"\n",
"These are the general steps that you will need to follow to use `TimeGPT` with any of the supported distributed computing frameworks. For a detailed explanation and a complete example, please refer to the guide for the specific framework linked above."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"::: {.callout-important}\n",
"Parallelization in these frameworks is done along the various time series within your dataset. Therefore, it is essential that your dataset includes multiple time series, each with a unique id. \n",
":::"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Important Considerations "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### When to Use a Distributed Computing Framework\n",
"\n",
"Consider using a distributed computing framework if your dataset:\n",
"\n",
"- Consists of millions of observations over multiple time series.\n",
"- Is too large to fit into the memory of a single machine.\n",
"- Would be too slow to process on a single machine.\n",
"\n",
"### Choosing the Right Framework\n",
"\n",
"When selecting a distributed computing framework, take into account your existing infrastructure and the skill set of your team. Although `TimeGPT` can be used with any of the supported frameworks with minimal code changes, choosing the right one should align with your specific needs and resources. This will ensure that you leverage the full potential of `TimeGPT` while handling large datasets efficiently."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla fugue[spark]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Spark\n",
"\n",
"> Run TimeGPT distributedly on top of Spark"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Spark](https://spark.apache.org/) is an open-source distributed computing framework designed for large-scale data processing. In this guide, we will explain how to use `TimeGPT` on top of Spark. \n",
"\n",
"**Outline:** \n",
"\n",
"1. [Installation](#installation)\n",
"\n",
"2. [Load Your Data](#load-your-data)\n",
"\n",
"3. [Initialize Spark](#initialize-spark) \n",
"\n",
"4. [Use TimeGPT on Spark](#use-timegpt-on-spark)\n",
"\n",
"5. [Stop Spark](#stop-spark)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/16_computing_at_scale_spark_distributed.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv()\n",
" colab_badge('docs/tutorials/17_computing_at_scale_spark_distributed')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Installation "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install Spark through [Fugue](https://fugue-tutorials.readthedocs.io/). Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Spark. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"::: {.callout-note}\n",
"You can install `fugue` with `pip`:\n",
" \n",
"```shell\n",
"pip install fugue[spark]\n",
"```\n",
":::"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If executing on a distributed `Spark` cluster, ensure that the `nixtla` library is installed across all the workers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Load Data "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can load your data as a `pandas` DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 00:00:00</td>\n",
" <td>70.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 01:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 02:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 03:00:00</td>\n",
" <td>44.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 04:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds y\n",
"0 BE 2016-10-22 00:00:00 70.00\n",
"1 BE 2016-10-22 01:00:00 37.10\n",
"2 BE 2016-10-22 02:00:00 37.10\n",
"3 BE 2016-10-22 03:00:00 44.75\n",
"4 BE 2016-10-22 04:00:00 37.10"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(\n",
" 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',\n",
" parse_dates=['ds'],\n",
") \n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Initialize Spark "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize `Spark` and convert the pandas DataFrame to a `Spark` DataFrame. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.sql import SparkSession"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"spark = SparkSession.builder.getOrCreate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"spark_df = spark.createDataFrame(df)\n",
"spark_df.show(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Use TimeGPT on Spark "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `TimeGPT` on top of `Spark` is almost identical to the non-distributed case. The only difference is that you need to use a `Spark` DataFrame. \n",
"\n",
"First, instantiate the `NixtlaClient` class. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nixtla import NixtlaClient"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 👍 Use an Azure AI endpoint\n",
">\n",
"> To use an Azure AI endpoint, set the `base_url` argument:\n",
">\n",
"> `nixtla_client = NixtlaClient(base_url=\"you azure ai endpoint\", api_key=\"your api_key\")`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then use any method from the `NixtlaClient` class such as [`forecast`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-forecast) or [`cross_validation`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-cross-validation)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fcst_df = nixtla_client.forecast(spark_df, h=12)\n",
"fcst_df.show(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 Available models in Azure AI\n",
">\n",
"> If you are using an Azure AI endpoint, please be sure to set `model=\"azureai\"`:\n",
">\n",
"> `nixtla_client.forecast(..., model=\"azureai\")`\n",
"> \n",
"> For the public API, we support two models: `timegpt-1` and `timegpt-1-long-horizon`. \n",
"> \n",
"> By default, `timegpt-1` is used. Please see [this tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting) on how and when to use `timegpt-1-long-horizon`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cv_df = nixtla_client.cross_validation(spark_df, h=12, n_windows=5, step_size=2)\n",
"cv_df.show(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also use exogenous variables with `TimeGPT` on top of `Spark`. To do this, please refer to the [Exogenous Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables) tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a `Spark` DataFrame instead."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Stop Spark "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you are done, stop the `Spark` session. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"spark.stop()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla fugue[dask]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dask\n",
"\n",
"> Run TimeGPT distributedly on top of Dask"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Dask](https://www.dask.org/get-started) is an open source parallel computing library for Python. In this guide, we will explain how to use `TimeGPT` on top of Dask. \n",
"\n",
"**Outline:** \n",
"\n",
"1. [Installation](#installation)\n",
"\n",
"2. [Load Your Data](#load-your-data)\n",
"\n",
"3. [Import Dask](#import-dask) \n",
"\n",
"4. [Use TimeGPT on Dask](#use-timegpt-on-dask)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/17_computing_at_scale_dask_distributed.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv()\n",
" colab_badge('docs/tutorials/18_computing_at_scale_dask_distributed')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Installation "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install Dask through [Fugue](https://fugue-tutorials.readthedocs.io/). Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Dask. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"::: {.callout-note}\n",
"You can install `fugue` with `pip`:\n",
" \n",
"```shell\n",
"pip install fugue[dask]\n",
"```\n",
":::"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If executing on a distributed `Dask` cluster, ensure that the `nixtla` library is installed across all the workers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Load Data "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can load your data as a `pandas` DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 00:00:00</td>\n",
" <td>70.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 01:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 02:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 03:00:00</td>\n",
" <td>44.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 04:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds y\n",
"0 BE 2016-10-22 00:00:00 70.00\n",
"1 BE 2016-10-22 01:00:00 37.10\n",
"2 BE 2016-10-22 02:00:00 37.10\n",
"3 BE 2016-10-22 03:00:00 44.75\n",
"4 BE 2016-10-22 04:00:00 37.10"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(\n",
" 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',\n",
" parse_dates=['ds'],\n",
") \n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Import Dask"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import Dask and convert the `pandas` DataFrame to a Dask DataFrame. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import dask.dataframe as dd"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div><strong>Dask DataFrame Structure:</strong></div>\n",
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>y</th>\n",
" </tr>\n",
" <tr>\n",
" <th>npartitions=2</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>string</td>\n",
" <td>string</td>\n",
" <td>float64</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4200</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8399</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
"<div>Dask Name: to_pyarrow_string, 2 graph layers</div>"
],
"text/plain": [
"Dask DataFrame Structure:\n",
" unique_id ds y\n",
"npartitions=2 \n",
"0 string string float64\n",
"4200 ... ... ...\n",
"8399 ... ... ...\n",
"Dask Name: to_pyarrow_string, 2 graph layers"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dask_df = dd.from_pandas(df, npartitions=2)\n",
"dask_df "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Use TimeGPT on Dask "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `TimeGPT` on top of `Dask` is almost identical to the non-distributed case. The only difference is that you need to use a `Dask` DataFrame, which we already defined in the previous step. \n",
"\n",
"First, instantiate the `NixtlaClient` class. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nixtla import NixtlaClient"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 👍 Use an Azure AI endpoint\n",
">\n",
"> To use an Azure AI endpoint, set the `base_url` argument:\n",
">\n",
"> `nixtla_client = NixtlaClient(base_url=\"you azure ai endpoint\", api_key=\"your api_key\")`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then use any method from the `NixtlaClient` class such as [`forecast`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-forecast) or [`cross_validation`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-cross-validation)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 00:00:00</td>\n",
" <td>45.190453</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 01:00:00</td>\n",
" <td>43.244446</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 02:00:00</td>\n",
" <td>41.958389</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 03:00:00</td>\n",
" <td>39.796486</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-12-31 04:00:00</td>\n",
" <td>39.204533</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds TimeGPT\n",
"0 BE 2016-12-31 00:00:00 45.190453\n",
"1 BE 2016-12-31 01:00:00 43.244446\n",
"2 BE 2016-12-31 02:00:00 41.958389\n",
"3 BE 2016-12-31 03:00:00 39.796486\n",
"4 BE 2016-12-31 04:00:00 39.204533"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst_df = nixtla_client.forecast(dask_df, h=12)\n",
"fcst_df.compute().head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 Available models in Azure AI\n",
">\n",
"> If you are using an Azure AI endpoint, please be sure to set `model=\"azureai\"`:\n",
">\n",
"> `nixtla_client.forecast(..., model=\"azureai\")`\n",
"> \n",
"> For the public API, we support two models: `timegpt-1` and `timegpt-1-long-horizon`. \n",
"> \n",
"> By default, `timegpt-1` is used. Please see [this tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting) on how and when to use `timegpt-1-long-horizon`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>cutoff</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-12-30 04:00:00</td>\n",
" <td>2016-12-30 03:00:00</td>\n",
" <td>39.375439</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-12-30 05:00:00</td>\n",
" <td>2016-12-30 03:00:00</td>\n",
" <td>40.039215</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-12-30 06:00:00</td>\n",
" <td>2016-12-30 03:00:00</td>\n",
" <td>43.455849</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-12-30 07:00:00</td>\n",
" <td>2016-12-30 03:00:00</td>\n",
" <td>47.716408</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-12-30 08:00:00</td>\n",
" <td>2016-12-30 03:00:00</td>\n",
" <td>50.31665</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds cutoff TimeGPT\n",
"0 BE 2016-12-30 04:00:00 2016-12-30 03:00:00 39.375439\n",
"1 BE 2016-12-30 05:00:00 2016-12-30 03:00:00 40.039215\n",
"2 BE 2016-12-30 06:00:00 2016-12-30 03:00:00 43.455849\n",
"3 BE 2016-12-30 07:00:00 2016-12-30 03:00:00 47.716408\n",
"4 BE 2016-12-30 08:00:00 2016-12-30 03:00:00 50.31665"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cv_df = nixtla_client.cross_validation(dask_df, h=12, n_windows=5, step_size=2)\n",
"cv_df.compute().head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also use exogenous variables with `TimeGPT` on top of `Dask`. To do this, please refer to the [Exogenous Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables) tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a `Dask` DataFrame instead."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"!pip install -Uqq nixtla fugue[ray]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"from nixtla.utils import in_colab"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"IN_COLAB = in_colab()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"if not IN_COLAB:\n",
" from nixtla.utils import colab_badge\n",
" from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ray \n",
"\n",
"> Run TimeGPT distributedly on top of Ray"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Ray](https://www.ray.io/) is an open source unified compute framework to scale Python workloads. In this guide, we will explain how to use `TimeGPT` on top of Ray. \n",
"\n",
"**Outline:** \n",
"\n",
"1. [Installation](#installation)\n",
"\n",
"2. [Load Your Data](#load-your-data)\n",
"\n",
"3. [Initialize Ray](#initialize-ray) \n",
"\n",
"4. [Use TimeGPT on Ray](#use-timegpt-on-ray)\n",
"\n",
"5. [Shutdown Ray](#shutdown-ray)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/19_computing_at_scale_ray_distributed.ipynb)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| echo: false\n",
"if not IN_COLAB:\n",
" load_dotenv()\n",
" colab_badge('docs/tutorials/19_computing_at_scale_ray_distributed')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Installation "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install Ray through [Fugue](https://fugue-tutorials.readthedocs.io/). Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Ray. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"::: {.callout-note}\n",
"You can install `fugue` with `pip`:\n",
" \n",
"```shell\n",
"pip install fugue[ray]\n",
"```\n",
":::"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If executing on a distributed `Ray` cluster, ensure that the `nixtla` library is installed across all the workers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Load Data "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can load your data as a `pandas` DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 00:00:00</td>\n",
" <td>70.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 01:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 02:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 03:00:00</td>\n",
" <td>44.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>BE</td>\n",
" <td>2016-10-22 04:00:00</td>\n",
" <td>37.10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds y\n",
"0 BE 2016-10-22 00:00:00 70.00\n",
"1 BE 2016-10-22 01:00:00 37.10\n",
"2 BE 2016-10-22 02:00:00 37.10\n",
"3 BE 2016-10-22 03:00:00 44.75\n",
"4 BE 2016-10-22 04:00:00 37.10"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(\n",
" 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',\n",
" parse_dates=['ds'],\n",
") \n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Initialize Ray"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize `Ray` and convert the pandas DataFrame to a `Ray` DataFrame. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import ray\n",
"from ray.cluster_utils import Cluster"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-05-10 11:09:17,240\tWARNING cluster_utils.py:157 -- Ray cluster mode is currently experimental and untested on Windows. If you are using it and running into issues please file a report at https://github.com/ray-project/ray/issues.\n",
"2024-05-10 11:09:19,076\tINFO worker.py:1564 -- Connecting to existing Ray cluster at address: 127.0.0.1:63694...\n",
"2024-05-10 11:09:19,092\tINFO worker.py:1740 -- Connected to Ray cluster. View the dashboard at \u001b[1m\u001b[32m127.0.0.1:8265 \u001b[39m\u001b[22m\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1205bee3f172455d9d3f706c2b0ae98b",
"version_major": 2,
"version_minor": 0
},
"text/html": [
"<div class=\"lm-Widget p-Widget lm-Panel p-Panel jp-Cell-outputWrapper\">\n",
" <div style=\"margin-left: 50px;display: flex;flex-direction: row;align-items: center\">\n",
" <div class=\"jp-RenderedHTMLCommon\" style=\"display: flex; flex-direction: row;\">\n",
" <svg viewBox=\"0 0 567 224\" fill=\"none\" xmlns=\"http://www.w3.org/2000/svg\" style=\"height: 3em;\">\n",
" <g clip-path=\"url(#clip0_4338_178347)\">\n",
" <path d=\"M341.29 165.561H355.29L330.13 129.051C345.63 123.991 354.21 112.051 354.21 94.2307C354.21 71.3707 338.72 58.1807 311.88 58.1807H271V165.561H283.27V131.661H311.8C314.25 131.661 316.71 131.501 319.01 131.351L341.25 165.561H341.29ZM283.29 119.851V70.0007H311.82C331.3 70.0007 342.34 78.2907 342.34 94.5507C342.34 111.271 331.34 119.861 311.82 119.861L283.29 119.851ZM451.4 138.411L463.4 165.561H476.74L428.74 58.1807H416L367.83 165.561H380.83L392.83 138.411H451.4ZM446.19 126.601H398L422 72.1407L446.24 126.601H446.19ZM526.11 128.741L566.91 58.1807H554.35L519.99 114.181L485.17 58.1807H472.44L514.01 129.181V165.541H526.13V128.741H526.11Z\" fill=\"var(--jp-ui-font-color0)\"/>\n",
" <path d=\"M82.35 104.44C84.0187 97.8827 87.8248 92.0678 93.1671 87.9146C98.5094 83.7614 105.083 81.5067 111.85 81.5067C118.617 81.5067 125.191 83.7614 130.533 87.9146C135.875 92.0678 139.681 97.8827 141.35 104.44H163.75C164.476 101.562 165.622 98.8057 167.15 96.2605L127.45 56.5605C121.071 60.3522 113.526 61.6823 106.235 60.3005C98.9443 58.9187 92.4094 54.9203 87.8602 49.0574C83.3109 43.1946 81.0609 35.8714 81.5332 28.4656C82.0056 21.0599 85.1679 14.0819 90.4252 8.8446C95.6824 3.60726 102.672 0.471508 110.08 0.0272655C117.487 -0.416977 124.802 1.86091 130.647 6.4324C136.493 11.0039 140.467 17.5539 141.821 24.8501C143.175 32.1463 141.816 39.6859 138 46.0505L177.69 85.7505C182.31 82.9877 187.58 81.4995 192.962 81.4375C198.345 81.3755 203.648 82.742 208.33 85.3976C213.012 88.0532 216.907 91.9029 219.616 96.5544C222.326 101.206 223.753 106.492 223.753 111.875C223.753 117.258 222.326 122.545 219.616 127.197C216.907 131.848 213.012 135.698 208.33 138.353C203.648 141.009 198.345 142.375 192.962 142.313C187.58 142.251 182.31 140.763 177.69 138L138 177.7C141.808 184.071 143.155 191.614 141.79 198.91C140.424 206.205 136.44 212.75 130.585 217.313C124.731 221.875 117.412 224.141 110.004 223.683C102.596 223.226 95.6103 220.077 90.3621 214.828C85.1139 209.58 81.9647 202.595 81.5072 195.187C81.0497 187.779 83.3154 180.459 87.878 174.605C92.4405 168.751 98.9853 164.766 106.281 163.401C113.576 162.035 121.119 163.383 127.49 167.19L167.19 127.49C165.664 124.941 164.518 122.182 163.79 119.3H141.39C139.721 125.858 135.915 131.673 130.573 135.826C125.231 139.98 118.657 142.234 111.89 142.234C105.123 142.234 98.5494 139.98 93.2071 135.826C87.8648 131.673 84.0587 125.858 82.39 119.3H60C58.1878 126.495 53.8086 132.78 47.6863 136.971C41.5641 141.163 34.1211 142.972 26.7579 142.059C19.3947 141.146 12.6191 137.574 7.70605 132.014C2.79302 126.454 0.0813599 119.29 0.0813599 111.87C0.0813599 104.451 2.79302 97.2871 7.70605 91.7272C12.6191 86.1673 19.3947 82.5947 26.7579 81.6817C34.1211 80.7686 41.5641 82.5781 47.6863 86.7696C53.8086 90.9611 58.1878 97.2456 60 104.44H82.35ZM100.86 204.32C103.407 206.868 106.759 208.453 110.345 208.806C113.93 209.159 117.527 208.258 120.522 206.256C123.517 204.254 125.725 201.276 126.771 197.828C127.816 194.38 127.633 190.677 126.253 187.349C124.874 184.021 122.383 181.274 119.205 179.577C116.027 177.88 112.359 177.337 108.826 178.042C105.293 178.746 102.113 180.654 99.8291 183.44C97.5451 186.226 96.2979 189.718 96.3 193.32C96.2985 195.364 96.7006 197.388 97.4831 199.275C98.2656 201.163 99.4132 202.877 100.86 204.32ZM204.32 122.88C206.868 120.333 208.453 116.981 208.806 113.396C209.159 109.811 208.258 106.214 206.256 103.219C204.254 100.223 201.275 98.0151 197.827 96.97C194.38 95.9249 190.676 96.1077 187.348 97.4873C184.02 98.8669 181.274 101.358 179.577 104.536C177.879 107.714 177.337 111.382 178.041 114.915C178.746 118.448 180.653 121.627 183.439 123.911C186.226 126.195 189.717 127.443 193.32 127.44C195.364 127.443 197.388 127.042 199.275 126.259C201.163 125.476 202.878 124.328 204.32 122.88ZM122.88 19.4205C120.333 16.8729 116.981 15.2876 113.395 14.9347C109.81 14.5817 106.213 15.483 103.218 17.4849C100.223 19.4868 98.0146 22.4654 96.9696 25.9131C95.9245 29.3608 96.1073 33.0642 97.4869 36.3922C98.8665 39.7202 101.358 42.4668 104.535 44.1639C107.713 45.861 111.381 46.4036 114.914 45.6992C118.447 44.9949 121.627 43.0871 123.911 40.301C126.195 37.515 127.442 34.0231 127.44 30.4205C127.44 28.3772 127.038 26.3539 126.255 24.4664C125.473 22.5788 124.326 20.8642 122.88 19.4205ZM19.42 100.86C16.8725 103.408 15.2872 106.76 14.9342 110.345C14.5813 113.93 15.4826 117.527 17.4844 120.522C19.4863 123.518 22.4649 125.726 25.9127 126.771C29.3604 127.816 33.0638 127.633 36.3918 126.254C39.7198 124.874 42.4664 122.383 44.1635 119.205C45.8606 116.027 46.4032 112.359 45.6988 108.826C44.9944 105.293 43.0866 102.114 40.3006 99.8296C37.5145 97.5455 34.0227 96.2983 30.42 96.3005C26.2938 96.3018 22.337 97.9421 19.42 100.86ZM100.86 100.86C98.3125 103.408 96.7272 106.76 96.3742 110.345C96.0213 113.93 96.9226 117.527 98.9244 120.522C100.926 123.518 103.905 125.726 107.353 126.771C110.8 127.816 114.504 127.633 117.832 126.254C121.16 124.874 123.906 122.383 125.604 119.205C127.301 116.027 127.843 112.359 127.139 108.826C126.434 105.293 124.527 102.114 121.741 99.8296C118.955 97.5455 115.463 96.2983 111.86 96.3005C109.817 96.299 107.793 96.701 105.905 97.4835C104.018 98.2661 102.303 99.4136 100.86 100.86Z\" fill=\"#00AEEF\"/>\n",
" </g>\n",
" <defs>\n",
" <clipPath id=\"clip0_4338_178347\">\n",
" <rect width=\"566.93\" height=\"223.75\" fill=\"white\"/>\n",
" </clipPath>\n",
" </defs>\n",
" </svg>\n",
"</div>\n",
"\n",
" <table class=\"jp-RenderedHTMLCommon\" style=\"border-collapse: collapse;color: var(--jp-ui-font-color1);font-size: var(--jp-ui-font-size1);\">\n",
" <tr>\n",
" <td style=\"text-align: left\"><b>Python version:</b></td>\n",
" <td style=\"text-align: left\"><b>3.10.14</b></td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left\"><b>Ray version:</b></td>\n",
" <td style=\"text-align: left\"><b>2.20.0</b></td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left\"><b>Dashboard:</b></td>\n",
" <td style=\"text-align: left\"><b><a href=\"http://127.0.0.1:8265\" target=\"_blank\">http://127.0.0.1:8265</a></b></td>\n",
"</tr>\n",
"\n",
"</table>\n",
"\n",
" </div>\n",
"</div>\n"
],
"text/plain": [
"RayContext(dashboard_url='127.0.0.1:8265', python_version='3.10.14', ray_version='2.20.0', ray_commit='5708e75978413e46c703e44f43fd89769f3c148b')"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#| output: false\n",
"ray_cluster = Cluster(\n",
" initialize_head=True,\n",
" head_node_args={\"num_cpus\": 2}\n",
")\n",
"ray.init(address=ray_cluster.address, ignore_reinit_error=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "43bbf566465c45368606a5fb057bf534",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"MaterializedDataset(\n",
" num_blocks=1,\n",
" num_rows=8400,\n",
" schema={unique_id: object, ds: object, y: float64}\n",
")"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ray_df = ray.data.from_pandas(df)\n",
"ray_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Use TimeGPT on Ray"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `TimeGPT` on top of `Ray` is almost identical to the non-distributed case. The only difference is that you need to use a `Ray` DataFrame. \n",
"\n",
"First, instantiate the `NixtlaClient` class. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nixtla import NixtlaClient"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nixtla_client = NixtlaClient(\n",
" # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n",
" api_key = 'my_api_key_provided_by_nixtla'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 👍 Use an Azure AI endpoint\n",
">\n",
"> To use an Azure AI endpoint, set the `base_url` argument:\n",
">\n",
"> `nixtla_client = NixtlaClient(base_url=\"you azure ai endpoint\", api_key=\"your api_key\")`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide \n",
"if not IN_COLAB:\n",
" nixtla_client = NixtlaClient()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then use any method from the `NixtlaClient` class such as [`forecast`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-forecast) or [`cross_validation`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-cross-validation)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"fcst_df = nixtla_client.forecast(ray_df, h=12)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 📘 Available models in Azure AI\n",
">\n",
"> If you are using an Azure AI endpoint, please be sure to set `model=\"azureai\"`:\n",
">\n",
"> `nixtla_client.forecast(..., model=\"azureai\")`\n",
"> \n",
"> For the public API, we support two models: `timegpt-1` and `timegpt-1-long-horizon`. \n",
"> \n",
"> By default, `timegpt-1` is used. Please see [this tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting) on how and when to use `timegpt-1-long-horizon`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To visualize the result, use the `to_pandas` method to convert the output of `Ray` to a `pandas` DataFrame."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>NP</td>\n",
" <td>2018-12-24 07:00:00</td>\n",
" <td>55.387066</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>NP</td>\n",
" <td>2018-12-24 08:00:00</td>\n",
" <td>56.115517</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>NP</td>\n",
" <td>2018-12-24 09:00:00</td>\n",
" <td>56.090714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>58</th>\n",
" <td>NP</td>\n",
" <td>2018-12-24 10:00:00</td>\n",
" <td>55.813717</td>\n",
" </tr>\n",
" <tr>\n",
" <th>59</th>\n",
" <td>NP</td>\n",
" <td>2018-12-24 11:00:00</td>\n",
" <td>55.528519</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds TimeGPT\n",
"55 NP 2018-12-24 07:00:00 55.387066\n",
"56 NP 2018-12-24 08:00:00 56.115517\n",
"57 NP 2018-12-24 09:00:00 56.090714\n",
"58 NP 2018-12-24 10:00:00 55.813717\n",
"59 NP 2018-12-24 11:00:00 55.528519"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst_df.to_pandas().tail()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"cv_df = nixtla_client.cross_validation(ray_df, h=12, freq='H', n_windows=5, step_size=2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique_id</th>\n",
" <th>ds</th>\n",
" <th>cutoff</th>\n",
" <th>TimeGPT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>295</th>\n",
" <td>NP</td>\n",
" <td>2018-12-23 19:00:00</td>\n",
" <td>2018-12-23 11:00:00</td>\n",
" <td>53.632019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>296</th>\n",
" <td>NP</td>\n",
" <td>2018-12-23 20:00:00</td>\n",
" <td>2018-12-23 11:00:00</td>\n",
" <td>52.512775</td>\n",
" </tr>\n",
" <tr>\n",
" <th>297</th>\n",
" <td>NP</td>\n",
" <td>2018-12-23 21:00:00</td>\n",
" <td>2018-12-23 11:00:00</td>\n",
" <td>51.894035</td>\n",
" </tr>\n",
" <tr>\n",
" <th>298</th>\n",
" <td>NP</td>\n",
" <td>2018-12-23 22:00:00</td>\n",
" <td>2018-12-23 11:00:00</td>\n",
" <td>51.06572</td>\n",
" </tr>\n",
" <tr>\n",
" <th>299</th>\n",
" <td>NP</td>\n",
" <td>2018-12-23 23:00:00</td>\n",
" <td>2018-12-23 11:00:00</td>\n",
" <td>50.32592</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" unique_id ds cutoff TimeGPT\n",
"295 NP 2018-12-23 19:00:00 2018-12-23 11:00:00 53.632019\n",
"296 NP 2018-12-23 20:00:00 2018-12-23 11:00:00 52.512775\n",
"297 NP 2018-12-23 21:00:00 2018-12-23 11:00:00 51.894035\n",
"298 NP 2018-12-23 22:00:00 2018-12-23 11:00:00 51.06572\n",
"299 NP 2018-12-23 23:00:00 2018-12-23 11:00:00 50.32592"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cv_df.to_pandas().tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also use exogenous variables with `TimeGPT` on top of `Ray`. To do this, please refer to the [Exogenous Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables) tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a `Ray` DataFrame instead."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Shutdown Ray"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you are done, shutdown the `Ray` session. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ray.shutdown()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment