--- title: "Time Series Forecasting with Dask" description: "Scale pandas workflows with Dask and TimeGPT for distributed time series forecasting. Learn to process 10M+ time series in Python with minimal code changes." icon: "server" --- ## Overview [Dask](https://www.dask.org/) is an open-source parallel computing library for Python that scales pandas workflows seamlessly. This guide explains how to use TimeGPT from Nixtla with Dask for distributed forecasting tasks. Dask is ideal when you're already using pandas and need to scale beyond single-machine memory limits—typically for datasets with 10-100 million observations across multiple time series. Unlike Spark, Dask requires minimal code changes from your existing pandas workflow. ## Why Use Dask for Time Series Forecasting? Dask offers unique advantages for scaling time series forecasting: - **Pandas-like API**: Minimal code changes from your existing pandas workflows - **Easy scaling**: Convert pandas DataFrames to Dask with a single line of code - **Python-native**: Pure Python implementation, no JVM required (unlike Spark) - **Flexible deployment**: Run on your laptop or scale to a cluster - **Memory efficiency**: Process datasets larger than RAM through intelligent chunking Choose Dask when you need to scale from 10 million to 100 million observations and want the smoothest transition from pandas. **What you'll learn:** - Simplify distributed computing with Fugue - Run TimeGPT at scale on a Dask cluster - Seamlessly convert pandas DataFrames to Dask ## Prerequisites Before proceeding, make sure you have an [API key from Nixtla](/setup/setting_up_your_api_key). ## How to Use TimeGPT with Dask [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/17_computing_at_scale_dask_distributed.ipynb) ### Step 1: Install Fugue and Dask Fugue provides an easy-to-use interface for distributed computing over frameworks like Dask. You can install Fugue with: ```bash pip install fugue[dask] ``` If running on a distributed Dask cluster, ensure the `nixtla` library is installed on all worker nodes. ### Step 2: Load Your Data You can start by loading data into a pandas DataFrame. In this example, we use hourly electricity prices from multiple markets: ```python import pandas as pd df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv', parse_dates=['ds'], ) df.head() ``` Example pandas DataFrame: | | unique_id | ds | y | | ----- | ----------- | --------------------- | ------- | | 0 | BE | 2016-10-22 00:00:00 | 70.00 | | 1 | BE | 2016-10-22 01:00:00 | 37.10 | | 2 | BE | 2016-10-22 02:00:00 | 37.10 | | 3 | BE | 2016-10-22 03:00:00 | 44.75 | | 4 | BE | 2016-10-22 04:00:00 | 37.10 | ### Step 3: Import Dask Convert the pandas DataFrame into a Dask DataFrame for parallel processing. ```python import dask.dataframe as dd dask_df = dd.from_pandas(df, npartitions=2) dask_df ``` When converting to a Dask DataFrame, you can specify the number of partitions based on your data size or system resources. ### Step 4: Use TimeGPT on Dask To use TimeGPT with Dask, provide a Dask DataFrame to Nixtla's client methods instead of a pandas DataFrame. Instantiate the `NixtlaClient` class to interact with Nixtla's API: ```python from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` You can use any method from the `NixtlaClient`, such as `forecast` or `cross_validation`. ```python Forecast with TimeGPT and Dask fcst_df = nixtla_client.forecast(dask_df, h=12) fcst_df.compute().head() ``` | | unique_id | ds | TimeGPT | | ----- | ----------- | --------------------- | ----------- | | 0 | BE | 2016-12-31 00:00:00 | 45.190453 | | 1 | BE | 2016-12-31 01:00:00 | 43.244446 | | 2 | BE | 2016-12-31 02:00:00 | 41.958389 | | 3 | BE | 2016-12-31 03:00:00 | 39.796486 | | 4 | BE | 2016-12-31 04:00:00 | 39.204533 | ```python Cross-validation with TimeGPT and Dask cv_df = nixtla_client.cross_validation( dask_df, h=12, n_windows=5, step_size=2 ) cv_df.compute().head() ``` | | unique_id | ds | cutoff | TimeGPT | | ----- | ----------- | --------------------- | --------------------- | ----------- | | 0 | BE | 2016-12-30 04:00:00 | 2016-12-30 03:00:00 | 39.375439 | | 1 | BE | 2016-12-30 05:00:00 | 2016-12-30 03:00:00 | 40.039215 | | 2 | BE | 2016-12-30 06:00:00 | 2016-12-30 03:00:00 | 43.455849 | | 3 | BE | 2016-12-30 07:00:00 | 2016-12-30 03:00:00 | 47.716408 | | 4 | BE | 2016-12-30 08:00:00 | 2016-12-30 03:00:00 | 50.316650 | ## Working with Exogenous Variables TimeGPT with Dask also supports exogenous variables. Refer to the [Exogenous Variables Tutorial](/forecasting/exogenous-variables/numeric_features) for details. Simply substitute pandas DataFrames with Dask DataFrames—the API remains identical. ## Related Resources Explore more distributed forecasting options: - [Distributed Computing Overview](/forecasting/forecasting-at-scale/computing_at_scale) - Compare Spark, Dask, and Ray - [Spark Integration](/forecasting/forecasting-at-scale/spark) - For datasets with 100M+ observations - [Ray Integration](/forecasting/forecasting-at-scale/ray) - For ML pipeline integration - [Fine-tuning TimeGPT](/forecasting/fine-tuning/steps) - Improve accuracy at scale - [Cross-Validation](/forecasting/evaluation/cross_validation) - Validate distributed forecasts