--- title: "Time Series Forecasting with Ray" description: "Scale machine learning pipelines with Ray and TimeGPT for distributed time series forecasting. Learn to integrate TimeGPT with Ray for complex ML workflows in Python." icon: "server" --- ## Overview [Ray](https://www.ray.io/) is an open-source unified compute framework that helps scale Python workloads for distributed computing. This guide demonstrates how to distribute TimeGPT forecasting jobs on top of Ray. Ray is ideal for machine learning pipelines with complex task dependencies and datasets with 10+ million observations. Its unified framework excels at orchestrating distributed ML workflows, making it perfect for integrating TimeGPT into broader AI applications. ## Why Use Ray for Time Series Forecasting? Ray offers unique advantages for ML-focused time series forecasting: - **ML pipeline integration**: Seamlessly integrate TimeGPT into complex ML workflows with Ray Tune and Ray Serve - **Task parallelism**: Handle complex task dependencies beyond data parallelism - **Python-native**: Pure Python with minimal boilerplate code - **Flexible architecture**: Scale from laptop to cluster with the same code - **Actor model**: Stateful computations for advanced forecasting scenarios Choose Ray when you're building ML pipelines, need complex task orchestration, or want to integrate TimeGPT with other ML frameworks like PyTorch or TensorFlow. **What you'll learn:** - Install Fugue with Ray support for distributed computing - Initialize Ray clusters for distributed forecasting - Run TimeGPT forecasting and cross-validation on Ray ## Prerequisites Before proceeding, make sure you have an [API key from Nixtla](/setup/setting_up_your_api_key). When executing on a distributed Ray cluster, ensure the `nixtla` library is installed on all workers. ## How to Use TimeGPT with Ray [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/19_computing_at_scale_ray_distributed.ipynb) ### Step 1: Install Fugue and Ray Fugue provides an easy-to-use interface for distributed computation across frameworks like Ray. Install Fugue with Ray support: ```bash pip install fugue[ray] ``` ### Step 2: Load Your Data Load your dataset into a pandas DataFrame. This tutorial uses hourly electricity prices from various markets: ```python import pandas as pd df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv', parse_dates=['ds'], ) df.head() ``` Example pandas DataFrame: | | unique_id | ds | y | | ----- | ----------- | --------------------- | ------- | | 0 | BE | 2016-10-22 00:00:00 | 70.00 | | 1 | BE | 2016-10-22 01:00:00 | 37.10 | | 2 | BE | 2016-10-22 02:00:00 | 37.10 | | 3 | BE | 2016-10-22 03:00:00 | 44.75 | | 4 | BE | 2016-10-22 04:00:00 | 37.10 | ### Step 3: Initialize Ray Create a Ray cluster locally by initializing a head node. You can scale this to multiple machines in a real cluster environment. ```python import ray from ray.cluster_utils import Cluster ray_cluster = Cluster( initialize_head=True, head_node_args={"num_cpus": 2} ) ray.init(address=ray_cluster.address, ignore_reinit_error=True) # Convert your DataFrame to Ray format: ray_df = ray.data.from_pandas(df) ray_df ``` ### Step 4: Use TimeGPT on Ray To use TimeGPT with Ray, provide a Ray Dataset to Nixtla's client methods instead of a pandas DataFrame. The API remains the same as local usage. Instantiate the `NixtlaClient` class to interact with Nixtla's API: ```python from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` You can use any method from the `NixtlaClient`, such as `forecast` or `cross_validation`. ```python fcst_df = nixtla_client.forecast(ray_df, h=12) fcst_df.to_pandas().tail() ``` Public API models supported include `timegpt-1` (default) and `timegpt-1-long-horizon`. For long horizon forecasting, see the [long-horizon model tutorial](/forecasting/model-version/longhorizon_model). ```python cv_df = nixtla_client.cross_validation( ray_df, h=12, freq='H', n_windows=5, step_size=2 ) cv_df.to_pandas().tail() ``` ### Step 5: Shutdown Ray Always shut down Ray after you finish your tasks to free up resources: ```python ray.shutdown() ``` ## Working with Exogenous Variables TimeGPT with Ray also supports exogenous variables. Refer to the [Exogenous Variables Tutorial](/forecasting/exogenous-variables/numeric_features) for details. Simply substitute pandas DataFrames with Ray Datasets—the API remains identical. ## Related Resources Explore more distributed forecasting options: - [Distributed Computing Overview](/forecasting/forecasting-at-scale/computing_at_scale) - Compare Spark, Dask, and Ray - [Spark Integration](/forecasting/forecasting-at-scale/spark) - For datasets with 100M+ observations - [Dask Integration](/forecasting/forecasting-at-scale/dask) - For datasets with 10M-100M observations - [Fine-tuning TimeGPT](/forecasting/fine-tuning/steps) - Improve accuracy at scale - [Cross-Validation](/forecasting/evaluation/cross_validation) - Validate distributed forecasts