ray.mdx 5.44 KB
Newer Older
bailuo's avatar
readme  
bailuo committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: "Time Series Forecasting with Ray"
description: "Scale machine learning pipelines with Ray and TimeGPT for distributed time series forecasting. Learn to integrate TimeGPT with Ray for complex ML workflows in Python."
icon: "server"
---

## Overview

[Ray](https://www.ray.io/) is an open-source unified compute framework that helps scale Python workloads for distributed computing. This guide demonstrates how to distribute TimeGPT forecasting jobs on top of Ray.

Ray is ideal for machine learning pipelines with complex task dependencies and datasets with 10+ million observations. Its unified framework excels at orchestrating distributed ML workflows, making it perfect for integrating TimeGPT into broader AI applications.

## Why Use Ray for Time Series Forecasting?

Ray offers unique advantages for ML-focused time series forecasting:

- **ML pipeline integration**: Seamlessly integrate TimeGPT into complex ML workflows with Ray Tune and Ray Serve
- **Task parallelism**: Handle complex task dependencies beyond data parallelism
- **Python-native**: Pure Python with minimal boilerplate code
- **Flexible architecture**: Scale from laptop to cluster with the same code
- **Actor model**: Stateful computations for advanced forecasting scenarios

Choose Ray when you're building ML pipelines, need complex task orchestration, or want to integrate TimeGPT with other ML frameworks like PyTorch or TensorFlow.

**What you'll learn:**

- Install Fugue with Ray support for distributed computing
- Initialize Ray clusters for distributed forecasting
- Run TimeGPT forecasting and cross-validation on Ray

## Prerequisites

Before proceeding, make sure you have an [API key from Nixtla](/setup/setting_up_your_api_key).

When executing on a distributed Ray cluster, ensure the `nixtla` library is installed on all workers.

## How to Use TimeGPT with Ray

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/19_computing_at_scale_ray_distributed.ipynb)

### Step 1: Install Fugue and Ray

Fugue provides an easy-to-use interface for distributed computation across frameworks like Ray.

Install Fugue with Ray support:

```bash
pip install fugue[ray]
```

### Step 2: Load Your Data

Load your dataset into a pandas DataFrame. This tutorial uses hourly electricity prices from various markets:

```python
import pandas as pd

df = pd.read_csv(
    'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
    parse_dates=['ds'],
)
df.head()
```

Example pandas DataFrame:

|       | unique_id   | ds                    | y       |
| ----- | ----------- | --------------------- | ------- |
| 0     | BE          | 2016-10-22 00:00:00   | 70.00   |
| 1     | BE          | 2016-10-22 01:00:00   | 37.10   |
| 2     | BE          | 2016-10-22 02:00:00   | 37.10   |
| 3     | BE          | 2016-10-22 03:00:00   | 44.75   |
| 4     | BE          | 2016-10-22 04:00:00   | 37.10   |

### Step 3: Initialize Ray

Create a Ray cluster locally by initializing a head node. You can scale this to multiple machines in a real cluster environment.

```python
import ray
from ray.cluster_utils import Cluster

ray_cluster = Cluster(
    initialize_head=True,
    head_node_args={"num_cpus": 2}
)

ray.init(address=ray_cluster.address, ignore_reinit_error=True)

# Convert your DataFrame to Ray format:
ray_df = ray.data.from_pandas(df)
ray_df
```

### Step 4: Use TimeGPT on Ray

To use TimeGPT with Ray, provide a Ray Dataset to Nixtla's client methods instead of a pandas DataFrame. The API remains the same as local usage.

Instantiate the `NixtlaClient` class to interact with Nixtla's API:

```python
from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    api_key='my_api_key_provided_by_nixtla'
)
```

You can use any method from the `NixtlaClient`, such as `forecast` or `cross_validation`.

<Tabs>
  <Tab title="Forecast Example">
    ```python
    fcst_df = nixtla_client.forecast(ray_df, h=12)
    fcst_df.to_pandas().tail()
    ```

    Public API models supported include `timegpt-1` (default) and `timegpt-1-long-horizon`. For long horizon forecasting, see the [long-horizon model tutorial](/forecasting/model-version/longhorizon_model).
  </Tab>
  <Tab title="Cross-validation Example">
    ```python
    cv_df = nixtla_client.cross_validation(
        ray_df,
        h=12,
        freq='H',
        n_windows=5,
        step_size=2
    )
    cv_df.to_pandas().tail()
    ```
  </Tab>
</Tabs>

### Step 5: Shutdown Ray

Always shut down Ray after you finish your tasks to free up resources:

```python
ray.shutdown()
```

## Working with Exogenous Variables

TimeGPT with Ray also supports exogenous variables. Refer to the [Exogenous Variables Tutorial](/forecasting/exogenous-variables/numeric_features) for details. Simply substitute pandas DataFrames with Ray Datasets—the API remains identical.

## Related Resources

Explore more distributed forecasting options:
- [Distributed Computing Overview](/forecasting/forecasting-at-scale/computing_at_scale) - Compare Spark, Dask, and Ray
- [Spark Integration](/forecasting/forecasting-at-scale/spark) - For datasets with 100M+ observations
- [Dask Integration](/forecasting/forecasting-at-scale/dask) - For datasets with 10M-100M observations
- [Fine-tuning TimeGPT](/forecasting/fine-tuning/steps) - Improve accuracy at scale
- [Cross-Validation](/forecasting/evaluation/cross_validation) - Validate distributed forecasts