---
title: "Data Requirements"
description: "Overview of the data format and requirements for TimeGPT forecasting."
icon: "table"
---
TimeGPT accepts **pandas** and **polars** dataframes in [long format](https://www.theanalysisfactor.com/wide-and-long-data/#comments). The minimum required columns are:
- **unique_id**: String or numerical value to label each series.
- **ds**(timestamp): String or datetime in `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS` format.
- **y**(numeric): Numerical target variable to forecast.
If a DataFrame lacks the `ds` column but uses a **DatetimeIndex**, that is also supported.
TimeGPT also supports distributed dataframe libraries such as **dask**, **spark**, and **ray**.
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/getting-started/5_data_requirements.ipynb)
You can include additional exogenous features in the same DataFrame. See the [Exogenous Variables tutorial](/forecasting/exogenous-variables/numeric_features) for details.
---
## Example DataFrame
Below is a sample of a valid input DataFrame for TimeGPT (with columns named `timestamp` and `value` instead of `ds` and `y`):
```python Sample Data Loading
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df["unique_id"] = "series1"
df.head()
```
**Sample Data Preview**
| **unique_id** | **timestamp** | **value** |
| ----- | ------------ | ------- |
| series1 | 1949-01-01 | 112 |
| series1 | 1949-02-01 | 118 |
| series1 | 1949-03-01 | 132 |
| series1 | 1949-04-01 | 129 |
| series1 | 1949-05-01 | 121 |
In this example:
- `unique_id` identifies the series
- `timestamp` corresponds to `ds`.
- `value` corresponds to `y`.
---
## Matching Columns to TimeGPT
You can choose how to align your DataFrame columns with TimeGPT’s expected structure:
Rename `timestamp` to `ds` and `value` to `y`:
```python Rename Columns Example
df = df.rename(columns={'timestamp': 'ds', 'value': 'y'})
```
Now your DataFrame has the explicitly required columns:
```bash Show Head of DataFrame
print(df.head())
```
Specify column names directly when calling `NixtlaClient`:
```python NixtlaClient Forecast Example
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla')
fcst = nixtla_client.forecast(
df=df,
h=12,
time_col='timestamp',
target_col='value'
)
fcst.head()
```
This way, you don’t need to rename your DataFrame columns, as TimeGPT will know which ones to treat as `ds` and `y`.
---
## Example Forecast
When you run the forecast method:
```python Forecast Example
fcst = nixtla_client.forecast(
df=df,
h=12,
time_col='timestamp',
target_col='value'
)
fcst.head()
```
```bash Forecast Logs
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
```
| unique_id | timestamp | TimeGPT |
| ----- | ------------ | ----------- |
| series1 | 1961-01-01 | 437.83792 |
| series1 | 1961-02-01 | 426.06270 |
| series1 | 1961-03-01 | 463.11655 |
| series1 | 1961-04-01 | 478.24450 |
| series1 | 1961-05-01 | 505.64648 |
TimeGPT attempts to automatically infer your data’s frequency (`freq`). You can override this by specifying the **freq** parameter (e.g., `freq='MS'`).
For more information, see the [TimeGPT Quickstart](/forecasting/timegpt_quickstart).
---
## Important Considerations
**Warning:** Data passed to TimeGPT must not contain missing values or time gaps.
To handle missing data, see [Dealing with Missing Values in TimeGPT](/data_requirements/missing_values).
---
### Minimum Data Requirements (Azure AI)
These are the minimum data sizes required for each frequency when using Azure AI:
| Frequency | Minimum Size |
| ---------------------------------- | -------------- |
| Hourly and subhourly (e.g., "H") | 1008 |
| Daily ("D") | 300 |
| Weekly (e.g., "W-MON") | 64 |
| Monthly and others | 48 |
When preparing your data, also consider:
Number of future periods you want to predict.
How many times to test the model's performance.
Periodic offset between validation windows during cross-validation.
This ensures you have enough data for both training and evaluation.