forecasting_web_traffic.mdx 6.25 KB
Newer Older
bailuo's avatar
readme  
bailuo committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
title: "Forecasting Web Traffic"
description: "Learn how to predict website traffic patterns using TimeGPT."
icon: "globe"
---

<Info>
  **Goal:** Forecast the next 7 days of daily visits to the website [cienciadedatos.net](https://cienciadedatos.net) using TimeGPT.
</Info>

This tutorial is adapted from *"Forecasting web traffic with machine learning and Python"* by Joaquín Amat Rodrigo and Javier Escobar Ortiz. You will learn how to:

<CardGroup cols={3}>
  <Card title="Improve Accuracy">

    Obtain forecasts nearly 10% more accurate than the original method.
  </Card>
  <Card title="Reduce Complexity">

    Use significantly fewer lines of code and simpler workflows.
  </Card>
  <Card title="Save Time">

    Generate forecasts in substantially less computation time.
  </Card>
</CardGroup>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/1_forecasting_web_traffic.ipynb)


<Steps>

<Step title="1. Import Packages and Initialize Client">

To start, import the required packages and initialize the Nixtla client with your API key.

```python Nixtla Client Initialization
import pandas as pd
from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    api_key='my_api_key_provided_by_nixtla'  # Defaults to os.environ.get("NIXTLA_API_KEY")
)
```

<Check>
  **Use an Azure AI endpoint**
  <br />
  If you are using an Azure AI endpoint, also set the `base_url` argument:

  ```python Azure AI Endpoint Setup
  nixtla_client = NixtlaClient(
    base_url="your_azure_ai_endpoint",
    api_key="your_api_key"
  )
  ```
</Check>

</Step>

<Step title="2. Load Data">

We will load the website visit data directly from a CSV file. Then, we format the dataset by adding an identifier column named `daily_visits`.

```python Load and Format Data
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/'
    'master/data/visitas_por_dia_web_cienciadedatos.csv'
)
df = pd.read_csv(url, sep=',', parse_dates=[0], date_format='%d/%m/%y')
df['unique_id'] = 'daily_visits'

df.head(10)
```

<AccordionGroup>
  <Accordion title="Data Preview (first 10 rows)">
    |       | date         | users   | unique_id      |
| ----- | ------------ | ------- | -------------- |
| 0     | 2020-07-01   | 2324    | daily_visits   |
| 1     | 2020-07-02   | 2201    | daily_visits   |
| 2     | 2020-07-03   | 2146    | daily_visits   |
| 3     | 2020-07-04   | 1666    | daily_visits   |
| 4     | 2020-07-05   | 1433    | daily_visits   |
| 5     | 2020-07-06   | 2195    | daily_visits   |
| 6     | 2020-07-07   | 2240    | daily_visits   |
| 7     | 2020-07-08   | 2295    | daily_visits   |
| 8     | 2020-07-09   | 2279    | daily_visits   |
| 9     | 2020-07-10   | 2155    | daily_visits   |

  </Accordion>
</AccordionGroup>

<Info>
  **Note:** No further preprocessing is required before we start forecasting.
</Info>

</Step>

<Step title="3. Cross-Validation with TimeGPT">

We will set up a rolling window cross-validation using TimeGPT. This will help us evaluate the forecast accuracy across multiple historic windows.

```python Cross-validation Setup
timegpt_cv_df = nixtla_client.cross_validation(
    df,
    h=7,
    n_windows=8,
    time_col='date',
    target_col='users',
    freq='D',
    level=[80, 90, 99.5]
)

timegpt_cv_df.head()
```

<Frame caption="Cross-validation forecast plot">
  ![CV Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/1_forecasting_web_traffic_files/figure-markdown_strict/cell-12-output-1.png)
</Frame>


  The results align closely with those from the original tutorial on [forecasting web traffic with machine learning](https://cienciadedatos.net/documentos/py37-forecasting-web-traffic-machine-learning.html).


Next, we compute the Mean Absolute Error (MAE) to quantify forecast accuracy:

```python Calculate MAE
from utilsforecast.losses import mae

mae_timegpt = mae(
    df=timegpt_cv_df.drop(columns=['cutoff']),
    models=['TimeGPT'],
    target_col='users'
)

mae_timegpt
```

<Info>
  **MAE Result:** The MAE obtained is `167.69`, outperforming the original pipeline.
</Info>

</Step>

<Step title="4. Adding Exogenous Variables (Weekday Indicators)">

Exogenous variables can provide additional context that may improve forecast accuracy. In this example, we add binary indicators for each day of the week.

```python Add Weekday Indicators
for i in range(7):
    df[f'week_day_{i + 1}'] = 1 * (df['date'].dt.weekday == i)

df.head(10)
```

We repeat the cross-validation with these new features:

```python Cross-validation with Exogenous Variables
timegpt_cv_df_with_ex = nixtla_client.cross_validation(
    df,
    h=7,
    n_windows=8,
    time_col='date',
    target_col='users',
    freq='D',
    level=[80, 90, 99.5]
)
```

<Frame caption="Forecast with Exogenous Variables">
  ![Exogenous CV Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/1_forecasting_web_traffic_files/figure-markdown_strict/cell-17-output-1.png)
</Frame>

<Info>
  Adding weekday indicators can capture weekly seasonality in user visits.
</Info>

</Step>

<Step title="5. Comparing Results">

<Card title="Results">

| **Model** | **Exogenous features** | **MAE Backtest**|
| --------- | -------------------- | -------------- |
| TimeGPT   | No                   | 167.6917       |
| TimeGPT   | Yes                  | 167.2286       |

</Card>

We see a slight improvement in MAE by including the weekday indicators. This illustrates how TimeGPT can incorporate additional signals without complex data processing or extensive model tuning.

</Step>

<Step title="6. Final Thoughts">

<Check>
  **Key Takeaways**

    - TimeGPT simplifies forecasting workflows by reducing code and tuning overhead.

    - Feature engineering (like adding weekday variables) further boosts accuracy.

    - Cross-validation provides a robust way to evaluate model performance.


</Check>

We have demonstrated significant improvements in forecasting accuracy with minimal effort using TimeGPT. This avoids the majority of the complex steps required when building custom models—such as extensive feature engineering, validation, model comparisons, and hyperparameter tuning.

**Good luck and happy forecasting!**

</Step>

</Steps>