update_pytorch_version.md 6.68 KB
Newer Older
1
# Update PyTorch version on vLLM OSS CI/CD
2
3
4
5
6
7

vLLM's current policy is to always use the latest PyTorch stable
release in CI/CD. It is standard practice to submit a PR to update the
PyTorch version as early as possible when a new [PyTorch stable
release](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-cadence) becomes available.
This process is non-trivial due to the gap between PyTorch
8
9
releases. Using <gh-pr:16859> as an example, this document outlines common steps to achieve this
update along with a list of potential issues and how to address them.
10
11
12
13
14
15
16
17
18

## Test PyTorch release candidates (RCs)

Updating PyTorch in vLLM after the official release is not
ideal because any issues discovered at that point can only be resolved
by waiting for the next release or by implementing hacky workarounds in vLLM.
The better solution is to test vLLM with PyTorch release candidates (RC) to ensure
compatibility before each release.

Reid's avatar
Reid committed
19
20
PyTorch release candidates can be downloaded from [PyTorch test index](https://download.pytorch.org/whl/test).
For example, `torch2.7.0+cu12.8` RC can be installed using the following command:
21

Reid's avatar
Reid committed
22
23
24
```bash
uv pip install torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/test/cu128
25
26
27
28
29
30
31
```

When the final RC is ready for testing, it will be announced to the community
on the [PyTorch dev-discuss forum](https://dev-discuss.pytorch.org/c/release-announcements).
After this announcement, we can begin testing vLLM integration by drafting a pull request
following this 3-step process:

Reid's avatar
Reid committed
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
1. Update [requirements files](https://github.com/vllm-project/vllm/tree/main/requirements)
to point to the new releases for `torch`, `torchvision`, and `torchaudio`.

2. Use the following option to get the final release candidates' wheels. Some common platforms are `cpu`, `cu128`, and `rocm6.2.4`.

    ```bash
    --extra-index-url https://download.pytorch.org/whl/test/<PLATFORM>
    ```

3. Since vLLM uses `uv`, ensure the following index strategy is applied:

    - Via environment variable:

    ```bash
    export UV_INDEX_STRATEGY=unsafe-best-match
    ```

    - Or via CLI flag:

    ```bash
    --index-strategy unsafe-best-match
    ```
54
55
56
57
58
59

If failures are found in the pull request, raise them as issues on vLLM and
cc the PyTorch release team to initiate discussion on how to address them.

## Update CUDA version

60
The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example, torch `2.7.1+cu126`) is uploaded to PyPI. However, vLLM may require a different CUDA version,
61
62
63
64
65
such as 12.8 for Blackwell support.
This complicates the process as we cannot use the out-of-the-box
`pip install torch torchvision torchaudio` command. The solution is to use
`--extra-index-url` in vLLM's Dockerfiles.

Reid's avatar
Reid committed
66
67
68
69
70
71
72
73
74
75
76
77
78
- Important indexes at the moment include:

| Platform | `--extra-index-url` |
|----------|-----------------|
| CUDA 12.8| [https://download.pytorch.org/whl/cu128](https://download.pytorch.org/whl/cu128)|
| CPU      | [https://download.pytorch.org/whl/cpu](https://download.pytorch.org/whl/cpu)|
| ROCm 6.2 | [https://download.pytorch.org/whl/rocm6.2.4](https://download.pytorch.org/whl/rocm6.2.4) |
| ROCm 6.3 | [https://download.pytorch.org/whl/rocm6.3](https://download.pytorch.org/whl/rocm6.3) |
| XPU      | [https://download.pytorch.org/whl/xpu](https://download.pytorch.org/whl/xpu) |

- Update the below files to match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested on CI.
    - `.buildkite/release-pipeline.yaml`
    - `.buildkite/scripts/upload-wheels.sh`
79
80
81
82
83
84
85
86
87

## Address long vLLM build time

When building vLLM with a new PyTorch/CUDA version, no cache will exist
in the vLLM sccache S3 bucket, causing the build job on CI to potentially take more than 5 hours
and timeout. Additionally, since vLLM's fastcheck pipeline runs in read-only mode,
it doesn't populate the cache, so re-running it to warm up the cache
is ineffective.

88
While ongoing efforts like [#17419](gh-issue:17419)
Reid's avatar
Reid committed
89
address the long build time at its source, the current workaround is to set `VLLM_CI_BRANCH`
90
91
92
to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`)
when manually triggering a build on Buildkite. This branch accomplishes two things:

93
1. Increase the timeout limit to 10 hours so that the build doesn't time out.
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
2. Allow the compiled artifacts to be written to the vLLM sccache S3 bucket
to warm it up so that future builds are faster.

<p align="center" width="100%">
    <img width="60%" src="https://github.com/user-attachments/assets/a8ff0fcd-76e0-4e91-b72f-014e3fdb6b94">
</p>

## Update dependencies

Several vLLM dependencies, such as FlashInfer, also depend on PyTorch and need
to be updated accordingly. Rather than waiting for all of them to publish new
releases (which would take too much time), they can be built from
source to unblock the update process.

### FlashInfer
109

Reid's avatar
Reid committed
110
Here is how to build and install it from source with `torch2.7.0+cu128` in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
111

112
```bash
113
114
export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0 10.0+PTX'
export FLASHINFER_ENABLE_SM90=1
Reid's avatar
Reid committed
115
116
uv pip install --system \
    --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6.post1"
117
118
119
120
```

One caveat is that building FlashInfer from source adds approximately 30
minutes to the vLLM build time. Therefore, it's preferable to cache the wheel in a
Reid's avatar
Reid committed
121
public location for immediate installation, such as [this FlashInfer wheel link](https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl). For future releases, contact the PyTorch release
122
123
124
team if you want to get the package published there.

### xFormers
125

126
127
Similar to FlashInfer, here is how to build and install xFormers from source:

128
```bash
129
export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0+PTX'
Reid's avatar
Reid committed
130
131
MAX_JOBS=16 uv pip install --system \
    --no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30"
132
133
134
135
136
137
138
139
```

## Update all the different vLLM platforms

Rather than attempting to update all vLLM platforms in a single pull request, it's more manageable
to handle some platforms separately. The separation of requirements and Dockerfiles
for different platforms in vLLM CI/CD allows us to selectively choose
which platforms to update. For instance, updating XPU requires the corresponding
Reid's avatar
Reid committed
140
release from [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) by Intel.
141
142
While <gh-pr:16859> updated vLLM to PyTorch 2.7.0 on CPU, CUDA, and ROCm,
<gh-pr:17444> completed the update for XPU.