cpu.s390x.inc.md 3.16 KB
Newer Older
1
2
<!-- markdownlint-disable MD041 -->
--8<-- [start:installation]
3

4
vLLM has experimental support for s390x architecture on IBM Z platform. For now, users must build from source to natively run on IBM Z platform.
5

6
Currently, the CPU implementation for s390x architecture supports FP32, BF16 and FP16.
7

8
9
--8<-- [end:installation]
--8<-- [start:requirements]
10
11

- OS: `Linux`
12
- SDK: `gcc/g++ >= 14.0.0` or later with Command Line Tools
13
- Instruction Set Architecture (ISA): VXE support is required. Works with Z14 and above.
14
- Build install python packages: `torchvision`, `llvmlite`, `numba`, `pyarrow (for testing)`, `opencv-headless`
15

16
17
--8<-- [end:requirements]
--8<-- [start:set-up-using-python]
18

19
20
--8<-- [end:set-up-using-python]
--8<-- [start:pre-built-wheels]
21

22
23
Currently, there are no pre-built IBM Z CPU wheels.

24
25
--8<-- [end:pre-built-wheels]
--8<-- [start:build-wheel-from-source]
26

27
Install the following packages from the package manager before building the vLLM. For example on RHEL 9.6:
28

29
```bash
30
dnf install -y \
31
    which procps findutils tar vim git gcc-toolset-14 gcc-toolset-14-binutils gcc-toolset-14-libatomic-devel zlib-devel \
32
    libjpeg-turbo-devel libtiff-devel libpng-devel libwebp-devel freetype-devel harfbuzz-devel \
33
34
    openssl-devel openblas openblas-devel autoconf automake libtool cmake numpy libsndfile \
    clang llvm-devel llvm-static clang-devel
35
36
37
38
```

Install rust>=1.80 which is needed for `outlines-core` and `uvloop` python packages installation.

39
```bash
40
41
42
43
curl https://sh.rustup.rs -sSf | sh -s -- -y && \
    . "$HOME/.cargo/env"
```

44
Execute the following commands to build and install vLLM from source.
45

46
!!! tip
47
    Please build the following dependencies, `torchvision`, `llvmlite`, `numba`, `llguidance`, `pyarrow`, `opencv-headless` from source before building vLLM.
48

49
```bash
50
    uv pip install -v \
51
        --extra-index-url https://download.pytorch.org/whl/cpu \
52
        --torch-backend auto \
53
        -r requirements/build/cpu.txt \
54
        -r requirements/cpu.txt \
55
    VLLM_TARGET_DEVICE=cpu python setup.py bdist_wheel && \
56
        uv pip install dist/*.whl
57
58
```

59
60
61
??? console "pip"
    ```bash
        pip install -v \
62
63
            --extra-index-url https://download.pytorch.org/whl/cpu \
            -r requirements/build/cpu.txt \
64
            -r requirements/cpu.txt \
65
66
67
68
        VLLM_TARGET_DEVICE=cpu python setup.py bdist_wheel && \
            pip install dist/*.whl
    ```

69
70
--8<-- [end:build-wheel-from-source]
--8<-- [start:pre-built-images]
71

72
73
Currently, there are no pre-built IBM Z CPU images.

74
75
--8<-- [end:pre-built-images]
--8<-- [start:build-image-from-source]
76

Li, Jiang's avatar
Li, Jiang committed
77
78
```bash
docker build -f docker/Dockerfile.s390x \
79
    --tag vllm-cpu-env .
Li, Jiang's avatar
Li, Jiang committed
80

81
# Launch OpenAI server
Li, Jiang's avatar
Li, Jiang committed
82
docker run --rm \
83
84
85
86
87
88
89
90
91
    --privileged true \
    --shm-size 4g \
    -p 8000:8000 \
    -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
    -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
    vllm-cpu-env \
    --model meta-llama/Llama-3.2-1B-Instruct \
    --dtype float \
    other vLLM OpenAI server arguments
Li, Jiang's avatar
Li, Jiang committed
92
93
```

94
95
96
!!! tip
    An alternative of `--privileged true` is `--cap-add SYS_NICE --security-opt seccomp=unconfined`.

97
98
99
--8<-- [end:build-image-from-source]
--8<-- [start:extra-information]
--8<-- [end:extra-information]