"...e2e/offline_inference/test_diffusion_layerwise_offload.py" did not exist on "356077823ea8569ff15218e51228c1b3d50792a9"
npu.inc.md 2.27 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# --8<-- [start:requirements]

For detailed hardware and software requirements, please refer to the [vllm-ascend installation documentation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html).

# --8<-- [end:requirements]
# --8<-- [start:installation]

The recommended way to use vLLM-Omni on NPU is through the vllm-ascend pre-built Docker images:

```bash
# Update DEVICE according to your NPUs (/dev/davinci[0-7])
export DEVICE0=/dev/davinci0
export DEVICE1=/dev/davinci1
# Update the vllm-ascend image
# Atlas A2:
# export IMAGE=quay.io/ascend/vllm-ascend:v0.14.0
# Atlas A3:
# export IMAGE=quay.io/ascend/vllm-ascend:v0.14.0-a3
export IMAGE=quay.io/ascend/vllm-ascend:v0.14.0
docker run --rm \
    --name vllm-omni-npu \
    --shm-size=1g \
    --device $DEVICE0 \
    --device $DEVICE1 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -p 8000:8000 \
    -it $IMAGE bash

# Install the missing dependency of mooncake in the origin image.
apt update
apt install libjemalloc2
echo "export LD_PRELOAD=/usr/lib/$(uname -m)-linux-gnu/libjemalloc.so.2:$LD_PRELOAD" >> ~/.bashrc
source ~/.bashrc

# Inside the container, install vLLM-Omni from source
cd /vllm-workspace
git clone -b v0.14.0 https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
pip install -v -e .
export VLLM_WORKER_MULTIPROC_METHOD=spawn

# (Optional) Disable mooncake for stable capability
mv /usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake \
   /usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake.disabled
```

The default workdir is `/workspace`, with vLLM, vLLM-Ascend and vLLM-Omni code placed in `/vllm-workspace` installed in development mode.

For other installation methods (pip installation, building from source, custom Docker builds), please refer to the [vllm-ascend installation guide](https://docs.vllm.ai/projects/ascend/en/latest/installation.html).

# --8<-- [end:installation]