"vscode:/vscode.git/clone" did not exist on "fe8a2c544ad97119f4dafd316e5d9664521b73f9"
nginx.md 4.06 KB
Newer Older
1
# Using Nginx
2
3
4

This document shows how to launch multiple vLLM serving containers and use Nginx to act as a load balancer between the servers.

5
[](){ #nginxloadbalancer-nginx-build }
6
7
8
9
10

## Build Nginx Container

This guide assumes that you have just cloned the vLLM project and you're currently in the vllm root directory.

11
```bash
12
13
14
15
16
export vllm_root=`pwd`
```

Create a file named `Dockerfile.nginx`:

17
```dockerfile
18
19
20
21
22
23
24
25
FROM nginx:latest
RUN rm /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
```

Build the container:

26
```bash
27
28
29
docker build . -f Dockerfile.nginx --tag nginx-lb
```

30
[](){ #nginxloadbalancer-nginx-conf }
31
32
33
34
35

## Create Simple Nginx Config file

Create a file named `nginx_conf/nginx.conf`. Note that you can add as many servers as you'd like. In the below example we'll start with two. To add more, add another `server vllmN:8000 max_fails=3 fail_timeout=10000s;` entry to `upstream backend`.

36
??? console "Config"
37
38
39
40
41
42

    ```console
    upstream backend {
        least_conn;
        server vllm0:8000 max_fails=3 fail_timeout=10000s;
        server vllm1:8000 max_fails=3 fail_timeout=10000s;
43
    }
44
45
46
47
48
49
50
51
52
53
54
    server {
        listen 80;
        location / {
            proxy_pass http://backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
    }
    ```
55

56
[](){ #nginxloadbalancer-nginx-vllm-container }
57
58
59

## Build vLLM Container

60
```bash
61
cd $vllm_root
62
docker build -f docker/Dockerfile . --tag vllm
63
64
65
66
```

If you are behind proxy, you can pass the proxy settings to the docker build command as shown below:

67
```bash
68
cd $vllm_root
Reid's avatar
Reid committed
69
70
71
72
73
docker build \
    -f docker/Dockerfile . \
    --tag vllm \
    --build-arg http_proxy=$http_proxy \
    --build-arg https_proxy=$https_proxy
74
75
```

76
[](){ #nginxloadbalancer-nginx-docker-network }
77
78
79

## Create Docker Network

80
```bash
81
82
83
docker network create vllm_nginx
```

84
[](){ #nginxloadbalancer-nginx-launch-container }
85
86
87
88
89
90
91

## Launch vLLM Containers

Notes:

- If you have your HuggingFace models cached somewhere else, update `hf_cache_dir` below.
- If you don't have an existing HuggingFace cache you will want to start `vllm0` and wait for the model to complete downloading and the server to be ready. This will ensure that `vllm1` can leverage the model you just downloaded and it won't have to be downloaded again.
92
- The below example assumes GPU backend used. If you are using CPU backend, remove `--gpus device=ID`, add `VLLM_CPU_KVCACHE_SPACE` and `VLLM_CPU_OMP_THREADS_BIND` environment variables to the docker run command.
93
94
- Adjust the model name that you want to use in your vLLM servers if you don't want to use `Llama-2-7b-chat-hf`.

95
??? console "Commands"
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120

    ```console
    mkdir -p ~/.cache/huggingface/hub/
    hf_cache_dir=~/.cache/huggingface/
    docker run \
        -itd \
        --ipc host \
        --network vllm_nginx \
        --gpus device=0 \
        --shm-size=10.24gb \
        -v $hf_cache_dir:/root/.cache/huggingface/ \
        -p 8081:8000 \
        --name vllm0 vllm \
        --model meta-llama/Llama-2-7b-chat-hf
    docker run \
        -itd \
        --ipc host \
        --network vllm_nginx \
        --gpus device=1 \
        --shm-size=10.24gb \
        -v $hf_cache_dir:/root/.cache/huggingface/ \
        -p 8082:8000 \
        --name vllm1 vllm \
        --model meta-llama/Llama-2-7b-chat-hf
    ```
121

122
123
!!! note
    If you are behind proxy, you can pass the proxy settings to the docker run command via `-e http_proxy=$http_proxy -e https_proxy=$https_proxy`.
124

125
[](){ #nginxloadbalancer-nginx-launch-nginx }
126
127
128

## Launch Nginx

129
```bash
Reid's avatar
Reid committed
130
131
132
133
134
135
docker run \
    -itd \
    -p 8000:80 \
    --network vllm_nginx \
    -v ./nginx_conf/:/etc/nginx/conf.d/ \
    --name nginx-lb nginx-lb:latest
136
137
```

138
[](){ #nginxloadbalancer-nginx-verify-nginx }
139
140
141

## Verify That vLLM Servers Are Ready

142
```bash
143
144
145
146
147
148
149
150
151
docker logs vllm0 | grep Uvicorn
docker logs vllm1 | grep Uvicorn
```

Both outputs should look like this:

```console
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
```