Unverified Commit 02d709a6 authored by Kay Yan's avatar Kay Yan Committed by GitHub
Browse files

[docs] standardize Hugging Face env var to `HF_TOKEN` (deprecates...


[docs] standardize Hugging Face env var to `HF_TOKEN` (deprecates `HUGGING_FACE_HUB_TOKEN`) (#27020)
Signed-off-by: default avatarKay Yan <kay.yan@daocloud.io>
parent 4a510ab4
...@@ -10,7 +10,7 @@ The image can be used to run OpenAI compatible server and is available on Docker ...@@ -10,7 +10,7 @@ The image can be used to run OpenAI compatible server and is available on Docker
```bash ```bash
docker run --runtime nvidia --gpus all \ docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \ -v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \ --env "HF_TOKEN=$HF_TOKEN" \
-p 8000:8000 \ -p 8000:8000 \
--ipc=host \ --ipc=host \
vllm/vllm-openai:latest \ vllm/vllm-openai:latest \
...@@ -22,7 +22,7 @@ This image can also be used with other container engines such as [Podman](https: ...@@ -22,7 +22,7 @@ This image can also be used with other container engines such as [Podman](https:
```bash ```bash
podman run --device nvidia.com/gpu=all \ podman run --device nvidia.com/gpu=all \
-v ~/.cache/huggingface:/root/.cache/huggingface \ -v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \ --env "HF_TOKEN=$HF_TOKEN" \
-p 8000:8000 \ -p 8000:8000 \
--ipc=host \ --ipc=host \
docker.io/vllm/vllm-openai:latest \ docker.io/vllm/vllm-openai:latest \
...@@ -128,7 +128,7 @@ To run vLLM with the custom-built Docker image: ...@@ -128,7 +128,7 @@ To run vLLM with the custom-built Docker image:
docker run --runtime nvidia --gpus all \ docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \ -v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \ -p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \ --env "HF_TOKEN=<secret>" \
vllm/vllm-openai <args...> vllm/vllm-openai <args...>
``` ```
......
...@@ -35,7 +35,7 @@ Deploy the following yaml file `lws.yaml` ...@@ -35,7 +35,7 @@ Deploy the following yaml file `lws.yaml`
- name: vllm-leader - name: vllm-leader
image: docker.io/vllm/vllm-openai:latest image: docker.io/vllm/vllm-openai:latest
env: env:
- name: HUGGING_FACE_HUB_TOKEN - name: HF_TOKEN
value: <your-hf-token> value: <your-hf-token>
command: command:
- sh - sh
...@@ -83,7 +83,7 @@ Deploy the following yaml file `lws.yaml` ...@@ -83,7 +83,7 @@ Deploy the following yaml file `lws.yaml`
ephemeral-storage: 800Gi ephemeral-storage: 800Gi
cpu: 125 cpu: 125
env: env:
- name: HUGGING_FACE_HUB_TOKEN - name: HF_TOKEN
value: <your-hf-token> value: <your-hf-token>
volumeMounts: volumeMounts:
- mountPath: /dev/shm - mountPath: /dev/shm
......
...@@ -82,7 +82,7 @@ Next, start the vLLM server as a Kubernetes Deployment and Service: ...@@ -82,7 +82,7 @@ Next, start the vLLM server as a Kubernetes Deployment and Service:
"vllm serve meta-llama/Llama-3.2-1B-Instruct" "vllm serve meta-llama/Llama-3.2-1B-Instruct"
] ]
env: env:
- name: HUGGING_FACE_HUB_TOKEN - name: HF_TOKEN
valueFrom: valueFrom:
secretKeyRef: secretKeyRef:
name: hf-token-secret name: hf-token-secret
...@@ -209,7 +209,7 @@ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) ...@@ -209,7 +209,7 @@ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
"vllm serve mistralai/Mistral-7B-Instruct-v0.3 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024" "vllm serve mistralai/Mistral-7B-Instruct-v0.3 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
] ]
env: env:
- name: HUGGING_FACE_HUB_TOKEN - name: HF_TOKEN
valueFrom: valueFrom:
secretKeyRef: secretKeyRef:
name: hf-token-secret name: hf-token-secret
...@@ -298,7 +298,7 @@ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) ...@@ -298,7 +298,7 @@ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
"vllm serve mistralai/Mistral-7B-v0.3 --port 8000 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024" "vllm serve mistralai/Mistral-7B-v0.3 --port 8000 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
] ]
env: env:
- name: HUGGING_FACE_HUB_TOKEN - name: HF_TOKEN
valueFrom: valueFrom:
secretKeyRef: secretKeyRef:
name: hf-token-secret name: hf-token-secret
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment