"vscode:/vscode.git/clone" did not exist on "c6b699d9c2d71d09359b6785cc3d7c50fac0c847"
Unverified Commit 4dd42db5 authored by Tyler Michael Smith's avatar Tyler Michael Smith Committed by GitHub
Browse files

Remove VLLM_SKIP_WARMUP tip (#29331)


Signed-off-by: default avatarTyler Michael Smith <tlrmchlsmth@gmail.com>
parent 84371daf
......@@ -22,9 +22,6 @@ export QUANT_CONFIG=/path/to/quant/config/inc/meta-llama-3.1-405b-instruct/maxab
vllm serve meta-llama/Llama-3.1-405B-Instruct --quantization inc --kv-cache-dtype fp8_inc --tensor_paralel_size 8
```
!!! tip
If you are just prototyping or testing your model with FP8, you can use the `VLLM_SKIP_WARMUP=true` environment variable to disable the warmup stage, which can take a long time. However, we do not recommend disabling this feature in production environments as it causes a significant performance drop.
!!! tip
When using FP8 models, you may experience timeouts caused by the long compilation time of FP8 operations. To mitigate this problem, you can use the below environment variables:
`VLLM_ENGINE_ITERATION_TIMEOUT_S` - to adjust the vLLM server timeout. You can set the value in seconds, e.g., 600 equals 10 minutes.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment