Remove VLLM_SKIP_WARMUP tip (#29331)

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

Remove VLLM_SKIP_WARMUP tip (#29331)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
4dd42db5 · Tyler Michael Smith · GitHub · 84371daf · 4dd42db5
Unverified Commit 4dd42db5 authored Nov 24, 2025 by Tyler Michael Smith Committed by GitHub Nov 24, 2025
Show whitespace changes
Inline Side-by-side

Showing with 0 additions and 3 deletions

docs/features/quantization/inc.md docs/features/quantization/inc.md +0 -3

No files found.
--- a/docs/features/quantization/inc.md
+++ b/docs/features/quantization/inc.md
@@ -22,9 +22,6 @@ export QUANT_CONFIG=/path/to/quant/config/inc/meta-llama-3.1-405b-instruct/maxab
 vllm serve meta-llama/Llama-3.1-405B-Instruct --quantization inc --kv-cache-dtype fp8_inc --tensor_paralel_size 8
 ```

-!!! tip
-    If you are just prototyping or testing your model with FP8, you can use the `VLLM_SKIP_WARMUP=true` environment variable to disable the warmup stage, which can take a long time. However, we do not recommend disabling this feature in production environments as it causes a significant performance drop.
-
 !!! tip
    When using FP8 models, you may experience timeouts caused by the long compilation time of FP8 operations. To mitigate this problem, you can use the below environment variables:
    `VLLM_ENGINE_ITERATION_TIMEOUT_S` - to adjust the vLLM server timeout. You can set the value in seconds, e.g., 600 equals 10 minutes.