Unverified Commit efc9ebf0 authored by ptarasiewiczNV's avatar ptarasiewiczNV Committed by GitHub
Browse files

chore: Add note on vLLM DSR1 gibberish outputs upstream issue (#5353)


Signed-off-by: default avatarPiotr Tarasiewicz <ptarasiewicz@nvidia.com>
parent ae03c857
...@@ -93,5 +93,6 @@ curl -sS http://localhost:8000/v1/chat/completions \ ...@@ -93,5 +93,6 @@ curl -sS http://localhost:8000/v1/chat/completions \
- If your storage class differs, update `storageClassName` before applying the PVC. - If your storage class differs, update `storageClassName` before applying the PVC.
- **If you want to run multinode deployments, IBGDA (InfiniBand GPU Direct Async) must be enabled on your nodes.** To enable IBGDA, you can follow this configuration script: [configure_system_drivers.sh](https://github.com/vllm-project/vllm/blob/v0.11.2/tools/ep_kernels/configure_system_drivers.sh). The script configures NVIDIA driver parameters and requires a system reboot to take effect. - **If you want to run multinode deployments, IBGDA (InfiniBand GPU Direct Async) must be enabled on your nodes.** To enable IBGDA, you can follow this configuration script: [configure_system_drivers.sh](https://github.com/vllm-project/vllm/blob/v0.11.2/tools/ep_kernels/configure_system_drivers.sh). The script configures NVIDIA driver parameters and requires a system reboot to take effect.
- `VLLM_MOE_DP_CHUNK_SIZE` can be tuned further. The value 384 was chosen to be largest possible that still can be deployed on 16 H200s. This value should be greater than per rank concurrency. - `VLLM_MOE_DP_CHUNK_SIZE` can be tuned further. The value 384 was chosen to be largest possible that still can be deployed on 16 H200s. This value should be greater than per rank concurrency.
- Starting with vLLM v0.12.0 (Dynamo v0.8.0) DeepSeek-R1 in this configuration might return gibberish outputs, please track the upstream issue [vLLM #32190](https://github.com/vllm-project/vllm/issues/32190).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment