fix: Update the vllm docker image to use the cuda sampler rather than the pytorch one (#5613)

f9858193 · Kyle McGill · GitHub · ca63c49d · f9858193
Unverified Commit f9858193 authored Jan 26, 2026 by Kyle McGill Committed by GitHub Jan 27, 2026
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 0 deletions

container/Dockerfile.vllm container/Dockerfile.vllm +4 -0

No files found.
--- a/container/Dockerfile.vllm
+++ b/container/Dockerfile.vllm
@@ -847,5 +847,9 @@ USER dynamo
 ARG DYNAMO_COMMIT_SHA
 ENV DYNAMO_COMMIT_SHA=$DYNAMO_COMMIT_SHA
+# In vLLM 0.12 the default sampler changed on the forward pass.
+# We need to enable this to enable the cuda kernels.
+ENV VLLM_USE_FLASHINFER_SAMPLER=1
 ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
 CMD []