fix: reduce VLLM_MOE_DP_CHUNK_SIZE to 384 (#5307)

Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>

fix: reduce VLLM_MOE_DP_CHUNK_SIZE to 384 (#5307)
Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
5f8d90a3 · ptarasiewiczNV · GitHub · 0d3ff440 · 5f8d90a3 · 5f8d90a3
Unverified Commit 5f8d90a3 authored Jan 09, 2026 by ptarasiewiczNV Committed by GitHub Jan 09, 2026
Showing with 3 additions and 6 deletions

recipes/deepseek-r1/vllm/disagg/README.md recipes/deepseek-r1/vllm/disagg/README.md +1 -0

recipes/deepseek-r1/vllm/disagg/deploy_hopper_16gpu.yaml recipes/deepseek-r1/vllm/disagg/deploy_hopper_16gpu.yaml +2 -6

No files found.
--- a/recipes/deepseek-r1/vllm/disagg/README.md
+++ b/recipes/deepseek-r1/vllm/disagg/README.md
@@ -92,5 +92,6 @@ curl -sS http://localhost:8000/v1/chat/completions \
 - If your cluster/network requires specific interfaces, adjust environment variables (e.g., `NCCL_SOCKET_IFNAME`) in the manifest accordingly.
 - If your storage class differs, update `storageClassName` before applying the PVC.
 - **If you want to run multinode deployments, IBGDA (InfiniBand GPU Direct Async) must be enabled on your nodes.** To enable IBGDA, you can follow this configuration script: [configure_system_drivers.sh](https://github.com/vllm-project/vllm/blob/v0.11.2/tools/ep_kernels/configure_system_drivers.sh). The script configures NVIDIA driver parameters and requires a system reboot to take effect.
+- `VLLM_MOE_DP_CHUNK_SIZE` can be tuned further. The value 384 was chosen to be largest possible that still can be deployed on 16 H200s. This value should be greater than per rank concurrency.
--- a/recipes/deepseek-r1/vllm/disagg/deploy_hopper_16gpu.yaml
+++ b/recipes/deepseek-r1/vllm/disagg/deploy_hopper_16gpu.yaml
@@ -58,7 +58,7 @@ spec:
            - name: VLLM_ALL2ALL_BACKEND
              value: deepep_low_latency
            - name: VLLM_MOE_DP_CHUNK_SIZE
-              value: "512"
+              value: "384"
            - name: VLLM_SKIP_P2P_CHECK
              value: "1"
            - name: VLLM_RANDOMIZE_DP_DUMMY_INPUTS
@@ -67,8 +67,6 @@ spec:
              value: enabled
            - name: VLLM_MOE_ROUTING_SIMULATION_STRATEGY
              value: "uniform_random"
-            - name: NVSHMEM_QP_DEPTH
-              value: "1512"
            - name: GLOO_SOCKET_IFNAME
              value: eth0
          command:
@@ -125,7 +123,7 @@ spec:
            - name: VLLM_ALL2ALL_BACKEND
              value: deepep_high_throughput
            - name: VLLM_MOE_DP_CHUNK_SIZE
-              value: "512"
+              value: "384"
            - name: VLLM_SKIP_P2P_CHECK
              value: "1"
            - name: VLLM_RANDOMIZE_DP_DUMMY_INPUTS
@@ -134,8 +132,6 @@ spec:
              value: enabled
            - name: VLLM_MOE_ROUTING_SIMULATION_STRATEGY
              value: "uniform_random"
-            - name: NVSHMEM_QP_DEPTH
-              value: "1512"
            - name: GLOO_SOCKET_IFNAME
              value: eth0
          command: