"lib/bindings/vscode:/vscode.git/clone" did not exist on "5a4c96dbaf1f7bc20d8faa0400874daf7370643e"
Unverified Commit 5f8d90a3 authored by ptarasiewiczNV's avatar ptarasiewiczNV Committed by GitHub
Browse files

fix: reduce VLLM_MOE_DP_CHUNK_SIZE to 384 (#5307)


Signed-off-by: default avatarPiotr Tarasiewicz <ptarasiewicz@nvidia.com>
parent 0d3ff440
...@@ -92,5 +92,6 @@ curl -sS http://localhost:8000/v1/chat/completions \ ...@@ -92,5 +92,6 @@ curl -sS http://localhost:8000/v1/chat/completions \
- If your cluster/network requires specific interfaces, adjust environment variables (e.g., `NCCL_SOCKET_IFNAME`) in the manifest accordingly. - If your cluster/network requires specific interfaces, adjust environment variables (e.g., `NCCL_SOCKET_IFNAME`) in the manifest accordingly.
- If your storage class differs, update `storageClassName` before applying the PVC. - If your storage class differs, update `storageClassName` before applying the PVC.
- **If you want to run multinode deployments, IBGDA (InfiniBand GPU Direct Async) must be enabled on your nodes.** To enable IBGDA, you can follow this configuration script: [configure_system_drivers.sh](https://github.com/vllm-project/vllm/blob/v0.11.2/tools/ep_kernels/configure_system_drivers.sh). The script configures NVIDIA driver parameters and requires a system reboot to take effect. - **If you want to run multinode deployments, IBGDA (InfiniBand GPU Direct Async) must be enabled on your nodes.** To enable IBGDA, you can follow this configuration script: [configure_system_drivers.sh](https://github.com/vllm-project/vllm/blob/v0.11.2/tools/ep_kernels/configure_system_drivers.sh). The script configures NVIDIA driver parameters and requires a system reboot to take effect.
- `VLLM_MOE_DP_CHUNK_SIZE` can be tuned further. The value 384 was chosen to be largest possible that still can be deployed on 16 H200s. This value should be greater than per rank concurrency.
...@@ -58,7 +58,7 @@ spec: ...@@ -58,7 +58,7 @@ spec:
- name: VLLM_ALL2ALL_BACKEND - name: VLLM_ALL2ALL_BACKEND
value: deepep_low_latency value: deepep_low_latency
- name: VLLM_MOE_DP_CHUNK_SIZE - name: VLLM_MOE_DP_CHUNK_SIZE
value: "512" value: "384"
- name: VLLM_SKIP_P2P_CHECK - name: VLLM_SKIP_P2P_CHECK
value: "1" value: "1"
- name: VLLM_RANDOMIZE_DP_DUMMY_INPUTS - name: VLLM_RANDOMIZE_DP_DUMMY_INPUTS
...@@ -67,8 +67,6 @@ spec: ...@@ -67,8 +67,6 @@ spec:
value: enabled value: enabled
- name: VLLM_MOE_ROUTING_SIMULATION_STRATEGY - name: VLLM_MOE_ROUTING_SIMULATION_STRATEGY
value: "uniform_random" value: "uniform_random"
- name: NVSHMEM_QP_DEPTH
value: "1512"
- name: GLOO_SOCKET_IFNAME - name: GLOO_SOCKET_IFNAME
value: eth0 value: eth0
command: command:
...@@ -125,7 +123,7 @@ spec: ...@@ -125,7 +123,7 @@ spec:
- name: VLLM_ALL2ALL_BACKEND - name: VLLM_ALL2ALL_BACKEND
value: deepep_high_throughput value: deepep_high_throughput
- name: VLLM_MOE_DP_CHUNK_SIZE - name: VLLM_MOE_DP_CHUNK_SIZE
value: "512" value: "384"
- name: VLLM_SKIP_P2P_CHECK - name: VLLM_SKIP_P2P_CHECK
value: "1" value: "1"
- name: VLLM_RANDOMIZE_DP_DUMMY_INPUTS - name: VLLM_RANDOMIZE_DP_DUMMY_INPUTS
...@@ -134,8 +132,6 @@ spec: ...@@ -134,8 +132,6 @@ spec:
value: enabled value: enabled
- name: VLLM_MOE_ROUTING_SIMULATION_STRATEGY - name: VLLM_MOE_ROUTING_SIMULATION_STRATEGY
value: "uniform_random" value: "uniform_random"
- name: NVSHMEM_QP_DEPTH
value: "1512"
- name: GLOO_SOCKET_IFNAME - name: GLOO_SOCKET_IFNAME
value: eth0 value: eth0
command: command:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment