Unverified Commit 781e4100 authored by Yuewei Na's avatar Yuewei Na Committed by GitHub
Browse files

docs(recipes): clarify Qwen3-235B DEEPGEMM requires Blackwell (SM100+) (#8410)


Signed-off-by: default avatarYuewei Na <nv-yna@users.noreply.github.com>
Co-authored-by: default avatarYuewei Na <nv-yna@users.noreply.github.com>
parent 508aed84
...@@ -12,7 +12,7 @@ Production-ready deployments for **Qwen3-235B-A22B** (MoE model with 22B active ...@@ -12,7 +12,7 @@ Production-ready deployments for **Qwen3-235B-A22B** (MoE model with 22B active
## Prerequisites ## Prerequisites
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md) 1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **GPU cluster** with H100/H200 GPUs (high memory recommended) 2. **GPU cluster** with Blackwell GPUs (B100/B200; SM100+) — see [Hardware Requirements](#hardware-requirements)
3. **HuggingFace token** with access to Qwen models 3. **HuggingFace token** with access to Qwen models
## Quick Start ## Quick Start
...@@ -62,12 +62,16 @@ curl http://localhost:8000/v1/chat/completions \ ...@@ -62,12 +62,16 @@ curl http://localhost:8000/v1/chat/completions \
## Hardware Requirements ## Hardware Requirements
This is a large MoE model requiring significant GPU resources: This recipe uses `moe_config.backend: DEEPGEMM`, which requires **Blackwell GPUs (SM100+, e.g. B100/B200)**.
DeepGEMM's FP8 grouped-GEMM kernels are designed for SM100/SM103 only and will crash on Hopper (SM90).
> **Note:** To run on Hopper (H100/H200, SM90), remove the `moe_config` block from the ConfigMaps in
> `trtllm/agg/deploy.yaml` and `trtllm/disagg/deploy.yaml`. This falls back to the default MoE backend at a modest throughput reduction.
| Configuration | GPUs | Min GPU VRAM (Total) | | Configuration | GPUs | Min GPU VRAM (Total) |
|--------------|------|----------------------| |--------------|------|----------------------|
| Aggregated | 16x H100/H200 | ~1.3TB | | Aggregated | 16x B100/B200 | ~1.3TB |
| Disaggregated | 16x H100/H200 | ~1.3TB | | Disaggregated | 16x B100/B200 | ~1.3TB |
## Notes ## Notes
......
...@@ -24,6 +24,7 @@ data: ...@@ -24,6 +24,7 @@ data:
max_batch_size: 128 max_batch_size: 128
disable_overlap_scheduler: false disable_overlap_scheduler: false
print_iter_log: false print_iter_log: false
# DEEPGEMM requires Blackwell (SM100+). Remove moe_config block to run on Hopper (SM90).
moe_config: moe_config:
backend: DEEPGEMM backend: DEEPGEMM
max_num_tokens: 8192 max_num_tokens: 8192
......
...@@ -24,6 +24,7 @@ data: ...@@ -24,6 +24,7 @@ data:
max_batch_size: 2 max_batch_size: 2
disable_overlap_scheduler: true disable_overlap_scheduler: true
print_iter_log: false print_iter_log: false
# DEEPGEMM requires Blackwell (SM100+). Remove moe_config block to run on Hopper (SM90).
moe_config: moe_config:
backend: DEEPGEMM backend: DEEPGEMM
max_num_tokens: 8192 max_num_tokens: 8192
...@@ -52,6 +53,7 @@ data: ...@@ -52,6 +53,7 @@ data:
max_batch_size: 512 max_batch_size: 512
disable_overlap_scheduler: false disable_overlap_scheduler: false
print_iter_log: false print_iter_log: false
# DEEPGEMM requires Blackwell (SM100+). Remove moe_config block to run on Hopper (SM90).
moe_config: moe_config:
backend: DEEPGEMM backend: DEEPGEMM
max_num_tokens: 8192 max_num_tokens: 8192
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment