docs(recipes): clarify Qwen3-235B DEEPGEMM requires Blackwell (SM100+) (#8410)

Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>

docs(recipes): clarify Qwen3-235B DEEPGEMM requires Blackwell (SM100+) (#8410)
Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
781e4100 · Yuewei Na · GitHub · 508aed84 · 781e4100 · 781e4100
Unverified Commit 781e4100 authored Apr 20, 2026 by Yuewei Na Committed by GitHub Apr 20, 2026
3 changed files
--- a/recipes/qwen3-235b-a22b-fp8/README.md
+++ b/recipes/qwen3-235b-a22b-fp8/README.md
@@ -12,7 +12,7 @@ Production-ready deployments for **Qwen3-235B-A22B** (MoE model with 22B active
 ## Prerequisites
 1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
-2. **GPU cluster** with H100/H200 GPUs (high memory recommended)
+2. **GPU cluster** with Blackwell GPUs (B100/B200; SM100+) — see [Hardware Requirements](#hardware-requirements)
 3. **HuggingFace token** with access to Qwen models
 ## Quick Start
@@ -62,12 +62,16 @@ curl http://localhost:8000/v1/chat/completions \
 ## Hardware Requirements
-This is a large MoE model requiring significant GPU resources:
+This recipe uses `moe_config.backend: DEEPGEMM`, which requires **Blackwell GPUs (SM100+, e.g. B100/B200)**.
+DeepGEMM's FP8 grouped-GEMM kernels are designed for SM100/SM103 only and will crash on Hopper (SM90).
+> **Note:** To run on Hopper (H100/H200, SM90), remove the `moe_config` block from the ConfigMaps in
+> `trtllm/agg/deploy.yaml` and `trtllm/disagg/deploy.yaml`. This falls back to the default MoE backend at a modest throughput reduction.
 | Configuration | GPUs | Min GPU VRAM (Total) |
 |--------------|------|----------------------|
-| Aggregated | 16x H100/H200 | ~1.3TB |
+| Aggregated | 16x B100/B200 | ~1.3TB |
-| Disaggregated | 16x H100/H200 | ~1.3TB |
+| Disaggregated | 16x B100/B200 | ~1.3TB |
 ## Notes

--- a/recipes/qwen3-235b-a22b-fp8/trtllm/agg/deploy.yaml
+++ b/recipes/qwen3-235b-a22b-fp8/trtllm/agg/deploy.yaml
@@ -24,6 +24,7 @@ data:
      max_batch_size: 128
    disable_overlap_scheduler: false
    print_iter_log: false
+    # DEEPGEMM requires Blackwell (SM100+). Remove moe_config block to run on Hopper (SM90).
    moe_config:
      backend: DEEPGEMM
      max_num_tokens: 8192

--- a/recipes/qwen3-235b-a22b-fp8/trtllm/disagg/deploy.yaml
+++ b/recipes/qwen3-235b-a22b-fp8/trtllm/disagg/deploy.yaml
@@ -24,6 +24,7 @@ data:
      max_batch_size: 2
    disable_overlap_scheduler: true
    print_iter_log: false
+    # DEEPGEMM requires Blackwell (SM100+). Remove moe_config block to run on Hopper (SM90).
    moe_config:
      backend: DEEPGEMM
      max_num_tokens: 8192
@@ -52,6 +53,7 @@ data:
      max_batch_size: 512
    disable_overlap_scheduler: false
    print_iter_log: false
+    # DEEPGEMM requires Blackwell (SM100+). Remove moe_config block to run on Hopper (SM90).
    moe_config:
      backend: DEEPGEMM
      max_num_tokens: 8192