Fix：Extend MAX_VPT from 32 to 256 to accommodate large-scale MoE models (e.g.,...

Fix：Extend MAX_VPT from 32 to 256 to accommodate large-scale MoE models (e.g., GLM-5-quantized model).

Fix：Extend MAX_VPT from 32 to 256 to accommodate large-scale MoE models (e.g.,...
Fix：Extend MAX_VPT from 32 to 256 to accommodate large-scale MoE models (e.g., GLM-5-quantized model).
03a3c522 · lixh6 · 4ad7a1fe · 03a3c522
Commit 03a3c522 authored Mar 11, 2026 by lixh6
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

csrc/moe/moe_fused_gate.cu csrc/moe/moe_fused_gate.cu +1 -1

No files found.
--- a/csrc/moe/moe_fused_gate.cu
+++ b/csrc/moe/moe_fused_gate.cu
@@ -72,7 +72,7 @@ __device__ inline bool cmp_eq(const T& a, const T& b) {
 static constexpr int SIZE_WARP = 32;
 static constexpr int WARPS_PER_CTA = 6;
 // static constexpr int MAX_VPT = 32;  // maximum VPT we support, > params.VPT = num_expert / num_expert_group
-static constexpr int MAX_VPT = 128; // Extend MAX_VPT from 32 to 128 to accommodate large-scale MoE models (e.g., GLM-4V-quantized model).
+static constexpr int MAX_VPT = 256; // Extend MAX_VPT from 32 to 256 to accommodate large-scale MoE models (e.g., GLM-5-quantized model).
 // Create an alias for Array using AlignedArray
 template <typename T, int N>