Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
3af22744
Commit
3af22744
authored
Mar 05, 2026
by
lixh6
Browse files
Fix: Extend MAX_VPT to 128 for large-scale MoE models (e.g., GLM4.5V-quantized model).
parent
cfd6a543
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
1 deletion
+2
-1
csrc/moe/moe_fused_gate.cu
csrc/moe/moe_fused_gate.cu
+2
-1
No files found.
csrc/moe/moe_fused_gate.cu
View file @
3af22744
...
...
@@ -71,7 +71,8 @@ __device__ inline bool cmp_eq(const T& a, const T& b) {
// Fixed constants common to both dynamic and static template versions:
static
constexpr
int
SIZE_WARP
=
32
;
static
constexpr
int
WARPS_PER_CTA
=
6
;
static
constexpr
int
MAX_VPT
=
32
;
// maximum VPT we support, > params.VPT = num_expert / num_expert_group
// static constexpr int MAX_VPT = 32; // maximum VPT we support, > params.VPT = num_expert / num_expert_group
static
constexpr
int
MAX_VPT
=
128
;
// Extend MAX_VPT from 32 to 128 to accommodate large-scale MoE models (e.g., GLM-4V-quantized model).
// Create an alias for Array using AlignedArray
template
<
typename
T
,
int
N
>
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment