Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
caeb887b
Unverified
Commit
caeb887b
authored
Feb 18, 2026
by
Michael Goin
Committed by
GitHub
Feb 18, 2026
Browse files
[Bugfix] Fix NVFP4 TRTLLM MoE non-gated support; add gsm8k for Nemotron-3-Nano FP8+NVFP4 (#34725)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
6b3166a7
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
20 additions
and
0 deletions
+20
-0
tests/evals/gsm8k/configs/moe-refactor/Nemotron-Nano-30B-Fp8-ModelOpt-fi-trtllm.yaml
...oe-refactor/Nemotron-Nano-30B-Fp8-ModelOpt-fi-trtllm.yaml
+8
-0
tests/evals/gsm8k/configs/moe-refactor/Nemotron-Nano-30B-NvFp4-ModelOpt-fi-cutlass.yaml
...refactor/Nemotron-Nano-30B-NvFp4-ModelOpt-fi-cutlass.yaml
+8
-0
tests/evals/gsm8k/configs/moe-refactor/config-b200.txt
tests/evals/gsm8k/configs/moe-refactor/config-b200.txt
+2
-0
vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
..._executor/layers/quantization/utils/flashinfer_fp4_moe.py
+2
-0
No files found.
tests/evals/gsm8k/configs/moe-refactor/Nemotron-Nano-30B-Fp8-ModelOpt-fi-trtllm.yaml
0 → 100644
View file @
caeb887b
model_name
:
"
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8"
accuracy_threshold
:
0.29
num_questions
:
1319
num_fewshot
:
5
server_args
:
"
--enforce-eager
--max-model-len
8192
--tensor-parallel-size
2"
env
:
VLLM_USE_FLASHINFER_MOE_FP8
:
"
1"
VLLM_FLASHINFER_MOE_BACKEND
:
"
latency"
tests/evals/gsm8k/configs/moe-refactor/Nemotron-Nano-30B-NvFp4-ModelOpt-fi-cutlass.yaml
0 → 100644
View file @
caeb887b
model_name
:
"
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4"
accuracy_threshold
:
0.29
num_questions
:
1319
num_fewshot
:
5
server_args
:
"
--enforce-eager
--max-model-len
8192
--tensor-parallel-size
2"
env
:
VLLM_USE_FLASHINFER_MOE_FP4
:
"
1"
VLLM_FLASHINFER_MOE_BACKEND
:
"
throughput"
tests/evals/gsm8k/configs/moe-refactor/config-b200.txt
View file @
caeb887b
...
@@ -13,3 +13,5 @@ Llama-4-Scout-BF16-fi-cutlass.yaml
...
@@ -13,3 +13,5 @@ Llama-4-Scout-BF16-fi-cutlass.yaml
Llama-4-Scout-BF16-triton.yaml
Llama-4-Scout-BF16-triton.yaml
Mixtral-8x7B-BF16-fi-cutlass.yaml
Mixtral-8x7B-BF16-fi-cutlass.yaml
Mixtral-8x7B-BF16-triton.yaml
Mixtral-8x7B-BF16-triton.yaml
Nemotron-Nano-30B-Fp8-ModelOpt-fi-trtllm.yaml
Nemotron-Nano-30B-NvFp4-ModelOpt-fi-cutlass.yaml
vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
View file @
caeb887b
...
@@ -122,6 +122,8 @@ def is_supported_config_trtllm(
...
@@ -122,6 +122,8 @@ def is_supported_config_trtllm(
return
False
,
_make_reason
(
"routing method"
)
return
False
,
_make_reason
(
"routing method"
)
elif
activation_format
!=
mk
.
FusedMoEActivationFormat
.
Standard
:
elif
activation_format
!=
mk
.
FusedMoEActivationFormat
.
Standard
:
return
False
,
_make_reason
(
"activation format"
)
return
False
,
_make_reason
(
"activation format"
)
elif
moe_config
.
hidden_dim
%
512
!=
0
:
return
False
,
_make_reason
(
"hidden_dim must be divisible by 512"
)
return
True
,
None
return
True
,
None
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment