Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
13baa653
Commit
13baa653
authored
Mar 23, 2026
by
zhuwenwen
Browse files
Merge tag 'v0.18.0' into v0.18.0-ori
parents
3fb4b5fa
bcf2be96
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
42 additions
and
5 deletions
+42
-5
.buildkite/test_areas/lm_eval.yaml
.buildkite/test_areas/lm_eval.yaml
+16
-0
tests/evals/gsm8k/configs/Qwen3.5-35B-A3B-DEP2.yaml
tests/evals/gsm8k/configs/Qwen3.5-35B-A3B-DEP2.yaml
+8
-0
tests/evals/gsm8k/configs/Qwen3.5-35B-A3B-FP8-DEP2.yaml
tests/evals/gsm8k/configs/Qwen3.5-35B-A3B-FP8-DEP2.yaml
+9
-0
tests/evals/gsm8k/configs/models-qwen35-blackwell.txt
tests/evals/gsm8k/configs/models-qwen35-blackwell.txt
+2
-0
vllm/model_executor/layers/fused_moe/experts/trtllm_fp8_moe.py
...model_executor/layers/fused_moe/experts/trtllm_fp8_moe.py
+7
-5
No files found.
.buildkite/test_areas/lm_eval.yaml
View file @
13baa653
...
@@ -45,6 +45,22 @@ steps:
...
@@ -45,6 +45,22 @@ steps:
commands
:
commands
:
-
pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-blackwell.txt
-
pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-blackwell.txt
-
label
:
LM Eval Qwen3.5 Models (B200)
timeout_in_minutes
:
120
device
:
b200
optional
:
true
num_devices
:
2
source_file_dependencies
:
-
vllm/model_executor/models/qwen3_5.py
-
vllm/model_executor/models/qwen3_5_mtp.py
-
vllm/transformers_utils/configs/qwen3_5.py
-
vllm/transformers_utils/configs/qwen3_5_moe.py
-
vllm/model_executor/models/qwen3_next.py
-
vllm/model_executor/models/qwen3_next_mtp.py
-
vllm/model_executor/layers/fla/ops/
commands
:
-
pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-qwen35-blackwell.txt
-
label
:
LM Eval Large Models (H200)
-
label
:
LM Eval Large Models (H200)
timeout_in_minutes
:
60
timeout_in_minutes
:
60
device
:
h200
device
:
h200
...
...
tests/evals/gsm8k/configs/Qwen3.5-35B-A3B-DEP2.yaml
0 → 100644
View file @
13baa653
model_name
:
"
Qwen/Qwen3.5-35B-A3B"
accuracy_threshold
:
0.86
num_questions
:
1319
num_fewshot
:
5
server_args
:
>-
--max-model-len 4096
--data-parallel-size 2
--enable-expert-parallel
tests/evals/gsm8k/configs/Qwen3.5-35B-A3B-FP8-DEP2.yaml
0 → 100644
View file @
13baa653
model_name
:
"
Qwen/Qwen3.5-35B-A3B-FP8"
accuracy_threshold
:
0.86
num_questions
:
1319
num_fewshot
:
5
server_args
:
>-
--max-model-len 4096
--data-parallel-size 2
--enable-expert-parallel
--kv-cache-dtype fp8
tests/evals/gsm8k/configs/models-qwen35-blackwell.txt
0 → 100644
View file @
13baa653
Qwen3.5-35B-A3B-DEP2.yaml
Qwen3.5-35B-A3B-FP8-DEP2.yaml
vllm/model_executor/layers/fused_moe/experts/trtllm_fp8_moe.py
View file @
13baa653
...
@@ -253,23 +253,25 @@ class TrtLlmFp8ExpertsMonolithic(TrtLlmFp8ExpertsBase, mk.FusedMoEExpertsMonolit
...
@@ -253,23 +253,25 @@ class TrtLlmFp8ExpertsMonolithic(TrtLlmFp8ExpertsBase, mk.FusedMoEExpertsMonolit
weight_key
:
QuantKey
|
None
,
weight_key
:
QuantKey
|
None
,
activation_key
:
QuantKey
|
None
,
activation_key
:
QuantKey
|
None
,
)
->
bool
:
)
->
bool
:
"""Monolithic kernels need to express router support."""
"""Monolithic kernels need to express router support.
Renormalize/RenormalizeNaive are excluded: the monolithic kernel's
internal routing for these methods produces output uncorrelated
with the modular kernel's output and with Triton kernel's output
for Qwen3.5-35B-A3B-FP8.
See: https://github.com/vllm-project/vllm/issues/37591
"""
# NOTE(dbari): TopK routing could also be enabled, but need to validate models
# NOTE(dbari): TopK routing could also be enabled, but need to validate models
# NOTE(dbari): Default is not implemented and should not be enabled until it is
# NOTE(dbari): Default is not implemented and should not be enabled until it is
if
(
weight_key
,
activation_key
)
==
(
kFp8Static128BlockSym
,
kFp8Dynamic128Sym
):
if
(
weight_key
,
activation_key
)
==
(
kFp8Static128BlockSym
,
kFp8Dynamic128Sym
):
# NOTE(rob): potentially allow others here. This is a conservative list.
# NOTE(rob): potentially allow others here. This is a conservative list.
return
routing_method
in
[
return
routing_method
in
[
RoutingMethodType
.
DeepSeekV3
,
RoutingMethodType
.
DeepSeekV3
,
RoutingMethodType
.
Renormalize
,
RoutingMethodType
.
RenormalizeNaive
,
]
]
elif
(
weight_key
,
activation_key
)
==
(
kFp8StaticTensorSym
,
kFp8StaticTensorSym
):
elif
(
weight_key
,
activation_key
)
==
(
kFp8StaticTensorSym
,
kFp8StaticTensorSym
):
# NOTE(dbari): as above, potentially allow others here.
# NOTE(dbari): as above, potentially allow others here.
return
routing_method
in
[
return
routing_method
in
[
RoutingMethodType
.
DeepSeekV3
,
RoutingMethodType
.
DeepSeekV3
,
RoutingMethodType
.
Llama4
,
RoutingMethodType
.
Llama4
,
RoutingMethodType
.
Renormalize
,
RoutingMethodType
.
RenormalizeNaive
,
]
]
else
:
else
:
raise
ValueError
(
"Unsupported quantization scheme."
)
raise
ValueError
(
"Unsupported quantization scheme."
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment