Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
c03a553b
Commit
c03a553b
authored
Dec 17, 2025
by
王敏
Browse files
[feat]w8a8 高吞吐模式先量化再dispatch
parent
4fadef92
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
1 deletion
+3
-1
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe_marlin.py
...ation/compressed_tensors/compressed_tensors_moe_marlin.py
+3
-1
No files found.
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe_marlin.py
View file @
c03a553b
...
@@ -520,6 +520,8 @@ class CompressedTensorsW8A8Int8MarlinMoEMethod(CompressedTensorsMarlinMoEMethod)
...
@@ -520,6 +520,8 @@ class CompressedTensorsW8A8Int8MarlinMoEMethod(CompressedTensorsMarlinMoEMethod)
False
)
False
)
return
TritonOrGroupGemmExperts
(
return
TritonOrGroupGemmExperts
(
use_int8_w8a8
=
envs
.
VLLM_ENABLE_DEEPEP_HT_DEEPGEMM
,
#use_int8_w8a8=envs.VLLM_ENABLE_DEEPEP_HT_DEEPGEMM,
use_int8_w8a8
=
True
,
per_act_token_quant
=
True
,
fused_experts
=
self
.
w8a8_groupgemm_contiguous_forward
if
envs
.
VLLM_ENABLE_DEEPEP_HT_DEEPGEMM
else
self
.
fused_moe_forward
fused_experts
=
self
.
w8a8_groupgemm_contiguous_forward
if
envs
.
VLLM_ENABLE_DEEPEP_HT_DEEPGEMM
else
self
.
fused_moe_forward
)
)
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment