Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
2f47d710
"examples/cpp/vscode:/vscode.git/clone" did not exist on "c359d8d56242997e6209b71524d7a6199ea333b2"
Unverified
Commit
2f47d710
authored
Feb 10, 2025
by
Xiaoyu Zhang
Committed by
GitHub
Feb 10, 2025
Browse files
refine some typo (#3473)
parent
4fe92bfc
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
4 additions
and
4 deletions
+4
-4
benchmark/kernels/fused_moe_triton/benchmark_torch_compile_fused_moe.py
...els/fused_moe_triton/benchmark_torch_compile_fused_moe.py
+1
-1
python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
+1
-1
python/sglang/srt/layers/moe/topk.py
python/sglang/srt/layers/moe/topk.py
+1
-1
python/sglang/srt/server_args.py
python/sglang/srt/server_args.py
+1
-1
No files found.
benchmark/kernels/fused_moe_triton/benchmark_torch_compile_fused_moe.py
View file @
2f47d710
...
...
@@ -30,7 +30,7 @@ def get_model_config(model_name: str, tp_size: int):
topk
=
config
.
num_experts_per_tok
intermediate_size
=
config
.
moe_intermediate_size
shard_intermediate_size
=
2
*
intermediate_size
//
tp_size
elif
config
.
architectures
[
0
]
==
"DeepseekV2ForCausalLM"
:
elif
config
.
architectures
[
0
]
in
[
"DeepseekV2ForCausalLM"
,
"DeepseekV3ForCausalLM"
]
:
E
=
config
.
n_routed_experts
topk
=
config
.
num_experts_per_tok
intermediate_size
=
config
.
intermediate_size
...
...
python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
View file @
2f47d710
...
...
@@ -1094,7 +1094,7 @@ def fused_moe(
- num_expert_group: Optional[int]: additional parameter for grouped_topk
- topk_group: Optional[int]: additional parameter for grouped_topk
- use_grouped_topk: If True, use grouped_topk instead of fused_topk
note: Deepseek
v2
model use
s
grouped_topk
note: Deepseek
V2/V3/R1 series
model
s
use grouped_topk
- use_fp8_w8a8 (bool): If True, use fp8 arithmetic to compute the inner
products for w1 and w2. Defaults to False.
- use_int8_w8a16 (bool): If True, use fp8 arithmetic to compute the inner
...
...
python/sglang/srt/layers/moe/topk.py
View file @
2f47d710
...
...
@@ -75,7 +75,7 @@ def fused_topk(
return
topk_weights
,
topk_ids
# This is used by the Deepseek
-
V2 model
# This is used by the Deepseek
V2
/V3/R1 series
model
s
@
torch
.
compile
(
dynamic
=
True
,
backend
=
get_compiler_backend
())
def
grouped_topk
(
hidden_states
:
torch
.
Tensor
,
...
...
python/sglang/srt/server_args.py
View file @
2f47d710
...
...
@@ -795,7 +795,7 @@ class ServerArgs:
parser
.
add_argument
(
"--disable-mla"
,
action
=
"store_true"
,
help
=
"Disable Multi-head Latent Attention (MLA) for DeepSeek
-
V2."
,
help
=
"Disable Multi-head Latent Attention (MLA) for DeepSeek
V2
/V3/R1 series models
."
,
)
parser
.
add_argument
(
"--disable-overlap-schedule"
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment