Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
53f7874a
Unverified
Commit
53f7874a
authored
Aug 09, 2025
by
valarLip
Committed by
GitHub
Aug 08, 2025
Browse files
refine aiter_backend for mtp (#7279)
Co-authored-by:
HAI
<
hixiao@gmail.com
>
parent
61a46804
Changes
3
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
387 additions
and
107 deletions
+387
-107
python/sglang/srt/layers/attention/aiter_backend.py
python/sglang/srt/layers/attention/aiter_backend.py
+370
-107
python/sglang/srt/managers/schedule_batch.py
python/sglang/srt/managers/schedule_batch.py
+1
-0
python/sglang/srt/speculative/eagle_worker.py
python/sglang/srt/speculative/eagle_worker.py
+16
-0
No files found.
python/sglang/srt/layers/attention/aiter_backend.py
View file @
53f7874a
This diff is collapsed.
Click to expand it.
python/sglang/srt/managers/schedule_batch.py
View file @
53f7874a
...
...
@@ -1722,6 +1722,7 @@ class ScheduleBatch(ScheduleBatchDisaggregationDecodeMixin):
or
attention_backend_str
==
"cutlass_mla"
or
attention_backend_str
==
"ascend"
or
attention_backend_str
==
"trtllm_mha"
or
attention_backend_str
==
"aiter"
or
global_server_args_dict
[
"enable_two_batch_overlap"
]
):
seq_lens_cpu
=
(
...
...
python/sglang/srt/speculative/eagle_worker.py
View file @
53f7874a
...
...
@@ -226,6 +226,22 @@ class EAGLEWorker(TpModelWorker):
self
.
draft_model_runner
,
skip_prefill
=
False
,
)
elif
self
.
server_args
.
attention_backend
==
"aiter"
:
from
sglang.srt.layers.attention.aiter_backend
import
(
AiterAttnBackend
,
AiterMultiStepDraftBackend
,
)
self
.
draft_attn_backend
=
AiterMultiStepDraftBackend
(
self
.
draft_model_runner
,
self
.
topk
,
self
.
speculative_num_steps
,
)
self
.
draft_extend_attn_backend
=
AiterAttnBackend
(
self
.
draft_model_runner
,
skip_prefill
=
False
,
)
self
.
has_prefill_wrapper_verify
=
False
elif
self
.
server_args
.
attention_backend
==
"fa3"
:
from
sglang.srt.layers.attention.flashattention_backend
import
(
FlashAttentionBackend
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment