Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
a532c838
Unverified
Commit
a532c838
authored
Feb 26, 2026
by
gnovack
Committed by
GitHub
Feb 27, 2026
Browse files
use 'max_active_experts' for moe lora input size (#33197)
Signed-off-by:
gnovack
<
gnovack@amazon.com
>
parent
1e5ad9b7
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
4 additions
and
0 deletions
+4
-0
tests/lora/test_moe_lora_align_sum.py
tests/lora/test_moe_lora_align_sum.py
+2
-0
vllm/lora/punica_wrapper/punica_gpu.py
vllm/lora/punica_wrapper/punica_gpu.py
+2
-0
No files found.
tests/lora/test_moe_lora_align_sum.py
View file @
a532c838
...
...
@@ -47,6 +47,8 @@ def test_moe_lora_align_block_size(
# compute paddings
max_num_tokens_padded
=
topk_ids
.
numel
()
+
num_experts
*
(
block_size
-
1
)
max_num_tokens_padded
=
round_up
(
max_num_tokens_padded
,
block_size
)
if
topk_ids
.
numel
()
<
num_experts
:
max_num_tokens_padded
=
topk_ids
.
numel
()
*
block_size
max_num_m_blocks
=
CEILDIV
(
max_num_tokens_padded
,
block_size
)
# init output tensors
...
...
vllm/lora/punica_wrapper/punica_gpu.py
View file @
a532c838
...
...
@@ -351,6 +351,8 @@ class PunicaWrapperGPU(PunicaWrapperBase):
max_num_tokens_padded
=
topk_ids
.
numel
()
+
num_experts
*
(
block_size
-
1
)
if
pad_sorted_ids
:
max_num_tokens_padded
=
round_up
(
max_num_tokens_padded
,
block_size
)
if
topk_ids
.
numel
()
<
num_experts
:
max_num_tokens_padded
=
topk_ids
.
numel
()
*
block_size
sorted_ids
=
torch
.
empty
(
(
max_loras
*
max_num_tokens_padded
,),
dtype
=
torch
.
int32
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment