Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
ba29ab44
Unverified
Commit
ba29ab44
authored
Jan 18, 2026
by
Andrey Khalyavin
Committed by
GitHub
Jan 18, 2026
Browse files
Use the same memory for workspace13 and fused_output. (#31531)
Signed-off-by:
Andrey Khalyavin
<
halyavin@yandex-team.ru
>
parent
afc36226
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
6 deletions
+8
-6
vllm/model_executor/layers/fused_moe/modular_kernel.py
vllm/model_executor/layers/fused_moe/modular_kernel.py
+8
-6
No files found.
vllm/model_executor/layers/fused_moe/modular_kernel.py
View file @
ba29ab44
...
@@ -848,15 +848,17 @@ class FusedMoEModularKernel(torch.nn.Module):
...
@@ -848,15 +848,17 @@ class FusedMoEModularKernel(torch.nn.Module):
# We can reuse the memory between cache1 and cache3 because by the
# We can reuse the memory between cache1 and cache3 because by the
# time we need cache3, we're done with cache1.
# time we need cache3, we're done with cache1.
# Construct the entire output that can then be processed in chunks.
# Construct the entire output that can then be processed in chunks.
# Reuse workspace13 for the output in the non-chunked case
as long
# Reuse workspace13 for the output in the non-chunked case
.
#
as it is large enough.
This will not always be the case for standard
# This will not always be the case for standard
# format experts and with experts that have empty workspaces.
# format experts and with experts that have empty workspaces.
if
num_chunks
==
1
and
prod
(
workspace13_shape
)
>=
prod
(
fused_out_shape
):
if
num_chunks
==
1
:
workspace13
,
workspace2
=
current_workspace_manager
().
get_simultaneous
(
max_shape_size
=
max
(
prod
(
workspace13_shape
),
prod
(
fused_out_shape
))
(
workspace13_shape
,
workspace_dtype
),
common_workspace
,
workspace2
=
current_workspace_manager
().
get_simultaneous
(
((
max_shape_size
,),
workspace_dtype
),
(
workspace2_shape
,
workspace_dtype
),
(
workspace2_shape
,
workspace_dtype
),
)
)
fused_out
=
_resize_cache
(
workspace13
,
fused_out_shape
)
workspace13
=
_resize_cache
(
common_workspace
,
workspace13_shape
)
fused_out
=
_resize_cache
(
common_workspace
,
fused_out_shape
)
else
:
else
:
workspace13
,
workspace2
,
fused_out
=
(
workspace13
,
workspace2
,
fused_out
=
(
current_workspace_manager
().
get_simultaneous
(
current_workspace_manager
().
get_simultaneous
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment