Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
75f64d8b
Unverified
Commit
75f64d8b
authored
Jul 12, 2024
by
Cody Yu
Committed by
GitHub
Jul 12, 2024
Browse files
[Bugfix] Fix illegal memory access in FP8 MoE kernel (#6382)
parent
21b2dced
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
3 deletions
+5
-3
vllm/model_executor/layers/fused_moe/fused_moe.py
vllm/model_executor/layers/fused_moe/fused_moe.py
+5
-3
No files found.
vllm/model_executor/layers/fused_moe/fused_moe.py
View file @
75f64d8b
...
@@ -492,12 +492,14 @@ def fused_experts(hidden_states: torch.Tensor,
...
@@ -492,12 +492,14 @@ def fused_experts(hidden_states: torch.Tensor,
if
tokens_in_chunk
==
0
:
if
tokens_in_chunk
==
0
:
break
break
if
tokens_in_chunk
<
CHUNK_SIZE
:
if
tokens_in_chunk
<
CHUNK_SIZE
and
chunk
>
0
:
# will only happen in the last chunk
# Adjust the intermediate cache size and config for the last
# chunk. Note that in most cases we only have one chunk
# so the cache size and config are already set correctly and
# do not need to be adjusted.
intermediate_cache1
=
intermediate_cache1
[:
tokens_in_chunk
]
intermediate_cache1
=
intermediate_cache1
[:
tokens_in_chunk
]
intermediate_cache2
=
intermediate_cache2
[:
tokens_in_chunk
]
intermediate_cache2
=
intermediate_cache2
[:
tokens_in_chunk
]
intermediate_cache3
=
intermediate_cache3
[:
tokens_in_chunk
]
intermediate_cache3
=
intermediate_cache3
[:
tokens_in_chunk
]
# reload config to get better performance on the last chunk
config
=
get_config_func
(
tokens_in_chunk
)
config
=
get_config_func
(
tokens_in_chunk
)
curr_topk_ids
=
topk_ids
[
begin_chunk_idx
:
end_chunk_idx
]
curr_topk_ids
=
topk_ids
[
begin_chunk_idx
:
end_chunk_idx
]
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment