Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
e34d130c
Unverified
Commit
e34d130c
authored
Jul 07, 2025
by
Chenyaaang
Committed by
GitHub
Jul 08, 2025
Browse files
[TPU] Temporary fix vmem oom for long model len by reducing page size (#20278)
Signed-off-by:
Chenyaaang
<
chenyangli@google.com
>
parent
7721ef17
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
0 deletions
+6
-0
vllm/v1/attention/backends/pallas.py
vllm/v1/attention/backends/pallas.py
+6
-0
No files found.
vllm/v1/attention/backends/pallas.py
View file @
e34d130c
...
@@ -86,6 +86,12 @@ class PallasAttentionBackend(AttentionBackend):
...
@@ -86,6 +86,12 @@ class PallasAttentionBackend(AttentionBackend):
# spill less likely. Meanwhile we make sure the page size is in [16, 256].
# spill less likely. Meanwhile we make sure the page size is in [16, 256].
@
staticmethod
@
staticmethod
def
get_page_size
(
vllm_config
:
VllmConfig
)
->
int
:
def
get_page_size
(
vllm_config
:
VllmConfig
)
->
int
:
# TODO: This is a temporary fix for vmem OOM.
# For long model length, we use 16 page-size to avoid too much
# VMEM spill. A more robust solution should be implemented to
# handle VREG spills.
if
vllm_config
.
model_config
.
max_model_len
>
8192
:
return
16
page_size
=
next_power_of_2
(
page_size
=
next_power_of_2
(
vllm_config
.
model_config
.
max_model_len
)
//
16
vllm_config
.
model_config
.
max_model_len
)
//
16
if
page_size
<=
16
:
if
page_size
<=
16
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment