Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
24dc30f7
Unverified
Commit
24dc30f7
authored
Jan 21, 2026
by
Nick Hill
Committed by
GitHub
Jan 21, 2026
Browse files
[ModelRunner V2] Don't pin reused flashinfer tensors (#32799)
Signed-off-by:
Nick Hill
<
nickhill123@gmail.com
>
parent
180fba65
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
1 deletion
+6
-1
vllm/v1/attention/backends/flashinfer.py
vllm/v1/attention/backends/flashinfer.py
+6
-1
No files found.
vllm/v1/attention/backends/flashinfer.py
View file @
24dc30f7
...
...
@@ -603,7 +603,12 @@ class FlashInferMetadataBuilder(AttentionMetadataBuilder[FlashInferMetadata]):
"earlier GPUs."
)
# Preparing persistent buffers
self
.
pin_memory
=
is_pin_memory_available
()
# Since we do not have explicit synchronization in ModelRunnerV2, we do not pin
# reused CPU buffers to avoid a race condition between step N async copies to
# GPU and step N+1 buffer updates.
self
.
pin_memory
=
(
not
envs
.
VLLM_USE_V2_MODEL_RUNNER
and
is_pin_memory_available
()
)
self
.
paged_kv_indptr
=
self
.
_make_buffer
(
max_num_reqs
+
1
)
self
.
paged_kv_indptr_cpu_buffer
=
torch
.
zeros_like
(
self
.
paged_kv_indptr
.
cpu
,
pin_memory
=
self
.
pin_memory
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment