Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
ac1c9342
Unverified
Commit
ac1c9342
authored
Dec 19, 2025
by
Isotr0py
Committed by
GitHub
Dec 19, 2025
Browse files
[Bugfix] Fix incorrect tiles creation for mm prefix triton attention (#30974)
Signed-off-by:
Isotr0py
<
mozf@mail2.sysu.edu.cn
>
parent
4924ac58
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
10 additions
and
4 deletions
+10
-4
vllm/attention/ops/triton_unified_attention.py
vllm/attention/ops/triton_unified_attention.py
+10
-4
No files found.
vllm/attention/ops/triton_unified_attention.py
View file @
ac1c9342
...
@@ -189,9 +189,14 @@ def kernel_unified_attention_2d(
...
@@ -189,9 +189,14 @@ def kernel_unified_attention_2d(
+
1
+
1
)
)
# adjust for potential padding in the last q_block by considering the
if
USE_MM_PREFIX
:
# actual sequence length
# image bidirectional attention ranges require a full range
max_seq_prefix_len
=
tl
.
minimum
(
max_seq_prefix_len
,
seq_len
)
# including q_block padding to make sure doc mask is correct
max_seq_prefix_len
=
tl
.
maximum
(
max_seq_prefix_len
,
seq_len
)
else
:
# adjust for potential padding in the last q_block by considering the
# actual sequence length
max_seq_prefix_len
=
tl
.
minimum
(
max_seq_prefix_len
,
seq_len
)
# calculate the number of tiles that need to be processed to
# calculate the number of tiles that need to be processed to
# cover the longest sequence prefix (due to causal masking, tiles beyond
# cover the longest sequence prefix (due to causal masking, tiles beyond
...
@@ -202,7 +207,8 @@ def kernel_unified_attention_2d(
...
@@ -202,7 +207,8 @@ def kernel_unified_attention_2d(
# Default: keep previous global behavior
# Default: keep previous global behavior
tile_start
=
0
tile_start
=
0
tile_end
=
num_tiles
tile_end
=
num_tiles
if
SLIDING_WINDOW
>
0
:
# TODO(Isotr0py): sliding window pruning with image bidirectional mask
if
SLIDING_WINDOW
>
0
and
not
USE_MM_PREFIX
:
# Query rows covered by this Q-block
# Query rows covered by this Q-block
qpos_lo
=
q_block_local_idx
*
BLOCK_Q
qpos_lo
=
q_block_local_idx
*
BLOCK_Q
qpos_hi
=
tl
.
minimum
(
qpos_hi
=
tl
.
minimum
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment