Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
1092a467
Commit
1092a467
authored
Jul 17, 2025
by
zhuwenwen
Browse files
add VLLM_USE_PA_PRINT_PARAM to print fa-pa size
parent
91feb245
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
0 deletions
+6
-0
vllm/attention/backends/rocm_flash_attn.py
vllm/attention/backends/rocm_flash_attn.py
+6
-0
No files found.
vllm/attention/backends/rocm_flash_attn.py
View file @
1092a467
...
...
@@ -970,6 +970,12 @@ class ROCmFlashAttentionImpl(AttentionImpl):
tree_attention_masks_tensor
=
decode_meta
.
tree_attention_masks_tensor
if
envs
.
VLLM_USE_FLASH_ATTN_PA
:
from
flash_attn
import
vllm_flash_attn_with_kvcache
if
envs
.
VLLM_USE_PA_PRINT_PARAM
:
print
(
"PA SIZE:"
)
print
(
f
"q.shape =
{
decode_query
.
unsqueeze
(
1
).
shape
}
, key_cache.shape =
{
key_cache
.
shape
}
, value_cache.shape =
{
value_cache
.
shape
}
, kv_cache_dtype =
{
self
.
kv_cache_dtype
}
"
)
print
(
f
"block_size=
{
block_size
}
, cache_seqlens.shape =
{
decode_meta
.
seq_lens_tensor
.
shape
}
, block_tables.shape =
{
decode_meta
.
block_tables
.
shape
}
"
)
print
(
f
"softmax_scale =
{
self
.
scale
:.
3
f
}
, window_size =
{
self
.
sliding_window
}
, softcap =
{
self
.
logits_soft_cap
}
, alibi_slopes =
{
self
.
alibi_slopes
}
"
)
# output[num_prefill_tokens:] = self.fa_decode_attn_func(
output
[
num_prefill_tokens
:]
=
vllm_flash_attn_with_kvcache
(
q
=
decode_query
.
unsqueeze
(
1
),
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment