Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
text-generation-inference
Commits
72501097
Commit
72501097
authored
Jan 24, 2025
by
xuxzh1
🎱
Browse files
update rocm.py
parent
d4bccff3
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
3 deletions
+2
-3
server/text_generation_server/layers/attention/rocm.py
server/text_generation_server/layers/attention/rocm.py
+2
-3
No files found.
server/text_generation_server/layers/attention/rocm.py
View file @
72501097
...
...
@@ -80,7 +80,6 @@ def paged_attention(
_PARTITION_SIZE
=
_PARTITION_SIZE_V1V2
else
:
_PARTITION_SIZE
=
_PARTITION_SIZE_CUSTOM
_PARTITION_SIZE
=
512
max_num_partitions
=
(
max_s
+
_PARTITION_SIZE
-
1
)
//
_PARTITION_SIZE
input_lengths
=
seqlen
.
input_lengths
+
seqlen
.
cache_lengths
...
...
@@ -234,7 +233,7 @@ def attention(
softcap
=
0.0
# We do not need to check window_size_left (not supported) here, so it is already checked ahead of time at model load.
return
flash_attn_2_cuda
.
varlen_fwd
(
return
flash_attn_2_cuda
.
varlen_fwd
(
query
,
key
,
value
,
...
...
@@ -257,7 +256,7 @@ def attention(
False
,
None
,
)[
0
]
elif
ENGINE
==
"triton"
:
from
.flash_attn_triton
import
triton_attention
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment