Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
norm
vllm
Commits
ba84b872
Commit
ba84b872
authored
Feb 23, 2023
by
Woosuk Kwon
Browse files
Fix attention
parent
87e0bcd4
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
1 deletion
+5
-1
cacheflow/models/attention.py
cacheflow/models/attention.py
+5
-1
No files found.
cacheflow/models/attention.py
View file @
ba84b872
from
typing
import
Optional
,
Tuple
from
typing
import
Optional
import
torch
import
torch.nn
as
nn
...
...
@@ -24,8 +24,12 @@ class OPTCacheFlowAttention(nn.Module):
key
:
torch
.
Tensor
,
value
:
torch
.
Tensor
,
)
->
None
:
query
=
query
.
unsqueeze
(
0
)
key
=
key
.
unsqueeze
(
0
)
value
=
value
.
unsqueeze
(
0
)
out
=
xops
.
memory_efficient_attention
(
query
,
key
,
value
,
attn_bias
=
self
.
attention_mask
,
scale
=
self
.
scale
)
out
=
out
.
squeeze
(
0
)
# FIXME(woosuk): Directly write the attention output.
output
.
copy_
(
out
,
non_blocking
=
True
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment