Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
AutoAWQ
Commits
299c460b
".github/vscode:/vscode.git/clone" did not exist on "855297aef0cef788acb6814c3989731a69841ac7"
Unverified
Commit
299c460b
authored
Nov 11, 2023
by
Casper
Committed by
GitHub
Nov 11, 2023
Browse files
Fix cache util logic (#186)
parent
7c976752
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
awq/utils/fused_utils.py
awq/utils/fused_utils.py
+1
-1
No files found.
awq/utils/fused_utils.py
View file @
299c460b
...
@@ -7,7 +7,7 @@ def prepare_cache(blocks, seqlen: int) -> int:
...
@@ -7,7 +7,7 @@ def prepare_cache(blocks, seqlen: int) -> int:
will_cache_be_exceeded
=
start_pos
+
seqlen
>
block
.
attn
.
max_seq_len
will_cache_be_exceeded
=
start_pos
+
seqlen
>
block
.
attn
.
max_seq_len
# Reset and avoid retaining state when processing context
# Reset and avoid retaining state when processing context
if
seqlen
>
1
and
(
will_cache_be_exceeded
or
s
eqlen
>
1
):
if
seqlen
>
1
and
(
will_cache_be_exceeded
or
s
tart_pos
>
0
):
block
.
attn
.
start_pos
=
block
.
attn
.
cache
.
roll_kv_n_steps
(
start_pos
,
n
=
start_pos
)
block
.
attn
.
start_pos
=
block
.
attn
.
cache
.
roll_kv_n_steps
(
start_pos
,
n
=
start_pos
)
# Slowly roll out old tokens without performance hit if exceeded during decoding
# Slowly roll out old tokens without performance hit if exceeded during decoding
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment