Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
text-generation-inference
Commits
4b460e72
Unverified
Commit
4b460e72
authored
Apr 21, 2023
by
OlivierDehaene
Committed by
GitHub
Apr 21, 2023
Browse files
fix(server): fix flash batch filtering (#220)
parent
1ffea36e
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
3 deletions
+4
-3
server/text_generation_server/models/flash_causal_lm.py
server/text_generation_server/models/flash_causal_lm.py
+4
-3
No files found.
server/text_generation_server/models/flash_causal_lm.py
View file @
4b460e72
...
...
@@ -188,9 +188,10 @@ class FlashCausalLMBatch(Batch):
position_ids
.
append
(
self
.
position_ids
[
idx
])
cu_seqlens
.
append
(
cumulative_length
+
request_input_length
)
max_seqlen
=
max
(
max_seqlen
,
request_input_length
)
# True index for past
past_key_values
.
append
(
self
.
past_key_values
[
2
*
idx
])
if
not
single_request
:
# True index for past
past_key_values
.
append
(
self
.
past_key_values
[
2
*
idx
])
# Add one padding
past_key_values
.
append
(
self
.
past_pad
)
...
...
@@ -209,7 +210,7 @@ class FlashCausalLMBatch(Batch):
if
single_request
:
# Preallocate tensor for bs = 1 case
past_key_values
=
torch
.
nn
.
functional
.
pad
(
self
.
past_key_values
[
0
],
past_key_values
[
0
],
(
0
,
0
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment