Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
text-generation-inference
Commits
96a982ad
Commit
96a982ad
authored
Oct 25, 2023
by
OlivierDehaene
Browse files
fix: better warmup error
parent
f9910d13
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
server/text_generation_server/models/flash_causal_lm.py
server/text_generation_server/models/flash_causal_lm.py
+1
-1
No files found.
server/text_generation_server/models/flash_causal_lm.py
View file @
96a982ad
...
@@ -670,7 +670,7 @@ class FlashCausalLM(Model):
...
@@ -670,7 +670,7 @@ class FlashCausalLM(Model):
self
.
device
,
self
.
device
,
)
)
_
,
batch
=
self
.
generate_token
(
batch
)
_
,
batch
=
self
.
generate_token
(
batch
)
except
Exception
as
e
:
except
torch
.
cuda
.
OutOfMemoryError
as
e
:
raise
RuntimeError
(
raise
RuntimeError
(
f
"Not enough memory to handle
{
len
(
batch
.
input_ids
)
}
prefill tokens. "
f
"Not enough memory to handle
{
len
(
batch
.
input_ids
)
}
prefill tokens. "
f
"You need to decrease `--max-batch-prefill-tokens`"
f
"You need to decrease `--max-batch-prefill-tokens`"
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment