Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ColossalAI
Commits
ce2cafae
Unverified
Commit
ce2cafae
authored
Mar 29, 2023
by
ver217
Committed by
GitHub
Mar 29, 2023
Browse files
[coati] add repetition_penalty for inference (#3294)
parent
a88ed0f8
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
0 deletions
+2
-0
applications/Chat/inference/server.py
applications/Chat/inference/server.py
+2
-0
No files found.
applications/Chat/inference/server.py
View file @
ce2cafae
...
...
@@ -27,6 +27,7 @@ class GenerationTaskReq(BaseModel):
top_k
:
Optional
[
int
]
=
Field
(
default
=
None
,
gt
=
0
,
example
=
50
)
top_p
:
Optional
[
float
]
=
Field
(
default
=
None
,
gt
=
0.0
,
lt
=
1.0
,
example
=
0.5
)
temperature
:
Optional
[
float
]
=
Field
(
default
=
None
,
gt
=
0.0
,
lt
=
1.0
,
example
=
0.7
)
repetition_penalty
:
Optional
[
float
]
=
Field
(
default
=
None
,
gt
=
1.0
,
example
=
1.2
)
limiter
=
Limiter
(
key_func
=
get_remote_address
)
...
...
@@ -55,6 +56,7 @@ app.add_middleware(
def
generate_streamingly
(
prompt
,
max_new_tokens
,
top_k
,
top_p
,
temperature
):
inputs
=
{
k
:
v
.
cuda
()
for
k
,
v
in
tokenizer
(
prompt
,
return_tensors
=
"pt"
).
items
()}
#TODO(ver217): streaming generation does not support repetition_penalty now
model_kwargs
=
{
'max_generate_tokens'
:
max_new_tokens
,
'early_stopping'
:
True
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment