Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
3c8a7872
Unverified
Commit
3c8a7872
authored
Aug 19, 2025
by
Daniel Serebrenik
Committed by
GitHub
Aug 19, 2025
Browse files
[Benchmark] Add flag --served-model-name to benchmark_serving_multi_turn (#22889)
Signed-off-by:
daniels
<
daniels@pliops.com
>
parent
01a08739
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
20 additions
and
6 deletions
+20
-6
benchmarks/multi_turn/README.md
benchmarks/multi_turn/README.md
+7
-5
benchmarks/multi_turn/benchmark_serving_multi_turn.py
benchmarks/multi_turn/benchmark_serving_multi_turn.py
+13
-1
No files found.
benchmarks/multi_turn/README.md
View file @
3c8a7872
...
@@ -5,11 +5,13 @@ The requirements (pip) for `benchmark_serving_multi_turn.py` can be found in `re
...
@@ -5,11 +5,13 @@ The requirements (pip) for `benchmark_serving_multi_turn.py` can be found in `re
First start serving your model
First start serving your model
```
bash
```
bash
export
MODEL_
NAME
=
/models/meta-llama/Meta-Llama-3.1-8B-Instruct/
export
MODEL_
PATH
=
/models/meta-llama/Meta-Llama-3.1-8B-Instruct/
vllm serve
$MODEL_
NAME
--disable-log-requests
vllm serve
$MODEL_
PATH
--served-model-name
Llama
--disable-log-requests
```
```
The variable
`MODEL_PATH`
should be a path to the model files (e.g. downloaded from huggingface).
## Synthetic Multi-Turn Conversations
## Synthetic Multi-Turn Conversations
Download the following text file (used for generation of synthetic conversations)
Download the following text file (used for generation of synthetic conversations)
...
@@ -26,10 +28,10 @@ But you may use other text files if you prefer (using this specific file is not
...
@@ -26,10 +28,10 @@ But you may use other text files if you prefer (using this specific file is not
Then run the benchmarking script
Then run the benchmarking script
```
bash
```
bash
export
MODEL_
NAME
=
/models/meta-llama/Meta-Llama-3.1-8B-Instruct/
export
MODEL_
PATH
=
/models/meta-llama/Meta-Llama-3.1-8B-Instruct/
python benchmark_serving_multi_turn.py
--model
$MODEL_
NAME
--input-file
generate_multi_turn.json
\
python benchmark_serving_multi_turn.py
--model
$MODEL_
PATH
--served-model-name
Llama
\
--num-clients
2
--max-active-conversations
6
--input-file
generate_multi_turn.json
--num-clients
2
--max-active-conversations
6
```
```
You can edit the file
`generate_multi_turn.json`
to change the conversation parameters (number of turns, etc.).
You can edit the file
`generate_multi_turn.json`
to change the conversation parameters (number of turns, etc.).
...
...
benchmarks/multi_turn/benchmark_serving_multi_turn.py
View file @
3c8a7872
...
@@ -825,9 +825,11 @@ def get_client_config(
...
@@ -825,9 +825,11 @@ def get_client_config(
# Arguments for API requests
# Arguments for API requests
chat_url
=
f
"
{
args
.
url
}
/v1/chat/completions"
chat_url
=
f
"
{
args
.
url
}
/v1/chat/completions"
model_name
=
args
.
served_model_name
if
args
.
served_model_name
else
args
.
model
req_args
=
RequestArgs
(
req_args
=
RequestArgs
(
chat_url
=
chat_url
,
chat_url
=
chat_url
,
model
=
args
.
model
,
model
=
model
_name
,
stream
=
not
args
.
no_stream
,
stream
=
not
args
.
no_stream
,
limit_min_tokens
=
args
.
limit_min_tokens
,
limit_min_tokens
=
args
.
limit_min_tokens
,
limit_max_tokens
=
args
.
limit_max_tokens
,
limit_max_tokens
=
args
.
limit_max_tokens
,
...
@@ -1247,9 +1249,19 @@ async def main() -> None:
...
@@ -1247,9 +1249,19 @@ async def main() -> None:
default
=
0
,
default
=
0
,
help
=
"Seed for random number generators (default: 0)"
,
help
=
"Seed for random number generators (default: 0)"
,
)
)
parser
.
add_argument
(
parser
.
add_argument
(
"-m"
,
"--model"
,
type
=
str
,
required
=
True
,
help
=
"Path of the LLM model"
"-m"
,
"--model"
,
type
=
str
,
required
=
True
,
help
=
"Path of the LLM model"
)
)
parser
.
add_argument
(
"--served-model-name"
,
type
=
str
,
default
=
None
,
help
=
"The model name used in the API. "
"If not specified, the model name will be the "
"same as the ``--model`` argument. "
,
)
parser
.
add_argument
(
parser
.
add_argument
(
"-u"
,
"-u"
,
"--url"
,
"--url"
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment