Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
1a4f35e2
Unverified
Commit
1a4f35e2
authored
Jul 10, 2025
by
Michael Goin
Committed by
GitHub
Jul 10, 2025
Browse files
Normalize lm-eval command between baseline and correctness test (#18560)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
be1e128d
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
4 additions
and
2 deletions
+4
-2
.buildkite/lm-eval-harness/run-lm-eval-gsm-vllm-baseline.sh
.buildkite/lm-eval-harness/run-lm-eval-gsm-vllm-baseline.sh
+1
-1
.buildkite/lm-eval-harness/test_lm_eval_correctness.py
.buildkite/lm-eval-harness/test_lm_eval_correctness.py
+3
-1
No files found.
.buildkite/lm-eval-harness/run-lm-eval-gsm-vllm-baseline.sh
View file @
1a4f35e2
...
...
@@ -46,6 +46,6 @@ while getopts "m:b:l:f:t:" OPT; do
done
lm_eval
--model
vllm
\
--model_args
"pretrained=
$MODEL
,tensor_parallel_size=
$TP_SIZE
,
distributed_executor_backend=ray
,trust_remote_code=true,max_model_len=4096"
\
--model_args
"pretrained=
$MODEL
,tensor_parallel_size=
$TP_SIZE
,
add_bos_token=true
,trust_remote_code=true,max_model_len=4096"
\
--tasks
gsm8k
--num_fewshot
"
$FEWSHOT
"
--limit
"
$LIMIT
"
\
--batch_size
"
$BATCH_SIZE
"
.buildkite/lm-eval-harness/test_lm_eval_correctness.py
View file @
1a4f35e2
...
...
@@ -18,12 +18,14 @@ RTOL = 0.08
def
launch_lm_eval
(
eval_config
,
tp_size
):
trust_remote_code
=
eval_config
.
get
(
"trust_remote_code"
,
False
)
max_model_len
=
eval_config
.
get
(
"max_model_len"
,
4096
)
model_args
=
(
f
"pretrained=
{
eval_config
[
'model_name'
]
}
,"
f
"tensor_parallel_size=
{
tp_size
}
,"
f
"enforce_eager=true,"
f
"add_bos_token=true,"
f
"trust_remote_code=
{
trust_remote_code
}
"
f
"trust_remote_code=
{
trust_remote_code
}
,"
f
"max_model_len=
{
max_model_len
}
"
)
results
=
lm_eval
.
simple_evaluate
(
model
=
"vllm"
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment