Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
a65a934e
Unverified
Commit
a65a934e
authored
Nov 09, 2025
by
Zhewen Li
Committed by
GitHub
Nov 09, 2025
Browse files
[CI/Build] Temporary fix to LM Eval Small Models (#28324)
Signed-off-by:
zhewenli
<
zhewenli@meta.com
>
parent
4a8d6bd1
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
8 additions
and
3 deletions
+8
-3
.buildkite/test-pipeline.yaml
.buildkite/test-pipeline.yaml
+1
-1
tests/evals/gsm8k/configs/Qwen1.5-MoE-W4A16-CT.yaml
tests/evals/gsm8k/configs/Qwen1.5-MoE-W4A16-CT.yaml
+4
-1
tests/evals/gsm8k/test_gsm8k_correctness.py
tests/evals/gsm8k/test_gsm8k_correctness.py
+3
-1
No files found.
.buildkite/test-pipeline.yaml
View file @
a65a934e
...
@@ -1253,7 +1253,7 @@ steps:
...
@@ -1253,7 +1253,7 @@ steps:
-
pytest -v -s tests/compile/test_fusions_e2e.py::test_tp2_attn_quant_allreduce_rmsnorm
-
pytest -v -s tests/compile/test_fusions_e2e.py::test_tp2_attn_quant_allreduce_rmsnorm
-
pytest -v -s tests/distributed/test_context_parallel.py
-
pytest -v -s tests/distributed/test_context_parallel.py
-
CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len
2048
-
CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len
2048
-
pytest -v -s tests/v1/distributed/test_dbo.py
-
pytest -v -s tests/v1/distributed/test_dbo.py
##### B200 test #####
##### B200 test #####
-
label
:
Distributed Tests (B200)
# optional
-
label
:
Distributed Tests (B200)
# optional
...
...
tests/evals/gsm8k/configs/Qwen1.5-MoE-W4A16-CT.yaml
View file @
a65a934e
...
@@ -2,4 +2,7 @@ model_name: "nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16"
...
@@ -2,4 +2,7 @@ model_name: "nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16"
accuracy_threshold
:
0.45
accuracy_threshold
:
0.45
num_questions
:
1319
num_questions
:
1319
num_fewshot
:
5
num_fewshot
:
5
max_model_len
:
4096
max_model_len
:
4096
\ No newline at end of file
# Duo stream incompatabilbe with this model: https://github.com/vllm-project/vllm/issues/28220
env
:
VLLM_DISABLE_SHARED_EXPERTS_STREAM
:
"
1"
tests/evals/gsm8k/test_gsm8k_correctness.py
View file @
a65a934e
...
@@ -62,9 +62,11 @@ def test_gsm8k_correctness_param(config_filename, tp_size):
...
@@ -62,9 +62,11 @@ def test_gsm8k_correctness_param(config_filename, tp_size):
str
(
tp_size
),
str
(
tp_size
),
]
]
env_dict
=
eval_config
.
get
(
"env"
,
None
)
# Launch server and run evaluation
# Launch server and run evaluation
with
RemoteOpenAIServer
(
with
RemoteOpenAIServer
(
eval_config
[
"model_name"
],
server_args
,
max_wait_seconds
=
480
eval_config
[
"model_name"
],
server_args
,
env_dict
=
env_dict
,
max_wait_seconds
=
480
)
as
remote_server
:
)
as
remote_server
:
server_url
=
remote_server
.
url_for
(
"v1"
)
server_url
=
remote_server
.
url_for
(
"v1"
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment