[CI/Build] Temporary fix to LM Eval Small Models (#28324)

Signed-off-by: zhewenli <zhewenli@meta.com>

[CI/Build] Temporary fix to LM Eval Small Models (#28324)
Signed-off-by: zhewenli <zhewenli@meta.com>
a65a934e · Zhewen Li · GitHub · 4a8d6bd1 · a65a934e · a65a934e
Unverified Commit a65a934e authored Nov 09, 2025 by Zhewen Li Committed by GitHub Nov 09, 2025
3 changed files
--- a/.buildkite/test-pipeline.yaml
+++ b/.buildkite/test-pipeline.yaml
--- a/tests/evals/gsm8k/configs/Qwen1.5-MoE-W4A16-CT.yaml
+++ b/tests/evals/gsm8k/configs/Qwen1.5-MoE-W4A16-CT.yaml
@@ -3,3 +3,6 @@ accuracy_threshold: 0.45
 num_questions: 1319
 num_fewshot: 5
 max_model_len: 4096
+# Duo stream incompatabilbe with this model: https://github.com/vllm-project/vllm/issues/28220
+env:
+  VLLM_DISABLE_SHARED_EXPERTS_STREAM: "1"
--- a/tests/evals/gsm8k/test_gsm8k_correctness.py
+++ b/tests/evals/gsm8k/test_gsm8k_correctness.py
@@ -62,9 +62,11 @@ def test_gsm8k_correctness_param(config_filename, tp_size):
        str(tp_size),
    ]
+    env_dict = eval_config.get("env", None)
    # Launch server and run evaluation
    with RemoteOpenAIServer(
-        eval_config["model_name"], server_args, max_wait_seconds=480
+        eval_config["model_name"], server_args, env_dict=env_dict, max_wait_seconds=480
    ) as remote_server:
        server_url = remote_server.url_for("v1")