fix(tests): Ensure reliable CUDA cache clearing in MoE test (#23416)

Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix(tests): Ensure reliable CUDA cache clearing in MoE test (#23416)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
341923b9 · Aziz · GitHub · 424fb7a5 · 341923b9
Unverified Commit 341923b9 authored Aug 22, 2025 by Aziz Committed by GitHub Aug 22, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

tests/kernels/moe/test_moe.py tests/kernels/moe/test_moe.py +1 -1

No files found.
--- a/tests/kernels/moe/test_moe.py
+++ b/tests/kernels/moe/test_moe.py
@@ -429,11 +429,11 @@ def test_mixtral_moe(dtype: torch.dtype, padding: bool, use_rocm_aiter: bool,
                vllm_moe.experts.w13_weight, (0, 128), "constant", 0)[...,
                                                                      0:-128],
                                                    requires_grad=False)
-            torch.cuda.empty_cache()
            vllm_moe.experts.w2_weight = Parameter(F.pad(
                vllm_moe.experts.w2_weight, (0, 128), "constant", 0)[...,
                                                                     0:-128],
                                                   requires_grad=False)
+            torch.cuda.synchronize()
            torch.cuda.empty_cache()
        # Run forward passes for both MoE blocks