Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
7e0ef408
Unverified
Commit
7e0ef408
authored
Oct 14, 2025
by
Michael Goin
Committed by
GitHub
Oct 14, 2025
Browse files
[CI Failure] Fix torchao dep failure for Quantization Test (#26824)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
4aed506b
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
8 additions
and
4 deletions
+8
-4
.buildkite/test-amd.yaml
.buildkite/test-amd.yaml
+2
-1
.buildkite/test-pipeline.yaml
.buildkite/test-pipeline.yaml
+2
-1
tests/quantization/test_compressed_tensors.py
tests/quantization/test_compressed_tensors.py
+2
-1
vllm/model_executor/layers/quantization/rtn.py
vllm/model_executor/layers/quantization/rtn.py
+2
-1
No files found.
.buildkite/test-amd.yaml
View file @
7e0ef408
...
...
@@ -603,7 +603,8 @@ steps:
# since torchao nightly is only compatible with torch nightly currently
# https://github.com/pytorch/ao/issues/2919, we'll have to skip new torchao tests for now
# we can only upgrade after this is resolved
-
pip install --pre torchao==0.13.0.dev20250814 --index-url https://download.pytorch.org/whl/nightly/cu128
# TODO(jerryzh168): resolve the above comment
-
uv pip install --system torchao==0.13.0
-
VLLM_TEST_FORCE_LOAD_FORMAT=auto pytest -v -s quantization/
-
label
:
LM Eval Small Models
# 53min
...
...
.buildkite/test-pipeline.yaml
View file @
7e0ef408
...
...
@@ -527,7 +527,8 @@ steps:
# since torchao nightly is only compatible with torch nightly currently
# https://github.com/pytorch/ao/issues/2919, we'll have to skip new torchao tests for now
# we can only upgrade after this is resolved
-
pip install --pre torchao==0.13.0.dev20250814 --index-url https://download.pytorch.org/whl/nightly/cu128
# TODO(jerryzh168): resolve the above comment
-
uv pip install --system torchao==0.13.0
-
VLLM_TEST_FORCE_LOAD_FORMAT=auto pytest -v -s quantization/
-
label
:
LM Eval Small Models
# 53min
...
...
tests/quantization/test_compressed_tensors.py
View file @
7e0ef408
...
...
@@ -697,7 +697,8 @@ def test_compressed_tensors_2of4_sparse_compressed(vllm_runner, args_2of4):
@
pytest
.
mark
.
parametrize
(
"args"
,
[
(
"nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4A16"
,
CompressedTensorsW4A16Fp4
),
# TODO: Enable once model is available again
# ("nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4A16", CompressedTensorsW4A16Fp4),
(
"nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4"
,
CompressedTensorsW4A4Fp4
),
],
)
...
...
vllm/model_executor/layers/quantization/rtn.py
View file @
7e0ef408
...
...
@@ -15,6 +15,7 @@ from vllm.model_executor.layers.fused_moe.config import (
FusedMoEConfig
,
FusedMoEQuantConfig
,
)
from
vllm.model_executor.layers.fused_moe.fused_marlin_moe
import
fused_marlin_moe
from
vllm.model_executor.layers.fused_moe.layer
import
FusedMoE
,
FusedMoEMethodBase
from
vllm.model_executor.layers.linear
import
(
LinearBase
,
...
...
@@ -396,7 +397,7 @@ class RTNMoEMethod(FusedMoEMethodBase):
indices_type
=
self
.
topk_indices_dtype
,
)
return
torch
.
ops
.
vllm
.
fused_marlin_moe
(
return
fused_marlin_moe
(
x
,
layer
.
w13_weight
,
layer
.
w2_weight
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment