Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
0d50fa1d
Unverified
Commit
0d50fa1d
authored
Mar 20, 2026
by
Andreas Karatzas
Committed by
GitHub
Mar 21, 2026
Browse files
[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
Signed-off-by:
Andreas Karatzas
<
akaratza@amd.com
>
parent
1fa1e53a
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
19 additions
and
15 deletions
+19
-15
.buildkite/test-amd.yaml
.buildkite/test-amd.yaml
+15
-14
tests/models/multimodal/generation/test_common.py
tests/models/multimodal/generation/test_common.py
+4
-1
No files found.
.buildkite/test-amd.yaml
View file @
0d50fa1d
...
...
@@ -39,8 +39,7 @@
#####################################################################################################################################
# #
# IMPORTANT: #
# * Currently AMD CI has MI300 agents, MI325 agents, and MI355 agents. Of those, AMD is using mostly MI325 and MI355. AMD team #
# is actively working on enabling more MI300 machines. All upcoming feature improvements are tracked in: #
# * Currently AMD CI has MI250 agents, MI325 agents, and MI355 agents. All upcoming feature improvements are tracked in: #
# https://github.com/vllm-project/vllm/issues/34994 #
# #
#-----------------------------------------------------------------------------------------------------------------------------------#
...
...
@@ -49,13 +48,15 @@
# * [Pytorch Nightly Dependency Override Check]: if this test fails, it means the nightly torch version is not compatible with #
# some of the dependencies. Please check the error message and add the package to #
# whitelist in `/vllm/tools/pre_commit/generate_nightly_torch_test.py`. #
# * [Entrypoints Integration
Test
(LLM)]: #
# * [Entrypoints Integration (LLM)]:
#
# - {`pytest -v -s entrypoints/llm/test_generate.py`}: It needs a clean process #
# - {`pytest -v -s entrypoints/offline_mode`}: Needs to avoid interference with other tests #
# * [V1 Test e2e + engine]: The test uses 4 GPUs, but we schedule it on 8-GPU machines for stability. See discussion here: #
# https://github.com/vllm-project/vllm/pull/31040 #
# * [V1 others]: #
# - Split the tests to avoid interference #
# * [Engine / Engine (1 GPU) / e2e Scheduling / e2e Core / V1 e2e / Spec Decode / V1 Sample + Logits / V1 Core + KV + Metrics]: #
# - Previously a single "V1 Test e2e + engine" step, now split across multiple groups. #
# - V1 e2e (2/4 GPUs) uses 4 GPUs but is scheduled on 8-GPU machines for stability. See: #
# https://github.com/vllm-project/vllm/pull/31040 #
# * [V1 Sample + Logits / V1 Core + KV + Metrics / V1 others (CPU)]: #
# - Previously a single "V1 others" step, now split to avoid interference. #
# - Integration test for streaming correctness (requires special branch for __harness__ lib). #
# * [V1 others (CPU)]: Split the tests to avoid interference #
# * [PyTorch Compilation Unit Tests]: Run unit tests defined directly under `compile/`, not including subdirectories, which #
...
...
@@ -83,9 +84,9 @@
# run plamo2 model in vLLM. #
# * [Language Models Test (Extended Generation)]: Install fast path packages for testing against transformers (mamba, conv1d) #
# and to run plamo2 model in vLLM. #
# * [Multi-Modal Models (Standard)
]:
#
# * [Multi-Modal Models (Standard)
1-4]:
#
# - Do NOT remove `VLLM_WORKER_MULTIPROC_METHOD=spawn` setting as ROCm requires this for certain models to function. #
# * [Transformers Nightly Models
Test
]: Whisper needs `VLLM_WORKER_MULTIPROC_METHOD=spawn` to avoid deadlock. #
# * [Transformers Nightly Models]: Whisper needs `VLLM_WORKER_MULTIPROC_METHOD=spawn` to avoid deadlock.
#
# * [Plugin Tests (2 GPUs)]: #
# - {`pytest -v -s entrypoints/openai/test_oot_registration.py`}: It needs a clean process #
# - {`pytest -v -s models/test_oot_registration.py`}: It needs a clean process #
...
...
@@ -94,11 +95,11 @@
# - There is some Tensor Parallelism related processing logic in LoRA that requires multi-GPU testing for validation. #
# - {`pytest -v -s -x lora/test_gptoss_tp.py`}: Disabled for now because MXFP4 backend on non-cuda platform doesn't support #
# LoRA yet. #
# * [Distributed Tests (GPU
_
TAG)]: Don't test llama model here, it seems hf implementation is buggy. See:
#
# https://github.com/vllm-project/vllm/pull/5689
#
# * [Distributed Tests (GPU
_
TAG)]: Some old E2E tests were removed in https://github.com/vllm-project/vllm/pull/33293
in
#
# favor of new tests in fusions_e2e. We avoid replicating the new jobs in
#
# this file as it's deprecated.
#
# * [Distributed Tests (
Nx
GPU
s)(HW-
TAG)]: Don't test llama model here, it seems hf implementation is buggy. See: #
#
https://github.com/vllm-project/vllm/pull/5689 #
# * [Distributed Tests (
Nx
GPU
s)(HW-
TAG)]: Some old E2E tests were removed in https://github.com/vllm-project/vllm/pull/33293 #
#
in
favor of new tests in fusions_e2e. We avoid replicating the new jobs in #
#
this file as it's deprecated. #
# #
#####################################################################################################################################
...
...
tests/models/multimodal/generation/test_common.py
View file @
0d50fa1d
...
...
@@ -220,7 +220,10 @@ VLM_TEST_SETTINGS = {
vllm_runner_kwargs
=
{
"model_impl"
:
"transformers"
,
},
marks
=
[
pytest
.
mark
.
core_model
],
marks
=
[
pytest
.
mark
.
core_model
,
*
([
large_gpu_mark
(
min_gb
=
80
)]
if
current_platform
.
is_rocm
()
else
[]),
],
),
"idefics3-transformers"
:
VLMTestInfo
(
models
=
[
"HuggingFaceTB/SmolVLM-256M-Instruct"
],
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment