Unverified Commit 0d50fa1d authored by Andreas Karatzas's avatar Andreas Karatzas Committed by GitHub
Browse files

[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)


Signed-off-by: default avatarAndreas Karatzas <akaratza@amd.com>
parent 1fa1e53a
...@@ -39,8 +39,7 @@ ...@@ -39,8 +39,7 @@
##################################################################################################################################### #####################################################################################################################################
# # # #
# IMPORTANT: # # IMPORTANT: #
# * Currently AMD CI has MI300 agents, MI325 agents, and MI355 agents. Of those, AMD is using mostly MI325 and MI355. AMD team # # * Currently AMD CI has MI250 agents, MI325 agents, and MI355 agents. All upcoming feature improvements are tracked in: #
# is actively working on enabling more MI300 machines. All upcoming feature improvements are tracked in: #
# https://github.com/vllm-project/vllm/issues/34994 # # https://github.com/vllm-project/vllm/issues/34994 #
# # # #
#-----------------------------------------------------------------------------------------------------------------------------------# #-----------------------------------------------------------------------------------------------------------------------------------#
...@@ -49,13 +48,15 @@ ...@@ -49,13 +48,15 @@
# * [Pytorch Nightly Dependency Override Check]: if this test fails, it means the nightly torch version is not compatible with # # * [Pytorch Nightly Dependency Override Check]: if this test fails, it means the nightly torch version is not compatible with #
# some of the dependencies. Please check the error message and add the package to # # some of the dependencies. Please check the error message and add the package to #
# whitelist in `/vllm/tools/pre_commit/generate_nightly_torch_test.py`. # # whitelist in `/vllm/tools/pre_commit/generate_nightly_torch_test.py`. #
# * [Entrypoints Integration Test (LLM)]: # # * [Entrypoints Integration (LLM)]: #
# - {`pytest -v -s entrypoints/llm/test_generate.py`}: It needs a clean process # # - {`pytest -v -s entrypoints/llm/test_generate.py`}: It needs a clean process #
# - {`pytest -v -s entrypoints/offline_mode`}: Needs to avoid interference with other tests # # - {`pytest -v -s entrypoints/offline_mode`}: Needs to avoid interference with other tests #
# * [V1 Test e2e + engine]: The test uses 4 GPUs, but we schedule it on 8-GPU machines for stability. See discussion here: # # * [Engine / Engine (1 GPU) / e2e Scheduling / e2e Core / V1 e2e / Spec Decode / V1 Sample + Logits / V1 Core + KV + Metrics]: #
# - Previously a single "V1 Test e2e + engine" step, now split across multiple groups. #
# - V1 e2e (2/4 GPUs) uses 4 GPUs but is scheduled on 8-GPU machines for stability. See: #
# https://github.com/vllm-project/vllm/pull/31040 # # https://github.com/vllm-project/vllm/pull/31040 #
# * [V1 others]: # # * [V1 Sample + Logits / V1 Core + KV + Metrics / V1 others (CPU)]: #
# - Split the tests to avoid interference # # - Previously a single "V1 others" step, now split to avoid interference. #
# - Integration test for streaming correctness (requires special branch for __harness__ lib). # # - Integration test for streaming correctness (requires special branch for __harness__ lib). #
# * [V1 others (CPU)]: Split the tests to avoid interference # # * [V1 others (CPU)]: Split the tests to avoid interference #
# * [PyTorch Compilation Unit Tests]: Run unit tests defined directly under `compile/`, not including subdirectories, which # # * [PyTorch Compilation Unit Tests]: Run unit tests defined directly under `compile/`, not including subdirectories, which #
...@@ -83,9 +84,9 @@ ...@@ -83,9 +84,9 @@
# run plamo2 model in vLLM. # # run plamo2 model in vLLM. #
# * [Language Models Test (Extended Generation)]: Install fast path packages for testing against transformers (mamba, conv1d) # # * [Language Models Test (Extended Generation)]: Install fast path packages for testing against transformers (mamba, conv1d) #
# and to run plamo2 model in vLLM. # # and to run plamo2 model in vLLM. #
# * [Multi-Modal Models (Standard)]: # # * [Multi-Modal Models (Standard) 1-4]: #
# - Do NOT remove `VLLM_WORKER_MULTIPROC_METHOD=spawn` setting as ROCm requires this for certain models to function. # # - Do NOT remove `VLLM_WORKER_MULTIPROC_METHOD=spawn` setting as ROCm requires this for certain models to function. #
# * [Transformers Nightly Models Test]: Whisper needs `VLLM_WORKER_MULTIPROC_METHOD=spawn` to avoid deadlock. # # * [Transformers Nightly Models]: Whisper needs `VLLM_WORKER_MULTIPROC_METHOD=spawn` to avoid deadlock. #
# * [Plugin Tests (2 GPUs)]: # # * [Plugin Tests (2 GPUs)]: #
# - {`pytest -v -s entrypoints/openai/test_oot_registration.py`}: It needs a clean process # # - {`pytest -v -s entrypoints/openai/test_oot_registration.py`}: It needs a clean process #
# - {`pytest -v -s models/test_oot_registration.py`}: It needs a clean process # # - {`pytest -v -s models/test_oot_registration.py`}: It needs a clean process #
...@@ -94,10 +95,10 @@ ...@@ -94,10 +95,10 @@
# - There is some Tensor Parallelism related processing logic in LoRA that requires multi-GPU testing for validation. # # - There is some Tensor Parallelism related processing logic in LoRA that requires multi-GPU testing for validation. #
# - {`pytest -v -s -x lora/test_gptoss_tp.py`}: Disabled for now because MXFP4 backend on non-cuda platform doesn't support # # - {`pytest -v -s -x lora/test_gptoss_tp.py`}: Disabled for now because MXFP4 backend on non-cuda platform doesn't support #
# LoRA yet. # # LoRA yet. #
# * [Distributed Tests (GPU_TAG)]: Don't test llama model here, it seems hf implementation is buggy. See: # # * [Distributed Tests (NxGPUs)(HW-TAG)]: Don't test llama model here, it seems hf implementation is buggy. See: #
# https://github.com/vllm-project/vllm/pull/5689 # # https://github.com/vllm-project/vllm/pull/5689 #
# * [Distributed Tests (GPU_TAG)]: Some old E2E tests were removed in https://github.com/vllm-project/vllm/pull/33293 in # # * [Distributed Tests (NxGPUs)(HW-TAG)]: Some old E2E tests were removed in https://github.com/vllm-project/vllm/pull/33293 #
# favor of new tests in fusions_e2e. We avoid replicating the new jobs in # # in favor of new tests in fusions_e2e. We avoid replicating the new jobs in #
# this file as it's deprecated. # # this file as it's deprecated. #
# # # #
##################################################################################################################################### #####################################################################################################################################
......
...@@ -220,7 +220,10 @@ VLM_TEST_SETTINGS = { ...@@ -220,7 +220,10 @@ VLM_TEST_SETTINGS = {
vllm_runner_kwargs={ vllm_runner_kwargs={
"model_impl": "transformers", "model_impl": "transformers",
}, },
marks=[pytest.mark.core_model], marks=[
pytest.mark.core_model,
*([large_gpu_mark(min_gb=80)] if current_platform.is_rocm() else []),
],
), ),
"idefics3-transformers": VLMTestInfo( "idefics3-transformers": VLMTestInfo(
models=["HuggingFaceTB/SmolVLM-256M-Instruct"], models=["HuggingFaceTB/SmolVLM-256M-Instruct"],
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment