Unverified Commit 5cc48766 authored by Andreas Karatzas's avatar Andreas Karatzas Committed by GitHub
Browse files

[ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing...


[ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing agent pool size (#31553)
Signed-off-by: default avatarAndreas Karatzas <akaratza@amd.com>
parent 5fff4406
......@@ -859,7 +859,7 @@ steps:
- label: Language Models Tests (Extra Standard) %N
timeout_in_minutes: 45
mirror_hardwares: [amdexperimental]
agent_pool: mi325_8
agent_pool: mi325_2
# grade: Blocking
torch_nightly: true
source_file_dependencies:
......@@ -871,6 +871,7 @@ steps:
# Shard slow subset of standard language models tests. Only run when model
# source is modified, or when specified test files are modified
- pip freeze | grep -E 'torch'
- export TORCH_NCCL_BLOCKING_WAIT=1
- pytest -v -s models/language -m 'core_model and slow_test' \
--num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT \
--shard-id=$$BUILDKITE_PARALLEL_JOB
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment