Unverified Commit 5cc48766 authored by Andreas Karatzas's avatar Andreas Karatzas Committed by GitHub
Browse files

[ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing...


[ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing agent pool size (#31553)
Signed-off-by: default avatarAndreas Karatzas <akaratza@amd.com>
parent 5fff4406
...@@ -859,7 +859,7 @@ steps: ...@@ -859,7 +859,7 @@ steps:
- label: Language Models Tests (Extra Standard) %N - label: Language Models Tests (Extra Standard) %N
timeout_in_minutes: 45 timeout_in_minutes: 45
mirror_hardwares: [amdexperimental] mirror_hardwares: [amdexperimental]
agent_pool: mi325_8 agent_pool: mi325_2
# grade: Blocking # grade: Blocking
torch_nightly: true torch_nightly: true
source_file_dependencies: source_file_dependencies:
...@@ -871,6 +871,7 @@ steps: ...@@ -871,6 +871,7 @@ steps:
# Shard slow subset of standard language models tests. Only run when model # Shard slow subset of standard language models tests. Only run when model
# source is modified, or when specified test files are modified # source is modified, or when specified test files are modified
- pip freeze | grep -E 'torch' - pip freeze | grep -E 'torch'
- export TORCH_NCCL_BLOCKING_WAIT=1
- pytest -v -s models/language -m 'core_model and slow_test' \ - pytest -v -s models/language -m 'core_model and slow_test' \
--num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT \ --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT \
--shard-id=$$BUILDKITE_PARALLEL_JOB --shard-id=$$BUILDKITE_PARALLEL_JOB
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment