Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
4c737f0e
Unverified
Commit
4c737f0e
authored
Feb 23, 2022
by
Lysandre Debut
Committed by
GitHub
Feb 23, 2022
Browse files
[Test refactor 4/5] Improve the scheduled tests (#15728)
parent
d3ae2bd3
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
145 additions
and
422 deletions
+145
-422
.github/workflows/self-scheduled.yml
.github/workflows/self-scheduled.yml
+145
-422
No files found.
.github/workflows/self-scheduled.yml
View file @
4c737f0e
...
...
@@ -3,533 +3,256 @@ name: Self-hosted runner (scheduled)
on
:
push
:
branches
:
-
multi_ci_*
-
master
-
ci_*
-
ci-*
-
github-actions-workflows
paths
:
-
"
src/**"
-
"
tests/**"
-
"
.github/**"
-
"
templates/**"
-
"
utils/**"
repository_dispatch
:
schedule
:
-
cron
:
"
0
0
*
*
*"
-
cron
:
"
0
2
*
*
*"
env
:
HF_HOME
:
/mnt/cache
TRANSFORMERS_IS_CI
:
yes
OMP_NUM_THREADS
:
8
MKL_NUM_THREADS
:
8
RUN_SLOW
:
yes
OMP_NUM_THREADS
:
16
MKL_NUM_THREADS
:
16
PYTEST_TIMEOUT
:
600
SIGOPT_API_TOKEN
:
${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH
:
true
RUN_PT_TF_CROSS_TESTS
:
1
jobs
:
run_all_tests_torch_gpu
:
runs-on
:
[
self-hosted
,
docker-gpu
,
single-gpu
]
setup
:
name
:
Setup
strategy
:
matrix
:
machines
:
[
multi-gpu-docker
,
single-gpu-docker
]
runs-on
:
${{ matrix.machines }}
container
:
image
:
pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
image
:
huggingface/transformers-all-latest-gpu
options
:
--gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs
:
matrix
:
${{ steps.set-matrix.outputs.matrix }}
steps
:
-
name
:
Launcher docker
uses
:
actions/checkout@v2
-
name
:
NVIDIA-SMI
run
:
|
nvidia-smi
-
name
:
Install dependencies
-
name
:
Update clone
working-directory
:
/transformers
run
:
|
apt -y update && apt install -y libsndfile1-dev git espeak-ng
pip install --upgrade pip
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
wandb login ${{ secrets.WANDB_API_KEY }}
git fetch && git checkout ${{ github.sha }}
-
name
:
Are GPUs recognized by our DL frameworks
-
name
:
Cleanup
working-directory
:
/transformers
run
:
|
utils/print_env_pt.py
rm -rf tests/__pycache__
rm -rf reports
-
name
:
Run all tests on GPU
-
id
:
set-matrix
name
:
Identify models to test
working-directory
:
/transformers/tests
run
:
|
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_gpu tests
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_torch_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_torch_gpu_durations.txt
echo "::set-output name=matrix::$(python3 -c 'import os; x = list(filter(os.path.isdir, os.listdir(os.getcwd()))); x.sort(); print(x)')"
-
name
:
Run examples tests on GPU
if
:
${{ always() }}
env
:
OMP_NUM_THREADS
:
16
MKL_NUM_THREADS
:
16
RUN_SLOW
:
yes
HF_HOME
:
/mnt/cache
TRANSFORMERS_IS_CI
:
yes
-
name
:
NVIDIA-SMI
run
:
|
pip install -r examples/pytorch/_tests_requirements.txt
python -m pytest -n 1 -v --dist=loadfile --make-reports=examples_torch_gpu examples
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/examples_torch_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/examples_torch_gpu_durations.txt
nvidia-smi
-
name
:
Run all pipeline tests on GPU
if
:
${{ always() }}
env
:
RUN_PIPELINE_TESTS
:
yes
-
name
:
GPU visibility
working-directory
:
/transformers
run
:
|
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_gpu tests
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_torch_pipeline_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_torch_pipeline_gpu_durations.txt
-
name
:
Test suite reports artifacts
if
:
${{ always() }}
uses
:
actions/upload-artifact@v2
with
:
name
:
run_all_tests_torch_gpu_test_reports
path
:
reports
# run_all_tests_flax_gpu:
# runs-on: [self-hosted, docker-gpu-test, single-gpu]
# container:
# image: tensorflow/tensorflow:2.4.1-gpu
# options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
# steps:
# - name: Launcher docker
# uses: actions/checkout@v2
#
# - name: NVIDIA-SMI
# continue-on-error: true
# run: |
# nvidia-smi
#
# - name: Install dependencies
# run: |
# pip install --upgrade pip
# pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
# pip install .[flax,integrations,sklearn,testing,sentencepiece,flax-speech,vision]
# pip install https://github.com/kpu/kenlm/archive/master.zip
#
# - name: Are GPUs recognized by our DL frameworks
# run: |
# python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
# python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
#
# - name: Run all tests on GPU
# run: |
# python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_flax_gpu tests
#
# - name: Failure short reports
# if: ${{ always() }}
# run: cat reports/tests_flax_gpu_failures_short.txt
#
# - name: Test durations
# if: ${{ always() }}
# run: cat reports/tests_flax_gpu_durations.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v2
# with:
# name: run_all_tests_flax_gpu_test_reports
# path: reports
run_all_tests_tf_gpu
:
runs-on
:
[
self-hosted
,
docker-gpu
,
single-gpu
]
utils/print_env_pt.py
TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
run_tests_gpu
:
name
:
Model tests
strategy
:
fail-fast
:
false
matrix
:
folders
:
${{ fromJson(needs.setup.outputs.matrix) }}
machines
:
[
multi-gpu-docker
,
single-gpu-docker
]
runs-on
:
${{ matrix.machines }}
container
:
image
:
tensorflow/te
nsor
flow:2.4.1
-gpu
image
:
huggingface/tra
ns
f
or
mers-all-latest
-gpu
options
:
--gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs
:
setup
steps
:
-
name
:
Launcher docker
uses
:
actions/checkout@v2
-
name
:
NVIDIA-SMI
run
:
|
nvidia-smi
-
name
:
Install dependencies
run
:
|
apt -y update && apt install -y libsndfile1-dev git espeak-ng
pip install --upgrade pip
pip install .[sklearn,testing,onnx,sentencepiece,tf-speech,vision]
pip install https://github.com/kpu/kenlm/archive/master.zip
-
name
:
Echo folder ${{ matrix.folders }}
run
:
echo "${{ matrix.folders }}"
-
name
:
Are GPUs recognized by our DL frameworks
run
:
|
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
-
name
:
Update clone
working-directory
:
/transformers
run
:
git fetch && git checkout ${{ github.sha }}
-
name
:
Run all tests on GPU
env
:
TF_NUM_INTEROP_THREADS
:
1
TF_NUM_INTRAOP_THREADS
:
16
run
:
|
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_tf_gpu tests
-
name
:
Run all non-slow tests on GPU
working-directory
:
/transformers
run
:
python3 -m pytest -v --make-reports=${{ matrix.machines }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }}
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_tf_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_tf_gpu_durations.txt
-
name
:
Run all pipeline tests on GPU
if
:
${{ always() }}
env
:
RUN_PIPELINE_TESTS
:
yes
TF_NUM_INTEROP_THREADS
:
1
TF_NUM_INTRAOP_THREADS
:
16
run
:
|
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_tf_pipeline_gpu tests
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_tf_pipeline_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_tf_pipeline_gpu_durations.txt
if
:
${{ failure() }}
continue-on-error
:
true
run
:
cat /transformers/reports/${{ matrix.machines }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
-
name
:
Test suite reports artifacts
if
:
${{ always() }}
uses
:
actions/upload-artifact@v2
with
:
name
:
run_all_tests_tf_gpu
_test_reports
path
:
reports
name
:
${{ matrix.machines }}_run_all_tests_gpu_${{ matrix.folders }}
_test_reports
path
:
/transformers/reports/${{ matrix.machines }}_tests_gpu_${{ matrix.folders }}
run_all_examples_torch_xla_tpu
:
runs-on
:
[
self-hosted
,
docker-tpu-test
,
tpu-v3-8
]
run_examples_gpu
:
name
:
Examples directory
runs-on
:
[
self-hosted
,
single-gpu-docker
]
container
:
image
:
gcr.io/tpu-pytorch/xla:nightly_3.8_tpuvm
options
:
--privileged -v "/lib/libtpu.so:/lib/libtpu.so" -v /mnt/cache/.cache/huggingface:/mnt/cache/ --shm-size 16G
image
:
huggingface/transformers-all-latest-gpu
options
:
--gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs
:
setup
steps
:
-
name
:
Launcher docker
uses
:
actions/checkout@v2
-
name
:
Install dependencies
run
:
|
pip install --upgrade pip
pip install .[testing]
-
name
:
Are TPUs recognized by our DL frameworks
env
:
XRT_TPU_CONFIG
:
localservice;0;localhost:51011
run
:
|
python -c "import torch_xla.core.xla_model as xm; print(xm.xla_device())"
-
name
:
Run example tests on TPU
env
:
XRT_TPU_CONFIG
:
"
localservice;0;localhost:51011"
MKL_SERVICE_FORCE_INTEL
:
"
1"
# See: https://github.com/pytorch/pytorch/issues/37377
-
name
:
Update clone
working-directory
:
/transformers
run
:
git fetch && git checkout ${{ github.sha }}
-
name
:
Run examples tests on GPU
working-directory
:
/transformers
run
:
|
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_xla_tpu examples/pytorch/test_xla_examples.py
pip install -r examples/pytorch/_tests_requirements.txt
python3 -m pytest -v --make-reports=examples_gpu examples/pytorch
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_torch_xla_tpu_failures_short.txt
-
name
:
Tests durations
if
:
${{ always() }}
run
:
cat reports/tests_torch_xla_tpu_durations.txt
if
:
${{ failure() }}
continue-on-error
:
true
run
:
cat /transformers/reports/examples_gpu/failures_short.txt
-
name
:
Test suite reports artifacts
if
:
${{ always() }}
uses
:
actions/upload-artifact@v2
with
:
name
:
run_all_examples_torch_xla_tpu
path
:
reports
run_all_tests_torch_multi_gpu
:
runs-on
:
[
self-hosted
,
docker-gpu
,
multi-gpu
]
name
:
run_examples_gpu
path
:
/transformers/reports/examples_gpu
run_pipelines_torch_gpu
:
name
:
PyTorch pipelines
strategy
:
fail-fast
:
false
matrix
:
machines
:
[
multi-gpu-docker
,
single-gpu-docker
]
runs-on
:
${{ matrix.machines }}
container
:
image
:
pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
options
:
--gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
image
:
huggingface/transformers-pytorch-latest-gpu
options
:
--gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs
:
setup
steps
:
-
name
:
Launcher docker
uses
:
actions/checkout@v2
-
name
:
NVIDIA-SMI
continue-on-error
:
true
run
:
|
nvidia-smi
-
name
:
Install dependencies
run
:
|
apt -y update && apt install -y libsndfile1-dev git espeak-ng
pip install --upgrade pip
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
wandb login ${{ secrets.WANDB_API_KEY }}
-
name
:
Are GPUs recognized by our DL frameworks
run
:
|
utils/print_env_pt.py
-
name
:
Run all tests on GPU
env
:
MKL_SERVICE_FORCE_INTEL
:
1
run
:
|
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_multi_gpu tests
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_torch_multi_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_torch_multi_gpu_durations.txt
-
name
:
Update clone
working-directory
:
/transformers
run
:
git fetch && git checkout ${{ github.sha }}
-
name
:
Run all pipeline tests on GPU
if
:
${{ always() }}
working-directory
:
/transformers
env
:
RUN_PIPELINE_TESTS
:
yes
run
:
|
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_
multi_
gpu tests
python
3
-m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=
${{ matrix.machines }}_
tests_torch_pipeline_gpu tests
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_torch_pipeline_multi_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_torch_pipeline_multi_gpu_durations.txt
if
:
${{ failure() }}
continue-on-error
:
true
run
:
cat /transformers/reports/${{ matrix.machines }}_tests_torch_pipeline_gpu/failures_short.txt
-
name
:
Test suite reports artifacts
if
:
${{ always() }}
uses
:
actions/upload-artifact@v2
with
:
name
:
run_all_tests_torch_multi_gpu_test_reports
path
:
reports
run_all_tests_tf_multi_gpu
:
runs-on
:
[
self-hosted
,
docker-gpu
,
multi-gpu
]
name
:
${{ matrix.machines }}_run_tests_torch_pipeline_gpu
path
:
/transformers/reports/${{ matrix.machines }}_tests_torch_pipeline_gpu
run_pipelines_tf_gpu
:
name
:
TensorFlow pipelines
strategy
:
fail-fast
:
false
matrix
:
machines
:
[
multi-gpu-docker
,
single-gpu-docker
]
runs-on
:
${{ matrix.machines }}
container
:
image
:
tensorflow/tensorflow:2.4.1-gpu
options
:
--gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
image
:
huggingface/transformers-tensorflow-latest-gpu
options
:
--gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs
:
setup
steps
:
-
name
:
Launcher docker
uses
:
actions/checkout@v2
-
name
:
NVIDIA-SMI
continue-on-error
:
true
run
:
|
nvidia-smi
-
name
:
Install dependencies
run
:
|
apt -y update && apt install -y libsndfile1-dev git espeak-ng
pip install --upgrade pip
pip install .[sklearn,testing,onnx,sentencepiece,tf-speech,vision]
pip install https://github.com/kpu/kenlm/archive/master.zip
-
name
:
Are GPUs recognized by our DL frameworks
run
:
|
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
-
name
:
Run all tests on GPU
env
:
TF_NUM_INTEROP_THREADS
:
1
TF_NUM_INTRAOP_THREADS
:
16
-
name
:
Update clone
working-directory
:
/transformers
run
:
|
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_tf_multi_gpu tests
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_tf_multi_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_tf_multi_gpu_durations.txt
git fetch && git checkout ${{ github.sha }}
-
name
:
Run all pipeline tests on GPU
if
:
${{ always() }}
working-directory
:
/transformers
env
:
RUN_PIPELINE_TESTS
:
yes
TF_NUM_INTEROP_THREADS
:
1
TF_NUM_INTRAOP_THREADS
:
16
run
:
|
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_tf_pipeline_
multi_
gpu tests
python
3
-m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=
${{ matrix.machines }}_
tests_tf_pipeline_gpu tests
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_tf_pipeline_multi_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_tf_pipeline_multi_gpu_durations.txt
run
:
|
cat /transformers/reports/${{ matrix.machines }}_tests_tf_pipeline_gpu/failures_short.txt
-
name
:
Test suite reports artifacts
if
:
${{ always() }}
uses
:
actions/upload-artifact@v2
with
:
name
:
run_all_tests_tf_multi_gpu_test_reports
path
:
reports
# run_all_tests_flax_multi_gpu:
# runs-on: [self-hosted, docker-gpu, multi-gpu]
# container:
# image: tensorflow/tensorflow:2.4.1-gpu
# options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
# steps:
# - name: Launcher docker
# uses: actions/checkout@v2
#
# - name: NVIDIA-SMI
# run: |
# nvidia-smi
#
# - name: Install dependencies
# run: |
# pip install --upgrade pip
# pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
# pip install .[flax,integrations,sklearn,testing,sentencepiece,flax-speech,vision]
#
# - name: Are GPUs recognized by our DL frameworks
# run: |
# python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
# python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
#
# - name: Run all tests on GPU
# run: |
# python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_flax_gpu tests
#
# - name: Failure short reports
# if: ${{ always() }}
# run: cat reports/tests_flax_gpu_failures_short.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v2
# with:
# name: run_all_tests_flax_gpu_test_reports
# path: reports
name
:
${{ matrix.machines }}_run_tests_tf_pipeline_gpu
path
:
/transformers/reports/${{ matrix.machines }}_tests_tf_pipeline_gpu
run_all_tests_torch_cuda_extensions_gpu
:
runs-on
:
[
self-hosted
,
docker-gpu
,
single-gpu
]
name
:
Torch CUDA extension tests
strategy
:
fail-fast
:
false
matrix
:
machines
:
[
multi-gpu-docker
,
single-gpu-docker
]
runs-on
:
${{ matrix.machines }}
needs
:
setup
container
:
image
:
nvcr.io/nvidia/pytorch:21.03-py3
options
:
--gpus
0
--shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
image
:
huggingface/transformers-pytorch-deepspeed-latest-gpu
options
:
--gpus
all
--shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps
:
-
name
:
Launcher docker
uses
:
actions/checkout@v2
-
name
:
NVIDIA-SMI
run
:
|
nvidia-smi
-
name
:
Install dependencies
run
:
|
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
pip install .[testing,deepspeed]
-
name
:
Are GPUs recognized by our DL frameworks
run
:
|
utils/print_env_pt.py
-
name
:
Update clone
working-directory
:
/workspace/transformers
run
:
git fetch && git checkout ${{ github.sha }}
-
name
:
Run all tests on GPU
working-directory
:
/workspace/transformers
run
:
|
python -m pytest -
n 1 -v --dist=loadfile --make-reports=
tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
python -m pytest -
v --make-reports=${{ matrix.machines }}_
tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_torch_cuda_extensions_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_torch_cuda_extensions_gpu_durations.txt
-
name
:
Test suite reports artifacts
if
:
${{ always() }}
uses
:
actions/upload-artifact@v2
with
:
name
:
run_tests_torch_cuda_extensions_gpu_test_reports
path
:
reports
run_all_tests_torch_cuda_extensions_multi_gpu
:
runs-on
:
[
self-hosted
,
docker-gpu
,
multi-gpu
]
container
:
image
:
nvcr.io/nvidia/pytorch:21.03-py3
options
:
--gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps
:
-
name
:
Launcher docker
uses
:
actions/checkout@v2
-
name
:
NVIDIA-SMI
if
:
${{ failure() }}
continue-on-error
:
true
run
:
|
nvidia-smi
-
name
:
Install dependencies
run
:
|
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
rm -rf ~/.cache/torch_extensions/ # shared between conflicting builds
pip install .[testing,deepspeed,fairscale]
-
name
:
Are GPUs recognized by our DL frameworks
run
:
|
utils/print_env_pt.py
-
name
:
Run all tests on GPU
run
:
|
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_cuda_extensions_multi_gpu tests/deepspeed tests/extended
-
name
:
Failure short reports
if
:
${{ always() }}
run
:
cat reports/tests_torch_cuda_extensions_multi_gpu_failures_short.txt
-
name
:
Test durations
if
:
${{ always() }}
run
:
cat reports/tests_torch_cuda_extensions_multi_gpu_durations.txt
run
:
cat /workspace/transformers/reports/${{ matrix.machines }}_tests_torch_cuda_extensions_gpu/failures_short.txt
-
name
:
Test suite reports artifacts
if
:
${{ always() }}
uses
:
actions/upload-artifact@v2
with
:
name
:
run_tests_torch_cuda_extensions_multi_gpu_test_reports
path
:
reports
name
:
${{ matrix.machines }}_run_tests_torch_cuda_extensions_gpu_test_reports
path
:
/workspace/transformers/reports/${{ matrix.machines }}_tests_torch_cuda_extensions_gpu
send_results
:
name
:
Send results to webhook
runs-on
:
ubuntu-latest
if
:
always()
needs
:
[
run_all_tests_torch_gpu
,
run_all_tests_tf_gpu
,
run_all_tests_torch_multi_gpu
,
run_all_tests_tf_multi_gpu
,
run_all_tests_torch_cuda_extensions_gpu
,
run_all_tests_torch_cuda_extensions_multi_gpu
]
needs
:
[
setup
,
run_tests_gpu
,
run_examples_gpu
,
run_pipelines_tf_gpu
,
run_pipelines_torch_gpu
,
run_all_tests_torch_cuda_extensions_gpu
]
steps
:
-
uses
:
actions/checkout@v2
-
uses
:
actions/download-artifact@v2
-
name
:
Send message to Slack
env
:
CI_SLACK_BOT_TOKEN
:
${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID
:
${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY
:
${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_DUMMY_TESTS
:
${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
run
:
|
pip install slack_sdk
python utils/notification_service.py
scheduled
python utils/notification_service.py
"${{ needs.setup.outputs.matrix }}"
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment