Unverified Commit ba475173 authored by Frank Lee's avatar Frank Lee Committed by GitHub
Browse files

[workflow] fixed example check workflow (#2554)

* [workflow] fixed example check workflow

* polish yaml
parent fb1a4c0d
......@@ -6,13 +6,14 @@
- [Table of Contents](#table-of-contents)
- [Overview](#overview)
- [Workflows](#workflows)
- [Checks on Pull Requests](#checks-on-pull-requests)
- [Regular Checks](#regular-checks)
- [Release](#release)
- [Manual Dispatch](#manual-dispatch)
- [Release bdist wheel](#release-bdist-wheel)
- [Code Style Check](#code-style-check)
- [Unit Test](#unit-test)
- [Example Test](#example-test)
- [Dispatch Example Test](#dispatch-example-test)
- [Compatibility Test](#compatibility-test)
- [Compatibility Test](#compatibility-test-1)
- [Release](#release)
- [Release bdist wheel](#release-bdist-wheel)
- [User Friendliness](#user-friendliness)
- [Configuration](#configuration)
- [Progress Log](#progress-log)
......@@ -26,25 +27,54 @@ In the section below, we will dive into the details of different workflows avail
## Workflows
### Checks on Pull Requests
Refer to this [documentation](https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow) on how to manually trigger a workflow.
I will provide the details of each workflow below.
### Code Style Check
| Workflow Name | File name | Description |
| --------------------------- | ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Build` | `build.yml` | This workflow is triggered when the label `Run build and Test` is assigned to a PR. It will run all the unit tests in the repository with 4 GPUs. |
| `Pre-commit` | `pre_commit.yml` | This workflow runs pre-commit checks for code style consistency. |
| `Report pre-commit failure` | `report_precommit_failure.yml` | This PR will put up a comment in the PR to explain the precommit failure and remedy. This is executed when `Pre-commit` is done |
| `Report test coverage` | `report_test_coverage.yml` | This PR will put up a comment to report the test coverage results. This is executed when `Build` is completed. |
| `Test example` | `auto_example_check.yml` | The example will be automatically tested if its files are changed in the PR |
| --------------------------- | ------------------------------ | ---------------------------------------------------------------------------------------------------------- |
| `Pre-commit` | `pre_commit.yml` | This workflow runs pre-commit checks for code style consistency for PRs. |
| `Report pre-commit failure` | `report_precommit_failure.yml` | This PR will put up a comment in the PR to explain the precommit failure and remedy if `Pre-commit` fails. |
### Regular Checks
### Unit Test
| Workflow Name | File name | Description |
| ----------------------- | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Test example` | `auto_example_check.yml` | This workflow will test all examples every Sunday |
| `Compatibility Test` | `auto_compatibility_test.yml` | This workflow will check the compatiblity of Colossal-AI against PyTorch and CUDA every Sunday. The PyTorch and CUDA versions are specified in `.compatibility`. |
| ---------------------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Build` | `build.yml` | This workflow is triggered when the label `Run build and Test` is assigned to a PR. It will run all the unit tests in the repository with 4 GPUs. |
| `Build on 8 GPUs` | `build_gpu_8.yml` | This workflow will run the unit tests everyday with 8 GPUs. |
| `Synchronize submodule` | `submodule.yml` | This workflow will check if any git submodule is updated. If so, it will create a PR to update the submodule pointers. |
| `Close inactive issues` | `close_inactive.yml` | This workflow will close issues which are stale for 14 days. |
| `Report test coverage` | `report_test_coverage.yml` | This PR will put up a comment to report the test coverage results when `Build` is done. |
### Example Test
| Workflow Name | File name | Description |
| -------------------------- | ------------------------------- | --------------------------------------------------------------------------- |
| `Test example on PR` | `example_check_on_pr.yml` | The example will be automatically tested if its files are changed in the PR |
| `Test example on Schedule` | `example_check_on_schedule.yml` | This workflow will test all examples every Sunday |
| `Example Test on Dispatch` | `example_check_on_dispatch.yml` | Manually test a specified example. |
#### Dispatch Example Test
parameters:
- `example_directory`: the example directory to test. Multiple directories are supported and must be separated by comma. For example, language/gpt, images/vit. Simply input language or simply gpt does not work.
### Compatibility Test
| Workflow Name | File name | Description |
| ---------------------------- | -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
| `Compatibility Test` | `auto_compatibility_test.yml` | This workflow will check the compatiblity of Colossal-AI against PyTorch and CUDA specified in `.compatibility` every Sunday. |
| `Auto Compatibility Test` | `auto_compatibility_test.yml` | Check Colossal-AI's compatiblity when `version.txt` is changed in a PR. |
| `Dispatch Compatiblity Test` | `dispatch_compatiblity_test.yml` | Test PyTorch and Python Compatibility. |
#### Compatibility Test
Parameters:
- `torch version`:torch version to test against, multiple versions are supported but must be separated by comma. The default is value is all, which will test all available torch versions listed in this [repository](https://github.com/hpcaitech/public_assets/tree/main/colossalai/torch_build/torch_wheels).
- `cuda version`: cuda versions to test against, multiple versions are supported but must be separated by comma. The CUDA versions must be present in our [DockerHub repository](https://hub.docker.com/r/hpcaitech/cuda-conda).
> It only test the compatiblity of the main branch
### Release
......@@ -56,18 +86,8 @@ In the section below, we will dive into the details of different workflows avail
| `Release Docker` | `release_docker.yml` | Build and release the Docker image to DockerHub. Triggered when the change of `version.txt` is merged. |
| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. Manually dispatched. See more details in the next section. |
| `Auto Release bdist wheel` | `auto_release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions.Triggered when the change of `version.txt` is merged. Build specificatons are stored in `.bdist.json` |
| `Auto Compatibility Test` | `auto_compatibility_test.yml` | Check Colossal-AI's compatiblity against the PyTorch and CUDA version specified in `.compatibility`. Triggered when `version.txt` is changed in a PR. |
### Manual Dispatch
| Workflow Name | File name | Description |
| ---------------------------- | -------------------------------- | ------------------------------------------------------ |
| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. |
| `Dispatch Example Test` | `dispatch_example_check.yml` | Manually test a specified example. |
| `Dispatch Compatiblity Test` | `dispatch_compatiblity_test.yml` | Test PyTorch and Python Compatibility. |
Refer to this [documentation](https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow) on how to manually trigger a workflow.
I will provide the details of each workflow below.
#### Release bdist wheel
......@@ -76,26 +96,13 @@ Parameters:
- `cuda version`: cuda versions to test against, multiple versions are supported but must be separated by comma. The CUDA versions must be present in our [DockerHub repository](https://hub.docker.com/r/hpcaitech/cuda-conda).
- `ref`: input the branch or tag name to build the wheel for this ref.
#### Dispatch Example Test
parameters:
- `example_directory`: the example directory to test. Multiple directories are supported and must be separated by comma. For example, language/gpt, images/vit. Simply input language or simply gpt does not work.
#### Compatibility Test
Parameters:
- `torch version`:torch version to test against, multiple versions are supported but must be separated by comma. The default is value is all, which will test all available torch versions listed in this [repository](https://github.com/hpcaitech/public_assets/tree/main/colossalai/torch_build/torch_wheels).
- `cuda version`: cuda versions to test against, multiple versions are supported but must be separated by comma. The CUDA versions must be present in our [DockerHub repository](https://hub.docker.com/r/hpcaitech/cuda-conda).
> It only test the compatiblity of the main branch
### User Friendliness
| Workflow Name | File name | Description |
| ----------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| ----------------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `issue-translate` | `translate_comment.yml` | This workflow is triggered when a new issue comment is created. The comment will be translated into English if not written in English. |
| `Synchronize submodule` | `submodule.yml` | This workflow will check if any git submodule is updated. If so, it will create a PR to update the submodule pointers. |
| `Close inactive issues` | `close_inactive.yml` | This workflow will close issues which are stale for 14 days. |
## Configuration
......
name: Test Example on Dispatch
on:
workflow_dispatch:
inputs:
example_directory:
type: string
description: example directory, separated by space. For example, language/gpt, images/vit. Simply input language or simply gpt does not work.
required: true
jobs:
matrix_preparation:
if: |
github.event.pull_request.draft == false &&
github.base_ref == 'main' &&
github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
name: Check the examples user want
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: 📚 Checkout
uses: actions/checkout@v3
- name: Set up matrix
id: set-matrix
env:
check_dir: ${{ inputs.example_directory }}
run: |
res=`python .github/workflows/scripts/example_checks/check_dispatch_inputs.py --fileNameList $check_dir`
if [ res == "failure" ];then
exit -1
fi
dirs="[${check_dir}]"
echo "Testing examples in $dirs"
echo "matrix={\"directory\":$(echo "$dirs")}" >> $GITHUB_OUTPUT
test_example:
if: |
github.event.pull_request.draft == false &&
github.base_ref == 'main' &&
github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
name: Manually check example files
needs: manual_check_matrix_preparation
runs-on: [self-hosted, gpu]
strategy:
fail-fast: false
matrix: ${{fromJson(needs.manual_check_matrix_preparation.outputs.matrix)}}
container:
image: hpcaitech/pytorch-cuda:1.12.0-11.3.0
options: --gpus all --rm -v /data/scratch/examples-data:/data/
timeout-minutes: 10
steps:
- name: 📚 Checkout
uses: actions/checkout@v3
- name: Install Colossal-AI
run: |
pip install -v .
- name: Test the example
run: |
dir=${{ matrix.directory }}
echo "Testing ${dir} now"
cd "${PWD}/examples/${dir}"
bash test_ci.sh
env:
NCCL_SHM_DISABLE: 1
name: Test Example on PR
on:
pull_request:
# any change in the examples folder will trigger check for the corresponding example.
paths:
- 'examples/**'
jobs:
# This is for changed example files detect and output a matrix containing all the corresponding directory name.
detect-changed-example:
if: |
github.event.pull_request.draft == false &&
github.base_ref == 'main' &&
github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI' && github.event_name == 'pull_request'
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.setup-matrix.outputs.matrix }}
anyChanged: ${{ steps.setup-matrix.outputs.anyChanged }}
name: Detect changed example files
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
ref: ${{ github.event.pull_request.head.sha }}
- name: Locate base commit
id: locate-base-sha
run: |
curBranch=$(git rev-parse --abbrev-ref HEAD)
commonCommit=$(git merge-base origin/main $curBranch)
echo $commonCommit
echo "baseSHA=$commonCommit" >> $GITHUB_OUTPUT
- name: Get all changed example files
id: changed-files
uses: tj-actions/changed-files@v35
with:
base_sha: ${{ steps.locate-base-sha.outputs.baseSHA }}
- name: setup matrix
id: setup-matrix
run: |
changedFileName=""
for file in ${{ steps.changed-files.outputs.all_changed_files }}; do
changedFileName="${file}:${changedFileName}"
done
echo "$changedFileName was changed"
res=`python .github/workflows/scripts/example_checks/detect_changed_example.py --fileNameList $changedFileName`
echo "All changed examples are $res"
if [ "$res" = "[]" ]; then
echo "anyChanged=false" >> $GITHUB_OUTPUT
echo "matrix=null" >> $GITHUB_OUTPUT
else
dirs=$( IFS=',' ; echo "${res[*]}" )
echo "anyChanged=true" >> $GITHUB_OUTPUT
echo "matrix={\"directory\":$(echo "$dirs")}" >> $GITHUB_OUTPUT
fi
# If no file is changed, it will prompt an error and shows the matrix do not have value.
check-changed-example:
# Add this condition to avoid executing this job if the trigger event is workflow_dispatch.
if: |
github.event.pull_request.draft == false &&
github.base_ref == 'main' &&
github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI' && github.event_name == 'pull_request' &&
needs.detect-changed-example.outputs.anyChanged == 'true'
name: Test the changed example
needs: detect-changed-example
runs-on: [self-hosted, gpu]
strategy:
fail-fast: false
matrix: ${{fromJson(needs.detect-changed-example.outputs.matrix)}}
container:
image: hpcaitech/pytorch-cuda:1.12.0-11.3.0
options: --gpus all --rm -v /data/scratch/examples-data:/data/
timeout-minutes: 10
steps:
- uses: actions/checkout@v3
- name: Install Colossal-AI
run: |
pip install -v .
- name: Test the example
run: |
example_dir=${{ matrix.directory }}
cd "${PWD}/examples/${example_dir}"
bash test_ci.sh
env:
NCCL_SHM_DISABLE: 1
name: Test Example on Schedule
on:
# run at 00:00 of every Sunday(singapore time) so here is UTC time Saturday 16:00
schedule:
- cron: '0 16 * * 6'
jobs:
# This is for all files' weekly check. Specifically, this job is to find all the directories.
matrix_preparation:
if: |
github.repository == 'hpcaitech/ColossalAI' &&
github.event_name == 'schedule'
name: Prepare matrix for weekly check
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.setup-matrix.outputs.matrix }}
steps:
- name: 📚 Checkout
uses: actions/checkout@v3
- name: setup matrix
id: setup-matrix
run: |
res=`python .github/workflows/scripts/example_checks/check_example_weekly.py`
all_loc=$( IFS=',' ; echo "${res[*]}" )
echo "Found the examples: $all_loc"
echo "matrix={\"directory\":$(echo "$all_loc")}" >> $GITHUB_OUTPUT
weekly_check:
if: |
github.repository == 'hpcaitech/ColossalAI' &&
github.event_name == 'schedule'
name: Weekly check all examples
needs: matrix_preparation
runs-on: [self-hosted, gpu]
strategy:
fail-fast: false
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
container:
image: hpcaitech/pytorch-cuda:1.12.0-11.3.0
timeout-minutes: 10
steps:
- name: 📚 Checkout
uses: actions/checkout@v3
- name: Install Colossal-AI
run: |
pip install -v .
- name: Traverse all files
run: |
example_dir=${{ matrix.diretory }}
echo "Testing ${example_dir} now"
cd "${PWD}/examples/${example_dir}"
bash test_ci.sh
env:
NCCL_SHM_DISABLE: 1
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment