"src/vscode:/vscode.git/clone" did not exist on "0f4f01cee9551ba995bcdf6019f0d45b0e836138"
Commit 5a567950 authored by lidc's avatar lidc
Browse files

yolov5增加了mpi单机多卡和多机多卡启动方式,其readme文件进行了更新,对maskrcnn的debug输出日志进行了删除,并更新了该模型的readme文件

parent a30b77fe
# 介绍
本测试用例用于测试目标检测MaskRCNN模型在ROCm平台的性能,测试流程如下
本测试用例用于测试目标检测MaskRCNN模型的性能,测试流程如下
# 测试流程
## 进入工作目录
......@@ -20,7 +20,7 @@ COCO2017数据集
### 单机多卡
1)pytorch启动方式
**1)pytorch启动方式**
export HIP_VISIBLE_DEVICES=0,1,2,3
export NGPUS=4
......@@ -28,11 +28,11 @@ COCO2017数据集
python3 -m torch.distributed.launch --nproc_per_node= ${NGPUS} --use_env train.py --dataset coco --model maskrcnn_resnet50_fpn --epochs 26 --lr-steps 16 22 --aspect-ratio-group-factor 3 --lr 0.005 --data-path /path/to/{COCO2017_data_dir} > train_2gpu_lr0.005.log 2>&1 &
注意:多卡运行时,学习率与卡数的对应关系为0.02/8*$NGPU,例如,lr_4gpu=0.01,lr_2gpu=0.005,lr_1gpu=0.0025。
2)mpi启动
**2)mpi启动**
```
cd references/detection
mpirun -np 4 --bind-to none single_process.sh localhost
mpirun -np $np --bind-to none single_process.sh localhost
```
### 多机多卡
......@@ -43,7 +43,12 @@ mpi启动
mpirun -np $np --hostfile hostfile --bind-to none single_process.sh $dist_url
```
其中,$dist_url为master_node的ip,在多节点的时候需要参考hostfile文件中的格式进行修改。
其中,$dist_url为master_node的ip,hostfile为所使用节点的配置文件,格式示例如下:
```
node1 slots=4
node2 slots=4
```
# 参考
......
......@@ -2,15 +2,12 @@
export MIOPEN_DEBUG_DISABLE_FIND_DB=1
export NCCL_SOCKET_IFNAME=ib0
export HSA_USERPTR_FOR_PAGED_MEM=0
#source /public/software/apps/DeepLearning/PyTorch/pytorch-env.sh
module rm compiler/dtk/21.10
module load compiler/dtk/22.10
cd /work/home/sugon_ldc/Gitlab/PyTorch/Compute-Vision/Objection/MaskRCNN/vision/references/detection
#conda init
#source activate
#source deactivate
#conda activate maskrcnn
source env.sh
#module load apps/PyTorch/1.5.0a0/hpcx-2.4.1-gcc-7.3.1-rocm3.3
......@@ -22,10 +19,8 @@ comm_size=$OMPI_COMM_WORLD_SIZE
echo $lrank
echo $comm_rank
echo $comm_size
echo '##################'
#APP="python3 `pwd`/main_bench.py --batch-size=${3} --a=${2} -j 24 --epochs=1 --dist-url tcp://${1}:34567 --dist-backend nccl --world-size=${comm_size} --rank=${comm_rank} --synthetic /public/software/apps/DeepLearning/Data/ImageNet-pytorch/"
APP="python3 train.py --dataset coco --model maskrcnn_resnet50_fpn --epochs 26 --dist-url tcp://${1}:34567 --dist-backend nccl --world-size=${comm_size} --rank=${comm_rank} --lr-steps 16 22 --aspect-ratio-group-factor 3 --data-path /work/home/sugon_ldc/datasets/COCO2017/"
APP="python3 `pwd`/train.py --dataset coco --model maskrcnn_resnet50_fpn --epochs 26 --dist-url tcp://${1}:34567 --dist-backend nccl --world-size=${comm_size} --rank=${comm_rank} --lr-steps 16 22 --aspect-ratio-group-factor 3 --data-path /work/home/sugon_ldc/datasets/COCO2017/"
echo $dist_url
case ${lrank} in
......
......@@ -292,11 +292,11 @@ def init_distributed_mode(args):
args.dist_backend = 'nccl'
print('| distributed init (rank {}): {}'.format(
args.rank, args.dist_url), flush=True)
print('**********************')
print('backend:',args.dist_backend)
print('init_method:',args.dist_url)
print('world_size:',args.world_size)
print('rank:',args.rank)
#print('**********************')
#print('backend:',args.dist_backend)
#print('init_method:',args.dist_url)
#print('world_size:',args.world_size)
#print('rank:',args.rank)
torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
world_size=args.world_size, rank=args.rank)
torch.distributed.barrier()
......
......@@ -15,7 +15,6 @@ data/samples/*
**/*.pt
**/*.pth
**/*.onnx
**/*.engine
**/*.mlmodel
**/*.torchscript
**/*.torchscript.pt
......@@ -24,7 +23,6 @@ data/samples/*
**/*.pb
*_saved_model/
*_web_model/
*_openvino_model/
# Below Copied From .gitignore -----------------------------------------------------------------------------------------
# Below Copied From .gitignore -----------------------------------------------------------------------------------------
......
---
name: "🐛 Bug report"
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''
---
Before submitting a bug report, please be aware that your issue **must be reproducible** with all of the following,
otherwise it is non-actionable, and we can not help you:
- **Current repo**: run `git fetch && git status -uno` to check and `git pull` to update repo
- **Common dataset**: coco.yaml or coco128.yaml
- **Common environment**: Colab, Google Cloud, or Docker image. See https://github.com/ultralytics/yolov5#environments
If this is a custom dataset/training question you **must include** your `train*.jpg`, `val*.jpg` and `results.png`
figures, or we can not help you. You can generate these with `utils.plot_results()`.
## 🐛 Bug
A clear and concise description of what the bug is.
## To Reproduce (REQUIRED)
Input:
```
import torch
a = torch.tensor([5])
c = a / 0
```
Output:
```
Traceback (most recent call last):
File "/Users/glennjocher/opt/anaconda3/envs/env1/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-be04c762b799>", line 5, in <module>
c = a / 0
RuntimeError: ZeroDivisionError
```
## Expected behavior
A clear and concise description of what you expected to happen.
## Environment
If applicable, add screenshots to help explain your problem.
- OS: [e.g. Ubuntu]
- GPU [e.g. 2080 Ti]
## Additional context
Add any other context about the problem here.
name: 🐛 Bug Report
# title: " "
description: Problems with YOLOv5
labels: [bug, triage]
body:
- type: markdown
attributes:
value: |
Thank you for submitting a YOLOv5 🐛 Bug Report!
- type: checkboxes
attributes:
label: Search before asking
description: >
Please search the [issues](https://github.com/ultralytics/yolov5/issues) to see if a similar bug report already exists.
options:
- label: >
I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and found no similar bug report.
required: true
- type: dropdown
attributes:
label: YOLOv5 Component
description: |
Please select the part of YOLOv5 where you found the bug.
multiple: true
options:
- "Training"
- "Validation"
- "Detection"
- "Export"
- "PyTorch Hub"
- "Multi-GPU"
- "Evolution"
- "Integrations"
- "Other"
validations:
required: false
- type: textarea
attributes:
label: Bug
description: Provide console output with error messages and/or screenshots of the bug.
placeholder: |
💡 ProTip! Include as much information as possible (screenshots, logs, tracebacks etc.) to receive the most helpful response.
validations:
required: true
- type: textarea
attributes:
label: Environment
description: Please specify the software and hardware you used to produce the bug.
placeholder: |
- YOLO: YOLOv5 🚀 v6.0-67-g60e42e1 torch 1.9.0+cu111 CUDA:0 (A100-SXM4-40GB, 40536MiB)
- OS: Ubuntu 20.04
- Python: 3.9.0
validations:
required: false
- type: textarea
attributes:
label: Minimal Reproducible Example
description: >
When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to **reproduce** the problem.
This is referred to by community members as creating a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example).
placeholder: |
```
# Code to reproduce your issue here
```
validations:
required: false
- type: textarea
attributes:
label: Additional
description: Anything else you would like to share?
- type: checkboxes
attributes:
label: Are you willing to submit a PR?
description: >
(Optional) We encourage you to submit a [Pull Request](https://github.com/ultralytics/yolov5/pulls) (PR) to help improve YOLOv5 for everyone, especially if you have a good understanding of how to implement a fix or feature.
See the YOLOv5 [Contributing Guide](https://github.com/ultralytics/yolov5/blob/master/CONTRIBUTING.md) to get started.
options:
- label: Yes I'd like to help by submitting a PR!
blank_issues_enabled: true
contact_links:
- name: Slack
url: https://join.slack.com/t/ultralytics/shared_invite/zt-w29ei8bp-jczz7QYUmDtgo6r6KcMIAg
about: Ask on Ultralytics Slack Forum
- name: Stack Overflow
url: https://stackoverflow.com/search?q=YOLOv5
about: Ask on Stack Overflow with 'YOLOv5' tag
---
name: "🚀 Feature request"
about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''
---
## 🚀 Feature
<!-- A clear and concise description of the feature proposal -->
## Motivation
<!-- Please outline the motivation for the proposal. Is your feature request related to a problem?
e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->
## Pitch
<!-- A clear and concise description of what you want to happen. -->
## Alternatives
<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->
## Additional context
<!-- Add any other context or screenshots about the feature request here. -->
name: 🚀 Feature Request
description: Suggest a YOLOv5 idea
# title: " "
labels: [enhancement]
body:
- type: markdown
attributes:
value: |
Thank you for submitting a YOLOv5 🚀 Feature Request!
- type: checkboxes
attributes:
label: Search before asking
description: >
Please search the [issues](https://github.com/ultralytics/yolov5/issues) to see if a similar feature request already exists.
options:
- label: >
I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and found no similar feature requests.
required: true
- type: textarea
attributes:
label: Description
description: A short description of your feature.
placeholder: |
What new feature would you like to see in YOLOv5?
validations:
required: true
- type: textarea
attributes:
label: Use case
description: |
Describe the use case of your feature request. It will help us understand and prioritize the feature request.
placeholder: |
How would this feature be used, and who would use it?
- type: textarea
attributes:
label: Additional
description: Anything else you would like to share?
- type: checkboxes
attributes:
label: Are you willing to submit a PR?
description: >
(Optional) We encourage you to submit a [Pull Request](https://github.com/ultralytics/yolov5/pulls) (PR) to help improve YOLOv5 for everyone, especially if you have a good understanding of how to implement a fix or feature.
See the YOLOv5 [Contributing Guide](https://github.com/ultralytics/yolov5/blob/master/CONTRIBUTING.md) to get started.
options:
- label: Yes I'd like to help by submitting a PR!
---
name: "❓Question"
about: Ask a general question
title: ''
labels: question
assignees: ''
---
## ❔Question
## Additional context
name: ❓ Question
description: Ask a YOLOv5 question
# title: " "
labels: [question]
body:
- type: markdown
attributes:
value: |
Thank you for asking a YOLOv5 ❓ Question!
- type: checkboxes
attributes:
label: Search before asking
description: >
Please search the [issues](https://github.com/ultralytics/yolov5/issues) and [discussions](https://github.com/ultralytics/yolov5/discussions) to see if a similar question already exists.
options:
- label: >
I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and [discussions](https://github.com/ultralytics/yolov5/discussions) and found no similar questions.
required: true
- type: textarea
attributes:
label: Question
description: What is your question?
placeholder: |
💡 ProTip! Include as much information as possible (screenshots, logs, tracebacks etc.) to receive the most helpful response.
validations:
required: true
- type: textarea
attributes:
label: Additional
description: Anything else you would like to share?
......@@ -10,14 +10,3 @@ updates:
- glenn-jocher
labels:
- dependencies
- package-ecosystem: github-actions
directory: "/"
schedule:
interval: weekly
time: "04:00"
open-pull-requests-limit: 5
reviewers:
- glenn-jocher
labels:
- dependencies
......@@ -19,8 +19,8 @@ jobs:
fail-fast: false
matrix:
os: [ ubuntu-latest, macos-latest, windows-latest ]
python-version: [ 3.9 ]
model: [ 'yolov5n' ] # models to test
python-version: [ 3.8 ]
model: [ 'yolov5s' ] # models to test
# Timeout: https://stackoverflow.com/a/59076067/4521646
timeout-minutes: 50
......@@ -39,27 +39,23 @@ jobs:
python -c "from pip._internal.locations import USER_CACHE_DIR; print('::set-output name=dir::' + USER_CACHE_DIR)"
- name: Cache pip
uses: actions/cache@v2.1.7
uses: actions/cache@v1
with:
path: ${{ steps.pip-cache.outputs.dir }}
key: ${{ runner.os }}-${{ matrix.python-version }}-pip-${{ hashFiles('requirements.txt') }}
restore-keys: |
${{ runner.os }}-${{ matrix.python-version }}-pip-
# Known Keras 2.7.0 issue: https://github.com/ultralytics/yolov5/pull/5486
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -qr requirements.txt -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install -q onnx tensorflow-cpu keras==2.6.0 # wandb # extras
pip install -q onnx tensorflow-cpu # for export
python --version
pip --version
pip list
shell: bash
# - name: W&B login
# run: wandb login 345011b3fb26dc8337fd9b20e53857c1d403f2aa
- name: Download data
run: |
# curl -L -o tmp.zip https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip
......@@ -87,7 +83,7 @@ jobs:
# Python
python - <<EOF
import torch
# Known issue, urllib.error.HTTPError: HTTP Error 403: rate limit exceeded, will be resolved in torch==1.10.0
# Known issue, urllib.error.HTTPError: HTTP Error 403: rate limit exceeded, will be resolved in torch==1.10.0
# model = torch.hub.load('ultralytics/yolov5', 'custom', path='runs/train/exp/weights/last.pt')
EOF
......
# This action runs GitHub's industry-leading static analysis engine, CodeQL, against a repository's source code to find security vulnerabilities.
# This action runs GitHub's industry-leading static analysis engine, CodeQL, against a repository's source code to find security vulnerabilities.
# https://github.com/github/codeql-action
name: "CodeQL"
......
......@@ -13,7 +13,7 @@ jobs:
repo-token: ${{ secrets.GITHUB_TOKEN }}
pr-message: |
👋 Hello @${{ github.actor }}, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- ✅ Verify your PR is **up-to-date with upstream/master.** If your PR is behind upstream/master an automatic [GitHub actions](https://github.com/ultralytics/yolov5/blob/master/.github/workflows/rebase.yml) rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:
- ✅ Verify your PR is **up-to-date with origin/master.** If your PR is behind origin/master an automatic [GitHub actions](https://github.com/ultralytics/yolov5/blob/master/.github/workflows/rebase.yml) rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:
```bash
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
......@@ -37,9 +37,9 @@ jobs:
[**Python>=3.6.0**](https://www.python.org/) with all [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) installed including [**PyTorch>=1.7**](https://pytorch.org/get-started/locally/). To get started:
```bash
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt
```
## Environments
......@@ -57,3 +57,4 @@ jobs:
<a href="https://github.com/ultralytics/yolov5/actions"><img src="https://github.com/ultralytics/yolov5/workflows/CI%20CPU%20testing/badge.svg" alt="CI CPU testing"></a>
If this badge is green, all [YOLOv5 GitHub Actions](https://github.com/ultralytics/yolov5/actions) Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training ([train.py](https://github.com/ultralytics/yolov5/blob/master/train.py)), validation ([val.py](https://github.com/ultralytics/yolov5/blob/master/val.py)), inference ([detect.py](https://github.com/ultralytics/yolov5/blob/master/detect.py)) and export ([export.py](https://github.com/ultralytics/yolov5/blob/master/export.py)) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
name: Automatic Rebase
# https://github.com/marketplace/actions/automatic-rebase
name: Automatic Rebase
on:
issue_comment:
types: [created]
jobs:
rebase:
name: Rebase
......@@ -13,9 +14,8 @@ jobs:
- name: Checkout the latest code
uses: actions/checkout@v2
with:
token: ${{ secrets.ACTIONS_TOKEN }}
fetch-depth: 0 # otherwise, you will fail to push refs to dest repo
fetch-depth: 0
- name: Automatic Rebase
uses: cirrus-actions/rebase@1.5
uses: cirrus-actions/rebase@1.3.1
env:
GITHUB_TOKEN: ${{ secrets.ACTIONS_TOKEN }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
......@@ -9,7 +9,7 @@ jobs:
stale:
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v4
- uses: actions/stale@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
stale-issue-message: |
......@@ -21,7 +21,7 @@ jobs:
- **Docs** – https://docs.ultralytics.com
Access additional [Ultralytics](https://ultralytics.com) ⚡ resources:
- **Ultralytics HUB** – https://ultralytics.com/hub
- **Ultralytics HUB** – https://ultralytics.com
- **Vision API** – https://ultralytics.com/yolov5
- **About Us** – https://ultralytics.com/about
- **Join Our Team** – https://ultralytics.com/work
......@@ -31,7 +31,7 @@ jobs:
Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!
stale-pr-message: 'This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 🚀 and Vision AI ⭐.'
stale-pr-message: 'This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 🚀 and Vision AI ⭐.'
days-before-stale: 30
days-before-close: 5
exempt-issue-labels: 'documentation,tutorial'
......
......@@ -20,17 +20,12 @@
*.data
*.json
*.cfg
!setup.cfg
!cfg/yolov3*.cfg
storage.googleapis.com
runs/*
data/*
data/images/*
!data/*.yaml
!data/hyps
!data/scripts
!data/images
!data/hyps/*
!data/images/zidane.jpg
!data/images/bus.jpg
!data/*.sh
......@@ -52,14 +47,12 @@ VOC/
*.pt
*.pb
*.onnx
*.engine
*.mlmodel
*.torchscript
*.tflite
*.h5
*_saved_model/
*_web_model/
*_openvino_model/
darknet53.conv.74
yolov3-tiny.conv.15
......
# Define hooks for code formations
# Will be applied on any updated commit files if a user has installed and linked commit hook
default_language_version:
python: python3.8
# Define bot property if installed via https://github.com/marketplace/pre-commit-ci
ci:
autofix_prs: true
autoupdate_commit_msg: '[pre-commit.ci] pre-commit suggestions'
autoupdate_schedule: quarterly
# submodules: true
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.1.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-case-conflict
- id: check-yaml
- id: check-toml
- id: pretty-format-json
- id: check-docstring-first
- repo: https://github.com/asottile/pyupgrade
rev: v2.31.0
hooks:
- id: pyupgrade
args: [--py36-plus]
name: Upgrade code
- repo: https://github.com/PyCQA/isort
rev: 5.10.1
hooks:
- id: isort
name: Sort imports
# TODO
#- repo: https://github.com/pre-commit/mirrors-yapf
# rev: v0.31.0
# hooks:
# - id: yapf
# name: formatting
# TODO
#- repo: https://github.com/executablebooks/mdformat
# rev: 0.7.7
# hooks:
# - id: mdformat
# additional_dependencies:
# - mdformat-gfm
# - mdformat-black
# - mdformat_frontmatter
# TODO
#- repo: https://github.com/asottile/yesqa
# rev: v1.2.3
# hooks:
# - id: yesqa
- repo: https://github.com/PyCQA/flake8
rev: 4.0.1
hooks:
- id: flake8
name: PEP8
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment