diff --git a/README.md b/README.md index 7a7ca50fe2f3d3f8c6b001542d81538b7c58c330..f93ae9e2baedda76582693ef3826ef351cbd087a 100644 --- a/README.md +++ b/README.md @@ -95,8 +95,9 @@ DCU深度学习样例 | 类别 | 版本 | DCU | 精度 | 多DCU | 支持网络 | 代码位置| | :----------: | :----------: | :----------: | :----------: | :----------: | :----------: | :----------: | -| mmclassification | v0.24.0 | Yes | FP32/FP16 | Yes | ResNet18/ResNet34/ResNet50/ResNet152/Vgg11/SeresNet50/ResNext50/MobileNet-v2/ShuffleNet-v1/ShuffleNet-v2 | [mmclassfication](http://10.0.50.24/dcutoolkit/deeplearing/dlexamples_new/-/tree/main/openmmlab_test/mmclassification-speed-benchmark) | -| mmdetection | v2.25.2 | Yes | FP32/FP16 | Yes | Faster-Rcnn/Mask-Rcnn/Double-Heads/Cascade-Mask-Rcnn/ResNest/Dcn/RetinaNet/VfNet/Ssd/Yolov3 | [mmdetection](http://10.0.50.24/dcutoolkit/deeplearing/dlexamples_new/-/tree/main/openmmlab_test/mmdetection-speed_xinpian) | -| mmpose | v0.28.1 | Yes | FP32/FP16 | Yes | ResNet50-Top-Down/ResNet50-Bottom-Up/HrNet-Top-Down | [mmpose](http://10.0.50.24/dcutoolkit/deeplearing/dlexamples_new/-/tree/main/openmmlab_test/mmpose-speed_test) | -| mmsegmentation | v0.29.1 | Yes | FP32/FP16 | Yes | PspNet-R50/DeepLab-V3-R50/Fcn-R50/UperNet-R50/DeepLab-V3plus-R50 | [mmsegmentation](http://10.0.50.24/dcutoolkit/deeplearing/dlexamples_new/-/tree/main/openmmlab_test/mmsegmentation) | +| mmclassification | v0.24.0 | Yes | FP32/FP16 | Yes | ResNet18/ResNet34/ResNet50/ResNet152/Vgg11/SeresNet50/ResNext50/MobileNet-v2/ShuffleNet-v1/ShuffleNet-v2 | [mmclassfication](http://10.0.50.24/dcutoolkit/deeplearing/dlexamples_new/-/tree/main/openmmlab_test/mmclassification-0.24.1) | +| mmdetection | v2.25.2 | Yes | FP32/FP16 | Yes | Faster-Rcnn/Mask-Rcnn/Double-Heads/Cascade-Mask-Rcnn/ResNest/Dcn/RetinaNet/VfNet/Ssd/Yolov3 | [mmdetection](http://10.0.50.24/dcutoolkit/deeplearing/dlexamples_new/-/tree/main/openmmlab_test/mmdetection-2.25.2) | +| mmpose | v0.28.1 | Yes | FP32/FP16 | Yes | ResNet50-Top-Down/ResNet50-Bottom-Up/HrNet-Top-Down | [mmpose](http://10.0.50.24/dcutoolkit/deeplearing/dlexamples_new/-/tree/main/openmmlab_test/mmpose-0.28.1) | +| mmsegmentation | v0.29.1 | Yes | FP32/FP16 | Yes | PspNet-R50/DeepLab-V3-R50/Fcn-R50/UperNet-R50/DeepLab-V3plus-R50 | [mmsegmentation](http://10.0.50.24/dcutoolkit/deeplearing/dlexamples_new/-/tree/main/openmmlab_test/mmsegmentation-0.29.1) | +| mmaction2 | v0.24.1 | Yes | FP32/FP16 | Yes | ST-GCN/C3D/R(2+1)D | [mmaction2](http://10.0.50.24/dcutoolkit/deeplearing/dlexamples_new/-/tree/main/openmmlab_test/mmaction2-0.24.1) | diff --git a/openmmlab_test/mmaction2-0.24.1/.github/CODE_OF_CONDUCT.md b/openmmlab_test/mmaction2-0.24.1/.github/CODE_OF_CONDUCT.md new file mode 100644 index 0000000000000000000000000000000000000000..92afad1c5ab5d5781115dee45c131d3751d3cd31 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/CODE_OF_CONDUCT.md @@ -0,0 +1,76 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as +contributors and maintainers pledge to making participation in our project and +our community a harassment-free experience for everyone, regardless of age, body +size, disability, ethnicity, sex characteristics, gender identity and expression, +level of experience, education, socio-economic status, nationality, personal +appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment +include: + +- Using welcoming and inclusive language +- Being respectful of differing viewpoints and experiences +- Gracefully accepting constructive criticism +- Focusing on what is best for the community +- Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +- The use of sexualized language or imagery and unwelcome sexual attention or + advances +- Trolling, insulting/derogatory comments, and personal or political attacks +- Public or private harassment +- Publishing others' private information, such as a physical or electronic + address, without explicit permission +- Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable +behavior and are expected to take appropriate and fair corrective action in +response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or +reject comments, commits, code, wiki edits, issues, and other contributions +that are not aligned to this Code of Conduct, or to ban temporarily or +permanently any contributor for other behaviors that they deem inappropriate, +threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies both within project spaces and in public spaces +when an individual is representing the project or its community. Examples of +representing a project or community include using an official project e-mail +address, posting via an official social media account, or acting as an appointed +representative at an online or offline event. Representation of a project may be +further defined and clarified by project maintainers. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported by contacting the project team at chenkaidev@gmail.com. All +complaints will be reviewed and investigated and will result in a response that +is deemed necessary and appropriate to the circumstances. The project team is +obligated to maintain confidentiality with regard to the reporter of an incident. +Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good +faith may face temporary or permanent repercussions as determined by other +members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, +available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html + +For answers to common questions about this code of conduct, see +https://www.contributor-covenant.org/faq + +[homepage]: https://www.contributor-covenant.org diff --git a/openmmlab_test/mmaction2-0.24.1/.github/CONTRIBUTING.md b/openmmlab_test/mmaction2-0.24.1/.github/CONTRIBUTING.md new file mode 100644 index 0000000000000000000000000000000000000000..fb894baf62c8da07f2b4ddd79e8d30790b0cc5aa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/CONTRIBUTING.md @@ -0,0 +1 @@ +We appreciate all contributions to improve MMAction2. Please refer to [CONTRIBUTING.md](https://github.com/open-mmlab/mmcv/blob/master/CONTRIBUTING.md) in MMCV for more details about the contributing guideline. diff --git a/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/config.yml b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 0000000000000000000000000000000000000000..a772220430aad6537846164a6efb88a894fa552c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,9 @@ +blank_issues_enabled: false + +contact_links: + - name: Common Issues + url: https://mmaction2.readthedocs.io/en/latest/faq.html + about: Check if your issue already has solutions + - name: MMAction2 Documentation + url: https://mmaction2.readthedocs.io/en/latest/ + about: Check if your question is answered in docs diff --git a/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/error-report.md b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/error-report.md new file mode 100644 index 0000000000000000000000000000000000000000..cab4b1b580da0883d8272dc90a219d1cdcd63646 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/error-report.md @@ -0,0 +1,49 @@ +--- +name: Error report +about: Create a report to help us improve +title: '' +labels: '' +assignees: '' +--- + +Thanks for your error report and we appreciate it a lot. +If you feel we have help you, give us a STAR! :satisfied: + +**Checklist** + +1. I have searched related issues but cannot get the expected help. +2. The bug has not been fixed in the latest version. + +**Describe the bug** + +A clear and concise description of what the bug is. + +**Reproduction** + +1. What command or script did you run? + +``` +A placeholder for the command. +``` + +2. Did you make any modifications on the code or config? Did you understand what you have modified? +3. What dataset did you use? + +**Environment** + +1. Please run `PYTHONPATH=${PWD}:$PYTHONPATH python mmaction/utils/collect_env.py` to collect necessary environment information and paste it here. +2. You may add addition that may be helpful for locating the problem, such as + - How you installed PyTorch \[e.g., pip, conda, source\] + - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.) + +**Error traceback** + +If applicable, paste the error traceback here. + +``` +A placeholder for traceback. +``` + +**Bug fix** + +If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated! diff --git a/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/feature_request.md b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 0000000000000000000000000000000000000000..9b5bc408646ed710e12ca208c7546520511698b2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,27 @@ +--- +name: Feature request +about: Suggest an idea for this project +title: '' +labels: '' +assignees: '' +--- + +Thanks for your feature request and we will review and plan for it when necessary. +If you feel we have help you, give us a STAR! :satisfied: + +**Describe the feature** + +**Motivation** + +A clear and concise description of the motivation of the feature. +Ex1. It is inconvenient when \[....\]. +Ex2. There is a recent paper \[....\], which is very helpful for \[....\]. + +**Related resources** + +If there is an official code released or third-party implementations, please also provide the information here, which would be very helpful. + +**Additional context** + +Add any other context or screenshots about the feature request here. +If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated. diff --git a/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/general_questions.md b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/general_questions.md new file mode 100644 index 0000000000000000000000000000000000000000..5aa583cb1cf2ec029f587e136d09831cb6211413 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/general_questions.md @@ -0,0 +1,14 @@ +--- +name: General questions +about: Ask general questions to get help +title: '' +labels: '' +assignees: '' +--- + +Before raising a question, you may need to check the following listed items. + +**Checklist** + +1. I have searched related issues but cannot get the expected help. +2. I have read the [FAQ documentation](https://mmaction2.readthedocs.io/en/latest/faq.html) but cannot get the expected help. diff --git a/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/reimplementation_questions.md b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/reimplementation_questions.md new file mode 100644 index 0000000000000000000000000000000000000000..babbaeb8b74623750d06e22e81000b7c4d37a930 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/ISSUE_TEMPLATE/reimplementation_questions.md @@ -0,0 +1,69 @@ +--- +name: Reimplementation Questions +about: Ask about questions during model reimplementation +title: '' +labels: reimplementation +assignees: '' +--- + +If you feel we have help you, give us a STAR! :satisfied: + +**Notice** + +There are several common situations in the reimplementation issues as below + +1. Reimplement a model in the model zoo using the provided configs +2. Reimplement a model in the model zoo on other dataset (e.g., custom datasets) +3. Reimplement a custom model but all the components are implemented in MMAction2 +4. Reimplement a custom model with new modules implemented by yourself + +There are several things to do for different cases as below. + +- For case 1 & 3, please follow the steps in the following sections thus we could help to quick identify the issue. +- For case 2 & 4, please understand that we are not able to do much help here because we usually do not know the full code and the users should be responsible to the code they write. +- One suggestion for case 2 & 4 is that the users should first check whether the bug lies in the self-implemented code or the original code. For example, users can first make sure that the same model runs well on supported datasets. If you still need help, please describe what you have done and what you obtain in the issue, and follow the steps in the following sections and try as clear as possible so that we can better help you. + +**Checklist** + +1. I have searched related issues but cannot get the expected help. +2. The issue has not been fixed in the latest version. + +**Describe the issue** + +A clear and concise description of what the problem you meet and what have you done. + +**Reproduction** + +1. What command or script did you run? + +``` +A placeholder for the command. +``` + +2. What config dir you run? + +``` +A placeholder for the config. +``` + +3. Did you make any modifications on the code or config? Did you understand what you have modified? +4. What dataset did you use? + +**Environment** + +1. Please run `PYTHONPATH=${PWD}:$PYTHONPATH python mmaction/utils/collect_env.py` to collect necessary environment information and paste it here. +2. You may add addition that may be helpful for locating the problem, such as + 1. How you installed PyTorch \[e.g., pip, conda, source\] + 2. Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.) + +**Results** + +If applicable, paste the related results here, e.g., what you expect and what you get. + +``` +A placeholder for results comparison +``` + +**Issue fix** + +If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated! diff --git a/openmmlab_test/mmaction2-0.24.1/.github/pull_request_template.md b/openmmlab_test/mmaction2-0.24.1/.github/pull_request_template.md new file mode 100644 index 0000000000000000000000000000000000000000..63052769a764157c015fd14a33db8663617c80a7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/pull_request_template.md @@ -0,0 +1,26 @@ +Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily got feedback. +If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. + +## Motivation + +Please describe the motivation of this PR and the goal you want to achieve through this PR. + +## Modification + +Please briefly describe what modification is made in this PR. + +## BC-breaking (Optional) + +Does the modification introduces changes that break the back-compatibility of this repo? +If so, please describe how it breaks the compatibility and how users should modify their codes to keep compatibility with this PR. + +## Use cases (Optional) + +If this PR introduces a new feature, it is better to list some use cases here, and update the documentation. + +## Checklist + +1. Pre-commit or other linting tools should be used to fix the potential lint issues. +2. The modification should be covered by complete unit tests. If not, please add more unit tests to ensure the correctness. +3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls. +4. The documentation should be modified accordingly, like docstring or example tutorials. diff --git a/openmmlab_test/mmaction2-0.24.1/.github/workflows/build.yml b/openmmlab_test/mmaction2-0.24.1/.github/workflows/build.yml new file mode 100644 index 0000000000000000000000000000000000000000..30d72c90f68b04afd00aa63d8c3dcea823911786 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/workflows/build.yml @@ -0,0 +1,248 @@ +name: build + +on: + push: + paths-ignore: + - ".github/**.md" + - "demo/**" + - "docker/**" + - "tools/**" + - "README.md" + - "README_zh-CN.md" + + pull_request: + paths-ignore: + - ".github/**.md" + - "demo/**" + - "docker/**" + - "docs/**" + - "docs_zh-CN/**" + - "tools/**" + - "README.md" + - "README_zh-CN.md" + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + build_cpu: + runs-on: ubuntu-18.04 + strategy: + matrix: + python-version: [3.7] + torch: [1.5.0, 1.7.0, 1.9.0] + include: + - torch: 1.5.0 + torchvision: 0.6.0 + - torch: 1.7.0 + torchvision: 0.8.1 + - torch: 1.9.0 + torchvision: 0.10.0 + python-version: 3.7 + - torch: 1.9.0 + torchvision: 0.10.0 + python-version: 3.8 + - torch: 1.9.0 + torchvision: 0.10.0 + python-version: 3.9 + steps: + - uses: actions/checkout@v2 + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v2 + with: + python-version: ${{ matrix.python-version }} + - name: Upgrade pip + run: pip install pip --upgrade + - name: Install soundfile lib + run: sudo apt-get install -y libsndfile1 + - name: Install onnx + run: pip install onnx + - name: Install librosa and soundfile + run: pip install librosa soundfile + - name: Install lmdb + run: pip install lmdb + - name: Install TurboJpeg lib + run: sudo apt-get install -y libturbojpeg + - name: Install PyTorch + run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/torch_stable.html + - name: Install MMCV + run: pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cpu/torch${{matrix.torch}}/index.html + - name: Install MMDet + run: pip install git+https://github.com/open-mmlab/mmdetection/ + - name: Install MMCls + run: pip install git+https://github.com/open-mmlab/mmclassification/ + - name: Install unittest dependencies + run: pip install -r requirements/tests.txt -r requirements/optional.txt + - name: Install PytorchVideo + run: pip install pytorchvideo + if: ${{matrix.torchvision == '0.10.0'}} + - name: Build and install + run: rm -rf .eggs && pip install -e . + - name: Run unittests and generate coverage report + run: | + coverage run --branch --source mmaction -m pytest tests/ + coverage xml + coverage report -m + build_cu101: + runs-on: ubuntu-18.04 + container: + image: pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel + + strategy: + matrix: + python-version: [3.7] + torch: [1.5.0+cu101, 1.6.0+cu101, 1.7.0+cu101] + include: + - torch: 1.5.0+cu101 + torch_version: torch1.5 + torchvision: 0.6.0+cu101 + - torch: 1.6.0+cu101 + torch_version: torch1.6 + torchvision: 0.7.0+cu101 + - torch: 1.7.0+cu101 + torch_version: torch1.7 + torchvision: 0.8.1+cu101 + steps: + - uses: actions/checkout@v2 + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v2 + with: + python-version: ${{ matrix.python-version }} + - name: Upgrade pip + run: pip install pip --upgrade + - name: Fetch GPG keys + run: | + apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub + apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub + - name: Install CUDA + run: | + apt-get update && apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libturbojpeg libsndfile1 libsm6 libxrender-dev libxext6 python${{matrix.python-version}}-dev + apt-get clean + rm -rf /var/lib/apt/lists/* + - name: Install librosa and soundfile + run: python -m pip install librosa soundfile + - name: Install lmdb + run: python -m pip install lmdb + - name: Install PyTorch + run: python -m pip install torch==${{matrix.torch}} torchvision==${{matrix.torchvision}} -f https://download.pytorch.org/whl/torch_stable.html + - name: Install mmaction dependencies + run: | + python -V + python -m pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu101/${{matrix.torch_version}}/index.html + python -m pip install -q git+https://github.com/open-mmlab/mmdetection/ + python -m pip install -q git+https://github.com/open-mmlab/mmclassification/ + python -m pip install -r requirements.txt + python -c 'import mmcv; print(mmcv.__version__)' + - name: Build and install + run: rm -rf .eggs && pip install -e . + - name: Run unittests and generate coverage report + run: | + coverage run --branch --source mmaction -m pytest tests/ + coverage xml + coverage report -m + # Only upload coverage report for python3.7 && pytorch1.5 + - name: Upload coverage to Codecov + if: ${{matrix.torch == '1.5.0+cu101' && matrix.python-version == '3.7'}} + uses: codecov/codecov-action@v1.0.14 + with: + file: ./coverage.xml + flags: unittests + env_vars: OS,PYTHON + name: codecov-umbrella + fail_ci_if_error: false + + build_cu102: + runs-on: ubuntu-18.04 + container: + image: pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel + + strategy: + matrix: + python-version: [3.7] + torch: [1.9.0+cu102] + include: + - torch: 1.9.0+cu102 + torch_version: torch1.9 + torchvision: 0.10.0+cu102 + steps: + - uses: actions/checkout@v2 + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v2 + with: + python-version: ${{ matrix.python-version }} + - name: Upgrade pip + run: pip install pip --upgrade + - name: Fetch GPG keys + run: | + apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub + apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub + - name: Install CUDA + run: | + apt-get update && apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libturbojpeg libsndfile1 libsm6 libxrender-dev libxext6 python${{matrix.python-version}}-dev + apt-get clean + rm -rf /var/lib/apt/lists/* + - name: Install librosa and soundfile + run: python -m pip install librosa soundfile + - name: Install lmdb + run: python -m pip install lmdb + - name: Install PyTorch + run: python -m pip install torch==${{matrix.torch}} torchvision==${{matrix.torchvision}} -f https://download.pytorch.org/whl/torch_stable.html + - name: Install mmaction dependencies + run: | + python -V + python -m pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/${{matrix.torch_version}}/index.html + python -m pip install -q git+https://github.com/open-mmlab/mmdetection/ + python -m pip install -q git+https://github.com/open-mmlab/mmclassification/ + python -m pip install -r requirements.txt + python -c 'import mmcv; print(mmcv.__version__)' + - name: Install PytorchVideo + run: python -m pip install pytorchvideo + if: ${{matrix.torchvision == '0.10.0+cu102'}} + - name: Build and install + run: rm -rf .eggs && pip install -e . + - name: Run unittests and generate coverage report + run: | + coverage run --branch --source mmaction -m pytest tests/ + coverage xml + coverage report -m + + test_windows: + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [windows-2022] + python: [3.8] + platform: [cpu] + steps: + - uses: actions/checkout@v2 + - name: Set up Python ${{ matrix.python }} + uses: actions/setup-python@v2 + with: + python-version: ${{ matrix.python }} + - name: Upgrade pip + run: python -m pip install pip --upgrade --user + - name: Install librosa and soundfile + run: python -m pip install librosa soundfile + - name: Install lmdb + run: python -m pip install lmdb + - name: Install PyTorch + # As a complement to Linux CI, we test on PyTorch LTS version + run: pip install torch==1.8.2+${{ matrix.platform }} torchvision==0.9.2+${{ matrix.platform }} -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html + - name: Install MMCV + run: pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cpu/torch1.8/index.html --only-binary mmcv-full + - name: Install mmaction dependencies + run: | + python -V + python -m pip install -q git+https://github.com/open-mmlab/mmdetection/ + python -m pip install -q git+https://github.com/open-mmlab/mmclassification/ + python -m pip install -r requirements.txt + python -c 'import mmcv; print(mmcv.__version__)' + - name: Install PytorchVideo + run: python -m pip install pytorchvideo + - name: Show pip list + run: pip list + - name: Build and install + run: pip install -e . + - name: Run unittests + run: coverage run --branch --source mmedit -m pytest tests -sv diff --git a/openmmlab_test/mmaction2-0.24.1/.github/workflows/deploy.yml b/openmmlab_test/mmaction2-0.24.1/.github/workflows/deploy.yml new file mode 100644 index 0000000000000000000000000000000000000000..a136e0cc3e7084956290881a170ce652671104b4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/workflows/deploy.yml @@ -0,0 +1,26 @@ +name: deploy + +on: push + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + build-n-publish: + runs-on: ubuntu-latest + if: startsWith(github.event.ref, 'refs/tags') + steps: + - uses: actions/checkout@v2 + - name: Set up Python 3.7 + uses: actions/setup-python@v2 + with: + python-version: 3.7 + - name: Build MMAction2 + run: | + pip install wheel + python setup.py sdist bdist_wheel + - name: Publish distribution to PyPI + run: | + pip install twine + twine upload dist/* -u __token__ -p ${{ secrets.pypi_password }} diff --git a/openmmlab_test/mmaction2-0.24.1/.github/workflows/lint.yml b/openmmlab_test/mmaction2-0.24.1/.github/workflows/lint.yml new file mode 100644 index 0000000000000000000000000000000000000000..68b58a2b2139493804c4cb5b03cc993c48d4dde9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/workflows/lint.yml @@ -0,0 +1,27 @@ +name: lint + +on: [push, pull_request] + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - name: Set up Python 3.7 + uses: actions/setup-python@v2 + with: + python-version: 3.7 + - name: Install pre-commit hook + run: | + pip install pre-commit + pre-commit install + - name: Linting + run: pre-commit run --all-files + - name: Check docstring coverage + run: | + pip install interrogate + interrogate -v --ignore-init-method --ignore-module --ignore-nested-functions --ignore-regex "__repr__" --fail-under 80 mmaction diff --git a/openmmlab_test/mmaction2-0.24.1/.github/workflows/test_mim.yml b/openmmlab_test/mmaction2-0.24.1/.github/workflows/test_mim.yml new file mode 100644 index 0000000000000000000000000000000000000000..88594d0e77768898b7186d879cf02e1d20af1961 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.github/workflows/test_mim.yml @@ -0,0 +1,47 @@ +name: test-mim + +on: + push: + paths: + - 'model-index.yml' + - 'configs/**' + + pull_request: + paths: + - 'model-index.yml' + - 'configs/**' + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + build_cpu: + runs-on: ubuntu-18.04 + strategy: + matrix: + python-version: [3.7] + torch: [1.8.0] + include: + - torch: 1.8.0 + torch_version: torch1.8 + torchvision: 0.9.0 + steps: + - uses: actions/checkout@v2 + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v2 + with: + python-version: ${{ matrix.python-version }} + - name: Upgrade pip + run: pip install pip --upgrade + - name: Install Pillow + run: pip install Pillow==6.2.2 + if: ${{matrix.torchvision == '0.4.2'}} + - name: Install PyTorch + run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/torch_stable.html + - name: Install openmim + run: pip install openmim + - name: Build and install + run: rm -rf .eggs && mim install -e . + - name: test commands of mim + run: mim search mmaction2 diff --git a/openmmlab_test/mmaction2-0.24.1/.gitignore b/openmmlab_test/mmaction2-0.24.1/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..587b296482453bcdca7be8b0efd8c99dc8e6ab1a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.gitignore @@ -0,0 +1,140 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class +**/*.pyc + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +.hypothesis/ +.pytest_cache/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# pyenv +.python-version + +# celery beat schedule file +celerybeat-schedule + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ + +# custom +/data +.vscode +.idea +*.pkl +*.pkl.json +*.log.json +benchlist.txt +work_dirs/ + +# Pytorch +*.pth + +# Profile +*.prof + +# lmdb +*.mdb + +# unignore some data file in tests/data +!tests/data/**/*.pkl +!tests/data/**/*.pkl.json +!tests/data/**/*.log.json +!tests/data/**/*.pth + +# avoid soft links created by MIM +mmaction/configs/* +mmaction/tools/* + +*.ipynb + +# unignore ipython notebook files in demo +!demo/*.ipynb +mmaction/.mim diff --git a/openmmlab_test/mmaction2-0.24.1/.pre-commit-config.yaml b/openmmlab_test/mmaction2-0.24.1/.pre-commit-config.yaml new file mode 100644 index 0000000000000000000000000000000000000000..5b8740ebfcbcd58025bf85dff1eeacf3eaf38043 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.pre-commit-config.yaml @@ -0,0 +1,52 @@ +exclude: ^tests/data/ +repos: + - repo: https://github.com/PyCQA/flake8 + rev: 3.8.3 + hooks: + - id: flake8 + - repo: https://github.com/PyCQA/isort + rev: 5.10.1 + hooks: + - id: isort + - repo: https://github.com/pre-commit/mirrors-yapf + rev: v0.30.0 + hooks: + - id: yapf + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v3.1.0 + hooks: + - id: trailing-whitespace + - id: check-yaml + - id: end-of-file-fixer + - id: requirements-txt-fixer + - id: double-quote-string-fixer + - id: check-merge-conflict + - id: fix-encoding-pragma + args: ["--remove"] + - id: mixed-line-ending + args: ["--fix=lf"] + - repo: https://github.com/executablebooks/mdformat + rev: 0.7.9 + hooks: + - id: mdformat + args: ["--number"] + additional_dependencies: + - mdformat-openmmlab + - mdformat_frontmatter + - linkify-it-py + - repo: https://github.com/myint/docformatter + rev: v1.3.1 + hooks: + - id: docformatter + args: ["--in-place", "--wrap-descriptions", "79"] + - repo: https://github.com/codespell-project/codespell + rev: v2.1.0 + hooks: + - id: codespell + args: ["--skip", "*.ipynb,tools/data/hvu/label_map.json,docs_zh_CN/*", "-L", "te,nd,thre,Gool,gool"] + - repo: https://github.com/open-mmlab/pre-commit-hooks + rev: v0.2.0 # Use the ref you want to point at + hooks: + - id: check-algo-readme + - id: check-copyright + args: ["mmaction", "tests", "demo", "tools"] # these directories will be checked diff --git a/openmmlab_test/mmaction2-0.24.1/.pylintrc b/openmmlab_test/mmaction2-0.24.1/.pylintrc new file mode 100644 index 0000000000000000000000000000000000000000..b1add44f16a5107d5e4d1f8ea713cdc4253ab49d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.pylintrc @@ -0,0 +1,624 @@ +[MASTER] + +# A comma-separated list of package or module names from where C extensions may +# be loaded. Extensions are loading into the active Python interpreter and may +# run arbitrary code. +extension-pkg-whitelist= + +# Specify a score threshold to be exceeded before program exits with error. +fail-under=10 + +# Add files or directories to the blacklist. They should be base names, not +# paths. +ignore=CVS,configs + +# Add files or directories matching the regex patterns to the blacklist. The +# regex matches against base names, not paths. +ignore-patterns= + +# Python code to execute, usually for sys.path manipulation such as +# pygtk.require(). +#init-hook= + +# Use multiple processes to speed up Pylint. Specifying 0 will auto-detect the +# number of processors available to use. +jobs=1 + +# Control the amount of potential inferred values when inferring a single +# object. This can help the performance when dealing with large functions or +# complex, nested conditions. +limit-inference-results=100 + +# List of plugins (as comma separated values of python module names) to load, +# usually to register additional checkers. +load-plugins= + +# Pickle collected data for later comparisons. +persistent=yes + +# When enabled, pylint would attempt to guess common misconfiguration and emit +# user-friendly hints instead of false-positive error messages. +suggestion-mode=yes + +# Allow loading of arbitrary C extensions. Extensions are imported into the +# active Python interpreter and may run arbitrary code. +unsafe-load-any-extension=no + + +[MESSAGES CONTROL] + +# Only show warnings with the listed confidence levels. Leave empty to show +# all. Valid levels: HIGH, INFERENCE, INFERENCE_FAILURE, UNDEFINED. +confidence= + +# Disable the message, report, category or checker with the given id(s). You +# can either give multiple identifiers separated by comma (,) or put this +# option multiple times (only on the command line, not in the configuration +# file where it should appear only once). You can also use "--disable=all" to +# disable everything first and then reenable specific checks. For example, if +# you want to run only the similarities checker, you can use "--disable=all +# --enable=similarities". If you want to run only the classes checker, but have +# no Warning level messages displayed, use "--disable=all --enable=classes +# --disable=W". +disable=import-outside-toplevel + redefined-outer-name + print-statement, + parameter-unpacking, + unpacking-in-except, + old-raise-syntax, + backtick, + long-suffix, + old-ne-operator, + old-octal-literal, + import-star-module-level, + non-ascii-bytes-literal, + raw-checker-failed, + bad-inline-option, + locally-disabled, + file-ignored, + suppressed-message, + useless-suppression, + deprecated-pragma, + use-symbolic-message-instead, + apply-builtin, + basestring-builtin, + buffer-builtin, + cmp-builtin, + coerce-builtin, + execfile-builtin, + file-builtin, + long-builtin, + raw_input-builtin, + reduce-builtin, + standarderror-builtin, + unicode-builtin, + xrange-builtin, + coerce-method, + delslice-method, + getslice-method, + setslice-method, + no-absolute-import, + old-division, + dict-iter-method, + dict-view-method, + next-method-called, + metaclass-assignment, + indexing-exception, + raising-string, + reload-builtin, + oct-method, + hex-method, + nonzero-method, + cmp-method, + input-builtin, + round-builtin, + intern-builtin, + unichr-builtin, + map-builtin-not-iterating, + zip-builtin-not-iterating, + range-builtin-not-iterating, + filter-builtin-not-iterating, + using-cmp-argument, + eq-without-hash, + div-method, + idiv-method, + rdiv-method, + exception-message-attribute, + invalid-str-codec, + sys-max-int, + bad-python3-import, + deprecated-string-function, + deprecated-str-translate-call, + deprecated-itertools-function, + deprecated-types-field, + next-method-defined, + dict-items-not-iterating, + dict-keys-not-iterating, + dict-values-not-iterating, + deprecated-operator-function, + deprecated-urllib-function, + xreadlines-attribute, + deprecated-sys-function, + exception-escape, + comprehension-escape, + no-member, + invalid-name, + too-many-branches, + wrong-import-order, + too-many-arguments, + missing-function-docstring, + missing-module-docstring, + too-many-locals, + too-few-public-methods, + abstract-method, + broad-except, + too-many-nested-blocks, + too-many-instance-attributes, + missing-class-docstring, + duplicate-code, + not-callable, + protected-access, + dangerous-default-value, + no-name-in-module, + logging-fstring-interpolation, + super-init-not-called, + redefined-builtin, + attribute-defined-outside-init, + arguments-differ, + cyclic-import, + bad-super-call, + too-many-statements, + line-too-long + +# Enable the message, report, category or checker with the given id(s). You can +# either give multiple identifier separated by comma (,) or put this option +# multiple time (only on the command line, not in the configuration file where +# it should appear only once). See also the "--disable" option for examples. +enable=c-extension-no-member + + +[REPORTS] + +# Python expression which should return a score less than or equal to 10. You +# have access to the variables 'error', 'warning', 'refactor', and 'convention' +# which contain the number of messages in each category, as well as 'statement' +# which is the total number of statements analyzed. This score is used by the +# global evaluation report (RP0004). +evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10) + +# Template used to display messages. This is a python new-style format string +# used to format the message information. See doc for all details. +#msg-template= + +# Set the output format. Available formats are text, parseable, colorized, json +# and msvs (visual studio). You can also give a reporter class, e.g. +# mypackage.mymodule.MyReporterClass. +output-format=text + +# Tells whether to display a full report or only the messages. +reports=no + +# Activate the evaluation score. +score=yes + + +[REFACTORING] + +# Maximum number of nested blocks for function / method body +max-nested-blocks=5 + +# Complete name of functions that never returns. When checking for +# inconsistent-return-statements if a never returning function is called then +# it will be considered as an explicit return statement and no message will be +# printed. +never-returning-functions=sys.exit + + +[TYPECHECK] + +# List of decorators that produce context managers, such as +# contextlib.contextmanager. Add to this list to register other decorators that +# produce valid context managers. +contextmanager-decorators=contextlib.contextmanager + +# List of members which are set dynamically and missed by pylint inference +# system, and so shouldn't trigger E1101 when accessed. Python regular +# expressions are accepted. +generated-members= + +# Tells whether missing members accessed in mixin class should be ignored. A +# mixin class is detected if its name ends with "mixin" (case insensitive). +ignore-mixin-members=yes + +# Tells whether to warn about missing members when the owner of the attribute +# is inferred to be None. +ignore-none=yes + +# This flag controls whether pylint should warn about no-member and similar +# checks whenever an opaque object is returned when inferring. The inference +# can return multiple potential results while evaluating a Python object, but +# some branches might not be evaluated, which results in partial inference. In +# that case, it might be useful to still emit no-member and other checks for +# the rest of the inferred objects. +ignore-on-opaque-inference=yes + +# List of class names for which member attributes should not be checked (useful +# for classes with dynamically set attributes). This supports the use of +# qualified names. +ignored-classes=optparse.Values,thread._local,_thread._local + +# List of module names for which member attributes should not be checked +# (useful for modules/projects where namespaces are manipulated during runtime +# and thus existing member attributes cannot be deduced by static analysis). It +# supports qualified module names, as well as Unix pattern matching. +ignored-modules= + +# Show a hint with possible names when a member name was not found. The aspect +# of finding the hint is based on edit distance. +missing-member-hint=yes + +# The minimum edit distance a name should have in order to be considered a +# similar match for a missing member name. +missing-member-hint-distance=1 + +# The total number of similar names that should be taken in consideration when +# showing a hint for a missing member. +missing-member-max-choices=1 + +# List of decorators that change the signature of a decorated function. +signature-mutators= + + +[SPELLING] + +# Limits count of emitted suggestions for spelling mistakes. +max-spelling-suggestions=4 + +# Spelling dictionary name. Available dictionaries: none. To make it work, +# install the python-enchant package. +spelling-dict= + +# List of comma separated words that should not be checked. +spelling-ignore-words= + +# A path to a file that contains the private dictionary; one word per line. +spelling-private-dict-file= + +# Tells whether to store unknown words to the private dictionary (see the +# --spelling-private-dict-file option) instead of raising a message. +spelling-store-unknown-words=no + + +[LOGGING] + +# The type of string formatting that logging methods do. `old` means using % +# formatting, `new` is for `{}` formatting. +logging-format-style=old + +# Logging modules to check that the string format arguments are in logging +# function parameter format. +logging-modules=logging + + +[VARIABLES] + +# List of additional names supposed to be defined in builtins. Remember that +# you should avoid defining new builtins when possible. +additional-builtins= + +# Tells whether unused global variables should be treated as a violation. +allow-global-unused-variables=yes + +# List of strings which can identify a callback function by name. A callback +# name must start or end with one of those strings. +callbacks=cb_, + _cb + +# A regular expression matching the name of dummy variables (i.e. expected to +# not be used). +dummy-variables-rgx=_+$|(_[a-zA-Z0-9_]*[a-zA-Z0-9]+?$)|dummy|^ignored_|^unused_ + +# Argument names that match this expression will be ignored. Default to name +# with leading underscore. +ignored-argument-names=_.*|^ignored_|^unused_ + +# Tells whether we should check for unused import in __init__ files. +init-import=no + +# List of qualified module names which can have objects that can redefine +# builtins. +redefining-builtins-modules=six.moves,past.builtins,future.builtins,builtins,io + + +[FORMAT] + +# Expected format of line ending, e.g. empty (any line ending), LF or CRLF. +expected-line-ending-format= + +# Regexp for a line that is allowed to be longer than the limit. +ignore-long-lines=^\s*(# )??$ + +# Number of spaces of indent required inside a hanging or continued line. +indent-after-paren=4 + +# String used as indentation unit. This is usually " " (4 spaces) or "\t" (1 +# tab). +indent-string=' ' + +# Maximum number of characters on a single line. +max-line-length=100 + +# Maximum number of lines in a module. +max-module-lines=1000 + +# Allow the body of a class to be on the same line as the declaration if body +# contains single statement. +single-line-class-stmt=no + +# Allow the body of an if to be on the same line as the test if there is no +# else. +single-line-if-stmt=no + + +[STRING] + +# This flag controls whether inconsistent-quotes generates a warning when the +# character used as a quote delimiter is used inconsistently within a module. +check-quote-consistency=no + +# This flag controls whether the implicit-str-concat should generate a warning +# on implicit string concatenation in sequences defined over several lines. +check-str-concat-over-line-jumps=no + + +[SIMILARITIES] + +# Ignore comments when computing similarities. +ignore-comments=yes + +# Ignore docstrings when computing similarities. +ignore-docstrings=yes + +# Ignore imports when computing similarities. +ignore-imports=no + +# Minimum lines number of a similarity. +min-similarity-lines=4 + + +[MISCELLANEOUS] + +# List of note tags to take in consideration, separated by a comma. +notes=FIXME, + XXX, + TODO + +# Regular expression of note tags to take in consideration. +#notes-rgx= + + +[BASIC] + +# Naming style matching correct argument names. +argument-naming-style=snake_case + +# Regular expression matching correct argument names. Overrides argument- +# naming-style. +#argument-rgx= + +# Naming style matching correct attribute names. +attr-naming-style=snake_case + +# Regular expression matching correct attribute names. Overrides attr-naming- +# style. +#attr-rgx= + +# Bad variable names which should always be refused, separated by a comma. +bad-names=foo, + bar, + baz, + toto, + tutu, + tata + +# Bad variable names regexes, separated by a comma. If names match any regex, +# they will always be refused +bad-names-rgxs= + +# Naming style matching correct class attribute names. +class-attribute-naming-style=any + +# Regular expression matching correct class attribute names. Overrides class- +# attribute-naming-style. +#class-attribute-rgx= + +# Naming style matching correct class names. +class-naming-style=PascalCase + +# Regular expression matching correct class names. Overrides class-naming- +# style. +#class-rgx= + +# Naming style matching correct constant names. +const-naming-style=UPPER_CASE + +# Regular expression matching correct constant names. Overrides const-naming- +# style. +#const-rgx= + +# Minimum line length for functions/classes that require docstrings, shorter +# ones are exempt. +docstring-min-length=-1 + +# Naming style matching correct function names. +function-naming-style=snake_case + +# Regular expression matching correct function names. Overrides function- +# naming-style. +#function-rgx= + +# Good variable names which should always be accepted, separated by a comma. +good-names=i, + j, + k, + ex, + Run, + _, + x, + y, + w, + h, + a, + b + +# Good variable names regexes, separated by a comma. If names match any regex, +# they will always be accepted +good-names-rgxs= + +# Include a hint for the correct naming format with invalid-name. +include-naming-hint=no + +# Naming style matching correct inline iteration names. +inlinevar-naming-style=any + +# Regular expression matching correct inline iteration names. Overrides +# inlinevar-naming-style. +#inlinevar-rgx= + +# Naming style matching correct method names. +method-naming-style=snake_case + +# Regular expression matching correct method names. Overrides method-naming- +# style. +#method-rgx= + +# Naming style matching correct module names. +module-naming-style=snake_case + +# Regular expression matching correct module names. Overrides module-naming- +# style. +#module-rgx= + +# Colon-delimited sets of names that determine each other's naming style when +# the name regexes allow several styles. +name-group= + +# Regular expression which should only match function or class names that do +# not require a docstring. +no-docstring-rgx=^_ + +# List of decorators that produce properties, such as abc.abstractproperty. Add +# to this list to register other decorators that produce valid properties. +# These decorators are taken in consideration only for invalid-name. +property-classes=abc.abstractproperty + +# Naming style matching correct variable names. +variable-naming-style=snake_case + +# Regular expression matching correct variable names. Overrides variable- +# naming-style. +#variable-rgx= + + +[DESIGN] + +# Maximum number of arguments for function / method. +max-args=5 + +# Maximum number of attributes for a class (see R0902). +max-attributes=7 + +# Maximum number of boolean expressions in an if statement (see R0916). +max-bool-expr=5 + +# Maximum number of branch for function / method body. +max-branches=12 + +# Maximum number of locals for function / method body. +max-locals=15 + +# Maximum number of parents for a class (see R0901). +max-parents=7 + +# Maximum number of public methods for a class (see R0904). +max-public-methods=20 + +# Maximum number of return / yield for function / method body. +max-returns=6 + +# Maximum number of statements in function / method body. +max-statements=50 + +# Minimum number of public methods for a class (see R0903). +min-public-methods=2 + + +[IMPORTS] + +# List of modules that can be imported at any level, not just the top level +# one. +allow-any-import-level= + +# Allow wildcard imports from modules that define __all__. +allow-wildcard-with-all=no + +# Analyse import fallback blocks. This can be used to support both Python 2 and +# 3 compatible code, which means that the block might have code that exists +# only in one or another interpreter, leading to false positives when analysed. +analyse-fallback-blocks=no + +# Deprecated modules which should not be used, separated by a comma. +deprecated-modules=optparse,tkinter.tix + +# Create a graph of external dependencies in the given file (report RP0402 must +# not be disabled). +ext-import-graph= + +# Create a graph of every (i.e. internal and external) dependencies in the +# given file (report RP0402 must not be disabled). +import-graph= + +# Create a graph of internal dependencies in the given file (report RP0402 must +# not be disabled). +int-import-graph= + +# Force import order to recognize a module as part of the standard +# compatibility libraries. +known-standard-library= + +# Force import order to recognize a module as part of a third party library. +known-third-party=enchant + +# Couples of modules and preferred modules, separated by a comma. +preferred-modules= + + +[CLASSES] + +# List of method names used to declare (i.e. assign) instance attributes. +defining-attr-methods=__init__, + __new__, + setUp, + __post_init__ + +# List of member names, which should be excluded from the protected access +# warning. +exclude-protected=_asdict, + _fields, + _replace, + _source, + _make + +# List of valid names for the first argument in a class method. +valid-classmethod-first-arg=cls + +# List of valid names for the first argument in a metaclass class method. +valid-metaclass-classmethod-first-arg=cls + + +[EXCEPTIONS] + +# Exceptions that will emit a warning when being caught. Defaults to +# "BaseException, Exception". +overgeneral-exceptions=BaseException, + Exception diff --git a/openmmlab_test/mmaction2-0.24.1/.readthedocs.yml b/openmmlab_test/mmaction2-0.24.1/.readthedocs.yml new file mode 100644 index 0000000000000000000000000000000000000000..73ea4cb7e95530cd18ed94895ca38edd531f0d94 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/.readthedocs.yml @@ -0,0 +1,7 @@ +version: 2 + +python: + version: 3.7 + install: + - requirements: requirements/docs.txt + - requirements: requirements/readthedocs.txt diff --git a/openmmlab_test/mmaction2-0.24.1/CITATION.cff b/openmmlab_test/mmaction2-0.24.1/CITATION.cff new file mode 100644 index 0000000000000000000000000000000000000000..93a03304abf45f2a781ebe6641a0da252f932246 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/CITATION.cff @@ -0,0 +1,8 @@ +cff-version: 1.2.0 +message: "If you use this software, please cite it as below." +authors: + - name: "MMAction2 Contributors" +title: "OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark" +date-released: 2020-07-21 +url: "https://github.com/open-mmlab/mmaction2" +license: Apache-2.0 diff --git a/openmmlab_test/mmaction2-0.24.1/LICENSE b/openmmlab_test/mmaction2-0.24.1/LICENSE new file mode 100644 index 0000000000000000000000000000000000000000..04adf5cbc620ad190547b092fa449e36df5f7bf4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/LICENSE @@ -0,0 +1,203 @@ +Copyright 2018-2019 Open-MMLab. All rights reserved. + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright 2018-2019 Open-MMLab. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/openmmlab_test/mmaction2-0.24.1/MANIFEST.in b/openmmlab_test/mmaction2-0.24.1/MANIFEST.in new file mode 100644 index 0000000000000000000000000000000000000000..258c4e016b9549e4eb2767738ebcf114ac064eec --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/MANIFEST.in @@ -0,0 +1,3 @@ +include mmaction/.mim/model-index.yml +recursive-include mmaction/.mim/configs *.py *.yml +recursive-include mmaction/.mim/tools *.sh *.py diff --git a/openmmlab_test/mmaction2-0.24.1/README.md b/openmmlab_test/mmaction2-0.24.1/README.md new file mode 100644 index 0000000000000000000000000000000000000000..675fb6be9538b440f8b1b969021889e10d073303 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/README.md @@ -0,0 +1,320 @@ +
+ +
 
+
+ OpenMMLab website + + + HOT + + +      + OpenMMLab platform + + + TRY IT OUT + + +
+ +[![Documentation](https://readthedocs.org/projects/mmaction2/badge/?version=latest)](https://mmaction2.readthedocs.io/en/latest/) +[![actions](https://github.com/open-mmlab/mmaction2/workflows/build/badge.svg)](https://github.com/open-mmlab/mmaction2/actions) +[![codecov](https://codecov.io/gh/open-mmlab/mmaction2/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmaction2) +[![PyPI](https://img.shields.io/pypi/v/mmaction2)](https://pypi.org/project/mmaction2/) +[![LICENSE](https://img.shields.io/github/license/open-mmlab/mmaction2.svg)](https://github.com/open-mmlab/mmaction2/blob/master/LICENSE) +[![Average time to resolve an issue](https://isitmaintained.com/badge/resolution/open-mmlab/mmaction2.svg)](https://github.com/open-mmlab/mmaction2/issues) +[![Percentage of issues still open](https://isitmaintained.com/badge/open/open-mmlab/mmaction2.svg)](https://github.com/open-mmlab/mmaction2/issues) + +[📘Documentation](https://mmaction2.readthedocs.io/en/latest/) | +[🛠️Installation](https://mmaction2.readthedocs.io/en/latest/install.html) | +[👀Model Zoo](https://mmaction2.readthedocs.io/en/latest/modelzoo.html) | +[🆕Update News](https://mmaction2.readthedocs.io/en/latest/changelog.html) | +[🚀Ongoing Projects](https://github.com/open-mmlab/mmaction2/projects) | +[🤔Reporting Issues](https://github.com/open-mmlab/mmaction2/issues/new/choose) + +
+ +English | [简体中文](/README_zh-CN.md) | [模型测试步骤](/train.md) + +## Introduction + +MMAction2 is an open-source toolbox for video understanding based on PyTorch. +It is a part of the [OpenMMLab](http://openmmlab.org/) project. + +The master branch works with **PyTorch 1.5+**. + +
+
+
+

Action Recognition Results on Kinetics-400

+
+
+
+

Skeleton-base Action Recognition Results on NTU-RGB+D-120

+
+
+
+
+

Skeleton-based Spatio-Temporal Action Detection and Action Recognition Results on Kinetics-400

+
+
+
+

Spatio-Temporal Action Detection Results on AVA-2.1

+
+ +## Major Features + +- **Modular design**: We decompose a video understanding framework into different components. One can easily construct a customized video understanding framework by combining different modules. + +- **Support four major video understanding tasks**: MMAction2 implements various algorithms for multiple video understanding tasks, including action recognition, action localization, spatio-temporal action detection, and skeleton-based action detection. We support **27** different algorithms and **20** different datasets for the four major tasks. + +- **Well tested and documented**: We provide detailed documentation and API reference, as well as unit tests. + +## What's New + +- (2022-03-04) We support **Multigrid** on Kinetics400, achieve 76.07% Top-1 accuracy and accelerate training speed. +- (2021-11-24) We support **2s-AGCN** on NTU60 XSub, achieve 86.06% Top-1 accuracy on joint stream and 86.89% Top-1 accuracy on bone stream respectively. +- (2021-10-29) We provide a demo for skeleton-based and rgb-based spatio-temporal detection and action recognition (demo/demo_video_structuralize.py). +- (2021-10-26) We train and test **ST-GCN** on NTU60 with 3D keypoint annotations, achieve 84.61% Top-1 accuracy (higher than 81.5% in the [paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/17135)). +- (2021-10-25) We provide a script(tools/data/skeleton/gen_ntu_rgbd_raw.py) to convert the NTU60 and NTU120 3D raw skeleton data to our format. +- (2021-10-25) We provide a [guide](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/custom_dataset_training.md) on how to train PoseC3D with custom datasets, [bit-scientist](https://github.com/bit-scientist) authored this PR! +- (2021-10-16) We support **PoseC3D** on UCF101 and HMDB51, achieves 87.0% and 69.3% Top-1 accuracy with 2D skeletons only. Pre-extracted 2D skeletons are also available. + +**Release**: v0.24.0 was released in 05/05/2022. Please refer to [changelog.md](docs/changelog.md) for details and release history. + +## Installation + +MMAction2 depends on [PyTorch](https://pytorch.org/), [MMCV](https://github.com/open-mmlab/mmcv), [MMDetection](https://github.com/open-mmlab/mmdetection) (optional), and [MMPose](https://github.com/open-mmlab/mmdetection)(optional). +Below are quick steps for installation. +Please refer to [install.md](docs/install.md) for more detailed instruction. + +```shell +conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y +conda activate open-mmlab +pip3 install openmim +mim install mmcv-full +mim install mmdet # optional +mim install mmpose # optional +git clone https://github.com/open-mmlab/mmaction2.git +cd mmaction2 +pip3 install -e . +``` + +## Get Started + +Please see [getting_started.md](docs/getting_started.md) for the basic usage of MMAction2. +There are also tutorials: + +- [learn about configs](docs/tutorials/1_config.md) +- [finetuning models](docs/tutorials/2_finetune.md) +- [adding new dataset](docs/tutorials/3_new_dataset.md) +- [designing data pipeline](docs/tutorials/4_data_pipeline.md) +- [adding new modules](docs/tutorials/5_new_modules.md) +- [exporting model to onnx](docs/tutorials/6_export_model.md) +- [customizing runtime settings](docs/tutorials/7_customize_runtime.md) + +A Colab tutorial is also provided. You may preview the notebook [here](demo/mmaction2_tutorial.ipynb) or directly [run](https://colab.research.google.com/github/open-mmlab/mmaction2/blob/master/demo/mmaction2_tutorial.ipynb) on Colab. + +## Supported Methods + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Action Recognition
C3D (CVPR'2014)TSN (ECCV'2016)I3D (CVPR'2017)I3D Non-Local (CVPR'2018)R(2+1)D (CVPR'2018)
TRN (ECCV'2018)TSM (ICCV'2019)TSM Non-Local (ICCV'2019)SlowOnly (ICCV'2019)SlowFast (ICCV'2019)
CSN (ICCV'2019)TIN (AAAI'2020)TPN (CVPR'2020)X3D (CVPR'2020)OmniSource (ECCV'2020)
MultiModality: Audio (ArXiv'2020)TANet (ArXiv'2020)TimeSformer (ICML'2021)
Action Localization
SSN (ICCV'2017)BSN (ECCV'2018)BMN (ICCV'2019)
Spatio-Temporal Action Detection
ACRN (ECCV'2018)SlowOnly+Fast R-CNN (ICCV'2019)SlowFast+Fast R-CNN (ICCV'2019)LFB (CVPR'2019)
Skeleton-based Action Recognition
ST-GCN (AAAI'2018)2s-AGCN (CVPR'2019)PoseC3D (ArXiv'2021)
+ +Results and models are available in the *README.md* of each method's config directory. +A summary can be found on the [**model zoo**](https://mmaction2.readthedocs.io/en/latest/recognition_models.html) page. + +We will keep up with the latest progress of the community and support more popular algorithms and frameworks. +If you have any feature requests, please feel free to leave a comment in [Issues](https://github.com/open-mmlab/mmaction2/issues/19). + +## Supported Datasets + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Action Recognition
HMDB51 (Homepage) (ICCV'2011)UCF101 (Homepage) (CRCV-IR-12-01)ActivityNet (Homepage) (CVPR'2015)Kinetics-[400/600/700] (Homepage) (CVPR'2017)
SthV1 (Homepage) (ICCV'2017)SthV2 (Homepage) (ICCV'2017)Diving48 (Homepage) (ECCV'2018)Jester (Homepage) (ICCV'2019)
Moments in Time (Homepage) (TPAMI'2019)Multi-Moments in Time (Homepage) (ArXiv'2019)HVU (Homepage) (ECCV'2020)OmniSource (Homepage) (ECCV'2020)
FineGYM (Homepage) (CVPR'2020)
Action Localization
THUMOS14 (Homepage) (THUMOS Challenge 2014)ActivityNet (Homepage) (CVPR'2015)
Spatio-Temporal Action Detection
UCF101-24* (Homepage) (CRCV-IR-12-01)JHMDB* (Homepage) (ICCV'2015)AVA (Homepage) (CVPR'2018)
Skeleton-based Action Recognition
PoseC3D-FineGYM (Homepage) (ArXiv'2021)PoseC3D-NTURGB+D (Homepage) (ArXiv'2021)PoseC3D-UCF101 (Homepage) (ArXiv'2021)PoseC3D-HMDB51 (Homepage) (ArXiv'2021)
+ +Datasets marked with * are not fully supported yet, but related dataset preparation steps are provided. A summary can be found on the [**Supported Datasets**](https://mmaction2.readthedocs.io/en/latest/supported_datasets.html) page. + +## Benchmark + +To demonstrate the efficacy and efficiency of our framework, we compare MMAction2 with some other popular frameworks and official releases in terms of speed. Details can be found in [benchmark](docs/benchmark.md). + +## Data Preparation + +Please refer to [data_preparation.md](docs/data_preparation.md) for a general knowledge of data preparation. +The supported datasets are listed in [supported_datasets.md](docs/supported_datasets.md) + +## FAQ + +Please refer to [FAQ](docs/faq.md) for frequently asked questions. + +## Projects built on MMAction2 + +Currently, there are many research works and projects built on MMAction2 by users from community, such as: + +- Video Swin Transformer. [\[paper\]](https://arxiv.org/abs/2106.13230)[\[github\]](https://github.com/SwinTransformer/Video-Swin-Transformer) +- Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 **Oral**. [\[paper\]](https://arxiv.org/abs/2107.10161)[\[github\]](https://github.com/Cogito2012/DEAR) +- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 **Oral**. [\[paper\]](https://arxiv.org/abs/2103.17263)[\[github\]](https://github.com/xvjiarui/VFS) + +etc., check [projects.md](docs/projects.md) to see all related projects. + +## Contributing + +We appreciate all contributions to improve MMAction2. Please refer to [CONTRIBUTING.md](https://github.com/open-mmlab/mmcv/blob/master/CONTRIBUTING.md) in MMCV for more details about the contributing guideline. + +## Acknowledgement + +MMAction2 is an open-source project that is contributed by researchers and engineers from various colleges and companies. +We appreciate all the contributors who implement their methods or add new features and users who give valuable feedback. +We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their new models. + +## Citation + +If you find this project useful in your research, please consider cite: + +```BibTeX +@misc{2020mmaction2, + title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark}, + author={MMAction2 Contributors}, + howpublished = {\url{https://github.com/open-mmlab/mmaction2}}, + year={2020} +} +``` + +## License + +This project is released under the [Apache 2.0 license](LICENSE). + +## Projects in OpenMMLab + +- [MIM](https://github.com/open-mmlab/mim): MIM installs OpenMMLab packages. +- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark. +- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark. +- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection. +- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab rotated object detection toolbox and benchmark. +- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark. +- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab text detection, recognition, and understanding toolbox. +- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark. +- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark. +- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark. +- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab model compression toolbox and benchmark. +- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab fewshot learning toolbox and benchmark. +- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark. +- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark. +- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark. +- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox. +- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab image and video generative models toolbox. +- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab model deployment framework. diff --git a/openmmlab_test/mmaction2-0.24.1/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..d6a1e2af43706e622150edd63eabf71c8daea976 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/README_zh-CN.md @@ -0,0 +1,331 @@ +
+ +
 
+
+ OpenMMLab 官网 + + + HOT + + +      + OpenMMLab 开放平台 + + + TRY IT OUT + + +
+ +[![Documentation](https://readthedocs.org/projects/mmaction2/badge/?version=latest)](https://mmaction2.readthedocs.io/zh_CN/latest/) +[![actions](https://github.com/open-mmlab/mmaction2/workflows/build/badge.svg)](https://github.com/open-mmlab/mmaction2/actions) +[![codecov](https://codecov.io/gh/open-mmlab/mmaction2/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmaction2) +[![PyPI](https://img.shields.io/pypi/v/mmaction2)](https://pypi.org/project/mmaction2/) +[![LICENSE](https://img.shields.io/github/license/open-mmlab/mmaction2.svg)](https://github.com/open-mmlab/mmaction2/blob/master/LICENSE) +[![Average time to resolve an issue](https://isitmaintained.com/badge/resolution/open-mmlab/mmaction2.svg)](https://github.com/open-mmlab/mmaction2/issues) +[![Percentage of issues still open](https://isitmaintained.com/badge/open/open-mmlab/mmaction2.svg)](https://github.com/open-mmlab/mmaction2/issues) + +[📘文档](https://mmaction2.readthedocs.io/en/latest/) | +[🛠️安装指南](https://mmaction2.readthedocs.io/en/latest/install.html) | +[👀模型库](https://mmaction2.readthedocs.io/en/latest/modelzoo.html) | +[🆕更新](https://mmaction2.readthedocs.io/en/latest/changelog.html) | +[🚀进行中项目](https://github.com/open-mmlab/mmaction2/projects) | +[🤔问题反馈](https://github.com/open-mmlab/mmaction2/issues/new/choose) + +
+ +[English](/README.md) | 简体中文 + +## 简介 + +MMAction2 是一款基于 PyTorch 的视频理解开源工具箱,是 [OpenMMLab](http://openmmlab.org/) 项目的成员之一 + +主分支代码目前支持 **PyTorch 1.5 以上**的版本 + +
+
+
+

Kinetics-400 上的动作识别

+
+
+
+

NTURGB+D-120 上的基于人体姿态的动作识别

+
+
+
+
+

Kinetics-400 上的基于 skeleton 的时空动作检测和动作识别

+
+
+
+

AVA-2.1 上的时空动作检测

+
+ +## 主要特性 + +- **模块化设计**:MMAction2 将统一的视频理解框架解耦成不同的模块组件,通过组合不同的模块组件,用户可以便捷地构建自定义的视频理解模型 + +- **支持多种任务和数据集**:MMAction2 支持多种视频理解任务,包括动作识别,时序动作检测,时空动作检测以及基于人体姿态的动作识别,总共支持 **27** 种算法和 **20** 种数据集 + +- **详尽的单元测试和文档**:MMAction2 提供了详尽的说明文档,API 接口说明,全面的单元测试,以供社区参考 + +## 更新记录 + +- (2021-11-24) 在 NTU60 XSub 上支持 **2s-AGCN**, 在 joint stream 和 bone stream 上分别达到 86.06% 和 86.89% 的识别准确率。 +- (2021-10-29) 支持基于 skeleton 模态和 rgb 模态的时空动作检测和行为识别 demo (demo/demo_video_structuralize.py)。 +- (2021-10-26) 在 NTU60 3d 关键点标注数据集上训练测试 **STGCN**, 可达到 84.61% (高于 [paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/17135) 中的 81.5%) 的识别准确率。 +- (2021-10-25) 提供将 NTU60 和 NTU120 的 3d 骨骼点数据转换成我们项目的格式的脚本(tools/data/skeleton/gen_ntu_rgbd_raw.py)。 +- (2021-10-25) 提供使用自定义数据集训练 PoseC3D 的 [教程](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/custom_dataset_training.md),此 PR 由用户 [bit-scientist](https://github.com/bit-scientist) 完成! +- (2021-10-16) 在 UCF101, HMDB51 上支持 **PoseC3D**,仅用 2D 关键点就可分别达到 87.0% 和 69.3% 的识别准确率。两数据集的预提取骨架特征可以公开下载。 + +v0.24.0 版本已于 2022 年 5 月 5 日发布,可通过查阅 [更新日志](/docs/changelog.md) 了解更多细节以及发布历史 + +## 安装 + +MMAction2 依赖 [PyTorch](https://pytorch.org/), [MMCV](https://github.com/open-mmlab/mmcv), [MMDetection](https://github.com/open-mmlab/mmdetection)(可选), [MMPose](https://github.com/open-mmlab/mmpose)(可选),以下是安装的简要步骤。 +更详细的安装指南请参考 [install.md](docs_zh_CN/install.md)。 + +```shell +conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y +conda activate open-mmlab +pip3 install openmim +mim install mmcv-full +mim install mmdet # 可选 +mim install mmpose # 可选 +git clone https://github.com/open-mmlab/mmaction2.git +cd mmaction2 +pip3 install -e . +``` + +## 教程 + +请参考 [基础教程](/docs_zh_CN/getting_started.md) 了解 MMAction2 的基本使用。MMAction2也提供了其他更详细的教程: + +- [如何编写配置文件](/docs_zh_CN/tutorials/1_config.md) +- [如何微调模型](/docs_zh_CN/tutorials/2_finetune.md) +- [如何增加新数据集](/docs_zh_CN/tutorials/3_new_dataset.md) +- [如何设计数据处理流程](/docs_zh_CN/tutorials/4_data_pipeline.md) +- [如何增加新模块](/docs_zh_CN/tutorials/5_new_modules.md) +- [如何导出模型为 onnx 格式](/docs_zh_CN/tutorials/6_export_model.md) +- [如何自定义模型运行参数](/docs_zh_CN/tutorials/7_customize_runtime.md) + +MMAction2 也提供了相应的中文 Colab 教程,可以点击 [这里](https://colab.research.google.com/github/open-mmlab/mmaction2/blob/master/demo/mmaction2_tutorial_zh-CN.ipynb) 进行体验! + +## 模型库 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
行为识别方法
C3D (CVPR'2014)TSN (ECCV'2016)I3D (CVPR'2017)I3D Non-Local (CVPR'2018)R(2+1)D (CVPR'2018)
TRN (ECCV'2018)TSM (ICCV'2019)TSM Non-Local (ICCV'2019)SlowOnly (ICCV'2019)SlowFast (ICCV'2019)
CSN (ICCV'2019)TIN (AAAI'2020)TPN (CVPR'2020)X3D (CVPR'2020)OmniSource (ECCV'2020)
MultiModality: Audio (ArXiv'2020)TANet (ArXiv'2020)TimeSformer (ICML'2021)
时序动作检测方法
SSN (ICCV'2017)BSN (ECCV'2018)BMN (ICCV'2019)
时空动作检测方法
ACRN (ECCV'2018)SlowOnly+Fast R-CNN (ICCV'2019)SlowFast+Fast R-CNN (ICCV'2019)LFB (CVPR'2019)
基于骨骼点的动作识别方法
ST-GCN (AAAI'2018)2s-AGCN (CVPR'2019)PoseC3D (ArXiv'2021)
+ +各个模型的结果和设置都可以在对应的 config 目录下的 *README_zh-CN.md* 中查看。整体的概况也可也在 [**模型库**](https://mmaction2.readthedocs.io/zh_CN/latest/recognition_models.html) 页面中查看 + +MMAction2 将跟进学界的最新进展,并支持更多算法和框架。如果您对 MMAction2 有任何功能需求,请随时在 [问题](https://github.com/open-mmlab/mmaction2/issues/19) 中留言。 + +## 数据集 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
动作识别数据集
HMDB51 (主页) (ICCV'2011)UCF101 (主页) (CRCV-IR-12-01)ActivityNet (主页) (CVPR'2015)Kinetics-[400/600/700] (主页) (CVPR'2017)
SthV1 (主页) (ICCV'2017)SthV2 (主页) (ICCV'2017)Diving48 (主页) (ECCV'2018)Jester (主页) (ICCV'2019)
Moments in Time (主页) (TPAMI'2019)Multi-Moments in Time (主页) (ArXiv'2019)HVU (主页) (ECCV'2020)OmniSource (主页) (ECCV'2020)
FineGYM (主页) (CVPR'2020)
时序动作检测数据集
THUMOS14 (主页) (THUMOS Challenge 2014)ActivityNet (主页) (CVPR'2015)
时空动作检测数据集
UCF101-24* (主页) (CRCV-IR-12-01)JHMDB* (主页) (ICCV'2015)AVA (主页) (CVPR'2018)
基于骨骼点的动作识别数据集
PoseC3D-FineGYM (主页) (ArXiv'2021)PoseC3D-NTURGB+D (主页) (ArXiv'2021)PoseC3D-UCF101 (主页) (ArXiv'2021)PoseC3D-HMDB51 (主页) (ArXiv'2021)
+ +标记 * 代表对应数据集并未被完全支持,但提供相应的数据准备步骤。整体的概况也可也在 [**数据集**](https://mmaction2.readthedocs.io/en/latest/supported_datasets.html) 页面中查看 + +## 基准测试 + +为了验证 MMAction2 框架的高精度和高效率,开发成员将其与当前其他主流框架进行速度对比。更多详情可见 [基准测试](/docs_zh_CN/benchmark.md) + +## 数据集准备 + +请参考 [数据准备](/docs_zh_CN/data_preparation.md) 了解数据集准备概况。所有支持的数据集都列于 [数据集清单](/docs_zh_CN/supported_datasets.md) 中 + +## 常见问题 + +请参考 [FAQ](/docs_zh_CN/faq.md) 了解其他用户的常见问题 + +## 相关工作 + +目前有许多研究工作或工程项目基于 MMAction2 搭建,例如: + +- Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 **Oral**. [\[论文\]](https://arxiv.org/abs/2107.10161)[\[代码\]](https://github.com/Cogito2012/DEAR) +- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 **Oral**. [\[论文\]](https://arxiv.org/abs/2103.17263)[\[代码\]](https://github.com/xvjiarui/VFS) +- Video Swin Transformer. [\[论文\]](https://arxiv.org/abs/2106.13230)[\[代码\]](https://github.com/SwinTransformer/Video-Swin-Transformer) + +更多详情可见 [相关工作](docs/projects.md) + +## 参与贡献 + +我们非常欢迎用户对于 MMAction2 做出的任何贡献,可以参考 [贡献指南](/.github/CONTRIBUTING.md) 文件了解更多细节 + +## 致谢 + +MMAction2 是一款由不同学校和公司共同贡献的开源项目。我们感谢所有为项目提供算法复现和新功能支持的贡献者,以及提供宝贵反馈的用户。 +我们希望该工具箱和基准测试可以为社区提供灵活的代码工具,供用户复现现有算法并开发自己的新模型,从而不断为开源社区提供贡献。 + +## 引用 + +如果你觉得 MMAction2 对你的研究有所帮助,可以考虑引用它: + +```BibTeX +@misc{2020mmaction2, + title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark}, + author={MMAction2 Contributors}, + howpublished = {\url{https://github.com/open-mmlab/mmaction2}}, + year={2020} +} +``` + +## 许可 + +该项目开源自 [Apache 2.0 license](/LICENSE) + +## OpenMMLab 的其他项目 + +- [MIM](https://github.com/open-mmlab/mim): MIM 是 OpenMMlab 项目、算法、模型的统一入口 +- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab 图像分类工具箱 +- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab 目标检测工具箱 +- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab 新一代通用 3D 目标检测平台 +- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab 旋转框检测工具箱与测试基准 +- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab 语义分割工具箱 +- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab 全流程文字检测识别理解工具箱 +- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab 姿态估计工具箱 +- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 人体参数化模型工具箱与测试基准 +- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab 自监督学习工具箱与测试基准 +- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab 模型压缩工具箱与测试基准 +- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab 少样本学习工具箱与测试基准 +- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab 新一代视频理解工具箱 +- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab 一体化视频目标感知平台 +- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab 光流估计工具箱与测试基准 +- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab 图像视频编辑工具箱 +- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab 图片视频生成模型工具箱 +- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab 模型部署框架 + +## 欢迎加入 OpenMMLab 社区 + +扫描下方的二维码可关注 OpenMMLab 团队的 [知乎官方账号](https://www.zhihu.com/people/openmmlab),加入 OpenMMLab 团队的 [官方交流 QQ 群](https://jq.qq.com/?_wv=1027&k=aCvMxdr3) + +
+ +
+ +我们会在 OpenMMLab 社区为大家 + +- 📢 分享 AI 框架的前沿核心技术 +- 💻 解读 PyTorch 常用模块源码 +- 📰 发布 OpenMMLab 的相关新闻 +- 🚀 介绍 OpenMMLab 开发的前沿算法 +- 🏃 获取更高效的问题答疑和意见反馈 +- 🔥 提供与各行各业开发者充分交流的平台 + +干货满满 📘,等你来撩 💗,OpenMMLab 社区期待您的加入 👬 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/default_runtime.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/default_runtime.py new file mode 100644 index 0000000000000000000000000000000000000000..3bfa975246284f2c8d250c13233268268eee4822 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/default_runtime.py @@ -0,0 +1,18 @@ +checkpoint_config = dict(interval=1) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +load_from = None +resume_from = None +workflow = [('train', 1)] + +# disable opencv multithreading to avoid system being overloaded +opencv_num_threads = 0 +# set multi-process start method as `fork` to speed up the training +mp_start_method = 'fork' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/audioonly_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/audioonly_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..d4a190c819cb0044010f8bbbf9a2a6d383cfb0c6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/audioonly_r50.py @@ -0,0 +1,18 @@ +# model settings +model = dict( + type='AudioRecognizer', + backbone=dict( + type='ResNetAudio', + depth=50, + pretrained=None, + in_channels=1, + norm_eval=False), + cls_head=dict( + type='AudioTSNHead', + num_classes=400, + in_channels=1024, + dropout_ratio=0.5, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/bmn_400x100.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/bmn_400x100.py new file mode 100644 index 0000000000000000000000000000000000000000..edaccb98daef29cc3c82a249313a40b07bd9cd16 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/bmn_400x100.py @@ -0,0 +1,12 @@ +# model settings +model = dict( + type='BMN', + temporal_dim=100, + boundary_ratio=0.5, + num_samples=32, + num_samples_per_bin=3, + feat_dim=400, + soft_nms_alpha=0.4, + soft_nms_low_threshold=0.5, + soft_nms_high_threshold=0.9, + post_process_top_k=100) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/bsn_pem.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/bsn_pem.py new file mode 100644 index 0000000000000000000000000000000000000000..7acb7d31d15765133843bab99aa7c9f619573442 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/bsn_pem.py @@ -0,0 +1,13 @@ +# model settings +model = dict( + type='PEM', + pem_feat_dim=32, + pem_hidden_dim=256, + pem_u_ratio_m=1, + pem_u_ratio_l=2, + pem_high_temporal_iou_threshold=0.6, + pem_low_temporal_iou_threshold=0.2, + soft_nms_alpha=0.75, + soft_nms_low_threshold=0.65, + soft_nms_high_threshold=0.9, + post_process_top_k=100) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/bsn_tem.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/bsn_tem.py new file mode 100644 index 0000000000000000000000000000000000000000..84a2b6997498c01e8652608bfa846e27f944d110 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/bsn_tem.py @@ -0,0 +1,8 @@ +# model settings +model = dict( + type='TEM', + temporal_dim=100, + boundary_ratio=0.1, + tem_feat_dim=400, + tem_hidden_dim=512, + tem_match_threshold=0.5) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/c3d_sports1m_pretrained.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/c3d_sports1m_pretrained.py new file mode 100644 index 0000000000000000000000000000000000000000..1cdc3d49f9d6ebac64c3555599cf39e7609ef8e7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/c3d_sports1m_pretrained.py @@ -0,0 +1,23 @@ +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='C3D', + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_pretrain_20201016-dcc47ddc.pth', # noqa: E501 + style='pytorch', + conv_cfg=dict(type='Conv3d'), + norm_cfg=None, + act_cfg=dict(type='ReLU'), + dropout_ratio=0.5, + init_std=0.005), + cls_head=dict( + type='I3DHead', + num_classes=101, + in_channels=4096, + spatial_type=None, + dropout_ratio=0.5, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='score')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/i3d_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/i3d_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..fee08bc29110a0f4b5a5ed98336a9462962d8483 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/i3d_r50.py @@ -0,0 +1,27 @@ +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3d', + pretrained2d=True, + pretrained='torchvision://resnet50', + depth=50, + conv1_kernel=(5, 7, 7), + conv1_stride_t=2, + pool1_stride_t=2, + conv_cfg=dict(type='Conv3d'), + norm_eval=False, + inflate=((1, 1, 1), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 1, 0)), + zero_init_residual=False), + cls_head=dict( + type='I3DHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + dropout_ratio=0.5, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) + +# This setting refers to https://github.com/open-mmlab/mmaction/blob/master/mmaction/models/tenons/backbones/resnet_i3d.py#L329-L332 # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/ircsn_r152.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/ircsn_r152.py new file mode 100644 index 0000000000000000000000000000000000000000..36e700c3849581cb96ce1e285838c74f6620aa10 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/ircsn_r152.py @@ -0,0 +1,22 @@ +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dCSN', + pretrained2d=False, + pretrained=None, + depth=152, + with_pool2=False, + bottleneck_mode='ir', + norm_eval=False, + zero_init_residual=False), + cls_head=dict( + type='I3DHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + dropout_ratio=0.5, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob', max_testing_views=10)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/r2plus1d_r34.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/r2plus1d_r34.py new file mode 100644 index 0000000000000000000000000000000000000000..b5bcdac0c29983f1520d29153b91b0cea3de3afe --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/r2plus1d_r34.py @@ -0,0 +1,28 @@ +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet2Plus1d', + depth=34, + pretrained=None, + pretrained2d=False, + norm_eval=False, + conv_cfg=dict(type='Conv2plus1d'), + norm_cfg=dict(type='SyncBN', requires_grad=True, eps=1e-3), + conv1_kernel=(3, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(1, 1, 1, 1), + spatial_strides=(1, 2, 2, 2), + temporal_strides=(1, 2, 2, 2), + zero_init_residual=False), + cls_head=dict( + type='I3DHead', + num_classes=400, + in_channels=512, + spatial_type='avg', + dropout_ratio=0.5, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/slowfast_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/slowfast_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..afa8aab0458605c02237202e9a7e6e53b724af0c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/slowfast_r50.py @@ -0,0 +1,39 @@ +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=8, # tau + speed_ratio=8, # alpha + channel_ratio=8, # beta_inv + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + norm_eval=False)), + cls_head=dict( + type='SlowFastHead', + in_channels=2304, # 2048+256 + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/slowonly_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/slowonly_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..13081786bfadbf72922ecb45a9f596b931986655 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/slowonly_r50.py @@ -0,0 +1,22 @@ +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained='torchvision://resnet50', + lateral=False, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + cls_head=dict( + type='I3DHead', + in_channels=2048, + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tanet_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tanet_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..b20ea82215abda92ea26f01c792fba972a0b271f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tanet_r50.py @@ -0,0 +1,20 @@ +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='TANet', + pretrained='torchvision://resnet50', + depth=50, + num_segments=8, + tam_cfg=dict()), + cls_head=dict( + type='TSMHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.5, + init_std=0.001), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tin_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tin_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..af9ac373e17963f6fd23163e0bbdc7456c265bef --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tin_r50.py @@ -0,0 +1,21 @@ +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNetTIN', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False, + shift_div=4), + cls_head=dict( + type='TSMHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.5, + init_std=0.001, + is_shift=False), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips=None)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tpn_slowonly_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tpn_slowonly_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..072e5e8872a8fe0e1a17b7e912f39a0e2b433498 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tpn_slowonly_r50.py @@ -0,0 +1,40 @@ +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained='torchvision://resnet50', + lateral=False, + out_indices=(2, 3), + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + neck=dict( + type='TPN', + in_channels=(1024, 2048), + out_channels=1024, + spatial_modulation_cfg=dict( + in_channels=(1024, 2048), out_channels=2048), + temporal_modulation_cfg=dict(downsample_scales=(8, 8)), + upsample_cfg=dict(scale_factor=(1, 1, 1)), + downsample_cfg=dict(downsample_scale=(1, 1, 1)), + level_fusion_cfg=dict( + in_channels=(1024, 1024), + mid_channels=(1024, 1024), + out_channels=2048, + downsample_scales=((1, 1, 1), (1, 1, 1))), + aux_head_cfg=dict(out_channels=400, loss_weight=0.5)), + cls_head=dict( + type='TPNHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.5, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tpn_tsm_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tpn_tsm_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..4a038669f669c8bea3d3c83fd73a068dfc3ca66c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tpn_tsm_r50.py @@ -0,0 +1,36 @@ +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNetTSM', + pretrained='torchvision://resnet50', + depth=50, + out_indices=(2, 3), + norm_eval=False, + shift_div=8), + neck=dict( + type='TPN', + in_channels=(1024, 2048), + out_channels=1024, + spatial_modulation_cfg=dict( + in_channels=(1024, 2048), out_channels=2048), + temporal_modulation_cfg=dict(downsample_scales=(8, 8)), + upsample_cfg=dict(scale_factor=(1, 1, 1)), + downsample_cfg=dict(downsample_scale=(1, 1, 1)), + level_fusion_cfg=dict( + in_channels=(1024, 1024), + mid_channels=(1024, 1024), + out_channels=2048, + downsample_scales=((1, 1, 1), (1, 1, 1))), + aux_head_cfg=dict(out_channels=174, loss_weight=0.5)), + cls_head=dict( + type='TPNHead', + num_classes=174, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.5, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob', fcn_test=True)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/trn_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/trn_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..ff84e78cb111dfbf50619f537834e61e647bc749 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/trn_r50.py @@ -0,0 +1,22 @@ +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False, + partial_bn=True), + cls_head=dict( + type='TRNHead', + num_classes=400, + in_channels=2048, + num_segments=8, + spatial_type='avg', + relation_type='TRNMultiScale', + hidden_dim=256, + dropout_ratio=0.8, + init_std=0.001), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsm_mobilenet_v2.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsm_mobilenet_v2.py new file mode 100644 index 0000000000000000000000000000000000000000..bce81074e28e7498c5f933a2b2a85ea25dcc158c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsm_mobilenet_v2.py @@ -0,0 +1,22 @@ +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='MobileNetV2TSM', + shift_div=8, + num_segments=8, + is_shift=True, + pretrained='mmcls://mobilenet_v2'), + cls_head=dict( + type='TSMHead', + num_segments=8, + num_classes=400, + in_channels=1280, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.5, + init_std=0.001, + is_shift=True), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsm_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsm_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..477497b67947c64625e8d3f639062fce35fcce16 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsm_r50.py @@ -0,0 +1,21 @@ +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNetTSM', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False, + shift_div=8), + cls_head=dict( + type='TSMHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.5, + init_std=0.001, + is_shift=True), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsn_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsn_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..d879ea692a8846295d36dda85fe06102a7b57ff7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsn_r50.py @@ -0,0 +1,19 @@ +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False), + cls_head=dict( + type='TSNHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.4, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips=None)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsn_r50_audio.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsn_r50_audio.py new file mode 100644 index 0000000000000000000000000000000000000000..2c3ab0dff54ba73bdad3393242893932b662e1bd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/tsn_r50_audio.py @@ -0,0 +1,13 @@ +# model settings +model = dict( + type='AudioRecognizer', + backbone=dict(type='ResNet', depth=50, in_channels=1, norm_eval=False), + cls_head=dict( + type='AudioTSNHead', + num_classes=400, + in_channels=2048, + dropout_ratio=0.5, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/x3d.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/x3d.py new file mode 100644 index 0000000000000000000000000000000000000000..10e302055a640f68ec2ee6921b672d020ee0a54b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/models/x3d.py @@ -0,0 +1,14 @@ +# model settings +model = dict( + type='Recognizer3D', + backbone=dict(type='X3D', gamma_w=1, gamma_b=2.25, gamma_d=2.2), + cls_head=dict( + type='X3DHead', + in_channels=432, + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5, + fc1_bias=False), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/adam_20e.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/adam_20e.py new file mode 100644 index 0000000000000000000000000000000000000000..baa535f76cb9d2a32c36a39d7c51e4d2d5402b45 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/adam_20e.py @@ -0,0 +1,7 @@ +# optimizer +optimizer = dict( + type='Adam', lr=0.01, weight_decay=0.00001) # this lr is used for 1 gpus +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=10) +total_epochs = 20 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_100e.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_100e.py new file mode 100644 index 0000000000000000000000000000000000000000..de37742bc82a91d2df1ec650507b1ca0e80064c4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_100e.py @@ -0,0 +1,10 @@ +# optimizer +optimizer = dict( + type='SGD', + lr=0.01, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[40, 80]) +total_epochs = 100 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_150e_warmup.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_150e_warmup.py new file mode 100644 index 0000000000000000000000000000000000000000..af33a7c4c9151c1813107a188e582e2a3eb669bc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_150e_warmup.py @@ -0,0 +1,13 @@ +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='step', + step=[90, 130], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=10) +total_epochs = 150 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_50e.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_50e.py new file mode 100644 index 0000000000000000000000000000000000000000..9345715defad9dbbf2869911a158dfc4a5275e71 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_50e.py @@ -0,0 +1,10 @@ +# optimizer +optimizer = dict( + type='SGD', + lr=0.01, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[20, 40]) +total_epochs = 50 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_100e.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_100e.py new file mode 100644 index 0000000000000000000000000000000000000000..dbdc4739878bc62455ab2cf19ed33030189a4cc3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_100e.py @@ -0,0 +1,12 @@ +# optimizer +optimizer = dict( + type='SGD', + constructor='TSMOptimizerConstructor', + paramwise_cfg=dict(fc_lr5=True), + lr=0.01, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[40, 80]) +total_epochs = 100 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_50e.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_50e.py new file mode 100644 index 0000000000000000000000000000000000000000..24f4f344e9ab9658fe6a70acbdbc6d0a41a15488 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_50e.py @@ -0,0 +1,12 @@ +# optimizer +optimizer = dict( + type='SGD', + constructor='TSMOptimizerConstructor', + paramwise_cfg=dict(fc_lr5=True), + lr=0.01, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[20, 40]) +total_epochs = 50 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_mobilenet_v2_100e.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_mobilenet_v2_100e.py new file mode 100644 index 0000000000000000000000000000000000000000..63ed3f275a32ca35c9bb3093707e32dcd6e6e6e9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_mobilenet_v2_100e.py @@ -0,0 +1,12 @@ +# optimizer +optimizer = dict( + type='SGD', + constructor='TSMOptimizerConstructor', + paramwise_cfg=dict(fc_lr5=True), + lr=0.01, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.00002) +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[40, 80]) +total_epochs = 100 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_mobilenet_v2_50e.py b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_mobilenet_v2_50e.py new file mode 100644 index 0000000000000000000000000000000000000000..78612def9596476ef5769f55a85d022bf6ea60a8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/_base_/schedules/sgd_tsm_mobilenet_v2_50e.py @@ -0,0 +1,12 @@ +# optimizer +optimizer = dict( + type='SGD', + constructor='TSMOptimizerConstructor', + paramwise_cfg=dict(fc_lr5=True), + lr=0.01, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.00002) +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[20, 40]) +total_epochs = 50 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/_base_/models/slowonly_r50.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/_base_/models/slowonly_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..965338ea445b552c2312f8dcbd5ceb89e8cb9c61 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/_base_/models/slowonly_r50.py @@ -0,0 +1,43 @@ +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + pretrained2d=False, + lateral=False, + num_stages=4, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1)), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2048, + num_classes=81, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/_base_/models/slowonly_r50_nl.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/_base_/models/slowonly_r50_nl.py new file mode 100644 index 0000000000000000000000000000000000000000..fd2f739da7d20adc2da60a60aab5b00c620c656e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/_base_/models/slowonly_r50_nl.py @@ -0,0 +1,50 @@ +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + pretrained2d=False, + lateral=False, + num_stages=4, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1), + norm_cfg=dict(type='BN3d', requires_grad=True), + non_local=((0, 0, 0), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 0, 0)), + non_local_cfg=dict( + sub_sample=True, + use_scale=True, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='embedded_gaussian')), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2048, + num_classes=81, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..18574fcb7b8300f2230f09f1f88b9aac412fe6b0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/README.md @@ -0,0 +1,97 @@ +# ACRN + +[Actor-centric relation network](https://openaccess.thecvf.com/content_ECCV_2018/html/Chen_Sun_Actor-centric_Relation_Network_ECCV_2018_paper.html) + + + +## Abstract + + + +Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level and model temporal context with 3D ConvNets. Here, we go one step further and model spatio-temporal relations to capture the interactions between human actors, relevant objects and scene elements essential to differentiate similar human actions. Our approach is weakly supervised and mines the relevant elements automatically with an actor-centric relational network (ACRN). ACRN computes and accumulates pair-wise relation information from actor and global scene features, and generates relation features for action classification. It is implemented as neural networks and can be trained jointly with an existing action detection system. We show that ACRN outperforms alternative approaches which capture relation information, and that the proposed framework improves upon the state-of-the-art performance on JHMDB and AVA. A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action. + + + +
+ +
+ +## Results and Models + +### AVA2.1 + +| Model | Modality | Pretrained | Backbone | Input | gpus | mAP | log | json | ckpt | +| :---------------------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :----------: | :------: | :---: | :--: | :--: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb](/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 27.1 | [log](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb-49b07bf2.pth) | + +### AVA2.2 + +| Model | Modality | Pretrained | Backbone | Input | gpus | mAP | log | json | ckpt | +| :-------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :----------: | :------: | :---: | :--: | :--: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb](/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 27.8 | [log](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-2be32625.pth) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. + +::: + +For more details on data preparation, you can refer to AVA in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train ACRN with SlowFast backbone on AVA with periodic validation. + +```shell +python tools/train.py configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py --validate +``` + +For more details and optional arguments infos, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test ACRN with SlowFast backbone on AVA and dump the result to a csv file. + +```shell +python tools/test.py configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py checkpoints/SOME_CHECKPOINT.pth --eval mAP --out results.csv +``` + +For more details and optional arguments infos, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset) . + +## Citation + + + +```BibTeX +@inproceedings{gu2018ava, + title={Ava: A video dataset of spatio-temporally localized atomic visual actions}, + author={Gu, Chunhui and Sun, Chen and Ross, David A and Vondrick, Carl and Pantofaru, Caroline and Li, Yeqing and Vijayanarasimhan, Sudheendra and Toderici, George and Ricco, Susanna and Sukthankar, Rahul and others}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={6047--6056}, + year={2018} +} +``` + +```BibTeX +@inproceedings{sun2018actor, + title={Actor-centric relation network}, + author={Sun, Chen and Shrivastava, Abhinav and Vondrick, Carl and Murphy, Kevin and Sukthankar, Rahul and Schmid, Cordelia}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + pages={318--334}, + year={2018} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..23ceb9fc74096f8d78ec359695990188d0de49a2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/README_zh-CN.md @@ -0,0 +1,81 @@ +# ACRN + +## 简介 + + + +```BibTeX +@inproceedings{gu2018ava, + title={Ava: A video dataset of spatio-temporally localized atomic visual actions}, + author={Gu, Chunhui and Sun, Chen and Ross, David A and Vondrick, Carl and Pantofaru, Caroline and Li, Yeqing and Vijayanarasimhan, Sudheendra and Toderici, George and Ricco, Susanna and Sukthankar, Rahul and others}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={6047--6056}, + year={2018} +} +``` + + + +```BibTeX +@inproceedings{sun2018actor, + title={Actor-centric relation network}, + author={Sun, Chen and Shrivastava, Abhinav and Vondrick, Carl and Murphy, Kevin and Sukthankar, Rahul and Schmid, Cordelia}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + pages={318--334}, + year={2018} +} +``` + +## 模型库 + +### AVA2.1 + +| 配置文件 | 模态 | 预训练 | 主干网络 | 输入 | GPU 数量 | mAP | log | json | ckpt | +| :---------------------------------------------------------------------------------------------------------------------------------------------------------: | :--: | :----------: | :------: | :--: | :------: | :--: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb](/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 27.1 | [log](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb-49b07bf2.pth) | + +### AVA2.2 + +| 配置文件 | 模态 | 预训练 | 主干网络 | 输入 | GPU 数量 | mAP | log | json | ckpt | +| :-------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--: | :----------: | :------: | :--: | :------: | :--: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb](/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 27.8 | [log](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-2be32625.pth) | + +- 注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 + +对于数据集准备的细节,用户可参考 [数据准备](/docs_zh_CN/data_preparation.md)。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:在 AVA 数据集上训练 ACRN 辅以 SlowFast 主干网络,并定期验证。 + +```shell +python tools/train.py configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py --validate +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 AVA 上测试 ACRN 辅以 SlowFast 主干网络,并将结果存为 csv 文件。 + +```shell +python tools/test.py configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py checkpoints/SOME_CHECKPOINT.pth --eval mAP --out results.csv +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..50cacc7ff993d68d6a7022eeae8586d2bcd346e6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/metafile.yml @@ -0,0 +1,49 @@ +Collections: +- Name: ACRN + README: configs/detection/acrn/README.md + Paper: + URL: https://arxiv.org/abs/1807.10982 + Title: Actor-Centric Relation Network +Models: +- Config: configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.py + In Collection: ACRN + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 10 + Input: 32x2 + Modality: RGB + Parameters: 92232057 + Pretrained: Kinetics-400 + Training Data: AVA v2.1 + Training Resources: 8 GPUs + Name: slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 27.1 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.json + Training Log: https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.log + Weights: https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb-49b07bf2.pth +- Config: configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py + In Collection: ACRN + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 10 + Input: 32x2 + Modality: RGB + Parameters: 92232057 + Pretrained: Kinetics-400 + Training Data: AVA v2.2 + Training Resources: 8 GPUs + Name: slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb + Results: + - Dataset: AVA v2.2 + Metrics: + mAP: 27.8 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json + Training Log: https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log + Weights: https://download.openmmlab.com/mmaction/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-2be32625.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..d42ef11efa1d811688ee3e228a5e843e69377423 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py @@ -0,0 +1,170 @@ +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=4, + speed_ratio=4, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + fusion_kernel=7, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + spatial_strides=(1, 2, 2, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1))), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True, + temporal_pool_mode='max'), + shared_head=dict(type='ACRNHead', in_channels=4608, out_channels=2304), + bbox_head=dict( + type='BBoxHeadAVA', + dropout_ratio=0.5, + in_channels=2304, + num_classes=81, + multilabel=True)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.2.csv' +ann_file_val = f'{anno_root}/ava_val_v2.2.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.2.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.2.csv' + +label_file = f'{anno_root}/ava_action_list_v2.2_for_activitynet_2019.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=32, frame_interval=2), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] +# optimizer +optimizer = dict(type='SGD', lr=0.075, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + by_epoch=False, + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=2, + warmup_ratio=0.1) +total_epochs = 10 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb' # noqa: E501 +load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth' # noqa: E501 +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4d069cbb2cac7156dd1c232834214e6c7cd25eb8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/acrn/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava_rgb.py @@ -0,0 +1,170 @@ +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=4, + speed_ratio=4, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + fusion_kernel=7, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + spatial_strides=(1, 2, 2, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1))), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True, + temporal_pool_mode='max'), + shared_head=dict(type='ACRNHead', in_channels=4608, out_channels=2304), + bbox_head=dict( + type='BBoxHeadAVA', + dropout_ratio=0.5, + in_channels=2304, + num_classes=81, + multilabel=True)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=32, frame_interval=2), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] +# optimizer +optimizer = dict(type='SGD', lr=0.075, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + by_epoch=False, + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=2, + warmup_ratio=0.1) +total_epochs = 10 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/slowfast_acrn_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb' # noqa: E501 +load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth' # noqa: E501 +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/README.md b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f46a3961bfcfd393f9d129ffc2668813bd1bf72a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/README.md @@ -0,0 +1,146 @@ +# AVA + +[Ava: A video dataset of spatio-temporally localized atomic visual actions](https://openaccess.thecvf.com/content_cvpr_2018/html/Gu_AVA_A_Video_CVPR_2018_paper.html) + + + +
+ +
+ +## Abstract + + + +This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. We will release the dataset publicly. +AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.6% mAP, underscoring the need for developing new approaches for video understanding. + + + +
+ +
+ + + +```BibTeX +@inproceedings{feichtenhofer2019slowfast, + title={Slowfast networks for video recognition}, + author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming}, + booktitle={Proceedings of the IEEE international conference on computer vision}, + pages={6202--6211}, + year={2019} +} +``` + +## Results and Models + +### AVA2.1 + +| Model | Modality | Pretrained | Backbone | Input | gpus | Resolution | mAP | log | json | ckpt | +| :--------------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :----------: | :-------: | :---: | :--: | :------------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 4x16 | 8 | short-side 256 | 20.1 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-40061d5f.pth) | +| [slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb.py) | RGB | OmniSource | ResNet50 | 4x16 | 8 | short-side 256 | 21.8 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb_20201217-0c6d2e98.pth) | +| [slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb](/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 4x16 | 8 | short-side 256 | 21.75 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb/20210316_122517.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb/20210316_122517.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb_20210316-959829ec.pth) | +| [slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb](/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 8x8 | 8x2 | short-side 256 | 23.79 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb/20210316_122517.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb/20210316_122517.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb_20210316-5742e4dd.pth) | +| [slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb](/configs/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet101 | 8x8 | 8x2 | short-side 256 | 24.6 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb_20201217-1c9b4117.pth) | +| [slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb](/configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py) | RGB | OmniSource | ResNet101 | 8x8 | 8x2 | short-side 256 | 25.9 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth) | +| [slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8x2 | short-side 256 | 24.4 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-6e7c704d.pth) | +| [slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8x2 | short-side 256 | 25.4 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201222.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201222.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201222-f4d209c9.pth) | +| [slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb](/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8x2 | short-side 256 | 25.5 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217-ae225e97.pth) | + +### AVA2.2 + +| Model | Modality | Pretrained | Backbone | Input | gpus | mAP | log | json | ckpt | +| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :----------: | :------: | :---: | :--: | :--: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb](/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 26.1 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-b987b516.pth) | +| [slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb](/configs/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 26.4 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-874e0845.pth) | +| [slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb](/configs/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 26.8 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-345618cd.pth) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. **Context** indicates that using both RoI feature and global pooled feature for classification, which leads to around 1% mAP improvement in general. + +::: + +For more details on data preparation, you can refer to AVA in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train SlowOnly model on AVA with periodic validation. + +```shell +python tools/train.py configs/detection/ava/slowonly_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py --validate +``` + +For more details and optional arguments infos, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting) . + +### Train Custom Classes From Ava Dataset + +You can train custom classes from ava. Ava suffers from class imbalance. There are more then 100,000 samples for classes like `stand`/`listen to (a person)`/`talk to (e.g., self, a person, a group)`/`watch (a person)`, whereas half of all classes has less than 500 samples. In most cases, training custom classes with fewer samples only will lead to better results. + +Three steps to train custom classes: + +- Step 1: Select custom classes from original classes, named `custom_classes`. Class `0` should not be selected since it is reserved for further usage (to identify whether a proposal is positive or negative, not implemented yet) and will be added automatically. +- Step 2: Set `num_classes`. In order to be compatible with current codes, Please make sure `num_classes == len(custom_classes) + 1`. + - The new class `0` corresponds to original class `0`. The new class `i`(i > 0) corresponds to original class `custom_classes[i-1]`. + - There are three `num_classes` in ava config, `model -> roi_head -> bbox_head -> num_classes`, `data -> train -> num_classes` and `data -> val -> num_classes`. + - If `num_classes <= 5`, input arg `topk` of `BBoxHeadAVA` should be modified. The default value of `topk` is `(3, 5)`, and all elements of `topk` must be smaller than `num_classes`. +- Step 3: Make sure all custom classes are in `label_file`. It is worth mentioning that there are two label files, `ava_action_list_v2.1_for_activitynet_2018.pbtxt`(contains 60 classes, 20 classes are missing) and `ava_action_list_v2.1.pbtxt`(contains all 80 classes). + +Take `slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb` as an example, training custom classes with AP in range `(0.1, 0.3)`, aka `[3, 6, 10, 27, 29, 38, 41, 48, 51, 53, 54, 59, 61, 64, 70, 72]`. Please note that, the previously mentioned AP is calculated by original ckpt, which is trained by all 80 classes. The results are listed as follows. + +| training classes | mAP(custom classes) | config | log | json | ckpt | +| :--------------: | :-----------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| All 80 classes | 0.1948 | [slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-40061d5f.pth) | +| custom classes | 0.3311 | [slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py) | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes-4ab80419.pth) | +| All 80 classes | 0.1864 | [slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py](/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-6e7c704d.pth) | +| custom classes | 0.3785 | [slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes](/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py) | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes_20210305.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes_20210305.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes_20210305-c6225546.pth) | + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test SlowOnly model on AVA and dump the result to a csv file. + +```shell +python tools/test.py configs/detection/ava/slowonly_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py checkpoints/SOME_CHECKPOINT.pth --eval mAP --out results.csv +``` + +For more details and optional arguments infos, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset) . + +## Citation + + + +```BibTeX +@inproceedings{gu2018ava, + title={Ava: A video dataset of spatio-temporally localized atomic visual actions}, + author={Gu, Chunhui and Sun, Chen and Ross, David A and Vondrick, Carl and Pantofaru, Caroline and Li, Yeqing and Vijayanarasimhan, Sudheendra and Toderici, George and Ricco, Susanna and Sukthankar, Rahul and others}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={6047--6056}, + year={2018} +} +``` + +```BibTeX +@article{duan2020omni, + title={Omni-sourced Webly-supervised Learning for Video Recognition}, + author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua}, + journal={arXiv preprint arXiv:2003.13042}, + year={2020} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..1b4b2b08de1501ec35693a7daad56634066fe869 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/README_zh-CN.md @@ -0,0 +1,129 @@ +# AVA + +
+ +
+ +## 简介 + + + +```BibTeX +@inproceedings{gu2018ava, + title={Ava: A video dataset of spatio-temporally localized atomic visual actions}, + author={Gu, Chunhui and Sun, Chen and Ross, David A and Vondrick, Carl and Pantofaru, Caroline and Li, Yeqing and Vijayanarasimhan, Sudheendra and Toderici, George and Ricco, Susanna and Sukthankar, Rahul and others}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={6047--6056}, + year={2018} +} +``` + + + +```BibTeX +@article{duan2020omni, + title={Omni-sourced Webly-supervised Learning for Video Recognition}, + author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua}, + journal={arXiv preprint arXiv:2003.13042}, + year={2020} +} +``` + + + +```BibTeX +@inproceedings{feichtenhofer2019slowfast, + title={Slowfast networks for video recognition}, + author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming}, + booktitle={Proceedings of the IEEE international conference on computer vision}, + pages={6202--6211}, + year={2019} +} +``` + +## 模型库 + +### AVA2.1 + +| 配置文件 | 模态 | 预训练 | 主干网络 | 输入 | GPU 数量 | 分辨率 | mAP | log | json | ckpt | +| :--------------------------------------------------------------------------------------------------------------------------------------------------: | :--: | :----------: | :-------: | :--: | :------: | :------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 4x16 | 8 | 短边 256 | 20.1 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-40061d5f.pth) | +| [slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb.py) | RGB | OmniSource | ResNet50 | 4x16 | 8 | 短边 256 | 21.8 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb_20201217-0c6d2e98.pth) | +| [slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb](/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 4x16 | 8 | 短边 256 | 21.75 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb/20210316_122517.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb/20210316_122517.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb_20210316-959829ec.pth) | +| [slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb](/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 8x8 | 8x2 | 短边 256 | 23.79 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb/20210316_122517.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb/20210316_122517.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb_20210316-5742e4dd.pth) | +| [slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb](/configs/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet101 | 8x8 | 8x2 | 短边 256 | 24.6 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb_20201217-1c9b4117.pth) | +| [slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb](/configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py) | RGB | OmniSource | ResNet101 | 8x8 | 8x2 | 短边 256 | 25.9 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth) | +| [slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8x2 | 短边 256 | 24.4 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-6e7c704d.pth) | +| [slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8x2 | 短边 256 | 25.4 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201222.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201222.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201222-f4d209c9.pth) | +| [slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb](/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8x2 | 短边 256 | 25.5 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217-ae225e97.pth) | + +### AVA2.2 + +| 配置文件 | 模态 | 预训练 | 主干网络 | 输入 | GPU 数量 | mAP | log | json | ckpt | +| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--: | :----------: | :------: | :--: | :------: | :--: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb](/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 26.1 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-b987b516.pth) | +| [slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb](/configs/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 26.4 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-874e0845.pth) | +| [slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb](/configs/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py) | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 26.8 | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-345618cd.pth) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. **Context** 表示同时使用 RoI 特征与全局特征进行分类,可带来约 1% mAP 的提升。 + +对于数据集准备的细节,用户可参考 [数据准备](/docs_zh_CN/data_preparation.md)。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:在 AVA 数据集上训练 SlowOnly,并定期验证。 + +```shell +python tools/train.py configs/detection/ava/slowonly_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py --validate +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +### 训练 AVA 数据集中的自定义类别 + +用户可以训练 AVA 数据集中的自定义类别。AVA 中不同类别的样本量很不平衡:其中有超过 100000 样本的类别: `stand`/`listen to (a person)`/`talk to (e.g., self, a person, a group)`/`watch (a person)`,也有样本较少的类别(半数类别不足 500 样本)。大多数情况下,仅使用样本较少的类别进行训练将在这些类别上得到更好精度。 + +训练 AVA 数据集中的自定义类别包含 3 个步骤: + +1. 从原先的类别中选择希望训练的类别,将其填写至配置文件的 `custom_classes` 域中。其中 `0` 不表示具体的动作类别,不应被选择。 +2. 将 `num_classes` 设置为 `num_classes = len(custom_classes) + 1`。 + - 在新的类别到编号的对应中,编号 `0` 仍对应原类别 `0`,编号 `i` (i > 0) 对应原类别 `custom_classes[i-1]`。 + - 配置文件中 3 处涉及 `num_classes` 需要修改:`model -> roi_head -> bbox_head -> num_classes`, `data -> train -> num_classes`, `data -> val -> num_classes`. + - 若 `num_classes <= 5`, 配置文件 `BBoxHeadAVA` 中的 `topk` 参数应被修改。`topk` 的默认值为 `(3, 5)`,`topk` 中的所有元素应小于 `num_classes`。 +3. 确认所有自定义类别在 `label_file` 中。 + +以 `slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb` 为例,这一配置文件训练所有 AP 在 `(0.1, 0.3)` 间的类别(这里的 AP 为 AVA 80 类训出模型的表现),即 `[3, 6, 10, 27, 29, 38, 41, 48, 51, 53, 54, 59, 61, 64, 70, 72]`。下表列出了自定义类别训练的模型精度: + +| 训练类别 | mAP (自定义类别) | 配置文件 | log | json | ckpt | +| :--------: | :----------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| 全部 80 类 | 0.1948 | [slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-40061d5f.pth) | +| 自定义类别 | 0.3311 | [slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py) | [log](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes-4ab80419.pth) | +| 全部 80 类 | 0.1864 | [slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-6e7c704d.pth) | +| 自定义类别 | 0.3785 | [slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes](/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py) | [log](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes_20210305.log) | [json](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes_20210305.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes_20210305-c6225546.pth) | + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 AVA 上测试 SlowOnly 模型,并将结果存为 csv 文件。 + +```shell +python tools/test.py configs/detection/ava/slowonly_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py checkpoints/SOME_CHECKPOINT.pth --eval mAP --out results.csv +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..971abd7bd42d130a4d846d8012f2515f9faa16d6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/metafile.yml @@ -0,0 +1,259 @@ +Collections: +- Name: AVA + README: configs/detection/ava/README.md + Paper: + URL: https://arxiv.org/abs/1705.08421 + Title: "AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions" +Models: +- Config: configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 16 + Epochs: 20 + Input: 4x16 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 8 GPUs + Modality: RGB + Name: slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 20.1 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201127.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-40061d5f.pth +- Config: configs/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 16 + Epochs: 20 + Input: 4x16 + Pretrained: OmniSource + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 8 GPUs + Modality: RGB + Name: slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 21.8 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb_20201127.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb_20201127.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb_20201217-0c6d2e98.pth +- Config: configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 10 + Input: 4x16 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 8 GPUs + Modality: RGB + Name: slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 21.75 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb/20210316_122517.log.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb/20210316_122517.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb_20210316-959829ec.pth +- Config: configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 10 + Input: 8x8 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 16 GPUs + Modality: RGB + Name: slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 23.79 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb/20210316_122517.log.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb/20210316_122517.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb_20210316-5742e4dd.pth +- Config: configs/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet101 + Batch Size: 6 + Epochs: 20 + Input: 8x8 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 16 GPUs + Modality: RGB + Name: slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 24.6 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb_20201127.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb_20201127.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb_20201217-1c9b4117.pth +- Config: configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet101 + Batch Size: 6 + Epochs: 20 + Input: 8x8 + Pretrained: OmniSource + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 16 GPUs + Modality: RGB + Name: slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 25.9 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201127.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201127.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth +- Config: configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 9 + Epochs: 20 + Input: 32x2 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 16 GPUs + Modality: RGB + Name: slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 24.4 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-6e7c704d.pth +- Config: configs/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 9 + Epochs: 20 + Input: 32x2 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 16 GPUs + Modality: RGB + Name: slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 25.4 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201222.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201222.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201222-f4d209c9.pth +- Config: configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 5 + Epochs: 20 + Input: 32x2 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 16 GPUs + Modality: RGB + Name: slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 25.5 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217-ae225e97.pth +- Config: configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 10 + Input: 32x2 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.2 + Training Resources: 8 GPUs + Modality: RGB + Name: slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb + Results: + - Dataset: AVA v2.2 + Metrics: + mAP: 26.1 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-b987b516.pth +- Config: configs/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 10 + Input: 32x2 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.2 + Training Resources: 8 GPUs + Modality: RGB + Name: slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb + Results: + - Dataset: AVA v2.2 + Metrics: + mAP: 26.8 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-345618cd.pth +- Config: configs/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py + In Collection: AVA + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 10 + Input: 32x2 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.2 + Training Resources: 8 GPUs + Modality: RGB + Name: slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb + Results: + - Dataset: AVA v2.2 + Metrics: + mAP: 26.4 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.json + Training Log: https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.log + Weights: https://download.openmmlab.com/mmaction/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb-874e0845.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..a180bb91734dc697e6a848987547f93f773ff0df --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py @@ -0,0 +1,175 @@ +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=8, + speed_ratio=8, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + spatial_strides=(1, 2, 2, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1))), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True, + with_global=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=4608, + num_classes=81, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=32, frame_interval=2), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=9, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict(type='SGD', lr=0.1125, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowfast/' + 'slowfast_r50_4x16x1_256e_kinetics400_rgb/' + 'slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..f649374a0e1d5f794a2414950124d0938a9459bd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py @@ -0,0 +1,174 @@ +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=8, + speed_ratio=8, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + spatial_strides=(1, 2, 2, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1))), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2304, + num_classes=81, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=32, frame_interval=2), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=9, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict(type='SGD', lr=0.1125, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowfast/' + 'slowfast_r50_4x16x1_256e_kinetics400_rgb/' + 'slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py new file mode 100644 index 0000000000000000000000000000000000000000..413065cb846204d86c80cf7df962622d96321123 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py @@ -0,0 +1,184 @@ +# custom classes of ava dataset +# Here we choose classes with AP in range [0.1, 0.3) +# AP is calculated by **slowonly** ckpt, which is trained by all 80 classes +custom_classes = [3, 6, 10, 27, 29, 38, 41, 48, 51, 53, 54, 59, 61, 64, 70, 72] +num_classes = len(custom_classes) + 1 + +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=8, + speed_ratio=8, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + spatial_strides=(1, 2, 2, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1))), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2304, + num_classes=num_classes, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=32, frame_interval=2), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=9, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + num_classes=num_classes, + custom_classes=custom_classes, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + num_classes=num_classes, + custom_classes=custom_classes, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict(type='SGD', lr=0.1125, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.05) +total_epochs = 20 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowfast/' + 'slowfast_r50_4x16x1_256e_kinetics400_rgb/' + 'slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7c3826d8bbd010be9fcba576a869b32eaf8b8ea2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py @@ -0,0 +1,175 @@ +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=4, + speed_ratio=4, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + fusion_kernel=7, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + spatial_strides=(1, 2, 2, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1))), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2304, + num_classes=81, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=32, frame_interval=2), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=5, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict(type='SGD', lr=0.075, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowfast/' + 'slowfast_r50_8x8x1_256e_kinetics400_rgb/' + 'slowfast_r50_8x8x1_256e_kinetics400_rgb_20200704-73547d2b.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..9fa024f2961ac223635f47f0c1ea728a700a386a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py @@ -0,0 +1,168 @@ +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=4, + speed_ratio=4, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + fusion_kernel=7, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + spatial_strides=(1, 2, 2, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1))), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + dropout_ratio=0.5, + in_channels=2304, + num_classes=81, + multilabel=True)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.2.csv' +ann_file_val = f'{anno_root}/ava_val_v2.2.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.2.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.2.csv' + +label_file = f'{anno_root}/ava_action_list_v2.2_for_activitynet_2019.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=32, frame_interval=2), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] +# optimizer +optimizer = dict(type='SGD', lr=0.075, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + by_epoch=False, + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=2, + warmup_ratio=0.1) +total_epochs = 10 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb' # noqa: E501 +load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth' # noqa: E501 +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..71af48e10b3bedd2633deeac659aea16ed6f0dcb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py @@ -0,0 +1,171 @@ +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=4, + speed_ratio=4, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + fusion_kernel=7, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + spatial_strides=(1, 2, 2, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1))), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True, + temporal_pool_mode='max'), + bbox_head=dict( + type='BBoxHeadAVA', + dropout_ratio=0.5, + in_channels=2304, + focal_alpha=3.0, + focal_gamma=1.0, + num_classes=81, + multilabel=True)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.2.csv' +ann_file_val = f'{anno_root}/ava_val_v2.2.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.2.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.2.csv' + +label_file = f'{anno_root}/ava_action_list_v2.2_for_activitynet_2019.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=32, frame_interval=2), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] +# optimizer +optimizer = dict(type='SGD', lr=0.075, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + by_epoch=False, + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=2, + warmup_ratio=0.1) +total_epochs = 10 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb' # noqa: E501 +load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth' # noqa: E501 +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..a4979d9ba11ad3f685851336ee7e1c5e42580e8f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb.py @@ -0,0 +1,169 @@ +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=4, + speed_ratio=4, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + fusion_kernel=7, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + spatial_strides=(1, 2, 2, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1))), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True, + temporal_pool_mode='max'), + bbox_head=dict( + type='BBoxHeadAVA', + dropout_ratio=0.5, + in_channels=2304, + num_classes=81, + multilabel=True)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.2.csv' +ann_file_val = f'{anno_root}/ava_val_v2.2.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.2.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.2.csv' + +label_file = f'{anno_root}/ava_action_list_v2.2_for_activitynet_2019.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=32, frame_interval=2), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] +# optimizer +optimizer = dict(type='SGD', lr=0.075, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + by_epoch=False, + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=2, + warmup_ratio=0.1) +total_epochs = 10 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb' # noqa: E501 +load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth' # noqa: E501 +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..ecc89f7ab0e004de3d2832912b19bc2d6c7662f7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb.py @@ -0,0 +1,158 @@ +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowOnly', + depth=101, + pretrained=None, + pretrained2d=False, + lateral=False, + num_stages=4, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1)), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2048, + num_classes=81, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=8, frame_interval=8), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict(type='SampleAVAFrames', clip_len=8, frame_interval=8, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + # During testing, each video may have different shape + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict(type='SGD', lr=0.075, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'omni/slowonly_r101_without_omni_8x8x1_' + 'kinetics400_rgb_20200926-0c730aef.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..54df99e59cc13127a80e6337048f32bfd16eb023 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py @@ -0,0 +1,158 @@ +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + pretrained2d=False, + lateral=False, + num_stages=4, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1)), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2048, + num_classes=81, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=4, frame_interval=16), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=4, frame_interval=16, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict(type='SGD', lr=0.2, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py new file mode 100644 index 0000000000000000000000000000000000000000..30d9ba82dd35c78c00abc95608102fbb80dc02c6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes.py @@ -0,0 +1,169 @@ +# custom classes of ava dataset +# Here we choose classes with AP in range [0.1, 0.3) +# AP is calculated by original ckpt, which is trained by all 80 classes +custom_classes = [3, 6, 10, 27, 29, 38, 41, 48, 51, 53, 54, 59, 61, 64, 70, 72] +num_classes = len(custom_classes) + 1 + +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + pretrained2d=False, + lateral=False, + num_stages=4, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1)), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2048, + num_classes=num_classes, + multilabel=True, + topk=(3, 5), + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=4, frame_interval=16), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=4, frame_interval=16, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + num_classes=num_classes, + custom_classes=custom_classes, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + num_classes=num_classes, + custom_classes=custom_classes, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict(type='SGD', lr=0.2, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..e0a055108e4b6e762d84b3d22ada905944f7454c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb.py @@ -0,0 +1,120 @@ +_base_ = ['../_base_/models/slowonly_r50_nl.py'] + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=4, frame_interval=16), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=4, frame_interval=16, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + # During testing, each video may have different shape + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict( + type='SGD', lr=0.3, momentum=0.9, weight_decay=1e-06, nesterov=True) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[4, 6, 8], + warmup='linear', + warmup_iters=800, + warmup_ratio=0.01) +total_epochs = 10 + +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb') +load_from = ( + 'https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/' + 'slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb_20210308-0d6e5a69.pth' # noqa: E501 +) +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..105b83204598be6fee2a1cddeee45fb5a408c028 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb.py @@ -0,0 +1,119 @@ +_base_ = ['../_base_/models/slowonly_r50_nl.py'] + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=8, frame_interval=8), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict(type='SampleAVAFrames', clip_len=8, frame_interval=8, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + # During testing, each video may have different shape + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict( + type='SGD', lr=0.15, momentum=0.9, weight_decay=1e-06, nesterov=True) +# this lr is used for 8x2 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[4, 6, 8], + warmup='linear', + warmup_iters=1600, + warmup_ratio=0.01) +total_epochs = 10 + +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb') +load_from = ( + 'https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/' + 'slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb_20210308-e8dd9e82.pth' # noqa: E501 +) +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..23f3aaf5dbdf8bd3725873e37d77534d9c1a1fad --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py @@ -0,0 +1,158 @@ +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowOnly', + depth=101, + pretrained=None, + pretrained2d=False, + lateral=False, + num_stages=4, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1)), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2048, + num_classes=81, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=8, frame_interval=8), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict(type='SampleAVAFrames', clip_len=8, frame_interval=8, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + # During testing, each video may have different shape + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict(type='SGD', lr=0.075, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'omni/' + 'slowonly_r101_omni_8x8x1_kinetics400_rgb_20200926-b5dbb701.pth') + +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..067e1745590735a19f748779a4aac452f6e79701 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/ava/slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb.py @@ -0,0 +1,159 @@ +# model setting +model = dict( + type='FastRCNN', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + pretrained2d=False, + lateral=False, + num_stages=4, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + spatial_strides=(1, 2, 2, 1)), + roi_head=dict( + type='AVARoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor3D', + roi_layer_type='RoIAlign', + output_size=8, + with_temporal_pool=True), + bbox_head=dict( + type='BBoxHeadAVA', + in_channels=2048, + num_classes=81, + multilabel=True, + dropout_ratio=0.5)), + train_cfg=dict( + rcnn=dict( + assigner=dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.9, + neg_iou_thr=0.9, + min_pos_iou=0.9), + sampler=dict( + type='RandomSampler', + num=32, + pos_fraction=1, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=1.0, + debug=False)), + test_cfg=dict(rcnn=dict(action_thr=0.002))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=4, frame_interval=16), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=4, frame_interval=16, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape'], + nested=True) +] + +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + # During testing, each video may have different shape + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] + +optimizer = dict(type='SGD', lr=0.2, momentum=0.9, weight_decay=0.00001) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/ava/' + 'slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'omni/' + 'slowonly_r50_omni_4x16x1_kinetics400_rgb_20200926-51b1f7ea.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/README.md b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0658acc9deac1a29fbfdafa7b83a6d6ea6ae4dcf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/README.md @@ -0,0 +1,132 @@ +# LFB + +[Long-term feature banks for detailed video understanding](https://openaccess.thecvf.com/content_CVPR_2019/html/Wu_Long-Term_Feature_Banks_for_Detailed_Video_Understanding_CVPR_2019_paper.html) + + + +## Abstract + + + +To understand the world, we humans constantly need to relate the present to the past, and put events in context. In this paper, we enable existing video models to do the same. We propose a long-term feature bank---supportive information extracted over the entire span of a video---to augment state-of-the-art video models that otherwise would only view short clips of 2-5 seconds. Our experiments demonstrate that augmenting 3D convolutional networks with a long-term feature bank yields state-of-the-art results on three challenging video datasets: AVA, EPIC-Kitchens, and Charades. + + + +
+ +
+ +## Results and Models + +### AVA2.1 + +| Model | Modality | Pretrained | Backbone | Input | gpus | Resolution | mAP | log | json | ckpt | +| :-----------------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :----------: | :--------------------------------------------------------------------------------------------------: | :---: | :--: | :------------: | :---: | :------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py](/configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | [slowonly_r50_4x16x1](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | 4x16 | 8 | short-side 256 | 24.11 | [log](https://download.openmmlab.com/mmaction/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210224_125052.log) | [json](https://download.openmmlab.com/mmaction/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210224_125052.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb_20210224-2ae136d9.pth) | +| [lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py](/configs/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | [slowonly_r50_4x16x1](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | 4x16 | 8 | short-side 256 | 20.17 | [log](https://download.openmmlab.com/mmaction/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log) | [json](https://download.openmmlab.com/mmaction/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb_20210301-19c330b7.pth) | +| [lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py](/configs/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | [slowonly_r50_4x16x1](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | 4x16 | 8 | short-side 256 | 22.15 | [log](https://download.openmmlab.com/mmaction/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log) | [json](https://download.openmmlab.com/mmaction/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb_20210301-37efcd15.pth) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. We use `slowonly_r50_4x16x1` instead of `I3D-R50-NL` in the original paper as the backbone of LFB, but we have achieved the similar improvement: (ours: 20.1 -> 24.11 vs. author: 22.1 -> 25.8). +3. Because the long-term features are randomly sampled in testing, the test accuracy may have some differences. +4. Before train or test lfb, you need to infer feature bank with the [lfb_slowonly_r50_ava_infer.py](/configs/detection/lfb/lfb_slowonly_r50_ava_infer.py). For more details on infer feature bank, you can refer to [Train](#Train) part. +5. You can also dowonload long-term feature bank from [AVA_train_val_float32_lfb](https://download.openmmlab.com/mmaction/detection/lfb/AVA_train_val_float32_lfb.rar) or [AVA_train_val_float16_lfb](https://download.openmmlab.com/mmaction/detection/lfb/AVA_train_val_float16_lfb.rar), and then put them on `lfb_prefix_path`. +6. The ROIHead now supports single-label classification (i.e. the network outputs at most + one-label per actor). This can be done by (a) setting multilabel=False during training and + the test_cfg.rcnn.action_thr for testing. + +::: + +## Train + +### a. Infer long-term feature bank for training + +Before train or test lfb, you need to infer long-term feature bank first. + +Specifically, run the test on the training, validation, testing dataset with the config file [lfb_slowonly_r50_ava_infer](/configs/detection/lfb/lfb_slowonly_r50_ava_infer.py) (The config file will only infer the feature bank of training dataset and you need set `dataset_mode = 'val'` to infer the feature bank of validation dataset in the config file.), and the shared head [LFBInferHead](/mmaction/models/heads/lfb_infer_head.py) will generate the feature bank. + +A long-term feature bank file of AVA training and validation datasets with float32 precision occupies 3.3 GB. If store the features with float16 precision, the feature bank occupies 1.65 GB. + +You can use the following command to infer feature bank of AVA training and validation dataset and the feature bank will be stored in `lfb_prefix_path/lfb_train.pkl` and `lfb_prefix_path/lfb_val.pkl`. + +```shell +# set `dataset_mode = 'train'` in lfb_slowonly_r50_ava_infer.py +python tools/test.py configs/detection/lfb/lfb_slowonly_r50_ava_infer.py \ + checkpoints/YOUR_BASELINE_CHECKPOINT.pth --eval mAP + +# set `dataset_mode = 'val'` in lfb_slowonly_r50_ava_infer.py +python tools/test.py configs/detection/lfb/lfb_slowonly_r50_ava_infer.py \ + checkpoints/YOUR_BASELINE_CHECKPOINT.pth --eval mAP +``` + +We use [slowonly_r50_4x16x1 checkpoint](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-40061d5f.pth) from [slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) to infer feature bank. + +### b. Train LFB + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train LFB model on AVA with half-precision long-term feature bank. + +```shell +python tools/train.py configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py \ + --validate --seed 0 --deterministic +``` + +For more details and optional arguments infos, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +### a. Infer long-term feature bank for testing + +Before train or test lfb, you also need to infer long-term feature bank first. If you have generated the feature bank file, you can skip it. + +The step is the same with **Infer long-term feature bank for training** part in [Train](#Train). + +### b. Test LFB + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test LFB model on AVA with half-precision long-term feature bank and dump the result to a csv file. + +```shell +python tools/test.py configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval mAP --out results.csv +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + + + +```BibTeX +@inproceedings{gu2018ava, + title={Ava: A video dataset of spatio-temporally localized atomic visual actions}, + author={Gu, Chunhui and Sun, Chen and Ross, David A and Vondrick, Carl and Pantofaru, Caroline and Li, Yeqing and Vijayanarasimhan, Sudheendra and Toderici, George and Ricco, Susanna and Sukthankar, Rahul and others}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={6047--6056}, + year={2018} +} +``` + +```BibTeX +@inproceedings{wu2019long, + title={Long-term feature banks for detailed video understanding}, + author={Wu, Chao-Yuan and Feichtenhofer, Christoph and Fan, Haoqi and He, Kaiming and Krahenbuhl, Philipp and Girshick, Ross}, + booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + pages={284--293}, + year={2019} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..2f42c39362f0537d58d67a0aed5507ada477376e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/README_zh-CN.md @@ -0,0 +1,103 @@ +# LFB + +## 简介 + + + +```BibTeX +@inproceedings{wu2019long, + title={Long-term feature banks for detailed video understanding}, + author={Wu, Chao-Yuan and Feichtenhofer, Christoph and Fan, Haoqi and He, Kaiming and Krahenbuhl, Philipp and Girshick, Ross}, + booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + pages={284--293}, + year={2019} +} +``` + +## 模型库 + +### AVA2.1 + +| 配置文件 | 模态 | 预训练 | 主干网络 | 输入 | GPU 数量 | 分辨率 | 平均精度 | log | json | ckpt | +| :-----------------------------------------------------------------------------------------------------------------------------------------------------: | :--: | :----------: | :--------------------------------------------------------------------------------------------------: | :--: | :------: | :------: | :------: | :------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py](/configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | [slowonly_r50_4x16x1](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | 4x16 | 8 | 短边 256 | 24.11 | [log](https://download.openmmlab.com/mmaction/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210224_125052.log) | [json](https://download.openmmlab.com/mmaction/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210224_125052.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb_20210224-2ae136d9.pth) | +| [lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py](/configs/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | [slowonly_r50_4x16x1](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | 4x16 | 8 | 短边 256 | 20.17 | [log](https://download.openmmlab.com/mmaction/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log) | [json](https://download.openmmlab.com/mmaction/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb_20210301-19c330b7.pth) | +| [lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py](/configs/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py) | RGB | Kinetics-400 | [slowonly_r50_4x16x1](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) | 4x16 | 8 | 短边 256 | 22.15 | [log](https://download.openmmlab.com/mmaction/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log) | [json](https://download.openmmlab.com/mmaction/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log.json) | [ckpt](https://download.openmmlab.com/mmaction/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb_20210301-37efcd15.pth) | + +- 注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 本 LFB 模型暂没有使用原论文中的 `I3D-R50-NL` 作为主干网络,而是用 `slowonly_r50_4x16x1` 替代,但取得了同样的提升效果:(本模型:20.1 -> 24.11 而原论文模型:22.1 -> 25.8)。 +3. 因为测试时,长时特征是被随机采样的,所以测试精度可能有一些偏差。 +4. 在训练或测试 LFB 之前,用户需要使用配置文件特征库 [lfb_slowonly_r50_ava_infer.py](/configs/detection/lfb/lfb_slowonly_r50_ava_infer.py) 来推导长时特征库。有关推导长时特征库的更多细节,请参照[训练部分](#%E8%AE%AD%E7%BB%83)。 +5. 用户也可以直接从 [AVA_train_val_float32_lfb](https://download.openmmlab.com/mmaction/detection/lfb/AVA_train_val_float32_lfb.rar) 或者 [AVA_train_val_float16_lfb](https://download.openmmlab.com/mmaction/detection/lfb/AVA_train_val_float16_lfb.rar) 下载 float32 或 float16 的长时特征库,并把它们放在 `lfb_prefix_path` 上。 + +## 训练 + +### a. 为训练 LFB 推导长时特征库 + +在训练或测试 LFB 之前,用户首先需要推导长时特征库。 + +具体来说,使用配置文件 [lfb_slowonly_r50_ava_infer](/configs/detection/lfb/lfb_slowonly_r50_ava_infer.py),在训练集、验证集、测试集上都运行一次模型测试。 + +配置文件的默认设置是推导训练集的长时特征库,用户需要将 `dataset_mode` 设置成 `'val'` 来推导验证集的长时特征库,在推导过程中。共享头 [LFBInferHead](/mmaction/models/heads/lfb_infer_head.py) 会生成长时特征库。 + +AVA 训练集和验证集的 float32 精度的长时特征库文件大约占 3.3 GB。如果以半精度来存储长时特征,文件大约占 1.65 GB。 + +用户可以使用以下命令来推导 AVA 训练集和验证集的长时特征库,而特征库会被存储为 `lfb_prefix_path/lfb_train.pkl` 和 `lfb_prefix_path/lfb_val.pkl`。 + +```shell +# 在 lfb_slowonly_r50_ava_infer.py 中 设置 `dataset_mode = 'train'` +python tools/test.py configs/detection/lfb/lfb_slowonly_r50_ava_infer.py \ + checkpoints/YOUR_BASELINE_CHECKPOINT.pth --eval mAP + +# 在 lfb_slowonly_r50_ava_infer.py 中 设置 `dataset_mode = 'val'` +python tools/test.py configs/detection/lfb/lfb_slowonly_r50_ava_infer.py \ + checkpoints/YOUR_BASELINE_CHECKPOINT.pth --eval mAP +``` + +MMAction2 使用来自配置文件 [slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb](/configs/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py) 的模型权重文件 [slowonly_r50_4x16x1 checkpoint](https://download.openmmlab.com/mmaction/detection/ava/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb/slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_20201217-40061d5f.pth)作为推导长时特征库的 LFB 模型的主干网络的预训练模型。 + +### b. 训练 LFB + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:使用半精度的长时特征库在 AVA 数据集上训练 LFB 模型。 + +```shell +python tools/train.py configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 测试 + +### a. 为测试 LFB 推导长时特征库 + +在训练或测试 LFB 之前,用户首先需要推导长时特征库。如果用户之前已经生成了特征库文件,可以跳过这一步。 + +这一步做法与[训练部分](#Train)中的 **为训练 LFB 推导长时特征库** 相同。 + +### b. 测试 LFB + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:使用半精度的长时特征库在 AVA 数据集上测试 LFB 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval mAP --out results.csv +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6ba6a8fc07e52e85a8a95eeaf4873ad8061b6760 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py @@ -0,0 +1,137 @@ +_base_ = ['../_base_/models/slowonly_r50.py'] + +# model settings +lfb_prefix_path = 'data/ava/lfb_half' +max_num_sampled_feat = 5 +window_size = 60 +lfb_channels = 2048 +dataset_modes = ('train', 'val') + +model = dict( + roi_head=dict( + shared_head=dict( + type='FBOHead', + lfb_cfg=dict( + lfb_prefix_path=lfb_prefix_path, + max_num_sampled_feat=max_num_sampled_feat, + window_size=window_size, + lfb_channels=lfb_channels, + dataset_modes=dataset_modes, + device='gpu'), + fbo_cfg=dict(type='avg')), + bbox_head=dict(in_channels=4096))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=4, frame_interval=16), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids', 'img_key']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=4, frame_interval=16, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape', 'img_key'], + nested=True) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') + +optimizer = dict(type='SGD', lr=0.15, momentum=0.9, weight_decay=1e-05) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 + +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb' # noqa E501 +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6c4dc19d0a741c7737c625aa8742dd655f8c847d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py @@ -0,0 +1,137 @@ +_base_ = ['../_base_/models/slowonly_r50.py'] + +# model settings +lfb_prefix_path = 'data/ava/lfb_half' +max_num_sampled_feat = 5 +window_size = 60 +lfb_channels = 2048 +dataset_modes = ('train', 'val') + +model = dict( + roi_head=dict( + shared_head=dict( + type='FBOHead', + lfb_cfg=dict( + lfb_prefix_path=lfb_prefix_path, + max_num_sampled_feat=max_num_sampled_feat, + window_size=window_size, + lfb_channels=lfb_channels, + dataset_modes=dataset_modes, + device='gpu'), + fbo_cfg=dict(type='max')), + bbox_head=dict(in_channels=4096))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=4, frame_interval=16), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids', 'img_key']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=4, frame_interval=16, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape', 'img_key'], + nested=True) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') + +optimizer = dict(type='SGD', lr=0.15, momentum=0.9, weight_decay=1e-05) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 + +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb' # noqa E501 +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..bdd90ce6e00de183ff0b7d81cb5615ced9f20b40 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py @@ -0,0 +1,147 @@ +_base_ = ['../_base_/models/slowonly_r50.py'] + +# model settings +lfb_prefix_path = 'data/ava/lfb_half' +max_num_sampled_feat = 5 +window_size = 60 +lfb_channels = 2048 +dataset_modes = ('train', 'val') + +model = dict( + roi_head=dict( + shared_head=dict( + type='FBOHead', + lfb_cfg=dict( + lfb_prefix_path=lfb_prefix_path, + max_num_sampled_feat=max_num_sampled_feat, + window_size=window_size, + lfb_channels=lfb_channels, + dataset_modes=dataset_modes, + device='gpu'), + fbo_cfg=dict( + type='non_local', + st_feat_channels=2048, + lt_feat_channels=lfb_channels, + latent_channels=512, + num_st_feat=1, + num_lt_feat=window_size * max_num_sampled_feat, + num_non_local_layers=2, + st_feat_dropout_ratio=0.2, + lt_feat_dropout_ratio=0.2, + pre_activate=True)), + bbox_head=dict(in_channels=2560))) + +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_train = f'{anno_root}/ava_train_v2.1.csv' +ann_file_val = f'{anno_root}/ava_val_v2.1.csv' + +exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' +exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' + 'recall_93.9.pkl') +proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleAVAFrames', clip_len=4, frame_interval=16), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=256), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict( + type='ToDataContainer', + fields=[ + dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False) + ]), + dict( + type='Collect', + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], + meta_keys=['scores', 'entity_ids', 'img_key']) +] +# The testing is w/o. any cropping / flipping +val_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=4, frame_interval=16, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape', 'img_key'], + nested=True) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) +data['test'] = data['val'] +evaluation = dict(interval=1, save_best='mAP@0.5IOU') + +optimizer = dict(type='SGD', lr=0.15, momentum=0.9, weight_decay=1e-05) +# this lr is used for 8 gpus + +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy + +lr_config = dict( + policy='step', + step=[10, 15], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=5, + warmup_ratio=0.1) +total_epochs = 20 + +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb' # noqa E501 +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth') +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_slowonly_r50_ava_infer.py b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_slowonly_r50_ava_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..568f0765bdcc91b7586aa22c1d9f0cdf680971e5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/lfb_slowonly_r50_ava_infer.py @@ -0,0 +1,65 @@ +# This config is used to generate long-term feature bank. +_base_ = ['../_base_/models/slowonly_r50.py'] + +# model settings +lfb_prefix_path = 'data/ava/lfb_half' +dataset_mode = 'train' # ['train', 'val', 'test'] + +model = dict( + roi_head=dict( + shared_head=dict( + type='LFBInferHead', + lfb_prefix_path=lfb_prefix_path, + dataset_mode=dataset_mode, + use_half_precision=True))) + +# dataset settings +dataset_type = 'AVADataset' +data_root = 'data/ava/rawframes' +anno_root = 'data/ava/annotations' + +ann_file_infer = f'{anno_root}/ava_{dataset_mode}_v2.1.csv' + +exclude_file_infer = ( + f'{anno_root}/ava_{dataset_mode}_excluded_timestamps_v2.1.csv') + +label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' + +proposal_file_infer = ( + f'{anno_root}/ava_dense_proposals_{dataset_mode}.FAIR.recall_93.9.pkl') + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +infer_pipeline = [ + dict( + type='SampleAVAFrames', clip_len=4, frame_interval=16, test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW', collapse=True), + # Rename is needed to use mmdet detectors + dict(type='Rename', mapping=dict(imgs='img')), + dict(type='ToTensor', keys=['img', 'proposals']), + dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]), + dict( + type='Collect', + keys=['img', 'proposals'], + meta_keys=['scores', 'img_shape', 'img_key'], + nested=True) +] + +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=ann_file_infer, + exclude_file=exclude_file_infer, + pipeline=infer_pipeline, + label_file=label_file, + proposal_file=proposal_file_infer, + person_det_score_thr=0.9, + data_prefix=data_root)) + +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..90ec931e97f1613fc517fab72bb798e16cebf5d1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/detection/lfb/metafile.yml @@ -0,0 +1,70 @@ +Collections: +- Name: LFB + README: configs/detection/lfb/README.md + Paper: + URL: https://arxiv.org/abs/1812.05038 + Title: Long-Term Feature Banks for Detailed Video Understanding +Models: +- Config: configs/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py + In Collection: LFB + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 20 + Input: 4x16 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 8 GPUs + Modality: RGB + Name: lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 24.11 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210224_125052.log.json + Training Log: https://download.openmmlab.com/mmaction/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210224_125052.log + Weights: https://download.openmmlab.com/mmaction/detection/lfb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/lfb_nl_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb_20210224-2ae136d9.pth +- Config: configs/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py + In Collection: LFB + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 20 + Input: 4x16 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 8 GPUs + Modality: RGB + Name: lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 20.17 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log.json + Training Log: https://download.openmmlab.com/mmaction/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log + Weights: https://download.openmmlab.com/mmaction/detection/lfb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/lfb_avg_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb_20210301-19c330b7.pth +- Config: configs/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py + In Collection: LFB + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 20 + Input: 4x16 + Pretrained: Kinetics-400 + Resolution: short-side 256 + Training Data: AVA v2.1 + Training Resources: 8 GPUs + Modality: RGB + Name: lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb.py + Results: + - Dataset: AVA v2.1 + Metrics: + mAP: 22.15 + Task: Spatial Temporal Action Detection + Training Json Log: https://download.openmmlab.com/mmaction/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log.json + Training Log: https://download.openmmlab.com/mmaction/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/20210301_124812.log + Weights: https://download.openmmlab.com/mmaction/detection/lfb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb/lfb_max_kinetics_pretrained_slowonly_r50_4x16x1_20e_ava_rgb_20210301-37efcd15.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ccf07450a0538b0354b0df48e857752e72ae7816 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/README.md @@ -0,0 +1,115 @@ +# BMN + +[Bmn: Boundary-matching network for temporal action proposal generation](https://openaccess.thecvf.com/content_ICCV_2019/html/Lin_BMN_Boundary-Matching_Network_for_Temporal_Action_Proposal_Generation_ICCV_2019_paper.html) + + + +## Abstract + + + +Temporal action proposal generation is an challenging and promising task which aims to locate temporal regions in real-world videos where action or event may occur. Current bottom-up proposal generation methods can generate proposals with precise boundary, but cannot efficiently generate adequately reliable confidence scores for retrieving proposals. To address these difficulties, we introduce the Boundary-Matching (BM) mechanism to evaluate confidence scores of densely distributed proposals, which denote a proposal as a matching pair of starting and ending boundaries and combine all densely distributed BM pairs into the BM confidence map. Based on BM mechanism, we propose an effective, efficient and end-to-end proposal generation method, named Boundary-Matching Network (BMN), which generates proposals with precise temporal boundaries as well as reliable confidence scores simultaneously. The two-branches of BMN are jointly trained in an unified framework. We conduct experiments on two challenging datasets: THUMOS-14 and ActivityNet-1.3, where BMN shows significant performance improvement with remarkable efficiency and generalizability. Further, combining with existing action classifier, BMN can achieve state-of-the-art temporal action detection performance. + + + +
+ +
+ +## Results and Models + +### ActivityNet feature + +| config | feature | gpus | AR@100 | AUC | AP@0.5 | AP@0.75 | AP@0.95 | mAP | gpu_mem(M) | iter time(s) | ckpt | log | json | +| :-----------------------------------------------------------------------------------------------------------: | :------------: | :--: | :----: | :---: | :----: | :-----: | :-----: | :---: | :--------: | ------------ | :----------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------: | -------------------------------------------------------------------------------------------------------------------------------------------------- | +| [bmn_400x100_9e_2x8_activitynet_feature](/configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py) | cuhk_mean_100 | 2 | 75.28 | 67.22 | 42.47 | 31.31 | 9.92 | 30.34 | 5420 | 3.27 | [ckpt](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_9e_activitynet_feature/bmn_400x100_9e_activitynet_feature_20200619-42a3b111.pth) | [log](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_9e_activitynet_feature/bmn_400x100_9e_activitynet_feature.log) | [json](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_9e_activitynet_feature/bmn_400x100_9e_activitynet_feature.log.json) | +| | mmaction_video | 2 | 75.43 | 67.22 | 42.62 | 31.56 | 10.86 | 30.77 | 5420 | 3.27 | [ckpt](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_video/bmn_400x100_2x8_9e_mmaction_video_20200809-c9fd14d2.pth) | [log](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_video/bmn_400x100_2x8_9e_mmaction_video_20200809.log) | [json](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_video/bmn_400x100_2x8_9e_mmaction_video_20200809.json) | +| | mmaction_clip | 2 | 75.35 | 67.38 | 43.08 | 32.19 | 10.73 | 31.15 | 5420 | 3.27 | [ckpt](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_clip/bmn_400x100_2x8_9e_mmaction_clip_20200809-10d803ce.pth) | [log](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_clip/bmn_400x100_2x8_9e_mmaction_clip_20200809.log) | [json](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_clip/bmn_400x100_2x8_9e_mmaction_clip_20200809.json) | +| [BMN-official](https://github.com/JJBOY/BMN-Boundary-Matching-Network) (for reference)\* | cuhk_mean_100 | - | 75.27 | 67.49 | 42.22 | 30.98 | 9.22 | 30.00 | - | - | - | - | - | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. For feature column, cuhk_mean_100 denotes the widely used cuhk activitynet feature extracted by [anet2016-cuhk](https://github.com/yjxiong/anet2016-cuhk), mmaction_video and mmaction_clip denote feature extracted by mmaction, with video-level activitynet finetuned model or clip-level activitynet finetuned model respectively. +3. We evaluate the action detection performance of BMN, using [anet_cuhk_2017](https://download.openmmlab.com/mmaction/localization/cuhk_anet17_pred.json) submission for ActivityNet2017 Untrimmed Video Classification Track to assign label for each action proposal. + +::: + +\*We train BMN with the [official repo](https://github.com/JJBOY/BMN-Boundary-Matching-Network), evaluate its proposal generation and action detection performance with [anet_cuhk_2017](https://download.openmmlab.com/mmaction/localization/cuhk_anet17_pred.json) for label assigning. + +For more details on data preparation, you can refer to ActivityNet feature in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train BMN model on ActivityNet features dataset. + +```shell +python tools/train.py configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py +``` + +For more details and optional arguments infos, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting) . + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test BMN on ActivityNet feature dataset. + +```shell +# Note: If evaluated, then please make sure the annotation file for test data contains groundtruth. +python tools/test.py configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth --eval AR@AN --out results.json +``` + +You can also test the action detection performance of the model, with [anet_cuhk_2017](https://download.openmmlab.com/mmaction/localization/cuhk_anet17_pred.json) prediction file and generated proposal file (`results.json` in last command). + +```shell +python tools/analysis/report_map.py --proposal path/to/proposal_file +``` + +:::{note} + +1. (Optional) You can use the following command to generate a formatted proposal file, which will be fed into the action classifier (Currently supports SSN and P-GCN, not including TSN, I3D etc.) to get the classification result of proposals. + + ```shell + python tools/data/activitynet/convert_proposal_format.py + ``` + +::: + +For more details and optional arguments infos, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset) . + +## Citation + +```BibTeX +@inproceedings{lin2019bmn, + title={Bmn: Boundary-matching network for temporal action proposal generation}, + author={Lin, Tianwei and Liu, Xiao and Li, Xin and Ding, Errui and Wen, Shilei}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + pages={3889--3898}, + year={2019} +} +``` + + + +```BibTeX +@article{zhao2017cuhk, + title={Cuhk \& ethz \& siat submission to activitynet challenge 2017}, + author={Zhao, Y and Zhang, B and Wu, Z and Yang, S and Zhou, L and Yan, S and Wang, L and Xiong, Y and Lin, D and Qiao, Y and others}, + journal={arXiv preprint arXiv:1710.08011}, + volume={8}, + year={2017} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..72c4f3fe2be13ef6aa5afea253f25c726ecba6a8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/README_zh-CN.md @@ -0,0 +1,98 @@ +# BMN + +## 简介 + + + +```BibTeX +@inproceedings{lin2019bmn, + title={Bmn: Boundary-matching network for temporal action proposal generation}, + author={Lin, Tianwei and Liu, Xiao and Li, Xin and Ding, Errui and Wen, Shilei}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + pages={3889--3898}, + year={2019} +} +``` + + + +```BibTeX +@article{zhao2017cuhk, + title={Cuhk \& ethz \& siat submission to activitynet challenge 2017}, + author={Zhao, Y and Zhang, B and Wu, Z and Yang, S and Zhou, L and Yan, S and Wang, L and Xiong, Y and Lin, D and Qiao, Y and others}, + journal={arXiv preprint arXiv:1710.08011}, + volume={8}, + year={2017} +} +``` + +## 模型库 + +### ActivityNet feature + +| 配置文件 | 特征 | GPU 数量 | AR@100 | AUC | AP@0.5 | AP@0.75 | AP@0.95 | mAP | GPU 显存占用 (M) | 推理时间 (s) | ckpt | log | json | +| :-----------------------------------------------------------------------------------------------------------: | :------------: | :------: | :----: | :---: | :----: | :-----: | :-----: | :---: | :--------------: | ------------ | :----------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------: | -------------------------------------------------------------------------------------------------------------------------------------------------- | +| [bmn_400x100_9e_2x8_activitynet_feature](/configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py) | cuhk_mean_100 | 2 | 75.28 | 67.22 | 42.47 | 31.31 | 9.92 | 30.34 | 5420 | 3.27 | [ckpt](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_9e_activitynet_feature/bmn_400x100_9e_activitynet_feature_20200619-42a3b111.pth) | [log](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_9e_activitynet_feature/bmn_400x100_9e_activitynet_feature.log) | [json](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_9e_activitynet_feature/bmn_400x100_9e_activitynet_feature.log.json) | +| | mmaction_video | 2 | 75.43 | 67.22 | 42.62 | 31.56 | 10.86 | 30.77 | 5420 | 3.27 | [ckpt](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_video/bmn_400x100_2x8_9e_mmaction_video_20200809-c9fd14d2.pth) | [log](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_video/bmn_400x100_2x8_9e_mmaction_video_20200809.log) | [json](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_video/bmn_400x100_2x8_9e_mmaction_video_20200809.json) | +| | mmaction_clip | 2 | 75.35 | 67.38 | 43.08 | 32.19 | 10.73 | 31.15 | 5420 | 3.27 | [ckpt](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_clip/bmn_400x100_2x8_9e_mmaction_clip_20200809-10d803ce.pth) | [log](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_clip/bmn_400x100_2x8_9e_mmaction_clip_20200809.log) | [json](https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_clip/bmn_400x100_2x8_9e_mmaction_clip_20200809.json) | +| [BMN-official](https://github.com/JJBOY/BMN-Boundary-Matching-Network) (for reference)\* | cuhk_mean_100 | - | 75.27 | 67.49 | 42.22 | 30.98 | 9.22 | 30.00 | - | - | - | - | - | + +- 注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 对于 **特征** 这一列,`cuhk_mean_100` 表示所使用的特征为利用 [anet2016-cuhk](https://github.com/yjxiong/anet2016-cuhk) 代码库抽取的,被广泛利用的 CUHK ActivityNet 特征, + `mmaction_video` 和 `mmaction_clip` 分布表示所使用的特征为利用 MMAction 抽取的,视频级别 ActivityNet 预训练模型的特征;视频片段级别 ActivityNet 预训练模型的特征。 +3. MMAction2 使用 ActivityNet2017 未剪辑视频分类赛道上 [anet_cuhk_2017](https://download.openmmlab.com/mmaction/localization/cuhk_anet17_pred.json) 所提交的结果来为每个视频的时序动作候选指定标签,以用于 BMN 模型评估。 + +\*MMAction2 在 [原始代码库](https://github.com/JJBOY/BMN-Boundary-Matching-Network) 上训练 BMN,并且在 [anet_cuhk_2017](https://download.openmmlab.com/mmaction/localization/cuhk_anet17_pred.json) 的对应标签上评估时序动作候选生成和时序检测的结果。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 ActivityNet 特征部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:在 ActivityNet 特征上训练 BMN。 + +```shell +python tools/train.py configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 ActivityNet 特征上测试 BMN 模型。 + +```shell +# 注:如果需要进行指标验证,需确测试数据的保标注文件包含真实标签 +python tools/test.py configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth --eval AR@AN --out results.json +``` + +用户也可以利用 [anet_cuhk_2017](https://download.openmmlab.com/mmaction/localization/cuhk_anet17_pred.json) 的预测文件评估模型时序检测的结果,并生成时序动作候选文件(即命令中的 `results.json`) + +```shell +python tools/analysis/report_map.py --proposal path/to/proposal_file +``` + +注: + +1. (可选项) 用户可以使用以下指令生成格式化的时序动作候选文件,该文件可被送入动作识别器中(目前只支持 SSN 和 P-GCN,不包括 TSN, I3D 等),以获得时序动作候选的分类结果。 + + ```shell + python tools/data/activitynet/convert_proposal_format.py + ``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py b/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py new file mode 100644 index 0000000000000000000000000000000000000000..6e27661f7757fc06798b1993d157b759f1fbef52 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py @@ -0,0 +1,88 @@ +_base_ = [ + '../../_base_/models/bmn_400x100.py', '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'ActivityNetDataset' +data_root = 'data/ActivityNet/activitynet_feature_cuhk/csv_mean_100/' +data_root_val = 'data/ActivityNet/activitynet_feature_cuhk/csv_mean_100/' +ann_file_train = 'data/ActivityNet/anet_anno_train.json' +ann_file_val = 'data/ActivityNet/anet_anno_val.json' +ann_file_test = 'data/ActivityNet/anet_anno_val.json' + +test_pipeline = [ + dict(type='LoadLocalizationFeature'), + dict( + type='Collect', + keys=['raw_feature'], + meta_name='video_meta', + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), + dict(type='ToTensor', keys=['raw_feature']), +] +train_pipeline = [ + dict(type='LoadLocalizationFeature'), + dict(type='GenerateLocalizationLabels'), + dict( + type='Collect', + keys=['raw_feature', 'gt_bbox'], + meta_name='video_meta', + meta_keys=['video_name']), + dict(type='ToTensor', keys=['raw_feature', 'gt_bbox']), + dict( + type='ToDataContainer', + fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) +] +val_pipeline = [ + dict(type='LoadLocalizationFeature'), + dict(type='GenerateLocalizationLabels'), + dict( + type='Collect', + keys=['raw_feature', 'gt_bbox'], + meta_name='video_meta', + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), + dict(type='ToTensor', keys=['raw_feature', 'gt_bbox']), + dict( + type='ToDataContainer', + fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=8, + train_dataloader=dict(drop_last=True), + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + pipeline=test_pipeline, + data_prefix=data_root_val), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + pipeline=val_pipeline, + data_prefix=data_root_val), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + pipeline=train_pipeline, + data_prefix=data_root)) +evaluation = dict(interval=1, metrics=['AR@AN']) + +# optimizer +optimizer = dict( + type='Adam', lr=0.001, weight_decay=0.0001) # this lr is used for 2 gpus +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=7) +total_epochs = 9 + +# runtime settings +log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) +work_dir = './work_dirs/bmn_400x100_2x8_9e_activitynet_feature/' +output_config = dict(out=f'{work_dir}/results.json', output_format='json') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..40eafd4f94b626ff60fe0866cf89756208b826b6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bmn/metafile.yml @@ -0,0 +1,73 @@ +Collections: +- Name: BMN + README: configs/localization/bmn/README.md + Paper: + URL: https://arxiv.org/abs/1907.09702 + Title: "BMN: Boundary-Matching Network for Temporal Action Proposal Generation" +Models: +- Config: configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py + In Collection: BMN + Metadata: + Batch Size: 8 + Epochs: 9 + Training Data: ActivityNet v1.3 + Training Resources: 2 GPUs + feature: cuhk_mean_100 + Name: bmn_400x100_9e_2x8_activitynet_feature (cuhk_mean_100) + Results: + - Dataset: ActivityNet v1.3 + Metrics: + AP@0.5: 42.47 + AP@0.75: 31.31 + AP@0.95: 9.92 + AR@100: 75.28 + AUC: 67.22 + mAP: 30.34 + Task: Temporal Action Localization + Training Json Log: https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_9e_activitynet_feature/bmn_400x100_9e_activitynet_feature.log.json + Training Log: https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_9e_activitynet_feature/bmn_400x100_9e_activitynet_feature.log + Weights: https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_9e_activitynet_feature/bmn_400x100_9e_activitynet_feature_20200619-42a3b111.pth +- Config: configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py + In Collection: BMN + Metadata: + Batch Size: 8 + Epochs: 9 + Training Data: ActivityNet v1.3 + Training Resources: 2 GPUs + feature: mmaction_video + Name: bmn_400x100_9e_2x8_activitynet_feature (mmaction_video) + Results: + - Dataset: ActivityNet v1.3 + Metrics: + AP@0.5: 42.62 + AP@0.75: 31.56 + AP@0.95: 10.86 + AR@100: 75.43 + AUC: 67.22 + mAP: 30.77 + Task: Temporal Action Localization + Training Json Log: https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_video/bmn_400x100_2x8_9e_mmaction_video_20200809.json + Training Log: https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_video/bmn_400x100_2x8_9e_mmaction_video_20200809.log + Weights: https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_video/bmn_400x100_2x8_9e_mmaction_video_20200809-c9fd14d2.pth +- Config: configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py + In Collection: BMN + Metadata: + Batch Size: 8 + Epochs: 9 + Training Data: ActivityNet v1.3 + Training Resources: 2 GPUs + feature: mmaction_clip + Name: bmn_400x100_9e_2x8_activitynet_feature (mmaction_clip) + Results: + - Dataset: ActivityNet v1.3 + Metrics: + AP@0.5: 43.08 + AP@0.75: 32.19 + AP@0.95: 10.73 + AR@100: 75.35 + AUC: 67.38 + mAP: 31.15 + Task: Temporal Action Localization + Training Json Log: https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_clip/bmn_400x100_2x8_9e_mmaction_clip_20200809.json + Training Log: https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_clip/bmn_400x100_2x8_9e_mmaction_clip_20200809.log + Weights: https://download.openmmlab.com/mmaction/localization/bmn/bmn_400x100_2x8_9e_mmaction_clip/bmn_400x100_2x8_9e_mmaction_clip_20200809-10d803ce.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..c307cb15d83b441d4d0f85c54ceae8e43cd68928 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/README.md @@ -0,0 +1,173 @@ +# BSN + +[Bsn: Boundary sensitive network for temporal action proposal generation](https://openaccess.thecvf.com/content_ECCV_2018/html/Tianwei_Lin_BSN_Boundary_Sensitive_ECCV_2018_paper.html) + + + +## Abstract + + + +Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content. This problem requires methods not only generating proposals with precise temporal boundaries, but also retrieving proposals to cover truth action instances with high recall and high overlap using relatively fewer proposals. To address these difficulties, we introduce an effective proposal generation method, named Boundary-Sensitive Network (BSN), which adopts "local to global" fashion. Locally, BSN first locates temporal boundaries with high probabilities, then directly combines these boundaries as proposals. Globally, with Boundary-Sensitive Proposal feature, BSN retrieves proposals by evaluating the confidence of whether a proposal contains an action within its region. We conduct experiments on two challenging datasets: ActivityNet-1.3 and THUMOS14, where BSN outperforms other state-of-the-art temporal action proposal generation methods with high recall and high temporal precision. Finally, further experiments demonstrate that by combining existing action classifiers, our method significantly improves the state-of-the-art temporal action detection performance. + + + +
+ +
+ +## Results and Models + +### ActivityNet feature + +| config | feature | gpus | pretrain | AR@100 | AUC | gpu_mem(M) | iter time(s) | ckpt | log | json | +| :--------------------------------------- | :------------: | :--: | :------: | :----: | :---: | :-------------: | :-------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| bsn_400x100_1x16_20e_activitynet_feature | cuhk_mean_100 | 1 | None | 74.66 | 66.45 | 41(TEM)+25(PEM) | 0.074(TEM)+0.036(PEM) | [ckpt_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature/bsn_tem_400x100_1x16_20e_activitynet_feature_20200619-cd6accc3.pth) [ckpt_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature/bsn_pem_400x100_1x16_20e_activitynet_feature_20210203-1c27763d.pth) | [log_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature/bsn_tem_400x100_1x16_20e_activitynet_feature.log) [log_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature/bsn_pem_400x100_1x16_20e_activitynet_feature.log) | [json_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature/bsn_tem_400x100_1x16_20e_activitynet_feature.log.json) [json_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature/bsn_pem_400x100_1x16_20e_activitynet_feature.log.json) | +| | mmaction_video | 1 | None | 74.93 | 66.74 | 41(TEM)+25(PEM) | 0.074(TEM)+0.036(PEM) | [ckpt_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_video/bsn_tem_400x100_1x16_20e_mmaction_video_20200809-ad6ec626.pth) [ckpt_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_video/bsn_pem_400x100_1x16_20e_mmaction_video_20200809-aa861b26.pth) | [log_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_video/bsn_tem_400x100_1x16_20e_mmaction_video_20200809.log) [log_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_video/bsn_pem_400x100_1x16_20e_mmaction_video_20200809.log) | [json_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_video/bsn_tem_400x100_1x16_20e_mmaction_video_20200809.json) [json_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_video/bsn_pem_400x100_1x16_20e_mmaction_video_20200809.json) | +| | mmaction_clip | 1 | None | 75.19 | 66.81 | 41(TEM)+25(PEM) | 0.074(TEM)+0.036(PEM) | [ckpt_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_clip/bsn_tem_400x100_1x16_20e_mmaction_clip_20200809-0a563554.pth) [ckpt_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_clip/bsn_pem_400x100_1x16_20e_mmaction_clip_20200809-e32f61e6.pth) | [log_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_clip/bsn_tem_400x100_1x16_20e_mmaction_clip_20200809.log) [log_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_clip/bsn_pem_400x100_1x16_20e_mmaction_clip_20200809.log) | [json_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_clip/bsn_tem_400x100_1x16_20e_mmaction_clip_20200809.json) [json_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_clip/bsn_pem_400x100_1x16_20e_mmaction_clip_20200809.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. For feature column, cuhk_mean_100 denotes the widely used cuhk activitynet feature extracted by [anet2016-cuhk](https://github.com/yjxiong/anet2016-cuhk), mmaction_video and mmaction_clip denote feature extracted by mmaction, with video-level activitynet finetuned model or clip-level activitynet finetuned model respectively. + +::: + +For more details on data preparation, you can refer to ActivityNet feature in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following commands to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Examples: + +1. train BSN(TEM) on ActivityNet features dataset. + + ```shell + python tools/train.py configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py + ``` + +2. train BSN(PEM) on PGM results. + + ```shell + python tools/train.py configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py + ``` + +For more details and optional arguments infos, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Inference + +You can use the following commands to inference a model. + +1. For TEM Inference + + ```shell + # Note: This could not be evaluated. + python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] + ``` + +2. For PGM Inference + + ```shell + python tools/misc/bsn_proposal_generation.py ${CONFIG_FILE} [--mode ${MODE}] + ``` + +3. For PEM Inference + + ```shell + python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] + ``` + +Examples: + +1. Inference BSN(TEM) with pretrained model. + + ```shell + python tools/test.py configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth + ``` + +2. Inference BSN(PGM) with pretrained model. + + ```shell + python tools/misc/bsn_proposal_generation.py configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py --mode train + ``` + +3. Inference BSN(PEM) with evaluation metric 'AR@AN' and output the results. + + ```shell + # Note: If evaluated, then please make sure the annotation file for test data contains groundtruth. + python tools/test.py configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth --eval AR@AN --out results.json + ``` + +## Test + +You can use the following commands to test a model. + +1. TEM + + ```shell + # Note: This could not be evaluated. + python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] + ``` + +2. PGM + + ```shell + python tools/misc/bsn_proposal_generation.py ${CONFIG_FILE} [--mode ${MODE}] + ``` + +3. PEM + + ```shell + python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] + ``` + +Examples: + +1. Test a TEM model on ActivityNet dataset. + + ```shell + python tools/test.py configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth + ``` + +2. Test a PGM model on ActivityNet dataset. + + ```shell + python tools/misc/bsn_proposal_generation.py configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py --mode test + ``` + +3. Test a PEM model with with evaluation metric 'AR@AN' and output the results. + + ```shell + python tools/test.py configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth --eval AR@AN --out results.json + ``` + +:::{note} + +1. (Optional) You can use the following command to generate a formatted proposal file, which will be fed into the action classifier (Currently supports only SSN and P-GCN, not including TSN, I3D etc.) to get the classification result of proposals. + + ```shell + python tools/data/activitynet/convert_proposal_format.py + ``` + +::: + +For more details and optional arguments infos, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{lin2018bsn, + title={Bsn: Boundary sensitive network for temporal action proposal generation}, + author={Lin, Tianwei and Zhao, Xu and Su, Haisheng and Wang, Chongjing and Yang, Ming}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + pages={3--19}, + year={2018} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..14e6251a75dfbb6175f9249f54296c3626beb032 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/README_zh-CN.md @@ -0,0 +1,156 @@ +# BSN + +## 简介 + + + +```BibTeX +@inproceedings{lin2018bsn, + title={Bsn: Boundary sensitive network for temporal action proposal generation}, + author={Lin, Tianwei and Zhao, Xu and Su, Haisheng and Wang, Chongjing and Yang, Ming}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + pages={3--19}, + year={2018} +} +``` + +## 模型库 + +### ActivityNet feature + +| 配置文件 | 特征 | GPU 数量 | 预训练 | AR@100 | AUC | GPU 显存占用 (M) | 迭代时间 (s) | ckpt | log | json | +| :--------------------------------------- | :------------: | :------: | :----: | :----: | :---: | :--------------: | :-------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| bsn_400x100_1x16_20e_activitynet_feature | cuhk_mean_100 | 1 | None | 74.66 | 66.45 | 41(TEM)+25(PEM) | 0.074(TEM)+0.036(PEM) | [ckpt_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature/bsn_tem_400x100_1x16_20e_activitynet_feature_20200619-cd6accc3.pth) [ckpt_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature/bsn_pem_400x100_1x16_20e_activitynet_feature_20210203-1c27763d.pth) | [log_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature/bsn_tem_400x100_1x16_20e_activitynet_feature.log) [log_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature/bsn_pem_400x100_1x16_20e_activitynet_feature.log) | [json_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature/bsn_tem_400x100_1x16_20e_activitynet_feature.log.json) [json_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature/bsn_pem_400x100_1x16_20e_activitynet_feature.log.json) | +| | mmaction_video | 1 | None | 74.93 | 66.74 | 41(TEM)+25(PEM) | 0.074(TEM)+0.036(PEM) | [ckpt_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_video/bsn_tem_400x100_1x16_20e_mmaction_video_20200809-ad6ec626.pth) [ckpt_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_video/bsn_pem_400x100_1x16_20e_mmaction_video_20200809-aa861b26.pth) | [log_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_video/bsn_tem_400x100_1x16_20e_mmaction_video_20200809.log) [log_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_video/bsn_pem_400x100_1x16_20e_mmaction_video_20200809.log) | [json_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_video/bsn_tem_400x100_1x16_20e_mmaction_video_20200809.json) [json_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_video/bsn_pem_400x100_1x16_20e_mmaction_video_20200809.json) | +| | mmaction_clip | 1 | None | 75.19 | 66.81 | 41(TEM)+25(PEM) | 0.074(TEM)+0.036(PEM) | [ckpt_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_clip/bsn_tem_400x100_1x16_20e_mmaction_clip_20200809-0a563554.pth) [ckpt_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_clip/bsn_pem_400x100_1x16_20e_mmaction_clip_20200809-e32f61e6.pth) | [log_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_clip/bsn_tem_400x100_1x16_20e_mmaction_clip_20200809.log) [log_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_clip/bsn_pem_400x100_1x16_20e_mmaction_clip_20200809.log) | [json_tem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_clip/bsn_tem_400x100_1x16_20e_mmaction_clip_20200809.json) [json_pem](https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_clip/bsn_pem_400x100_1x16_20e_mmaction_clip_20200809.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 对于 **特征** 这一列,`cuhk_mean_100` 表示所使用的特征为利用 [anet2016-cuhk](https://github.com/yjxiong/anet2016-cuhk) 代码库抽取的,被广泛利用的 CUHK ActivityNet 特征, + `mmaction_video` 和 `mmaction_clip` 分布表示所使用的特征为利用 MMAction 抽取的,视频级别 ActivityNet 预训练模型的特征;视频片段级别 ActivityNet 预训练模型的特征。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 ActivityNet 特征部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如: + +1. 在 ActivityNet 特征上训练 BSN(TEM) 模型。 + + ```shell + python tools/train.py configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py + ``` + +2. 基于 PGM 的结果训练 BSN(PEM)。 + + ```shell + python tools/train.py configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py + ``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何进行推理 + +用户可以使用以下指令进行模型推理。 + +1. 推理 TEM 模型。 + + ```shell + # Note: This could not be evaluated. + python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] + ``` + +2. 推理 PGM 模型 + + ```shell + python tools/misc/bsn_proposal_generation.py ${CONFIG_FILE} [--mode ${MODE}] + ``` + +3. 推理 PEM 模型 + + ```shell + python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] + ``` + +例如 + +1. 利用预训练模型进行 BSN(TEM) 模型的推理。 + + ```shell + python tools/test.py configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth + ``` + +2. 利用预训练模型进行 BSN(PGM) 模型的推理 + + ```shell + python tools/misc/bsn_proposal_generation.py configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py --mode train + ``` + +3. 推理 BSN(PEM) 模型,并计算 'AR@AN' 指标,输出结果文件。 + + ```shell + # 注:如果需要进行指标验证,需确测试数据的保标注文件包含真实标签 + python tools/test.py configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth --eval AR@AN --out results.json + ``` + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +1. TEM + + ```shell + # 注:该命令无法进行指标验证 + python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] + ``` + +2. PGM + + ```shell + python tools/misc/bsn_proposal_generation.py ${CONFIG_FILE} [--mode ${MODE}] + ``` + +3. PEM + + ```shell + python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] + ``` + +例如: + +1. 在 ActivityNet 数据集上测试 TEM 模型。 + + ```shell + python tools/test.py configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth + ``` + +2. 在 ActivityNet 数据集上测试 PGM 模型。 + + ```shell + python tools/misc/bsn_proposal_generation.py configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py --mode test + ``` + +3. 测试 PEM 模型,并计算 'AR@AN' 指标,输出结果文件。 + + ```shell + python tools/test.py configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py checkpoints/SOME_CHECKPOINT.pth --eval AR@AN --out results.json + ``` + +注: + +1. (可选项) 用户可以使用以下指令生成格式化的时序动作候选文件,该文件可被送入动作识别器中(目前只支持 SSN 和 P-GCN,不包括 TSN, I3D 等),以获得时序动作候选的分类结果。 + + ```shell + python tools/data/activitynet/convert_proposal_format.py + ``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py new file mode 100644 index 0000000000000000000000000000000000000000..429d2284024d4146a8143c9358838728661e6ae2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py @@ -0,0 +1,95 @@ +_base_ = [ + '../../_base_/models/bsn_pem.py', '../../_base_/schedules/adam_20e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'ActivityNetDataset' +data_root = 'data/ActivityNet/activitynet_feature_cuhk/csv_mean_100/' +data_root_val = 'data/ActivityNet/activitynet_feature_cuhk/csv_mean_100/' +ann_file_train = 'data/ActivityNet/anet_anno_train.json' +ann_file_val = 'data/ActivityNet/anet_anno_val.json' +ann_file_test = 'data/ActivityNet/anet_anno_val.json' + +work_dir = 'work_dirs/bsn_400x100_20e_1x16_activitynet_feature/' +pgm_proposals_dir = f'{work_dir}/pgm_proposals/' +pgm_features_dir = f'{work_dir}/pgm_features/' + +test_pipeline = [ + dict( + type='LoadProposals', + top_k=1000, + pgm_proposals_dir=pgm_proposals_dir, + pgm_features_dir=pgm_features_dir), + dict( + type='Collect', + keys=['bsp_feature', 'tmin', 'tmax', 'tmin_score', 'tmax_score'], + meta_name='video_meta', + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), + dict(type='ToTensor', keys=['bsp_feature']) +] + +train_pipeline = [ + dict( + type='LoadProposals', + top_k=500, + pgm_proposals_dir=pgm_proposals_dir, + pgm_features_dir=pgm_features_dir), + dict( + type='Collect', + keys=['bsp_feature', 'reference_temporal_iou'], + meta_name='video_meta', + meta_keys=[]), + dict(type='ToTensor', keys=['bsp_feature', 'reference_temporal_iou']), + dict( + type='ToDataContainer', + fields=(dict(key='bsp_feature', stack=False), + dict(key='reference_temporal_iou', stack=False))) +] + +val_pipeline = [ + dict( + type='LoadProposals', + top_k=1000, + pgm_proposals_dir=pgm_proposals_dir, + pgm_features_dir=pgm_features_dir), + dict( + type='Collect', + keys=['bsp_feature', 'tmin', 'tmax', 'tmin_score', 'tmax_score'], + meta_name='video_meta', + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), + dict(type='ToTensor', keys=['bsp_feature']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=8, + train_dataloader=dict(drop_last=True), + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + pipeline=test_pipeline, + data_prefix=data_root_val), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + pipeline=val_pipeline, + data_prefix=data_root_val), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + pipeline=train_pipeline, + data_prefix=data_root)) +evaluation = dict(interval=1, metrics=['AR@AN']) + +# runtime settings +checkpoint_config = dict(interval=1, filename_tmpl='pem_epoch_{}.pth') +log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) +output_config = dict(out=f'{work_dir}/results.json', output_format='json') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py new file mode 100644 index 0000000000000000000000000000000000000000..2c5f7a0339df48493c7df6e353ba3324de08818c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py @@ -0,0 +1,32 @@ +# dataset settings +dataset_type = 'ActivityNetDataset' +data_root = 'data/ActivityNet/activitynet_feature_cuhk/csv_mean_100/' +data_root_val = 'data/ActivityNet/activitynet_feature_cuhk/csv_mean_100/' +ann_file_train = 'data/ActivityNet/anet_anno_train.json' +ann_file_val = 'data/ActivityNet/anet_anno_val.json' +ann_file_test = 'data/ActivityNet/anet_anno_test.json' + +work_dir = 'work_dirs/bsn_400x100_20e_1x16_activitynet_feature/' +tem_results_dir = f'{work_dir}/tem_results/' +pgm_proposals_dir = f'{work_dir}/pgm_proposals/' +pgm_features_dir = f'{work_dir}/pgm_features/' + +temporal_scale = 100 +pgm_proposals_cfg = dict( + pgm_proposals_thread=8, temporal_scale=temporal_scale, peak_threshold=0.5) +pgm_features_test_cfg = dict( + pgm_features_thread=4, + top_k=1000, + num_sample_start=8, + num_sample_end=8, + num_sample_action=16, + num_sample_interp=3, + bsp_boundary_ratio=0.2) +pgm_features_train_cfg = dict( + pgm_features_thread=4, + top_k=500, + num_sample_start=8, + num_sample_end=8, + num_sample_action=16, + num_sample_interp=3, + bsp_boundary_ratio=0.2) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py new file mode 100644 index 0000000000000000000000000000000000000000..60093cf4188f30d2865aa825ba0985de815563a4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py @@ -0,0 +1,79 @@ +_base_ = ['../../_base_/models/bsn_tem.py', '../../_base_/default_runtime.py'] + +# dataset settings +dataset_type = 'ActivityNetDataset' +data_root = 'data/ActivityNet/activitynet_feature_cuhk/csv_mean_100/' +data_root_val = 'data/ActivityNet/activitynet_feature_cuhk/csv_mean_100/' +ann_file_train = 'data/ActivityNet/anet_anno_train.json' +ann_file_val = 'data/ActivityNet/anet_anno_val.json' +ann_file_test = 'data/ActivityNet/anet_anno_full.json' + +test_pipeline = [ + dict(type='LoadLocalizationFeature'), + dict( + type='Collect', + keys=['raw_feature'], + meta_name='video_meta', + meta_keys=['video_name']), + dict(type='ToTensor', keys=['raw_feature']) +] +train_pipeline = [ + dict(type='LoadLocalizationFeature'), + dict(type='GenerateLocalizationLabels'), + dict( + type='Collect', + keys=['raw_feature', 'gt_bbox'], + meta_name='video_meta', + meta_keys=['video_name']), + dict(type='ToTensor', keys=['raw_feature', 'gt_bbox']), + dict(type='ToDataContainer', fields=[dict(key='gt_bbox', stack=False)]) +] +val_pipeline = [ + dict(type='LoadLocalizationFeature'), + dict(type='GenerateLocalizationLabels'), + dict( + type='Collect', + keys=['raw_feature', 'gt_bbox'], + meta_name='video_meta', + meta_keys=['video_name']), + dict(type='ToTensor', keys=['raw_feature', 'gt_bbox']), + dict(type='ToDataContainer', fields=[dict(key='gt_bbox', stack=False)]) +] + +data = dict( + videos_per_gpu=16, + workers_per_gpu=8, + train_dataloader=dict(drop_last=True), + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + pipeline=test_pipeline, + data_prefix=data_root_val), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + pipeline=val_pipeline, + data_prefix=data_root_val), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + pipeline=train_pipeline, + data_prefix=data_root)) + +# optimizer +optimizer = dict( + type='Adam', lr=0.001, weight_decay=0.0001) # this lr is used for 1 gpus +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=7) +total_epochs = 20 + +# runtime settings +checkpoint_config = dict(interval=1, filename_tmpl='tem_epoch_{}.pth') +log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) +workflow = [('train', 1), ('val', 1)] +work_dir = 'work_dirs/bsn_400x100_20e_1x16_activitynet_feature/' +tem_results_dir = f'{work_dir}/tem_results/' +output_config = dict(out=tem_results_dir, output_format='csv') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..e1bddeb9cda1ea0397bbc13abc6c02db096746fa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/bsn/metafile.yml @@ -0,0 +1,85 @@ +Collections: +- Name: BSN + README: configs/localization/bsn/README.md + Paper: + URL: https://arxiv.org/abs/1806.02964 + Title: "BSN: Boundary Sensitive Network for Temporal Action Proposal Generation" +Models: +- Config: + - configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py + - configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py + - configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py + In Collection: BSN + Metadata: + Pretrained: None + Training Data: ActivityNet v1.3 + Training Resources: 1 GPUs + feature: cuhk_mean_100 + Name: bsn_400x100_1x16_20e_activitynet_feature (cuhk_mean_100) + Results: + - Dataset: ActivityNet v1.3 + Metrics: + AR@100: 74.66 + AUC: 66.45 + Task: Temporal Action Localization + Training Json Log: + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature/bsn_tem_400x100_1x16_20e_activitynet_feature.log.json + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature/bsn_pem_400x100_1x16_20e_activitynet_feature.log.json + Training Log: + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature/bsn_tem_400x100_1x16_20e_activitynet_feature.log + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature/bsn_pem_400x100_1x16_20e_activitynet_feature.log + Weights: + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature/bsn_tem_400x100_1x16_20e_activitynet_feature_20200619-cd6accc3.pth + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature/bsn_pem_400x100_1x16_20e_activitynet_feature_20210203-1c27763d.pth +- Config: + - configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py + - configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py + - configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py + In Collection: BSN + Metadata: + Pretrained: None + Training Data: ActivityNet v1.3 + Training Resources: 1 GPUs + feature: mmaction_video + Name: bsn_400x100_1x16_20e_activitynet_feature (mmaction_video) + Results: + - Dataset: ActivityNet v1.3 + Metrics: + AR@100: 74.93 + AUC: 66.74 + Task: Temporal Action Localization + Training Json Log: + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_video/bsn_tem_400x100_1x16_20e_mmaction_video_20200809.json + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_video/bsn_pem_400x100_1x16_20e_mmaction_video_20200809.json + Training Log: + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_video/bsn_tem_400x100_1x16_20e_mmaction_video_20200809.log + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_video/bsn_pem_400x100_1x16_20e_mmaction_video_20200809.log + Weights: + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_video/bsn_tem_400x100_1x16_20e_mmaction_video_20200809-ad6ec626.pth + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_video/bsn_pem_400x100_1x16_20e_mmaction_video_20200809-aa861b26.pth +- Config: + - configs/localization/bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py + - configs/localization/bsn/bsn_pgm_400x100_activitynet_feature.py + - configs/localization/bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py + In Collection: BSN + Metadata: + Pretrained: None + Training Data: ActivityNet v1.3 + Training Resources: 1 GPUs + feature: mmaction_clip + Name: bsn_400x100_1x16_20e_activitynet_feature (mmaction_clip) + Results: + - Dataset: ActivityNet v1.3 + Metrics: + AR@100: 75.19 + AUC: 66.81 + Task: Temporal Action Localization + Training Json Log: + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_clip/bsn_tem_400x100_1x16_20e_mmaction_clip_20200809.json + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_clip/bsn_pem_400x100_1x16_20e_mmaction_clip_20200809.json + Training Log: + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_clip/bsn_tem_400x100_1x16_20e_mmaction_clip_20200809.log + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_clip/bsn_pem_400x100_1x16_20e_mmaction_clip_20200809.log + Weights: + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_tem_400x100_1x16_20e_mmaction_clip/bsn_tem_400x100_1x16_20e_mmaction_clip_20200809-0a563554.pth + - https://download.openmmlab.com/mmaction/localization/bsn/bsn_pem_400x100_1x16_20e_mmaction_clip/bsn_pem_400x100_1x16_20e_mmaction_clip_20200809-e32f61e6.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7eb73213c47718d057c4d705412b9af71a733cea --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/README.md @@ -0,0 +1,79 @@ +# SSN + +[Temporal Action Detection With Structured Segment Networks](https://openaccess.thecvf.com/content_iccv_2017/html/Zhao_Temporal_Action_Detection_ICCV_2017_paper.html) + + + +## Abstract + + + +Detecting actions in untrimmed videos is an important yet challenging task. In this paper, we present the structured segment network (SSN), a novel framework which models the temporal structure of each action instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. This allows the framework to effectively distinguish positive proposals from background or incomplete ones, thus leading to both accurate recognition and localization. These components are integrated into a unified network that can be efficiently trained in an end-to-end fashion. Additionally, a simple yet effective temporal action proposal scheme, dubbed temporal actionness grouping (TAG) is devised to generate high quality action proposals. On two challenging benchmarks, THUMOS14 and ActivityNet, our method remarkably outperforms previous state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling actions with various temporal structures. + + + +
+ +
+ +## Results and Models + +| config | gpus | backbone | pretrain | mAP@0.3 | mAP@0.4 | mAP@0.5 | reference mAP@0.3 | reference mAP@0.4 | reference mAP@0.5 | gpu_mem(M) | ckpt | log | json | reference ckpt | reference json | +| :---------------------------------------------------------------------------------------: | :--: | :------: | :------: | :-----: | :-----: | :-----: | :---------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------: | :--------: | :----------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | +| [ssn_r50_450e_thumos14_rgb](/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py) | 8 | ResNet50 | ImageNet | 29.37 | 22.15 | 15.69 | [27.61](https://github.com/open-mmlab/mmaction/tree/c7e3b7c11fb94131be9b48a8e3d510589addc3ce#Get%20started) | [21.28](https://github.com/open-mmlab/mmaction/tree/c7e3b7c11fb94131be9b48a8e3d510589addc3ce#Get%20started) | [14.57](https://github.com/open-mmlab/mmaction/tree/c7e3b7c11fb94131be9b48a8e3d510589addc3ce#Get%20started) | 6352 | [ckpt](https://download.openmmlab.com/mmaction/localization/ssn/ssn_r50_450e_thumos14_rgb/ssn_r50_450e_thumos14_rgb_20201012-1920ab16.pth) | [log](https://download.openmmlab.com/mmaction/localization/ssn/ssn_r50_450e_thumos14_rgb/20201005_144656.log) | [json](https://download.openmmlab.com/mmaction/localization/ssn/ssn_r50_450e_thumos14_rgb/20201005_144656.log.json) | [ckpt](https://download.openmmlab.com/mmaction/localization/ssn/mmaction_reference/ssn_r50_450e_thumos14_rgb_ref/ssn_r50_450e_thumos14_rgb_ref_20201014-b6f48f68.pth) | [json](https://download.openmmlab.com/mmaction/localization/ssn/mmaction_reference/ssn_r50_450e_thumos14_rgb_ref/20201008_103258.log.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. Since SSN utilizes different structured temporal pyramid pooling methods at training and testing, please refer to [ssn_r50_450e_thumos14_rgb_train](/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py) at training and [ssn_r50_450e_thumos14_rgb_test](/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_test.py) at testing. +3. We evaluate the action detection performance of SSN, using action proposals of TAG. For more details on data preparation, you can refer to thumos14 TAG proposals in [Data Preparation](/docs/data_preparation.md). +4. The reference SSN in is evaluated with `ResNet50` backbone in MMAction, which is the same backbone with ours. Note that the original setting of MMAction SSN uses the `BNInception` backbone. + +::: + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train SSN model on thumos14 dataset. + +```shell +python tools/train.py configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py +``` + +For more details and optional arguments infos, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test BMN on ActivityNet feature dataset. + +```shell +# Note: If evaluated, then please make sure the annotation file for test data contains groundtruth. +python tools/test.py configs/localization/ssn/ssn_r50_450e_thumos14_rgb_test.py checkpoints/SOME_CHECKPOINT.pth --eval mAP +``` + +For more details and optional arguments infos, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@InProceedings{Zhao_2017_ICCV, +author = {Zhao, Yue and Xiong, Yuanjun and Wang, Limin and Wu, Zhirong and Tang, Xiaoou and Lin, Dahua}, +title = {Temporal Action Detection With Structured Segment Networks}, +booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)}, +month = {Oct}, +year = {2017} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..62ccc2caed23c3ee6118068ed8966012f403170e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/README_zh-CN.md @@ -0,0 +1,63 @@ +# SSN + +## 简介 + + + +```BibTeX +@InProceedings{Zhao_2017_ICCV, +author = {Zhao, Yue and Xiong, Yuanjun and Wang, Limin and Wu, Zhirong and Tang, Xiaoou and Lin, Dahua}, +title = {Temporal Action Detection With Structured Segment Networks}, +booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)}, +month = {Oct}, +year = {2017} +} +``` + +## 模型库 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | mAP@0.3 | mAP@0.4 | mAP@0.5 | 参考代码的 mAP@0.3 | 参考代码的 mAP@0.4 | 参考代码的 mAP@0.5 | GPU 显存占用 (M) | ckpt | log | json | 参考代码的 ckpt | 参考代码的 json | +| :---------------------------------------------------------------------------------------: | :------: | :------: | :------: | :-----: | :-----: | :-----: | :---------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------: | :--------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | +| [ssn_r50_450e_thumos14_rgb](/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py) | 8 | ResNet50 | ImageNet | 29.37 | 22.15 | 15.69 | [27.61](https://github.com/open-mmlab/mmaction/tree/c7e3b7c11fb94131be9b48a8e3d510589addc3ce#Get%20started) | [21.28](https://github.com/open-mmlab/mmaction/tree/c7e3b7c11fb94131be9b48a8e3d510589addc3ce#Get%20started) | [14.57](https://github.com/open-mmlab/mmaction/tree/c7e3b7c11fb94131be9b48a8e3d510589addc3ce#Get%20started) | 6352 | [ckpt](https://download.openmmlab.com/mmaction/localization/ssn/ssn_r50_450e_thumos14_rgb/ssn_r50_450e_thumos14_rgb_20201012-1920ab16.pth) | [log](https://download.openmmlab.com/mmaction/localization/ssn/ssn_r50_450e_thumos14_rgb/20201005_144656.log) | [json](https://download.openmmlab.com/mmaction/localization/ssn/ssn_r50_450e_thumos14_rgb/20201005_144656.log.json) | [ckpt](https://download.openmmlab.com/mmaction/localization/ssn/mmaction_reference/ssn_r50_450e_thumos14_rgb_ref/ssn_r50_450e_thumos14_rgb_ref_20201014-b6f48f68.pth) | [json](https://download.openmmlab.com/mmaction/localization/ssn/mmaction_reference/ssn_r50_450e_thumos14_rgb_ref/20201008_103258.log.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 由于 SSN 在训练和测试阶段使用不同的结构化时序金字塔池化方法(structured temporal pyramid pooling methods),请分别参考 [ssn_r50_450e_thumos14_rgb_train](/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py) 和 [ssn_r50_450e_thumos14_rgb_test](/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_test.py)。 +3. MMAction2 使用 TAG 的时序动作候选进行 SSN 模型的精度验证。关于数据准备的更多细节,用户可参考 [Data 数据集准备文档](/docs_zh_CN/data_preparation.md) 准备 thumos14 的 TAG 时序动作候选。 +4. 参考代码的 SSN 模型是和 MMAction2 一样在 `ResNet50` 主干网络上验证的。注意,这里的 SSN 的初始设置与原代码库的 `BNInception` 骨干网络的设置相同。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:在 thumos14 数据集上训练 SSN 模型。 + +```shell +python tools/train.py configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 ActivityNet 特征上测试 BMN。 + +```shell +# 注:如果需要进行指标验证,需确测试数据的保标注文件包含真实标签 +python tools/test.py configs/localization/ssn/ssn_r50_450e_thumos14_rgb_test.py checkpoints/SOME_CHECKPOINT.pth --eval mAP +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..d2b588009d723e170fd5cf5a932782deba541dc9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/metafile.yml @@ -0,0 +1,30 @@ +Collections: +- Name: SSN + README: configs/localization/ssn/README.md + Paper: + URL: https://arxiv.org/abs/1704.06228 + Title: Temporal Action Detection with Structured Segment Networks +Models: +- Config: configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py + In Collection: SSN + Metadata: + Architecture: ResNet50 + Pretrained: ImageNet + Training Data: THUMOS 14 + Training Resources: 8 GPUs + Name: ssn_r50_450e_thumos14_rgb + Results: + - Dataset: THUMOS 14 + Metrics: + mAP@0.3: 29.37 + mAP@0.4: 22.15 + mAP@0.5: 15.69 + Task: Temporal Action Localization + Training Json Log: https://download.openmmlab.com/mmaction/localization/ssn/ssn_r50_450e_thumos14_rgb/20201005_144656.log.json + Training Log: https://download.openmmlab.com/mmaction/localization/ssn/ssn_r50_450e_thumos14_rgb/20201005_144656.log + Weights: https://download.openmmlab.com/mmaction/localization/ssn/ssn_r50_450e_thumos14_rgb/ssn_r50_450e_thumos14_rgb_20201012-1920ab16.pth + reference mAP@0.3: '[27.61](https://github.com/open-mmlab/mmaction/tree/c7e3b7c11fb94131be9b48a8e3d510589addc3ce#Get%20started)' + reference mAP@0.4: '[21.28](https://github.com/open-mmlab/mmaction/tree/c7e3b7c11fb94131be9b48a8e3d510589addc3ce#Get%20started)' + reference mAP@0.5: '[14.57](https://github.com/open-mmlab/mmaction/tree/c7e3b7c11fb94131be9b48a8e3d510589addc3ce#Get%20started)' + reference ckpt: '[ckpt](https://download.openmmlab.com/mmaction/localization/ssn/mmaction_reference/ssn_r50_450e_thumos14_rgb_ref/ssn_r50_450e_thumos14_rgb_ref_20201014-b6f48f68.pth)' + reference json: '[json](https://download.openmmlab.com/mmaction/localization/ssn/mmaction_reference/ssn_r50_450e_thumos14_rgb_ref/20201008_103258.log.json)' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_test.py b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b9ed3979ebfa536867c0d9369cf96dec9c2eaa94 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_test.py @@ -0,0 +1,109 @@ +# model training and testing settings +train_cfg_ = dict( + ssn=dict( + assigner=dict( + positive_iou_threshold=0.7, + background_iou_threshold=0.01, + incomplete_iou_threshold=0.3, + background_coverage_threshold=0.02, + incomplete_overlap_threshold=0.01), + sampler=dict( + num_per_video=8, + positive_ratio=1, + background_ratio=1, + incomplete_ratio=6, + add_gt_as_proposals=True), + loss_weight=dict(comp_loss_weight=0.1, reg_loss_weight=0.1), + debug=False)) +test_cfg_ = dict( + ssn=dict( + sampler=dict(test_interval=6, batch_size=16), + evaluater=dict( + top_k=2000, + nms=0.2, + softmax_before_filter=True, + cls_score_dict=None, + cls_top_k=2))) +# model settings +model = dict( + type='SSN', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False, + partial_bn=True), + spatial_type='avg', + dropout_ratio=0.8, + cls_head=dict( + type='SSNHead', + dropout_ratio=0., + in_channels=2048, + num_classes=20, + consensus=dict(type='STPPTest', stpp_stage=(1, 1, 1)), + use_regression=True), + test_cfg=test_cfg_) +# dataset settings +dataset_type = 'SSNDataset' +data_root = './data/thumos14/rawframes/' +data_root_val = './data/thumos14/rawframes/' +ann_file_train = 'data/thumos14/thumos14_tag_val_proposal_list.txt' +ann_file_val = 'data/thumos14/thumos14_tag_val_proposal_list.txt' +ann_file_test = 'data/thumos14/thumos14_tag_test_proposal_list.txt' +img_norm_cfg = dict(mean=[104, 117, 128], std=[1, 1, 1], to_bgr=True) +test_pipeline = [ + dict( + type='SampleProposalFrames', + clip_len=1, + body_segments=5, + aug_segments=(2, 2), + aug_ratio=0.5, + mode='test'), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(340, 256), keep_ratio=True), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict( + type='Collect', + keys=[ + 'imgs', 'relative_proposal_list', 'scale_factor_list', + 'proposal_tick_list', 'reg_norm_consts' + ], + meta_keys=[]), + dict( + type='ToTensor', + keys=[ + 'imgs', 'relative_proposal_list', 'scale_factor_list', + 'proposal_tick_list', 'reg_norm_consts' + ]) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root, + train_cfg=train_cfg_, + test_cfg=test_cfg_, + aug_ratio=0.5, + test_mode=True, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.001, momentum=0.9, + weight_decay=1e-6) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[200, 400]) +checkpoint_config = dict(interval=5) +log_config = dict(interval=5, hooks=[dict(type='TextLoggerHook')]) +# runtime settings +total_epochs = 450 +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/ssn_r50_1x5_450e_thumos14_rgb' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py new file mode 100644 index 0000000000000000000000000000000000000000..75d927a76ffd0826e424b98b6193eb453723bf05 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/localization/ssn/ssn_r50_450e_thumos14_rgb_train.py @@ -0,0 +1,154 @@ +# model training and testing settings +train_cfg_ = dict( + ssn=dict( + assigner=dict( + positive_iou_threshold=0.7, + background_iou_threshold=0.01, + incomplete_iou_threshold=0.3, + background_coverage_threshold=0.02, + incomplete_overlap_threshold=0.01), + sampler=dict( + num_per_video=8, + positive_ratio=1, + background_ratio=1, + incomplete_ratio=6, + add_gt_as_proposals=True), + loss_weight=dict(comp_loss_weight=0.1, reg_loss_weight=0.1), + debug=False)) +test_cfg_ = dict( + ssn=dict( + sampler=dict(test_interval=6, batch_size=16), + evaluater=dict( + top_k=2000, + nms=0.2, + softmax_before_filter=True, + cls_score_dict=None, + cls_top_k=2))) +# model settings +model = dict( + type='SSN', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False, + partial_bn=True), + spatial_type='avg', + dropout_ratio=0.8, + loss_cls=dict(type='SSNLoss'), + cls_head=dict( + type='SSNHead', + dropout_ratio=0., + in_channels=2048, + num_classes=20, + consensus=dict( + type='STPPTrain', + stpp_stage=(1, 1, 1), + num_segments_list=(2, 5, 2)), + use_regression=True), + train_cfg=train_cfg_) +# dataset settings +dataset_type = 'SSNDataset' +data_root = './data/thumos14/rawframes/' +data_root_val = './data/thumos14/rawframes/' +ann_file_train = 'data/thumos14/thumos14_tag_val_proposal_list.txt' +ann_file_val = 'data/thumos14/thumos14_tag_val_proposal_list.txt' +ann_file_test = 'data/thumos14/thumos14_tag_test_proposal_list.txt' +img_norm_cfg = dict(mean=[104, 117, 128], std=[1, 1, 1], to_bgr=True) +train_pipeline = [ + dict( + type='SampleProposalFrames', + clip_len=1, + body_segments=5, + aug_segments=(2, 2), + aug_ratio=0.5), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(340, 256), keep_ratio=True), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NPTCHW'), + dict( + type='Collect', + keys=[ + 'imgs', 'reg_targets', 'proposal_scale_factor', 'proposal_labels', + 'proposal_type' + ], + meta_keys=[]), + dict( + type='ToTensor', + keys=[ + 'imgs', 'reg_targets', 'proposal_scale_factor', 'proposal_labels', + 'proposal_type' + ]) +] +val_pipeline = [ + dict( + type='SampleProposalFrames', + clip_len=1, + body_segments=5, + aug_segments=(2, 2), + aug_ratio=0.5), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(340, 256), keep_ratio=True), + dict(type='CenterCrop', crop_size=224), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NPTCHW'), + dict( + type='Collect', + keys=[ + 'imgs', 'reg_targets', 'proposal_scale_factor', 'proposal_labels', + 'proposal_type' + ], + meta_keys=[]), + dict( + type='ToTensor', + keys=[ + 'imgs', 'reg_targets', 'proposal_scale_factor', 'proposal_labels', + 'proposal_type' + ]) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + train_cfg=train_cfg_, + test_cfg=test_cfg_, + body_segments=5, + aug_segments=(2, 2), + aug_ratio=0.5, + test_mode=False, + verbose=True, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root, + train_cfg=train_cfg_, + test_cfg=test_cfg_, + body_segments=5, + aug_segments=(2, 2), + aug_ratio=0.5, + test_mode=False, + pipeline=val_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.001, momentum=0.9, + weight_decay=1e-6) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[200, 400]) +checkpoint_config = dict(interval=5) +log_config = dict(interval=1, hooks=[dict(type='TextLoggerHook')]) +# runtime settings +total_epochs = 450 +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/ssn_r50_1x5_450e_thumos14_rgb' +load_from = None +resume_from = None +workflow = [('train', 1)] +find_unused_parameters = True diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/README.md new file mode 100644 index 0000000000000000000000000000000000000000..859890c11d41c2e2c0551ccb80cf716bdd6a87a7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/README.md @@ -0,0 +1,87 @@ +# C3D + +[Learning Spatiotemporal Features with 3D Convolutional Networks](https://openaccess.thecvf.com/content_iccv_2015/html/Tran_Learning_Spatiotemporal_Features_ICCV_2015_paper.html) + + + +## Abstract + + + +We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets; 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets; and 3) Our learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. In addition, the features are compact: achieving 52.8% accuracy on UCF101 dataset with only 10 dimensions and also very efficient to compute due to the fast inference of ConvNets. Finally, they are conceptually very simple and easy to train and use. + + + +
+ +
+ +## Results and Models + +### UCF-101 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | testing protocol | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------ | :--------: | :--: | :------: | :------: | :------: | :------: | :---------------: | :---------------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | +| [c3d_sports1m_16x1x1_45e_ucf101_rgb.py](/configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py) | 128x171 | 8 | c3d | sports1m | 83.27 | 95.90 | 10 clips x 1 crop | x | 6053 | [ckpt](https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb/c3d_sports1m_16x1x1_45e_ucf101_rgb_20201021-26655025.pth) | [log](https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb/20201021_140429.log) | [json](https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb/20201021_140429.log.json) | + +:::{note} + +1. The author of C3D normalized UCF-101 with volume mean and used SVM to classify videos, while we normalized the dataset with RGB mean value and used a linear classifier. +2. The **gpus** indicates the number of gpu (32G V100) we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +3. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, + not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. + +::: + +For more details on data preparation, you can refer to UCF-101 in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train C3D model on UCF-101 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test C3D model on UCF-101 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + + + +```BibTeX +@ARTICLE{2014arXiv1412.0767T, +author = {Tran, Du and Bourdev, Lubomir and Fergus, Rob and Torresani, Lorenzo and Paluri, Manohar}, +title = {Learning Spatiotemporal Features with 3D Convolutional Networks}, +keywords = {Computer Science - Computer Vision and Pattern Recognition}, +year = 2014, +month = dec, +eid = {arXiv:1412.0767} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..3344f7d000ccf819631350bcd701631d0a552a99 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/README_zh-CN.md @@ -0,0 +1,69 @@ +# C3D + +## 简介 + + + +```BibTeX +@ARTICLE{2014arXiv1412.0767T, +author = {Tran, Du and Bourdev, Lubomir and Fergus, Rob and Torresani, Lorenzo and Paluri, Manohar}, +title = {Learning Spatiotemporal Features with 3D Convolutional Networks}, +keywords = {Computer Science - Computer Vision and Pattern Recognition}, +year = 2014, +month = dec, +eid = {arXiv:1412.0767} +} +``` + +## 模型库 + +### UCF-101 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 测试方案 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------ | :-----: | :------: | :------: | :------: | :---------: | :---------: | :---------------: | :----------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | +| [c3d_sports1m_16x1x1_45e_ucf101_rgb.py](/configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py) | 128x171 | 8 | c3d | sports1m | 83.27 | 95.90 | 10 clips x 1 crop | x | 6053 | [ckpt](https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb/c3d_sports1m_16x1x1_45e_ucf101_rgb_20201021-26655025.pth) | [log](https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb/20201021_140429.log) | [json](https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb/20201021_140429.log.json) | + +注: + +1. C3D 的原论文使用 UCF-101 的数据均值进行数据正则化,并且使用 SVM 进行视频分类。MMAction2 使用 ImageNet 的 RGB 均值进行数据正则化,并且使用线性分类器。 +2. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +3. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 UCF-101 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 C3D 模型在 UCF-101 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 UCF-101 数据集上测试 C3D 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..cd96fca866573966de5c1436e9a786b575552779 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py @@ -0,0 +1,95 @@ +_base_ = '../../_base_/models/c3d_sports1m_pretrained.py' + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ucf101/rawframes' +data_root_val = 'data/ucf101/rawframes' +split = 1 # official train/test splits. valid numbers: 1, 2, 3 +ann_file_train = f'data/ucf101/ucf101_train_split_{split}_rawframes.txt' +ann_file_val = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +ann_file_test = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +img_norm_cfg = dict(mean=[104, 117, 128], std=[1, 1, 1], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=16, frame_interval=1, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(128, 171)), + dict(type='RandomCrop', size=112), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=16, + frame_interval=1, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(128, 171)), + dict(type='CenterCrop', crop_size=112), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=16, + frame_interval=1, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(128, 171)), + dict(type='CenterCrop', crop_size=112), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +data = dict( + videos_per_gpu=30, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.001, momentum=0.9, + weight_decay=0.0005) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[20, 40]) +total_epochs = 45 +checkpoint_config = dict(interval=5) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = f'./work_dirs/c3d_sports1m_16x1x1_45e_ucf101_split_{split}_rgb/' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..f3e7ec9a5fe13976c4be7b87b22d093772e785aa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/c3d/metafile.yml @@ -0,0 +1,30 @@ +Collections: +- Name: C3D + README: configs/recognition/c3d/README.md + Paper: + URL: https://arxiv.org/abs/1412.0767 + Title: Learning Spatiotemporal Features with 3D Convolutional Networks +Models: +- Config: configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py + In Collection: C3D + Metadata: + Architecture: c3d + Batch Size: 30 + Epochs: 45 + FLOPs: 38615475200 + Parameters: 78409573 + Pretrained: sports1m + Resolution: 128x171 + Training Data: UCF101 + Training Resources: 8 GPUs + Modality: RGB + Name: c3d_sports1m_16x1x1_45e_ucf101_rgb + Results: + - Dataset: UCF101 + Metrics: + Top 1 Accuracy: 83.27 + Top 5 Accuracy: 95.9 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb/20201021_140429.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb/20201021_140429.log + Weights: https://download.openmmlab.com/mmaction/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb/c3d_sports1m_16x1x1_45e_ucf101_rgb_20201021-26655025.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5fa387e5e5114c50a62c67f079457b4354d4d20f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/README.md @@ -0,0 +1,108 @@ +# CSN + +[Video Classification With Channel-Separated Convolutional Networks](https://openaccess.thecvf.com/content_ICCV_2019/html/Tran_Video_Classification_With_Channel-Separated_Convolutional_Networks_ICCV_2019_paper.html) + + + +## Abstract + + + +Group convolution has been shown to offer great computational savings in various 2D convolutional architectures for image classification. It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks. This paper studies the effects of different design choices in 3D group convolutional networks for video classification. We empirically demonstrate that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks. Our experiments suggest two main findings. First, it is a good practice to factorize 3D convolutions by separating channel interactions and spatiotemporal interactions as this leads to improved accuracy and lower computational cost. Second, 3D channel-separated convolutions provide a form of regularization, yielding lower training accuracy but higher test accuracy compared to 3D convolutions. These two empirical findings lead us to design an architecture -- Channel-Separated Convolutional Network (CSN) -- which is simple, efficient, yet accurate. On Sports1M, Kinetics, and Something-Something, our CSNs are comparable with or better than the state-of-the-art while being 2-3 times more efficient. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :-------: | :------: | :--------: | :--------: | :---------------------: | :--------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb](/configs/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb.py) | short-side 320 | x | ResNet50 | None | 73.6 | 91.3 | x | x | [ckpt](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb_20210618-4e29e2e8.pth) | [log](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb/20210618_182414.log) | [json](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb/20210618_182414.log.json) | +| [ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb](/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb.py) | short-side 320 | x | ResNet50 | IG65M | 79.0 | 94.2 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_ig65m_pretrained_r50_32x2x1_58e_kinetics400_rgb_20210617-86d33018.pth) | x | x | +| [ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb](/configs/recognition/csn/ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py) | short-side 320 | x | ResNet152 | None | 76.5 | 92.1 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_from_scratch_r152_32x2x1_180e_kinetics400_rgb_20210617-5c933ae1.pth) | x | x | +| [ircsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb](/configs/recognition/csn/ircsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py) | short-side 320 | x | ResNet152 | Sports1M | 78.2 | 93.0 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_sports1m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-b9b10241.pth) | x | x | +| [ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py](/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py) | short-side 320 | 8x4 | ResNet152 | IG65M | 82.76/82.6 | 95.68/95.3 | x | 8516 | [ckpt](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb_20200812-9037a758.pth)/[infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-e63ee1bd.pth) | [log](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb/20200809_053132.log) | [json](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb/20200809_053132.log.json) | +| [ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb](/configs/recognition/csn/ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py) | short-side 320 | x | ResNet152 | None | 77.8 | 92.8 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ipcsn_from_scratch_r152_32x2x1_180e_kinetics400_rgb_20210617-d565828d.pth) | x | x | +| [ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb](/configs/recognition/csn/ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py) | short-side 320 | x | ResNet152 | Sports1M | 78.8 | 93.5 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ipcsn_sports1m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-3367437a.pth) | x | x | +| [ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb](/configs/recognition/csn/ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py) | short-side 320 | x | ResNet152 | IG65M | 82.5 | 95.3 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ipcsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-c3be9793.pth) | x | x | +| [ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py](/configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py) | short-side 320 | 8x4 | ResNet152 | IG65M | 80.14 | 94.93 | x | 8517 | [ckpt](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20200803-fc66ce8d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb/20200728_031952.log) | [json](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb/20200728_031952.log.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu (32G V100) we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, + not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. +4. The **infer_ckpt** means those checkpoints are ported from [VMZ](https://github.com/facebookresearch/VMZ). + +::: + +For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train CSN model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py \ + --work-dir work_dirs/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test CSN model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips prob +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{inproceedings, +author = {Wang, Heng and Feiszli, Matt and Torresani, Lorenzo}, +year = {2019}, +month = {10}, +pages = {5551-5560}, +title = {Video Classification With Channel-Separated Convolutional Networks}, +doi = {10.1109/ICCV.2019.00565} +} +``` + + + +```BibTeX +@inproceedings{ghadiyaram2019large, + title={Large-scale weakly-supervised pre-training for video action recognition}, + author={Ghadiyaram, Deepti and Tran, Du and Mahajan, Dhruv}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={12046--12055}, + year={2019} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..24f964dbbe9c4894028c1c59be9ba257ecf3a739 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/README_zh-CN.md @@ -0,0 +1,92 @@ +# CSN + +## 简介 + + + +```BibTeX +@inproceedings{inproceedings, +author = {Wang, Heng and Feiszli, Matt and Torresani, Lorenzo}, +year = {2019}, +month = {10}, +pages = {5551-5560}, +title = {Video Classification With Channel-Separated Convolutional Networks}, +doi = {10.1109/ICCV.2019.00565} +} +``` + + + +```BibTeX +@inproceedings{ghadiyaram2019large, + title={Large-scale weakly-supervised pre-training for video action recognition}, + author={Ghadiyaram, Deepti and Tran, Du and Mahajan, Dhruv}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={12046--12055}, + year={2019} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :-------: | :------: | :---------: | :---------: | :----------------: | :--------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb](/configs/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb.py) | 短边 320 | x | ResNet50 | None | 73.6 | 91.3 | x | x | [ckpt](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb_20210618-4e29e2e8.pth) | [log](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb/20210618_182414.log) | [json](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb/20210618_182414.log.json) | +| [ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb](/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb.py) | 短边 320 | x | ResNet50 | IG65M | 79.0 | 94.2 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_ig65m_pretrained_r50_32x2x1_58e_kinetics400_rgb_20210617-86d33018.pth) | x | x | +| [ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb](/configs/recognition/csn/ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py) | 短边 320 | x | ResNet152 | None | 76.5 | 92.1 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_from_scratch_r152_32x2x1_180e_kinetics400_rgb_20210617-5c933ae1.pth) | x | x | +| [ircsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb](/configs/recognition/csn/ircsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py) | 短边 320 | x | ResNet152 | Sports1M | 78.2 | 93.0 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_sports1m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-b9b10241.pth) | x | x | +| [ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py](/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py) | 短边 320 | 8x4 | ResNet152 | IG65M | 82.76/82.6 | 95.68/95.3 | x | 8516 | [ckpt](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb_20200812-9037a758.pth)/[infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-e63ee1bd.pth) | [log](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb/20200809_053132.log) | [json](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb/20200809_053132.log.json) | +| [ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb](/configs/recognition/csn/ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py) | 短边 320 | x | ResNet152 | None | 77.8 | 92.8 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ipcsn_from_scratch_r152_32x2x1_180e_kinetics400_rgb_20210617-d565828d.pth) | x | x | +| [ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb](/configs/recognition/csn/ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py) | 短边 320 | x | ResNet152 | Sports1M | 78.8 | 93.5 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ipcsn_sports1m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-3367437a.pth) | x | x | +| [ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb](/configs/recognition/csn/ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py) | 短边 320 | x | ResNet152 | IG65M | 82.5 | 95.3 | x | x | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ipcsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-c3be9793.pth) | x | x | +| [ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py](/configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py) | 短边 320 | 8x4 | ResNet152 | IG65M | 80.14 | 94.93 | x | 8517 | [ckpt](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20200803-fc66ce8d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb/20200728_031952.log) | [json](https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb/20200728_031952.log.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 这里使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 +4. 这里的 **infer_ckpt** 表示该模型权重文件是从 [VMZ](https://github.com/facebookresearch/VMZ) 导入的。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 CSN 模型在 Kinetics400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py \ + --work-dir work_dirs/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics400 数据集上测试 CSN 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips prob +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7cd96b726e8944310d4a9375acd8c70384490816 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py @@ -0,0 +1,95 @@ +_base_ = [ + './ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py' +] + +# model settings +model = dict( + backbone=dict( + norm_eval=True, bn_frozen=True, bottleneck_mode='ip', pretrained=None)) + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[110.2008, 100.63983, 95.99475], + std=[58.14765, 56.46975, 55.332195], + to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +optimizer = dict( + type='SGD', lr=0.08, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=40) +total_epochs = 180 + +work_dir = './work_dirs/ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7aed801a62a4cf5b8b4148a99d2a84cd34cf9f6d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py @@ -0,0 +1,15 @@ +_base_ = [ + './ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py' +] + +# model settings +model = dict( + backbone=dict( + norm_eval=True, + bn_frozen=True, + bottleneck_mode='ip', + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/csn/ipcsn_from_scratch_r152_ig65m_20210617-c4b99d38.pth' # noqa: E501 + )) + +work_dir = './work_dirs/ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..fc5372a82bd257622f502f48a8c7dab5d81b118a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py @@ -0,0 +1,88 @@ +_base_ = [ + './ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py' +] + +# model settings +model = dict( + backbone=dict( + norm_eval=True, + bn_frozen=True, + bottleneck_mode='ip', + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/csn/ipcsn_from_scratch_r152_sports1m_20210617-7a7cc5b9.pth' # noqa: E501 + )) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[110.2008, 100.63983, 95.99475], + std=[58.14765, 56.46975, 55.332195], + to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=3, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +work_dir = './work_dirs/ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..777b2c0c7101adee70bc88e84b65601bc54266a0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py @@ -0,0 +1,95 @@ +_base_ = [ + './ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py' +] + +# model settings +model = dict( + backbone=dict( + norm_eval=True, bn_frozen=True, bottleneck_mode='ir', pretrained=None)) + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[110.2008, 100.63983, 95.99475], + std=[58.14765, 56.46975, 55.332195], + to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +optimizer = dict( + type='SGD', lr=0.08, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=40) +total_epochs = 180 + +work_dir = './work_dirs/ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..cef9d5dea70cb2256690082ddade7558eececc44 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb.py @@ -0,0 +1,97 @@ +_base_ = [ + './ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py' +] + +# model settings +model = dict( + backbone=dict( + depth=50, + norm_eval=True, + bn_frozen=True, + bottleneck_mode='ir', + pretrained=None)) + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +optimizer = dict( + type='SGD', lr=0.08, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=40) +total_epochs = 180 + +work_dir = './work_dirs/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..54bc5b012f042298be3b88c227e83c28049ac01c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../_base_/models/ircsn_r152.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict( + norm_eval=True, + bn_frozen=True, + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/csn/ircsn_from_scratch_r152_ig65m_20200807-771c4135.pth' # noqa: E501 + )) +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=3, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.000125, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='step', + step=[32, 48], + warmup='linear', + warmup_ratio=0.1, + warmup_by_epoch=True, + warmup_iters=16) +total_epochs = 58 + +work_dir = './work_dirs/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb' # noqa: E501 +find_unused_parameters = True diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..fc44dc42519c96cda663bd6a4ca10d51448d52d3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb.py @@ -0,0 +1,103 @@ +_base_ = [ + '../../_base_/models/ircsn_r152.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict( + depth=50, + norm_eval=True, + bn_frozen=True, + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/csn/ircsn_from_scratch_r50_ig65m_20210617-ce545a37.pth' # noqa: E501 + )) +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=3, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.000125, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='step', + step=[32, 48], + warmup='linear', + warmup_ratio=0.1, + warmup_by_epoch=True, + warmup_iters=16) +total_epochs = 58 + +work_dir = './work_dirs/ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb' # noqa: E501 +find_unused_parameters = True diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..015526ccf622bc88bde3686266fd83d15ed0a8c8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py @@ -0,0 +1,100 @@ +_base_ = [ + '../../_base_/models/ircsn_r152.py', '../../_base_/default_runtime.py' +] + +model = dict( + backbone=dict( + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/csn/ircsn_from_scratch_r152_ig65m_20200807-771c4135.pth' # noqa: E501 + )) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=3, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.000125, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='step', + step=[32, 48], + warmup='linear', + warmup_ratio=0.1, + warmup_by_epoch=True, + warmup_iters=16) +total_epochs = 58 + +work_dir = './work_dirs/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb' +find_unused_parameters = True diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..b4601839b97a2e565ad2f678606359bbc55cb744 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/ircsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py @@ -0,0 +1,88 @@ +_base_ = [ + './ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py' +] + +# model settings +model = dict( + backbone=dict( + norm_eval=True, + bn_frozen=True, + bottleneck_mode='ir', + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/csn/ircsn_from_scratch_r152_sports1m_20210617-bcc9c0dd.pth' # noqa: E501 + )) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[110.2008, 100.63983, 95.99475], + std=[58.14765, 56.46975, 55.332195], + to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=3, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +work_dir = './work_dirs/ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..408e11944fb3e4d8550c811d2e91efdeda0bbf11 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/csn/metafile.yml @@ -0,0 +1,204 @@ +Collections: +- Name: CSN + README: configs/recognition/csn/README.md + Paper: + URL: https://arxiv.org/abs/1904.02811 + Title: Video Classification with Channel-Separated Convolutional Networks +Models: +- Config: configs/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py + In Collection: CSN + Metadata: + Architecture: ResNet152 + Batch Size: 3 + Epochs: 58 + FLOPs: 98096676864 + Parameters: 29703568 + Pretrained: IG65M + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 80.14 + Top 5 Accuracy: 94.93 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb/20200728_031952.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb/20200728_031952.log + Weights: https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20200803-fc66ce8d.pth +- Config: configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py + In Collection: CSN + Metadata: + Architecture: ResNet152 + Batch Size: 3 + Epochs: 58 + FLOPs: 98096676864 + Parameters: 29703568 + Pretrained: IG65M + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 82.76 + Top 5 Accuracy: 95.68 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb/20200809_053132.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb/20200809_053132.log + Weights: https://download.openmmlab.com/mmaction/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb_20200812-9037a758.pth +- Config: configs/recognition/csn/ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py + In Collection: CSN + Metadata: + Architecture: ResNet152 + Epochs: 180 + FLOPs: 110337228800 + Parameters: 33016592 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: ipcsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb + Converted From: + Weights: https://www.dropbox.com/s/3fihu6ti60047mu/ipCSN_152_kinetics_from_scratch_f129594342.pkl?dl=0 + Code: https://github.com/facebookresearch/VMZ/tree/b61b08194bc3273bef4c45fdfdd36c56c8579ff3 + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 77.8 + Top 5 Accuracy: 92.8 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ipcsn_from_scratch_r152_32x2x1_180e_kinetics400_rgb_20210617-d565828d.pth +- Config: configs/recognition/csn/ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py + In Collection: CSN + Metadata: + Architecture: ResNet152 + Epochs: 58 + FLOPs: 110337228800 + Parameters: 33016592 + Pretrained: IG65M + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: ipcsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb + Converted From: + Weights: https://www.dropbox.com/s/zpp3p0vn2i7bibl/ipCSN_152_ft_kinetics_from_ig65m_f133090949.pkl?dl=0 + Code: https://github.com/facebookresearch/VMZ/tree/b61b08194bc3273bef4c45fdfdd36c56c8579ff3 + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 82.5 + Top 5 Accuracy: 95.3 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ipcsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-c3be9793.pth +- Config: configs/recognition/csn/ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py + In Collection: CSN + Metadata: + Architecture: ResNet152 + Epochs: 58 + FLOPs: 110337228800 + Parameters: 33016592 + Pretrained: Sports1M + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: ipcsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb + Converted From: + Weights: https://www.dropbox.com/s/ir7cr0hda36knux/ipCSN_152_ft_kinetics_from_sports1m_f111279053.pkl?dl=0 + Code: https://github.com/facebookresearch/VMZ/tree/b61b08194bc3273bef4c45fdfdd36c56c8579ff3 + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 78.8 + Top 5 Accuracy: 93.5 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ipcsn_sports1m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-3367437a.pth +- Config: configs/recognition/csn/ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb.py + In Collection: CSN + Metadata: + Architecture: ResNet152 + Epochs: 180 + FLOPs: 98096676864 + Parameters: 29703568 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: ircsn_bnfrozen_r152_32x2x1_180e_kinetics400_rgb + Converted From: + Weights: https://www.dropbox.com/s/46gcm7up60ssx5c/irCSN_152_kinetics_from_scratch_f98268019.pkl?dl=0 + Code: https://github.com/facebookresearch/VMZ/tree/b61b08194bc3273bef4c45fdfdd36c56c8579ff3 + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.5 + Top 5 Accuracy: 92.1 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_from_scratch_r152_32x2x1_180e_kinetics400_rgb_20210617-5c933ae1.pth +- Config: configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb.py + In Collection: CSN + Metadata: + Architecture: ResNet50 + Epochs: 58 + FLOPs: 56209211392 + Parameters: 13131152 + Pretrained: IG65M + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: ircsn_ig65m_pretrained_bnfrozen_r50_32x2x1_58e_kinetics400_rgb + Converted From: + Weights: https://www.dropbox.com/s/gmd8r87l3wmkn3h/irCSN_152_ft_kinetics_from_ig65m_f126851907.pkl?dl=0 + Code: https://github.com/facebookresearch/VMZ/tree/b61b08194bc3273bef4c45fdfdd36c56c8579ff3 + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 79.0 + Top 5 Accuracy: 94.2 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_ig65m_pretrained_r50_32x2x1_58e_kinetics400_rgb_20210617-86d33018.pth +- Config: configs/recognition/csn/ircsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py + In Collection: CSN + Metadata: + Architecture: ResNet152 + Epochs: 58 + FLOPs: 98096676864 + Parameters: 29703568 + Pretrained: Sports1M + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: ircsn_sports1m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb + Converted From: + Weights: https://www.dropbox.com/s/zuoj1aqouh6bo6k/irCSN_152_ft_kinetics_from_sports1m_f101599884.pkl?dl=0 + Code: https://github.com/facebookresearch/VMZ/tree/b61b08194bc3273bef4c45fdfdd36c56c8579ff3 + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 78.2 + Top 5 Accuracy: 93.0 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/csn/vmz/vmz_ircsn_sports1m_pretrained_r152_32x2x1_58e_kinetics400_rgb_20210617-b9b10241.pth +- Config: configs/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb.py + In Collection: CSN + Metadata: + Architecture: ResNet50 + Epochs: 58 + FLOPs: 56209211392 + Parameters: 13131152 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.6 + top5 accuracy: 91.3 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/csn/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb/ircsn_bnfrozen_r50_32x2x1_180e_kinetics400_rgb_20210618-4e29e2e8.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/README.md new file mode 100644 index 0000000000000000000000000000000000000000..37fee079c2b12ed749be1dbd95986c9853137388 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/README.md @@ -0,0 +1,108 @@ +# I3D + +[Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset](https://openaccess.thecvf.com/content_cvpr_2017/html/Carreira_Quo_Vadis_Action_CVPR_2017_paper.html) + +[Non-local Neural Networks](https://openaccess.thecvf.com/content_cvpr_2018/html/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.html) + + + +## Abstract + + + +The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks. This paper re-evaluates state-of-the-art architectures in light of the new Kinetics Human Action Video dataset. Kinetics has two orders of magnitude more data, with 400 human action classes and over 400 clips per class, and is collected from realistic, challenging YouTube videos. We provide an analysis on how current architectures fare on the task of action classification on this dataset and how much performance improves on the smaller benchmark datasets after pre-training on Kinetics. We also introduce a new Two-Stream Inflated 3D ConvNet (I3D) that is based on 2D ConvNet inflation: filters and pooling kernels of very deep image classification ConvNets are expanded into 3D, making it possible to learn seamless spatio-temporal feature extractors from video while leveraging successful ImageNet architecture designs and even their parameters. We show that, after pre-training on Kinetics, I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.9% on HMDB-51 and 98.0% on UCF-101. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :----------------------------------------------------------------------------------------------------------------------------------------------- | :-------------: | :--: | :------: | :------: | :------: | :------: | :---------------------: | :--------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | +| [i3d_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 72.68 | 90.78 | 1.7 (320x3 frames) | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb/20200614_060456.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb/20200614_060456.log.json) | +| [i3d_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 73.27 | 90.92 | x | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_256p_32x2x1_100e_kinetics400_rgb/20200725_031555.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_256p_32x2x1_100e_kinetics400_rgb/20200725_031555.log.json) | +| [i3d_r50_video_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb.py) | short-side 256p | 8 | ResNet50 | ImageNet | 72.85 | 90.75 | x | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/20200706_143014.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/20200706_143014.log.json) | +| [i3d_r50_dense_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py) | 340x256 | 8x2 | ResNet50 | ImageNet | 72.77 | 90.57 | 1.7 (320x3 frames) | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/i3d_r50_dense_32x2x1_100e_kinetics400_rgb_20200616-2bbb4361.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/20200616_230011.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/20200616_230011.log.json) | +| [i3d_r50_dense_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 73.48 | 91.00 | x | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb_20200725-24eb54cc.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/20200725_031604.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/20200725_031604.log.json) | +| [i3d_r50_lazy_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 72.32 | 90.72 | 1.8 (320x3 frames) | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/i3d_r50_fast_32x2x1_100e_kinetics400_rgb_20200612-000e4d2a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/20200612_233836.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/20200612_233836.log.json) | +| [i3d_r50_lazy_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 73.24 | 90.99 | x | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb_20200817-4e90d1d5.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/20200725_031457.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/20200725_031457.log.json) | +| [i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb.py) | short-side 256p | 8x4 | ResNet50 | ImageNet | 74.71 | 91.81 | x | 6438 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb_20200813-6e6aef1b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034054.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034054.log.json) | +| [i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py) | short-side 256p | 8x4 | ResNet50 | ImageNet | 73.37 | 91.26 | x | 4944 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb_20200815-17f84aa2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034909.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034909.log.json) | +| [i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py) | short-side 256p | 8x4 | ResNet50 | ImageNet | 73.92 | 91.59 | x | 4832 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb_20200814-7c30d5bb.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/20200814_044208.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/20200814_044208.log.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train I3D model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py \ + --work-dir work_dirs/i3d_r50_32x2x1_100e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test I3D model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips prob +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{inproceedings, + author = {Carreira, J. and Zisserman, Andrew}, + year = {2017}, + month = {07}, + pages = {4724-4733}, + title = {Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset}, + doi = {10.1109/CVPR.2017.502} +} +``` + + + +```BibTeX +@article{NonLocal2018, + author = {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He}, + title = {Non-local Neural Networks}, + journal = {CVPR}, + year = {2018} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..c04a7e500e557ba44a7148f846c24f8752f555ef --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/README_zh-CN.md @@ -0,0 +1,91 @@ +# I3D + +## 简介 + + + +```BibTeX +@inproceedings{inproceedings, + author = {Carreira, J. and Zisserman, Andrew}, + year = {2017}, + month = {07}, + pages = {4724-4733}, + title = {Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset}, + doi = {10.1109/CVPR.2017.502} +} +``` + + + +```BibTeX +@article{NonLocal2018, + author = {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He}, + title = {Non-local Neural Networks}, + journal = {CVPR}, + year = {2018} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :----------------------------------------------------------------------------------------------------------------------------------------------- | :-------: | :------: | :------: | :------: | :---------: | :---------: | :----------------: | :--------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | +| [i3d_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 72.68 | 90.78 | 1.7 (320x3 frames) | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb/20200614_060456.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb/20200614_060456.log.json) | +| [i3d_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 73.27 | 90.92 | x | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_256p_32x2x1_100e_kinetics400_rgb/20200725_031555.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_256p_32x2x1_100e_kinetics400_rgb/20200725_031555.log.json) | +| [i3d_r50_video_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb.py) | 短边 256p | 8 | ResNet50 | ImageNet | 72.85 | 90.75 | x | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/20200706_143014.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/20200706_143014.log.json) | +| [i3d_r50_dense_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py) | 340x256 | 8x2 | ResNet50 | ImageNet | 72.77 | 90.57 | 1.7 (320x3 frames) | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/i3d_r50_dense_32x2x1_100e_kinetics400_rgb_20200616-2bbb4361.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/20200616_230011.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/20200616_230011.log.json) | +| [i3d_r50_dense_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 73.48 | 91.00 | x | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb_20200725-24eb54cc.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/20200725_031604.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/20200725_031604.log.json) | +| [i3d_r50_lazy_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 72.32 | 90.72 | 1.8 (320x3 frames) | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/i3d_r50_fast_32x2x1_100e_kinetics400_rgb_20200612-000e4d2a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/20200612_233836.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/20200612_233836.log.json) | +| [i3d_r50_lazy_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 73.24 | 90.99 | x | 5170 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb_20200817-4e90d1d5.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/20200725_031457.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/20200725_031457.log.json) | +| [i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb.py) | 短边 256p | 8x4 | ResNet50 | ImageNet | 74.71 | 91.81 | x | 6438 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb_20200813-6e6aef1b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034054.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034054.log.json) | +| [i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py) | 短边 256p | 8x4 | ResNet50 | ImageNet | 73.37 | 91.26 | x | 4944 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb_20200815-17f84aa2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034909.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034909.log.json) | +| [i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb](/configs/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py) | 短边 256p | 8x4 | ResNet50 | ImageNet | 73.92 | 91.59 | x | 4832 | [ckpt](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb_20200814-7c30d5bb.pth) | [log](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/20200814_044208.log) | [json](https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/20200814_044208.log.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 I3D 模型在 Kinetics400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py \ + --work-dir work_dirs/i3d_r50_32x2x1_100e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics400 数据集上测试 I3D 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips prob +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..466285006a3798cf4a0f919b6ec09ab78052002e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/i3d_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict( + non_local=((0, 0, 0), (0, 1, 0, 1), (0, 1, 0, 1, 0, 1), (0, 0, 0)), + non_local_cfg=dict( + sub_sample=True, + use_scale=False, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='dot_product'))) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..969e42a0ddfd6a1e51c2e7dfac7526d758b0ffff --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb.py @@ -0,0 +1,13 @@ +_base_ = ['./i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py'] + +# model settings +model = dict( + backbone=dict( + non_local_cfg=dict( + sub_sample=True, + use_scale=False, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='embedded_gaussian'))) + +# runtime settings +work_dir = './work_dirs/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..f23775876f23d9978f974f1a4c161705ee703520 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py @@ -0,0 +1,13 @@ +_base_ = ['./i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py'] + +# model settings +model = dict( + backbone=dict( + non_local_cfg=dict( + sub_sample=True, + use_scale=False, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='gaussian'))) + +# runtime settings +work_dir = './work_dirs/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..aa0e523f141c49d2f9eed6dd00b915cafba9a32f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py @@ -0,0 +1,86 @@ +_base_ = [ + '../../_base_/models/i3d_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/i3d_r50_32x2x1_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..17ea4303b9e529061bb7418eaee2b881d8a50ddf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py @@ -0,0 +1,80 @@ +_base_ = ['./i3d_r50_32x2x1_100e_kinetics400_rgb.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DenseSampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# runtime settings +work_dir = './work_dirs/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_heavy_8x8x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_heavy_8x8x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..f21feb2a01b479053e32044af270516c40a71051 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_heavy_8x8x1_100e_kinetics400_rgb.py @@ -0,0 +1,88 @@ +_base_ = ['./i3d_r50_32x2x1_100e_kinetics400_rgb.py'] + +# model settings +model = dict( + backbone=dict( + inflate=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + with_pool2=True)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# runtime settings +work_dir = './work_dirs/i3d_r50_heavy_8x8x1_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..de84b8feb5bef3c64b85bdcf9cad36f06cfc6add --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb.py @@ -0,0 +1,84 @@ +_base_ = ['./i3d_r50_32x2x1_100e_kinetics400_rgb.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode', decoding_backend='turbojpeg'), + dict(type='Resize', scale=(-1, 256), lazy=True), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0, + lazy=True), + dict(type='Resize', scale=(224, 224), keep_ratio=False, lazy=True), + dict(type='Flip', flip_ratio=0.5, lazy=True), + dict(type='Fuse'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode', decoding_backend='turbojpeg'), + dict(type='Resize', scale=(-1, 256), lazy=True), + dict(type='CenterCrop', crop_size=224, lazy=True), + dict(type='Flip', flip_ratio=0, lazy=True), + dict(type='Fuse'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode', decoding_backend='turbojpeg'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# runtime settings +work_dir = './work_dirs/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..1477ac2a99988127ad23406c5789eb6d7de2795c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb.py @@ -0,0 +1,83 @@ +_base_ = ['./i3d_r50_32x2x1_100e_kinetics400_rgb.py'] + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# runtime settings +work_dir = './work_dirs/i3d_r50_video_3d_32x2x1_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..973f7fb88f06c96b91f958e72fe3507e555f980c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py @@ -0,0 +1,83 @@ +_base_ = ['./i3d_r50_heavy_8x8x1_100e_kinetics400_rgb.py'] + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# +# runtime settings +work_dir = './work_dirs/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_imgaug_32x2x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_imgaug_32x2x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..86baa0289d5d599a11b5fd6fc69384549d9deb14 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_imgaug_32x2x1_100e_kinetics400_rgb.py @@ -0,0 +1,111 @@ +_base_ = ['../../_base_/models/i3d_r50.py'] + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict( + type='Imgaug', + transforms=[ + dict(type='Fliplr', p=0.5), + dict(type='Rotate', rotate=(-20, 20)), + dict(type='Dropout', p=(0, 0.05)) + ]), + # dict(type='Imgaug', transforms='default'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[40, 80]) +total_epochs = 100 +checkpoint_config = dict(interval=5) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/i3d_r50_video_3d_32x2x1_100e_kinetics400_rgb/' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..497c013564208616c71092983bd93948ce382607 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py @@ -0,0 +1,30 @@ +_base_ = ['../../_base_/models/i3d_r50.py'] + +# dataset settings +dataset_type = 'VideoDataset' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict(type='DecordInit', num_threads=1), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=None, + data_prefix=None, + pipeline=test_pipeline)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..22a7bfe33c2e77970c8a0488b0a1f1a011da80b0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/i3d/metafile.yml @@ -0,0 +1,237 @@ +Collections: +- Name: I3D + README: configs/recognition/i3d/README.md + Paper: + URL: https://arxiv.org/abs/1705.07750 + Title: Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset +Models: +- Config: configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 43564040192 + Parameters: 28043472 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: i3d_r50_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.68 + Top 5 Accuracy: 90.78 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb/20200614_060456.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb/20200614_060456.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth +- Config: configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 43564040192 + Parameters: 28043472 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: i3d_r50_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.27 + Top 5 Accuracy: 90.92 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_256p_32x2x1_100e_kinetics400_rgb/20200725_031555.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_256p_32x2x1_100e_kinetics400_rgb/20200725_031555.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth +- Config: configs/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 43564040192 + Parameters: 28043472 + Pretrained: ImageNet + Resolution: short-side 256p + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: i3d_r50_video_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.85 + Top 5 Accuracy: 90.75 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/20200706_143014.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/20200706_143014.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth +- Config: configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 43564040192 + Parameters: 28043472 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: i3d_r50_dense_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.77 + Top 5 Accuracy: 90.57 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/20200616_230011.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/20200616_230011.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb/i3d_r50_dense_32x2x1_100e_kinetics400_rgb_20200616-2bbb4361.pth +- Config: configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 43564040192 + Parameters: 28043472 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: i3d_r50_dense_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.48 + Top 5 Accuracy: 91.0 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/20200725_031604.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/20200725_031604.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb_20200725-24eb54cc.pth +- Config: configs/recognition/i3d/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 43564040192 + Parameters: 28043472 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: i3d_r50_lazy_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.32 + Top 5 Accuracy: 90.72 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/20200612_233836.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/20200612_233836.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/i3d_r50_fast_32x2x1_100e_kinetics400_rgb_20200612-000e4d2a.pth +- Config: configs/recognition/i3d/i3d_r50_lazy_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 43564040192 + Parameters: 28043472 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: i3d_r50_lazy_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.24 + Top 5 Accuracy: 90.99 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/20200725_031457.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/20200725_031457.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb_20200817-4e90d1d5.pth +- Config: configs/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 54334488576 + Parameters: 35397840 + Pretrained: ImageNet + Resolution: short-side 256p + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 74.71 + Top 5 Accuracy: 91.81 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034054.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034054.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb_20200813-6e6aef1b.pth +- Config: configs/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 48962109440 + Parameters: 31723728 + Pretrained: ImageNet + Resolution: short-side 256p + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.37 + Top 5 Accuracy: 91.26 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034909.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034909.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb_20200815-17f84aa2.pth +- Config: configs/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py + In Collection: I3D + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 54334488576 + Parameters: 35397840 + Pretrained: ImageNet + Resolution: short-side 256p + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.92 + Top 5 Accuracy: 91.59 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/20200814_044208.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/20200814_044208.log + Weights: https://download.openmmlab.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb_20200814-7c30d5bb.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/README.md new file mode 100644 index 0000000000000000000000000000000000000000..daeda15424f686fd44ec576bababd0d1f7af867d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/README.md @@ -0,0 +1,80 @@ +# Omni-sourced Webly-supervised Learning for Video Recognition + +[Omni-sourced Webly-supervised Learning for Video Recognition](https://arxiv.org/abs/2003.13042) + +[Dataset](https://docs.google.com/forms/d/e/1FAIpQLSd8_GlmHzG8FcDbW-OEu__G7qLgOSYZpH-i5vYVJcu7wcb_TQ/viewform?usp=sf_link) + +## Abstract + + + +We introduce OmniSource, a novel framework for leveraging web data to train video recognition models. OmniSource overcomes the barriers between data formats, such as images, short videos, and long untrimmed videos for webly-supervised learning. First, data samples with multiple formats, curated by task-specific data collection and automatically filtered by a teacher model, are transformed into a unified form. Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning. Several good practices, including data balancing, resampling, and cross-dataset mixup are adopted in joint training. Experiments show that by utilizing data from multiple sources and formats, OmniSource is more data-efficient in training. With only 3.5M images and 800K minutes videos crawled from the internet without human labeling (less than 2% of prior works), our models learned with OmniSource improve Top-1 accuracy of 2D- and 3D-ConvNet baseline models by 3.0% and 3.9%, respectively, on the Kinetics-400 benchmark. With OmniSource, we establish new records with different pretraining strategies for video recognition. Our best models achieve 80.4%, 80.5%, and 83.6 Top-1 accuracies on the Kinetics-400 benchmark respectively for training-from-scratch, ImageNet pre-training and IG-65M pre-training. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 Model Release + +We currently released 4 models trained with OmniSource framework, including both 2D and 3D architectures. We compare the performance of models trained with or without OmniSource in the following table. + +| Model | Modality | Pretrained | Backbone | Input | Resolution | Top-1 (Baseline / OmniSource (Delta)) | Top-5 (Baseline / OmniSource (Delta))) | Download | +| :------: | :------: | :--------: | :-------: | :---: | :------------: | :-----------------------------------: | :------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| TSN | RGB | ImageNet | ResNet50 | 3seg | 340x256 | 70.6 / 73.6 (+ 3.0) | 89.4 / 91.0 (+ 1.6) | [Baseline](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_imagenet_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-54192355.pth) | +| TSN | RGB | IG-1B | ResNet50 | 3seg | short-side 320 | 73.1 / 75.7 (+ 2.6) | 90.4 / 91.9 (+ 1.5) | [Baseline](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_without_omni_1x1x3_kinetics400_rgb_20200926-c133dd49.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-2863fed0.pth) | +| SlowOnly | RGB | Scratch | ResNet50 | 4x16 | short-side 320 | 72.9 / 76.8 (+ 3.9) | 90.9 / 92.5 (+ 1.6) | [Baseline](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r50_omni_4x16x1_kinetics400_rgb_20200926-51b1f7ea.pth) | +| SlowOnly | RGB | Scratch | ResNet101 | 8x8 | short-side 320 | 76.5 / 80.4 (+ 3.9) | 92.7 / 94.4 (+ 1.7) | [Baseline](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_without_omni_8x8x1_kinetics400_rgb_20200926-0c730aef.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_omni_8x8x1_kinetics400_rgb_20200926-b5dbb701.pth) | + +1. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +## Benchmark on Mini-Kinetics + +We release a subset of web dataset used in the OmniSource paper. Specifically, we release the web data in the 200 classes of [Mini-Kinetics](https://arxiv.org/pdf/1712.04851.pdf). The statistics of those datasets is detailed in [preparing_omnisource](/tools/data/omnisource/README.md). To obtain those data, you need to fill in a [data request form](https://docs.google.com/forms/d/e/1FAIpQLSd8_GlmHzG8FcDbW-OEu__G7qLgOSYZpH-i5vYVJcu7wcb_TQ/viewform?usp=sf_link). After we received your request, the download link of these data will be send to you. For more details on the released OmniSource web dataset, please refer to [preparing_omnisource](/tools/data/omnisource/README.md). + +We benchmark the OmniSource framework on the released subset, results are listed in the following table (we report the Top-1 and Top-5 accuracy on Mini-Kinetics validation). The benchmark can be used as a baseline for video recognition with web data. + +### TSN-8seg-ResNet50 + +| Model | Modality | Pretrained | Backbone | Input | Resolution | top1 acc | top5 acc | ckpt | json | log | +| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -------- | ---------- | -------- | ----- | -------------- | :------: | :------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x8_100e_minikinetics_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 77.4 | 93.6 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/baseline/tsn_r50_1x1x8_100e_minikinetics_rgb_20201030-b4eaf92b.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/baseline/tsn_r50_1x1x8_100e_minikinetics_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/baseline/tsn_r50_1x1x8_100e_minikinetics_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 78.0 | 93.6 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/googleimage/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb_20201030-23966b4b.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/googleimage/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/googleimage/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_webimage_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 78.6 | 93.6 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/webimage/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb_20201030-66f5e046.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/webimage/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/webimage/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 80.6 | 95.0 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/insvideo/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb_20201030-011f984d.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/insvideo/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/insvideo/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 78.6 | 93.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/kineticsraw/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb_20201030-59f5d064.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/kineticsraw/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/kineticsraw/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 81.3 | 94.8 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/omnisource/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb_20201030-0f56ef51.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/omnisource/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/omnisource/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb_20201030.log) | + +### SlowOnly-8x8-ResNet50 + +| Model | Modality | Pretrained | Backbone | Input | Resolution | top1 acc | top5 acc | ckpt | json | log | +| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -------- | ---------- | -------- | ----- | -------------- | :------: | :------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_8x8x1_256e_minikinetics_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 78.6 | 93.9 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/baseline/slowonly_r50_8x8x1_256e_minikinetics_rgb_20201030-168eb098.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/baseline/slowonly_r50_8x8x1_256e_minikinetics_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/baseline/slowonly_r50_8x8x1_256e_minikinetics_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 80.8 | 95.0 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/googleimage/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb_20201030-7da6dfc3.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/googleimage/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/googleimage/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 81.3 | 95.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/webimage/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb_20201030-c36616e9.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/webimage/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/webimage/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 82.4 | 95.6 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/insvideo/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb_20201030-e2890e8d.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/insvideo/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/insvideo/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 80.3 | 94.5 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/kineticsraw/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb_20201030-62974bac.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/kineticsraw/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/kineticsraw/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 82.9 | 95.8 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/omnisource/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb_20201030-284cfd3b.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/omnisource/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/omnisource/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb_20201030.log) | + +We also list the benchmark in the original paper which run on Kinetics-400 for comparison: + +| Model | Baseline | +GG-img | +\[GG-IG\]-img | +IG-vid | +KRaw | OmniSource | +| :--------------------: | :---------: | :---------: | :------------: | :---------: | :---------: | :---------: | +| TSN-3seg-ResNet50 | 70.6 / 89.4 | 71.5 / 89.5 | 72.0 / 90.0 | 72.0 / 90.3 | 71.7 / 89.6 | 73.6 / 91.0 | +| SlowOnly-4x16-ResNet50 | 73.8 / 90.9 | 74.5 / 91.4 | 75.2 / 91.6 | 75.2 / 91.7 | 74.5 / 91.1 | 76.6 / 92.5 | + +## Citation + + + +```BibTeX +@article{duan2020omni, + title={Omni-sourced Webly-supervised Learning for Video Recognition}, + author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua}, + journal={arXiv preprint arXiv:2003.13042}, + year={2020} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..ac87258746623b26e8642251cc963f823d3ebfb3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/README_zh-CN.md @@ -0,0 +1,72 @@ +# Omni-sourced Webly-supervised Learning for Video Recognition + +[Haodong Duan](https://github.com/kennymckormick), [Yue Zhao](https://github.com/zhaoyue-zephyrus), [Yuanjun Xiong](https://github.com/yjxiong), Wentao Liu, [Dahua Lin](https://github.com/lindahua) + +In ECCV, 2020. [Paper](https://arxiv.org/abs/2003.13042), [Dataset](https://docs.google.com/forms/d/e/1FAIpQLSd8_GlmHzG8FcDbW-OEu__G7qLgOSYZpH-i5vYVJcu7wcb_TQ/viewform?usp=sf_link) + +![pipeline](https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/omnisource/pipeline.png?raw=true) + +## 模型库 + +### Kinetics-400 + +MMAction2 当前公开了 4 个 OmniSource 框架训练的模型,包含 2D 架构与 3D 架构。下表比较了使用或不适用 OmniSource 框架训练得的模型在 Kinetics-400 上的精度: + +| 模型 | 模态 | 预训练 | 主干网络 | 输入 | 分辨率 | Top-1 准确率(Baseline / OmniSource (Delta)) | Top-5 准确率(Baseline / OmniSource (Delta))) | 模型下载链接 | +| :------: | :--: | :------: | :-------: | :--: | :------------: | :-----------------------------------------: | :------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| TSN | RGB | ImageNet | ResNet50 | 3seg | 340x256 | 70.6 / 73.6 (+ 3.0) | 89.4 / 91.0 (+ 1.6) | [Baseline](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_imagenet_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-54192355.pth) | +| TSN | RGB | IG-1B | ResNet50 | 3seg | short-side 320 | 73.1 / 75.7 (+ 2.6) | 90.4 / 91.9 (+ 1.5) | [Baseline](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_without_omni_1x1x3_kinetics400_rgb_20200926-c133dd49.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-2863fed0.pth) | +| SlowOnly | RGB | None | ResNet50 | 4x16 | short-side 320 | 72.9 / 76.8 (+ 3.9) | 90.9 / 92.5 (+ 1.6) | [Baseline](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r50_omni_4x16x1_kinetics400_rgb_20200926-51b1f7ea.pth) | +| SlowOnly | RGB | None | ResNet101 | 8x8 | short-side 320 | 76.5 / 80.4 (+ 3.9) | 92.7 / 94.4 (+ 1.7) | [Baseline](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_without_omni_8x8x1_kinetics400_rgb_20200926-0c730aef.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_omni_8x8x1_kinetics400_rgb_20200926-b5dbb701.pth) | + +1. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +## Mini-Kinetics 上的基准测试 + +OmniSource 项目当前公开了所采集网络数据的一个子集,涉及 [Mini-Kinetics](https://arxiv.org/pdf/1712.04851.pdf) 中的 200 个动作类别。[OmniSource 数据集准备](/tools/data/omnisource/README_zh-CN.md) 中记录了这些数据集的详细统计信息。用户可以通过填写 [申请表](https://docs.google.com/forms/d/e/1FAIpQLSd8_GlmHzG8FcDbW-OEu__G7qLgOSYZpH-i5vYVJcu7wcb_TQ/viewform?usp=sf_link) 获取这些数据,在完成填写后,数据下载链接会被发送至用户邮箱。更多关于 OmniSource 网络数据集的信息请参照 [OmniSource 数据集准备](/tools/data/omnisource/README_zh-CN.md)。 + +MMAction2 在公开的数据集上进行了 OmniSource 框架的基准测试,下表记录了详细的结果(在 Mini-Kinetics 验证集上的精度),这些结果可以作为使用网络数据训练视频识别任务的基线。 + +### TSN-8seg-ResNet50 + +| 模型 | 模态 | 预训练 | 主干网络 | 输入 | 分辨率 | Top-1 准确率 | Top-5 准确率 | ckpt | json | log | +| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--: | :------: | :------: | :--: | :------------: | :----------: | :----------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x8_100e_minikinetics_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 77.4 | 93.6 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/baseline/tsn_r50_1x1x8_100e_minikinetics_rgb_20201030-b4eaf92b.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/baseline/tsn_r50_1x1x8_100e_minikinetics_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/baseline/tsn_r50_1x1x8_100e_minikinetics_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 78.0 | 93.6 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/googleimage/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb_20201030-23966b4b.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/googleimage/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/googleimage/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_webimage_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 78.6 | 93.6 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/webimage/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb_20201030-66f5e046.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/webimage/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/webimage/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 80.6 | 95.0 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/insvideo/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb_20201030-011f984d.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/insvideo/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/insvideo/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 78.6 | 93.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/kineticsraw/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb_20201030-59f5d064.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/kineticsraw/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/kineticsraw/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb_20201030.log) | +| [tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb](/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb.py) | RGB | ImageNet | ResNet50 | 3seg | short-side 320 | 81.3 | 94.8 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/omnisource/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb_20201030-0f56ef51.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/omnisource/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/omnisource/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb_20201030.log) | + +### SlowOnly-8x8-ResNet50 + +| 模型 | 模态 | 预训练 | 主干网络 | 输入 | 分辨率 | Top-1 准确率 | Top-5 准确率 | ckpt | json | log | +| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--: | :----: | :------: | :--: | :------------: | :----------: | :----------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_8x8x1_256e_minikinetics_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 78.6 | 93.9 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/baseline/slowonly_r50_8x8x1_256e_minikinetics_rgb_20201030-168eb098.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/baseline/slowonly_r50_8x8x1_256e_minikinetics_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/baseline/slowonly_r50_8x8x1_256e_minikinetics_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 80.8 | 95.0 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/googleimage/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb_20201030-7da6dfc3.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/googleimage/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/googleimage/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 81.3 | 95.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/webimage/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb_20201030-c36616e9.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/webimage/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/webimage/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 82.4 | 95.6 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/insvideo/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb_20201030-e2890e8d.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/insvideo/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/insvideo/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 80.3 | 94.5 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/kineticsraw/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb_20201030-62974bac.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/kineticsraw/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/kineticsraw/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb_20201030.log) | +| [slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb](/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb.py) | RGB | None | ResNet50 | 8x8 | short-side 320 | 82.9 | 95.8 | [ckpt](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/omnisource/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb_20201030-284cfd3b.pth) | [json](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/omnisource/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb_20201030.json) | [log](https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/omnisource/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb_20201030.log) | + +下表列出了原论文中在 Kinetics-400 上进行基准测试的结果供参考: + +| Model | Baseline | +GG-img | +\[GG-IG\]-img | +IG-vid | +KRaw | OmniSource | +| :--------------------: | :---------: | :---------: | :------------: | :---------: | :---------: | :---------: | +| TSN-3seg-ResNet50 | 70.6 / 89.4 | 71.5 / 89.5 | 72.0 / 90.0 | 72.0 / 90.3 | 71.7 / 89.6 | 73.6 / 91.0 | +| SlowOnly-4x16-ResNet50 | 73.8 / 90.9 | 74.5 / 91.4 | 75.2 / 91.6 | 75.2 / 91.7 | 74.5 / 91.1 | 76.6 / 92.5 | + +## 注: + +如果 OmniSource 项目对您的研究有所帮助,请使用以下 BibTex 项进行引用: + + + +```BibTeX +@article{duan2020omni, + title={Omni-sourced Webly-supervised Learning for Video Recognition}, + author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua}, + journal={arXiv preprint arXiv:2003.13042}, + year={2020} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..ae3db16e52ec0270d4cbbf4069a8d857fdb6f9d8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/metafile.yml @@ -0,0 +1,388 @@ +Collections: +- Name: OmniSource + README: configs/recognition/omnisource/README.md + Paper: + URL: https://arxiv.org/abs/2003.13042 + Title: Omni-sourced Webly-supervised Learning for Video Recognition + +Models: +- Config: configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134526976000 + Input: 3seg + Modality: RGB + Parameters: 23917832 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: tsn_r50_1x1x8_100e_minikinetics_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 77.4 + Top 5 Accuracy: 93.6 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/baseline/tsn_r50_1x1x8_100e_minikinetics_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/baseline/tsn_r50_1x1x8_100e_minikinetics_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/baseline/tsn_r50_1x1x8_100e_minikinetics_rgb_20201030-b4eaf92b.pth +- Config: configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134526976000 + Input: 3seg + Modality: RGB + Parameters: 23917832 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 78.0 + Top 5 Accuracy: 93.6 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/googleimage/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/googleimage/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/googleimage/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb_20201030-23966b4b.pth +- Config: configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134526976000 + Input: 3seg + Modality: RGB + Parameters: 23917832 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: tsn_r50_1x1x8_100e_minikinetics_webimage_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 78.6 + Top 5 Accuracy: 93.6 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/webimage/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/webimage/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/webimage/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb_20201030-66f5e046.pth +- Config: configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134526976000 + Input: 3seg + Modality: RGB + Parameters: 23917832 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 80.6 + Top 5 Accuracy: 95.0 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/insvideo/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/insvideo/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/insvideo/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb_20201030-011f984d.pth +- Config: configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134526976000 + Input: 3seg + Modality: RGB + Parameters: 23917832 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 78.6 + Top 5 Accuracy: 93.2 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/kineticsraw/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/kineticsraw/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/kineticsraw/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb_20201030-59f5d064.pth +- Config: configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134526976000 + Input: 3seg + Modality: RGB + Parameters: 23917832 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 81.3 + Top 5 Accuracy: 94.8 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/omnisource/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/omnisource/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb/omnisource/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb_20201030-0f56ef51.pth +- Config: configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 256 + FLOPs: 54860070912 + Input: 8x8 + Modality: RGB + Parameters: 32044296 + Pretrained: None + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: slowonly_r50_8x8x1_256e_minikinetics_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 78.6 + Top 5 Accuracy: 93.9 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/baseline/slowonly_r50_8x8x1_256e_minikinetics_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/baseline/slowonly_r50_8x8x1_256e_minikinetics_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/baseline/slowonly_r50_8x8x1_256e_minikinetics_rgb_20201030-168eb098.pth +- Config: configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 256 + FLOPs: 54860070912 + Input: 8x8 + Modality: RGB + Parameters: 32044296 + Pretrained: None + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 80.8 + Top 5 Accuracy: 95.0 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/googleimage/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/googleimage/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/googleimage/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb_20201030-7da6dfc3.pth +- Config: configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 256 + FLOPs: 54860070912 + Input: 8x8 + Modality: RGB + Parameters: 32044296 + Pretrained: None + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 81.3 + Top 5 Accuracy: 95.2 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/webimage/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/webimage/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/webimage/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb_20201030-c36616e9.pth +- Config: configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 256 + FLOPs: 54860070912 + Input: 8x8 + Modality: RGB + Parameters: 32044296 + Pretrained: None + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 82.4 + Top 5 Accuracy: 95.6 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/insvideo/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/insvideo/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/insvideo/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb_20201030-e2890e8d.pth +- Config: configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 256 + FLOPs: 54860070912 + Input: 8x8 + Modality: RGB + Parameters: 32044296 + Pretrained: None + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 80.3 + Top 5 Accuracy: 94.5 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/kineticsraw/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/kineticsraw/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/kineticsraw/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb_20201030-62974bac.pth +- Config: configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 256 + FLOPs: 54860070912 + Input: 8x8 + Modality: RGB + Parameters: 32044296 + Pretrained: None + Resolution: short-side 320 + Training Data: MiniKinetics + Modality: RGB + Name: slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb + Results: + - Dataset: MiniKinetics + Metrics: + Top 1 Accuracy: 82.9 + Top 5 Accuracy: 95.8 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/omnisource/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb_20201030.json + Training Log: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/omnisource/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb_20201030.log + Weights: https://download.openmmlab.com/mmaction/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb/omnisource/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb_20201030-284cfd3b.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 102997721600 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Modality: RGB + Name: tsn_omnisource_r50_1x1x3_100e_kinetics_rgb + Converted From: + Weights: https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/tsn_OmniSource_kinetics400_se_rgb_r50_seg3_f1s1_imagenet-4066cb7e.pth + Code: https://github.com/open-mmlab/mmaction + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.6 + Top 5 Accuracy: 91.0 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_imagenet_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-54192355.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 102997721600 + Parameters: 24327632 + Pretrained: IG-1B + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: tsn_IG1B_pretrained_omnisource_r50_1x1x3_100e_kinetics_rgb + Converted From: + Weights: https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/tsn_OmniSource_kinetics400_se_rgb_r50_seg3_f1s1_IG1B-25fc136b.pth + Code: https://github.com/open-mmlab/mmaction/ + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 75.7 + Top 5 Accuracy: 91.9 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-2863fed0.pth +- Config: configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 27430649856 + Parameters: 32454096 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: slowonly_r50_omnisource_4x16x1_256e_kinetics400_rgb + Converted From: + Weights: https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/slowonly_OmniSource_kinetics400_se_rgb_r50_seg1_4x16_scratch-71f7b8ee.pth + Code: https://github.com/open-mmlab/mmaction/ + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.8 + Top 5 Accuracy: 92.5 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r50_omni_4x16x1_kinetics400_rgb_20200926-51b1f7ea.pth +- Config: configs/recognition/slowonly/slowonly_r101_8x8x1_196e_kinetics400_rgb.py + In Collection: OmniSource + Metadata: + Architecture: ResNet101 + Batch Size: 8 + Epochs: 196 + FLOPs: 112063447040 + Parameters: 60359120 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: slowonly_r101_omnisource_8x8x1_196e_kinetics400_rgb + Converted From: + Weights: https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/slowonly_OmniSource_kinetics400_se_rgb_r101_seg1_8x8_scratch-2f838cb0.pth + Code: https://github.com/open-mmlab/mmaction/ + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 80.4 + Top 5 Accuracy: 94.4 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_omni_8x8x1_kinetics400_rgb_20200926-b5dbb701.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/pipeline.png b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/pipeline.png new file mode 100644 index 0000000000000000000000000000000000000000..a3e3a2a046b04ea3f18dd26f007e97d268361b16 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/pipeline.png differ diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..0aee7f2c2c449d8b54160169eec137e3dc563961 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb.py @@ -0,0 +1,130 @@ +_base_ = [ + '../../../_base_/models/slowonly_r50.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None), cls_head=dict(num_classes=200)) + +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +gg_root = 'data/OmniSource/googleimage_200' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_gg = ('data/OmniSource/annotations/googleimage_200/' + 'tsn_8seg_googleimage_200_wodup.txt') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_gg_pipeline = [ + dict(type='ImageDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='BuildPseudoClip', clip_len=8), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type='ImageDataset', + ann_file=ann_file_gg, + data_prefix=gg_root, + pipeline=train_gg_pipeline) + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=8, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.15, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) + +# runtime settings +total_epochs = 256 +checkpoint_config = dict(interval=8) +work_dir = ('./work_dirs/omnisource/' + 'slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb') +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..06195d431ce230850df8507eba3160f0b298b5b5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb.py @@ -0,0 +1,134 @@ +_base_ = [ + '../../../_base_/models/slowonly_r50.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None), cls_head=dict(num_classes=200)) + +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +iv_root = 'data/OmniSource/insvideo_200' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_iv = ('data/OmniSource/annotations/insvideo_200/' + 'slowonly_8x8_insvideo_200_wodup.txt') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_iv_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type=dataset_type, + ann_file=ann_file_iv, + data_prefix=iv_root, + pipeline=train_iv_pipeline, + num_classes=200, + sample_by_class=True, + power=0.5) + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=8, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.15, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) + +# runtime settings +total_epochs = 256 +checkpoint_config = dict(interval=8) +work_dir = ('./work_dirs/omnisource/' + 'slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb') +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..35263134cd3e05fd07534e86e662c51f368082a8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb.py @@ -0,0 +1,133 @@ +_base_ = [ + '../../../_base_/models/slowonly_r50.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None), cls_head=dict(num_classes=200)) + +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +kraw_root = 'data/OmniSource/kinetics_raw_200_train' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_kraw = ('data/OmniSource/annotations/kinetics_raw_200/' + 'slowonly_8x8_kinetics_raw_200.json') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_kraw_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type='RawVideoDataset', + ann_file=ann_file_kraw, + data_prefix=kraw_root, + pipeline=train_kraw_pipeline, + clipname_tmpl='part_{}.mp4', + sampling_strategy='positive') + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=8, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.15, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) + +# runtime settings +total_epochs = 256 +checkpoint_config = dict(interval=8) +work_dir = ('./work_dirs/omnisource/' + 'slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb') +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4ef38005bce749d9e7343daac733fe1312d2ae3c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb.py @@ -0,0 +1,181 @@ +_base_ = [ + '../../../_base_/models/slowonly_r50.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None), cls_head=dict(num_classes=200)) + +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +web_root = 'data/OmniSource/' +iv_root = 'data/OmniSource/insvideo_200' +kraw_root = 'data/OmniSource/kinetics_raw_200_train' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_web = ('data/OmniSource/annotations/webimage_200/' + 'tsn_8seg_webimage_200_wodup.txt') +ann_file_iv = ('data/OmniSource/annotations/insvideo_200/' + 'slowonly_8x8_insvideo_200_wodup.txt') +ann_file_kraw = ('data/OmniSource/annotations/kinetics_raw_200/' + 'slowonly_8x8_kinetics_raw_200.json') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_web_pipeline = [ + dict(type='ImageDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='BuildPseudoClip', clip_len=8), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_iv_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_kraw_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train_ratio=[2, 1, 1, 1], + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type='ImageDataset', + ann_file=ann_file_web, + data_prefix=web_root, + pipeline=train_web_pipeline, + num_classes=200, + sample_by_class=True, + power=0.5), + dict( + type=dataset_type, + ann_file=ann_file_iv, + data_prefix=iv_root, + pipeline=train_iv_pipeline, + num_classes=200, + sample_by_class=True, + power=0.5), + dict( + type='RawVideoDataset', + ann_file=ann_file_kraw, + data_prefix=kraw_root, + pipeline=train_kraw_pipeline, + clipname_tmpl='part_{}.mp4', + sampling_strategy='positive') + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=8, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.15, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) + +# runtime settings +total_epochs = 256 +checkpoint_config = dict(interval=8) +work_dir = ('./work_dirs/omnisource/' + 'slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb') +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..38f7be651bc094ed89e1e699d66883f19c3c43c7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_rgb.py @@ -0,0 +1,108 @@ +_base_ = [ + '../../../_base_/models/slowonly_r50.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None), cls_head=dict(num_classes=200)) + +# dataset settings +dataset_type = 'VideoDataset' + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +evaluation = dict( + interval=8, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# The flag indicates using joint training +omnisource = True + +# optimizer +optimizer = dict( + type='SGD', lr=0.15, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) + +# runtime settings +total_epochs = 256 +checkpoint_config = dict(interval=8) +work_dir = './work_dirs/omnisource/slowonly_r50_8x8x1_256e_minikinetics_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4acf708c5b0a5ee507ea44679b68590eafad6686 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/slowonly_r50_8x8x1_256e_minikinetics/slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb.py @@ -0,0 +1,132 @@ +_base_ = [ + '../../../_base_/models/slowonly_r50.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None), cls_head=dict(num_classes=200)) + +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +web_root = 'data/OmniSource/' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_web = ('data/OmniSource/annotations/webimage_200/' + 'tsn_8seg_webimage_200_wodup.txt') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_web_pipeline = [ + dict(type='ImageDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='BuildPseudoClip', clip_len=8), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type='ImageDataset', + ann_file=ann_file_web, + data_prefix=web_root, + pipeline=train_web_pipeline, + num_classes=200, + sample_by_class=True, + power=0.5) + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=8, metrics=['top_k_accuracy', 'mean_class_accuracy']) +# optimizer +optimizer = dict( + type='SGD', lr=0.15, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) + +# runtime settings +total_epochs = 256 +checkpoint_config = dict(interval=8) +work_dir = ('./work_dirs/omnisource/' + 'slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb') +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..447b7cb6c411e7ae2c70557586dc5343737ae439 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb.py @@ -0,0 +1,126 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=200)) + +omnisource = True +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +gg_root = 'data/OmniSource/googleimage_200' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_gg = ('data/OmniSource/annotations/googleimage_200/' + 'tsn_8seg_googleimage_200_wodup.txt') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_gg_pipeline = [ + dict(type='ImageDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + omni_videos_per_gpu=[12, 64], + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type='ImageDataset', + ann_file=ann_file_gg, + data_prefix=gg_root, + pipeline=train_gg_pipeline) + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00375, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = ('./work_dirs/omnisource/' + 'tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..89d369403c0cc5e2e7f66291dae6a8d1a4dbd60b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb.py @@ -0,0 +1,130 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=200)) + +omnisource = True +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +iv_root = 'data/OmniSource/insvideo_200' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_iv = ('data/OmniSource/annotations/insvideo_200/' + 'slowonly_8x8_insvideo_200_wodup.txt') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_iv_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type=dataset_type, + ann_file=ann_file_iv, + data_prefix=iv_root, + pipeline=train_iv_pipeline, + num_classes=200, + sample_by_class=True, + power=0.5) + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00375, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = ('./work_dirs/omnisource/' + 'tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..f86eaa5f698f6e50b810fea425beb1356a151cc2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb.py @@ -0,0 +1,129 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=200)) + +omnisource = True +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +kraw_root = 'data/OmniSource/kinetics_raw_200_train' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_kraw = ('data/OmniSource/annotations/kinetics_raw_200/' + 'slowonly_8x8_kinetics_raw_200.json') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_kraw_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type='RawVideoDataset', + ann_file=ann_file_kraw, + data_prefix=kraw_root, + pipeline=train_kraw_pipeline, + clipname_tmpl='part_{}.mp4', + sampling_strategy='positive') + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00375, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = ('./work_dirs/omnisource/' + 'tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..e87c726b477cd0be8f16bd9d73b03b72a64c9c6d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb.py @@ -0,0 +1,177 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=200)) + +omnisource = True +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +web_root = 'data/OmniSource/' +iv_root = 'data/OmniSource/insvideo_200' +kraw_root = 'data/OmniSource/kinetics_raw_200_train' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_web = ('data/OmniSource/annotations/webimage_200/' + 'tsn_8seg_webimage_200_wodup.txt') +ann_file_iv = ('data/OmniSource/annotations/insvideo_200/' + 'slowonly_8x8_insvideo_200_wodup.txt') +ann_file_kraw = ('data/OmniSource/annotations/kinetics_raw_200/' + 'slowonly_8x8_kinetics_raw_200.json') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_web_pipeline = [ + dict(type='ImageDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_iv_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_kraw_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + omni_videos_per_gpu=[12, 64, 12, 12], + train_ratio=[2, 1, 1, 1], + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type='ImageDataset', + ann_file=ann_file_web, + data_prefix=web_root, + pipeline=train_web_pipeline, + num_classes=200, + sample_by_class=True, + power=0.5), + dict( + type=dataset_type, + ann_file=ann_file_iv, + data_prefix=iv_root, + pipeline=train_iv_pipeline, + num_classes=200, + sample_by_class=True, + power=0.5), + dict( + type='RawVideoDataset', + ann_file=ann_file_kraw, + data_prefix=kraw_root, + pipeline=train_kraw_pipeline, + clipname_tmpl='part_{}.mp4', + sampling_strategy='positive') + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00375, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = ('./work_dirs/omnisource/' + 'tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6ec9e1dc65f6a49d1698650bcc0c9c21b9b30517 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_rgb.py @@ -0,0 +1,100 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=200)) + +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00375, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = './work_dirs/omnisource/tsn_r50_1x1x8_100e_minikinetics_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..070aa8571ed14802a7a8a35cc746468601de30ab --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/omnisource/tsn_r50_1x1x8_100e_minikinetics/tsn_r50_1x1x8_100e_minikinetics_webimage_rgb.py @@ -0,0 +1,129 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=200)) + +omnisource = True +# dataset settings +dataset_type = 'VideoDataset' +# The flag indicates using joint training +omnisource = True + +data_root = 'data/OmniSource/kinetics_200_train' +data_root_val = 'data/OmniSource/kinetics_200_val' +web_root = 'data/OmniSource/' + +ann_file_train = 'data/OmniSource/annotations/kinetics_200/k200_train.txt' +ann_file_web = ('data/OmniSource/annotations/webimage_200/' + 'tsn_8seg_webimage_200_wodup.txt') + +ann_file_val = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' +ann_file_test = 'data/OmniSource/annotations/kinetics_200/k200_val.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +train_web_pipeline = [ + dict(type='ImageDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] + +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=12, + omni_videos_per_gpu=[12, 64], + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=[ + dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + dict( + type='ImageDataset', + ann_file=ann_file_web, + data_prefix=web_root, + pipeline=train_web_pipeline, + num_classes=200, + sample_by_class=True, + power=0.5) + ], + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00375, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = ('./work_dirs/omnisource/' + 'tsn_r50_1x1x8_100e_minikinetics_webimage_rgb') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e22f36a60540de2a4618beee0fcaed682cbbc522 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/README.md @@ -0,0 +1,88 @@ +# R2plus1D + +[A closer look at spatiotemporal convolutions for action recognition](https://openaccess.thecvf.com/content_cvpr_2018/html/Tran_A_Closer_Look_CVPR_2018_paper.html) + + + +## Abstract + + + +In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly advantages in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block "R(2+1)D" which gives rise to CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101 and HMDB51. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------ | :------------: | :--: | :------: | :------: | :------: | :------: | :---------------------: | :--------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: | +| [r2plus1d_r34_8x8x1_180e_kinetics400_rgb](/configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py) | short-side 256 | 8x4 | ResNet34 | None | 67.30 | 87.65 | x | 5019 | [ckpt](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb_20200729-aa94765e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb/20200728_021421.log) | [json](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb/20200728_021421.log.json) | +| [r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet34 | None | 67.3 | 87.8 | x | 5019 | [ckpt](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb_20200826-ab35a529.pth) | [log](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/20200724_201360.log.json) | [json](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/20200724_201360.log) | +| [r2plus1d_r34_8x8x1_180e_kinetics400_rgb](/configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet34 | None | 68.68 | 88.36 | 1.6 (80x3 frames) | 5019 | [ckpt](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_8x8x1_180e_kinetics400_rgb_20200618-3fce5629.pth) | [log](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/r21d_8x8.log) | [json](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_8x8_69.58_88.36.log.json) | +| [r2plus1d_r34_32x2x1_180e_kinetics400_rgb](/configs/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet34 | None | 74.60 | 91.59 | 0.5 (320x3 frames) | 12975 | [ckpt](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb/r2plus1d_r34_32x2x1_180e_kinetics400_rgb_20200618-63462eb3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb/r21d_32x2.log) | [json](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb/r2plus1d_r34_32x2_74.6_91.6.log.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train R(2+1)D model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py \ + --work-dir work_dirs/r2plus1d_r34_3d_8x8x1_180e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test R(2+1)D model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips=prob +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{tran2018closer, + title={A closer look at spatiotemporal convolutions for action recognition}, + author={Tran, Du and Wang, Heng and Torresani, Lorenzo and Ray, Jamie and LeCun, Yann and Paluri, Manohar}, + booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition}, + pages={6450--6459}, + year={2018} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..0d2e1e22da29fe12efb48d1c35fb7ef40a1b5925 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/README_zh-CN.md @@ -0,0 +1,73 @@ +# R2plus1D + +## 简介 + + + +```BibTeX +@inproceedings{tran2018closer, + title={A closer look at spatiotemporal convolutions for action recognition}, + author={Tran, Du and Wang, Heng and Torresani, Lorenzo and Ray, Jamie and LeCun, Yann and Paluri, Manohar}, + booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition}, + pages={6450--6459}, + year={2018} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------ | :------: | :------: | :------: | :----: | :---------: | :---------: | :----------------: | :--------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: | +| [r2plus1d_r34_8x8x1_180e_kinetics400_rgb](/configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py) | 短边 256 | 8x4 | ResNet34 | None | 67.30 | 87.65 | x | 5019 | [ckpt](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb_20200729-aa94765e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb/20200728_021421.log) | [json](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb/20200728_021421.log.json) | +| [r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet34 | None | 67.3 | 87.8 | x | 5019 | [ckpt](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb_20200826-ab35a529.pth) | [log](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/20200724_201360.log.json) | [json](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/20200724_201360.log) | +| [r2plus1d_r34_8x8x1_180e_kinetics400_rgb](/configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py) | 短边 320 | 8x2 | ResNet34 | None | 68.68 | 88.36 | 1.6 (80x3 frames) | 5019 | [ckpt](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_8x8x1_180e_kinetics400_rgb_20200618-3fce5629.pth) | [log](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/r21d_8x8.log) | [json](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_8x8_69.58_88.36.log.json) | +| [r2plus1d_r34_32x2x1_180e_kinetics400_rgb](/configs/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb.py) | 短边 320 | 8x2 | ResNet34 | None | 74.60 | 91.59 | 0.5 (320x3 frames) | 12975 | [ckpt](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb/r2plus1d_r34_32x2x1_180e_kinetics400_rgb_20200618-63462eb3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb/r21d_32x2.log) | [json](https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb/r2plus1d_r34_32x2_74.6_91.6.log.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 R(2+1)D 模型在 Kinetics400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py \ + --work-dir work_dirs/r2plus1d_r34_3d_8x8x1_180e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics400 数据集上测试 R(2+1)D 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips=prob +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..f7056af424d0102f1f918af919b22a4f6cfaf416 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/metafile.yml @@ -0,0 +1,99 @@ +Collections: +- Name: R2Plus1D + README: configs/recognition/r2plus1d/README.md + Paper: + URL: https://arxiv.org/abs/1711.11248 + Title: A Closer Look at Spatiotemporal Convolutions for Action Recognition +Models: +- Config: configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py + In Collection: R2Plus1D + Metadata: + Architecture: ResNet34 + Batch Size: 8 + Epochs: 180 + FLOPs: 53175572992 + Parameters: 63759281 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: r2plus1d_r34_8x8x1_180e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 67.3 + Top 5 Accuracy: 87.65 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb/20200728_021421.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb/20200728_021421.log + Weights: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_256p_8x8x1_180e_kinetics400_rgb_20200729-aa94765e.pth +- Config: configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py + In Collection: R2Plus1D + Metadata: + Architecture: ResNet34 + Batch Size: 16 + Epochs: 180 + FLOPs: 53175572992 + Parameters: 63759281 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 67.3 + Top 5 Accuracy: 87.8 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/20200724_201360.log + Training Log: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/20200724_201360.log.json + Weights: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb_20200826-ab35a529.pth +- Config: configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py + In Collection: R2Plus1D + Metadata: + Architecture: ResNet34 + Batch Size: 8 + Epochs: 180 + FLOPs: 53175572992 + Parameters: 63759281 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: r2plus1d_r34_8x8x1_180e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 68.68 + Top 5 Accuracy: 88.36 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_8x8_69.58_88.36.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/r21d_8x8.log + Weights: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_8x8x1_180e_kinetics400_rgb_20200618-3fce5629.pth +- Config: configs/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb.py + In Collection: R2Plus1D + Metadata: + Architecture: ResNet34 + Batch Size: 6 + Epochs: 180 + FLOPs: 212701677568 + Parameters: 63759281 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: r2plus1d_r34_32x2x1_180e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 74.6 + Top 5 Accuracy: 91.59 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb/r2plus1d_r34_32x2_74.6_91.6.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb/r21d_32x2.log + Weights: https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb/r2plus1d_r34_32x2x1_180e_kinetics400_rgb_20200618-63462eb3.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..53b17630990ba4efc3f38a9d4ad6af36a5a9575a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_32x2x1_180e_kinetics400_rgb.py @@ -0,0 +1,81 @@ +_base_ = ['./r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + test_mode=True), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline, + test_mode=True)) +# optimizer +optimizer = dict( + type='SGD', lr=0.075, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = './work_dirs/r2plus1d_r34_3d_32x2x1_180e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..f06d5696a28c83db2dffbee9c682fd162167c2ae --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py @@ -0,0 +1,92 @@ +_base_ = [ + '../../_base_/models/r2plus1d_r34.py', '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + test_mode=True), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline, + test_mode=True)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 180 + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/r2plus1d_r34_8x8x1_180e_kinetics400_rgb/' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..49c85c2ae770e49329a6b82fdf08ff6250bca3a0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py @@ -0,0 +1,87 @@ +_base_ = ['./r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py'] + +# model settings +model = dict(backbone=dict(act_cfg=dict(type='ReLU'))) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + test_mode=True), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline, + test_mode=True)) +# optimizer +optimizer = dict( + type='SGD', lr=0.2, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = './work_dirs/r2plus1d_r34_video_3d_8x8x1_180e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_video_inference_8x8x1_180e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_video_inference_8x8x1_180e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..cb4bb161d46207f9fad0b3d9e1c1430bb098ef9d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/r2plus1d/r2plus1d_r34_video_inference_8x8x1_180e_kinetics400_rgb.py @@ -0,0 +1,33 @@ +_base_ = ['../../_base_/models/r2plus1d_r34.py'] + +# model settings +model = dict(backbone=dict(act_cfg=dict(type='ReLU'))) + +# dataset settings +dataset_type = 'VideoDataset' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=None, + data_prefix=None, + pipeline=test_pipeline)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/README.md new file mode 100644 index 0000000000000000000000000000000000000000..35fbc3f319ec5e9ee78517eaaa23a7262b519802 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/README.md @@ -0,0 +1,101 @@ +# SlowFast + +[SlowFast Networks for Video Recognition](https://openaccess.thecvf.com/content_ICCV_2019/html/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.html) + + + +## Abstract + + + +We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report state-of-the-art accuracy on major video recognition benchmarks, Kinetics, Charades and AVA. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------------------: | :------: | :------: | :------: | :----------------------: | :--------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py) | short-side 256 | 8x4 | ResNet50 | None | 74.75 | 91.73 | x | 6203 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb_20200728-145f1097.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/20200731_151706.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/20200731_151706.log.json) | +| [slowfast_r50_video_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | None | 73.95 | 91.50 | x | 6203 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/slowfast_r50_video_4x16x1_256e_kinetics400_rgb_20200826-f85b90c5.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/20200812_160237.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/20200812_160237.log.json) | +| [slowfast_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | None | 76.0 | 92.54 | 1.6 ((32+4)x10x3 frames) | 6203 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_256e_kinetics400_rgb_20210722-04e43ed4.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_20210722.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_20210722.log.json) | +| [slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | None | 76.34 | 92.67 | x | 6203 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb_20210722-bb725050.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb/slowfast_prebn_r50_4x16x1_20210722.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb/slowfast_prebn_r50_4x16x1_20210722.log.json) | +| [slowfast_r50_8x8x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py) | short-side 320 | 8x3 | ResNet50 | None | 76.94 | 92.8 | 1.3 ((32+8)x10x3 frames) | 9062 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/20200716_192653.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/20200716_192653.log.json) | +| [slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr](/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.py) | short-side 320 | 8x4 | ResNet50 | None | 76.34 | 92.61 | | 9062 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr-43988bac.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.json) | +| [slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | None | 76.07 | 92.21 | x | 9062 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb-f82bd304.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.json) | +| [slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr](/configs/recognition/slowfast/slowfast_perbn_r50_8x8x1_256e_kinetics400_rgb_steplr.py) | short-side 320 | 8x4 | ResNet50 | None | 76.58 | 92.85 | | 9062 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr-28474e54.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.json) | +| [slowfast_r101_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r101_r50_4x16x1_256e_kinetics400_rgb.py) | short-side 256 | 8x1 | ResNet101 + ResNet50 | None | 76.69 | 93.07 | | 16628 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_4x16x1_256e_kinetics400_rgb/slowfast_r101_4x16x1_256e_kinetics400_rgb_20210218-d8b58813.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_4x16x1_256e_kinetics400_rgb/20210118_133528.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_4x16x1_256e_kinetics400_rgb/20210118_133528.log.json) | +| [slowfast_r101_8x8x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb.py) | short-side 256 | 8x4 | ResNet101 | None | 77.90 | 93.51 | | 25994 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb/slowfast_r101_8x8x1_256e_kinetics400_rgb_20210218-0dd54025.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb/20210218_121513.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb/20210218_121513.log.json) | +| [slowfast_r152_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r152_r50_4x16x1_256e_kinetics400_rgb.py) | short-side 256 | 8x1 | ResNet152 + ResNet50 | None | 77.13 | 93.20 | | 10077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r152_4x16x1_256e_kinetics400_rgb/slowfast_r152_4x16x1_256e_kinetics400_rgb_20210122-bdeb6b87.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r152_4x16x1_256e_kinetics400_rgb/20210122_131321.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r152_4x16x1_256e_kinetics400_rgb/20210122_131321.log.json) | + +### Something-Something V1 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------ | :--------: | :--: | :------: | :---------: | :------: | :------: | :---------------------: | :--------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_r50_16x8x1_22e_sthv1_rgb](/configs/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | Kinetics400 | 49.67 | 79.00 | x | 9293 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb/slowfast_r50_16x8x1_22e_sthv1_rgb_20211202-aaaf9279.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb/slowfast_r50_16x8x1_22e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb/slowfast_r50_16x8x1_22e_sthv1_rgb.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train SlowFast model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py \ + --work-dir work_dirs/slowfast_r50_4x16x1_256e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test SlowFast model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips=prob +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{feichtenhofer2019slowfast, + title={Slowfast networks for video recognition}, + author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming}, + booktitle={Proceedings of the IEEE international conference on computer vision}, + pages={6202--6211}, + year={2019} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..479664598ea16344b749a5c89682772963ad4e99 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/README_zh-CN.md @@ -0,0 +1,86 @@ +# SlowFast + +## 简介 + + + +```BibTeX +@inproceedings{feichtenhofer2019slowfast, + title={Slowfast networks for video recognition}, + author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming}, + booktitle={Proceedings of the IEEE international conference on computer vision}, + pages={6202--6211}, + year={2019} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------------------------- | :-----: | :------: | :------------------: | :----: | :---------: | :---------: | :----------------------: | :--------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py) | 短边256 | 8x4 | ResNet50 | None | 74.75 | 91.73 | x | 6203 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb_20200728-145f1097.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/20200731_151706.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/20200731_151706.log.json) | +| [slowfast_r50_video_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 短边256 | 8 | ResNet50 | None | 73.95 | 91.50 | x | 6203 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/slowfast_r50_video_4x16x1_256e_kinetics400_rgb_20200826-f85b90c5.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/20200812_160237.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/20200812_160237.log.json) | +| [slowfast_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py) | 短边320 | 8x2 | ResNet50 | None | 76.0 | 92.54 | 1.6 ((32+4)x10x3 frames) | 6203 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_256e_kinetics400_rgb_20210722-04e43ed4.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_20210722.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_20210722.log.json) | +| [slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb.py) | 短边320 | 8x2 | ResNet50 | None | 76.34 | 92.67 | x | 6203 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb_20210722-bb725050.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb/slowfast_prebn_r50_4x16x1_20210722.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb/slowfast_prebn_r50_4x16x1_20210722.log.json) | +| [slowfast_r50_8x8x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py) | 短边320 | 8x3 | ResNet50 | None | 76.94 | 92.8 | 1.3 ((32+8)x10x3 frames) | 9062 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/20200716_192653.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/20200716_192653.log.json) | +| [slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr](/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py) | 短边320 | 8x4 | ResNet50 | None | 76.34 | 92.61 | 1.3 ((32+8)x10x3 frames) | 9062 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr-43988bac.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.json) | +| [slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.py) | 短边320 | 8x2 | ResNet50 | None | 76.07 | 92.21 | x | 9062 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb-f82bd304.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.json) | +| [slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr](/configs/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb.py) | 短边320 | 8x4 | ResNet50 | None | 76.58 | 92.85 | 1.3 ((32+8)x10x3 frames) | 9062 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr-28474e54.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.json) | +| [slowfast_r101_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r101_r50_4x16x1_256e_kinetics400_rgb.py) | 短边256 | 8x1 | ResNet101 + ResNet50 | None | 76.69 | 93.07 | | 16628 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_4x16x1_256e_kinetics400_rgb/slowfast_r101_4x16x1_256e_kinetics400_rgb_20210218-d8b58813.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_4x16x1_256e_kinetics400_rgb/20210118_133528.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_4x16x1_256e_kinetics400_rgb/20210118_133528.log.json) | +| [slowfast_r101_8x8x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb.py) | 短边256 | 8x4 | ResNet101 | None | 77.90 | 93.51 | | 25994 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb/slowfast_r101_8x8x1_256e_kinetics400_rgb_20210218-0dd54025.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb/20210218_121513.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb/20210218_121513.log.json) | +| [slowfast_r152_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowfast/slowfast_r152_r50_4x16x1_256e_kinetics400_rgb.py) | 短边256 | 8x1 | ResNet152 + ResNet50 | None | 77.13 | 93.20 | | 10077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r152_4x16x1_256e_kinetics400_rgb/slowfast_r152_4x16x1_256e_kinetics400_rgb_20210122-bdeb6b87.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r152_4x16x1_256e_kinetics400_rgb/20210122_131321.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r152_4x16x1_256e_kinetics400_rgb/20210122_131321.log.json) | + +### Something-Something V1 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------ | :----: | :------: | :------: | :---------: | :---------: | :---------: | :----------------: | :--------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowfast_r50_16x8x1_22e_sthv1_rgb](/configs/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | Kinetics400 | 49.67 | 79.00 | x | 9293 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb/slowfast_r50_16x8x1_22e_sthv1_rgb_20211202-aaaf9279.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb/slowfast_r50_16x8x1_22e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb/slowfast_r50_16x8x1_22e_sthv1_rgb.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 SlowFast 模型在 Kinetics400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py \ + --work-dir work_dirs/slowfast_r50_4x16x1_256e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 SlowFast 数据集上测试 CSN 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips=prob +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..353631e385d2da991b766a917effc065f878cd3d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/metafile.yml @@ -0,0 +1,260 @@ +Collections: +- Name: SlowFast + README: configs/recognition/slowfast/README.md + Paper: + URL: https://arxiv.org/abs/1812.03982 + Title: SlowFast Networks for Video Recognition +Models: +- Config: configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py + In Collection: SlowFast + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 36441296896 + Parameters: 34479288 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: slowfast_r50_4x16x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 74.75 + Top 5 Accuracy: 91.73 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/20200731_151706.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/20200731_151706.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb_20200728-145f1097.pth +- Config: configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py + In Collection: SlowFast + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 36441296896 + Parameters: 34479288 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: slowfast_r50_video_4x16x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.95 + Top 5 Accuracy: 91.50 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/20200812_160237.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/20200812_160237.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/slowfast_r50_video_4x16x1_256e_kinetics400_rgb_20200826-f85b90c5.pth +- Config: configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py + In Collection: SlowFast + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 36441296896 + Parameters: 34479288 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 24 GPUs + Modality: RGB + Name: slowfast_r50_4x16x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 75.64 + Top 5 Accuracy: 92.3 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/20200704_232901.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/20200704_232901.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth +- Config: configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py + In Collection: SlowFast + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 66222034944 + Parameters: 34565560 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 24 GPUs + Modality: RGB + Name: slowfast_r50_8x8x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.94 + Top 5 Accuracy: 92.8 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/20200716_192653.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/20200716_192653.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth +- Config: configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.py + In Collection: SlowFast + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 66222034944 + Parameters: 34565560 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.34 + Top 5 Accuracy: 92.61 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr-43988bac.pth +- Config: configs/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.py + In Collection: SlowFast + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 66222034944 + Parameters: 34565560 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.07 + Top 5 Accuracy: 92.21 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb-f82bd304.pth +- Config: configs/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.py + In Collection: SlowFast + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 66222034944 + Parameters: 34565560 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.58 + Top 5 Accuracy: 92.85 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr-28474e54.pth +- Config: configs/recognition/slowfast/slowfast_r101_r50_4x16x1_256e_kinetics400_rgb.py + In Collection: SlowFast + Metadata: + Architecture: ResNet101 + ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 65042780160 + Parameters: 62384312 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: slowfast_r101_r50_4x16x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.69 + Top 5 Accuracy: 93.07 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_4x16x1_256e_kinetics400_rgb/20210118_133528.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_4x16x1_256e_kinetics400_rgb/20210118_133528.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_4x16x1_256e_kinetics400_rgb/slowfast_r101_4x16x1_256e_kinetics400_rgb_20210218-d8b58813.pth +- Config: configs/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb.py + In Collection: SlowFast + Metadata: + Architecture: ResNet101 + Batch Size: 8 + Epochs: 256 + FLOPs: 127070375936 + Parameters: 62912312 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: slowfast_r101_8x8x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 77.9 + Top 5 Accuracy: 93.51 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb/20210218_121513.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb/20210218_121513.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb/slowfast_r101_8x8x1_256e_kinetics400_rgb_20210218-0dd54025.pth +- Config: configs/recognition/slowfast/slowfast_r152_r50_4x16x1_256e_kinetics400_rgb.py + In Collection: SlowFast + Metadata: + Architecture: ResNet152 + ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 91515654144 + Parameters: 84843704 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: slowfast_r152_r50_4x16x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 77.13 + Top 5 Accuracy: 93.2 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r152_4x16x1_256e_kinetics400_rgb/20210122_131321.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r152_4x16x1_256e_kinetics400_rgb/20210122_131321.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r152_4x16x1_256e_kinetics400_rgb/slowfast_r152_4x16x1_256e_kinetics400_rgb_20210122-bdeb6b87.pth +- Config: configs/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb.py + In Collection: SlowFast + Metadata: + Architecture: ResNet50 + Batch Size: 4 + Epochs: 22 + FLOPs: 132442627584 + Parameters: 34044630 + Pretrained: Kinetics400 + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: slowfast_r50_16x8x1_22e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 49.67 + Top 5 Accuracy: 79.00 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb/slowfast_r50_16x8x1_22e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb/slowfast_r50_16x8x1_22e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb/slowfast_r50_16x8x1_22e_sthv1_rgb_20211202-aaaf9279.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7abce40f178559cb7af2f9b4969ce4b4576a8a84 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb.py @@ -0,0 +1,153 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=4, # tau + speed_ratio=4, # alpha + channel_ratio=8, # beta_inv + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + lateral_norm=True, + fusion_kernel=7, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + norm_eval=False)), + cls_head=dict( + type='SlowFastHead', + in_channels=2304, # 2048+256 + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) + +train_cfg = None +test_cfg = dict(average_clips='prob') + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=4, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001) # 16gpu 0.1->0.2 +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +lr_config = dict(policy='step', step=[94, 154, 196]) + +total_epochs = 239 + +evaluation = dict( + interval=3, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' + +checkpoint_config = dict(interval=3) +workflow = [('train', 1)] + +find_unused_parameters = False + +multigrid = dict( + long_cycle=True, + short_cycle=True, + epoch_factor=1.5, + long_cycle_factors=[[0.25, 0.7071], [0.5, 0.7071], [0.5, 1], [1, 1]], + short_cycle_factors=[0.5, 0.7071], + default_s=(224, 224), +) + +precise_bn = dict(num_iters=200, interval=3) + +load_from = None +resume_from = None + +work_dir = './work_dirs/slowfast_multigrid_r50_8x8x1_358e_kinetics400_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..b407bc15dd42850061d41c0f8e22a7bffb84fcc2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/slowfast_r50.py', '../../_base_/default_runtime.py' +] + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=4, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=34) +total_epochs = 256 + +# precise bn +precise_bn = dict(num_iters=200, interval=1) + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowfast_prebn_r50_4x16x1_256e_kinetics400_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.py new file mode 100644 index 0000000000000000000000000000000000000000..392990c7fd9017ece9f4df0091d26a9fff1ec37d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr.py @@ -0,0 +1,15 @@ +_base_ = ['./slowfast_r50_8x8x1_256e_kinetics400_rgb.py'] + +model = dict(backbone=dict(slow_pathway=dict(lateral_norm=True))) + +lr_config = dict( + policy='step', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=34, + step=[94, 154, 196]) + +precise_bn = dict(num_iters=200, interval=5) + +work_dir = './work_dirs/slowfast_prebn_r50_8x8x1_256e_kinetics400_rgb_steplr' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..31c52441e8eea6fb56d0d44e4e5e3b89b94eec8f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r101_8x8x1_256e_kinetics400_rgb.py @@ -0,0 +1,137 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=4, # tau + speed_ratio=4, # alpha + channel_ratio=8, # beta_inv + slow_pathway=dict( + type='resnet3d', + depth=101, + pretrained=None, + lateral=True, + fusion_kernel=7, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + fast_pathway=dict( + type='resnet3d', + depth=101, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + norm_eval=False)), + cls_head=dict( + type='SlowFastHead', + in_channels=2304, # 2048+256 + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=None, + test_cfg=dict(average_clips='prob', max_testing_views=10)) + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=34) +total_epochs = 256 +checkpoint_config = dict(interval=4) +workflow = [('train', 1)] +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/slowfast_r101_8x8x1_256e_kinetics400_rgb' +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r101_r50_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r101_r50_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..b8da9030e6b13f9f9c3404245af6376441a09b8d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r101_r50_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,136 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=8, # tau + speed_ratio=8, # alpha + channel_ratio=8, # beta_inv + slow_pathway=dict( + type='resnet3d', + depth=101, + pretrained=None, + lateral=True, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + norm_eval=False)), + cls_head=dict( + type='SlowFastHead', + in_channels=2304, # 2048+256 + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=None, + test_cfg=dict(average_clips='prob', max_testing_views=10)) + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=34) +total_epochs = 256 +checkpoint_config = dict(interval=4) +workflow = [('train', 1)] +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/slowfast_r101_r50_4x16x1_256e_kinetics400_rgb' +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r152_r50_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r152_r50_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..0d9cd7ee10678ba82ca7dde1bb8675a84c461e44 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r152_r50_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,136 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowFast', + pretrained=None, + resample_rate=8, # tau + speed_ratio=8, # alpha + channel_ratio=8, # beta_inv + slow_pathway=dict( + type='resnet3d', + depth=152, + pretrained=None, + lateral=True, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + norm_eval=False)), + cls_head=dict( + type='SlowFastHead', + in_channels=2304, # 2048+256 + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=None, + test_cfg=dict(average_clips='prob', max_testing_views=8)) + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=34) +total_epochs = 256 +checkpoint_config = dict(interval=4) +workflow = [('train', 1)] +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/slowfast_r152_r50_4x16x1_256e_kinetics400_rgb' +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6cc79902dbeb44563522b5ca26ceae689fa8125a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_16x8x1_22e_sthv1_rgb.py @@ -0,0 +1,111 @@ +_base_ = [ + '../../_base_/models/slowfast_r50.py', '../../_base_/default_runtime.py' +] + +model = dict( + backbone=dict( + resample_rate=4, # tau + speed_ratio=4, # alpha + channel_ratio=8, # beta_inv + slow_pathway=dict(fusion_kernel=7)), + cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' + +sthv1_flip_label_map = {2: 4, 4: 2, 30: 41, 41: 30, 52: 66, 66: 52} +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=64, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, flip_label_map=sthv1_flip_label_map), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=64, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=64, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) + +evaluation = dict( + interval=1, metrics=['top_k_accuracy'], start=18, gpu_collect=True) + +# optimizer +optimizer = dict( + type='SGD', lr=0.06, momentum=0.9, + weight_decay=0.000001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='step', + step=[14, 18], + warmup='linear', + warmup_by_epoch=False, + warmup_iters=16343 // 32) +total_epochs = 22 + +# runtime settings +checkpoint_config = dict(interval=1) +work_dir = './work_dirs/slowfast_r50_16x8x1_22e_sthv1_rgb' +load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth' # noqa: E501 +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7e455a7ca6e041634db73d0631aed9c26441af3b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,94 @@ +_base_ = [ + '../../_base_/models/slowfast_r50.py', '../../_base_/default_runtime.py' +] + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=34) +total_epochs = 256 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowfast_r50_3d_4x16x1_256e_kinetics400_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..49a30be628ecd73596856198458915c7c5a3d0ef --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py @@ -0,0 +1,10 @@ +_base_ = ['./slowfast_r50_4x16x1_256e_kinetics400_rgb.py'] + +model = dict( + backbone=dict( + resample_rate=4, # tau + speed_ratio=4, # alpha + channel_ratio=8, # beta_inv + slow_pathway=dict(fusion_kernel=7))) + +work_dir = './work_dirs/slowfast_r50_3d_8x8x1_256e_kinetics400_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.py new file mode 100644 index 0000000000000000000000000000000000000000..284e10705065537ee7cc2d5bb140e23714d7d977 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr.py @@ -0,0 +1,13 @@ +_base_ = ['./slowfast_r50_8x8x1_256e_kinetics400_rgb.py'] + +model = dict(backbone=dict(slow_pathway=dict(lateral_norm=True))) + +lr_config = dict( + policy='step', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=34, + step=[94, 154, 196]) + +work_dir = './work_dirs/slowfast_r50_8x8x1_256e_kinetics400_rgb_steplr' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7335b3e7b4f4269b26647515acb28b2e1dce4804 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,85 @@ +_base_ = ['./slowfast_r50_4x16x1_256e_kinetics400_rgb.py'] + +model = dict( + backbone=dict( + resample_rate=8, # tau + speed_ratio=8, # alpha + channel_ratio=8 # beta_inv + )) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# runtime settings +work_dir = './work_dirs/slowfast_r50_video_3d_4x16x1_256e_kinetics400_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_video_inference_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_video_inference_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..aac3615fcc7ec599558ea8789de1c2e61c73ab2e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowfast/slowfast_r50_video_inference_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,32 @@ +_base_ = ['../../_base_/models/slowfast_r50.py'] + +# dataset settings +dataset_type = 'VideoDataset' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=None, + data_prefix=None, + pipeline=test_pipeline)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6e18ffc110859fd13b21e9a97fa9a60224666d67 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/README.md @@ -0,0 +1,160 @@ +# SlowOnly + +[Slowfast networks for video recognition](https://openaccess.thecvf.com/content_ICCV_2019/html/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.html) + + + +## Abstract + + + +We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report state-of-the-art accuracy on major video recognition benchmarks, Kinetics, Charades and AVA. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------: | :------: | :------: | :------: | :---------------------: | :--------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py) | short-side 256 | 8x4 | ResNet50 | None | 72.76 | 90.51 | x | 3168 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb_20200820-bea7701f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log.json) | +| [slowonly_r50_video_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | None | 72.90 | 90.82 | x | 8472 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014.json) | +| [slowonly_r50_8x8x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb.py) | short-side 256 | 8x4 | ResNet50 | None | 74.42 | 91.49 | x | 5820 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb_20200820-75851a7d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb/20200817_003320.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb/20200817_003320.log.json) | +| [slowonly_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | None | 73.02 | 90.77 | 4.0 (40x3 frames) | 3168 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/so_4x16.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16_73.02_90.77.log.json) | +| [slowonly_r50_8x8x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb.py) | short-side 320 | 8x3 | ResNet50 | None | 74.93 | 91.92 | 2.3 (80x3 frames) | 5820 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/slowonly_r50_8x8x1_256e_kinetics400_rgb_20200703-a79c555a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/so_8x8.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/slowonly_r50_8x8_74.93_91.92.log.json) | +| [slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | ImageNet | 73.39 | 91.12 | x | 3168 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb_20200912-1e8fc736.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb_20200912.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb_20200912.json) | +| [slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb.py) | short-side 320 | 8x4 | ResNet50 | ImageNet | 75.55 | 92.04 | x | 5820 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912-3f9ce182.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912.json) | +| [slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | ImageNet | 74.54 | 91.73 | x | 4435 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb_20210308-0d6e5a69.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/20210305_152630.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/20210305_152630.log.json) | +| [slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb.py) | short-side 320 | 8x4 | ResNet50 | ImageNet | 76.07 | 92.42 | x | 8895 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb_20210308-e8dd9e82.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/20210308_212250.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/20210308_212250.log.json) | +| [slowonly_r50_4x16x1_256e_kinetics400_flow](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow.py) | short-side 320 | 8x2 | ResNet50 | ImageNet | 61.79 | 83.62 | x | 8450 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow/slowonly_r50_4x16x1_256e_kinetics400_flow_20200704-decb8568.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow/slowonly_r50_4x16x1_256e_kinetics400_flow_61.8_83.6.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow/slowonly_r50_4x16x1_256e_kinetics400_flow_61.8_83.6.log.json) | +| [slowonly_r50_8x8x1_196e_kinetics400_flow](/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow.py) | short-side 320 | 8x4 | ResNet50 | ImageNet | 65.76 | 86.25 | x | 8455 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow/slowonly_r50_8x8x1_256e_kinetics400_flow_20200704-6b384243.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow/slowonly_r50_8x8x1_196e_kinetics400_flow_65.8_86.3.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow/slowonly_r50_8x8x1_196e_kinetics400_flow_65.8_86.3.log.json) | + +### Kinetics-400 Data Benchmark + +In data benchmark, we compare two different data preprocessing methods: (1) Resize video to 340x256, (2) Resize the short edge of video to 320px, (3) Resize the short edge of video to 256px. + +| config | resolution | gpus | backbone | Input | pretrain | top1 acc | top5 acc | testing protocol | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------: | :---: | :------: | :------: | :------: | :----------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb.py) | 340x256 | 8x2 | ResNet50 | 4x16 | None | 71.61 | 90.05 | 10 clips x 3 crops | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb_20200803-dadca1a3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb_20200803.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb_20200803.json) | +| [slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | 4x16 | None | 73.02 | 90.77 | 10 clips x 3 crops | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/so_4x16.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16_73.02_90.77.log.json) | +| [slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb.py) | short-side 256 | 8x4 | ResNet50 | 4x16 | None | 72.76 | 90.51 | 10 clips x 3 crops | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb_20200820-bea7701f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log.json) | + +### Kinetics-400 OmniSource Experiments + +| config | resolution | backbone | pretrain | w. OmniSource | top1 acc | top5 acc | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------: | :------------: | :-------: | :------: | :----------------: | :------: | :------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py) | short-side 320 | ResNet50 | None | :x: | 73.0 | 90.8 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/so_4x16.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16_73.02_90.77.log.json) | +| x | x | ResNet50 | None | :heavy_check_mark: | 76.8 | 92.5 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r50_omni_4x16x1_kinetics400_rgb_20200926-51b1f7ea.pth) | x | x | +| [slowonly_r101_8x8x1_196e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r101_8x8x1_196e_kinetics400_rgb.py) | x | ResNet101 | None | :x: | 76.5 | 92.7 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_without_omni_8x8x1_kinetics400_rgb_20200926-0c730aef.pth) | x | x | +| x | x | ResNet101 | None | :heavy_check_mark: | 80.4 | 94.4 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_omni_8x8x1_kinetics400_rgb_20200926-b5dbb701.pth) | x | x | + +### Kinetics-600 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------ | :------------: | :--: | :------: | :------: | :------: | :------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_video_8x8x1_256e_kinetics600_rgb](/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb.py) | short-side 256 | 8x4 | ResNet50 | None | 77.5 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb/slowonly_r50_video_8x8x1_256e_kinetics600_rgb_20201015-81e5153e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb/slowonly_r50_video_8x8x1_256e_kinetics600_rgb_20201015.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb/slowonly_r50_video_8x8x1_256e_kinetics600_rgb_20201015.json) | + +### Kinetics-700 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------ | :------------: | :--: | :------: | :------: | :------: | :------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_video_8x8x1_256e_kinetics700_rgb](/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb.py) | short-side 256 | 8x4 | ResNet50 | None | 65.0 | 86.1 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb/slowonly_r50_video_8x8x1_256e_kinetics700_rgb_20201015-9250f662.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb/slowonly_r50_video_8x8x1_256e_kinetics700_rgb_20201015.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb/slowonly_r50_video_8x8x1_256e_kinetics700_rgb_20201015.json) | + +### GYM99 + +| config | resolution | gpus | backbone | pretrain | top1 acc | mean class acc | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------------------------ | :------------: | :--: | :------: | :------: | :------: | :------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb.py) | short-side 256 | 8x2 | ResNet50 | ImageNet | 79.3 | 70.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb_20201111-a9c34b54.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb_20201111.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb_20201111.json) | +| [slowonly_k400_pretrained_r50_4x16x1_120e_gym99_flow](/configs/recognition/slowonly/slowonly_k400_pretrained_r50_4x16x1_120e_gym99_flow.py) | short-side 256 | 8x2 | ResNet50 | Kinetics | 80.3 | 71.0 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow_20201111-66ecdb3c.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow_20201111.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow_20201111.json) | +| 1: 1 Fusion | | | | | 83.7 | 74.8 | | | | + +### Jester + +| config | resolution | gpus | backbone | pretrain | top1 acc | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 97.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb-b56a5389.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.json) | + +### HMDB51 + +| config | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------- | :--: | :------: | :---------: | :------: | :------: | :--------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb.py) | 8 | ResNet50 | ImageNet | 37.52 | 71.50 | 5812 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb_20210630-16faeb6a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb/20210605_185256.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb/20210605_185256.log.json) | +| [slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb](/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb.py) | 8 | ResNet50 | Kinetics400 | 65.95 | 91.05 | 5812 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb_20210630-cee5f725.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb/20210606_010153.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb/20210606_010153.log.json) | + +### UCF101 + +| config | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------- | :--: | :------: | :---------: | :------: | :------: | :--------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb.py) | 8 | ResNet50 | ImageNet | 71.35 | 89.35 | 5812 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb_20210630-181e1661.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb/20210605_213503.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb/20210605_213503.log.json) | +| [slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb](/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb.py) | 8 | ResNet50 | Kinetics400 | 92.78 | 99.42 | 5812 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb_20210630-ee8c850f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb/20210606_010231.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb/20210606_010231.log.json) | + +### Something-Something V1 + +| config | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------------------------------- | :--: | :------: | :------: | :------: | :------: | :--------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.py) | 8 | ResNet50 | ImageNet | 47.76 | 77.49 | 7759 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb_20211202-d034ff12.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to corresponding parts in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train SlowOnly model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py \ + --work-dir work_dirs/slowonly_r50_4x16x1_256e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test SlowOnly model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips=prob +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{feichtenhofer2019slowfast, + title={Slowfast networks for video recognition}, + author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming}, + booktitle={Proceedings of the IEEE international conference on computer vision}, + pages={6202--6211}, + year={2019} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..a9d341b90942904991a04759a8fe686108979ac6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/README_zh-CN.md @@ -0,0 +1,145 @@ +# SlowOnly + +## 简介 + + + +```BibTeX +@inproceedings{feichtenhofer2019slowfast, + title={Slowfast networks for video recognition}, + author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming}, + booktitle={Proceedings of the IEEE international conference on computer vision}, + pages={6202--6211}, + year={2019} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :------: | :---------: | :---------: | :----------------: | :--------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py) | 短边 256 | 8x4 | ResNet50 | None | 72.76 | 90.51 | x | 3168 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb_20200820-bea7701f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log.json) | +| [slowonly_r50_video_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 短边 320 | 8x2 | ResNet50 | None | 72.90 | 90.82 | x | 8472 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014.json) | +| [slowonly_r50_8x8x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb.py) | 短边 256 | 8x4 | ResNet50 | None | 74.42 | 91.49 | x | 5820 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb_20200820-75851a7d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb/20200817_003320.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb/20200817_003320.log.json) | +| [slowonly_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py) | 短边 320 | 8x2 | ResNet50 | None | 73.02 | 90.77 | 4.0 (40x3 frames) | 3168 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/so_4x16.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16_73.02_90.77.log.json) | +| [slowonly_r50_8x8x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb.py) | 短边 320 | 8x3 | ResNet50 | None | 74.93 | 91.92 | 2.3 (80x3 frames) | 5820 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/slowonly_r50_8x8x1_256e_kinetics400_rgb_20200703-a79c555a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/so_8x8.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/slowonly_r50_8x8_74.93_91.92.log.json) | +| [slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb.py) | 短边 320 | 8x2 | ResNet50 | ImageNet | 73.39 | 91.12 | x | 3168 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb_20200912-1e8fc736.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb_20200912.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb_20200912.json) | +| [slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb.py) | 短边 320 | 8x4 | ResNet50 | ImageNet | 75.55 | 92.04 | x | 5820 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912-3f9ce182.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912.json) | +| [slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb.py) | 短边 320 | 8x2 | ResNet50 | ImageNet | 74.54 | 91.73 | x | 4435 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb_20210308-0d6e5a69.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/20210305_152630.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/20210305_152630.log.json) | +| [slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb.py) | 短边 320 | 8x4 | ResNet50 | ImageNet | 76.07 | 92.42 | x | 8895 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb_20210308-e8dd9e82.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/20210308_212250.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/20210308_212250.log.json) | +| [slowonly_r50_4x16x1_256e_kinetics400_flow](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow.py) | 短边 320 | 8x2 | ResNet50 | ImageNet | 61.79 | 83.62 | x | 8450 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow/slowonly_r50_4x16x1_256e_kinetics400_flow_20200704-decb8568.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow/slowonly_r50_4x16x1_256e_kinetics400_flow_61.8_83.6.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow/slowonly_r50_4x16x1_256e_kinetics400_flow_61.8_83.6.log.json) | +| [slowonly_r50_8x8x1_196e_kinetics400_flow](/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow.py) | 短边 320 | 8x4 | ResNet50 | ImageNet | 65.76 | 86.25 | x | 8455 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow/slowonly_r50_8x8x1_256e_kinetics400_flow_20200704-6b384243.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow/slowonly_r50_8x8x1_196e_kinetics400_flow_65.8_86.3.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow/slowonly_r50_8x8x1_196e_kinetics400_flow_65.8_86.3.log.json) | + +### Kinetics-400 数据基准测试 + +在数据基准测试中,比较两种不同的数据预处理方法 (1) 视频分辨率为 340x256, (2) 视频分辨率为短边 320px, (3) 视频分辨率为短边 256px. + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 输入 | 预训练 | top1 准确率 | top5 准确率 | 测试方案 | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :--: | :----: | :---------: | :---------: | :----------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb.py) | 340x256 | 8x2 | ResNet50 | 4x16 | None | 71.61 | 90.05 | 10 clips x 3 crops | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb_20200803-dadca1a3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb_20200803.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb_20200803.json) | +| [slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb.py) | 短边 320 | 8x2 | ResNet50 | 4x16 | None | 73.02 | 90.77 | 10 clips x 3 crops | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/so_4x16.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16_73.02_90.77.log.json) | +| [slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb.py) | 短边 256 | 8x4 | ResNet50 | 4x16 | None | 72.76 | 90.51 | 10 clips x 3 crops | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb_20200820-bea7701f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log.json) | + +### Kinetics-400 OmniSource Experiments + +| 配置文件 | 分辨率 | 主干网络 | 预训练 | w. OmniSource | top1 准确率 | top5 准确率 | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------: | :------: | :-------: | :----: | :----------------: | :---------: | :---------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py) | 短边 320 | ResNet50 | None | :x: | 73.0 | 90.8 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/so_4x16.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16_73.02_90.77.log.json) | +| x | x | ResNet50 | None | :heavy_check_mark: | 76.8 | 92.5 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r50_omni_4x16x1_kinetics400_rgb_20200926-51b1f7ea.pth) | x | x | +| [slowonly_r101_8x8x1_196e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r101_8x8x1_196e_kinetics400_rgb.py) | x | ResNet101 | None | :x: | 76.5 | 92.7 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_without_omni_8x8x1_kinetics400_rgb_20200926-0c730aef.pth) | x | x | +| x | x | ResNet101 | None | :heavy_check_mark: | 80.4 | 94.4 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_omni_8x8x1_kinetics400_rgb_20200926-b5dbb701.pth) | x | x | + +### Kinetics-600 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------ | :------: | :------: | :------: | :----: | :---------: | :---------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_video_8x8x1_256e_kinetics600_rgb](/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb.py) | 短边 256 | 8x4 | ResNet50 | None | 77.5 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb/slowonly_r50_video_8x8x1_256e_kinetics600_rgb_20201015-81e5153e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb/slowonly_r50_video_8x8x1_256e_kinetics600_rgb_20201015.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb/slowonly_r50_video_8x8x1_256e_kinetics600_rgb_20201015.json) | + +### Kinetics-700 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------ | :------: | :------: | :------: | :----: | :---------: | :---------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_video_8x8x1_256e_kinetics700_rgb](/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb.py) | 短边 256 | 8x4 | ResNet50 | None | 65.0 | 86.1 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb/slowonly_r50_video_8x8x1_256e_kinetics700_rgb_20201015-9250f662.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb/slowonly_r50_video_8x8x1_256e_kinetics700_rgb_20201015.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb/slowonly_r50_video_8x8x1_256e_kinetics700_rgb_20201015.json) | + +### GYM99 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | 类别平均准确率 | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------------------------ | :------: | :------: | :------: | :------: | :---------: | :------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb.py) | 短边 256 | 8x2 | ResNet50 | ImageNet | 79.3 | 70.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb_20201111-a9c34b54.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb_20201111.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb_20201111.json) | +| [slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow](/configs/recognition/slowonly/slowonly_k400_pretrained_r50_4x16x1_120e_gym99_flow.py) | 短边 256 | 8x2 | ResNet50 | Kinetics | 80.3 | 71.0 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow_20201111-66ecdb3c.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow_20201111.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow_20201111.json) | +| 1: 1 融合 | | | | | 83.7 | 74.8 | | | | + +### Jester + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :---------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 97.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb-b56a5389.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.json) | + +### HMDB51 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :---------: | :---------: | :---------: | :--------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb.py) | 8 | ResNet50 | ImageNet | 37.52 | 71.50 | 5812 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb_20210630-16faeb6a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb/20210605_185256.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb/20210605_185256.log.json) | +| [slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb](/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb.py) | 8 | ResNet50 | Kinetics400 | 65.95 | 91.05 | 5812 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb_20210630-cee5f725.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb/20210606_010153.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb/20210606_010153.log.json) | + +### UCF101 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :---------: | :---------: | :---------: | :--------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb.py) | 8 | ResNet50 | ImageNet | 71.35 | 89.35 | 5812 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb_20210630-181e1661.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb/20210605_213503.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb/20210605_213503.log.json) | +| [slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb](/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb.py) | 8 | ResNet50 | Kinetics400 | 92.78 | 99.42 | 5812 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb_20210630-ee8c850f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb/20210606_010231.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb/20210606_010231.log.json) | + +### Something-Something V1 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :---------: | :---------: | :--------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb](/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.py) | 8 | ResNet50 | ImageNet | 47.76 | 77.49 | 7759 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb_20211202-d034ff12.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 SlowOnly 模型在 Kinetics400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py \ + --work-dir work_dirs/slowonly_r50_4x16x1_256e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics400 数据集上测试 SlowOnly 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips=prob +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..e79543a59a76715840875164cfb3d8466c7ec9b0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,115 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + lateral=False, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + cls_head=dict( + type='I3DHead', + in_channels=2048, + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=None, + test_cfg=dict(average_clips='prob')) + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_256p' +data_root_val = 'data/kinetics400/rawframes_val_256p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_256p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_256p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_256p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.6, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 256 +checkpoint_config = dict(interval=4) +workflow = [('train', 1)] +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/slowonly_r50_randomresizedcrop_256p_4x16x1' + '_256e_kinetics400_rgb') +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..b2d55cefaebd2b91e4cf1984e4ae1a441ece087a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,114 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + lateral=False, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + cls_head=dict( + type='I3DHead', + in_channels=2048, + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=None, + test_cfg=dict(average_clips='prob')) +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_320p' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_320p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.6, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 256 +checkpoint_config = dict(interval=4) +workflow = [('train', 1)] +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('./work_dirs/slowonly_r50_randomresizedcrop_320p_4x16x1' + '_256e_kinetics400_rgb') +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..d5c38635b2de9919d8f465c57c9eb54b10e705d4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/data_benchmark/slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,114 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + lateral=False, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + cls_head=dict( + type='I3DHead', + in_channels=2048, + num_classes=400, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=None, + test_cfg=dict(average_clips='prob')) +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.6, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 256 +checkpoint_config = dict(interval=4) +workflow = [('train', 1)] +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = ('slowonly_r50_randomresizedcrop_320p_4x16x1' + '_256e_kinetics400_rgb') +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..9e4110ea09805ad4eea80292748aba704b0e875e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/metafile.yml @@ -0,0 +1,550 @@ +Collections: +- Name: SlowOnly + README: configs/recognition/slowonly/README.md + Paper: + URL: https://arxiv.org/abs/1812.03982 + Title: SlowFast Networks for Video Recognition +Models: +- Config: configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 27430649856 + Parameters: 32454096 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: slowonly_r50_omnisource_4x16x1_256e_kinetics400_rgb + Converted From: + Weights: https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/slowonly_OmniSource_kinetics400_se_rgb_r50_seg1_4x16_scratch-71f7b8ee.pth + Code: https://github.com/open-mmlab/mmaction/ + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.8 + Top 5 Accuracy: 92.5 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r50_omni_4x16x1_kinetics400_rgb_20200926-51b1f7ea.pth +- Config: configs/recognition/slowonly/slowonly_r101_8x8x1_196e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet101 + Batch Size: 8 + Epochs: 196 + FLOPs: 112063447040 + Parameters: 60359120 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: slowonly_r101_8x8x1_196e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.5 + Top 5 Accuracy: 92.7 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_without_omni_8x8x1_kinetics400_rgb_20200926-0c730aef.pth +- Config: configs/recognition/slowonly/slowonly_r101_8x8x1_196e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet101 + Batch Size: 8 + Epochs: 196 + FLOPs: 112063447040 + Parameters: 60359120 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: slowonly_r101_omnisource_8x8x1_196e_kinetics400_rgb + Converted From: + Weights: https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/slowonly_OmniSource_kinetics400_se_rgb_r101_seg1_8x8_scratch-2f838cb0.pth + Code: https://github.com/open-mmlab/mmaction/ + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 80.4 + Top 5 Accuracy: 94.4 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_omni_8x8x1_kinetics400_rgb_20200926-b5dbb701.pth +- Config: configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 27430649856 + Parameters: 32454096 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: slowonly_r50_4x16x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.76 + Top 5 Accuracy: 90.51 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb_20200820-bea7701f.pth +- Config: configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 24 + Epochs: 256 + FLOPs: 27430649856 + Parameters: 32454096 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: slowonly_r50_video_4x16x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.9 + Top 5 Accuracy: 90.82 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth +- Config: configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 54860480512 + Parameters: 32454096 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: slowonly_r50_8x8x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 74.42 + Top 5 Accuracy: 91.49 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb/20200817_003320.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb/20200817_003320.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb/slowonly_r50_256p_8x8x1_256e_kinetics400_rgb_20200820-75851a7d.pth +- Config: configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 27430649856 + Parameters: 32454096 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: slowonly_r50_4x16x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.02 + Top 5 Accuracy: 90.77 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16_73.02_90.77.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/so_4x16.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth +- Config: configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 256 + FLOPs: 54860480512 + Parameters: 32454096 + Pretrained: None + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 24 GPUs + Modality: RGB + Name: slowonly_r50_8x8x1_256e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 74.93 + Top 5 Accuracy: 91.92 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/slowonly_r50_8x8_74.93_91.92.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/so_8x8.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/slowonly_r50_8x8x1_256e_kinetics400_rgb_20200703-a79c555a.pth +- Config: configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 150 + FLOPs: 27430649856 + Parameters: 32454096 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.39 + Top 5 Accuracy: 91.12 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb_20200912.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb_20200912.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb_20200912-1e8fc736.pth +- Config: configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 150 + FLOPs: 54860480512 + Parameters: 32454096 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 75.55 + Top 5 Accuracy: 92.04 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912-3f9ce182.pth +- Config: configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 150 + FLOPs: 38201098240 + Parameters: 39808464 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 74.54 + Top 5 Accuracy: 91.73 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/20210305_152630.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/20210305_152630.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb_20210308-0d6e5a69.pth +- Config: configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 150 + FLOPs: 76401377280 + Parameters: 39808464 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.07 + Top 5 Accuracy: 92.42 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/20210308_212250.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/20210308_212250.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb_20210308-e8dd9e82.pth +- Config: configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 24 + Epochs: 256 + FLOPs: 27225128960 + Parameters: 32450960 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: Flow + Name: slowonly_r50_4x16x1_256e_kinetics400_flow + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 61.79 + Top 5 Accuracy: 83.62 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow/slowonly_r50_4x16x1_256e_kinetics400_flow_61.8_83.6.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow/slowonly_r50_4x16x1_256e_kinetics400_flow_61.8_83.6.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow/slowonly_r50_4x16x1_256e_kinetics400_flow_20200704-decb8568.pth +- Config: configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 196 + FLOPs: 54449438720 + Parameters: 32450960 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: Flow + Name: slowonly_r50_8x8x1_196e_kinetics400_flow + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 65.76 + Top 5 Accuracy: 86.25 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow/slowonly_r50_8x8x1_196e_kinetics400_flow_65.8_86.3.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow/slowonly_r50_8x8x1_196e_kinetics400_flow_65.8_86.3.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow/slowonly_r50_8x8x1_256e_kinetics400_flow_20200704-6b384243.pth +- Config: configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 256 + FLOPs: 54860890112 + Parameters: 32863896 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-600 + Training Resources: 32 GPUs + Modality: RGB + Name: slowonly_r50_video_8x8x1_256e_kinetics600_rgb + Results: + - Dataset: Kinetics-600 + Metrics: + Top 1 Accuracy: 77.5 + Top 5 Accuracy: 93.7 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb/slowonly_r50_video_8x8x1_256e_kinetics600_rgb_20201015.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb/slowonly_r50_video_8x8x1_256e_kinetics600_rgb_20201015.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb/slowonly_r50_video_8x8x1_256e_kinetics600_rgb_20201015-81e5153e.pth +- Config: configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 256 + FLOPs: 54861094912 + Parameters: 33068796 + Pretrained: None + Resolution: short-side 256 + Training Data: Kinetics-700 + Training Resources: 32 GPUs + Modality: RGB + Name: slowonly_r50_video_8x8x1_256e_kinetics700_rgb + Results: + - Dataset: Kinetics-700 + Metrics: + Top 1 Accuracy: 65.0 + Top 5 Accuracy: 86.1 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb/slowonly_r50_video_8x8x1_256e_kinetics700_rgb_20201015.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb/slowonly_r50_video_8x8x1_256e_kinetics700_rgb_20201015.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb/slowonly_r50_video_8x8x1_256e_kinetics700_rgb_20201015-9250f662.pth +- Config: configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 24 + Epochs: 120 + FLOPs: 27430649856 + Parameters: 32454096 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: GYM99 + Training Resources: 16 GPUs + Modality: RGB + Name: slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb + Results: + - Dataset: GYM99 + Metrics: + Top 1 Accuracy: 79.3 + mean Top 1 Accuracy: 70.2 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb_20201111.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb_20201111.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb_20201111-a9c34b54.pth +- Config: configs/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 24 + Epochs: 120 + FLOPs: 27225128960 + Parameters: 32450960 + Pretrained: Kinetics + Resolution: short-side 256 + Training Data: GYM99 + Training Resources: 16 GPUs + Modality: Flow + Name: slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow + Results: + - Dataset: GYM99 + Metrics: + Top 1 Accuracy: 80.3 + mean Top 1 Accuracy: 71.0 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow_20201111.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow_20201111.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow/slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow_20201111-66ecdb3c.pth +- Config: configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 64 + FLOPs: 54859716608 + Parameters: 31689819 + Pretrained: ImageNet + Resolution: height 100 + Training Data: Jester + Training Resources: 8 GPUs + Modality: RGB + Name: slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb + Results: + - Dataset: Jester + Metrics: + Top 1 Accuracy: 97.2 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb-b56a5389.pth +- Config: configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 64 + FLOPs: 54859765760 + Parameters: 31738995 + Pretrained: ImageNet + Training Data: HMDB51 + Training Resources: 8 GPUs + Modality: RGB + Name: slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb + Results: + - Dataset: HMDB51 + Metrics: + Top 1 Accuracy: 37.52 + Top 5 Accuracy: 71.5 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb/20210605_185256.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb/20210605_185256.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb_20210630-16faeb6a.pth +- Config: configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 40 + FLOPs: 54859765760 + Parameters: 31738995 + Pretrained: Kinetics400 + Training Data: HMDB51 + Training Resources: 8 GPUs + Modality: RGB + Name: slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb + Results: + - Dataset: HMDB51 + Metrics: + Top 1 Accuracy: 65.95 + Top 5 Accuracy: 91.05 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb/20210606_010153.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb/20210606_010153.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb_20210630-cee5f725.pth +- Config: configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 64 + FLOPs: 54859868160 + Parameters: 31841445 + Pretrained: ImageNet + Training Data: UCF101 + Training Resources: 8 GPUs + Modality: RGB + Name: slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb + Results: + - Dataset: UCF101 + Metrics: + Top 1 Accuracy: 71.35 + Top 5 Accuracy: 89.35 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb/20210605_213503.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb/20210605_213503.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb_20210630-181e1661.pth +- Config: configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 40 + FLOPs: 54859868160 + Parameters: 31841445 + Pretrained: Kinetics400 + Training Data: UCF101 + Training Resources: 8 GPUs + Modality: RGB + Name: slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb + Results: + - Dataset: UCF101 + Metrics: + Top 1 Accuracy: 92.78 + Top 5 Accuracy: 99.42 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb/20210606_010231.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb/20210606_010231.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb_20210630-ee8c850f.pth +- Config: configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.py + In Collection: SlowOnly + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 64 + FLOPs: 53907910656 + Parameters: 31991022 + Pretrained: ImageNet + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 47.76 + Top 5 Accuracy: 77.49 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb_20211202-d034ff12.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..f5c3d79fbe9e4ed9652db70475c58ce848553695 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb.py @@ -0,0 +1,89 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/gym/rawframes' +data_root_val = 'data/gym/rawframes' +ann_file_train = 'data/gym/annotations/gym99_train_list_rawframes.txt' +ann_file_val = 'data/gym/annotations/gym99_val_list_rawframes.txt' +ann_file_test = 'data/gym/annotations/gym99_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=24, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.03, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[90, 110]) +total_epochs = 120 + +# runtime settings +work_dir = './work_dirs/slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..750d01b8b42730c2f2b4f0872e1d989638cb0025 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='step', + step=[90, 130], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=10) +total_epochs = 150 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = ('./work_dirs/slowonly_imagenet_pretrained_r50_4x16x1_150e' + '_kinetics400_rgb') +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..0305527d3daded044ec992b5767d68e76e82afba --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_hmdb51_rgb.py @@ -0,0 +1,93 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', + '../../_base_/schedules/sgd_150e_warmup.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=51)) + +# dataset settings +split = 1 +dataset_type = 'RawframeDataset' +data_root = 'data/hmdb51/rawframes' +data_root_val = 'data/hmdb51/rawframes' +ann_file_train = f'data/hmdb51/hmdb51_train_split_{split}_rawframes.txt' +ann_file_val = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +ann_file_test = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=4, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(lr=0.1) # this lr is used for 8 gpus +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0, by_epoch=False) +total_epochs = 64 + +# runtime settings +work_dir = './work_dirs/slowonly_r50_8x4x1_64e_hmdb51_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..89457ddf0426411e80bbbf14e529b32e30179e43 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv1_rgb.py @@ -0,0 +1,100 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', + '../../_base_/schedules/sgd_150e_warmup.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(with_pool1=False), cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=4, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 128)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(112, 112), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 128)), + dict(type='CenterCrop', crop_size=112), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 128)), + dict(type='ThreeCrop', crop_size=128), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + filename_tmpl='{:05}.jpg', + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + filename_tmpl='{:05}.jpg', + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + filename_tmpl='{:05}.jpg', + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(lr=0.1) # this lr is used for 8 gpus +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=10) +total_epochs = 64 + +# runtime settings +work_dir = './work_dirs/slowonly_r50_8x4x1_64e_sthv1_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv2_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv2_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..65720cffbc9ac49b560576edb030fe8b7f3f50cb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_sthv2_rgb.py @@ -0,0 +1,97 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', + '../../_base_/schedules/sgd_150e_warmup.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(with_pool1=False), cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv2/rawframes' +data_root_val = 'data/sthv2/rawframes' +ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt' +ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt' +ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=4, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 128)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(112, 112), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 128)), + dict(type='CenterCrop', crop_size=112), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 128)), + dict(type='ThreeCrop', crop_size=128), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(lr=0.1) # this lr is used for 8 gpus +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=10) +total_epochs = 64 + +# runtime settings +work_dir = './work_dirs/slowonly_r50_8x4x1_64e_sthv2_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..48df87cc320b51fd2cd980cd78eade24f3d1d968 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x4x1_64e_ucf101_rgb.py @@ -0,0 +1,93 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', + '../../_base_/schedules/sgd_150e_warmup.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=101)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ucf101/rawframes/' +data_root_val = 'data/ucf101/rawframes/' +split = 1 # official train/test splits. valid numbers: 1, 2, 3 +ann_file_train = f'data/ucf101/ucf101_train_split_{split}_rawframes.txt' +ann_file_val = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +ann_file_test = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=4, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(lr=0.1) # this lr is used for 8 gpus +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0, by_epoch=False) +total_epochs = 64 + +# runtime settings +work_dir = './work_dirs/slowonly_r50_8x4x1_64e_ucf101_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..0e34eda9fde94d5e1fd878ee8d9b1d56b82f2d77 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='step', + step=[90, 130], + warmup='linear', + warmup_by_epoch=True, + warmup_iters=10) +total_epochs = 150 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = ('./work_dirs/slowonly_imagenet_pretrained_r50_8x8x1_150e' + '_kinetics400_rgb') +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6e4e7fbc338374b50f048416f36a50dd1f6ad7cb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb.py @@ -0,0 +1,97 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=27)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/jester/rawframes' +data_root_val = 'data/jester/rawframes' +ann_file_train = 'data/jester/jester_train_list_rawframes.txt' +ann_file_val = 'data/jester/jester_val_list_rawframes.txt' +ann_file_test = 'data/jester/jester_val_list_rawframes.txt' +jester_flip_label_map = {0: 1, 1: 0, 6: 7, 7: 6} +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=4, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, flip_label_map=jester_flip_label_map), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0, by_epoch=False) +total_epochs = 64 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowonly_imagenet_pretrained_r50_8x8x1_64e_jester_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_k400_pretrained_r50_4x16x1_120e_gym99_flow.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_k400_pretrained_r50_4x16x1_120e_gym99_flow.py new file mode 100644 index 0000000000000000000000000000000000000000..b561287f3229bc10cfce5e20fce53974b4b8a284 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_k400_pretrained_r50_4x16x1_120e_gym99_flow.py @@ -0,0 +1,101 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None, in_channels=2, with_pool2=False)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/gym/rawframes' +data_root_val = 'data/gym/rawframes' +ann_file_train = 'data/gym/annotations/gym99_train_list_rawframes.txt' +ann_file_val = 'data/gym/annotations/gym99_val_list_rawframes.txt' +ann_file_test = 'data/gym/annotations/gym99_val_list_rawframes.txt' +img_norm_cfg = dict(mean=[128, 128], std=[128, 128]) +train_pipeline = [ + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=24, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + modality='Flow', + filename_tmpl='{}_{:05d}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + modality='Flow', + filename_tmpl='{}_{:05d}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + modality='Flow', + filename_tmpl='{}_{:05d}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.03, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[90, 110]) +total_epochs = 120 + +# runtime settings +work_dir = ('./work_dirs/' + 'slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow') +load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' + 'slowonly_r50_4x16x1_256e_kinetics400_flow/' + 'slowonly_r50_4x16x1_256e_kinetics400_flow_20200704-decb8568.pth') +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..53832d7dc10f9936af8d33d30379986d7aa0f70f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb.py @@ -0,0 +1,81 @@ +_base_ = ['./slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb.py'] + +# model settings +model = dict(cls_head=dict(num_classes=51)) + +# dataset settings +split = 1 +dataset_type = 'RawframeDataset' +data_root = 'data/hmdb51/rawframes' +data_root_val = 'data/hmdb51/rawframes' +ann_file_train = f'data/hmdb51/hmdb51_train_split_{split}_rawframes.txt' +ann_file_val = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +ann_file_test = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=4, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# runtime settings +work_dir = './work_dirs/slowonly_k400_pretrained_r50_8x4x1_40e_hmdb51_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..c4e5be479db1e4b51009b179891f8c9afd42d615 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb.py @@ -0,0 +1,97 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/schedules/sgd_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=101)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ucf101/rawframes/' +data_root_val = 'data/ucf101/rawframes/' +split = 1 # official train/test splits. valid numbers: 1, 2, 3 +ann_file_train = f'data/ucf101/ucf101_train_split_{split}_rawframes.txt' +ann_file_val = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +ann_file_test = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=4, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=4, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.001, # this lr is used for 8 gpus +) +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[15, 30]) +total_epochs = 40 + +# runtime settings +work_dir = './work_dirs/slowonly_k400_pretrained_r50_8x4x1_40e_ucf101_rgb' +load_from = 'https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb/slowonly_r50_8x8x1_256e_kinetics400_rgb_20200703-a79c555a.pth' # noqa: E501 +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..85d8b7f237fdf9bb4c3f5c343f2b27be61059b2f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb.py @@ -0,0 +1,93 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', + '../../_base_/schedules/sgd_150e_warmup.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict( + non_local=((0, 0, 0), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 0, 0)), + non_local_cfg=dict( + sub_sample=True, + use_scale=True, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='embedded_gaussian'))) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb' # noqa E501 +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4f71e890c5b965805904ac69791bb0dce4dfc4a8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb.py @@ -0,0 +1,98 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', + '../../_base_/schedules/sgd_150e_warmup.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict( + non_local=((0, 0, 0), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 0, 0)), + non_local_cfg=dict( + sub_sample=True, + use_scale=True, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='embedded_gaussian'))) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = './work_dirs/slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb' # noqa E501 +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r101_8x8x1_196e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r101_8x8x1_196e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..9b3f8903f51a43d463033b79a96ec7ce03fbe240 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r101_8x8x1_196e_kinetics400_rgb.py @@ -0,0 +1,21 @@ +_base_ = ['./slowonly_r50_8x8x1_256e_kinetics400_rgb.py'] + +# model settings +model = dict(backbone=dict(depth=101, pretrained=None)) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_ratio=0.1, + warmup_by_epoch=True, + warmup_iters=34) +total_epochs = 196 + +# runtime settings +work_dir = './work_dirs/slowonly_r101_8x8x1_196e_kinetics400_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow.py new file mode 100644 index 0000000000000000000000000000000000000000..02a3faf6968319127cb19dfa13902b913a7e5623 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_flow.py @@ -0,0 +1,103 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(in_channels=2, with_pool2=False)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics_flow_train_list.txt' +ann_file_val = 'data/kinetics400/kinetics_flow_val_list.txt' +ann_file_test = 'data/kinetics400/kinetics_flow_val_list.txt' +img_norm_cfg = dict(mean=[128, 128], std=[128, 128]) +train_pipeline = [ + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=24, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + modality='Flow', + filename_tmpl='{}_{:05d}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + modality='Flow', + filename_tmpl='{}_{:05d}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + modality='Flow', + filename_tmpl='{}_{:05d}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.06, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=34) +total_epochs = 256 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowonly_r50_4x16x1_256e_kinetics400_flow' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..a68c8efa88e6f0a057d0ee1cf958fc7b3563514e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,93 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 256 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowonly_r50_4x16x1_256e_kinetics400_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow.py new file mode 100644 index 0000000000000000000000000000000000000000..2cba67d9e1c959da33d7301185d09f1deb7a7454 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_flow.py @@ -0,0 +1,103 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(in_channels=2, with_pool2=False)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics_flow_train_list.txt' +ann_file_val = 'data/kinetics400/kinetics_flow_val_list.txt' +ann_file_test = 'data/kinetics400/kinetics_flow_val_list.txt' +img_norm_cfg = dict(mean=[128, 128], std=[128, 128]) +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + modality='Flow', + filename_tmpl='{}_{:05d}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + modality='Flow', + filename_tmpl='{}_{:05d}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + modality='Flow', + filename_tmpl='{}_{:05d}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.06, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr=0, + warmup='linear', + warmup_by_epoch=True, + warmup_iters=34) +total_epochs = 196 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowonly_r50_8x8x1_256e_kinetics400_flow' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..eec3694e7a7f412012e8915fb7499c577f8e4405 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_8x8x1_256e_kinetics400_rgb.py @@ -0,0 +1,93 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 256 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowonly_r50_8x8x1_256e_kinetics400_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_clip_feature_extraction_4x16x1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_clip_feature_extraction_4x16x1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..90d8087f83bb74822a1f07e21c3275e3aaa17d40 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_clip_feature_extraction_4x16x1_rgb.py @@ -0,0 +1,45 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + lateral=False, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + norm_eval=False), + train_cfg=None, + test_cfg=dict(feature_extraction=True)) + +# dataset settings +dataset_type = 'VideoDataset' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=None, + data_prefix=None, + pipeline=test_pipeline)) + +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..202fa4e330cf9e568318a327db96f5ac1b452661 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=4, frame_interval=16, num_clips=1), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=24, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.3, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 256 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowonly_r50_video_4x16x1_256e_kinetics400_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4b2b987b68e177e187ba86f20a062caa682c1cfb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics600_rgb.py @@ -0,0 +1,93 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None), cls_head=dict(num_classes=600)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics600/videos_train' +data_root_val = 'data/kinetics600/videos_val' +ann_file_train = 'data/kinetics600/kinetics600_train_list_videos.txt' +ann_file_val = 'data/kinetics600/kinetics600_val_list_videos.txt' +ann_file_test = 'data/kinetics600/kinetics600_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.15, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 256 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowonly_r50_video_8x8x1_256e_kinetics600_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4cbc90185091d540898e50ccfe45ef5d56319c60 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_8x8x1_256e_kinetics700_rgb.py @@ -0,0 +1,92 @@ +_base_ = [ + '../../_base_/models/slowonly_r50.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(pretrained=None), cls_head=dict(num_classes=700)) + +dataset_type = 'VideoDataset' +data_root = 'data/kinetics700/videos_train' +data_root_val = 'data/kinetics700/videos_val' +ann_file_train = 'data/kinetics700/kinetics700_train_list_videos.txt' +ann_file_val = 'data/kinetics700/kinetics700_val_list_videos.txt' +ann_file_test = 'data/kinetics700/kinetics700_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='DecordDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='DecordDecode'), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.15, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 256 + +# runtime settings +checkpoint_config = dict(interval=4) +work_dir = './work_dirs/slowonly_r50_video_8x8x1_256e_kinetics700_rgb' +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_inference_4x16x1_256e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_inference_4x16x1_256e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..bd61b766c2ade1617ac7649e3b9ebe093e404298 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/slowonly/slowonly_r50_video_inference_4x16x1_256e_kinetics400_rgb.py @@ -0,0 +1,33 @@ +_base_ = ['../../_base_/models/slowonly_r50.py'] + +# model settings +model = dict(backbone=dict(pretrained=None)) + +# dataset settings +dataset_type = 'VideoDataset' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=4, + frame_interval=16, + num_clips=10, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=None, + data_prefix=None, + pipeline=test_pipeline)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..25d224ea520e2bbe0e04333e2dad7927114a84d1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/README.md @@ -0,0 +1,92 @@ +# TANet + +[TAM: Temporal Adaptive Module for Video Recognition](https://openaccess.thecvf.com/content/ICCV2021/html/Liu_TAM_Temporal_Adaptive_Module_for_Video_Recognition_ICCV_2021_paper.html) + + + +## Abstract + + + +Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities. To effectively capture this diverse motion pattern, this paper presents a new temporal adaptive module ({\\bf TAM}) to generate video-specific temporal kernels based on its own feature map. TAM proposes a unique two-level adaptive modeling scheme by decoupling the dynamic kernel into a location sensitive importance map and a location invariant aggregation weight. The importance map is learned in a local temporal window to capture short-term information, while the aggregation weight is generated from a global view with a focus on long-term structure. TAM is a modular block and could be integrated into 2D CNNs to yield a powerful video architecture (TANet) with a very small extra computational cost. The extensive experiments on Kinetics-400 and Something-Something datasets demonstrate that our TAM outperforms other temporal modeling methods consistently, and achieves the state-of-the-art performance under the similar complexity. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------: | :------: | :------: | :------: | :----------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------: | :---------------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tanet_r50_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py) | short-side 320 | 8 | TANet | ImageNet | 76.28 | 92.60 | [76.22](https://github.com/liu-zhy/temporal-adaptive-module/blob/master/scripts/test_tam_kinetics_rgb_8f.sh) | [92.53](https://github.com/liu-zhy/temporal-adaptive-module/blob/master/scripts/test_tam_kinetics_rgb_8f.sh) | x | 7124 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219-032c8e94.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219.log) | [json](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219.json) | + +### Something-Something V1 + +| config | resolution | gpus | backbone | pretrain | top1 acc (efficient/accurate) | top5 acc (efficient/accurate) | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :---------------------------: | :---------------------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | +| [tanet_r50_1x1x8_50e_sthv1_rgb](/configs/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | TANet | ImageNet | 47.34/49.58 | 75.72/77.31 | 7127 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb/tanet_r50_1x1x8_50e_sthv1_rgb_20210630-f4a48609.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb/20210606_205006.log) | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb/20210606_205006.log.json) | +| [tanet_r50_1x1x16_50e_sthv1_rgb](/configs/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb.py) | height 100 | 8 | TANet | ImageNet | 49.05/50.91 | 77.90/79.13 | 7127 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb/tanet_r50_1x1x16_50e_sthv1_rgb_20211202-370c2128.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb/tanet_r50_1x1x16_50e_sthv1_rgb.log) | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb/tanet_r50_1x1x16_50e_sthv1_rgb.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 8 GPUs x 8 videos/gpu and lr=0.04 for 16 GPUs x 16 videos/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The values in columns named after "reference" are the results got by testing on our dataset, using the checkpoints provided by the author with same model settings. The checkpoints for reference repo can be downloaded [here](https://drive.google.com/drive/folders/1sFfmP3yrfc7IzRshEELOby7-aEoymIFL?usp=sharing). +4. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to corresponding parts in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train TANet model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py \ + --work-dir work_dirs/tanet_r50_dense_1x1x8_100e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test TANet model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@article{liu2020tam, + title={TAM: Temporal Adaptive Module for Video Recognition}, + author={Liu, Zhaoyang and Wang, Limin and Wu, Wayne and Qian, Chen and Lu, Tong}, + journal={arXiv preprint arXiv:2005.06803}, + year={2020} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..c99ddf852f0d38089fe7713141f4dbaf2659ae8f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/README_zh-CN.md @@ -0,0 +1,77 @@ +# TANet + +## 简介 + + + +```BibTeX +@article{liu2020tam, + title={TAM: Temporal Adaptive Module for Video Recognition}, + author={Liu, Zhaoyang and Wang, Limin and Wu, Wayne and Qian, Chen and Lu, Tong}, + journal={arXiv preprint arXiv:2005.06803}, + year={2020} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 参考代码的 top1 准确率 | 参考代码的 top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :------: | :---------: | :---------: | :----------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------: | :----------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tanet_r50_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py) | 短边 320 | 8 | TANet | ImageNet | 76.28 | 92.60 | [76.22](https://github.com/liu-zhy/temporal-adaptive-module/blob/master/scripts/test_tam_kinetics_rgb_8f.sh) | [92.53](https://github.com/liu-zhy/temporal-adaptive-module/blob/master/scripts/test_tam_kinetics_rgb_8f.sh) | x | 7124 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219-032c8e94.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219.log) | [json](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219.json) | + +### Something-Something V1 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 (efficient/accurate) | top5 准确率 (efficient/accurate) | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :------------------------------: | :------------------------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | +| [tanet_r50_1x1x8_50e_sthv1_rgb](/configs/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | TANet | ImageNet | 47.34/49.58 | 75.72/77.31 | 7127 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb/tanet_r50_1x1x8_50e_sthv1_rgb_20210630-f4a48609.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb/20210606_205006.log) | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb/20210606_205006.log.json) | +| [tanet_r50_1x1x16_50e_sthv1_rgb](/configs/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb.py) | 高 100 | 8 | TANet | ImageNet | 49.05/50.91 | 77.90/79.13 | 7127 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb/tanet_r50_1x1x16_50e_sthv1_rgb_20211202-370c2128.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb/tanet_r50_1x1x16_50e_sthv1_rgb.log) | [ckpt](https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb/tanet_r50_1x1x16_50e_sthv1_rgb.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 参考代码的结果是通过使用相同的模型配置在原来的代码库上训练得到的。对应的模型权重文件可从 [这里](https://drive.google.com/drive/folders/1sFfmP3yrfc7IzRshEELOby7-aEoymIFL?usp=sharing) 下载。 +4. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 TANet 模型在 Kinetics400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py \ + --work-dir work_dirs/tanet_r50_dense_1x1x8_100e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics400 数据集上测试 TANet 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..4e5746bf723821153b7355c84b2a0bfa5f992561 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/metafile.yml @@ -0,0 +1,80 @@ +Collections: +- Name: TANet + README: configs/recognition/tanet/README.md + Paper: + URL: https://arxiv.org/abs/2005.06803 + Title: "TAM: Temporal Adaptive Module for Video Recognition" +Models: +- Config: configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py + In Collection: TANet + Metadata: + Architecture: TANet + Batch Size: 8 + Epochs: 100 + FLOPs: 43065983104 + Parameters: 25590320 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tanet_r50_dense_1x1x8_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.28 + Top 5 Accuracy: 92.6 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219.log + Weights: https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219-032c8e94.pth +- Config: configs/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb.py + In Collection: TANet + Metadata: + Architecture: TANet + Batch Size: 8 + Epochs: 50 + FLOPs: 32972787840 + Parameters: 25127246 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tanet_r50_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 49.58 + Top 1 Accuracy (efficient): 47.34 + Top 5 Accuracy: 77.31 + Top 5 Accuracy (efficient): 75.72 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb/20210606_205006.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb/20210606_205006.log + Weights: https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb/tanet_r50_1x1x8_50e_sthv1_rgb_20210630-f4a48609.pth +- Config: configs/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb.py + In Collection: TANet + Metadata: + Architecture: TANet + Batch Size: 8 + Epochs: 50 + FLOPs: 65946542336 + Parameters: 25134670 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + gpus: 4 + Modality: RGB + Name: tanet_r50_1x1x16_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 50.91 + Top 1 Accuracy (efficient): 49.05 + Top 5 Accuracy: 79.13 + Top 5 Accuracy (efficient): 77.90 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb/tanet_r50_1x1x16_50e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb/tanet_r50_1x1x16_50e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb/tanet_r50_1x1x16_50e_sthv1_rgb_20211202-370c2128.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..741bd4db655a853ca7df514c77967e56ddd01b18 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/tanet_r50_1x1x16_50e_sthv1_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../_base_/models/tanet_r50.py', '../../_base_/default_runtime.py', + '../../_base_/schedules/sgd_tsm_50e.py' +] + +# model settings +model = dict( + backbone=dict(num_segments=16), + cls_head=dict(num_classes=174, num_segments=16, dropout_ratio=0.6)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' + +sthv1_flip_label_map = {2: 4, 4: 2, 30: 41, 41: 30, 52: 66, 66: 52} +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, flip_label_map=sthv1_flip_label_map), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(lr=0.005, weight_decay=0.001) +lr_config = dict(policy='step', step=[30, 40, 45]) + +# runtime settings +work_dir = './work_dirs/tanet_r50_1x1x16_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..2aa497dca923ddaaca3cfc86c5ad379ee9a7b910 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/tanet_r50_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,100 @@ +_base_ = [ + '../../_base_/models/tanet_r50.py', '../../_base_/default_runtime.py', + '../../_base_/schedules/sgd_tsm_50e.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174, dropout_ratio=0.6)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' + +sthv1_flip_label_map = {2: 4, 4: 2, 30: 41, 41: 30, 52: 66, 66: 52} +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, flip_label_map=sthv1_flip_label_map), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.001) +lr_config = dict(policy='step', step=[30, 40, 45]) + +# runtime settings +work_dir = './work_dirs/tanet_r50_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..3ac78366c26d864feb9967c143fa307449e669b9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,100 @@ +_base_ = [ + '../../_base_/models/tanet_r50.py', '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DenseSampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', + constructor='TSMOptimizerConstructor', + paramwise_cfg=dict(fc_lr5=True), + lr=0.01, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[50, 75, 90]) +total_epochs = 100 + +# runtime settings +work_dir = './work_dirs/tanet_r50_dense_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/README.md new file mode 100644 index 0000000000000000000000000000000000000000..71168eef5b8f8372b7fc1e8de15b0b391e67fa98 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/README.md @@ -0,0 +1,88 @@ +# TimeSformer + +[Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) + + + +## Abstract + + + +We present a convolution-free approach to video classification built exclusively on self-attention over space and time. Our method, named "TimeSformer," adapts the standard Transformer architecture to video by enabling spatiotemporal feature learning directly from a sequence of frame-level patches. Our experimental study compares different self-attention schemes and suggests that "divided attention," where temporal attention and spatial attention are separately applied within each block, leads to the best video classification accuracy among the design choices considered. Despite the radically new design, TimeSformer achieves state-of-the-art results on several action recognition benchmarks, including the best reported accuracy on Kinetics-400 and Kinetics-600. Finally, compared to 3D convolutional networks, our model is faster to train, it can achieve dramatically higher test efficiency (at a small drop in accuracy), and it can also be applied to much longer video clips (over one minute long). + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :---------: | :----------: | :------: | :------: | :---------------------: | :--------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [timesformer_divST_8x32x1_15e_kinetics400_rgb](/configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py) | short-side 320 | 8 | TimeSformer | ImageNet-21K | 77.92 | 93.29 | x | 17874 | [ckpt](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb/timesformer_divST_8x32x1_15e_kinetics400_rgb-3f8e5d03.pth) | [log](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb/timesformer_divST_8x32x1_15e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb/timesformer_divST_8x32x1_15e_kinetics400_rgb.json) | +| [timesformer_jointST_8x32x1_15e_kinetics400_rgb](/configs/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb.py) | short-side 320 | 8 | TimeSformer | ImageNet-21K | 77.01 | 93.08 | x | 25658 | [ckpt](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb/timesformer_jointST_8x32x1_15e_kinetics400_rgb-0d6e3984.pth) | [log](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb/timesformer_jointST_8x32x1_15e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb/timesformer_jointST_8x32x1_15e_kinetics400_rgb.json) | +| [timesformer_sapceOnly_8x32x1_15e_kinetics400_rgb](/configs/recognition/timesformer/timesformer_sapceOnly_8x32x1_15e_kinetics400_rgb.py) | short-side 320 | 8 | TimeSformer | ImageNet-21K | 76.93 | 92.90 | x | 12750 | [ckpt](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb-0cf829cd.pth) | [log](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu (32G V100) we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.005 for 8 GPUs x 8 videos/gpu and lr=0.00375 for 8 GPUs x 6 videos/gpu. +2. We keep the test setting with the [original repo](https://github.com/facebookresearch/TimeSformer) (three crop x 1 clip). +3. The pretrained model `vit_base_patch16_224.pth` used by TimeSformer was converted from [vision_transformer](https://github.com/google-research/vision_transformer). + +::: + +For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train TimeSformer model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py \ + --work-dir work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb.py \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test TimeSformer model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@misc{bertasius2021spacetime, + title = {Is Space-Time Attention All You Need for Video Understanding?}, + author = {Gedas Bertasius and Heng Wang and Lorenzo Torresani}, + year = {2021}, + eprint = {2102.05095}, + archivePrefix = {arXiv}, + primaryClass = {cs.CV} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..43d5511483ff29ef1d57ef08e3821548cc2fedbc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/README_zh-CN.md @@ -0,0 +1,72 @@ +# TimeSformer + +## 简介 + + + +```BibTeX +@misc{bertasius2021spacetime, + title = {Is Space-Time Attention All You Need for Video Understanding?}, + author = {Gedas Bertasius and Heng Wang and Lorenzo Torresani}, + year = {2021}, + eprint = {2102.05095}, + archivePrefix = {arXiv}, + primaryClass = {cs.CV} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :---------: | :----------: | :---------: | :---------: | :----------------: | :--------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [timesformer_divST_8x32x1_15e_kinetics400_rgb](/configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py) | 短边 320 | 8 | TimeSformer | ImageNet-21K | 77.92 | 93.29 | x | 17874 | [ckpt](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb/timesformer_divST_8x32x1_15e_kinetics400_rgb-3f8e5d03.pth) | [log](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb/timesformer_divST_8x32x1_15e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb/timesformer_divST_8x32x1_15e_kinetics400_rgb.json) | +| [timesformer_jointST_8x32x1_15e_kinetics400_rgb](/configs/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb.py) | 短边 320 | 8 | TimeSformer | ImageNet-21K | 77.01 | 93.08 | x | 25658 | [ckpt](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb/timesformer_jointST_8x32x1_15e_kinetics400_rgb-0d6e3984.pth) | [log](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb/timesformer_jointST_8x32x1_15e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb/timesformer_jointST_8x32x1_15e_kinetics400_rgb.json) | +| [timesformer_sapceOnly_8x32x1_15e_kinetics400_rgb](/configs/recognition/timesformer/timesformer_sapceOnly_8x32x1_15e_kinetics400_rgb.py) | 短边 320 | 8 | TimeSformer | ImageNet-21K | 76.93 | 92.90 | x | 12750 | [ckpt](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb-0cf829cd.pth) | [log](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数 (32G V100)。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.005 对应 8 GPUs x 8 video/gpu,以及 lr=0.004375 对应 8 GPUs x 7 video/gpu。 +2. MMAction2 保持与 [原代码](https://github.com/facebookresearch/TimeSformer) 的测试设置一致(three crop x 1 clip)。 +3. TimeSformer 使用的预训练模型 `vit_base_patch16_224.pth` 转换自 [vision_transformer](https://github.com/google-research/vision_transformer)。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 TimeSformer 模型在 Kinetics400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py \ + --work-dir work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb.py \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics400 数据集上测试 TimeSformer 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..a93c57b3929c7b989e89da2d1898d27e618e2626 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/metafile.yml @@ -0,0 +1,70 @@ +Collections: +- Name: TimeSformer + README: configs/recognition/timesformer/README.md + Paper: + URL: https://arxiv.org/abs/2102.05095 + Title: Is Space-Time Attention All You Need for Video Understanding +Models: +- Config: configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py + In Collection: TimeSformer + Metadata: + Architecture: TimeSformer + Batch Size: 8 + Epochs: 15 + Pretrained: ImageNet-21K + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: timesformer_divST_8x32x1_15e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 77.92 + Top 5 Accuracy: 93.29 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb/timesformer_divST_8x32x1_15e_kinetics400_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb/timesformer_divST_8x32x1_15e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb/timesformer_divST_8x32x1_15e_kinetics400_rgb-3f8e5d03.pth +- Config: configs/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb.py + In Collection: TimeSformer + Metadata: + Architecture: TimeSformer + Batch Size: 7 + Epochs: 15 + Pretrained: ImageNet-21K + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: timesformer_jointST_8x32x1_15e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 77.01 + Top 5 Accuracy: 93.08 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb/timesformer_jointST_8x32x1_15e_kinetics400_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb/timesformer_jointST_8x32x1_15e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb/timesformer_jointST_8x32x1_15e_kinetics400_rgb-0d6e3984.pth +- Config: configs/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.py + In Collection: TimeSformer + Metadata: + Architecture: TimeSformer + Batch Size: 8 + Epochs: 15 + Pretrained: ImageNet-21K + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.93 + Top 5 Accuracy: 92.90 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb-0cf829cd.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..8772ad953bf7f7ab44e83af4dd82b6b039f7cf01 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py @@ -0,0 +1,120 @@ +_base_ = ['../../_base_/default_runtime.py'] + +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='TimeSformer', + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth', # noqa: E501 + num_frames=8, + img_size=224, + patch_size=16, + embed_dims=768, + in_channels=3, + dropout_ratio=0., + transformer_layers=None, + attention_type='divided_space_time', + norm_cfg=dict(type='LN', eps=1e-6)), + cls_head=dict(type='TimeSformerHead', num_classes=400, in_channels=768), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=32, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=224), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=32, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=32, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 224)), + dict(type='ThreeCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', + lr=0.005, + momentum=0.9, + paramwise_cfg=dict( + custom_keys={ + '.backbone.cls_token': dict(decay_mult=0.0), + '.backbone.pos_embed': dict(decay_mult=0.0), + '.backbone.time_embed': dict(decay_mult=0.0) + }), + weight_decay=1e-4, + nesterov=True) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) + +# learning policy +lr_config = dict(policy='step', step=[5, 10]) +total_epochs = 15 + +# runtime settings +checkpoint_config = dict(interval=1) +work_dir = './work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4f4fdf7cbc544a35b163d355043483769c6ae69e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/timesformer_jointST_8x32x1_15e_kinetics400_rgb.py @@ -0,0 +1,119 @@ +_base_ = ['../../_base_/default_runtime.py'] + +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='TimeSformer', + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth', # noqa: E501 + num_frames=8, + img_size=224, + patch_size=16, + embed_dims=768, + in_channels=3, + dropout_ratio=0., + transformer_layers=None, + attention_type='joint_space_time', + norm_cfg=dict(type='LN', eps=1e-6)), + cls_head=dict(type='TimeSformerHead', num_classes=400, in_channels=768), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=32, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=224), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=32, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=32, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 224)), + dict(type='ThreeCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +data = dict( + videos_per_gpu=7, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', + lr=0.004375, + momentum=0.9, + paramwise_cfg=dict( + custom_keys={ + '.backbone.cls_token': dict(decay_mult=0.0), + '.backbone.pos_embed': dict(decay_mult=0.0), + '.backbone.time_embed': dict(decay_mult=0.0) + }), + weight_decay=1e-4, + nesterov=True) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[5, 10]) +total_epochs = 15 + +# runtime settings +checkpoint_config = dict(interval=1) +work_dir = './work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..a6207d9542aeb58f11c96c8e7f456464933340c5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/timesformer/timesformer_spaceOnly_8x32x1_15e_kinetics400_rgb.py @@ -0,0 +1,118 @@ +_base_ = ['../../_base_/default_runtime.py'] + +# model settings +model = dict( + type='Recognizer3D', + backbone=dict( + type='TimeSformer', + pretrained= # noqa: E251 + 'https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth', # noqa: E501 + num_frames=8, + img_size=224, + patch_size=16, + embed_dims=768, + in_channels=3, + dropout_ratio=0., + transformer_layers=None, + attention_type='space_only', + norm_cfg=dict(type='LN', eps=1e-6)), + cls_head=dict(type='TimeSformerHead', num_classes=400, in_channels=768), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=32, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='RandomRescale', scale_range=(256, 320)), + dict(type='RandomCrop', size=224), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=32, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=32, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 224)), + dict(type='ThreeCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', + lr=0.005, + momentum=0.9, + paramwise_cfg=dict( + custom_keys={ + '.backbone.cls_token': dict(decay_mult=0.0), + '.backbone.pos_embed': dict(decay_mult=0.0) + }), + weight_decay=1e-4, + nesterov=True) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[5, 10]) +total_epochs = 15 + +# runtime settings +checkpoint_config = dict(interval=1) +work_dir = './work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/README.md new file mode 100644 index 0000000000000000000000000000000000000000..72aa519033df0ead2a58bcd7c29283a89822f5ab --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/README.md @@ -0,0 +1,102 @@ +# TIN + +[Temporal Interlacing Network](https://ojs.aaai.org/index.php/AAAI/article/view/6872) + + + +## Abstract + + + +For a long time, the vision community tries to learn the spatio-temporal representation by combining convolutional neural network together with various temporal models, such as the families of Markov chain, optical flow, RNN and temporal convolution. However, these pipelines consume enormous computing resources due to the alternately learning process for spatial and temporal information. One natural question is whether we can embed the temporal information into the spatial one so the information in the two domains can be jointly learned once-only. In this work, we answer this question by presenting a simple yet powerful operator -- temporal interlacing network (TIN). Instead of learning the temporal features, TIN fuses the two kinds of information by interlacing spatial representations from the past to the future, and vice versa. A differentiable interlacing target can be learned to control the interlacing process. In this way, a heavy temporal model is replaced by a simple interlacing operator. We theoretically prove that with a learnable interlacing target, TIN performs equivalently to the regularized temporal convolution network (r-TCN), but gains 4% more accuracy with 6x less latency on 6 challenging benchmarks. These results push the state-of-the-art performances of video understanding by a considerable margin. Not surprising, the ensemble model of the proposed TIN won the 1st place in the ICCV19 - Multi Moments in Time challenge. + + + +
+ +
+ +## Results and Models + +### Something-Something V1 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top5 acc | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :------: | :------: | :----------------: | :----------------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | +| [tin_r50_1x1x8_40e_sthv1_rgb](/configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py) | height 100 | 8x4 | ResNet50 | ImageNet | 44.25 | 73.94 | 44.04 | 72.72 | 6181 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb/tin_r50_1x1x8_40e_sthv1_rgb_20200729-4a33db86.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb/20200729_034132.log) | [json](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb/20200729_034132.log.json) | + +### Something-Something V2 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top5 acc | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :------: | :------: | :----------------: | :----------------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | +| [tin_r50_1x1x8_40e_sthv2_rgb](/configs/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb.py) | height 240 | 8x4 | ResNet50 | ImageNet | 56.70 | 83.62 | 56.48 | 83.45 | 6185 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb/tin_r50_1x1x8_40e_sthv2_rgb_20200912-b27a7337.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb/20200912_225451.log) | [json](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb/20200912_225451.log.json) | + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------: | :-------------: | :------: | :------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | +| [tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb.py) | short-side 256 | 8x4 | ResNet50 | TSM-Kinetics400 | 70.89 | 89.89 | 6187 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb_20200810-4a146a70.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/20200809_142447.log) | [json](https://download.openmmlab.com/mmaction/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/20200809_142447.log.json) | + +Here, we use `finetune` to indicate that we use [TSM model](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/tsm_r50_1x1x8_50e_kinetics400_rgb_20200607-af7fb746.pth) trained on Kinetics-400 to finetune the TIN model on Kinetics-400. + +:::{note} + +1. The **reference topk acc** are got by training the [original repo #1aacd0c](https://github.com/deepcs233/TIN/tree/1aacd0c4c30d5e1d334bf023e55b855b59f158db) with no [AverageMeter issue](https://github.com/deepcs233/TIN/issues/4). + The [AverageMeter issue](https://github.com/deepcs233/TIN/issues/4) will lead to incorrect performance, so we fix it before running. +2. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +3. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, + not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +4. The values in columns named after "reference" are the results got by training on the original repo, using the same model settings. +5. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to Kinetics400, Something-Something V1 and Something-Something V2 in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train TIN model on Something-Something V1 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py \ + --work-dir work_dirs/tin_r50_1x1x8_40e_sthv1_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test TIN model on Something-Something V1 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@article{shao2020temporal, + title={Temporal Interlacing Network}, + author={Hao Shao and Shengju Qian and Yu Liu}, + year={2020}, + journal={AAAI}, +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..7e7e85d1fa7c2cdc25f5d71ec1534be1b5119e2f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/README_zh-CN.md @@ -0,0 +1,85 @@ +# TIN + +## 简介 + + + +```BibTeX +@article{shao2020temporal, + title={Temporal Interlacing Network}, + author={Hao Shao and Shengju Qian and Yu Liu}, + year={2020}, + journal={AAAI}, +} +``` + +## 模型库 + +### Something-Something V1 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 参考代码的 top1 准确率 | 参考代码的 top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :---------: | :---------: | :--------------------: | :--------------------: | :--------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | +| [tin_r50_1x1x8_40e_sthv1_rgb](/configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py) | 高 100 | 8x4 | ResNet50 | ImageNet | 44.25 | 73.94 | 44.04 | 72.72 | 6181 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb/tin_r50_1x1x8_40e_sthv1_rgb_20200729-4a33db86.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb/20200729_034132.log) | [json](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb/20200729_034132.log.json) | + +### Something-Something V2 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 参考代码的 top1 准确率 | 参考代码的 top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :---------: | :---------: | :--------------------: | :--------------------: | :--------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | +| [tin_r50_1x1x8_40e_sthv2_rgb](/configs/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb.py) | 高 240 | 8x4 | ResNet50 | ImageNet | 56.70 | 83.62 | 56.48 | 83.45 | 6185 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb/tin_r50_1x1x8_40e_sthv2_rgb_20200912-b27a7337.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb/20200912_225451.log) | [json](https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb/20200912_225451.log.json) | + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :-------------: | :---------: | :---------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | +| [tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb.py) | 短边 256 | 8x4 | ResNet50 | TSM-Kinetics400 | 70.89 | 89.89 | 6187 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb_20200810-4a146a70.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/20200809_142447.log) | [json](https://download.openmmlab.com/mmaction/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/20200809_142447.log.json) | + +这里,MMAction2 使用 `finetune` 一词表示 TIN 模型使用 Kinetics400 上的 [TSM 模型](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/tsm_r50_1x1x8_50e_kinetics400_rgb_20200607-af7fb746.pth) 进行微调。 + +注: + +1. 参考代码的结果是通过 [原始 repo](https://github.com/deepcs233/TIN/tree/1aacd0c4c30d5e1d334bf023e55b855b59f158db) 解决 [AverageMeter 相关问题](https://github.com/deepcs233/TIN/issues/4) 后训练得到的,该问题会导致错误的精度计算。 +2. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +3. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +4. 参考代码的结果是通过使用相同的模型配置在原来的代码库上训练得到的。 +5. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400, Something-Something V1 and Something-Something V2 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 TIN 模型在 Something-Something V1 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py \ + --work-dir work_dirs/tin_r50_1x1x8_40e_sthv1_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Something-Something V1 数据集上测试 TIN 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..a820f93c9c8c5fa3d72dfb3b0a2efd5870494fa9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/metafile.yml @@ -0,0 +1,76 @@ +Collections: +- Name: TIN + README: configs/recognition/tin/README.md + Paper: + URL: https://arxiv.org/abs/2001.06499 + Title: Temporal Interlacing Network +Models: +- Config: configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py + In Collection: TIN + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 40 + FLOPs: 32962097536 + Parameters: 23895566 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 32 GPUs + Modality: RGB + Name: tin_r50_1x1x8_40e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 44.25 + Top 5 Accuracy: 73.94 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb/20200729_034132.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb/20200729_034132.log + Weights: https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb/tin_r50_1x1x8_40e_sthv1_rgb_20200729-4a33db86.pth +- Config: configs/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb.py + In Collection: TIN + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 40 + FLOPs: 32962097536 + Parameters: 23895566 + Pretrained: ImageNet + Resolution: height 240 + Training Data: SthV2 + Training Resources: 32 GPUs + Modality: RGB + Name: tin_r50_1x1x8_40e_sthv2_rgb + Results: + - Dataset: SthV2 + Metrics: + Top 1 Accuracy: 56.7 + Top 5 Accuracy: 83.62 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb/20200912_225451.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb/20200912_225451.log + Weights: https://download.openmmlab.com/mmaction/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb/tin_r50_1x1x8_40e_sthv2_rgb_20200912-b27a7337.pth +- Config: configs/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb.py + In Collection: TIN + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 50 + FLOPs: 32965800320 + Parameters: 24358640 + Pretrained: TSM-Kinetics400 + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.89 + Top 5 Accuracy: 89.89 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/20200809_142447.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/20200809_142447.log + Weights: https://download.openmmlab.com/mmaction/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb_20200810-4a146a70.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..3ba652479ba821b5bb8257293a580e3e32540381 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/tin_r50_1x1x8_40e_sthv1_rgb.py @@ -0,0 +1,106 @@ +_base_ = ['../../_base_/models/tin_r50.py', '../../_base_/default_runtime.py'] + +# model settings +model = dict(cls_head=dict(num_classes=174, dropout_ratio=0.8)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', + constructor='TSMOptimizerConstructor', + paramwise_cfg=dict(fc_lr5=True), + lr=0.02, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0005) +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + min_lr_ratio=0.5, + warmup='linear', + warmup_ratio=0.1, + warmup_by_epoch=True, + warmup_iters=1) +total_epochs = 40 + +# runtime settings +work_dir = './work_dirs/tin_r50_1x1x8_40e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..35bbd26b0003a7283d5868ccbe5399474730c0a5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb.py @@ -0,0 +1,103 @@ +_base_ = ['../../_base_/models/tin_r50.py', '../../_base_/default_runtime.py'] + +# model settings +model = dict(cls_head=dict(num_classes=174, dropout_ratio=0.8)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv2/rawframes' +data_root_val = 'data/sthv2/rawframes' +ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt' +ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt' +ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', + constructor='TSMOptimizerConstructor', + paramwise_cfg=dict(fc_lr5=True), + lr=0.02, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0005) +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict( + policy='CosineAnnealing', + by_epoch=False, + warmup='linear', + warmup_iters=1, + warmup_by_epoch=True, + min_lr=0) +total_epochs = 40 + +# runtime settings +work_dir = './work_dirs/tin_r50_1x1x8_40e_sthv2_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..81f03a7344ebf624c5440f305fd297c8cf6c9d1e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb.py @@ -0,0 +1,93 @@ +_base_ = [ + '../../_base_/models/tin_r50.py', '../../_base_/schedules/sgd_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(is_shift=True)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +# optimizer +optimizer = dict( + constructor='TSMOptimizerConstructor', paramwise_cfg=dict(fc_lr5=True)) +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) + +# runtime settings +work_dir = './work_dirs/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb/' +load_from = 'https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/tsm_r50_1x1x8_50e_kinetics400_rgb_20200607-af7fb746.pth' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..dbb0d42e0fc5045101ef5cd4d5863652a68b32c9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/README.md @@ -0,0 +1,92 @@ +# TPN + +[Temporal Pyramid Network for Action Recognition](https://openaccess.thecvf.com/content_CVPR_2020/html/Yang_Temporal_Pyramid_Network_for_Action_Recognition_CVPR_2020_paper.html) + + + +## Abstract + + + +Visual tempo characterizes the dynamics and the temporal scale of an action. Modeling such visual tempos of different actions facilitates their recognition. Previous works often capture the visual tempo through sampling raw videos at multiple rates and constructing an input-level frame pyramid, which usually requires a costly multi-branch network to handle. In this work we propose a generic Temporal Pyramid Network (TPN) at the feature-level, which can be flexibly integrated into 2D or 3D backbone networks in a plug-and-play manner. Two essential components of TPN, the source of features and the fusion of features, form a feature hierarchy for the backbone so that it can capture action instances at various tempos. TPN also shows consistent improvements over other challenging baselines on several action recognition datasets. Specifically, when equipped with TPN, the 3D ResNet-50 with dense sampling obtains a 2% gain on the validation set of Kinetics-400. A further analysis also reveals that TPN gains most of its improvements on action classes that have large variances in their visual tempos, validating the effectiveness of TPN. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------: | :------: | :------: | :------: | :-------------------------------------------------------------------: | :-------------------------------------------------------------------: | :---------------------: | :--------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tpn_slowonly_r50_8x8x1_150e_kinetics_rgb](/configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py) | short-side 320 | 8x2 | ResNet50 | None | 73.58 | 91.35 | x | x | x | 6916 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb-c568e7ad.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.json) | +| [tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb](/configs/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.py) | short-side 320 | 8 | ResNet50 | ImageNet | 76.59 | 92.72 | [75.49](https://github.com/decisionforce/TPN/blob/master/MODELZOO.md) | [92.05](https://github.com/decisionforce/TPN/blob/master/MODELZOO.md) | x | 6916 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb-44362b55.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.json) | + +### Something-Something V1 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :----------------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :------: | :------: | :--------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------: | +| [tpn_tsm_r50_1x1x8_150e_sthv1_rgb](/configs/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.py) | height 100 | 8x6 | ResNet50 | TSM | 51.50 | 79.15 | 8828 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb/tpn_tsm_r50_1x1x8_150e_sthv1_rgb_20211202-c28ed83f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, + not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The values in columns named after "reference" are the results got by testing the checkpoint released on the original repo and codes, using the same dataset with ours. +4. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to Kinetics400, Something-Something V1 and Something-Something V2 in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train TPN model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py \ + --work-dir work_dirs/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb [--validate --seed 0 --deterministic] +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test TPN model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips prob +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{yang2020tpn, + title={Temporal Pyramid Network for Action Recognition}, + author={Yang, Ceyuan and Xu, Yinghao and Shi, Jianping and Dai, Bo and Zhou, Bolei}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, + year={2020}, +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..e1d4c21d2252218e8a515254c1f9a60cb34bde0a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/README_zh-CN.md @@ -0,0 +1,74 @@ +# TPN + +## 简介 + + + +```BibTeX +@inproceedings{yang2020tpn, + title={Temporal Pyramid Network for Action Recognition}, + author={Yang, Ceyuan and Xu, Yinghao and Shi, Jianping and Dai, Bo and Zhou, Bolei}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, + year={2020}, +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 参考代码的 top1 准确率 | 参考代码的 top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :------: | :---------: | :---------: | :-------------------------------------------------------------------: | :-------------------------------------------------------------------: | :----------------: | :--------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tpn_slowonly_r50_8x8x1_150e_kinetics_rgb](/configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py) | 短边 320 | 8x2 | ResNet50 | None | 73.58 | 91.35 | x | x | x | 6916 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb-c568e7ad.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.json) | +| [tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb](/configs/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.py) | 短边 320 | 8 | ResNet50 | ImageNet | 76.59 | 92.72 | [75.49](https://github.com/decisionforce/TPN/blob/master/MODELZOO.md) | [92.05](https://github.com/decisionforce/TPN/blob/master/MODELZOO.md) | x | 6916 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb-44362b55.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.json) | + +### Something-Something V1 + +|配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率| top5 准确率 | GPU 显存占用 (M) | ckpt | log| json| +|:--|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:| +|[tpn_tsm_r50_1x1x8_150e_sthv1_rgb](/configs/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.py)|height 100|8x6| ResNet50 | TSM | 51.50 | 79.15 | 8828 |[ckpt](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb/tpn_tsm_r50_1x1x8_150e_sthv1_rgb_20211202-c28ed83f.pth) |[log](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.log)|[json](https://download.openmmlab.com/mmaction/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.json)| + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 参考代码的结果是通过使用相同的模型配置在原来的代码库上训练得到的。 +4. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 TPN 模型在 Kinetics-400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py \ + --work-dir work_dirs/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb [--validate --seed 0 --deterministic] +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics-400 数据集上测试 TPN 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips prob +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..973b6adaa6467de257ffaf9dc900b9c497d02baa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/metafile.yml @@ -0,0 +1,76 @@ +Collections: +- Name: TPN + README: configs/recognition/tpn/README.md + Paper: + URL: https://arxiv.org/abs/2004.03548 + Title: Temporal Pyramid Network for Action Recognition +Models: +- Config: configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py + In Collection: TPN + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 150 + FLOPs: 66014576640 + Parameters: 91498336 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: tpn_slowonly_r50_8x8x1_150e_kinetics_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.58 + top5 accuracy: 91.35 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb-c568e7ad.pth +- Config: configs/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.py + In Collection: TPN + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 150 + FLOPs: 66014576640 + Parameters: 91498336 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 76.59 + top5 accuracy: 92.72 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb-44362b55.pth +- Config: configs/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.py + In Collection: TPN + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 150 + FLOPs: 54202822656 + Parameters: 82445724 + Pretrained: TSM + Resolution: height 100 + Training Data: SthV1 + Training Resources: 48 GPUs + Modality: RGB + Name: tpn_tsm_r50_1x1x8_150e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 51.50 + Top 5 Accuracy: 79.15 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb/tpn_tsm_r50_1x1x8_150e_sthv1_rgb_20211202-c28ed83f.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..3b1738fdcf23253a53a6d24654e6c3107b384ab7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.py @@ -0,0 +1,89 @@ +_base_ = [ + '../../_base_/models/tpn_slowonly_r50.py', + '../../_base_/default_runtime.py' +] + +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='ColorJitter'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='ColorJitter'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=8, + frame_interval=8, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=8, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001, + nesterov=True) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[75, 125]) +total_epochs = 150 + +# runtime settings +work_dir = './work_dirs/tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics400_rgb' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..009b076e15943e94f1543ea6d9c47195920dd630 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py @@ -0,0 +1,7 @@ +_base_ = ['./tpn_imagenet_pretrained_slowonly_r50_8x8x1_150e_kinetics_rgb.py'] + +# model settings +model = dict(backbone=dict(pretrained=None)) + +# runtime settings +work_dir = './work_dirs/tpn_slowonly_r50_8x8x1_150e_kinetics400_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..0258f4a3d4fa94e01877ace44a56f53f8d989711 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.py @@ -0,0 +1,89 @@ +_base_ = [ + '../../_base_/models/tpn_tsm_r50.py', '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='ColorJitter'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=8, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005, + nesterov=True) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[75, 125]) +total_epochs = 150 + +# runtime settings +work_dir = './work_dirs/tpn_tsm_r50_1x1x8_150e_kinetics400_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..004bead94c192d10e377b21a817afbb425b15491 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/README.md @@ -0,0 +1,94 @@ +# TRN + +[Temporal Relational Reasoning in Videos](https://openaccess.thecvf.com/content_ECCV_2018/html/Bolei_Zhou_Temporal_Relational_Reasoning_ECCV_2018_paper.html) + + + +## Abstract + + + +Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets - Something-Something, Jester, and Charades - which fundamentally depend on temporal relational reasoning. Our results demonstrate that the proposed TRN gives convolutional neural networks a remarkable capacity to discover temporal relations in videos. Through only sparsely sampled video frames, TRN-equipped networks can accurately predict human-object interactions in the Something-Something dataset and identify various human gestures on the Jester dataset with very competitive performance. TRN-equipped networks also outperform two-stream networks and 3D convolution networks in recognizing daily activities in the Charades dataset. Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos. + + + +
+ +
+ +## Results and Models + +### Something-Something V1 + +| config | resolution | gpus | backbone | pretrain | top1 acc (efficient/accurate) | top5 acc (efficient/accurate) | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :---------------------------: | :---------------------------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | +| [trn_r50_1x1x8_50e_sthv1_rgb](/configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 31.62 / 33.88 | 60.01 / 62.12 | 11010 | [ckpt](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb/trn_r50_1x1x8_50e_sthv1_rgb_20210401-163704a8.pth) | [log](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb/20210326_103948.log) | [json](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb/20210326_103948.log.json) | + +### Something-Something V2 + +| config | resolution | gpus | backbone | pretrain | top1 acc (efficient/accurate) | top5 acc (efficient/accurate) | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :---------------------------: | :---------------------------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | +| [trn_r50_1x1x8_50e_sthv2_rgb](/configs/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb.py) | height 256 | 8 | ResNet50 | ImageNet | 48.39 / 51.28 | 76.58 / 78.65 | 11010 | [ckpt](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb/trn_r50_1x1x8_50e_sthv2_rgb_20210816-7abbc4c1.pth) | [log](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb/20210816_221356.log) | [json](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb/20210816_221356.log.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. There are two kinds of test settings for Something-Something dataset, efficient setting (center crop x 1 clip) and accurate setting (Three crop x 2 clip). +3. In the original [repository](https://github.com/zhoubolei/TRN-pytorch), the author augments data with random flipping on something-something dataset, but the augmentation method may be wrong due to the direct actions, such as `push left to right`. So, we replaced `flip` with `flip with label mapping`, and change the testing method `TenCrop`, which has five flipped crops, to `Twice Sample & ThreeCrop`. +4. We use `ResNet50` instead of `BNInception` as the backbone of TRN. When Training `TRN-ResNet50` on sthv1 dataset in the original repository, we get top1 (top5) accuracy 30.542 (58.627) vs. ours 31.62 (60.01). + +::: + +For more details on data preparation, you can refer to + +- [preparing_sthv1](/tools/data/sthv1/README.md) +- [preparing_sthv2](/tools/data/sthv2/README.md) + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train TRN model on sthv1 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py \ + --work-dir work_dirs/trn_r50_1x1x8_50e_sthv1_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test TRN model on sthv1 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@article{zhou2017temporalrelation, + title = {Temporal Relational Reasoning in Videos}, + author = {Zhou, Bolei and Andonian, Alex and Oliva, Aude and Torralba, Antonio}, + journal={European Conference on Computer Vision}, + year={2018} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..d8c6cdd8948a85b58b684c20d0b3f9d10ff985ae --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/README_zh-CN.md @@ -0,0 +1,78 @@ +# TRN + +## 简介 + + + +```BibTeX +@article{zhou2017temporalrelation, + title = {Temporal Relational Reasoning in Videos}, + author = {Zhou, Bolei and Andonian, Alex and Oliva, Aude and Torralba, Antonio}, + journal={European Conference on Computer Vision}, + year={2018} +} +``` + +## 模型库 + +### Something-Something V1 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 (efficient/accurate) | top5 准确率 (efficient/accurate) | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :------------------------------: | :------------------------------: | :--------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | +| [trn_r50_1x1x8_50e_sthv1_rgb](/configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 31.62 / 33.88 | 60.01 / 62.12 | 11010 | [ckpt](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb/trn_r50_1x1x8_50e_sthv1_rgb_20210401-163704a8.pth) | [log](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb/20210326_103948.log) | [json](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb/20210326_103948.log.json) | + +### Something-Something V2 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 (efficient/accurate) | top5 准确率 (efficient/accurate) | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :------------------------------: | :------------------------------: | :--------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | +| [trn_r50_1x1x8_50e_sthv2_rgb](/configs/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb.py) | 高 256 | 8 | ResNet50 | ImageNet | 48.39 / 51.28 | 76.58 / 78.65 | 11010 | [ckpt](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb/trn_r50_1x1x8_50e_sthv2_rgb_20210816-7abbc4c1.pth) | [log](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb/20210816_221356.log) | [json](https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb/20210816_221356.log.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 对于 Something-Something 数据集,有两种测试方案:efficient(对应 center crop x 1 clip)和 accurate(对应 Three crop x 2 clip)。 +3. 在原代码库中,作者在 Something-Something 数据集上使用了随机水平翻转,但这种数据增强方法有一些问题,因为 Something-Something 数据集有一些方向性的动作,比如`从左往右推`。所以 MMAction2 把`随机水平翻转`改为`带标签映射的水平翻转`,同时修改了测试模型的数据处理方法,即把`裁剪 10 个图像块`(这里面包括 5 个翻转后的图像块)修改成`采帧两次 & 裁剪 3 个图像块`。 +4. MMAction2 使用 `ResNet50` 代替 `BNInception` 作为 TRN 的主干网络。使用原代码,在 sthv1 数据集上训练 `TRN-ResNet50` 时,实验得到的 top1 (top5) 的准确度为 30.542 (58.627),而 MMAction2 的精度为 31.62 (60.01)。 + +关于数据处理的更多细节,用户可以参照 + +- [准备 sthv1](/tools/data/sthv1/README_zh-CN.md) +- [准备 sthv2](/tools/data/sthv2/README_zh-CN.md) + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 TRN 模型在 sthv1 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py \ + --work-dir work_dirs/trn_r50_1x1x8_50e_sthv1_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 sthv1 数据集上测试 TRN 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..39bedaa26f72bf37b35e6346f56a5d3758594d94 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/metafile.yml @@ -0,0 +1,55 @@ +Collections: +- Name: TRN + README: configs/recognition/trn/README.md + Paper: + URL: https://arxiv.org/abs/1711.08496 + Title: Temporal Relational Reasoning in Videos +Models: +- Config: configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py + In Collection: TRN + Metadata: + Architecture: ResNet50 + Batch Size: 16 + Epochs: 50 + Parameters: 26641154 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: trn_r50_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 33.88 + Top 1 Accuracy (efficient): 31.62 + Top 5 Accuracy: 62.12 + Top 5 Accuracy (efficient): 60.01 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb/20210326_103948.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb/20210326_103948.log + Weights: https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb/trn_r50_1x1x8_50e_sthv1_rgb_20210401-163704a8.pth +- Config: configs/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb.py + In Collection: TRN + Metadata: + Architecture: ResNet50 + Batch Size: 16 + Epochs: 50 + Parameters: 26641154 + Pretrained: ImageNet + Resolution: height 256 + Training Data: SthV2 + Training Resources: 8 GPUs + Modality: RGB + Name: trn_r50_1x1x8_50e_sthv2_rgb + Results: + - Dataset: SthV2 + Metrics: + Top 1 Accuracy: 51.28 + Top 1 Accuracy (efficient): 48.39 + Top 5 Accuracy: 78.65 + Top 5 Accuracy (efficient): 76.58 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb/20210816_221356.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb/20210816_221356.log + Weights: https://download.openmmlab.com/mmaction/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb/trn_r50_1x1x8_50e_sthv2_rgb_20210816-7abbc4c1.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..dac55c03b75053f44bf24fc35795798c4d8de573 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/trn_r50_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../_base_/models/trn_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' + +sthv1_flip_label_map = {2: 4, 4: 2, 30: 41, 41: 30, 52: 66, 66: 52} +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, flip_label_map=sthv1_flip_label_map), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(lr=0.002, paramwise_cfg=dict(fc_lr5=False), weight_decay=5e-4) +# learning policy +lr_config = dict(policy='step', step=[30, 45]) +total_epochs = 50 + +# runtime settings +find_unused_parameters = True +work_dir = './work_dirs/trn_r50_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..ab0ba48bb2e345d29bf085c252b124f32601a984 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/trn/trn_r50_1x1x8_50e_sthv2_rgb.py @@ -0,0 +1,99 @@ +_base_ = [ + '../../_base_/models/trn_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv2/rawframes' +data_root_val = 'data/sthv2/rawframes' +ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt' +ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt' +ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt' + +sthv2_flip_label_map = {86: 87, 87: 86, 93: 94, 94: 93, 166: 167, 167: 166} +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, flip_label_map=sthv2_flip_label_map), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(lr=0.002, paramwise_cfg=dict(fc_lr5=False), weight_decay=5e-4) +# learning policy +lr_config = dict(policy='step', step=[30, 45]) +total_epochs = 50 + +# runtime settings +find_unused_parameters = True +work_dir = './work_dirs/trn_r50_1x1x8_50e_sthv2_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/README.md new file mode 100644 index 0000000000000000000000000000000000000000..35333731f477f60f7820a0f95cdc42d0dc455af0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/README.md @@ -0,0 +1,193 @@ +# TSM + +[TSM: Temporal Shift Module for Efficient Video Understanding](https://openaccess.thecvf.com/content_ICCV_2019/html/Lin_TSM_Temporal_Shift_Module_for_Efficient_Video_Understanding_ICCV_2019_paper.html) + + + +## Abstract + + + +The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :---------: | :------: | :------: | :------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 70.24 | 89.56 | [70.36](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_8f.sh) | [89.49](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_8f.sh) | 74.0 (8x1 frames) | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/tsm_r50_1x1x8_50e_kinetics400_rgb_20200607-af7fb746.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20200607_211800.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20200607_211800.log.json) | +| [tsm_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 70.59 | 89.52 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/tsm_r50_256p_1x1x8_50e_kinetics400_rgb_20200726-020785e2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/20200725_031623.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/20200725_031623.log.json) | +| [tsm_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | short-side 320 | 8 | ResNet50 | ImageNet | 70.73 | 89.81 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/tsm_r50_1x1x8_50e_kinetics400_rgb_20210701-68d582b4.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20210616_021451.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20210616_021451.log.json) | +| [tsm_r50_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb.py) | short-side 320 | 8 | ResNet50 | ImageNet | 71.90 | 90.03 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb/tsm_r50_1x1x8_100e_kinetics400_rgb_20210701-7ff22268.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb/20210617_103543.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb/20210617_103543.log.json) | +| [tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb.py](/configs/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 70.48 | 89.40 | x | x | x | 7076 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb_20210219-bf96e6cc.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb_20210219.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb_20210219.json) | +| [tsm_r50_video_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 70.25 | 89.66 | [70.36](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_8f.sh) | [89.49](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_8f.sh) | 74.0 (8x1 frames) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_1x1x8_100e_kinetics400_rgb_20200702-a77f4328.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_2d_1x1x8_50e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_2d_1x1x8_50e_kinetics400_rgb.log.json) | +| [tsm_r50_dense_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb.py) | short-side 320 | 8 | ResNet50 | ImageNet | 73.46 | 90.84 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/tsm_r50_dense_1x1x8_50e_kinetics400_rgb_20210701-a54ff3d3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/20210617_103245.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/20210617_103245.log.json) | +| [tsm_r50_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb.py) | short-side 320 | 8 | ResNet50 | ImageNet | 74.55 | 91.74 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/tsm_r50_dense_1x1x8_100e_kinetics400_rgb_20210701-e3e5e97f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/20210613_034931.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/20210613_034931.log.json) | +| [tsm_r50_1x1x16_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 72.09 | 90.37 | [70.67](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_16f.sh) | [89.98](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_16f.sh) | 47.0 (16x1 frames) | 10404 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/tsm_r50_340x256_1x1x16_50e_kinetics400_rgb_20201011-2f27f229.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20201011_205356.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20201011_205356.log.json) | +| [tsm_r50_1x1x16_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py) | short-side 256 | 8x4 | ResNet50 | ImageNet | 71.89 | 90.73 | x | x | x | 10398 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/tsm_r50_256p_1x1x16_50e_kinetics400_rgb_20201010-85645c2a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/20201010_224825.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/20201010_224825.log.json) | +| [tsm_r50_1x1x16_100e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb.py) | short-side 320 | 8 | ResNet50 | ImageNet | 72.80 | 90.75 | x | x | x | 10398 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb/tsm_r50_1x1x16_100e_kinetics400_rgb_20210701-41ac92b9.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb/20210618_193859.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb/20210618_193859.log.json) | +| [tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb.py) | short-side 320 | 8x4 | ResNet50 | ImageNet | 72.03 | 90.25 | 71.81 | 90.36 | x | 8931 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb_20200724-f00f1336.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200724_120023.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200724_120023.log.json) | +| [tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py) | short-side 320 | 8x4 | ResNet50 | ImageNet | 70.70 | 89.90 | x | x | x | 10125 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb_20200816-b93fd297.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200815_210253.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200815_210253.log.json) | +| [tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py) | short-side 320 | 8x4 | ResNet50 | ImageNet | 71.60 | 90.34 | x | x | x | 8358 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb_20200724-d8ad84d2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/20200723_220442.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/20200723_220442.log.json) | +| [tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb.py) | short-side 320 | 8 | MobileNetV2 | ImageNet | 68.46 | 88.64 | x | x | x | 3385 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/tsm_mobilenetv2_dense_320p_1x1x8_100e_kinetics400_rgb_20210202-61135809.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/20210129_024936.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/20210129_024936.log.json) | +| [tsm_mobilenetv2_dense_1x1x8_kinetics400_rgb_port](/configs/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb.py) | short-side 320 | 8 | MobileNetV2 | ImageNet | 69.89 | 89.01 | x | x | x | 3385 | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_kinetics400_rgb_port_20210922-aa5cadf6.pth) | x | x | + +### Diving48 + +| config | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------- | :--: | :------: | :------: | :------: | :------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_video_1x1x8_50e_diving48_rgb](/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 75.99 | 97.16 | 7070 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/tsm_r50_video_1x1x8_50e_diving48_rgb_20210426-aba5aa3d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/20210426_012424.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/20210426_012424.log.json) | +| [tsm_r50_video_1x1x16_50e_diving48_rgb](/configs/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 81.62 | 97.66 | 7070 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/tsm_r50_video_1x1x16_50e_diving48_rgb_20210426-aa9631c0.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/20210426_012823.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/20210426_012823.log.json) | + +### Something-Something V1 + +| config | resolution | gpus | backbone | pretrain | top1 acc (efficient/accurate) | top5 acc (efficient/accurate) | reference top1 acc (efficient/accurate) | reference top5 acc (efficient/accurate) | gpu_mem(M) | ckpt | log | json | +| :----------------------------------------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :---------------------------: | :---------------------------: | :--------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------: | :--------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 45.58 / 47.70 | 75.02 / 76.12 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb/tsm_r50_1x1x8_50e_sthv1_rgb_20210203-01dce462.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb/20210203_150227.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb/20210203_150227.log.json) | +| [tsm_r50_flip_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 47.10 / 48.51 | 76.02 / 77.56 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb/tsm_r50_flip_1x1x8_50e_sthv1_rgb_20210203-12596f16.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb/20210203_145829.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb/20210203_145829.log.json) | +| [tsm_r50_randaugment_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 47.16 / 48.90 | 76.07 / 77.92 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb_20210324-481268d9.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 47.65 / 48.66 | 76.67 / 77.41 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb-ee93e5e3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 46.26 / 47.68 | 75.92 / 76.49 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb-4f4f4740.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 47.85 / 50.31 | 76.78 / 78.18 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb_20210324-76937692.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_1x1x16_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 47.77 / 49.03 | 76.82 / 77.83 | [47.05 / 48.61](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [76.40 / 77.96](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 10390 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb/tsm_r50_1x1x16_50e_sthv1_rgb_20211202-b922e5d2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb/tsm_r50_1x1x16_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb/tsm_r50_1x1x16_50e_sthv1_rgb.json) | +| [tsm_r101_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 46.09 / 48.59 | 75.41 / 77.10 | [46.64 / 48.13](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [75.40 / 77.31](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 9800 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb/tsm_r101_1x1x8_50e_sthv1_rgb_20211202-49970a5b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb/tsm_r101_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb/tsm_r101_1x1x8_50e_sthv1_rgb.json) | + +### Something-Something V2 + +| config | resolution | gpus | backbone | pretrain | top1 acc (efficient/accurate) | top5 acc (efficient/accurate) | reference top1 acc (efficient/accurate) | reference top5 acc (efficient/accurate) | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------- | :--------: | :--: | :-------: | :------: | :---------------------------: | :---------------------------: | :----------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------: | :--------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_1x1x8_50e_sthv2_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb.py) | height 256 | 8 | ResNet50 | ImageNet | 59.11 / 61.82 | 85.39 / 86.80 | [xx / 61.2](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [xx / xx](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7069 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/tsm_r50_256h_1x1x8_50e_sthv2_rgb_20210816-032aa4da.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/20210816_224310.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/20210816_224310.log.json) | +| [tsm_r50_1x1x16_50e_sthv2_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb.py) | height 256 | 8 | ResNet50 | ImageNet | 61.06 / 63.19 | 86.66 / 87.93 | [xx / 63.1](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [xx / xx](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 10400 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb/tsm_r50_256h_1x1x16_50e_sthv2_rgb_20210331-0a45549c.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb/20210331_134458.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb/20210331_134458.log.json) | +| [tsm_r101_1x1x8_50e_sthv2_rgb](/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb.py) | height 256 | 8 | ResNet101 | ImageNet | 60.88 / 63.84 | 86.56 / 88.30 | [xx / 63.3](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [xx / xx](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 9727 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb/tsm_r101_256h_1x1x8_50e_sthv2_rgb_20210401-df97f3e1.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb/20210401_143656.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb/20210401_143656.log.json) | + +### MixUp & CutMix on Something-Something V1 + +| config | resolution | gpus | backbone | pretrain | top1 acc (efficient/accurate) | top5 acc (efficient/accurate) | delta top1 acc (efficient/accurate) | delta top5 acc (efficient/accurate) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :---------------------------: | :---------------------------: | :---------------------------------: | :---------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_mixup_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 46.35 / 48.49 | 75.07 / 76.88 | +0.77 / +0.79 | +0.05 / +0.70 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/tsm_r50_mixup_1x1x8_50e_sthv1_rgb-9eca48e5.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_cutmix_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 45.92 / 47.46 | 75.23 / 76.71 | +0.34 / -0.24 | +0.21 / +0.59 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb-34934615.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.json) | + +### Jester + +| config | resolution | gpus | backbone | pretrain | top1 acc (efficient/accurate) | ckpt | log | json | +| ---------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :---------------------------: | :------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_1x1x8_50e_jester_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 96.5 / 97.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb/tsm_r50_1x1x8_50e_jester_rgb-c799267e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb/tsm_r50_1x1x8_50e_jester_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb/tsm_r50_1x1x8_50e_jester_rgb.json) | + +### HMDB51 + +| config | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------- | :--: | :------: | :---------: | :------: | :------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb](/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb.py) | 8 | ResNet50 | Kinetics400 | 72.68 | 92.03 | 10388 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb_20210630-10c74ee5.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/20210605_182554.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/20210605_182554.log.json) | +| [tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb](/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb.py) | 8 | ResNet50 | Kinetics400 | 74.77 | 93.86 | 10388 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb_20210630-4785548e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/20210605_182505.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/20210605_182505.log.json) | + +### UCF101 + +| config | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------- | :--: | :------: | :---------: | :------: | :------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb](/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb.py) | 8 | ResNet50 | Kinetics400 | 94.50 | 99.58 | 10389 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb_20210630-1fae312b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/20210605_182720.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/20210605_182720.log.json) | +| [tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb](/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb.py) | 8 | ResNet50 | Kinetics400 | 94.58 | 99.37 | 10389 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb_20210630-8df9c358.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/20210605_182720.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/20210605_182720.log.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, + not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The values in columns named after "reference" are the results got by training on the original repo, using the same model settings. The checkpoints for reference repo can be downloaded [here](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_reference_ckpt.rar). +4. There are two kinds of test settings for Something-Something dataset, efficient setting (center crop x 1 clip) and accurate setting (Three crop x 2 clip), which is referred from the [original repo](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd). + We use efficient setting as default provided in config files, and it can be changed to accurate setting by + +```python +... +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, # `num_clips = 8` when using 8 segments + twice_sample=True, # set `twice_sample=True` for twice sample in accurate setting + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + # dict(type='CenterCrop', crop_size=224), it is used for efficient setting + dict(type='ThreeCrop', crop_size=256), # it is used for accurate setting + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +``` + +5. When applying Mixup and CutMix, we use the hyper parameter `alpha=0.2`. +6. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. +7. The **infer_ckpt** means those checkpoints are ported from [TSM](https://github.com/mit-han-lab/temporal-shift-module/blob/master/test_models.py). + +::: + +For more details on data preparation, you can refer to corresponding parts in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train TSM model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py \ + --work-dir work_dirs/tsm_r50_1x1x8_100e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test TSM model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{lin2019tsm, + title={TSM: Temporal Shift Module for Efficient Video Understanding}, + author={Lin, Ji and Gan, Chuang and Han, Song}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + year={2019} +} +``` + + + +```BibTeX +@article{NonLocal2018, + author = {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He}, + title = {Non-local Neural Networks}, + journal = {CVPR}, + year = {2018} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..81ea735bf1b5c03d6ee231adf0c0a20c1d7cd37c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/README_zh-CN.md @@ -0,0 +1,184 @@ +# TSM + +## 简介 + + + +```BibTeX +@inproceedings{lin2019tsm, + title={TSM: Temporal Shift Module for Efficient Video Understanding}, + author={Lin, Ji and Gan, Chuang and Han, Song}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + year={2019} +} +``` + + + +```BibTeX +@article{NonLocal2018, + author = {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He}, + title = {Non-local Neural Networks}, + journal = {CVPR}, + year = {2018} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 参考代码的 top1 准确率 | 参考代码的 top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :---------: | :------: | :---------: | :---------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | :----------------: | :--------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 70.24 | 89.56 | [70.36](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_8f.sh) | [89.49](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_8f.sh) | 74.0 (8x1 frames) | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/tsm_r50_1x1x8_50e_kinetics400_rgb_20200607-af7fb746.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20200607_211800.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20200607_211800.log.json) | +| [tsm_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 70.59 | 89.52 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/tsm_r50_256p_1x1x8_50e_kinetics400_rgb_20200726-020785e2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/20200725_031623.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/20200725_031623.log.json) | +| [tsm_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | 短边 320 | 8 | ResNet50 | ImageNet | 70.73 | 89.81 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/tsm_r50_1x1x8_50e_kinetics400_rgb_20210701-68d582b4.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20210616_021451.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20210616_021451.log.json) | +| [tsm_r50_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb.py) | 短边 320 | 8 | ResNet50 | ImageNet | 71.90 | 90.03 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb/tsm_r50_1x1x8_100e_kinetics400_rgb_20210701-7ff22268.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb/20210617_103543.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb/20210617_103543.log.json) | +| [tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb.py](/configs/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 70.48 | 89.40 | x | x | x | 7076 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb_20210219-bf96e6cc.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb_20210219.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb_20210219.json) | +| [tsm_r50_video_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 70.25 | 89.66 | [70.36](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_8f.sh) | [89.49](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_8f.sh) | 74.0 (8x1 frames) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_1x1x8_100e_kinetics400_rgb_20200702-a77f4328.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_2d_1x1x8_50e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_2d_1x1x8_50e_kinetics400_rgb.log.json) | +| [tsm_r50_dense_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb.py) | 短边 320 | 8 | ResNet50 | ImageNet | 73.46 | 90.84 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/tsm_r50_dense_1x1x8_50e_kinetics400_rgb_20210701-a54ff3d3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/20210617_103245.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/20210617_103245.log.json) | +| [tsm_r50_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb.py) | 短边 320 | 8 | ResNet50 | ImageNet | 74.55 | 91.74 | x | x | x | 7079 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/tsm_r50_dense_1x1x8_100e_kinetics400_rgb_20210701-e3e5e97f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/20210613_034931.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/20210613_034931.log.json) | +| [tsm_r50_1x1x16_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 72.09 | 90.37 | [70.67](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_16f.sh) | [89.98](https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_16f.sh) | 47.0 (16x1 frames) | 10404 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/tsm_r50_340x256_1x1x16_50e_kinetics400_rgb_20201011-2f27f229.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20201011_205356.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20201011_205356.log.json) | +| [tsm_r50_1x1x16_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py) | 短边 256 | 8x4 | ResNet50 | ImageNet | 71.89 | 90.73 | x | x | x | 10398 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/tsm_r50_256p_1x1x16_50e_kinetics400_rgb_20201010-85645c2a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/20201010_224825.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/20201010_224825.log.json) | +| [tsm_r50_1x1x16_100e_kinetics400_rgb](/configs/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb.py) | 短边 320 | 8 | ResNet50 | ImageNet | 72.80 | 90.75 | x | x | x | 10398 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb/tsm_r50_1x1x16_100e_kinetics400_rgb_20210701-41ac92b9.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb/20210618_193859.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb/20210618_193859.log.json) | +| [tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb.py) | 短边 320 | 8x4 | ResNet50 | ImageNet | 72.03 | 90.25 | 71.81 | 90.36 | x | 8931 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb_20200724-f00f1336.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200724_120023.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200724_120023.log.json) | +| [tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py) | 短边 320 | 8x4 | ResNet50 | ImageNet | 70.70 | 89.90 | x | x | x | 10125 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb_20200816-b93fd297.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200815_210253.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200815_210253.log.json) | +| [tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb](/configs/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py) | 短边 320 | 8x4 | ResNet50 | ImageNet | 71.60 | 90.34 | x | x | x | 8358 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb_20200724-d8ad84d2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/20200723_220442.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/20200723_220442.log.json) | +| [tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb.py) | 短边 320 | 8 | MobileNetV2 | ImageNet | 68.46 | 88.64 | x | x | x | 3385 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/tsm_mobilenetv2_dense_320p_1x1x8_100e_kinetics400_rgb_20210202-61135809.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/20210129_024936.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/20210129_024936.log.json) | +| [tsm_mobilenetv2_dense_1x1x8_kinetics400_rgb_port](/configs/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb.py) | 短边 320 | 8 | MobileNetV2 | ImageNet | 69.89 | 89.01 | x | x | x | 3385 | [infer_ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_kinetics400_rgb_port_20210922-aa5cadf6.pth) | x | x | + +### Diving48 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :---------: | :---------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_video_1x1x8_50e_diving48_rgb](/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 75.99 | 97.16 | 7070 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/tsm_r50_video_1x1x8_50e_diving48_rgb_20210426-aba5aa3d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/20210426_012424.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/20210426_012424.log.json) | +| [tsm_r50_video_1x1x16_50e_diving48_rgb](/configs/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 81.62 | 97.66 | 7070 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/tsm_r50_video_1x1x16_50e_diving48_rgb_20210426-aa9631c0.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/20210426_012823.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/20210426_012823.log.json) | + +### Something-Something V1 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 (efficient/accurate) | top5 准确率 (efficient/accurate) | 参考代码的 top1 准确率 (efficient/accurate) | 参考代码的 top5 准确率 (efficient/accurate) | GPU 显存占用 (M) | ckpt | log | json | +| :----------------------------------------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :------------------------------: | :------------------------------: | :--------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------: | :--------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 45.58 / 47.70 | 75.02 / 76.12 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb/tsm_r50_1x1x8_50e_sthv1_rgb_20210203-01dce462.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb/20210203_150227.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb/20210203_150227.log.json) | +| [tsm_r50_flip_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 47.10 / 48.51 | 76.02 / 77.56 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb/tsm_r50_flip_1x1x8_50e_sthv1_rgb_20210203-12596f16.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb/20210203_145829.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb/20210203_145829.log.json) | +| [tsm_r50_randaugment_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 47.16 / 48.90 | 76.07 / 77.92 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb_20210324-481268d9.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 47.65 / 48.66 | 76.67 / 77.41 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb-ee93e5e3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 46.26 / 47.68 | 75.92 / 76.49 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb-4f4f4740.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 47.85 / 50.31 | 76.78 / 78.18 | [45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7077 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb_20210324-76937692.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_1x1x16_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 47.77 / 49.03 | 76.82 / 77.83 | [47.05 / 48.61](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [76.40 / 77.96](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 10390 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb/tsm_r50_1x1x16_50e_sthv1_rgb_20211202-b922e5d2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb/tsm_r50_1x1x16_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb/tsm_r50_1x1x16_50e_sthv1_rgb.json) | +| [tsm_r101_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 46.09 / 48.59 | 75.41 / 77.10 | [46.64 / 48.13](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [75.40 / 77.31](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 9800 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb/tsm_r101_1x1x8_50e_sthv1_rgb_20201010-43fedf2e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb/tsm_r101_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb/tsm_r101_1x1x8_50e_sthv1_rgb.json) | + +### Something-Something V2 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 (efficient/accurate) | top5 准确率 (efficient/accurate) | 参考代码的 top1 准确率 (efficient/accurate) | 参考代码的 top5 准确率 (efficient/accurate) | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------- | :----: | :------: | :-------: | :------: | :------------------------------: | :------------------------------: | :----------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------: | :--------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_1x1x8_50e_sthv2_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb.py) | 高 256 | 8 | ResNet50 | ImageNet | 59.11 / 61.82 | 85.39 / 86.80 | [xx / 61.2](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [xx / xx](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 7069 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/tsm_r50_256h_1x1x8_50e_sthv2_rgb_20210816-032aa4da.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/20210816_224310.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/20210816_224310.log.json) | +| [tsm_r50_1x1x16_50e_sthv2_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb.py) | 高 256 | 8 | ResNet50 | ImageNet | 61.06 / 63.19 | 86.66 / 87.93 | [xx / 63.1](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [xx / xx](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 10400 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb/tsm_r50_256h_1x1x16_50e_sthv2_rgb_20210331-0a45549c.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb/20210331_134458.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb/20210331_134458.log.json) | +| [tsm_r101_1x1x8_50e_sthv2_rgb](/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb.py) | 高 256 | 8 | ResNet101 | ImageNet | 60.88 / 63.84 | 86.56 / 88.30 | [xx / 63.3](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [xx / xx](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 9727 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb/tsm_r101_256h_1x1x8_50e_sthv2_rgb_20210401-df97f3e1.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb/20210401_143656.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb/20210401_143656.log.json) | + +### Diving48 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :---------: | :---------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_video_1x1x8_50e_diving48_rgb](/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 75.99 | 97.16 | 7070 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/tsm_r50_video_1x1x8_50e_diving48_rgb_20210426-aba5aa3d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/20210426_012424.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/20210426_012424.log.json) | +| [tsm_r50_video_1x1x16_50e_diving48_rgb](/configs/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 81.62 | 97.66 | 7070 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/tsm_r50_video_1x1x16_50e_diving48_rgb_20210426-aa9631c0.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/20210426_012823.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/20210426_012823.log.json) | + +### MixUp & CutMix on Something-Something V1 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 (efficient/accurate) | top5 准确率 (efficient/accurate) | top1 准确率变化 (efficient/accurate) | top5 准确率变化 (efficient/accurate) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :------------------------------: | :------------------------------: | :----------------------------------: | :----------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_mixup_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 46.35 / 48.49 | 75.07 / 76.88 | +0.77 / +0.79 | +0.05 / +0.70 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/tsm_r50_mixup_1x1x8_50e_sthv1_rgb-9eca48e5.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.json) | +| [tsm_r50_cutmix_1x1x8_50e_sthv1_rgb](/configs/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 45.92 / 47.46 | 75.23 / 76.71 | +0.34 / -0.24 | +0.21 / +0.59 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb-34934615.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.json) | + +### Jester + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 (efficient/accurate) | ckpt | log | json | +| ---------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | +| [tsm_r50_1x1x8_50e_jester_rgb](/configs/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 96.5 / 97.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb/tsm_r50_1x1x8_50e_jester_rgb-c799267e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb/tsm_r50_1x1x8_50e_jester_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb/tsm_r50_1x1x8_50e_jester_rgb.json) | + +### HMDB51 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :---------: | :---------: | :---------: | :--------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb](/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb.py) | 8 | ResNet50 | Kinetics400 | 72.68 | 92.03 | 10388 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb_20210630-10c74ee5.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/20210605_182554.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/20210605_182554.log.json) | +| [tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb](/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb.py) | 8 | ResNet50 | Kinetics400 | 74.77 | 93.86 | 10388 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb_20210630-4785548e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/20210605_182505.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/20210605_182505.log.json) | + +### UCF101 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :---------: | :---------: | :---------: | :--------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------: | +| [tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb](/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb.py) | 8 | ResNet50 | Kinetics400 | 94.50 | 99.58 | 10389 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb_20210630-1fae312b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/20210605_182720.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/20210605_182720.log.json) | +| [tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb](/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb.py) | 8 | ResNet50 | Kinetics400 | 94.58 | 99.37 | 10389 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb_20210630-8df9c358.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/20210605_182720.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/20210605_182720.log.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 参考代码的结果是通过使用相同的模型配置在原来的代码库上训练得到的。对应的模型权重文件可从 [这里](https://download.openmmlab.com/mmaction/recognition/tsm/tsm_reference_ckpt.rar) 下载。 +4. 对于 Something-Something 数据集,有两种测试方案:efficient(对应 center crop x 1 clip)和 accurate(对应 Three crop x 2 clip)。两种方案参考自 [原始代码库](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd)。 + MMAction2 使用 efficient 方案作为配置文件中的默认选择,用户可以通过以下方式转变为 accurate 方案: + +```python +... +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, # 当使用 8 个 视频段时,设置 `num_clips = 8` + twice_sample=True, # 设置 `twice_sample=True` 用于 accurate 方案中的 Twice Sample + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + # dict(type='CenterCrop', crop_size=224), 用于 efficient 方案 + dict(type='ThreeCrop', crop_size=256), # 用于 accurate 方案 + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +``` + +5. 当采用 Mixup 和 CutMix 的数据增强时,使用超参 `alpha=0.2`。 +6. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 +7. 这里的 **infer_ckpt** 表示该模型权重文件是从 [TSM](https://github.com/mit-han-lab/temporal-shift-module/blob/master/test_models.py) 导入的。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400, Something-Something V1 and Something-Something V2 部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 TSM 模型在 Kinetics-400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py \ + --work-dir work_dirs/tsm_r50_1x1x8_100e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics-400 数据集上测试 TSM 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..6ad13f2948ed1cbed1238931d4768094ecc95ba0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/metafile.yml @@ -0,0 +1,830 @@ +Collections: +- Name: TSM + README: configs/recognition/tsm/README.md + Paper: + URL: https://arxiv.org/abs/1811.08383 + Title: "TSM: Temporal Shift Module for Efficient Video Understanding" +Models: +- Config: configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32965562368 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.24 + Top 5 Accuracy: 89.56 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20200607_211800.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20200607_211800.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/tsm_r50_1x1x8_50e_kinetics400_rgb_20200607-af7fb746.pth +- Config: configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32965562368 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.59 + Top 5 Accuracy: 89.52 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/20200725_031623.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/20200725_031623.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/tsm_r50_256p_1x1x8_50e_kinetics400_rgb_20200726-020785e2.pth +- Config: configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32965562368 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.73 + Top 5 Accuracy: 89.81 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20210616_021451.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/20210616_021451.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb/tsm_r50_1x1x8_50e_kinetics400_rgb_20210701-68d582b4.pth +- Config: configs/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 32965562368 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x8_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 71.9 + Top 5 Accuracy: 90.03 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb/20210617_103543.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb/20210617_103543.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb/tsm_r50_1x1x8_100e_kinetics400_rgb_20210701-7ff22268.pth +- Config: configs/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32965562368 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.48 + Top 5 Accuracy: 89.4 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb_20210219.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb_20210219.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb_20210219-bf96e6cc.pth +- Config: configs/recognition/tsm/tsm_r50_video_1x1x8_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32965562368 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_video_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.25 + Top 5 Accuracy: 89.66 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_2d_1x1x8_50e_kinetics400_rgb.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_2d_1x1x8_50e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_1x1x8_100e_kinetics400_rgb_20200702-a77f4328.pth +- Config: configs/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32965562368 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_dense_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.46 + Top 5 Accuracy: 90.84 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/20210617_103245.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/20210617_103245.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/tsm_r50_dense_1x1x8_50e_kinetics400_rgb_20210701-a54ff3d3.pth +- Config: configs/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 32965562368 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_dense_1x1x8_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 74.55 + Top 5 Accuracy: 91.74 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/20210613_034931.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/20210613_034931.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/tsm_r50_dense_1x1x8_100e_kinetics400_rgb_20210701-e3e5e97f.pth +- Config: configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 50 + FLOPs: 65931124736 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x16_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.09 + Top 5 Accuracy: 90.37 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20201011_205356.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20201011_205356.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/tsm_r50_340x256_1x1x16_50e_kinetics400_rgb_20201011-2f27f229.pth +- Config: configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 50 + FLOPs: 65931124736 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: tsm_r50_1x1x16_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 71.89 + Top 5 Accuracy: 90.73 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/20201010_224825.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/20201010_224825.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/tsm_r50_256p_1x1x16_50e_kinetics400_rgb_20201010-85645c2a.pth +- Config: configs/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 100 + FLOPs: 65931124736 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x16_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.80 + Top 5 Accuracy: 90.75 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20210621_115844.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20210621_115844.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/tsm_r50_1x1x16_50e_kinetics400_rgb_20210701-7c0c5d54.pth +- Config: configs/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 49457811456 + Parameters: 31682000 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.03 + Top 5 Accuracy: 90.25 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200724_120023.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200724_120023.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb_20200724-f00f1336.pth +- Config: configs/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 41231355904 + Parameters: 28007888 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.7 + Top 5 Accuracy: 89.9 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200815_210253.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200815_210253.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb_20200816-b93fd297.pth +- Config: configs/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 49457811456 + Parameters: 31682000 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: RGB + Name: tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 71.6 + Top 5 Accuracy: 90.34 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/20200723_220442.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/20200723_220442.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb_20200724-d8ad84d2.pth +- Config: configs/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: MobileNetV2 + Batch Size: 8 + Epochs: 100 + FLOPs: 3337519104 + Parameters: 2736272 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 68.46 + Top 5 Accuracy: 88.64 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/20210129_024936.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/20210129_024936.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/tsm_mobilenetv2_dense_320p_1x1x8_100e_kinetics400_rgb_20210202-61135809.pth +- Config: configs/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32959795200 + Parameters: 23606384 + Pretrained: ImageNet + Training Data: Diving48 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_video_1x1x8_50e_diving48_rgb + Results: + - Dataset: Diving48 + Metrics: + Top 1 Accuracy: 75.99 + Top 5 Accuracy: 97.16 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/20210426_012424.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/20210426_012424.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb/tsm_r50_video_1x1x8_50e_diving48_rgb_20210426-aba5aa3d.pth +- Config: configs/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 4 + Epochs: 50 + FLOPs: 65919590400 + Parameters: 23606384 + Pretrained: ImageNet + Training Data: Diving48 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_video_1x1x16_50e_diving48_rgb + Results: + - Dataset: Diving48 + Metrics: + Top 1 Accuracy: 81.62 + Top 5 Accuracy: 97.66 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/20210426_012823.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/20210426_012823.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb/tsm_r50_video_1x1x16_50e_diving48_rgb_20210426-aa9631c0.pth +- Config: configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32961859584 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 47.7 + Top 1 Accuracy (efficient): 45.58 + Top 5 Accuracy: 76.12 + Top 5 Accuracy (efficient): 75.02 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb/20210203_150227.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb/20210203_150227.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb/tsm_r50_1x1x8_50e_sthv1_rgb_20210203-01dce462.pth + reference top1 acc (efficient/accurate): '[45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' + reference top5 acc (efficient/accurate): '[74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' +- Config: configs/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32961859584 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_flip_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 48.51 + Top 1 Accuracy (efficient): 47.1 + Top 5 Accuracy: 77.56 + Top 5 Accuracy (efficient): 76.02 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb/20210203_145829.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb/20210203_145829.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb/tsm_r50_flip_1x1x8_50e_sthv1_rgb_20210203-12596f16.pth + reference top1 acc (efficient/accurate): '[45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' + reference top5 acc (efficient/accurate): '[74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' +- Config: configs/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32961859584 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_randaugment_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 48.9 + Top 1 Accuracy (efficient): 47.16 + Top 5 Accuracy: 77.92 + Top 5 Accuracy (efficient): 76.07 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb_20210324-481268d9.pth + reference top1 acc (efficient/accurate): '[45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' + reference top5 acc (efficient/accurate): '[74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' +- Config: configs/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 32961859584 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 50.31 + Top 1 Accuracy (efficient): 47.85 + Top 5 Accuracy: 78.18 + Top 5 Accuracy (efficient): 76.78 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb_20210324-76937692.pth + reference top1 acc (efficient/accurate): '[45.50 / 47.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' + reference top5 acc (efficient/accurate): '[74.34 / 76.60](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' +- Config: configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 50 + FLOPs: 65923719168 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x16_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 49.03 + Top 1 Accuracy (efficient): 47.77 + Top 5 Accuracy: 77.83 + Top 5 Accuracy (efficient): 76.82 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb/tsm_r50_1x1x16_50e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb/tsm_r50_1x1x16_50e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb/tsm_r50_1x1x16_50e_sthv1_rgb_20211202-b922e5d2.pth + reference top1 acc (efficient/accurate): '[47.05 / 48.61](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' + reference top5 acc (efficient/accurate): '[76.40 / 77.96](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' +- Config: configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 62782459904 + Parameters: 42856686 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r101_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 48.59 + Top 1 Accuracy (efficient): 46.09 + Top 5 Accuracy: 77.10 + Top 5 Accuracy (efficient): 75.41 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb/tsm_r101_1x1x8_50e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb/tsm_r101_1x1x8_50e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb/tsm_r101_1x1x8_50e_sthv1_rgb_20211202-49970a5b.pth + reference top1 acc (efficient/accurate): '[46.64 / 48.13](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' + reference top5 acc (efficient/accurate): '[75.40 / 77.31](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' +- Config: configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 50 + FLOPs: 32961859584 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 256 + Training Data: SthV2 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x8_50e_sthv2_rgb + Results: + - Dataset: SthV2 + Metrics: + Top 1 Accuracy: 61.82 + Top 1 Accuracy (efficient): 59.11 + Top 5 Accuracy: 86.80 + Top 5 Accuracy (efficient): 85.39 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/20210816_224310.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/20210816_224310.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/tsm_r50_256h_1x1x8_50e_sthv2_rgb_20210816-032aa4da.pth + reference top1 acc (efficient/accurate): '[57.98 / 60.69](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' + reference top5 acc (efficient/accurate): '[84.57 / 86.28](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' +- Config: configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 50 + FLOPs: 32961859584 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 256 + Training Data: SthV2 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x16_50e_sthv2_rgb + Results: + - Dataset: SthV2 + Metrics: + Top 1 Accuracy: 63.19 + Top 1 Accuracy (efficient): 61.06 + Top 5 Accuracy: 87.93 + Top 5 Accuracy (efficient): 86.66 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb/20210331_134458.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb/20210331_134458.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb/tsm_r50_256h_1x1x16_50e_sthv2_rgb_20210331-0a45549c.pth + reference top1 acc (efficient/accurate): '[xx / 63.1](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' + reference top5 acc (efficient/accurate): '[xx / xx](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' +- Config: configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet101 + Batch Size: 8 + Epochs: 50 + FLOPs: 62782459904 + Parameters: 42856686 + Pretrained: ImageNet + Resolution: height 256 + Training Data: SthV2 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r101_1x1x8_50e_sthv2_rgb + Results: + - Dataset: SthV2 + Metrics: + Top 1 Accuracy: 63.84 + Top 1 Accuracy (efficient): 60.88 + Top 5 Accuracy: 88.30 + Top 5 Accuracy (efficient): 86.56 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb/20210401_143656.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb/20210401_143656.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb/tsm_r101_256h_1x1x8_50e_sthv2_rgb_20210401-df97f3e1.pth + reference top1 acc (efficient/accurate): '[xx / 63.3](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' + reference top5 acc (efficient/accurate): '[xx / xx](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)' +- Config: configs/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 43051352064 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_mixup_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 48.49 + Top 1 Accuracy (efficient): 46.35 + Top 5 Accuracy: 76.88 + Top 5 Accuracy (efficient): 75.07 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/tsm_r50_mixup_1x1x8_50e_sthv1_rgb-9eca48e5.pth + delta top1 acc (efficient/accurate): +0.77 / +0.79 + delta top5 acc (efficient/accurate): +0.05 / +0.70 +- Config: configs/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 43051352064 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_cutmix_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 47.46 + Top 1 Accuracy (efficient): 45.92 + Top 5 Accuracy: 76.71 + Top 5 Accuracy (efficient): 75.23 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb-34934615.pth + delta top1 acc (efficient/accurate): +0.34 / -0.24 + delta top5 acc (efficient/accurate): +0.21 / +0.59 +- Config: configs/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 43048943616 + Parameters: 23563355 + Pretrained: ImageNet + Resolution: height 100 + Training Data: Jester + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_r50_1x1x8_50e_jester_rgb + Results: + - Dataset: Jester + Metrics: + Top 1 Accuracy: 97.2 + Top 1 Accuracy (efficient): 96.5 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb/tsm_r50_1x1x8_50e_jester_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb/tsm_r50_1x1x8_50e_jester_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb/tsm_r50_1x1x8_50e_jester_rgb-c799267e.pth +- Config: configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 25 + FLOPs: 32959844352 + Parameters: 23612531 + Pretrained: Kinetics400 + Training Data: HMDB51 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb + Results: + - Dataset: HMDB51 + Metrics: + Top 1 Accuracy: 72.68 + Top 5 Accuracy: 92.03 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/20210605_182554.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/20210605_182554.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb_20210630-10c74ee5.pth + gpu_mem(M): '10388' +- Config: configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 25 + FLOPs: 65919688704 + Parameters: 23612531 + Pretrained: Kinetics400 + Training Data: HMDB51 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb + Results: + - Dataset: HMDB51 + Metrics: + Top 1 Accuracy: 74.77 + Top 5 Accuracy: 93.86 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/20210605_182505.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/20210605_182505.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb_20210630-4785548e.pth + gpu_mem(M): '10388' +- Config: configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 25 + FLOPs: 32960663552 + Parameters: 23714981 + Pretrained: Kinetics400 + Training Data: UCF101 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb + Results: + - Dataset: UCF101 + Metrics: + Top 1 Accuracy: 94.5 + Top 5 Accuracy: 99.58 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/20210605_182720.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/20210605_182720.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb_20210630-1fae312b.pth + gpu_mem(M): '10389' +- Config: configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb.py + In Collection: TSM + Metadata: + Architecture: ResNet50 + Batch Size: 6 + Epochs: 25 + FLOPs: 65921327104 + Parameters: 23714981 + Pretrained: Kinetics400 + Training Data: UCF101 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb + Results: + - Dataset: UCF101 + Metrics: + Top 1 Accuracy: 94.58 + Top 5 Accuracy: 99.37 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/20210605_182720.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/20210605_182720.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb_20210630-8df9c358.pth + gpu_mem(M): '10389' +- Config: configs/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb.py + In Collection: TSM + Metadata: + Architecture: MobileNetV2 + Batch Size: 8 + Epochs: 100 + FLOPs: 3337519104 + Parameters: 2736272 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsm_mobilenetv2_dense_1x1x8_kinetics400_rgb_port + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 69.89 + Top 5 Accuracy: 89.01 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_kinetics400_rgb_port_20210922-aa5cadf6.pth + gpu_mem(M): '3385' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..9a6535b3ed5dcab9693faa0ff931c15d2b723ec7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb.py @@ -0,0 +1,101 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict(num_segments=16), + cls_head=dict(num_classes=51, num_segments=16)) + +# dataset settings +split = 1 +dataset_type = 'RawframeDataset' +data_root = 'data/hmdb51/rawframes' +data_root_val = 'data/hmdb51/rawframes' +ann_file_train = f'data/hmdb51/hmdb51_train_split_{split}_rawframes.txt' +ann_file_val = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +ann_file_test = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.00075, # this lr is used for 8 gpus +) +# learning policy +lr_config = dict(policy='step', step=[10, 20]) +total_epochs = 25 + +load_from = 'https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/tsm_r50_256p_1x1x16_50e_kinetics400_rgb_20201010-85645c2a.pth' # noqa: E501 +# runtime settings +work_dir = './work_dirs/tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..92ef9bfe4cf51573eae1858ddb4daf56065a98f1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb.py @@ -0,0 +1,101 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict(num_segments=16), + cls_head=dict(num_classes=101, num_segments=16)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ucf101/rawframes/' +data_root_val = 'data/ucf101/rawframes/' +split = 1 # official train/test splits. valid numbers: 1, 2, 3 +ann_file_train = f'data/ucf101/ucf101_train_split_{split}_rawframes.txt' +ann_file_val = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +ann_file_test = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.00075, # this lr is used for 8 gpus +) +# learning policy +lr_config = dict(policy='step', step=[10, 20]) +total_epochs = 25 + +load_from = 'https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/tsm_r50_256p_1x1x16_50e_kinetics400_rgb_20201010-85645c2a.pth' # noqa: E501 +# runtime settings +work_dir = './work_dirs/tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..5169eda3a906dce87291528003b364924bcea1d7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb.py @@ -0,0 +1,101 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict(num_segments=8), + cls_head=dict(num_classes=51, num_segments=8)) + +# dataset settings +split = 1 +dataset_type = 'RawframeDataset' +data_root = 'data/hmdb51/rawframes' +data_root_val = 'data/hmdb51/rawframes' +ann_file_train = f'data/hmdb51/hmdb51_train_split_{split}_rawframes.txt' +ann_file_val = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +ann_file_test = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.0015, # this lr is used for 8 gpus +) +# learning policy +lr_config = dict(policy='step', step=[10, 20]) +total_epochs = 25 + +load_from = 'https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/tsm_r50_256p_1x1x8_50e_kinetics400_rgb_20200726-020785e2.pth' # noqa: E501 +# runtime settings +work_dir = './work_dirs/tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..84317727a400a15bf2a7f66fddf7f92c17e34bc6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb.py @@ -0,0 +1,101 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict(num_segments=8), + cls_head=dict(num_classes=101, num_segments=8)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ucf101/rawframes/' +data_root_val = 'data/ucf101/rawframes/' +split = 1 # official train/test splits. valid numbers: 1, 2, 3 +ann_file_train = f'data/ucf101/ucf101_train_split_{split}_rawframes.txt' +ann_file_val = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +ann_file_test = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.0015, # this lr is used for 8 gpus +) +# learning policy +lr_config = dict(policy='step', step=[10, 20]) +total_epochs = 25 + +load_from = 'https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x8_50e_kinetics400_rgb/tsm_r50_256p_1x1x8_50e_kinetics400_rgb_20200726-020785e2.pth' # noqa: E501 +# runtime settings +work_dir = './work_dirs/tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..b6df2b32d140cb0a167d83466a4ceea1d93468e2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,88 @@ +_base_ = [ + '../../_base_/models/tsm_mobilenet_v2.py', + '../../_base_/schedules/sgd_tsm_mobilenet_v2_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=1) +work_dir = './work_dirs/tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_mobilenetv2_video_dense_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_mobilenetv2_video_dense_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..9442e1d700cc9104edaa440419e983178305bed9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_mobilenetv2_video_dense_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/tsm_mobilenet_v2.py', + '../../_base_/schedules/sgd_tsm_mobilenet_v2_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='DenseSampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.01, # this lr is used for 8 gpus +) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsm_mobilenetv2_dense_video_1x1x8_100e_kinetics400_rgb/' # noqa diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_mobilenetv2_video_inference_dense_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_mobilenetv2_video_inference_dense_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..15a3edd5f426dcd22834e07299792382a0cb3284 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_mobilenetv2_video_inference_dense_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,33 @@ +_base_ = ['../../_base_/models/tsm_mobilenet_v2.py'] + +# dataset settings +dataset_type = 'VideoDataset' +data_root_val = 'data/kinetics400/videos_val' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..884a2d663c4b77219a3e82a821132df229c1cfeb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict( + non_local=((0, 0, 0), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 0, 0)), + non_local_cfg=dict( + sub_sample=True, + use_scale=False, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='dot_product'))) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..738043ac04c60907f429717320a2a35222d6c8d2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict( + non_local=((0, 0, 0), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 0, 0)), + non_local_cfg=dict( + sub_sample=True, + use_scale=False, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='embedded_gaussian'))) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/' # noqa: E501 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..9516e93b05016ed4fd90414349a9ee8e083ba10d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict( + non_local=((0, 0, 0), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 0, 0)), + non_local_cfg=dict( + sub_sample=True, + use_scale=False, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='gaussian'))) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..02c43a38085b6e69edf82dbba6082f7c0f103cce --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,7 @@ +_base_ = ['./tsm_r50_1x1x8_50e_sthv1_rgb.py'] + +# model settings +model = dict(backbone=dict(pretrained='torchvision://resnet101', depth=101)) + +# runtime settings +work_dir = './work_dirs/tsm_r101_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..1926a975ba68effaaead8cf7c0bbcb239adbe330 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r101_1x1x8_50e_sthv2_rgb.py @@ -0,0 +1,90 @@ +_base_ = ['./tsm_r50_1x1x8_50e_sthv2_rgb.py'] + +# model settings +model = dict(backbone=dict(pretrained='torchvision://resnet101', depth=101)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv2/rawframes' +data_root_val = 'data/sthv2/rawframes' +ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt' +ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt' +ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.01, # this lr is used for 8 gpus +) +# runtime settings +work_dir = './work_dirs/tsm_r101_1x1x8_50e_sthv2_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..b09b65b4c16b85c0015aad760455058e921fa10a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_100e_kinetics400_rgb.py @@ -0,0 +1,7 @@ +_base_ = ['tsm_r50_1x1x16_50e_kinetics400_rgb.py'] + +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[40, 80]) +total_epochs = 100 +work_dir = './work_dirs/tsm_r50_1x1x16_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..d28b979fd9c036f0e683a8fc9ee652e0d35fa231 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py @@ -0,0 +1,95 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(backbone=dict(num_segments=16), cls_head=dict(num_segments=16)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.0075, # this lr is used for 8 gpus +) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsm_r50_1x1x16_50e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..8ca1b6b0c4214c8952ab844e43e39b3069ea7f65 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv1_rgb.py @@ -0,0 +1,99 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict(num_segments=16), + cls_head=dict(num_classes=174, num_segments=16)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.0075, # this lr is used for 8 gpus + weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_1x1x16_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..f930f1c2448f20dd84975334c7f4da2884c788b6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict(num_segments=16), + cls_head=dict(num_classes=174, num_segments=16)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv2/rawframes' +data_root_val = 'data/sthv2/rawframes' +ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt' +ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt' +ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.0075, # this lr is used for 8 gpus + weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_1x1x16_50e_sthv2_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..88b28924f60cc193c1c700d552ba9b0013b294fc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,6 @@ +_base_ = ['./tsm_r50_1x1x8_50e_kinetics400_rgb.py'] + +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +lr_config = dict(policy='step', step=[40, 80]) +total_epochs = 100 +work_dir = './work_dirs/tsm_r50_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4c1daf1d49946e2d8f55c01956bfb900eb20a3c4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_jester_rgb.py @@ -0,0 +1,91 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=27)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/jester/rawframes' +data_root_val = 'data/jester/rawframes' +ann_file_train = 'data/jester/jester_train_list_rawframes.txt' +ann_file_val = 'data/jester/jester_val_list_rawframes.txt' +ann_file_test = 'data/jester/jester_val_list_rawframes.txt' +jester_flip_label_map = {0: 1, 1: 0, 6: 7, 7: 6} +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, flip_label_map=jester_flip_label_map), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_1x1x8_50e_jester_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..76195eb83efdff7cc6871c822d6e2f7342584c93 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py @@ -0,0 +1,87 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsm_r50_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..e57a5b020ccfb9ad32055e9774a6381299c42b44 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,95 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..c51ac187c59062fb3a9b6892eae63f9d2ff1153b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb.py @@ -0,0 +1,94 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv2/rawframes' +data_root_val = 'data/sthv2/rawframes' +ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt' +ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt' +ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=6, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.0075, # this lr is used for 8 gpus + weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_1x1x8_50e_sthv2_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..cac9dbb75c4cb53533799d2f24f9a6c1cfa4fc98 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,115 @@ +_base_ = [ + '../../_base_/schedules/sgd_tsm_50e.py', '../../_base_/default_runtime.py' +] + +# model settings +# model settings# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNetTSM', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False, + shift_div=8), + cls_head=dict( + type='TSMHead', + num_classes=174, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.5, + init_std=0.001, + is_shift=True), + # model training and testing settings + train_cfg=dict( + blending=dict(type='CutmixBlending', num_classes=174, alpha=.2)), + test_cfg=dict(average_clips='prob')) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_cutmix_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..150e0f14e314e8d7d67767e747dc1098226915b4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,87 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DenseSampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/tsm_r50_dense_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..66ffd96ba00d1e265a91db20d371a727191f4ba1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_dense_1x1x8_50e_kinetics400_rgb.py @@ -0,0 +1,7 @@ +_base_ = ['tsm_r50_dense_1x1x8_100e_kinetics400_rgb.py'] + +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[20, 40]) +total_epochs = 50 +work_dir = './work_dirs/tsm_r50_dense_1x1x8_50e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..9b5199a7d004b8fa3b67ab0ccd41ad96664924e2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_flip_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,99 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' + +sthv1_flip_label_map = {2: 4, 4: 2, 30: 41, 41: 30, 52: 66, 66: 52} +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, flip_label_map=sthv1_flip_label_map), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_flip_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..11ae99c946fa04428f3f255f72ee487f01560f24 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,100 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' + +sthv1_flip_label_map = {2: 4, 4: 2, 30: 41, 41: 30, 52: 66, 66: 52} +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, flip_label_map=sthv1_flip_label_map), + dict(type='Imgaug', transforms='default'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..61004a5bd481bdb75562ccbbb7db2f84f4145008 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb.py @@ -0,0 +1,93 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +module_hooks = [ + dict( + type='GPUNormalize', + hooked_module='backbone', + hook_pos='forward_pre', + input_format='NCHW', + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375]) +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsm_r50_gpu_normalize_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..24864ec22907d9d43457b2bc181d9027df63a237 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_mixup_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,114 @@ +_base_ = [ + '../../_base_/schedules/sgd_tsm_50e.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNetTSM', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False, + shift_div=8), + cls_head=dict( + type='TSMHead', + num_classes=174, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.5, + init_std=0.001, + is_shift=True), + # model training and testing settings + train_cfg=dict( + blending=dict(type='MixupBlending', num_classes=174, alpha=.2)), + test_cfg=dict(average_clips='prob')) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_mixup_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7b39be496417f3a87ca249935bfacc73febf1d47 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='pytorchvideo.AugMix'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..a7a8346a783129e6fbfd99ea13ee19385f39a0e1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='pytorchvideo.RandAugment'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..83ba457bb022fb02ef045a6aa6833181b714203f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Imgaug', transforms='default'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + twice_sample=True, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(weight_decay=0.0005) + +# runtime settings +work_dir = './work_dirs/tsm_r50_randaugment_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6871f53817bb0c5bd03eb6ed27a0c04f19702bc7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_1x1x16_50e_diving48_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict(num_segments=16), + cls_head=dict(num_classes=48, num_segments=16)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/diving48/videos' +data_root_val = 'data/diving48/videos' +ann_file_train = 'data/diving48/diving48_train_list_videos.txt' +ann_file_val = 'data/diving48/diving48_val_list_videos.txt' +ann_file_test = 'data/diving48/diving48_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, + metrics=['top_k_accuracy', 'mean_class_accuracy'], +) + +# optimizer +optimizer = dict( + lr=0.005, # this lr is used for 8 gpus +) +# runtime settings +checkpoint_config = dict(interval=1) +work_dir = './work_dirs/tsm_r50_video_1x1x16_50e_diving48_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..65609d21ec4dd2af395212f24ab87edf2d5ba546 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_diving48_rgb.py @@ -0,0 +1,100 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=48)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/diving48/videos' +data_root_val = 'data/diving48/videos' +ann_file_train = 'data/diving48/diving48_train_list_videos.txt' +ann_file_val = 'data/diving48/diving48_val_list_videos.txt' +ann_file_test = 'data/diving48/diving48_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, + metrics=['top_k_accuracy', 'mean_class_accuracy'], +) + +# optimizer +optimizer = dict( + lr=0.01, # this lr is used for 8 gpus +) +# runtime settings +checkpoint_config = dict(interval=1) +work_dir = './work_dirs/tsm_r50_video_1x1x8_50e_diving48_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..3e34c822c92c1dbc8afa9d1ff9bdf7d897fa2885 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_1x1x8_50e_kinetics400_rgb.py @@ -0,0 +1,94 @@ +_base_ = [ + '../../_base_/models/tsm_r50.py', '../../_base_/schedules/sgd_tsm_50e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + lr=0.02, # this lr is used for 8 gpus +) +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsm_r50_video_2d_1x1x8_50e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_inference_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_inference_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7c355ade2f60e169d6897733534bd9e1509b7600 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_r50_video_inference_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,31 @@ +_base_ = ['../../_base_/models/tsm_r50.py'] + +# dataset settings +dataset_type = 'VideoDataset' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict(type='DecordInit', num_threads=1), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=None, + data_prefix=None, + pipeline=test_pipeline)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_temporal_pool_r50_1x1x8_50e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_temporal_pool_r50_1x1x8_50e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..2984d37968e89ceba5528c2ee6c127719b9db4ef --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsm/tsm_temporal_pool_r50_1x1x8_50e_kinetics400_rgb.py @@ -0,0 +1,8 @@ +_base_ = ['./tsm_r50_1x1x8_50e_kinetics400_rgb.py'] + +# model settings +model = dict( + backbone=dict(temporal_pool=True), cls_head=dict(temporal_pool=True)) + +# runtime settings +work_dir = './work_dirs/tsm_temporal_pool_r50_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..c3c01dc6eecc76f17815ac590668088120ae8e69 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/README.md @@ -0,0 +1,248 @@ +# TSN + +[Temporal segment networks: Towards good practices for deep action recognition](https://link.springer.com/chapter/10.1007/978-3-319-46484-8_2) + + + +## Abstract + + + +Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles to design effective ConvNet architectures for action recognition in videos and learn these models given limited training samples. Our first contribution is temporal segment network (TSN), a novel framework for video-based action recognition. which is based on the idea of long-range temporal structure modeling. It combines a sparse temporal sampling strategy and video-level supervision to enable efficient and effective learning using the whole action video. The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network. Our approach obtains the state-the-of-art performance on the datasets of HMDB51 ( 69.4%) and UCF101 (94.2%). We also visualize the learned ConvNet models, which qualitatively demonstrates the effectiveness of temporal segment network and the proposed good practices. + + + +
+ +
+ +## Results and Models + +### UCF-101 + +| config | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------- | :--: | :------: | :------: | :------: | :------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x3_75e_ucf101_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb.py) \[1\] | 8 | ResNet50 | ImageNet | 83.03 | 96.78 | 8332 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023-d85ab600.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.json) | + +\[1\] We report the performance on UCF-101 split1. + +### Diving48 + +| config | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :----------------------------------------------------------------------------------------------------------- | :--: | :------: | :------: | :------: | :------: | :--------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_video_1x1x8_100e_diving48_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 71.27 | 95.74 | 5699 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/tsn_r50_video_1x1x8_100e_diving48_rgb_20210426-6dde0185.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log.json) | +| [tsn_r50_video_1x1x16_100e_diving48_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 76.75 | 96.95 | 5705 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/tsn_r50_video_1x1x16_100e_diving48_rgb_20210426-63c5f2f7.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log.json) | + +### HMDB51 + +| config | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------- | :--: | :------: | :---------: | :------: | :------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb.py) | 8 | ResNet50 | ImageNet | 48.95 | 80.19 | 21535 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb_20201123-ce6c27ed.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log.json) | +| [tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb.py) | 8 | ResNet50 | Kinetics400 | 56.08 | 84.31 | 21535 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb_20201123-7f84701b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log.json) | +| [tsn_r50_1x1x8_50e_hmdb51_mit_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb.py) | 8 | ResNet50 | Moments | 54.25 | 83.86 | 21535 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/tsn_r50_1x1x8_50e_hmdb51_mit_rgb_20201123-01526d41.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log.json) | + +### Kinetics-400 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------: | :------: | :------: | :------: | :------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | :---------------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 70.60 | 89.26 | x | x | 4.3 (25x10 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json) | +| [tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 70.42 | 89.03 | x | x | x | 8343 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_1x1x3_100e_kinetics400_rgb_20200725-22592236.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log.json) | +| [tsn_r50_dense_1x1x5_50e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb.py) | 340x256 | 8x3 | ResNet50 | ImageNet | 70.18 | 89.10 | [69.15](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [88.56](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 12.7 (8x10 frames) | 7028 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/tsn_r50_dense_1x1x5_100e_kinetics400_rgb_20200627-a063165f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log.json) | +| [tsn_r50_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | ImageNet | 70.91 | 89.51 | x | x | 10.7 (25x3 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log.json) | +| [tsn_r50_320p_1x1x3_110e_kinetics400_flow](/configs/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow.py) | short-side 320 | 8x2 | ResNet50 | ImageNet | 55.70 | 79.85 | x | x | x | 8471 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow/tsn_r50_320p_1x1x3_110e_kinetics400_flow_20200705-3036bab6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow/tsn_r50_f3_kinetics400_flow_shortedge_55.7_79.9.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow/tsn_r50_f3_kinetics400_flow_shortedge_55.7_79.9.log.json) | +| tsn_r50_320p_1x1x3_kinetics400_twostream \[1: 1\]\* | x | x | ResNet50 | ImageNet | 72.76 | 90.52 | x | x | x | x | x | x | x | +| [tsn_r50_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 71.80 | 90.17 | x | x | x | 8343 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/tsn_r50_256p_1x1x8_100e_kinetics400_rgb_20200817-883baf16.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/20200815_173413.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/20200815_173413.log.json) | +| [tsn_r50_320p_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py) | short-side 320 | 8x3 | ResNet50 | ImageNet | 72.41 | 90.55 | x | x | 11.1 (25x3 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/tsn_r50_320p_1x1x8_100e_kinetics400_rgb_20200702-ef80e3d7.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/tsn_r50_f8_kinetics400_shortedge_72.4_90.6.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/tsn_r50_f8_kinetics400_shortedge_72.4_90.6.log.json) | +| [tsn_r50_320p_1x1x8_110e_kinetics400_flow](/configs/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow.py) | short-side 320 | 8x4 | ResNet50 | ImageNet | 57.76 | 80.99 | x | x | x | 8473 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow/tsn_r50_320p_1x1x8_110e_kinetics400_flow_20200705-1f39486b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow/tsn_r50_f8_kinetics400_flow_shortedge_57.8_81.0.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow/tsn_r50_f8_kinetics400_flow_shortedge_57.8_81.0.log.json) | +| tsn_r50_320p_1x1x8_kinetics400_twostream \[1: 1\]\* | x | x | ResNet50 | ImageNet | 74.64 | 91.77 | x | x | x | x | x | x | x | +| [tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py) | short-side 320 | 8 | ResNet50 | ImageNet | 71.11 | 90.04 | x | x | x | 8343 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014-5ae1ee79.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014.json) | +| [tsn_r50_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 70.77 | 89.3 | [68.75](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [88.42](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 12.2 (8x10 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_dense_1x1x8_100e_kinetics400_rgb_20200606-e925e6e3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/20200606_003901.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/20200606_003901.log.json) | +| [tsn_r50_video_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 71.14 | 89.63 | x | x | x | 21558 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_1x1x8_100e_kinetics400_rgb_20200702-568cde33.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_100e_kinetics400_rgb.log.json) | +| [tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb.py) | short-side 256 | 8 | ResNet50 | ImageNet | 70.40 | 89.12 | x | x | x | 21553 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb_20200703-0f19175f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log.json) | + +Here, We use \[1: 1\] to indicate that we combine rgb and flow score with coefficients 1: 1 to get the two-stream prediction (without applying softmax). + +### Using backbones from 3rd-party in TSN + +It's possible and convenient to use a 3rd-party backbone for TSN under the framework of MMAction2, here we provide some examples for: + +- [x] Backbones from [MMClassification](https://github.com/open-mmlab/mmclassification/) +- [x] Backbones from [TorchVision](https://github.com/pytorch/vision/) +- [x] Backbones from [TIMM (pytorch-image-models)](https://github.com/rwightman/pytorch-image-models) + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------------------------------------------------------------------------------------------------------: | :------: | :------: | :------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNeXt101-32x4d \[[MMCls](https://github.com/open-mmlab/mmclassification/tree/master/configs/resnext)\] | ImageNet | 73.43 | 91.01 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb-16a8b561.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.json) | +| [tsn_dense161_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.py) | short-side 320 | 8x2 | Densenet-161 \[[TorchVision](https://github.com/pytorch/vision/)\] | ImageNet | 72.78 | 90.75 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb-cbe85332.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.json) | +| [tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.py) | short-side 320 | 8 | Swin Transformer Base \[[timm](https://github.com/rwightman/pytorch-image-models)\] | ImageNet | 77.51 | 92.92 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb-805380f6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.json) | + +1. Note that some backbones in TIMM are not supported due to multiple reasons. Please refer to to [PR #880](https://github.com/open-mmlab/mmaction2/pull/880) for details. + +### Kinetics-400 Data Benchmark (8-gpus, ResNet50, ImageNet pretrain; 3 segments) + +In data benchmark, we compare: + +1. Different data preprocessing methods: (1) Resize video to 340x256, (2) Resize the short edge of video to 320px, (3) Resize the short edge of video to 256px; +2. Different data augmentation methods: (1) MultiScaleCrop, (2) RandomResizedCrop; +3. Different testing protocols: (1) 25 frames x 10 crops, (2) 25 frames x 3 crops. + +| config | resolution | training augmentation | testing protocol | top1 acc | top5 acc | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------: | :-------------------: | :--------------: | :------: | :------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_multiscalecrop_340x256_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_340x256_1x1x3_100e_kinetics400_rgb.py) | 340x256 | MultiScaleCrop | 25x10 frames | 70.60 | 89.26 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json) | +| x | 340x256 | MultiScaleCrop | 25x3 frames | 70.52 | 89.39 | x | x | x | +| [tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb.py) | 340x256 | RandomResizedCrop | 25x10 frames | 70.11 | 89.01 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb_20200725-88cb325a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb_20200725.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb_20200725.json) | +| x | 340x256 | RandomResizedCrop | 25x3 frames | 69.95 | 89.02 | x | x | x | +| [tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb.py) | short-side 320 | MultiScaleCrop | 25x10 frames | 70.32 | 89.25 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb_20200725-9922802f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb_20200725.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb_20200725.json) | +| x | short-side 320 | MultiScaleCrop | 25x3 frames | 70.54 | 89.39 | x | x | x | +| [tsn_r50_randomresizedcrop_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_320p_1x1x3_100e_kinetics400_rgb.py) | short-side 320 | RandomResizedCrop | 25x10 frames | 70.44 | 89.23 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log.json) | +| x | short-side 320 | RandomResizedCrop | 25x3 frames | 70.91 | 89.51 | x | x | x | +| [tsn_r50_multiscalecrop_256p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_256p_1x1x3_100e_kinetics400_rgb.py) | short-side 256 | MultiScaleCrop | 25x10 frames | 70.42 | 89.03 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_1x1x3_100e_kinetics400_rgb_20200725-22592236.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log.json) | +| x | short-side 256 | MultiScaleCrop | 25x3 frames | 70.79 | 89.42 | x | x | x | +| [tsn_r50_randomresizedcrop_256p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_256p_1x1x3_100e_kinetics400_rgb.py) | short-side 256 | RandomResizedCrop | 25x10 frames | 69.80 | 89.06 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb_20200817-ae7963ca.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb/20200815_172601.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb/20200815_172601.log.json) | +| x | short-side 256 | RandomResizedCrop | 25x3 frames | 70.48 | 89.89 | x | x | x | + +### Kinetics-400 OmniSource Experiments + +| config | resolution | backbone | pretrain | w. OmniSource | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------: | :------------: | :------: | :---------: | :----------------: | :------: | :------: | :---------------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 340x256 | ResNet50 | ImageNet | :x: | 70.6 | 89.3 | 4.3 (25x10 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json) | +| x | 340x256 | ResNet50 | ImageNet | :heavy_check_mark: | 73.6 | 91.0 | x | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_imagenet_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-54192355.pth) | x | x | +| x | short-side 320 | ResNet50 | IG-1B \[1\] | :x: | 73.1 | 90.4 | x | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_without_omni_1x1x3_kinetics400_rgb_20200926-c133dd49.pth) | x | x | +| x | short-side 320 | ResNet50 | IG-1B \[1\] | :heavy_check_mark: | 75.7 | 91.9 | x | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-2863fed0.pth) | x | x | + +\[1\] We obtain the pre-trained model from [torch-hub](https://pytorch.org/hub/facebookresearch_semi-supervised-ImageNet1K-models_resnext/), the pretrain model we used is `resnet50_swsl` + +### Kinetics-600 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------: | :------: | :------: | :------: | :---------------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_video_1x1x8_100e_kinetics600_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb.py) | short-side 256 | 8x2 | ResNet50 | ImageNet | 74.8 | 92.3 | 11.1 (25x3 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015-4db3c461.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015.json) | + +### Kinetics-700 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------: | :------: | :------: | :------: | :---------------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_video_1x1x8_100e_kinetics700_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb.py) | short-side 256 | 8x2 | ResNet50 | ImageNet | 61.7 | 83.6 | 11.1 (25x3 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015-e381a6c7.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015.json) | + +### Something-Something V1 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top5 acc | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :------: | :------: | :------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x8_50e_sthv1_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 18.55 | 44.80 | [17.53](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [44.29](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 10978 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb/tsn_r50_1x1x8_50e_sthv1_rgb_20200618-061b9195.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb/tsn_sthv1.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb/tsn_r50_f8_sthv1_18.1_45.0.log.json) | +| [tsn_r50_1x1x16_50e_sthv1_rgb](/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb.py) | height 100 | 8 | ResNet50 | ImageNet | 15.77 | 39.85 | [13.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [35.58](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 5691 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb/tsn_r50_1x1x16_50e_sthv1_rgb_20200614-7e2fe4f1.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb/20200614_211932.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb/20200614_211932.log.json) | + +### Something-Something V2 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top5 acc | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------- | :--------: | :--: | :------: | :------: | :------: | :------: | :----------------: | :----------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x8_50e_sthv2_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb.py) | height 256 | 8 | ResNet50 | ImageNet | 28.59 | 59.56 | x | x | 10966 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb/tsn_r50_1x1x8_50e_sthv2_rgb_20210816-1aafee8f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb/20210816_221116.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb/20210816_221116.log.json) | +| [tsn_r50_1x1x16_50e_sthv2_rgb](/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb.py) | height 256 | 8 | ResNet50 | ImageNet | 20.89 | 49.16 | x | x | 8337 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb/tsn_r50_1x1x16_50e_sthv2_rgb_20210816-5d23ac6e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb/20210816_225256.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb/20210816_225256.log.json) | + +### Moments in Time + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :----------------------------------------------------------------------------------- | :------------: | :--: | :------: | :------: | :------: | :------: | :--------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x6_100e_mit_rgb](/configs/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb.py) | short-side 256 | 8x2 | ResNet50 | ImageNet | 26.84 | 51.6 | 8339 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_r50_1x1x6_100e_mit_rgb_20200618-d512ab1b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_mit.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_r50_f6_mit_26.8_51.6.log.json) | + +### Multi-Moments in Time + +| config | resolution | gpus | backbone | pretrain | mAP | gpu_mem(M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :------------: | :--: | :-------: | :------: | :---: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r101_1x1x5_50e_mmit_rgb](/configs/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb.py) | short-side 256 | 8x2 | ResNet101 | ImageNet | 61.09 | 10467 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb/tsn_r101_1x1x5_50e_mmit_rgb_20200618-642f450d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb/tsn_mmit.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb/tsn_r101_f6_mmit_61.1.log.json) | + +### ActivityNet v1.3 + +| config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :------: | :---------: | :------: | :------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_320p_1x1x8_50e_activitynet_video_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb.py) | short-side 320 | 8x1 | ResNet50 | Kinetics400 | 73.93 | 93.44 | 5692 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb_20210301-7f8da0c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/20210228_223327.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/20210228_223327.log.json) | +| [tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb.py) | short-side 320 | 8x1 | ResNet50 | Kinetics400 | 76.90 | 94.47 | 5692 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb_20210301-c0f04a7e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/20210217_181313.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/20210217_181313.log.json) | +| [tsn_r50_320p_1x1x8_150e_activitynet_video_flow](/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow.py) | 340x256 | 8x2 | ResNet50 | Kinetics400 | 57.51 | 83.02 | 5780 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/tsn_r50_320p_1x1x8_150e_activitynet_video_flow_20200804-13313f52.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/tsn_r50_320p_1x1x8_150e_activitynet_video_flow_20200804.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/tsn_r50_320p_1x1x8_150e_activitynet_video_flow_20200804.json) | +| [tsn_r50_320p_1x1x8_150e_activitynet_clip_flow](/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow.py) | 340x256 | 8x2 | ResNet50 | Kinetics400 | 59.51 | 82.69 | 5780 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow_20200804-8622cf38.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow_20200804.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow_20200804.json) | + +### HVU + +| config\[1\] | tag category | resolution | gpus | backbone | pretrain | mAP | HATNet\[2\] | HATNet-multi\[2\] | ckpt | log | json | +| :----------------------------------------------------------------------------------------------------------: | :----------: | :------------: | :--: | :------: | :------: | :--: | :---------: | :---------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r18_1x1x8_100e_hvu_action_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_action_rgb.py) | action | short-side 256 | 8x2 | ResNet18 | ImageNet | 57.5 | 51.8 | 53.5 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/action/tsn_r18_1x1x8_100e_hvu_action_rgb_20201027-011b282b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/action/tsn_r18_1x1x8_100e_hvu_action_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/action/tsn_r18_1x1x8_100e_hvu_action_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_scene_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_scene_rgb.py) | scene | short-side 256 | 8 | ResNet18 | ImageNet | 55.2 | 55.8 | 57.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/scene/tsn_r18_1x1x8_100e_hvu_scene_rgb_20201027-00e5748d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/scene/tsn_r18_1x1x8_100e_hvu_scene_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/scene/tsn_r18_1x1x8_100e_hvu_scene_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_object_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_object_rgb.py) | object | short-side 256 | 8 | ResNet18 | ImageNet | 45.7 | 34.2 | 35.1 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/object/tsn_r18_1x1x8_100e_hvu_object_rgb_20201102-24a22f30.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/object/tsn_r18_1x1x8_100e_hvu_object_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/object/tsn_r18_1x1x8_100e_hvu_object_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_event_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_event_rgb.py) | event | short-side 256 | 8 | ResNet18 | ImageNet | 63.7 | 38.5 | 39.8 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/event/tsn_r18_1x1x8_100e_hvu_event_rgb_20201027-dea8cd71.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/event/tsn_r18_1x1x8_100e_hvu_event_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/event/tsn_r18_1x1x8_100e_hvu_event_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_concept_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_concept_rgb.py) | concept | short-side 256 | 8 | ResNet18 | ImageNet | 47.5 | 26.1 | 27.3 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/concept/tsn_r18_1x1x8_100e_hvu_concept_rgb_20201027-fc1dd8e3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/concept/tsn_r18_1x1x8_100e_hvu_concept_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/concept/tsn_r18_1x1x8_100e_hvu_concept_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_attribute_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_attribute_rgb.py) | attribute | short-side 256 | 8 | ResNet18 | ImageNet | 46.1 | 33.6 | 34.9 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/attribute/tsn_r18_1x1x8_100e_hvu_attribute_rgb_20201027-0b3b49d2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/attribute/tsn_r18_1x1x8_100e_hvu_attribute_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/attribute/tsn_r18_1x1x8_100e_hvu_attribute_rgb_20201027.json) | +| - | Overall | short-side 256 | - | ResNet18 | ImageNet | 52.6 | 40.0 | 41.3 | - | - | - | + +\[1\] For simplicity, we train a specific model for each tag category as the baselines for HVU. + +\[2\] The performance of HATNet and HATNet-multi are from the paper [Large Scale Holistic Video Understanding](https://pages.iai.uni-bonn.de/gall_juergen/download/HVU_eccv20.pdf). The proposed HATNet is a 2 branch Convolution Network (one 2D branch, one 3D branch) and share the same backbone(ResNet18) with us. The inputs of HATNet are 16 or 32 frames long video clips (which is much larger than us), while the input resolution is coarser (112 instead of 224). HATNet is trained on each individual task (each tag category) while HATNet-multi is trained on multiple tasks. Since there is no released codes or models for the HATNet, we just include the performance reported by the original paper. + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, + not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The values in columns named after "reference" are the results got by training on the original repo, using the same model settings. +4. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to + +- [preparing_ucf101](/tools/data/ucf101/README.md) +- [preparing_kinetics](/tools/data/kinetics/README.md) +- [preparing_sthv1](/tools/data/sthv1/README.md) +- [preparing_sthv2](/tools/data/sthv2/README.md) +- [preparing_mit](/tools/data/mit/README.md) +- [preparing_mmit](/tools/data/mmit/README.md) +- [preparing_hvu](/tools/data/hvu/README.md) +- [preparing_hmdb51](/tools/data/hmdb51/README.md) + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train TSN model on Kinetics-400 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + --work-dir work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test TSN model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{wang2016temporal, + title={Temporal segment networks: Towards good practices for deep action recognition}, + author={Wang, Limin and Xiong, Yuanjun and Wang, Zhe and Qiao, Yu and Lin, Dahua and Tang, Xiaoou and Van Gool, Luc}, + booktitle={European conference on computer vision}, + pages={20--36}, + year={2016}, + organization={Springer} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..6c57c3d451557ac73d9cf5415c1327af7602756a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/README_zh-CN.md @@ -0,0 +1,234 @@ +# TSN + +## 简介 + + + +```BibTeX +@inproceedings{wang2016temporal, + title={Temporal segment networks: Towards good practices for deep action recognition}, + author={Wang, Limin and Xiong, Yuanjun and Wang, Zhe and Qiao, Yu and Lin, Dahua and Tang, Xiaoou and Van Gool, Luc}, + booktitle={European conference on computer vision}, + pages={20--36}, + year={2016}, + organization={Springer} +} +``` + +## 模型库 + +### UCF-101 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :---------: | :---------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x3_75e_ucf101_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb.py) \[1\] | 8 | ResNet50 | ImageNet | 83.03 | 96.78 | 8332 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023-d85ab600.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.json) | + +\[1\] 这里汇报的是 UCF-101 的 split1 部分的结果。 + +### Diving48 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :----------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :---------: | :---------: | :--------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_video_1x1x8_100e_diving48_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 71.27 | 95.74 | 5699 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/tsn_r50_video_1x1x8_100e_diving48_rgb_20210426-6dde0185.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log.json) | +| [tsn_r50_video_1x1x16_100e_diving48_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb.py) | 8 | ResNet50 | ImageNet | 76.75 | 96.95 | 5705 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/tsn_r50_video_1x1x16_100e_diving48_rgb_20210426-63c5f2f7.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log.json) | + +### HMDB51 + +| 配置文件 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------- | :------: | :------: | :---------: | :---------: | :---------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb.py) | 8 | ResNet50 | ImageNet | 48.95 | 80.19 | 21535 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb_20201123-ce6c27ed.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log.json) | +| [tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb.py) | 8 | ResNet50 | Kinetics400 | 56.08 | 84.31 | 21535 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb_20201123-7f84701b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log.json) | +| [tsn_r50_1x1x8_50e_hmdb51_mit_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb.py) | 8 | ResNet50 | Moments | 54.25 | 83.86 | 21535 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/tsn_r50_1x1x8_50e_hmdb51_mit_rgb_20201123-01526d41.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log.json) | + +### Kinetics-400 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 参考代码的 top1 准确率 | 参考代码的 top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :------: | :---------: | :---------: | :------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | :----------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 70.60 | 89.26 | x | x | 4.3 (25x10 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json) | +| [tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 70.42 | 89.03 | x | x | x | 8343 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_1x1x3_100e_kinetics400_rgb_20200725-22592236.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log.json) | +| [tsn_r50_dense_1x1x5_50e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb.py) | 340x256 | 8x3 | ResNet50 | ImageNet | 70.18 | 89.10 | [69.15](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [88.56](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 12.7 (8x10 frames) | 7028 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/tsn_r50_dense_1x1x5_100e_kinetics400_rgb_20200627-a063165f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log.json) | +| [tsn_r50_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb.py) | 短边 320 | 8x2 | ResNet50 | ImageNet | 70.91 | 89.51 | x | x | 10.7 (25x3 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log.json) | +| [tsn_r50_320p_1x1x3_110e_kinetics400_flow](/configs/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow.py) | 短边 320 | 8x2 | ResNet50 | ImageNet | 55.70 | 79.85 | x | x | x | 8471 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow/tsn_r50_320p_1x1x3_110e_kinetics400_flow_20200705-3036bab6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow/tsn_r50_f3_kinetics400_flow_shortedge_55.7_79.9.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow/tsn_r50_f3_kinetics400_flow_shortedge_55.7_79.9.log.json) | +| tsn_r50_320p_1x1x3_kinetics400_twostream \[1: 1\]\* | x | x | ResNet50 | ImageNet | 72.76 | 90.52 | x | x | x | x | x | x | x | +| [tsn_r50_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 71.80 | 90.17 | x | x | x | 8343 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/tsn_r50_256p_1x1x8_100e_kinetics400_rgb_20200817-883baf16.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/20200815_173413.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/20200815_173413.log.json) | +| [tsn_r50_320p_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py) | 短边 320 | 8x3 | ResNet50 | ImageNet | 72.41 | 90.55 | x | x | 11.1 (25x3 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/tsn_r50_320p_1x1x8_100e_kinetics400_rgb_20200702-ef80e3d7.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/tsn_r50_f8_kinetics400_shortedge_72.4_90.6.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/tsn_r50_f8_kinetics400_shortedge_72.4_90.6.log.json) | +| [tsn_r50_320p_1x1x8_110e_kinetics400_flow](/configs/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow.py) | 短边 320 | 8x4 | ResNet50 | ImageNet | 57.76 | 80.99 | x | x | x | 8473 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow/tsn_r50_320p_1x1x8_110e_kinetics400_flow_20200705-1f39486b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow/tsn_r50_f8_kinetics400_flow_shortedge_57.8_81.0.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow/tsn_r50_f8_kinetics400_flow_shortedge_57.8_81.0.log.json) | +| tsn_r50_320p_1x1x8_kinetics400_twostream \[1: 1\]\* | x | x | ResNet50 | ImageNet | 74.64 | 91.77 | x | x | x | x | x | x | x | +| [tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py) | 短边 320 | 8 | ResNet50 | ImageNet | 71.11 | 90.04 | x | x | x | 8343 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014-5ae1ee79.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014.json) | +| [tsn_r50_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb.py) | 340x256 | 8 | ResNet50 | ImageNet | 70.77 | 89.3 | [68.75](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [88.42](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 12.2 (8x10 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_dense_1x1x8_100e_kinetics400_rgb_20200606-e925e6e3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/20200606_003901.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/20200606_003901.log.json) | +| [tsn_r50_video_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 71.14 | 89.63 | x | x | x | 21558 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_1x1x8_100e_kinetics400_rgb_20200702-568cde33.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_100e_kinetics400_rgb.log.json) | +| [tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb.py) | 短边 256 | 8 | ResNet50 | ImageNet | 70.40 | 89.12 | x | x | x | 21553 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb_20200703-0f19175f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log.json) | + +这里,MMAction2 使用 \[1: 1\] 表示以 1: 1 的比例融合 RGB 和光流两分支的融合结果(融合前不经过 softmax) + +### 在 TSN 模型中使用第三方的主干网络 + +用户可在 MMAction2 的框架中使用第三方的主干网络训练 TSN,例如: + +- [x] MMClassification 中的主干网络 +- [x] TorchVision 中的主干网络 +- [x] pytorch-image-models(timm) 中的主干网络 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------: | :------: | :------------------------------------------------------------------------------------------------------: | :------: | :---------: | :---------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.py) | 短边 320 | 8x2 | ResNeXt101-32x4d \[[MMCls](https://github.com/open-mmlab/mmclassification/tree/master/configs/resnext)\] | ImageNet | 73.43 | 91.01 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb-16a8b561.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.json) | +| [tsn_dense161_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.py) | 短边 320 | 8x2 | Densenet-161 \[[TorchVision](https://github.com/pytorch/vision/)\] | ImageNet | 72.78 | 90.75 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb-cbe85332.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.json) | +| [tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.py) | short-side 320 | 8 | Swin Transformer Base \[[timm](https://github.com/rwightman/pytorch-image-models)\] | ImageNet | 77.51 | 92.92 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb-805380f6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.json) | + +1. 由于多种原因,TIMM 中的一些模型未能收到支持,详情请参考 [PR #880](https://github.com/open-mmlab/mmaction2/pull/880)。 + +### Kinetics-400 数据基准测试 (8 块 GPU, ResNet50, ImageNet 预训练; 3 个视频段) + +在数据基准测试中,比较: + +1. 不同的数据预处理方法:(1) 视频分辨率为 340x256, (2) 视频分辨率为短边 320px, (3) 视频分辨率为短边 256px; +2. 不同的数据增强方法:(1) MultiScaleCrop, (2) RandomResizedCrop; +3. 不同的测试方法:(1) 25 帧 x 10 裁剪片段, (2) 25 frames x 3 裁剪片段. + +| 配置文件 | 分辨率 | 训练时的数据增强 | 测试时的策略 | top1 准确率 | top5 准确率 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :---------------: | :----------: | :---------: | :---------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_multiscalecrop_340x256_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_340x256_1x1x3_100e_kinetics400_rgb.py) | 340x256 | MultiScaleCrop | 25x10 frames | 70.60 | 89.26 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json) | +| x | 340x256 | MultiScaleCrop | 25x3 frames | 70.52 | 89.39 | x | x | x | +| [tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb.py) | 340x256 | RandomResizedCrop | 25x10 frames | 70.11 | 89.01 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb_20200725-88cb325a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb_20200725.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb_20200725.json) | +| x | 340x256 | RandomResizedCrop | 25x3 frames | 69.95 | 89.02 | x | x | x | +| [tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb.py) | 短边 320 | MultiScaleCrop | 25x10 frames | 70.32 | 89.25 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb_20200725-9922802f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb_20200725.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb_20200725.json) | +| x | 短边 320 | MultiScaleCrop | 25x3 frames | 70.54 | 89.39 | x | x | x | +| [tsn_r50_randomresizedcrop_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_320p_1x1x3_100e_kinetics400_rgb.py) | 短边 320 | RandomResizedCrop | 25x10 frames | 70.44 | 89.23 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log.json) | +| x | 短边 320 | RandomResizedCrop | 25x3 frames | 70.91 | 89.51 | x | x | x | +| [tsn_r50_multiscalecrop_256p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_256p_1x1x3_100e_kinetics400_rgb.py) | 短边 256 | MultiScaleCrop | 25x10 frames | 70.42 | 89.03 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_1x1x3_100e_kinetics400_rgb_20200725-22592236.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log.json) | +| x | 短边 256 | MultiScaleCrop | 25x3 frames | 70.79 | 89.42 | x | x | x | +| [tsn_r50_randomresizedcrop_256p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_256p_1x1x3_100e_kinetics400_rgb.py) | 短边 256 | RandomResizedCrop | 25x10 frames | 69.80 | 89.06 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb_20200817-ae7963ca.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb/20200815_172601.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb/20200815_172601.log.json) | +| x | 短边 256 | RandomResizedCrop | 25x3 frames | 70.48 | 89.89 | x | x | x | + +### Kinetics-400 OmniSource 实验 + +| 配置文件 | 分辨率 | 主干网络 | 预训练 | w. OmniSource | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------: | :------: | :------: | :---------: | :----------------: | :---------: | :---------: | :----------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 340x256 | ResNet50 | ImageNet | :x: | 70.6 | 89.3 | 4.3 (25x10 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json) | +| x | 340x256 | ResNet50 | ImageNet | :heavy_check_mark: | 73.6 | 91.0 | x | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_imagenet_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-54192355.pth) | x | x | +| x | 短边 320 | ResNet50 | IG-1B \[1\] | :x: | 73.1 | 90.4 | x | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_without_omni_1x1x3_kinetics400_rgb_20200926-c133dd49.pth) | x | x | +| x | 短边 320 | ResNet50 | IG-1B \[1\] | :heavy_check_mark: | 75.7 | 91.9 | x | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-2863fed0.pth) | x | x | + +\[1\] MMAction2 使用 [torch-hub](https://pytorch.org/hub/facebookresearch_semi-supervised-ImageNet1K-models_resnext/) 提供的 `resnet50_swsl` 预训练模型。 + +### Kinetics-600 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :------: | :---------: | :---------: | :----------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_video_1x1x8_100e_kinetics600_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb.py) | 短边 256 | 8x2 | ResNet50 | ImageNet | 74.8 | 92.3 | 11.1 (25x3 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015-4db3c461.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015.json) | + +### Kinetics-700 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :------: | :---------: | :---------: | :----------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_video_1x1x8_100e_kinetics700_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb.py) | 短边 256 | 8x2 | ResNet50 | ImageNet | 61.7 | 83.6 | 11.1 (25x3 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015-e381a6c7.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015.json) | + +### Something-Something V1 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 参考代码的 top1 准确率 | 参考代码的 top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :---------: | :---------: | :------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x8_50e_sthv1_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 18.55 | 44.80 | [17.53](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [44.29](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 10978 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb/tsn_r50_1x1x8_50e_sthv1_rgb_20200618-061b9195.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb/tsn_sthv1.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb/tsn_r50_f8_sthv1_18.1_45.0.log.json) | +| [tsn_r50_1x1x16_50e_sthv1_rgb](/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb.py) | 高 100 | 8 | ResNet50 | ImageNet | 15.77 | 39.85 | [13.33](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | [35.58](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training) | 5691 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb/tsn_r50_1x1x16_50e_sthv1_rgb_20200614-7e2fe4f1.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb/20200614_211932.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb/20200614_211932.log.json) | + +### Something-Something V2 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | 参考代码的 top1 准确率 | 参考代码的 top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------- | :----: | :------: | :------: | :------: | :---------: | :---------: | :--------------------: | :--------------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x8_50e_sthv2_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb.py) | 高 256 | 8 | ResNet50 | ImageNet | 28.59 | 59.56 | x | x | 10966 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb/tsn_r50_1x1x8_50e_sthv2_rgb_20210816-1aafee8f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb/20210816_221116.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb/20210816_221116.log.json) | +| [tsn_r50_1x1x16_50e_sthv2_rgb](/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb.py) | 高 256 | 8 | ResNet50 | ImageNet | 20.89 | 49.16 | x | x | 8337 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb/tsn_r50_1x1x16_50e_sthv2_rgb_20210816-5d23ac6e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb/20210816_225256.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb/20210816_225256.log.json) | + +### Moments in Time + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :----------------------------------------------------------------------------------- | :------: | :------: | :------: | :------: | :---------: | :---------: | :--------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_1x1x6_100e_mit_rgb](/configs/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb.py) | 短边 256 | 8x2 | ResNet50 | ImageNet | 26.84 | 51.6 | 8339 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_r50_1x1x6_100e_mit_rgb_20200618-d512ab1b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_mit.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_r50_f6_mit_26.8_51.6.log.json) | + +### Multi-Moments in Time + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | mAP | GPU 显存占用 (M) | ckpt | log | json | +| :------------------------------------------------------------------------------------- | :------: | :------: | :-------: | :------: | :---: | :--------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r101_1x1x5_50e_mmit_rgb](/configs/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb.py) | 短边 256 | 8x2 | ResNet101 | ImageNet | 61.09 | 10467 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb/tsn_r101_1x1x5_50e_mmit_rgb_20200618-642f450d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb/tsn_mmit.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb/tsn_r101_f6_mmit_61.1.log.json) | + +### ActivityNet v1.3 + +| 配置文件 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | top1 准确率 | top5 准确率 | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :---------: | :---------: | :---------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r50_320p_1x1x8_50e_activitynet_video_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb.py) | 短边 320 | 8x1 | ResNet50 | Kinetics400 | 73.93 | 93.44 | 5692 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb_20210301-7f8da0c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/20210228_223327.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/20210228_223327.log.json) | +| [tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb.py) | 短边 320 | 8x1 | ResNet50 | Kinetics400 | 76.90 | 94.47 | 5692 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb_20210301-c0f04a7e.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/20210217_181313.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/20210217_181313.log.json) | +| [tsn_r50_320p_1x1x8_150e_activitynet_video_flow](/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow.py) | 340x256 | 8x2 | ResNet50 | Kinetics400 | 57.51 | 83.02 | 5780 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/tsn_r50_320p_1x1x8_150e_activitynet_video_flow_20200804-13313f52.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/tsn_r50_320p_1x1x8_150e_activitynet_video_flow_20200804.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/tsn_r50_320p_1x1x8_150e_activitynet_video_flow_20200804.json) | +| [tsn_r50_320p_1x1x8_150e_activitynet_clip_flow](/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow.py) | 340x256 | 8x2 | ResNet50 | Kinetics400 | 59.51 | 82.69 | 5780 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow_20200804-8622cf38.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow_20200804.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow_20200804.json) | + +### HVU + +| 配置文件\[1\] | tag 类别 | 分辨率 | GPU 数量 | 主干网络 | 预训练 | mAP | HATNet\[2\] | HATNet-multi\[2\] | ckpt | log | json | +| :----------------------------------------------------------------------------------------------------------: | :-------: | :------: | :------: | :------: | :------: | :--: | :---------: | :---------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r18_1x1x8_100e_hvu_action_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_action_rgb.py) | action | 短边 256 | 8x2 | ResNet18 | ImageNet | 57.5 | 51.8 | 53.5 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/action/tsn_r18_1x1x8_100e_hvu_action_rgb_20201027-011b282b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/action/tsn_r18_1x1x8_100e_hvu_action_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/action/tsn_r18_1x1x8_100e_hvu_action_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_scene_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_scene_rgb.py) | scene | 短边 256 | 8 | ResNet18 | ImageNet | 55.2 | 55.8 | 57.2 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/scene/tsn_r18_1x1x8_100e_hvu_scene_rgb_20201027-00e5748d.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/scene/tsn_r18_1x1x8_100e_hvu_scene_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/scene/tsn_r18_1x1x8_100e_hvu_scene_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_object_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_object_rgb.py) | object | 短边 256 | 8 | ResNet18 | ImageNet | 45.7 | 34.2 | 35.1 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/object/tsn_r18_1x1x8_100e_hvu_object_rgb_20201102-24a22f30.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/object/tsn_r18_1x1x8_100e_hvu_object_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/object/tsn_r18_1x1x8_100e_hvu_object_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_event_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_event_rgb.py) | event | 短边 256 | 8 | ResNet18 | ImageNet | 63.7 | 38.5 | 39.8 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/event/tsn_r18_1x1x8_100e_hvu_event_rgb_20201027-dea8cd71.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/event/tsn_r18_1x1x8_100e_hvu_event_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/event/tsn_r18_1x1x8_100e_hvu_event_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_concept_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_concept_rgb.py) | concept | 短边 256 | 8 | ResNet18 | ImageNet | 47.5 | 26.1 | 27.3 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/concept/tsn_r18_1x1x8_100e_hvu_concept_rgb_20201027-fc1dd8e3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/concept/tsn_r18_1x1x8_100e_hvu_concept_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/concept/tsn_r18_1x1x8_100e_hvu_concept_rgb_20201027.json) | +| [tsn_r18_1x1x8_100e_hvu_attribute_rgb](/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_attribute_rgb.py) | attribute | 短边 256 | 8 | ResNet18 | ImageNet | 46.1 | 33.6 | 34.9 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/attribute/tsn_r18_1x1x8_100e_hvu_attribute_rgb_20201027-0b3b49d2.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/attribute/tsn_r18_1x1x8_100e_hvu_attribute_rgb_20201027.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/hvu/attribute/tsn_r18_1x1x8_100e_hvu_attribute_rgb_20201027.json) | +| - | 所有 tag | 短边 256 | - | ResNet18 | ImageNet | 52.6 | 40.0 | 41.3 | - | - | - | + +\[1\] 简单起见,MMAction2 对每个 tag 类别训练特定的模型,作为 HVU 的基准模型。 + +\[2\] 这里 HATNet 和 HATNet-multi 的结果来自于 paper: [Large Scale Holistic Video Understanding](https://pages.iai.uni-bonn.de/gall_juergen/download/HVU_eccv20.pdf)。 +HATNet 的时序动作候选是一个双分支的卷积网络(一个 2D 分支,一个 3D 分支),并且和 MMAction2 有相同的主干网络(ResNet18)。HATNet 的输入是 16 帧或 32 帧的长视频片段(这样的片段比 MMAction2 使用的要长),同时输入分辨率更粗糙(112px 而非 224px)。 +HATNet 是在每个独立的任务(对应每个 tag 类别)上进行训练的,HATNet-multi 是在多个任务上进行训练的。由于目前没有 HATNet 的开源代码和模型,这里仅汇报了原 paper 的精度。 + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 参考代码的结果是通过使用相同的模型配置在原来的代码库上训练得到的。 +4. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +对于数据集准备的细节,用户可参考: + +- [准备 ucf101](/tools/data/ucf101/README_zh-CN.md) +- [准备 kinetics](/tools/data/kinetics/README_zh-CN.md) +- [准备 sthv1](/tools/data/sthv1/README_zh-CN.md) +- [准备 sthv2](/tools/data/sthv2/README_zh-CN.md) +- [准备 mit](/tools/data/mit/README_zh-CN.md) +- [准备 mmit](/tools/data/mmit/README_zh-CN.md) +- [准备 hvu](/tools/data/hvu/README_zh-CN.md) +- [准备 hmdb51](/tools/data/hmdb51/README_zh-CN.md) + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 TSN 模型在 Kinetics-400 数据集上的训练。 + +```shell +python tools/train.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + --work-dir work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics-400 数据集上测试 TSN 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..d4b5051083742800a2862b98541c2415696cb96b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,99 @@ +_base_ = [ + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict( + type='Recognizer2D', + backbone=dict(type='torchvision.densenet161', pretrained=True), + cls_head=dict( + type='TSNHead', + num_classes=400, + in_channels=2208, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.4, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips=None)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_320p' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_320p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# runtime settings +work_dir = './work_dirs/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/' +optimizer = dict( + type='SGD', + lr=0.00375, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..978cb5bc9dba1ebda24c4aeafbfb5a1e8d5fc13c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,108 @@ +_base_ = [ + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='mmcls.ResNeXt', + depth=101, + num_stages=4, + out_indices=(3, ), + groups=32, + width_per_group=4, + style='pytorch'), + cls_head=dict( + type='TSNHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.4, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips=None)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_320p' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_320p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# runtime settings +work_dir = './work_dirs/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb/' +load_from = ('https://download.openmmlab.com/mmclassification/v0/resnext/' + 'resnext101_32x4d_batch256_imagenet_20200708-87f2d1c9.pth') +optimizer = dict( + type='SGD', + lr=0.005, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..dfe70170eeecf079438be8c314ef121c1cdfa2c2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,103 @@ +_base_ = [ + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +model = dict( + type='Recognizer2D', + backbone=dict(type='timm.swin_base_patch4_window7_224', pretrained=True), + cls_head=dict( + type='TSNHead', + num_classes=400, + in_channels=1024, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.4, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips=None)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='DecordDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=24, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/' # noqa +optimizer = dict( + type='SGD', + lr=0.0075, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_256p_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_256p_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..bb0a5fe33377f22af6413d2506333d257e0055e3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_256p_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,89 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_256p' +data_root_val = 'data/kinetics400/rawframes_val_256p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_256p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_256p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_256p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = ('./work_dirs/tsn_r50_multiscalecrop_256p_1x1x3' + '_100e_kinetics400_rgb/') +workflow = [('train', 5)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6b77944ee02ae1d1544c9bec1fc3e4e5b572d8b1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_320p_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,89 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_320p' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_320p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = ('./work_dirs/tsn_r50_multiscalecrop_320p_1x1x3' + '_100e_kinetics400_rgb/') +workflow = [('train', 5)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_340x256_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_340x256_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..897fb05f9074236c50bb4146ba5c05bcdd68da22 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_multiscalecrop_340x256_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,88 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = ('./work_dirs/tsn_r50_multiscalecrop_340x256_1x1x3' + '_100e_kinetics400_rgb/') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_256p_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_256p_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..3d9e8ca547716a07669d062191c00f72aea5b60f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_256p_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,83 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_256p' +data_root_val = 'data/kinetics400/rawframes_val_256p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_256p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_256p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_256p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = ('./work_dirs/tsn_r50_randomresizedcrop_256p_1x1x3' + '_100e_kinetics400_rgb/') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_320p_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_320p_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..c35a32e4e77bda03cd5159999e91800e76a6d738 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_320p_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,83 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_320p' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_320p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = ('./work_dirs/tsn_r50_randomresizedcrop_320p_1x1x3' + '_100e_kinetics400_rgb/') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..968bfc6f38d560082d341954c75c8f02749a2b5d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_340x256_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,84 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = ('./work_dirs/tsn_r50_randomresizedcrop_340x256_1x1x3' + '_100e_kinetics400_rgb') +workflow = [('train', 5)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_256p_1x1x25_10crop_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_256p_1x1x25_10crop_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..bb4da3990f81be00b38530f100b2dead10e2dd87 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_256p_1x1x25_10crop_100e_kinetics400_rgb.py @@ -0,0 +1,32 @@ +_base_ = ['../../../_base_/models/tsn_r50.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root_val = 'data/kinetics400/rawframes_val_256p' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_256p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_256p_1x1x25_3crop_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_256p_1x1x25_3crop_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..82f1d3eabe48525b08fac7857eb1142c3ac1621e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_256p_1x1x25_3crop_100e_kinetics400_rgb.py @@ -0,0 +1,32 @@ +_base_ = ['../../../_base_/models/tsn_r50.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root_val = 'data/kinetics400/rawframes_val_256p' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_256p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_320p_1x1x25_10crop_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_320p_1x1x25_10crop_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..74aeac51e039facb8ed9f6af48b8858dc76df006 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_320p_1x1x25_10crop_100e_kinetics400_rgb.py @@ -0,0 +1,32 @@ +_base_ = ['../../../_base_/models/tsn_r50.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_320p_1x1x25_3crop_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_320p_1x1x25_3crop_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..ba35eb5922af9a8465447262225b03664711d443 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_320p_1x1x25_3crop_100e_kinetics400_rgb.py @@ -0,0 +1,32 @@ +_base_ = ['../../../_base_/models/tsn_r50.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_340x256_1x1x25_10crop_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_340x256_1x1x25_10crop_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..ad900cd342ce7d6d42afc4093996e047556496b9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_340x256_1x1x25_10crop_100e_kinetics400_rgb.py @@ -0,0 +1,32 @@ +_base_ = ['../../../_base_/models/tsn_r50.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_340x256_1x1x25_3crop_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_340x256_1x1x25_3crop_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..980259ecbd2466f50965bcc7c9b9dec9a0fbf4a4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/data_benchmark/tsn_r50_test_340x256_1x1x25_3crop_100e_kinetics400_rgb.py @@ -0,0 +1,32 @@ +_base_ = ['../../../_base_/models/tsn_r50.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_action_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_action_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..3e2a4bfa8e39861bbd53a054fc2dfd0d7982a81a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_action_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +category_nums = dict( + action=739, attribute=117, concept=291, event=69, object=1678, scene=248) +target_cate = 'action' + +model = dict( + backbone=dict(pretrained='torchvision://resnet18', depth=18), + cls_head=dict( + in_channels=512, + num_classes=category_nums[target_cate], + multi_class=True, + loss_cls=dict(type='BCELossWithLogits', loss_weight=333.))) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/hvu/videos_train' +data_root_val = 'data/hvu/videos_val' +ann_file_train = f'data/hvu/hvu_{target_cate}_train.json' +ann_file_val = f'data/hvu/hvu_{target_cate}_val.json' +ann_file_test = f'data/hvu/hvu_{target_cate}_val.json' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline, + multi_class=True, + num_classes=category_nums[target_cate])) +evaluation = dict(interval=2, metrics=['mean_average_precision']) + +# runtime settings +work_dir = f'./work_dirs/tsn_r18_1x1x8_100e_hvu_{target_cate}_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_attribute_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_attribute_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..f4ebc6ff1930a75b28a6243919ccdd8cfe779e68 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_attribute_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +category_nums = dict( + action=739, attribute=117, concept=291, event=69, object=1678, scene=248) +target_cate = 'attribute' + +model = dict( + backbone=dict(pretrained='torchvision://resnet18', depth=18), + cls_head=dict( + in_channels=512, + num_classes=category_nums[target_cate], + multi_class=True, + loss_cls=dict(type='BCELossWithLogits', loss_weight=333.))) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/hvu/videos_train' +data_root_val = 'data/hvu/videos_val' +ann_file_train = f'data/hvu/hvu_{target_cate}_train.json' +ann_file_val = f'data/hvu/hvu_{target_cate}_val.json' +ann_file_test = f'data/hvu/hvu_{target_cate}_val.json' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline, + multi_class=True, + num_classes=category_nums[target_cate])) +evaluation = dict(interval=2, metrics=['mean_average_precision']) + +# runtime settings +work_dir = f'./work_dirs/tsn_r18_1x1x8_100e_hvu_{target_cate}_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_concept_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_concept_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..b8e6812f16db4ab4c429dca0456b13f696b71429 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_concept_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +category_nums = dict( + action=739, attribute=117, concept=291, event=69, object=1678, scene=248) +target_cate = 'concept' + +model = dict( + backbone=dict(pretrained='torchvision://resnet18', depth=18), + cls_head=dict( + in_channels=512, + num_classes=category_nums[target_cate], + multi_class=True, + loss_cls=dict(type='BCELossWithLogits', loss_weight=333.))) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/hvu/videos_train' +data_root_val = 'data/hvu/videos_val' +ann_file_train = f'data/hvu/hvu_{target_cate}_train.json' +ann_file_val = f'data/hvu/hvu_{target_cate}_val.json' +ann_file_test = f'data/hvu/hvu_{target_cate}_val.json' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline, + multi_class=True, + num_classes=category_nums[target_cate])) +evaluation = dict(interval=2, metrics=['mean_average_precision']) + +# runtime settings +work_dir = f'./work_dirs/tsn_r18_1x1x8_100e_hvu_{target_cate}_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_event_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_event_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4073e5994cb88a6f83131e41a964ef245cae07d9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_event_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +category_nums = dict( + action=739, attribute=117, concept=291, event=69, object=1678, scene=248) +target_cate = 'event' + +model = dict( + backbone=dict(pretrained='torchvision://resnet18', depth=18), + cls_head=dict( + in_channels=512, + num_classes=category_nums[target_cate], + multi_class=True, + loss_cls=dict(type='BCELossWithLogits', loss_weight=333.))) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/hvu/videos_train' +data_root_val = 'data/hvu/videos_val' +ann_file_train = f'data/hvu/hvu_{target_cate}_train.json' +ann_file_val = f'data/hvu/hvu_{target_cate}_val.json' +ann_file_test = f'data/hvu/hvu_{target_cate}_val.json' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline, + multi_class=True, + num_classes=category_nums[target_cate])) +evaluation = dict(interval=2, metrics=['mean_average_precision']) + +# runtime settings +work_dir = f'./work_dirs/tsn_r18_1x1x8_100e_hvu_{target_cate}_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_object_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_object_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..350d256ab0dbf63bbdcd88b39909210c201eec0a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_object_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +category_nums = dict( + action=739, attribute=117, concept=291, event=69, object=1678, scene=248) +target_cate = 'object' + +model = dict( + backbone=dict(pretrained='torchvision://resnet18', depth=18), + cls_head=dict( + in_channels=512, + num_classes=category_nums[target_cate], + multi_class=True, + loss_cls=dict(type='BCELossWithLogits', loss_weight=333.))) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/hvu/videos_train' +data_root_val = 'data/hvu/videos_val' +ann_file_train = f'data/hvu/hvu_{target_cate}_train.json' +ann_file_val = f'data/hvu/hvu_{target_cate}_val.json' +ann_file_test = f'data/hvu/hvu_{target_cate}_val.json' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline, + multi_class=True, + num_classes=category_nums[target_cate])) +evaluation = dict(interval=2, metrics=['mean_average_precision']) + +# runtime settings +work_dir = f'./work_dirs/tsn_r18_1x1x8_100e_hvu_{target_cate}_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_scene_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_scene_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..ff60a6593477d73a23d88668f7a88ce421ac3e4f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_scene_rgb.py @@ -0,0 +1,102 @@ +_base_ = [ + '../../../_base_/models/tsn_r50.py', + '../../../_base_/schedules/sgd_100e.py', + '../../../_base_/default_runtime.py' +] + +# model settings +category_nums = dict( + action=739, attribute=117, concept=291, event=69, object=1678, scene=248) +target_cate = 'scene' + +model = dict( + backbone=dict(pretrained='torchvision://resnet18', depth=18), + cls_head=dict( + in_channels=512, + num_classes=category_nums[target_cate], + multi_class=True, + loss_cls=dict(type='BCELossWithLogits', loss_weight=333.))) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/hvu/videos_train' +data_root_val = 'data/hvu/videos_val' +ann_file_train = f'data/hvu/hvu_{target_cate}_train.json' +ann_file_val = f'data/hvu/hvu_{target_cate}_val.json' +ann_file_test = f'data/hvu/hvu_{target_cate}_val.json' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + multi_class=True, + num_classes=category_nums[target_cate]), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline, + multi_class=True, + num_classes=category_nums[target_cate])) +evaluation = dict(interval=2, metrics=['mean_average_precision']) + +# runtime settings +work_dir = f'./work_dirs/tsn_r18_1x1x8_100e_hvu_{target_cate}_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..fa941ed5b2eede61f47a3323ae24df60d6325746 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/metafile.yml @@ -0,0 +1,960 @@ +Collections: +- Name: TSN + README: configs/recognition/tsn/README.md + Paper: + URL: https://arxiv.org/abs/1608.00859 + Title: "Temporal Segment Networks: Towards Good Practices for Deep Action Recognition" +Models: +- Config: configs/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 75 + FLOPs: 134526773248 + Parameters: 23714981 + Pretrained: ImageNet + Training Data: UCF101 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x3_75e_ucf101_rgb + Results: + - Dataset: UCF101 + Metrics: + Top 1 Accuracy: 83.03 + Top 5 Accuracy: 96.78 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023-d85ab600.pth +- Config: configs/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 100 + FLOPs: 32959107072 + Parameters: 23606384 + Pretrained: ImageNet + Training Data: Diving48 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_video_1x1x8_100e_diving48_rgb + Results: + - Dataset: Diving48 + Metrics: + Top 1 Accuracy: 71.27 + Top 5 Accuracy: 95.74 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/tsn_r50_video_1x1x8_100e_diving48_rgb_20210426-6dde0185.pth +- Config: configs/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 4 + Epochs: 100 + FLOPs: 32959107072 + Parameters: 23606384 + Pretrained: ImageNet + Training Data: Diving48 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_video_1x1x16_100e_diving48_rgb + Results: + - Dataset: Diving48 + Metrics: + Top 1 Accuracy: 76.75 + Top 5 Accuracy: 96.95 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/tsn_r50_video_1x1x16_100e_diving48_rgb_20210426-63c5f2f7.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 50 + FLOPs: 43048605696 + Parameters: 23612531 + Pretrained: ImageNet + Training Data: HMDB51 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb + Results: + - Dataset: HMDB51 + Metrics: + Top 1 Accuracy: 48.95 + Top 5 Accuracy: 80.19 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb_20201123-ce6c27ed.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 50 + FLOPs: 43048605696 + Parameters: 23612531 + Pretrained: Kinetics400 + Training Data: HMDB51 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb + Results: + - Dataset: HMDB51 + Metrics: + Top 1 Accuracy: 56.08 + Top 5 Accuracy: 84.31 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb_20201123-7f84701b.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Epochs: 50 + FLOPs: 43048605696 + Parameters: 23612531 + Pretrained: Moments + Training Data: HMDB51 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x8_50e_hmdb51_mit_rgb + Results: + - Dataset: HMDB51 + Metrics: + Top 1 Accuracy: 54.25 + Top 5 Accuracy: 83.86 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/tsn_r50_1x1x8_50e_hmdb51_mit_rgb_20201123-01526d41.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 102997721600 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x3_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.6 + Top 5 Accuracy: 89.26 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 102997721600 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x3_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.42 + Top 5 Accuracy: 89.03 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_1x1x3_100e_kinetics400_rgb_20200725-22592236.pth +- Config: configs/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 16 + Epochs: 100 + FLOPs: 32959827968 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Training Resources: 24 GPUs + Modality: RGB + Name: tsn_r50_dense_1x1x5_50e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.18 + Top 5 Accuracy: 89.1 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/tsn_r50_dense_1x1x5_100e_kinetics400_rgb_20200627-a063165f.pth +- Config: configs/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 134527385600 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: tsn_r50_320p_1x1x3_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.91 + Top 5 Accuracy: 89.51 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_f3_kinetics400_shortedge_70.9_89.5.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth +- Config: configs/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 110 + FLOPs: 109881868800 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: Flow + Name: tsn_r50_320p_1x1x3_110e_kinetics400_flow + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 55.7 + Top 5 Accuracy: 79.85 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow/tsn_r50_f3_kinetics400_flow_shortedge_55.7_79.9.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow/tsn_r50_f3_kinetics400_flow_shortedge_55.7_79.9.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow/tsn_r50_320p_1x1x3_110e_kinetics400_flow_20200705-3036bab6.pth +- Config: configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134527385600 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x8_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 71.8 + Top 5 Accuracy: 90.17 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/20200815_173413.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/20200815_173413.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/tsn_r50_256p_1x1x8_100e_kinetics400_rgb_20200817-883baf16.pth +- Config: configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134527385600 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 24 GPUs + Modality: RGB + Name: tsn_r50_320p_1x1x8_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.41 + Top 5 Accuracy: 90.55 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/tsn_r50_f8_kinetics400_shortedge_72.4_90.6.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/tsn_r50_f8_kinetics400_shortedge_72.4_90.6.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/tsn_r50_320p_1x1x8_100e_kinetics400_rgb_20200702-ef80e3d7.pth +- Config: configs/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 110 + FLOPs: 109881868800 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 32 GPUs + Modality: Flow + Name: tsn_r50_320p_1x1x8_110e_kinetics400_flow + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 57.76 + Top 5 Accuracy: 80.99 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow/tsn_r50_f8_kinetics400_flow_shortedge_57.8_81.0.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow/tsn_r50_f8_kinetics400_flow_shortedge_57.8_81.0.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow/tsn_r50_320p_1x1x8_110e_kinetics400_flow_20200705-1f39486b.pth +- Config: configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 102997721600 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 71.11 + Top 5 Accuracy: 90.04 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014-5ae1ee79.pth +- Config: configs/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 32959827968 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_dense_1x1x8_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.77 + Top 5 Accuracy: 89.3 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/20200606_003901.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/20200606_003901.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_dense_1x1x8_100e_kinetics400_rgb_20200606-e925e6e3.pth +- Config: configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 134527385600 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_video_1x1x8_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 71.14 + Top 5 Accuracy: 89.63 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_100e_kinetics400_rgb.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_100e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_1x1x8_100e_kinetics400_rgb_20200702-568cde33.pth +- Config: configs/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 32959827968 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-400 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 70.4 + Top 5 Accuracy: 89.12 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb_20200703-0f19175f.pth +- Config: configs/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNeXt101-32x4d [[MMCls](https://github.com/open-mmlab/mmclassification/tree/master/configs/resnext)] + Batch Size: 16 + Epochs: 100 + FLOPs: 262238208000 + Parameters: 42948304 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.43 + Top 5 Accuracy: 91.01 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_rn101_32x4d_320p_1x1x3_100e_kinetics400_rgb-16a8b561.pth +- Config: configs/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNeXt101-32x4d [[TorchVision](https://github.com/pytorch/vision/)] + Batch Size: 12 + Epochs: 100 + FLOPs: 255225561600 + Parameters: 27355600 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics-400 + Training Resources: 16 GPUs + Modality: RGB + Name: tsn_dense161_320p_1x1x3_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 72.78 + Top 5 Accuracy: 90.75 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb/tsn_dense161_320p_1x1x3_100e_kinetics400_rgb-cbe85332.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 102997721600 + Parameters: 24327632 + Pretrained: ImageNet + Resolution: 340x256 + Training Data: Kinetics-400 + Modality: RGB + Name: tsn_omnisource_r50_1x1x3_100e_kinetics_rgb + Converted From: + Weights: https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/tsn_OmniSource_kinetics400_se_rgb_r50_seg3_f1s1_imagenet-4066cb7e.pth + Code: https://github.com/open-mmlab/mmaction + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.6 + Top 5 Accuracy: 91.0 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_imagenet_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-54192355.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 102997721600 + Parameters: 24327632 + Pretrained: IG-1B + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: tsn_IG1B_pretrained_r50_1x1x3_100e_kinetics_rgb + Converted From: + Weights: https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/tsn_OmniSource_kinetics400_se_rgb_r50_seg3_f1s1_IG1B-25fc136b.pth + Code: https://github.com/open-mmlab/mmaction/ + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.1 + Top 5 Accuracy: 90.4 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_without_omni_1x1x3_kinetics400_rgb_20200926-c133dd49.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 32 + Epochs: 100 + FLOPs: 102997721600 + Parameters: 24327632 + Pretrained: IG-1B + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: tsn_IG1B_pretrained_omnisource_r50_1x1x3_100e_kinetics_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 75.7 + Top 5 Accuracy: 91.9 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-2863fed0.pth +- Config: configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134527795200 + Parameters: 24737432 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-600 + Training Resources: 16 GPUs + Modality: RGB + Name: tsn_r50_video_1x1x8_100e_kinetics600_rgb + Results: + - Dataset: Kinetics-600 + Metrics: + Top 1 Accuracy: 74.8 + Top 5 Accuracy: 92.3 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015-4db3c461.pth +- Config: configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 12 + Epochs: 100 + FLOPs: 134528000000 + Parameters: 24942332 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: Kinetics-700 + Training Resources: 16 GPUs + Modality: RGB + Name: tsn_r50_video_1x1x8_100e_kinetics700_rgb + Results: + - Dataset: Kinetics-700 + Metrics: + Top 1 Accuracy: 61.7 + Top 5 Accuracy: 83.6 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015-e381a6c7.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 16 + Epochs: 50 + FLOPs: 32781541376 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x8_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 18.55 + Top 5 Accuracy: 44.8 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb/tsn_r50_f8_sthv1_18.1_45.0.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb/tsn_sthv1.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb/tsn_r50_1x1x8_50e_sthv1_rgb_20200618-061b9195.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 4 + Epochs: 50 + FLOPs: 32781541376 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 100 + Training Data: SthV1 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x16_50e_sthv1_rgb + Results: + - Dataset: SthV1 + Metrics: + Top 1 Accuracy: 15.77 + Top 5 Accuracy: 39.85 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb/20200614_211932.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb/20200614_211932.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb/tsn_r50_1x1x16_50e_sthv1_rgb_20200614-7e2fe4f1.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 16 + Epochs: 50 + FLOPs: 32959365120 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 256 + Training Data: SthV2 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x8_50e_sthv2_rgb + Results: + - Dataset: SthV2 + Metrics: + Top 1 Accuracy: 28.59 + Top 5 Accuracy: 59.56 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb/20210816_221116.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb/20210816_221116.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb/tsn_r50_1x1x8_50e_sthv2_rgb_20210816-1aafee8f.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 4 + Epochs: 50 + FLOPs: 65918373888 + Parameters: 23864558 + Pretrained: ImageNet + Resolution: height 256 + Training Data: SthV2 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_1x1x16_50e_sthv2_rgb + Results: + - Dataset: SthV2 + Metrics: + Top 1 Accuracy: 20.89 + Top 5 Accuracy: 49.16 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb/20210816_225256.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb/20210816_225256.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb/tsn_r50_1x1x16_50e_sthv2_rgb_20210816-5d23ac6e.pth +- Config: configs/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 16 + Epochs: 100 + FLOPs: 32287070208 + Parameters: 24202643 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: MiT + Training Resources: 16 GPUs + Modality: RGB + Name: tsn_r50_1x1x6_100e_mit_rgb + Results: + - Dataset: MiT + Metrics: + Top 1 Accuracy: 26.84 + Top 5 Accuracy: 51.6 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_r50_f6_mit_26.8_51.6.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_mit.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_r50_1x1x6_100e_mit_rgb_20200618-d512ab1b.pth +- Config: configs/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet101 + Batch Size: 16 + Epochs: 50 + FLOPs: 51249301504 + Parameters: 43141497 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: MMiT + Training Resources: 16 GPUs + Modality: RGB + Name: tsn_r101_1x1x5_50e_mmit_rgb + Results: + - Dataset: MMiT + Metrics: + mAP: 61.09 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb/tsn_r101_f6_mmit_61.1.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb/tsn_mmit.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb/tsn_r101_1x1x5_50e_mmit_rgb_20200618-642f450d.pth +- Config: configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 134526976000 + Parameters: 23917832 + Pretrained: Kinetics400 + Resolution: short-side 320 + Training Data: ActivityNet v1.3 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_320p_1x1x8_50e_activitynet_video_rgb + Results: + - Dataset: ActivityNet v1.3 + Metrics: + Top 1 Accuracy: 73.93 + Top 5 Accuracy: 93.44 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/20210228_223327.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/20210228_223327.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb_20210301-7f8da0c6.pth +- Config: configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 50 + FLOPs: 134526976000 + Parameters: 23917832 + Pretrained: Kinetics400 + Resolution: short-side 320 + Training Data: ActivityNet v1.3 + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb + Results: + - Dataset: ActivityNet v1.3 + Metrics: + Top 1 Accuracy: 76.9 + Top 5 Accuracy: 94.47 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/20210217_181313.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/20210217_181313.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb_20210301-c0f04a7e.pth +- Config: configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 150 + FLOPs: 109881459200 + Parameters: 23939784 + Pretrained: Kinetics400 + Resolution: 340x256 + Training Data: ActivityNet v1.3 + Training Resources: 16 GPUs + Modality: Flow + Name: tsn_r50_320p_1x1x8_150e_activitynet_video_flow + Results: + - Dataset: ActivityNet v1.3 + Metrics: + Top 1 Accuracy: 57.51 + Top 5 Accuracy: 83.02 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/tsn_r50_320p_1x1x8_150e_activitynet_video_flow_20200804.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/tsn_r50_320p_1x1x8_150e_activitynet_video_flow_20200804.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/tsn_r50_320p_1x1x8_150e_activitynet_video_flow_20200804-13313f52.pth +- Config: configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow.py + In Collection: TSN + Metadata: + Architecture: ResNet50 + Batch Size: 8 + Epochs: 150 + FLOPs: 109881459200 + Parameters: 23939784 + Pretrained: Kinetics400 + Resolution: 340x256 + Training Data: ActivityNet v1.3 + Training Resources: 16 GPUs + Modality: Flow + Name: tsn_r50_320p_1x1x8_150e_activitynet_clip_flow + Results: + - Dataset: ActivityNet v1.3 + Metrics: + Top 1 Accuracy: 59.51 + Top 5 Accuracy: 82.69 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow_20200804.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow_20200804.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow_20200804-8622cf38.pth +- Config: configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_action_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet18 + Batch Size: 32 + Epochs: 100 + FLOPs: 59483309568 + Parameters: 11555619 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: HVU + Training Resources: 16 GPUs + Modality: RGB + Name: tsn_r18_1x1x8_100e_hvu_action_rgb + Results: + - Dataset: HVU + Metrics: + mAP: 57.5 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/action/tsn_r18_1x1x8_100e_hvu_action_rgb_20201027.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/action/tsn_r18_1x1x8_100e_hvu_action_rgb_20201027.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/action/tsn_r18_1x1x8_100e_hvu_action_rgb_20201027-011b282b.pth + tag category: action +- Config: configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_scene_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet18 + Batch Size: 32 + Epochs: 100 + FLOPs: 59483058176 + Parameters: 11303736 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: HVU + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r18_1x1x8_100e_hvu_scene_rgb + Results: + - Dataset: HVU + Metrics: + mAP: 55.2 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/scene/tsn_r18_1x1x8_100e_hvu_scene_rgb_20201027.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/scene/tsn_r18_1x1x8_100e_hvu_scene_rgb_20201027.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/scene/tsn_r18_1x1x8_100e_hvu_scene_rgb_20201027-00e5748d.pth + tag category: scene +- Config: configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_object_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet18 + Batch Size: 32 + Epochs: 100 + FLOPs: 59483790336 + Parameters: 12037326 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: HVU + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r18_1x1x8_100e_hvu_object_rgb + Results: + - Dataset: HVU + Metrics: + mAP: 45.7 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/object/tsn_r18_1x1x8_100e_hvu_object_rgb_20201027.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/object/tsn_r18_1x1x8_100e_hvu_object_rgb_20201027.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/object/tsn_r18_1x1x8_100e_hvu_object_rgb_20201102-24a22f30.pth + tag category: object +- Config: configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_event_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet18 + Batch Size: 32 + Epochs: 100 + FLOPs: 59482966528 + Parameters: 11211909 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: HVU + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r18_1x1x8_100e_hvu_event_rgb + Results: + - Dataset: HVU + Metrics: + mAP: 63.7 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/event/tsn_r18_1x1x8_100e_hvu_event_rgb_20201027.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/event/tsn_r18_1x1x8_100e_hvu_event_rgb_20201027.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/event/tsn_r18_1x1x8_100e_hvu_event_rgb_20201027-dea8cd71.pth + tag category: event +- Config: configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_concept_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet18 + Batch Size: 32 + Epochs: 100 + FLOPs: 59483790336 + Parameters: 12037326 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: HVU + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r18_1x1x8_100e_hvu_concept_rgb + Results: + - Dataset: HVU + Metrics: + mAP: 47.5 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/concept/tsn_r18_1x1x8_100e_hvu_concept_rgb_20201027.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/concept/tsn_r18_1x1x8_100e_hvu_concept_rgb_20201027.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/concept/tsn_r18_1x1x8_100e_hvu_concept_rgb_20201027-fc1dd8e3.pth + tag category: concept +- Config: configs/recognition/tsn/hvu/tsn_r18_1x1x8_100e_hvu_attribute_rgb.py + In Collection: TSN + Metadata: + Architecture: ResNet18 + Batch Size: 32 + Epochs: 100 + FLOPs: 59482991104 + Parameters: 11236533 + Pretrained: ImageNet + Resolution: short-side 256 + Training Data: HVU + Training Resources: 8 GPUs + Modality: RGB + Name: tsn_r18_1x1x8_100e_hvu_attribute_rgb + Results: + - Dataset: HVU + Metrics: + mAP: 46.1 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/attribute/tsn_r18_1x1x8_100e_hvu_attribute_rgb_20201027.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/attribute/tsn_r18_1x1x8_100e_hvu_attribute_rgb_20201027.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/hvu/attribute/tsn_r18_1x1x8_100e_hvu_attribute_rgb_20201027-0b3b49d2.pth + tag category: attribute +- Config: configs/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.py + In Collection: TSN + Metadata: + Architecture: Swin Transformer + Batch Size: 24 + Epochs: 100 + Parameters: 87153224 + Pretrained: ImageNet + Resolution: short-side 320 + Training Data: Kinetics400 + Training Resources: 8 GPUs + Name: tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 77.51 + Top 5 Accuracy: 92.92 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.json + Training Log: https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb.log + Weights: https://download.openmmlab.com/mmaction/recognition/tsn/custom_backbones/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb/tsn_swin_transformer_video_320p_1x1x3_100e_kinetics400_rgb-805380f6.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_fp16_r50_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_fp16_r50_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..5f73da4ae078f2e8e498d55321939306c76f026b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_fp16_r50_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,89 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# fp16 settings +fp16 = dict() + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_fp16_r50_1x1x3_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..11da54749244c72548e10bfd2a8db68d67e534bc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r101_1x1x5_50e_mmit_rgb.py @@ -0,0 +1,116 @@ +_base_ = [ + '../../_base_/schedules/sgd_tsm_50e.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet101', + depth=101, + norm_eval=False), + cls_head=dict( + type='TSNHead', + num_classes=313, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + loss_cls=dict(type='BCELossWithLogits', loss_weight=160.0), + dropout_ratio=0.5, + init_std=0.01, + multi_class=True, + label_smooth_eps=0), + train_cfg=None, + test_cfg=dict(average_clips=None)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/mmit/videos' +data_root_val = '/data/mmit/videos' +ann_file_train = 'data/mmit/mmit_train_list_videos.txt' +ann_file_val = 'data/mmit/mmit_val_list_videos.txt' +ann_file_test = 'data/mmit/mmit_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=5), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=5, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=5, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline, + multi_class=True, + num_classes=313), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + multi_class=True, + num_classes=313), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline, + multi_class=True, + num_classes=313)) +evaluation = dict(interval=5, metrics=['mmit_mean_average_precision']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r101_1x1x5_50e_mmit_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..9b5de9f691b1081c79674b0ae1a25e7656691949 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv1_rgb.py @@ -0,0 +1,94 @@ +_base_ = ['./tsn_r50_1x1x8_50e_sthv1_rgb.py'] +# model settings +model = dict(cls_head=dict(init_std=0.001)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, + weight_decay=0.0005) # this lr is used for 8 gpus + +# runtime settings +checkpoint_config = dict(interval=1) +work_dir = './work_dirs/tsn_r50_1x1x16_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..1d8b3e014306443ba219bfde207c802c9fecbd69 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x16_50e_sthv2_rgb.py @@ -0,0 +1,92 @@ +_base_ = ['./tsn_r50_1x1x8_50e_sthv2_rgb.py'] + +# model settings +model = dict(cls_head=dict(init_std=0.001)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv2/rawframes' +data_root_val = 'data/sthv2/rawframes' +ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt' +ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt' +ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.005, momentum=0.9, + weight_decay=0.0005) # this lr is used for 8 gpus +# optimizer config +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) + +# runtime settings +checkpoint_config = dict(interval=1) +work_dir = './work_dirs/tsn_r50_1x1x16_50e_sthv2_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..1eca1ae6aa25638b1f059e6c9fb0708839ea1ad7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,86 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..e902eba9558eea1da36c687664c93e54e7aad3b6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb.py @@ -0,0 +1,91 @@ +_base_ = ['../../_base_/models/tsn_r50.py', '../../_base_/default_runtime.py'] + +# model settings +model = dict(cls_head=dict(num_classes=101, init_std=0.001)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ucf101/rawframes/' +data_root_val = 'data/ucf101/rawframes/' +split = 1 # official train/test splits. valid numbers: 1, 2, 3 +ann_file_train = f'data/ucf101/ucf101_train_split_{split}_rawframes.txt' +ann_file_val = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +ann_file_test = f'data/ucf101/ucf101_val_split_{split}_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00128, momentum=0.9, + weight_decay=0.0005) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[]) +total_epochs = 75 + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = f'./work_dirs/tsn_r50_1x1x3_75e_ucf101_split_{split}_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..d11f28323713664a957a483bb42324c0e7c3e7a3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb.py @@ -0,0 +1,95 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=339)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/mit/videos/training' +data_root_val = '/data/mit/videos/validation/' +ann_file_train = 'data/mit/mit_train_list_videos.txt' +ann_file_val = 'data/mit/mit_val_list_videos.txt' +ann_file_test = 'data/mit/mit_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=6), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=6, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordDecode'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=6, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# optimizer +optimizer = dict( + type='SGD', lr=0.005, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_1x1x6_100e_mit_rgb' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..b881817293f46e619c5b4dc2a74a73dea80eceac --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb.py @@ -0,0 +1,90 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=51)) + +# dataset settings +split = 1 +dataset_type = 'RawframeDataset' +data_root = 'data/hmdb51/rawframes' +data_root_val = 'data/hmdb51/rawframes' +ann_file_train = f'data/hmdb51/hmdb51_train_split_{split}_rawframes.txt' +ann_file_val = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +ann_file_test = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5)) + +# optimizer +optimizer = dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=0.0001) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/' +gpu_ids = range(0, 1) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6b3230ec2d72768eccfc3bd12144b0b13ced6321 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb.py @@ -0,0 +1,91 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=51)) + +# dataset settings +split = 1 +dataset_type = 'RawframeDataset' +data_root = 'data/hmdb51/rawframes' +data_root_val = 'data/hmdb51/rawframes' +ann_file_train = f'data/hmdb51/hmdb51_train_split_{split}_rawframes.txt' +ann_file_val = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +ann_file_test = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5)) + +# optimizer +optimizer = dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=0.0001) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/' +load_from = 'https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x8_100e_kinetics400_rgb/tsn_r50_256p_1x1x8_100e_kinetics400_rgb_20200817-883baf16.pth' # noqa: E501 +gpu_ids = range(0, 1) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..83081300ed070af5ccf4c7684d796e728c8698a6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb.py @@ -0,0 +1,90 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=51)) + +# dataset settings +split = 1 +dataset_type = 'RawframeDataset' +data_root = 'data/hmdb51/rawframes' +data_root_val = 'data/hmdb51/rawframes' +ann_file_train = f'data/hmdb51/hmdb51_train_split_{split}_rawframes.txt' +ann_file_val = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' +ann_file_test = f'data/hmdb51/hmdb51_val_split_{split}_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5)) + +# optimizer +optimizer = dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=0.0001) + +# runtime settings +checkpoint_config = dict(interval=5) +log_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/' +load_from = 'https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x6_100e_mit_rgb/tsn_r50_1x1x6_100e_mit_rgb_20200618-d512ab1b.pth' # noqa: E501 +gpu_ids = range(0, 1) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..0147490a4276859faf0ecb32656ffdfd03a386cf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py @@ -0,0 +1,101 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + backbone=dict( + norm_cfg=dict(type='SyncBN', requires_grad=True), norm_eval=True), + cls_head=dict(num_classes=174, dropout_ratio=0.5)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv1/rawframes' +data_root_val = 'data/sthv1/rawframes' +ann_file_train = 'data/sthv1/sthv1_train_list_rawframes.txt' +ann_file_val = 'data/sthv1/sthv1_val_list_rawframes.txt' +ann_file_test = 'data/sthv1/sthv1_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.02, momentum=0.9, + weight_decay=0.0005) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_1x1x8_50e_sthv1_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6b33b98a1e0f922f863e60bd00cf3d5c0e0500aa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv2_rgb.py @@ -0,0 +1,93 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_50e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=174, dropout_ratio=0.5)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/sthv2/rawframes' +data_root_val = 'data/sthv2/rawframes' +ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt' +ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt' +ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.02, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_1x1x8_50e_sthv2_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..64554a7934df240f5f3c295ff68780d4e5b9bef9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,75 @@ +_base_ = ['./tsn_r50_1x1x3_100e_kinetics400_rgb.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_320p' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_320p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +# runtime settings +work_dir = './work_dirs/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow.py new file mode 100644 index 0000000000000000000000000000000000000000..761d214aaddec8a4889bdb82346c47bf827f2924 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow.py @@ -0,0 +1,96 @@ +_base_ = ['../../_base_/models/tsn_r50.py', '../../_base_/default_runtime.py'] + +# model settings +# ``in_channels`` should be 2 * clip_len +model = dict(backbone=dict(in_channels=10)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_320p' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_train = 'data/kinetics400/kinetics_flow_train_list.txt' +ann_file_val = 'data/kinetics400/kinetics_flow_val_list.txt' +ann_file_test = 'data/kinetics400/kinetics_flow_val_list.txt' +img_norm_cfg = dict(mean=[128, 128], std=[128, 128]) +train_pipeline = [ + dict(type='SampleFrames', clip_len=5, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=5, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=5, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{}_{:05d}.jpg', + modality='Flow', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{}_{:05d}.jpg', + modality='Flow', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{}_{:05d}.jpg', + modality='Flow', + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.005, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[70, 100]) +total_epochs = 110 + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_320p_1x1x3_110e_kinetics400_flow/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7641b9771f56791cf4b7612a236911dd9bed7f22 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,85 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_320p' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes_320p.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes_320p.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00375, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = './work_dirs/tsn_r50_320p_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow.py new file mode 100644 index 0000000000000000000000000000000000000000..3ca87c708cbbcfc80fa3c65319c9e56ce3e162e8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_110e_kinetics400_flow.py @@ -0,0 +1,96 @@ +_base_ = ['../../_base_/models/tsn_r50.py', '../../_base_/default_runtime.py'] + +# model settings +# ``in_channels`` should be 2 * clip_len +model = dict(backbone=dict(in_channels=10)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train_320p' +data_root_val = 'data/kinetics400/rawframes_val_320p' +ann_file_train = 'data/kinetics400/kinetics400_flow_train_list_320p.txt' +ann_file_val = 'data/kinetics400/kinetics400_flow_val_list_320p.txt' +ann_file_test = 'data/kinetics400/kinetics400_flow_val_list_320p.txt' +img_norm_cfg = dict(mean=[128, 128], std=[128, 128]) +train_pipeline = [ + dict(type='SampleFrames', clip_len=5, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=5, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=5, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{}_{:05d}.jpg', + modality='Flow', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{}_{:05d}.jpg', + modality='Flow', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{}_{:05d}.jpg', + modality='Flow', + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.001875, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[70, 100]) +total_epochs = 110 + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_320p_1x1x8_110e_kinetics400_flow/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow.py new file mode 100644 index 0000000000000000000000000000000000000000..ebb9982850a97d42075bedf350857bdf669ec9c2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow.py @@ -0,0 +1,107 @@ +_base_ = ['../../_base_/models/tsn_r50.py', '../../_base_/default_runtime.py'] + +# model settings +# ``in_channels`` should be 2 * clip_len +model = dict( + backbone=dict(in_channels=10), + cls_head=dict(num_classes=200, dropout_ratio=0.8)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ActivityNet/rawframes' +data_root_val = 'data/ActivityNet/rawframes' +ann_file_train = 'data/ActivityNet/anet_train_clip.txt' +ann_file_val = 'data/ActivityNet/anet_val_clip.txt' +ann_file_test = 'data/ActivityNet/anet_val_clip.txt' +img_norm_cfg = dict(mean=[128, 128], std=[128, 128], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=5, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=5, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=5, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='flow_{}_{:05d}.jpg', + with_offset=True, + modality='Flow', + start_index=0, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='flow_{}_{:05d}.jpg', + with_offset=True, + modality='Flow', + start_index=0, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='flow_{}_{:05d}.jpg', + with_offset=True, + modality='Flow', + start_index=0, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001) +# this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[60, 120]) +total_epochs = 150 + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_320p_1x1x8_150e_activitynet_clip_flow/' +load_from = ('https://download.openmmlab.com/mmaction/recognition/tsn/' + 'tsn_r50_320p_1x1x8_110e_kinetics400_flow/' + 'tsn_r50_320p_1x1x8_110e_kinetics400_flow_20200705-1f39486b.pth') +workflow = [('train', 5)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow.py new file mode 100644 index 0000000000000000000000000000000000000000..dfab68032f1027a21ae30ab15685733f78927456 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_150e_activitynet_video_flow.py @@ -0,0 +1,105 @@ +_base_ = ['../../_base_/models/tsn_r50.py', '../../_base_/default_runtime.py'] + +# model settings +# ``in_channels`` should be 2 * clip_len +model = dict( + backbone=dict(in_channels=10), + cls_head=dict(num_classes=200, dropout_ratio=0.8)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ActivityNet/rawframes' +data_root_val = 'data/ActivityNet/rawframes' +ann_file_train = 'data/ActivityNet/anet_train_video.txt' +ann_file_val = 'data/ActivityNet/anet_val_video.txt' +ann_file_test = 'data/ActivityNet/anet_val_clip.txt' +img_norm_cfg = dict(mean=[128, 128], std=[128, 128], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=5, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=5, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=5, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW_Flow'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='flow_{}_{:05d}.jpg', + modality='Flow', + start_index=0, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='flow_{}_{:05d}.jpg', + modality='Flow', + start_index=0, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='flow_{}_{:05d}.jpg', + with_offset=True, + modality='Flow', + start_index=0, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001) +# this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[60, 120]) +total_epochs = 150 + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_320p_1x1x8_150e_activitynet_video_flow/' +load_from = ('https://download.openmmlab.com/mmaction/recognition/tsn/' + 'tsn_r50_320p_1x1x8_110e_kinetics400_flow/' + 'tsn_r50_320p_1x1x8_110e_kinetics400_flow_20200705-1f39486b.pth') +workflow = [('train', 5)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..7ccb2beed5f5f8094f046fe20324b8ccaf90bda5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb.py @@ -0,0 +1,98 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_50e.py', + '../../_base_/default_runtime.py' +] +# model settings +model = dict(cls_head=dict(num_classes=200, dropout_ratio=0.8)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ActivityNet/rawframes' +data_root_val = 'data/ActivityNet/rawframes' +ann_file_train = 'data/ActivityNet/anet_train_clip.txt' +ann_file_val = 'data/ActivityNet/anet_val_clip.txt' +ann_file_test = 'data/ActivityNet/anet_val_clip.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline, + with_offset=True, + start_index=0, + filename_tmpl='image_{:05d}.jpg'), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline, + with_offset=True, + start_index=0, + filename_tmpl='image_{:05d}.jpg'), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline, + with_offset=True, + start_index=0, + filename_tmpl='image_{:05d}.jpg')) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001) + +# runtime settings +work_dir = './work_dirs/tsn_r50_320p_1x1x8_50e_activitynet_clip_rgb/' +load_from = ('https://download.openmmlab.com/mmaction/recognition/tsn/' + 'tsn_r50_320p_1x1x8_100e_kinetics400_rgb/' + 'tsn_r50_320p_1x1x8_100e_kinetics400_rgb_20200702-ef80e3d7.pth') +workflow = [('train', 5)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..17f1a7e79c6d34e5ac9b988fbcf8e8c6191321ff --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb.py @@ -0,0 +1,88 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_50e.py', + '../../_base_/default_runtime.py' +] +# model settings +model = dict(cls_head=dict(num_classes=200, dropout_ratio=0.8)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ActivityNet/rawframes' +data_root_val = 'data/ActivityNet/rawframes' +ann_file_train = 'data/ActivityNet/anet_train_video.txt' +ann_file_val = 'data/ActivityNet/anet_val_video.txt' +ann_file_test = 'data/ActivityNet/anet_val_video.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001) + +# runtime settings +work_dir = './work_dirs/tsn_r50_320p_1x1x8_50e_activitynet_video_rgb/' +load_from = ('https://download.openmmlab.com/mmaction/recognition/tsn/' + 'tsn_r50_320p_1x1x8_100e_kinetics400_rgb/' + 'tsn_r50_320p_1x1x8_100e_kinetics400_rgb_20200702-ef80e3d7.pth') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..a64608acfe4af30db2757b32b679bbe926fd93ff --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py @@ -0,0 +1,42 @@ +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False), + train_cfg=None, + test_cfg=dict(feature_extraction=True)) + +# dataset settings +dataset_type = 'VideoDataset' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict(type='DecordInit', num_threads=1), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + test=dict( + type=dataset_type, + ann_file=None, + data_prefix=None, + pipeline=test_pipeline)) + +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..e8e498e9dfe079171a5b8d86ef2b38ecd62fe11e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb.py @@ -0,0 +1,96 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(dropout_ratio=0.5, init_std=0.001)) + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DenseSampleFrames', clip_len=1, frame_interval=1, num_clips=5), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.03, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2)) + +# runtime settings +work_dir = './work_dirs/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..70affa83824eee43ef643613b16a355dee605f55 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,91 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DenseSampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.005, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +work_dir = './work_dirs/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..2a9594e27cc257e189dd3deff4ef1b43521ffe57 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,29 @@ +_base_ = ['../../_base_/models/tsn_r50.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=None, + data_prefix=None, + pipeline=test_pipeline)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..a2a3e61e1ce8a26d0c0aed0b18135f5b5cd060f1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb.py @@ -0,0 +1,98 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=48)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/diving48/videos' +data_root_val = 'data/diving48/videos' +ann_file_train = 'data/diving48/diving48_train_list_videos.txt' +ann_file_val = 'data/diving48/diving48_val_list_videos.txt' +ann_file_test = 'data/diving48/diving48_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=4, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +optimizer = dict( + type='SGD', + lr=0.00125, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) + +# runtime settings +work_dir = './work_dirs/tsn_r50_video_1x1x16_100e_diving48_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..57a8614fd5e0988c901a77a8bf95bee9747529be --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb.py @@ -0,0 +1,98 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=48)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/diving48/videos' +data_root_val = 'data/diving48/videos' +ann_file_train = 'data/diving48/diving48_train_list_videos.txt' +ann_file_val = 'data/diving48/diving48_val_list_videos.txt' +ann_file_test = 'data/diving48/diving48_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +optimizer = dict( + type='SGD', + lr=0.0025, # this lr is used for 8 gpus + momentum=0.9, + weight_decay=0.0001) + +# runtime settings +work_dir = './work_dirs/tsn_r50_video_1x1x8_100e_diving48_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..6ca137a845c1752a6ae583cebd055002b1713d89 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,87 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/tsn_r50_video_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..687ce2018f05893a25f269b205a4ac1b85f12d76 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb.py @@ -0,0 +1,91 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=600)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics600/videos_train' +data_root_val = 'data/kinetics600/videos_val' +ann_file_train = 'data/kinetics600/kinetics600_train_list_videos.txt' +ann_file_val = 'data/kinetics600/kinetics600_val_list_videos.txt' +ann_file_test = 'data/kinetics600/kinetics600_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00375, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_1x1x3_100e_kinetics600_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..62390025f4cf2d40950e618e83e114c62a354507 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb.py @@ -0,0 +1,91 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# model settings +model = dict(cls_head=dict(num_classes=700)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics700/videos_train' +data_root_val = 'data/kinetics700/videos_val' +ann_file_train = 'data/kinetics700/kinetics700_train_list_videos.txt' +ann_file_val = 'data/kinetics700/kinetics700_val_list_videos.txt' +ann_file_test = 'data/kinetics700/kinetics700_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.00375, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_1x1x3_100e_kinetics700_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..aff8c54d2ac04767e6b96edfaea08035654f5f3b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,82 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='DecordDecode'), + dict(type='RandomResizedCrop'), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/tsn_r50_video_1x1x3_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..e2f0107262861ad925e9a39282f50ea7a6381b83 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,88 @@ +_base_ = [ + '../../_base_/models/tsn_r50.py', '../../_base_/schedules/sgd_100e.py', + '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='DenseSampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='DenseSampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + val_dataloader=dict(videos_per_gpu=1), + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_imgaug_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_imgaug_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..c16f7a3001e9b1893abd07339effb580f1f0bd28 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_imgaug_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,126 @@ +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False), + cls_head=dict( + type='TSNHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.4, + init_std=0.01), + train_cfg=None, + test_cfg=dict(average_clips=None)) +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Imgaug', transforms='default'), + # dict( + # type='Imgaug', + # transforms=[ + # dict(type='Rotate', rotate=(-20, 20)), + # dict(type='Dropout', p=(0, 0.05)) + # ]), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[40, 80]) +total_epochs = 100 +checkpoint_config = dict(interval=1) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict( + interval=20, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook'), + ]) +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/tsn_r50_video_1x1x8_100e_kinetics400_rgb/' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..99dd110ce67d3044ad02e98eb8d3df3d16ddec98 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py @@ -0,0 +1,30 @@ +_base_ = ['../../_base_/models/tsn_r50.py'] + +# dataset settings +dataset_type = 'VideoDataset' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +test_pipeline = [ + dict(type='OpenCVInit', num_threads=1), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='OpenCVDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=None, + data_prefix=None, + pipeline=test_pipeline)) diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_mixup_1x1x8_100e_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_mixup_1x1x8_100e_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..4f5f2a3a038c8b6b9697febadc6d69293f1e4655 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/tsn/tsn_r50_video_mixup_1x1x8_100e_kinetics400_rgb.py @@ -0,0 +1,107 @@ +_base_ = [ + '../../_base_/schedules/sgd_100e.py', '../../_base_/default_runtime.py' +] + +# model settings +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False), + cls_head=dict( + type='TSNHead', + num_classes=400, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.4, + init_std=0.01), + # model training and testing settings + train_cfg=dict( + blending=dict(type='MixupBlending', num_classes=400, alpha=.2)), + test_cfg=dict(average_clips=None)) + +# dataset settings +dataset_type = 'VideoDataset' +data_root = 'data/kinetics400/videos_train' +data_root_val = 'data/kinetics400/videos_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='DecordInit'), + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), + dict(type='DecordDecode'), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=8, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict(type='DecordInit'), + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='DecordDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# runtime settings +work_dir = './work_dirs/tsn_r50_video_mixup_1x1x8_100e_kinetics400_rgb/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0c835e3dee424bf3d4b33c83ff62b98bb111ef24 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/README.md @@ -0,0 +1,68 @@ +# X3D + +[X3D: Expanding Architectures for Efficient Video Recognition](https://openaccess.thecvf.com/content_CVPR_2020/html/Feichtenhofer_X3D_Expanding_Architectures_for_Efficient_Video_Recognition_CVPR_2020_paper.html) + + + +## Abstract + + + +This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. Inspired by feature selection methods in machine learning, a simple stepwise network expansion approach is employed that expands a single axis in each step, such that good accuracy to complexity trade-off is achieved. To expand X3D to a specific target complexity, we perform progressive forward expansion followed by backward contraction. X3D achieves state-of-the-art performance while requiring 4.8x and 5.5x fewer multiply-adds and parameters for similar accuracy as previous work. Our most surprising finding is that networks with high spatiotemporal resolution can perform well, while being extremely light in terms of network width and parameters. We report competitive accuracy at unprecedented efficiency on video classification and detection benchmarks. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | resolution | backbone | top1 10-view | top1 30-view | reference top1 10-view | reference top1 30-view | ckpt | +| :--------------------------------------------------------------------------------------------------------- | :------------: | :------: | :----------: | :----------: | :----------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | +| [x3d_s_13x6x1_facebook_kinetics400_rgb](/configs/recognition/x3d/x3d_s_13x6x1_facebook_kinetics400_rgb.py) | short-side 320 | X3D_S | 72.7 | 73.2 | 73.1 \[[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)\] | 73.5 \[[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)\] | [ckpt](https://download.openmmlab.com/mmaction/recognition/x3d/facebook/x3d_s_facebook_13x6x1_kinetics400_rgb_20201027-623825a0.pth)\[1\] | +| [x3d_m_16x5x1_facebook_kinetics400_rgb](/configs/recognition/x3d/x3d_m_16x5x1_facebook_kinetics400_rgb.py) | short-side 320 | X3D_M | 75.0 | 75.6 | 75.1 \[[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)\] | 76.2 \[[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)\] | [ckpt](https://download.openmmlab.com/mmaction/recognition/x3d/facebook/x3d_m_facebook_16x5x1_kinetics400_rgb_20201027-3f42382a.pth)\[1\] | + +\[1\] The models are ported from the repo [SlowFast](https://github.com/facebookresearch/SlowFast/) and tested on our data. Currently, we only support the testing of X3D models, training will be available soon. + +:::{note} + +1. The values in columns named after "reference" are the results got by testing the checkpoint released on the original repo and codes, using the same dataset with ours. +2. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test X3D model on Kinetics-400 dataset and dump the result to a json file. + +```shell +python tools/test.py configs/recognition/x3d/x3d_s_13x6x1_facebook_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips prob +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@misc{feichtenhofer2020x3d, + title={X3D: Expanding Architectures for Efficient Video Recognition}, + author={Christoph Feichtenhofer}, + year={2020}, + eprint={2004.04730}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..5a05b88cd191ed4d7c0ff0c475ded0f3e845c142 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/README_zh-CN.md @@ -0,0 +1,52 @@ +# X3D + +## 简介 + + + +```BibTeX +@misc{feichtenhofer2020x3d, + title={X3D: Expanding Architectures for Efficient Video Recognition}, + author={Christoph Feichtenhofer}, + year={2020}, + eprint={2004.04730}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | 分辨率 | 主干网络 | top1 10-view | top1 30-view | 参考代码的 top1 10-view | 参考代码的 top1 30-view | ckpt | +| :--------------------------------------------------------------------------------------------------------- | :------: | :------: | :----------: | :----------: | :----------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | +| [x3d_s_13x6x1_facebook_kinetics400_rgb](/configs/recognition/x3d/x3d_s_13x6x1_facebook_kinetics400_rgb.py) | 短边 320 | X3D_S | 72.7 | 73.2 | 73.1 \[[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)\] | 73.5 \[[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)\] | [ckpt](https://download.openmmlab.com/mmaction/recognition/x3d/facebook/x3d_s_facebook_13x6x1_kinetics400_rgb_20201027-623825a0.pth)\[1\] | +| [x3d_m_16x5x1_facebook_kinetics400_rgb](/configs/recognition/x3d/x3d_m_16x5x1_facebook_kinetics400_rgb.py) | 短边 320 | X3D_M | 75.0 | 75.6 | 75.1 \[[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)\] | 76.2 \[[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)\] | [ckpt](https://download.openmmlab.com/mmaction/recognition/x3d/facebook/x3d_m_facebook_16x5x1_kinetics400_rgb_20201027-3f42382a.pth)\[1\] | + +\[1\] 这里的模型是从 [SlowFast](https://github.com/facebookresearch/SlowFast/) 代码库中导入并在 MMAction2 使用的数据上进行测试的。目前仅支持 X3D 模型的测试,训练部分将会在近期提供。 + +注: + +1. 参考代码的结果是通过使用相同的数据和原来的代码库所提供的模型进行测试得到的。 +2. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的 Kinetics400 部分 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics-400 数据集上测试 X3D 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/recognition/x3d/x3d_s_13x6x1_facebook_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json --average-clips prob +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..2608a6a9106784a6602a91d61c06114100d1a9a3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/metafile.yml @@ -0,0 +1,51 @@ +Collections: +- Name: X3D + README: configs/recognition/x3d/README.md + Paper: + URL: https://arxiv.org/abs/2004.04730 + Title: "X3D: Expanding Architectures for Efficient Video Recognition" +Models: +- Config: configs/recognition/x3d/x3d_s_13x6x1_facebook_kinetics400_rgb.py + In Collection: X3D + Metadata: + Architecture: X3D_S + Batch Size: 1 + FLOPs: 2967543760 + Parameters: 3794322 + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: x3d_s_13x6x1_facebook_kinetics400_rgb + Converted From: + Weights: https://dl.fbaipublicfiles.com/pyslowfast/x3d_models/x3d_s.pyth + Code: https://github.com/facebookresearch/SlowFast/ + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 73.2 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/x3d/facebook/x3d_s_facebook_13x6x1_kinetics400_rgb_20201027-623825a0.pth + reference top1 10-view: 73.1 [[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)] + reference top1 30-view: 73.5 [[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)] +- Config: configs/recognition/x3d/x3d_m_16x5x1_facebook_kinetics400_rgb.py + In Collection: X3D + Metadata: + Architecture: X3D_M + Batch Size: 1 + FLOPs: 6490866832 + Parameters: 3794322 + Resolution: short-side 320 + Training Data: Kinetics-400 + Modality: RGB + Name: x3d_m_16x5x1_facebook_kinetics400_rgb + Converted From: + Weights: https://dl.fbaipublicfiles.com/pyslowfast/x3d_models/x3d_s.pyth + Code: https://github.com/facebookresearch/SlowFast/ + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 75.6 + Task: Action Recognition + Weights: https://download.openmmlab.com/mmaction/recognition/x3d/facebook/x3d_m_facebook_16x5x1_kinetics400_rgb_20201027-3f42382a.pth + reference top1 10-view: 75.1 [[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)] + reference top1 30-view: 76.2 [[SlowFast](https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/x3d_m_16x5x1_facebook_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/x3d_m_16x5x1_facebook_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..baaed73fb4597d19bb2df3b0deee2b50822e84a1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/x3d_m_16x5x1_facebook_kinetics400_rgb.py @@ -0,0 +1,33 @@ +_base_ = ['../../_base_/models/x3d.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[114.75, 114.75, 114.75], std=[57.38, 57.38, 57.38], to_bgr=False) +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=16, + frame_interval=5, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/x3d_s_13x6x1_facebook_kinetics400_rgb.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/x3d_s_13x6x1_facebook_kinetics400_rgb.py new file mode 100644 index 0000000000000000000000000000000000000000..0de5cf95ed6470a650d2d487fa207cdb22a52c25 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition/x3d/x3d_s_13x6x1_facebook_kinetics400_rgb.py @@ -0,0 +1,33 @@ +_base_ = ['../../_base_/models/x3d.py'] + +# dataset settings +dataset_type = 'RawframeDataset' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +img_norm_cfg = dict( + mean=[114.75, 114.75, 114.75], std=[57.38, 57.38, 57.38], to_bgr=False) +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=13, + frame_interval=6, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 192)), + dict(type='ThreeCrop', crop_size=192), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=1, + workers_per_gpu=2, + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + +dist_params = dict(backend='nccl') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/audioonly/audioonly_r50_64x1x1_100e_kinetics400_audio_feature.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/audioonly/audioonly_r50_64x1x1_100e_kinetics400_audio_feature.py new file mode 100644 index 0000000000000000000000000000000000000000..d8be216e99415717e502ace53415f35543bd4e9e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/audioonly/audioonly_r50_64x1x1_100e_kinetics400_audio_feature.py @@ -0,0 +1,80 @@ +_base_ = [ + '../../_base_/models/audioonly_r50.py', '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'AudioFeatureDataset' +data_root = 'data/kinetics400/audio_feature_train' +data_root_val = 'data/kinetics400/audio_feature_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_audio_feature.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_audio_feature.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_audio_feature.txt' +train_pipeline = [ + dict(type='LoadAudioFeature'), + dict(type='SampleFrames', clip_len=64, frame_interval=1, num_clips=1), + dict(type='AudioFeatureSelector'), + dict(type='FormatAudioShape', input_format='NCTF'), + dict(type='Collect', keys=['audios', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['audios']) +] +val_pipeline = [ + dict(type='LoadAudioFeature'), + dict( + type='SampleFrames', + clip_len=64, + frame_interval=1, + num_clips=1, + test_mode=True), + dict(type='AudioFeatureSelector'), + dict(type='FormatAudioShape', input_format='NCTF'), + dict(type='Collect', keys=['audios', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['audios']) +] +test_pipeline = [ + dict(type='LoadAudioFeature'), + dict( + type='SampleFrames', + clip_len=64, + frame_interval=1, + num_clips=10, + test_mode=True), + dict(type='AudioFeatureSelector'), + dict(type='FormatAudioShape', input_format='NCTF'), + dict(type='Collect', keys=['audios', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['audios']) +] +data = dict( + videos_per_gpu=160, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=2.0, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 100 + +# runtime settings +checkpoint_config = dict(interval=5) +log_config = dict(interval=1) +work_dir = ('./work_dirs/' + + 'audioonly_r50_64x1x1_100e_kinetics400_audio_feature/') diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/README.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7d1526165ce1a464c7dd54e1652f9a85830db585 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/README.md @@ -0,0 +1,97 @@ +# ResNet for Audio + +[Audiovisual SlowFast Networks for Video Recognition](https://arxiv.org/abs/2001.08740) + + + +## Abstract + + + +We present Audiovisual SlowFast Networks, an archi- +tecture for integrated audiovisual perception. AVSlowFast has Slow and Fast visual pathways that are deeply inte- grated with a Faster Audio pathway to model vision and sound in a unified representation. We fuse audio and vi- sual features at multiple layers, enabling audio to con- tribute to the formation of hierarchical audiovisual con- cepts. To overcome training difficulties that arise from dif- ferent learning dynamics for audio and visual modalities, we introduce DropPathway, which randomly drops the Au- dio pathway during training as an effective regularization technique. Inspired by prior studies in neuroscience, we perform hierarchical audiovisual synchronization to learn joint audiovisual features. We report state-of-the-art results on six video action classification and detection datasets, perform detailed ablation studies, and show the gener- alization of AVSlowFast to learn self-supervised audiovi- sual features. Code will be made available at: https: //github.com/facebookresearch/SlowFast. + + + +
+ +
+ +## Results and Models + +### Kinetics-400 + +| config | n_fft | gpus | backbone | pretrain | top1 acc/delta | top5 acc/delta | inference_time(video/s) | gpu_mem(M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :--: | :-----------: | :------: | :------------: | :------------: | :---------------------: | :--------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r18_64x1x1_100e_kinetics400_audio_feature](/configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py) | 1024 | 8 | ResNet18 | None | 19.7 | 35.75 | x | 1897 | [ckpt](https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/tsn_r18_64x1x1_100e_kinetics400_audio_feature_20201012-bf34df6c.pth) | [log](https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/20201010_144630.log) | [json](https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/20201010_144630.log.json) | +| [tsn_r18_64x1x1_100e_kinetics400_audio_feature](/configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py) + [tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py) | 1024 | 8 | ResNet(18+50) | None | 71.50(+0.39) | 90.18(+0.14) | x | x | x | x | x | + +:::{note} + +1. The **gpus** indicates the number of gpus we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. +2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time, not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time. +3. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available. + +::: + +For more details on data preparation, you can refer to `Prepare audio` in [Data Preparation](/docs/data_preparation.md). + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train ResNet model on Kinetics-400 audio dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/audio_recognition/tsn_r50_64x1x1_100e_kinetics400_audio_feature.py \ + --work-dir work_dirs/tsn_r50_64x1x1_100e_kinetics400_audio_feature \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test ResNet model on Kinetics-400 audio dataset and dump the result to a json file. + +```shell +python tools/test.py configs/audio_recognition/tsn_r50_64x1x1_100e_kinetics400_audio_feature.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Fusion + +For multi-modality fusion, you can use the simple [script](/tools/analysis/report_accuracy.py), the standard usage is: + +```shell +python tools/analysis/report_accuracy.py --scores ${AUDIO_RESULT_PKL} ${VISUAL_RESULT_PKL} --datalist data/kinetics400/kinetics400_val_list_rawframes.txt --coefficient 1 1 +``` + +- AUDIO_RESULT_PKL: The saved output file of `tools/test.py` by the argument `--out`. +- VISUAL_RESULT_PKL: The saved output file of `tools/test.py` by the argument `--out`. + +## Citation + +```BibTeX +@article{xiao2020audiovisual, + title={Audiovisual SlowFast Networks for Video Recognition}, + author={Xiao, Fanyi and Lee, Yong Jae and Grauman, Kristen and Malik, Jitendra and Feichtenhofer, Christoph}, + journal={arXiv preprint arXiv:2001.08740}, + year={2020} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..922a9c0a73f48f805dd1054e799a3005eff2e2d6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/README_zh-CN.md @@ -0,0 +1,81 @@ +# ResNet for Audio + +## 简介 + + + +```BibTeX +@article{xiao2020audiovisual, + title={Audiovisual SlowFast Networks for Video Recognition}, + author={Xiao, Fanyi and Lee, Yong Jae and Grauman, Kristen and Malik, Jitendra and Feichtenhofer, Christoph}, + journal={arXiv preprint arXiv:2001.08740}, + year={2020} +} +``` + +## 模型库 + +### Kinetics-400 + +| 配置文件 | n_fft | GPU 数量 | 主干网络 | 预训练 | top1 acc/delta | top5 acc/delta | 推理时间 (video/s) | GPU 显存占用 (M) | ckpt | log | json | +| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :------: | :-----------: | :----: | :------------: | :------------: | :----------------: | :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: | +| [tsn_r18_64x1x1_100e_kinetics400_audio_feature](/configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py) | 1024 | 8 | ResNet18 | None | 19.7 | 35.75 | x | 1897 | [ckpt](https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/tsn_r18_64x1x1_100e_kinetics400_audio_feature_20201012-bf34df6c.pth) | [log](https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/20201010_144630.log) | [json](https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/20201010_144630.log.json) | +| [tsn_r18_64x1x1_100e_kinetics400_audio_feature](/configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py) + [tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py) | 1024 | 8 | ResNet(18+50) | None | 71.50(+0.39) | 90.18(+0.14) | x | x | x | x | x | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.01 对应 4 GPUs x 2 video/gpu,以及 lr=0.08 对应 16 GPUs x 4 video/gpu。 +2. 这里的 **推理时间** 是根据 [基准测试脚本](/tools/analysis/benchmark.py) 获得的,采用测试时的采帧策略,且只考虑模型的推理时间, + 并不包括 IO 时间以及预处理时间。对于每个配置,MMAction2 使用 1 块 GPU 并设置批大小(每块 GPU 处理的视频个数)为 1 来计算推理时间。 +3. 我们使用的 Kinetics400 验证集包含 19796 个视频,用户可以从 [验证集视频](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB) 下载这些视频。同时也提供了对应的 [数据列表](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (每行格式为:视频 ID,视频帧数目,类别序号)以及 [标签映射](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) (类别序号到类别名称)。 + +对于数据集准备的细节,用户可参考 [数据集准备文档](/docs_zh_CN/data_preparation.md) 中的准备音频部分。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: 以一个确定性的训练方式,辅以定期的验证过程进行 ResNet 模型在 Kinetics400 音频数据集上的训练。 + +```shell +python tools/train.py configs/audio_recognition/tsn_r50_64x1x1_100e_kinetics400_audio_feature.py \ + --work-dir work_dirs/tsn_r50_64x1x1_100e_kinetics400_audio_feature \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 Kinetics400 音频数据集上测试 ResNet 模型,并将结果导出为一个 json 文件。 + +```shell +python tools/test.py configs/audio_recognition/tsn_r50_64x1x1_100e_kinetics400_audio_feature.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.json +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 + +## 融合 + +对于多模态融合,用户可以使用这个 [脚本](/tools/analysis/report_accuracy.py),其命令大致为: + +```shell +python tools/analysis/report_accuracy.py --scores ${AUDIO_RESULT_PKL} ${VISUAL_RESULT_PKL} --datalist data/kinetics400/kinetics400_val_list_rawframes.txt --coefficient 1 1 +``` + +- AUDIO_RESULT_PKL: `tools/test.py` 脚本通过 `--out` 选项存储的输出文件。 +- VISUAL_RESULT_PKL: `tools/test.py` 脚本通过 `--out` 选项存储的输出文件。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..42ebc2bdce1e68880a687bb616377268a2864cf6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/metafile.yml @@ -0,0 +1,27 @@ +Collections: +- Name: Audio + README: configs/recognition_audio/resnet/README.md +Models: +- Config: configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py + In Collection: Audio + Metadata: + Architecture: ResNet18 + Pretrained: None + Training Data: Kinetics-400 + Training Resources: 8 GPUs + n_fft: '1024' + Modality: Audio + Name: tsn_r18_64x1x1_100e_kinetics400_audio_feature + Results: + - Dataset: Kinetics-400 + Metrics: + Top 1 Accuracy: 19.7 + Top 1 Accuracy [w. RGB]: 71.5 + Top 1 Accuracy delta [w. RGB]: 0.39 + Top 5 Accuracy: 35.75 + top5 accuracy [w. RGB]: 90.18 + top5 accuracy delta [w. RGB]: 0.14 + Task: Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/20201010_144630.log.json + Training Log: https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/20201010_144630.log + Weights: https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/tsn_r18_64x1x1_100e_kinetics400_audio_feature_20201012-bf34df6c.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py new file mode 100644 index 0000000000000000000000000000000000000000..d8b5c1e6f38850c581ae62c0eb8085bd010c9915 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py @@ -0,0 +1,89 @@ +_base_ = ['../../_base_/default_runtime.py'] + +# model settings +model = dict( + type='AudioRecognizer', + backbone=dict(type='ResNet', depth=18, in_channels=1, norm_eval=False), + cls_head=dict( + type='AudioTSNHead', + num_classes=400, + in_channels=512, + dropout_ratio=0.5, + init_std=0.01), + # model training and testing settings + train_cfg=None, + test_cfg=dict(average_clips='prob')) +# dataset settings +dataset_type = 'AudioFeatureDataset' +data_root = 'data/kinetics400/audio_feature_train' +data_root_val = 'data/kinetics400/audio_feature_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_audio_feature.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_audio_feature.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_audio_feature.txt' +train_pipeline = [ + dict(type='LoadAudioFeature'), + dict(type='SampleFrames', clip_len=64, frame_interval=1, num_clips=1), + dict(type='AudioFeatureSelector'), + dict(type='FormatAudioShape', input_format='NCTF'), + dict(type='Collect', keys=['audios', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['audios']) +] +val_pipeline = [ + dict(type='LoadAudioFeature'), + dict( + type='SampleFrames', + clip_len=64, + frame_interval=1, + num_clips=1, + test_mode=True), + dict(type='AudioFeatureSelector'), + dict(type='FormatAudioShape', input_format='NCTF'), + dict(type='Collect', keys=['audios', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['audios']) +] +test_pipeline = [ + dict(type='LoadAudioFeature'), + dict( + type='SampleFrames', + clip_len=64, + frame_interval=1, + num_clips=1, + test_mode=True), + dict(type='AudioFeatureSelector'), + dict(type='FormatAudioShape', input_format='NCTF'), + dict(type='Collect', keys=['audios', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['audios']) +] +data = dict( + videos_per_gpu=320, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 100 + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r18_64x1x1_100e_kinetics400_audio_feature/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/tsn_r50_64x1x1_100e_kinetics400_audio.py b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/tsn_r50_64x1x1_100e_kinetics400_audio.py new file mode 100644 index 0000000000000000000000000000000000000000..a806dea747f410b4a141ab1468dd21b54fe1efc9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/recognition_audio/resnet/tsn_r50_64x1x1_100e_kinetics400_audio.py @@ -0,0 +1,84 @@ +_base_ = [ + '../../_base_/models/tsn_r50_audio.py', '../../_base_/default_runtime.py' +] + +# dataset settings +dataset_type = 'AudioDataset' +data_root = 'data/kinetics400/audios' +data_root_val = 'data/kinetics400/audios' +ann_file_train = 'data/kinetics400/kinetics400_train_list_audio.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_audio.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_audio.txt' +train_pipeline = [ + dict(type='AudioDecodeInit'), + dict(type='SampleFrames', clip_len=64, frame_interval=1, num_clips=1), + dict(type='AudioDecode'), + dict(type='AudioAmplify', ratio=1.5), + dict(type='MelLogSpectrogram'), + dict(type='FormatAudioShape', input_format='NCTF'), + dict(type='Collect', keys=['audios', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['audios']) +] +val_pipeline = [ + dict(type='AudioDecodeInit'), + dict( + type='SampleFrames', + clip_len=64, + frame_interval=1, + num_clips=1, + test_mode=True), + dict(type='AudioDecode'), + dict(type='AudioAmplify', ratio=1.5), + dict(type='MelLogSpectrogram'), + dict(type='FormatAudioShape', input_format='NCTF'), + dict(type='Collect', keys=['audios', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['audios']) +] +test_pipeline = [ + dict(type='AudioDecodeInit'), + dict( + type='SampleFrames', + clip_len=64, + frame_interval=1, + num_clips=1, + test_mode=True), + dict(type='AudioDecodeInit'), + dict(type='AudioAmplify', ratio=1.5), + dict(type='MelLogSpectrogram'), + dict(type='FormatAudioShape', input_format='NCTF'), + dict(type='Collect', keys=['audios', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['audios']) +] +data = dict( + videos_per_gpu=320, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) +evaluation = dict( + interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy']) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', min_lr=0) +total_epochs = 100 + +# runtime settings +checkpoint_config = dict(interval=5) +work_dir = './work_dirs/tsn_r50_64x1x1_100e_kinetics400_audio/' diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py new file mode 100644 index 0000000000000000000000000000000000000000..4a8ffbfc977e37ecd0bf127efd5462d57e5d6ed2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py @@ -0,0 +1,79 @@ +model = dict( + type='SkeletonGCN', + backbone=dict( + type='AGCN', + in_channels=3, + graph_cfg=dict(layout='ntu-rgb+d', strategy='agcn')), + cls_head=dict( + type='STGCNHead', + num_classes=60, + in_channels=256, + loss_cls=dict(type='CrossEntropyLoss')), + train_cfg=None, + test_cfg=None) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/ntu/nturgb+d_skeletons_60_3d/xsub/train.pkl' +ann_file_val = 'data/ntu/nturgb+d_skeletons_60_3d/xsub/val.pkl' +train_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='JointToBone'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +val_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='JointToBone'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +test_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='JointToBone'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=[30, 40]) +total_epochs = 80 +checkpoint_config = dict(interval=3) +evaluation = dict(interval=3, metrics=['top_k_accuracy']) +log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')]) + +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/2sagcn_80e_ntu60_xsub_bone_3d/' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py new file mode 100644 index 0000000000000000000000000000000000000000..b2f4422a6dcf4531f213ab4f21fd6ae09139416b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py @@ -0,0 +1,76 @@ +model = dict( + type='SkeletonGCN', + backbone=dict( + type='AGCN', + in_channels=3, + graph_cfg=dict(layout='ntu-rgb+d', strategy='agcn')), + cls_head=dict( + type='STGCNHead', + num_classes=60, + in_channels=256, + loss_cls=dict(type='CrossEntropyLoss')), + train_cfg=None, + test_cfg=None) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/ntu/nturgb+d_skeletons_60_3d/xsub/train.pkl' +ann_file_val = 'data/ntu/nturgb+d_skeletons_60_3d/xsub/val.pkl' +train_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +val_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +test_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +data = dict( + videos_per_gpu=12, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=[30, 40]) +total_epochs = 80 +checkpoint_config = dict(interval=3) +evaluation = dict(interval=3, metrics=['top_k_accuracy']) +log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')]) + +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d/' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..651142af70715f14fd1f3760a5d48319dc0a0e72 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/README.md @@ -0,0 +1,90 @@ +# AGCN + +[Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition](https://openaccess.thecvf.com/content_CVPR_2019/html/Shi_Two-Stream_Adaptive_Graph_Convolutional_Networks_for_Skeleton-Based_Action_Recognition_CVPR_2019_paper.html) + + + +## Abstract + + + +In skeleton-based action recognition, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have achieved remarkable performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This may not be optimal for the hierarchical GCN and diverse samples in action recognition tasks. In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods. In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition. The topology of the graph in our model can be either uniformly or individually learned by the BP algorithm in an end-to-end manner. This data-driven method increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Moreover, a two-stream framework is proposed to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin. + + + +
+ +
+ +## Results and Models + +### NTU60_XSub + +| config | type | gpus | backbone | Top-1 | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------- | :---: | :--: | :------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | +| [2sagcn_80e_ntu60_xsub_keypoint_3d](/configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py) | joint | 1 | AGCN | 86.06 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d-3bed61ba.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d.log) | [json](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d.json) | +| [2sagcn_80e_ntu60_xsub_bone_3d](/configs/skeleton/ss-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py) | bone | 2 | AGCN | 86.89 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d-278b8815.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d.log) | [json](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d.json) | + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train AGCN model on joint data of NTU60 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py \ + --work-dir work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d \ + --validate --seed 0 --deterministic +``` + +Example: train AGCN model on bone data of NTU60 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py \ + --work-dir work_dirs/2sagcn_80e_ntu60_xsub_bone_3d \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test AGCN model on joint data of NTU60 dataset and dump the result to a pickle file. + +```shell +python tools/test.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out joint_result.pkl +``` + +Example: test AGCN model on bone data of NTU60 dataset and dump the result to a pickle file. + +```shell +python tools/test.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out bone_result.pkl +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{shi2019two, + title={Two-stream adaptive graph convolutional networks for skeleton-based action recognition}, + author={Shi, Lei and Zhang, Yifan and Cheng, Jian and Lu, Hanqing}, + booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, + pages={12026--12035}, + year={2019} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..cb7707a2fae21116c164732a3e337cfaa687043d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/README_zh-CN.md @@ -0,0 +1,76 @@ +# AGCN + +## 简介 + + + +```BibTeX +@inproceedings{shi2019two, + title={Two-stream adaptive graph convolutional networks for skeleton-based action recognition}, + author={Shi, Lei and Zhang, Yifan and Cheng, Jian and Lu, Hanqing}, + booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, + pages={12026--12035}, + year={2019} +} +``` + +## 模型库 + +### NTU60_XSub + +| 配置文件 | 数据格式 | GPU 数量 | 主干网络 | top1 准确率 | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------- | :------: | :------: | :------: | :---------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | +| [2sagcn_80e_ntu60_xsub_keypoint_3d](/configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py) | joint | 1 | AGCN | 86.06 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d-3bed61ba.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d.log) | [json](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d.json) | +| [2sagcn_80e_ntu60_xsub_bone_3d](/configs/skeleton/ss-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py) | bone | 2 | AGCN | 86.89 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d-278b8815.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d.log) | [json](https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d.json) | + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 AGCN 模型在 NTU60 数据集的骨骼数据上的训练。 + +```shell +python tools/train.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py \ + --work-dir work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d \ + --validate --seed 0 --deterministic +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 AGCN 模型在 NTU60 数据集的关节数据上的训练。 + +```shell +python tools/train.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py \ + --work-dir work_dirs/2sagcn_80e_ntu60_xsub_bone_3d \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 NTU60 数据集的骨骼数据上测试 AGCN 模型,并将结果导出为一个 pickle 文件。 + +```shell +python tools/test.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out joint_result.pkl +``` + +例如:在 NTU60 数据集的关节数据上测试 AGCN 模型,并将结果导出为一个 pickle 文件。 + +```shell +python tools/test.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out bone_result.pkl +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..30d5804f745ac814e01eb85d2ae257aa1885d6ca --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/2s-agcn/metafile.yml @@ -0,0 +1,40 @@ +Collections: +- Name: AGCN + README: configs/skeleton/2s-agcn/README.md +Models: +- Config: configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py + In Collection: AGCN + Metadata: + Architecture: AGCN + Batch Size: 24 + Epochs: 80 + Parameters: 3472176 + Training Data: NTU60-XSub + Training Resources: 1 GPU + Name: agcn_80e_ntu60_xsub_keypoint_3d + Results: + Dataset: NTU60-XSub + Metrics: + Top 1 Accuracy: 86.06 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d.log + Weights: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d-3bed61ba.pth +- Config: configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py + In Collection: AGCN + Metadata: + Architecture: AGCN + Batch Size: 24 + Epochs: 80 + Parameters: 3472176 + Training Data: NTU60-XSub + Training Resources: 2 GPU + Name: agcn_80e_ntu60_xsub_bone_3d + Results: + Dataset: NTU60-XSub + Metrics: + Top 1 Accuracy: 86.89 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d.log + Weights: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d-278b8815.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/README.md b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e8837c85d610c50e43f8cc58af2fdbf0cfb7fe02 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/README.md @@ -0,0 +1,149 @@ +# PoseC3D + +[Revisiting Skeleton-based Action Recognition](https://arxiv.org/abs/2104.13586) + + + +## Abstract + + + +Human skeleton, as a compact representation of human action, has received increasing attention in recent years. Many skeleton-based action recognition methods adopt graph convolutional networks (GCN) to extract features on top of human skeletons. Despite the positive results shown in previous works, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons. Compared to GCN-based methods, PoseC3D is more effective in learning spatiotemporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings. Also, PoseC3D can handle multiple-person scenarios without additional computation cost, and its features can be easily integrated with other modalities at early fusion stages, which provides a great design space to further boost the performance. On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality. + + + +
+ +
+ + + + + + + + + +
+
+ Pose Estimation Results +
+ +
+
+ +
+
+ Keypoint Heatmap Volume Visualization +
+ +
+
+ +
+
+ Limb Heatmap Volume Visualization +
+ +
+
+ +
+ +## Results and Models + +### FineGYM + +| config | pseudo heatmap | gpus | backbone | Mean Top-1 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------- | :------------: | :---: | :----------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_u48_240e_gym_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py) | keypoint | 8 x 2 | SlowOnly-R50 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint-b07a98a0.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.json) | +| [slowonly_r50_u48_240e_gym_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb.py) | limb | 8 x 2 | SlowOnly-R50 | 94.0 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb-c0d7b482.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.json) | +| Fusion | | | | 94.3 | | | | + +### NTU60_XSub + +| config | pseudo heatmap | gpus | backbone | Top-1 | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------ | :------------: | :---: | :----------: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_u48_240e_ntu60_xsub_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint.py) | keypoint | 8 x 2 | SlowOnly-R50 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint-f3adabf1.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.json) | +| [slowonly_r50_u48_240e_ntu60_xsub_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb.py) | limb | 8 x 2 | SlowOnly-R50 | 93.4 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb-1d69006a.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.json) | +| Fusion | | | | 94.1 | | | | + +### NTU120_XSub + +| config | pseudo heatmap | gpus | backbone | Top-1 | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------- | :------------: | :---: | :----------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_u48_240e_ntu120_xsub_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py) | keypoint | 8 x 2 | SlowOnly-R50 | 86.3 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.json) | +| [slowonly_r50_u48_240e_ntu120_xsub_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb.py) | limb | 8 x 2 | SlowOnly-R50 | 85.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb-803c2317.pth?) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.json) | +| Fusion | | | | 86.9 | | | | + +### UCF101 + +| config | pseudo heatmap | gpus | backbone | Top-1 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :----------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint](/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.py) | keypoint | 8 | SlowOnly-R50 | 87.0 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint-cae8aa4a.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.json) | + +### HMDB51 + +| config | pseudo heatmap | gpus | backbone | Top-1 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :----------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint](/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.py) | keypoint | 8 | SlowOnly-R50 | 69.3 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint-76ffdd8b.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.json) | + +:::{note} + +1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. + According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, + e.g., lr=0.01 for 8 GPUs x 8 videos/gpu and lr=0.04 for 16 GPUs x 16 videos/gpu. +2. You can follow the guide in [Preparing Skeleton Dataset](https://github.com/open-mmlab/mmaction2/tree/master/tools/data/skeleton) to obtain skeleton annotations used in the above configs. + +::: + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train PoseC3D model on FineGYM dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py \ + --work-dir work_dirs/slowonly_r50_u48_240e_gym_keypoint \ + --validate --seed 0 --deterministic +``` + +For training with your custom dataset, you can refer to [Custom Dataset Training](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/custom_dataset_training.md). + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test PoseC3D model on FineGYM dataset and dump the result to a pickle file. + +```shell +python tools/test.py configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.pkl +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@misc{duan2021revisiting, + title={Revisiting Skeleton-based Action Recognition}, + author={Haodong Duan and Yue Zhao and Kai Chen and Dian Shao and Dahua Lin and Bo Dai}, + year={2021}, + eprint={2104.13586}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..9aa2bf47ada927331f70d3493b8b3ab9d1ed8082 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/README_zh-CN.md @@ -0,0 +1,133 @@ +# PoseC3D + +## 简介 + + + +```BibTeX +@misc{duan2021revisiting, + title={Revisiting Skeleton-based Action Recognition}, + author={Haodong Duan and Yue Zhao and Kai Chen and Dian Shao and Dahua Lin and Bo Dai}, + year={2021}, + eprint={2104.13586}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + + + + + + + + + +
+
+ 姿态估计结果 +
+ +
+
+ +
+
+ 关键点热图三维体可视化 +
+ +
+
+ +
+
+ 肢体热图三维体可视化 +
+ +
+
+ +
+ +## 模型库 + +### FineGYM + +| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Mean Top-1 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------- | :------: | :------: | :----------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_u48_240e_gym_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py) | 关键点 | 8 x 2 | SlowOnly-R50 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint-b07a98a0.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.json) | +| [slowonly_r50_u48_240e_gym_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb.py) | 肢体 | 8 x 2 | SlowOnly-R50 | 94.0 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb-c0d7b482.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.json) | +| 融合预测结果 | | | | 94.3 | | | | + +### NTU60_XSub + +| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Top-1 | ckpt | log | json | +| :------------------------------------------------------------------------------------------------------------------ | :------: | :------: | :----------: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_u48_240e_ntu60_xsub_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint.py) | 关键点 | 8 x 2 | SlowOnly-R50 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint-f3adabf1.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.json) | +| [slowonly_r50_u48_240e_ntu60_xsub_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb.py) | 肢体 | 8 x 2 | SlowOnly-R50 | 93.4 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb-1d69006a.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.json) | +| 融合预测结果 | | | | 94.1 | | | | + +### NTU120_XSub + +| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Top-1 | ckpt | log | json | +| :-------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :----------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_r50_u48_240e_ntu120_xsub_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py) | 关键点 | 8 x 2 | SlowOnly-R50 | 86.3 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.json) | +| [slowonly_r50_u48_240e_ntu120_xsub_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb.py) | 肢体 | 8 x 2 | SlowOnly-R50 | 85.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb-803c2317.pth?) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.json) | +| 融合预测结果 | | | | 86.9 | | | | + +### UCF101 + +| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Top-1 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :----------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint](/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.py) | 关键点 | 8 | SlowOnly-R50 | 87.0 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint-cae8aa4a.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.json) | + +### HMDB51 + +| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Top-1 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :----------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint](/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.py) | 关键点 | 8 | SlowOnly-R50 | 69.3 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint-76ffdd8b.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.json) | + +注: + +1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。 + 依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。 + 如,lr=0.2 对应 8 GPUs x 16 video/gpu,以及 lr=0.4 对应 16 GPUs x 16 video/gpu。 +2. 用户可以参照 [准备骨骼数据集](https://github.com/open-mmlab/mmaction2/blob/master/tools/data/skeleton/README_zh-CN.md) 来获取以上配置文件使用的骨骼标注。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: 以确定性的训练,加以定期的验证过程进行 PoseC3D 模型在 FineGYM 数据集上的训练。 + +```shell +python tools/train.py configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py \ + --work-dir work_dirs/slowonly_r50_u48_240e_gym_keypoint \ + --validate --seed 0 --deterministic +``` + +有关自定义数据集上的训练,可以参考 [Custom Dataset Training](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/custom_dataset_training.md)。 + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: 在 FineGYM 数据集上测试 PoseC3D 模型,并将结果导出为一个 pickle 文件。 + +```shell +python tools/test.py configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.pkl +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/custom_dataset_training.md b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/custom_dataset_training.md new file mode 100644 index 0000000000000000000000000000000000000000..cb5b2f647f3edaea24d1976e598417857c0cb0fd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/custom_dataset_training.md @@ -0,0 +1,41 @@ +# Custom Dataset Training with PoseC3D + +We provide a step-by-step tutorial on how to train your custom dataset with PoseC3D. + +1. First, you should know that action recognition with PoseC3D requires skeleton information only and for that you need to prepare your custom annotation files (for training and validation). To start with, you need to replace the placeholder `mmdet_root` and `mmpose_root` in `ntu_pose_extraction.py` with your installation path. Then you need to take advantage of [ntu_pose_extraction.py](https://github.com/open-mmlab/mmaction2/blob/90fc8440961987b7fe3ee99109e2c633c4e30158/tools/data/skeleton/ntu_pose_extraction.py) as shown in [Prepare Annotations](https://github.com/open-mmlab/mmaction2/blob/master/tools/data/skeleton/README.md#prepare-annotations) to extract 2D keypoints for each video in your custom dataset. The command looks like (assuming the name of your video is `some_video_from_my_dataset.mp4`): + + ```shell + # You can use the above command to generate pickle files for all of your training and validation videos. + python ntu_pose_extraction.py some_video_from_my_dataset.mp4 some_video_from_my_dataset.pkl + ``` + + @kennymckormick's [note](https://github.com/open-mmlab/mmaction2/issues/1216#issuecomment-950130079): + + > One only thing you may need to change is that: since ntu_pose_extraction.py is developed specifically for pose extraction of NTU videos, you can skip the [ntu_det_postproc](https://github.com/open-mmlab/mmaction2/blob/90fc8440961987b7fe3ee99109e2c633c4e30158/tools/data/skeleton/ntu_pose_extraction.py#L307) step when using this script for extracting pose from your custom video datasets. + +2. Then, you will collect all the pickle files into one list for training (and, of course, for validation) and save them as a single file (like `custom_dataset_train.pkl` or `custom_dataset_val.pkl`). At that time, you finalize preparing annotation files for your custom dataset. + +3. Next, you may use the following script (with some alterations according to your needs) for training as shown in [PoseC3D/Train](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/README.md#train): `python tools/train.py configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py --work-dir work_dirs/slowonly_r50_u48_240e_ntu120_xsub_keypoint --validate --test-best --gpus 2 --seed 0 --deterministic`: + + - Before running the above script, you need to modify the variables to initialize with your newly made annotation files: + + ```python + model = dict( + ... + cls_head=dict( + ... + num_classes=4, # Your class number + ... + ), + ... + ) + + ann_file_train = 'data/posec3d/custom_dataset_train.pkl' # Your annotation for training + ann_file_val = 'data/posec3d/custom_dataset_val.pkl' # Your annotation for validation + + load_from = 'pretrained_weight.pth' # Your can use released weights for initialization, set to None if training from scratch + + # You can also alter the hyper parameters or training schedule + ``` + +With that, your machine should start its work to let you grab a cup of coffee and watch how the training goes. diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..b4c29ac730afddf61ce9d96deb313f3656af8fb9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/metafile.yml @@ -0,0 +1,159 @@ +Collections: +- Name: PoseC3D + README: configs/skeleton/posec3d/README.md + Paper: + URL: https://arxiv.org/abs/2104.13586 + Title: Revisiting Skeleton-based Action Recognition +Models: +- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py + In Collection: PoseC3D + Metadata: + Architecture: SlowOnly-R50 + Batch Size: 16 + Epochs: 240 + Parameters: 2044867 + Training Data: FineGYM + Training Resources: 16 GPUs + pseudo heatmap: keypoint + Name: slowonly_r50_u48_240e_gym_keypoint + Results: + - Dataset: FineGYM + Metrics: + mean Top 1 Accuracy: 93.7 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.log + Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint-b07a98a0.pth +- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb.py + In Collection: PoseC3D + Metadata: + Architecture: SlowOnly-R50 + Batch Size: 16 + Epochs: 240 + Parameters: 2044867 + Training Data: FineGYM + Training Resources: 16 GPUs + pseudo heatmap: limb + Name: slowonly_r50_u48_240e_gym_limb + Results: + - Dataset: FineGYM + Metrics: + mean Top 1 Accuracy: 94.0 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.log + Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb-c0d7b482.pth +- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint.py + In Collection: PoseC3D + Metadata: + Architecture: SlowOnly-R50 + Batch Size: 16 + Epochs: 240 + Parameters: 2024860 + Training Data: NTU60-XSub + Training Resources: 16 GPUs + pseudo heatmap: keypoint + Name: slowonly_r50_u48_240e_ntu60_xsub_keypoint + Results: + - Dataset: NTU60-XSub + Metrics: + Top 1 Accuracy: 93.7 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.log + Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint-f3adabf1.pth +- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb.py + In Collection: PoseC3D + Metadata: + Architecture: SlowOnly-R50 + Batch Size: 16 + Epochs: 240 + Parameters: 2024860 + Training Data: NTU60-XSub + Training Resources: 16 GPUs + pseudo heatmap: limb + Name: slowonly_r50_u48_240e_ntu60_xsub_limb + Results: + - Dataset: NTU60-XSub + Metrics: + Top 1 Accuracy: 93.4 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.log + Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb-1d69006a.pth +- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py + In Collection: PoseC3D + Metadata: + Architecture: SlowOnly-R50 + Batch Size: 16 + Epochs: 240 + Parameters: 2055640 + Training Data: NTU120-XSub + Training Resources: 16 GPUs + pseudo heatmap: keypoint + Name: slowonly_r50_u48_240e_ntu120_xsub_keypoint + Results: + - Dataset: NTU120-XSub + Metrics: + Top 1 Accuracy: 86.3 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.log + Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth +- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb.py + In Collection: PoseC3D + Metadata: + Architecture: SlowOnly-R50 + Batch Size: 16 + Epochs: 240 + Parameters: 2055640 + Training Data: NTU120-XSub + Training Resources: 16 GPUs + pseudo heatmap: limb + Name: slowonly_r50_u48_240e_ntu120_xsub_limb + Results: + - Dataset: NTU120-XSub + Metrics: + Top 1 Accuracy: 85.7 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.log + Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb-803c2317.pth +- Config: configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.py + In Collection: PoseC3D + Metadata: + Architecture: SlowOnly-R50 + Batch Size: 16 + Epochs: 120 + Parameters: 3029984 + Training Data: HMDB51 + Training Resources: 8 GPUs + pseudo heatmap: keypoint + Name: slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint + Results: + - Dataset: HMDB51 + Metrics: + Top 1 Accuracy: 69.3 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.log + Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint-76ffdd8b.pth +- Config: configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.py + In Collection: PoseC3D + Metadata: + Architecture: SlowOnly-R50 + Batch Size: 16 + Epochs: 120 + Parameters: 3055584 + Training Data: UCF101 + Training Resources: 8 GPUs + pseudo heatmap: keypoint + Name: slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint + Results: + - Dataset: UCF101 + Metrics: + Top 1 Accuracy: 87.0 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.log + Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint-cae8aa4a.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.py new file mode 100644 index 0000000000000000000000000000000000000000..158469e10796a737ab8c13b3f20aed9647821147 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.py @@ -0,0 +1,131 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + in_channels=17, + base_channels=32, + num_stages=3, + out_indices=(2, ), + stage_blocks=(3, 4, 6), + conv1_stride_s=1, + pool1_stride_s=1, + inflate=(0, 1, 1), + spatial_strides=(2, 2, 2), + temporal_strides=(1, 1, 2), + dilations=(1, 1, 1)), + cls_head=dict( + type='I3DHead', + in_channels=512, + num_classes=51, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=dict(), + test_cfg=dict(average_clips='prob')) + +dataset_type = 'PoseDataset' +ann_file = 'data/posec3d/hmdb51.pkl' +left_kp = [1, 3, 5, 7, 9, 11, 13, 15] +right_kp = [2, 4, 6, 8, 10, 12, 14, 16] +train_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), + dict(type='Resize', scale=(48, 48), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 56)), + dict(type='CenterCrop', crop_size=56), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 56)), + dict(type='CenterCrop', crop_size=56), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False, + double=True, + left_kp=left_kp, + right_kp=right_kp), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type='RepeatDataset', + times=10, + dataset=dict( + type=dataset_type, + ann_file=ann_file, + split='train1', + data_prefix='', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + ann_file=ann_file, + split='test1', + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file, + split='test1', + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, + weight_decay=0.0001) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[9, 11]) +total_epochs = 12 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5)) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/posec3d_iclr/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint' # noqa: E501 +load_from = 'https://download.openmmlab.com/mmaction/skeleton/posec3d/k400_posec3d-041f49c6.pth' # noqa: E501 +resume_from = None +find_unused_parameters = True diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.py new file mode 100644 index 0000000000000000000000000000000000000000..6e5f34d3d3d36efb95d312c7ddf655a54d69bdad --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.py @@ -0,0 +1,131 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + in_channels=17, + base_channels=32, + num_stages=3, + out_indices=(2, ), + stage_blocks=(3, 4, 6), + conv1_stride_s=1, + pool1_stride_s=1, + inflate=(0, 1, 1), + spatial_strides=(2, 2, 2), + temporal_strides=(1, 1, 2), + dilations=(1, 1, 1)), + cls_head=dict( + type='I3DHead', + in_channels=512, + num_classes=101, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=dict(), + test_cfg=dict(average_clips='prob')) + +dataset_type = 'PoseDataset' +ann_file = 'data/posec3d/ucf101.pkl' +left_kp = [1, 3, 5, 7, 9, 11, 13, 15] +right_kp = [2, 4, 6, 8, 10, 12, 14, 16] +train_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), + dict(type='Resize', scale=(48, 48), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 56)), + dict(type='CenterCrop', crop_size=56), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 56)), + dict(type='CenterCrop', crop_size=56), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False, + double=True, + left_kp=left_kp, + right_kp=right_kp), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type='RepeatDataset', + times=10, + dataset=dict( + type=dataset_type, + ann_file=ann_file, + split='train1', + data_prefix='', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + ann_file=ann_file, + split='test1', + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file, + split='test1', + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.01, momentum=0.9, + weight_decay=0.0003) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[9, 11]) +total_epochs = 12 +checkpoint_config = dict(interval=1) +workflow = [('train', 1)] +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5)) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/posec3d_iclr/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint' # noqa: E501 +load_from = 'https://download.openmmlab.com/mmaction/skeleton/posec3d/k400_posec3d-041f49c6.pth' # noqa: E501 +resume_from = None +find_unused_parameters = True diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py new file mode 100644 index 0000000000000000000000000000000000000000..8ce6fbcb31fcfaf4b618e7170a63305418a034d4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py @@ -0,0 +1,128 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + in_channels=17, + base_channels=32, + num_stages=3, + out_indices=(2, ), + stage_blocks=(4, 6, 3), + conv1_stride_s=1, + pool1_stride_s=1, + inflate=(0, 1, 1), + spatial_strides=(2, 2, 2), + temporal_strides=(1, 1, 2), + dilations=(1, 1, 1)), + cls_head=dict( + type='I3DHead', + in_channels=512, + num_classes=99, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=dict(), + test_cfg=dict(average_clips='prob')) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/posec3d/gym_train.pkl' +ann_file_val = 'data/posec3d/gym_val.pkl' +left_kp = [1, 3, 5, 7, 9, 11, 13, 15] +right_kp = [2, 4, 6, 8, 10, 12, 14, 16] +train_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), + dict(type='Resize', scale=(56, 56), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False, + double=True, + left_kp=left_kp, + right_kp=right_kp), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.2, momentum=0.9, + weight_decay=0.0003) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0) +total_epochs = 240 +checkpoint_config = dict(interval=10) +workflow = [('train', 10)] +evaluation = dict( + interval=10, + metrics=['top_k_accuracy', 'mean_class_accuracy'], + topk=(1, 5)) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_gym_keypoint' +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb.py new file mode 100644 index 0000000000000000000000000000000000000000..c0c9295e029a169b711d01e196e2e2e1791ba0ab --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb.py @@ -0,0 +1,134 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + in_channels=17, + base_channels=32, + num_stages=3, + out_indices=(2, ), + stage_blocks=(4, 6, 3), + conv1_stride_s=1, + pool1_stride_s=1, + inflate=(0, 1, 1), + spatial_strides=(2, 2, 2), + temporal_strides=(1, 1, 2), + dilations=(1, 1, 1)), + cls_head=dict( + type='I3DHead', + in_channels=512, + num_classes=99, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=dict(), + test_cfg=dict(average_clips='prob')) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/posec3d/gym_train.pkl' +ann_file_val = 'data/posec3d/gym_val.pkl' +left_kp = [1, 3, 5, 7, 9, 11, 13, 15] +right_kp = [2, 4, 6, 8, 10, 12, 14, 16] +skeletons = [[0, 5], [0, 6], [5, 7], [7, 9], [6, 8], [8, 10], [5, 11], + [11, 13], [13, 15], [6, 12], [12, 14], [14, 16], [0, 1], [0, 2], + [1, 3], [2, 4], [11, 12]] +train_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), + dict(type='Resize', scale=(56, 56), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=False, + with_limb=True, + skeletons=skeletons), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=False, + with_limb=True, + skeletons=skeletons), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=False, + with_limb=True, + skeletons=skeletons, + double=True, + left_kp=left_kp, + right_kp=right_kp), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.2, momentum=0.9, + weight_decay=0.0003) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0) +total_epochs = 240 +checkpoint_config = dict(interval=10) +workflow = [('train', 10)] +evaluation = dict( + interval=10, + metrics=['top_k_accuracy', 'mean_class_accuracy'], + topk=(1, 5)) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_gym_limb' +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py new file mode 100644 index 0000000000000000000000000000000000000000..640c67485a2ae5448f42e3e291bb60a08fd48312 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py @@ -0,0 +1,130 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + in_channels=17, + base_channels=32, + num_stages=3, + out_indices=(2, ), + stage_blocks=(4, 6, 3), + conv1_stride_s=1, + pool1_stride_s=1, + inflate=(0, 1, 1), + spatial_strides=(2, 2, 2), + temporal_strides=(1, 1, 2), + dilations=(1, 1, 1)), + cls_head=dict( + type='I3DHead', + in_channels=512, + num_classes=120, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=dict(), + test_cfg=dict(average_clips='prob')) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/posec3d/ntu120_xsub_train.pkl' +ann_file_val = 'data/posec3d/ntu120_xsub_val.pkl' +left_kp = [1, 3, 5, 7, 9, 11, 13, 15] +right_kp = [2, 4, 6, 8, 10, 12, 14, 16] +train_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), + dict(type='Resize', scale=(56, 56), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False, + double=True, + left_kp=left_kp, + right_kp=right_kp), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + class_prob={i: 1 + int(i >= 60) + for i in range(120)}, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.2, momentum=0.9, + weight_decay=0.0003) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0) +total_epochs = 240 +checkpoint_config = dict(interval=10) +workflow = [('train', 10)] +evaluation = dict( + interval=10, + metrics=['top_k_accuracy', 'mean_class_accuracy'], + topk=(1, 5)) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint' +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb.py new file mode 100644 index 0000000000000000000000000000000000000000..978bb2adcf38a1326156e7cc85763b74267adc8b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb.py @@ -0,0 +1,136 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + in_channels=17, + base_channels=32, + num_stages=3, + out_indices=(2, ), + stage_blocks=(4, 6, 3), + conv1_stride_s=1, + pool1_stride_s=1, + inflate=(0, 1, 1), + spatial_strides=(2, 2, 2), + temporal_strides=(1, 1, 2), + dilations=(1, 1, 1)), + cls_head=dict( + type='I3DHead', + in_channels=512, + num_classes=120, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=dict(), + test_cfg=dict(average_clips='prob')) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/posec3d/ntu60_xsub_train.pkl' +ann_file_val = 'data/posec3d/ntu60_xsub_val.pkl' +left_kp = [1, 3, 5, 7, 9, 11, 13, 15] +right_kp = [2, 4, 6, 8, 10, 12, 14, 16] +skeletons = [[0, 5], [0, 6], [5, 7], [7, 9], [6, 8], [8, 10], [5, 11], + [11, 13], [13, 15], [6, 12], [12, 14], [14, 16], [0, 1], [0, 2], + [1, 3], [2, 4], [11, 12]] +train_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), + dict(type='Resize', scale=(56, 56), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=False, + with_limb=True, + skeletons=skeletons), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=False, + with_limb=True, + skeletons=skeletons), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=False, + with_limb=True, + skeletons=skeletons, + double=True, + left_kp=left_kp, + right_kp=right_kp), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + class_prob={i: 1 + int(i >= 60) + for i in range(120)}, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.2, momentum=0.9, + weight_decay=0.0003) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0) +total_epochs = 240 +checkpoint_config = dict(interval=10) +workflow = [('train', 10)] +evaluation = dict( + interval=10, + metrics=['top_k_accuracy', 'mean_class_accuracy'], + topk=(1, 5)) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb' +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint.py new file mode 100644 index 0000000000000000000000000000000000000000..47e541115e1c8c3d9fef417de58b37f1a9fc6396 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint.py @@ -0,0 +1,128 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + in_channels=17, + base_channels=32, + num_stages=3, + out_indices=(2, ), + stage_blocks=(4, 6, 3), + conv1_stride_s=1, + pool1_stride_s=1, + inflate=(0, 1, 1), + spatial_strides=(2, 2, 2), + temporal_strides=(1, 1, 2), + dilations=(1, 1, 1)), + cls_head=dict( + type='I3DHead', + in_channels=512, + num_classes=60, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=dict(), + test_cfg=dict(average_clips='prob')) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/posec3d/ntu60_xsub_train.pkl' +ann_file_val = 'data/posec3d/ntu60_xsub_val.pkl' +left_kp = [1, 3, 5, 7, 9, 11, 13, 15] +right_kp = [2, 4, 6, 8, 10, 12, 14, 16] +train_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), + dict(type='Resize', scale=(56, 56), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False, + double=True, + left_kp=left_kp, + right_kp=right_kp), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.2, momentum=0.9, + weight_decay=0.0003) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0) +total_epochs = 240 +checkpoint_config = dict(interval=10) +workflow = [('train', 10)] +evaluation = dict( + interval=10, + metrics=['top_k_accuracy', 'mean_class_accuracy'], + topk=(1, 5)) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint' +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb.py new file mode 100644 index 0000000000000000000000000000000000000000..7e98d22dd640d535639dfd6d2c71f4913e1b79ce --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb.py @@ -0,0 +1,134 @@ +model = dict( + type='Recognizer3D', + backbone=dict( + type='ResNet3dSlowOnly', + depth=50, + pretrained=None, + in_channels=17, + base_channels=32, + num_stages=3, + out_indices=(2, ), + stage_blocks=(4, 6, 3), + conv1_stride_s=1, + pool1_stride_s=1, + inflate=(0, 1, 1), + spatial_strides=(2, 2, 2), + temporal_strides=(1, 1, 2), + dilations=(1, 1, 1)), + cls_head=dict( + type='I3DHead', + in_channels=512, + num_classes=60, + spatial_type='avg', + dropout_ratio=0.5), + train_cfg=dict(), + test_cfg=dict(average_clips='prob')) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/posec3d/ntu60_xsub_train.pkl' +ann_file_val = 'data/posec3d/ntu60_xsub_val.pkl' +left_kp = [1, 3, 5, 7, 9, 11, 13, 15] +right_kp = [2, 4, 6, 8, 10, 12, 14, 16] +skeletons = [[0, 5], [0, 6], [5, 7], [7, 9], [6, 8], [8, 10], [5, 11], + [11, 13], [13, 15], [6, 12], [12, 14], [14, 16], [0, 1], [0, 2], + [1, 3], [2, 4], [11, 12]] +train_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), + dict(type='Resize', scale=(56, 56), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=False, + with_limb=True, + skeletons=skeletons), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=False, + with_limb=True, + skeletons=skeletons), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), + dict(type='PoseDecode'), + dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True), + dict(type='Resize', scale=(-1, 64)), + dict(type='CenterCrop', crop_size=64), + dict( + type='GeneratePoseTarget', + sigma=0.6, + use_score=True, + with_kp=False, + with_limb=True, + skeletons=skeletons, + double=True, + left_kp=left_kp, + right_kp=right_kp), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.2, momentum=0.9, + weight_decay=0.0003) # this lr is used for 8 gpus +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0) +total_epochs = 240 +checkpoint_config = dict(interval=10) +workflow = [('train', 10)] +evaluation = dict( + interval=10, + metrics=['top_k_accuracy', 'mean_class_accuracy'], + topk=(1, 5)) +log_config = dict( + interval=20, hooks=[ + dict(type='TextLoggerHook'), + ]) +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb' +load_from = None +resume_from = None +find_unused_parameters = False diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/README.md b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1b8f435d57f0edcf3c61077393eb29ec5816d35a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/README.md @@ -0,0 +1,84 @@ +# STGCN + +[Spatial temporal graph convolutional networks for skeleton-based action recognition](https://ojs.aaai.org/index.php/AAAI/article/view/12328) + + + +## Abstract + + + +Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power and difficulties of generalization. In this work, we propose a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data. This formulation not only leads to greater expressive power but also stronger generalization capability. On two large datasets, Kinetics and NTU-RGBD, it achieves substantial improvements over mainstream methods. + + + +
+ +
+ +## Results and Models + +### NTU60_XSub + +| config | keypoint | gpus | backbone | Top-1 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------- | :------: | :--: | :------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------: | +| [stgcn_80e_ntu60_xsub_keypoint](/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py) | 2d | 2 | STGCN | 86.91 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint-e7bb9653.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.json) | +| [stgcn_80e_ntu60_xsub_keypoint_3d](/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d.py) | 3d | 1 | STGCN | 84.61 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d-13e7ccf0.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.log) | [json](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.json) | + +### BABEL + +| config | gpus | backbone | Top-1 | Mean Top-1 | Top-1 Official (AGCN) | Mean Top-1 Official (AGCN) | ckpt | log | +| --------------------------------------------------------------------------- | :--: | :------: | :-------: | :--------: | :-------------------: | :------------------------: | :-----------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------: | +| [stgcn_80e_babel60](/configs/skeleton/stgcn/stgcn_80e_babel60.py) | 8 | ST-GCN | **42.39** | **28.28** | 41.14 | 24.46 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60-3d206418.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60.log) | +| [stgcn_80e_babel60_wfl](/configs/skeleton/stgcn/stgcn_80e_babel60_wfl.py) | 8 | ST-GCN | **40.31** | 29.79 | 33.41 | **30.42** | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60_wfl/stgcn_80e_babel60_wfl-1a9102d7.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60_wfl.log) | +| [stgcn_80e_babel120](/configs/skeleton/stgcn/stgcn_80e_babel120.py) | 8 | ST-GCN | **38.95** | **20.58** | 38.41 | 17.56 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120/stgcn_80e_babel120-e41eb6d7.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel120.log) | +| [stgcn_80e_babel120_wfl](/configs/skeleton/stgcn/stgcn_80e_babel120_wfl.py) | 8 | ST-GCN | **33.00** | 24.33 | 27.91 | **26.17**\* | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120_wfl/stgcn_80e_babel120_wfl-3f2c100d.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel120_wfl.log) | + +\* The number is copied from the [paper](https://arxiv.org/pdf/2106.09696.pdf), the performance of the [released checkpoints](https://github.com/abhinanda-punnakkal/BABEL/tree/main/action_recognition) for BABEL-120 is inferior. + +## Train + +You can use the following command to train a model. + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +Example: train STGCN model on NTU60 dataset in a deterministic option with periodic validation. + +```shell +python tools/train.py configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \ + --work-dir work_dirs/stgcn_80e_ntu60_xsub_keypoint \ + --validate --seed 0 --deterministic +``` + +For more details, you can refer to **Training setting** part in [getting_started](/docs/getting_started.md#training-setting). + +## Test + +You can use the following command to test a model. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +Example: test STGCN model on NTU60 dataset and dump the result to a pickle file. + +```shell +python tools/test.py configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.pkl +``` + +For more details, you can refer to **Test a dataset** part in [getting_started](/docs/getting_started.md#test-a-dataset). + +## Citation + +```BibTeX +@inproceedings{yan2018spatial, + title={Spatial temporal graph convolutional networks for skeleton-based action recognition}, + author={Yan, Sijie and Xiong, Yuanjun and Lin, Dahua}, + booktitle={Thirty-second AAAI conference on artificial intelligence}, + year={2018} +} +``` diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..834d47bc80790ca5e1f7eee3dd8603c7019b4056 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/README_zh-CN.md @@ -0,0 +1,70 @@ +# STGCN + +## 简介 + + + +```BibTeX +@inproceedings{yan2018spatial, + title={Spatial temporal graph convolutional networks for skeleton-based action recognition}, + author={Yan, Sijie and Xiong, Yuanjun and Lin, Dahua}, + booktitle={Thirty-second AAAI conference on artificial intelligence}, + year={2018} +} +``` + +## 模型库 + +### NTU60_XSub + +| 配置文件 | 骨骼点 | GPU 数量 | 主干网络 | Top-1 准确率 | ckpt | log | json | +| :---------------------------------------------------------------------------------------------- | :----: | :------: | :------: | :----------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------: | +| [stgcn_80e_ntu60_xsub_keypoint](/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py) | 2d | 2 | STGCN | 86.91 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint-e7bb9653.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.json) | +| [stgcn_80e_ntu60_xsub_keypoint_3d](/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d.py) | 3d | 1 | STGCN | 84.61 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d-13e7ccf0.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.log) | [json](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.json) | + +### BABEL + +| 配置文件 | GPU 数量 | 主干网络 | Top-1 准确率 | 类平均 Top-1 准确率 | Top-1 准确率
(官方,使用 AGCN) | 类平均 Top-1 准确率
(官方,使用 AGCN) | ckpt | log | +| --------------------------------------------------------------------------- | :------: | :------: | :----------: | :-----------------: | :----------------------------------: | :----------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------: | +| [stgcn_80e_babel60](/configs/skeleton/stgcn/stgcn_80e_babel60.py) | 8 | ST-GCN | **42.39** | **28.28** | 41.14 | 24.46 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60-3d206418.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60.log) | +| [stgcn_80e_babel60_wfl](/configs/skeleton/stgcn/stgcn_80e_babel60_wfl.py) | 8 | ST-GCN | **40.31** | 29.79 | 33.41 | **30.42** | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60_wfl/stgcn_80e_babel60_wfl-1a9102d7.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60_wfl.log) | +| [stgcn_80e_babel120](/configs/skeleton/stgcn/stgcn_80e_babel120.py) | 8 | ST-GCN | **38.95** | **20.58** | 38.41 | 17.56 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120/stgcn_80e_babel120-e41eb6d7.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel120.log) | +| [stgcn_80e_babel120_wfl](/configs/skeleton/stgcn/stgcn_80e_babel120_wfl.py) | 8 | ST-GCN | **33.00** | 24.33 | 27.91 | **26.17**\* | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120_wfl/stgcn_80e_babel120_wfl-3f2c100d.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel120_wfl.log) | + +\* 注:此数字引自原 [论文](https://arxiv.org/pdf/2106.09696.pdf), 实际公开的 [模型权重](https://github.com/abhinanda-punnakkal/BABEL/tree/main/action_recognition) 精度略低一些。 + +## 如何训练 + +用户可以使用以下指令进行模型训练。 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +例如:以一个确定性的训练方式,辅以定期的验证过程进行 STGCN 模型在 NTU60 数据集上的训练 + +```shell +python tools/train.py configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \ + --work-dir work_dirs/stgcn_80e_ntu60_xsub_keypoint \ + --validate --seed 0 --deterministic +``` + +更多训练细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) 中的 **训练配置** 部分。 + +## 如何测试 + +用户可以使用以下指令进行模型测试。 + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] +``` + +例如:在 NTU60 数据集上测试 STGCN 模型,并将结果导出为一个 pickle 文件。 + +```shell +python tools/test.py configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \ + checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \ + --out result.pkl +``` + +更多测试细节,可参考 [基础教程](/docs_zh_CN/getting_started.md#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) 中的 **测试某个数据集** 部分。 diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/metafile.yml b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/metafile.yml new file mode 100644 index 0000000000000000000000000000000000000000..f4e2b7fc06066743afc33e4189cbda636dc7c54f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/metafile.yml @@ -0,0 +1,112 @@ +Collections: +- Name: STGCN + README: configs/skeleton/stgcn/README.md +Models: +- Config: configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py + In Collection: STGCN + Metadata: + Architecture: STGCN + Batch Size: 16 + Epochs: 80 + Parameters: 3088704 + Training Data: NTU60-XSub + Training Resources: 2 GPUs + Name: stgcn_80e_ntu60_xsub_keypoint + Results: + Dataset: NTU60-XSub + Metrics: + Top 1 Accuracy: 86.91 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.log + Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint-e7bb9653.pth +- Config: configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d.py + In Collection: STGCN + Metadata: + Architecture: STGCN + Batch Size: 32 + Epochs: 80 + Parameters: 3088704 + Training Data: NTU60-XSub + Training Resources: 1 GPU + Name: stgcn_80e_ntu60_xsub_keypoint_3d + Results: + Dataset: NTU60-XSub + Metrics: + Top 1 Accuracy: 84.61 + Task: Skeleton-based Action Recognition + Training Json Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.json + Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.log + Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d-13e7ccf0.pth +- Config: configs/skeleton/stgcn/stgcn_80e_babel60.py + In Collection: STGCN + Metadata: + Architecture: STGCN + Batch Size: 128 + Epochs: 80 + Parameters: 3088704 + Training Data: BABEL60 + Training Resources: 8 GPU + Name: stgcn_80e_babel60 + Results: + Dataset: BABEL60 + Metrics: + Top 1 Accuracy: 42.39 + Mean Top 1 Accuracy: 28.28 + Task: Skeleton-based Action Recognition + Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60.log + Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60-3d206418.pth +- Config: configs/skeleton/stgcn/stgcn_80e_babel60_wfl.py + In Collection: STGCN + Metadata: + Architecture: STGCN + Batch Size: 128 + Epochs: 80 + Parameters: 3088704 + Training Data: BABEL60 + Training Resources: 8 GPU + Name: stgcn_80e_babel60_wfl + Results: + Dataset: BABEL60 + Metrics: + Top 1 Accuracy: 40.31 + Mean Top 1 Accuracy: 29.79 + Task: Skeleton-based Action Recognition + Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60_wfl/stgcn_80e_babel60_wfl.log + Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60_wfl/stgcn_80e_babel60_wfl-1a9102d7.pth +- Config: configs/skeleton/stgcn/stgcn_80e_babel120.py + In Collection: STGCN + Metadata: + Architecture: STGCN + Batch Size: 128 + Epochs: 80 + Parameters: 3104320 + Training Data: BABEL120 + Training Resources: 8 GPU + Name: stgcn_80e_babel120 + Results: + Dataset: BABEL120 + Metrics: + Top 1 Accuracy: 38.95 + Mean Top 1 Accuracy: 20.58 + Task: Skeleton-based Action Recognition + Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120/stgcn_80e_babel120.log + Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120/stgcn_80e_babel120-e41eb6d7.pth +- Config: configs/skeleton/stgcn/stgcn_80e_babel120_wfl.py + In Collection: STGCN + Metadata: + Architecture: STGCN + Batch Size: 128 + Epochs: 80 + Parameters: 3104320 + Training Data: BABEL120 + Training Resources: 8 GPU + Name: stgcn_80e_babel120_wfl + Results: + Dataset: BABEL120 + Metrics: + Top 1 Accuracy: 33.00 + Mean Top 1 Accuracy: 24.33 + Task: Skeleton-based Action Recognition + Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120_wfl/stgcn_80e_babel120_wfl.log + Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120_wfl/stgcn_80e_babel120_wfl-3f2c100d.pth diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel120.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel120.py new file mode 100644 index 0000000000000000000000000000000000000000..bf6bac29f03fbdf49b5c8a855c4e10e2b2b357e9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel120.py @@ -0,0 +1,78 @@ +model = dict( + type='SkeletonGCN', + backbone=dict( + type='STGCN', + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')), + cls_head=dict( + type='STGCNHead', + num_classes=120, + in_channels=256, + num_person=1, + loss_cls=dict(type='CrossEntropyLoss')), + train_cfg=None, + test_cfg=None) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/babel/babel120_train.pkl' +ann_file_val = 'data/babel/babel120_val.pkl' +train_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +val_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +test_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type='RepeatDataset', + times=5, + dataset=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=[10, 14]) +total_epochs = 16 +checkpoint_config = dict(interval=1) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')]) + +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/stgcn_80e_babel120' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel120_wfl.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel120_wfl.py new file mode 100644 index 0000000000000000000000000000000000000000..63516b2e1f73f680e01d1cd0fe0fbe27a86371b8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel120_wfl.py @@ -0,0 +1,89 @@ +samples_per_cls = [ + 518, 1993, 6260, 508, 208, 3006, 431, 724, 4527, 2131, 199, 1255, 487, 302, + 136, 571, 267, 646, 1180, 405, 72, 731, 842, 1619, 271, 27, 1198, 1012, + 110, 865, 462, 526, 405, 487, 101, 24, 84, 64, 168, 271, 609, 503, 76, 167, + 415, 137, 421, 283, 2069, 715, 196, 66, 44, 989, 122, 43, 599, 396, 245, + 380, 34, 236, 260, 325, 127, 133, 119, 66, 125, 50, 206, 191, 394, 69, 98, + 145, 38, 21, 29, 64, 277, 65, 39, 31, 35, 85, 54, 80, 133, 66, 39, 64, 268, + 34, 172, 54, 33, 21, 110, 19, 40, 55, 146, 39, 37, 75, 101, 20, 46, 55, 43, + 21, 43, 87, 29, 36, 24, 37, 28, 39 +] + +model = dict( + type='SkeletonGCN', + backbone=dict( + type='STGCN', + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')), + cls_head=dict( + type='STGCNHead', + num_classes=120, + in_channels=256, + num_person=1, + loss_cls=dict(type='CBFocalLoss', samples_per_cls=samples_per_cls)), + train_cfg=None, + test_cfg=None) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/babel/babel120_train.pkl' +ann_file_val = 'data/babel/babel120_val.pkl' +train_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +val_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +test_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type='RepeatDataset', + times=5, + dataset=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=[10, 14]) +total_epochs = 16 +checkpoint_config = dict(interval=1) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')]) + +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/stgcn_80e_babel120_wfl/' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel60.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel60.py new file mode 100644 index 0000000000000000000000000000000000000000..dd338b9d17f87d6e1a777d569951fe32e460e856 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel60.py @@ -0,0 +1,78 @@ +model = dict( + type='SkeletonGCN', + backbone=dict( + type='STGCN', + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')), + cls_head=dict( + type='STGCNHead', + num_classes=60, + in_channels=256, + num_person=1, + loss_cls=dict(type='CrossEntropyLoss')), + train_cfg=None, + test_cfg=None) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/babel/babel60_train.pkl' +ann_file_val = 'data/babel/babel60_val.pkl' +train_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +val_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +test_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type='RepeatDataset', + times=5, + dataset=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=[10, 14]) +total_epochs = 16 +checkpoint_config = dict(interval=1) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')]) + +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/stgcn_80e_babel60' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel60_wfl.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel60_wfl.py new file mode 100644 index 0000000000000000000000000000000000000000..b19714d673cdb72e03cd7740a416f1f70d5edced --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_babel60_wfl.py @@ -0,0 +1,86 @@ +samples_per_cls = [ + 518, 1993, 6260, 508, 208, 3006, 431, 724, 4527, 2131, 199, 1255, 487, 302, + 136, 571, 267, 646, 1180, 405, 731, 842, 1619, 271, 1198, 1012, 865, 462, + 526, 405, 487, 168, 271, 609, 503, 167, 415, 421, 283, 2069, 715, 196, 989, + 122, 599, 396, 245, 380, 236, 260, 325, 133, 206, 191, 394, 145, 277, 268, + 172, 146 +] + +model = dict( + type='SkeletonGCN', + backbone=dict( + type='STGCN', + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')), + cls_head=dict( + type='STGCNHead', + num_classes=60, + in_channels=256, + num_person=1, + loss_cls=dict(type='CBFocalLoss', samples_per_cls=samples_per_cls)), + train_cfg=None, + test_cfg=None) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/babel/babel60_train.pkl' +ann_file_val = 'data/babel/babel60_val.pkl' +train_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +val_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +test_pipeline = [ + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM', num_person=1), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type='RepeatDataset', + times=5, + dataset=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=[10, 14]) +total_epochs = 16 +checkpoint_config = dict(interval=1) +evaluation = dict( + interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy']) +log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')]) + +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/stgcn_80e_babel60_wfl/' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py new file mode 100644 index 0000000000000000000000000000000000000000..e23f501fe5ddbd9b91f29b63e07e0e341dfeb8bb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py @@ -0,0 +1,80 @@ +model = dict( + type='SkeletonGCN', + backbone=dict( + type='STGCN', + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='coco', strategy='spatial')), + cls_head=dict( + type='STGCNHead', + num_classes=60, + in_channels=256, + loss_cls=dict(type='CrossEntropyLoss')), + train_cfg=None, + test_cfg=None) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/posec3d/ntu60_xsub_train.pkl' +ann_file_val = 'data/posec3d/ntu60_xsub_val.pkl' +train_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='PoseNormalize'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +val_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='PoseNormalize'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +test_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='PoseNormalize'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=[10, 50]) +total_epochs = 80 +checkpoint_config = dict(interval=5) +evaluation = dict(interval=5, metrics=['top_k_accuracy']) +log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')]) + +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/stgcn_80e_ntu60_xsub_keypoint/' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d.py b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d.py new file mode 100644 index 0000000000000000000000000000000000000000..4422dd759c2c9dc22f06a3a563a26a8f07bf61c8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d.py @@ -0,0 +1,77 @@ +model = dict( + type='SkeletonGCN', + backbone=dict( + type='STGCN', + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')), + cls_head=dict( + type='STGCNHead', + num_classes=60, + in_channels=256, + loss_cls=dict(type='CrossEntropyLoss')), + train_cfg=None, + test_cfg=None) + +dataset_type = 'PoseDataset' +ann_file_train = 'data/ntu/nturgb+d_skeletons_60_3d_nmtvc/xsub/train.pkl' +ann_file_val = 'data/ntu/nturgb+d_skeletons_60_3d_nmtvc/xsub/val.pkl' +train_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +val_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +test_pipeline = [ + dict(type='PaddingWithLoop', clip_len=300), + dict(type='PoseDecode'), + dict(type='FormatGCNInput', input_format='NCTVM'), + dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['keypoint']) +] +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + test_dataloader=dict(videos_per_gpu=1), + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix='', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix='', + pipeline=test_pipeline)) + +# optimizer +optimizer = dict( + type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict(policy='step', step=[10, 50]) +total_epochs = 80 +checkpoint_config = dict(interval=3) +evaluation = dict(interval=3, metrics=['top_k_accuracy']) +log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')]) + +# runtime settings +dist_params = dict(backend='nccl') +log_level = 'INFO' +work_dir = './work_dirs/stgcn_80e_ntu60_xsub_keypoint_3d/' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/demo/README.md b/openmmlab_test/mmaction2-0.24.1/demo/README.md new file mode 100644 index 0000000000000000000000000000000000000000..93f85fad85f4f11a302c09b27323f75a39f239df --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/README.md @@ -0,0 +1,674 @@ +# Demo + +## Outline + +- [Modify configs through script arguments](#modify-config-through-script-arguments): Tricks to directly modify configs through script arguments. +- [Video demo](#video-demo): A demo script to predict the recognition result using a single video. +- [SpatioTemporal Action Detection Video Demo](#spatiotemporal-action-detection-video-demo): A demo script to predict the SpatioTemporal Action Detection result using a single video. +- [Video GradCAM Demo](#video-gradcam-demo): A demo script to visualize GradCAM results using a single video. +- [Webcam demo](#webcam-demo): A demo script to implement real-time action recognition from a web camera. +- [Long Video demo](#long-video-demo): a demo script to predict different labels using a single long video. +- [SpatioTemporal Action Detection Webcam Demo](#spatiotemporal-action-detection-webcam-demo): A demo script to implement real-time spatio-temporal action detection from a web camera. +- [Skeleton-based Action Recognition Demo](#skeleton-based-action-recognition-demo): A demo script to predict the skeleton-based action recognition result using a single video. +- [Video Structuralize Demo](#video-structuralize-demo): A demo script to predict the skeleton-based and rgb-based action recognition and spatio-temporal action detection result using a single video. +- [Audio Demo](#audio-demo): A demo script to predict the recognition result using a single audio file. + +## Modify configs through script arguments + +When running demos using our provided scripts, you may specify `--cfg-options` to in-place modify the config. + +- Update config keys of dict. + + The config options can be specified following the order of the dict keys in the original config. + For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode. + +- Update keys inside a list of configs. + + Some config dicts are composed as a list in your config. For example, the training pipeline `data.train.pipeline` is normally a list + e.g. `[dict(type='SampleFrames'), ...]`. If you want to change `'SampleFrames'` to `'DenseSampleFrames'` in the pipeline, + you may specify `--cfg-options data.train.pipeline.0.type=DenseSampleFrames`. + +- Update values of list/tuples. + + If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to + change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark " is necessary to + support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value. + +## Video demo + +We provide a demo script to predict the recognition result using a single video. In order to get predict results in range `[0, 1]`, make sure to set `model['test_cfg'] = dict(average_clips='prob')` in config file. + +```shell +python demo/demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} {LABEL_FILE} [--use-frames] \ + [--device ${DEVICE_TYPE}] [--fps {FPS}] [--font-scale {FONT_SCALE}] [--font-color {FONT_COLOR}] \ + [--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}] +``` + +Optional arguments: + +- `--use-frames`: If specified, the demo will take rawframes as input. Otherwise, it will take a video as input. +- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`. +- `FPS`: FPS value of the output video when using rawframes as input. If not specified, it will be set to 30. +- `FONT_SCALE`: Font scale of the label added in the video. If not specified, it will be 0.5. +- `FONT_COLOR`: Font color of the label added in the video. If not specified, it will be `white`. +- `TARGET_RESOLUTION`: Resolution(desired_width, desired_height) for resizing the frames before output when using a video as input. If not specified, it will be None and the frames are resized by keeping the existing aspect ratio. +- `RESIZE_ALGORITHM`: Resize algorithm used for resizing. If not specified, it will be set to `bicubic`. +- `OUT_FILE`: Path to the output file which can be a video format or gif format. If not specified, it will be set to `None` and does not generate the output file. + +Examples: + +Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`, +or use checkpoint url from `configs/` to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`. + +1. Recognize a video file as input by using a TSN model on cuda by default. + + ```shell + # The demo.mp4 and label_map_k400.txt are both from Kinetics-400 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt + ``` + +2. Recognize a video file as input by using a TSN model on cuda by default, loading checkpoint from url. + + ```shell + # The demo.mp4 and label_map_k400.txt are both from Kinetics-400 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt + ``` + +3. Recognize a list of rawframes as input by using a TSN model on cpu. + + ```shell + python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_FRAMES/ LABEL_FILE --use-frames --device cpu + ``` + +4. Recognize a video file as input by using a TSN model and then generate an mp4 file. + + ```shell + # The demo.mp4 and label_map_k400.txt are both from Kinetics-400 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --out-filename demo/demo_out.mp4 + ``` + +5. Recognize a list of rawframes as input by using a TSN model and then generate a gif file. + + ```shell + python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_FRAMES/ LABEL_FILE --use-frames --out-filename demo/demo_out.gif + ``` + +6. Recognize a video file as input by using a TSN model, then generate an mp4 file with a given resolution and resize algorithm. + + ```shell + # The demo.mp4 and label_map_k400.txt are both from Kinetics-400 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --target-resolution 340 256 --resize-algorithm bilinear \ + --out-filename demo/demo_out.mp4 + ``` + + ```shell + # The demo.mp4 and label_map_k400.txt are both from Kinetics-400 + # If either dimension is set to -1, the frames are resized by keeping the existing aspect ratio + # For --target-resolution 170 -1, original resolution (340, 256) -> target resolution (170, 128) + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --target-resolution 170 -1 --resize-algorithm bilinear \ + --out-filename demo/demo_out.mp4 + ``` + +7. Recognize a video file as input by using a TSN model, then generate an mp4 file with a label in a red color and fontscale 1. + + ```shell + # The demo.mp4 and label_map_k400.txt are both from Kinetics-400 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --font-scale 1 --font-color red \ + --out-filename demo/demo_out.mp4 + ``` + +8. Recognize a list of rawframes as input by using a TSN model and then generate an mp4 file with 24 fps. + + ```shell + python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_FRAMES/ LABEL_FILE --use-frames --fps 24 --out-filename demo/demo_out.gif + ``` + +## SpatioTemporal Action Detection Video Demo + +We provide a demo script to predict the SpatioTemporal Action Detection result using a single video. + +```shell +python demo/demo_spatiotemporal_det.py --video ${VIDEO_FILE} \ + [--config ${SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE}] \ + [--checkpoint ${SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT}] \ + [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \ + [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \ + [--det-score-thr ${HUMAN_DETECTION_SCORE_THRESHOLD}] \ + [--action-score-thr ${ACTION_DETECTION_SCORE_THRESHOLD}] \ + [--label-map ${LABEL_MAP}] \ + [--device ${DEVICE}] \ + [--out-filename ${OUTPUT_FILENAME}] \ + [--predict-stepsize ${PREDICT_STEPSIZE}] \ + [--output-stepsize ${OUTPUT_STEPSIZE}] \ + [--output-fps ${OUTPUT_FPS}] +``` + +Optional arguments: + +- `SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE`: The spatiotemporal action detection config file path. +- `SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT`: The spatiotemporal action detection checkpoint URL. +- `HUMAN_DETECTION_CONFIG_FILE`: The human detection config file path. +- `HUMAN_DETECTION_CHECKPOINT`: The human detection checkpoint URL. +- `HUMAN_DETECTION_SCORE_THRE`: The score threshold for human detection. Default: 0.9. +- `ACTION_DETECTION_SCORE_THRESHOLD`: The score threshold for action detection. Default: 0.5. +- `LABEL_MAP`: The label map used. Default: `tools/data/ava/label_map.txt`. +- `DEVICE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. Default: `cuda:0`. +- `OUTPUT_FILENAME`: Path to the output file which is a video format. Default: `demo/stdet_demo.mp4`. +- `PREDICT_STEPSIZE`: Make a prediction per N frames. Default: 8. +- `OUTPUT_STEPSIZE`: Output 1 frame per N frames in the input video. Note that `PREDICT_STEPSIZE % OUTPUT_STEPSIZE == 0`. Default: 4. +- `OUTPUT_FPS`: The FPS of demo video output. Default: 6. + +Examples: + +Assume that you are located at `$MMACTION2` . + +1. Use the Faster RCNN as the human detector, SlowOnly-8x8-R101 as the action detector. Making predictions per 8 frames, and output 1 frame per 4 frames to the output video. The FPS of the output video is 4. + +```shell +python demo/demo_spatiotemporal_det.py --video demo/demo.mp4 \ + --config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \ + --checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --det-score-thr 0.9 \ + --action-score-thr 0.5 \ + --label-map tools/data/ava/label_map.txt \ + --predict-stepsize 8 \ + --output-stepsize 4 \ + --output-fps 6 +``` + +## Video GradCAM Demo + +We provide a demo script to visualize GradCAM results using a single video. + +```shell +python demo/demo_gradcam.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} [--use-frames] \ + [--device ${DEVICE_TYPE}] [--target-layer-name ${TARGET_LAYER_NAME}] [--fps {FPS}] \ + [--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}] +``` + +- `--use-frames`: If specified, the demo will take rawframes as input. Otherwise, it will take a video as input. +- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`. +- `FPS`: FPS value of the output video when using rawframes as input. If not specified, it will be set to 30. +- `OUT_FILE`: Path to the output file which can be a video format or gif format. If not specified, it will be set to `None` and does not generate the output file. +- `TARGET_LAYER_NAME`: Layer name to generate GradCAM localization map. +- `TARGET_RESOLUTION`: Resolution(desired_width, desired_height) for resizing the frames before output when using a video as input. If not specified, it will be None and the frames are resized by keeping the existing aspect ratio. +- `RESIZE_ALGORITHM`: Resize algorithm used for resizing. If not specified, it will be set to `bilinear`. + +Examples: + +Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`, +or use checkpoint url from `configs/` to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`. + +1. Get GradCAM results of a I3D model, using a video file as input and then generate an gif file with 10 fps. + + ```shell + python demo/demo_gradcam.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth demo/demo.mp4 \ + --target-layer-name backbone/layer4/1/relu --fps 10 \ + --out-filename demo/demo_gradcam.gif + ``` + +2. Get GradCAM results of a TSM model, using a video file as input and then generate an gif file, loading checkpoint from url. + + ```shell + python demo/demo_gradcam.py configs/recognition/tsm/tsm_r50_video_inference_1x1x8_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_1x1x8_100e_kinetics400_rgb_20200702-a77f4328.pth \ + demo/demo.mp4 --target-layer-name backbone/layer4/1/relu --out-filename demo/demo_gradcam_tsm.gif + ``` + +## Webcam demo + +We provide a demo script to implement real-time action recognition from web camera. In order to get predict results in range `[0, 1]`, make sure to set `model.['test_cfg'] = dict(average_clips='prob')` in config file. + +```shell +python demo/webcam_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${LABEL_FILE} \ + [--device ${DEVICE_TYPE}] [--camera-id ${CAMERA_ID}] [--threshold ${THRESHOLD}] \ + [--average-size ${AVERAGE_SIZE}] [--drawing-fps ${DRAWING_FPS}] [--inference-fps ${INFERENCE_FPS}] +``` + +Optional arguments: + +- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`. +- `CAMERA_ID`: ID of camera device If not specified, it will be set to 0. +- `THRESHOLD`: Threshold of prediction score for action recognition. Only label with score higher than the threshold will be shown. If not specified, it will be set to 0. +- `AVERAGE_SIZE`: Number of latest clips to be averaged for prediction. If not specified, it will be set to 1. +- `DRAWING_FPS`: Upper bound FPS value of the output drawing. If not specified, it will be set to 20. +- `INFERENCE_FPS`: Upper bound FPS value of the output drawing. If not specified, it will be set to 4. + +:::{note} +If your hardware is good enough, increasing the value of `DRAWING_FPS` and `INFERENCE_FPS` will get a better experience. +::: + +Examples: + +Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`, +or use checkpoint url from `configs/` to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`. + +1. Recognize the action from web camera as input by using a TSN model on cpu, averaging the score per 5 times + and outputting result labels with score higher than 0.2. + + ```shell + python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth tools/data/kinetics/label_map_k400.txt --average-size 5 \ + --threshold 0.2 --device cpu + ``` + +2. Recognize the action from web camera as input by using a TSN model on cpu, averaging the score per 5 times + and outputting result labels with score higher than 0.2, loading checkpoint from url. + + ```shell + python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + tools/data/kinetics/label_map_k400.txt --average-size 5 --threshold 0.2 --device cpu + ``` + +3. Recognize the action from web camera as input by using a I3D model on gpu by default, averaging the score per 5 times + and outputting result labels with score higher than 0.2. + + ```shell + python demo/webcam_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth tools/data/kinetics/label_map_k400.txt \ + --average-size 5 --threshold 0.2 + ``` + +:::{note} +Considering the efficiency difference for users' hardware, Some modifications might be done to suit the case. +Users can change: + +1). `SampleFrames` step (especially the number of `clip_len` and `num_clips`) of `test_pipeline` in the config file, like `--cfg-options data.test.pipeline.0.num_clips=3`. +2). Change to the suitable Crop methods like `TenCrop`, `ThreeCrop`, `CenterCrop`, etc. in `test_pipeline` of the config file, like `--cfg-options data.test.pipeline.4.type=CenterCrop`. +3). Change the number of `--average-size`. The smaller, the faster. +::: + +## Long video demo + +We provide a demo script to predict different labels using a single long video. In order to get predict results in range `[0, 1]`, make sure to set `test_cfg = dict(average_clips='prob')` in config file. + +```shell +python demo/long_video_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} ${LABEL_FILE} \ + ${OUT_FILE} [--input-step ${INPUT_STEP}] [--device ${DEVICE_TYPE}] [--threshold ${THRESHOLD}] +``` + +Optional arguments: + +- `OUT_FILE`: Path to the output, either video or json file +- `INPUT_STEP`: Input step for sampling frames, which can help to get more spare input. If not specified , it will be set to 1. +- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`. +- `THRESHOLD`: Threshold of prediction score for action recognition. Only label with score higher than the threshold will be shown. If not specified, it will be set to 0.01. +- `STRIDE`: By default, the demo generates a prediction for each single frame, which might cost lots of time. To speed up, you can set the argument `STRIDE` and then the demo will generate a prediction every `STRIDE x sample_length` frames (`sample_length` indicates the size of temporal window from which you sample frames, which equals to `clip_len x frame_interval`). For example, if the sample_length is 64 frames and you set `STRIDE` to 0.5, predictions will be generated every 32 frames. If set as 0, predictions will be generated for each frame. The desired value of `STRIDE` is (0, 1\], while it also works for `STRIDE > 1` (the generated predictions will be too sparse). Default: 0. +- `LABEL_COLOR`: Font Color of the labels in (B, G, R). Default is white, that is (256, 256, 256). +- `MSG_COLOR`: Font Color of the messages in (B, G, R). Default is gray, that is (128, 128, 128). + +Examples: + +Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`, +or use checkpoint url from `configs/` to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`. + +1. Predict different labels in a long video by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames) + and outputting result labels with score higher than 0.2. + + ```shell + python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO \ + --input-step 3 --device cpu --threshold 0.2 + ``` + +2. Predict different labels in a long video by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames) + and outputting result labels with score higher than 0.2, loading checkpoint from url. + + ```shell + python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2 + ``` + +3. Predict different labels in a long video from web by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames) + and outputting result labels with score higher than 0.2, loading checkpoint from url. + + ```shell + python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4 \ + tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2 + ``` + +4. Predict different labels in a long video by using a I3D model on gpu, with input_step=1, threshold=0.01 as default and print the labels in cyan. + + ```shell + python demo/long_video_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO \ + --label-color 255 255 0 + ``` + +5. Predict different labels in a long video by using a I3D model on gpu and save the results as a `json` file + + ```shell + python demo/long_video_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt ./results.json + ``` + +## SpatioTemporal Action Detection Webcam Demo + +We provide a demo script to implement real-time spatio-temporal action detection from a web camera. + +```shell +python demo/webcam_demo_spatiotemporal_det.py \ + [--config ${SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE}] \ + [--checkpoint ${SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT}] \ + [--action-score-thr ${ACTION_DETECTION_SCORE_THRESHOLD}] \ + [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \ + [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \ + [--det-score-thr ${HUMAN_DETECTION_SCORE_THRESHOLD}] \ + [--input-video] ${INPUT_VIDEO} \ + [--label-map ${LABEL_MAP}] \ + [--device ${DEVICE}] \ + [--output-fps ${OUTPUT_FPS}] \ + [--out-filename ${OUTPUT_FILENAME}] \ + [--show] \ + [--display-height] ${DISPLAY_HEIGHT} \ + [--display-width] ${DISPLAY_WIDTH} \ + [--predict-stepsize ${PREDICT_STEPSIZE}] \ + [--clip-vis-length] ${CLIP_VIS_LENGTH} +``` + +Optional arguments: + +- `SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE`: The spatiotemporal action detection config file path. +- `SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT`: The spatiotemporal action detection checkpoint path or URL. +- `ACTION_DETECTION_SCORE_THRESHOLD`: The score threshold for action detection. Default: 0.4. +- `HUMAN_DETECTION_CONFIG_FILE`: The human detection config file path. +- `HUMAN_DETECTION_CHECKPOINT`: The human detection checkpoint URL. +- `HUMAN_DETECTION_SCORE_THRE`: The score threshold for human detection. Default: 0.9. +- `INPUT_VIDEO`: The webcam id or video path of the source. Default: `0`. +- `LABEL_MAP`: The label map used. Default: `tools/data/ava/label_map.txt`. +- `DEVICE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. Default: `cuda:0`. +- `OUTPUT_FPS`: The FPS of demo video output. Default: 15. +- `OUTPUT_FILENAME`: Path to the output file which is a video format. Default: None. +- `--show`: Whether to show predictions with `cv2.imshow`. +- `DISPLAY_HEIGHT`: The height of the display frame. Default: 0. +- `DISPLAY_WIDTH`: The width of the display frame. Default: 0. If `DISPLAY_HEIGHT <= 0 and DISPLAY_WIDTH <= 0`, the display frame and input video share the same shape. +- `PREDICT_STEPSIZE`: Make a prediction per N frames. Default: 8. +- `CLIP_VIS_LENGTH`: The number of the draw frames for each clip. In other words, for each clip, there are at most `CLIP_VIS_LENGTH` frames to be draw around the keyframe. DEFAULT: 8. + +Tips to get a better experience for webcam demo: + +- How to choose `--output-fps`? + + - `--output-fps` should be almost equal to read thread fps. + - Read thread fps is printed by logger in format `DEBUG:__main__:Read Thread: {duration} ms, {fps} fps` + +- How to choose `--predict-stepsize`? + + - It's related to how to choose human detector and spatio-temporval model. + - Overall, the duration of read thread for each task should be greater equal to that of model inference. + - The durations for read/inference are both printed by logger. + - Larger `--predict-stepsize` leads to larger duration for read thread. + - In order to fully take the advantage of computation resources, decrease the value of `--predict-stepsize`. + +Examples: + +Assume that you are located at `$MMACTION2` . + +1. Use the Faster RCNN as the human detector, SlowOnly-8x8-R101 as the action detector. Making predictions per 40 frames, and FPS of the output is 20. Show predictions with `cv2.imshow`. + +```shell +python demo/webcam_demo_spatiotemporal_det.py \ + --input-video 0 \ + --config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \ + --checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --det-score-thr 0.9 \ + --action-score-thr 0.5 \ + --label-map tools/data/ava/label_map.txt \ + --predict-stepsize 40 \ + --output-fps 20 \ + --show +``` + +## Skeleton-based Action Recognition Demo + +We provide a demo script to predict the skeleton-based action recognition result using a single video. + +```shell +python demo/demo_skeleton.py ${VIDEO_FILE} ${OUT_FILENAME} \ + [--config ${SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE}] \ + [--checkpoint ${SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT}] \ + [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \ + [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \ + [--det-score-thr ${HUMAN_DETECTION_SCORE_THRESHOLD}] \ + [--pose-config ${HUMAN_POSE_ESTIMATION_CONFIG_FILE}] \ + [--pose-checkpoint ${HUMAN_POSE_ESTIMATION_CHECKPOINT}] \ + [--label-map ${LABEL_MAP}] \ + [--device ${DEVICE}] \ + [--short-side] ${SHORT_SIDE} +``` + +Optional arguments: + +- `SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE`: The skeleton-based action recognition config file path. +- `SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT`: The skeleton-based action recognition checkpoint path or URL. +- `HUMAN_DETECTION_CONFIG_FILE`: The human detection config file path. +- `HUMAN_DETECTION_CHECKPOINT`: The human detection checkpoint URL. +- `HUMAN_DETECTION_SCORE_THRE`: The score threshold for human detection. Default: 0.9. +- `HUMAN_POSE_ESTIMATION_CONFIG_FILE`: The human pose estimation config file path (trained on COCO-Keypoint). +- `HUMAN_POSE_ESTIMATION_CHECKPOINT`: The human pose estimation checkpoint URL (trained on COCO-Keypoint). +- `LABEL_MAP`: The label map used. Default: `tools/data/ava/label_map.txt`. +- `DEVICE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. Default: `cuda:0`. +- `SHORT_SIDE`: The short side used for frame extraction. Default: 480. + +Examples: + +Assume that you are located at `$MMACTION2` . + +1. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, PoseC3D-NTURGB+D-120-Xsub-keypoint as the skeleton-based action recognizer. + +```shell +python demo/demo_skeleton.py demo/ntu_sample.avi demo/skeleton_demo.mp4 \ + --config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \ + --checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --det-score-thr 0.9 \ + --pose-config demo/hrnet_w32_coco_256x192.py \ + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --label-map tools/data/skeleton/label_map_ntu120.txt +``` + +2. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, STGCN-NTURGB+D-60-Xsub-keypoint as the skeleton-based action recognizer. + +```shell +python demo/demo_skeleton.py demo/ntu_sample.avi demo/skeleton_demo.mp4 \ + --config configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \ + --checkpoint https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint-e7bb9653.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --det-score-thr 0.9 \ + --pose-config demo/hrnet_w32_coco_256x192.py \ + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --label-map tools/data/skeleton/label_map_ntu120.txt +``` + +## Video Structuralize Demo + +We provide a demo script to to predict the skeleton-based and rgb-based action recognition and spatio-temporal action detection result using a single video. + +```shell +python demo/demo_video_structuralize.py + [--rgb-stdet-config ${RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CONFIG_FILE}] \ + [--rgb-stdet-checkpoint ${RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT}] \ + [--skeleton-stdet-checkpoint ${SKELETON_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT}] \ + [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \ + [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \ + [--pose-config ${HUMAN_POSE_ESTIMATION_CONFIG_FILE}] \ + [--pose-checkpoint ${HUMAN_POSE_ESTIMATION_CHECKPOINT}] \ + [--skeleton-config ${SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE}] \ + [--skeleton-checkpoint ${SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT}] \ + [--rgb-config ${RGB_BASED_ACTION_RECOGNITION_CONFIG_FILE}] \ + [--rgb-checkpoint ${RGB_BASED_ACTION_RECOGNITION_CHECKPOINT}] \ + [--use-skeleton-stdet ${USE_SKELETON_BASED_SPATIO_TEMPORAL_DETECTION_METHOD}] \ + [--use-skeleton-recog ${USE_SKELETON_BASED_ACTION_RECOGNITION_METHOD}] \ + [--det-score-thr ${HUMAN_DETECTION_SCORE_THRE}] \ + [--action-score-thr ${ACTION_DETECTION_SCORE_THRE}] \ + [--video ${VIDEO_FILE}] \ + [--label-map-stdet ${LABEL_MAP_FOR_SPATIO_TEMPORAL_ACTION_DETECTION}] \ + [--device ${DEVICE}] \ + [--out-filename ${OUTPUT_FILENAME}] \ + [--predict-stepsize ${PREDICT_STEPSIZE}] \ + [--output-stepsize ${OUTPU_STEPSIZE}] \ + [--output-fps ${OUTPUT_FPS}] \ + [--cfg-options] +``` + +Optional arguments: + +- `RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CONFIG_FILE`: The rgb-based spatio temoral action detection config file path. +- `RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT`: The rgb-based spatio temoral action detection checkpoint path or URL. +- `SKELETON_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT`: The skeleton-based spatio temoral action detection checkpoint path or URL. +- `HUMAN_DETECTION_CONFIG_FILE`: The human detection config file path. +- `HUMAN_DETECTION_CHECKPOINT`: The human detection checkpoint URL. +- `HUMAN_POSE_ESTIMATION_CONFIG_FILE`: The human pose estimation config file path (trained on COCO-Keypoint). +- `HUMAN_POSE_ESTIMATION_CHECKPOINT`: The human pose estimation checkpoint URL (trained on COCO-Keypoint). +- `SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE`: The skeleton-based action recognition config file path. +- `SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT`: The skeleton-based action recognition checkpoint path or URL. +- `RGB_BASED_ACTION_RECOGNITION_CONFIG_FILE`: The rgb-based action recognition config file path. +- `RGB_BASED_ACTION_RECOGNITION_CHECKPOINT`: The rgb-based action recognition checkpoint path or URL. +- `USE_SKELETON_BASED_SPATIO_TEMPORAL_DETECTION_METHOD`: Use skeleton-based spatio temporal action detection method. +- `USE_SKELETON_BASED_ACTION_RECOGNITION_METHOD`: Use skeleton-based action recognition method. +- `HUMAN_DETECTION_SCORE_THRE`: The score threshold for human detection. Default: 0.9. +- `ACTION_DETECTION_SCORE_THRE`: The score threshold for action detection. Default: 0.4. +- `LABEL_MAP_FOR_SPATIO_TEMPORAL_ACTION_DETECTION`: The label map for spatio temporal action detection used. Default: `tools/data/ava/label_map.txt`. +- `LABEL_MAP`: The label map for action recognition. Default: `tools/data/kinetics/label_map_k400.txt`. +- `DEVICE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. Default: `cuda:0`. +- `OUTPUT_FILENAME`: Path to the output file which is a video format. Default: `demo/test_stdet_recognition_output.mp4`. +- `PREDICT_STEPSIZE`: Make a prediction per N frames. Default: 8. +- `OUTPUT_STEPSIZE`: Output 1 frame per N frames in the input video. Note that `PREDICT_STEPSIZE % OUTPUT_STEPSIZE == 0`. Default: 1. +- `OUTPUT_FPS`: The FPS of demo video output. Default: 24. + +Examples: + +Assume that you are located at `$MMACTION2` . + +1. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, PoseC3D as the skeleton-based action recognizer and the skeleton-based spatio temporal action detector. Making action detection predictions per 8 frames, and output 1 frame per 1 frame to the output video. The FPS of the output video is 24. + +```shell +python demo/demo_video_structuralize.py + --skeleton-stdet-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/posec3d_ava.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --pose-config demo/hrnet_w32_coco_256x192.py + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/ + hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --skeleton-config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \ + --skeleton-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/ + posec3d_k400.pth \ + --use-skeleton-stdet \ + --use-skeleton-recog \ + --label-map-stdet tools/data/ava/label_map.txt \ + --label-map tools/data/kinetics/label_map_k400.txt +``` + +2. Use the Faster RCNN as the human detector, TSN-R50-1x1x3 as the rgb-based action recognizer, SlowOnly-8x8-R101 as the rgb-based spatio temporal action detector. Making action detection predictions per 8 frames, and output 1 frame per 1 frame to the output video. The FPS of the output video is 24. + +```shell +python demo/demo_video_structuralize.py + --rgb-stdet-config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \ + --rgb-stdet-checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --rgb-config configs/recognition/tsn/ + tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + --rgb-checkpoint https://download.openmmlab.com/mmaction/recognition/ + tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/ + tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + --label-map-stdet tools/data/ava/label_map.txt \ + --label-map tools/data/kinetics/label_map_k400.txt +``` + +3. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, PoseC3D as the skeleton-based action recognizer, SlowOnly-8x8-R101 as the rgb-based spatio temporal action detector. Making action detection predictions per 8 frames, and output 1 frame per 1 frame to the output video. The FPS of the output video is 24. + +```shell +python demo/demo_video_structuralize.py + --rgb-stdet-config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \ + --rgb-stdet-checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --pose-config demo/hrnet_w32_coco_256x192.py + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/ + hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --skeleton-config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \ + --skeleton-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/ + posec3d_k400.pth \ + --use-skeleton-recog \ + --label-map-stdet tools/data/ava/label_map.txt \ + --label-map tools/data/kinetics/label_map_k400.txt +``` + +4. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, TSN-R50-1x1x3 as the rgb-based action recognizer, PoseC3D as the skeleton-based spatio temporal action detector. Making action detection predictions per 8 frames, and output 1 frame per 1 frame to the output video. The FPS of the output video is 24. + +```shell +python demo/demo_video_structuralize.py + --skeleton-stdet-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/posec3d_ava.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --pose-config demo/hrnet_w32_coco_256x192.py + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/ + hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --skeleton-config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \ + --rgb-config configs/recognition/tsn/ + tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + --rgb-checkpoint https://download.openmmlab.com/mmaction/recognition/ + tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/ + tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + --use-skeleton-stdet \ + --label-map-stdet tools/data/ava/label_map.txt \ + --label-map tools/data/kinetics/label_map_k400.txt +``` + +## Audio Demo + +Demo script to predict the audio-based action recognition using a single audio feature. + +The script `extract_audio.py` can be used to extract audios from videos and the script `build_audio_features.py` can be used to extract the audio features. + +```shell +python demo/demo_audio.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${AUDIO_FILE} {LABEL_FILE} [--device ${DEVICE}] +``` + +Optional arguments: + +- `DEVICE`: Type of device to run the demo. Allowed values are cuda devices like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`. + +Examples: + +Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`, +or use checkpoint url from `configs/` to directly load the corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`. + +1. Recognize an audio file as input by using a tsn model on cuda by default. + + ```shell + python demo/demo_audio.py \ + configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py \ + https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/tsn_r18_64x1x1_100e_kinetics400_audio_feature_20201012-bf34df6c.pth \ + audio_feature.npy label_map_k400.txt + ``` diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo.gif b/openmmlab_test/mmaction2-0.24.1/demo/demo.gif new file mode 100644 index 0000000000000000000000000000000000000000..3f9953cdcf50f622e56ab81f5d609acd7c34e81a Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/demo/demo.gif differ diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo.ipynb b/openmmlab_test/mmaction2-0.24.1/demo/demo.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..fd9e42487b409224d629419a310cd261988215de --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/demo.ipynb @@ -0,0 +1,128 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "pycharm": { + "is_executing": false + } + }, + "outputs": [], + "source": [ + "from mmaction.apis import init_recognizer, inference_recognizer" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "pycharm": { + "is_executing": false + } + }, + "outputs": [], + "source": [ + "config_file = '../configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'\n", + "# download the checkpoint from model zoo and put it in `checkpoints/`\n", + "checkpoint_file = '../checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "pycharm": { + "is_executing": false + } + }, + "outputs": [], + "source": [ + "# build the model from a config file and a checkpoint file\n", + "model = init_recognizer(config_file, checkpoint_file, device='cpu')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "pycharm": { + "is_executing": false + } + }, + "outputs": [], + "source": [ + "# test a single video and show the result:\n", + "video = 'demo.mp4'\n", + "label = '../tools/data/kinetics/label_map_k400.txt'\n", + "results = inference_recognizer(model, video)\n", + "\n", + "labels = open(label).readlines()\n", + "labels = [x.strip() for x in labels]\n", + "results = [(labels[k[0]], k[1]) for k in results]" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "pycharm": { + "is_executing": false, + "name": "#%%\n" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "arm wrestling: 29.61644\n", + "rock scissors paper: 10.754839\n", + "shaking hands: 9.9084\n", + "clapping: 9.189912\n", + "massaging feet: 8.305307\n" + ] + } + ], + "source": [ + "# show the results\n", + "for result in results:\n", + " print(f'{result[0]}: ', result[1])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + }, + "pycharm": { + "stem_cell": { + "cell_type": "raw", + "metadata": { + "collapsed": false + }, + "source": [] + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo.mp4 b/openmmlab_test/mmaction2-0.24.1/demo/demo.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..8a1ffbf2cd72d916094a1f2c0ddc56586787e385 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/demo/demo.mp4 differ diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo.py b/openmmlab_test/mmaction2-0.24.1/demo/demo.py new file mode 100644 index 0000000000000000000000000000000000000000..85565cb5a29c90590f570979ab511bc7c8920b50 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/demo.py @@ -0,0 +1,207 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import os.path as osp + +import cv2 +import decord +import numpy as np +import torch +import webcolors +from mmcv import Config, DictAction + +from mmaction.apis import inference_recognizer, init_recognizer + + +def parse_args(): + parser = argparse.ArgumentParser(description='MMAction2 demo') + parser.add_argument('config', help='test config file path') + parser.add_argument('checkpoint', help='checkpoint file/url') + parser.add_argument('video', help='video file/url or rawframes directory') + parser.add_argument('label', help='label file') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + parser.add_argument( + '--use-frames', + default=False, + action='store_true', + help='whether to use rawframes as input') + parser.add_argument( + '--device', type=str, default='cuda:0', help='CPU/CUDA device option') + parser.add_argument( + '--fps', + default=30, + type=int, + help='specify fps value of the output video when using rawframes to ' + 'generate file') + parser.add_argument( + '--font-scale', + default=0.5, + type=float, + help='font scale of the label in output video') + parser.add_argument( + '--font-color', + default='white', + help='font color of the label in output video') + parser.add_argument( + '--target-resolution', + nargs=2, + default=None, + type=int, + help='Target resolution (w, h) for resizing the frames when using a ' + 'video as input. If either dimension is set to -1, the frames are ' + 'resized by keeping the existing aspect ratio') + parser.add_argument( + '--resize-algorithm', + default='bicubic', + help='resize algorithm applied to generate video') + parser.add_argument('--out-filename', default=None, help='output filename') + args = parser.parse_args() + return args + + +def get_output(video_path, + out_filename, + label, + fps=30, + font_scale=0.5, + font_color='white', + target_resolution=None, + resize_algorithm='bicubic', + use_frames=False): + """Get demo output using ``moviepy``. + + This function will generate video file or gif file from raw video or + frames, by using ``moviepy``. For more information of some parameters, + you can refer to: https://github.com/Zulko/moviepy. + + Args: + video_path (str): The video file path or the rawframes directory path. + If ``use_frames`` is set to True, it should be rawframes directory + path. Otherwise, it should be video file path. + out_filename (str): Output filename for the generated file. + label (str): Predicted label of the generated file. + fps (int): Number of picture frames to read per second. Default: 30. + font_scale (float): Font scale of the label. Default: 0.5. + font_color (str): Font color of the label. Default: 'white'. + target_resolution (None | tuple[int | None]): Set to + (desired_width desired_height) to have resized frames. If either + dimension is None, the frames are resized by keeping the existing + aspect ratio. Default: None. + resize_algorithm (str): Support "bicubic", "bilinear", "neighbor", + "lanczos", etc. Default: 'bicubic'. For more information, + see https://ffmpeg.org/ffmpeg-scaler.html + use_frames: Determine Whether to use rawframes as input. Default:False. + """ + + if video_path.startswith(('http://', 'https://')): + raise NotImplementedError + + try: + from moviepy.editor import ImageSequenceClip + except ImportError: + raise ImportError('Please install moviepy to enable output file.') + + # Channel Order is BGR + if use_frames: + frame_list = sorted( + [osp.join(video_path, x) for x in os.listdir(video_path)]) + frames = [cv2.imread(x) for x in frame_list] + else: + video = decord.VideoReader(video_path) + frames = [x.asnumpy()[..., ::-1] for x in video] + + if target_resolution: + w, h = target_resolution + frame_h, frame_w, _ = frames[0].shape + if w == -1: + w = int(h / frame_h * frame_w) + if h == -1: + h = int(w / frame_w * frame_h) + frames = [cv2.resize(f, (w, h)) for f in frames] + + textsize = cv2.getTextSize(label, cv2.FONT_HERSHEY_DUPLEX, font_scale, + 1)[0] + textheight = textsize[1] + padding = 10 + location = (padding, padding + textheight) + + if isinstance(font_color, str): + font_color = webcolors.name_to_rgb(font_color)[::-1] + + frames = [np.array(frame) for frame in frames] + for frame in frames: + cv2.putText(frame, label, location, cv2.FONT_HERSHEY_DUPLEX, + font_scale, font_color, 1) + + # RGB order + frames = [x[..., ::-1] for x in frames] + video_clips = ImageSequenceClip(frames, fps=fps) + + out_type = osp.splitext(out_filename)[1][1:] + if out_type == 'gif': + video_clips.write_gif(out_filename) + else: + video_clips.write_videofile(out_filename, remove_temp=True) + + +def main(): + args = parse_args() + # assign the desired device. + device = torch.device(args.device) + + cfg = Config.fromfile(args.config) + cfg.merge_from_dict(args.cfg_options) + + # build the recognizer from a config file and checkpoint file/url + model = init_recognizer(cfg, args.checkpoint, device=device) + + # e.g. use ('backbone', ) to return backbone feature + output_layer_names = None + + # test a single video or rawframes of a single video + if output_layer_names: + results, returned_feature = inference_recognizer( + model, args.video, outputs=output_layer_names) + else: + results = inference_recognizer(model, args.video) + + labels = open(args.label).readlines() + labels = [x.strip() for x in labels] + results = [(labels[k[0]], k[1]) for k in results] + + print('The top-5 labels with corresponding scores are:') + for result in results: + print(f'{result[0]}: ', result[1]) + + if args.out_filename is not None: + + if args.target_resolution is not None: + if args.target_resolution[0] == -1: + assert isinstance(args.target_resolution[1], int) + assert args.target_resolution[1] > 0 + if args.target_resolution[1] == -1: + assert isinstance(args.target_resolution[0], int) + assert args.target_resolution[0] > 0 + args.target_resolution = tuple(args.target_resolution) + + get_output( + args.video, + args.out_filename, + results[0][0], + fps=args.fps, + font_scale=args.font_scale, + font_color=args.font_color, + target_resolution=args.target_resolution, + resize_algorithm=args.resize_algorithm, + use_frames=args.use_frames) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo_audio.py b/openmmlab_test/mmaction2-0.24.1/demo/demo_audio.py new file mode 100644 index 0000000000000000000000000000000000000000..bcbde94a1e76f822f6b617142bb8c296cc4312fb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/demo_audio.py @@ -0,0 +1,51 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse + +import torch +from mmcv import Config, DictAction + +from mmaction.apis import inference_recognizer, init_recognizer + + +def parse_args(): + parser = argparse.ArgumentParser(description='MMAction2 demo') + parser.add_argument('config', help='test config file path') + parser.add_argument('checkpoint', help='checkpoint file/url') + parser.add_argument('audio', help='audio file') + parser.add_argument('label', help='label file') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + parser.add_argument( + '--device', type=str, default='cuda:0', help='CPU/CUDA device option') + args = parser.parse_args() + return args + + +def main(): + args = parse_args() + device = torch.device(args.device) + cfg = Config.fromfile(args.config) + cfg.merge_from_dict(args.cfg_options) + model = init_recognizer(cfg, args.checkpoint, device=device) + + if not args.audio.endswith('.npy'): + raise NotImplementedError('Demo works on extracted audio features') + results = inference_recognizer(model, args.audio) + + labels = open(args.label).readlines() + labels = [x.strip() for x in labels] + results = [(labels[k[0]], k[1]) for k in results] + + print('Scores:') + for result in results: + print(f'{result[0]}: ', result[1]) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo_gradcam.gif b/openmmlab_test/mmaction2-0.24.1/demo/demo_gradcam.gif new file mode 100644 index 0000000000000000000000000000000000000000..56f78ca4c616da1f1db7b66758526c5d55774bab Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/demo/demo_gradcam.gif differ diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo_gradcam.py b/openmmlab_test/mmaction2-0.24.1/demo/demo_gradcam.py new file mode 100644 index 0000000000000000000000000000000000000000..4af6851ac4e1b70ddf4959d4186851d511e064b0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/demo_gradcam.py @@ -0,0 +1,208 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import os.path as osp + +import mmcv +import numpy as np +import torch +from mmcv import Config, DictAction +from mmcv.parallel import collate, scatter + +from mmaction.apis import init_recognizer +from mmaction.datasets.pipelines import Compose +from mmaction.utils import GradCAM + + +def parse_args(): + parser = argparse.ArgumentParser(description='MMAction2 GradCAM demo') + + parser.add_argument('config', help='test config file path') + parser.add_argument('checkpoint', help='checkpoint file/url') + parser.add_argument('video', help='video file/url or rawframes directory') + parser.add_argument( + '--use-frames', + default=False, + action='store_true', + help='whether to use rawframes as input') + parser.add_argument( + '--device', type=str, default='cuda:0', help='CPU/CUDA device option') + parser.add_argument( + '--target-layer-name', + type=str, + default='backbone/layer4/1/relu', + help='GradCAM target layer name') + parser.add_argument('--out-filename', default=None, help='output filename') + parser.add_argument('--fps', default=5, type=int) + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + parser.add_argument( + '--target-resolution', + nargs=2, + default=None, + type=int, + help='Target resolution (w, h) for resizing the frames when using a ' + 'video as input. If either dimension is set to -1, the frames are ' + 'resized by keeping the existing aspect ratio') + parser.add_argument( + '--resize-algorithm', + default='bilinear', + help='resize algorithm applied to generate video & gif') + + args = parser.parse_args() + return args + + +def build_inputs(model, video_path, use_frames=False): + """build inputs for GradCAM. + + Note that, building inputs for GradCAM is exactly the same as building + inputs for Recognizer test stage. Codes from `inference_recognizer`. + + Args: + model (nn.Module): Recognizer model. + video_path (str): video file/url or rawframes directory. + use_frames (bool): whether to use rawframes as input. + Returns: + dict: Both GradCAM inputs and Recognizer test stage inputs, + including two keys, ``imgs`` and ``label``. + """ + if not (osp.exists(video_path) or video_path.startswith('http')): + raise RuntimeError(f"'{video_path}' is missing") + + if osp.isfile(video_path) and use_frames: + raise RuntimeError( + f"'{video_path}' is a video file, not a rawframe directory") + if osp.isdir(video_path) and not use_frames: + raise RuntimeError( + f"'{video_path}' is a rawframe directory, not a video file") + + cfg = model.cfg + device = next(model.parameters()).device # model device + + # build the data pipeline + test_pipeline = cfg.data.test.pipeline + test_pipeline = Compose(test_pipeline) + # prepare data + if use_frames: + filename_tmpl = cfg.data.test.get('filename_tmpl', 'img_{:05}.jpg') + modality = cfg.data.test.get('modality', 'RGB') + start_index = cfg.data.test.get('start_index', 1) + data = dict( + frame_dir=video_path, + total_frames=len(os.listdir(video_path)), + label=-1, + start_index=start_index, + filename_tmpl=filename_tmpl, + modality=modality) + else: + start_index = cfg.data.test.get('start_index', 0) + data = dict( + filename=video_path, + label=-1, + start_index=start_index, + modality='RGB') + data = test_pipeline(data) + data = collate([data], samples_per_gpu=1) + if next(model.parameters()).is_cuda: + # scatter to specified GPU + data = scatter(data, [device])[0] + + return data + + +def _resize_frames(frame_list, + scale, + keep_ratio=True, + interpolation='bilinear'): + """resize frames according to given scale. + + Codes are modified from `mmaction2/datasets/pipelines/augmentation.py`, + `Resize` class. + + Args: + frame_list (list[np.ndarray]): frames to be resized. + scale (tuple[int]): If keep_ratio is True, it serves as scaling + factor or maximum size: the image will be rescaled as large + as possible within the scale. Otherwise, it serves as (w, h) + of output size. + keep_ratio (bool): If set to True, Images will be resized without + changing the aspect ratio. Otherwise, it will resize images to a + given size. Default: True. + interpolation (str): Algorithm used for interpolation: + "nearest" | "bilinear". Default: "bilinear". + Returns: + list[np.ndarray]: Both GradCAM and Recognizer test stage inputs, + including two keys, ``imgs`` and ``label``. + """ + if scale is None or (scale[0] == -1 and scale[1] == -1): + return frame_list + scale = tuple(scale) + max_long_edge = max(scale) + max_short_edge = min(scale) + if max_short_edge == -1: + scale = (np.inf, max_long_edge) + + img_h, img_w, _ = frame_list[0].shape + + if keep_ratio: + new_w, new_h = mmcv.rescale_size((img_w, img_h), scale) + else: + new_w, new_h = scale + + frame_list = [ + mmcv.imresize(img, (new_w, new_h), interpolation=interpolation) + for img in frame_list + ] + + return frame_list + + +def main(): + args = parse_args() + + # assign the desired device. + device = torch.device(args.device) + + cfg = Config.fromfile(args.config) + cfg.merge_from_dict(args.cfg_options) + + # build the recognizer from a config file and checkpoint file/url + model = init_recognizer(cfg, args.checkpoint, device=device) + + inputs = build_inputs(model, args.video, use_frames=args.use_frames) + gradcam = GradCAM(model, args.target_layer_name) + results = gradcam(inputs) + + if args.out_filename is not None: + try: + from moviepy.editor import ImageSequenceClip + except ImportError: + raise ImportError('Please install moviepy to enable output file.') + + # frames_batches shape [B, T, H, W, 3], in RGB order + frames_batches = (results[0] * 255.).numpy().astype(np.uint8) + frames = frames_batches.reshape(-1, *frames_batches.shape[-3:]) + + frame_list = list(frames) + frame_list = _resize_frames( + frame_list, + args.target_resolution, + interpolation=args.resize_algorithm) + + video_clips = ImageSequenceClip(frame_list, fps=args.fps) + out_type = osp.splitext(args.out_filename)[1][1:] + if out_type == 'gif': + video_clips.write_gif(args.out_filename) + else: + video_clips.write_videofile(args.out_filename, remove_temp=True) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo_out.mp4 b/openmmlab_test/mmaction2-0.24.1/demo/demo_out.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..f689f60f70a28974b9d8f90d0ca93fb2e2afaa94 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/demo/demo_out.mp4 differ diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo_skeleton.py b/openmmlab_test/mmaction2-0.24.1/demo/demo_skeleton.py new file mode 100644 index 0000000000000000000000000000000000000000..f74f593f3db9803178b4f09b9994a290072c3d1c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/demo_skeleton.py @@ -0,0 +1,253 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import os.path as osp +import shutil + +import cv2 +import mmcv +import numpy as np +import torch +from mmcv import DictAction + +from mmaction.apis import inference_recognizer, init_recognizer + +try: + from mmdet.apis import inference_detector, init_detector +except (ImportError, ModuleNotFoundError): + raise ImportError('Failed to import `inference_detector` and ' + '`init_detector` form `mmdet.apis`. These apis are ' + 'required in this demo! ') + +try: + from mmpose.apis import (inference_top_down_pose_model, init_pose_model, + vis_pose_result) +except (ImportError, ModuleNotFoundError): + raise ImportError('Failed to import `inference_top_down_pose_model`, ' + '`init_pose_model`, and `vis_pose_result` form ' + '`mmpose.apis`. These apis are required in this demo! ') + +try: + import moviepy.editor as mpy +except ImportError: + raise ImportError('Please install moviepy to enable output file') + +FONTFACE = cv2.FONT_HERSHEY_DUPLEX +FONTSCALE = 0.75 +FONTCOLOR = (255, 255, 255) # BGR, white +THICKNESS = 1 +LINETYPE = 1 + + +def parse_args(): + parser = argparse.ArgumentParser(description='MMAction2 demo') + parser.add_argument('video', help='video file/url') + parser.add_argument('out_filename', help='output filename') + parser.add_argument( + '--config', + default=('configs/skeleton/posec3d/' + 'slowonly_r50_u48_240e_ntu120_xsub_keypoint.py'), + help='skeleton model config file path') + parser.add_argument( + '--checkpoint', + default=('https://download.openmmlab.com/mmaction/skeleton/posec3d/' + 'slowonly_r50_u48_240e_ntu120_xsub_keypoint/' + 'slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth'), + help='skeleton model checkpoint file/url') + parser.add_argument( + '--det-config', + default='demo/faster_rcnn_r50_fpn_2x_coco.py', + help='human detection config file path (from mmdet)') + parser.add_argument( + '--det-checkpoint', + default=('http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/' + 'faster_rcnn_r50_fpn_2x_coco/' + 'faster_rcnn_r50_fpn_2x_coco_' + 'bbox_mAP-0.384_20200504_210434-a5d8aa15.pth'), + help='human detection checkpoint file/url') + parser.add_argument( + '--pose-config', + default='demo/hrnet_w32_coco_256x192.py', + help='human pose estimation config file path (from mmpose)') + parser.add_argument( + '--pose-checkpoint', + default=('https://download.openmmlab.com/mmpose/top_down/hrnet/' + 'hrnet_w32_coco_256x192-c78dce93_20200708.pth'), + help='human pose estimation checkpoint file/url') + parser.add_argument( + '--det-score-thr', + type=float, + default=0.9, + help='the threshold of human detection score') + parser.add_argument( + '--label-map', + default='tools/data/skeleton/label_map_ntu120.txt', + help='label map file') + parser.add_argument( + '--device', type=str, default='cuda:0', help='CPU/CUDA device option') + parser.add_argument( + '--short-side', + type=int, + default=480, + help='specify the short-side length of the image') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + args = parser.parse_args() + return args + + +def frame_extraction(video_path, short_side): + """Extract frames given video_path. + + Args: + video_path (str): The video_path. + """ + # Load the video, extract frames into ./tmp/video_name + target_dir = osp.join('./tmp', osp.basename(osp.splitext(video_path)[0])) + os.makedirs(target_dir, exist_ok=True) + # Should be able to handle videos up to several hours + frame_tmpl = osp.join(target_dir, 'img_{:06d}.jpg') + vid = cv2.VideoCapture(video_path) + frames = [] + frame_paths = [] + flag, frame = vid.read() + cnt = 0 + new_h, new_w = None, None + while flag: + if new_h is None: + h, w, _ = frame.shape + new_w, new_h = mmcv.rescale_size((w, h), (short_side, np.Inf)) + + frame = mmcv.imresize(frame, (new_w, new_h)) + + frames.append(frame) + frame_path = frame_tmpl.format(cnt + 1) + frame_paths.append(frame_path) + + cv2.imwrite(frame_path, frame) + cnt += 1 + flag, frame = vid.read() + + return frame_paths, frames + + +def detection_inference(args, frame_paths): + """Detect human boxes given frame paths. + + Args: + args (argparse.Namespace): The arguments. + frame_paths (list[str]): The paths of frames to do detection inference. + + Returns: + list[np.ndarray]: The human detection results. + """ + model = init_detector(args.det_config, args.det_checkpoint, args.device) + assert model.CLASSES[0] == 'person', ('We require you to use a detector ' + 'trained on COCO') + results = [] + print('Performing Human Detection for each frame') + prog_bar = mmcv.ProgressBar(len(frame_paths)) + for frame_path in frame_paths: + result = inference_detector(model, frame_path) + # We only keep human detections with score larger than det_score_thr + result = result[0][result[0][:, 4] >= args.det_score_thr] + results.append(result) + prog_bar.update() + return results + + +def pose_inference(args, frame_paths, det_results): + model = init_pose_model(args.pose_config, args.pose_checkpoint, + args.device) + ret = [] + print('Performing Human Pose Estimation for each frame') + prog_bar = mmcv.ProgressBar(len(frame_paths)) + for f, d in zip(frame_paths, det_results): + # Align input format + d = [dict(bbox=x) for x in list(d)] + pose = inference_top_down_pose_model(model, f, d, format='xyxy')[0] + ret.append(pose) + prog_bar.update() + return ret + + +def main(): + args = parse_args() + + frame_paths, original_frames = frame_extraction(args.video, + args.short_side) + num_frame = len(frame_paths) + h, w, _ = original_frames[0].shape + + # Get clip_len, frame_interval and calculate center index of each clip + config = mmcv.Config.fromfile(args.config) + config.merge_from_dict(args.cfg_options) + for component in config.data.test.pipeline: + if component['type'] == 'PoseNormalize': + component['mean'] = (w // 2, h // 2, .5) + component['max_value'] = (w, h, 1.) + + model = init_recognizer(config, args.checkpoint, args.device) + + # Load label_map + label_map = [x.strip() for x in open(args.label_map).readlines()] + + # Get Human detection results + det_results = detection_inference(args, frame_paths) + torch.cuda.empty_cache() + + pose_results = pose_inference(args, frame_paths, det_results) + torch.cuda.empty_cache() + + fake_anno = dict( + frame_dir='', + label=-1, + img_shape=(h, w), + original_shape=(h, w), + start_index=0, + modality='Pose', + total_frames=num_frame) + num_person = max([len(x) for x in pose_results]) + + num_keypoint = 17 + keypoint = np.zeros((num_person, num_frame, num_keypoint, 2), + dtype=np.float16) + keypoint_score = np.zeros((num_person, num_frame, num_keypoint), + dtype=np.float16) + for i, poses in enumerate(pose_results): + for j, pose in enumerate(poses): + pose = pose['keypoints'] + keypoint[j, i] = pose[:, :2] + keypoint_score[j, i] = pose[:, 2] + fake_anno['keypoint'] = keypoint + fake_anno['keypoint_score'] = keypoint_score + + results = inference_recognizer(model, fake_anno) + + action_label = label_map[results[0][0]] + + pose_model = init_pose_model(args.pose_config, args.pose_checkpoint, + args.device) + vis_frames = [ + vis_pose_result(pose_model, frame_paths[i], pose_results[i]) + for i in range(num_frame) + ] + for frame in vis_frames: + cv2.putText(frame, action_label, (10, 30), FONTFACE, FONTSCALE, + FONTCOLOR, THICKNESS, LINETYPE) + + vid = mpy.ImageSequenceClip([x[:, :, ::-1] for x in vis_frames], fps=24) + vid.write_videofile(args.out_filename, remove_temp=True) + + tmp_frame_dir = osp.dirname(frame_paths[0]) + shutil.rmtree(tmp_frame_dir) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo_spatiotemporal_det.py b/openmmlab_test/mmaction2-0.24.1/demo/demo_spatiotemporal_det.py new file mode 100644 index 0000000000000000000000000000000000000000..78dd7bcaa987059392479115587938e03a74795c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/demo_spatiotemporal_det.py @@ -0,0 +1,421 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import copy as cp +import os +import os.path as osp +import shutil + +import cv2 +import mmcv +import numpy as np +import torch +from mmcv import DictAction +from mmcv.runner import load_checkpoint + +from mmaction.models import build_detector + +try: + from mmdet.apis import inference_detector, init_detector +except (ImportError, ModuleNotFoundError): + raise ImportError('Failed to import `inference_detector` and ' + '`init_detector` form `mmdet.apis`. These apis are ' + 'required in this demo! ') + +try: + import moviepy.editor as mpy +except ImportError: + raise ImportError('Please install moviepy to enable output file') + +FONTFACE = cv2.FONT_HERSHEY_DUPLEX +FONTSCALE = 0.5 +FONTCOLOR = (255, 255, 255) # BGR, white +MSGCOLOR = (128, 128, 128) # BGR, gray +THICKNESS = 1 +LINETYPE = 1 + + +def hex2color(h): + """Convert the 6-digit hex string to tuple of 3 int value (RGB)""" + return (int(h[:2], 16), int(h[2:4], 16), int(h[4:], 16)) + + +plate_blue = '03045e-023e8a-0077b6-0096c7-00b4d8-48cae4' +plate_blue = plate_blue.split('-') +plate_blue = [hex2color(h) for h in plate_blue] +plate_green = '004b23-006400-007200-008000-38b000-70e000' +plate_green = plate_green.split('-') +plate_green = [hex2color(h) for h in plate_green] + + +def visualize(frames, annotations, plate=plate_blue, max_num=5): + """Visualize frames with predicted annotations. + + Args: + frames (list[np.ndarray]): Frames for visualization, note that + len(frames) % len(annotations) should be 0. + annotations (list[list[tuple]]): The predicted results. + plate (str): The plate used for visualization. Default: plate_blue. + max_num (int): Max number of labels to visualize for a person box. + Default: 5. + + Returns: + list[np.ndarray]: Visualized frames. + """ + + assert max_num + 1 <= len(plate) + plate = [x[::-1] for x in plate] + frames_ = cp.deepcopy(frames) + nf, na = len(frames), len(annotations) + assert nf % na == 0 + nfpa = len(frames) // len(annotations) + anno = None + h, w, _ = frames[0].shape + scale_ratio = np.array([w, h, w, h]) + for i in range(na): + anno = annotations[i] + if anno is None: + continue + for j in range(nfpa): + ind = i * nfpa + j + frame = frames_[ind] + for ann in anno: + box = ann[0] + label = ann[1] + if not len(label): + continue + score = ann[2] + box = (box * scale_ratio).astype(np.int64) + st, ed = tuple(box[:2]), tuple(box[2:]) + cv2.rectangle(frame, st, ed, plate[0], 2) + for k, lb in enumerate(label): + if k >= max_num: + break + text = abbrev(lb) + text = ': '.join([text, str(score[k])]) + location = (0 + st[0], 18 + k * 18 + st[1]) + textsize = cv2.getTextSize(text, FONTFACE, FONTSCALE, + THICKNESS)[0] + textwidth = textsize[0] + diag0 = (location[0] + textwidth, location[1] - 14) + diag1 = (location[0], location[1] + 2) + cv2.rectangle(frame, diag0, diag1, plate[k + 1], -1) + cv2.putText(frame, text, location, FONTFACE, FONTSCALE, + FONTCOLOR, THICKNESS, LINETYPE) + + return frames_ + + +def parse_args(): + parser = argparse.ArgumentParser(description='MMAction2 demo') + parser.add_argument( + '--config', + default=('configs/detection/ava/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py'), + help='spatio temporal detection config file path') + parser.add_argument( + '--checkpoint', + default=('https://download.openmmlab.com/mmaction/detection/ava/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb' + '_20201217-16378594.pth'), + help='spatio temporal detection checkpoint file/url') + parser.add_argument( + '--det-config', + default='demo/faster_rcnn_r50_fpn_2x_coco.py', + help='human detection config file path (from mmdet)') + parser.add_argument( + '--det-checkpoint', + default=('http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/' + 'faster_rcnn_r50_fpn_2x_coco/' + 'faster_rcnn_r50_fpn_2x_coco_' + 'bbox_mAP-0.384_20200504_210434-a5d8aa15.pth'), + help='human detection checkpoint file/url') + parser.add_argument( + '--det-score-thr', + type=float, + default=0.9, + help='the threshold of human detection score') + parser.add_argument( + '--action-score-thr', + type=float, + default=0.5, + help='the threshold of human action score') + parser.add_argument('--video', help='video file/url') + parser.add_argument( + '--label-map', + default='tools/data/ava/label_map.txt', + help='label map file') + parser.add_argument( + '--device', type=str, default='cuda:0', help='CPU/CUDA device option') + parser.add_argument( + '--out-filename', + default='demo/stdet_demo.mp4', + help='output filename') + parser.add_argument( + '--predict-stepsize', + default=8, + type=int, + help='give out a prediction per n frames') + parser.add_argument( + '--output-stepsize', + default=4, + type=int, + help=('show one frame per n frames in the demo, we should have: ' + 'predict_stepsize % output_stepsize == 0')) + parser.add_argument( + '--output-fps', + default=6, + type=int, + help='the fps of demo video output') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + args = parser.parse_args() + return args + + +def frame_extraction(video_path): + """Extract frames given video_path. + + Args: + video_path (str): The video_path. + """ + # Load the video, extract frames into ./tmp/video_name + target_dir = osp.join('./tmp', osp.basename(osp.splitext(video_path)[0])) + os.makedirs(target_dir, exist_ok=True) + # Should be able to handle videos up to several hours + frame_tmpl = osp.join(target_dir, 'img_{:06d}.jpg') + vid = cv2.VideoCapture(video_path) + frames = [] + frame_paths = [] + flag, frame = vid.read() + cnt = 0 + while flag: + frames.append(frame) + frame_path = frame_tmpl.format(cnt + 1) + frame_paths.append(frame_path) + cv2.imwrite(frame_path, frame) + cnt += 1 + flag, frame = vid.read() + return frame_paths, frames + + +def detection_inference(args, frame_paths): + """Detect human boxes given frame paths. + + Args: + args (argparse.Namespace): The arguments. + frame_paths (list[str]): The paths of frames to do detection inference. + + Returns: + list[np.ndarray]: The human detection results. + """ + model = init_detector(args.det_config, args.det_checkpoint, args.device) + assert model.CLASSES[0] == 'person', ('We require you to use a detector ' + 'trained on COCO') + results = [] + print('Performing Human Detection for each frame') + prog_bar = mmcv.ProgressBar(len(frame_paths)) + for frame_path in frame_paths: + result = inference_detector(model, frame_path) + # We only keep human detections with score larger than det_score_thr + result = result[0][result[0][:, 4] >= args.det_score_thr] + results.append(result) + prog_bar.update() + return results + + +def load_label_map(file_path): + """Load Label Map. + + Args: + file_path (str): The file path of label map. + + Returns: + dict: The label map (int -> label name). + """ + lines = open(file_path).readlines() + lines = [x.strip().split(': ') for x in lines] + return {int(x[0]): x[1] for x in lines} + + +def abbrev(name): + """Get the abbreviation of label name: + + 'take (an object) from (a person)' -> 'take ... from ...' + """ + while name.find('(') != -1: + st, ed = name.find('('), name.find(')') + name = name[:st] + '...' + name[ed + 1:] + return name + + +def pack_result(human_detection, result, img_h, img_w): + """Short summary. + + Args: + human_detection (np.ndarray): Human detection result. + result (type): The predicted label of each human proposal. + img_h (int): The image height. + img_w (int): The image width. + + Returns: + tuple: Tuple of human proposal, label name and label score. + """ + human_detection[:, 0::2] /= img_w + human_detection[:, 1::2] /= img_h + results = [] + if result is None: + return None + for prop, res in zip(human_detection, result): + res.sort(key=lambda x: -x[1]) + results.append( + (prop.data.cpu().numpy(), [x[0] for x in res], [x[1] + for x in res])) + return results + + +def main(): + args = parse_args() + + frame_paths, original_frames = frame_extraction(args.video) + num_frame = len(frame_paths) + h, w, _ = original_frames[0].shape + + # resize frames to shortside 256 + new_w, new_h = mmcv.rescale_size((w, h), (256, np.Inf)) + frames = [mmcv.imresize(img, (new_w, new_h)) for img in original_frames] + w_ratio, h_ratio = new_w / w, new_h / h + + # Get clip_len, frame_interval and calculate center index of each clip + config = mmcv.Config.fromfile(args.config) + config.merge_from_dict(args.cfg_options) + val_pipeline = config.data.val.pipeline + + sampler = [x for x in val_pipeline if x['type'] == 'SampleAVAFrames'][0] + clip_len, frame_interval = sampler['clip_len'], sampler['frame_interval'] + window_size = clip_len * frame_interval + assert clip_len % 2 == 0, 'We would like to have an even clip_len' + # Note that it's 1 based here + timestamps = np.arange(window_size // 2, num_frame + 1 - window_size // 2, + args.predict_stepsize) + + # Load label_map + label_map = load_label_map(args.label_map) + try: + if config['data']['train']['custom_classes'] is not None: + label_map = { + id + 1: label_map[cls] + for id, cls in enumerate(config['data']['train'] + ['custom_classes']) + } + except KeyError: + pass + + # Get Human detection results + center_frames = [frame_paths[ind - 1] for ind in timestamps] + human_detections = detection_inference(args, center_frames) + for i in range(len(human_detections)): + det = human_detections[i] + det[:, 0:4:2] *= w_ratio + det[:, 1:4:2] *= h_ratio + human_detections[i] = torch.from_numpy(det[:, :4]).to(args.device) + + # Get img_norm_cfg + img_norm_cfg = config['img_norm_cfg'] + if 'to_rgb' not in img_norm_cfg and 'to_bgr' in img_norm_cfg: + to_bgr = img_norm_cfg.pop('to_bgr') + img_norm_cfg['to_rgb'] = to_bgr + img_norm_cfg['mean'] = np.array(img_norm_cfg['mean']) + img_norm_cfg['std'] = np.array(img_norm_cfg['std']) + + # Build STDET model + try: + # In our spatiotemporal detection demo, different actions should have + # the same number of bboxes. + config['model']['test_cfg']['rcnn']['action_thr'] = .0 + except KeyError: + pass + + config.model.backbone.pretrained = None + model = build_detector(config.model, test_cfg=config.get('test_cfg')) + + load_checkpoint(model, args.checkpoint, map_location='cpu') + model.to(args.device) + model.eval() + + predictions = [] + + print('Performing SpatioTemporal Action Detection for each clip') + assert len(timestamps) == len(human_detections) + prog_bar = mmcv.ProgressBar(len(timestamps)) + for timestamp, proposal in zip(timestamps, human_detections): + if proposal.shape[0] == 0: + predictions.append(None) + continue + + start_frame = timestamp - (clip_len // 2 - 1) * frame_interval + frame_inds = start_frame + np.arange(0, window_size, frame_interval) + frame_inds = list(frame_inds - 1) + imgs = [frames[ind].astype(np.float32) for ind in frame_inds] + _ = [mmcv.imnormalize_(img, **img_norm_cfg) for img in imgs] + # THWC -> CTHW -> 1CTHW + input_array = np.stack(imgs).transpose((3, 0, 1, 2))[np.newaxis] + input_tensor = torch.from_numpy(input_array).to(args.device) + + with torch.no_grad(): + result = model( + return_loss=False, + img=[input_tensor], + img_metas=[[dict(img_shape=(new_h, new_w))]], + proposals=[[proposal]]) + result = result[0] + prediction = [] + # N proposals + for i in range(proposal.shape[0]): + prediction.append([]) + # Perform action score thr + for i in range(len(result)): + if i + 1 not in label_map: + continue + for j in range(proposal.shape[0]): + if result[i][j, 4] > args.action_score_thr: + prediction[j].append((label_map[i + 1], result[i][j, + 4])) + predictions.append(prediction) + prog_bar.update() + + results = [] + for human_detection, prediction in zip(human_detections, predictions): + results.append(pack_result(human_detection, prediction, new_h, new_w)) + + def dense_timestamps(timestamps, n): + """Make it nx frames.""" + old_frame_interval = (timestamps[1] - timestamps[0]) + start = timestamps[0] - old_frame_interval / n * (n - 1) / 2 + new_frame_inds = np.arange( + len(timestamps) * n) * old_frame_interval / n + start + return new_frame_inds.astype(np.int) + + dense_n = int(args.predict_stepsize / args.output_stepsize) + frames = [ + cv2.imread(frame_paths[i - 1]) + for i in dense_timestamps(timestamps, dense_n) + ] + print('Performing visualization') + vis_frames = visualize(frames, results) + vid = mpy.ImageSequenceClip([x[:, :, ::-1] for x in vis_frames], + fps=args.output_fps) + vid.write_videofile(args.out_filename) + + tmp_frame_dir = osp.dirname(frame_paths[0]) + shutil.rmtree(tmp_frame_dir) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/demo/demo_video_structuralize.py b/openmmlab_test/mmaction2-0.24.1/demo/demo_video_structuralize.py new file mode 100644 index 0000000000000000000000000000000000000000..2de2d8b55e199582d6356ca8f9ef1ca9e73f5f36 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/demo_video_structuralize.py @@ -0,0 +1,786 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import copy as cp +import os +import os.path as osp +import shutil +import warnings + +import cv2 +import mmcv +import numpy as np +import torch +from mmcv import DictAction +from mmcv.runner import load_checkpoint + +from mmaction.apis import inference_recognizer +from mmaction.datasets.pipelines import Compose +from mmaction.models import build_detector, build_model, build_recognizer + +try: + from mmdet.apis import inference_detector, init_detector +except (ImportError, ModuleNotFoundError): + warnings.warn('Failed to import `inference_detector` and `init_detector` ' + 'form `mmdet.apis`. These apis are required in ' + 'skeleton-based applications! ') + +try: + from mmpose.apis import (inference_top_down_pose_model, init_pose_model, + vis_pose_result) +except (ImportError, ModuleNotFoundError): + warnings.warn('Failed to import `inference_top_down_pose_model`, ' + '`init_pose_model`, and `vis_pose_result` form ' + '`mmpose.apis`. These apis are required in skeleton-based ' + 'applications! ') + +try: + import moviepy.editor as mpy +except ImportError: + raise ImportError('Please install moviepy to enable output file') + +FONTFACE = cv2.FONT_HERSHEY_DUPLEX +FONTSCALE = 0.5 +FONTCOLOR = (255, 255, 255) # BGR, white +MSGCOLOR = (128, 128, 128) # BGR, gray +THICKNESS = 1 +LINETYPE = 1 + + +def hex2color(h): + """Convert the 6-digit hex string to tuple of 3 int value (RGB)""" + return (int(h[:2], 16), int(h[2:4], 16), int(h[4:], 16)) + + +PLATEBLUE = '03045e-023e8a-0077b6-0096c7-00b4d8-48cae4' +PLATEBLUE = PLATEBLUE.split('-') +PLATEBLUE = [hex2color(h) for h in PLATEBLUE] +PLATEGREEN = '004b23-006400-007200-008000-38b000-70e000' +PLATEGREEN = PLATEGREEN.split('-') +PLATEGREEN = [hex2color(h) for h in PLATEGREEN] + + +def visualize(frames, + annotations, + pose_results, + action_result, + pose_model, + plate=PLATEBLUE, + max_num=5): + """Visualize frames with predicted annotations. + + Args: + frames (list[np.ndarray]): Frames for visualization, note that + len(frames) % len(annotations) should be 0. + annotations (list[list[tuple]]): The predicted spatio-temporal + detection results. + pose_results (list[list[tuple]): The pose results. + action_result (str): The predicted action recognition results. + pose_model (nn.Module): The constructed pose model. + plate (str): The plate used for visualization. Default: PLATEBLUE. + max_num (int): Max number of labels to visualize for a person box. + Default: 5. + + Returns: + list[np.ndarray]: Visualized frames. + """ + + assert max_num + 1 <= len(plate) + plate = [x[::-1] for x in plate] + frames_ = cp.deepcopy(frames) + nf, na = len(frames), len(annotations) + assert nf % na == 0 + nfpa = len(frames) // len(annotations) + anno = None + h, w, _ = frames[0].shape + scale_ratio = np.array([w, h, w, h]) + + # add pose results + if pose_results: + for i in range(nf): + frames_[i] = vis_pose_result(pose_model, frames_[i], + pose_results[i]) + + for i in range(na): + anno = annotations[i] + if anno is None: + continue + for j in range(nfpa): + ind = i * nfpa + j + frame = frames_[ind] + + # add action result for whole video + cv2.putText(frame, action_result, (10, 30), FONTFACE, FONTSCALE, + FONTCOLOR, THICKNESS, LINETYPE) + + # add spatio-temporal action detection results + for ann in anno: + box = ann[0] + label = ann[1] + if not len(label): + continue + score = ann[2] + box = (box * scale_ratio).astype(np.int64) + st, ed = tuple(box[:2]), tuple(box[2:]) + if not pose_results: + cv2.rectangle(frame, st, ed, plate[0], 2) + + for k, lb in enumerate(label): + if k >= max_num: + break + text = abbrev(lb) + text = ': '.join([text, str(score[k])]) + location = (0 + st[0], 18 + k * 18 + st[1]) + textsize = cv2.getTextSize(text, FONTFACE, FONTSCALE, + THICKNESS)[0] + textwidth = textsize[0] + diag0 = (location[0] + textwidth, location[1] - 14) + diag1 = (location[0], location[1] + 2) + cv2.rectangle(frame, diag0, diag1, plate[k + 1], -1) + cv2.putText(frame, text, location, FONTFACE, FONTSCALE, + FONTCOLOR, THICKNESS, LINETYPE) + + return frames_ + + +def parse_args(): + parser = argparse.ArgumentParser(description='MMAction2 demo') + parser.add_argument( + '--rgb-stdet-config', + default=('configs/detection/ava/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py'), + help='rgb-based spatio temporal detection config file path') + parser.add_argument( + '--rgb-stdet-checkpoint', + default=('https://download.openmmlab.com/mmaction/detection/ava/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb' + '_20201217-16378594.pth'), + help='rgb-based spatio temporal detection checkpoint file/url') + parser.add_argument( + '--skeleton-stdet-checkpoint', + default=('https://download.openmmlab.com/mmaction/skeleton/posec3d/' + 'posec3d_ava.pth'), + help='skeleton-based spatio temporal detection checkpoint file/url') + parser.add_argument( + '--det-config', + default='demo/faster_rcnn_r50_fpn_2x_coco.py', + help='human detection config file path (from mmdet)') + parser.add_argument( + '--det-checkpoint', + default=('http://download.openmmlab.com/mmdetection/v2.0/' + 'faster_rcnn/faster_rcnn_r50_fpn_2x_coco/' + 'faster_rcnn_r50_fpn_2x_coco_' + 'bbox_mAP-0.384_20200504_210434-a5d8aa15.pth'), + help='human detection checkpoint file/url') + parser.add_argument( + '--pose-config', + default='demo/hrnet_w32_coco_256x192.py', + help='human pose estimation config file path (from mmpose)') + parser.add_argument( + '--pose-checkpoint', + default=('https://download.openmmlab.com/mmpose/top_down/hrnet/' + 'hrnet_w32_coco_256x192-c78dce93_20200708.pth'), + help='human pose estimation checkpoint file/url') + parser.add_argument( + '--skeleton-config', + default='configs/skeleton/posec3d/' + 'slowonly_r50_u48_240e_ntu120_xsub_keypoint.py', + help='skeleton-based action recognition config file path') + parser.add_argument( + '--skeleton-checkpoint', + default='https://download.openmmlab.com/mmaction/skeleton/posec3d/' + 'posec3d_k400.pth', + help='skeleton-based action recognition checkpoint file/url') + parser.add_argument( + '--rgb-config', + default='configs/recognition/tsn/' + 'tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py', + help='rgb-based action recognition config file path') + parser.add_argument( + '--rgb-checkpoint', + default='https://download.openmmlab.com/mmaction/recognition/' + 'tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/' + 'tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth', + help='rgb-based action recognition checkpoint file/url') + parser.add_argument( + '--use-skeleton-stdet', + action='store_true', + help='use skeleton-based spatio temporal detection method') + parser.add_argument( + '--use-skeleton-recog', + action='store_true', + help='use skeleton-based action recognition method') + parser.add_argument( + '--det-score-thr', + type=float, + default=0.9, + help='the threshold of human detection score') + parser.add_argument( + '--action-score-thr', + type=float, + default=0.4, + help='the threshold of action prediction score') + parser.add_argument( + '--video', + default='demo/test_video_structuralize.mp4', + help='video file/url') + parser.add_argument( + '--label-map-stdet', + default='tools/data/ava/label_map.txt', + help='label map file for spatio-temporal action detection') + parser.add_argument( + '--label-map', + default='tools/data/kinetics/label_map_k400.txt', + help='label map file for action recognition') + parser.add_argument( + '--device', type=str, default='cuda:0', help='CPU/CUDA device option') + parser.add_argument( + '--out-filename', + default='demo/test_stdet_recognition_output.mp4', + help='output filename') + parser.add_argument( + '--predict-stepsize', + default=8, + type=int, + help='give out a spatio-temporal detection prediction per n frames') + parser.add_argument( + '--output-stepsize', + default=1, + type=int, + help=('show one frame per n frames in the demo, we should have: ' + 'predict_stepsize % output_stepsize == 0')) + parser.add_argument( + '--output-fps', + default=24, + type=int, + help='the fps of demo video output') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + args = parser.parse_args() + return args + + +def frame_extraction(video_path): + """Extract frames given video_path. + + Args: + video_path (str): The video_path. + """ + # Load the video, extract frames into ./tmp/video_name + target_dir = osp.join('./tmp', osp.basename(osp.splitext(video_path)[0])) + # target_dir = osp.join('./tmp','spatial_skeleton_dir') + os.makedirs(target_dir, exist_ok=True) + # Should be able to handle videos up to several hours + frame_tmpl = osp.join(target_dir, 'img_{:06d}.jpg') + vid = cv2.VideoCapture(video_path) + frames = [] + frame_paths = [] + flag, frame = vid.read() + cnt = 0 + while flag: + frames.append(frame) + frame_path = frame_tmpl.format(cnt + 1) + frame_paths.append(frame_path) + cv2.imwrite(frame_path, frame) + cnt += 1 + flag, frame = vid.read() + return frame_paths, frames + + +def detection_inference(args, frame_paths): + """Detect human boxes given frame paths. + + Args: + args (argparse.Namespace): The arguments. + frame_paths (list[str]): The paths of frames to do detection inference. + + Returns: + list[np.ndarray]: The human detection results. + """ + model = init_detector(args.det_config, args.det_checkpoint, args.device) + assert model.CLASSES[0] == 'person', ('We require you to use a detector ' + 'trained on COCO') + results = [] + print('Performing Human Detection for each frame') + prog_bar = mmcv.ProgressBar(len(frame_paths)) + for frame_path in frame_paths: + result = inference_detector(model, frame_path) + # We only keep human detections with score larger than det_score_thr + result = result[0][result[0][:, 4] >= args.det_score_thr] + results.append(result) + prog_bar.update() + + return results + + +def pose_inference(args, frame_paths, det_results): + model = init_pose_model(args.pose_config, args.pose_checkpoint, + args.device) + ret = [] + print('Performing Human Pose Estimation for each frame') + prog_bar = mmcv.ProgressBar(len(frame_paths)) + for f, d in zip(frame_paths, det_results): + # Align input format + d = [dict(bbox=x) for x in list(d)] + + pose = inference_top_down_pose_model(model, f, d, format='xyxy')[0] + ret.append(pose) + prog_bar.update() + return ret + + +def load_label_map(file_path): + """Load Label Map. + + Args: + file_path (str): The file path of label map. + + Returns: + dict: The label map (int -> label name). + """ + lines = open(file_path).readlines() + lines = [x.strip().split(': ') for x in lines] + return {int(x[0]): x[1] for x in lines} + + +def abbrev(name): + """Get the abbreviation of label name: + + 'take (an object) from (a person)' -> 'take ... from ...' + """ + while name.find('(') != -1: + st, ed = name.find('('), name.find(')') + name = name[:st] + '...' + name[ed + 1:] + return name + + +def pack_result(human_detection, result, img_h, img_w): + """Short summary. + + Args: + human_detection (np.ndarray): Human detection result. + result (type): The predicted label of each human proposal. + img_h (int): The image height. + img_w (int): The image width. + + Returns: + tuple: Tuple of human proposal, label name and label score. + """ + human_detection[:, 0::2] /= img_w + human_detection[:, 1::2] /= img_h + results = [] + if result is None: + return None + for prop, res in zip(human_detection, result): + res.sort(key=lambda x: -x[1]) + results.append( + (prop.data.cpu().numpy(), [x[0] for x in res], [x[1] + for x in res])) + return results + + +def expand_bbox(bbox, h, w, ratio=1.25): + x1, y1, x2, y2 = bbox + center_x = (x1 + x2) // 2 + center_y = (y1 + y2) // 2 + width = x2 - x1 + height = y2 - y1 + + square_l = max(width, height) + new_width = new_height = square_l * ratio + + new_x1 = max(0, int(center_x - new_width / 2)) + new_x2 = min(int(center_x + new_width / 2), w) + new_y1 = max(0, int(center_y - new_height / 2)) + new_y2 = min(int(center_y + new_height / 2), h) + return (new_x1, new_y1, new_x2, new_y2) + + +def cal_iou(box1, box2): + xmin1, ymin1, xmax1, ymax1 = box1 + xmin2, ymin2, xmax2, ymax2 = box2 + + s1 = (xmax1 - xmin1) * (ymax1 - ymin1) + s2 = (xmax2 - xmin2) * (ymax2 - ymin2) + + xmin = max(xmin1, xmin2) + ymin = max(ymin1, ymin2) + xmax = min(xmax1, xmax2) + ymax = min(ymax1, ymax2) + + w = max(0, xmax - xmin) + h = max(0, ymax - ymin) + intersect = w * h + union = s1 + s2 - intersect + iou = intersect / union + + return iou + + +def skeleton_based_action_recognition(args, pose_results, num_frame, h, w): + fake_anno = dict( + frame_dict='', + label=-1, + img_shape=(h, w), + origin_shape=(h, w), + start_index=0, + modality='Pose', + total_frames=num_frame) + num_person = max([len(x) for x in pose_results]) + + num_keypoint = 17 + keypoint = np.zeros((num_person, num_frame, num_keypoint, 2), + dtype=np.float16) + keypoint_score = np.zeros((num_person, num_frame, num_keypoint), + dtype=np.float16) + for i, poses in enumerate(pose_results): + for j, pose in enumerate(poses): + pose = pose['keypoints'] + keypoint[j, i] = pose[:, :2] + keypoint_score[j, i] = pose[:, 2] + + fake_anno['keypoint'] = keypoint + fake_anno['keypoint_score'] = keypoint_score + + label_map = [x.strip() for x in open(args.label_map).readlines()] + num_class = len(label_map) + + skeleton_config = mmcv.Config.fromfile(args.skeleton_config) + skeleton_config.model.cls_head.num_classes = num_class # for K400 dataset + skeleton_pipeline = Compose(skeleton_config.test_pipeline) + skeleton_imgs = skeleton_pipeline(fake_anno)['imgs'][None] + skeleton_imgs = skeleton_imgs.to(args.device) + + # Build skeleton-based recognition model + skeleton_model = build_model(skeleton_config.model) + load_checkpoint( + skeleton_model, args.skeleton_checkpoint, map_location='cpu') + skeleton_model.to(args.device) + skeleton_model.eval() + + with torch.no_grad(): + output = skeleton_model(return_loss=False, imgs=skeleton_imgs) + + action_idx = np.argmax(output) + skeleton_action_result = label_map[ + action_idx] # skeleton-based action result for the whole video + return skeleton_action_result + + +def rgb_based_action_recognition(args): + rgb_config = mmcv.Config.fromfile(args.rgb_config) + rgb_config.model.backbone.pretrained = None + rgb_model = build_recognizer( + rgb_config.model, test_cfg=rgb_config.get('test_cfg')) + load_checkpoint(rgb_model, args.rgb_checkpoint, map_location='cpu') + rgb_model.cfg = rgb_config + rgb_model.to(args.device) + rgb_model.eval() + action_results = inference_recognizer( + rgb_model, args.video, label_path=args.label_map) + rgb_action_result = action_results[0][0] + label_map = [x.strip() for x in open(args.label_map).readlines()] + return label_map[rgb_action_result] + + +def skeleton_based_stdet(args, label_map, human_detections, pose_results, + num_frame, clip_len, frame_interval, h, w): + window_size = clip_len * frame_interval + assert clip_len % 2 == 0, 'We would like to have an even clip_len' + timestamps = np.arange(window_size // 2, num_frame + 1 - window_size // 2, + args.predict_stepsize) + + skeleton_config = mmcv.Config.fromfile(args.skeleton_config) + num_class = max(label_map.keys()) + 1 # for AVA dataset (81) + skeleton_config.model.cls_head.num_classes = num_class + skeleton_pipeline = Compose(skeleton_config.test_pipeline) + skeleton_stdet_model = build_model(skeleton_config.model) + load_checkpoint( + skeleton_stdet_model, + args.skeleton_stdet_checkpoint, + map_location='cpu') + skeleton_stdet_model.to(args.device) + skeleton_stdet_model.eval() + + skeleton_predictions = [] + + print('Performing SpatioTemporal Action Detection for each clip') + prog_bar = mmcv.ProgressBar(len(timestamps)) + for timestamp in timestamps: + proposal = human_detections[timestamp - 1] + if proposal.shape[0] == 0: # no people detected + skeleton_predictions.append(None) + continue + + start_frame = timestamp - (clip_len // 2 - 1) * frame_interval + frame_inds = start_frame + np.arange(0, window_size, frame_interval) + frame_inds = list(frame_inds - 1) + num_frame = len(frame_inds) # 30 + + pose_result = [pose_results[ind] for ind in frame_inds] + + skeleton_prediction = [] + for i in range(proposal.shape[0]): # num_person + skeleton_prediction.append([]) + + fake_anno = dict( + frame_dict='', + label=-1, + img_shape=(h, w), + origin_shape=(h, w), + start_index=0, + modality='Pose', + total_frames=num_frame) + num_person = 1 + + num_keypoint = 17 + keypoint = np.zeros( + (num_person, num_frame, num_keypoint, 2)) # M T V 2 + keypoint_score = np.zeros( + (num_person, num_frame, num_keypoint)) # M T V + + # pose matching + person_bbox = proposal[i][:4] + area = expand_bbox(person_bbox, h, w) + + for j, poses in enumerate(pose_result): # num_frame + max_iou = float('-inf') + index = -1 + if len(poses) == 0: + continue + for k, per_pose in enumerate(poses): + iou = cal_iou(per_pose['bbox'][:4], area) + if max_iou < iou: + index = k + max_iou = iou + keypoint[0, j] = poses[index]['keypoints'][:, :2] + keypoint_score[0, j] = poses[index]['keypoints'][:, 2] + + fake_anno['keypoint'] = keypoint + fake_anno['keypoint_score'] = keypoint_score + + skeleton_imgs = skeleton_pipeline(fake_anno)['imgs'][None] + skeleton_imgs = skeleton_imgs.to(args.device) + + with torch.no_grad(): + output = skeleton_stdet_model( + return_loss=False, imgs=skeleton_imgs) + output = output[0] + for k in range(len(output)): # 81 + if k not in label_map: + continue + if output[k] > args.action_score_thr: + skeleton_prediction[i].append( + (label_map[k], output[k])) + + skeleton_predictions.append(skeleton_prediction) + prog_bar.update() + + return timestamps, skeleton_predictions + + +def rgb_based_stdet(args, frames, label_map, human_detections, w, h, new_w, + new_h, w_ratio, h_ratio): + + rgb_stdet_config = mmcv.Config.fromfile(args.rgb_stdet_config) + rgb_stdet_config.merge_from_dict(args.cfg_options) + + val_pipeline = rgb_stdet_config.data.val.pipeline + sampler = [x for x in val_pipeline if x['type'] == 'SampleAVAFrames'][0] + clip_len, frame_interval = sampler['clip_len'], sampler['frame_interval'] + assert clip_len % 2 == 0, 'We would like to have an even clip_len' + + window_size = clip_len * frame_interval + num_frame = len(frames) + timestamps = np.arange(window_size // 2, num_frame + 1 - window_size // 2, + args.predict_stepsize) + + # Get img_norm_cfg + img_norm_cfg = rgb_stdet_config['img_norm_cfg'] + if 'to_rgb' not in img_norm_cfg and 'to_bgr' in img_norm_cfg: + to_bgr = img_norm_cfg.pop('to_bgr') + img_norm_cfg['to_rgb'] = to_bgr + img_norm_cfg['mean'] = np.array(img_norm_cfg['mean']) + img_norm_cfg['std'] = np.array(img_norm_cfg['std']) + + # Build STDET model + try: + # In our spatiotemporal detection demo, different actions should have + # the same number of bboxes. + rgb_stdet_config['model']['test_cfg']['rcnn']['action_thr'] = .0 + except KeyError: + pass + + rgb_stdet_config.model.backbone.pretrained = None + rgb_stdet_model = build_detector( + rgb_stdet_config.model, test_cfg=rgb_stdet_config.get('test_cfg')) + + load_checkpoint( + rgb_stdet_model, args.rgb_stdet_checkpoint, map_location='cpu') + rgb_stdet_model.to(args.device) + rgb_stdet_model.eval() + + predictions = [] + + print('Performing SpatioTemporal Action Detection for each clip') + prog_bar = mmcv.ProgressBar(len(timestamps)) + for timestamp in timestamps: + proposal = human_detections[timestamp - 1] + + if proposal.shape[0] == 0: + predictions.append(None) + continue + + start_frame = timestamp - (clip_len // 2 - 1) * frame_interval + frame_inds = start_frame + np.arange(0, window_size, frame_interval) + frame_inds = list(frame_inds - 1) + + imgs = [frames[ind].astype(np.float32) for ind in frame_inds] + _ = [mmcv.imnormalize_(img, **img_norm_cfg) for img in imgs] + # THWC -> CTHW -> 1CTHW + input_array = np.stack(imgs).transpose((3, 0, 1, 2))[np.newaxis] + input_tensor = torch.from_numpy(input_array).to(args.device) + + with torch.no_grad(): + result = rgb_stdet_model( + return_loss=False, + img=[input_tensor], + img_metas=[[dict(img_shape=(new_h, new_w))]], + proposals=[[proposal]]) + result = result[0] + prediction = [] + # N proposals + for i in range(proposal.shape[0]): + prediction.append([]) + + # Perform action score thr + for i in range(len(result)): # 80 + if i + 1 not in label_map: + continue + for j in range(proposal.shape[0]): + if result[i][j, 4] > args.action_score_thr: + prediction[j].append((label_map[i + 1], result[i][j, + 4])) + predictions.append(prediction) + prog_bar.update() + + return timestamps, predictions + + +def main(): + args = parse_args() + + frame_paths, original_frames = frame_extraction(args.video) + num_frame = len(frame_paths) + h, w, _ = original_frames[0].shape + + # Get Human detection results and pose results + human_detections = detection_inference(args, frame_paths) + pose_results = None + if args.use_skeleton_recog or args.use_skeleton_stdet: + pose_results = pose_inference(args, frame_paths, human_detections) + + # resize frames to shortside 256 + new_w, new_h = mmcv.rescale_size((w, h), (256, np.Inf)) + frames = [mmcv.imresize(img, (new_w, new_h)) for img in original_frames] + w_ratio, h_ratio = new_w / w, new_h / h + + # Load spatio-temporal detection label_map + stdet_label_map = load_label_map(args.label_map_stdet) + rgb_stdet_config = mmcv.Config.fromfile(args.rgb_stdet_config) + rgb_stdet_config.merge_from_dict(args.cfg_options) + try: + if rgb_stdet_config['data']['train']['custom_classes'] is not None: + stdet_label_map = { + id + 1: stdet_label_map[cls] + for id, cls in enumerate(rgb_stdet_config['data']['train'] + ['custom_classes']) + } + except KeyError: + pass + + action_result = None + if args.use_skeleton_recog: + print('Use skeleton-based recognition') + action_result = skeleton_based_action_recognition( + args, pose_results, num_frame, h, w) + else: + print('Use rgb-based recognition') + action_result = rgb_based_action_recognition(args) + + stdet_preds = None + if args.use_skeleton_stdet: + print('Use skeleton-based SpatioTemporal Action Detection') + clip_len, frame_interval = 30, 1 + timestamps, stdet_preds = skeleton_based_stdet(args, stdet_label_map, + human_detections, + pose_results, num_frame, + clip_len, + frame_interval, h, w) + for i in range(len(human_detections)): + det = human_detections[i] + det[:, 0:4:2] *= w_ratio + det[:, 1:4:2] *= h_ratio + human_detections[i] = torch.from_numpy(det[:, :4]).to(args.device) + + else: + print('Use rgb-based SpatioTemporal Action Detection') + for i in range(len(human_detections)): + det = human_detections[i] + det[:, 0:4:2] *= w_ratio + det[:, 1:4:2] *= h_ratio + human_detections[i] = torch.from_numpy(det[:, :4]).to(args.device) + timestamps, stdet_preds = rgb_based_stdet(args, frames, + stdet_label_map, + human_detections, w, h, + new_w, new_h, w_ratio, + h_ratio) + + stdet_results = [] + for timestamp, prediction in zip(timestamps, stdet_preds): + human_detection = human_detections[timestamp - 1] + stdet_results.append( + pack_result(human_detection, prediction, new_h, new_w)) + + def dense_timestamps(timestamps, n): + """Make it nx frames.""" + old_frame_interval = (timestamps[1] - timestamps[0]) + start = timestamps[0] - old_frame_interval / n * (n - 1) / 2 + new_frame_inds = np.arange( + len(timestamps) * n) * old_frame_interval / n + start + return new_frame_inds.astype(np.int) + + dense_n = int(args.predict_stepsize / args.output_stepsize) + output_timestamps = dense_timestamps(timestamps, dense_n) + frames = [ + cv2.imread(frame_paths[timestamp - 1]) + for timestamp in output_timestamps + ] + + print('Performing visualization') + pose_model = init_pose_model(args.pose_config, args.pose_checkpoint, + args.device) + + if args.use_skeleton_recog or args.use_skeleton_stdet: + pose_results = [ + pose_results[timestamp - 1] for timestamp in output_timestamps + ] + + vis_frames = visualize(frames, stdet_results, pose_results, action_result, + pose_model) + vid = mpy.ImageSequenceClip([x[:, :, ::-1] for x in vis_frames], + fps=args.output_fps) + vid.write_videofile(args.out_filename) + + tmp_frame_dir = osp.dirname(frame_paths[0]) + shutil.rmtree(tmp_frame_dir) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/demo/faster_rcnn_r50_fpn_2x_coco.py b/openmmlab_test/mmaction2-0.24.1/demo/faster_rcnn_r50_fpn_2x_coco.py new file mode 100644 index 0000000000000000000000000000000000000000..2387ce3c7f2d35b557b9afcb5a8d81bceb82f566 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/faster_rcnn_r50_fpn_2x_coco.py @@ -0,0 +1,182 @@ +# Copyright (c) OpenMMLab. All rights reserved. +# model config +model = dict( + type='FasterRCNN', + pretrained='torchvision://resnet50', + backbone=dict( + type='ResNet', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + frozen_stages=1, + norm_cfg=dict(type='BN', requires_grad=True), + norm_eval=True, + style='pytorch'), + neck=dict( + type='FPN', + in_channels=[256, 512, 1024, 2048], + out_channels=256, + num_outs=5), + rpn_head=dict( + type='RPNHead', + in_channels=256, + feat_channels=256, + anchor_generator=dict( + type='AnchorGenerator', + scales=[8], + ratios=[0.5, 1.0, 2.0], + strides=[4, 8, 16, 32, 64]), + bbox_coder=dict( + type='DeltaXYWHBBoxCoder', + target_means=[.0, .0, .0, .0], + target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + roi_head=dict( + type='StandardRoIHead', + bbox_roi_extractor=dict( + type='SingleRoIExtractor', + roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + out_channels=256, + featmap_strides=[4, 8, 16, 32]), + bbox_head=dict( + type='Shared2FCBBoxHead', + in_channels=256, + fc_out_channels=1024, + roi_feat_size=7, + num_classes=80, + bbox_coder=dict( + type='DeltaXYWHBBoxCoder', + target_means=[0., 0., 0., 0.], + target_stds=[0.1, 0.1, 0.2, 0.2]), + reg_class_agnostic=False, + loss_cls=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type='L1Loss', loss_weight=1.0))), + # model training and testing settings + train_cfg=dict( + rpn=dict( + assigner=dict( + type='MaxIoUAssigner', + pos_iou_thr=0.7, + neg_iou_thr=0.3, + min_pos_iou=0.3, + match_low_quality=True, + ignore_iof_thr=-1), + sampler=dict( + type='RandomSampler', + num=256, + pos_fraction=0.5, + neg_pos_ub=-1, + add_gt_as_proposals=False), + allowed_border=-1, + pos_weight=-1, + debug=False), + rpn_proposal=dict( + nms_pre=2000, + max_per_img=1000, + nms=dict(type='nms', iou_threshold=0.7), + min_bbox_size=0), + rcnn=dict( + assigner=dict( + type='MaxIoUAssigner', + pos_iou_thr=0.5, + neg_iou_thr=0.5, + min_pos_iou=0.5, + match_low_quality=False, + ignore_iof_thr=-1), + sampler=dict( + type='RandomSampler', + num=512, + pos_fraction=0.25, + neg_pos_ub=-1, + add_gt_as_proposals=True), + pos_weight=-1, + debug=False)), + test_cfg=dict( + rpn=dict( + nms_pre=1000, + max_per_img=1000, + nms=dict(type='nms', iou_threshold=0.7), + min_bbox_size=0), + rcnn=dict( + score_thr=0.05, + nms=dict(type='nms', iou_threshold=0.5), + max_per_img=100))) + +# dataset config +dataset_type = 'CocoDataset' +data_root = 'data/coco/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations', with_bbox=True), + dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), + dict(type='RandomFlip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size_divisor=32), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(1333, 800), + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size_divisor=32), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=data_root + 'annotations/instances_train2017.json', + img_prefix=data_root + 'train2017/', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=data_root + 'annotations/instances_val2017.json', + img_prefix=data_root + 'val2017/', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + ann_file=data_root + 'annotations/instances_val2017.json', + img_prefix=data_root + 'val2017/', + pipeline=test_pipeline)) +evaluation = dict(interval=1, metric='bbox') +# Schedule +# optimizer +optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict( + policy='step', + warmup='linear', + warmup_iters=500, + warmup_ratio=0.001, + step=[16, 22]) +total_epochs = 24 +# runtime +checkpoint_config = dict(interval=1) +# yapf:disable +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook'), + ]) +# yapf:enable +dist_params = dict(backend='nccl') +log_level = 'INFO' +load_from = None +resume_from = None +workflow = [('train', 1)] diff --git a/openmmlab_test/mmaction2-0.24.1/demo/fuse/data_list.txt b/openmmlab_test/mmaction2-0.24.1/demo/fuse/data_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..7fa4c0b3cb262331f34acb1d10b5029f82e09d55 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/fuse/data_list.txt @@ -0,0 +1,100 @@ +jf7RDuUTrsQ 300 325 +JTlatknwOrY 301 233 +8UxlDNur-Z0 300 262 +y9r115bgfNk 300 320 +ZnIDviwA8CE 300 244 +c8ln_nWYMyM 300 333 +9GFfKVeoGm0 300 98 +F5Y_gGsg4x8 300 193 +AuqIu3x_lhY 300 36 +1Hi5GMotrjs 300 26 +czhL0iDbNT8 300 46 +DYpTE_n-Wvk 177 208 +R-xmgefs-M4 300 101 +KPP2qRzMdos 300 131 +PmgfU9ocx5A 300 193 +GI7nIyMEQi4 300 173 +A8TIWMvJVDU 300 72 +ustVqWMM56c 300 289 +03dk7mneDU0 300 254 +jqkyelS4GJk 300 279 +a58tBGuDIg0 231 382 +5l1ajLjqaPo 300 226 +-5wLopwbGX0 300 132 +NUG7kwJ-614 300 103 +wHUvw_R2iv8 300 97 +44Mak5_s6Fk 300 256 +y5vsk8Mj-3w 300 77 +TEj_A_BC-aU 300 393 +fUdu6hpMt_c 299 40 +C5Z1sRArUR0 300 254 +-orecnYvpNw 300 284 +Urmbp1ulIXI 300 319 +bLgdi4w7OAk 299 36 +cVv_XMw4W2U 300 27 +dV8JmKwDUzM 300 312 +yZ9hIqW4bRc 300 239 +9ykbMdR9Jss 213 257 +G8fEnqIOkiA 300 158 +6P2eVJ-Qp1g 300 131 +Y-acp_jXG1Q 302 315 +xthWPdx21r8 301 62 +LExCUx4STW0 300 9 +p2UMwzWsY0U 300 248 +c0UI7f3Plro 300 383 +1MmjE51PeIE 300 93 +OU5dJpNHATk 300 342 +38Uv6dbQkWc 281 44 +5ZNdkbmv274 300 59 +DrSL3Uddj6s 300 283 +aNJ1-bvRox8 175 384 +b5U7A_crvE0 194 377 +xeWO9Bl9aWA 300 86 +Zy8Ta83mrXo 300 223 +AXnDRH7o2DQ 300 146 +fTPDXmcygjw 300 11 +EhRxb8-cNzQ 164 325 +iO8RYYQzNiE 299 191 +XbCncZcXuTI 300 55 +pSCunaRn45A 300 265 +UqI--TBQRgg 300 165 +yD42KW6cm-A 300 186 +VseX7hoxhbM 300 61 +1FEcfy-moBM 300 8 +BUT8oefH9Nw 300 120 +-49tMSUTnZg 300 227 +cZKPTt_FcFs 300 85 +fiKJm0eavfw 300 323 +gJcVljRRxGE 302 87 +de1rSoht9t4 300 253 +UAIJnI7fQYo 300 284 +c4eIDxmVmCw 300 95 +3LGce3efz7M 300 332 +EC8iyn_q-NM 300 92 +eo15donXwmY 300 351 +NsG31u7Pd2Q 300 87 +ILkPWpZYlPE 300 137 +n5ZHSJRZl1U 300 338 +UoQE44FEqLQ 300 260 +5I-4meP_5wY 300 185 +udLMOf77S3U 300 209 +a4Ye18Mnblk 262 172 +QbDMgHWwt_s 236 395 +S6iAYBBMnwk 300 267 +DNMfmNV8Uug 300 131 +AJdp07pp43c 300 293 +tVuop87KbDY 300 103 +o79s5eOAF-c 300 246 +dMt_nuBNdeY 300 168 +RJU9NV1R4Fw 300 128 +Zhux7Vy-hHc 300 82 +47Cj6jwQKjo 300 228 +a7Mc-0lwAuE 300 129 +taZtEzvkg3M 300 264 +bVDZohQJhBI 240 129 +sBJk5li0O5o 216 154 +DQUNZmbQI_g 300 29 +-zpKHNrNsn4 300 244 +Dcz0r8q-sx0 300 249 +hfRKTH9pOMA 165 116 +8CdUbOHDtes 300 222 diff --git a/openmmlab_test/mmaction2-0.24.1/demo/hrnet_w32_coco_256x192.py b/openmmlab_test/mmaction2-0.24.1/demo/hrnet_w32_coco_256x192.py new file mode 100644 index 0000000000000000000000000000000000000000..3806739d45b54d00d4df24478af83a10aca6af48 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/hrnet_w32_coco_256x192.py @@ -0,0 +1,174 @@ +# Copyright (c) OpenMMLab. All rights reserved. +log_level = 'INFO' +load_from = None +resume_from = None +dist_params = dict(backend='nccl') +workflow = [('train', 1)] +checkpoint_config = dict(interval=10) +evaluation = dict(interval=10, metric='mAP', key_indicator='AP') + +optimizer = dict( + type='Adam', + lr=5e-4, +) +optimizer_config = dict(grad_clip=None) +# learning policy +lr_config = dict( + policy='step', + warmup='linear', + warmup_iters=500, + warmup_ratio=0.001, + step=[170, 200]) +total_epochs = 210 +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook'), + # dict(type='TensorboardLoggerHook') + ]) + +channel_cfg = dict( + num_output_channels=17, + dataset_joints=17, + dataset_channel=[ + [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + ], + inference_channel=[ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 + ]) + +# model settings +model = dict( + type='TopDown', + pretrained='https://download.openmmlab.com/mmpose/' + 'pretrain_models/hrnet_w32-36af842e.pth', + backbone=dict( + type='HRNet', + in_channels=3, + extra=dict( + stage1=dict( + num_modules=1, + num_branches=1, + block='BOTTLENECK', + num_blocks=(4, ), + num_channels=(64, )), + stage2=dict( + num_modules=1, + num_branches=2, + block='BASIC', + num_blocks=(4, 4), + num_channels=(32, 64)), + stage3=dict( + num_modules=4, + num_branches=3, + block='BASIC', + num_blocks=(4, 4, 4), + num_channels=(32, 64, 128)), + stage4=dict( + num_modules=3, + num_branches=4, + block='BASIC', + num_blocks=(4, 4, 4, 4), + num_channels=(32, 64, 128, 256))), + ), + keypoint_head=dict( + type='TopdownHeatmapSimpleHead', + in_channels=32, + out_channels=channel_cfg['num_output_channels'], + num_deconv_layers=0, + extra=dict(final_conv_kernel=1, ), + loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)), + train_cfg=dict(), + test_cfg=dict( + flip_test=True, + post_process='default', + shift_heatmap=True, + modulate_kernel=11)) + +data_cfg = dict( + image_size=[192, 256], + heatmap_size=[48, 64], + num_output_channels=channel_cfg['num_output_channels'], + num_joints=channel_cfg['dataset_joints'], + dataset_channel=channel_cfg['dataset_channel'], + inference_channel=channel_cfg['inference_channel'], + soft_nms=False, + nms_thr=1.0, + oks_thr=0.9, + vis_thr=0.2, + use_gt_bbox=False, + det_bbox_thr=0.0, + bbox_file='data/coco/person_detection_results/' + 'COCO_val2017_detections_AP_H_56_person.json', +) + +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='TopDownRandomFlip', flip_prob=0.5), + dict( + type='TopDownHalfBodyTransform', + num_joints_half_body=8, + prob_half_body=0.3), + dict( + type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), + dict(type='TopDownAffine'), + dict(type='ToTensor'), + dict( + type='NormalizeTensor', + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]), + dict(type='TopDownGenerateTarget', sigma=2), + dict( + type='Collect', + keys=['img', 'target', 'target_weight'], + meta_keys=[ + 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', + 'rotation', 'bbox_score', 'flip_pairs' + ]), +] + +val_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='TopDownGetBboxCenterScale', padding=1.25), + dict(type='TopDownAffine'), + dict(type='ToTensor'), + dict( + type='NormalizeTensor', + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]), + dict( + type='Collect', + keys=['img'], + meta_keys=[ + 'image_file', 'center', 'scale', 'rotation', 'bbox_score', + 'flip_pairs' + ]), +] + +test_pipeline = val_pipeline + +data_root = 'data/coco' +data = dict( + samples_per_gpu=64, + workers_per_gpu=2, + val_dataloader=dict(samples_per_gpu=32), + test_dataloader=dict(samples_per_gpu=32), + train=dict( + type='TopDownCocoDataset', + ann_file=f'{data_root}/annotations/person_keypoints_train2017.json', + img_prefix=f'{data_root}/train2017/', + data_cfg=data_cfg, + pipeline=train_pipeline), + val=dict( + type='TopDownCocoDataset', + ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', + img_prefix=f'{data_root}/val2017/', + data_cfg=data_cfg, + pipeline=val_pipeline), + test=dict( + type='TopDownCocoDataset', + ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', + img_prefix=f'{data_root}/val2017/', + data_cfg=data_cfg, + pipeline=val_pipeline), +) diff --git a/openmmlab_test/mmaction2-0.24.1/demo/long_video_demo.py b/openmmlab_test/mmaction2-0.24.1/demo/long_video_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..45df202dbb076f9d9d63a5b64c93c6a97b45dff9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/long_video_demo.py @@ -0,0 +1,265 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import json +import random +from collections import deque +from operator import itemgetter + +import cv2 +import mmcv +import numpy as np +import torch +from mmcv import Config, DictAction +from mmcv.parallel import collate, scatter + +from mmaction.apis import init_recognizer +from mmaction.datasets.pipelines import Compose + +FONTFACE = cv2.FONT_HERSHEY_COMPLEX_SMALL +FONTSCALE = 1 +THICKNESS = 1 +LINETYPE = 1 + +EXCLUED_STEPS = [ + 'OpenCVInit', 'OpenCVDecode', 'DecordInit', 'DecordDecode', 'PyAVInit', + 'PyAVDecode', 'RawFrameDecode' +] + + +def parse_args(): + parser = argparse.ArgumentParser( + description='MMAction2 predict different labels in a long video demo') + parser.add_argument('config', help='test config file path') + parser.add_argument('checkpoint', help='checkpoint file/url') + parser.add_argument('video_path', help='video file/url') + parser.add_argument('label', help='label file') + parser.add_argument('out_file', help='output result file in video/json') + parser.add_argument( + '--input-step', + type=int, + default=1, + help='input step for sampling frames') + parser.add_argument( + '--device', type=str, default='cuda:0', help='CPU/CUDA device option') + parser.add_argument( + '--threshold', + type=float, + default=0.01, + help='recognition score threshold') + parser.add_argument( + '--stride', + type=float, + default=0, + help=('the prediction stride equals to stride * sample_length ' + '(sample_length indicates the size of temporal window from ' + 'which you sample frames, which equals to ' + 'clip_len x frame_interval), if set as 0, the ' + 'prediction stride is 1')) + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + parser.add_argument( + '--label-color', + nargs='+', + type=int, + default=(255, 255, 255), + help='font color (B, G, R) of the labels in output video') + parser.add_argument( + '--msg-color', + nargs='+', + type=int, + default=(128, 128, 128), + help='font color (B, G, R) of the messages in output video') + args = parser.parse_args() + return args + + +def show_results_video(result_queue, + text_info, + thr, + msg, + frame, + video_writer, + label_color=(255, 255, 255), + msg_color=(128, 128, 128)): + if len(result_queue) != 0: + text_info = {} + results = result_queue.popleft() + for i, result in enumerate(results): + selected_label, score = result + if score < thr: + break + location = (0, 40 + i * 20) + text = selected_label + ': ' + str(round(score, 2)) + text_info[location] = text + cv2.putText(frame, text, location, FONTFACE, FONTSCALE, + label_color, THICKNESS, LINETYPE) + elif len(text_info): + for location, text in text_info.items(): + cv2.putText(frame, text, location, FONTFACE, FONTSCALE, + label_color, THICKNESS, LINETYPE) + else: + cv2.putText(frame, msg, (0, 40), FONTFACE, FONTSCALE, msg_color, + THICKNESS, LINETYPE) + video_writer.write(frame) + return text_info + + +def get_results_json(result_queue, text_info, thr, msg, ind, out_json): + if len(result_queue) != 0: + text_info = {} + results = result_queue.popleft() + for i, result in enumerate(results): + selected_label, score = result + if score < thr: + break + text_info[i + 1] = selected_label + ': ' + str(round(score, 2)) + out_json[ind] = text_info + elif len(text_info): + out_json[ind] = text_info + else: + out_json[ind] = msg + return text_info, out_json + + +def show_results(model, data, label, args): + frame_queue = deque(maxlen=args.sample_length) + result_queue = deque(maxlen=1) + + cap = cv2.VideoCapture(args.video_path) + num_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) + frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) + frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = cap.get(cv2.CAP_PROP_FPS) + + msg = 'Preparing action recognition ...' + text_info = {} + out_json = {} + fourcc = cv2.VideoWriter_fourcc(*'mp4v') + frame_size = (frame_width, frame_height) + + ind = 0 + video_writer = None if args.out_file.endswith('.json') \ + else cv2.VideoWriter(args.out_file, fourcc, fps, frame_size) + prog_bar = mmcv.ProgressBar(num_frames) + backup_frames = [] + + while ind < num_frames: + ind += 1 + prog_bar.update() + ret, frame = cap.read() + if frame is None: + # drop it when encounting None + continue + backup_frames.append(np.array(frame)[:, :, ::-1]) + if ind == args.sample_length: + # provide a quick show at the beginning + frame_queue.extend(backup_frames) + backup_frames = [] + elif ((len(backup_frames) == args.input_step + and ind > args.sample_length) or ind == num_frames): + # pick a frame from the backup + # when the backup is full or reach the last frame + chosen_frame = random.choice(backup_frames) + backup_frames = [] + frame_queue.append(chosen_frame) + + ret, scores = inference(model, data, args, frame_queue) + + if ret: + num_selected_labels = min(len(label), 5) + scores_tuples = tuple(zip(label, scores)) + scores_sorted = sorted( + scores_tuples, key=itemgetter(1), reverse=True) + results = scores_sorted[:num_selected_labels] + result_queue.append(results) + + if args.out_file.endswith('.json'): + text_info, out_json = get_results_json(result_queue, text_info, + args.threshold, msg, ind, + out_json) + else: + text_info = show_results_video(result_queue, text_info, + args.threshold, msg, frame, + video_writer, args.label_color, + args.msg_color) + + cap.release() + cv2.destroyAllWindows() + if args.out_file.endswith('.json'): + with open(args.out_file, 'w') as js: + json.dump(out_json, js) + + +def inference(model, data, args, frame_queue): + if len(frame_queue) != args.sample_length: + # Do no inference when there is no enough frames + return False, None + + cur_windows = list(np.array(frame_queue)) + if data['img_shape'] is None: + data['img_shape'] = frame_queue[0].shape[:2] + + cur_data = data.copy() + cur_data['imgs'] = cur_windows + cur_data = args.test_pipeline(cur_data) + cur_data = collate([cur_data], samples_per_gpu=1) + if next(model.parameters()).is_cuda: + cur_data = scatter(cur_data, [args.device])[0] + with torch.no_grad(): + scores = model(return_loss=False, **cur_data)[0] + + if args.stride > 0: + pred_stride = int(args.sample_length * args.stride) + for _ in range(pred_stride): + frame_queue.popleft() + + # for case ``args.stride=0`` + # deque will automatically popleft one element + + return True, scores + + +def main(): + args = parse_args() + + args.device = torch.device(args.device) + + cfg = Config.fromfile(args.config) + cfg.merge_from_dict(args.cfg_options) + + model = init_recognizer(cfg, args.checkpoint, device=args.device) + data = dict(img_shape=None, modality='RGB', label=-1) + with open(args.label, 'r') as f: + label = [line.strip() for line in f] + + # prepare test pipeline from non-camera pipeline + cfg = model.cfg + sample_length = 0 + pipeline = cfg.data.test.pipeline + pipeline_ = pipeline.copy() + for step in pipeline: + if 'SampleFrames' in step['type']: + sample_length = step['clip_len'] * step['num_clips'] + data['num_clips'] = step['num_clips'] + data['clip_len'] = step['clip_len'] + pipeline_.remove(step) + if step['type'] in EXCLUED_STEPS: + # remove step to decode frames + pipeline_.remove(step) + test_pipeline = Compose(pipeline_) + + assert sample_length > 0 + args.sample_length = sample_length + args.test_pipeline = test_pipeline + + show_results(model, data, label, args) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/demo/mmaction2_tutorial.ipynb b/openmmlab_test/mmaction2-0.24.1/demo/mmaction2_tutorial.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..8dea211bbb5cf893d72b10cc9b0cb8d511cdb90d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/mmaction2_tutorial.ipynb @@ -0,0 +1,1461 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "view-in-github" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VcjSRFELVbNk" + }, + "source": [ + "# MMAction2 Tutorial\n", + "\n", + "Welcome to MMAction2! This is the official colab tutorial for using MMAction2. In this tutorial, you will learn\n", + "- Perform inference with a MMAction2 recognizer.\n", + "- Train a new recognizer with a new dataset.\n", + "- Perform spatio-temporal detection.\n", + "\n", + "Let's start!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7LqHGkGEVqpm" + }, + "source": [ + "## Install MMAction2" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Bf8PpPXtVvmg", + "outputId": "75519a17-cc0a-491f-98a1-f287b090cf82" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "nvcc: NVIDIA (R) Cuda compiler driver\n", + "Copyright (c) 2005-2020 NVIDIA Corporation\n", + "Built on Mon_Oct_12_20:09:46_PDT_2020\n", + "Cuda compilation tools, release 11.1, V11.1.105\n", + "Build cuda_11.1.TC455_06.29190527_0\n", + "gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0\n", + "Copyright (C) 2017 Free Software Foundation, Inc.\n", + "This is free software; see the source for copying conditions. There is NO\n", + "warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n", + "\n" + ] + } + ], + "source": [ + "# Check nvcc version\n", + "!nvcc -V\n", + "# Check GCC version\n", + "!gcc --version" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5PAJ4ArzV5Ry", + "outputId": "992b30c2-8281-4198-97c8-df2a287b0ae8" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Looking in links: https://download.pytorch.org/whl/torch_stable.html\n", + "Collecting torch==1.8.0+cu101\n", + " Downloading https://download.pytorch.org/whl/cu101/torch-1.8.0%2Bcu101-cp37-cp37m-linux_x86_64.whl (763.5 MB)\n", + "\u001B[K |████████████████████████████████| 763.5 MB 15 kB/s \n", + "\u001B[?25hCollecting torchvision==0.9.0+cu101\n", + " Downloading https://download.pytorch.org/whl/cu101/torchvision-0.9.0%2Bcu101-cp37-cp37m-linux_x86_64.whl (17.3 MB)\n", + "\u001B[K |████████████████████████████████| 17.3 MB 983 kB/s \n", + "\u001B[?25hCollecting torchtext==0.9.0\n", + " Downloading torchtext-0.9.0-cp37-cp37m-manylinux1_x86_64.whl (7.1 MB)\n", + "\u001B[K |████████████████████████████████| 7.1 MB 10.9 MB/s \n", + "\u001B[?25hCollecting torchaudio==0.8.0\n", + " Downloading torchaudio-0.8.0-cp37-cp37m-manylinux1_x86_64.whl (1.9 MB)\n", + "\u001B[K |████████████████████████████████| 1.9 MB 46.6 MB/s \n", + "\u001B[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torch==1.8.0+cu101) (1.21.5)\n", + "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch==1.8.0+cu101) (3.10.0.2)\n", + "Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from torchvision==0.9.0+cu101) (7.1.2)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from torchtext==0.9.0) (4.62.3)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from torchtext==0.9.0) (2.23.0)\n", + "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (1.24.3)\n", + "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (2.10)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (2021.10.8)\n", + "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (3.0.4)\n", + "Installing collected packages: torch, torchvision, torchtext, torchaudio\n", + " Attempting uninstall: torch\n", + " Found existing installation: torch 1.10.0+cu111\n", + " Uninstalling torch-1.10.0+cu111:\n", + " Successfully uninstalled torch-1.10.0+cu111\n", + " Attempting uninstall: torchvision\n", + " Found existing installation: torchvision 0.11.1+cu111\n", + " Uninstalling torchvision-0.11.1+cu111:\n", + " Successfully uninstalled torchvision-0.11.1+cu111\n", + " Attempting uninstall: torchtext\n", + " Found existing installation: torchtext 0.11.0\n", + " Uninstalling torchtext-0.11.0:\n", + " Successfully uninstalled torchtext-0.11.0\n", + " Attempting uninstall: torchaudio\n", + " Found existing installation: torchaudio 0.10.0+cu111\n", + " Uninstalling torchaudio-0.10.0+cu111:\n", + " Successfully uninstalled torchaudio-0.10.0+cu111\n", + "Successfully installed torch-1.8.0+cu101 torchaudio-0.8.0 torchtext-0.9.0 torchvision-0.9.0+cu101\n", + "Looking in links: https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/index.html\n", + "Collecting mmcv-full\n", + " Downloading https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/mmcv_full-1.4.5-cp37-cp37m-manylinux1_x86_64.whl (60.7 MB)\n", + "\u001B[K |████████████████████████████████| 60.7 MB 66 kB/s \n", + "\u001B[?25hCollecting addict\n", + " Downloading addict-2.4.0-py3-none-any.whl (3.8 kB)\n", + "Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (7.1.2)\n", + "Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (3.13)\n", + "Collecting yapf\n", + " Downloading yapf-0.32.0-py2.py3-none-any.whl (190 kB)\n", + "\u001B[K |████████████████████████████████| 190 kB 15.6 MB/s \n", + "\u001B[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (1.21.5)\n", + "Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (21.3)\n", + "Requirement already satisfied: opencv-python>=3 in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (4.1.2.30)\n", + "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->mmcv-full) (3.0.7)\n", + "Installing collected packages: yapf, addict, mmcv-full\n", + "Successfully installed addict-2.4.0 mmcv-full-1.4.5 yapf-0.32.0\n", + "Cloning into 'mmaction2'...\n", + "remote: Enumerating objects: 15036, done.\u001B[K\n", + "remote: Counting objects: 100% (233/233), done.\u001B[K\n", + "remote: Compressing objects: 100% (192/192), done.\u001B[K\n", + "remote: Total 15036 (delta 86), reused 72 (delta 41), pack-reused 14803\u001B[K\n", + "Receiving objects: 100% (15036/15036), 49.25 MiB | 25.23 MiB/s, done.\n", + "Resolving deltas: 100% (10608/10608), done.\n", + "/content/mmaction2\n", + "Obtaining file:///content/mmaction2\n", + "Collecting decord>=0.4.1\n", + " Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl (13.6 MB)\n", + "\u001B[K |████████████████████████████████| 13.6 MB 10.2 MB/s \n", + "\u001B[?25hCollecting einops\n", + " Downloading einops-0.4.0-py3-none-any.whl (28 kB)\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (3.2.2)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (1.21.5)\n", + "Requirement already satisfied: opencv-contrib-python in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (4.1.2.30)\n", + "Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (7.1.2)\n", + "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (1.4.1)\n", + "Requirement already satisfied: torch>=1.3 in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (1.8.0+cu101)\n", + "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.3->mmaction2==0.21.0) (3.10.0.2)\n", + "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.21.0) (1.3.2)\n", + "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.21.0) (2.8.2)\n", + "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.21.0) (3.0.7)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.21.0) (0.11.0)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->mmaction2==0.21.0) (1.15.0)\n", + "Installing collected packages: einops, decord, mmaction2\n", + " Running setup.py develop for mmaction2\n", + "Successfully installed decord-0.6.0 einops-0.4.0 mmaction2-0.21.0\n", + "Collecting av\n", + " Downloading av-8.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36.1 MB)\n", + "\u001B[K |████████████████████████████████| 36.1 MB 298 kB/s \n", + "\u001B[?25hRequirement already satisfied: imgaug in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 2)) (0.2.9)\n", + "Requirement already satisfied: librosa in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 3)) (0.8.1)\n", + "Requirement already satisfied: lmdb in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 4)) (0.99)\n", + "Requirement already satisfied: moviepy in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 5)) (0.2.3.5)\n", + "Collecting onnx\n", + " Downloading onnx-1.11.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (12.8 MB)\n", + "\u001B[K |████████████████████████████████| 12.8 MB 52.3 MB/s \n", + "\u001B[?25hCollecting onnxruntime\n", + " Downloading onnxruntime-1.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB)\n", + "\u001B[K |████████████████████████████████| 4.9 MB 51.6 MB/s \n", + "\u001B[?25hCollecting pims\n", + " Downloading PIMS-0.5.tar.gz (85 kB)\n", + "\u001B[K |████████████████████████████████| 85 kB 5.2 MB/s \n", + "\u001B[?25hCollecting PyTurboJPEG\n", + " Downloading PyTurboJPEG-1.6.5.tar.gz (11 kB)\n", + "Collecting timm\n", + " Downloading timm-0.5.4-py3-none-any.whl (431 kB)\n", + "\u001B[K |████████████████████████████████| 431 kB 64.7 MB/s \n", + "\u001B[?25hRequirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (7.1.2)\n", + "Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (1.21.5)\n", + "Requirement already satisfied: scikit-image>=0.11.0 in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (0.18.3)\n", + "Requirement already satisfied: imageio in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (2.4.1)\n", + "Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (4.1.2.30)\n", + "Requirement already satisfied: Shapely in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (1.8.0)\n", + "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (1.4.1)\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (3.2.2)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (1.15.0)\n", + "Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 2)) (1.2.0)\n", + "Requirement already satisfied: tifffile>=2019.7.26 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 2)) (2021.11.2)\n", + "Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 2)) (2.6.3)\n", + "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 2)) (1.3.2)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 2)) (0.11.0)\n", + "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 2)) (2.8.2)\n", + "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 2)) (3.0.7)\n", + "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (21.3)\n", + "Requirement already satisfied: numba>=0.43.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (0.51.2)\n", + "Requirement already satisfied: resampy>=0.2.2 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (0.2.2)\n", + "Requirement already satisfied: decorator>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (4.4.2)\n", + "Requirement already satisfied: soundfile>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (0.10.3.post1)\n", + "Requirement already satisfied: scikit-learn!=0.19.0,>=0.14.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (1.0.2)\n", + "Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (1.1.0)\n", + "Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (1.6.0)\n", + "Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (2.1.9)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba>=0.43.0->librosa->-r requirements/optional.txt (line 3)) (57.4.0)\n", + "Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba>=0.43.0->librosa->-r requirements/optional.txt (line 3)) (0.34.0)\n", + "Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (2.23.0)\n", + "Requirement already satisfied: appdirs>=1.3.0 in /usr/local/lib/python3.7/dist-packages (from pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (1.4.4)\n", + "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (3.0.4)\n", + "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (1.24.3)\n", + "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (2.10)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (2021.10.8)\n", + "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn!=0.19.0,>=0.14.0->librosa->-r requirements/optional.txt (line 3)) (3.1.0)\n", + "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.7/dist-packages (from soundfile>=0.10.2->librosa->-r requirements/optional.txt (line 3)) (1.15.0)\n", + "Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.0->soundfile>=0.10.2->librosa->-r requirements/optional.txt (line 3)) (2.21)\n", + "Requirement already satisfied: tqdm<5.0,>=4.11.2 in /usr/local/lib/python3.7/dist-packages (from moviepy->-r requirements/optional.txt (line 5)) (4.62.3)\n", + "Requirement already satisfied: typing-extensions>=3.6.2.1 in /usr/local/lib/python3.7/dist-packages (from onnx->-r requirements/optional.txt (line 6)) (3.10.0.2)\n", + "Requirement already satisfied: protobuf>=3.12.2 in /usr/local/lib/python3.7/dist-packages (from onnx->-r requirements/optional.txt (line 6)) (3.17.3)\n", + "Requirement already satisfied: flatbuffers in /usr/local/lib/python3.7/dist-packages (from onnxruntime->-r requirements/optional.txt (line 7)) (2.0)\n", + "Collecting slicerator>=0.9.8\n", + " Downloading slicerator-1.0.0-py3-none-any.whl (9.3 kB)\n", + "Requirement already satisfied: torch>=1.4 in /usr/local/lib/python3.7/dist-packages (from timm->-r requirements/optional.txt (line 10)) (1.8.0+cu101)\n", + "Requirement already satisfied: torchvision in /usr/local/lib/python3.7/dist-packages (from timm->-r requirements/optional.txt (line 10)) (0.9.0+cu101)\n", + "Building wheels for collected packages: pims, PyTurboJPEG\n", + " Building wheel for pims (setup.py) ... \u001B[?25l\u001B[?25hdone\n", + " Created wheel for pims: filename=PIMS-0.5-py3-none-any.whl size=84325 sha256=acdeb0697c66e2b9cc49a549f9a3c67a35b36642e6724eeac9795e25e6d9de47\n", + " Stored in directory: /root/.cache/pip/wheels/75/02/a9/86571c38081ba4c1832eb95430b5d588dfa15a738e2a603737\n", + " Building wheel for PyTurboJPEG (setup.py) ... \u001B[?25l\u001B[?25hdone\n", + " Created wheel for PyTurboJPEG: filename=PyTurboJPEG-1.6.5-py3-none-any.whl size=12160 sha256=b5fffd01e16b4d2a1d2f4e1cd976501c1e3ea1b3872f91bf595f6c025735a4e0\n", + " Stored in directory: /root/.cache/pip/wheels/1b/6a/97/17286b24cd97dda462b5a886107f8663f1ccc7705f148b3850\n", + "Successfully built pims PyTurboJPEG\n", + "Installing collected packages: slicerator, timm, PyTurboJPEG, pims, onnxruntime, onnx, av\n", + "Successfully installed PyTurboJPEG-1.6.5 av-8.1.0 onnx-1.11.0 onnxruntime-1.10.0 pims-0.5 slicerator-1.0.0 timm-0.5.4\n" + ] + } + ], + "source": [ + "# install dependencies: (use cu111 because colab has CUDA 11.1)\n", + "!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html\n", + "\n", + "# install mmcv-full thus we could use CUDA operators\n", + "!pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html\n", + "\n", + "# Install mmaction2\n", + "!rm -rf mmaction2\n", + "!git clone https://github.com/open-mmlab/mmaction2.git\n", + "%cd mmaction2\n", + "\n", + "!pip install -e .\n", + "\n", + "# Install some optional requirements\n", + "!pip install -r requirements/optional.txt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "No_zZAFpWC-a", + "outputId": "1f5dd76e-7749-4fc3-ee97-83c5e1700f29" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.8.0+cu101 True\n", + "0.21.0\n", + "10.1\n", + "GCC 7.3\n" + ] + } + ], + "source": [ + "# Check Pytorch installation\n", + "import torch, torchvision\n", + "print(torch.__version__, torch.cuda.is_available())\n", + "\n", + "# Check MMAction2 installation\n", + "import mmaction\n", + "print(mmaction.__version__)\n", + "\n", + "# Check MMCV installation\n", + "from mmcv.ops import get_compiling_cuda_version, get_compiler_version\n", + "print(get_compiling_cuda_version())\n", + "print(get_compiler_version())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pXf7oV5DWdab" + }, + "source": [ + "## Perform inference with a MMAction2 recognizer\n", + "MMAction2 already provides high level APIs to do inference and training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "64CW6d_AaT-Q", + "outputId": "d08bfb9b-ab1e-451b-d3b2-89023a59766b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2021-07-11 12:44:00-- https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth\n", + "Resolving download.openmmlab.com (download.openmmlab.com)... 47.88.36.78\n", + "Connecting to download.openmmlab.com (download.openmmlab.com)|47.88.36.78|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 97579339 (93M) [application/octet-stream]\n", + "Saving to: ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’\n", + "\n", + "checkpoints/tsn_r50 100%[===================>] 93.06M 11.4MB/s in 8.1s \n", + "\n", + "2021-07-11 12:44:09 (11.4 MB/s) - ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’ saved [97579339/97579339]\n", + "\n" + ] + } + ], + "source": [ + "!mkdir checkpoints\n", + "!wget -c https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \\\n", + " -O checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "HNZB7NoSabzj", + "outputId": "b2f9bd71-1490-44d3-81c6-5037d804f0b1" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Use load_from_local loader\n" + ] + } + ], + "source": [ + "from mmaction.apis import inference_recognizer, init_recognizer\n", + "\n", + "# Choose to use a config and initialize the recognizer\n", + "config = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'\n", + "# Setup a checkpoint file to load\n", + "checkpoint = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n", + "# Initialize the recognizer\n", + "model = init_recognizer(config, checkpoint, device='cuda:0')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rEMsBnpHapAn" + }, + "outputs": [], + "source": [ + "# Use the recognizer to do inference\n", + "video = 'demo/demo.mp4'\n", + "label = 'tools/data/kinetics/label_map_k400.txt'\n", + "results = inference_recognizer(model, video)\n", + "\n", + "labels = open(label).readlines()\n", + "labels = [x.strip() for x in labels]\n", + "results = [(labels[k[0]], k[1]) for k in results]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "NIyJXqfWathq", + "outputId": "ca24528b-f99d-414a-fa50-456f6068b463" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "arm wrestling: 29.616438\n", + "rock scissors paper: 10.754841\n", + "shaking hands: 9.908401\n", + "clapping: 9.189913\n", + "massaging feet: 8.305307\n" + ] + } + ], + "source": [ + "# Let's show the results\n", + "for result in results:\n", + " print(f'{result[0]}: ', result[1])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QuZG8kZ2fJ5d" + }, + "source": [ + "## Train a recognizer on customized dataset\n", + "\n", + "To train a new recognizer, there are usually three things to do:\n", + "1. Support a new dataset\n", + "2. Modify the config\n", + "3. Train a new recognizer" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "neEFyxChfgiJ" + }, + "source": [ + "### Support a new dataset\n", + "\n", + "In this tutorial, we gives an example to convert the data into the format of existing datasets. Other methods and more advanced usages can be found in the [doc](/docs/tutorials/new_dataset.md)\n", + "\n", + "Firstly, let's download a tiny dataset obtained from [Kinetics-400](https://deepmind.com/research/open-source/open-source-datasets/kinetics/). We select 30 videos with their labels as train dataset and 10 videos with their labels as test dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "gjsUj9JzgUlJ", + "outputId": "61c4704d-db81-4ca5-ed16-e2454dbdfe8e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "rm: cannot remove 'kinetics400_tiny.zip*': No such file or directory\n", + "--2021-07-11 12:44:29-- https://download.openmmlab.com/mmaction/kinetics400_tiny.zip\n", + "Resolving download.openmmlab.com (download.openmmlab.com)... 47.88.36.78\n", + "Connecting to download.openmmlab.com (download.openmmlab.com)|47.88.36.78|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 18308682 (17M) [application/zip]\n", + "Saving to: ‘kinetics400_tiny.zip’\n", + "\n", + "kinetics400_tiny.zi 100%[===================>] 17.46M 10.7MB/s in 1.6s \n", + "\n", + "2021-07-11 12:44:31 (10.7 MB/s) - ‘kinetics400_tiny.zip’ saved [18308682/18308682]\n", + "\n" + ] + } + ], + "source": [ + "# download, decompress the data\n", + "!rm kinetics400_tiny.zip*\n", + "!rm -rf kinetics400_tiny\n", + "!wget https://download.openmmlab.com/mmaction/kinetics400_tiny.zip\n", + "!unzip kinetics400_tiny.zip > /dev/null" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "AbZ-o7V6hNw4", + "outputId": "b091909c-def2-49b5-88c2-01b00802b162" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Reading package lists...\n", + "Building dependency tree...\n", + "Reading state information...\n", + "The following NEW packages will be installed:\n", + " tree\n", + "0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.\n", + "Need to get 40.7 kB of archives.\n", + "After this operation, 105 kB of additional disk space will be used.\n", + "Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]\n", + "Fetched 40.7 kB in 0s (88.7 kB/s)\n", + "Selecting previously unselected package tree.\n", + "(Reading database ... 160815 files and directories currently installed.)\n", + "Preparing to unpack .../tree_1.7.0-5_amd64.deb ...\n", + "Unpacking tree (1.7.0-5) ...\n", + "Setting up tree (1.7.0-5) ...\n", + "Processing triggers for man-db (2.8.3-2ubuntu0.1) ...\n", + "kinetics400_tiny\n", + "├── kinetics_tiny_train_video.txt\n", + "├── kinetics_tiny_val_video.txt\n", + "├── train\n", + "│   ├── 27_CSXByd3s.mp4\n", + "│   ├── 34XczvTaRiI.mp4\n", + "│   ├── A-wiliK50Zw.mp4\n", + "│   ├── D32_1gwq35E.mp4\n", + "│   ├── D92m0HsHjcQ.mp4\n", + "│   ├── DbX8mPslRXg.mp4\n", + "│   ├── FMlSTTpN3VY.mp4\n", + "│   ├── h10B9SVE-nk.mp4\n", + "│   ├── h2YqqUhnR34.mp4\n", + "│   ├── iRuyZSKhHRg.mp4\n", + "│   ├── IyfILH9lBRo.mp4\n", + "│   ├── kFC3KY2bOP8.mp4\n", + "│   ├── LvcFDgCAXQs.mp4\n", + "│   ├── O46YA8tI530.mp4\n", + "│   ├── oMrZaozOvdQ.mp4\n", + "│   ├── oXy-e_P_cAI.mp4\n", + "│   ├── P5M-hAts7MQ.mp4\n", + "│   ├── phDqGd0NKoo.mp4\n", + "│   ├── PnOe3GZRVX8.mp4\n", + "│   ├── R8HXQkdgKWA.mp4\n", + "│   ├── RqnKtCEoEcA.mp4\n", + "│   ├── soEcZZsBmDs.mp4\n", + "│   ├── TkkZPZHbAKA.mp4\n", + "│   ├── T_TMNGzVrDk.mp4\n", + "│   ├── WaS0qwP46Us.mp4\n", + "│   ├── Wh_YPQdH1Zg.mp4\n", + "│   ├── WWP5HZJsg-o.mp4\n", + "│   ├── xGY2dP0YUjA.mp4\n", + "│   ├── yLC9CtWU5ws.mp4\n", + "│   └── ZQV4U2KQ370.mp4\n", + "└── val\n", + " ├── 0pVGiAU6XEA.mp4\n", + " ├── AQrbRSnRt8M.mp4\n", + " ├── b6Q_b7vgc7Q.mp4\n", + " ├── ddvJ6-faICE.mp4\n", + " ├── IcLztCtvhb8.mp4\n", + " ├── ik4BW3-SCts.mp4\n", + " ├── jqRrH30V0k4.mp4\n", + " ├── SU_x2LQqSLs.mp4\n", + " ├── u4Rm6srmIS8.mp4\n", + " └── y5Iu7XkTqV0.mp4\n", + "\n", + "2 directories, 42 files\n" + ] + } + ], + "source": [ + "# Check the directory structure of the tiny data\n", + "\n", + "# Install tree first\n", + "!apt-get -q install tree\n", + "!tree kinetics400_tiny" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "fTdi6dI0hY3g", + "outputId": "ffda0997-8d77-431a-d66e-2f273e80c756" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "D32_1gwq35E.mp4 0\n", + "iRuyZSKhHRg.mp4 1\n", + "oXy-e_P_cAI.mp4 0\n", + "34XczvTaRiI.mp4 1\n", + "h2YqqUhnR34.mp4 0\n", + "O46YA8tI530.mp4 0\n", + "kFC3KY2bOP8.mp4 1\n", + "WWP5HZJsg-o.mp4 1\n", + "phDqGd0NKoo.mp4 1\n", + "yLC9CtWU5ws.mp4 0\n", + "27_CSXByd3s.mp4 1\n", + "IyfILH9lBRo.mp4 1\n", + "T_TMNGzVrDk.mp4 1\n", + "TkkZPZHbAKA.mp4 0\n", + "PnOe3GZRVX8.mp4 1\n", + "soEcZZsBmDs.mp4 1\n", + "FMlSTTpN3VY.mp4 1\n", + "WaS0qwP46Us.mp4 0\n", + "A-wiliK50Zw.mp4 1\n", + "oMrZaozOvdQ.mp4 1\n", + "ZQV4U2KQ370.mp4 0\n", + "DbX8mPslRXg.mp4 1\n", + "h10B9SVE-nk.mp4 1\n", + "P5M-hAts7MQ.mp4 0\n", + "R8HXQkdgKWA.mp4 0\n", + "D92m0HsHjcQ.mp4 0\n", + "RqnKtCEoEcA.mp4 0\n", + "LvcFDgCAXQs.mp4 0\n", + "xGY2dP0YUjA.mp4 0\n", + "Wh_YPQdH1Zg.mp4 0\n" + ] + } + ], + "source": [ + "# After downloading the data, we need to check the annotation format\n", + "!cat kinetics400_tiny/kinetics_tiny_train_video.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0bq0mxmEi29H" + }, + "source": [ + "According to the format defined in [`VideoDataset`](./datasets/video_dataset.py), each line indicates a sample video with the filepath and label, which are split with a whitespace." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ht_DGJA9jQar" + }, + "source": [ + "### Modify the config\n", + "\n", + "In the next step, we need to modify the config for the training.\n", + "To accelerate the process, we finetune a recognizer using a pre-trained recognizer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LjCcmCKOjktc" + }, + "outputs": [], + "source": [ + "from mmcv import Config\n", + "cfg = Config.fromfile('./configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tc8YhFFGjp3e" + }, + "source": [ + "Given a config that trains a TSN model on kinetics400-full dataset, we need to modify some values to use it for training TSN on Kinetics400-tiny dataset.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "tlhu9byjjt-K", + "outputId": "3b9a3c49-ace0-41d3-dd15-d6c8579755f8" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Config:\n", + "model = dict(\n", + " type='Recognizer2D',\n", + " backbone=dict(\n", + " type='ResNet',\n", + " pretrained='torchvision://resnet50',\n", + " depth=50,\n", + " norm_eval=False),\n", + " cls_head=dict(\n", + " type='TSNHead',\n", + " num_classes=2,\n", + " in_channels=2048,\n", + " spatial_type='avg',\n", + " consensus=dict(type='AvgConsensus', dim=1),\n", + " dropout_ratio=0.4,\n", + " init_std=0.01),\n", + " train_cfg=None,\n", + " test_cfg=dict(average_clips=None))\n", + "optimizer = dict(type='SGD', lr=7.8125e-05, momentum=0.9, weight_decay=0.0001)\n", + "optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))\n", + "lr_config = dict(policy='step', step=[40, 80])\n", + "total_epochs = 10\n", + "checkpoint_config = dict(interval=5)\n", + "log_config = dict(interval=5, hooks=[dict(type='TextLoggerHook')])\n", + "dist_params = dict(backend='nccl')\n", + "log_level = 'INFO'\n", + "load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n", + "resume_from = None\n", + "workflow = [('train', 1)]\n", + "dataset_type = 'VideoDataset'\n", + "data_root = 'kinetics400_tiny/train/'\n", + "data_root_val = 'kinetics400_tiny/val/'\n", + "ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n", + "ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "ann_file_test = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "img_norm_cfg = dict(\n", + " mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)\n", + "train_pipeline = [\n", + " dict(type='DecordInit'),\n", + " dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8),\n", + " dict(type='DecordDecode'),\n", + " dict(\n", + " type='MultiScaleCrop',\n", + " input_size=224,\n", + " scales=(1, 0.875, 0.75, 0.66),\n", + " random_crop=False,\n", + " max_wh_scale_gap=1),\n", + " dict(type='Resize', scale=(224, 224), keep_ratio=False),\n", + " dict(type='Flip', flip_ratio=0.5),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs', 'label'])\n", + "]\n", + "val_pipeline = [\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames',\n", + " clip_len=1,\n", + " frame_interval=1,\n", + " num_clips=8,\n", + " test_mode=True),\n", + " dict(type='DecordDecode'),\n", + " dict(type='Resize', scale=(-1, 256)),\n", + " dict(type='CenterCrop', crop_size=224),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs'])\n", + "]\n", + "test_pipeline = [\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames',\n", + " clip_len=1,\n", + " frame_interval=1,\n", + " num_clips=25,\n", + " test_mode=True),\n", + " dict(type='DecordDecode'),\n", + " dict(type='Resize', scale=(-1, 256)),\n", + " dict(type='ThreeCrop', crop_size=256),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs'])\n", + "]\n", + "data = dict(\n", + " videos_per_gpu=2,\n", + " workers_per_gpu=2,\n", + " train=dict(\n", + " type='VideoDataset',\n", + " ann_file='kinetics400_tiny/kinetics_tiny_train_video.txt',\n", + " data_prefix='kinetics400_tiny/train/',\n", + " pipeline=[\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames', clip_len=1, frame_interval=1,\n", + " num_clips=8),\n", + " dict(type='DecordDecode'),\n", + " dict(\n", + " type='MultiScaleCrop',\n", + " input_size=224,\n", + " scales=(1, 0.875, 0.75, 0.66),\n", + " random_crop=False,\n", + " max_wh_scale_gap=1),\n", + " dict(type='Resize', scale=(224, 224), keep_ratio=False),\n", + " dict(type='Flip', flip_ratio=0.5),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs', 'label'])\n", + " ]),\n", + " val=dict(\n", + " type='VideoDataset',\n", + " ann_file='kinetics400_tiny/kinetics_tiny_val_video.txt',\n", + " data_prefix='kinetics400_tiny/val/',\n", + " pipeline=[\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames',\n", + " clip_len=1,\n", + " frame_interval=1,\n", + " num_clips=8,\n", + " test_mode=True),\n", + " dict(type='DecordDecode'),\n", + " dict(type='Resize', scale=(-1, 256)),\n", + " dict(type='CenterCrop', crop_size=224),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs'])\n", + " ]),\n", + " test=dict(\n", + " type='VideoDataset',\n", + " ann_file='kinetics400_tiny/kinetics_tiny_val_video.txt',\n", + " data_prefix='kinetics400_tiny/val/',\n", + " pipeline=[\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames',\n", + " clip_len=1,\n", + " frame_interval=1,\n", + " num_clips=25,\n", + " test_mode=True),\n", + " dict(type='DecordDecode'),\n", + " dict(type='Resize', scale=(-1, 256)),\n", + " dict(type='ThreeCrop', crop_size=256),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs'])\n", + " ]))\n", + "evaluation = dict(\n", + " interval=5,\n", + " metrics=['top_k_accuracy', 'mean_class_accuracy'],\n", + " save_best='auto')\n", + "work_dir = './tutorial_exps'\n", + "omnisource = False\n", + "seed = 0\n", + "gpu_ids = range(0, 1)\n", + "\n" + ] + } + ], + "source": [ + "from mmcv.runner import set_random_seed\n", + "\n", + "# Modify dataset type and path\n", + "cfg.dataset_type = 'VideoDataset'\n", + "cfg.data_root = 'kinetics400_tiny/train/'\n", + "cfg.data_root_val = 'kinetics400_tiny/val/'\n", + "cfg.ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n", + "cfg.ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "cfg.ann_file_test = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "\n", + "cfg.data.test.type = 'VideoDataset'\n", + "cfg.data.test.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "cfg.data.test.data_prefix = 'kinetics400_tiny/val/'\n", + "\n", + "cfg.data.train.type = 'VideoDataset'\n", + "cfg.data.train.ann_file = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n", + "cfg.data.train.data_prefix = 'kinetics400_tiny/train/'\n", + "\n", + "cfg.data.val.type = 'VideoDataset'\n", + "cfg.data.val.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "cfg.data.val.data_prefix = 'kinetics400_tiny/val/'\n", + "\n", + "# The flag is used to determine whether it is omnisource training\n", + "cfg.setdefault('omnisource', False)\n", + "# Modify num classes of the model in cls_head\n", + "cfg.model.cls_head.num_classes = 2\n", + "# We can use the pre-trained TSN model\n", + "cfg.load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n", + "\n", + "# Set up working dir to save files and logs.\n", + "cfg.work_dir = './tutorial_exps'\n", + "\n", + "# The original learning rate (LR) is set for 8-GPU training.\n", + "# We divide it by 8 since we only use one GPU.\n", + "cfg.data.videos_per_gpu = cfg.data.videos_per_gpu // 16\n", + "cfg.optimizer.lr = cfg.optimizer.lr / 8 / 16\n", + "cfg.total_epochs = 10\n", + "\n", + "# We can set the checkpoint saving interval to reduce the storage cost\n", + "cfg.checkpoint_config.interval = 5\n", + "# We can set the log print interval to reduce the the times of printing log\n", + "cfg.log_config.interval = 5\n", + "\n", + "# Set seed thus the results are more reproducible\n", + "cfg.seed = 0\n", + "set_random_seed(0, deterministic=False)\n", + "cfg.gpu_ids = range(1)\n", + "\n", + "# Save the best\n", + "cfg.evaluation.save_best='auto'\n", + "\n", + "\n", + "# We can initialize the logger for training and have a look\n", + "# at the final config used for training\n", + "print(f'Config:\\n{cfg.pretty_text}')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tES-qnZ3k38Z" + }, + "source": [ + "### Train a new recognizer\n", + "\n", + "Finally, lets initialize the dataset and recognizer, then train a new recognizer!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "dDBWkdDRk6oz", + "outputId": "a85d80d7-b3c4-43f1-d49a-057e8036807f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Use load_from_torchvision loader\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-07-11 13:00:46,931 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}\n", + "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n", + " cpuset_checked))\n", + "2021-07-11 13:00:46,980 - mmaction - INFO - load checkpoint from ./checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth\n", + "2021-07-11 13:00:46,981 - mmaction - INFO - Use load_from_local loader\n", + "2021-07-11 13:00:47,071 - mmaction - WARNING - The model and loaded state dict do not match exactly\n", + "\n", + "size mismatch for cls_head.fc_cls.weight: copying a param with shape torch.Size([400, 2048]) from checkpoint, the shape in current model is torch.Size([2, 2048]).\n", + "size mismatch for cls_head.fc_cls.bias: copying a param with shape torch.Size([400]) from checkpoint, the shape in current model is torch.Size([2]).\n", + "2021-07-11 13:00:47,074 - mmaction - INFO - Start running, host: root@b465112b4add, work_dir: /content/mmaction2/tutorial_exps\n", + "2021-07-11 13:00:47,078 - mmaction - INFO - Hooks will be executed in the following order:\n", + "before_run:\n", + "(VERY_HIGH ) StepLrUpdaterHook \n", + "(NORMAL ) CheckpointHook \n", + "(NORMAL ) EvalHook \n", + "(VERY_LOW ) TextLoggerHook \n", + " -------------------- \n", + "before_train_epoch:\n", + "(VERY_HIGH ) StepLrUpdaterHook \n", + "(NORMAL ) EvalHook \n", + "(LOW ) IterTimerHook \n", + "(VERY_LOW ) TextLoggerHook \n", + " -------------------- \n", + "before_train_iter:\n", + "(VERY_HIGH ) StepLrUpdaterHook \n", + "(NORMAL ) EvalHook \n", + "(LOW ) IterTimerHook \n", + " -------------------- \n", + "after_train_iter:\n", + "(ABOVE_NORMAL) OptimizerHook \n", + "(NORMAL ) CheckpointHook \n", + "(NORMAL ) EvalHook \n", + "(LOW ) IterTimerHook \n", + "(VERY_LOW ) TextLoggerHook \n", + " -------------------- \n", + "after_train_epoch:\n", + "(NORMAL ) CheckpointHook \n", + "(NORMAL ) EvalHook \n", + "(VERY_LOW ) TextLoggerHook \n", + " -------------------- \n", + "before_val_epoch:\n", + "(LOW ) IterTimerHook \n", + "(VERY_LOW ) TextLoggerHook \n", + " -------------------- \n", + "before_val_iter:\n", + "(LOW ) IterTimerHook \n", + " -------------------- \n", + "after_val_iter:\n", + "(LOW ) IterTimerHook \n", + " -------------------- \n", + "after_val_epoch:\n", + "(VERY_LOW ) TextLoggerHook \n", + " -------------------- \n", + "2021-07-11 13:00:47,081 - mmaction - INFO - workflow: [('train', 1)], max: 10 epochs\n", + "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/evaluation.py:190: UserWarning: runner.meta is None. Creating an empty one.\n", + " warnings.warn('runner.meta is None. Creating an empty one.')\n", + "2021-07-11 13:00:51,802 - mmaction - INFO - Epoch [1][5/15]\tlr: 7.813e-05, eta: 0:02:16, time: 0.942, data_time: 0.730, memory: 2918, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7604, loss: 0.7604, grad_norm: 14.8813\n", + "2021-07-11 13:00:52,884 - mmaction - INFO - Epoch [1][10/15]\tlr: 7.813e-05, eta: 0:01:21, time: 0.217, data_time: 0.028, memory: 2918, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6282, loss: 0.6282, grad_norm: 10.1834\n", + "2021-07-11 13:00:53,706 - mmaction - INFO - Epoch [1][15/15]\tlr: 7.813e-05, eta: 0:00:59, time: 0.164, data_time: 0.001, memory: 2918, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7165, loss: 0.7165, grad_norm: 10.8534\n", + "2021-07-11 13:00:57,724 - mmaction - INFO - Epoch [2][5/15]\tlr: 7.813e-05, eta: 0:01:09, time: 0.802, data_time: 0.596, memory: 2918, top1_acc: 0.3000, top5_acc: 1.0000, loss_cls: 0.7001, loss: 0.7001, grad_norm: 11.4311\n", + "2021-07-11 13:00:59,219 - mmaction - INFO - Epoch [2][10/15]\tlr: 7.813e-05, eta: 0:01:00, time: 0.296, data_time: 0.108, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6916, loss: 0.6916, grad_norm: 12.7101\n", + "2021-07-11 13:01:00,040 - mmaction - INFO - Epoch [2][15/15]\tlr: 7.813e-05, eta: 0:00:51, time: 0.167, data_time: 0.004, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6567, loss: 0.6567, grad_norm: 8.8837\n", + "2021-07-11 13:01:04,152 - mmaction - INFO - Epoch [3][5/15]\tlr: 7.813e-05, eta: 0:00:56, time: 0.820, data_time: 0.618, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6320, loss: 0.6320, grad_norm: 11.4025\n", + "2021-07-11 13:01:05,526 - mmaction - INFO - Epoch [3][10/15]\tlr: 7.813e-05, eta: 0:00:50, time: 0.276, data_time: 0.075, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6542, loss: 0.6542, grad_norm: 10.6429\n", + "2021-07-11 13:01:06,350 - mmaction - INFO - Epoch [3][15/15]\tlr: 7.813e-05, eta: 0:00:44, time: 0.165, data_time: 0.001, memory: 2918, top1_acc: 0.2000, top5_acc: 1.0000, loss_cls: 0.7661, loss: 0.7661, grad_norm: 12.8421\n", + "2021-07-11 13:01:10,771 - mmaction - INFO - Epoch [4][5/15]\tlr: 7.813e-05, eta: 0:00:47, time: 0.883, data_time: 0.676, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6410, loss: 0.6410, grad_norm: 10.6697\n", + "2021-07-11 13:01:11,776 - mmaction - INFO - Epoch [4][10/15]\tlr: 7.813e-05, eta: 0:00:42, time: 0.201, data_time: 0.011, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6949, loss: 0.6949, grad_norm: 10.5467\n", + "2021-07-11 13:01:12,729 - mmaction - INFO - Epoch [4][15/15]\tlr: 7.813e-05, eta: 0:00:38, time: 0.190, data_time: 0.026, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6290, loss: 0.6290, grad_norm: 11.2779\n", + "2021-07-11 13:01:16,816 - mmaction - INFO - Epoch [5][5/15]\tlr: 7.813e-05, eta: 0:00:38, time: 0.817, data_time: 0.608, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6011, loss: 0.6011, grad_norm: 9.1335\n", + "2021-07-11 13:01:18,176 - mmaction - INFO - Epoch [5][10/15]\tlr: 7.813e-05, eta: 0:00:35, time: 0.272, data_time: 0.080, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6652, loss: 0.6652, grad_norm: 11.0616\n", + "2021-07-11 13:01:19,119 - mmaction - INFO - Epoch [5][15/15]\tlr: 7.813e-05, eta: 0:00:32, time: 0.188, data_time: 0.017, memory: 2918, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6440, loss: 0.6440, grad_norm: 11.6473\n", + "2021-07-11 13:01:19,120 - mmaction - INFO - Saving checkpoint at 5 epochs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 4.9 task/s, elapsed: 2s, ETA: 0s" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-07-11 13:01:21,673 - mmaction - INFO - Evaluating top_k_accuracy ...\n", + "2021-07-11 13:01:21,677 - mmaction - INFO - \n", + "top1_acc\t0.7000\n", + "top5_acc\t1.0000\n", + "2021-07-11 13:01:21,679 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", + "2021-07-11 13:01:21,682 - mmaction - INFO - \n", + "mean_acc\t0.7000\n", + "2021-07-11 13:01:22,264 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_5.pth.\n", + "2021-07-11 13:01:22,267 - mmaction - INFO - Best top1_acc is 0.7000 at 5 epoch.\n", + "2021-07-11 13:01:22,271 - mmaction - INFO - Epoch(val) [5][5]\ttop1_acc: 0.7000, top5_acc: 1.0000, mean_class_accuracy: 0.7000\n", + "2021-07-11 13:01:26,623 - mmaction - INFO - Epoch [6][5/15]\tlr: 7.813e-05, eta: 0:00:31, time: 0.868, data_time: 0.656, memory: 2918, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6753, loss: 0.6753, grad_norm: 11.8640\n", + "2021-07-11 13:01:27,597 - mmaction - INFO - Epoch [6][10/15]\tlr: 7.813e-05, eta: 0:00:28, time: 0.195, data_time: 0.003, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6715, loss: 0.6715, grad_norm: 11.3347\n", + "2021-07-11 13:01:28,736 - mmaction - INFO - Epoch [6][15/15]\tlr: 7.813e-05, eta: 0:00:25, time: 0.228, data_time: 0.063, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5769, loss: 0.5769, grad_norm: 9.2541\n", + "2021-07-11 13:01:32,860 - mmaction - INFO - Epoch [7][5/15]\tlr: 7.813e-05, eta: 0:00:24, time: 0.822, data_time: 0.620, memory: 2918, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.5379, loss: 0.5379, grad_norm: 8.0147\n", + "2021-07-11 13:01:34,340 - mmaction - INFO - Epoch [7][10/15]\tlr: 7.813e-05, eta: 0:00:22, time: 0.298, data_time: 0.109, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6187, loss: 0.6187, grad_norm: 11.5244\n", + "2021-07-11 13:01:35,165 - mmaction - INFO - Epoch [7][15/15]\tlr: 7.813e-05, eta: 0:00:19, time: 0.165, data_time: 0.002, memory: 2918, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7063, loss: 0.7063, grad_norm: 12.4979\n", + "2021-07-11 13:01:39,435 - mmaction - INFO - Epoch [8][5/15]\tlr: 7.813e-05, eta: 0:00:17, time: 0.853, data_time: 0.641, memory: 2918, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.5369, loss: 0.5369, grad_norm: 8.6545\n", + "2021-07-11 13:01:40,808 - mmaction - INFO - Epoch [8][10/15]\tlr: 7.813e-05, eta: 0:00:15, time: 0.275, data_time: 0.086, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6407, loss: 0.6407, grad_norm: 12.5537\n", + "2021-07-11 13:01:41,627 - mmaction - INFO - Epoch [8][15/15]\tlr: 7.813e-05, eta: 0:00:12, time: 0.164, data_time: 0.001, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6073, loss: 0.6073, grad_norm: 11.4028\n", + "2021-07-11 13:01:45,651 - mmaction - INFO - Epoch [9][5/15]\tlr: 7.813e-05, eta: 0:00:11, time: 0.803, data_time: 0.591, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5596, loss: 0.5596, grad_norm: 10.0821\n", + "2021-07-11 13:01:46,891 - mmaction - INFO - Epoch [9][10/15]\tlr: 7.813e-05, eta: 0:00:08, time: 0.248, data_time: 0.044, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6470, loss: 0.6470, grad_norm: 11.8979\n", + "2021-07-11 13:01:47,944 - mmaction - INFO - Epoch [9][15/15]\tlr: 7.813e-05, eta: 0:00:06, time: 0.211, data_time: 0.041, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6657, loss: 0.6657, grad_norm: 12.0643\n", + "2021-07-11 13:01:52,200 - mmaction - INFO - Epoch [10][5/15]\tlr: 7.813e-05, eta: 0:00:04, time: 0.849, data_time: 0.648, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6310, loss: 0.6310, grad_norm: 11.5690\n", + "2021-07-11 13:01:53,707 - mmaction - INFO - Epoch [10][10/15]\tlr: 7.813e-05, eta: 0:00:02, time: 0.303, data_time: 0.119, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5178, loss: 0.5178, grad_norm: 9.3324\n", + "2021-07-11 13:01:54,520 - mmaction - INFO - Epoch [10][15/15]\tlr: 7.813e-05, eta: 0:00:00, time: 0.162, data_time: 0.001, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6919, loss: 0.6919, grad_norm: 12.6688\n", + "2021-07-11 13:01:54,522 - mmaction - INFO - Saving checkpoint at 10 epochs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 5.9 task/s, elapsed: 2s, ETA: 0s" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-07-11 13:01:56,741 - mmaction - INFO - Evaluating top_k_accuracy ...\n", + "2021-07-11 13:01:56,743 - mmaction - INFO - \n", + "top1_acc\t1.0000\n", + "top5_acc\t1.0000\n", + "2021-07-11 13:01:56,749 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", + "2021-07-11 13:01:56,750 - mmaction - INFO - \n", + "mean_acc\t1.0000\n", + "2021-07-11 13:01:57,267 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_10.pth.\n", + "2021-07-11 13:01:57,269 - mmaction - INFO - Best top1_acc is 1.0000 at 10 epoch.\n", + "2021-07-11 13:01:57,270 - mmaction - INFO - Epoch(val) [10][5]\ttop1_acc: 1.0000, top5_acc: 1.0000, mean_class_accuracy: 1.0000\n" + ] + } + ], + "source": [ + "import os.path as osp\n", + "\n", + "from mmaction.datasets import build_dataset\n", + "from mmaction.models import build_model\n", + "from mmaction.apis import train_model\n", + "\n", + "import mmcv\n", + "\n", + "# Build the dataset\n", + "datasets = [build_dataset(cfg.data.train)]\n", + "\n", + "# Build the recognizer\n", + "model = build_model(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))\n", + "\n", + "# Create work_dir\n", + "mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))\n", + "train_model(model, datasets, cfg, distributed=False, validate=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zdSd7oTLlxIf" + }, + "source": [ + "### Understand the log\n", + "From the log, we can have a basic understanding the training process and know how well the recognizer is trained.\n", + "\n", + "Firstly, the ResNet-50 backbone pre-trained on ImageNet is loaded, this is a common practice since training from scratch is more cost. The log shows that all the weights of the ResNet-50 backbone are loaded except the `fc.bias` and `fc.weight`.\n", + "\n", + "Second, since the dataset we are using is small, we loaded a TSN model and finetune it for action recognition.\n", + "The original TSN is trained on original Kinetics-400 dataset which contains 400 classes but Kinetics-400 Tiny dataset only have 2 classes. Therefore, the last FC layer of the pre-trained TSN for classification has different weight shape and is not used.\n", + "\n", + "Third, after training, the recognizer is evaluated by the default evaluation. The results show that the recognizer achieves 100% top1 accuracy and 100% top5 accuracy on the val dataset,\n", + " \n", + "Not bad!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ryVoSfZVmogw" + }, + "source": [ + "## Test the trained recognizer\n", + "\n", + "After finetuning the recognizer, let's check the prediction results!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "eyY3hCMwyTct", + "outputId": "ea54ff0a-4299-4e93-c1ca-4fe597e7516b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ ] 0/10, elapsed: 0s, ETA:" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n", + " cpuset_checked))\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 2.2 task/s, elapsed: 5s, ETA: 0s\n", + "Evaluating top_k_accuracy ...\n", + "\n", + "top1_acc\t1.0000\n", + "top5_acc\t1.0000\n", + "\n", + "Evaluating mean_class_accuracy ...\n", + "\n", + "mean_acc\t1.0000\n", + "top1_acc: 1.0000\n", + "top5_acc: 1.0000\n", + "mean_class_accuracy: 1.0000\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/content/mmaction2/mmaction/datasets/base.py:166: UserWarning: Option arguments for metrics has been changed to `metric_options`, See 'https://github.com/open-mmlab/mmaction2/pull/286' for more details\n", + " 'Option arguments for metrics has been changed to '\n" + ] + } + ], + "source": [ + "from mmaction.apis import single_gpu_test\n", + "from mmaction.datasets import build_dataloader\n", + "from mmcv.parallel import MMDataParallel\n", + "\n", + "# Build a test dataloader\n", + "dataset = build_dataset(cfg.data.test, dict(test_mode=True))\n", + "data_loader = build_dataloader(\n", + " dataset,\n", + " videos_per_gpu=1,\n", + " workers_per_gpu=cfg.data.workers_per_gpu,\n", + " dist=False,\n", + " shuffle=False)\n", + "model = MMDataParallel(model, device_ids=[0])\n", + "outputs = single_gpu_test(model, data_loader)\n", + "\n", + "eval_config = cfg.evaluation\n", + "eval_config.pop('interval')\n", + "eval_res = dataset.evaluate(outputs, **eval_config)\n", + "for name, val in eval_res.items():\n", + " print(f'{name}: {val:.04f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jZ4t44nWmZDM" + }, + "source": [ + "## Perform Spatio-Temporal Detection\n", + "Here we first install MMDetection." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "w1p0_g76nHOQ", + "outputId": "b30a6be3-c457-452e-c789-7083117c5011" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/content\n", + "Cloning into 'mmdetection'...\n", + "remote: Enumerating objects: 23137, done.\u001B[K\n", + "remote: Total 23137 (delta 0), reused 0 (delta 0), pack-reused 23137\u001B[K\n", + "Receiving objects: 100% (23137/23137), 25.88 MiB | 25.75 MiB/s, done.\n", + "Resolving deltas: 100% (16198/16198), done.\n", + "/content/mmdetection\n", + "Obtaining file:///content/mmdetection\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (3.2.2)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (1.21.5)\n", + "Requirement already satisfied: pycocotools in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (2.0.4)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (1.15.0)\n", + "Collecting terminaltables\n", + " Downloading terminaltables-3.1.10-py2.py3-none-any.whl (15 kB)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (0.11.0)\n", + "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (2.8.2)\n", + "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (1.3.2)\n", + "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (3.0.7)\n", + "Installing collected packages: terminaltables, mmdet\n", + " Running setup.py develop for mmdet\n", + "Successfully installed mmdet-2.21.0 terminaltables-3.1.10\n", + "/content/mmaction2\n" + ] + } + ], + "source": [ + "# Git clone mmdetection repo\n", + "%cd ..\n", + "!git clone https://github.com/open-mmlab/mmdetection.git\n", + "%cd mmdetection\n", + "\n", + "# install mmdet\n", + "!pip install -e .\n", + "%cd ../mmaction2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vlOQsH8OnVKn" + }, + "source": [ + "Download a video to `demo` directory in MMAction2." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QaW3jg5Enish", + "outputId": "c70cde3a-b337-41d0-cb08-82dfc746d9ef" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2022-02-19 11:02:59-- https://download.openmmlab.com/mmaction/dataset/sample/1j20qq1JyX4.mp4\n", + "Resolving download.openmmlab.com (download.openmmlab.com)... 47.254.186.233\n", + "Connecting to download.openmmlab.com (download.openmmlab.com)|47.254.186.233|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 4864186 (4.6M) [video/mp4]\n", + "Saving to: ‘demo/1j20qq1JyX4.mp4’\n", + "\n", + "demo/1j20qq1JyX4.mp 100%[===================>] 4.64M 3.78MB/s in 1.2s \n", + "\n", + "2022-02-19 11:03:01 (3.78 MB/s) - ‘demo/1j20qq1JyX4.mp4’ saved [4864186/4864186]\n", + "\n" + ] + } + ], + "source": [ + "!wget https://download.openmmlab.com/mmaction/dataset/sample/1j20qq1JyX4.mp4 -O demo/1j20qq1JyX4.mp4" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LYGxdu8Vnoah" + }, + "source": [ + "Run spatio-temporal demo." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "LPLiaHaYnrb7", + "outputId": "8a8f8a16-ad7b-4559-c19c-c8264533bff3" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Imageio: 'ffmpeg-linux64-v3.3.1' was not found on your computer; downloading it now.\n", + "Try 1. Download from https://github.com/imageio/imageio-binaries/raw/master/ffmpeg/ffmpeg-linux64-v3.3.1 (43.8 MB)\n", + "Downloading: 8192/45929032 bytes (0.0%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b3883008/45929032 bytes (8.5%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b7995392/45929032 bytes (17.4%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b11796480/45929032 bytes (25.7%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b16072704/45929032 bytes (35.0%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b20152320/45929032 bytes (43.9%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b24305664/45929032 bytes (52.9%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b28319744/45929032 bytes (61.7%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b32440320/45929032 bytes (70.6%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b36634624/45929032 bytes (79.8%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b40886272/45929032 bytes (89.0%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b45146112/45929032 bytes (98.3%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b45929032/45929032 bytes (100.0%)\n", + " Done\n", + "File saved as /root/.imageio/ffmpeg/ffmpeg-linux64-v3.3.1.\n", + "load checkpoint from http path: http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth\n", + "Downloading: \"http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth\" to /root/.cache/torch/hub/checkpoints/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth\n", + "100% 160M/160M [00:21<00:00, 7.77MB/s]\n", + "Performing Human Detection for each frame\n", + "[>>] 217/217, 8.6 task/s, elapsed: 25s, ETA: 0sload checkpoint from http path: https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth\n", + "Downloading: \"https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth\" to /root/.cache/torch/hub/checkpoints/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth\n", + "100% 228M/228M [00:31<00:00, 7.55MB/s]\n", + "Performing SpatioTemporal Action Detection for each clip\n", + "[> ] 167/217, 7.7 task/s, elapsed: 22s, ETA: 7sPerforming visualization\n", + "[MoviePy] >>>> Building video demo/stdet_demo.mp4\n", + "[MoviePy] Writing video demo/stdet_demo.mp4\n", + "100% 434/434 [00:12<00:00, 36.07it/s]\n", + "[MoviePy] Done.\n", + "[MoviePy] >>>> Video ready: demo/stdet_demo.mp4 \n", + "\n" + ] + } + ], + "source": [ + "!python demo/demo_spatiotemporal_det.py --video demo/1j20qq1JyX4.mp4" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 341 + }, + "id": "-0atQCzBo9-C", + "outputId": "b6bb3a67-669c-45d0-cdf4-25b6210362d0" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check the video\n", + "from IPython.display import HTML\n", + "from base64 import b64encode\n", + "mp4 = open('demo/stdet_demo.mp4','rb').read()\n", + "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", + "HTML(\"\"\"\n", + "\n", + "\"\"\" % data_url)" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [], + "include_colab_link": true, + "name": "MMAction2 Tutorial.ipynb", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/openmmlab_test/mmaction2-0.24.1/demo/mmaction2_tutorial_zh-CN.ipynb b/openmmlab_test/mmaction2-0.24.1/demo/mmaction2_tutorial_zh-CN.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..57bad5e85a0fae726f4ec08957bad3bce579128d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/mmaction2_tutorial_zh-CN.ipynb @@ -0,0 +1,1665 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "view-in-github" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VcjSRFELVbNk" + }, + "source": [ + "# MMAction2 Tutorial\n", + "\n", + "- 用MMAction2的识别模型做一次推理\n", + "- 用新数据集训练一个新的识别模型\n", + "- 用MMAction2的时空检测模型做一次推理" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7LqHGkGEVqpm" + }, + "source": [ + "## 安装 MMAction2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "JUFfYElIB3cJ", + "outputId": "cdf9ef1d-9e85-4a77-9e63-fc6f3ca13ae2" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "torch 1.8.1+cu101 \n", + "torchsummary 1.5.1 \n", + "torchtext 0.9.1 \n", + "torchvision 0.9.1+cu101 \n" + ] + } + ], + "source": [ + "# 检查 nvcc,gcc 版本\n", + "!nvcc -V\n", + "!gcc --version" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "thuLEJ7lByQv", + "outputId": "4035efd5-103e-4122-8107-a65777937ce7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Looking in links: https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/index.html\n", + "Collecting mmcv-full\n", + "\u001b[?25l Downloading https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/mmcv_full-1.3.5-cp37-cp37m-manylinux1_x86_64.whl (31.2MB)\n", + "\u001b[K |████████████████████████████████| 31.2MB 96kB/s \n", + "\u001b[?25hRequirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (3.13)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (1.19.5)\n", + "Collecting addict\n", + " Downloading https://files.pythonhosted.org/packages/6a/00/b08f23b7d7e1e14ce01419a467b583edbb93c6cdb8654e54a9cc579cd61f/addict-2.4.0-py3-none-any.whl\n", + "Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (7.1.2)\n", + "Requirement already satisfied: opencv-python>=3 in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (4.1.2.30)\n", + "Collecting yapf\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/5f/0d/8814e79eb865eab42d95023b58b650d01dec6f8ea87fc9260978b1bf2167/yapf-0.31.0-py2.py3-none-any.whl (185kB)\n", + "\u001b[K |████████████████████████████████| 194kB 7.7MB/s \n", + "\u001b[?25hInstalling collected packages: addict, yapf, mmcv-full\n", + "Successfully installed addict-2.4.0 mmcv-full-1.3.5 yapf-0.31.0\n" + ] + } + ], + "source": [ + "# 安装 torch 及 torchvision\n", + "!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html\n", + "\n", + "# 安装 mmcv-full\n", + "!pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html\n", + "\n", + "# 安装 mmaction2\n", + "!rm -rf mmaction2\n", + "!git clone https://github.com/open-mmlab/mmaction2.git\n", + "%cd mmaction2\n", + "\n", + "!pip install -e .\n", + "\n", + "# 安装其他可选依赖库\n", + "!pip install -r requirements/optional.txt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "qKiI1qelB6BT", + "outputId": "1d269eaa-814a-48f5-dfe9-f6952cf5e851" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cloning into 'mmaction2'...\n", + "remote: Enumerating objects: 11360, done.\u001b[K\n", + "remote: Counting objects: 100% (1029/1029), done.\u001b[K\n", + "remote: Compressing objects: 100% (587/587), done.\u001b[K\n", + "remote: Total 11360 (delta 603), reused 721 (delta 436), pack-reused 10331\u001b[K\n", + "Receiving objects: 100% (11360/11360), 37.17 MiB | 14.99 MiB/s, done.\n", + "Resolving deltas: 100% (7930/7930), done.\n", + "/content/mmaction2\n", + "Branch 'fix_nms_config' set up to track remote branch 'fix_nms_config' from 'origin'.\n", + "Switched to a new branch 'fix_nms_config'\n", + "Obtaining file:///content/mmaction2\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.15.0) (3.2.2)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.15.0) (1.19.5)\n", + "Requirement already satisfied: opencv-contrib-python in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.15.0) (4.1.2.30)\n", + "Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.15.0) (7.1.2)\n", + "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.15.0) (1.3.1)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.15.0) (0.10.0)\n", + "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.15.0) (2.8.1)\n", + "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.15.0) (2.4.7)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from cycler>=0.10->matplotlib->mmaction2==0.15.0) (1.15.0)\n", + "Installing collected packages: mmaction2\n", + " Running setup.py develop for mmaction2\n", + "Successfully installed mmaction2\n", + "Collecting av\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/66/ff/bacde7314c646a2bd2f240034809a10cc3f8b096751284d0828640fff3dd/av-8.0.3-cp37-cp37m-manylinux2010_x86_64.whl (37.2MB)\n", + "\u001b[K |████████████████████████████████| 37.2MB 81kB/s \n", + "\u001b[?25hCollecting decord>=0.4.1\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/64/5e/e2be6a3a3a46275059574d9c6a1d422aa6c7c3cbf6614939b8a3c3f8f2d5/decord-0.5.2-py3-none-manylinux2010_x86_64.whl (14.1MB)\n", + "\u001b[K |████████████████████████████████| 14.1MB 225kB/s \n", + "\u001b[?25hRequirement already satisfied: imgaug in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 3)) (0.2.9)\n", + "Requirement already satisfied: librosa in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 4)) (0.8.0)\n", + "Requirement already satisfied: lmdb in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 5)) (0.99)\n", + "Requirement already satisfied: moviepy in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 6)) (0.2.3.5)\n", + "Collecting onnx\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/3f/9b/54c950d3256e27f970a83cd0504efb183a24312702deed0179453316dbd0/onnx-1.9.0-cp37-cp37m-manylinux2010_x86_64.whl (12.2MB)\n", + "\u001b[K |████████████████████████████████| 12.2MB 26.1MB/s \n", + "\u001b[?25hCollecting onnxruntime\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/f9/76/3d0f8bb2776961c7335693df06eccf8d099e48fa6fb552c7546867192603/onnxruntime-1.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5MB)\n", + "\u001b[K |████████████████████████████████| 4.5MB 35.9MB/s \n", + "\u001b[?25hCollecting PyTurboJPEG\n", + " Downloading https://files.pythonhosted.org/packages/07/70/8397de6c39476d2cc0fcee6082ade0225b3e67bc4466a0cf07486b0d0de4/PyTurboJPEG-1.5.0.tar.gz\n", + "Requirement already satisfied: numpy>=1.14.0 in /usr/local/lib/python3.7/dist-packages (from decord>=0.4.1->-r requirements/optional.txt (line 2)) (1.19.5)\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 3)) (3.2.2)\n", + "Requirement already satisfied: imageio in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 3)) (2.4.1)\n", + "Requirement already satisfied: scikit-image>=0.11.0 in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 3)) (0.16.2)\n", + "Requirement already satisfied: Shapely in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 3)) (1.7.1)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 3)) (1.15.0)\n", + "Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 3)) (7.1.2)\n", + "Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 3)) (4.1.2.30)\n", + "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 3)) (1.4.1)\n", + "Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 4)) (2.1.9)\n", + "Requirement already satisfied: numba>=0.43.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 4)) (0.51.2)\n", + "Requirement already satisfied: resampy>=0.2.2 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 4)) (0.2.2)\n", + "Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 4)) (1.3.0)\n", + "Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 4)) (1.0.1)\n", + "Requirement already satisfied: decorator>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 4)) (4.4.2)\n", + "Requirement already satisfied: soundfile>=0.9.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 4)) (0.10.3.post1)\n", + "Requirement already satisfied: scikit-learn!=0.19.0,>=0.14.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 4)) (0.22.2.post1)\n", + "Requirement already satisfied: tqdm<5.0,>=4.11.2 in /usr/local/lib/python3.7/dist-packages (from moviepy->-r requirements/optional.txt (line 6)) (4.41.1)\n", + "Requirement already satisfied: protobuf in /usr/local/lib/python3.7/dist-packages (from onnx->-r requirements/optional.txt (line 7)) (3.12.4)\n", + "Requirement already satisfied: typing-extensions>=3.6.2.1 in /usr/local/lib/python3.7/dist-packages (from onnx->-r requirements/optional.txt (line 7)) (3.7.4.3)\n", + "Requirement already satisfied: flatbuffers in /usr/local/lib/python3.7/dist-packages (from onnxruntime->-r requirements/optional.txt (line 8)) (1.12)\n", + "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 3)) (2.8.1)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 3)) (0.10.0)\n", + "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 3)) (1.3.1)\n", + "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 3)) (2.4.7)\n", + "Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 3)) (2.5.1)\n", + "Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 3)) (1.1.1)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba>=0.43.0->librosa->-r requirements/optional.txt (line 4)) (57.0.0)\n", + "Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba>=0.43.0->librosa->-r requirements/optional.txt (line 4)) (0.34.0)\n", + "Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from pooch>=1.0->librosa->-r requirements/optional.txt (line 4)) (20.9)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from pooch>=1.0->librosa->-r requirements/optional.txt (line 4)) (2.23.0)\n", + "Requirement already satisfied: appdirs in /usr/local/lib/python3.7/dist-packages (from pooch>=1.0->librosa->-r requirements/optional.txt (line 4)) (1.4.4)\n", + "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.7/dist-packages (from soundfile>=0.9.0->librosa->-r requirements/optional.txt (line 4)) (1.14.5)\n", + "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->pooch>=1.0->librosa->-r requirements/optional.txt (line 4)) (1.24.3)\n", + "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->pooch>=1.0->librosa->-r requirements/optional.txt (line 4)) (3.0.4)\n", + "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->pooch>=1.0->librosa->-r requirements/optional.txt (line 4)) (2.10)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->pooch>=1.0->librosa->-r requirements/optional.txt (line 4)) (2020.12.5)\n", + "Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.0->soundfile>=0.9.0->librosa->-r requirements/optional.txt (line 4)) (2.20)\n", + "Building wheels for collected packages: PyTurboJPEG\n", + " Building wheel for PyTurboJPEG (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for PyTurboJPEG: filename=PyTurboJPEG-1.5.0-cp37-none-any.whl size=7478 sha256=ea928a968966ea04f37722e3866c986be613835f0598c242df7e44e6e9d6749b\n", + " Stored in directory: /root/.cache/pip/wheels/87/62/6a/834c085b372ce84e5f95addd832a860edd356711b9c7918424\n", + "Successfully built PyTurboJPEG\n", + "Installing collected packages: av, decord, onnx, onnxruntime, PyTurboJPEG\n", + "Successfully installed PyTurboJPEG-1.5.0 av-8.0.3 decord-0.5.2 onnx-1.9.0 onnxruntime-1.8.0\n" + ] + }, + { + "data": { + "application/vnd.colab-display-data+json": { + "pip_warning": { + "packages": [ + "numpy" + ] + } + } + }, + "metadata": { + "tags": [] + }, + "output_type": "display_data" + } + ], + "source": [ + "# 克隆mmaction2项目\n", + "# %cd /content/\n", + "# !rm -rf mmaction2\n", + "# !git clone https://github.com/open-mmlab/mmaction2.git\n", + "!git clone https://github.com/wangruohui/mmaction2.git\n", + "%cd /content/mmaction2\n", + "!git checkout fix_nms_config\n", + "\n", + "# 以可编辑的模式安装mmaction\n", + "!pip install -e .\n", + "\n", + "# 安装一些额外的依赖\n", + "!pip install -r requirements/optional.txt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "No_zZAFpWC-a", + "outputId": "ff4558ab-30ca-42b3-bf4b-27116d0629f7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.8.1+cu101 True\n", + "0.15.0\n", + "10.1\n", + "GCC 7.3\n" + ] + } + ], + "source": [ + "# 检查torch的安装以及gpu的使用\n", + "import torch, torchvision\n", + "print(torch.__version__, torch.cuda.is_available())\n", + "\n", + "# 检查MMAction2的安装\n", + "import mmaction\n", + "print(mmaction.__version__)\n", + "\n", + "# 检查mmcv的安装\n", + "from mmcv.ops import get_compiling_cuda_version, get_compiler_version\n", + "print(get_compiling_cuda_version())\n", + "print(get_compiler_version())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pXf7oV5DWdab" + }, + "source": [ + "## MMAction2识别模型的推理" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "64CW6d_AaT-Q", + "outputId": "8b1b0465-62a9-4a8b-b1a4-278a5f81945d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2021-06-03 15:01:35-- https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth\n", + "Resolving download.openmmlab.com (download.openmmlab.com)... 47.88.36.78\n", + "Connecting to download.openmmlab.com (download.openmmlab.com)|47.88.36.78|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 97579339 (93M) [application/octet-stream]\n", + "Saving to: ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’\n", + "\n", + "checkpoints/tsn_r50 100%[===================>] 93.06M 11.1MB/s in 8.2s \n", + "\n", + "2021-06-03 15:01:44 (11.4 MB/s) - ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’ saved [97579339/97579339]\n", + "\n" + ] + } + ], + "source": [ + "# 创建checkpoints文件夹并下载tsn模型\n", + "!mkdir checkpoints\n", + "!wget -c https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \\\n", + " -O checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HNZB7NoSabzj" + }, + "outputs": [], + "source": [ + "from mmaction.apis import inference_recognizer, init_recognizer\n", + "\n", + "# 选择tsn对应的配置文件\n", + "config = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'\n", + "# 加载上面下载的checkpoint文件\n", + "checkpoint = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n", + "# 初始化模型\n", + "model = init_recognizer(config, checkpoint, device='cuda:0')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rEMsBnpHapAn" + }, + "outputs": [], + "source": [ + "# 选择视频进行推理\n", + "video = 'demo/demo.mp4'\n", + "label = 'tools/data/kinetics/label_map_k400.txt'\n", + "results = inference_recognizer(model, video)\n", + "\n", + "labels = open(label).readlines()\n", + "labels = [x.strip() for x in labels]\n", + "results = [(labels[k[0]], k[1]) for k in results]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YqtUNVTQyLMJ" + }, + "outputs": [], + "source": [ + "# 查看视频\n", + "from IPython.display import HTML\n", + "from base64 import b64encode\n", + "mp4 = open(video,'rb').read()\n", + "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", + "HTML(\"\"\"\n", + "\n", + "\"\"\" % data_url)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIyJXqfWathq" + }, + "outputs": [], + "source": [ + "# 查看推理Top-5结果\n", + "for result in results:\n", + " print(f'{result[0]}: ', result[1])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QuZG8kZ2fJ5d" + }, + "source": [ + "## 在自定义数据集上训练模型\n", + "训练新模型通常有三个步骤:\n", + "- 支持新数据集\n", + "- 修改配置文件\n", + "- 训练模型\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kbVu0-D-1JT2" + }, + "source": [ + "### 支持新数据集\n", + "\n", + "这里我们给出将数据转换为已有数据集格式的示例。其他方法可以参考[doc](/docs/tutorials/new_dataset.md)\n", + "\n", + "用到的是一个从[Kinetics-400](https://deepmind.com/research/open-source/open-source-datasets/kinetics/)中获取的tiny数据集。包含30个训练视频,10个测试视频。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "gjsUj9JzgUlJ", + "outputId": "7aa8f278-95c2-4073-8c93-2e197e12c6c2" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "rm: cannot remove 'kinetics400_tiny.zip*': No such file or directory\n", + "--2021-06-03 14:55:03-- https://download.openmmlab.com/mmaction/kinetics400_tiny.zip\n", + "Resolving download.openmmlab.com (download.openmmlab.com)... 47.88.36.78\n", + "Connecting to download.openmmlab.com (download.openmmlab.com)|47.88.36.78|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 18308682 (17M) [application/zip]\n", + "Saving to: ‘kinetics400_tiny.zip’\n", + "\n", + "kinetics400_tiny.zi 100%[===================>] 17.46M 10.5MB/s in 1.7s \n", + "\n", + "2021-06-03 14:55:07 (10.5 MB/s) - ‘kinetics400_tiny.zip’ saved [18308682/18308682]\n", + "\n" + ] + } + ], + "source": [ + "# 下载并解压数据集kinetics400_tiny\n", + "!rm kinetics400_tiny.zip*\n", + "!rm -rf kinetics400_tiny\n", + "!wget https://download.openmmlab.com/mmaction/kinetics400_tiny.zip\n", + "!unzip kinetics400_tiny.zip > /dev/null" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "AbZ-o7V6hNw4" + }, + "outputs": [], + "source": [ + "# 安装tree工具并检查数据集目录结构\n", + "!apt-get -q install tree\n", + "!tree kinetics400_tiny" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fTdi6dI0hY3g" + }, + "outputs": [], + "source": [ + "# 查看标注文件格式\n", + "!cat kinetics400_tiny/kinetics_tiny_train_video.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0bq0mxmEi29H" + }, + "source": [ + "根据[`VideoDataset`](./datasets/video_dataset.py)中定义的格式,每一行表示样本视频的文件名和标签,用空格符分隔。\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ht_DGJA9jQar" + }, + "source": [ + "### 修改配置文件\n", + "\n", + "我们需要修改配置文件,同时会用到之前下载的checkpoint作为pre-trained模型。\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LjCcmCKOjktc" + }, + "outputs": [], + "source": [ + "# 获得tsn对应的配置文件cfg\n", + "from mmcv import Config\n", + "cfg = Config.fromfile('./configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tc8YhFFGjp3e" + }, + "source": [ + "我们在原本用于kinetics400-full数据集训练的tsn模型配置上进行修改,让模型可以在Kinetics400-tiny数据集上进行训练。\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "tlhu9byjjt-K", + "outputId": "a1c04b76-9305-497d-9a97-cef55491a7ab" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Config:\n", + "model = dict(\n", + " type='Recognizer2D',\n", + " backbone=dict(\n", + " type='ResNet',\n", + " pretrained='torchvision://resnet50',\n", + " depth=50,\n", + " norm_eval=False),\n", + " cls_head=dict(\n", + " type='TSNHead',\n", + " num_classes=2,\n", + " in_channels=2048,\n", + " spatial_type='avg',\n", + " consensus=dict(type='AvgConsensus', dim=1),\n", + " dropout_ratio=0.4,\n", + " init_std=0.01),\n", + " train_cfg=None,\n", + " test_cfg=dict(average_clips=None))\n", + "optimizer = dict(type='SGD', lr=7.8125e-05, momentum=0.9, weight_decay=0.0001)\n", + "optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))\n", + "lr_config = dict(policy='step', step=[40, 80])\n", + "total_epochs = 30\n", + "checkpoint_config = dict(interval=10)\n", + "log_config = dict(interval=5, hooks=[dict(type='TextLoggerHook')])\n", + "dist_params = dict(backend='nccl')\n", + "log_level = 'INFO'\n", + "load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n", + "resume_from = None\n", + "workflow = [('train', 1)]\n", + "dataset_type = 'VideoDataset'\n", + "data_root = 'kinetics400_tiny/train/'\n", + "data_root_val = 'kinetics400_tiny/val/'\n", + "ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n", + "ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "ann_file_test = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "img_norm_cfg = dict(\n", + " mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)\n", + "train_pipeline = [\n", + " dict(type='DecordInit'),\n", + " dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8),\n", + " dict(type='DecordDecode'),\n", + " dict(\n", + " type='MultiScaleCrop',\n", + " input_size=224,\n", + " scales=(1, 0.875, 0.75, 0.66),\n", + " random_crop=False,\n", + " max_wh_scale_gap=1),\n", + " dict(type='Resize', scale=(224, 224), keep_ratio=False),\n", + " dict(type='Flip', flip_ratio=0.5),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs', 'label'])\n", + "]\n", + "val_pipeline = [\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames',\n", + " clip_len=1,\n", + " frame_interval=1,\n", + " num_clips=8,\n", + " test_mode=True),\n", + " dict(type='DecordDecode'),\n", + " dict(type='Resize', scale=(-1, 256)),\n", + " dict(type='CenterCrop', crop_size=224),\n", + " dict(type='Flip', flip_ratio=0),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs'])\n", + "]\n", + "test_pipeline = [\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames',\n", + " clip_len=1,\n", + " frame_interval=1,\n", + " num_clips=25,\n", + " test_mode=True),\n", + " dict(type='DecordDecode'),\n", + " dict(type='Resize', scale=(-1, 256)),\n", + " dict(type='ThreeCrop', crop_size=256),\n", + " dict(type='Flip', flip_ratio=0),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs'])\n", + "]\n", + "data = dict(\n", + " videos_per_gpu=2,\n", + " workers_per_gpu=2,\n", + " train=dict(\n", + " type='VideoDataset',\n", + " ann_file='kinetics400_tiny/kinetics_tiny_train_video.txt',\n", + " data_prefix='kinetics400_tiny/train/',\n", + " pipeline=[\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames', clip_len=1, frame_interval=1,\n", + " num_clips=8),\n", + " dict(type='DecordDecode'),\n", + " dict(\n", + " type='MultiScaleCrop',\n", + " input_size=224,\n", + " scales=(1, 0.875, 0.75, 0.66),\n", + " random_crop=False,\n", + " max_wh_scale_gap=1),\n", + " dict(type='Resize', scale=(224, 224), keep_ratio=False),\n", + " dict(type='Flip', flip_ratio=0.5),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs', 'label'])\n", + " ]),\n", + " val=dict(\n", + " type='VideoDataset',\n", + " ann_file='kinetics400_tiny/kinetics_tiny_val_video.txt',\n", + " data_prefix='kinetics400_tiny/val/',\n", + " pipeline=[\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames',\n", + " clip_len=1,\n", + " frame_interval=1,\n", + " num_clips=8,\n", + " test_mode=True),\n", + " dict(type='DecordDecode'),\n", + " dict(type='Resize', scale=(-1, 256)),\n", + " dict(type='CenterCrop', crop_size=224),\n", + " dict(type='Flip', flip_ratio=0),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs'])\n", + " ]),\n", + " test=dict(\n", + " type='VideoDataset',\n", + " ann_file='kinetics400_tiny/kinetics_tiny_val_video.txt',\n", + " data_prefix='kinetics400_tiny/val/',\n", + " pipeline=[\n", + " dict(type='DecordInit'),\n", + " dict(\n", + " type='SampleFrames',\n", + " clip_len=1,\n", + " frame_interval=1,\n", + " num_clips=25,\n", + " test_mode=True),\n", + " dict(type='DecordDecode'),\n", + " dict(type='Resize', scale=(-1, 256)),\n", + " dict(type='ThreeCrop', crop_size=256),\n", + " dict(type='Flip', flip_ratio=0),\n", + " dict(\n", + " type='Normalize',\n", + " mean=[123.675, 116.28, 103.53],\n", + " std=[58.395, 57.12, 57.375],\n", + " to_bgr=False),\n", + " dict(type='FormatShape', input_format='NCHW'),\n", + " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", + " dict(type='ToTensor', keys=['imgs'])\n", + " ]))\n", + "evaluation = dict(\n", + " interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'])\n", + "work_dir = './tutorial_exps'\n", + "omnisource = False\n", + "seed = 0\n", + "gpu_ids = range(0, 1)\n", + "\n" + ] + } + ], + "source": [ + "from mmcv.runner import set_random_seed\n", + "\n", + "# 修改数据集类型和各个文件路径\n", + "cfg.dataset_type = 'VideoDataset'\n", + "cfg.data_root = 'kinetics400_tiny/train/'\n", + "cfg.data_root_val = 'kinetics400_tiny/val/'\n", + "cfg.ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n", + "cfg.ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "cfg.ann_file_test = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "\n", + "cfg.data.test.type = 'VideoDataset'\n", + "cfg.data.test.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "cfg.data.test.data_prefix = 'kinetics400_tiny/val/'\n", + "\n", + "cfg.data.train.type = 'VideoDataset'\n", + "cfg.data.train.ann_file = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n", + "cfg.data.train.data_prefix = 'kinetics400_tiny/train/'\n", + "\n", + "cfg.data.val.type = 'VideoDataset'\n", + "cfg.data.val.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", + "cfg.data.val.data_prefix = 'kinetics400_tiny/val/'\n", + "\n", + "# 这里用于确认是否使用到omnisource训练\n", + "cfg.setdefault('omnisource', False)\n", + "# 修改cls_head中类别数为2\n", + "cfg.model.cls_head.num_classes = 2\n", + "# 使用预训练好的tsn模型\n", + "cfg.load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n", + "\n", + "# 设置工作目录\n", + "cfg.work_dir = './tutorial_exps'\n", + "\n", + "# 由于是单卡训练,修改对应的lr\n", + "cfg.data.videos_per_gpu = cfg.data.videos_per_gpu // 16\n", + "cfg.optimizer.lr = cfg.optimizer.lr / 8 / 16\n", + "cfg.total_epochs = 30\n", + "\n", + "# 设置存档点间隔减少存储空间的消耗\n", + "cfg.checkpoint_config.interval = 10\n", + "# 设置日志打印间隔减少打印时间\n", + "cfg.log_config.interval = 5\n", + "\n", + "# 固定随机种子使得结果可复现\n", + "cfg.seed = 0\n", + "set_random_seed(0, deterministic=False)\n", + "cfg.gpu_ids = range(1)\n", + "\n", + "# 打印所有的配置参数\n", + "print(f'Config:\\n{cfg.pretty_text}')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tES-qnZ3k38Z" + }, + "source": [ + "### 训练识别模型\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000, + "referenced_widgets": [ + "81bfbdf1ec55451b8be8a68fd1b0cf18", + "4a9a4d1a6a554315a7d4362fd9ef0290", + "c992b295041a4908a6a0d4f62a542cca", + "57f2df1708fa455ea8a305b9100ad171", + "8c947d1afee142e4b6cd2e0e26f46d6f", + "adf3a16cdae740cf882999a25d53e8f7", + "e6b45b124776452a85136fc3e18502f6", + "974f4fceb03748f1b346b498df9828a3" + ] + }, + "id": "dDBWkdDRk6oz", + "outputId": "574904cc-29fb-4b0a-ae2f-1dcba0248455" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Use load_from_torchvision loader\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Downloading: \"https://download.pytorch.org/models/resnet50-19c8e357.pth\" to /root/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "81bfbdf1ec55451b8be8a68fd1b0cf18", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, max=102502400.0), HTML(value='')))" + ] + }, + "metadata": { + "tags": [] + }, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-06-03 15:02:48,410 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}\n", + "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n", + " cpuset_checked))\n", + "2021-06-03 15:02:59,146 - mmaction - INFO - load checkpoint from ./checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth\n", + "2021-06-03 15:02:59,147 - mmaction - INFO - Use load_from_local loader\n", + "2021-06-03 15:02:59,233 - mmaction - WARNING - The model and loaded state dict do not match exactly\n", + "\n", + "size mismatch for cls_head.fc_cls.weight: copying a param with shape torch.Size([400, 2048]) from checkpoint, the shape in current model is torch.Size([2, 2048]).\n", + "size mismatch for cls_head.fc_cls.bias: copying a param with shape torch.Size([400]) from checkpoint, the shape in current model is torch.Size([2]).\n", + "2021-06-03 15:02:59,235 - mmaction - INFO - Start running, host: root@dd065c1a509c, work_dir: /content/mmaction2/tutorial_exps\n", + "2021-06-03 15:02:59,240 - mmaction - INFO - workflow: [('train', 1)], max: 30 epochs\n", + "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/evaluation.py:144: UserWarning: runner.meta is None. Creating an empty one.\n", + " warnings.warn('runner.meta is None. Creating an empty one.')\n", + "2021-06-03 15:03:03,913 - mmaction - INFO - Epoch [1][5/15]\tlr: 7.813e-05, eta: 0:06:55, time: 0.933, data_time: 0.701, memory: 1654, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7604, loss: 0.7604, grad_norm: 14.8813\n", + "2021-06-03 15:03:04,822 - mmaction - INFO - Epoch [1][10/15]\tlr: 7.813e-05, eta: 0:04:05, time: 0.183, data_time: 0.006, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6282, loss: 0.6282, grad_norm: 10.1833\n", + "2021-06-03 15:03:05,630 - mmaction - INFO - Epoch [1][15/15]\tlr: 7.813e-05, eta: 0:03:05, time: 0.162, data_time: 0.002, memory: 1654, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7165, loss: 0.7165, grad_norm: 10.8552\n", + "2021-06-03 15:03:09,840 - mmaction - INFO - Epoch [2][5/15]\tlr: 7.813e-05, eta: 0:03:45, time: 0.824, data_time: 0.620, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6444, loss: 0.6444, grad_norm: 11.3933\n", + "2021-06-03 15:03:11,318 - mmaction - INFO - Epoch [2][10/15]\tlr: 7.813e-05, eta: 0:03:23, time: 0.296, data_time: 0.109, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.7155, loss: 0.7155, grad_norm: 12.3879\n", + "2021-06-03 15:03:12,109 - mmaction - INFO - Epoch [2][15/15]\tlr: 7.813e-05, eta: 0:02:58, time: 0.158, data_time: 0.001, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6797, loss: 0.6797, grad_norm: 10.9274\n", + "2021-06-03 15:03:16,265 - mmaction - INFO - Epoch [3][5/15]\tlr: 7.813e-05, eta: 0:03:19, time: 0.812, data_time: 0.613, memory: 1654, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7126, loss: 0.7126, grad_norm: 11.1647\n", + "2021-06-03 15:03:17,416 - mmaction - INFO - Epoch [3][10/15]\tlr: 7.813e-05, eta: 0:03:04, time: 0.229, data_time: 0.049, memory: 1654, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.6635, loss: 0.6635, grad_norm: 12.1194\n", + "2021-06-03 15:03:18,283 - mmaction - INFO - Epoch [3][15/15]\tlr: 7.813e-05, eta: 0:02:49, time: 0.176, data_time: 0.014, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6978, loss: 0.6978, grad_norm: 10.3157\n", + "2021-06-03 15:03:22,394 - mmaction - INFO - Epoch [4][5/15]\tlr: 7.813e-05, eta: 0:03:03, time: 0.803, data_time: 0.595, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6795, loss: 0.6795, grad_norm: 12.0900\n", + "2021-06-03 15:03:23,662 - mmaction - INFO - Epoch [4][10/15]\tlr: 7.813e-05, eta: 0:02:53, time: 0.253, data_time: 0.067, memory: 1654, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7414, loss: 0.7414, grad_norm: 12.6038\n", + "2021-06-03 15:03:24,541 - mmaction - INFO - Epoch [4][15/15]\tlr: 7.813e-05, eta: 0:02:42, time: 0.177, data_time: 0.010, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6761, loss: 0.6761, grad_norm: 11.2109\n", + "2021-06-03 15:03:28,677 - mmaction - INFO - Epoch [5][5/15]\tlr: 7.813e-05, eta: 0:02:52, time: 0.809, data_time: 0.594, memory: 1654, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.6899, loss: 0.6899, grad_norm: 12.3528\n", + "2021-06-03 15:03:29,778 - mmaction - INFO - Epoch [5][10/15]\tlr: 7.813e-05, eta: 0:02:43, time: 0.220, data_time: 0.026, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6337, loss: 0.6337, grad_norm: 12.3525\n", + "2021-06-03 15:03:30,887 - mmaction - INFO - Epoch [5][15/15]\tlr: 7.813e-05, eta: 0:02:36, time: 0.222, data_time: 0.058, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6425, loss: 0.6425, grad_norm: 9.7286\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 5.7 task/s, elapsed: 2s, ETA: 0s" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-06-03 15:03:32,826 - mmaction - INFO - Evaluating top_k_accuracy ...\n", + "2021-06-03 15:03:32,828 - mmaction - INFO - \n", + "top1_acc\t0.8000\n", + "top5_acc\t1.0000\n", + "2021-06-03 15:03:32,831 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", + "2021-06-03 15:03:32,836 - mmaction - INFO - \n", + "mean_acc\t0.8000\n", + "2021-06-03 15:03:33,250 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_5.pth.\n", + "2021-06-03 15:03:33,251 - mmaction - INFO - Best top1_acc is 0.8000 at 5 epoch.\n", + "2021-06-03 15:03:33,255 - mmaction - INFO - Epoch(val) [5][5]\ttop1_acc: 0.8000, top5_acc: 1.0000, mean_class_accuracy: 0.8000\n", + "2021-06-03 15:03:37,510 - mmaction - INFO - Epoch [6][5/15]\tlr: 7.813e-05, eta: 0:02:44, time: 0.848, data_time: 0.638, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5897, loss: 0.5897, grad_norm: 11.0816\n", + "2021-06-03 15:03:38,830 - mmaction - INFO - Epoch [6][10/15]\tlr: 7.813e-05, eta: 0:02:38, time: 0.266, data_time: 0.094, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6937, loss: 0.6937, grad_norm: 11.3882\n", + "2021-06-03 15:03:39,638 - mmaction - INFO - Epoch [6][15/15]\tlr: 7.813e-05, eta: 0:02:30, time: 0.162, data_time: 0.002, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6607, loss: 0.6607, grad_norm: 11.6493\n", + "2021-06-03 15:03:43,948 - mmaction - INFO - Epoch [7][5/15]\tlr: 7.813e-05, eta: 0:02:36, time: 0.844, data_time: 0.643, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6503, loss: 0.6503, grad_norm: 12.5117\n", + "2021-06-03 15:03:45,085 - mmaction - INFO - Epoch [7][10/15]\tlr: 7.813e-05, eta: 0:02:30, time: 0.228, data_time: 0.047, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.6313, loss: 0.6313, grad_norm: 10.8442\n", + "2021-06-03 15:03:45,922 - mmaction - INFO - Epoch [7][15/15]\tlr: 7.813e-05, eta: 0:02:24, time: 0.167, data_time: 0.002, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6310, loss: 0.6310, grad_norm: 10.5798\n", + "2021-06-03 15:03:50,322 - mmaction - INFO - Epoch [8][5/15]\tlr: 7.813e-05, eta: 0:02:28, time: 0.863, data_time: 0.662, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6283, loss: 0.6283, grad_norm: 11.3411\n", + "2021-06-03 15:03:51,521 - mmaction - INFO - Epoch [8][10/15]\tlr: 7.813e-05, eta: 0:02:23, time: 0.240, data_time: 0.055, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6765, loss: 0.6765, grad_norm: 11.1512\n", + "2021-06-03 15:03:52,331 - mmaction - INFO - Epoch [8][15/15]\tlr: 7.813e-05, eta: 0:02:17, time: 0.162, data_time: 0.001, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.5961, loss: 0.5961, grad_norm: 11.1990\n", + "2021-06-03 15:03:56,661 - mmaction - INFO - Epoch [9][5/15]\tlr: 7.813e-05, eta: 0:02:21, time: 0.848, data_time: 0.645, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6524, loss: 0.6524, grad_norm: 11.9008\n", + "2021-06-03 15:03:57,882 - mmaction - INFO - Epoch [9][10/15]\tlr: 7.813e-05, eta: 0:02:16, time: 0.244, data_time: 0.061, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6937, loss: 0.6937, grad_norm: 13.0136\n", + "2021-06-03 15:03:58,697 - mmaction - INFO - Epoch [9][15/15]\tlr: 7.813e-05, eta: 0:02:11, time: 0.163, data_time: 0.001, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.5511, loss: 0.5511, grad_norm: 9.5135\n", + "2021-06-03 15:04:02,948 - mmaction - INFO - Epoch [10][5/15]\tlr: 7.813e-05, eta: 0:02:14, time: 0.831, data_time: 0.631, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.5565, loss: 0.5565, grad_norm: 9.2178\n", + "2021-06-03 15:04:03,954 - mmaction - INFO - Epoch [10][10/15]\tlr: 7.813e-05, eta: 0:02:09, time: 0.202, data_time: 0.006, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6199, loss: 0.6199, grad_norm: 10.8341\n", + "2021-06-03 15:04:04,855 - mmaction - INFO - Epoch [10][15/15]\tlr: 7.813e-05, eta: 0:02:05, time: 0.180, data_time: 0.011, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.5853, loss: 0.5853, grad_norm: 10.9314\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 5.8 task/s, elapsed: 2s, ETA: 0s" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-06-03 15:04:06,763 - mmaction - INFO - Evaluating top_k_accuracy ...\n", + "2021-06-03 15:04:06,765 - mmaction - INFO - \n", + "top1_acc\t0.8000\n", + "top5_acc\t1.0000\n", + "2021-06-03 15:04:06,766 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", + "2021-06-03 15:04:06,770 - mmaction - INFO - \n", + "mean_acc\t0.8000\n", + "2021-06-03 15:04:06,772 - mmaction - INFO - Saving checkpoint at 10 epochs\n", + "2021-06-03 15:04:07,188 - mmaction - INFO - Epoch(val) [10][5]\ttop1_acc: 0.8000, top5_acc: 1.0000, mean_class_accuracy: 0.8000\n", + "2021-06-03 15:04:11,319 - mmaction - INFO - Epoch [11][5/15]\tlr: 7.813e-05, eta: 0:02:06, time: 0.825, data_time: 0.620, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5100, loss: 0.5100, grad_norm: 8.8945\n", + "2021-06-03 15:04:12,449 - mmaction - INFO - Epoch [11][10/15]\tlr: 7.813e-05, eta: 0:02:02, time: 0.226, data_time: 0.042, memory: 1654, top1_acc: 0.2000, top5_acc: 1.0000, loss_cls: 0.6959, loss: 0.6959, grad_norm: 13.3499\n", + "2021-06-03 15:04:13,350 - mmaction - INFO - Epoch [11][15/15]\tlr: 7.813e-05, eta: 0:01:58, time: 0.180, data_time: 0.014, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.4929, loss: 0.4929, grad_norm: 8.5170\n", + "2021-06-03 15:04:17,700 - mmaction - INFO - Epoch [12][5/15]\tlr: 7.813e-05, eta: 0:02:00, time: 0.851, data_time: 0.649, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6076, loss: 0.6076, grad_norm: 11.6095\n", + "2021-06-03 15:04:18,762 - mmaction - INFO - Epoch [12][10/15]\tlr: 7.813e-05, eta: 0:01:56, time: 0.213, data_time: 0.032, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5356, loss: 0.5356, grad_norm: 9.7047\n", + "2021-06-03 15:04:19,608 - mmaction - INFO - Epoch [12][15/15]\tlr: 7.813e-05, eta: 0:01:52, time: 0.169, data_time: 0.002, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6340, loss: 0.6340, grad_norm: 11.7714\n", + "2021-06-03 15:04:23,829 - mmaction - INFO - Epoch [13][5/15]\tlr: 7.813e-05, eta: 0:01:53, time: 0.825, data_time: 0.611, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.5467, loss: 0.5467, grad_norm: 9.3259\n", + "2021-06-03 15:04:24,969 - mmaction - INFO - Epoch [13][10/15]\tlr: 7.813e-05, eta: 0:01:49, time: 0.230, data_time: 0.042, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5878, loss: 0.5878, grad_norm: 11.7431\n", + "2021-06-03 15:04:25,994 - mmaction - INFO - Epoch [13][15/15]\tlr: 7.813e-05, eta: 0:01:46, time: 0.205, data_time: 0.038, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5018, loss: 0.5018, grad_norm: 8.9612\n", + "2021-06-03 15:04:30,330 - mmaction - INFO - Epoch [14][5/15]\tlr: 7.813e-05, eta: 0:01:46, time: 0.850, data_time: 0.643, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6634, loss: 0.6634, grad_norm: 12.9608\n", + "2021-06-03 15:04:31,497 - mmaction - INFO - Epoch [14][10/15]\tlr: 7.813e-05, eta: 0:01:43, time: 0.232, data_time: 0.048, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.5646, loss: 0.5646, grad_norm: 10.2523\n", + "2021-06-03 15:04:32,322 - mmaction - INFO - Epoch [14][15/15]\tlr: 7.813e-05, eta: 0:01:39, time: 0.166, data_time: 0.004, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6504, loss: 0.6504, grad_norm: 12.5382\n", + "2021-06-03 15:04:36,355 - mmaction - INFO - Epoch [15][5/15]\tlr: 7.813e-05, eta: 0:01:39, time: 0.789, data_time: 0.589, memory: 1654, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.5893, loss: 0.5893, grad_norm: 11.1704\n", + "2021-06-03 15:04:37,811 - mmaction - INFO - Epoch [15][10/15]\tlr: 7.813e-05, eta: 0:01:36, time: 0.291, data_time: 0.117, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6413, loss: 0.6413, grad_norm: 12.5114\n", + "2021-06-03 15:04:38,647 - mmaction - INFO - Epoch [15][15/15]\tlr: 7.813e-05, eta: 0:01:33, time: 0.167, data_time: 0.001, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.4747, loss: 0.4747, grad_norm: 8.3424\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 5.7 task/s, elapsed: 2s, ETA: 0s" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-06-03 15:04:40,575 - mmaction - INFO - Evaluating top_k_accuracy ...\n", + "2021-06-03 15:04:40,576 - mmaction - INFO - \n", + "top1_acc\t0.8000\n", + "top5_acc\t1.0000\n", + "2021-06-03 15:04:40,586 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", + "2021-06-03 15:04:40,589 - mmaction - INFO - \n", + "mean_acc\t0.8000\n", + "2021-06-03 15:04:40,590 - mmaction - INFO - Epoch(val) [15][5]\ttop1_acc: 0.8000, top5_acc: 1.0000, mean_class_accuracy: 0.8000\n", + "2021-06-03 15:04:44,502 - mmaction - INFO - Epoch [16][5/15]\tlr: 7.813e-05, eta: 0:01:33, time: 0.780, data_time: 0.572, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.4760, loss: 0.4760, grad_norm: 8.9694\n", + "2021-06-03 15:04:45,694 - mmaction - INFO - Epoch [16][10/15]\tlr: 7.813e-05, eta: 0:01:30, time: 0.237, data_time: 0.049, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.5583, loss: 0.5583, grad_norm: 11.0941\n", + "2021-06-03 15:04:46,780 - mmaction - INFO - Epoch [16][15/15]\tlr: 7.813e-05, eta: 0:01:27, time: 0.219, data_time: 0.053, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.5707, loss: 0.5707, grad_norm: 11.3002\n", + "2021-06-03 15:04:51,458 - mmaction - INFO - Epoch [17][5/15]\tlr: 7.813e-05, eta: 0:01:27, time: 0.918, data_time: 0.705, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.5781, loss: 0.5781, grad_norm: 11.3368\n", + "2021-06-03 15:04:52,369 - mmaction - INFO - Epoch [17][10/15]\tlr: 7.813e-05, eta: 0:01:24, time: 0.181, data_time: 0.004, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.5642, loss: 0.5642, grad_norm: 10.7471\n", + "2021-06-03 15:04:53,264 - mmaction - INFO - Epoch [17][15/15]\tlr: 7.813e-05, eta: 0:01:21, time: 0.180, data_time: 0.014, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.4448, loss: 0.4448, grad_norm: 7.9083\n", + "2021-06-03 15:04:57,485 - mmaction - INFO - Epoch [18][5/15]\tlr: 7.813e-05, eta: 0:01:20, time: 0.827, data_time: 0.617, memory: 1654, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.4346, loss: 0.4346, grad_norm: 8.5470\n", + "2021-06-03 15:04:58,807 - mmaction - INFO - Epoch [18][10/15]\tlr: 7.813e-05, eta: 0:01:17, time: 0.265, data_time: 0.077, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.4648, loss: 0.4648, grad_norm: 8.6081\n", + "2021-06-03 15:04:59,651 - mmaction - INFO - Epoch [18][15/15]\tlr: 7.813e-05, eta: 0:01:14, time: 0.169, data_time: 0.002, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6353, loss: 0.6353, grad_norm: 12.7139\n", + "2021-06-03 15:05:04,048 - mmaction - INFO - Epoch [19][5/15]\tlr: 7.813e-05, eta: 0:01:14, time: 0.860, data_time: 0.654, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.5173, loss: 0.5173, grad_norm: 10.0505\n", + "2021-06-03 15:05:05,140 - mmaction - INFO - Epoch [19][10/15]\tlr: 7.813e-05, eta: 0:01:11, time: 0.220, data_time: 0.032, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.4610, loss: 0.4610, grad_norm: 9.0271\n", + "2021-06-03 15:05:05,992 - mmaction - INFO - Epoch [19][15/15]\tlr: 7.813e-05, eta: 0:01:08, time: 0.170, data_time: 0.003, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.4900, loss: 0.4900, grad_norm: 9.4134\n", + "2021-06-03 15:05:10,251 - mmaction - INFO - Epoch [20][5/15]\tlr: 7.813e-05, eta: 0:01:07, time: 0.832, data_time: 0.633, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.4717, loss: 0.4717, grad_norm: 9.3263\n", + "2021-06-03 15:05:11,296 - mmaction - INFO - Epoch [20][10/15]\tlr: 7.813e-05, eta: 0:01:05, time: 0.210, data_time: 0.010, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6269, loss: 0.6269, grad_norm: 12.3093\n", + "2021-06-03 15:05:12,249 - mmaction - INFO - Epoch [20][15/15]\tlr: 7.813e-05, eta: 0:01:02, time: 0.191, data_time: 0.022, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6329, loss: 0.6329, grad_norm: 11.7156\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 5.8 task/s, elapsed: 2s, ETA: 0s" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-06-03 15:05:14,159 - mmaction - INFO - Evaluating top_k_accuracy ...\n", + "2021-06-03 15:05:14,161 - mmaction - INFO - \n", + "top1_acc\t1.0000\n", + "top5_acc\t1.0000\n", + "2021-06-03 15:05:14,166 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", + "2021-06-03 15:05:14,168 - mmaction - INFO - \n", + "mean_acc\t1.0000\n", + "2021-06-03 15:05:14,599 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_20.pth.\n", + "2021-06-03 15:05:14,603 - mmaction - INFO - Best top1_acc is 1.0000 at 20 epoch.\n", + "2021-06-03 15:05:14,606 - mmaction - INFO - Saving checkpoint at 20 epochs\n", + "2021-06-03 15:05:15,008 - mmaction - INFO - Epoch(val) [20][5]\ttop1_acc: 1.0000, top5_acc: 1.0000, mean_class_accuracy: 1.0000\n", + "2021-06-03 15:05:19,127 - mmaction - INFO - Epoch [21][5/15]\tlr: 7.813e-05, eta: 0:01:01, time: 0.823, data_time: 0.618, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.3904, loss: 0.3904, grad_norm: 7.6698\n", + "2021-06-03 15:05:20,196 - mmaction - INFO - Epoch [21][10/15]\tlr: 7.813e-05, eta: 0:00:58, time: 0.214, data_time: 0.024, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.5884, loss: 0.5884, grad_norm: 11.4530\n", + "2021-06-03 15:05:21,218 - mmaction - INFO - Epoch [21][15/15]\tlr: 7.813e-05, eta: 0:00:56, time: 0.204, data_time: 0.032, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5800, loss: 0.5800, grad_norm: 12.1364\n", + "2021-06-03 15:05:25,640 - mmaction - INFO - Epoch [22][5/15]\tlr: 7.813e-05, eta: 0:00:55, time: 0.864, data_time: 0.656, memory: 1654, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.3669, loss: 0.3669, grad_norm: 7.3256\n", + "2021-06-03 15:05:26,903 - mmaction - INFO - Epoch [22][10/15]\tlr: 7.813e-05, eta: 0:00:52, time: 0.255, data_time: 0.063, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5618, loss: 0.5618, grad_norm: 11.0834\n", + "2021-06-03 15:05:27,740 - mmaction - INFO - Epoch [22][15/15]\tlr: 7.813e-05, eta: 0:00:50, time: 0.167, data_time: 0.001, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6190, loss: 0.6190, grad_norm: 12.5605\n", + "2021-06-03 15:05:32,036 - mmaction - INFO - Epoch [23][5/15]\tlr: 7.813e-05, eta: 0:00:48, time: 0.839, data_time: 0.631, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.5490, loss: 0.5490, grad_norm: 11.1925\n", + "2021-06-03 15:05:33,384 - mmaction - INFO - Epoch [23][10/15]\tlr: 7.813e-05, eta: 0:00:46, time: 0.272, data_time: 0.081, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.5988, loss: 0.5988, grad_norm: 12.0808\n", + "2021-06-03 15:05:34,222 - mmaction - INFO - Epoch [23][15/15]\tlr: 7.813e-05, eta: 0:00:43, time: 0.167, data_time: 0.001, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6084, loss: 0.6084, grad_norm: 11.4491\n", + "2021-06-03 15:05:38,546 - mmaction - INFO - Epoch [24][5/15]\tlr: 7.813e-05, eta: 0:00:42, time: 0.845, data_time: 0.637, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.5125, loss: 0.5125, grad_norm: 10.9388\n", + "2021-06-03 15:05:39,792 - mmaction - INFO - Epoch [24][10/15]\tlr: 7.813e-05, eta: 0:00:39, time: 0.251, data_time: 0.059, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6036, loss: 0.6036, grad_norm: 12.3427\n", + "2021-06-03 15:05:40,640 - mmaction - INFO - Epoch [24][15/15]\tlr: 7.813e-05, eta: 0:00:37, time: 0.169, data_time: 0.001, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.5052, loss: 0.5052, grad_norm: 10.0184\n", + "2021-06-03 15:05:44,885 - mmaction - INFO - Epoch [25][5/15]\tlr: 7.813e-05, eta: 0:00:35, time: 0.831, data_time: 0.623, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.5324, loss: 0.5324, grad_norm: 10.9933\n", + "2021-06-03 15:05:46,302 - mmaction - INFO - Epoch [25][10/15]\tlr: 7.813e-05, eta: 0:00:33, time: 0.283, data_time: 0.097, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6386, loss: 0.6386, grad_norm: 12.9881\n", + "2021-06-03 15:05:47,135 - mmaction - INFO - Epoch [25][15/15]\tlr: 7.813e-05, eta: 0:00:31, time: 0.166, data_time: 0.001, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.4406, loss: 0.4406, grad_norm: 9.0257\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 5.8 task/s, elapsed: 2s, ETA: 0s" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-06-03 15:05:49,031 - mmaction - INFO - Evaluating top_k_accuracy ...\n", + "2021-06-03 15:05:49,033 - mmaction - INFO - \n", + "top1_acc\t0.8000\n", + "top5_acc\t1.0000\n", + "2021-06-03 15:05:49,039 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", + "2021-06-03 15:05:49,040 - mmaction - INFO - \n", + "mean_acc\t0.8000\n", + "2021-06-03 15:05:49,042 - mmaction - INFO - Epoch(val) [25][5]\ttop1_acc: 0.8000, top5_acc: 1.0000, mean_class_accuracy: 0.8000\n", + "2021-06-03 15:05:53,064 - mmaction - INFO - Epoch [26][5/15]\tlr: 7.813e-05, eta: 0:00:29, time: 0.801, data_time: 0.590, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.3512, loss: 0.3512, grad_norm: 7.0619\n", + "2021-06-03 15:05:54,188 - mmaction - INFO - Epoch [26][10/15]\tlr: 7.813e-05, eta: 0:00:27, time: 0.225, data_time: 0.030, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.3328, loss: 0.3328, grad_norm: 7.1553\n", + "2021-06-03 15:05:55,139 - mmaction - INFO - Epoch [26][15/15]\tlr: 7.813e-05, eta: 0:00:25, time: 0.192, data_time: 0.018, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.4698, loss: 0.4698, grad_norm: 9.4666\n", + "2021-06-03 15:05:59,226 - mmaction - INFO - Epoch [27][5/15]\tlr: 7.813e-05, eta: 0:00:23, time: 0.799, data_time: 0.593, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.5434, loss: 0.5434, grad_norm: 10.9087\n", + "2021-06-03 15:06:00,493 - mmaction - INFO - Epoch [27][10/15]\tlr: 7.813e-05, eta: 0:00:21, time: 0.254, data_time: 0.067, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.3672, loss: 0.3672, grad_norm: 7.5920\n", + "2021-06-03 15:06:01,451 - mmaction - INFO - Epoch [27][15/15]\tlr: 7.813e-05, eta: 0:00:18, time: 0.191, data_time: 0.014, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.3633, loss: 0.3633, grad_norm: 7.8609\n", + "2021-06-03 15:06:05,792 - mmaction - INFO - Epoch [28][5/15]\tlr: 7.813e-05, eta: 0:00:16, time: 0.850, data_time: 0.645, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6003, loss: 0.6003, grad_norm: 12.0149\n", + "2021-06-03 15:06:07,078 - mmaction - INFO - Epoch [28][10/15]\tlr: 7.813e-05, eta: 0:00:14, time: 0.257, data_time: 0.068, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6538, loss: 0.6538, grad_norm: 13.2297\n", + "2021-06-03 15:06:07,941 - mmaction - INFO - Epoch [28][15/15]\tlr: 7.813e-05, eta: 0:00:12, time: 0.172, data_time: 0.003, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.4151, loss: 0.4151, grad_norm: 8.6073\n", + "2021-06-03 15:06:12,212 - mmaction - INFO - Epoch [29][5/15]\tlr: 7.813e-05, eta: 0:00:10, time: 0.836, data_time: 0.629, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.3997, loss: 0.3997, grad_norm: 8.2630\n", + "2021-06-03 15:06:13,414 - mmaction - INFO - Epoch [29][10/15]\tlr: 7.813e-05, eta: 0:00:08, time: 0.240, data_time: 0.050, memory: 1654, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.3257, loss: 0.3257, grad_norm: 6.8715\n", + "2021-06-03 15:06:14,279 - mmaction - INFO - Epoch [29][15/15]\tlr: 7.813e-05, eta: 0:00:06, time: 0.173, data_time: 0.002, memory: 1654, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5843, loss: 0.5843, grad_norm: 12.2261\n", + "2021-06-03 15:06:18,611 - mmaction - INFO - Epoch [30][5/15]\tlr: 7.813e-05, eta: 0:00:04, time: 0.849, data_time: 0.645, memory: 1654, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.4302, loss: 0.4302, grad_norm: 8.8877\n", + "2021-06-03 15:06:20,008 - mmaction - INFO - Epoch [30][10/15]\tlr: 7.813e-05, eta: 0:00:02, time: 0.280, data_time: 0.091, memory: 1654, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.2355, loss: 0.2355, grad_norm: 5.3905\n", + "2021-06-03 15:06:20,850 - mmaction - INFO - Epoch [30][15/15]\tlr: 7.813e-05, eta: 0:00:00, time: 0.168, data_time: 0.001, memory: 1654, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.4508, loss: 0.4508, grad_norm: 9.6814\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 5.8 task/s, elapsed: 2s, ETA: 0s" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-06-03 15:06:22,740 - mmaction - INFO - Evaluating top_k_accuracy ...\n", + "2021-06-03 15:06:22,742 - mmaction - INFO - \n", + "top1_acc\t1.0000\n", + "top5_acc\t1.0000\n", + "2021-06-03 15:06:22,746 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", + "2021-06-03 15:06:22,747 - mmaction - INFO - \n", + "mean_acc\t1.0000\n", + "2021-06-03 15:06:22,756 - mmaction - INFO - Saving checkpoint at 30 epochs\n", + "2021-06-03 15:06:23,168 - mmaction - INFO - Epoch(val) [30][5]\ttop1_acc: 1.0000, top5_acc: 1.0000, mean_class_accuracy: 1.0000\n" + ] + } + ], + "source": [ + "import os.path as osp\n", + "\n", + "from mmaction.datasets import build_dataset\n", + "from mmaction.models import build_model\n", + "from mmaction.apis import train_model\n", + "\n", + "import mmcv\n", + "\n", + "# 构建数据集\n", + "datasets = [build_dataset(cfg.data.train)]\n", + "\n", + "# 构建动作识别模型\n", + "model = build_model(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))\n", + "\n", + "# 创建工作目录并训练模型\n", + "mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))\n", + "train_model(model, datasets, cfg, distributed=False, validate=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ryVoSfZVmogw" + }, + "source": [ + "## 评价模型\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "eyY3hCMwyTct", + "outputId": "54c2d6ce-3f3e-45ed-b3d4-f628ba4263b0" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ ] 0/10, elapsed: 0s, ETA:" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n", + " cpuset_checked))\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 2.2 task/s, elapsed: 4s, ETA: 0s\n", + "Evaluating top_k_accuracy ...\n", + "\n", + "top1_acc\t0.9000\n", + "top5_acc\t1.0000\n", + "\n", + "Evaluating mean_class_accuracy ...\n", + "\n", + "mean_acc\t0.9000\n", + "top1_acc: 0.9000\n", + "top5_acc: 1.0000\n", + "mean_class_accuracy: 0.9000\n" + ] + } + ], + "source": [ + "from mmaction.apis import single_gpu_test\n", + "from mmaction.datasets import build_dataloader\n", + "from mmcv.parallel import MMDataParallel\n", + "\n", + "# 构建测试数据集\n", + "dataset = build_dataset(cfg.data.test, dict(test_mode=True))\n", + "data_loader = build_dataloader(\n", + " dataset,\n", + " videos_per_gpu=1,\n", + " workers_per_gpu=cfg.data.workers_per_gpu,\n", + " dist=False,\n", + " shuffle=False)\n", + "model = MMDataParallel(model, device_ids=[0])\n", + "outputs = single_gpu_test(model, data_loader)\n", + "\n", + "# 在测试集上评价训练完成的识别模型\n", + "eval_config = cfg.evaluation\n", + "eval_config.pop('interval')\n", + "eval_res = dataset.evaluate(outputs, **eval_config)\n", + "for name, val in eval_res.items():\n", + " print(f'{name}: {val:.04f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8EbJVEEmrv0S" + }, + "source": [ + "## 时空动作识别\n", + "\n", + "这里我们用到mmdet来辅助完成时空动作识别的任务,首先要在主目录下进行安装。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Yq5e5l-zEMpf", + "outputId": "178b2d61-d00c-4b93-847c-efc4b249ceaa" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/content\n", + "Cloning into 'mmdetection'...\n", + "remote: Enumerating objects: 18118, done.\u001b[K\n", + "remote: Counting objects: 100% (207/207), done.\u001b[K\n", + "remote: Compressing objects: 100% (163/163), done.\u001b[K\n", + "remote: Total 18118 (delta 87), reused 113 (delta 44), pack-reused 17911\u001b[K\n", + "Receiving objects: 100% (18118/18118), 21.50 MiB | 33.66 MiB/s, done.\n", + "Resolving deltas: 100% (12576/12576), done.\n", + "/content/mmdetection\n", + "Obtaining file:///content/mmdetection\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from mmdet==2.13.0) (3.2.2)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmdet==2.13.0) (1.19.5)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from mmdet==2.13.0) (1.15.0)\n", + "Collecting terminaltables\n", + " Downloading https://files.pythonhosted.org/packages/9b/c4/4a21174f32f8a7e1104798c445dacdc1d4df86f2f26722767034e4de4bff/terminaltables-3.1.0.tar.gz\n", + "Requirement already satisfied: pycocotools in /usr/local/lib/python3.7/dist-packages (from mmdet==2.13.0) (2.0.2)\n", + "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.13.0) (2.4.7)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.13.0) (0.10.0)\n", + "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.13.0) (1.3.1)\n", + "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.13.0) (2.8.1)\n", + "Requirement already satisfied: cython>=0.27.3 in /usr/local/lib/python3.7/dist-packages (from pycocotools->mmdet==2.13.0) (0.29.23)\n", + "Requirement already satisfied: setuptools>=18.0 in /usr/local/lib/python3.7/dist-packages (from pycocotools->mmdet==2.13.0) (57.0.0)\n", + "Building wheels for collected packages: terminaltables\n", + " Building wheel for terminaltables (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for terminaltables: filename=terminaltables-3.1.0-cp37-none-any.whl size=15356 sha256=37a2b87aceff6ca4b32508fac67142e960106f99a33a0a1d2127aaaecd9fae0b\n", + " Stored in directory: /root/.cache/pip/wheels/30/6b/50/6c75775b681fb36cdfac7f19799888ef9d8813aff9e379663e\n", + "Successfully built terminaltables\n", + "Installing collected packages: terminaltables, mmdet\n", + " Running setup.py develop for mmdet\n", + "Successfully installed mmdet terminaltables-3.1.0\n", + "/content/mmaction2\n" + ] + } + ], + "source": [ + "# 克隆mmdetection项目\n", + "%cd ..\n", + "!git clone https://github.com/open-mmlab/mmdetection.git\n", + "%cd mmdetection\n", + "\n", + "# 以可编辑的模式安装mmdet\n", + "!pip install -e .\n", + "%cd ../mmaction2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dWLxybK6INRI" + }, + "source": [ + "同时我们需要上传视频至目录mmaction2下" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "'wget' is not recognized as an internal or external command,\n", + "operable program or batch file.\n" + ] + } + ], + "source": [ + "!wget https://download.openmmlab.com/mmaction/dataset/sample/1j20qq1JyX4.mp4 -O demo/1j20qq1JyX4.mp4" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "AUw6xa1YrvZb", + "outputId": "566e2683-9158-4173-b821-b9d9a34cf893" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Imageio: 'ffmpeg-linux64-v3.3.1' was not found on your computer; downloading it now.\n", + "Try 1. Download from https://github.com/imageio/imageio-binaries/raw/master/ffmpeg/ffmpeg-linux64-v3.3.1 (43.8 MB)\n", + "Downloading: 8192/45929032 bytes (0.02826240/45929032 bytes (6.2%6922240/45929032 bytes (15.110977280/45929032 bytes (23.9%14925824/45929032 bytes (32.5%19046400/45929032 bytes (41.5%23068672/45929032 bytes (50.2%26279936/45929032 bytes (57.2%30392320/45929032 bytes (66.2%34471936/45929032 bytes (75.1%38543360/45929032 bytes (83.9%42688512/45929032 bytes (92.9%45929032/45929032 bytes (100.0%)\n", + " Done\n", + "File saved as /root/.imageio/ffmpeg/ffmpeg-linux64-v3.3.1.\n", + "Use load_from_http loader\n", + "Downloading: \"http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth\" to /root/.cache/torch/hub/checkpoints/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth\n", + "100% 160M/160M [00:17<00:00, 9.30MB/s]\n", + "Performing Human Detection for each frame\n", + "100% 217/217 [00:26<00:00, 8.24it/s]\n", + "Use load_from_http loader\n", + "Downloading: \"https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth\" to /root/.cache/torch/hub/checkpoints/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth\n", + "100% 228M/228M [00:24<00:00, 9.79MB/s]\n", + "Performing SpatioTemporal Action Detection for each clip\n", + "217it [00:23, 9.19it/s]\n", + "Performing visualization\n", + "[MoviePy] >>>> Building video demo/stdet_demo.mp4\n", + "[MoviePy] Writing video demo/stdet_demo.mp4\n", + "100% 434/434 [00:10<00:00, 39.93it/s]\n", + "[MoviePy] Done.\n", + "[MoviePy] >>>> Video ready: demo/stdet_demo.mp4 \n", + "\n" + ] + } + ], + "source": [ + "# 完成时空检测\n", + "!python demo/demo_spatiotemporal_det.py --video demo/1j20qq1JyX4.mp4" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 341 + }, + "id": "oRabUF1TsE-v", + "outputId": "ff8cee1a-6715-4368-edf2-ce796fd946db" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 15, + "metadata": { + "tags": [] + }, + "output_type": "execute_result" + } + ], + "source": [ + "# 查看视频\n", + "from IPython.display import HTML\n", + "from base64 import b64encode\n", + "mp4 = open('demo/stdet_demo.mp4','rb').read()\n", + "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", + "HTML(\"\"\"\n", + "\n", + "\"\"\" % data_url)" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [], + "include_colab_link": true, + "name": "MMAction2 new.ipynb", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "4a9a4d1a6a554315a7d4362fd9ef0290": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "57f2df1708fa455ea8a305b9100ad171": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_974f4fceb03748f1b346b498df9828a3", + "placeholder": "​", + "style": "IPY_MODEL_e6b45b124776452a85136fc3e18502f6", + "value": " 97.8M/97.8M [00:45<00:00, 2.26MB/s]" + } + }, + "81bfbdf1ec55451b8be8a68fd1b0cf18": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c992b295041a4908a6a0d4f62a542cca", + "IPY_MODEL_57f2df1708fa455ea8a305b9100ad171" + ], + "layout": "IPY_MODEL_4a9a4d1a6a554315a7d4362fd9ef0290" + } + }, + "8c947d1afee142e4b6cd2e0e26f46d6f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "initial" + } + }, + "974f4fceb03748f1b346b498df9828a3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "adf3a16cdae740cf882999a25d53e8f7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c992b295041a4908a6a0d4f62a542cca": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "100%", + "description_tooltip": null, + "layout": "IPY_MODEL_adf3a16cdae740cf882999a25d53e8f7", + "max": 102502400, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_8c947d1afee142e4b6cd2e0e26f46d6f", + "value": 102502400 + } + }, + "e6b45b124776452a85136fc3e18502f6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + } + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/openmmlab_test/mmaction2-0.24.1/demo/ntu_sample.avi b/openmmlab_test/mmaction2-0.24.1/demo/ntu_sample.avi new file mode 100644 index 0000000000000000000000000000000000000000..42f8e03b1e3a294e026b340ac8bb3b94e66effb0 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/demo/ntu_sample.avi differ diff --git a/openmmlab_test/mmaction2-0.24.1/demo/test_video_structuralize.mp4 b/openmmlab_test/mmaction2-0.24.1/demo/test_video_structuralize.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..1170c88e8856aacf38332a50b448635e1e311a8b Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/demo/test_video_structuralize.mp4 differ diff --git a/openmmlab_test/mmaction2-0.24.1/demo/visualize_heatmap_volume.ipynb b/openmmlab_test/mmaction2-0.24.1/demo/visualize_heatmap_volume.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..87f26a7d2be12bdb4cc163d8114a34ddc3de7f86 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/visualize_heatmap_volume.ipynb @@ -0,0 +1,403 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 6, + "id": "speaking-algebra", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import cv2\n", + "import os.path as osp\n", + "import decord\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import urllib\n", + "import moviepy.editor as mpy\n", + "import random as rd\n", + "from mmpose.apis import vis_pose_result\n", + "from mmpose.models import TopDown\n", + "from mmcv import load, dump\n", + "\n", + "# We assume the annotation is already prepared\n", + "gym_train_ann_file = '../data/skeleton/gym_train.pkl'\n", + "gym_val_ann_file = '../data/skeleton/gym_val.pkl'\n", + "ntu60_xsub_train_ann_file = '../data/skeleton/ntu60_xsub_train.pkl'\n", + "ntu60_xsub_val_ann_file = '../data/skeleton/ntu60_xsub_val.pkl'" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "alive-consolidation", + "metadata": {}, + "outputs": [], + "source": [ + "FONTFACE = cv2.FONT_HERSHEY_DUPLEX\n", + "FONTSCALE = 0.6\n", + "FONTCOLOR = (255, 255, 255)\n", + "BGBLUE = (0, 119, 182)\n", + "THICKNESS = 1\n", + "LINETYPE = 1" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "ranging-conjunction", + "metadata": {}, + "outputs": [], + "source": [ + "def add_label(frame, label, BGCOLOR=BGBLUE):\n", + " threshold = 30\n", + " def split_label(label):\n", + " label = label.split()\n", + " lines, cline = [], ''\n", + " for word in label:\n", + " if len(cline) + len(word) < threshold:\n", + " cline = cline + ' ' + word\n", + " else:\n", + " lines.append(cline)\n", + " cline = word\n", + " if cline != '':\n", + " lines += [cline]\n", + " return lines\n", + " \n", + " if len(label) > 30:\n", + " label = split_label(label)\n", + " else:\n", + " label = [label]\n", + " label = ['Action: '] + label\n", + " \n", + " sizes = []\n", + " for line in label:\n", + " sizes.append(cv2.getTextSize(line, FONTFACE, FONTSCALE, THICKNESS)[0])\n", + " box_width = max([x[0] for x in sizes]) + 10\n", + " text_height = sizes[0][1]\n", + " box_height = len(sizes) * (text_height + 6)\n", + " \n", + " cv2.rectangle(frame, (0, 0), (box_width, box_height), BGCOLOR, -1)\n", + " for i, line in enumerate(label):\n", + " location = (5, (text_height + 6) * i + text_height + 3)\n", + " cv2.putText(frame, line, location, FONTFACE, FONTSCALE, FONTCOLOR, THICKNESS, LINETYPE)\n", + " return frame\n", + " \n", + "\n", + "def vis_skeleton(vid_path, anno, category_name=None, ratio=0.5):\n", + " vid = decord.VideoReader(vid_path)\n", + " frames = [x.asnumpy() for x in vid]\n", + " \n", + " h, w, _ = frames[0].shape\n", + " new_shape = (int(w * ratio), int(h * ratio))\n", + " frames = [cv2.resize(f, new_shape) for f in frames]\n", + " \n", + " assert len(frames) == anno['total_frames']\n", + " # The shape is N x T x K x 3\n", + " kps = np.concatenate([anno['keypoint'], anno['keypoint_score'][..., None]], axis=-1)\n", + " kps[..., :2] *= ratio\n", + " # Convert to T x N x K x 3\n", + " kps = kps.transpose([1, 0, 2, 3])\n", + " vis_frames = []\n", + "\n", + " # we need an instance of TopDown model, so build a minimal one\n", + " model = TopDown(backbone=dict(type='ShuffleNetV1'))\n", + "\n", + " for f, kp in zip(frames, kps):\n", + " bbox = np.zeros([0, 4], dtype=np.float32)\n", + " result = [dict(bbox=bbox, keypoints=k) for k in kp]\n", + " vis_frame = vis_pose_result(model, f, result)\n", + " \n", + " if category_name is not None:\n", + " vis_frame = add_label(vis_frame, category_name)\n", + " \n", + " vis_frames.append(vis_frame)\n", + " return vis_frames" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "id": "applied-humanity", + "metadata": {}, + "outputs": [], + "source": [ + "keypoint_pipeline = [\n", + " dict(type='PoseDecode'),\n", + " dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),\n", + " dict(type='Resize', scale=(-1, 64)),\n", + " dict(type='CenterCrop', crop_size=64),\n", + " dict(type='GeneratePoseTarget', sigma=0.6, use_score=True, with_kp=True, with_limb=False)\n", + "]\n", + "\n", + "limb_pipeline = [\n", + " dict(type='PoseDecode'),\n", + " dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),\n", + " dict(type='Resize', scale=(-1, 64)),\n", + " dict(type='CenterCrop', crop_size=64),\n", + " dict(type='GeneratePoseTarget', sigma=0.6, use_score=True, with_kp=False, with_limb=True)\n", + "]\n", + "\n", + "from mmaction.datasets.pipelines import Compose\n", + "def get_pseudo_heatmap(anno, flag='keypoint'):\n", + " assert flag in ['keypoint', 'limb']\n", + " pipeline = Compose(keypoint_pipeline if flag == 'keypoint' else limb_pipeline)\n", + " return pipeline(anno)['imgs']\n", + "\n", + "def vis_heatmaps(heatmaps, channel=-1, ratio=8):\n", + " # if channel is -1, draw all keypoints / limbs on the same map\n", + " import matplotlib.cm as cm\n", + " h, w, _ = heatmaps[0].shape\n", + " newh, neww = int(h * ratio), int(w * ratio)\n", + " \n", + " if channel == -1:\n", + " heatmaps = [np.max(x, axis=-1) for x in heatmaps]\n", + " cmap = cm.viridis\n", + " heatmaps = [(cmap(x)[..., :3] * 255).astype(np.uint8) for x in heatmaps]\n", + " heatmaps = [cv2.resize(x, (neww, newh)) for x in heatmaps]\n", + " return heatmaps" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "automatic-commons", + "metadata": {}, + "outputs": [], + "source": [ + "# Load GYM annotations\n", + "lines = list(urllib.request.urlopen('https://sdolivia.github.io/FineGym/resources/dataset/gym99_categories.txt'))\n", + "gym_categories = [x.decode().strip().split('; ')[-1] for x in lines]\n", + "gym_annos = load(gym_train_ann_file) + load(gym_val_ann_file)" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "id": "numerous-bristol", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2021-04-25 22:18:53-- https://download.openmmlab.com/mmaction/posec3d/gym_samples.tar\n", + "Resolving download.openmmlab.com (download.openmmlab.com)... 124.160.145.22\n", + "Connecting to download.openmmlab.com (download.openmmlab.com)|124.160.145.22|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 36300800 (35M) [application/x-tar]\n", + "Saving to: ‘gym_samples.tar’\n", + "\n", + "100%[======================================>] 36,300,800 11.5MB/s in 3.0s \n", + "\n", + "2021-04-25 22:18:58 (11.5 MB/s) - ‘gym_samples.tar’ saved [36300800/36300800]\n", + "\n" + ] + } + ], + "source": [ + "# download sample videos of GYM\n", + "!wget https://download.openmmlab.com/mmaction/posec3d/gym_samples.tar\n", + "!tar -xf gym_samples.tar\n", + "!rm gym_samples.tar" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "id": "ranging-harrison", + "metadata": {}, + "outputs": [], + "source": [ + "gym_root = 'gym_samples/'\n", + "gym_vids = os.listdir(gym_root)\n", + "# visualize pose of which video? index in 0 - 50.\n", + "idx = 1\n", + "vid = gym_vids[idx]\n", + "\n", + "frame_dir = vid.split('.')[0]\n", + "vid_path = osp.join(gym_root, vid)\n", + "anno = [x for x in gym_annos if x['frame_dir'] == frame_dir][0]" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "id": "fitting-courage", + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize Skeleton\n", + "vis_frames = vis_skeleton(vid_path, anno, gym_categories[anno['label']])\n", + "vid = mpy.ImageSequenceClip(vis_frames, fps=24)\n", + "vid.ipython_display()" + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "id": "orange-logging", + "metadata": {}, + "outputs": [], + "source": [ + "keypoint_heatmap = get_pseudo_heatmap(anno)\n", + "keypoint_mapvis = vis_heatmaps(keypoint_heatmap)\n", + "keypoint_mapvis = [add_label(f, gym_categories[anno['label']]) for f in keypoint_mapvis]\n", + "vid = mpy.ImageSequenceClip(keypoint_mapvis, fps=24)\n", + "vid.ipython_display()" + ] + }, + { + "cell_type": "code", + "execution_count": 88, + "id": "residential-conjunction", + "metadata": {}, + "outputs": [], + "source": [ + "limb_heatmap = get_pseudo_heatmap(anno, 'limb')\n", + "limb_mapvis = vis_heatmaps(limb_heatmap)\n", + "limb_mapvis = [add_label(f, gym_categories[anno['label']]) for f in limb_mapvis]\n", + "vid = mpy.ImageSequenceClip(limb_mapvis, fps=24)\n", + "vid.ipython_display()" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "id": "coupled-stranger", + "metadata": {}, + "outputs": [], + "source": [ + "# The name list of \n", + "ntu_categories = ['drink water', 'eat meal/snack', 'brushing teeth', 'brushing hair', 'drop', 'pickup', \n", + " 'throw', 'sitting down', 'standing up (from sitting position)', 'clapping', 'reading', \n", + " 'writing', 'tear up paper', 'wear jacket', 'take off jacket', 'wear a shoe', \n", + " 'take off a shoe', 'wear on glasses', 'take off glasses', 'put on a hat/cap', \n", + " 'take off a hat/cap', 'cheer up', 'hand waving', 'kicking something', \n", + " 'reach into pocket', 'hopping (one foot jumping)', 'jump up', \n", + " 'make a phone call/answer phone', 'playing with phone/tablet', 'typing on a keyboard', \n", + " 'pointing to something with finger', 'taking a selfie', 'check time (from watch)', \n", + " 'rub two hands together', 'nod head/bow', 'shake head', 'wipe face', 'salute', \n", + " 'put the palms together', 'cross hands in front (say stop)', 'sneeze/cough', \n", + " 'staggering', 'falling', 'touch head (headache)', 'touch chest (stomachache/heart pain)', \n", + " 'touch back (backache)', 'touch neck (neckache)', 'nausea or vomiting condition', \n", + " 'use a fan (with hand or paper)/feeling warm', 'punching/slapping other person', \n", + " 'kicking other person', 'pushing other person', 'pat on back of other person', \n", + " 'point finger at the other person', 'hugging other person', \n", + " 'giving something to other person', \"touch other person's pocket\", 'handshaking', \n", + " 'walking towards each other', 'walking apart from each other']\n", + "ntu_annos = load(ntu60_xsub_train_ann_file) + load(ntu60_xsub_val_ann_file)" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "id": "critical-review", + "metadata": {}, + "outputs": [], + "source": [ + "ntu_root = 'ntu_samples/'\n", + "ntu_vids = os.listdir(ntu_root)\n", + "# visualize pose of which video? index in 0 - 50.\n", + "idx = 20\n", + "vid = ntu_vids[idx]\n", + "\n", + "frame_dir = vid.split('.')[0]\n", + "vid_path = osp.join(ntu_root, vid)\n", + "anno = [x for x in ntu_annos if x['frame_dir'] == frame_dir.split('_')[0]][0]\n" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "id": "seasonal-palmer", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2021-04-25 22:21:16-- https://download.openmmlab.com/mmaction/posec3d/ntu_samples.tar\n", + "Resolving download.openmmlab.com (download.openmmlab.com)... 124.160.145.22\n", + "Connecting to download.openmmlab.com (download.openmmlab.com)|124.160.145.22|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 121753600 (116M) [application/x-tar]\n", + "Saving to: ‘ntu_samples.tar’\n", + "\n", + "100%[======================================>] 121,753,600 14.4MB/s in 9.2s \n", + "\n", + "2021-04-25 22:21:26 (12.6 MB/s) - ‘ntu_samples.tar’ saved [121753600/121753600]\n", + "\n" + ] + } + ], + "source": [ + "# download sample videos of NTU-60\n", + "!wget https://download.openmmlab.com/mmaction/posec3d/ntu_samples.tar\n", + "!tar -xf ntu_samples.tar\n", + "!rm ntu_samples.tar" + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "id": "accompanied-invitation", + "metadata": {}, + "outputs": [], + "source": [ + "vis_frames = vis_skeleton(vid_path, anno, ntu_categories[anno['label']])\n", + "vid = mpy.ImageSequenceClip(vis_frames, fps=24)\n", + "vid.ipython_display()" + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "id": "respiratory-conclusion", + "metadata": {}, + "outputs": [], + "source": [ + "keypoint_heatmap = get_pseudo_heatmap(anno)\n", + "keypoint_mapvis = vis_heatmaps(keypoint_heatmap)\n", + "keypoint_mapvis = [add_label(f, gym_categories[anno['label']]) for f in keypoint_mapvis]\n", + "vid = mpy.ImageSequenceClip(keypoint_mapvis, fps=24)\n", + "vid.ipython_display()" + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "id": "thirty-vancouver", + "metadata": {}, + "outputs": [], + "source": [ + "limb_heatmap = get_pseudo_heatmap(anno, 'limb')\n", + "limb_mapvis = vis_heatmaps(limb_heatmap)\n", + "limb_mapvis = [add_label(f, gym_categories[anno['label']]) for f in limb_mapvis]\n", + "vid = mpy.ImageSequenceClip(limb_mapvis, fps=24)\n", + "vid.ipython_display()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openmmlab_test/mmaction2-0.24.1/demo/webcam_demo.py b/openmmlab_test/mmaction2-0.24.1/demo/webcam_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..575a503a9ca9ae880bb14168183bca030d13f627 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/webcam_demo.py @@ -0,0 +1,223 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import time +from collections import deque +from operator import itemgetter +from threading import Thread + +import cv2 +import numpy as np +import torch +from mmcv import Config, DictAction +from mmcv.parallel import collate, scatter + +from mmaction.apis import init_recognizer +from mmaction.datasets.pipelines import Compose + +FONTFACE = cv2.FONT_HERSHEY_COMPLEX_SMALL +FONTSCALE = 1 +FONTCOLOR = (255, 255, 255) # BGR, white +MSGCOLOR = (128, 128, 128) # BGR, gray +THICKNESS = 1 +LINETYPE = 1 + +EXCLUED_STEPS = [ + 'OpenCVInit', 'OpenCVDecode', 'DecordInit', 'DecordDecode', 'PyAVInit', + 'PyAVDecode', 'RawFrameDecode' +] + + +def parse_args(): + parser = argparse.ArgumentParser(description='MMAction2 webcam demo') + parser.add_argument('config', help='test config file path') + parser.add_argument('checkpoint', help='checkpoint file') + parser.add_argument('label', help='label file') + parser.add_argument( + '--device', type=str, default='cuda:0', help='CPU/CUDA device option') + parser.add_argument( + '--camera-id', type=int, default=0, help='camera device id') + parser.add_argument( + '--threshold', + type=float, + default=0.01, + help='recognition score threshold') + parser.add_argument( + '--average-size', + type=int, + default=1, + help='number of latest clips to be averaged for prediction') + parser.add_argument( + '--drawing-fps', + type=int, + default=20, + help='Set upper bound FPS value of the output drawing') + parser.add_argument( + '--inference-fps', + type=int, + default=4, + help='Set upper bound FPS value of model inference') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + args = parser.parse_args() + assert args.drawing_fps >= 0 and args.inference_fps >= 0, \ + 'upper bound FPS value of drawing and inference should be set as ' \ + 'positive number, or zero for no limit' + return args + + +def show_results(): + print('Press "Esc", "q" or "Q" to exit') + + text_info = {} + cur_time = time.time() + while True: + msg = 'Waiting for action ...' + _, frame = camera.read() + frame_queue.append(np.array(frame[:, :, ::-1])) + + if len(result_queue) != 0: + text_info = {} + results = result_queue.popleft() + for i, result in enumerate(results): + selected_label, score = result + if score < threshold: + break + location = (0, 40 + i * 20) + text = selected_label + ': ' + str(round(score, 2)) + text_info[location] = text + cv2.putText(frame, text, location, FONTFACE, FONTSCALE, + FONTCOLOR, THICKNESS, LINETYPE) + + elif len(text_info) != 0: + for location, text in text_info.items(): + cv2.putText(frame, text, location, FONTFACE, FONTSCALE, + FONTCOLOR, THICKNESS, LINETYPE) + + else: + cv2.putText(frame, msg, (0, 40), FONTFACE, FONTSCALE, MSGCOLOR, + THICKNESS, LINETYPE) + + cv2.imshow('camera', frame) + ch = cv2.waitKey(1) + + if ch == 27 or ch == ord('q') or ch == ord('Q'): + break + + if drawing_fps > 0: + # add a limiter for actual drawing fps <= drawing_fps + sleep_time = 1 / drawing_fps - (time.time() - cur_time) + if sleep_time > 0: + time.sleep(sleep_time) + cur_time = time.time() + + +def inference(): + score_cache = deque() + scores_sum = 0 + cur_time = time.time() + while True: + cur_windows = [] + + while len(cur_windows) == 0: + if len(frame_queue) == sample_length: + cur_windows = list(np.array(frame_queue)) + if data['img_shape'] is None: + data['img_shape'] = frame_queue.popleft().shape[:2] + + cur_data = data.copy() + cur_data['imgs'] = cur_windows + cur_data = test_pipeline(cur_data) + cur_data = collate([cur_data], samples_per_gpu=1) + if next(model.parameters()).is_cuda: + cur_data = scatter(cur_data, [device])[0] + + with torch.no_grad(): + scores = model(return_loss=False, **cur_data)[0] + + score_cache.append(scores) + scores_sum += scores + + if len(score_cache) == average_size: + scores_avg = scores_sum / average_size + num_selected_labels = min(len(label), 5) + + scores_tuples = tuple(zip(label, scores_avg)) + scores_sorted = sorted( + scores_tuples, key=itemgetter(1), reverse=True) + results = scores_sorted[:num_selected_labels] + + result_queue.append(results) + scores_sum -= score_cache.popleft() + + if inference_fps > 0: + # add a limiter for actual inference fps <= inference_fps + sleep_time = 1 / inference_fps - (time.time() - cur_time) + if sleep_time > 0: + time.sleep(sleep_time) + cur_time = time.time() + + camera.release() + cv2.destroyAllWindows() + + +def main(): + global frame_queue, camera, frame, results, threshold, sample_length, \ + data, test_pipeline, model, device, average_size, label, \ + result_queue, drawing_fps, inference_fps + + args = parse_args() + average_size = args.average_size + threshold = args.threshold + drawing_fps = args.drawing_fps + inference_fps = args.inference_fps + + device = torch.device(args.device) + + cfg = Config.fromfile(args.config) + cfg.merge_from_dict(args.cfg_options) + + model = init_recognizer(cfg, args.checkpoint, device=device) + camera = cv2.VideoCapture(args.camera_id) + data = dict(img_shape=None, modality='RGB', label=-1) + + with open(args.label, 'r') as f: + label = [line.strip() for line in f] + + # prepare test pipeline from non-camera pipeline + cfg = model.cfg + sample_length = 0 + pipeline = cfg.data.test.pipeline + pipeline_ = pipeline.copy() + for step in pipeline: + if 'SampleFrames' in step['type']: + sample_length = step['clip_len'] * step['num_clips'] + data['num_clips'] = step['num_clips'] + data['clip_len'] = step['clip_len'] + pipeline_.remove(step) + if step['type'] in EXCLUED_STEPS: + # remove step to decode frames + pipeline_.remove(step) + test_pipeline = Compose(pipeline_) + + assert sample_length > 0 + + try: + frame_queue = deque(maxlen=sample_length) + result_queue = deque(maxlen=1) + pw = Thread(target=show_results, args=(), daemon=True) + pr = Thread(target=inference, args=(), daemon=True) + pw.start() + pr.start() + pw.join() + except KeyboardInterrupt: + pass + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/demo/webcam_demo_spatiotemporal_det.py b/openmmlab_test/mmaction2-0.24.1/demo/webcam_demo_spatiotemporal_det.py new file mode 100644 index 0000000000000000000000000000000000000000..fd02cbdb8b5fc5f94311afa77b4ef12600add2ce --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/demo/webcam_demo_spatiotemporal_det.py @@ -0,0 +1,856 @@ +# Copyright (c) OpenMMLab. All rights reserved. +"""Webcam Spatio-Temporal Action Detection Demo. + +Some codes are based on https://github.com/facebookresearch/SlowFast +""" + +import argparse +import atexit +import copy +import logging +import queue +import threading +import time +from abc import ABCMeta, abstractmethod + +import cv2 +import mmcv +import numpy as np +import torch +from mmcv import Config, DictAction +from mmcv.runner import load_checkpoint + +from mmaction.models import build_detector + +try: + from mmdet.apis import inference_detector, init_detector +except (ImportError, ModuleNotFoundError): + raise ImportError('Failed to import `inference_detector` and ' + '`init_detector` form `mmdet.apis`. These apis are ' + 'required in this demo! ') + +logging.basicConfig(level=logging.DEBUG) +logger = logging.getLogger(__name__) + + +def parse_args(): + parser = argparse.ArgumentParser( + description='MMAction2 webcam spatio-temporal detection demo') + + parser.add_argument( + '--config', + default=('configs/detection/ava/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py'), + help='spatio temporal detection config file path') + parser.add_argument( + '--checkpoint', + default=('https://download.openmmlab.com/mmaction/detection/ava/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/' + 'slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb' + '_20201217-16378594.pth'), + help='spatio temporal detection checkpoint file/url') + parser.add_argument( + '--action-score-thr', + type=float, + default=0.4, + help='the threshold of human action score') + parser.add_argument( + '--det-config', + default='demo/faster_rcnn_r50_fpn_2x_coco.py', + help='human detection config file path (from mmdet)') + parser.add_argument( + '--det-checkpoint', + default=('http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/' + 'faster_rcnn_r50_fpn_2x_coco/' + 'faster_rcnn_r50_fpn_2x_coco_' + 'bbox_mAP-0.384_20200504_210434-a5d8aa15.pth'), + help='human detection checkpoint file/url') + parser.add_argument( + '--det-score-thr', + type=float, + default=0.9, + help='the threshold of human detection score') + parser.add_argument( + '--input-video', + default='0', + type=str, + help='webcam id or input video file/url') + parser.add_argument( + '--label-map', + default='tools/data/ava/label_map.txt', + help='label map file') + parser.add_argument( + '--device', type=str, default='cuda:0', help='CPU/CUDA device option') + parser.add_argument( + '--output-fps', + default=15, + type=int, + help='the fps of demo video output') + parser.add_argument( + '--out-filename', + default=None, + type=str, + help='the filename of output video') + parser.add_argument( + '--show', + action='store_true', + help='Whether to show results with cv2.imshow') + parser.add_argument( + '--display-height', + type=int, + default=0, + help='Image height for human detector and draw frames.') + parser.add_argument( + '--display-width', + type=int, + default=0, + help='Image width for human detector and draw frames.') + parser.add_argument( + '--predict-stepsize', + default=8, + type=int, + help='give out a prediction per n frames') + parser.add_argument( + '--clip-vis-length', + default=8, + type=int, + help='Number of draw frames per clip.') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + + args = parser.parse_args() + return args + + +class TaskInfo: + """Wapper for a clip. + + Transmit data around three threads. + + 1) Read Thread: Create task and put task into read queue. Init `frames`, + `processed_frames`, `img_shape`, `ratio`, `clip_vis_length`. + 2) Main Thread: Get data from read queue, predict human bboxes and stdet + action labels, draw predictions and put task into display queue. Init + `display_bboxes`, `stdet_bboxes` and `action_preds`, update `frames`. + 3) Display Thread: Get data from display queue, show/write frames and + delete task. + """ + + def __init__(self): + self.id = -1 + + # raw frames, used as human detector input, draw predictions input + # and output, display input + self.frames = None + + # stdet params + self.processed_frames = None # model inputs + self.frames_inds = None # select frames from processed frames + self.img_shape = None # model inputs, processed frame shape + # `action_preds` is `list[list[tuple]]`. The outer brackets indicate + # different bboxes and the intter brackets indicate different action + # results for the same bbox. tuple contains `class_name` and `score`. + self.action_preds = None # stdet results + + # human bboxes with the format (xmin, ymin, xmax, ymax) + self.display_bboxes = None # bboxes coords for self.frames + self.stdet_bboxes = None # bboxes coords for self.processed_frames + self.ratio = None # processed_frames.shape[1::-1]/frames.shape[1::-1] + + # for each clip, draw predictions on clip_vis_length frames + self.clip_vis_length = -1 + + def add_frames(self, idx, frames, processed_frames): + """Add the clip and corresponding id. + + Args: + idx (int): the current index of the clip. + frames (list[ndarray]): list of images in "BGR" format. + processed_frames (list[ndarray]): list of resize and normed images + in "BGR" format. + """ + self.frames = frames + self.processed_frames = processed_frames + self.id = idx + self.img_shape = processed_frames[0].shape[:2] + + def add_bboxes(self, display_bboxes): + """Add correspondding bounding boxes.""" + self.display_bboxes = display_bboxes + self.stdet_bboxes = display_bboxes.clone() + self.stdet_bboxes[:, ::2] = self.stdet_bboxes[:, ::2] * self.ratio[0] + self.stdet_bboxes[:, 1::2] = self.stdet_bboxes[:, 1::2] * self.ratio[1] + + def add_action_preds(self, preds): + """Add the corresponding action predictions.""" + self.action_preds = preds + + def get_model_inputs(self, device): + """Convert preprocessed images to MMAction2 STDet model inputs.""" + cur_frames = [self.processed_frames[idx] for idx in self.frames_inds] + input_array = np.stack(cur_frames).transpose((3, 0, 1, 2))[np.newaxis] + input_tensor = torch.from_numpy(input_array).to(device) + return dict( + return_loss=False, + img=[input_tensor], + proposals=[[self.stdet_bboxes]], + img_metas=[[dict(img_shape=self.img_shape)]]) + + +class BaseHumanDetector(metaclass=ABCMeta): + """Base class for Human Dector. + + Args: + device (str): CPU/CUDA device option. + """ + + def __init__(self, device): + self.device = torch.device(device) + + @abstractmethod + def _do_detect(self, image): + """Get human bboxes with shape [n, 4]. + + The format of bboxes is (xmin, ymin, xmax, ymax) in pixels. + """ + + def predict(self, task): + """Add keyframe bboxes to task.""" + # keyframe idx == (clip_len * frame_interval) // 2 + keyframe = task.frames[len(task.frames) // 2] + + # call detector + bboxes = self._do_detect(keyframe) + + # convert bboxes to torch.Tensor and move to target device + if isinstance(bboxes, np.ndarray): + bboxes = torch.from_numpy(bboxes).to(self.device) + elif isinstance(bboxes, torch.Tensor) and bboxes.device != self.device: + bboxes = bboxes.to(self.device) + + # update task + task.add_bboxes(bboxes) + + return task + + +class MmdetHumanDetector(BaseHumanDetector): + """Wrapper for mmdetection human detector. + + Args: + config (str): Path to mmdetection config. + ckpt (str): Path to mmdetection checkpoint. + device (str): CPU/CUDA device option. + score_thr (float): The threshold of human detection score. + person_classid (int): Choose class from detection results. + Default: 0. Suitable for COCO pretrained models. + """ + + def __init__(self, config, ckpt, device, score_thr, person_classid=0): + super().__init__(device) + self.model = init_detector(config, ckpt, device) + self.person_classid = person_classid + self.score_thr = score_thr + + def _do_detect(self, image): + """Get bboxes in shape [n, 4] and values in pixels.""" + result = inference_detector(self.model, image)[self.person_classid] + result = result[result[:, 4] >= self.score_thr][:, :4] + return result + + +class StdetPredictor: + """Wrapper for MMAction2 spatio-temporal action models. + + Args: + config (str): Path to stdet config. + ckpt (str): Path to stdet checkpoint. + device (str): CPU/CUDA device option. + score_thr (float): The threshold of human action score. + label_map_path (str): Path to label map file. The format for each line + is `{class_id}: {class_name}`. + """ + + def __init__(self, config, checkpoint, device, score_thr, label_map_path): + self.score_thr = score_thr + + # load model + config.model.backbone.pretrained = None + model = build_detector(config.model, test_cfg=config.get('test_cfg')) + load_checkpoint(model, checkpoint, map_location='cpu') + model.to(device) + model.eval() + self.model = model + self.device = device + + # init label map, aka class_id to class_name dict + with open(label_map_path) as f: + lines = f.readlines() + lines = [x.strip().split(': ') for x in lines] + self.label_map = {int(x[0]): x[1] for x in lines} + try: + if config['data']['train']['custom_classes'] is not None: + self.label_map = { + id + 1: self.label_map[cls] + for id, cls in enumerate(config['data']['train'] + ['custom_classes']) + } + except KeyError: + pass + + def predict(self, task): + """Spatio-temporval Action Detection model inference.""" + # No need to do inference if no one in keyframe + if len(task.stdet_bboxes) == 0: + return task + + with torch.no_grad(): + result = self.model(**task.get_model_inputs(self.device))[0] + + # pack results of human detector and stdet + preds = [] + for _ in range(task.stdet_bboxes.shape[0]): + preds.append([]) + for class_id in range(len(result)): + if class_id + 1 not in self.label_map: + continue + for bbox_id in range(task.stdet_bboxes.shape[0]): + if result[class_id][bbox_id, 4] > self.score_thr: + preds[bbox_id].append((self.label_map[class_id + 1], + result[class_id][bbox_id, 4])) + + # update task + # `preds` is `list[list[tuple]]`. The outer brackets indicate + # different bboxes and the intter brackets indicate different action + # results for the same bbox. tuple contains `class_name` and `score`. + task.add_action_preds(preds) + + return task + + +class ClipHelper: + """Multithrading utils to manage the lifecycle of task.""" + + def __init__(self, + config, + display_height=0, + display_width=0, + input_video=0, + predict_stepsize=40, + output_fps=25, + clip_vis_length=8, + out_filename=None, + show=True, + stdet_input_shortside=256): + # stdet sampling strategy + val_pipeline = config.data.val.pipeline + sampler = [x for x in val_pipeline + if x['type'] == 'SampleAVAFrames'][0] + clip_len, frame_interval = sampler['clip_len'], sampler[ + 'frame_interval'] + self.window_size = clip_len * frame_interval + + # asserts + assert (out_filename or show), \ + 'out_filename and show cannot both be None' + assert clip_len % 2 == 0, 'We would like to have an even clip_len' + assert clip_vis_length <= predict_stepsize + assert 0 < predict_stepsize <= self.window_size + + # source params + try: + self.cap = cv2.VideoCapture(int(input_video)) + self.webcam = True + except ValueError: + self.cap = cv2.VideoCapture(input_video) + self.webcam = False + assert self.cap.isOpened() + + # stdet input preprocessing params + h = int(self.cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) + w = int(self.cap.get(cv2.CAP_PROP_FRAME_WIDTH)) + self.stdet_input_size = mmcv.rescale_size( + (w, h), (stdet_input_shortside, np.Inf)) + img_norm_cfg = config['img_norm_cfg'] + if 'to_rgb' not in img_norm_cfg and 'to_bgr' in img_norm_cfg: + to_bgr = img_norm_cfg.pop('to_bgr') + img_norm_cfg['to_rgb'] = to_bgr + img_norm_cfg['mean'] = np.array(img_norm_cfg['mean']) + img_norm_cfg['std'] = np.array(img_norm_cfg['std']) + self.img_norm_cfg = img_norm_cfg + + # task init params + self.clip_vis_length = clip_vis_length + self.predict_stepsize = predict_stepsize + self.buffer_size = self.window_size - self.predict_stepsize + frame_start = self.window_size // 2 - (clip_len // 2) * frame_interval + self.frames_inds = [ + frame_start + frame_interval * i for i in range(clip_len) + ] + self.buffer = [] + self.processed_buffer = [] + + # output/display params + if display_height > 0 and display_width > 0: + self.display_size = (display_width, display_height) + elif display_height > 0 or display_width > 0: + self.display_size = mmcv.rescale_size( + (w, h), (np.Inf, max(display_height, display_width))) + else: + self.display_size = (w, h) + self.ratio = tuple( + n / o for n, o in zip(self.stdet_input_size, self.display_size)) + if output_fps <= 0: + self.output_fps = int(self.cap.get(cv2.CAP_PROP_FPS)) + else: + self.output_fps = output_fps + self.show = show + self.video_writer = None + if out_filename is not None: + self.video_writer = self.get_output_video_writer(out_filename) + display_start_idx = self.window_size // 2 - self.predict_stepsize // 2 + self.display_inds = [ + display_start_idx + i for i in range(self.predict_stepsize) + ] + + # display multi-theading params + self.display_id = -1 # task.id for display queue + self.display_queue = {} + self.display_lock = threading.Lock() + self.output_lock = threading.Lock() + + # read multi-theading params + self.read_id = -1 # task.id for read queue + self.read_id_lock = threading.Lock() + self.read_queue = queue.Queue() + self.read_lock = threading.Lock() + self.not_end = True # cap.read() flag + + # program state + self.stopped = False + + atexit.register(self.clean) + + def read_fn(self): + """Main function for read thread. + + Contains three steps: + + 1) Read and preprocess (resize + norm) frames from source. + 2) Create task by frames from previous step and buffer. + 3) Put task into read queue. + """ + was_read = True + start_time = time.time() + while was_read and not self.stopped: + # init task + task = TaskInfo() + task.clip_vis_length = self.clip_vis_length + task.frames_inds = self.frames_inds + task.ratio = self.ratio + + # read buffer + frames = [] + processed_frames = [] + if len(self.buffer) != 0: + frames = self.buffer + if len(self.processed_buffer) != 0: + processed_frames = self.processed_buffer + + # read and preprocess frames from source and update task + with self.read_lock: + before_read = time.time() + read_frame_cnt = self.window_size - len(frames) + while was_read and len(frames) < self.window_size: + was_read, frame = self.cap.read() + if not self.webcam: + # Reading frames too fast may lead to unexpected + # performance degradation. If you have enough + # resource, this line could be commented. + time.sleep(1 / self.output_fps) + if was_read: + frames.append(mmcv.imresize(frame, self.display_size)) + processed_frame = mmcv.imresize( + frame, self.stdet_input_size).astype(np.float32) + _ = mmcv.imnormalize_(processed_frame, + **self.img_norm_cfg) + processed_frames.append(processed_frame) + task.add_frames(self.read_id + 1, frames, processed_frames) + + # update buffer + if was_read: + self.buffer = frames[-self.buffer_size:] + self.processed_buffer = processed_frames[-self.buffer_size:] + + # update read state + with self.read_id_lock: + self.read_id += 1 + self.not_end = was_read + + self.read_queue.put((was_read, copy.deepcopy(task))) + cur_time = time.time() + logger.debug( + f'Read thread: {1000*(cur_time - start_time):.0f} ms, ' + f'{read_frame_cnt / (cur_time - before_read):.0f} fps') + start_time = cur_time + + def display_fn(self): + """Main function for display thread. + + Read data from display queue and display predictions. + """ + start_time = time.time() + while not self.stopped: + # get the state of the read thread + with self.read_id_lock: + read_id = self.read_id + not_end = self.not_end + + with self.display_lock: + # If video ended and we have display all frames. + if not not_end and self.display_id == read_id: + break + + # If the next task are not available, wait. + if (len(self.display_queue) == 0 or + self.display_queue.get(self.display_id + 1) is None): + time.sleep(0.02) + continue + + # get display data and update state + self.display_id += 1 + was_read, task = self.display_queue[self.display_id] + del self.display_queue[self.display_id] + display_id = self.display_id + + # do display predictions + with self.output_lock: + if was_read and task.id == 0: + # the first task + cur_display_inds = range(self.display_inds[-1] + 1) + elif not was_read: + # the last task + cur_display_inds = range(self.display_inds[0], + len(task.frames)) + else: + cur_display_inds = self.display_inds + + for frame_id in cur_display_inds: + frame = task.frames[frame_id] + if self.show: + cv2.imshow('Demo', frame) + cv2.waitKey(int(1000 / self.output_fps)) + if self.video_writer: + self.video_writer.write(frame) + + cur_time = time.time() + logger.debug( + f'Display thread: {1000*(cur_time - start_time):.0f} ms, ' + f'read id {read_id}, display id {display_id}') + start_time = cur_time + + def __iter__(self): + return self + + def __next__(self): + """Get data from read queue. + + This function is part of the main thread. + """ + if self.read_queue.qsize() == 0: + time.sleep(0.02) + return not self.stopped, None + + was_read, task = self.read_queue.get() + if not was_read: + # If we reach the end of the video, there aren't enough frames + # in the task.processed_frames, so no need to model inference + # and draw predictions. Put task into display queue. + with self.read_id_lock: + read_id = self.read_id + with self.display_lock: + self.display_queue[read_id] = was_read, copy.deepcopy(task) + + # main thread doesn't need to handle this task again + task = None + return was_read, task + + def start(self): + """Start read thread and display thread.""" + self.read_thread = threading.Thread( + target=self.read_fn, args=(), name='VidRead-Thread', daemon=True) + self.read_thread.start() + self.display_thread = threading.Thread( + target=self.display_fn, + args=(), + name='VidDisplay-Thread', + daemon=True) + self.display_thread.start() + + return self + + def clean(self): + """Close all threads and release all resources.""" + self.stopped = True + self.read_lock.acquire() + self.cap.release() + self.read_lock.release() + self.output_lock.acquire() + cv2.destroyAllWindows() + if self.video_writer: + self.video_writer.release() + self.output_lock.release() + + def join(self): + """Waiting for the finalization of read and display thread.""" + self.read_thread.join() + self.display_thread.join() + + def display(self, task): + """Add the visualized task to the display queue. + + Args: + task (TaskInfo object): task object that contain the necessary + information for prediction visualization. + """ + with self.display_lock: + self.display_queue[task.id] = (True, task) + + def get_output_video_writer(self, path): + """Return a video writer object. + + Args: + path (str): path to the output video file. + """ + return cv2.VideoWriter( + filename=path, + fourcc=cv2.VideoWriter_fourcc(*'mp4v'), + fps=float(self.output_fps), + frameSize=self.display_size, + isColor=True) + + +class BaseVisualizer(metaclass=ABCMeta): + """Base class for visualization tools.""" + + def __init__(self, max_labels_per_bbox): + self.max_labels_per_bbox = max_labels_per_bbox + + def draw_predictions(self, task): + """Visualize stdet predictions on raw frames.""" + # read bboxes from task + bboxes = task.display_bboxes.cpu().numpy() + + # draw predictions and update task + keyframe_idx = len(task.frames) // 2 + draw_range = [ + keyframe_idx - task.clip_vis_length // 2, + keyframe_idx + (task.clip_vis_length - 1) // 2 + ] + assert draw_range[0] >= 0 and draw_range[1] < len(task.frames) + task.frames = self.draw_clip_range(task.frames, task.action_preds, + bboxes, draw_range) + + return task + + def draw_clip_range(self, frames, preds, bboxes, draw_range): + """Draw a range of frames with the same bboxes and predictions.""" + # no predictions to be draw + if bboxes is None or len(bboxes) == 0: + return frames + + # draw frames in `draw_range` + left_frames = frames[:draw_range[0]] + right_frames = frames[draw_range[1] + 1:] + draw_frames = frames[draw_range[0]:draw_range[1] + 1] + + # get labels(texts) and draw predictions + draw_frames = [ + self.draw_one_image(frame, bboxes, preds) for frame in draw_frames + ] + + return list(left_frames) + draw_frames + list(right_frames) + + @abstractmethod + def draw_one_image(self, frame, bboxes, preds): + """Draw bboxes and corresponding texts on one frame.""" + + @staticmethod + def abbrev(name): + """Get the abbreviation of label name: + + 'take (an object) from (a person)' -> 'take ... from ...' + """ + while name.find('(') != -1: + st, ed = name.find('('), name.find(')') + name = name[:st] + '...' + name[ed + 1:] + return name + + +class DefaultVisualizer(BaseVisualizer): + """Tools to visualize predictions. + + Args: + max_labels_per_bbox (int): Max number of labels to visualize for a + person box. Default: 5. + plate (str): The color plate used for visualization. Two recommended + plates are blue plate `03045e-023e8a-0077b6-0096c7-00b4d8-48cae4` + and green plate `004b23-006400-007200-008000-38b000-70e000`. These + plates are generated by https://coolors.co/. + Default: '03045e-023e8a-0077b6-0096c7-00b4d8-48cae4'. + text_fontface (int): Fontface from OpenCV for texts. + Default: cv2.FONT_HERSHEY_DUPLEX. + text_fontscale (float): Fontscale from OpenCV for texts. + Default: 0.5. + text_fontcolor (tuple): fontface from OpenCV for texts. + Default: (255, 255, 255). + text_thickness (int): Thickness from OpenCV for texts. + Default: 1. + text_linetype (int): LInetype from OpenCV for texts. + Default: 1. + """ + + def __init__( + self, + max_labels_per_bbox=5, + plate='03045e-023e8a-0077b6-0096c7-00b4d8-48cae4', + text_fontface=cv2.FONT_HERSHEY_DUPLEX, + text_fontscale=0.5, + text_fontcolor=(255, 255, 255), # white + text_thickness=1, + text_linetype=1): + super().__init__(max_labels_per_bbox=max_labels_per_bbox) + self.text_fontface = text_fontface + self.text_fontscale = text_fontscale + self.text_fontcolor = text_fontcolor + self.text_thickness = text_thickness + self.text_linetype = text_linetype + + def hex2color(h): + """Convert the 6-digit hex string to tuple of 3 int value (RGB)""" + return (int(h[:2], 16), int(h[2:4], 16), int(h[4:], 16)) + + plate = plate.split('-') + self.plate = [hex2color(h) for h in plate] + + def draw_one_image(self, frame, bboxes, preds): + """Draw predictions on one image.""" + for bbox, pred in zip(bboxes, preds): + # draw bbox + box = bbox.astype(np.int64) + st, ed = tuple(box[:2]), tuple(box[2:]) + cv2.rectangle(frame, st, ed, (0, 0, 255), 2) + + # draw texts + for k, (label, score) in enumerate(pred): + if k >= self.max_labels_per_bbox: + break + text = f'{self.abbrev(label)}: {score:.4f}' + location = (0 + st[0], 18 + k * 18 + st[1]) + textsize = cv2.getTextSize(text, self.text_fontface, + self.text_fontscale, + self.text_thickness)[0] + textwidth = textsize[0] + diag0 = (location[0] + textwidth, location[1] - 14) + diag1 = (location[0], location[1] + 2) + cv2.rectangle(frame, diag0, diag1, self.plate[k + 1], -1) + cv2.putText(frame, text, location, self.text_fontface, + self.text_fontscale, self.text_fontcolor, + self.text_thickness, self.text_linetype) + + return frame + + +def main(args): + # init human detector + human_detector = MmdetHumanDetector(args.det_config, args.det_checkpoint, + args.device, args.det_score_thr) + + # init action detector + config = Config.fromfile(args.config) + config.merge_from_dict(args.cfg_options) + + try: + # In our spatiotemporal detection demo, different actions should have + # the same number of bboxes. + config['model']['test_cfg']['rcnn']['action_thr'] = .0 + except KeyError: + pass + stdet_predictor = StdetPredictor( + config=config, + checkpoint=args.checkpoint, + device=args.device, + score_thr=args.action_score_thr, + label_map_path=args.label_map) + + # init clip helper + clip_helper = ClipHelper( + config=config, + display_height=args.display_height, + display_width=args.display_width, + input_video=args.input_video, + predict_stepsize=args.predict_stepsize, + output_fps=args.output_fps, + clip_vis_length=args.clip_vis_length, + out_filename=args.out_filename, + show=args.show) + + # init visualizer + vis = DefaultVisualizer() + + # start read and display thread + clip_helper.start() + + try: + # Main thread main function contains: + # 1) get data from read queue + # 2) get human bboxes and stdet predictions + # 3) draw stdet predictions and update task + # 4) put task into display queue + for able_to_read, task in clip_helper: + # get data from read queue + + if not able_to_read: + # read thread is dead and all tasks are processed + break + + if task is None: + # when no data in read queue, wait + time.sleep(0.01) + continue + + inference_start = time.time() + + # get human bboxes + human_detector.predict(task) + + # get stdet predictions + stdet_predictor.predict(task) + + # draw stdet predictions in raw frames + vis.draw_predictions(task) + logger.info(f'Stdet Results: {task.action_preds}') + + # add draw frames to display queue + clip_helper.display(task) + + logger.debug('Main thread inference time ' + f'{1000*(time.time() - inference_start):.0f} ms') + + # wait for display thread + clip_helper.join() + except KeyboardInterrupt: + pass + finally: + # close read & display thread, release all resources + clip_helper.clean() + + +if __name__ == '__main__': + main(parse_args()) diff --git a/openmmlab_test/mmaction2-0.24.1/docker/Dockerfile b/openmmlab_test/mmaction2-0.24.1/docker/Dockerfile new file mode 100644 index 0000000000000000000000000000000000000000..506366f70ab585bfd11542ae1b982432db3af2d8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docker/Dockerfile @@ -0,0 +1,25 @@ +ARG PYTORCH="1.6.0" +ARG CUDA="10.1" +ARG CUDNN="7" + +FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel + +ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX" +ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all" +ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" + +RUN apt-get update && apt-get install -y git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 ffmpeg \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +# Install mmcv-full +RUN pip install mmcv-full==latest -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html + +# Install MMAction2 +RUN conda clean --all +RUN git clone https://github.com/open-mmlab/mmaction2.git /mmaction2 +WORKDIR /mmaction2 +RUN mkdir -p /mmaction2/data +ENV FORCE_CUDA="1" +RUN pip install cython --no-cache-dir +RUN pip install --no-cache-dir -e . diff --git a/openmmlab_test/mmaction2-0.24.1/docker/serve/Dockerfile b/openmmlab_test/mmaction2-0.24.1/docker/serve/Dockerfile new file mode 100644 index 0000000000000000000000000000000000000000..8ea55de3f69fbc405c7cbe17aacd0f20e1f70929 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docker/serve/Dockerfile @@ -0,0 +1,51 @@ +ARG PYTORCH="1.9.0" +ARG CUDA="10.2" +ARG CUDNN="7" +FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel + +ARG MMCV="1.3.8" +ARG MMACTION="0.24.0" + +ENV PYTHONUNBUFFERED TRUE + +RUN apt-get update && \ + DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \ + ca-certificates \ + g++ \ + openjdk-11-jre-headless \ + # MMDET Requirements + ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \ + libsndfile1 libturbojpeg \ + && rm -rf /var/lib/apt/lists/* + +ENV PATH="/opt/conda/bin:$PATH" +RUN export FORCE_CUDA=1 + +# TORCHSEVER +RUN pip install torchserve torch-model-archiver + +# MMLAB +ARG PYTORCH +ARG CUDA +RUN ["/bin/bash", "-c", "pip install mmcv-full==${MMCV} -f https://download.openmmlab.com/mmcv/dist/cu${CUDA//./}/torch${PYTORCH}/index.html"] +# RUN pip install mmaction2==${MMACTION} +RUN pip install git+https://github.com/open-mmlab/mmaction2.git + +RUN useradd -m model-server \ + && mkdir -p /home/model-server/tmp + +COPY entrypoint.sh /usr/local/bin/entrypoint.sh + +RUN chmod +x /usr/local/bin/entrypoint.sh \ + && chown -R model-server /home/model-server + +COPY config.properties /home/model-server/config.properties +RUN mkdir /home/model-server/model-store && chown -R model-server /home/model-server/model-store + +EXPOSE 8080 8081 8082 + +USER model-server +WORKDIR /home/model-server +ENV TEMP=/home/model-server/tmp +ENTRYPOINT ["/usr/local/bin/entrypoint.sh"] +CMD ["serve"] diff --git a/openmmlab_test/mmaction2-0.24.1/docker/serve/config.properties b/openmmlab_test/mmaction2-0.24.1/docker/serve/config.properties new file mode 100644 index 0000000000000000000000000000000000000000..efb9c47e40ab550bac765611e6c6c6f2a7152f11 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docker/serve/config.properties @@ -0,0 +1,5 @@ +inference_address=http://0.0.0.0:8080 +management_address=http://0.0.0.0:8081 +metrics_address=http://0.0.0.0:8082 +model_store=/home/model-server/model-store +load_models=all diff --git a/openmmlab_test/mmaction2-0.24.1/docker/serve/entrypoint.sh b/openmmlab_test/mmaction2-0.24.1/docker/serve/entrypoint.sh new file mode 100644 index 0000000000000000000000000000000000000000..41ba00b048aed84b45c5a8015a016ff148e97d86 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docker/serve/entrypoint.sh @@ -0,0 +1,12 @@ +#!/bin/bash +set -e + +if [[ "$1" = "serve" ]]; then + shift 1 + torchserve --start --ts-config /home/model-server/config.properties +else + eval "$@" +fi + +# prevent docker exit +tail -f /dev/null diff --git a/openmmlab_test/mmaction2-0.24.1/docs/Makefile b/openmmlab_test/mmaction2-0.24.1/docs/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..d4bb2cbb9eddb1bb1b4f366623044af8e4830919 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/openmmlab_test/mmaction2-0.24.1/docs/_static/css/readthedocs.css b/openmmlab_test/mmaction2-0.24.1/docs/_static/css/readthedocs.css new file mode 100644 index 0000000000000000000000000000000000000000..c8b2f6bdda09021e18b5304ddfbd8d158f84adcf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/_static/css/readthedocs.css @@ -0,0 +1,6 @@ +.header-logo { + background-image: url("../images/mmaction2.png"); + background-size: 130px 40px; + height: 40px; + width: 130px; +} diff --git a/openmmlab_test/mmaction2-0.24.1/docs/_static/images/mmaction2.png b/openmmlab_test/mmaction2-0.24.1/docs/_static/images/mmaction2.png new file mode 100644 index 0000000000000000000000000000000000000000..f0c759bb78c5424b4394d18a5ba833a8c9f43add Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/docs/_static/images/mmaction2.png differ diff --git a/openmmlab_test/mmaction2-0.24.1/docs/api.rst b/openmmlab_test/mmaction2-0.24.1/docs/api.rst new file mode 100644 index 0000000000000000000000000000000000000000..ecc9b810e909ba1c4a904c517474b79451921b3e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/api.rst @@ -0,0 +1,101 @@ +mmaction.apis +------------- +.. automodule:: mmaction.apis + :members: + +mmaction.core +------------- + +optimizer +^^^^^^^^^ +.. automodule:: mmaction.core.optimizer + :members: + +evaluation +^^^^^^^^^^ +.. automodule:: mmaction.core.evaluation + :members: + +scheduler +^^ +.. automodule:: mmaction.core.scheduler + :members: + +mmaction.localization +--------------------- + +localization +^^^^^^^^^^^^ +.. automodule:: mmaction.localization + :members: + +mmaction.models +--------------- + +models +^^^^^^ +.. automodule:: mmaction.models + :members: + +recognizers +^^^^^^^^^^^ +.. automodule:: mmaction.models.recognizers + :members: + +localizers +^^^^^^^^^^ +.. automodule:: mmaction.models.localizers + :members: + +common +^^^^^^ +.. automodule:: mmaction.models.common + :members: + +backbones +^^^^^^^^^ +.. automodule:: mmaction.models.backbones + :members: + +heads +^^^^^ +.. automodule:: mmaction.models.heads + :members: + +necks +^^^^^ +.. automodule:: mmaction.models.necks + :members: + +losses +^^^^^^ +.. automodule:: mmaction.models.losses + :members: + +mmaction.datasets +----------------- + +datasets +^^^^^^^^ +.. automodule:: mmaction.datasets + :members: + +pipelines +^^^^^^^^^ +.. automodule:: mmaction.datasets.pipelines + :members: + +samplers +^^^^^^^^ +.. automodule:: mmaction.datasets.samplers + :members: + +mmaction.utils +-------------- +.. automodule:: mmaction.utils + :members: + +mmaction.localization +--------------------- +.. automodule:: mmaction.localization + :members: diff --git a/openmmlab_test/mmaction2-0.24.1/docs/benchmark.md b/openmmlab_test/mmaction2-0.24.1/docs/benchmark.md new file mode 100644 index 0000000000000000000000000000000000000000..562064e500fc1770693e4b47ee5830e8cff01147 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/benchmark.md @@ -0,0 +1,160 @@ +# Benchmark + +We compare our results with some popular frameworks and official releases in terms of speed. + +## Settings + +### Hardware + +- 8 NVIDIA Tesla V100 (32G) GPUs +- Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz + +### Software Environment + +- Python 3.7 +- PyTorch 1.4 +- CUDA 10.1 +- CUDNN 7.6.03 +- NCCL 2.4.08 + +### Metrics + +The time we measured is the average training time for an iteration, including data processing and model training. +The training speed is measure with s/iter. The lower, the better. Note that we skip the first 50 iter times as they may contain the device warmup time. + +### Comparison Rules + +Here we compare our MMAction2 repo with other video understanding toolboxes in the same data and model settings +by the training time per iteration. Here, we use + +- commit id [7f3490d](https://github.com/open-mmlab/mmaction/tree/7f3490d3db6a67fe7b87bfef238b757403b670e3)(1/5/2020) of MMAction +- commit id [8d53d6f](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd)(5/5/2020) of Temporal-Shift-Module +- commit id [8299c98](https://github.com/facebookresearch/SlowFast/tree/8299c9862f83a067fa7114ce98120ae1568a83ec)(7/7/2020) of PySlowFast +- commit id [f13707f](https://github.com/wzmsltw/BSN-boundary-sensitive-network/tree/f13707fbc362486e93178c39f9c4d398afe2cb2f)(12/12/2018) of BSN(boundary sensitive network) +- commit id [45d0514](https://github.com/JJBOY/BMN-Boundary-Matching-Network/tree/45d05146822b85ca672b65f3d030509583d0135a)(17/10/2019) of BMN(boundary matching network) + +To ensure the fairness of the comparison, the comparison experiments were conducted under the same hardware environment and using the same dataset. The rawframe dataset we used is generated by the [data preparation tools](/tools/data/kinetics/README.md), the video dataset we used is a special version of resized video cache called '256p dense-encoded video', featuring a faster decoding speed which is generated by the scripts [here](/tools/data/resize_videos.py). Significant improvement can be observed when comparing with normal 256p videos as shown in the table below, especially when the sampling is sparse(like [TSN](/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py)). + +For each model setting, we kept the same data preprocessing methods to make sure the same feature input. +In addition, we also used Memcached, a distributed cached system, to load the data for the same IO time except for fair comparisons with Pyslowfast which uses raw videos directly from disk by default. + +We provide the training log based on which we calculate the average iter time, with the actual setting logged inside, feel free to verify it and fire an issue if something does not make sense. + +## Main Results + +### Recognizers + +| Model | input | io backend | batch size x gpus | MMAction2 (s/iter) | GPU mem(GB) | MMAction (s/iter) | GPU mem(GB) | Temporal-Shift-Module (s/iter) | GPU mem(GB) | PySlowFast (s/iter) | GPU mem(GB) | +| :------------------------------------------------------------------------------------------ | :----------------------: | :--------: | :---------------: | :-------------------------------------------------------------------------------------------------------------------------: | :---------: | :------------------------------------------------------------------------------------------------------------------: | :---------: | :-------------------------------------------------------------------------------------------------------------------------------: | :---------: | :--------------------------------------------------------------------------------------------------------------------: | :---------: | +| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p rawframes | Memcached | 32x8 | **[0.32](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_rawframes_memcahed_32x8.zip)** | 8.1 | [0.38](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction/tsn_256p_rawframes_memcached_32x8.zip) | 8.1 | [0.42](https://download.openmmlab.com/mmaction/benchmark/recognition/temporal_shift_module/tsn_256p_rawframes_memcached_32x8.zip) | 10.5 | x | x | +| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p videos | Disk | 32x8 | **[1.42](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO | +| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 32x8 | **[0.61](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_fast_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO | +| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_video.log) | 4.6 | +| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.35](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_fast_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.36](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_fast_video.log) | 4.6 | +| [I3D](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.43](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_256p_rawframes_memcahed_8x8.zip)** | 5.0 | [0.56](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction/i3d_256p_rawframes_memcached_8x8.zip) | 5.0 | x | x | x | x | +| [TSM](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.31](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsm_256p_rawframes_memcahed_8x8.zip)** | 6.9 | x | x | [0.41](https://download.openmmlab.com/mmaction/benchmark/recognition/temporal_shift_module/tsm_256p_rawframes_memcached_8x8.zip) | 9.1 | x | x | +| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.32](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_video.log) | 3.4 | +| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.25](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_fast_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.28](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_fast_video.log) | 3.4 | +| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.69](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [1.04](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_video.log) | 7.0 | +| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.68](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_fast_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [0.96](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_fast_video.log) | 7.0 | +| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.45](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x | +| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_fast_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x | + +### Localizers + +| Model | MMAction2 (s/iter) | BSN(boundary sensitive network) (s/iter) | BMN(boundary matching network) (s/iter) | +| :------------------------------------------------------------------------------------------------------------------ | :-----------------------: | :--------------------------------------: | :-------------------------------------: | +| BSN ([TEM + PEM + PGM](/configs/localization/bsn)) | **0.074(TEM)+0.040(PEM)** | 0.101(TEM)+0.040(PEM) | x | +| BMN ([bmn_400x100_2x8_9e_activitynet_feature](/configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py)) | **3.27** | x | 3.30 | + +## Details of Comparison + +### TSN + +- **MMAction2** + +```shell +# rawframes +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_tsn configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_tsn_rawframes + +# videos +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_tsn configs/recognition/tsn/tsn_r50_video_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_tsn_video +``` + +- **MMAction** + +```shell +python -u tools/train_recognizer.py configs/TSN/tsn_kinetics400_2d_rgb_r50_seg3_f1s1.py +``` + +- **Temporal-Shift-Module** + +```shell +python main.py kinetics RGB --arch resnet50 --num_segments 3 --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 1 --batch-size 256 -j 32 --dropout 0.5 --consensus_type=avg --eval-freq=10 --npb --print-freq 1 +``` + +### I3D + +- **MMAction2** + +```shell +# rawframes +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_i3d configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_i3d_rawframes + +# videos +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_i3d configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_i3d_video +``` + +- **MMAction** + +```shell +python -u tools/train_recognizer.py configs/I3D_RGB/i3d_kinetics400_3d_rgb_r50_c3d_inflate3x1x1_seg1_f32s2.py +``` + +- **PySlowFast** + +```shell +python tools/run_net.py --cfg configs/Kinetics/I3D_8x8_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_i3d_r50_8x8_video.log +``` + +You may reproduce the result by writing a simple script to parse out the value of the field 'time_diff'. + +### SlowFast + +- **MMAction2** + +```shell +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_slowfast configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py --work-dir work_dirs/benchmark_slowfast_video +``` + +- **PySlowFast** + +```shell +python tools/run_net.py --cfg configs/Kinetics/SLOWFAST_4x16_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_slowfast_r50_4x16_video.log +``` + +You may reproduce the result by writing a simple script to parse out the value of the field 'time_diff'. + +### SlowOnly + +- **MMAction2** + +```shell +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_slowonly configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py --work-dir work_dirs/benchmark_slowonly_video +``` + +- **PySlowFast** + +```shell +python tools/run_net.py --cfg configs/Kinetics/SLOW_4x16_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_slowonly_r50_4x16_video.log +``` + +You may reproduce the result by writing a simple script to parse out the value of the field 'time_diff'. + +### R2plus1D + +- **MMAction2** + +```shell +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_r2plus1d configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py --work-dir work_dirs/benchmark_r2plus1d_video +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/changelog.md b/openmmlab_test/mmaction2-0.24.1/docs/changelog.md new file mode 100644 index 0000000000000000000000000000000000000000..94c3632fcbab1a18bf5cdbc70299ce223577ac65 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/changelog.md @@ -0,0 +1,792 @@ +## Changelog + +### 0.24.0 (05/05/2022) + +**Highlights** + +- Support different seeds + +**New Features** + +- Add lateral norm in multigrid config ([#1567](https://github.com/open-mmlab/mmaction2/pull/1567)) +- Add openpose 25 joints in graph config ([#1578](https://github.com/open-mmlab/mmaction2/pull/1578)) +- Support MLU Backend ([#1608](https://github.com/open-mmlab/mmaction2/pull/1608)) + +**Bug and Typo Fixes** + +- Fix local_rank ([#1558](https://github.com/open-mmlab/mmaction2/pull/1558)) +- Fix install typo ([#1571](https://github.com/open-mmlab/mmaction2/pull/1571)) +- Fix the inference API doc ([#1580](https://github.com/open-mmlab/mmaction2/pull/1580)) +- Fix zh-CN demo.md and getting_started.md ([#1587](https://github.com/open-mmlab/mmaction2/pull/1587)) +- Remove Recommonmark ([#1595](https://github.com/open-mmlab/mmaction2/pull/1595)) +- Fix inference with ndarray ([#1603](https://github.com/open-mmlab/mmaction2/pull/1603)) +- Fix the log error when `IterBasedRunner` is used ([#1606](https://github.com/open-mmlab/mmaction2/pull/1606)) + +### 0.23.0 (04/01/2022) + +**Highlights** + +- Support different seeds +- Provide multi-node training & testing script +- Update error log + +**New Features** + +- Support different seeds([#1502](https://github.com/open-mmlab/mmaction2/pull/1502)) +- Provide multi-node training & testing script([#1521](https://github.com/open-mmlab/mmaction2/pull/1521)) +- Update error log([#1546](https://github.com/open-mmlab/mmaction2/pull/1546)) + +**Documentations** + +- Update gpus in Slowfast readme([#1497](https://github.com/open-mmlab/mmaction2/pull/1497)) +- Fix work_dir in multigrid config([#1498](https://github.com/open-mmlab/mmaction2/pull/1498)) +- Add sub bn docs([#1503](https://github.com/open-mmlab/mmaction2/pull/1503)) +- Add shortcycle sampler docs([#1513](https://github.com/open-mmlab/mmaction2/pull/1513)) +- Update Windows Declaration([#1520](https://github.com/open-mmlab/mmaction2/pull/1520)) +- Update the link for ST-GCN([#1544](https://github.com/open-mmlab/mmaction2/pull/1544)) +- Update install commands([#1549](https://github.com/open-mmlab/mmaction2/pull/1549)) + +**Bug and Typo Fixes** + +- Update colab tutorial install cmds([#1522](https://github.com/open-mmlab/mmaction2/pull/1522)) +- Fix num_iters_per_epoch in analyze_logs.py([#1530](https://github.com/open-mmlab/mmaction2/pull/1530)) +- Fix distributed_sampler([#1532](https://github.com/open-mmlab/mmaction2/pull/1532)) +- Fix cd dir error([#1545](https://github.com/open-mmlab/mmaction2/pull/1545)) +- Update arg names([#1548](https://github.com/open-mmlab/mmaction2/pull/1548)) + +**ModelZoo** + +### 0.22.0 (03/05/2022) + +**Highlights** + +- Support Multigrid training strategy +- Support CPU training +- Support audio demo +- Support topk customizing in models/heads/base.py + +**New Features** + +- Support Multigrid training strategy([#1378](https://github.com/open-mmlab/mmaction2/pull/1378)) +- Support STGCN in demo_skeleton.py([#1391](https://github.com/open-mmlab/mmaction2/pull/1391)) +- Support CPU training([#1407](https://github.com/open-mmlab/mmaction2/pull/1407)) +- Support audio demo([#1425](https://github.com/open-mmlab/mmaction2/pull/1425)) +- Support topk customizing in models/heads/base.py([#1452](https://github.com/open-mmlab/mmaction2/pull/1452)) + +**Documentations** + +- Add OpenMMLab platform([#1393](https://github.com/open-mmlab/mmaction2/pull/1393)) +- Update links([#1394](https://github.com/open-mmlab/mmaction2/pull/1394)) +- Update readme in configs([#1404](https://github.com/open-mmlab/mmaction2/pull/1404)) +- Update instructions to install mmcv-full([#1426](https://github.com/open-mmlab/mmaction2/pull/1426)) +- Add shortcut([#1433](https://github.com/open-mmlab/mmaction2/pull/1433)) +- Update modelzoo([#1439](https://github.com/open-mmlab/mmaction2/pull/1439)) +- add video_structuralize in readme([#1455](https://github.com/open-mmlab/mmaction2/pull/1455)) +- Update OpenMMLab repo information([#1482](https://github.com/open-mmlab/mmaction2/pull/1482)) + +**Bug and Typo Fixes** + +- Update train.py([#1375](https://github.com/open-mmlab/mmaction2/pull/1375)) +- Fix printout bug([#1382](<(https://github.com/open-mmlab/mmaction2/pull/1382)>)) +- Update multi processing setting([#1395](https://github.com/open-mmlab/mmaction2/pull/1395)) +- Setup multi processing both in train and test([#1405](https://github.com/open-mmlab/mmaction2/pull/1405)) +- Fix bug in nondistributed multi-gpu training([#1406](https://github.com/open-mmlab/mmaction2/pull/1406)) +- Add variable fps in ava_dataset.py([#1409](https://github.com/open-mmlab/mmaction2/pull/1409)) +- Only support distributed training([#1414](https://github.com/open-mmlab/mmaction2/pull/1414)) +- Set test_mode for AVA configs([#1432](https://github.com/open-mmlab/mmaction2/pull/1432)) +- Support single label([#1434](https://github.com/open-mmlab/mmaction2/pull/1434)) +- Add check copyright([#1447](https://github.com/open-mmlab/mmaction2/pull/1447)) +- Support Windows CI([#1448](https://github.com/open-mmlab/mmaction2/pull/1448)) +- Fix wrong device of class_weight in models/losses/cross_entropy_loss.py([#1457](https://github.com/open-mmlab/mmaction2/pull/1457)) +- Fix bug caused by distributed([#1459](https://github.com/open-mmlab/mmaction2/pull/1459)) +- Update readme([#1460](https://github.com/open-mmlab/mmaction2/pull/1460)) +- Fix lint caused by colab automatic upload([#1461](https://github.com/open-mmlab/mmaction2/pull/1461)) +- Refine CI([#1471](https://github.com/open-mmlab/mmaction2/pull/1471)) +- Update pre-commit([#1474](https://github.com/open-mmlab/mmaction2/pull/1474)) +- Add deprecation message for deploy tool([#1483](https://github.com/open-mmlab/mmaction2/pull/1483)) + +**ModelZoo** + +- Support slowfast_steplr([#1421](https://github.com/open-mmlab/mmaction2/pull/1421)) + +### 0.21.0 (31/12/2021) + +**Highlights** + +- Support 2s-AGCN +- Support publish models in Windows +- Improve some sthv1 related models +- Support BABEL + +**New Features** + +- Support 2s-AGCN([#1248](https://github.com/open-mmlab/mmaction2/pull/1248)) +- Support skip postproc in ntu_pose_extraction([#1295](https://github.com/open-mmlab/mmaction2/pull/1295)) +- Support publish models in Windows([#1325](https://github.com/open-mmlab/mmaction2/pull/1325)) +- Add copyright checkhook in pre-commit-config([#1344](https://github.com/open-mmlab/mmaction2/pull/1344)) + +**Documentations** + +- Add MMFlow ([#1273](https://github.com/open-mmlab/mmaction2/pull/1273)) +- Revise README.md and add projects.md ([#1286](https://github.com/open-mmlab/mmaction2/pull/1286)) +- Add 2s-AGCN in Updates([#1289](https://github.com/open-mmlab/mmaction2/pull/1289)) +- Add MMFewShot([#1300](https://github.com/open-mmlab/mmaction2/pull/1300)) +- Add MMHuman3d([#1304](https://github.com/open-mmlab/mmaction2/pull/1304)) +- Update pre-commit([#1313](https://github.com/open-mmlab/mmaction2/pull/1313)) +- Use share menu from the theme instead([#1328](https://github.com/open-mmlab/mmaction2/pull/1328)) +- Update installation command([#1340](https://github.com/open-mmlab/mmaction2/pull/1340)) + +**Bug and Typo Fixes** + +- Update the inference part in notebooks([#1256](https://github.com/open-mmlab/mmaction2/pull/1256)) +- Update the map_location([#1262](<(https://github.com/open-mmlab/mmaction2/pull/1262)>)) +- Fix bug that start_index is not used in RawFrameDecode([#1278](https://github.com/open-mmlab/mmaction2/pull/1278)) +- Fix bug in init_random_seed([#1282](https://github.com/open-mmlab/mmaction2/pull/1282)) +- Fix bug in setup.py([#1303](https://github.com/open-mmlab/mmaction2/pull/1303)) +- Fix interrogate error in workflows([#1305](https://github.com/open-mmlab/mmaction2/pull/1305)) +- Fix typo in slowfast config([#1309](https://github.com/open-mmlab/mmaction2/pull/1309)) +- Cancel previous runs that are not completed([#1327](https://github.com/open-mmlab/mmaction2/pull/1327)) +- Fix missing skip_postproc parameter([#1347](https://github.com/open-mmlab/mmaction2/pull/1347)) +- Update ssn.py([#1355](https://github.com/open-mmlab/mmaction2/pull/1355)) +- Use latest youtube-dl([#1357](https://github.com/open-mmlab/mmaction2/pull/1357)) +- Fix test-best([#1362](https://github.com/open-mmlab/mmaction2/pull/1362)) + +**ModelZoo** + +- Improve some sthv1 related models([#1306](https://github.com/open-mmlab/mmaction2/pull/1306)) +- Support BABEL([#1332](https://github.com/open-mmlab/mmaction2/pull/1332)) + +### 0.20.0 (07/10/2021) + +**Highlights** + +- Support TorchServe +- Add video structuralize demo +- Support using 3D skeletons for skeleton-based action recognition +- Benchmark PoseC3D on UCF and HMDB + +**New Features** + +- Support TorchServe ([#1212](https://github.com/open-mmlab/mmaction2/pull/1212)) +- Support 3D skeletons pre-processing ([#1218](https://github.com/open-mmlab/mmaction2/pull/1218)) +- Support video structuralize demo ([#1197](https://github.com/open-mmlab/mmaction2/pull/1197)) + +**Documentations** + +- Revise README.md and add projects.md ([#1214](https://github.com/open-mmlab/mmaction2/pull/1214)) +- Add CN docs for Skeleton dataset, PoseC3D and ST-GCN ([#1228](https://github.com/open-mmlab/mmaction2/pull/1228), [#1237](https://github.com/open-mmlab/mmaction2/pull/1237), [#1236](https://github.com/open-mmlab/mmaction2/pull/1236)) +- Add tutorial for custom dataset training for skeleton-based action recognition ([#1234](https://github.com/open-mmlab/mmaction2/pull/1234)) + +**Bug and Typo Fixes** + +- Fix tutorial link ([#1219](https://github.com/open-mmlab/mmaction2/pull/1219)) +- Fix GYM links ([#1224](https://github.com/open-mmlab/mmaction2/pull/1224)) + +**ModelZoo** + +- Benchmark PoseC3D on UCF and HMDB ([#1223](https://github.com/open-mmlab/mmaction2/pull/1223)) +- Add ST-GCN + 3D skeleton model for NTU60-XSub ([#1236](https://github.com/open-mmlab/mmaction2/pull/1236)) + +### 0.19.0 (07/10/2021) + +**Highlights** + +- Support ST-GCN +- Refactor the inference API +- Add code spell check hook + +**New Features** + +- Support ST-GCN ([#1123](https://github.com/open-mmlab/mmaction2/pull/1123)) + +**Improvement** + +- Add label maps for every dataset ([#1127](https://github.com/open-mmlab/mmaction2/pull/1127)) +- Remove useless code MultiGroupCrop ([#1180](https://github.com/open-mmlab/mmaction2/pull/1180)) +- Refactor Inference API ([#1191](https://github.com/open-mmlab/mmaction2/pull/1191)) +- Add code spell check hook ([#1208](https://github.com/open-mmlab/mmaction2/pull/1208)) +- Use docker in CI ([#1159](https://github.com/open-mmlab/mmaction2/pull/1159)) + +**Documentations** + +- Update metafiles to new OpenMMLAB protocols ([#1134](https://github.com/open-mmlab/mmaction2/pull/1134)) +- Switch to new doc style ([#1160](https://github.com/open-mmlab/mmaction2/pull/1160)) +- Improve the ERROR message ([#1203](https://github.com/open-mmlab/mmaction2/pull/1203)) +- Fix invalid URL in getting_started ([#1169](https://github.com/open-mmlab/mmaction2/pull/1169)) + +**Bug and Typo Fixes** + +- Compatible with new MMClassification ([#1139](https://github.com/open-mmlab/mmaction2/pull/1139)) +- Add missing runtime dependencies ([#1144](https://github.com/open-mmlab/mmaction2/pull/1144)) +- Fix THUMOS tag proposals path ([#1156](https://github.com/open-mmlab/mmaction2/pull/1156)) +- Fix LoadHVULabel ([#1194](https://github.com/open-mmlab/mmaction2/pull/1194)) +- Switch the default value of `persistent_workers` to False ([#1202](https://github.com/open-mmlab/mmaction2/pull/1202)) +- Fix `_freeze_stages` for MobileNetV2 ([#1193](https://github.com/open-mmlab/mmaction2/pull/1193)) +- Fix resume when building rawframes ([#1150](https://github.com/open-mmlab/mmaction2/pull/1150)) +- Fix device bug for class weight ([#1188](https://github.com/open-mmlab/mmaction2/pull/1188)) +- Correct Arg names in extract_audio.py ([#1148](https://github.com/open-mmlab/mmaction2/pull/1148)) + +**ModelZoo** + +- Add TSM-MobileNetV2 ported from TSM ([#1163](https://github.com/open-mmlab/mmaction2/pull/1163)) +- Add ST-GCN for NTURGB+D-XSub-60 ([#1123](https://github.com/open-mmlab/mmaction2/pull/1123)) + +### 0.18.0 (02/09/2021) + +**Improvement** + +- Add CopyRight ([#1099](https://github.com/open-mmlab/mmaction2/pull/1099)) +- Support NTU Pose Extraction ([#1076](https://github.com/open-mmlab/mmaction2/pull/1076)) +- Support Caching in RawFrameDecode ([#1078](https://github.com/open-mmlab/mmaction2/pull/1078)) +- Add citations & Support python3.9 CI & Use fixed-version sphinx ([#1125](https://github.com/open-mmlab/mmaction2/pull/1125)) + +**Documentations** + +- Add Descriptions of PoseC3D dataset ([#1053](https://github.com/open-mmlab/mmaction2/pull/1053)) + +**Bug and Typo Fixes** + +- Fix SSV2 checkpoints ([#1101](https://github.com/open-mmlab/mmaction2/pull/1101)) +- Fix CSN normalization ([#1116](https://github.com/open-mmlab/mmaction2/pull/1116)) +- Fix typo ([#1121](https://github.com/open-mmlab/mmaction2/pull/1121)) +- Fix new_crop_quadruple bug ([#1108](https://github.com/open-mmlab/mmaction2/pull/1108)) + +### 0.17.0 (03/08/2021) + +**Highlights** + +- Support PyTorch 1.9 +- Support Pytorchvideo Transforms +- Support PreciseBN + +**New Features** + +- Support Pytorchvideo Transforms ([#1008](https://github.com/open-mmlab/mmaction2/pull/1008)) +- Support PreciseBN ([#1038](https://github.com/open-mmlab/mmaction2/pull/1038)) + +**Improvements** + +- Remove redundant augmentations in config files ([#996](https://github.com/open-mmlab/mmaction2/pull/996)) +- Make resource directory to hold common resource pictures ([#1011](https://github.com/open-mmlab/mmaction2/pull/1011)) +- Remove deprecated FrameSelector ([#1010](https://github.com/open-mmlab/mmaction2/pull/1010)) +- Support Concat Dataset ([#1000](https://github.com/open-mmlab/mmaction2/pull/1000)) +- Add `to-mp4` option to resize_videos.py ([#1021](https://github.com/open-mmlab/mmaction2/pull/1021)) +- Add option to keep tail frames ([#1050](https://github.com/open-mmlab/mmaction2/pull/1050)) +- Update MIM support ([#1061](https://github.com/open-mmlab/mmaction2/pull/1061)) +- Calculate Top-K accurate and inaccurate classes ([#1047](https://github.com/open-mmlab/mmaction2/pull/1047)) + +**Bug and Typo Fixes** + +- Fix bug in PoseC3D demo ([#1009](https://github.com/open-mmlab/mmaction2/pull/1009)) +- Fix some problems in resize_videos.py ([#1012](https://github.com/open-mmlab/mmaction2/pull/1012)) +- Support torch1.9 ([#1015](https://github.com/open-mmlab/mmaction2/pull/1015)) +- Remove redundant code in CI ([#1046](https://github.com/open-mmlab/mmaction2/pull/1046)) +- Fix bug about persistent_workers ([#1044](https://github.com/open-mmlab/mmaction2/pull/1044)) +- Support TimeSformer feature extraction ([#1035](https://github.com/open-mmlab/mmaction2/pull/1035)) +- Fix ColorJitter ([#1025](https://github.com/open-mmlab/mmaction2/pull/1025)) + +**ModelZoo** + +- Add TSM-R50 sthv1 models trained by PytorchVideo RandAugment and AugMix ([#1008](https://github.com/open-mmlab/mmaction2/pull/1008)) +- Update SlowOnly SthV1 checkpoints ([#1034](https://github.com/open-mmlab/mmaction2/pull/1034)) +- Add SlowOnly Kinetics400 checkpoints trained with Precise-BN ([#1038](https://github.com/open-mmlab/mmaction2/pull/1038)) +- Add CSN-R50 from scratch checkpoints ([#1045](https://github.com/open-mmlab/mmaction2/pull/1045)) +- TPN Kinetics-400 Checkpoints trained with the new ColorJitter ([#1025](https://github.com/open-mmlab/mmaction2/pull/1025)) + +**Documentation** + +- Add Chinese translation of feature_extraction.md ([#1020](https://github.com/open-mmlab/mmaction2/pull/1020)) +- Fix the code snippet in getting_started.md ([#1023](https://github.com/open-mmlab/mmaction2/pull/1023)) +- Fix TANet config table ([#1028](https://github.com/open-mmlab/mmaction2/pull/1028)) +- Add description to PoseC3D dataset ([#1053](https://github.com/open-mmlab/mmaction2/pull/1053)) + +### 0.16.0 (01/07/2021) + +**Highlights** + +- Support using backbone from pytorch-image-models(timm) +- Support PIMS Decoder +- Demo for skeleton-based action recognition +- Support Timesformer + +**New Features** + +- Support using backbones from pytorch-image-models(timm) for TSN ([#880](https://github.com/open-mmlab/mmaction2/pull/880)) +- Support torchvision transformations in preprocessing pipelines ([#972](https://github.com/open-mmlab/mmaction2/pull/972)) +- Demo for skeleton-based action recognition ([#972](https://github.com/open-mmlab/mmaction2/pull/972)) +- Support Timesformer ([#839](https://github.com/open-mmlab/mmaction2/pull/839)) + +**Improvements** + +- Add a tool to find invalid videos ([#907](https://github.com/open-mmlab/mmaction2/pull/907), [#950](https://github.com/open-mmlab/mmaction2/pull/950)) +- Add an option to specify spectrogram_type ([#909](https://github.com/open-mmlab/mmaction2/pull/909)) +- Add json output to video demo ([#906](https://github.com/open-mmlab/mmaction2/pull/906)) +- Add MIM related docs ([#918](https://github.com/open-mmlab/mmaction2/pull/918)) +- Rename lr to scheduler ([#916](https://github.com/open-mmlab/mmaction2/pull/916)) +- Support `--cfg-options` for demos ([#911](https://github.com/open-mmlab/mmaction2/pull/911)) +- Support number counting for flow-wise filename template ([#922](https://github.com/open-mmlab/mmaction2/pull/922)) +- Add Chinese tutorial ([#941](https://github.com/open-mmlab/mmaction2/pull/941)) +- Change ResNet3D default values ([#939](https://github.com/open-mmlab/mmaction2/pull/939)) +- Adjust script structure ([#935](https://github.com/open-mmlab/mmaction2/pull/935)) +- Add font color to args in long_video_demo ([#947](https://github.com/open-mmlab/mmaction2/pull/947)) +- Polish code style with Pylint ([#908](https://github.com/open-mmlab/mmaction2/pull/908)) +- Support PIMS Decoder ([#946](https://github.com/open-mmlab/mmaction2/pull/946)) +- Improve Metafiles ([#956](https://github.com/open-mmlab/mmaction2/pull/956), [#979](https://github.com/open-mmlab/mmaction2/pull/979), [#966](https://github.com/open-mmlab/mmaction2/pull/966)) +- Add links to download Kinetics400 validation ([#920](https://github.com/open-mmlab/mmaction2/pull/920)) +- Audit the usage of shutil.rmtree ([#943](https://github.com/open-mmlab/mmaction2/pull/943)) +- Polish localizer related codes([#913](https://github.com/open-mmlab/mmaction2/pull/913)) + +**Bug and Typo Fixes** + +- Fix spatiotemporal detection demo ([#899](https://github.com/open-mmlab/mmaction2/pull/899)) +- Fix docstring for 3D inflate ([#925](https://github.com/open-mmlab/mmaction2/pull/925)) +- Fix bug of writing text to video with TextClip ([#952](https://github.com/open-mmlab/mmaction2/pull/952)) +- Fix mmcv install in CI ([#977](https://github.com/open-mmlab/mmaction2/pull/977)) + +**ModelZoo** + +- Add TSN with Swin Transformer backbone as an example for using pytorch-image-models(timm) backbones ([#880](https://github.com/open-mmlab/mmaction2/pull/880)) +- Port CSN checkpoints from VMZ ([#945](https://github.com/open-mmlab/mmaction2/pull/945)) +- Release various checkpoints for UCF101, HMDB51 and Sthv1 ([#938](https://github.com/open-mmlab/mmaction2/pull/938)) +- Support Timesformer ([#839](https://github.com/open-mmlab/mmaction2/pull/839)) +- Update TSM modelzoo ([#981](https://github.com/open-mmlab/mmaction2/pull/981)) + +### 0.15.0 (31/05/2021) + +**Highlights** + +- Support PoseC3D +- Support ACRN +- Support MIM + +**New Features** + +- Support PoseC3D ([#786](https://github.com/open-mmlab/mmaction2/pull/786), [#890](https://github.com/open-mmlab/mmaction2/pull/890)) +- Support MIM ([#870](https://github.com/open-mmlab/mmaction2/pull/870)) +- Support ACRN and Focal Loss ([#891](https://github.com/open-mmlab/mmaction2/pull/891)) +- Support Jester dataset ([#864](https://github.com/open-mmlab/mmaction2/pull/864)) + +**Improvements** + +- Add `metric_options` for evaluation to docs ([#873](https://github.com/open-mmlab/mmaction2/pull/873)) +- Support creating a new label map based on custom classes for demos about spatio temporal demo ([#879](https://github.com/open-mmlab/mmaction2/pull/879)) +- Improve document about AVA dataset preparation ([#878](https://github.com/open-mmlab/mmaction2/pull/878)) +- Provide a script to extract clip-level feature ([#856](https://github.com/open-mmlab/mmaction2/pull/856)) + +**Bug and Typo Fixes** + +- Fix issues about resume ([#877](https://github.com/open-mmlab/mmaction2/pull/877), [#878](https://github.com/open-mmlab/mmaction2/pull/878)) +- Correct the key name of `eval_results` dictionary for metric 'mmit_mean_average_precision' ([#885](https://github.com/open-mmlab/mmaction2/pull/885)) + +**ModelZoo** + +- Support Jester dataset ([#864](https://github.com/open-mmlab/mmaction2/pull/864)) +- Support ACRN and Focal Loss ([#891](https://github.com/open-mmlab/mmaction2/pull/891)) + +### 0.14.0 (30/04/2021) + +**Highlights** + +- Support TRN +- Support Diving48 + +**New Features** + +- Support TRN ([#755](https://github.com/open-mmlab/mmaction2/pull/755)) +- Support Diving48 ([#835](https://github.com/open-mmlab/mmaction2/pull/835)) +- Support Webcam Demo for Spatio-temporal Action Detection Models ([#795](https://github.com/open-mmlab/mmaction2/pull/795)) + +**Improvements** + +- Add softmax option for pytorch2onnx tool ([#781](https://github.com/open-mmlab/mmaction2/pull/781)) +- Support TRN ([#755](https://github.com/open-mmlab/mmaction2/pull/755)) +- Test with onnx models and TensorRT engines ([#758](https://github.com/open-mmlab/mmaction2/pull/758)) +- Speed up AVA Testing ([#784](https://github.com/open-mmlab/mmaction2/pull/784)) +- Add `self.with_neck` attribute ([#796](https://github.com/open-mmlab/mmaction2/pull/796)) +- Update installation document ([#798](https://github.com/open-mmlab/mmaction2/pull/798)) +- Use a random master port ([#809](https://github.com/open-mmlab/mmaction2/pull/8098)) +- Update AVA processing data document ([#801](https://github.com/open-mmlab/mmaction2/pull/801)) +- Refactor spatio-temporal augmentation ([#782](https://github.com/open-mmlab/mmaction2/pull/782)) +- Add QR code in CN README ([#812](https://github.com/open-mmlab/mmaction2/pull/812)) +- Add Alternative way to download Kinetics ([#817](https://github.com/open-mmlab/mmaction2/pull/817), [#822](https://github.com/open-mmlab/mmaction2/pull/822)) +- Refactor Sampler ([#790](https://github.com/open-mmlab/mmaction2/pull/790)) +- Use EvalHook in MMCV with backward compatibility ([#793](https://github.com/open-mmlab/mmaction2/pull/793)) +- Use MMCV Model Registry ([#843](https://github.com/open-mmlab/mmaction2/pull/843)) + +**Bug and Typo Fixes** + +- Fix a bug in pytorch2onnx.py when `num_classes <= 4` ([#800](https://github.com/open-mmlab/mmaction2/pull/800), [#824](https://github.com/open-mmlab/mmaction2/pull/824)) +- Fix `demo_spatiotemporal_det.py` error ([#803](https://github.com/open-mmlab/mmaction2/pull/803), [#805](https://github.com/open-mmlab/mmaction2/pull/805)) +- Fix loading config bugs when resume ([#820](https://github.com/open-mmlab/mmaction2/pull/820)) +- Make HMDB51 annotation generation more robust ([#811](https://github.com/open-mmlab/mmaction2/pull/811)) + +**ModelZoo** + +- Update checkpoint for 256 height in something-V2 ([#789](https://github.com/open-mmlab/mmaction2/pull/789)) +- Support Diving48 ([#835](https://github.com/open-mmlab/mmaction2/pull/835)) + +### 0.13.0 (31/03/2021) + +**Highlights** + +- Support LFB +- Support using backbone from MMCls/TorchVision +- Add Chinese documentation + +**New Features** + +- Support LFB ([#553](https://github.com/open-mmlab/mmaction2/pull/553)) +- Support using backbones from MMCls for TSN ([#679](https://github.com/open-mmlab/mmaction2/pull/679)) +- Support using backbones from TorchVision for TSN ([#720](https://github.com/open-mmlab/mmaction2/pull/720)) +- Support Mixup and Cutmix for recognizers ([#681](https://github.com/open-mmlab/mmaction2/pull/681)) +- Support Chinese documentation ([#665](https://github.com/open-mmlab/mmaction2/pull/665), [#680](https://github.com/open-mmlab/mmaction2/pull/680), [#689](https://github.com/open-mmlab/mmaction2/pull/689), [#701](https://github.com/open-mmlab/mmaction2/pull/701), [#702](https://github.com/open-mmlab/mmaction2/pull/702), [#703](https://github.com/open-mmlab/mmaction2/pull/703), [#706](https://github.com/open-mmlab/mmaction2/pull/706), [#716](https://github.com/open-mmlab/mmaction2/pull/716), [#717](https://github.com/open-mmlab/mmaction2/pull/717), [#731](https://github.com/open-mmlab/mmaction2/pull/731), [#733](https://github.com/open-mmlab/mmaction2/pull/733), [#735](https://github.com/open-mmlab/mmaction2/pull/735), [#736](https://github.com/open-mmlab/mmaction2/pull/736), [#737](https://github.com/open-mmlab/mmaction2/pull/737), [#738](https://github.com/open-mmlab/mmaction2/pull/738), [#739](https://github.com/open-mmlab/mmaction2/pull/739), [#740](https://github.com/open-mmlab/mmaction2/pull/740), [#742](https://github.com/open-mmlab/mmaction2/pull/742), [#752](https://github.com/open-mmlab/mmaction2/pull/752), [#759](https://github.com/open-mmlab/mmaction2/pull/759), [#761](https://github.com/open-mmlab/mmaction2/pull/761), [#772](https://github.com/open-mmlab/mmaction2/pull/772), [#775](https://github.com/open-mmlab/mmaction2/pull/775)) + +**Improvements** + +- Add slowfast config/json/log/ckpt for training custom classes of AVA ([#678](https://github.com/open-mmlab/mmaction2/pull/678)) +- Set RandAugment as Imgaug default transforms ([#585](https://github.com/open-mmlab/mmaction2/pull/585)) +- Add `--test-last` & `--test-best` for `tools/train.py` to test checkpoints after training ([#608](https://github.com/open-mmlab/mmaction2/pull/608)) +- Add fcn_testing in TPN ([#684](https://github.com/open-mmlab/mmaction2/pull/684)) +- Remove redundant recall functions ([#741](https://github.com/open-mmlab/mmaction2/pull/741)) +- Recursively remove pretrained step for testing ([#695](https://github.com/open-mmlab/mmaction2/pull/695)) +- Improve demo by limiting inference fps ([#668](https://github.com/open-mmlab/mmaction2/pull/668)) + +**Bug and Typo Fixes** + +- Fix a bug about multi-class in VideoDataset ([#723](https://github.com/open-mmlab/mmaction2/pull/678)) +- Reverse key-value in anet filelist generation ([#686](https://github.com/open-mmlab/mmaction2/pull/686)) +- Fix flow norm cfg typo ([#693](https://github.com/open-mmlab/mmaction2/pull/693)) + +**ModelZoo** + +- Add LFB for AVA2.1 ([#553](https://github.com/open-mmlab/mmaction2/pull/553)) +- Add TSN with ResNeXt-101-32x4d backbone as an example for using MMCls backbones ([#679](https://github.com/open-mmlab/mmaction2/pull/679)) +- Add TSN with Densenet161 backbone as an example for using TorchVision backbones ([#720](https://github.com/open-mmlab/mmaction2/pull/720)) +- Add slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb ([#690](https://github.com/open-mmlab/mmaction2/pull/690)) +- Add slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb ([#704](https://github.com/open-mmlab/mmaction2/pull/704)) +- Add slowonly_nl_kinetics_pretrained_r50_4x16x1(8x8x1)\_20e_ava_rgb ([#730](https://github.com/open-mmlab/mmaction2/pull/730)) + +### 0.12.0 (28/02/2021) + +**Highlights** + +- Support TSM-MobileNetV2 +- Support TANet +- Support GPU Normalize + +**New Features** + +- Support TSM-MobileNetV2 ([#415](https://github.com/open-mmlab/mmaction2/pull/415)) +- Support flip with label mapping ([#591](https://github.com/open-mmlab/mmaction2/pull/591)) +- Add seed option for sampler ([#642](https://github.com/open-mmlab/mmaction2/pull/642)) +- Support GPU Normalize ([#586](https://github.com/open-mmlab/mmaction2/pull/586)) +- Support TANet ([#595](https://github.com/open-mmlab/mmaction2/pull/595)) + +**Improvements** + +- Training custom classes of ava dataset ([#555](https://github.com/open-mmlab/mmaction2/pull/555)) +- Add CN README in homepage ([#592](https://github.com/open-mmlab/mmaction2/pull/592), [#594](https://github.com/open-mmlab/mmaction2/pull/594)) +- Support soft label for CrossEntropyLoss ([#625](https://github.com/open-mmlab/mmaction2/pull/625)) +- Refactor config: Specify `train_cfg` and `test_cfg` in `model` ([#629](https://github.com/open-mmlab/mmaction2/pull/629)) +- Provide an alternative way to download older kinetics annotations ([#597](https://github.com/open-mmlab/mmaction2/pull/597)) +- Update FAQ for + - 1). data pipeline about video and frames ([#598](https://github.com/open-mmlab/mmaction2/pull/598)) + - 2). how to show results ([#598](https://github.com/open-mmlab/mmaction2/pull/598)) + - 3). batch size setting for batchnorm ([#657](https://github.com/open-mmlab/mmaction2/pull/657)) + - 4). how to fix stages of backbone when finetuning models ([#658](https://github.com/open-mmlab/mmaction2/pull/658)) +- Modify default value of `save_best` ([#600](https://github.com/open-mmlab/mmaction2/pull/600)) +- Use BibTex rather than latex in markdown ([#607](https://github.com/open-mmlab/mmaction2/pull/607)) +- Add warnings of uninstalling mmdet and supplementary documents ([#624](https://github.com/open-mmlab/mmaction2/pull/624)) +- Support soft label for CrossEntropyLoss ([#625](https://github.com/open-mmlab/mmaction2/pull/625)) + +**Bug and Typo Fixes** + +- Fix value of `pem_low_temporal_iou_threshold` in BSN ([#556](https://github.com/open-mmlab/mmaction2/pull/556)) +- Fix ActivityNet download script ([#601](https://github.com/open-mmlab/mmaction2/pull/601)) + +**ModelZoo** + +- Add TSM-MobileNetV2 for Kinetics400 ([#415](https://github.com/open-mmlab/mmaction2/pull/415)) +- Add deeper SlowFast models ([#605](https://github.com/open-mmlab/mmaction2/pull/605)) + +### 0.11.0 (31/01/2021) + +**Highlights** + +- Support imgaug +- Support spatial temporal demo +- Refactor EvalHook, config structure, unittest structure + +**New Features** + +- Support [imgaug](https://imgaug.readthedocs.io/en/latest/index.html) for augmentations in the data pipeline ([#492](https://github.com/open-mmlab/mmaction2/pull/492)) +- Support setting `max_testing_views` for extremely large models to save GPU memory used ([#511](https://github.com/open-mmlab/mmaction2/pull/511)) +- Add spatial temporal demo ([#547](https://github.com/open-mmlab/mmaction2/pull/547), [#566](https://github.com/open-mmlab/mmaction2/pull/566)) + +**Improvements** + +- Refactor EvalHook ([#395](https://github.com/open-mmlab/mmaction2/pull/395)) +- Refactor AVA hook ([#567](https://github.com/open-mmlab/mmaction2/pull/567)) +- Add repo citation ([#545](https://github.com/open-mmlab/mmaction2/pull/545)) +- Add dataset size of Kinetics400 ([#503](https://github.com/open-mmlab/mmaction2/pull/503)) +- Add lazy operation docs ([#504](https://github.com/open-mmlab/mmaction2/pull/504)) +- Add class_weight for CrossEntropyLoss and BCELossWithLogits ([#509](https://github.com/open-mmlab/mmaction2/pull/509)) +- add some explanation about the resampling in slowfast ([#502](https://github.com/open-mmlab/mmaction2/pull/502)) +- Modify paper title in README.md ([#512](https://github.com/open-mmlab/mmaction2/pull/512)) +- Add alternative ways to download Kinetics ([#521](https://github.com/open-mmlab/mmaction2/pull/521)) +- Add OpenMMLab projects link in README ([#530](https://github.com/open-mmlab/mmaction2/pull/530)) +- Change default preprocessing to shortedge to 256 ([#538](https://github.com/open-mmlab/mmaction2/pull/538)) +- Add config tag in dataset README ([#540](https://github.com/open-mmlab/mmaction2/pull/540)) +- Add solution for markdownlint installation issue ([#497](https://github.com/open-mmlab/mmaction2/pull/497)) +- Add dataset overview in readthedocs ([#548](https://github.com/open-mmlab/mmaction2/pull/548)) +- Modify the trigger mode of the warnings of missing mmdet ([#583](https://github.com/open-mmlab/mmaction2/pull/583)) +- Refactor config structure ([#488](https://github.com/open-mmlab/mmaction2/pull/488), [#572](https://github.com/open-mmlab/mmaction2/pull/572)) +- Refactor unittest structure ([#433](https://github.com/open-mmlab/mmaction2/pull/433)) + +**Bug and Typo Fixes** + +- Fix a bug about ava dataset validation ([#527](https://github.com/open-mmlab/mmaction2/pull/527)) +- Fix a bug about ResNet pretrain weight initialization ([#582](https://github.com/open-mmlab/mmaction2/pull/582)) +- Fix a bug in CI due to MMCV index ([#495](https://github.com/open-mmlab/mmaction2/pull/495)) +- Remove invalid links of MiT and MMiT ([#516](https://github.com/open-mmlab/mmaction2/pull/516)) +- Fix frame rate bug for AVA preparation ([#576](https://github.com/open-mmlab/mmaction2/pull/576)) + +**ModelZoo** + +### 0.10.0 (31/12/2020) + +**Highlights** + +- Support Spatio-Temporal Action Detection (AVA) +- Support precise BN + +**New Features** + +- Support precise BN ([#501](https://github.com/open-mmlab/mmaction2/pull/501/)) +- Support Spatio-Temporal Action Detection (AVA) ([#351](https://github.com/open-mmlab/mmaction2/pull/351)) +- Support to return feature maps in `inference_recognizer` ([#458](https://github.com/open-mmlab/mmaction2/pull/458)) + +**Improvements** + +- Add arg `stride` to long_video_demo.py, to make inference faster ([#468](https://github.com/open-mmlab/mmaction2/pull/468)) +- Support training and testing for Spatio-Temporal Action Detection ([#351](https://github.com/open-mmlab/mmaction2/pull/351)) +- Fix CI due to pip upgrade ([#454](https://github.com/open-mmlab/mmaction2/pull/454)) +- Add markdown lint in pre-commit hook ([#255](https://github.com/open-mmlab/mmaction2/pull/225)) +- Speed up confusion matrix calculation ([#465](https://github.com/open-mmlab/mmaction2/pull/465)) +- Use title case in modelzoo statistics ([#456](https://github.com/open-mmlab/mmaction2/pull/456)) +- Add FAQ documents for easy troubleshooting. ([#413](https://github.com/open-mmlab/mmaction2/pull/413), [#420](https://github.com/open-mmlab/mmaction2/pull/420), [#439](https://github.com/open-mmlab/mmaction2/pull/439)) +- Support Spatio-Temporal Action Detection with context ([#471](https://github.com/open-mmlab/mmaction2/pull/471)) +- Add class weight for CrossEntropyLoss and BCELossWithLogits ([#509](https://github.com/open-mmlab/mmaction2/pull/509)) +- Add Lazy OPs docs ([#504](https://github.com/open-mmlab/mmaction2/pull/504)) + +**Bug and Typo Fixes** + +- Fix typo in default argument of BaseHead ([#446](https://github.com/open-mmlab/mmaction2/pull/446)) +- Fix potential bug about `output_config` overwrite ([#463](https://github.com/open-mmlab/mmaction2/pull/463)) + +**ModelZoo** + +- Add SlowOnly, SlowFast for AVA2.1 ([#351](https://github.com/open-mmlab/mmaction2/pull/351)) + +### 0.9.0 (30/11/2020) + +**Highlights** + +- Support GradCAM utils for recognizers +- Support ResNet Audio model + +**New Features** + +- Automatically add modelzoo statistics to readthedocs ([#327](https://github.com/open-mmlab/mmaction2/pull/327)) +- Support GYM99 ([#331](https://github.com/open-mmlab/mmaction2/pull/331), [#336](https://github.com/open-mmlab/mmaction2/pull/336)) +- Add AudioOnly Pathway from AVSlowFast. ([#355](https://github.com/open-mmlab/mmaction2/pull/355)) +- Add GradCAM utils for recognizer ([#324](https://github.com/open-mmlab/mmaction2/pull/324)) +- Add print config script ([#345](https://github.com/open-mmlab/mmaction2/pull/345)) +- Add online motion vector decoder ([#291](https://github.com/open-mmlab/mmaction2/pull/291)) + +**Improvements** + +- Support PyTorch 1.7 in CI ([#312](https://github.com/open-mmlab/mmaction2/pull/312)) +- Support to predict different labels in a long video ([#274](https://github.com/open-mmlab/mmaction2/pull/274)) +- Update docs bout test crops ([#359](https://github.com/open-mmlab/mmaction2/pull/359)) +- Polish code format using pylint manually ([#338](https://github.com/open-mmlab/mmaction2/pull/338)) +- Update unittest coverage ([#358](https://github.com/open-mmlab/mmaction2/pull/358), [#322](https://github.com/open-mmlab/mmaction2/pull/322), [#325](https://github.com/open-mmlab/mmaction2/pull/325)) +- Add random seed for building filelists ([#323](https://github.com/open-mmlab/mmaction2/pull/323)) +- Update colab tutorial ([#367](https://github.com/open-mmlab/mmaction2/pull/367)) +- set default batch_size of evaluation and testing to 1 ([#250](https://github.com/open-mmlab/mmaction2/pull/250)) +- Rename the preparation docs to `README.md` ([#388](https://github.com/open-mmlab/mmaction2/pull/388)) +- Move docs about demo to `demo/README.md` ([#329](https://github.com/open-mmlab/mmaction2/pull/329)) +- Remove redundant code in `tools/test.py` ([#310](https://github.com/open-mmlab/mmaction2/pull/310)) +- Automatically calculate number of test clips for Recognizer2D ([#359](https://github.com/open-mmlab/mmaction2/pull/359)) + +**Bug and Typo Fixes** + +- Fix rename Kinetics classnames bug ([#384](https://github.com/open-mmlab/mmaction2/pull/384)) +- Fix a bug in BaseDataset when `data_prefix` is None ([#314](https://github.com/open-mmlab/mmaction2/pull/314)) +- Fix a bug about `tmp_folder` in `OpenCVInit` ([#357](https://github.com/open-mmlab/mmaction2/pull/357)) +- Fix `get_thread_id` when not using disk as backend ([#354](https://github.com/open-mmlab/mmaction2/pull/354), [#357](https://github.com/open-mmlab/mmaction2/pull/357)) +- Fix the bug of HVU object `num_classes` from 1679 to 1678 ([#307](https://github.com/open-mmlab/mmaction2/pull/307)) +- Fix typo in `export_model.md` ([#399](https://github.com/open-mmlab/mmaction2/pull/399)) +- Fix OmniSource training configs ([#321](https://github.com/open-mmlab/mmaction2/pull/321)) +- Fix Issue #306: Bug of SampleAVAFrames ([#317](https://github.com/open-mmlab/mmaction2/pull/317)) + +**ModelZoo** + +- Add SlowOnly model for GYM99, both RGB and Flow ([#336](https://github.com/open-mmlab/mmaction2/pull/336)) +- Add auto modelzoo statistics in readthedocs ([#327](https://github.com/open-mmlab/mmaction2/pull/327)) +- Add TSN for HMDB51 pretrained on Kinetics400, Moments in Time and ImageNet. ([#372](https://github.com/open-mmlab/mmaction2/pull/372)) + +### v0.8.0 (31/10/2020) + +**Highlights** + +- Support [OmniSource](https://arxiv.org/abs/2003.13042) +- Support C3D +- Support video recognition with audio modality +- Support HVU +- Support X3D + +**New Features** + +- Support AVA dataset preparation ([#266](https://github.com/open-mmlab/mmaction2/pull/266)) +- Support the training of video recognition dataset with multiple tag categories ([#235](https://github.com/open-mmlab/mmaction2/pull/235)) +- Support joint training with multiple training datasets of multiple formats, including images, untrimmed videos, etc. ([#242](https://github.com/open-mmlab/mmaction2/pull/242)) +- Support to specify a start epoch to conduct evaluation ([#216](https://github.com/open-mmlab/mmaction2/pull/216)) +- Implement X3D models, support testing with model weights converted from SlowFast ([#288](https://github.com/open-mmlab/mmaction2/pull/288)) +- Support specify a start epoch to conduct evaluation ([#216](https://github.com/open-mmlab/mmaction2/pull/216)) + +**Improvements** + +- Set default values of 'average_clips' in each config file so that there is no need to set it explicitly during testing in most cases ([#232](https://github.com/open-mmlab/mmaction2/pull/232)) +- Extend HVU datatools to generate individual file list for each tag category ([#258](https://github.com/open-mmlab/mmaction2/pull/258)) +- Support data preparation for Kinetics-600 and Kinetics-700 ([#254](https://github.com/open-mmlab/mmaction2/pull/254)) +- Use `metric_dict` to replace hardcoded arguments in `evaluate` function ([#286](https://github.com/open-mmlab/mmaction2/pull/286)) +- Add `cfg-options` in arguments to override some settings in the used config for convenience ([#212](https://github.com/open-mmlab/mmaction2/pull/212)) +- Rename the old evaluating protocol `mean_average_precision` as `mmit_mean_average_precision` since it is only used on MMIT and is not the `mAP` we usually talk about. Add `mean_average_precision`, which is the real `mAP` ([#235](https://github.com/open-mmlab/mmaction2/pull/235)) +- Add accurate setting (Three crop * 2 clip) and report corresponding performance for TSM model ([#241](https://github.com/open-mmlab/mmaction2/pull/241)) +- Add citations in each preparing_dataset.md in `tools/data/dataset` ([#289](https://github.com/open-mmlab/mmaction2/pull/289)) +- Update the performance of audio-visual fusion on Kinetics-400 ([#281](https://github.com/open-mmlab/mmaction2/pull/281)) +- Support data preparation of OmniSource web datasets, including GoogleImage, InsImage, InsVideo and KineticsRawVideo ([#294](https://github.com/open-mmlab/mmaction2/pull/294)) +- Use `metric_options` dict to provide metric args in `evaluate` ([#286](https://github.com/open-mmlab/mmaction2/pull/286)) + +**Bug Fixes** + +- Register `FrameSelector` in `PIPELINES` ([#268](https://github.com/open-mmlab/mmaction2/pull/268)) +- Fix the potential bug for default value in dataset_setting ([#245](https://github.com/open-mmlab/mmaction2/pull/245)) +- Fix multi-node dist test ([#292](https://github.com/open-mmlab/mmaction2/pull/292)) +- Fix the data preparation bug for `something-something` dataset ([#278](https://github.com/open-mmlab/mmaction2/pull/278)) +- Fix the invalid config url in slowonly README data benchmark ([#249](https://github.com/open-mmlab/mmaction2/pull/249)) +- Validate that the performance of models trained with videos have no significant difference comparing to the performance of models trained with rawframes ([#256](https://github.com/open-mmlab/mmaction2/pull/256)) +- Correct the `img_norm_cfg` used by TSN-3seg-R50 UCF-101 model, improve the Top-1 accuracy by 3% ([#273](https://github.com/open-mmlab/mmaction2/pull/273)) + +**ModelZoo** + +- Add Baselines for Kinetics-600 and Kinetics-700, including TSN-R50-8seg and SlowOnly-R50-8x8 ([#259](https://github.com/open-mmlab/mmaction2/pull/259)) +- Add OmniSource benchmark on MiniKineitcs ([#296](https://github.com/open-mmlab/mmaction2/pull/296)) +- Add Baselines for HVU, including TSN-R18-8seg on 6 tag categories of HVU ([#287](https://github.com/open-mmlab/mmaction2/pull/287)) +- Add X3D models ported from [SlowFast](https://github.com/facebookresearch/SlowFast/) ([#288](https://github.com/open-mmlab/mmaction2/pull/288)) + +### v0.7.0 (30/9/2020) + +**Highlights** + +- Support TPN +- Support JHMDB, UCF101-24, HVU dataset preparation +- support onnx model conversion + +**New Features** + +- Support the data pre-processing pipeline for the HVU Dataset ([#277](https://github.com/open-mmlab/mmaction2/pull/227/)) +- Support real-time action recognition from web camera ([#171](https://github.com/open-mmlab/mmaction2/pull/171)) +- Support onnx ([#160](https://github.com/open-mmlab/mmaction2/pull/160)) +- Support UCF101-24 preparation ([#219](https://github.com/open-mmlab/mmaction2/pull/219)) +- Support evaluating mAP for ActivityNet with [CUHK17_activitynet_pred](http://activity-net.org/challenges/2017/evaluation.html) ([#176](https://github.com/open-mmlab/mmaction2/pull/176)) +- Add the data pipeline for ActivityNet, including downloading videos, extracting RGB and Flow frames, finetuning TSN and extracting feature ([#190](https://github.com/open-mmlab/mmaction2/pull/190)) +- Support JHMDB preparation ([#220](https://github.com/open-mmlab/mmaction2/pull/220)) + +**ModelZoo** + +- Add finetuning setting for SlowOnly ([#173](https://github.com/open-mmlab/mmaction2/pull/173)) +- Add TSN and SlowOnly models trained with [OmniSource](https://arxiv.org/abs/2003.13042), which achieve 75.7% Top-1 with TSN-R50-3seg and 80.4% Top-1 with SlowOnly-R101-8x8 ([#215](https://github.com/open-mmlab/mmaction2/pull/215)) + +**Improvements** + +- Support demo with video url ([#165](https://github.com/open-mmlab/mmaction2/pull/165)) +- Support multi-batch when testing ([#184](https://github.com/open-mmlab/mmaction2/pull/184)) +- Add tutorial for adding a new learning rate updater ([#181](https://github.com/open-mmlab/mmaction2/pull/181)) +- Add config name in meta info ([#183](https://github.com/open-mmlab/mmaction2/pull/183)) +- Remove git hash in `__version__` ([#189](https://github.com/open-mmlab/mmaction2/pull/189)) +- Check mmcv version ([#189](https://github.com/open-mmlab/mmaction2/pull/189)) +- Update url with 'https://download.openmmlab.com' ([#208](https://github.com/open-mmlab/mmaction2/pull/208)) +- Update Docker file to support PyTorch 1.6 and update `install.md` ([#209](https://github.com/open-mmlab/mmaction2/pull/209)) +- Polish readsthedocs display ([#217](https://github.com/open-mmlab/mmaction2/pull/217), [#229](https://github.com/open-mmlab/mmaction2/pull/229)) + +**Bug Fixes** + +- Fix the bug when using OpenCV to extract only RGB frames with original shape ([#184](https://github.com/open-mmlab/mmaction2/pull/187)) +- Fix the bug of sthv2 `num_classes` from 339 to 174 ([#174](https://github.com/open-mmlab/mmaction2/pull/174), [#207](https://github.com/open-mmlab/mmaction2/pull/207)) + +### v0.6.0 (2/9/2020) + +**Highlights** + +- Support TIN, CSN, SSN, NonLocal +- Support FP16 training + +**New Features** + +- Support NonLocal module and provide ckpt in TSM and I3D ([#41](https://github.com/open-mmlab/mmaction2/pull/41)) +- Support SSN ([#33](https://github.com/open-mmlab/mmaction2/pull/33), [#37](https://github.com/open-mmlab/mmaction2/pull/37), [#52](https://github.com/open-mmlab/mmaction2/pull/52), [#55](https://github.com/open-mmlab/mmaction2/pull/55)) +- Support CSN ([#87](https://github.com/open-mmlab/mmaction2/pull/87)) +- Support TIN ([#53](https://github.com/open-mmlab/mmaction2/pull/53)) +- Support HMDB51 dataset preparation ([#60](https://github.com/open-mmlab/mmaction2/pull/60)) +- Support encoding videos from frames ([#84](https://github.com/open-mmlab/mmaction2/pull/84)) +- Support FP16 training ([#25](https://github.com/open-mmlab/mmaction2/pull/25)) +- Enhance demo by supporting rawframe inference ([#59](https://github.com/open-mmlab/mmaction2/pull/59)), output video/gif ([#72](https://github.com/open-mmlab/mmaction2/pull/72)) + +**ModelZoo** + +- Update Slowfast modelzoo ([#51](https://github.com/open-mmlab/mmaction2/pull/51)) +- Update TSN, TSM video checkpoints ([#50](https://github.com/open-mmlab/mmaction2/pull/50)) +- Add data benchmark for TSN ([#57](https://github.com/open-mmlab/mmaction2/pull/57)) +- Add data benchmark for SlowOnly ([#77](https://github.com/open-mmlab/mmaction2/pull/77)) +- Add BSN/BMN performance results with feature extracted by our codebase ([#99](https://github.com/open-mmlab/mmaction2/pull/99)) + +**Improvements** + +- Polish data preparation codes ([#70](https://github.com/open-mmlab/mmaction2/pull/70)) +- Improve data preparation scripts ([#58](https://github.com/open-mmlab/mmaction2/pull/58)) +- Improve unittest coverage and minor fix ([#62](https://github.com/open-mmlab/mmaction2/pull/62)) +- Support PyTorch 1.6 in CI ([#117](https://github.com/open-mmlab/mmaction2/pull/117)) +- Support `with_offset` for rawframe dataset ([#48](https://github.com/open-mmlab/mmaction2/pull/48)) +- Support json annotation files ([#119](https://github.com/open-mmlab/mmaction2/pull/119)) +- Support `multi-class` in TSMHead ([#104](https://github.com/open-mmlab/mmaction2/pull/104)) +- Support using `val_step()` to validate data for each `val` workflow ([#123](https://github.com/open-mmlab/mmaction2/pull/123)) +- Use `xxInit()` method to get `total_frames` and make `total_frames` a required key ([#90](https://github.com/open-mmlab/mmaction2/pull/90)) +- Add paper introduction in model readme ([#140](https://github.com/open-mmlab/mmaction2/pull/140)) +- Adjust the directory structure of `tools/` and rename some scripts files ([#142](https://github.com/open-mmlab/mmaction2/pull/142)) + +**Bug Fixes** + +- Fix configs for localization test ([#67](https://github.com/open-mmlab/mmaction2/pull/67)) +- Fix configs of SlowOnly by fixing lr to 8 gpus ([#136](https://github.com/open-mmlab/mmaction2/pull/136)) +- Fix the bug in analyze_log ([#54](https://github.com/open-mmlab/mmaction2/pull/54)) +- Fix the bug of generating HMDB51 class index file ([#69](https://github.com/open-mmlab/mmaction2/pull/69)) +- Fix the bug of using `load_checkpoint()` in ResNet ([#93](https://github.com/open-mmlab/mmaction2/pull/93)) +- Fix the bug of `--work-dir` when using slurm training script ([#110](https://github.com/open-mmlab/mmaction2/pull/110)) +- Correct the sthv1/sthv2 rawframes filelist generate command ([#71](https://github.com/open-mmlab/mmaction2/pull/71)) +- `CosineAnnealing` typo ([#47](https://github.com/open-mmlab/mmaction2/pull/47)) + +### v0.5.0 (9/7/2020) + +**Highlights** + +- MMAction2 is released + +**New Features** + +- Support various datasets: UCF101, Kinetics-400, Something-Something V1&V2, Moments in Time, + Multi-Moments in Time, THUMOS14 +- Support various action recognition methods: TSN, TSM, R(2+1)D, I3D, SlowOnly, SlowFast, Non-local +- Support various action localization methods: BSN, BMN +- Colab demo for action recognition diff --git a/openmmlab_test/mmaction2-0.24.1/docs/conf.py b/openmmlab_test/mmaction2-0.24.1/docs/conf.py new file mode 100644 index 0000000000000000000000000000000000000000..049b1065a6429f92b2422f84040ef67c0b72d183 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/conf.py @@ -0,0 +1,136 @@ +# Copyright (c) OpenMMLab. All rights reserved. +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import subprocess +import sys + +import pytorch_sphinx_theme + +sys.path.insert(0, os.path.abspath('..')) + +# -- Project information ----------------------------------------------------- + +project = 'MMAction2' +copyright = '2020, OpenMMLab' +author = 'MMAction2 Authors' +version_file = '../mmaction/version.py' + + +def get_version(): + with open(version_file, 'r') as f: + exec(compile(f.read(), version_file, 'exec')) + return locals()['__version__'] + + +# The full version, including alpha/beta/rc tags +release = get_version() + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.autodoc', 'sphinx.ext.napoleon', 'sphinx.ext.viewcode', + 'sphinx_markdown_tables', 'sphinx_copybutton', 'myst_parser' +] + +# numpy and torch are required +autodoc_mock_imports = ['mmaction.version', 'PIL'] + +copybutton_prompt_text = r'>>> |\.\.\. ' +copybutton_prompt_is_regexp = True + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] + +# -- Options for HTML output ------------------------------------------------- +source_suffix = {'.rst': 'restructuredtext', '.md': 'markdown'} + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = 'pytorch_sphinx_theme' + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". + +html_theme_path = [pytorch_sphinx_theme.get_html_theme_path()] +html_theme_options = { + # 'logo_url': 'https://mmaction2.readthedocs.io/en/latest/', + 'menu': [ + { + 'name': + 'Tutorial', + 'url': + 'https://colab.research.google.com/github/' + 'open-mmlab/mmaction2/blob/master/demo/mmaction2_tutorial.ipynb' + }, + { + 'name': 'GitHub', + 'url': 'https://github.com/open-mmlab/mmaction2' + }, + { + 'name': + 'Upstream', + 'children': [ + { + 'name': 'MMCV', + 'url': 'https://github.com/open-mmlab/mmcv', + 'description': 'Foundational library for computer vision' + }, + { + 'name': + 'MMClassification', + 'url': + 'https://github.com/open-mmlab/mmclassification', + 'description': + 'Open source image classification toolbox based on PyTorch' + }, + { + 'name': 'MMDetection', + 'url': 'https://github.com/open-mmlab/mmdetection', + 'description': 'Object detection toolbox and benchmark' + }, + ] + }, + ], + # Specify the language of shared menu + 'menu_lang': + 'en' +} + +language = 'en' +master_doc = 'index' + +html_static_path = ['_static'] +html_css_files = ['css/readthedocs.css'] + +myst_enable_extensions = ['colon_fence'] +myst_heading_anchors = 3 + + +def builder_inited_handler(app): + subprocess.run(['./merge_docs.sh']) + subprocess.run(['./stat.py']) + + +def setup(app): + app.connect('builder-inited', builder_inited_handler) diff --git a/openmmlab_test/mmaction2-0.24.1/docs/data_preparation.md b/openmmlab_test/mmaction2-0.24.1/docs/data_preparation.md new file mode 100644 index 0000000000000000000000000000000000000000..84788dcf2cca6bd2122ea4849290f092e561b601 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/data_preparation.md @@ -0,0 +1,154 @@ +# Data Preparation + +We provide some tips for MMAction2 data preparation in this file. + + + +- [Notes on Video Data Format](#notes-on-video-data-format) +- [Getting Data](#getting-data) + - [Prepare videos](#prepare-videos) + - [Extract frames](#extract-frames) + - [Alternative to denseflow](#alternative-to-denseflow) + - [Generate file list](#generate-file-list) + - [Prepare audio](#prepare-audio) + + + +## Notes on Video Data Format + +MMAction2 supports two types of data format: raw frames and video. The former is widely used in previous projects such as [TSN](https://github.com/yjxiong/temporal-segment-networks). +This is fast when SSD is available but fails to scale to the fast-growing datasets. +(For example, the newest edition of [Kinetics](https://deepmind.com/research/open-source/open-source-datasets/kinetics/) has 650K videos and the total frames will take up several TBs.) +The latter saves much space but has to do the computation intensive video decoding at execution time. +To make video decoding faster, we support several efficient video loading libraries, such as [decord](https://github.com/zhreshold/decord), [PyAV](https://github.com/PyAV-Org/PyAV), etc. + +## Getting Data + +The following guide is helpful when you want to experiment with custom dataset. +Similar to the datasets stated above, it is recommended organizing in `$MMACTION2/data/$DATASET`. + +### Prepare videos + +Please refer to the official website and/or the official script to prepare the videos. +Note that the videos should be arranged in either + +(1). A two-level directory organized by `${CLASS_NAME}/${VIDEO_ID}`, which is recommended to be used for action recognition datasets (such as UCF101 and Kinetics) + +(2). A single-level directory, which is recommended to be used for action detection datasets or those with multiple annotations per video (such as THUMOS14). + +### Extract frames + +To extract both frames and optical flow, you can use the tool [denseflow](https://github.com/open-mmlab/denseflow) we wrote. +Since different frame extraction tools produce different number of frames, +it is beneficial to use the same tool to do both frame extraction and the flow computation, to avoid mismatching of frame counts. + +```shell +python build_rawframes.py ${SRC_FOLDER} ${OUT_FOLDER} [--task ${TASK}] [--level ${LEVEL}] \ + [--num-worker ${NUM_WORKER}] [--flow-type ${FLOW_TYPE}] [--out-format ${OUT_FORMAT}] \ + [--ext ${EXT}] [--new-width ${NEW_WIDTH}] [--new-height ${NEW_HEIGHT}] [--new-short ${NEW_SHORT}] \ + [--resume] [--use-opencv] [--mixed-ext] +``` + +- `SRC_FOLDER`: Folder of the original video. +- `OUT_FOLDER`: Root folder where the extracted frames and optical flow store. +- `TASK`: Extraction task indicating which kind of frames to extract. Allowed choices are `rgb`, `flow`, `both`. +- `LEVEL`: Directory level. 1 for the single-level directory or 2 for the two-level directory. +- `NUM_WORKER`: Number of workers to build rawframes. +- `FLOW_TYPE`: Flow type to extract, e.g., `None`, `tvl1`, `warp_tvl1`, `farn`, `brox`. +- `OUT_FORMAT`: Output format for extracted frames, e.g., `jpg`, `h5`, `png`. +- `EXT`: Video file extension, e.g., `avi`, `mp4`. +- `NEW_WIDTH`: Resized image width of output. +- `NEW_HEIGHT`: Resized image height of output. +- `NEW_SHORT`: Resized image short side length keeping ratio. +- `--resume`: Whether to resume optical flow extraction instead of overwriting. +- `--use-opencv`: Whether to use OpenCV to extract rgb frames. +- `--mixed-ext`: Indicate whether process video files with mixed extensions. + +The recommended practice is + +1. set `$OUT_FOLDER` to be a folder located in SSD. +2. symlink the link `$OUT_FOLDER` to `$MMACTION2/data/$DATASET/rawframes`. +3. set `new-short` instead of using `new-width` and `new-height`. + +```shell +ln -s ${YOUR_FOLDER} $MMACTION2/data/$DATASET/rawframes +``` + +#### Alternative to denseflow + +In case your device doesn't fulfill the installation requirement of [denseflow](https://github.com/open-mmlab/denseflow)(like Nvidia driver version), or you just want to see some quick demos about flow extraction, we provide a python script `tools/misc/flow_extraction.py` as an alternative to denseflow. You can use it for rgb frames and optical flow extraction from one or several videos. Note that the speed of the script is much slower than denseflow, since it runs optical flow algorithms on CPU. + +```shell +python tools/misc/flow_extraction.py --input ${INPUT} [--prefix ${PREFIX}] [--dest ${DEST}] [--rgb-tmpl ${RGB_TMPL}] \ + [--flow-tmpl ${FLOW_TMPL}] [--start-idx ${START_IDX}] [--method ${METHOD}] [--bound ${BOUND}] [--save-rgb] +``` + +- `INPUT`: Videos for frame extraction, can be single video or a video list, the video list should be a txt file and just consists of filenames without directories. +- `PREFIX`: The prefix of input videos, used when input is a video list. +- `DEST`: The destination to save extracted frames. +- `RGB_TMPL`: The template filename of rgb frames. +- `FLOW_TMPL`: The template filename of flow frames. +- `START_IDX`: The start index of extracted frames. +- `METHOD`: The method used to generate flow. +- `BOUND`: The maximum of optical flow. +- `SAVE_RGB`: Also save extracted rgb frames. + +### Generate file list + +We provide a convenient script to generate annotation file list. You can use the following command to generate file lists given extracted frames / downloaded videos. + +```shell +cd $MMACTION2 +python tools/data/build_file_list.py ${DATASET} ${SRC_FOLDER} [--rgb-prefix ${RGB_PREFIX}] \ + [--flow-x-prefix ${FLOW_X_PREFIX}] [--flow-y-prefix ${FLOW_Y_PREFIX}] [--num-split ${NUM_SPLIT}] \ + [--subset ${SUBSET}] [--level ${LEVEL}] [--format ${FORMAT}] [--out-root-path ${OUT_ROOT_PATH}] \ + [--seed ${SEED}] [--shuffle] +``` + +- `DATASET`: Dataset to be prepared, e.g., `ucf101`, `kinetics400`, `thumos14`, `sthv1`, `sthv2`, etc. +- `SRC_FOLDER`: Folder of the corresponding data format: + - "$MMACTION2/data/$DATASET/rawframes" if `--format rawframes`. + - "$MMACTION2/data/$DATASET/videos" if `--format videos`. +- `RGB_PREFIX`: Name prefix of rgb frames. +- `FLOW_X_PREFIX`: Name prefix of x flow frames. +- `FLOW_Y_PREFIX`: Name prefix of y flow frames. +- `NUM_SPLIT`: Number of split to file list. +- `SUBSET`: Subset to generate file list. Allowed choice are `train`, `val`, `test`. +- `LEVEL`: Directory level. 1 for the single-level directory or 2 for the two-level directory. +- `FORMAT`: Source data format to generate file list. Allowed choices are `rawframes`, `videos`. +- `OUT_ROOT_PATH`: Root path for output +- `SEED`: Random seed. +- `--shuffle`: Whether to shuffle the file list. + +Now, you can go to [getting_started.md](getting_started.md) to train and test the model. + +### Prepare audio + +We also provide a simple script for audio waveform extraction and mel-spectrogram generation. + +```shell +cd $MMACTION2 +python tools/data/extract_audio.py ${ROOT} ${DST_ROOT} [--ext ${EXT}] [--num-workers ${N_WORKERS}] \ + [--level ${LEVEL}] +``` + +- `ROOT`: The root directory of the videos. +- `DST_ROOT`: The destination root directory of the audios. +- `EXT`: Extension of the video files. e.g., `mp4`. +- `N_WORKERS`: Number of processes to be used. + +After extracting audios, you are free to decode and generate the spectrogram on-the-fly such as [this](/configs/recognition_audio/resnet/tsn_r50_64x1x1_100e_kinetics400_audio.py). As for the annotations, you can directly use those of the rawframes as long as you keep the relative position of audio files same as the rawframes directory. However, extracting spectrogram on-the-fly is slow and bad for prototype iteration. Therefore, we also provide a script (and many useful tools to play with) for you to generation spectrogram off-line. + +```shell +cd $MMACTION2 +python tools/data/build_audio_features.py ${AUDIO_HOME_PATH} ${SPECTROGRAM_SAVE_PATH} [--level ${LEVEL}] \ + [--ext $EXT] [--num-workers $N_WORKERS] [--part $PART] +``` + +- `AUDIO_HOME_PATH`: The root directory of the audio files. +- `SPECTROGRAM_SAVE_PATH`: The destination root directory of the audio features. +- `EXT`: Extension of the audio files. e.g., `m4a`. +- `N_WORKERS`: Number of processes to be used. +- `PART`: Determines how many parts to be splited and which part to run. e.g., `2/5` means splitting all files into 5-fold and executing the 2nd part. This is useful if you have several machines. + +The annotations for audio spectrogram features are identical to those of rawframes. You can simply make a copy of `dataset_[train/val]_list_rawframes.txt` and rename it as `dataset_[train/val]_list_audio_feature.txt` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/faq.md b/openmmlab_test/mmaction2-0.24.1/docs/faq.md new file mode 100644 index 0000000000000000000000000000000000000000..7ec9727aae652d57dfe8d8ce5b2b9a09df21b462 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/faq.md @@ -0,0 +1,132 @@ +# FAQ + +## Outline + +We list some common issues faced by many users and their corresponding solutions here. + +- [Installation](#installation) +- [Data](#data) +- [Training](#training) +- [Testing](#testing) +- [Deploying](#deploying) + +Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them. +If the contents here do not cover your issue, please create an issue using the [provided templates](/.github/ISSUE_TEMPLATE/error-report.md) and make sure you fill in all required information in the template. + +## Installation + +- **"No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"** + + 1. Uninstall existing mmcv in the environment using `pip uninstall mmcv` + 2. Install mmcv-full following the [installation instruction](https://mmcv.readthedocs.io/en/latest/#installation) + +- **"OSError: MoviePy Error: creation of None failed because of the following error"** + + Refer to [install.md](https://github.com/open-mmlab/mmaction2/blob/master/docs/install.md#requirements) + + 1. For Windows users, [ImageMagick](https://www.imagemagick.org/script/index.php) will not be automatically detected by MoviePy, there is a need to modify `moviepy/config_defaults.py` file by providing the path to the ImageMagick binary called `magick`, like `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"` + 2. For Linux users, there is a need to modify the `/etc/ImageMagick-6/policy.xml` file by commenting out `` to ``, if ImageMagick is not detected by moviepy. + +- **"Why I got the error message 'Please install XXCODEBASE to use XXX' even if I have already installed XXCODEBASE?"** + + You got that error message because our project failed to import a function or a class from XXCODEBASE. You can try to run the corresponding line to see what happens. One possible reason is, for some codebases in OpenMMLAB, you need to install mmcv-full before you install them. + +## Data + +- **FileNotFound like `No such file or directory: xxx/xxx/img_00300.jpg`** + + In our repo, we set `start_index=1` as default value for rawframe dataset, and `start_index=0` as default value for video dataset. + If users encounter FileNotFound error for the first or last frame of the data, there is a need to check the files begin with offset 0 or 1, + that is `xxx_00000.jpg` or `xxx_00001.jpg`, and then change the `start_index` value of data pipeline in configs. + +- **How should we preprocess the videos in the dataset? Resizing them to a fix size(all videos with the same height-width ratio) like `340x256`(1) or resizing them so that the short edges of all videos are of the same length (256px or 320px)** + + We have tried both preprocessing approaches and found (2) is a better solution in general, so we use (2) with short edge length 256px as the default preprocessing setting. We benchmarked these preprocessing approaches and you may find the results in [TSN Data Benchmark](https://github.com/open-mmlab/mmaction2/tree/master/configs/recognition/tsn) and [SlowOnly Data Benchmark](https://github.com/open-mmlab/mmaction2/tree/master/configs/recognition/tsn). + +- **Mismatched data pipeline items lead to errors like `KeyError: 'total_frames'`** + + We have both pipeline for processing videos and frames. + + **For videos**, We should decode them on the fly in the pipeline, so pairs like `DecordInit & DecordDecode`, `OpenCVInit & OpenCVDecode`, `PyAVInit & PyAVDecode` should be used for this case like [this example](https://github.com/open-mmlab/mmaction2/blob/023777cfd26bb175f85d78c455f6869673e0aa09/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py#L47-L49). + + **For Frames**, the image has been decoded offline, so pipeline item `RawFrameDecode` should be used for this case like [this example](https://github.com/open-mmlab/mmaction2/blob/023777cfd26bb175f85d78c455f6869673e0aa09/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py#L49). + + `KeyError: 'total_frames'` is caused by incorrectly using `RawFrameDecode` step for videos, since when the input is a video, it can not get the `total_frame` beforehand. + +## Training + +- **How to just use trained recognizer models for backbone pre-training?** + + Refer to [Use Pre-Trained Model](https://github.com/open-mmlab/mmaction2/blob/master/docs/tutorials/2_finetune.md#use-pre-trained-model), + in order to use the pre-trained model for the whole network, the new config adds the link of pre-trained models in the `load_from`. + + And to use backbone for pre-training, you can change `pretrained` value in the backbone dict of config files to the checkpoint path / url. + When training, the unexpected keys will be ignored. + +- **How to visualize the training accuracy/loss curves in real-time?** + + Use `TensorboardLoggerHook` in `log_config` like + + ```python + log_config=dict(interval=20, hooks=[dict(type='TensorboardLoggerHook')]) + ``` + + You can refer to [tutorials/1_config.md](tutorials/1_config.md), [tutorials/7_customize_runtime.md](tutorials/7_customize_runtime.md#log-config), and [this](https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py#L118). + +- **In batchnorm.py: Expected more than 1 value per channel when training** + + To use batchnorm, the batch_size should be larger than 1. If `drop_last` is set as False when building dataloaders, sometimes the last batch of an epoch will have `batch_size==1` (what a coincidence ...) and training will throw out this error. You can set `drop_last` as True to avoid this error: + + ```python + train_dataloader=dict(drop_last=True) + ``` + +- **How to fix stages of backbone when finetuning a model?** + + You can refer to [`def _freeze_stages()`](https://github.com/open-mmlab/mmaction2/blob/0149a0e8c1e0380955db61680c0006626fd008e9/mmaction/models/backbones/x3d.py#L458) and [`frozen_stages`](https://github.com/open-mmlab/mmaction2/blob/0149a0e8c1e0380955db61680c0006626fd008e9/mmaction/models/backbones/x3d.py#L183-L184), + reminding to set `find_unused_parameters = True` in config files for distributed training or testing. + + Actually, users can set `frozen_stages` to freeze stages in backbones except C3D model, since all backbones inheriting from `ResNet` and `ResNet3D` support the inner function `_freeze_stages()`. + +- **How to set memcached setting in config files?** + + In MMAction2, you can pass memcached kwargs to `class DecordInit` for video dataset or `RawFrameDecode` for rawframes dataset. + For more details, you can refer to [`class FileClient`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py) in MMCV for more details. + + Here is an example to use memcached for rawframes dataset: + + ```python + mc_cfg = dict(server_list_cfg='server_list_cfg', client_cfg='client_cfg', sys_path='sys_path') + + train_pipeline = [ + ... + dict(type='RawFrameDecode', io_backend='memcached', **mc_cfg), + ... + ] + ``` + +- **How to set `load_from` value in config files to finetune models?** + + In MMAction2, We set `load_from=None` as default in `configs/_base_/default_runtime.py` and owing to [inheritance design](/docs/tutorials/1_config.md), + users can directly change it by setting `load_from` in their configs. + +## Testing + +- **How to make predicted score normalized by softmax within \[0, 1\]?** + + change this in the config, make `model['test_cfg'] = dict(average_clips='prob')`. + +- **What if the model is too large and the GPU memory can not fit even only one testing sample?** + + By default, the 3d models are tested with 10clips x 3crops, which are 30 views in total. For extremely large models, the GPU memory can not fit even only one testing sample (cuz there are 30 views). To handle this, you can set `max_testing_views=n` in `model['test_cfg']` of the config file. If so, n views will be used as a batch during forwarding to save GPU memory used. + +- **How to show test results?** + + During testing, we can use the command `--out xxx.json/pkl/yaml` to output result files for checking. The testing output has exactly the same order as the test dataset. + Besides, we provide an analysis tool for evaluating a model using the output result files in [`tools/analysis/eval_metric.py`](/tools/analysis/eval_metric.py) + +## Deploying + +- **Why is the onnx model converted by mmaction2 throwing error when converting to other frameworks such as TensorRT?** + + For now, we can only make sure that models in mmaction2 are onnx-compatible. However, some operations in onnx may be unsupported by your target framework for deployment, e.g. TensorRT in [this issue](https://github.com/open-mmlab/mmaction2/issues/414). When such situation occurs, we suggest you raise an issue and ask the community to help as long as `pytorch2onnx.py` works well and is verified numerically. diff --git a/openmmlab_test/mmaction2-0.24.1/docs/feature_extraction.md b/openmmlab_test/mmaction2-0.24.1/docs/feature_extraction.md new file mode 100644 index 0000000000000000000000000000000000000000..919fcc35af0d430c59f63e61d8731ee4d21fe4bb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/feature_extraction.md @@ -0,0 +1,70 @@ +# Feature Extraction + +We provide easy to use scripts for feature extraction. + +## Clip-level Feature Extraction + +Clip-level feature extraction extract deep feature from a video clip, which usually lasts several to tens of seconds. The extracted feature is an n-dim vector for each clip. When performing multi-view feature extraction, e.g. n clips x m crops, the extracted feature will be the average of the n * m views. + +Before applying clip-level feature extraction, you need to prepare a video list (which include all videos that you want to extract feature from). For example, the video list for videos in UCF101 will look like: + +``` +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c03.avi +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi +... +YoYo/v_YoYo_g25_c01.avi +YoYo/v_YoYo_g25_c02.avi +YoYo/v_YoYo_g25_c03.avi +YoYo/v_YoYo_g25_c04.avi +YoYo/v_YoYo_g25_c05.avi +``` + +Assume the root of UCF101 videos is `data/ucf101/videos` and the name of the video list is `ucf101.txt`, to extract clip-level feature of UCF101 videos with Kinetics-400 pretrained TSN, you can use the following script: + +```shell +python tools/misc/clip_feature_extraction.py \ +configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \ +https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \ +--video-list ucf101.txt \ +--video-root data/ucf101/videos \ +--out ucf101_feature.pkl +``` + +and the extracted feature will be stored in `ucf101_feature.pkl` + +You can also use distributed clip-level feature extraction. Below is an example for a node with 8 gpus. + +```shell +bash tools/misc/dist_clip_feature_extraction.sh \ +configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \ +https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \ +8 \ +--video-list ucf101.txt \ +--video-root data/ucf101/videos \ +--out ucf101_feature.pkl +``` + +To extract clip-level feature of UCF101 videos with Kinetics-400 pretrained SlowOnly, you can use the following script: + +```shell +python tools/misc/clip_feature_extraction.py \ +configs/recognition/slowonly/slowonly_r50_clip_feature_extraction_4x16x1_rgb.py \ +https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \ +--video-list ucf101.txt \ +--video-root data/ucf101/videos \ +--out ucf101_feature.pkl +``` + +The two config files demonstrates what a minimal config file for feature extraction looks like. You can also use other existing config files for feature extraction, as long as they use videos rather than raw frames for training and testing: + +```shell +python tools/misc/clip_feature_extraction.py \ +configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py \ +https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \ +--video-list ucf101.txt \ +--video-root data/ucf101/videos \ +--out ucf101_feature.pkl +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/getting_started.md b/openmmlab_test/mmaction2-0.24.1/docs/getting_started.md new file mode 100644 index 0000000000000000000000000000000000000000..9b492360fe98baa92ffb758a51709f6d1b46abb9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/getting_started.md @@ -0,0 +1,468 @@ +# Getting Started + +This page provides basic tutorials about the usage of MMAction2. +For installation instructions, please see [install.md](install.md). + + + +- [Getting Started](#getting-started) + - [Datasets](#datasets) + - [Inference with Pre-Trained Models](#inference-with-pre-trained-models) + - [Test a dataset](#test-a-dataset) + - [High-level APIs for testing a video and rawframes](#high-level-apis-for-testing-a-video-and-rawframes) + - [Build a Model](#build-a-model) + - [Build a model with basic components](#build-a-model-with-basic-components) + - [Write a new model](#write-a-new-model) + - [Train a Model](#train-a-model) + - [Iteration pipeline](#iteration-pipeline) + - [Training setting](#training-setting) + - [Train with a single GPU](#train-with-a-single-gpu) + - [Train with multiple GPUs](#train-with-multiple-gpus) + - [Train with multiple machines](#train-with-multiple-machines) + - [Launch multiple jobs on a single machine](#launch-multiple-jobs-on-a-single-machine) + - [Tutorials](#tutorials) + + + +## Datasets + +It is recommended to symlink the dataset root to `$MMACTION2/data`. +If your folder structure is different, you may need to change the corresponding paths in config files. + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── kinetics400 +│ │ ├── rawframes_train +│ │ ├── rawframes_val +│ │ ├── kinetics_train_list.txt +│ │ ├── kinetics_val_list.txt +│ ├── ucf101 +│ │ ├── rawframes_train +│ │ ├── rawframes_val +│ │ ├── ucf101_train_list.txt +│ │ ├── ucf101_val_list.txt +│ ├── ... +``` + +For more information on data preparation, please see [data_preparation.md](data_preparation.md) + +For using custom datasets, please refer to [Tutorial 3: Adding New Dataset](tutorials/3_new_dataset.md) + +## Inference with Pre-Trained Models + +We provide testing scripts to evaluate a whole dataset (Kinetics-400, Something-Something V1&V2, (Multi-)Moments in Time, etc.), +and provide some high-level apis for easier integration to other projects. + +MMAction2 also supports testing with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU. +To test with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the testing scripts directly with `python tools/test.py {OTHER_ARGS}`. + +### Test a dataset + +- [x] single GPU +- [x] single node multiple GPUs +- [x] multiple node + +You can use the following commands to test a dataset. + +```shell +# single-gpu testing +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \ + [--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \ + [--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}] [--onnx] [--tensorrt] + +# multi-gpu testing +./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \ + [--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \ + [--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}] +``` + +Optional arguments: + +- `RESULT_FILE`: Filename of the output results. If not specified, the results will not be saved to a file. +- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `top_k_accuracy`, `mean_class_accuracy` are available for all datasets in recognition, `mmit_mean_average_precision` for Multi-Moments in Time, `mean_average_precision` for Multi-Moments in Time and HVU single category. `AR@AN` for ActivityNet, etc. +- `--gpu-collect`: If specified, recognition results will be collected using gpu communication. Otherwise, it will save the results on different gpus to `TMPDIR` and collect them by the rank 0 worker. +- `TMPDIR`: Temporary directory used for collecting results from multiple workers, available when `--gpu-collect` is not specified. +- `OPTIONS`: Custom options used for evaluation. Allowed values depend on the arguments of the `evaluate` function in dataset. +- `AVG_TYPE`: Items to average the test clips. If set to `prob`, it will apply softmax before averaging the clip scores. Otherwise, it will directly average the clip scores. +- `JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode. +- `LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0. +- `--onnx`: If specified, recognition results will be generated by onnx model and `CHECKPOINT_FILE` should be onnx model file path. Onnx model files are generated by `/tools/deployment/pytorch2onnx.py`. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of onnx model should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.) +- `--tensorrt`: If specified, recognition results will be generated by TensorRT engine and `CHECKPOINT_FILE` should be TensorRT engine file path. TensorRT engines are generated by exported onnx models and TensorRT official conversion tools. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of TensorRT engine should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.) + +Examples: + +Assume that you have already downloaded the checkpoints to the directory `checkpoints/`. + +1. Test TSN on Kinetics-400 (without saving the test results) and evaluate the top-k accuracy and mean class accuracy. + + ```shell + python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth \ + --eval top_k_accuracy mean_class_accuracy + ``` + +2. Test TSN on Something-Something V1 with 8 GPUS, and evaluate the top-k accuracy. + + ```shell + ./tools/dist_test.sh configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth \ + 8 --out results.pkl --eval top_k_accuracy + ``` + +3. Test TSN on Kinetics-400 in slurm environment and evaluate the top-k accuracy + + ```shell + python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth \ + --launcher slurm --eval top_k_accuracy + ``` + +4. Test TSN on Something-Something V1 with onnx model and evaluate the top-k accuracy + + ```shell + python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.onnx \ + --eval top_k_accuracy --onnx + ``` + +### High-level APIs for testing a video and rawframes + +Here is an example of building the model and testing a given video. + +```python +import torch + +from mmaction.apis import init_recognizer, inference_recognizer + +config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py' +# download the checkpoint from model zoo and put it in `checkpoints/` +checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth' + +# assign the desired device. +device = 'cuda:0' # or 'cpu' +device = torch.device(device) + + # build the model from a config file and a checkpoint file +model = init_recognizer(config_file, checkpoint_file, device=device) + +# test a single video and show the result: +video = 'demo/demo.mp4' +labels = 'tools/data/kinetics/label_map_k400.txt' +results = inference_recognizer(model, video) + +# show the results +labels = open('tools/data/kinetics/label_map_k400.txt').readlines() +labels = [x.strip() for x in labels] +results = [(labels[k[0]], k[1]) for k in results] + +print(f'The top-5 labels with corresponding scores are:') +for result in results: + print(f'{result[0]}: ', result[1]) +``` + +Here is an example of building the model and testing with a given rawframes directory. + +```python +import torch + +from mmaction.apis import init_recognizer, inference_recognizer + +config_file = 'configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py' +# download the checkpoint from model zoo and put it in `checkpoints/` +checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth' + +# assign the desired device. +device = 'cuda:0' # or 'cpu' +device = torch.device(device) + + # build the model from a config file and a checkpoint file +model = init_recognizer(config_file, checkpoint_file, device=device) + +# test rawframe directory of a single video and show the result: +video = 'SOME_DIR_PATH/' +labels = 'tools/data/kinetics/label_map_k400.txt' +results = inference_recognizer(model, video) + +# show the results +labels = open('tools/data/kinetics/label_map_k400.txt').readlines() +labels = [x.strip() for x in labels] +results = [(labels[k[0]], k[1]) for k in results] + +print(f'The top-5 labels with corresponding scores are:') +for result in results: + print(f'{result[0]}: ', result[1]) +``` + +Here is an example of building the model and testing with a given video url. + +```python +import torch + +from mmaction.apis import init_recognizer, inference_recognizer + +config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py' +# download the checkpoint from model zoo and put it in `checkpoints/` +checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth' + +# assign the desired device. +device = 'cuda:0' # or 'cpu' +device = torch.device(device) + + # build the model from a config file and a checkpoint file +model = init_recognizer(config_file, checkpoint_file, device=device) + +# test url of a single video and show the result: +video = 'https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4' +labels = 'tools/data/kinetics/label_map_k400.txt' +results = inference_recognizer(model, video) + +# show the results +labels = open('tools/data/kinetics/label_map_k400.txt').readlines() +labels = [x.strip() for x in labels] +results = [(labels[k[0]], k[1]) for k in results] + +print(f'The top-5 labels with corresponding scores are:') +for result in results: + print(f'{result[0]}: ', result[1]) +``` + +:::{note} +We define `data_prefix` in config files and set it None as default for our provided inference configs. +If the `data_prefix` is not None, the path for the video file (or rawframe directory) to get will be `data_prefix/video`. +Here, the `video` is the param in the demo scripts above. +This detail can be found in `rawframe_dataset.py` and `video_dataset.py`. For example, + +- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is None in the config file, + the param `video` should be `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME`). +- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is `SOME_DIR_PATH` in the config file, + the param `video` should be `VIDEO.mp4` (`VIDEO_NAME`). +- When rawframes path is `VIDEO_NAME/img_xxxxx.jpg`, and `data_prefix` is None in the config file, the param `video` should be `VIDEO_NAME`. +- When passing a url instead of a local video file, you need to use OpenCV as the video decoding backend. + +::: + +A notebook demo can be found in [demo/demo.ipynb](/demo/demo.ipynb) + +## Build a Model + +### Build a model with basic components + +In MMAction2, model components are basically categorized as 4 types. + +- recognizer: the whole recognizer model pipeline, usually contains a backbone and cls_head. +- backbone: usually an FCN network to extract feature maps, e.g., ResNet, BNInception. +- cls_head: the component for classification task, usually contains an FC layer with some pooling layers. +- localizer: the model for localization task, currently available: BSN, BMN. + +Following some basic pipelines (e.g., `Recognizer2D`), the model structure +can be customized through config files with no pains. + +If we want to implement some new components, e.g., the temporal shift backbone structure as +in [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383), there are several things to do. + +1. create a new file in `mmaction/models/backbones/resnet_tsm.py`. + + ```python + from ..builder import BACKBONES + from .resnet import ResNet + + @BACKBONES.register_module() + class ResNetTSM(ResNet): + + def __init__(self, + depth, + num_segments=8, + is_shift=True, + shift_div=8, + shift_place='blockres', + temporal_pool=False, + **kwargs): + pass + + def forward(self, x): + # implementation is ignored + pass + ``` + +2. Import the module in `mmaction/models/backbones/__init__.py` + + ```python + from .resnet_tsm import ResNetTSM + ``` + +3. modify the config file from + + ```python + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False) + ``` + + to + + ```python + backbone=dict( + type='ResNetTSM', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False, + shift_div=8) + ``` + +### Write a new model + +To write a new recognition pipeline, you need to inherit from `BaseRecognizer`, +which defines the following abstract methods. + +- `forward_train()`: forward method of the training mode. +- `forward_test()`: forward method of the testing mode. + +[Recognizer2D](/mmaction/models/recognizers/recognizer2d.py) and [Recognizer3D](/mmaction/models/recognizers/recognizer3d.py) +are good examples which show how to do that. + +## Train a Model + +### Iteration pipeline + +MMAction2 implements distributed training and non-distributed training, +which uses `MMDistributedDataParallel` and `MMDataParallel` respectively. + +We adopt distributed training for both single machine and multiple machines. +Supposing that the server has 8 GPUs, 8 processes will be started and each process runs on a single GPU. + +Each process keeps an isolated model, data loader, and optimizer. +Model parameters are only synchronized once at the beginning. +After a forward and backward pass, gradients will be allreduced among all GPUs, +and the optimizer will update model parameters. +Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration. + +### Training setting + +All outputs (log files and checkpoints) will be saved to the working directory, +which is specified by `work_dir` in the config file. + +By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by modifying the interval argument in the training config + +```python +evaluation = dict(interval=5) # This evaluate the model per 5 epoch. +``` + +According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu. + +MMAction2 also supports training with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU. +To train with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the training scripts directly with `python tools/train.py {OTHER_ARGS}`. + +### Train with a single GPU + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +If you want to specify the working directory in the command, you can add an argument `--work-dir ${YOUR_WORK_DIR}`. + +### Train with multiple GPUs + +```shell +./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] +``` + +Optional arguments are: + +- `--validate` (**strongly recommended**): Perform evaluation at every k (default value is 5, which can be modified by changing the `interval` value in `evaluation` dict in each config file) epochs during the training. +- `--test-last`: Test the final checkpoint when training is over, save the prediction to `${WORK_DIR}/last_pred.pkl`. +- `--test-best`: Test the best checkpoint when training is over, save the prediction to `${WORK_DIR}/best_pred.pkl`. +- `--work-dir ${WORK_DIR}`: Override the working directory specified in the config file. +- `--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file. +- `--gpus ${GPU_NUM}`: Number of gpus to use, which is only applicable to non-distributed training. +- `--gpu-ids ${GPU_IDS}`: IDs of gpus to use, which is only applicable to non-distributed training. +- `--seed ${SEED}`: Seed id for random state in python, numpy and pytorch to generate random numbers. +- `--deterministic`: If specified, it will set deterministic options for CUDNN backend. +- `JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode. +- `LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0. + +Difference between `resume-from` and `load-from`: +`resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally. +`load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning. + +Here is an example of using 8 GPUs to load TSN checkpoint. + +```shell +./tools/dist_train.sh configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py 8 --resume-from work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth +``` + +### Train with multiple machines + +If you can run MMAction2 on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.) + +```shell +[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} [--work-dir ${WORK_DIR}] +``` + +Here is an example of using 16 GPUs to train TSN on the dev partition in a slurm cluster. (use `GPUS_PER_NODE=8` to specify a single slurm cluster node with 8 GPUs.) + +```shell +GPUS=16 ./tools/slurm_train.sh dev tsn_r50_k400 configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb +``` + +You can check [slurm_train.sh](/tools/slurm_train.sh) for full arguments and environment variables. + +If you have just multiple machines connected with ethernet, you can simply run the following commands: + +On the first machine: + +```shell +NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS +``` + +On the second machine: + +```shell +NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS +``` + +It can be extremely slow if you do not have high-speed networking like InfiniBand. + +### Launch multiple jobs on a single machine + +If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, +you need to specify different ports (29500 by default) for each job to avoid communication conflict. + +If you use `dist_train.sh` to launch training jobs, you can set the port in commands. + +```shell +CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4 +CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4 +``` + +If you use launch training jobs with slurm, you need to modify `dist_params` in the config files (usually the 6th line from the bottom in config files) to set different communication ports. + +In `config1.py`, + +```python +dist_params = dict(backend='nccl', port=29500) +``` + +In `config2.py`, + +```python +dist_params = dict(backend='nccl', port=29501) +``` + +Then you can launch two jobs with `config1.py` ang `config2.py`. + +```shell +CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py [--work-dir ${WORK_DIR}] +CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py [--work-dir ${WORK_DIR}] +``` + +## Tutorials + +Currently, we provide some tutorials for users to [learn about configs](tutorials/1_config.md), [finetune model](tutorials/2_finetune.md), +[add new dataset](tutorials/3_new_dataset.md), [customize data pipelines](tutorials/4_data_pipeline.md), +[add new modules](tutorials/5_new_modules.md), [export a model to ONNX](tutorials/6_export_model.md) and [customize runtime settings](tutorials/7_customize_runtime.md). diff --git a/openmmlab_test/mmaction2-0.24.1/docs/index.rst b/openmmlab_test/mmaction2-0.24.1/docs/index.rst new file mode 100644 index 0000000000000000000000000000000000000000..b64cb6ea471803b064afe169ea19c8d1138ac5e0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/index.rst @@ -0,0 +1,75 @@ +Welcome to MMAction2's documentation! +===================================== + +You can switch between Chinese and English documents in the lower-left corner of the layout. + +您可以在页面左下角切换文档语言。 + +.. toctree:: + :maxdepth: 2 + + install.md + getting_started.md + demo.md + benchmark.md + +.. toctree:: + :maxdepth: 2 + :caption: Datasets + + datasets.md + data_preparation.md + supported_datasets.md + +.. toctree:: + :maxdepth: 2 + :caption: Model Zoo + + modelzoo.md + recognition_models.md + localization_models.md + detection_models.md + skeleton_models.md + +.. toctree:: + :maxdepth: 2 + :caption: Tutorials + + tutorials/1_config.md + tutorials/2_finetune.md + tutorials/3_new_dataset.md + tutorials/4_data_pipeline.md + tutorials/5_new_modules.md + tutorials/6_export_model.md + tutorials/7_customize_runtime.md + +.. toctree:: + :maxdepth: 2 + :caption: Useful Tools and Scripts + + useful_tools.md + +.. toctree:: + :maxdepth: 2 + :caption: Notes + + changelog.md + faq.md + +.. toctree:: + :caption: API Reference + + api.rst + +.. toctree:: + :caption: Switch Language + + switch_language.md + + + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`search` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/install.md b/openmmlab_test/mmaction2-0.24.1/docs/install.md new file mode 100644 index 0000000000000000000000000000000000000000..d1bde35709bd2bfb202c358cbe2db5a4a8be0e7e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/install.md @@ -0,0 +1,255 @@ +# Installation + +We provide some tips for MMAction2 installation in this file. + + + +- [Installation](#installation) + - [Requirements](#requirements) + - [Prepare environment](#prepare-environment) + - [Install MMAction2](#install-mmaction2) + - [Install with CPU only](#install-with-cpu-only) + - [Another option: Docker Image](#another-option-docker-image) + - [A from-scratch setup script](#a-from-scratch-setup-script) + - [Developing with multiple MMAction2 versions](#developing-with-multiple-mmaction2-versions) + - [Verification](#verification) + + + +## Requirements + +- Linux, Windows (We can successfully install mmaction2 on Windows and run inference, but we haven't tried training yet) +- Python 3.6+ +- PyTorch 1.3+ +- CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible) +- GCC 5+ +- [mmcv](https://github.com/open-mmlab/mmcv) 1.1.1+ +- Numpy +- ffmpeg (4.2 is preferred) +- [decord](https://github.com/dmlc/decord) (optional, 0.4.1+): Install CPU version by `pip install decord==0.4.1` and install GPU version from source +- [PyAV](https://github.com/mikeboers/PyAV) (optional): `conda install av -c conda-forge -y` +- [PyTurboJPEG](https://github.com/lilohuang/PyTurboJPEG) (optional): `pip install PyTurboJPEG` +- [denseflow](https://github.com/open-mmlab/denseflow) (optional): See [here](https://github.com/innerlee/setup) for simple install scripts. +- [moviepy](https://zulko.github.io/moviepy/) (optional): `pip install moviepy`. See [here](https://zulko.github.io/moviepy/install.html) for official installation. **Note**(according to [this issue](https://github.com/Zulko/moviepy/issues/693)) that: + 1. For Windows users, [ImageMagick](https://www.imagemagick.org/script/index.php) will not be automatically detected by MoviePy, + there is a need to modify `moviepy/config_defaults.py` file by providing the path to the ImageMagick binary called `magick`, like `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"` + 2. For Linux users, there is a need to modify the `/etc/ImageMagick-6/policy.xml` file by commenting out + `` to ``, if [ImageMagick](https://www.imagemagick.org/script/index.php) is not detected by `moviepy`. +- [Pillow-SIMD](https://docs.fast.ai/performance.html#pillow-simd) (optional): Install it by the following scripts. + +```shell +conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo +pip uninstall -y pillow pil jpeg libtiff libjpeg-turbo +conda install -yc conda-forge libjpeg-turbo +CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd +conda install -y jpeg libtiff +``` + +:::{note} +You need to run `pip uninstall mmcv` first if you have mmcv installed. +If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`. +::: + +## Prepare environment + +a. Create a conda virtual environment and activate it. + +```shell +conda create -n open-mmlab python=3.7 -y +conda activate open-mmlab +``` + +b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g., + +```shell +conda install pytorch torchvision -c pytorch +``` + +:::{note} +Make sure that your compilation CUDA version and runtime CUDA version match. +You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/). + +`E.g.1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install PyTorch 1.5, +you need to install the prebuilt PyTorch with CUDA 10.1. + +```shell +conda install pytorch cudatoolkit=10.1 torchvision -c pytorch +``` + +`E.g.2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install PyTorch 1.3.1., +you need to install the prebuilt PyTorch with CUDA 9.2. + +```shell +conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch +``` + +If you build PyTorch from source instead of installing the prebuilt package, you can use more CUDA versions such as 9.0. +::: + +## Install MMAction2 + +We recommend you to install MMAction2 with [MIM](https://github.com/open-mmlab/mim). + +```shell +pip install git+https://github.com/open-mmlab/mim.git +mim install mmaction2 -f https://github.com/open-mmlab/mmaction2.git +``` + +MIM can automatically install OpenMMLab projects and their requirements. + +Or, you can install MMAction2 manually: + +a. Install mmcv-full, we recommend you to install the pre-built package as below. + +```shell +# pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html +pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html +``` + +mmcv-full is only compiled on PyTorch 1.x.0 because the compatibility usually holds between 1.x.0 and 1.x.1. If your PyTorch version is 1.x.1, you can install mmcv-full compiled with PyTorch 1.x.0 and it usually works well. + +``` +# We can ignore the micro version of PyTorch +pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10/index.html +``` + +See [here](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions. + +Optionally you can choose to compile mmcv from source by the following command + +```shell +git clone https://github.com/open-mmlab/mmcv.git +cd mmcv +MMCV_WITH_OPS=1 pip install -e . # package mmcv-full, which contains cuda ops, will be installed after this step +# OR pip install -e . # package mmcv, which contains no cuda ops, will be installed after this step +cd .. +``` + +Or directly run + +```shell +pip install mmcv-full +# alternative: pip install mmcv +``` + +**Important:** You need to run `pip uninstall mmcv` first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`. + +b. Clone the MMAction2 repository. + +```shell +git clone https://github.com/open-mmlab/mmaction2.git +cd mmaction2 +``` + +c. Install build requirements and then install MMAction2. + +```shell +pip install -r requirements/build.txt +pip install -v -e . # or "python setup.py develop" +``` + +If you build MMAction2 on macOS, replace the last command with + +```shell +CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' pip install -e . +``` + +d. Install mmdetection for spatial temporal detection tasks. + +This part is **optional** if you're not going to do spatial temporal detection. + +See [here](https://github.com/open-mmlab/mmdetection#installation) to install mmdetection. + +:::{note} + +1. The git commit id will be written to the version number with step b, e.g. 0.6.0+2e7045c. The version will also be saved in trained models. + It is recommended that you run step b each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory. + +2. Following the above instructions, MMAction2 is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number). + +3. If you would like to use `opencv-python-headless` instead of `opencv-python`, + you can install it before installing MMCV. + +4. If you would like to use `PyAV`, you can install it with `conda install av -c conda-forge -y`. + +5. Some dependencies are optional. Running `python setup.py develop` will only install the minimum runtime requirements. + To use optional dependencies like `decord`, either install them with `pip install -r requirements/optional.txt` + or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`, + valid keys for the `[optional]` field are `all`, `tests`, `build`, and `optional`) like `pip install -v -e .[tests,build]`. + +::: + +## Install with CPU only + +The code can be built for CPU only environment (where CUDA isn't available). + +In CPU mode you can run the demo/demo.py for example. + +## Another option: Docker Image + +We provide a [Dockerfile](/docker/Dockerfile) to build an image. + +```shell +# build an image with PyTorch 1.6.0, CUDA 10.1, CUDNN 7. +docker build -f ./docker/Dockerfile --rm -t mmaction2 . +``` + +**Important:** Make sure you've installed the [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker). + +Run it with command: + +```shell +docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmaction2/data mmaction2 +``` + +## A from-scratch setup script + +Here is a full script for setting up MMAction2 with conda and link the dataset path (supposing that your Kinetics-400 dataset path is $KINETICS400_ROOT). + +```shell +conda create -n open-mmlab python=3.7 -y +conda activate open-mmlab + +# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest) +conda install -c pytorch pytorch torchvision -y + +# install the latest mmcv or mmcv-full, here we take mmcv as example +pip install mmcv + +# install mmaction2 +git clone https://github.com/open-mmlab/mmaction2.git +cd mmaction2 +pip install -r requirements/build.txt +python setup.py develop + +mkdir data +ln -s $KINETICS400_ROOT data +``` + +## Developing with multiple MMAction2 versions + +The train and test scripts already modify the `PYTHONPATH` to ensure the script use the MMAction2 in the current directory. + +To use the default MMAction2 installed in the environment rather than that you are working with, you can remove the following line in those scripts. + +```shell +PYTHONPATH="$(dirname $0)/..":$PYTHONPATH +``` + +## Verification + +To verify whether MMAction2 and the required environment are installed correctly, +we can run sample python codes to initialize a recognizer and inference a demo video: + +```python +import torch +from mmaction.apis import init_recognizer, inference_recognizer + +config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py' +device = 'cuda:0' # or 'cpu' +device = torch.device(device) + +model = init_recognizer(config_file, device=device) +# inference the demo video +inference_recognizer(model, 'demo/demo.mp4') +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/make.bat b/openmmlab_test/mmaction2-0.24.1/docs/make.bat new file mode 100644 index 0000000000000000000000000000000000000000..922152e96a04a242e6fc40f124261d74890617d8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=. +set BUILDDIR=_build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/openmmlab_test/mmaction2-0.24.1/docs/merge_docs.sh b/openmmlab_test/mmaction2-0.24.1/docs/merge_docs.sh new file mode 100644 index 0000000000000000000000000000000000000000..b38366f616de8243009b04349f16610eb7546b69 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/merge_docs.sh @@ -0,0 +1,48 @@ +#!/usr/bin/env bash + +sed -i '$a\\n' ../demo/README.md + +# gather models +cat ../configs/localization/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Action Localization Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > localization_models.md +cat ../configs/recognition/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Action Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > recognition_models.md +cat ../configs/recognition_audio/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" >> recognition_models.md +cat ../configs/detection/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Spatio Temporal Action Detection Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > detection_models.md +cat ../configs/skeleton/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Skeleton-based Action Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > skeleton_models.md + + +# demo +cat ../demo/README.md | sed "s/md#t/html#t/g" | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > demo.md + +# gather datasets +cat ../tools/data/*/README.md | sed 's/# Preparing/# /g' | sed 's/#/#&/' > prepare_data.md + +sed -i 's/(\/tools\/data\/activitynet\/README.md/(#activitynet/g' supported_datasets.md +sed -i 's/(\/tools\/data\/kinetics\/README.md/(#kinetics-400-600-700/g' supported_datasets.md +sed -i 's/(\/tools\/data\/mit\/README.md/(#moments-in-time/g' supported_datasets.md +sed -i 's/(\/tools\/data\/mmit\/README.md/(#multi-moments-in-time/g' supported_datasets.md +sed -i 's/(\/tools\/data\/sthv1\/README.md/(#something-something-v1/g' supported_datasets.md +sed -i 's/(\/tools\/data\/sthv2\/README.md/(#something-something-v2/g' supported_datasets.md +sed -i 's/(\/tools\/data\/thumos14\/README.md/(#thumos-14/g' supported_datasets.md +sed -i 's/(\/tools\/data\/ucf101\/README.md/(#ucf-101/g' supported_datasets.md +sed -i 's/(\/tools\/data\/ucf101_24\/README.md/(#ucf101-24/g' supported_datasets.md +sed -i 's/(\/tools\/data\/jhmdb\/README.md/(#jhmdb/g' supported_datasets.md +sed -i 's/(\/tools\/data\/hvu\/README.md/(#hvu/g' supported_datasets.md +sed -i 's/(\/tools\/data\/hmdb51\/README.md/(#hmdb51/g' supported_datasets.md +sed -i 's/(\/tools\/data\/jester\/README.md/(#jester/g' supported_datasets.md +sed -i 's/(\/tools\/data\/ava\/README.md/(#ava/g' supported_datasets.md +sed -i 's/(\/tools\/data\/gym\/README.md/(#gym/g' supported_datasets.md +sed -i 's/(\/tools\/data\/omnisource\/README.md/(#omnisource/g' supported_datasets.md +sed -i 's/(\/tools\/data\/diving48\/README.md/(#diving48/g' supported_datasets.md +sed -i 's/(\/tools\/data\/skeleton\/README.md/(#skeleton/g' supported_datasets.md + + +cat prepare_data.md >> supported_datasets.md +sed -i 's/](\/docs\//](/g' supported_datasets.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' supported_datasets.md + +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' benchmark.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' getting_started.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' install.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' changelog.md +sed -i 's/](\/docs\//](/g' ./tutorials/*.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' ./tutorials/*.md diff --git a/openmmlab_test/mmaction2-0.24.1/docs/projects.md b/openmmlab_test/mmaction2-0.24.1/docs/projects.md new file mode 100644 index 0000000000000000000000000000000000000000..01a68643394f3a35f290417f9ae4cc81865ec2cd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/projects.md @@ -0,0 +1,23 @@ +# Projects based on MMAction2 + +There are many research works and projects built on MMAction2. +We list some of them as examples of how to extend MMAction2 for your own projects. +As the page might not be completed, please feel free to create a PR to update this page. + +## Projects as an extension + +- [OTEAction2](https://github.com/openvinotoolkit/mmaction2): OpenVINO Training Extensions for Action Recognition. + +## Projects of papers + +There are also projects released with papers. +Some of the papers are published in top-tier conferences (CVPR, ICCV, and ECCV), the others are also highly influential. +To make this list also a reference for the community to develop and compare new video understanding algorithms, we list them following the time order of top-tier conferences. +Methods already supported and maintained by MMAction2 are not listed. + +- Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2107.10161)[\[github\]](https://github.com/Cogito2012/DEAR) +- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2103.17263)[\[github\]](https://github.com/xvjiarui/VFS) +- MGSampler: An Explainable Sampling Strategy for Video Action Recognition, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2104.09952)[\[github\]](https://github.com/MCG-NJU/MGSampler) +- MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2105.07404) +- Video Swin Transformer. [\[paper\]](https://arxiv.org/abs/2106.13230)[\[github\]](https://github.com/SwinTransformer/Video-Swin-Transformer) +- Long Short-Term Transformer for Online Action Detection. [\[paper\]](https://arxiv.org/abs/2107.03377) diff --git a/openmmlab_test/mmaction2-0.24.1/docs/stat.py b/openmmlab_test/mmaction2-0.24.1/docs/stat.py new file mode 100644 index 0000000000000000000000000000000000000000..53e64004e256f21e0445883e258e66a8bc2e140a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/stat.py @@ -0,0 +1,174 @@ +#!/usr/bin/env python +# Copyright (c) OpenMMLab. All rights reserved. +import functools as func +import glob +import re +from os.path import basename, splitext + +import numpy as np +import titlecase + + +def anchor(name): + return re.sub(r'-+', '-', re.sub(r'[^a-zA-Z0-9]', '-', + name.strip().lower())).strip('-') + + +# Count algorithms + +files = sorted(glob.glob('*_models.md')) +# files = sorted(glob.glob('docs/*_models.md')) + +stats = [] + +for f in files: + with open(f, 'r') as content_file: + content = content_file.read() + + # title + title = content.split('\n')[0].replace('#', '') + + # skip IMAGE and ABSTRACT tags + content = [ + x for x in content.split('\n') + if 'IMAGE' not in x and 'ABSTRACT' not in x + ] + content = '\n'.join(content) + + # count papers + papers = set( + (papertype, titlecase.titlecase(paper.lower().strip())) + for (papertype, paper) in re.findall( + r'\s*\n.*?\btitle\s*=\s*{(.*?)}', + content, re.DOTALL)) + # paper links + revcontent = '\n'.join(list(reversed(content.splitlines()))) + paperlinks = {} + for _, p in papers: + print(p) + q = p.replace('\\', '\\\\').replace('?', '\\?') + paperlinks[p] = ' '.join( + (f'[->]({splitext(basename(f))[0]}.html#{anchor(paperlink)})' + for paperlink in re.findall( + rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n', + revcontent, re.DOTALL | re.IGNORECASE))) + print(' ', paperlinks[p]) + paperlist = '\n'.join( + sorted(f' - [{t}] {x} ({paperlinks[x]})' for t, x in papers)) + # count configs + configs = set(x.lower().strip() + for x in re.findall(r'https.*configs/.*\.py', content)) + + # count ckpts + ckpts = set(x.lower().strip() + for x in re.findall(r'https://download.*\.pth', content) + if 'mmaction' in x) + + statsmsg = f""" +## [{title}]({f}) + +* Number of checkpoints: {len(ckpts)} +* Number of configs: {len(configs)} +* Number of papers: {len(papers)} +{paperlist} + + """ + + stats.append((papers, configs, ckpts, statsmsg)) + +allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _, _ in stats]) +allconfigs = func.reduce(lambda a, b: a.union(b), [c for _, c, _, _ in stats]) +allckpts = func.reduce(lambda a, b: a.union(b), [c for _, _, c, _ in stats]) +msglist = '\n'.join(x for _, _, _, x in stats) + +papertypes, papercounts = np.unique([t for t, _ in allpapers], + return_counts=True) +countstr = '\n'.join( + [f' - {t}: {c}' for t, c in zip(papertypes, papercounts)]) + +modelzoo = f""" +# Overview + +* Number of checkpoints: {len(allckpts)} +* Number of configs: {len(allconfigs)} +* Number of papers: {len(allpapers)} +{countstr} + +For supported datasets, see [datasets overview](datasets.md). + +{msglist} +""" + +with open('modelzoo.md', 'w') as f: + f.write(modelzoo) + +# Count datasets + +files = ['supported_datasets.md'] +# files = sorted(glob.glob('docs/tasks/*.md')) + +datastats = [] + +for f in files: + with open(f, 'r') as content_file: + content = content_file.read() + + # title + title = content.split('\n')[0].replace('#', '') + + # count papers + papers = set( + (papertype, titlecase.titlecase(paper.lower().strip())) + for (papertype, paper) in re.findall( + r'\s*\n.*?\btitle\s*=\s*{(.*?)}', + content, re.DOTALL)) + # paper links + revcontent = '\n'.join(list(reversed(content.splitlines()))) + paperlinks = {} + for _, p in papers: + print(p) + q = p.replace('\\', '\\\\').replace('?', '\\?') + paperlinks[p] = ', '.join( + (f'[{p.strip()} ->]({splitext(basename(f))[0]}.html#{anchor(p)})' + for p in re.findall( + rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n', + revcontent, re.DOTALL | re.IGNORECASE))) + print(' ', paperlinks[p]) + paperlist = '\n'.join( + sorted(f' - [{t}] {x} ({paperlinks[x]})' for t, x in papers)) + + statsmsg = f""" +## [{title}]({f}) + +* Number of papers: {len(papers)} +{paperlist} + + """ + + datastats.append((papers, configs, ckpts, statsmsg)) + +alldatapapers = func.reduce(lambda a, b: a.union(b), + [p for p, _, _, _ in datastats]) + +# Summarize + +msglist = '\n'.join(x for _, _, _, x in stats) +datamsglist = '\n'.join(x for _, _, _, x in datastats) +papertypes, papercounts = np.unique([t for t, _ in alldatapapers], + return_counts=True) +countstr = '\n'.join( + [f' - {t}: {c}' for t, c in zip(papertypes, papercounts)]) + +modelzoo = f""" +# Overview + +* Number of papers: {len(alldatapapers)} +{countstr} + +For supported action algorithms, see [modelzoo overview](modelzoo.md). + +{datamsglist} +""" + +with open('datasets.md', 'w') as f: + f.write(modelzoo) diff --git a/openmmlab_test/mmaction2-0.24.1/docs/supported_datasets.md b/openmmlab_test/mmaction2-0.24.1/docs/supported_datasets.md new file mode 100644 index 0000000000000000000000000000000000000000..8a4403df0d77528ce704baf561df4cbacf483c66 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/supported_datasets.md @@ -0,0 +1,36 @@ +# Supported Datasets + +- Action Recognition + + - [UCF101](/tools/data/ucf101/README.md) \[ [Homepage](https://www.crcv.ucf.edu/research/data-sets/ucf101/) \]. + - [HMDB51](/tools/data/hmdb51/README.md) \[ [Homepage](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/) \]. + - [Kinetics-\[400/600/700\]](/tools/data/kinetics/README.md) \[ [Homepage](https://deepmind.com/research/open-source/kinetics) \] + - [Something-Something V1](/tools/data/sthv1/README.md) \[ [Homepage](https://20bn.com/datasets/something-something/v1) \] + - [Something-Something V2](/tools/data/sthv2/README.md) \[ [Homepage](https://20bn.com/datasets/something-something) \] + - [Moments in Time](/tools/data/mit/README.md) \[ [Homepage](http://moments.csail.mit.edu/) \] + - [Multi-Moments in Time](/tools/data/mmit/README.md) \[ [Homepage](http://moments.csail.mit.edu/challenge_iccv_2019.html) \] + - [HVU](/tools/data/hvu/README.md) \[ [Homepage](https://github.com/holistic-video-understanding/HVU-Dataset) \] + - [Jester](/tools/data/jester/README.md) \[ [Homepage](https://20bn.com/datasets/jester/v1) \] + - [GYM](/tools/data/gym/README.md) \[ [Homepage](https://sdolivia.github.io/FineGym/) \] + - [ActivityNet](/tools/data/activitynet/README.md) \[ [Homepage](http://activity-net.org/) \] + - [Diving48](/tools/data/diving48/README.md) \[ [Homepage](http://www.svcl.ucsd.edu/projects/resound/dataset.html) \] + - [OmniSource](/tools/data/omnisource/README.md) \[ [Homepage](https://kennymckormick.github.io/omnisource/) \] + +- Temporal Action Detection + + - [ActivityNet](/tools/data/activitynet/README.md) \[ [Homepage](http://activity-net.org/) \] + - [THUMOS14](/tools/data/thumos14/README.md) \[ [Homepage](https://www.crcv.ucf.edu/THUMOS14/download.html) \] + +- Spatial Temporal Action Detection + + - [AVA](/tools/data/ava/README.md) \[ [Homepage](https://research.google.com/ava/index.html) \] + - [UCF101-24](/tools/data/ucf101_24/README.md) \[ [Homepage](http://www.thumos.info/download.html) \] + - [JHMDB](/tools/data/jhmdb/README.md) \[ [Homepage](http://jhmdb.is.tue.mpg.de/) \] + +- Skeleton-based Action Recognition + + - [PoseC3D Skeleton Dataset](/tools/data/skeleton/README.md) \[ [Homepage](https://kennymckormick.github.io/posec3d/) \] + +The supported datasets are listed above. +We provide shell scripts for data preparation under the path `$MMACTION2/tools/data/`. +Below is the detailed tutorials of data deployment for each dataset. diff --git a/openmmlab_test/mmaction2-0.24.1/docs/switch_language.md b/openmmlab_test/mmaction2-0.24.1/docs/switch_language.md new file mode 100644 index 0000000000000000000000000000000000000000..4bade2237f4cd26b1999da90baafef3543b333cf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/switch_language.md @@ -0,0 +1,3 @@ +## English + +## 简体中文 diff --git a/openmmlab_test/mmaction2-0.24.1/docs/tutorials/1_config.md b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/1_config.md new file mode 100644 index 0000000000000000000000000000000000000000..617c71330a2edd62fe41fb6cc94f1636314cc4b0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/1_config.md @@ -0,0 +1,757 @@ +# Tutorial 1: Learn about Configs + +We use python files as configs, incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments. +You can find all the provided configs under `$MMAction2/configs`. If you wish to inspect the config file, +you may run `python tools/analysis/print_config.py /PATH/TO/CONFIG` to see the complete config. + + + +- [Modify config through script arguments](#modify-config-through-script-arguments) +- [Config File Structure](#config-file-structure) +- [Config File Naming Convention](#config-file-naming-convention) + - [Config System for Action localization](#config-system-for-action-localization) + - [Config System for Action Recognition](#config-system-for-action-recognition) + - [Config System for Spatio-Temporal Action Detection](#config-system-for-spatio-temporal-action-detection) +- [FAQ](#faq) + - [Use intermediate variables in configs](#use-intermediate-variables-in-configs) + + + +## Modify config through script arguments + +When submitting jobs using "tools/train.py" or "tools/test.py", you may specify `--cfg-options` to in-place modify the config. + +- Update config keys of dict. + + The config options can be specified following the order of the dict keys in the original config. + For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode. + +- Update keys inside a list of configs. + + Some config dicts are composed as a list in your config. For example, the training pipeline `data.train.pipeline` is normally a list + e.g. `[dict(type='SampleFrames'), ...]`. If you want to change `'SampleFrames'` to `'DenseSampleFrames'` in the pipeline, + you may specify `--cfg-options data.train.pipeline.0.type=DenseSampleFrames`. + +- Update values of list/tuples. + + If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to + change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark " is necessary to + support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value. + +## Config File Structure + +There are 3 basic component types under `config/_base_`, model, schedule, default_runtime. +Many methods could be easily constructed with one of each like TSN, I3D, SlowOnly, etc. +The configs that are composed by components from `_base_` are called _primitive_. + +For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3. + +For easy understanding, we recommend contributors to inherit from exiting methods. +For example, if some modification is made base on TSN, users may first inherit the basic TSN structure by specifying `_base_ = ../tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py`, then modify the necessary fields in the config files. + +If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder under `configs/TASK`. + +Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/understand_mmcv/config.html) for detailed documentation. + +## Config File Naming Convention + +We follow the style below to name config files. Contributors are advised to follow the same style. + +``` +{model}_[model setting]_{backbone}_[misc]_{data setting}_[gpu x batch_per_gpu]_{schedule}_{dataset}_{modality} +``` + +`{xxx}` is required field and `[yyy]` is optional. + +- `{model}`: model type, e.g. `tsn`, `i3d`, etc. +- `[model setting]`: specific setting for some models. +- `{backbone}`: backbone type, e.g. `r50` (ResNet-50), etc. +- `[misc]`: miscellaneous setting/plugins of model, e.g. `dense`, `320p`, `video`, etc. +- `{data setting}`: frame sample setting in `{clip_len}x{frame_interval}x{num_clips}` format. +- `[gpu x batch_per_gpu]`: GPUs and samples per GPU. +- `{schedule}`: training schedule, e.g. `20e` means 20 epochs. +- `{dataset}`: dataset name, e.g. `kinetics400`, `mmit`, etc. +- `{modality}`: frame modality, e.g. `rgb`, `flow`, etc. + +### Config System for Action localization + +We incorporate modular design into our config system, +which is convenient to conduct various experiments. + +- An Example of BMN + + To help the users have a basic idea of a complete config structure and the modules in an action localization system, + we make brief comments on the config of BMN as the following. + For more detailed usage and alternative for per parameter in each module, please refer to the [API documentation](https://mmaction2.readthedocs.io/en/latest/api.html). + + ```python + # model settings + model = dict( # Config of the model + type='BMN', # Type of the localizer + temporal_dim=100, # Total frames selected for each video + boundary_ratio=0.5, # Ratio for determining video boundaries + num_samples=32, # Number of samples for each proposal + num_samples_per_bin=3, # Number of bin samples for each sample + feat_dim=400, # Dimension of feature + soft_nms_alpha=0.4, # Soft NMS alpha + soft_nms_low_threshold=0.5, # Soft NMS low threshold + soft_nms_high_threshold=0.9, # Soft NMS high threshold + post_process_top_k=100) # Top k proposals in post process + # model training and testing settings + train_cfg = None # Config of training hyperparameters for BMN + test_cfg = dict(average_clips='score') # Config for testing hyperparameters for BMN + + # dataset settings + dataset_type = 'ActivityNetDataset' # Type of dataset for training, validation and testing + data_root = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for training + data_root_val = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for validation and testing + ann_file_train = 'data/ActivityNet/anet_anno_train.json' # Path to the annotation file for training + ann_file_val = 'data/ActivityNet/anet_anno_val.json' # Path to the annotation file for validation + ann_file_test = 'data/ActivityNet/anet_anno_test.json' # Path to the annotation file for testing + + train_pipeline = [ # List of training pipeline steps + dict(type='LoadLocalizationFeature'), # Load localization feature pipeline + dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer + keys=['raw_feature', 'gt_bbox'], # Keys of input + meta_name='video_meta', # Meta name + meta_keys=['video_name']), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['raw_feature']), # Keys to be converted from image to tensor + dict( # Config of ToDataContainer + type='ToDataContainer', # Pipeline to convert the data to DataContainer + fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # Required fields to be converted with keys and attributes + ] + val_pipeline = [ # List of validation pipeline steps + dict(type='LoadLocalizationFeature'), # Load localization feature pipeline + dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer + keys=['raw_feature', 'gt_bbox'], # Keys of input + meta_name='video_meta', # Meta name + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['raw_feature']), # Keys to be converted from image to tensor + dict( # Config of ToDataContainer + type='ToDataContainer', # Pipeline to convert the data to DataContainer + fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # Required fields to be converted with keys and attributes + ] + test_pipeline = [ # List of testing pipeline steps + dict(type='LoadLocalizationFeature'), # Load localization feature pipeline + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer + keys=['raw_feature'], # Keys of input + meta_name='video_meta', # Meta name + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['raw_feature']), # Keys to be converted from image to tensor + ] + data = dict( # Config of data + videos_per_gpu=8, # Batch size of each single GPU + workers_per_gpu=8, # Workers to pre-fetch data for each single GPU + train_dataloader=dict( # Additional config of train dataloader + drop_last=True), # Whether to drop out the last batch of data in training + val_dataloader=dict( # Additional config of validation dataloader + videos_per_gpu=1), # Batch size of each single GPU during evaluation + test_dataloader=dict( # Additional config of test dataloader + videos_per_gpu=2), # Batch size of each single GPU during testing + test=dict( # Testing dataset config + type=dataset_type, + ann_file=ann_file_test, + pipeline=test_pipeline, + data_prefix=data_root_val), + val=dict( # Validation dataset config + type=dataset_type, + ann_file=ann_file_val, + pipeline=val_pipeline, + data_prefix=data_root_val), + train=dict( # Training dataset config + type=dataset_type, + ann_file=ann_file_train, + pipeline=train_pipeline, + data_prefix=data_root)) + + # optimizer + optimizer = dict( + # Config used to build optimizer, support (1). All the optimizers in PyTorch + # whose arguments are also the same as those in PyTorch. (2). Custom optimizers + # which are built on `constructor`, referring to "tutorials/5_new_modules.md" + # for implementation. + type='Adam', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details + lr=0.001, # Learning rate, see detail usages of the parameters in the documentation of PyTorch + weight_decay=0.0001) # Weight decay of Adam + optimizer_config = dict( # Config used to build the optimizer hook + grad_clip=None) # Most of the methods do not use gradient clip + # learning policy + lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook + policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 + step=7) # Steps to decay the learning rate + + total_epochs = 9 # Total epochs to train the model + checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation + interval=1) # Interval to save checkpoint + evaluation = dict( # Config of evaluation during training + interval=1, # Interval to perform evaluation + metrics=['AR@AN']) # Metrics to be performed + log_config = dict( # Config to register logger hook + interval=50, # Interval to print the log + hooks=[ # Hooks to be implemented during training + dict(type='TextLoggerHook'), # The logger used to record the training process + # dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported + ]) + + # runtime settings + dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set + log_level = 'INFO' # The level of logging + work_dir = './work_dirs/bmn_400x100_2x8_9e_activitynet_feature/' # Directory to save the model checkpoints and logs for the current experiments + load_from = None # load models as a pre-trained model from a given path. This will not resume training + resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved + workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once + output_config = dict( # Config of localization output + out=f'{work_dir}/results.json', # Path to output file + output_format='json') # File format of output file + ``` + +### Config System for Action Recognition + +We incorporate modular design into our config system, +which is convenient to conduct various experiments. + +- An Example of TSN + + To help the users have a basic idea of a complete config structure and the modules in an action recognition system, + we make brief comments on the config of TSN as the following. + For more detailed usage and alternative for per parameter in each module, please refer to the API documentation. + + ```python + # model settings + model = dict( # Config of the model + type='Recognizer2D', # Type of the recognizer + backbone=dict( # Dict for backbone + type='ResNet', # Name of the backbone + pretrained='torchvision://resnet50', # The url/site of the pretrained model + depth=50, # Depth of ResNet model + norm_eval=False), # Whether to set BN layers to eval mode when training + cls_head=dict( # Dict for classification head + type='TSNHead', # Name of classification head + num_classes=400, # Number of classes to be classified. + in_channels=2048, # The input channels of classification head. + spatial_type='avg', # Type of pooling in spatial dimension + consensus=dict(type='AvgConsensus', dim=1), # Config of consensus module + dropout_ratio=0.4, # Probability in dropout layer + init_std=0.01), # Std value for linear layer initiation + # model training and testing settings + train_cfg=None, # Config of training hyperparameters for TSN + test_cfg=dict(average_clips=None)) # Config for testing hyperparameters for TSN. + + # dataset settings + dataset_type = 'RawframeDataset' # Type of dataset for training, validation and testing + data_root = 'data/kinetics400/rawframes_train/' # Root path to data for training + data_root_val = 'data/kinetics400/rawframes_val/' # Root path to data for validation and testing + ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' # Path to the annotation file for training + ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for validation + ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for testing + img_norm_cfg = dict( # Config of image normalization used in data pipeline + mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize + std=[58.395, 57.12, 57.375], # Std values of different channels to normalize + to_bgr=False) # Whether to convert channels from RGB to BGR + + train_pipeline = [ # List of training pipeline steps + dict( # Config of SampleFrames + type='SampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=1, # Frames of each sampled output clip + frame_interval=1, # Temporal interval of adjacent sampled frames + num_clips=3), # Number of clips to be sampled + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(-1, 256)), # The scale to resize images + dict( # Config of MultiScaleCrop + type='MultiScaleCrop', # Multi scale crop pipeline, cropping images with a list of randomly selected scales + input_size=224, # Input size of the network + scales=(1, 0.875, 0.75, 0.66), # Scales of width and height to be selected + random_crop=False, # Whether to randomly sample cropping bbox + max_wh_scale_gap=1), # Maximum gap of w and h scale levels + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(224, 224), # The scale to resize images + keep_ratio=False), # Whether to resize with changing the aspect ratio + dict( # Config of Flip + type='Flip', # Flip Pipeline + flip_ratio=0.5), # Probability of implementing flip + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCHW'), # Final image shape format + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer + keys=['imgs', 'label'], # Keys of input + meta_keys=[]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['imgs', 'label']) # Keys to be converted from image to tensor + ] + val_pipeline = [ # List of validation pipeline steps + dict( # Config of SampleFrames + type='SampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=1, # Frames of each sampled output clip + frame_interval=1, # Temporal interval of adjacent sampled frames + num_clips=3, # Number of clips to be sampled + test_mode=True), # Whether to set test mode in sampling + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(-1, 256)), # The scale to resize images + dict( # Config of CenterCrop + type='CenterCrop', # Center crop pipeline, cropping the center area from images + crop_size=224), # The size to crop images + dict( # Config of Flip + type='Flip', # Flip pipeline + flip_ratio=0), # Probability of implementing flip + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCHW'), # Final image shape format + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer + keys=['imgs', 'label'], # Keys of input + meta_keys=[]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['imgs']) # Keys to be converted from image to tensor + ] + test_pipeline = [ # List of testing pipeline steps + dict( # Config of SampleFrames + type='SampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=1, # Frames of each sampled output clip + frame_interval=1, # Temporal interval of adjacent sampled frames + num_clips=25, # Number of clips to be sampled + test_mode=True), # Whether to set test mode in sampling + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(-1, 256)), # The scale to resize images + dict( # Config of TenCrop + type='TenCrop', # Ten crop pipeline, cropping ten area from images + crop_size=224), # The size to crop images + dict( # Config of Flip + type='Flip', # Flip pipeline + flip_ratio=0), # Probability of implementing flip + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCHW'), # Final image shape format + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer + keys=['imgs', 'label'], # Keys of input + meta_keys=[]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['imgs']) # Keys to be converted from image to tensor + ] + data = dict( # Config of data + videos_per_gpu=32, # Batch size of each single GPU + workers_per_gpu=2, # Workers to pre-fetch data for each single GPU + train_dataloader=dict( # Additional config of train dataloader + drop_last=True), # Whether to drop out the last batch of data in training + val_dataloader=dict( # Additional config of validation dataloader + videos_per_gpu=1), # Batch size of each single GPU during evaluation + test_dataloader=dict( # Additional config of test dataloader + videos_per_gpu=2), # Batch size of each single GPU during testing + train=dict( # Training dataset config + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( # Validation dataset config + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( # Testing dataset config + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + # optimizer + optimizer = dict( + # Config used to build optimizer, support (1). All the optimizers in PyTorch + # whose arguments are also the same as those in PyTorch. (2). Custom optimizers + # which are built on `constructor`, referring to "tutorials/5_new_modules.md" + # for implementation. + type='SGD', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details + lr=0.01, # Learning rate, see detail usages of the parameters in the documentation of PyTorch + momentum=0.9, # Momentum, + weight_decay=0.0001) # Weight decay of SGD + optimizer_config = dict( # Config used to build the optimizer hook + grad_clip=dict(max_norm=40, norm_type=2)) # Use gradient clip + # learning policy + lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook + policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 + step=[40, 80]) # Steps to decay the learning rate + total_epochs = 100 # Total epochs to train the model + checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation + interval=5) # Interval to save checkpoint + evaluation = dict( # Config of evaluation during training + interval=5, # Interval to perform evaluation + metrics=['top_k_accuracy', 'mean_class_accuracy'], # Metrics to be performed + metric_options=dict(top_k_accuracy=dict(topk=(1, 3))), # Set top-k accuracy to 1 and 3 during validation + save_best='top_k_accuracy') # set `top_k_accuracy` as key indicator to save best checkpoint + eval_config = dict( + metric_options=dict(top_k_accuracy=dict(topk=(1, 3)))) # Set top-k accuracy to 1 and 3 during testing. You can also use `--eval top_k_accuracy` to assign evaluation metrics + log_config = dict( # Config to register logger hook + interval=20, # Interval to print the log + hooks=[ # Hooks to be implemented during training + dict(type='TextLoggerHook'), # The logger used to record the training process + # dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported + ]) + + # runtime settings + dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set + log_level = 'INFO' # The level of logging + work_dir = './work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/' # Directory to save the model checkpoints and logs for the current experiments + load_from = None # load models as a pre-trained model from a given path. This will not resume training + resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved + workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once + + ``` + +### Config System for Spatio-Temporal Action Detection + +We incorporate modular design into our config system, which is convenient to conduct various experiments. + +- An Example of FastRCNN + + To help the users have a basic idea of a complete config structure and the modules in a spatio-temporal action detection system, + we make brief comments on the config of FastRCNN as the following. + For more detailed usage and alternative for per parameter in each module, please refer to the API documentation. + + ```python + # model setting + model = dict( # Config of the model + type='FastRCNN', # Type of the detector + backbone=dict( # Dict for backbone + type='ResNet3dSlowOnly', # Name of the backbone + depth=50, # Depth of ResNet model + pretrained=None, # The url/site of the pretrained model + pretrained2d=False, # If the pretrained model is 2D + lateral=False, # If the backbone is with lateral connections + num_stages=4, # Stages of ResNet model + conv1_kernel=(1, 7, 7), # Conv1 kernel size + conv1_stride_t=1, # Conv1 temporal stride + pool1_stride_t=1, # Pool1 temporal stride + spatial_strides=(1, 2, 2, 1)), # The spatial stride for each ResNet stage + roi_head=dict( # Dict for roi_head + type='AVARoIHead', # Name of the roi_head + bbox_roi_extractor=dict( # Dict for bbox_roi_extractor + type='SingleRoIExtractor3D', # Name of the bbox_roi_extractor + roi_layer_type='RoIAlign', # Type of the RoI op + output_size=8, # Output feature size of the RoI op + with_temporal_pool=True), # If temporal dim is pooled + bbox_head=dict( # Dict for bbox_head + type='BBoxHeadAVA', # Name of the bbox_head + in_channels=2048, # Number of channels of the input feature + num_classes=81, # Number of action classes + 1 + multilabel=True, # If the dataset is multilabel + dropout_ratio=0.5)), # The dropout ratio used + # model training and testing settings + train_cfg=dict( # Training config of FastRCNN + rcnn=dict( # Dict for rcnn training config + assigner=dict( # Dict for assigner + type='MaxIoUAssignerAVA', # Name of the assigner + pos_iou_thr=0.9, # IoU threshold for positive examples, > pos_iou_thr -> positive + neg_iou_thr=0.9, # IoU threshold for negative examples, < neg_iou_thr -> negative + min_pos_iou=0.9), # Minimum acceptable IoU for positive examples + sampler=dict( # Dict for sample + type='RandomSampler', # Name of the sampler + num=32, # Batch Size of the sampler + pos_fraction=1, # Positive bbox fraction of the sampler + neg_pos_ub=-1, # Upper bound of the ratio of num negative to num positive + add_gt_as_proposals=True), # Add gt bboxes as proposals + pos_weight=1.0, # Loss weight of positive examples + debug=False)), # Debug mode + test_cfg=dict( # Testing config of FastRCNN + rcnn=dict( # Dict for rcnn testing config + action_thr=0.002))) # The threshold of an action + + # dataset settings + dataset_type = 'AVADataset' # Type of dataset for training, validation and testing + data_root = 'data/ava/rawframes' # Root path to data + anno_root = 'data/ava/annotations' # Root path to annotations + + ann_file_train = f'{anno_root}/ava_train_v2.1.csv' # Path to the annotation file for training + ann_file_val = f'{anno_root}/ava_val_v2.1.csv' # Path to the annotation file for validation + + exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for training + exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for validation + + label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' # Path to the label file + + proposal_file_train = f'{anno_root}/ava_dense_proposals_train.FAIR.recall_93.9.pkl' # Path to the human detection proposals for training examples + proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' # Path to the human detection proposals for validation examples + + img_norm_cfg = dict( # Config of image normalization used in data pipeline + mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize + std=[58.395, 57.12, 57.375], # Std values of different channels to normalize + to_bgr=False) # Whether to convert channels from RGB to BGR + + train_pipeline = [ # List of training pipeline steps + dict( # Config of SampleFrames + type='AVASampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=4, # Frames of each sampled output clip + frame_interval=16), # Temporal interval of adjacent sampled frames + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of RandomRescale + type='RandomRescale', # Randomly rescale the shortedge by a given range + scale_range=(256, 320)), # The shortedge size range of RandomRescale + dict( # Config of RandomCrop + type='RandomCrop', # Randomly crop a patch with the given size + size=256), # The size of the cropped patch + dict( # Config of Flip + type='Flip', # Flip Pipeline + flip_ratio=0.5), # Probability of implementing flip + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCTHW', # Final image shape format + collapse=True), # Collapse the dim N if N == 1 + dict( # Config of Rename + type='Rename', # Rename keys + mapping=dict(imgs='img')), # The old name to new name mapping + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), # Keys to be converted from image to tensor + dict( # Config of ToDataContainer + type='ToDataContainer', # Convert other types to DataContainer type pipeline + fields=[ # Fields to convert to DataContainer + dict( # Dict of fields + key=['proposals', 'gt_bboxes', 'gt_labels'], # Keys to Convert to DataContainer + stack=False)]), # Whether to stack these tensor + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the detector + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], # Keys of input + meta_keys=['scores', 'entity_ids']), # Meta keys of input + ] + + val_pipeline = [ # List of validation pipeline steps + dict( # Config of SampleFrames + type='AVASampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=4, # Frames of each sampled output clip + frame_interval=16) # Temporal interval of adjacent sampled frames + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(-1, 256)), # The scale to resize images + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCTHW', # Final image shape format + collapse=True), # Collapse the dim N if N == 1 + dict( # Config of Rename + type='Rename', # Rename keys + mapping=dict(imgs='img')), # The old name to new name mapping + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['img', 'proposals']), # Keys to be converted from image to tensor + dict( # Config of ToDataContainer + type='ToDataContainer', # Convert other types to DataContainer type pipeline + fields=[ # Fields to convert to DataContainer + dict( # Dict of fields + key=['proposals'], # Keys to Convert to DataContainer + stack=False)]), # Whether to stack these tensor + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the detector + keys=['img', 'proposals'], # Keys of input + meta_keys=['scores', 'entity_ids'], # Meta keys of input + nested=True) # Whether to wrap the data in a nested list + ] + + data = dict( # Config of data + videos_per_gpu=16, # Batch size of each single GPU + workers_per_gpu=2, # Workers to pre-fetch data for each single GPU + val_dataloader=dict( # Additional config of validation dataloader + videos_per_gpu=1), # Batch size of each single GPU during evaluation + train=dict( # Training dataset config + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( # Validation dataset config + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) + data['test'] = data['val'] # Set test_dataset as val_dataset + + # optimizer + optimizer = dict( + # Config used to build optimizer, support (1). All the optimizers in PyTorch + # whose arguments are also the same as those in PyTorch. (2). Custom optimizers + # which are built on `constructor`, referring to "tutorials/5_new_modules.md" + # for implementation. + type='SGD', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details + lr=0.2, # Learning rate, see detail usages of the parameters in the documentation of PyTorch (for 8gpu) + momentum=0.9, # Momentum, + weight_decay=0.00001) # Weight decay of SGD + + optimizer_config = dict( # Config used to build the optimizer hook + grad_clip=dict(max_norm=40, norm_type=2)) # Use gradient clip + + lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook + policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 + step=[40, 80], # Steps to decay the learning rate + warmup='linear', # Warmup strategy + warmup_by_epoch=True, # Warmup_iters indicates iter num or epoch num + warmup_iters=5, # Number of iters or epochs for warmup + warmup_ratio=0.1) # The initial learning rate is warmup_ratio * lr + + total_epochs = 20 # Total epochs to train the model + checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation + interval=1) # Interval to save checkpoint + workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once + evaluation = dict( # Config of evaluation during training + interval=1, save_best='mAP@0.5IOU') # Interval to perform evaluation and the key for saving best checkpoint + log_config = dict( # Config to register logger hook + interval=20, # Interval to print the log + hooks=[ # Hooks to be implemented during training + dict(type='TextLoggerHook'), # The logger used to record the training process + ]) + + # runtime settings + dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set + log_level = 'INFO' # The level of logging + work_dir = ('./work_dirs/ava/' # Directory to save the model checkpoints and logs for the current experiments + 'slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb') + load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' # load models as a pre-trained model from a given path. This will not resume training + 'slowonly_r50_4x16x1_256e_kinetics400_rgb/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth') + resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved + ``` + +## FAQ + +### Use intermediate variables in configs + +Some intermediate variables are used in the config files, like `train_pipeline`/`val_pipeline`/`test_pipeline`, +`ann_file_train`/`ann_file_val`/`ann_file_test`, `img_norm_cfg` etc. + +For Example, we would like to first define `train_pipeline`/`val_pipeline`/`test_pipeline` and pass them into `data`. +Thus, `train_pipeline`/`val_pipeline`/`test_pipeline` are intermediate variable. + +we also define `ann_file_train`/`ann_file_val`/`ann_file_test` and `data_root`/`data_root_val` to provide data pipeline some +basic information. + +In addition, we use `img_norm_cfg` as intermediate variables to construct data augmentation components. + +```python +... +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/tutorials/2_finetune.md b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/2_finetune.md new file mode 100644 index 0000000000000000000000000000000000000000..ea2c83046e4873c185567fa940ab7d71f0e500a5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/2_finetune.md @@ -0,0 +1,99 @@ +# Tutorial 2: Finetuning Models + +This tutorial provides instructions for users to use the pre-trained models +to finetune them on other datasets, so that better performance can be achieved. + + + +- [Outline](#outline) +- [Modify Head](#modify-head) +- [Modify Dataset](#modify-dataset) +- [Modify Training Schedule](#modify-training-schedule) +- [Use Pre-Trained Model](#use-pre-trained-model) + + + +## Outline + +There are two steps to finetune a model on a new dataset. + +1. Add support for the new dataset. See [Tutorial 3: Adding New Dataset](3_new_dataset.md). +2. Modify the configs. This will be discussed in this tutorial. + +For example, if the users want to finetune models pre-trained on Kinetics-400 Dataset to another dataset, say UCF101, +then four parts in the config (see [here](1_config.md)) needs attention. + +## Modify Head + +The `num_classes` in the `cls_head` need to be changed to the class number of the new dataset. +The weights of the pre-trained models are reused except for the final prediction layer. +So it is safe to change the class number. +In our case, UCF101 has 101 classes. +So we change it from 400 (class number of Kinetics-400) to 101. + +```python +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False), + cls_head=dict( + type='TSNHead', + num_classes=101, # change from 400 to 101 + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.4, + init_std=0.01), + train_cfg=None, + test_cfg=dict(average_clips=None)) +``` + +Note that the `pretrained='torchvision://resnet50'` setting is used for initializing backbone. +If you are training a new model from ImageNet-pretrained weights, this is for you. +However, this setting is not related to our task at hand. +What we need is `load_from`, which will be discussed later. + +## Modify Dataset + +MMAction2 supports UCF101, Kinetics-400, Moments in Time, Multi-Moments in Time, THUMOS14, +Something-Something V1&V2, ActivityNet Dataset. +The users may need to adapt one of the above dataset to fit for their special datasets. +In our case, UCF101 is already supported by various dataset types, like `RawframeDataset`, +so we change the config as follows. + +```python +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'data/ucf101/rawframes_train/' +data_root_val = 'data/ucf101/rawframes_val/' +ann_file_train = 'data/ucf101/ucf101_train_list.txt' +ann_file_val = 'data/ucf101/ucf101_val_list.txt' +ann_file_test = 'data/ucf101/ucf101_val_list.txt' +``` + +## Modify Training Schedule + +Finetuning usually requires smaller learning rate and less training epochs. + +```python +# optimizer +optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001) # change from 0.01 to 0.005 +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# learning policy +lr_config = dict(policy='step', step=[20, 40]) +total_epochs = 50 # change from 100 to 50 +checkpoint_config = dict(interval=5) +``` + +## Use Pre-Trained Model + +To use the pre-trained model for the whole network, the new config adds the link of pre-trained models in the `load_from`. +We set `load_from=None` as default in `configs/_base_/default_runtime.py` and owing to [inheritance design](/docs/tutorials/1_config.md), users can directly change it by setting `load_from` in their configs. + +```python +# use the pre-trained model for the whole TSN network +load_from = 'https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/mmaction-v1/recognition/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth' # model path can be found in model zoo +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/tutorials/3_new_dataset.md b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/3_new_dataset.md new file mode 100644 index 0000000000000000000000000000000000000000..223117aa5751a8eb84c13de24402e63e8808ff63 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/3_new_dataset.md @@ -0,0 +1,252 @@ +# Tutorial 3: Adding New Dataset + +In this tutorial, we will introduce some methods about how to customize your own dataset by reorganizing data and mixing dataset for the project. + + + +- [Customize Datasets by Reorganizing Data](#customize-datasets-by-reorganizing-data) + - [Reorganize datasets to existing format](#reorganize-datasets-to-existing-format) + - [An example of a custom dataset](#an-example-of-a-custom-dataset) +- [Customize Dataset by Mixing Dataset](#customize-dataset-by-mixing-dataset) + - [Repeat dataset](#repeat-dataset) + + + +## Customize Datasets by Reorganizing Data + +### Reorganize datasets to existing format + +The simplest way is to convert your dataset to existing dataset formats (RawframeDataset or VideoDataset). + +There are three kinds of annotation files. + +- rawframe annotation + + The annotation of a rawframe dataset is a text file with multiple lines, + and each line indicates `frame_directory` (relative path) of a video, + `total_frames` of a video and the `label` of a video, which are split by a whitespace. + + Here is an example. + + ``` + some/directory-1 163 1 + some/directory-2 122 1 + some/directory-3 258 2 + some/directory-4 234 2 + some/directory-5 295 3 + some/directory-6 121 3 + ``` + +- video annotation + + The annotation of a video dataset is a text file with multiple lines, + and each line indicates a sample video with the `filepath` (relative path) and `label`, + which are split by a whitespace. + + Here is an example. + + ``` + some/path/000.mp4 1 + some/path/001.mp4 1 + some/path/002.mp4 2 + some/path/003.mp4 2 + some/path/004.mp4 3 + some/path/005.mp4 3 + ``` + +- ActivityNet annotation + + The annotation of ActivityNet dataset is a json file. Each key is a video name + and the corresponding value is the meta data and annotation for the video. + + Here is an example. + + ``` + { + "video1": { + "duration_second": 211.53, + "duration_frame": 6337, + "annotations": [ + { + "segment": [ + 30.025882995319815, + 205.2318595943838 + ], + "label": "Rock climbing" + } + ], + "feature_frame": 6336, + "fps": 30.0, + "rfps": 29.9579255898 + }, + "video2": { + "duration_second": 26.75, + "duration_frame": 647, + "annotations": [ + { + "segment": [ + 2.578755070202808, + 24.914101404056165 + ], + "label": "Drinking beer" + } + ], + "feature_frame": 624, + "fps": 24.0, + "rfps": 24.1869158879 + } + } + ``` + +There are two ways to work with custom datasets. + +- online conversion + + You can write a new Dataset class inherited from [BaseDataset](/mmaction/datasets/base.py), and overwrite three methods + `load_annotations(self)`, `evaluate(self, results, metrics, logger)` and `dump_results(self, results, out)`, + like [RawframeDataset](/mmaction/datasets/rawframe_dataset.py), [VideoDataset](/mmaction/datasets/video_dataset.py) or [ActivityNetDataset](/mmaction/datasets/activitynet_dataset.py). + +- offline conversion + + You can convert the annotation format to the expected format above and save it to + a pickle or json file, then you can simply use `RawframeDataset`, `VideoDataset` or `ActivityNetDataset`. + +After the data pre-processing, the users need to further modify the config files to use the dataset. +Here is an example of using a custom dataset in rawframe format. + +In `configs/task/method/my_custom_config.py`: + +```python +... +# dataset settings +dataset_type = 'RawframeDataset' +data_root = 'path/to/your/root' +data_root_val = 'path/to/your/root_val' +ann_file_train = 'data/custom/custom_train_list.txt' +ann_file_val = 'data/custom/custom_val_list.txt' +ann_file_test = 'data/custom/custom_val_list.txt' +... +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + ...), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + ...), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + ...)) +... +``` + +We use this way to support Rawframe dataset. + +### An example of a custom dataset + +Assume the annotation is in a new format in text files, and the image file name is of template like `img_00005.jpg` +The video annotations are stored in text file `annotation.txt` as following + +``` +directory,total frames,class +D32_1gwq35E,299,66 +-G-5CJ0JkKY,249,254 +T4h1bvOd9DA,299,33 +4uZ27ivBl00,299,341 +0LfESFkfBSw,249,186 +-YIsNpBEx6c,299,169 +``` + +We can create a new dataset in `mmaction/datasets/my_dataset.py` to load the data. + +```python +import copy +import os.path as osp + +import mmcv + +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class MyDataset(BaseDataset): + + def __init__(self, + ann_file, + pipeline, + data_prefix=None, + test_mode=False, + filename_tmpl='img_{:05}.jpg'): + super(MyDataset, self).__init__(ann_file, pipeline, test_mode) + + self.filename_tmpl = filename_tmpl + + def load_annotations(self): + video_infos = [] + with open(self.ann_file, 'r') as fin: + for line in fin: + if line.startswith("directory"): + continue + frame_dir, total_frames, label = line.split(',') + if self.data_prefix is not None: + frame_dir = osp.join(self.data_prefix, frame_dir) + video_infos.append( + dict( + frame_dir=frame_dir, + total_frames=int(total_frames), + label=int(label))) + return video_infos + + def prepare_train_frames(self, idx): + results = copy.deepcopy(self.video_infos[idx]) + results['filename_tmpl'] = self.filename_tmpl + return self.pipeline(results) + + def prepare_test_frames(self, idx): + results = copy.deepcopy(self.video_infos[idx]) + results['filename_tmpl'] = self.filename_tmpl + return self.pipeline(results) + + def evaluate(self, + results, + metrics='top_k_accuracy', + topk=(1, 5), + logger=None): + pass +``` + +Then in the config, to use `MyDataset` you can modify the config as the following + +```python +dataset_A_train = dict( + type='MyDataset', + ann_file=ann_file_train, + pipeline=train_pipeline +) +``` + +## Customize Dataset by Mixing Dataset + +MMAction2 also supports to mix dataset for training. Currently it supports to repeat dataset. + +### Repeat dataset + +We use `RepeatDataset` as wrapper to repeat the dataset. For example, suppose the original dataset as `Dataset_A`, +to repeat it, the config looks like the following + +```python +dataset_A_train = dict( + type='RepeatDataset', + times=N, + dataset=dict( # This is the original config of Dataset_A + type='Dataset_A', + ... + pipeline=train_pipeline + ) + ) +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/tutorials/4_data_pipeline.md b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/4_data_pipeline.md new file mode 100644 index 0000000000000000000000000000000000000000..97c5deb10c820f1ca72bdaa8393373da204ed368 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/4_data_pipeline.md @@ -0,0 +1,262 @@ +# Tutorial 4: Customize Data Pipelines + +In this tutorial, we will introduce some methods about the design of data pipelines, and how to customize and extend your own data pipelines for the project. + + + +- [Tutorial 4: Customize Data Pipelines](#tutorial-4-customize-data-pipelines) + - [Design of Data Pipelines](#design-of-data-pipelines) + - [Data loading](#data-loading) + - [Pre-processing](#pre-processing) + - [Formatting](#formatting) + - [Extend and Use Custom Pipelines](#extend-and-use-custom-pipelines) + + + +## Design of Data Pipelines + +Following typical conventions, we use `Dataset` and `DataLoader` for data loading +with multiple workers. `Dataset` returns a dict of data items corresponding +the arguments of models' forward method. +Since the data in action recognition & localization may not be the same size (image size, gt bbox size, etc.), +The `DataContainer` in MMCV is used to help collect and distribute data of different sizes. +See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details. + +The data preparation pipeline and the dataset is decomposed. Usually a dataset +defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict. +A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next operation. + +We present a typical pipeline in the following figure. The blue blocks are pipeline operations. +With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange). +![pipeline figure](https://github.com/open-mmlab/mmaction2/raw/master/resources/data_pipeline.png) + +The operations are categorized into data loading, pre-processing and formatting. + +Here is a pipeline example for TSN. + +```python +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode', io_backend='disk'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode', io_backend='disk'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode', io_backend='disk'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +``` + +We have supported some lazy operators and encourage users to apply them. +Lazy ops record how the data should be processed, but it will postpone the processing on the raw data until the raw data forward `Fuse` stage. +Specifically, lazy ops avoid frequent reading and modification operation on the raw data, but process the raw data once in the final Fuse stage, thus accelerating data preprocessing. + +Here is a pipeline example applying lazy ops. + +```python +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode', decoding_backend='turbojpeg'), + # The following three lazy ops only process the bbox of frames without + # modifying the raw data. + dict(type='Resize', scale=(-1, 256), lazy=True), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0, + lazy=True), + dict(type='Resize', scale=(224, 224), keep_ratio=False, lazy=True), + # Lazy operator `Flip` only record whether a frame should be fliped and the + # flip direction. + dict(type='Flip', flip_ratio=0.5, lazy=True), + # Processing the raw data once in Fuse stage. + dict(type='Fuse'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +``` + +For each operation, we list the related dict fields that are added/updated/removed below, where `*` means the key may not be affected. + +### Data loading + +`SampleFrames` + +- add: frame_inds, clip_len, frame_interval, num_clips, \*total_frames + +`DenseSampleFrames` + +- add: frame_inds, clip_len, frame_interval, num_clips, \*total_frames + +`PyAVDecode` + +- add: imgs, original_shape +- update: \*frame_inds + +`DecordDecode` + +- add: imgs, original_shape +- update: \*frame_inds + +`OpenCVDecode` + +- add: imgs, original_shape +- update: \*frame_inds + +`RawFrameDecode` + +- add: imgs, original_shape +- update: \*frame_inds + +### Pre-processing + +`RandomCrop` + +- add: crop_bbox, img_shape +- update: imgs + +`RandomResizedCrop` + +- add: crop_bbox, img_shape +- update: imgs + +`MultiScaleCrop` + +- add: crop_bbox, img_shape, scales +- update: imgs + +`Resize` + +- add: img_shape, keep_ratio, scale_factor +- update: imgs + +`Flip` + +- add: flip, flip_direction +- update: imgs, label + +`Normalize` + +- add: img_norm_cfg +- update: imgs + +`CenterCrop` + +- add: crop_bbox, img_shape +- update: imgs + +`ThreeCrop` + +- add: crop_bbox, img_shape +- update: imgs + +`TenCrop` + +- add: crop_bbox, img_shape +- update: imgs + +### Formatting + +`ToTensor` + +- update: specified by `keys`. + +`ImageToTensor` + +- update: specified by `keys`. + +`Transpose` + +- update: specified by `keys`. + +`Collect` + +- add: img_metas (the keys of img_metas is specified by `meta_keys`) +- remove: all other keys except for those specified by `keys` + +It is **noteworthy** that the first key, commonly `imgs`, will be used as the main key to calculate the batch size. + +`FormatShape` + +- add: input_shape +- update: imgs + +## Extend and Use Custom Pipelines + +1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict. + + ```python + from mmaction.datasets import PIPELINES + + @PIPELINES.register_module() + class MyTransform: + + def __call__(self, results): + results['key'] = value + return results + ``` + +2. Import the new class. + + ```python + from .my_pipeline import MyTransform + ``` + +3. Use it in config files. + + ```python + img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) + train_pipeline = [ + dict(type='DenseSampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='RawFrameDecode', io_backend='disk'), + dict(type='MyTransform'), # use a custom pipeline + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) + ] + ``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/tutorials/5_new_modules.md b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/5_new_modules.md new file mode 100644 index 0000000000000000000000000000000000000000..c683c7f96c4ad520d09dbec7f84484262e0eea07 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/5_new_modules.md @@ -0,0 +1,291 @@ +# Tutorial 5: Adding New Modules + +In this tutorial, we will introduce some methods about how to customize optimizer, develop new components and new a learning rate scheduler for this project. + + + +- [Customize Optimizer](#customize-optimizer) +- [Customize Optimizer Constructor](#customize-optimizer-constructor) +- [Develop New Components](#develop-new-components) + - [Add new backbones](#add-new-backbones) + - [Add new heads](#add-new-heads) + - [Add new loss](#add-new-loss) +- [Add new learning rate scheduler (updater)](#add-new-learning-rate-scheduler--updater-) + + + +## Customize Optimizer + +An example of customized optimizer is [CopyOfSGD](/mmaction/core/optimizer/copy_of_sgd.py) is defined in `mmaction/core/optimizer/copy_of_sgd.py`. +More generally, a customized optimizer could be defined as following. + +Assume you want to add an optimizer named as `MyOptimizer`, which has arguments `a`, `b` and `c`. +You need to first implement the new optimizer in a file, e.g., in `mmaction/core/optimizer/my_optimizer.py`: + +```python +from mmcv.runner import OPTIMIZERS +from torch.optim import Optimizer + +@OPTIMIZERS.register_module() +class MyOptimizer(Optimizer): + + def __init__(self, a, b, c): +``` + +Then add this module in `mmaction/core/optimizer/__init__.py`, thus the registry will find the new module and add it: + +```python +from .my_optimizer import MyOptimizer +``` + +Then you can use `MyOptimizer` in `optimizer` field of config files. +In the configs, the optimizers are defined by the field `optimizer` like the following: + +```python +optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +``` + +To use your own optimizer, the field can be changed as + +```python +optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value) +``` + +We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files. +For example, if you want to use `ADAM`, though the performance will drop a lot, the modification could be as the following. + +```python +optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001) +``` + +The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch. + +## Customize Optimizer Constructor + +Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers. +The users can do those fine-grained parameter tuning through customizing optimizer constructor. + +You can write a new optimizer constructor inherit from [DefaultOptimizerConstructor](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py) +and overwrite the `add_params(self, params, module)` method. + +An example of customized optimizer constructor is [TSMOptimizerConstructor](/mmaction/core/optimizer/tsm_optimizer_constructor.py). +More generally, a customized optimizer constructor could be defined as following. + +In `mmaction/core/optimizer/my_optimizer_constructor.py`: + +```python +from mmcv.runner import OPTIMIZER_BUILDERS, DefaultOptimizerConstructor + +@OPTIMIZER_BUILDERS.register_module() +class MyOptimizerConstructor(DefaultOptimizerConstructor): + +``` + +In `mmaction/core/optimizer/__init__.py`: + +```python +from .my_optimizer_constructor import MyOptimizerConstructor +``` + +Then you can use `MyOptimizerConstructor` in `optimizer` field of config files. + +```python +# optimizer +optimizer = dict( + type='SGD', + constructor='MyOptimizerConstructor', + paramwise_cfg=dict(fc_lr5=True), + lr=0.02, + momentum=0.9, + weight_decay=0.0001) +``` + +## Develop New Components + +We basically categorize model components into 4 types. + +- recognizer: the whole recognizer model pipeline, usually contains a backbone and cls_head. +- backbone: usually an FCN network to extract feature maps, e.g., ResNet, BNInception. +- cls_head: the component for classification task, usually contains an FC layer with some pooling layers. +- localizer: the model for temporal localization task, currently available: BSN, BMN, SSN. + +### Add new backbones + +Here we show how to develop new components with an example of TSN. + +1. Create a new file `mmaction/models/backbones/resnet.py`. + + ```python + import torch.nn as nn + + from ..builder import BACKBONES + + @BACKBONES.register_module() + class ResNet(nn.Module): + + def __init__(self, arg1, arg2): + pass + + def forward(self, x): # should return a tuple + pass + + def init_weights(self, pretrained=None): + pass + ``` + +2. Import the module in `mmaction/models/backbones/__init__.py`. + + ```python + from .resnet import ResNet + ``` + +3. Use it in your config file. + + ```python + model = dict( + ... + backbone=dict( + type='ResNet', + arg1=xxx, + arg2=xxx), + ) + ``` + +### Add new heads + +Here we show how to develop a new head with the example of TSNHead as the following. + +1. Create a new file `mmaction/models/heads/tsn_head.py`. + + You can write a new classification head inheriting from [BaseHead](/mmaction/models/heads/base.py), + and overwrite `init_weights(self)` and `forward(self, x)` method. + + ```python + from ..builder import HEADS + from .base import BaseHead + + + @HEADS.register_module() + class TSNHead(BaseHead): + + def __init__(self, arg1, arg2): + pass + + def forward(self, x): + pass + + def init_weights(self): + pass + ``` + +2. Import the module in `mmaction/models/heads/__init__.py` + + ```python + from .tsn_head import TSNHead + ``` + +3. Use it in your config file + + ```python + model = dict( + ... + cls_head=dict( + type='TSNHead', + num_classes=400, + in_channels=2048, + arg1=xxx, + arg2=xxx), + ``` + +### Add new loss + +Assume you want to add a new loss as `MyLoss`. To add a new loss function, the users need implement it in `mmaction/models/losses/my_loss.py`. + +```python +import torch +import torch.nn as nn + +from ..builder import LOSSES + +def my_loss(pred, target): + assert pred.size() == target.size() and target.numel() > 0 + loss = torch.abs(pred - target) + return loss + + +@LOSSES.register_module() +class MyLoss(nn.Module): + + def forward(self, pred, target): + loss = my_loss(pred, target) + return loss +``` + +Then the users need to add it in the `mmaction/models/losses/__init__.py` + +```python +from .my_loss import MyLoss, my_loss +``` + +To use it, modify the `loss_xxx` field. Since MyLoss is for regression, we can use it for the bbox loss `loss_bbox`. + +```python +loss_bbox=dict(type='MyLoss')) +``` + +## Add new learning rate scheduler (updater) + +The default manner of constructing a lr updater(namely, 'scheduler' by pytorch convention), is to modify the config such as: + +```python +... +lr_config = dict(policy='step', step=[20, 40]) +... +``` + +In the api for [`train.py`](/mmaction/apis/train.py), it will register the learning rate updater hook based on the config at: + +```python +... + runner.register_training_hooks( + cfg.lr_config, + optimizer_config, + cfg.checkpoint_config, + cfg.log_config, + cfg.get('momentum_config', None)) +... +``` + +So far, the supported updaters can be find in [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), but if you want to customize a new learning rate updater, you may follow the steps below: + +1. First, write your own LrUpdaterHook in `$MMAction2/mmaction/core/scheduler`. The snippet followed is an example of customized lr updater that uses learning rate based on a specific learning rate ratio: `lrs`, by which the learning rate decreases at each `steps`: + +```python +@HOOKS.register_module() +# Register it here +class RelativeStepLrUpdaterHook(LrUpdaterHook): + # You should inheritate it from mmcv.LrUpdaterHook + def __init__(self, steps, lrs, **kwargs): + super().__init__(**kwargs) + assert len(steps) == (len(lrs)) + self.steps = steps + self.lrs = lrs + + def get_lr(self, runner, base_lr): + # Only this function is required to override + # This function is called before each training epoch, return the specific learning rate here. + progress = runner.epoch if self.by_epoch else runner.iter + for i in range(len(self.steps)): + if progress < self.steps[i]: + return self.lrs[i] +``` + +2. Modify your config: + +In your config file, swap the original `lr_config` by: + +```python +lr_config = dict(policy='RelativeStep', steps=[20, 40, 60], lrs=[0.1, 0.01, 0.001]) +``` + +More examples can be found in [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py). diff --git a/openmmlab_test/mmaction2-0.24.1/docs/tutorials/6_export_model.md b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/6_export_model.md new file mode 100644 index 0000000000000000000000000000000000000000..d445ab12260a8d63df6df1a72f1f18848e6684e7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/6_export_model.md @@ -0,0 +1,74 @@ +# Tutorial 6: Exporting a model to ONNX + +Open Neural Network Exchange [(ONNX)](https://onnx.ai/) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. + + + +- [Supported Models](#supported-models) +- [Usage](#usage) + - [Prerequisite](#prerequisite) + - [Recognizers](#recognizers) + - [Localizers](#localizers) + + + +## Supported Models + +So far, our codebase supports onnx exporting from pytorch models trained with MMAction2. The supported models are: + +- I3D +- TSN +- TIN +- TSM +- R(2+1)D +- SLOWFAST +- SLOWONLY +- BMN +- BSN(tem, pem) + +## Usage + +For simple exporting, you can use the [script](/tools/deployment/pytorch2onnx.py) here. Note that the package `onnx` and `onnxruntime` are required for verification after exporting. + +### Prerequisite + +First, install onnx. + +```shell +pip install onnx onnxruntime +``` + +We provide a python script to export the pytorch model trained by MMAction2 to ONNX. + +```shell +python tools/deployment/pytorch2onnx.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--shape ${SHAPE}] \ + [--verify] [--show] [--output-file ${OUTPUT_FILE}] [--is-localizer] [--opset-version ${VERSION}] +``` + +Optional arguments: + +- `--shape`: The shape of input tensor to the model. For 2D recognizer(e.g. TSN), the input should be `$batch $clip $channel $height $width`(e.g. `1 1 3 224 224`); For 3D recognizer(e.g. I3D), the input should be `$batch $clip $channel $time $height $width`(e.g. `1 1 3 32 224 224`); For localizer such as BSN, the input for each module is different, please check the `forward` function for it. If not specified, it will be set to `1 1 3 224 224`. +- `--verify`: Determines whether to verify the exported model, runnably and numerically. If not specified, it will be set to `False`. +- `--show`: Determines whether to print the architecture of the exported model. If not specified, it will be set to `False`. +- `--output-file`: The output onnx model name. If not specified, it will be set to `tmp.onnx`. +- `--is-localizer`: Determines whether the model to be exported is a localizer. If not specified, it will be set to `False`. +- `--opset-version`: Determines the operation set version of onnx, we recommend you to use a higher version such as 11 for compatibility. If not specified, it will be set to `11`. +- `--softmax`: Determines whether to add a softmax layer at the end of recognizers. If not specified, it will be set to `False`. For now, localizers are not supported. + +### Recognizers + +For recognizers, please run: + +```shell +python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify +``` + +### Localizers + +For localizers, please run: + +```shell +python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify +``` + +Please fire an issue if you discover any checkpoints that are not perfectly exported or suffer some loss in accuracy. diff --git a/openmmlab_test/mmaction2-0.24.1/docs/tutorials/7_customize_runtime.md b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/7_customize_runtime.md new file mode 100644 index 0000000000000000000000000000000000000000..e0f2834db8c15568beb0f10f8fd38964c0d0174d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/tutorials/7_customize_runtime.md @@ -0,0 +1,350 @@ +# Tutorial 7: Customize Runtime Settings + +In this tutorial, we will introduce some methods about how to customize optimization methods, training schedules, workflow and hooks when running your own settings for the project. + + + +- [Customize Optimization Methods](#customize-optimization-methods) + - [Customize optimizer supported by PyTorch](#customize-optimizer-supported-by-pytorch) + - [Customize self-implemented optimizer](#customize-self-implemented-optimizer) + - [1. Define a new optimizer](#1-define-a-new-optimizer) + - [2. Add the optimizer to registry](#2-add-the-optimizer-to-registry) + - [3. Specify the optimizer in the config file](#3-specify-the-optimizer-in-the-config-file) + - [Customize optimizer constructor](#customize-optimizer-constructor) + - [Additional settings](#additional-settings) +- [Customize Training Schedules](#customize-training-schedules) +- [Customize Workflow](#customize-workflow) +- [Customize Hooks](#customize-hooks) + - [Customize self-implemented hooks](#customize-self-implemented-hooks) + - [1. Implement a new hook](#1-implement-a-new-hook) + - [2. Register the new hook](#2-register-the-new-hook) + - [3. Modify the config](#3-modify-the-config) + - [Use hooks implemented in MMCV](#use-hooks-implemented-in-mmcv) + - [Modify default runtime hooks](#modify-default-runtime-hooks) + - [Checkpoint config](#checkpoint-config) + - [Log config](#log-config) + - [Evaluation config](#evaluation-config) + + + +## Customize Optimization Methods + +### Customize optimizer supported by PyTorch + +We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files. +For example, if you want to use `Adam`, the modification could be as the following. + +```python +optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001) +``` + +To modify the learning rate of the model, the users only need to modify the `lr` in the config of optimizer. +The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch. + +For example, if you want to use `Adam` with the setting like `torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)` in PyTorch, +the modification could be set as the following. + +```python +optimizer = dict(type='Adam', lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False) +``` + +### Customize self-implemented optimizer + +#### 1. Define a new optimizer + +A customized optimizer could be defined as following. + +Assume you want to add an optimizer named `MyOptimizer`, which has arguments `a`, `b`, and `c`. +You need to create a new directory named `mmaction/core/optimizer`. +And then implement the new optimizer in a file, e.g., in `mmaction/core/optimizer/my_optimizer.py`: + +```python +from mmcv.runner import OPTIMIZERS +from torch.optim import Optimizer + + +@OPTIMIZERS.register_module() +class MyOptimizer(Optimizer): + + def __init__(self, a, b, c): + +``` + +#### 2. Add the optimizer to registry + +To find the above module defined above, this module should be imported into the main namespace at first. There are two ways to achieve it. + +- Modify `mmaction/core/optimizer/__init__.py` to import it. + + The newly defined module should be imported in `mmaction/core/optimizer/__init__.py` so that the registry will + find the new module and add it: + +```python +from .my_optimizer import MyOptimizer +``` + +- Use `custom_imports` in the config to manually import it + +```python +custom_imports = dict(imports=['mmaction.core.optimizer.my_optimizer'], allow_failed_imports=False) +``` + +The module `mmaction.core.optimizer.my_optimizer` will be imported at the beginning of the program and the class `MyOptimizer` is then automatically registered. +Note that only the package containing the class `MyOptimizer` should be imported. `mmaction.core.optimizer.my_optimizer.MyOptimizer` **cannot** be imported directly. + +#### 3. Specify the optimizer in the config file + +Then you can use `MyOptimizer` in `optimizer` field of config files. +In the configs, the optimizers are defined by the field `optimizer` like the following: + +```python +optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +``` + +To use your own optimizer, the field can be changed to + +```python +optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value) +``` + +### Customize optimizer constructor + +Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers. +The users can do those fine-grained parameter tuning through customizing optimizer constructor. + +```python +from mmcv.runner.optimizer import OPTIMIZER_BUILDERS + + +@OPTIMIZER_BUILDERS.register_module() +class MyOptimizerConstructor: + + def __init__(self, optimizer_cfg, paramwise_cfg=None): + pass + + def __call__(self, model): + + return my_optimizer +``` + +The default optimizer constructor is implemented [here](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/optimizer/default_constructor.py#L11), +which could also serve as a template for new optimizer constructor. + +### Additional settings + +Tricks not implemented by the optimizer should be implemented through optimizer constructor (e.g., set parameter-wise learning rates) or hooks. +We list some common settings that could stabilize the training or accelerate the training. Feel free to create PR, issue for more settings. + +- __Use gradient clip to stabilize training__: + Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below: + + ```python + optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) + ``` + +- __Use momentum schedule to accelerate model convergence__: + We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way. + Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence. + For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327) + and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130). + + ```python + lr_config = dict( + policy='cyclic', + target_ratio=(10, 1e-4), + cyclic_times=1, + step_ratio_up=0.4, + ) + momentum_config = dict( + policy='cyclic', + target_ratio=(0.85 / 0.95, 1), + cyclic_times=1, + step_ratio_up=0.4, + ) + ``` + +## Customize Training Schedules + +we use step learning rate with default value in config files, this calls [`StepLRHook`](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L153) in MMCV. +We support many other learning rate schedule [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), such as `CosineAnnealing` and `Poly` schedule. Here are some examples + +- Poly schedule: + + ```python + lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) + ``` + +- ConsineAnnealing schedule: + + ```python + lr_config = dict( + policy='CosineAnnealing', + warmup='linear', + warmup_iters=1000, + warmup_ratio=1.0 / 10, + min_lr_ratio=1e-5) + ``` + +## Customize Workflow + +By default, we recommend users to use `EvalHook` to do evaluation after training epoch, but they can still use `val` workflow as an alternative. + +Workflow is a list of (phase, epochs) to specify the running order and epochs. By default it is set to be + +```python +workflow = [('train', 1)] +``` + +which means running 1 epoch for training. +Sometimes user may want to check some metrics (e.g. loss, accuracy) about the model on the validate set. +In such case, we can set the workflow as + +```python +[('train', 1), ('val', 1)] +``` + +so that 1 epoch for training and 1 epoch for validation will be run iteratively. + +:::{note} + +1. The parameters of model will not be updated during val epoch. +2. Keyword `total_epochs` in the config only controls the number of training epochs and will not affect the validation workflow. +3. Workflows `[('train', 1), ('val', 1)]` and `[('train', 1)]` will not change the behavior of `EvalHook` because `EvalHook` is called by `after_train_epoch` and validation workflow only affect hooks that are called through `after_val_epoch`. + Therefore, the only difference between `[('train', 1), ('val', 1)]` and `[('train', 1)]` is that the runner will calculate losses on validation set after each training epoch. + +::: + +## Customize Hooks + +### Customize self-implemented hooks + +#### 1. Implement a new hook + +Here we give an example of creating a new hook in MMAction2 and using it in training. + +```python +from mmcv.runner import HOOKS, Hook + + +@HOOKS.register_module() +class MyHook(Hook): + + def __init__(self, a, b): + pass + + def before_run(self, runner): + pass + + def after_run(self, runner): + pass + + def before_epoch(self, runner): + pass + + def after_epoch(self, runner): + pass + + def before_iter(self, runner): + pass + + def after_iter(self, runner): + pass +``` + +Depending on the functionality of the hook, the users need to specify what the hook will do at each stage of the training in `before_run`, `after_run`, `before_epoch`, `after_epoch`, `before_iter`, and `after_iter`. + +#### 2. Register the new hook + +Then we need to make `MyHook` imported. Assuming the file is in `mmaction/core/utils/my_hook.py` there are two ways to do that: + +- Modify `mmaction/core/utils/__init__.py` to import it. + + The newly defined module should be imported in `mmaction/core/utils/__init__.py` so that the registry will + find the new module and add it: + +```python +from .my_hook import MyHook +``` + +- Use `custom_imports` in the config to manually import it + +```python +custom_imports = dict(imports=['mmaction.core.utils.my_hook'], allow_failed_imports=False) +``` + +#### 3. Modify the config + +```python +custom_hooks = [ + dict(type='MyHook', a=a_value, b=b_value) +] +``` + +You can also set the priority of the hook by adding key `priority` to `'NORMAL'` or `'HIGHEST'` as below + +```python +custom_hooks = [ + dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL') +] +``` + +By default the hook's priority is set as `NORMAL` during registration. + +### Use hooks implemented in MMCV + +If the hook is already implemented in MMCV, you can directly modify the config to use the hook as below + +```python +mmcv_hooks = [ + dict(type='MMCVHook', a=a_value, b=b_value, priority='NORMAL') +] +``` + +### Modify default runtime hooks + +There are some common hooks that are not registered through `custom_hooks` but has been registered by default when importing MMCV, they are + +- log_config +- checkpoint_config +- evaluation +- lr_config +- optimizer_config +- momentum_config + +In those hooks, only the logger hook has the `VERY_LOW` priority, others' priority are `NORMAL`. +The above-mentioned tutorials already cover how to modify `optimizer_config`, `momentum_config`, and `lr_config`. +Here we reveals how what we can do with `log_config`, `checkpoint_config`, and `evaluation`. + +#### Checkpoint config + +The MMCV runner will use `checkpoint_config` to initialize [`CheckpointHook`](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py#L9). + +```python +checkpoint_config = dict(interval=1) +``` + +The users could set `max_keep_ckpts` to only save only small number of checkpoints or decide whether to store state dict of optimizer by `save_optimizer`. +More details of the arguments are [here](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.CheckpointHook) + +#### Log config + +The `log_config` wraps multiple logger hooks and enables to set intervals. Now MMCV supports `WandbLoggerHook`, `MlflowLoggerHook`, and `TensorboardLoggerHook`. +The detail usages can be found in the [doc](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook). + +```python +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook'), + dict(type='TensorboardLoggerHook') + ]) +``` + +#### Evaluation config + +The config of `evaluation` will be used to initialize the [`EvalHook`](https://github.com/open-mmlab/mmaction2/blob/master/mmaction/core/evaluation/eval_hooks.py#L12). +Except the key `interval`, other arguments such as `metrics` will be passed to the `dataset.evaluate()` + +```python +evaluation = dict(interval=1, metrics='bbox') +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs/useful_tools.md b/openmmlab_test/mmaction2-0.24.1/docs/useful_tools.md new file mode 100644 index 0000000000000000000000000000000000000000..086061020b28cc7f18539a47e38356075c84f426 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs/useful_tools.md @@ -0,0 +1,230 @@ +Apart from training/testing scripts, We provide lots of useful tools under the `tools/` directory. + +## Useful Tools Link + + + +- [Useful Tools Link](#useful-tools-link) +- [Log Analysis](#log-analysis) +- [Model Complexity](#model-complexity) +- [Model Conversion](#model-conversion) + - [MMAction2 model to ONNX (experimental)](#mmaction2-model-to-onnx-experimental) + - [Prepare a model for publishing](#prepare-a-model-for-publishing) +- [Model Serving](#model-serving) + - [1. Convert model from MMAction2 to TorchServe](#1-convert-model-from-mmaction2-to-torchserve) + - [2. Build `mmaction-serve` docker image](#2-build-mmaction-serve-docker-image) + - [3. Launch `mmaction-serve`](#3-launch-mmaction-serve) + - [4. Test deployment](#4-test-deployment) +- [Miscellaneous](#miscellaneous) + - [Evaluating a metric](#evaluating-a-metric) + - [Print the entire config](#print-the-entire-config) + - [Check videos](#check-videos) + + + +## Log Analysis + +`tools/analysis/analyze_logs.py` plots loss/top-k acc curves given a training log file. Run `pip install seaborn` first to install the dependency. + +![acc_curve_image](https://github.com/open-mmlab/mmaction2/raw/master/resources/acc_curve.png) + +```shell +python tools/analysis/analyze_logs.py plot_curve ${JSON_LOGS} [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}] +``` + +Examples: + +- Plot the classification loss of some run. + + ```shell + python tools/analysis/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls + ``` + +- Plot the top-1 acc and top-5 acc of some run, and save the figure to a pdf. + + ```shell + python tools/analysis/analyze_logs.py plot_curve log.json --keys top1_acc top5_acc --out results.pdf + ``` + +- Compare the top-1 acc of two runs in the same figure. + + ```shell + python tools/analysis/analyze_logs.py plot_curve log1.json log2.json --keys top1_acc --legend run1 run2 + ``` + + You can also compute the average training speed. + + ```shell + python tools/analysis/analyze_logs.py cal_train_time ${JSON_LOGS} [--include-outliers] + ``` + +- Compute the average training speed for a config file. + + ```shell + python tools/analysis/analyze_logs.py cal_train_time work_dirs/some_exp/20200422_153324.log.json + ``` + + The output is expected to be like the following. + + ```text + -----Analyze train time of work_dirs/some_exp/20200422_153324.log.json----- + slowest epoch 60, average time is 0.9736 + fastest epoch 18, average time is 0.9001 + time std over epochs is 0.0177 + average iter time: 0.9330 s/iter + ``` + +## Model Complexity + +`/tools/analysis/get_flops.py` is a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) to compute the FLOPs and params of a given model. + +```shell +python tools/analysis/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}] +``` + +We will get the result like this + +```text +============================== +Input shape: (1, 3, 32, 340, 256) +Flops: 37.1 GMac +Params: 28.04 M +============================== +``` + +:::{note} +This tool is still experimental and we do not guarantee that the number is absolutely correct. +You may use the result for simple comparisons, but double check it before you adopt it in technical reports or papers. + +(1) FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 340, 256) for 2D recognizer, (1, 3, 32, 340, 256) for 3D recognizer. +(2) Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details. +::: + +## Model Conversion + +### MMAction2 model to ONNX (experimental) + +`/tools/deployment/pytorch2onnx.py` is a script to convert model to [ONNX](https://github.com/onnx/onnx) format. +It also supports comparing the output results between Pytorch and ONNX model for verification. +Run `pip install onnx onnxruntime` first to install the dependency. +Please note that a softmax layer could be added for recognizers by `--softmax` option, in order to get predictions in range `[0, 1]`. + +- For recognizers, please run: + + ```shell + python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify + ``` + +- For localizers, please run: + + ```shell + python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify + ``` + +### Prepare a model for publishing + +`tools/deployment/publish_model.py` helps users to prepare their model for publishing. + +Before you upload a model to AWS, you may want to: + +(1) convert model weights to CPU tensors. +(2) delete the optimizer states. +(3) compute the hash of the checkpoint file and append the hash id to the filename. + +```shell +python tools/deployment/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME} +``` + +E.g., + +```shell +python tools/deployment/publish_model.py work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth tsn_r50_1x1x3_100e_kinetics400_rgb.pth +``` + +The final output filename will be `tsn_r50_1x1x3_100e_kinetics400_rgb-{hash id}.pth`. + +## Model Serving + +In order to serve an `MMAction2` model with [`TorchServe`](https://pytorch.org/serve/), you can follow the steps: + +### 1. Convert model from MMAction2 to TorchServe + +```shell +python tools/deployment/mmaction2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \ +--output_folder ${MODEL_STORE} \ +--model-name ${MODEL_NAME} \ +--label-file ${LABLE_FILE} + +``` + +### 2. Build `mmaction-serve` docker image + +```shell +DOCKER_BUILDKIT=1 docker build -t mmaction-serve:latest docker/serve/ +``` + +### 3. Launch `mmaction-serve` + +Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment). + +Example: + +```shell +docker run --rm \ +--cpus 8 \ +--gpus device=0 \ +-p8080:8080 -p8081:8081 -p8082:8082 \ +--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \ +mmaction-serve:latest +``` + +**Note**: ${MODEL_STORE} needs to be an absolute path. +[Read the docs](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md) about the Inference (8080), Management (8081) and Metrics (8082) APis + +### 4. Test deployment + +```shell +# Assume you are under the directory `mmaction2` +curl http://127.0.0.1:8080/predictions/${MODEL_NAME} -T demo/demo.mp4 +``` + +You should obtain a response similar to: + +```json +{ + "arm wrestling": 1.0, + "rock scissors paper": 4.962051880497143e-10, + "shaking hands": 3.9761663406245873e-10, + "massaging feet": 1.1924419784925533e-10, + "stretching leg": 1.0601879096849842e-10 +} +``` + +## Miscellaneous + +### Evaluating a metric + +`tools/analysis/eval_metric.py` evaluates certain metrics of the results saved in a file according to a config file. + +The saved result file is created on `tools/test.py` by setting the arguments `--out ${RESULT_FILE}` to indicate the result file, +which stores the final output of the whole model. + +```shell +python tools/analysis/eval_metric.py ${CONFIG_FILE} ${RESULT_FILE} [--eval ${EVAL_METRICS}] [--cfg-options ${CFG_OPTIONS}] [--eval-options ${EVAL_OPTIONS}] +``` + +### Print the entire config + +`tools/analysis/print_config.py` prints the whole config verbatim, expanding all its imports. + +```shell +python tools/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}] +``` + +### Check videos + +`tools/analysis/check_videos.py` uses specified video encoder to iterate all samples that are specified by the input configuration file, looks for invalid videos (corrupted or missing), and saves the corresponding file path to the output file. Please note that after deleting invalid videos, users need to regenerate the video file list. + +```shell +python tools/analysis/check_videos.py ${CONFIG} [-h] [--options OPTIONS [OPTIONS ...]] [--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]] [--output-file OUTPUT_FILE] [--split SPLIT] [--decoder DECODER] [--num-processes NUM_PROCESSES] [--remove-corrupted-videos] +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/Makefile b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..d4bb2cbb9eddb1bb1b4f366623044af8e4830919 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/README.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/README.md new file mode 100644 index 0000000000000000000000000000000000000000..94dfbcae744cd474081282232ef480ed587068ab --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/README.md @@ -0,0 +1 @@ +../README_zh-CN.md diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/api.rst b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/api.rst new file mode 100644 index 0000000000000000000000000000000000000000..ecc9b810e909ba1c4a904c517474b79451921b3e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/api.rst @@ -0,0 +1,101 @@ +mmaction.apis +------------- +.. automodule:: mmaction.apis + :members: + +mmaction.core +------------- + +optimizer +^^^^^^^^^ +.. automodule:: mmaction.core.optimizer + :members: + +evaluation +^^^^^^^^^^ +.. automodule:: mmaction.core.evaluation + :members: + +scheduler +^^ +.. automodule:: mmaction.core.scheduler + :members: + +mmaction.localization +--------------------- + +localization +^^^^^^^^^^^^ +.. automodule:: mmaction.localization + :members: + +mmaction.models +--------------- + +models +^^^^^^ +.. automodule:: mmaction.models + :members: + +recognizers +^^^^^^^^^^^ +.. automodule:: mmaction.models.recognizers + :members: + +localizers +^^^^^^^^^^ +.. automodule:: mmaction.models.localizers + :members: + +common +^^^^^^ +.. automodule:: mmaction.models.common + :members: + +backbones +^^^^^^^^^ +.. automodule:: mmaction.models.backbones + :members: + +heads +^^^^^ +.. automodule:: mmaction.models.heads + :members: + +necks +^^^^^ +.. automodule:: mmaction.models.necks + :members: + +losses +^^^^^^ +.. automodule:: mmaction.models.losses + :members: + +mmaction.datasets +----------------- + +datasets +^^^^^^^^ +.. automodule:: mmaction.datasets + :members: + +pipelines +^^^^^^^^^ +.. automodule:: mmaction.datasets.pipelines + :members: + +samplers +^^^^^^^^ +.. automodule:: mmaction.datasets.samplers + :members: + +mmaction.utils +-------------- +.. automodule:: mmaction.utils + :members: + +mmaction.localization +--------------------- +.. automodule:: mmaction.localization + :members: diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/benchmark.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/benchmark.md new file mode 100644 index 0000000000000000000000000000000000000000..a737f9489b2d2243779159cd422b6083fc55076b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/benchmark.md @@ -0,0 +1,157 @@ +# 基准测试 + +这里将 MMAction2 与其他流行的代码框架和官方开源代码的速度性能进行对比 + +## 配置 + +### 硬件环境 + +- 8 NVIDIA Tesla V100 (32G) GPUs +- Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz + +### 软件环境 + +- Python 3.7 +- PyTorch 1.4 +- CUDA 10.1 +- CUDNN 7.6.03 +- NCCL 2.4.08 + +### 评测指标 + +这里测量的时间是一轮训练迭代的平均时间,包括数据处理和模型训练。 +训练速度以 s/iter 为单位,其值越低越好。注意,这里跳过了前 50 个迭代时间,因为它们可能包含设备的预热时间。 + +### 比较规则 + +这里以一轮训练迭代时间为基准,使用了相同的数据和模型设置对 MMAction2 和其他的视频理解工具箱进行比较。参与评测的其他代码库包括 + +- MMAction: commit id [7f3490d](https://github.com/open-mmlab/mmaction/tree/7f3490d3db6a67fe7b87bfef238b757403b670e3)(1/5/2020) +- Temporal-Shift-Module: commit id [8d53d6f](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd)(5/5/2020) +- PySlowFast: commit id [8299c98](https://github.com/facebookresearch/SlowFast/tree/8299c9862f83a067fa7114ce98120ae1568a83ec)(7/7/2020) +- BSN(boundary sensitive network): commit id [f13707f](https://github.com/wzmsltw/BSN-boundary-sensitive-network/tree/f13707fbc362486e93178c39f9c4d398afe2cb2f)(12/12/2018) +- BMN(boundary matching network): commit id [45d0514](https://github.com/JJBOY/BMN-Boundary-Matching-Network/tree/45d05146822b85ca672b65f3d030509583d0135a)(17/10/2019) + +为了公平比较,这里基于相同的硬件环境和数据进行对比实验。 +使用的视频帧数据集是通过 [数据准备工具](/tools/data/kinetics/README.md) 生成的, +使用的视频数据集是通过 [该脚本](/tools/data/resize_videos.py) 生成的,以快速解码为特点的,"短边 256,密集关键帧编码“的视频数据集。 +正如以下表格所示,在对比正常的短边 256 视频时,可以观察到速度上的显著提升,尤其是在采样特别稀疏的情况下,如 [TSN](/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py)。 + +## 主要结果 + +### 行为识别器 + +| 模型 | 输入 | IO 后端 | 批大小 x GPU 数量 | MMAction2 (s/iter) | GPU 显存占用 (GB) | MMAction (s/iter) | GPU 显存占用 (GB) | Temporal-Shift-Module (s/iter) | GPU 显存占用 (GB) | PySlowFast (s/iter) | GPU 显存占用 (GB) | +| :------------------------------------------------------------------------------------------ | :----------------------: | :-------: | :---------------: | :-------------------------------------------------------------------------------------------------------------------------: | :---------------: | :------------------------------------------------------------------------------------------------------------------: | :---------------: | :-------------------------------------------------------------------------------------------------------------------------------: | :---------------: | :--------------------------------------------------------------------------------------------------------------------: | :---------------: | +| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p rawframes | Memcached | 32x8 | **[0.32](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_rawframes_memcahed_32x8.zip)** | 8.1 | [0.38](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction/tsn_256p_rawframes_memcached_32x8.zip) | 8.1 | [0.42](https://download.openmmlab.com/mmaction/benchmark/recognition/temporal_shift_module/tsn_256p_rawframes_memcached_32x8.zip) | 10.5 | x | x | +| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p videos | Disk | 32x8 | **[1.42](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO | +| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 32x8 | **[0.61](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_fast_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO | +| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_video.log) | 4.6 | +| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.35](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_fast_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.36](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_fast_video.log) | 4.6 | +| [I3D](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.43](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_256p_rawframes_memcahed_8x8.zip)** | 5.0 | [0.56](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction/i3d_256p_rawframes_memcached_8x8.zip) | 5.0 | x | x | x | x | +| [TSM](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.31](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsm_256p_rawframes_memcahed_8x8.zip)** | 6.9 | x | x | [0.41](https://download.openmmlab.com/mmaction/benchmark/recognition/temporal_shift_module/tsm_256p_rawframes_memcached_8x8.zip) | 9.1 | x | x | +| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.32](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_video.log) | 3.4 | +| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.25](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_fast_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.28](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_fast_video.log) | 3.4 | +| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.69](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [1.04](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_video.log) | 7.0 | +| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.68](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_fast_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [0.96](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_fast_video.log) | 7.0 | +| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.45](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x | +| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_fast_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x | + +### 时序动作检测器 + +| Model | MMAction2 (s/iter) | BSN(boundary sensitive network) (s/iter) | BMN(boundary matching network) (s/iter) | +| :------------------------------------------------------------------------------------------------------------------ | :-----------------------: | :--------------------------------------: | :-------------------------------------: | +| BSN ([TEM + PEM + PGM](/configs/localization/bsn)) | **0.074(TEM)+0.040(PEM)** | 0.101(TEM)+0.040(PEM) | x | +| BMN ([bmn_400x100_2x8_9e_activitynet_feature](/configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py)) | **3.27** | x | 3.30 | + +## 比较细节 + +### TSN + +- **MMAction2** + +```shell +# 处理视频帧 +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_tsn configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_tsn_rawframes + +# 处理视频 +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_tsn configs/recognition/tsn/tsn_r50_video_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_tsn_video +``` + +- **MMAction** + +```shell +python -u tools/train_recognizer.py configs/TSN/tsn_kinetics400_2d_rgb_r50_seg3_f1s1.py +``` + +- **Temporal-Shift-Module** + +```shell +python main.py kinetics RGB --arch resnet50 --num_segments 3 --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 1 --batch-size 256 -j 32 --dropout 0.5 --consensus_type=avg --eval-freq=10 --npb --print-freq 1 +``` + +### I3D + +- **MMAction2** + +```shell +# 处理视频帧 +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_i3d configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_i3d_rawframes + +# 处理视频 +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_i3d configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_i3d_video +``` + +- **MMAction** + +```shell +python -u tools/train_recognizer.py configs/I3D_RGB/i3d_kinetics400_3d_rgb_r50_c3d_inflate3x1x1_seg1_f32s2.py +``` + +- **PySlowFast** + +```shell +python tools/run_net.py --cfg configs/Kinetics/I3D_8x8_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_i3d_r50_8x8_video.log +``` + +可以通过编写一个简单的脚本对日志文件的 'time_diff' 域进行解析,以复现对应的结果。 + +### SlowFast + +- **MMAction2** + +```shell +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_slowfast configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py --work-dir work_dirs/benchmark_slowfast_video +``` + +- **MMAction** + +```shell +python tools/run_net.py --cfg configs/Kinetics/SLOWFAST_4x16_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_slowfast_r50_4x16_video.log +``` + +可以通过编写一个简单的脚本对日志文件的 'time_diff' 域进行解析,以复现对应的结果。 + +### SlowOnly + +- **MMAction2** + +```shell +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_slowonly configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py --work-dir work_dirs/benchmark_slowonly_video +``` + +- **PySlowFast** + +```shell +python tools/run_net.py --cfg configs/Kinetics/SLOW_4x16_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_slowonly_r50_4x16_video.log +``` + +可以通过编写一个简单的脚本对日志文件的 'time_diff' 域进行解析,以复现对应的结果。 + +### R2plus1D + +- **MMAction2** + +```shell +bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_r2plus1d configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py --work-dir work_dirs/benchmark_r2plus1d_video +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/conf.py b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/conf.py new file mode 100644 index 0000000000000000000000000000000000000000..9ee1b8262bcbce88eb27d1cbd6712775b1522c93 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/conf.py @@ -0,0 +1,132 @@ +# Copyright (c) OpenMMLab. All rights reserved. +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import subprocess +import sys + +import pytorch_sphinx_theme + +sys.path.insert(0, os.path.abspath('..')) + +# -- Project information ----------------------------------------------------- + +project = 'MMAction2' +copyright = '2020, OpenMMLab' +author = 'MMAction2 Authors' +version_file = '../mmaction/version.py' + + +def get_version(): + with open(version_file, 'r') as f: + exec(compile(f.read(), version_file, 'exec')) + return locals()['__version__'] + + +# The full version, including alpha/beta/rc tags +release = get_version() + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.autodoc', 'sphinx.ext.napoleon', 'sphinx.ext.viewcode', + 'sphinx_markdown_tables', 'sphinx_copybutton', 'myst_parser' +] + +# numpy and torch are required +autodoc_mock_imports = ['mmaction.version', 'PIL'] + +copybutton_prompt_text = r'>>> |\.\.\. ' +copybutton_prompt_is_regexp = True + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] + +# -- Options for HTML output ------------------------------------------------- +source_suffix = {'.rst': 'restructuredtext', '.md': 'markdown'} + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = 'pytorch_sphinx_theme' +html_theme_path = [pytorch_sphinx_theme.get_html_theme_path()] +html_theme_options = { + # 'logo_url': 'https://mmocr.readthedocs.io/en/latest/', + 'menu': [ + { + 'name': + '教程', + 'url': + 'https://colab.research.google.com/github/' + 'open-mmlab/mmaction2/blob/master/demo/' + 'mmaction2_tutorial_zh-CN.ipynb' + }, + { + 'name': 'GitHub', + 'url': 'https://github.com/open-mmlab/mmaction2' + }, + { + 'name': + '上游代码库', + 'children': [ + { + 'name': 'MMCV', + 'url': 'https://github.com/open-mmlab/mmcv', + 'description': '计算机视觉基础库' + }, + { + 'name': 'MMClassification', + 'url': 'https://github.com/open-mmlab/mmclassification', + 'description': '图像分类代码库' + }, + { + 'name': 'MMDetection', + 'url': 'https://github.com/open-mmlab/mmdetection', + 'description': '物体检测代码库' + }, + ] + }, + ], + # Specify the language of shared menu + 'menu_lang': + 'cn' +} + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] +html_css_files = ['css/readthedocs.css'] + +myst_enable_extensions = ['colon_fence'] +myst_heading_anchors = 3 + +language = 'zh_CN' +master_doc = 'index' + + +def builder_inited_handler(app): + subprocess.run(['./merge_docs.sh']) + subprocess.run(['./stat.py']) + + +def setup(app): + app.connect('builder-inited', builder_inited_handler) diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/data_preparation.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/data_preparation.md new file mode 100644 index 0000000000000000000000000000000000000000..bd43422bba9dc9247c0a9f70a7a8f8e5b13dc8f5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/data_preparation.md @@ -0,0 +1,155 @@ +# 准备数据 + +本文为 MMAction2 的数据准备提供一些指南。 + + + +- [视频格式数据的一些注意事项](#%E8%A7%86%E9%A2%91%E6%A0%BC%E5%BC%8F%E6%95%B0%E6%8D%AE%E7%9A%84%E4%B8%80%E4%BA%9B%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A1%B9) +- [获取数据](#%E8%8E%B7%E5%8F%96%E6%95%B0%E6%8D%AE) + - [准备视频](#%E5%87%86%E5%A4%87%E8%A7%86%E9%A2%91) + - [提取帧](#%E6%8F%90%E5%8F%96%E5%B8%A7) + - [denseflow 的替代项](#denseflow-%E7%9A%84%E6%9B%BF%E4%BB%A3%E9%A1%B9) + - [生成文件列表](#%E7%94%9F%E6%88%90%E6%96%87%E4%BB%B6%E5%88%97%E8%A1%A8) + - [准备音频](#%E5%87%86%E5%A4%87%E9%9F%B3%E9%A2%91) + + + +## 视频格式数据的一些注意事项 + +MMAction2 支持两种数据类型:原始帧和视频。前者在过去的项目中经常出现,如 TSN。 +如果能把原始帧存储在固态硬盘上,处理帧格式的数据是非常快的,但对于大规模的数据集来说,原始帧需要占据大量的磁盘空间。 +(举例来说,最新版本的 [Kinetics](https://deepmind.com/research/open-source/open-source-datasets/kinetics/) 有 650K 个视频,其所有原始帧需要占据几个 TB 的磁盘空间。) +视频格式的数据能够节省很多空间,但在运行模型时,必须进行视频解码,算力开销很大。 +为了加速视频解码,MMAction2 支持了若干种高效的视频加载库,如 [decord](https://github.com/zhreshold/decord), [PyAV](https://github.com/PyAV-Org/PyAV) 等。 + +## 获取数据 + +本文介绍如何构建自定义数据集。 +与上述数据集相似,推荐用户把数据放在 `$MMACTION2/data/$DATASET` 中。 + +### 准备视频 + +请参照官网或官方脚本准备视频。 +注意,应该按照下面两种方法之一来组织视频数据文件夹结构: + +(1) 形如 `${CLASS_NAME}/${VIDEO_ID}` 的两级文件目录结构,这种结构推荐在动作识别数据集中使用(如 UCF101 和 Kinetics) + +(2) 单级文件目录结构,这种结构推荐在动作检测数据集或者多标签数据集中使用(如 THUMOS14) + +### 提取帧 + +若想同时提取帧和光流,可以使用 OpenMMLab 准备的 [denseflow](https://github.com/open-mmlab/denseflow) 工具。 +因为不同的帧提取工具可能产生不同数量的帧,建议使用同一工具来提取 RGB 帧和光流,以避免它们的数量不同。 + +```shell +python build_rawframes.py ${SRC_FOLDER} ${OUT_FOLDER} [--task ${TASK}] [--level ${LEVEL}] \ + [--num-worker ${NUM_WORKER}] [--flow-type ${FLOW_TYPE}] [--out-format ${OUT_FORMAT}] \ + [--ext ${EXT}] [--new-width ${NEW_WIDTH}] [--new-height ${NEW_HEIGHT}] [--new-short ${NEW_SHORT}] \ + [--resume] [--use-opencv] [--mixed-ext] +``` + +- `SRC_FOLDER`: 视频源文件夹 +- `OUT_FOLDER`: 存储提取出的帧和光流的根文件夹 +- `TASK`: 提取任务,说明提取帧,光流,还是都提取,选项为 `rgb`, `flow`, `both` +- `LEVEL`: 目录层级。1 指单级文件目录,2 指两级文件目录 +- `NUM_WORKER`: 提取原始帧的线程数 +- `FLOW_TYPE`: 提取的光流类型,如 `None`, `tvl1`, `warp_tvl1`, `farn`, `brox` +- `OUT_FORMAT`: 提取帧的输出文件类型,如 `jpg`, `h5`, `png` +- `EXT`: 视频文件后缀名,如 `avi`, `mp4` +- `NEW_WIDTH`: 调整尺寸后,输出图像的宽 +- `NEW_HEIGHT`: 调整尺寸后,输出图像的高 +- `NEW_SHORT`: 等比例缩放图片后,输出图像的短边长 +- `--resume`: 是否接续之前的光流提取任务,还是覆盖之前的输出结果重新提取 +- `--use-opencv`: 是否使用 OpenCV 提取 RGB 帧 +- `--mixed-ext`: 说明是否处理不同文件类型的视频文件 + +根据实际经验,推荐设置为: + +1. 将 `$OUT_FOLDER` 设置为固态硬盘上的文件夹。 +2. 软连接 `$OUT_FOLDER` 到 `$MMACTION2/data/$DATASET/rawframes` +3. 使用 `new-short` 而不是 `new-width` 和 `new-height` 来调整图像尺寸 + +```shell +ln -s ${YOUR_FOLDER} $MMACTION2/data/$DATASET/rawframes +``` + +#### denseflow 的替代项 + +如果用户因依赖要求(如 Nvidia 显卡驱动版本),无法安装 [denseflow](https://github.com/open-mmlab/denseflow), +或者只需要一些关于光流提取的快速演示,可用 Python 脚本 `tools/misc/flow_extraction.py` 替代 denseflow。 +这个脚本可用于一个或多个视频提取 RGB 帧和光流。注意,由于该脚本时在 CPU 上运行光流算法,其速度比 denseflow 慢很多。 + +```shell +python tools/misc/flow_extraction.py --input ${INPUT} [--prefix ${PREFIX}] [--dest ${DEST}] [--rgb-tmpl ${RGB_TMPL}] \ + [--flow-tmpl ${FLOW_TMPL}] [--start-idx ${START_IDX}] [--method ${METHOD}] [--bound ${BOUND}] [--save-rgb] +``` + +- `INPUT`: 用于提取帧的视频,可以是单个视频或一个视频列表,视频列表应该是一个 txt 文件,并且只包含视频文件名,不包含目录 +- `PREFIX`: 输入视频的前缀,当输入是一个视频列表时使用 +- `DEST`: 保存提取出的帧的位置 +- `RGB_TMPL`: RGB 帧的文件名格式 +- `FLOW_TMPL`: 光流的文件名格式 +- `START_IDX`: 提取帧的开始索引 +- `METHOD`: 用于生成光流的方法 +- `BOUND`: 光流的最大值 +- `SAVE_RGB`: 同时保存提取的 RGB 帧 + +### 生成文件列表 + +MMAction2 提供了便利的脚本用于生成文件列表。在完成视频下载(或更进一步完成视频抽帧)后,用户可以使用如下的脚本生成文件列表。 + +```shell +cd $MMACTION2 +python tools/data/build_file_list.py ${DATASET} ${SRC_FOLDER} [--rgb-prefix ${RGB_PREFIX}] \ + [--flow-x-prefix ${FLOW_X_PREFIX}] [--flow-y-prefix ${FLOW_Y_PREFIX}] [--num-split ${NUM_SPLIT}] \ + [--subset ${SUBSET}] [--level ${LEVEL}] [--format ${FORMAT}] [--out-root-path ${OUT_ROOT_PATH}] \ + [--seed ${SEED}] [--shuffle] +``` + +- `DATASET`: 所要准备的数据集,例如:`ucf101` , `kinetics400` , `thumos14` , `sthv1` , `sthv2` 等。 +- `SRC_FOLDER`: 存放对应格式的数据的目录: + - 如目录为 "$MMACTION2/data/$DATASET/rawframes",则需设置 `--format rawframes`。 + - 如目录为 "$MMACTION2/data/$DATASET/videos",则需设置 `--format videos`。 +- `RGB_PREFIX`: RGB 帧的文件前缀。 +- `FLOW_X_PREFIX`: 光流 x 分量帧的文件前缀。 +- `FLOW_Y_PREFIX`: 光流 y 分量帧的文件前缀。 +- `NUM_SPLIT`: 数据集总共的划分个数。 +- `SUBSET`: 需要生成文件列表的子集名称。可选项为 `train`, `val`, `test`。 +- `LEVEL`: 目录级别数量,1 表示一级目录(数据集中所有视频或帧文件夹位于同一目录), 2 表示二级目录(数据集中所有视频或帧文件夹按类别存放于各子目录)。 +- `FORMAT`: 需要生成文件列表的源数据格式。可选项为 `rawframes`, `videos`。 +- `OUT_ROOT_PATH`: 生成文件的根目录。 +- `SEED`: 随机种子。 +- `--shuffle`: 是否打乱生成的文件列表。 + +至此为止,用户可参考 [基础教程](getting_started.md) 来进行模型的训练及测试。 + +### 准备音频 + +MMAction2 还提供如下脚本来提取音频的波形并生成梅尔频谱。 + +```shell +cd $MMACTION2 +python tools/data/extract_audio.py ${ROOT} ${DST_ROOT} [--ext ${EXT}] [--num-workers ${N_WORKERS}] \ + [--level ${LEVEL}] +``` + +- `ROOT`: 视频的根目录。 +- `DST_ROOT`: 存放生成音频的根目录。 +- `EXT`: 视频的后缀名,如 `mp4`。 +- `N_WORKERS`: 使用的进程数量。 + +成功提取出音频后,用户可参照 [配置文件](/configs/recognition_audio/resnet/tsn_r50_64x1x1_100e_kinetics400_audio.py) 在线解码并生成梅尔频谱。如果音频文件的目录结构与帧文件夹一致,用户可以直接使用帧数据所用的标注文件作为音频数据的标注文件。在线解码的缺陷在于速度较慢,因此,MMAction2 也提供如下脚本用于离线地生成梅尔频谱。 + +```shell +cd $MMACTION2 +python tools/data/build_audio_features.py ${AUDIO_HOME_PATH} ${SPECTROGRAM_SAVE_PATH} [--level ${LEVEL}] \ + [--ext $EXT] [--num-workers $N_WORKERS] [--part $PART] +``` + +- `AUDIO_HOME_PATH`: 音频文件的根目录。 +- `SPECTROGRAM_SAVE_PATH`: 存放生成音频特征的根目录。 +- `EXT`: 音频的后缀名,如 `m4a`。 +- `N_WORKERS`: 使用的进程数量。 +- `PART`: 将完整的解码任务分为几部分并执行其中一份。如 `2/5` 表示将所有待解码数据分成 5 份,并对其中的第 2 份进行解码。这一选项在用户有多台机器时发挥作用。 + +梅尔频谱特征所对应的标注文件与帧文件夹一致,用户可以直接复制 `dataset_[train/val]_list_rawframes.txt` 并将其重命名为 `dataset_[train/val]_list_audio_feature.txt`。 diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/demo.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/demo.md new file mode 100644 index 0000000000000000000000000000000000000000..c5d12538bf4935f1cba889f823ec2d026c5efdb1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/demo.md @@ -0,0 +1,630 @@ +# Demo 示例 + +## 目录 + +- [Demo 示例](#demo-%E7%A4%BA%E4%BE%8B) + - [目录](#%E7%9B%AE%E5%BD%95) + - [预测视频的动作标签](#%E9%A2%84%E6%B5%8B%E8%A7%86%E9%A2%91%E7%9A%84%E5%8A%A8%E4%BD%9C%E6%A0%87%E7%AD%BE) + - [预测视频的时空检测结果](#%E9%A2%84%E6%B5%8B%E8%A7%86%E9%A2%91%E7%9A%84%E6%97%B6%E7%A9%BA%E6%A3%80%E6%B5%8B%E7%BB%93%E6%9E%9C) + - [可视化输入视频的 GradCAM](#%E5%8F%AF%E8%A7%86%E5%8C%96%E8%BE%93%E5%85%A5%E8%A7%86%E9%A2%91%E7%9A%84-gradcam) + - [使用网络摄像头的实时动作识别](#%E4%BD%BF%E7%94%A8%E7%BD%91%E7%BB%9C%E6%91%84%E5%83%8F%E5%A4%B4%E7%9A%84%E5%AE%9E%E6%97%B6%E5%8A%A8%E4%BD%9C%E8%AF%86%E5%88%AB) + - [滑动窗口预测长视频中不同动作类别](#%E6%BB%91%E5%8A%A8%E7%AA%97%E5%8F%A3%E9%A2%84%E6%B5%8B%E9%95%BF%E8%A7%86%E9%A2%91%E4%B8%AD%E4%B8%8D%E5%90%8C%E5%8A%A8%E4%BD%9C%E7%B1%BB%E5%88%AB) + - [基于网络摄像头的实时时空动作检测](#%E5%9F%BA%E4%BA%8E%E7%BD%91%E7%BB%9C%E6%91%84%E5%83%8F%E5%A4%B4%E7%9A%84%E5%AE%9E%E6%97%B6%E6%97%B6%E7%A9%BA%E5%8A%A8%E4%BD%9C%E6%A3%80%E6%B5%8B) + - [基于人体姿态预测动作标签](#%E5%9F%BA%E4%BA%8E%E4%BA%BA%E4%BD%93%E5%A7%BF%E6%80%81%E9%A2%84%E6%B5%8B%E5%8A%A8%E4%BD%9C%E6%A0%87%E7%AD%BE) + - [视频结构化预测](#%E8%A7%86%E9%A2%91%E7%BB%93%E6%9E%84%E5%8C%96%E9%A2%84%E6%B5%8B) + - [基于音频的动作识别](#%E5%9F%BA%E4%BA%8E%E9%9F%B3%E9%A2%91%E7%9A%84%E5%8A%A8%E4%BD%9C%E8%AF%86%E5%88%AB) + +## 预测视频的动作标签 + +MMAction2 提供如下脚本以预测视频的动作标签。为得到 \[0, 1\] 间的动作分值,请确保在配置文件中设定 `model['test_cfg'] = dict(average_clips='prob')`。 + +```shell +python demo/demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} {LABEL_FILE} [--use-frames] \ + [--device ${DEVICE_TYPE}] [--fps {FPS}] [--font-scale {FONT_SCALE}] [--font-color {FONT_COLOR}] \ + [--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}] +``` + +可选参数: + +- `--use-frames`: 如指定,代表使用帧目录作为输入;否则代表使用视频作为输入。 +- `DEVICE_TYPE`: 指定脚本运行设备,支持 cuda 设备(如 `cuda:0`)或 cpu(`cpu`)。默认为 `cuda:0`。 +- `FPS`: 使用帧目录作为输入时,代表输入的帧率。默认为 30。 +- `FONT_SCALE`: 输出视频上的字体缩放比例。默认为 0.5。 +- `FONT_COLOR`: 输出视频上的字体颜色,默认为白色( `white`)。 +- `TARGET_RESOLUTION`: 输出视频的分辨率,如未指定,使用输入视频的分辨率。 +- `RESIZE_ALGORITHM`: 缩放视频时使用的插值方法,默认为 `bicubic`。 +- `OUT_FILE`: 输出视频的路径,如未指定,则不会生成输出视频。 + +示例: + +以下示例假设用户的当前目录为 `$MMACTION2`,并已经将所需的模型权重文件下载至目录 `checkpoints/` 下,用户也可以使用所提供的 URL 来直接加载模型权重,文件将会被默认下载至 `$HOME/.cache/torch/checkpoints`。 + +1. 在 cuda 设备上,使用 TSN 模型进行视频识别: + + ```shell + # demo.mp4 及 label_map_k400.txt 均来自 Kinetics-400 数据集 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt + ``` + +2. 在 cuda 设备上,使用 TSN 模型进行视频识别,并利用 URL 加载模型权重文件: + + ```shell + # demo.mp4 及 label_map_k400.txt 均来自 Kinetics-400 数据集 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt + ``` + +3. 在 CPU 上,使用 TSN 模型进行视频识别,输入为视频抽好的帧: + + ```shell + python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_FRAMES/ LABEL_FILE --use-frames --device cpu + ``` + +4. 使用 TSN 模型进行视频识别,输出 MP4 格式的识别结果: + + ```shell + # demo.mp4 及 label_map_k400.txt 均来自 Kinetics-400 数据集 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --out-filename demo/demo_out.mp4 + ``` + +5. 使用 TSN 模型进行视频识别,输入为视频抽好的帧,将识别结果存为 GIF 格式: + + ```shell + python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_FRAMES/ LABEL_FILE --use-frames --out-filename demo/demo_out.gif + ``` + +6. 使用 TSN 模型进行视频识别,输出 MP4 格式的识别结果,并指定输出视频分辨率及缩放视频时使用的插值方法: + + ```shell + # demo.mp4 及 label_map_k400.txt 均来自 Kinetics-400 数据集 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --target-resolution 340 256 --resize-algorithm bilinear \ + --out-filename demo/demo_out.mp4 + ``` + + ```shell + # demo.mp4 及 label_map_k400.txt 均来自 Kinetics-400 数据集 + # 若 TARGET_RESOLUTION 的任一维度被设置为 -1,视频帧缩放时将保持长宽比 + # 如设定 --target-resolution 为 170 -1,原先长宽为 (340, 256) 的视频帧将被缩放至 (170, 128) + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --target-resolution 170 -1 --resize-algorithm bilinear \ + --out-filename demo/demo_out.mp4 + ``` + +7. 使用 TSN 模型进行视频识别,输出 MP4 格式的识别结果,指定输出视频中使用红色文字,字体大小为 10 像素: + + ```shell + # demo.mp4 及 label_map_k400.txt 均来自 Kinetics-400 数据集 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --font-size 10 --font-color red \ + --out-filename demo/demo_out.mp4 + ``` + +8. 使用 TSN 模型进行视频识别,输入为视频抽好的帧,将识别结果存为 MP4 格式,帧率设置为 24fps: + + ```shell + python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_FRAMES/ LABEL_FILE --use-frames --fps 24 --out-filename demo/demo_out.gif + ``` + +## 预测视频的时空检测结果 + +MMAction2 提供如下脚本以预测视频的时空检测结果。 + +```shell +python demo/demo_spatiotemporal_det.py --video ${VIDEO_FILE} \ + [--config ${SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE}] \ + [--checkpoint ${SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT}] \ + [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \ + [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \ + [--det-score-thr ${HUMAN_DETECTION_SCORE_THRESHOLD}] \ + [--action-score-thr ${ACTION_DETECTION_SCORE_THRESHOLD}] \ + [--label-map ${LABEL_MAP}] \ + [--device ${DEVICE}] \ + [--out-filename ${OUTPUT_FILENAME}] \ + [--predict-stepsize ${PREDICT_STEPSIZE}] \ + [--output-stepsize ${OUTPUT_STEPSIZE}] \ + [--output-fps ${OUTPUT_FPS}] +``` + +可选参数: + +- `SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE`: 时空检测配置文件路径。 +- `SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT`: 时空检测模型权重文件路径。 +- `HUMAN_DETECTION_CONFIG_FILE`: 人体检测配置文件路径。 +- `HUMAN_DETECTION_CHECKPOINT`: 人体检测模型权重文件路径。 +- `HUMAN_DETECTION_SCORE_THRE`: 人体检测分数阈值,默认为 0.9。 +- `ACTION_DETECTION_SCORE_THRESHOLD`: 动作检测分数阈值,默认为 0.5。 +- `LABEL_MAP`: 所使用的标签映射文件,默认为 `tools/data/ava/label_map.txt`。 +- `DEVICE`: 指定脚本运行设备,支持 cuda 设备(如 `cuda:0`)或 cpu(`cpu`)。默认为 `cuda:0`。 +- `OUTPUT_FILENAME`: 输出视频的路径,默认为 `demo/stdet_demo.mp4`。 +- `PREDICT_STEPSIZE`: 每 N 帧进行一次预测(以节约计算资源),默认值为 8。 +- `OUTPUT_STEPSIZE`: 对于输入视频的每 N 帧,输出 1 帧至输出视频中, 默认值为 4,注意需满足 `PREDICT_STEPSIZE % OUTPUT_STEPSIZE == 0`。 +- `OUTPUT_FPS`: 输出视频的帧率,默认值为 6。 + +示例: + +以下示例假设用户的当前目录为 `$MMACTION2`,并已经将所需的模型权重文件下载至目录 `checkpoints/` 下,用户也可以使用所提供的 URL 来直接加载模型权重,文件将会被默认下载至 `$HOME/.cache/torch/checkpoints`。 + +1. 使用 Faster RCNN 作为人体检测器,SlowOnly-8x8-R101 作为动作检测器。每 8 帧进行一次预测,原视频中每 4 帧输出 1 帧至输出视频中,设置输出视频的帧率为 6。 + +```shell +python demo/demo_spatiotemporal_det.py --video demo/demo.mp4 \ + --config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \ + --checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --det-score-thr 0.9 \ + --action-score-thr 0.5 \ + --label-map tools/data/ava/label_map.txt \ + --predict-stepsize 8 \ + --output-stepsize 4 \ + --output-fps 6 +``` + +## 可视化输入视频的 GradCAM + +MMAction2 提供如下脚本以可视化输入视频的 GradCAM。 + +```shell +python demo/demo_gradcam.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} [--use-frames] \ + [--device ${DEVICE_TYPE}] [--target-layer-name ${TARGET_LAYER_NAME}] [--fps {FPS}] \ + [--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}] +``` + +可选参数: + +- `--use-frames`: 如指定,代表使用帧目录作为输入;否则代表使用视频作为输入。 +- `DEVICE_TYPE`: 指定脚本运行设备,支持 cuda 设备(如 `cuda:0`)或 cpu(`cpu`)。默认为 `cuda:0`。 +- `TARGET_LAYER_NAME`: 需要生成 GradCAM 可视化的网络层名称。 +- `FPS`: 使用帧目录作为输入时,代表输入的帧率。默认为 30。 +- `TARGET_RESOLUTION`: 输出视频的分辨率,如未指定,使用输入视频的分辨率。 +- `RESIZE_ALGORITHM`: 缩放视频时使用的插值方法,默认为 `bilinear`。 +- `OUT_FILE`: 输出视频的路径,如未指定,则不会生成输出视频。 + +示例: + +以下示例假设用户的当前目录为 `$MMACTION2`,并已经将所需的模型权重文件下载至目录 `checkpoints/` 下,用户也可以使用所提供的 URL 来直接加载模型权重,文件将会被默认下载至 `$HOME/.cache/torch/checkpoints`。 + +1. 对于 I3D 模型进行 GradCAM 的可视化,使用视频作为输入,并输出一帧率为 10 的 GIF 文件: + + ```shell + python demo/demo_gradcam.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth demo/demo.mp4 \ + --target-layer-name backbone/layer4/1/relu --fps 10 \ + --out-filename demo/demo_gradcam.gif + ``` + +2. 对于 I3D 模型进行 GradCAM 的可视化,使用视频作为输入,并输出一 GIF 文件,此示例利用 URL 加载模型权重文件: + + ```shell + python demo/demo_gradcam.py configs/recognition/tsm/tsm_r50_video_inference_1x1x8_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_1x1x8_100e_kinetics400_rgb_20200702-a77f4328.pth \ + demo/demo.mp4 --target-layer-name backbone/layer4/1/relu --out-filename demo/demo_gradcam_tsm.gif + ``` + +## 使用网络摄像头的实时动作识别 + +MMAction2 提供如下脚本来进行使用网络摄像头的实时动作识别。为得到 \[0, 1\] 间的动作分值,请确保在配置文件中设定 `model['test_cfg'] = dict(average_clips='prob')` 。 + +```shell +python demo/webcam_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${LABEL_FILE} \ + [--device ${DEVICE_TYPE}] [--camera-id ${CAMERA_ID}] [--threshold ${THRESHOLD}] \ + [--average-size ${AVERAGE_SIZE}] [--drawing-fps ${DRAWING_FPS}] [--inference-fps ${INFERENCE_FPS}] +``` + +可选参数: + +- `DEVICE_TYPE`: 指定脚本运行设备,支持 cuda 设备(如 `cuda:0`)或 cpu(`cpu`)。默认为 `cuda:0`。 +- `CAMERA_ID`: 摄像头设备的 ID,默认为 0。 +- `THRESHOLD`: 动作识别的分数阈值,只有分数大于阈值的动作类型会被显示,默认为 0。 +- `AVERAGE_SIZE`: 使用最近 N 个片段的平均结果作为预测,默认为 1。 +- `DRAWING_FPS`: 可视化结果时的最高帧率,默认为 20。 +- `INFERENCE_FPS`: 进行推理时的最高帧率,默认为 4。 + +**注**: 若用户的硬件配置足够,可增大可视化帧率和推理帧率以带来更好体验。 + +示例: + +以下示例假设用户的当前目录为 `$MMACTION2`,并已经将所需的模型权重文件下载至目录 `checkpoints/` 下,用户也可以使用所提供的 URL 来直接加载模型权重,文件将会被默认下载至 `$HOME/.cache/torch/checkpoints`。 + +1. 使用 TSN 模型进行利用网络摄像头的实时动作识别,平均最近 5 个片段结果作为预测,输出大于阈值 0.2 的动作类别: + +```shell + python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth tools/data/kinetics/label_map_k400.txt --average-size 5 \ + --threshold 0.2 --device cpu +``` + +2. 使用 TSN 模型在 CPU 上进行利用网络摄像头的实时动作识别,平均最近 5 个片段结果作为预测,输出大于阈值 0.2 的动作类别,此示例利用 URL 加载模型权重文件: + +```shell + python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + tools/data/kinetics/label_map_k400.txt --average-size 5 --threshold 0.2 --device cpu +``` + +3. 使用 I3D 模型在 GPU 上进行利用网络摄像头的实时动作识别,平均最近 5 个片段结果作为预测,输出大于阈值 0.2 的动作类别: + +```shell + python demo/webcam_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth tools/data/kinetics/label_map_k400.txt \ + --average-size 5 --threshold 0.2 +``` + +**注:** 考虑到用户所使用的推理设备具有性能差异,可进行如下改动在用户设备上取得更好效果: + +1). 更改配置文件中的 `test_pipeline` 下 `SampleFrames` 步骤 (特别是 `clip_len` 与 `num_clips`)。 +2). 更改配置文件中的 `test_pipeline` 下的裁剪方式类型(可选项含:`TenCrop`, `ThreeCrop`, `CenterCrop`)。 +3). 调低 `AVERAGE_SIZE` 以加快推理。 + +## 滑动窗口预测长视频中不同动作类别 + +MMAction2 提供如下脚本来预测长视频中的不同动作类别。为得到 \[0, 1\] 间的动作分值,请确保在配置文件中设定 `model['test_cfg'] = dict(average_clips='prob')` 。 + +```shell +python demo/long_video_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} ${LABEL_FILE} \ + ${OUT_FILE} [--input-step ${INPUT_STEP}] [--device ${DEVICE_TYPE}] [--threshold ${THRESHOLD}] +``` + +可选参数: + +- `OUT_FILE`: 输出视频的路径。 +- `INPUT_STEP`: 在视频中的每 N 帧中选取一帧作为输入,默认为 1。 +- `DEVICE_TYPE`: 指定脚本运行设备,支持 cuda 设备(如 `cuda:0`)或 cpu(`cpu`)。默认为 `cuda:0`。 +- `THRESHOLD`: 动作识别的分数阈值,只有分数大于阈值的动作类型会被显示,默认为 0.01。 +- `STRIDE`: 默认情况下,脚本为每帧给出单独预测,较为耗时。可以设定 `STRIDE` 参数进行加速,此时脚本将会为每 `STRIDE x sample_length` 帧做一次预测(`sample_length` 指模型采帧时的时间窗大小,等于 `clip_len x frame_interval`)。例如,若 sample_length 为 64 帧且 `STRIDE` 设定为 0.5,模型将每 32 帧做一次预测。若 `STRIDE` 设为 0,模型将为每帧做一次预测。`STRIDE` 的理想取值为 (0, 1\] 间,若大于 1,脚本亦可正常执行。`STRIDE` 默认值为 0。 + +示例: + +以下示例假设用户的当前目录为 `$MMACTION2`,并已经将所需的模型权重文件下载至目录 `checkpoints/` 下,用户也可以使用所提供的 URL 来直接加载模型权重,文件将会被默认下载至 `$HOME/.cache/torch/checkpoints`。 + +1. 利用 TSN 模型在 CPU 上预测长视频中的不同动作类别,设置 `INPUT_STEP` 为 3(即每 3 帧随机选取 1 帧作为输入),输出分值大于 0.2 的动作类别: + +```shell + python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO \ + --input-step 3 --device cpu --threshold 0.2 +``` + +2. 利用 TSN 模型在 CPU 上预测长视频中的不同动作类别,设置 `INPUT_STEP` 为 3,输出分值大于 0.2 的动作类别,此示例利用 URL 加载模型权重文件: + +```shell + python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2 +``` + +3. 利用 TSN 模型在 CPU 上预测网络长视频(利用 URL 读取)中的不同动作类别,设置 `INPUT_STEP` 为 3,输出分值大于 0.2 的动作类别,此示例利用 URL 加载模型权重文件: + +```shell + python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4 \ + tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2 +``` + +4. 利用 I3D 模型在 GPU 上预测长视频中的不同动作类别,设置 `INPUT_STEP` 为 3,动作识别的分数阈值为 0.01: + + ```shell + python demo/long_video_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO \ + ``` + +## 基于网络摄像头的实时时空动作检测 + +MMAction2 提供本脚本实现基于网络摄像头的实时时空动作检测。 + +```shell +python demo/webcam_demo_spatiotemporal_det.py \ + [--config ${SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE}] \ + [--checkpoint ${SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT}] \ + [--action-score-thr ${ACTION_DETECTION_SCORE_THRESHOLD}] \ + [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \ + [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \ + [--det-score-thr ${HUMAN_DETECTION_SCORE_THRESHOLD}] \ + [--input-video] ${INPUT_VIDEO} \ + [--label-map ${LABEL_MAP}] \ + [--device ${DEVICE}] \ + [--output-fps ${OUTPUT_FPS}] \ + [--out-filename ${OUTPUT_FILENAME}] \ + [--show] \ + [--display-height] ${DISPLAY_HEIGHT} \ + [--display-width] ${DISPLAY_WIDTH} \ + [--predict-stepsize ${PREDICT_STEPSIZE}] \ + [--clip-vis-length] ${CLIP_VIS_LENGTH} +``` + +可选参数: + +- `SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE`: 时空检测配置文件路径。 +- `SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT`: 时空检测模型权重文件路径。 +- `ACTION_DETECTION_SCORE_THRESHOLD`: 动作检测分数阈值,默认为 0.4。 +- `HUMAN_DETECTION_CONFIG_FILE`: 人体检测配置文件路径。 +- `HUMAN_DETECTION_CHECKPOINT`: 人体检测模型权重文件路径。 +- `HUMAN_DETECTION_SCORE_THRE`: 人体检测分数阈值,默认为 0.9。 +- `INPUT_VIDEO`: 网络摄像头编号或本地视频文件路径,默认为 `0`。 +- `LABEL_MAP`: 所使用的标签映射文件,默认为 `tools/data/ava/label_map.txt`。 +- `DEVICE`: 指定脚本运行设备,支持 cuda 设备(如 `cuda:0`)或 cpu(`cpu`),默认为 `cuda:0`。 +- `OUTPUT_FPS`: 输出视频的帧率,默认为 15。 +- `OUTPUT_FILENAME`: 输出视频的路径,默认为 `None`。 +- `--show`: 是否通过 `cv2.imshow` 展示预测结果。 +- `DISPLAY_HEIGHT`: 输出结果图像高度,默认为 0。 +- `DISPLAY_WIDTH`: 输出结果图像宽度,默认为 0。若 `DISPLAY_HEIGHT <= 0 and DISPLAY_WIDTH <= 0`,则表示输出图像形状与输入视频形状相同。 +- `PREDICT_STEPSIZE`: 每 N 帧进行一次预测(以控制计算资源),默认为 8。 +- `CLIP_VIS_LENGTH`: 预测结果可视化持续帧数,即每次预测结果将可视化到 `CLIP_VIS_LENGTH` 帧中,默认为 8。 + +小技巧: + +- 如何设置 `--output-fps` 的数值? + + - `--output-fps` 建议设置为视频读取线程的帧率。 + - 视频读取线程帧率已通过日志输出,格式为 `DEBUG:__main__:Read Thread: {duration} ms, {fps} fps`。 + +- 如何设置 `--predict-stepsize` 的数值? + + - 该参数选择与模型选型有关。 + - 模型输入构建时间(视频读取线程)应大于等于模型推理时间(主线程)。 + - 模型输入构建时间与模型推理时间均已通过日志输出。 + - `--predict-stepsize` 数值越大,模型输入构建时间越长。 + - 可降低 `--predict-stepsize` 数值增加模型推理频率,从而充分利用计算资源。 + +示例: + +以下示例假设用户的当前目录为 $MMACTION2,并已经将所需的模型权重文件下载至目录 checkpoints/ 下,用户也可以使用所提供的 URL 来直接加载模型权重,文件将会被默认下载至 $HOME/.cache/torch/checkpoints。 + +1. 使用 Faster RCNN 作为人体检测器,SlowOnly-8x8-R101 作为动作检测器,每 8 帧进行一次预测,设置输出视频的帧率为 20,并通过 `cv2.imshow` 展示预测结果。 + +```shell +python demo/webcam_demo_spatiotemporal_det.py \ + --input-video 0 \ + --config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \ + --checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --det-score-thr 0.9 \ + --action-score-thr 0.5 \ + --label-map tools/data/ava/label_map.txt \ + --predict-stepsize 40 \ + --output-fps 20 \ + --show +``` + +## 基于人体姿态预测动作标签 + +MMAction2 提供本脚本实现基于人体姿态的动作标签预测。 + +```shell +python demo/demo_skeleton.py ${VIDEO_FILE} ${OUT_FILENAME} \ + [--config ${SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE}] \ + [--checkpoint ${SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT}] \ + [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \ + [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \ + [--det-score-thr ${HUMAN_DETECTION_SCORE_THRESHOLD}] \ + [--pose-config ${HUMAN_POSE_ESTIMATION_CONFIG_FILE}] \ + [--pose-checkpoint ${HUMAN_POSE_ESTIMATION_CHECKPOINT}] \ + [--label-map ${LABEL_MAP}] \ + [--device ${DEVICE}] \ + [--short-side] ${SHORT_SIDE} +``` + +可选参数: + +- `SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE`: 基于人体姿态的动作识别模型配置文件路径。 +- `SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT`: 基于人体姿态的动作识别模型权重文件路径。 +- `HUMAN_DETECTION_CONFIG_FILE`: 人体检测配置文件路径。 +- `HUMAN_DETECTION_CHECKPOINT`: 人体检测模型权重文件路径。 +- `HUMAN_DETECTION_SCORE_THRE`: 人体检测分数阈值,默认为 0.9。 +- `HUMAN_POSE_ESTIMATION_CONFIG_FILE`: 人体姿态估计模型配置文件路径 (需在 COCO-keypoint 数据集上训练)。 +- `HUMAN_POSE_ESTIMATION_CHECKPOINT`: 人体姿态估计模型权重文件路径 (需在 COCO-keypoint 数据集上训练). +- `LABEL_MAP`: 所使用的标签映射文件,默认为 `tools/data/skeleton/label_map_ntu120.txt`。 +- `DEVICE`: 指定脚本运行设备,支持 cuda 设备(如 `cuda:0`)或 cpu(`cpu`),默认为 `cuda:0`。 +- `SHORT_SIDE`: 视频抽帧时使用的短边长度,默认为 480。 + +示例: + +以下示例假设用户的当前目录为 $MMACTION2。 + +1. 使用 Faster RCNN 作为人体检测器,HRNetw32 作为人体姿态估计模型,PoseC3D-NTURGB+D-120-Xsub-keypoint 作为基于人体姿态的动作识别模型。 + +```shell +python demo/demo_skeleton.py demo/ntu_sample.avi demo/skeleton_demo.mp4 \ + --config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \ + --checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --det-score-thr 0.9 \ + --pose-config demo/hrnet_w32_coco_256x192.py \ + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --label-map tools/data/skeleton/label_map_ntu120.txt +``` + +2. 使用 Faster RCNN 作为人体检测器,HRNetw32 作为人体姿态估计模型,STGCN-NTURGB+D-60-Xsub-keypoint 作为基于人体姿态的动作识别模型。 + +```shell +python demo/demo_skeleton.py demo/ntu_sample.avi demo/skeleton_demo.mp4 \ + --config configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \ + --checkpoint https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint-e7bb9653.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --det-score-thr 0.9 \ + --pose-config demo/hrnet_w32_coco_256x192.py \ + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --label-map tools/data/skeleton/label_map_ntu120.txt +``` + +## 视频结构化预测 + +MMAction2 提供本脚本实现基于人体姿态和RGB的视频结构化预测。 + +```shell +python demo/demo_video_structuralize.py + [--rgb-stdet-config ${RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CONFIG_FILE}] \ + [--rgb-stdet-checkpoint ${RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT}] \ + [--skeleton-stdet-checkpoint ${SKELETON_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT}] \ + [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \ + [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \ + [--pose-config ${HUMAN_POSE_ESTIMATION_CONFIG_FILE}] \ + [--pose-checkpoint ${HUMAN_POSE_ESTIMATION_CHECKPOINT}] \ + [--skeleton-config ${SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE}] \ + [--skeleton-checkpoint ${SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT}] \ + [--rgb-config ${RGB_BASED_ACTION_RECOGNITION_CONFIG_FILE}] \ + [--rgb-checkpoint ${RGB_BASED_ACTION_RECOGNITION_CHECKPOINT}] \ + [--use-skeleton-stdet ${USE_SKELETON_BASED_SPATIO_TEMPORAL_DETECTION_METHOD}] \ + [--use-skeleton-recog ${USE_SKELETON_BASED_ACTION_RECOGNITION_METHOD}] \ + [--det-score-thr ${HUMAN_DETECTION_SCORE_THRE}] \ + [--action-score-thr ${ACTION_DETECTION_SCORE_THRE}] \ + [--video ${VIDEO_FILE}] \ + [--label-map-stdet ${LABEL_MAP_FOR_SPATIO_TEMPORAL_ACTION_DETECTION}] \ + [--device ${DEVICE}] \ + [--out-filename ${OUTPUT_FILENAME}] \ + [--predict-stepsize ${PREDICT_STEPSIZE}] \ + [--output-stepsize ${OUTPU_STEPSIZE}] \ + [--output-fps ${OUTPUT_FPS}] \ + [--cfg-options] +``` + +可选参数: + +- `RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CONFIG_FILE`: 基于 RGB 的时空检测配置文件路径。 +- `RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT`: 基于 RGB 的时空检测模型权重文件路径。 +- `SKELETON_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT`: 基于人体姿态的时空检测模型权重文件路径。 +- `HUMAN_DETECTION_CONFIG_FILE`: 人体检测配置文件路径。 +- `HUMAN_DETECTION_CHECKPOINT`: The human detection checkpoint URL. +- `HUMAN_POSE_ESTIMATION_CONFIG_FILE`: 人体姿态估计模型配置文件路径 (需在 COCO-keypoint 数据集上训练)。 +- `HUMAN_POSE_ESTIMATION_CHECKPOINT`: 人体姿态估计模型权重文件路径 (需在 COCO-keypoint 数据集上训练)。 +- `SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE`: 基于人体姿态的动作识别模型配置文件路径。 +- `SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT`: 基于人体姿态的动作识别模型权重文件路径。 +- `RGB_BASED_ACTION_RECOGNITION_CONFIG_FILE`: 基于 RGB 的行为识别配置文件路径。 +- `RGB_BASED_ACTION_RECOGNITION_CHECKPOINT`: 基于 RGB 的行为识别模型权重文件路径。 +- `USE_SKELETON_BASED_SPATIO_TEMPORAL_DETECTION_METHOD`: 使用基于人体姿态的时空检测方法。 +- `USE_SKELETON_BASED_ACTION_RECOGNITION_METHOD`: 使用基于人体姿态的行为识别方法。 +- `HUMAN_DETECTION_SCORE_THRE`: 人体检测分数阈值,默认为 0.9。 +- `ACTION_DETECTION_SCORE_THRE`: 动作检测分数阈值,默认为 0.5。 +- `LABEL_MAP_FOR_SPATIO_TEMPORAL_ACTION_DETECTION`: 时空动作检测所使用的标签映射文件,默认为: `tools/data/ava/label_map.txt`。 +- `LABEL_MAP`: 行为识别所使用的标签映射文件, 默认为: `tools/data/kinetics/label_map_k400.txt`。 +- `DEVICE`: 指定脚本运行设备,支持 cuda 设备(如 `cuda:0`)或 cpu(`cpu`),默认为 `cuda:0`。 +- `OUTPUT_FILENAME`: 输出视频的路径,默认为 `demo/test_stdet_recognition_output.mp4`。 +- `PREDICT_STEPSIZE`: 每 N 帧进行一次预测(以节约计算资源),默认值为 8。 +- `OUTPUT_STEPSIZE`: 对于输入视频的每 N 帧,输出 1 帧至输出视频中, 默认值为 1,注意需满足 `PREDICT_STEPSIZE % OUTPUT_STEPSIZE == 0`。 +- `OUTPUT_FPS`: 输出视频的帧率,默认为 24。 + +示例: + +以下示例假设用户的当前目录为 $MMACTION2。 + +1. 使用 Faster RCNN 作为人体检测器,HRNetw32 作为人体姿态估计模型,PoseC3D 作为基于人体姿态的动作识别模型和时空动作检测器。每 8 帧进行一次预测,原视频中每 1 帧输出 1 帧至输出视频中,设置输出视频的帧率为 24。 + +```shell +python demo/demo_video_structuralize.py + --skeleton-stdet-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/posec3d_ava.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --pose-config demo/hrnet_w32_coco_256x192.py + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/ + hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --skeleton-config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \ + --skeleton-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/ + posec3d_k400.pth \ + --use-skeleton-stdet \ + --use-skeleton-recog \ + --label-map-stdet tools/data/ava/label_map.txt \ + --label-map tools/data/kinetics/label_map_k400.txt +``` + +2. 使用 Faster RCNN 作为人体检测器,TSN-R50-1x1x3 作为动作识别模型, SlowOnly-8x8-R101 作为时空动检测器。每 8 帧进行一次预测,原视频中每 1 帧输出 1 帧至输出视频中,设置输出视频的帧率为 24。 + +```shell +python demo/demo_video_structuralize.py + --rgb-stdet-config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \ + --rgb-stdet-checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --rgb-config configs/recognition/tsn/ + tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + --rgb-checkpoint https://download.openmmlab.com/mmaction/recognition/ + tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/ + tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + --label-map-stdet tools/data/ava/label_map.txt \ + --label-map tools/data/kinetics/label_map_k400.txt +``` + +3. 使用 Faster RCNN 作为人体检测器,HRNetw32 作为人体姿态估计模型,PoseC3D 作为基于人体姿态的动作识别模型, SlowOnly-8x8-R101 作为时空动作检测器。每 8 帧进行一次预测,原视频中每 1 帧输出 1 帧至输出视频中,设置输出视频的帧率为 24。 + +```shell +python demo/demo_video_structuralize.py + --rgb-stdet-config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \ + --rgb-stdet-checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --pose-config demo/hrnet_w32_coco_256x192.py + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/ + hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --skeleton-config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \ + --skeleton-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/ + posec3d_k400.pth \ + --use-skeleton-recog \ + --label-map-stdet tools/data/ava/label_map.txt \ + --label-map tools/data/kinetics/label_map_k400.txt +``` + +4. 使用 Faster RCNN 作为人体检测器,HRNetw32 作为人体姿态估计模型,TSN-R50-1x1x3 作为动作识别模型, PoseC3D 作为基于人体姿态的时空动作检测器。每 8 帧进行一次预测,原视频中每 1 帧输出 1 帧至输出视频中,设置输出视频的帧率为 24。 + +```shell +python demo/demo_video_structuralize.py + --skeleton-stdet-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/posec3d_ava.pth \ + --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \ + --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \ + --pose-config demo/hrnet_w32_coco_256x192.py + --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/ + hrnet_w32_coco_256x192-c78dce93_20200708.pth \ + --skeleton-config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \ + --rgb-config configs/recognition/tsn/ + tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + --rgb-checkpoint https://download.openmmlab.com/mmaction/recognition/ + tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/ + tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + --use-skeleton-stdet \ + --label-map-stdet tools/data/ava/label_map.txt \ + --label-map tools/data/kinetics/label_map_k400.txt +``` + +## 基于音频的动作识别 + +本脚本可用于进行基于音频特征的动作识别。 + +脚本 `extract_audio.py` 可被用于从视频中提取音频,脚本 `build_audio_features.py` 可被用于基于音频文件提取音频特征。 + +```shell +python demo/demo_audio.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${AUDIO_FILE} {LABEL_FILE} [--device ${DEVICE}] +``` + +可选参数: + +- `DEVICE`: 指定脚本运行设备,支持 cuda 设备(如 `cuda:0`)或 cpu(`cpu`),默认为 `cuda:0`。 + +示例: + +以下示例假设用户的当前目录为 $MMACTION2。 + +1. 在 GPU 上,使用 TSN 模型进行基于音频特征的动作识别。 + + ```shell + python demo/demo_audio.py \ + configs/recognition_audio/resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py \ + https://download.openmmlab.com/mmaction/recognition/audio_recognition/tsn_r18_64x1x1_100e_kinetics400_audio_feature/tsn_r18_64x1x1_100e_kinetics400_audio_feature_20201012-bf34df6c.pth \ + audio_feature.npy label_map_k400.txt + ``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/faq.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/faq.md new file mode 100644 index 0000000000000000000000000000000000000000..4c46302ecd67b56fead0e08e40d909e762c05449 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/faq.md @@ -0,0 +1,112 @@ +# 常见问题解答 + +本文这里列出了用户们遇到的一些常见问题,及相应的解决方案。 +如果您发现了任何社区中经常出现的问题,也有了相应的解决方案,欢迎充实本文档来帮助他人。 +如果本文档不包括您的问题,欢迎使用提供的 [模板](/.github/ISSUE_TEMPLATE/error-report.md) 创建问题,还请确保您在模板中填写了所有必需的信息。 + +## 安装 + +- **"No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"** + + 1. 使用 `pip uninstall mmcv` 卸载环境中已安装的 `mmcv`。 + 2. 遵循 [MMCV 安装文档](https://mmcv.readthedocs.io/en/latest/#installation) 来安装 `mmcv-full`。 + +- **"OSError: MoviePy Error: creation of None failed because of the following error"** + + 参照 [MMAction2 安装文档](https://github.com/open-mmlab/mmaction2/blob/master/docs_zh_CN/install.md#%E5%AE%89%E8%A3%85%E4%BE%9D%E8%B5%96%E5%8C%85) + + 1. 对于 Windows 用户,[ImageMagick](https://www.imagemagick.org/script/index.php) 不再被 MoviePy 自动检测, + 需要获取名为 `magick` 的 ImageMagick 二进制包的路径,来修改 `moviepy/config_defaults.py` 文件中的 `IMAGEMAGICK_BINARY`,如 `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"` + 2. 对于 Linux 用户,如果 ImageMagick 没有被 moviepy 检测,需要注释掉 `/etc/ImageMagick-6/policy.xml` 文件中的 ``,即改为 ``。 + +- **"Please install XXCODEBASE to use XXX"** + + 如得到报错消息 "Please install XXCODEBASE to use XXX",代表 MMAction2 无法从 XXCODEBASE 中 import XXX。用户可以执行对应 import 语句定位原因。 + 一个可能的原因是,对于部分 OpenMMLAB 中的代码库,需先安装 mmcv-full 后再进行安装。 + +## 数据 + +- **FileNotFound 如 `No such file or directory: xxx/xxx/img_00300.jpg`** + + 在 MMAction2 中,对于帧数据集,`start_index` 的默认值为 1,而对于视频数据集, `start_index` 的默认值为 0。 + 如果 FileNotFound 错误发生于视频的第一帧或最后一帧,则需根据视频首帧(即 `xxx_00000.jpg` 或 `xxx_00001.jpg`)的偏移量,修改配置文件中数据处理流水线的 `start_index` 值。 + +- **如何处理数据集中传入视频的尺寸?是把所有视频调整为固定尺寸,如 “340x256”,还是把所有视频的短边调整成相同的长度(256像素或320像素)?** + + 从基准测试来看,总体来说,后者(把所有视频的短边调整成相同的长度)效果更好,所以“调整尺寸为短边256像素”被设置为默认的数据处理方式。用户可以在 [TSN 数据基准测试](https://github.com/open-mmlab/mmaction2/tree/master/configs/recognition/tsn) 和 [SlowOnly 数据基准测试](https://github.com/open-mmlab/mmaction2/tree/master/configs/recognition/tsn) 中查看相关的基准测试结果。 + +- **输入数据格式(视频或帧)与数据流水线不匹配,导致异常,如 `KeyError: 'total_frames'`** + + 对于视频和帧,我们都有相应的流水线来处理。 + + **对于视频**,应该在处理时首先对其进行解码。可选的解码方式,有 `DecordInit & DecordDecode`, `OpenCVInit & OpenCVDecode`, `PyAVInit & PyAVDecode` 等等。可以参照 [这个例子](https://github.com/open-mmlab/mmaction2/blob/023777cfd26bb175f85d78c455f6869673e0aa09/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py#L47-L49)。 + + **对于帧**,已经事先在本地对其解码,所以使用 `RawFrameDecode` 对帧处理即可。可以参照 [这个例子](https://github.com/open-mmlab/mmaction2/blob/023777cfd26bb175f85d78c455f6869673e0aa09/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py#L49)。 + + `KeyError: 'total_frames'` 是因为错误地使用了 `RawFrameDecode` 来处理视频。当输入是视频的时候,程序是无法事先得到 `total_frame` 的。 + +## 训练 + +- **如何使用训练过的识别器作为主干网络的预训练模型?** + + 参照 [使用预训练模型](https://github.com/open-mmlab/mmaction2/blob/master/docs_zh_CN/tutorials/2_finetune.md#%E4%BD%BF%E7%94%A8%E9%A2%84%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B), + 如果想对整个网络使用预训练模型,可以在配置文件中,将 `load_from` 设置为预训练模型的链接。 + + 如果只想对主干网络使用预训练模型,可以在配置文件中,将主干网络 `backbone` 中的 `pretrained` 设置为预训练模型的地址或链接。 + 在训练时,预训练模型中无法与主干网络对应的参数会被忽略。 + +- **如何实时绘制训练集和验证集的准确率/损失函数曲线图?** + + 使用 `log_config` 中的 `TensorboardLoggerHook`,如: + + ```python + log_config=dict( + interval=20, + hooks=[ + dict(type='TensorboardLoggerHook') + ] + ) + ``` + + 可以参照 [教程1:如何编写配置文件](tutorials/1_config.md),[教程7:如何自定义模型运行参数](tutorials/7_customize_runtime.md#log-config),和 [这个例子](https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py#L118) 了解更多相关内容。 + +- **在 batchnorm.py 中抛出错误: Expected more than 1 value per channel when training** + + BatchNorm 层要求批大小(batch size)大于 1。构建数据集时, 若 `drop_last` 被设为 `False`,有时每个轮次的最后一个批次的批大小可能为 1,进而在训练时抛出错误,可以设置 `drop_last=True` 来避免该错误,如: + + ```python + train_dataloader=dict(drop_last=True) + ``` + +- **微调模型参数时,如何冻结主干网络中的部分参数?** + + 可以参照 [`def _freeze_stages()`](https://github.com/open-mmlab/mmaction2/blob/0149a0e8c1e0380955db61680c0006626fd008e9/mmaction/models/backbones/x3d.py#L458) 和 [`frozen_stages`](https://github.com/open-mmlab/mmaction2/blob/0149a0e8c1e0380955db61680c0006626fd008e9/mmaction/models/backbones/x3d.py#L183-L184)。在分布式训练和测试时,还须设置 `find_unused_parameters = True`。 + + 实际上,除了少数模型,如 C3D 等,用户都能通过设置 `frozen_stages` 来冻结模型参数,因为大多数主干网络继承自 `ResNet` 和 `ResNet3D`,而这两个模型都支持 `_freeze_stages()` 方法。 + +- **如何在配置文件中设置 `load_from` 参数以进行模型微调?** + + MMAction2 在 `configs/_base_/default_runtime.py` 文件中将 `load_from=None` 设为默认。由于配置文件的可继承性,用户可直接在下游配置文件中设置 `load_from` 的值来进行更改。 + +## 测试 + +- **如何将预测分值用 softmax 归一化到 \[0, 1\] 区间内?** + + 可以通过设置 `model['test_cfg'] = dict(average_clips='prob')` 来实现。 + +- **如果模型太大,连一个测试样例都没法放进显存,怎么办?** + + 默认情况下,3D 模型是以 `10 clips x 3 crops` 的设置进行测试的,也即采样 10 个帧片段,每帧裁剪出 3 个图像块,总计有 30 个视图。 + 对于特别大的模型,GPU 显存可能连一个视频都放不下。对于这种情况,您可以在配置文件的 `model['test_cfg']` 中设置 `max_testing_views=n`。 + 如此设置,在模型推理过程中,一个批只会使用 n 个视图,以节省显存。 + +- **如何保存测试结果?** + + 测试时,用户可在运行指令中设置可选项 `--out xxx.json/pkl/yaml` 来输出结果文件,以供后续检查。输出的测试结果顺序和测试集顺序保持一致。 + 除此之外,MMAction2 也在 [`tools/analysis/eval_metric.py`](/tools/analysis/eval_metric.py) 中提供了分析工具,用于结果文件的模型评估。 + +## 部署 + +- **为什么由 MMAction2 转换的 ONNX 模型在转换到其他框架(如 TensorRT)时会抛出错误?** + + 目前只能确保 MMAction2 中的模型与 ONNX 兼容。但是,ONNX 中的某些算子可能不受其他框架支持,例如 [这个问题](https://github.com/open-mmlab/mmaction2/issues/414) 中的 TensorRT。当这种情况发生时,如果 `pytorch2onnx.py` 没有出现问题,转换过去的 ONNX 模型也通过了数值检验,可以提 issue 让社区提供帮助。 diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/feature_extraction.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/feature_extraction.md new file mode 100644 index 0000000000000000000000000000000000000000..62b76066bc8e2b37a9e4d6d3d7ef8a2388942e19 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/feature_extraction.md @@ -0,0 +1,70 @@ +# 特征提取 + +MMAction2 为特征提取提供了便捷使用的脚本。 + +## 片段级特征提取 + +片段级特征提取是从长度一般为几秒到几十秒不等的剪辑片段中提取深度特征。从每个片段中提取的特征是一个 n 维向量。当进行多视图特征提取时,例如 n 个片段 × m 种裁剪,提取的特征将会是 n\*m 个视图的平均值。 + +在应用片段级特征提取之前,用户需要准备一个视频列表包含所有想要进行特征提取的视频。例如,由 UCF101 中视频组成的视频列表如下: + +``` +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c03.avi +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi +ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi +... +YoYo/v_YoYo_g25_c01.avi +YoYo/v_YoYo_g25_c02.avi +YoYo/v_YoYo_g25_c03.avi +YoYo/v_YoYo_g25_c04.avi +YoYo/v_YoYo_g25_c05.avi +``` + +假设 UCF101 中的视频所在目录为 `data/ucf101/videos`,视频列表的文件名为 `ucf101.txt`,使用 TSN(Kinetics-400 预训练)从 UCF101 中提取片段级特征,用户可以使用脚本如下: + +```shell +python tools/misc/clip_feature_extraction.py \ +configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \ +https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \ +--video-list ucf101.txt \ +--video-root data/ucf101/videos \ +--out ucf101_feature.pkl +``` + +被提取的特征存储于 `ucf101_feature.pkl`。 + +用户也可以使用分布式片段级特征提取。以下是使用拥有 8 gpus 的计算节点的示例。 + +```shell +bash tools/misc/dist_clip_feature_extraction.sh \ +configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \ +https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \ +8 \ +--video-list ucf101.txt \ +--video-root data/ucf101/videos \ +--out ucf101_feature.pkl +``` + +使用 SlowOnly(Kinetics-400 预训练)从 UCF101 中提取片段级特征,用户可以使用脚本如下: + +```shell +python tools/misc/clip_feature_extraction.py \ +configs/recognition/slowonly/slowonly_r50_clip_feature_extraction_4x16x1_rgb.py \ +https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \ +--video-list ucf101.txt \ +--video-root data/ucf101/videos \ +--out ucf101_feature.pkl +``` + +这两个配置文件展示了用于特征提取的最小配置。用户也可以使用其他存在的配置文件进行特征提取,只要注意使用视频数据进行训练和测试,而不是原始帧数据。 + +```shell +python tools/misc/clip_feature_extraction.py \ +configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py \ +https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \ +--video-list ucf101.txt \ +--video-root data/ucf101/videos \ +--out ucf101_feature.pkl +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/getting_started.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/getting_started.md new file mode 100644 index 0000000000000000000000000000000000000000..b1672f13009806906be9deeb7cfd1be963f34b72 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/getting_started.md @@ -0,0 +1,457 @@ +# 基础教程 + +本文档提供 MMAction2 相关用法的基本教程。对于安装说明,请参阅 [安装指南](install.md)。 + + + +- [基础教程](#%E5%9F%BA%E7%A1%80%E6%95%99%E7%A8%8B) + - [数据集](#%E6%95%B0%E6%8D%AE%E9%9B%86) + - [使用预训练模型进行推理](#%E4%BD%BF%E7%94%A8%E9%A2%84%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B%E8%BF%9B%E8%A1%8C%E6%8E%A8%E7%90%86) + - [测试某个数据集](#%E6%B5%8B%E8%AF%95%E6%9F%90%E4%B8%AA%E6%95%B0%E6%8D%AE%E9%9B%86) + - [使用高级 API 对视频和帧文件夹进行测试](#%E4%BD%BF%E7%94%A8%E9%AB%98%E7%BA%A7-api-%E5%AF%B9%E8%A7%86%E9%A2%91%E5%92%8C%E5%B8%A7%E6%96%87%E4%BB%B6%E5%A4%B9%E8%BF%9B%E8%A1%8C%E6%B5%8B%E8%AF%95) + - [如何建立模型](#%E5%A6%82%E4%BD%95%E5%BB%BA%E7%AB%8B%E6%A8%A1%E5%9E%8B) + - [使用基本组件建立模型](#%E4%BD%BF%E7%94%A8%E5%9F%BA%E6%9C%AC%E7%BB%84%E4%BB%B6%E5%BB%BA%E7%AB%8B%E6%A8%A1%E5%9E%8B) + - [构建新模型](#%E6%9E%84%E5%BB%BA%E6%96%B0%E6%A8%A1%E5%9E%8B) + - [如何训练模型](#%E5%A6%82%E4%BD%95%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B) + - [推理流水线](#%E6%8E%A8%E7%90%86%E6%B5%81%E6%B0%B4%E7%BA%BF) + - [训练配置](#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE) + - [使用单个 GPU 进行训练](#%E4%BD%BF%E7%94%A8%E5%8D%95%E4%B8%AA-gpu-%E8%BF%9B%E8%A1%8C%E8%AE%AD%E7%BB%83) + - [使用多个 GPU 进行训练](#%E4%BD%BF%E7%94%A8%E5%A4%9A%E4%B8%AA-gpu-%E8%BF%9B%E8%A1%8C%E8%AE%AD%E7%BB%83) + - [使用多台机器进行训练](#%E4%BD%BF%E7%94%A8%E5%A4%9A%E5%8F%B0%E6%9C%BA%E5%99%A8%E8%BF%9B%E8%A1%8C%E8%AE%AD%E7%BB%83) + - [使用单台机器启动多个任务](#%E4%BD%BF%E7%94%A8%E5%8D%95%E5%8F%B0%E6%9C%BA%E5%99%A8%E5%90%AF%E5%8A%A8%E5%A4%9A%E4%B8%AA%E4%BB%BB%E5%8A%A1) + - [详细教程](#%E8%AF%A6%E7%BB%86%E6%95%99%E7%A8%8B) + + + +## 数据集 + +MMAction2 建议用户将数据集根目录链接到 `$MMACTION2/data` 下。 +如果用户的文件夹结构与默认结构不同,则需要在配置文件中进行对应路径的修改。 + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── kinetics400 +│ │ ├── rawframes_train +│ │ ├── rawframes_val +│ │ ├── kinetics_train_list.txt +│ │ ├── kinetics_val_list.txt +│ ├── ucf101 +│ │ ├── rawframes_train +│ │ ├── rawframes_val +│ │ ├── ucf101_train_list.txt +│ │ ├── ucf101_val_list.txt +│ ├── ... +``` + +请参阅 [数据集准备](data_preparation.md) 获取数据集准备的相关信息。 + +对于用户自定义数据集的准备,请参阅 [教程 3:如何增加新数据集](tutorials/3_new_dataset.md) + +## 使用预训练模型进行推理 + +MMAction2 提供了一些脚本用于测试数据集(如 Kinetics-400,Something-Something V1&V2,(Multi-)Moments in Time,等), +并提供了一些高级 API,以便更好地兼容其他项目。 + +MMAction2 支持仅使用 CPU 进行测试。然而,这样做的速度**非常慢**,用户应仅使用其作为无 GPU 机器上的 debug 手段。 +如需使用 CPU 进行测试,用户需要首先使用命令 `export CUDA_VISIBLE_DEVICES=-1` 禁用机器上的 GPU (如有),然后使用命令 `python tools/test.py {OTHER_ARGS}` 直接调用测试脚本。 + +### 测试某个数据集 + +- [x] 支持单 GPU +- [x] 支持单节点,多 GPU +- [x] 支持多节点 + +用户可使用以下命令进行数据集测试 + +```shell +# 单 GPU 测试 +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \ + [--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \ + [--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}] [--onnx] [--tensorrt] + +# 多 GPU 测试 +./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \ + [--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \ + [--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}] +``` + +可选参数: + +- `RESULT_FILE`:输出结果文件名。如果没有被指定,则不会保存测试结果。 +- `EVAL_METRICS`:测试指标。其可选值与对应数据集相关,如 `top_k_accuracy`,`mean_class_accuracy` 适用于所有动作识别数据集,`mmit_mean_average_precision` 适用于 Multi-Moments in Time 数据集,`mean_average_precision` 适用于 Multi-Moments in Time 和单类 HVU 数据集,`AR@AN` 适用于 ActivityNet 数据集等。 +- `--gpu-collect`:如果被指定,动作识别结果将会通过 GPU 通信进行收集。否则,它将被存储到不同 GPU 上的 `TMPDIR` 文件夹中,并在 rank 0 的进程中被收集。 +- `TMPDIR`:用于存储不同进程收集的结果文件的临时文件夹。该变量仅当 `--gpu-collect` 没有被指定时有效。 +- `OPTIONS`:用于验证过程的自定义选项。其可选值与对应数据集的 `evaluate` 函数变量有关。 +- `AVG_TYPE`:用于平均测试片段结果的选项。如果被设置为 `prob`,则会在平均测试片段结果之前施加 softmax 函数。否则,会直接进行平均。 +- `JOB_LAUNCHER`:分布式任务初始化启动器选项。可选值有 `none`,`pytorch`,`slurm`,`mpi`。特别地,如果被设置为 `none`, 则会以非分布式模式进行测试。 +- `LOCAL_RANK`:本地 rank 的 ID。如果没有被指定,则会被设置为 0。 +- `--onnx`: 如果指定,将通过 onnx 模型推理获取预测结果,输入参数 `CHECKPOINT_FILE` 应为 onnx 模型文件。Onnx 模型文件由 `/tools/deployment/pytorch2onnx.py` 脚本导出。目前,不支持多 GPU 测试以及动态张量形状(Dynamic shape)。请注意,数据集输出与模型输入张量的形状应保持一致。同时,不建议使用测试时数据增强,如 `ThreeCrop`,`TenCrop`,`twice_sample` 等。 +- `--tensorrt`: 如果指定,将通过 TensorRT 模型推理获取预测结果,输入参数 `CHECKPOINT_FILE` 应为 TensorRT 模型文件。TensorRT 模型文件由导出的 onnx 模型以及 TensorRT 官方模型转换工具生成。目前,不支持多 GPU 测试以及动态张量形状(Dynamic shape)。请注意,数据集输出与模型输入张量的形状应保持一致。同时,不建议使用测试时数据增强,如 `ThreeCrop`,`TenCrop`,`twice_sample` 等。 + +例子: + +假定用户将下载的模型权重文件放置在 `checkpoints/` 目录下。 + +1. 在 Kinetics-400 数据集下测试 TSN (不存储测试结果为文件),并验证 `top-k accuracy` 和 `mean class accuracy` 指标 + + ```shell + python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth \ + --eval top_k_accuracy mean_class_accuracy + ``` + +2. 使用 8 块 GPU 在 Something-Something V1 下测试 TSN,并验证 `top-k accuracy` 指标 + + ```shell + ./tools/dist_test.sh configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth \ + 8 --out results.pkl --eval top_k_accuracy + ``` + +3. 在 slurm 分布式环境中测试 TSN 在 Kinetics-400 数据集下的 `top-k accuracy` 指标 + + ```shell + python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.pth \ + --launcher slurm --eval top_k_accuracy + ``` + +4. 在 Something-Something V1 下测试 onnx 格式的 TSN 模型,并验证 `top-k accuracy` 指标 + + ```shell + python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/SOME_CHECKPOINT.onnx \ + --eval top_k_accuracy --onnx + ``` + +### 使用高级 API 对视频和帧文件夹进行测试 + +这里举例说明如何构建模型并测试给定视频 + +```python +import torch + +from mmaction.apis import init_recognizer, inference_recognizer + +config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py' +# 从模型库中下载检测点,并把它放到 `checkpoints/` 文件夹下 +checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth' + +# 指定设备 +device = 'cuda:0' # or 'cpu' +device = torch.device(device) + + # 根据配置文件和检查点来建立模型 +model = init_recognizer(config_file, checkpoint_file, device=device) + +# 测试单个视频并显示其结果 +video = 'demo/demo.mp4' +labels = 'tools/data/kinetics/label_map_k400.txt' +results = inference_recognizer(model, video) + +# 显示结果 +labels = open('tools/data/kinetics/label_map_k400.txt').readlines() +labels = [x.strip() for x in labels] +results = [(labels[k[0]], k[1]) for k in results] + +print(f'The top-5 labels with corresponding scores are:') +for result in results: + print(f'{result[0]}: ', result[1]) +``` + +这里举例说明如何构建模型并测试给定帧文件夹 + +```python +import torch + +from mmaction.apis import init_recognizer, inference_recognizer + +config_file = 'configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py' +# 从模型库中下载检测点,并把它放到 `checkpoints/` 文件夹下 +checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth' + +# 指定设备 +device = 'cuda:0' # or 'cpu' +device = torch.device(device) + + # 根据配置文件和检查点来建立模型 +model = init_recognizer(config_file, checkpoint_file, device=device) + +# 测试单个视频的帧文件夹并显示其结果 +video = 'SOME_DIR_PATH/' +labels = 'tools/data/kinetics/label_map_k400.txt' +results = inference_recognizer(model, video) + +# 显示结果 +labels = open('tools/data/kinetics/label_map_k400.txt').readlines() +labels = [x.strip() for x in labels] +results = [(labels[k[0]], k[1]) for k in results] + +print(f'The top-5 labels with corresponding scores are:') +for result in results: + print(f'{result[0]}: ', result[1]) +``` + +这里举例说明如何构建模型并通过 url 测试给定视频 + +```python +import torch + +from mmaction.apis import init_recognizer, inference_recognizer + +config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py' +# 从模型库中下载检测点,并把它放到 `checkpoints/` 文件夹下 +checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth' + +# 指定设备 +device = 'cuda:0' # or 'cpu' +device = torch.device(device) + + # 根据配置文件和检查点来建立模型 +model = init_recognizer(config_file, checkpoint_file, device=device) + +# 测试单个视频的 url 并显示其结果 +video = 'https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4' +labels = 'tools/data/kinetics/label_map_k400.txt' +results = inference_recognizer(model, video) + +# 显示结果 +labels = open('tools/data/kinetics/label_map_k400.txt').readlines() +labels = [x.strip() for x in labels] +results = [(labels[k[0]], k[1]) for k in results] + +print(f'The top-5 labels with corresponding scores are:') +for result in results: + print(f'{result[0]}: ', result[1]) +``` + +**注意**:MMAction2 在默认提供的推理配置文件(inference configs)中定义 `data_prefix` 变量,并将其设置为 None 作为默认值。 +如果 `data_prefix` 不为 None,则要获取的视频文件(或帧文件夹)的路径将为 `data_prefix/video`。 +在这里,`video` 是上述脚本中的同名变量。可以在 `rawframe_dataset.py` 文件和 `video_dataset.py` 文件中找到此详细信息。例如, + +- 当视频(帧文件夹)路径为 `SOME_DIR_PATH/VIDEO.mp4`(`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`),并且配置文件中的 `data_prefix` 为 None,则 `video` 变量应为 `SOME_DIR_PATH/VIDEO.mp4`(`SOME_DIR_PATH/VIDEO_NAME`)。 +- 当视频(帧文件夹)路径为 `SOME_DIR_PATH/VIDEO.mp4`(`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`),并且配置文件中的 `data_prefix` 为 `SOME_DIR_PATH`,则 `video` 变量应为 `VIDEO.mp4`(`VIDEO_NAME`)。 +- 当帧文件夹路径为 `VIDEO_NAME/img_xxxxx.jpg`,并且配置文件中的 `data_prefix` 为 None,则 `video` 变量应为 `VIDEO_NAME`。 +- 当传递参数为视频 url 而非本地路径,则需使用 OpenCV 作为视频解码后端。 + +在 [demo/demo.ipynb](/demo/demo.ipynb) 中有提供相应的 notebook 演示文件。 + +## 如何建立模型 + +### 使用基本组件建立模型 + +MMAction2 将模型组件分为 4 种基础模型: + +- 识别器(recognizer):整个识别器模型管道,通常包含一个主干网络(backbone)和分类头(cls_head)。 +- 主干网络(backbone):通常为一个用于提取特征的 FCN 网络,例如 ResNet,BNInception。 +- 分类头(cls_head):用于分类任务的组件,通常包括一个带有池化层的 FC 层。 +- 时序检测器(localizer):用于时序检测的模型,目前有的检测器包含 BSN,BMN,SSN。 + +用户可参照给出的配置文件里的基础模型搭建流水线(如 `Recognizer2D`) + +如果想创建一些新的组件,如 [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383) 中的 temporal shift backbone 结构,则需: + +1. 创建 `mmaction/models/backbones/resnet_tsm.py` 文件 + + ```python + from ..builder import BACKBONES + from .resnet import ResNet + + @BACKBONES.register_module() + class ResNetTSM(ResNet): + + def __init__(self, + depth, + num_segments=8, + is_shift=True, + shift_div=8, + shift_place='blockres', + temporal_pool=False, + **kwargs): + pass + + def forward(self, x): + # implementation is ignored + pass + ``` + +2. 从 `mmaction/models/backbones/__init__.py` 中导入模型 + + ```python + from .resnet_tsm import ResNetTSM + ``` + +3. 修改模型文件 + + ```python + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False) + ``` + + 修改为 + + ```python + backbone=dict( + type='ResNetTSM', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False, + shift_div=8) + ``` + +### 构建新模型 + +要编写一个新的动作识别器流水线,用户需要继承 `BaseRecognizer`,其定义了如下抽象方法 + +- `forward_train()`: 训练模式下的前向方法 +- `forward_test()`: 测试模式下的前向方法 + +具体可参照 [Recognizer2D](/mmaction/models/recognizers/recognizer2d.py) 和 [Recognizer3D](/mmaction/models/recognizers/recognizer3d.py) + +## 如何训练模型 + +### 推理流水线 + +MMAction2 使用 `MMDistributedDataParallel` 进行分布式训练,使用 `MMDataParallel` 进行非分布式训练。 + +对于单机多卡与多台机器的情况,MMAction2 使用分布式训练。假设服务器有 8 块 GPU,则会启动 8 个进程,并且每台 GPU 对应一个进程。 + +每个进程拥有一个独立的模型,以及对应的数据加载器和优化器。 +模型参数同步只发生于最开始。之后,每经过一次前向与后向计算,所有 GPU 中梯度都执行一次 allreduce 操作,而后优化器将更新模型参数。 +由于梯度执行了 allreduce 操作,因此不同 GPU 中模型参数将保持一致。 + +### 训练配置 + +所有的输出(日志文件和模型权重文件)会被将保存到工作目录下。工作目录通过配置文件中的参数 `work_dir` 指定。 + +默认情况下,MMAction2 在每个周期后会在验证集上评估模型,可以通过在训练配置中修改 `interval` 参数来更改评估间隔 + +```python +evaluation = dict(interval=5) # 每 5 个周期进行一次模型评估 +``` + +根据 [Linear Scaling Rule](https://arxiv.org/abs/1706.02677),当 GPU 数量或每个 GPU 上的视频批大小改变时,用户可根据批大小按比例地调整学习率,如,当 4 GPUs x 2 video/gpu 时,lr=0.01;当 16 GPUs x 4 video/gpu 时,lr=0.08。 + +MMAction2 支持仅使用 CPU 进行训练。然而,这样做的速度**非常慢**,用户应仅使用其作为无 GPU 机器上的 debug 手段。 +如需使用 CPU 进行训练,用户需要首先使用命令 `export CUDA_VISIBLE_DEVICES=-1` 禁用机器上的 GPU (如有),然后使用命令 `python tools/train.py {OTHER_ARGS}` 直接调用训练脚本。 + +### 使用单个 GPU 进行训练 + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +如果用户想在命令中指定工作目录,则需要增加参数 `--work-dir ${YOUR_WORK_DIR}` + +### 使用多个 GPU 进行训练 + +```shell +./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] +``` + +可选参数为: + +- `--validate` (**强烈建议**):在训练期间每 k 个周期进行一次验证(默认值为 5,可通过修改每个配置文件中的 `evaluation` 字典变量的 `interval` 值进行改变)。 +- `--test-last`:在训练结束后使用最后一个检查点的参数进行测试,将测试结果存储在 `${WORK_DIR}/last_pred.pkl` 中。 +- `--test-best`:在训练结束后使用效果最好的检查点的参数进行测试,将测试结果存储在 `${WORK_DIR}/best_pred.pkl` 中。 +- `--work-dir ${WORK_DIR}`:覆盖配置文件中指定的工作目录。 +- `--resume-from ${CHECKPOINT_FILE}`:从以前的模型权重文件恢复训练。 +- `--gpus ${GPU_NUM}`:使用的 GPU 数量,仅适用于非分布式训练。 +- `--gpu-ids ${GPU_IDS}`:使用的 GPU ID,仅适用于非分布式训练。 +- `--seed ${SEED}`:设置 python,numpy 和 pytorch 里的种子 ID,已用于生成随机数。 +- `--deterministic`:如果被指定,程序将设置 CUDNN 后端的确定化选项。 +- `JOB_LAUNCHER`:分布式任务初始化启动器选项。可选值有 `none`,`pytorch`,`slurm`,`mpi`。特别地,如果被设置为 `none`, 则会以非分布式模式进行测试。 +- `LOCAL_RANK`:本地 rank 的 ID。如果没有被指定,则会被设置为 0。 + +`resume-from` 和 `load-from` 的不同点: +`resume-from` 加载模型参数和优化器状态,并且保留检查点所在的周期数,常被用于恢复意外被中断的训练。 +`load-from` 只加载模型参数,但周期数从 0 开始计数,常被用于微调模型。 + +这里提供一个使用 8 块 GPU 加载 TSN 模型权重文件的例子。 + +```shell +./tools/dist_train.sh configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py 8 --resume-from work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth +``` + +### 使用多台机器进行训练 + +如果用户在 [slurm](https://slurm.schedmd.com/) 集群上运行 MMAction2,可使用 `slurm_train.sh` 脚本。(该脚本也支持单台机器上进行训练) + +```shell +[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} [--work-dir ${WORK_DIR}] +``` + +这里给出一个在 slurm 集群上的 dev 分区使用 16 块 GPU 训练 TSN 的例子。(使用 `GPUS_PER_NODE=8` 参数来指定一个有 8 块 GPUS 的 slurm 集群节点) + +```shell +GPUS=16 ./tools/slurm_train.sh dev tsn_r50_k400 configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb +``` + +用户可以查看 [slurm_train.sh](/tools/slurm_train.sh) 文件来检查完整的参数和环境变量。 + +如果您想使用由 ethernet 连接起来的多台机器, 您可以使用以下命令: + +在第一台机器上: + +```shell +NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS +``` + +在第二台机器上: + +```shell +NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS +``` + +但是,如果您不使用高速网路连接这几台机器的话,训练将会非常慢。 + +### 使用单台机器启动多个任务 + +如果用使用单台机器启动多个任务,如在有 8 块 GPU 的单台机器上启动 2 个需要 4 块 GPU 的训练任务,则需要为每个任务指定不同端口,以避免通信冲突。 + +如果用户使用 `dist_train.sh` 脚本启动训练任务,则可以通过以下命令指定端口 + +```shell +CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4 +CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4 +``` + +如果用户在 slurm 集群下启动多个训练任务,则需要修改配置文件(通常是配置文件的倒数第 6 行)中的 `dist_params` 变量,以设置不同的通信端口。 + +在 `config1.py` 中, + +```python +dist_params = dict(backend='nccl', port=29500) +``` + +在 `config2.py` 中, + +```python +dist_params = dict(backend='nccl', port=29501) +``` + +之后便可启动两个任务,分别对应 `config1.py` 和 `config2.py`。 + +```shell +CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py [--work-dir ${WORK_DIR}] +CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py [--work-dir ${WORK_DIR}] +``` + +## 详细教程 + +目前, MMAction2 提供以下几种更详细的教程: + +- [如何编写配置文件](tutorials/1_config.md) +- [如何微调模型](tutorials/2_finetune.md) +- [如何增加新数据集](tutorials/3_new_dataset.md) +- [如何设计数据处理流程](tutorials/4_data_pipeline.md) +- [如何增加新模块](tutorials/5_new_modules.md) +- [如何导出模型为 onnx 格式](tutorials/6_export_model.md) +- [如何自定义模型运行参数](tutorials/7_customize_runtime.md) diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/index.rst b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/index.rst new file mode 100644 index 0000000000000000000000000000000000000000..4c4351e59bb5dff557624bf7d119ddf1ac3c8722 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/index.rst @@ -0,0 +1,74 @@ +欢迎来到 MMAction2 的中文文档! +===================================== + +您可以在页面左下角切换中英文文档。 + +You can change the documentation language at the lower-left corner of the page. + +.. toctree:: + :maxdepth: 2 + + install.md + getting_started.md + demo.md + benchmark.md + +.. toctree:: + :maxdepth: 2 + :caption: 数据集 + + datasets.md + data_preparation.md + supported_datasets.md + +.. toctree:: + :maxdepth: 2 + :caption: 模型库 + + modelzoo.md + recognition_models.md + localization_models.md + detection_models.md + skeleton_models.md + +.. toctree:: + :maxdepth: 2 + :caption: 教程 + + tutorials/1_config.md + tutorials/2_finetune.md + tutorials/3_new_dataset.md + tutorials/4_data_pipeline.md + tutorials/5_new_modules.md + tutorials/6_export_model.md + tutorials/7_customize_runtime.md + +.. toctree:: + :maxdepth: 2 + :caption: 实用工具和脚本 + + useful_tools.md + +.. toctree:: + :maxdepth: 2 + :caption: 记录 + + changelog.md + faq.md + +.. toctree:: + :caption: API 参考文档 + + api.rst + +.. toctree:: + :caption: 语言切换 + + switch_language.md + + +索引和表格 +================== + +* :ref:`genindex` +* :ref:`search` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/install.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/install.md new file mode 100644 index 0000000000000000000000000000000000000000..14ee70e1696ad4f16b9533bd7906eb705e64a7d7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/install.md @@ -0,0 +1,244 @@ +# 安装 + +本文档提供了安装 MMAction2 的相关步骤。 + + + +- [安装](#%E5%AE%89%E8%A3%85) + - [安装依赖包](#%E5%AE%89%E8%A3%85%E4%BE%9D%E8%B5%96%E5%8C%85) + - [准备环境](#%E5%87%86%E5%A4%87%E7%8E%AF%E5%A2%83) + - [MMAction2 的安装步骤](#mmaction2-%E7%9A%84%E5%AE%89%E8%A3%85%E6%AD%A5%E9%AA%A4) + - [CPU 环境下的安装步骤](#cpu-%E7%8E%AF%E5%A2%83%E4%B8%8B%E7%9A%84%E5%AE%89%E8%A3%85%E6%AD%A5%E9%AA%A4) + - [利用 Docker 镜像安装 MMAction2](#%E5%88%A9%E7%94%A8-docker-%E9%95%9C%E5%83%8F%E5%AE%89%E8%A3%85-mmaction2) + - [源码安装 MMAction2](#%E6%BA%90%E7%A0%81%E5%AE%89%E8%A3%85-mmaction2) + - [在多个 MMAction2 版本下进行开发](#%E5%9C%A8%E5%A4%9A%E4%B8%AA-mmaction2-%E7%89%88%E6%9C%AC%E4%B8%8B%E8%BF%9B%E8%A1%8C%E5%BC%80%E5%8F%91) + - [安装验证](#%E5%AE%89%E8%A3%85%E9%AA%8C%E8%AF%81) + + + +## 安装依赖包 + +- Linux (Windows 系统暂未有官方支持) +- Python 3.6+ +- PyTorch 1.3+ +- CUDA 9.2+ (如果要从源码对 PyTorch 进行编译, CUDA 9.0 版本同样可以兼容) +- GCC 5+ +- [mmcv](https://github.com/open-mmlab/mmcv) 1.1.1+ +- Numpy +- ffmpeg (4.2 版本最佳) +- [decord](https://github.com/dmlc/decord) (可选项, 0.4.1+):使用 `pip install decord==0.4.1` 命令安装其 CPU 版本,GPU 版本需从源码进行编译。 +- [PyAV](https://github.com/mikeboers/PyAV) (可选项):`conda install av -c conda-forge -y`。 +- [PyTurboJPEG](https://github.com/lilohuang/PyTurboJPEG) (可选项):`pip install PyTurboJPEG`。 +- [denseflow](https://github.com/open-mmlab/denseflow) (可选项):可参考 [这里](https://github.com/innerlee/setup) 获取简便安装步骤。 +- [moviepy](https://zulko.github.io/moviepy/) (可选项):`pip install moviepy`. 官方安装步骤可参考 [这里](https://zulko.github.io/moviepy/install.html)。**特别地**,如果安装过程碰到 [这个问题](https://github.com/Zulko/moviepy/issues/693),可参考: + 1. 对于 Windows 用户, [ImageMagick](https://www.imagemagick.org/script/index.php) 将不会被 MoviePy 自动检测到,用户需要对 `moviepy/config_defaults.py` 文件进行修改,以提供 ImageMagick 的二进制文件(即,`magick`)的路径,如 `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"` + 2. 对于 Linux 用户, 如果 [ImageMagick](https://www.imagemagick.org/script/index.php) 没有被 `moviepy` 检测到,用于需要对 `/etc/ImageMagick-6/policy.xml` 文件进行修改,把文件中的 `` 代码行修改为 ``。 +- [Pillow-SIMD](https://docs.fast.ai/performance.html#pillow-simd) (可选项):可使用如下脚本进行安装: + +```shell +conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo +pip uninstall -y pillow pil jpeg libtiff libjpeg-turbo +conda install -yc conda-forge libjpeg-turbo +CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd +conda install -y jpeg libtiff +``` + +**注意**:用户需要首先运行 `pip uninstall mmcv` 命令,以确保 mmcv 被成功安装。 +如果 mmcv 和 mmcv-full 同时被安装, 会报 `ModuleNotFoundError` 的错误。 + +## 准备环境 + +a. 创建并激活 conda 虚拟环境,如: + +```shell +conda create -n open-mmlab python=3.7 -y +conda activate open-mmlab +``` + +b. 根据 [官方文档](https://pytorch.org/) 进行 PyTorch 和 torchvision 的安装,如: + +```shell +conda install pytorch torchvision -c pytorch +``` + +**注**:确保 CUDA 的编译版本和 CUDA 的运行版本相匹配。 +用户可以参照 [PyTorch 官网](https://pytorch.org/) 对预编译包所支持的 CUDA 版本进行核对。 + +`例 1`:如果用户的 `/usr/local/cuda` 文件夹下已安装 CUDA 10.1 版本,并且想要安装 PyTorch 1.5 版本, +则需要安装 CUDA 10.1 下预编译的 PyTorch。 + +```shell +conda install pytorch cudatoolkit=10.1 torchvision -c pytorch +``` + +`例 2`:如果用户的 `/usr/local/cuda` 文件夹下已安装 CUDA 9.2 版本,并且想要安装 PyTorch 1.3.1 版本, +则需要安装 CUDA 9.2 下预编译的 PyTorch。 + +```shell +conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch +``` + +如果 PyTorch 是由源码进行编译安装(而非直接下载预编译好的安装包),则可以使用更多的 CUDA 版本(如 9.0 版本)。 + +## MMAction2 的安装步骤 + +这里推荐用户使用 [MIM](https://github.com/open-mmlab/mim) 安装 MMAction2。 + +```shell +pip install git+https://github.com/open-mmlab/mim.git +mim install mmaction2 -f https://github.com/open-mmlab/mmaction2.git +``` + +MIM 可以自动安装 OpenMMLab 项目及其依赖。 + +或者,用户也可以通过以下步骤手动安装 MMAction2。 + +a. 安装 mmcv-full,我们推荐您安装以下预构建包: + +```shell +# pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html +pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html +``` + +PyTorch 在 1.x.0 和 1.x.1 之间通常是兼容的,故 mmcv-full 只提供 1.x.0 的编译包。如果你的 PyTorch 版本是 1.x.1,你可以放心地安装在 1.x.0 版本编译的 mmcv-full。 + +``` +# 我们可以忽略 PyTorch 的小版本号 +pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10/index.html +``` + +可查阅 [这里](https://github.com/open-mmlab/mmcv#installation) 以参考不同版本的 MMCV 所兼容的 PyTorch 和 CUDA 版本。 + +另外,用户也可以通过使用以下命令从源码进行编译: + +```shell +git clone https://github.com/open-mmlab/mmcv.git +cd mmcv +MMCV_WITH_OPS=1 pip install -e . # mmcv-full 包含一些 cuda 算子,执行该步骤会安装 mmcv-full(而非 mmcv) +# 或者使用 pip install -e . # 这个命令安装的 mmcv 将不包含 cuda ops,通常适配 CPU(无 GPU)环境 +cd .. +``` + +或者直接运行脚本: + +```shell +pip install mmcv-full +``` + +**注意**:如果 mmcv 已经被安装,用户需要使用 `pip uninstall mmcv` 命令进行卸载。如果 mmcv 和 mmcv-full 同时被安装, 会报 `ModuleNotFoundError` 的错误。 + +b. 克隆 MMAction2 库。 + +```shell +git clone https://github.com/open-mmlab/mmaction2.git +cd mmaction2 +``` + +c. 安装依赖包和 MMAction2。 + +```shell +pip install -r requirements/build.txt +pip install -v -e . # or "python setup.py develop" +``` + +如果是在 macOS 环境安装 MMAction2,则需使用如下命令: + +```shell +CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' pip install -e . +``` + +d. 安装 mmdetection 以支持时空检测任务。 + +如果用户不想做时空检测相关任务,这部分步骤可以选择跳过。 + +可参考 [这里](https://github.com/open-mmlab/mmdetection#installation) 进行 mmdetection 的安装。 + +注意: + +1. 在步骤 b 中,git commit 的 id 将会被写到版本号中,如 0.6.0+2e7045c。这个版本号也会被保存到训练好的模型中。 + 这里推荐用户每次在步骤 b 中对本地代码和 github 上的源码进行同步。如果 C++/CUDA 代码被修改,就必须进行这一步骤。 + +2. 根据上述步骤,MMAction2 就会以 `dev` 模式被安装,任何本地的代码修改都会立刻生效,不需要再重新安装一遍(除非用户提交了 commits,并且想更新版本号)。 + +3. 如果用户想使用 `opencv-python-headless` 而不是 `opencv-python`,可再安装 MMCV 前安装 `opencv-python-headless`。 + +4. 如果用户想使用 `PyAV`,可以通过 `conda install av -c conda-forge -y` 进行安装。 + +5. 一些依赖包是可选的。运行 `python setup.py develop` 将只会安装运行代码所需的最小要求依赖包。 + 要想使用一些可选的依赖包,如 `decord`,用户需要通过 `pip install -r requirements/optional.txt` 进行安装, + 或者通过调用 `pip`(如 `pip install -v -e .[optional]`,这里的 `[optional]` 可替换为 `all`,`tests`,`build` 或 `optional`) 指定安装对应的依赖包,如 `pip install -v -e .[tests,build]`。 + +## CPU 环境下的安装步骤 + +MMAction2 可以在只有 CPU 的环境下安装(即无法使用 GPU 的环境)。 + +在 CPU 模式下,用户可以运行 `demo/demo.py` 的代码。 + +## 利用 Docker 镜像安装 MMAction2 + +MMAction2 提供一个 [Dockerfile](/docker/Dockerfile) 用户创建 docker 镜像。 + +```shell +# 创建拥有 PyTorch 1.6.0, CUDA 10.1, CUDNN 7 配置的 docker 镜像. +docker build -f ./docker/Dockerfile --rm -t mmaction2 . +``` + +**注意**:用户需要确保已经安装了 [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker)。 + +运行以下命令: + +```shell +docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmaction2/data mmaction2 +``` + +## 源码安装 MMAction2 + +这里提供了 conda 下安装 MMAction2 并链接数据集路径的完整脚本(假设 Kinetics-400 数据的路径在 $KINETICS400_ROOT)。 + +```shell +conda create -n open-mmlab python=3.7 -y +conda activate open-mmlab + +# 安装最新的,使用默认版本的 CUDA 版本(一般为最新版本)预编译的 PyTorch 包 +conda install -c pytorch pytorch torchvision -y + +# 安装最新版本的 mmcv 或 mmcv-full,这里以 mmcv 为例 +pip install mmcv + +# 安装 mmaction2 +git clone https://github.com/open-mmlab/mmaction2.git +cd mmaction2 +pip install -r requirements/build.txt +python setup.py develop + +mkdir data +ln -s $KINETICS400_ROOT data +``` + +## 在多个 MMAction2 版本下进行开发 + +MMAction2 的训练和测试脚本已经修改了 `PYTHONPATH` 变量,以确保其能够运行当前目录下的 MMAction2。 + +如果想要运行环境下默认的 MMAction2,用户需要在训练和测试脚本中去除这一行: + +```shell +PYTHONPATH="$(dirname $0)/..":$PYTHONPATH +``` + +## 安装验证 + +为了验证 MMAction2 和所需的依赖包是否已经安装成功, +用户可以运行以下的 python 代码,以测试其是否能成功地初始化动作识别器,并进行演示视频的推理: + +```python +import torch +from mmaction.apis import init_recognizer, inference_recognizer + +config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py' +device = 'cuda:0' # 或 'cpu' +device = torch.device(device) + +model = init_recognizer(config_file, device=device) +# 进行演示视频的推理 +inference_recognizer(model, 'demo/demo.mp4') +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/make.bat b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/make.bat new file mode 100644 index 0000000000000000000000000000000000000000..922152e96a04a242e6fc40f124261d74890617d8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=. +set BUILDDIR=_build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/merge_docs.sh b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/merge_docs.sh new file mode 100644 index 0000000000000000000000000000000000000000..1265731a9798f05be69e626b5de322f3842a0808 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/merge_docs.sh @@ -0,0 +1,41 @@ +#!/usr/bin/env bash +# gather models +cat ../configs/localization/*/README_zh-CN.md | sed "s/md#测/html#测/g" | sed "s/md#训/html#训/g" | sed "s/#/#&/" | sed '1i\# 时序动作检测模型' | sed 's/](\/docs_zh_CN\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##/getting_started.html#/g" > localization_models.md +cat ../configs/recognition/*/README_zh-CN.md | sed "s/md#测/html#t测/g" | sed "s/md#训/html#训/g" | sed "s/#/#&/" | sed '1i\# 动作识别模型' | sed 's/](\/docs_zh_CN\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g'| sed "s/getting_started.html##/getting_started.html#/g" > recognition_models.md +cat ../configs/recognition_audio/*/README_zh-CN.md | sed "s/md#测/html#测/g" | sed "s/md#训/html#训/g" | sed "s/#/#&/" | sed 's/](\/docs_zh_CN\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g'| sed "s/getting_started.html##/getting_started.html#/g" >> recognition_models.md +cat ../configs/detection/*/README_zh-CN.md | sed "s/md#测/html#测/g" | sed "s/md#训/html#训/g" | sed "s/#/#&/" | sed '1i\# 时空动作检测模型' | sed 's/](\/docs_zh_CN\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g'| sed "s/getting_started.html##/getting_started.html#/g" > detection_models.md +cat ../configs/skeleton/*/README_zh-CN.md | sed "s/md#测/html#测/g" | sed "s/md#训/html#训/g" | sed "s/#/#&/" | sed '1i\# 骨骼动作识别模型' | sed 's/](\/docs_zh_CN\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g'| sed "s/getting_started.html##/getting_started.html#/g" > skeleton_models.md + +# gather datasets +cat ../tools/data/*/README_zh-CN.md | sed 's/# 准备/# /g' | sed 's/#/#&/' > prepare_data.md + +sed -i 's/(\/tools\/data\/activitynet\/README_zh-CN.md/(#activitynet/g' supported_datasets.md +sed -i 's/(\/tools\/data\/kinetics\/README_zh-CN.md/(#kinetics-400-600-700/g' supported_datasets.md +sed -i 's/(\/tools\/data\/mit\/README_zh-CN.md/(#moments-in-time/g' supported_datasets.md +sed -i 's/(\/tools\/data\/mmit\/README_zh-CN.md/(#multi-moments-in-time/g' supported_datasets.md +sed -i 's/(\/tools\/data\/sthv1\/README_zh-CN.md/(#something-something-v1/g' supported_datasets.md +sed -i 's/(\/tools\/data\/sthv2\/README_zh-CN.md/(#something-something-v2/g' supported_datasets.md +sed -i 's/(\/tools\/data\/thumos14\/README_zh-CN.md/(#thumos-14/g' supported_datasets.md +sed -i 's/(\/tools\/data\/ucf101\/README_zh-CN.md/(#ucf-101/g' supported_datasets.md +sed -i 's/(\/tools\/data\/ucf101_24\/README_zh-CN.md/(#ucf101-24/g' supported_datasets.md +sed -i 's/(\/tools\/data\/jhmdb\/README_zh-CN.md/(#jhmdb/g' supported_datasets.md +sed -i 's/(\/tools\/data\/hvu\/README_zh-CN.md/(#hvu/g' supported_datasets.md +sed -i 's/(\/tools\/data\/hmdb51\/README_zh-CN.md/(#hmdb51/g' supported_datasets.md +sed -i 's/(\/tools\/data\/jester\/README_zh-CN.md/(#jester/g' supported_datasets.md +sed -i 's/(\/tools\/data\/ava\/README_zh-CN.md/(#ava/g' supported_datasets.md +sed -i 's/(\/tools\/data\/gym\/README_zh-CN.md/(#gym/g' supported_datasets.md +sed -i 's/(\/tools\/data\/omnisource\/README_zh-CN.md/(#omnisource/g' supported_datasets.md +sed -i 's/(\/tools\/data\/diving48\/README_zh-CN.md/(#diving48/g' supported_datasets.md +sed -i 's/(\/tools\/data\/skeleton\/README_zh-CN.md/(#skeleton/g' supported_datasets.md + +cat prepare_data.md >> supported_datasets.md +sed -i 's/](\/docs_zh_CN\//](/g' supported_datasets.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' supported_datasets.md + +sed -i "s/md###t/html#t/g" demo.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' demo.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' benchmark.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' getting_started.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' install.md +sed -i 's/](\/docs_zh_CN\//](/g' ./tutorials/*.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' ./tutorials/*.md diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/stat.py b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/stat.py new file mode 100644 index 0000000000000000000000000000000000000000..fe7590afddd1756cb1ee8b06e494ca5c6f06e052 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/stat.py @@ -0,0 +1,173 @@ +#!/usr/bin/env python +# Copyright (c) OpenMMLab. All rights reserved. +import functools as func +import glob +import re +from os.path import basename, splitext + +import numpy as np +import titlecase + + +def anchor(name): + return re.sub(r'-+', '-', re.sub(r'[^a-zA-Z0-9]', '-', + name.strip().lower())).strip('-') + + +# Count algorithms + +files = sorted(glob.glob('*_models.md')) + +stats = [] + +for f in files: + with open(f, 'r') as content_file: + content = content_file.read() + + # title + title = content.split('\n')[0].replace('#', '') + + # skip IMAGE and ABSTRACT tags + content = [ + x for x in content.split('\n') + if 'IMAGE' not in x and 'ABSTRACT' not in x + ] + content = '\n'.join(content) + + # count papers + papers = set( + (papertype, titlecase.titlecase(paper.lower().strip())) + for (papertype, paper) in re.findall( + r'\s*\n.*?\btitle\s*=\s*{(.*?)}', + content, re.DOTALL)) + # paper links + revcontent = '\n'.join(list(reversed(content.splitlines()))) + paperlinks = {} + for _, p in papers: + print(p) + q = p.replace('\\', '\\\\').replace('?', '\\?') + paperlinks[p] = ' '.join( + (f'[->]({splitext(basename(f))[0]}.html#{anchor(paperlink)})' + for paperlink in re.findall( + rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n', + revcontent, re.DOTALL | re.IGNORECASE))) + print(' ', paperlinks[p]) + paperlist = '\n'.join( + sorted(f' - [{t}] {x} ({paperlinks[x]})' for t, x in papers)) + # count configs + configs = set(x.lower().strip() + for x in re.findall(r'https.*configs/.*\.py', content)) + + # count ckpts + ckpts = set(x.lower().strip() + for x in re.findall(r'https://download.*\.pth', content) + if 'mmaction' in x) + + statsmsg = f""" +## [{title}]({f}) + +* 模型权重文件数量: {len(ckpts)} +* 配置文件数量: {len(configs)} +* 论文数量: {len(papers)} +{paperlist} + + """ + + stats.append((papers, configs, ckpts, statsmsg)) + +allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _, _ in stats]) +allconfigs = func.reduce(lambda a, b: a.union(b), [c for _, c, _, _ in stats]) +allckpts = func.reduce(lambda a, b: a.union(b), [c for _, _, c, _ in stats]) +msglist = '\n'.join(x for _, _, _, x in stats) + +papertypes, papercounts = np.unique([t for t, _ in allpapers], + return_counts=True) +countstr = '\n'.join( + [f' - {t}: {c}' for t, c in zip(papertypes, papercounts)]) + +modelzoo = f""" +# 总览 + +* 模型权重文件数量: {len(allckpts)} +* 配置文件数量: {len(allconfigs)} +* 论文数量: {len(allpapers)} +{countstr} + +有关受支持的数据集,可参见 [数据集总览](datasets.md)。 + +{msglist} +""" + +with open('modelzoo.md', 'w') as f: + f.write(modelzoo) + +# Count datasets + +files = ['supported_datasets.md'] + +datastats = [] + +for f in files: + with open(f, 'r') as content_file: + content = content_file.read() + + # title + title = content.split('\n')[0].replace('#', '') + + # count papers + papers = set( + (papertype, titlecase.titlecase(paper.lower().strip())) + for (papertype, paper) in re.findall( + r'\s*\n.*?\btitle\s*=\s*{(.*?)}', + content, re.DOTALL)) + # paper links + revcontent = '\n'.join(list(reversed(content.splitlines()))) + paperlinks = {} + for _, p in papers: + print(p) + q = p.replace('\\', '\\\\').replace('?', '\\?') + paperlinks[p] = ', '.join( + (f'[{p.strip()} ->]({splitext(basename(f))[0]}.html#{anchor(p)})' + for p in re.findall( + rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n', + revcontent, re.DOTALL | re.IGNORECASE))) + print(' ', paperlinks[p]) + paperlist = '\n'.join( + sorted(f' - [{t}] {x} ({paperlinks[x]})' for t, x in papers)) + + statsmsg = f""" +## [{title}]({f}) + +* 论文数量: {len(papers)} +{paperlist} + + """ + + datastats.append((papers, configs, ckpts, statsmsg)) + +alldatapapers = func.reduce(lambda a, b: a.union(b), + [p for p, _, _, _ in datastats]) + +# Summarize + +msglist = '\n'.join(x for _, _, _, x in stats) +datamsglist = '\n'.join(x for _, _, _, x in datastats) +papertypes, papercounts = np.unique([t for t, _ in alldatapapers], + return_counts=True) +countstr = '\n'.join( + [f' - {t}: {c}' for t, c in zip(papertypes, papercounts)]) + +modelzoo = f""" +# 总览 + +* 论文数量: {len(alldatapapers)} +{countstr} + + +有关受支持的视频理解算法,可参见 [模型总览](modelzoo.md)。 + +{datamsglist} +""" + +with open('datasets.md', 'w') as f: + f.write(modelzoo) diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/supported_datasets.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/supported_datasets.md new file mode 100644 index 0000000000000000000000000000000000000000..7cafa129dccf6d01081c81af57cab1189ad73a07 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/supported_datasets.md @@ -0,0 +1,34 @@ +# 支持的数据集 + +- 支持的动作识别数据集: + + - [UCF101](/tools/data/ucf101/README_zh-CN.md) \[ [主页](https://www.crcv.ucf.edu/research/data-sets/ucf101/) \]. + - [HMDB51](/tools/data/hmdb51/README_zh-CN.md) \[ [主页](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/) \]. + - [Kinetics-\[400/600/700\]](/tools/data/kinetics/README_zh-CN.md) \[ [主页](https://deepmind.com/research/open-source/kinetics) \] + - [Something-Something V1](/tools/data/sthv1/README_zh-CN.md) \[ [主页](https://20bn.com/datasets/something-something/v1) \] + - [Something-Something V2](/tools/data/sthv2/README_zh-CN.md) \[ [主页](https://20bn.com/datasets/something-something) \] + - [Moments in Time](/tools/data/mit/README_zh-CN.md) \[ [主页](http://moments.csail.mit.edu/) \] + - [Multi-Moments in Time](/tools/data/mmit/README_zh-CN.md) \[ [主页](http://moments.csail.mit.edu/challenge_iccv_2019.html) \] + - [HVU](/tools/data/hvu/README_zh-CN.md) \[ [主页](https://github.com/holistic-video-understanding/HVU-Dataset) \] + - [Jester](/tools/data/jester/README_zh-CN.md) \[ [主页](https://20bn.com/datasets/jester/v1) \] + - [GYM](/tools/data/gym/README_zh-CN.md) \[ [主页](https://sdolivia.github.io/FineGym/) \] + - [ActivityNet](/tools/data/activitynet/README_zh-CN.md) \[ [主页](http://activity-net.org/) \] + +- 支持的时序动作检测数据集: + + - [ActivityNet](/tools/data/activitynet/README_zh-CN.md) \[ [主页](http://activity-net.org/) \] + - [THUMOS14](/tools/data/thumos14/README_zh-CN.md) \[ [主页](https://www.crcv.ucf.edu/THUMOS14/download.html) \] + +- 支持的时空动作检测数据集: + + - [AVA](/tools/data/ava/README_zh-CN.md) \[ [主页](https://research.google.com/ava/index.html) \] + - [UCF101-24](/tools/data/ucf101_24/README_zh-CN.md) \[ [主页](http://www.thumos.info/download.html) \] + - [JHMDB](/tools/data/jhmdb/README_zh-CN.md) \[ [主页](http://jhmdb.is.tue.mpg.de/) \] + +- 基于人体骨架的动作识别数据集: + + - [PoseC3D Skeleton Dataset](/tools/data/skeleton/README.md) \[ [主页](https://kennymckormick.github.io/posec3d/) \] + +MMAction2 目前支持的数据集如上所列。 +MMAction2 在 `$MMACTION2/tools/data/` 路径下提供数据集准备脚本。 +每个数据集的详细准备教程也在 [Readthedocs](https://mmaction2.readthedocs.io/zh_CN/latest/supported_datasets.html) 中给出。 diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/switch_language.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/switch_language.md new file mode 100644 index 0000000000000000000000000000000000000000..4bade2237f4cd26b1999da90baafef3543b333cf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/switch_language.md @@ -0,0 +1,3 @@ +## English + +## 简体中文 diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/1_config.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/1_config.md new file mode 100644 index 0000000000000000000000000000000000000000..7c2f04abf5fe9152357899ee0a63d810e9bcc201 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/1_config.md @@ -0,0 +1,748 @@ +# 教程 1:如何编写配置文件 + +MMAction2 使用 python 文件作为配置文件。其配置文件系统的设计将模块化与继承整合进来,方便用户进行各种实验。 +MMAction2 提供的所有配置文件都放置在 `$MMAction2/configs` 文件夹下,用户可以通过运行命令 +`python tools/analysis/print_config.py /PATH/TO/CONFIG` 来查看完整的配置信息,从而方便检查所对应的配置文件。 + + + +- [通过命令行参数修改配置信息](#%E9%80%9A%E8%BF%87%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0%E4%BF%AE%E6%94%B9%E9%85%8D%E7%BD%AE%E4%BF%A1%E6%81%AF) +- [配置文件结构](#%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E7%BB%93%E6%9E%84) +- [配置文件命名规则](#%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E5%91%BD%E5%90%8D%E8%A7%84%E5%88%99) + - [时序动作检测的配置文件系统](#%E6%97%B6%E5%BA%8F%E5%8A%A8%E4%BD%9C%E6%A3%80%E6%B5%8B%E7%9A%84%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F) + - [动作识别的配置文件系统](#%E5%8A%A8%E4%BD%9C%E8%AF%86%E5%88%AB%E7%9A%84%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F) + - [时空动作检测的配置文件系统](#%E6%97%B6%E7%A9%BA%E5%8A%A8%E4%BD%9C%E6%A3%80%E6%B5%8B%E7%9A%84%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F) +- [常见问题](#%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98) + - [配置文件中的中间变量](#%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E4%B8%AD%E7%9A%84%E4%B8%AD%E9%97%B4%E5%8F%98%E9%87%8F) + + + +## 通过命令行参数修改配置信息 + +当用户使用脚本 "tools/train.py" 或者 "tools/test.py" 提交任务时,可以通过指定 `--cfg-options` 参数来直接修改所使用的配置文件内容。 + +- 更新配置文件内的字典 + + 用户可以按照原始配置中的字典键顺序来指定配置文件的设置。 + 例如,`--cfg-options model.backbone.norm_eval=False` 会改变 `train` 模式下模型主干网络 backbone 中所有的 BN 模块。 + +- 更新配置文件内列表的键 + + 配置文件中,存在一些由字典组成的列表。例如,训练数据前处理流水线 data.train.pipeline 就是 python 列表。 + 如,`[dict(type='SampleFrames'), ...]`。如果用户想更改其中的 `'SampleFrames'` 为 `'DenseSampleFrames'`, + 可以指定 `--cfg-options data.train.pipeline.0.type=DenseSampleFrames`。 + +- 更新列表/元组的值。 + + 当配置文件中需要更新的是一个列表或者元组,例如,配置文件通常会设置 `workflow=[('train', 1)]`,用户如果想更改, + 需要指定 `--cfg-options workflow="[(train,1),(val,1)]"`。注意这里的引号 " 对于列表/元组数据类型的修改是必要的, + 并且 **不允许** 引号内所指定的值的书写存在空格。 + +## 配置文件结构 + +在 `config/_base_` 文件夹下存在 3 种基本组件类型: 模型(model), 训练策略(schedule), 运行时的默认设置(default_runtime)。 +许多方法都可以方便地通过组合这些组件进行实现,如 TSN,I3D,SlowOnly 等。 +其中,通过 `_base_` 下组件来构建的配置被称为 _原始配置_(_primitive_)。 + +对于在同一文件夹下的所有配置文件,MMAction2 推荐只存在 **一个** 对应的 _原始配置_ 文件。 +所有其他的配置文件都应该继承 _原始配置_ 文件,这样就能保证配置文件的最大继承深度为 3。 + +为了方便理解,MMAction2 推荐用户继承现有方法的配置文件。 +例如,如需修改 TSN 的配置文件,用户应先通过 `_base_ = '../tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py'` 继承 TSN 配置文件的基本结构, +并修改其中必要的内容以完成继承。 + +如果用户想实现一个独立于任何一个现有的方法结构的新方法,则需要像 `configs/recognition`, `configs/detection` 等一样,在 `configs/TASK` 中建立新的文件夹。 + +更多详细内容,请参考 [mmcv](https://mmcv.readthedocs.io/en/latest/understand_mmcv/config.html)。 + +## 配置文件命名规则 + +MMAction2 按照以下风格进行配置文件命名,代码库的贡献者需要遵循相同的命名规则。 + +``` +{model}_[model setting]_{backbone}_[misc]_{data setting}_[gpu x batch_per_gpu]_{schedule}_{dataset}_{modality} +``` + +其中,`{xxx}` 表示必要的命名域,`[yyy]` 表示可选的命名域。 + +- `{model}`:模型类型,如 `tsn`,`i3d` 等。 +- `[model setting]`:一些模型上的特殊设置。 +- `{backbone}`:主干网络类型,如 `r50`(ResNet-50)等。 +- `[misc]`:模型的额外设置或插件,如 `dense`,`320p`,`video`等。 +- `{data setting}`:采帧数据格式,形如 `{clip_len}x{frame_interval}x{num_clips}`。 +- `[gpu x batch_per_gpu]`:GPU 数量以及每个 GPU 上的采样。 +- `{schedule}`:训练策略设置,如 `20e` 表示 20 个周期(epoch)。 +- `{dataset}`:数据集名,如 `kinetics400`,`mmit`等。 +- `{modality}`:帧的模态,如 `rgb`, `flow`等。 + +### 时序动作检测的配置文件系统 + +MMAction2 将模块化设计整合到配置文件系统中,以便于执行各种不同的实验。 + +- 以 BMN 为例 + + 为了帮助用户理解 MMAction2 的配置文件结构,以及时序动作检测系统中的一些模块,这里以 BMN 为例,给出其配置文件的注释。 + 对于每个模块的详细用法以及对应参数的选择,请参照 [API 文档](https://mmaction2.readthedocs.io/en/latest/api.html)。 + + ```python + # 模型设置 + model = dict( # 模型的配置 + type='BMN', # 时序动作检测器的类型 + temporal_dim=100, # 每个视频中所选择的帧数量 + boundary_ratio=0.5, # 视频边界的决策几率 + num_samples=32, # 每个候选的采样数 + num_samples_per_bin=3, # 每个样本的直方图采样数 + feat_dim=400, # 特征维度 + soft_nms_alpha=0.4, # soft-NMS 的 alpha 值 + soft_nms_low_threshold=0.5, # soft-NMS 的下界 + soft_nms_high_threshold=0.9, # soft-NMS 的上界 + post_process_top_k=100) # 后处理得到的最好的 K 个 proposal + # 模型训练和测试的设置 + train_cfg = None # 训练 BMN 的超参配置 + test_cfg = dict(average_clips='score') # 测试 BMN 的超参配置 + + # 数据集设置 + dataset_type = 'ActivityNetDataset' # 训练,验证,测试的数据集类型 + data_root = 'data/activitynet_feature_cuhk/csv_mean_100/' # 训练集的根目录 + data_root_val = 'data/activitynet_feature_cuhk/csv_mean_100/' # 验证集和测试集的根目录 + ann_file_train = 'data/ActivityNet/anet_anno_train.json' # 训练集的标注文件 + ann_file_val = 'data/ActivityNet/anet_anno_val.json' # 验证集的标注文件 + ann_file_test = 'data/ActivityNet/anet_anno_test.json' # 测试集的标注文件 + + train_pipeline = [ # 训练数据前处理流水线步骤组成的列表 + dict(type='LoadLocalizationFeature'), # 加载时序动作检测特征 + dict(type='GenerateLocalizationLabels'), # 生成时序动作检测标签 + dict( # Collect 类的配置 + type='Collect', # Collect 类决定哪些键会被传递到时序检测器中 + keys=['raw_feature', 'gt_bbox'], # 输入的键 + meta_name='video_meta', # 元名称 + meta_keys=['video_name']), # 输入的元键 + dict( # ToTensor 类的配置 + type='ToTensor', # ToTensor 类将其他类型转化为 Tensor 类型 + keys=['raw_feature']), # 将被从其他类型转化为 Tensor 类型的特征 + dict( # ToDataContainer 类的配置 + type='ToDataContainer', # 将一些信息转入到 ToDataContainer 中 + fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # 携带额外键和属性的信息域 + ] + val_pipeline = [ # 验证数据前处理流水线步骤组成的列表 + dict(type='LoadLocalizationFeature'), # 加载时序动作检测特征 + dict(type='GenerateLocalizationLabels'), # 生成时序动作检测标签 + dict( # Collect 类的配置 + type='Collect', # Collect 类决定哪些键会被传递到时序检测器中 + keys=['raw_feature', 'gt_bbox'], # 输入的键 + meta_name='video_meta', # 元名称 + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), # 输入的元键 + dict( # ToTensor 类的配置 + type='ToTensor', # ToTensor 类将其他类型转化为 Tensor 类型 + keys=['raw_feature']), # 将被从其他类型转化为 Tensor 类型的特征 + dict( # ToDataContainer 类的配置 + type='ToDataContainer', # 将一些信息转入到 ToDataContainer 中 + fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # 携带额外键和属性的信息域 + ] + test_pipeline = [ # 测试数据前处理流水线步骤组成的列表 + dict(type='LoadLocalizationFeature'), # 加载时序动作检测特征 + dict( # Collect 类的配置 + type='Collect', # Collect 类决定哪些键会被传递到时序检测器中 + keys=['raw_feature'], # 输入的键 + meta_name='video_meta', # 元名称 + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), # 输入的元键 + dict( # ToTensor 类的配置 + type='ToTensor', # ToTensor 类将其他类型转化为 Tensor 类型 + keys=['raw_feature']), # 将被从其他类型转化为 Tensor 类型的特征 + ] + data = dict( # 数据的配置 + videos_per_gpu=8, # 单个 GPU 的批大小 + workers_per_gpu=8, # 单个 GPU 的 dataloader 的进程 + train_dataloader=dict( # 训练过程 dataloader 的额外设置 + drop_last=True), # 在训练过程中是否丢弃最后一个批次 + val_dataloader=dict( # 验证过程 dataloader 的额外设置 + videos_per_gpu=1), # 单个 GPU 的批大小 + test_dataloader=dict( # 测试过程 dataloader 的额外设置 + videos_per_gpu=2), # 单个 GPU 的批大小 + test=dict( # 测试数据集的设置 + type=dataset_type, + ann_file=ann_file_test, + pipeline=test_pipeline, + data_prefix=data_root_val), + val=dict( # 验证数据集的设置 + type=dataset_type, + ann_file=ann_file_val, + pipeline=val_pipeline, + data_prefix=data_root_val), + train=dict( # 训练数据集的设置 + type=dataset_type, + ann_file=ann_file_train, + pipeline=train_pipeline, + data_prefix=data_root)) + + # 优化器设置 + optimizer = dict( + # 构建优化器的设置,支持: + # (1) 所有 PyTorch 原生的优化器,这些优化器的参数和 PyTorch 对应的一致; + # (2) 自定义的优化器,这些优化器在 `constructor` 的基础上构建。 + # 更多细节可参考 "tutorials/5_new_modules.md" 部分 + type='Adam', # 优化器类型, 参考 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details + lr=0.001, # 学习率, 参数的细节使用可参考 PyTorch 的对应文档 + weight_decay=0.0001) # Adam 优化器的权重衰减 + optimizer_config = dict( # 用于构建优化器钩子的设置 + grad_clip=None) # 大部分的方法不使用梯度裁剪 + # 学习策略设置 + lr_config = dict( # 用于注册学习率调整钩子的设置 + policy='step', # 调整器策略, 支持 CosineAnnealing,Cyclic等方法。更多细节可参考 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 + step=7) # 学习率衰减步长 + + total_epochs = 9 # 训练模型的总周期数 + checkpoint_config = dict( # 模型权重文件钩子设置,更多细节可参考 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py + interval=1) # 模型权重文件保存间隔 + evaluation = dict( # 训练期间做验证的设置 + interval=1, # 执行验证的间隔 + metrics=['AR@AN']) # 验证方法 + log_config = dict( # 注册日志钩子的设置 + interval=50, # 打印日志间隔 + hooks=[ # 训练期间执行的钩子 + dict(type='TextLoggerHook'), # 记录训练过程信息的日志 + # dict(type='TensorboardLoggerHook'), # 同时支持 Tensorboard 日志 + ]) + + # 运行设置 + dist_params = dict(backend='nccl') # 建立分布式训练的设置(端口号,多 GPU 通信框架等) + log_level = 'INFO' # 日志等级 + work_dir = './work_dirs/bmn_400x100_2x8_9e_activitynet_feature/' # 记录当前实验日志和模型权重文件的文件夹 + load_from = None # 从给定路径加载模型作为预训练模型. 这个选项不会用于断点恢复训练 + resume_from = None # 加载给定路径的模型权重文件作为断点续连的模型, 训练将从该时间点保存的周期点继续进行 + workflow = [('train', 1)] # runner 的执行流. [('train', 1)] 代表只有一个执行流,并且这个名为 train 的执行流只执行一次 + output_config = dict( # 时序检测器输出设置 + out=f'{work_dir}/results.json', # 输出文件路径 + output_format='json') # 输出文件格式 + ``` + +### 动作识别的配置文件系统 + +MMAction2 将模块化设计整合到配置文件系统中,以便执行各类不同实验。 + +- 以 TSN 为例 + + 为了帮助用户理解 MMAction2 的配置文件结构,以及动作识别系统中的一些模块,这里以 TSN 为例,给出其配置文件的注释。 + 对于每个模块的详细用法以及对应参数的选择,请参照 [API 文档](https://mmaction2.readthedocs.io/en/latest/api.html)。 + + ```python + # 模型设置 + model = dict( # 模型的配置 + type='Recognizer2D', # 动作识别器的类型 + backbone=dict( # Backbone 字典设置 + type='ResNet', # Backbone 名 + pretrained='torchvision://resnet50', # 预训练模型的 url 或文件位置 + depth=50, # ResNet 模型深度 + norm_eval=False), # 训练时是否设置 BN 层为验证模式 + cls_head=dict( # 分类器字典设置 + type='TSNHead', # 分类器名 + num_classes=400, # 分类类别数量 + in_channels=2048, # 分类器里输入通道数 + spatial_type='avg', # 空间维度的池化种类 + consensus=dict(type='AvgConsensus', dim=1), # consensus 模块设置 + dropout_ratio=0.4, # dropout 层概率 + init_std=0.01), # 线性层初始化 std 值 + # 模型训练和测试的设置 + train_cfg=None, # 训练 TSN 的超参配置 + test_cfg=dict(average_clips=None)) # 测试 TSN 的超参配置 + + # 数据集设置 + dataset_type = 'RawframeDataset' # 训练,验证,测试的数据集类型 + data_root = 'data/kinetics400/rawframes_train/' # 训练集的根目录 + data_root_val = 'data/kinetics400/rawframes_val/' # 验证集,测试集的根目录 + ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' # 训练集的标注文件 + ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # 验证集的标注文件 + ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # 测试集的标注文件 + img_norm_cfg = dict( # 图像正则化参数设置 + mean=[123.675, 116.28, 103.53], # 图像正则化平均值 + std=[58.395, 57.12, 57.375], # 图像正则化方差 + to_bgr=False) # 是否将通道数从 RGB 转为 BGR + + train_pipeline = [ # 训练数据前处理流水线步骤组成的列表 + dict( # SampleFrames 类的配置 + type='SampleFrames', # 选定采样哪些视频帧 + clip_len=1, # 每个输出视频片段的帧 + frame_interval=1, # 所采相邻帧的时序间隔 + num_clips=3), # 所采帧片段的数量 + dict( # RawFrameDecode 类的配置 + type='RawFrameDecode'), # 给定帧序列,加载对应帧,解码对应帧 + dict( # Resize 类的配置 + type='Resize', # 调整图片尺寸 + scale=(-1, 256)), # 调整比例 + dict( # MultiScaleCrop 类的配置 + type='MultiScaleCrop', # 多尺寸裁剪,随机从一系列给定尺寸中选择一个比例尺寸进行裁剪 + input_size=224, # 网络输入 + scales=(1, 0.875, 0.75, 0.66), # 长宽比例选择范围 + random_crop=False, # 是否进行随机裁剪 + max_wh_scale_gap=1), # 长宽最大比例间隔 + dict( # Resize 类的配置 + type='Resize', # 调整图片尺寸 + scale=(224, 224), # 调整比例 + keep_ratio=False), # 是否保持长宽比 + dict( # Flip 类的配置 + type='Flip', # 图片翻转 + flip_ratio=0.5), # 执行翻转几率 + dict( # Normalize 类的配置 + type='Normalize', # 图片正则化 + **img_norm_cfg), # 图片正则化参数 + dict( # FormatShape 类的配置 + type='FormatShape', # 将图片格式转变为给定的输入格式 + input_format='NCHW'), # 最终的图片组成格式 + dict( # Collect 类的配置 + type='Collect', # Collect 类决定哪些键会被传递到行为识别器中 + keys=['imgs', 'label'], # 输入的键 + meta_keys=[]), # 输入的元键 + dict( # ToTensor 类的配置 + type='ToTensor', # ToTensor 类将其他类型转化为 Tensor 类型 + keys=['imgs', 'label']) # 将被从其他类型转化为 Tensor 类型的特征 + ] + val_pipeline = [ # 验证数据前处理流水线步骤组成的列表 + dict( # SampleFrames 类的配置 + type='SampleFrames', # 选定采样哪些视频帧 + clip_len=1, # 每个输出视频片段的帧 + frame_interval=1, # 所采相邻帧的时序间隔 + num_clips=3, # 所采帧片段的数量 + test_mode=True), # 是否设置为测试模式采帧 + dict( # RawFrameDecode 类的配置 + type='RawFrameDecode'), # 给定帧序列,加载对应帧,解码对应帧 + dict( # Resize 类的配置 + type='Resize', # 调整图片尺寸 + scale=(-1, 256)), # 调整比例 + dict( # CenterCrop 类的配置 + type='CenterCrop', # 中心裁剪 + crop_size=224), # 裁剪部分的尺寸 + dict( # Flip 类的配置 + type='Flip', # 图片翻转 + flip_ratio=0), # 翻转几率 + dict( # Normalize 类的配置 + type='Normalize', # 图片正则化 + **img_norm_cfg), # 图片正则化参数 + dict( # FormatShape 类的配置 + type='FormatShape', # 将图片格式转变为给定的输入格式 + input_format='NCHW'), # 最终的图片组成格式 + dict( # Collect 类的配置 + type='Collect', # Collect 类决定哪些键会被传递到行为识别器中 + keys=['imgs', 'label'], # 输入的键 + meta_keys=[]), # 输入的元键 + dict( # ToTensor 类的配置 + type='ToTensor', # ToTensor 类将其他类型转化为 Tensor 类型 + keys=['imgs']) # 将被从其他类型转化为 Tensor 类型的特征 + ] + test_pipeline = [ # 测试数据前处理流水线步骤组成的列表 + dict( # SampleFrames 类的配置 + type='SampleFrames', # 选定采样哪些视频帧 + clip_len=1, # 每个输出视频片段的帧 + frame_interval=1, # 所采相邻帧的时序间隔 + num_clips=25, # 所采帧片段的数量 + test_mode=True), # 是否设置为测试模式采帧 + dict( # RawFrameDecode 类的配置 + type='RawFrameDecode'), # 给定帧序列,加载对应帧,解码对应帧 + dict( # Resize 类的配置 + type='Resize', # 调整图片尺寸 + scale=(-1, 256)), # 调整比例 + dict( # TenCrop 类的配置 + type='TenCrop', # 裁剪 10 个区域 + crop_size=224), # 裁剪部分的尺寸 + dict( # Flip 类的配置 + type='Flip', # 图片翻转 + flip_ratio=0), # 执行翻转几率 + dict( # Normalize 类的配置 + type='Normalize', # 图片正则化 + **img_norm_cfg), # 图片正则化参数 + dict( # FormatShape 类的配置 + type='FormatShape', # 将图片格式转变为给定的输入格式 + input_format='NCHW'), # 最终的图片组成格式 + dict( # Collect 类的配置 + type='Collect', # Collect 类决定哪些键会被传递到行为识别器中 + keys=['imgs', 'label'], # 输入的键 + meta_keys=[]), # 输入的元键 + dict( # ToTensor 类的配置 + type='ToTensor', # ToTensor 类将其他类型转化为 Tensor 类型 + keys=['imgs']) # 将被从其他类型转化为 Tensor 类型的特征 + ] + data = dict( # 数据的配置 + videos_per_gpu=32, # 单个 GPU 的批大小 + workers_per_gpu=2, # 单个 GPU 的 dataloader 的进程 + train_dataloader=dict( # 训练过程 dataloader 的额外设置 + drop_last=True), # 在训练过程中是否丢弃最后一个批次 + val_dataloader=dict( # 验证过程 dataloader 的额外设置 + videos_per_gpu=1), # 单个 GPU 的批大小 + test_dataloader=dict( # 测试过程 dataloader 的额外设置 + videos_per_gpu=2), # 单个 GPU 的批大小 + train=dict( # 训练数据集的设置 + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( # 验证数据集的设置 + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( # 测试数据集的设置 + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + # 优化器设置 + optimizer = dict( + # 构建优化器的设置,支持: + # (1) 所有 PyTorch 原生的优化器,这些优化器的参数和 PyTorch 对应的一致; + # (2) 自定义的优化器,这些优化器在 `constructor` 的基础上构建。 + # 更多细节可参考 "tutorials/5_new_modules.md" 部分 + type='SGD', # 优化器类型, 参考 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 + lr=0.01, # 学习率, 参数的细节使用可参考 PyTorch 的对应文档 + momentum=0.9, # 动量大小 + weight_decay=0.0001) # SGD 优化器权重衰减 + optimizer_config = dict( # 用于构建优化器钩子的设置 + grad_clip=dict(max_norm=40, norm_type=2)) # 使用梯度裁剪 + # 学习策略设置 + lr_config = dict( # 用于注册学习率调整钩子的设置 + policy='step', # 调整器策略, 支持 CosineAnnealing,Cyclic等方法。更多细节可参考 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 + step=[40, 80]) # 学习率衰减步长 + total_epochs = 100 # 训练模型的总周期数 + checkpoint_config = dict( # 模型权重钩子设置,更多细节可参考 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py + interval=5) # 模型权重文件保存间隔 + evaluation = dict( # 训练期间做验证的设置 + interval=5, # 执行验证的间隔 + metrics=['top_k_accuracy', 'mean_class_accuracy'], # 验证方法 + save_best='top_k_accuracy') # 设置 `top_k_accuracy` 作为指示器,用于存储最好的模型权重文件 + log_config = dict( # 注册日志钩子的设置 + interval=20, # 打印日志间隔 + hooks=[ # 训练期间执行的钩子 + dict(type='TextLoggerHook'), # 记录训练过程信息的日志 + # dict(type='TensorboardLoggerHook'), # 同时支持 Tensorboard 日志 + ]) + + # 运行设置 + dist_params = dict(backend='nccl') # 建立分布式训练的设置,其中端口号也可以设置 + log_level = 'INFO' # 日志等级 + work_dir = './work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/' # 记录当前实验日志和模型权重文件的文件夹 + load_from = None # 从给定路径加载模型作为预训练模型. 这个选项不会用于断点恢复训练 + resume_from = None # 加载给定路径的模型权重文件作为断点续连的模型, 训练将从该时间点保存的周期点继续进行 + workflow = [('train', 1)] # runner 的执行流. [('train', 1)] 代表只有一个执行流,并且这个名为 train 的执行流只执行一次 + + ``` + +### 时空动作检测的配置文件系统 + +MMAction2 将模块化设计整合到配置文件系统中,以便于执行各种不同的实验。 + +- 以 FastRCNN 为例 + + 为了帮助用户理解 MMAction2 的完整配置文件结构,以及时空检测系统中的一些模块,这里以 FastRCNN 为例,给出其配置文件的注释。 + 对于每个模块的详细用法以及对应参数的选择,请参照 [API 文档](https://mmaction2.readthedocs.io/en/latest/api.html)。 + + ```python + # 模型设置 + model = dict( # 模型的配置 + type='FastRCNN', # 时空检测器类型 + backbone=dict( # Backbone 字典设置 + type='ResNet3dSlowOnly', # Backbone 名 + depth=50, # ResNet 模型深度 + pretrained=None, # 预训练模型的 url 或文件位置 + pretrained2d=False, # 预训练模型是否为 2D 模型 + lateral=False, # backbone 是否有侧连接 + num_stages=4, # ResNet 模型阶数 + conv1_kernel=(1, 7, 7), # Conv1 卷积核尺寸 + conv1_stride_t=1, # Conv1 时序步长 + pool1_stride_t=1, # Pool1 时序步长 + spatial_strides=(1, 2, 2, 1)), # 每个 ResNet 阶的空间步长 + roi_head=dict( # roi_head 字典设置 + type='AVARoIHead', # roi_head 名 + bbox_roi_extractor=dict( # bbox_roi_extractor 字典设置 + type='SingleRoIExtractor3D', # bbox_roi_extractor 名 + roi_layer_type='RoIAlign', # RoI op 类型 + output_size=8, # RoI op 输出特征尺寸 + with_temporal_pool=True), # 时序维度是否要经过池化 + bbox_head=dict( # bbox_head 字典设置 + type='BBoxHeadAVA', # bbox_head 名 + in_channels=2048, # 输入特征通道数 + num_classes=81, # 动作类别数 + 1(背景) + multilabel=True, # 数据集是否多标签 + dropout_ratio=0.5)), # dropout 比率 + # 模型训练和测试的设置 + train_cfg=dict( # 训练 FastRCNN 的超参配置 + rcnn=dict( # rcnn 训练字典设置 + assigner=dict( # assigner 字典设置 + type='MaxIoUAssignerAVA', # assigner 名 + pos_iou_thr=0.9, # 正样本 IoU 阈值, > pos_iou_thr -> positive + neg_iou_thr=0.9, # 负样本 IoU 阈值, < neg_iou_thr -> negative + min_pos_iou=0.9), # 正样本最小可接受 IoU + sampler=dict( # sample 字典设置 + type='RandomSampler', # sampler 名 + num=32, # sampler 批大小 + pos_fraction=1, # sampler 正样本边界框比率 + neg_pos_ub=-1, # 负样本数转正样本数的比率上界 + add_gt_as_proposals=True), # 是否添加 ground truth 为候选 + pos_weight=1.0, # 正样本 loss 权重 + debug=False)), # 是否为 debug 模式 + test_cfg=dict( # 测试 FastRCNN 的超参设置 + rcnn=dict( # rcnn 测试字典设置 + action_thr=0.002))) # 某行为的阈值 + + # 数据集设置 + dataset_type = 'AVADataset' # 训练,验证,测试的数据集类型 + data_root = 'data/ava/rawframes' # 训练集的根目录 + anno_root = 'data/ava/annotations' # 标注文件目录 + + ann_file_train = f'{anno_root}/ava_train_v2.1.csv' # 训练集的标注文件 + ann_file_val = f'{anno_root}/ava_val_v2.1.csv' # 验证集的标注文件 + + exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' # 训练除外数据集文件路径 + exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' # 验证除外数据集文件路径 + + label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' # 标签文件路径 + + proposal_file_train = f'{anno_root}/ava_dense_proposals_train.FAIR.recall_93.9.pkl' # 训练样本检测候选框的文件路径 + proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' # 验证样本检测候选框的文件路径 + + img_norm_cfg = dict( # 图像正则化参数设置 + mean=[123.675, 116.28, 103.53], # 图像正则化平均值 + std=[58.395, 57.12, 57.375], # 图像正则化方差 + to_bgr=False) # 是否将通道数从 RGB 转为 BGR + + train_pipeline = [ # 训练数据前处理流水线步骤组成的列表 + dict( # SampleFrames 类的配置 + type='AVASampleFrames', # 选定采样哪些视频帧 + clip_len=4, # 每个输出视频片段的帧 + frame_interval=16), # 所采相邻帧的时序间隔 + dict( # RawFrameDecode 类的配置 + type='RawFrameDecode'), # 给定帧序列,加载对应帧,解码对应帧 + dict( # RandomRescale 类的配置 + type='RandomRescale', # 给定一个范围,进行随机短边缩放 + scale_range=(256, 320)), # RandomRescale 的短边缩放范围 + dict( # RandomCrop 类的配置 + type='RandomCrop', # 给定一个尺寸进行随机裁剪 + size=256), # 裁剪尺寸 + dict( # Flip 类的配置 + type='Flip', # 图片翻转 + flip_ratio=0.5), # 执行翻转几率 + dict( # Normalize 类的配置 + type='Normalize', # 图片正则化 + **img_norm_cfg), # 图片正则化参数 + dict( # FormatShape 类的配置 + type='FormatShape', # 将图片格式转变为给定的输入格式 + input_format='NCTHW', # 最终的图片组成格式 + collapse=True), # 去掉 N 梯度当 N == 1 + dict( # Rename 类的配置 + type='Rename', # 重命名 key 名 + mapping=dict(imgs='img')), # 改名映射字典 + dict( # ToTensor 类的配置 + type='ToTensor', # ToTensor 类将其他类型转化为 Tensor 类型 + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), # 将被从其他类型转化为 Tensor 类型的特征 + dict( # ToDataContainer 类的配置 + type='ToDataContainer', # 将一些信息转入到 ToDataContainer 中 + fields=[ # 转化为 Datacontainer 的域 + dict( # 域字典 + key=['proposals', 'gt_bboxes', 'gt_labels'], # 将转化为 DataContainer 的键 + stack=False)]), # 是否要堆列这些 tensor + dict( # Collect 类的配置 + type='Collect', # Collect 类决定哪些键会被传递到时空检测器中 + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], # 输入的键 + meta_keys=['scores', 'entity_ids']), # 输入的元键 + ] + + val_pipeline = [ # 验证数据前处理流水线步骤组成的列表 + dict( # SampleFrames 类的配置 + type='AVASampleFrames', # 选定采样哪些视频帧 + clip_len=4, # 每个输出视频片段的帧 + frame_interval=16), # 所采相邻帧的时序间隔 + dict( # RawFrameDecode 类的配置 + type='RawFrameDecode'), # 给定帧序列,加载对应帧,解码对应帧 + dict( # Resize 类的配置 + type='Resize', # 调整图片尺寸 + scale=(-1, 256)), # 调整比例 + dict( # Normalize 类的配置 + type='Normalize', # 图片正则化 + **img_norm_cfg), # 图片正则化参数 + dict( # FormatShape 类的配置 + type='FormatShape', # 将图片格式转变为给定的输入格式 + input_format='NCTHW', # 最终的图片组成格式 + collapse=True), # 去掉 N 梯度当 N == 1 + dict( # Rename 类的配置 + type='Rename', # 重命名 key 名 + mapping=dict(imgs='img')), # 改名映射字典 + dict( # ToTensor 类的配置 + type='ToTensor', # ToTensor 类将其他类型转化为 Tensor 类型 + keys=['img', 'proposals']), # 将被从其他类型转化为 Tensor 类型的特征 + dict( # ToDataContainer 类的配置 + type='ToDataContainer', # 将一些信息转入到 ToDataContainer 中 + fields=[ # 转化为 Datacontainer 的域 + dict( # 域字典 + key=['proposals'], # 将转化为 DataContainer 的键 + stack=False)]), # 是否要堆列这些 tensor + dict( # Collect 类的配置 + type='Collect', # Collect 类决定哪些键会被传递到时空检测器中 + keys=['img', 'proposals'], # 输入的键 + meta_keys=['scores', 'entity_ids'], # 输入的元键 + nested=True) # 是否将数据包装为嵌套列表 + ] + + data = dict( # 数据的配置 + videos_per_gpu=16, # 单个 GPU 的批大小 + workers_per_gpu=2, # 单个 GPU 的 dataloader 的进程 + val_dataloader=dict( # 验证过程 dataloader 的额外设置 + videos_per_gpu=1), # 单个 GPU 的批大小 + train=dict( # 训练数据集的设置 + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( # 验证数据集的设置 + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) + data['test'] = data['val'] # 将验证数据集设置复制到测试数据集设置 + + # 优化器设置 + optimizer = dict( + # 构建优化器的设置,支持: + # (1) 所有 PyTorch 原生的优化器,这些优化器的参数和 PyTorch 对应的一致; + # (2) 自定义的优化器,这些优化器在 `constructor` 的基础上构建。 + # 更多细节可参考 "tutorials/5_new_modules.md" 部分 + type='SGD', # 优化器类型, 参考 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 + lr=0.2, # 学习率, 参数的细节使用可参考 PyTorch 的对应文档 + momentum=0.9, # 动量大小 + weight_decay=0.00001) # SGD 优化器权重衰减 + + optimizer_config = dict( # 用于构建优化器钩子的设置 + grad_clip=dict(max_norm=40, norm_type=2)) # 使用梯度裁剪 + + lr_config = dict( # 用于注册学习率调整钩子的设置 + policy='step', # 调整器策略, 支持 CosineAnnealing,Cyclic等方法。更多细节可参考 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 + step=[40, 80], # 学习率衰减步长 + warmup='linear', # Warmup 策略 + warmup_by_epoch=True, # Warmup 单位为 epoch 还是 iteration + warmup_iters=5, # warmup 数 + warmup_ratio=0.1) # 初始学习率为 warmup_ratio * lr + + total_epochs = 20 # 训练模型的总周期数 + checkpoint_config = dict( # 模型权重文件钩子设置,更多细节可参考 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py + interval=1) # 模型权重文件保存间隔 + workflow = [('train', 1)] # runner 的执行流. [('train', 1)] 代表只有一个执行流,并且这个名为 train 的执行流只执行一次 + evaluation = dict( # 训练期间做验证的设置 + interval=1, save_best='mAP@0.5IOU') # 执行验证的间隔,以及设置 `mAP@0.5IOU` 作为指示器,用于存储最好的模型权重文件 + log_config = dict( # 注册日志钩子的设置 + interval=20, # 打印日志间隔 + hooks=[ # 训练期间执行的钩子 + dict(type='TextLoggerHook'), # 记录训练过程信息的日志 + ]) + + # 运行设置 + dist_params = dict(backend='nccl') # 建立分布式训练的设置,其中端口号也可以设置 + log_level = 'INFO' # 日志等级 + work_dir = ('./work_dirs/ava/' # 记录当前实验日志和模型权重文件的文件夹 + 'slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb') + load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' # 从给定路径加载模型作为预训练模型. 这个选项不会用于断点恢复训练 + 'slowonly_r50_4x16x1_256e_kinetics400_rgb/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth') + resume_from = None # 加载给定路径的模型权重文件作为断点续连的模型, 训练将从该时间点保存的周期点继续进行 + ``` + +## 常见问题 + +### 配置文件中的中间变量 + +配置文件中会用到一些中间变量,如 `train_pipeline`/`val_pipeline`/`test_pipeline`, `ann_file_train`/`ann_file_val`/`ann_file_test`, `img_norm_cfg` 等。 + +例如,首先定义中间变量 `train_pipeline`/`val_pipeline`/`test_pipeline`,再将上述变量传递到 `data`。因此,`train_pipeline`/`val_pipeline`/`test_pipeline` 为中间变量 + +这里也定义了 `ann_file_train`/`ann_file_val`/`ann_file_test` 和 `data_root`/`data_root_val` 为数据处理流程提供一些基本信息。 + +此外,使用 `img_norm_cfg` 作为中间变量,构建一些数组增强组件。 + +```python +... +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/2_finetune.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/2_finetune.md new file mode 100644 index 0000000000000000000000000000000000000000..dc3f19db25b7992d086bddb6da317f5d74d85aab --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/2_finetune.md @@ -0,0 +1,93 @@ +# 教程 2:如何微调模型 + +本教程介绍如何使用预训练模型在其他数据集上进行微调。 + + + +- [概要](#%E6%A6%82%E8%A6%81) +- [修改 Head](#%E4%BF%AE%E6%94%B9-Head) +- [修改数据集](#%E4%BF%AE%E6%94%B9%E6%95%B0%E6%8D%AE%E9%9B%86) +- [修改训练策略](#%E4%BF%AE%E6%94%B9%E8%AE%AD%E7%BB%83%E7%AD%96%E7%95%A5) +- [使用预训练模型](#%E4%BD%BF%E7%94%A8%E9%A2%84%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B) + + + +## 概要 + +对新数据集上的模型进行微调需要进行两个步骤: + +1. 增加对新数据集的支持。详情请见 [教程 3:如何增加新数据集](3_new_dataset.md) +2. 修改配置文件。这部分将在本教程中做具体讨论。 + +例如,如果用户想要微调 Kinetics-400 数据集的预训练模型到另一个数据集上,如 UCF101,则需要注意 [配置文件](1_config.md) 中 Head、数据集、训练策略、预训练模型四个部分,下面分别介绍。 + +## 修改 Head + +`cls_head` 中的 `num_classes` 参数需改为新数据集中的类别数。 +预训练模型中,除了最后一层外的权重都会被重新利用,因此这个改动是安全的。 +例如,UCF101 拥有 101 类行为,因此需要把 400 (Kinetics-400 的类别数) 改为 101。 + +```python +model = dict( + type='Recognizer2D', + backbone=dict( + type='ResNet', + pretrained='torchvision://resnet50', + depth=50, + norm_eval=False), + cls_head=dict( + type='TSNHead', + num_classes=101, # 从 400 改为 101 + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.4, + init_std=0.01), + train_cfg=None, + test_cfg=dict(average_clips=None)) +``` + +其中, `pretrained='torchvision://resnet50'` 表示通过 ImageNet 预训练权重初始化 backbone。 +然而,模型微调时的预训练权重一般通过 `load_from`(而不是 `pretrained`)指定。 + +## 修改数据集 + +MMAction2 支持 UCF101, Kinetics-400, Moments in Time, Multi-Moments in Time, THUMOS14, +Something-Something V1&V2, ActivityNet 等数据集。 +用户可将自建数据集转换已有数据集格式。 +对动作识别任务来讲,MMAction2 提供了 `RawframeDataset` 和 `VideoDataset` 等通用的数据集读取类,数据集格式相对简单。 +以 `UCF101` 和 `RawframeDataset` 为例, + +```python +# 数据集设置 +dataset_type = 'RawframeDataset' +data_root = 'data/ucf101/rawframes_train/' +data_root_val = 'data/ucf101/rawframes_val/' +ann_file_train = 'data/ucf101/ucf101_train_list.txt' +ann_file_val = 'data/ucf101/ucf101_val_list.txt' +ann_file_test = 'data/ucf101/ucf101_val_list.txt' +``` + +## 修改训练策略 + +通常情况下,设置较小的学习率,微调模型少量训练批次,即可取得较好效果。 + +```python +# 优化器 +optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001) # 从 0.01 改为 0.005 +optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) +# 学习策略 +lr_config = dict(policy='step', step=[20, 40]) # step 与 total_epoch 相适应 +total_epochs = 50 # 从 100 改为 50 +checkpoint_config = dict(interval=5) +``` + +## 使用预训练模型 + +若要将预训练模型用于整个网络(主干网络设置中的 `pretrained`,仅会在主干网络模型上加载预训练参数),可通过 `load_from` 指定模型文件路径或模型链接,实现预训练权重导入。 +MMAction2 在 `configs/_base_/default_runtime.py` 文件中将 `load_from=None` 设为默认。由于配置文件的可继承性,用户可直接在下游配置文件中设置 `load_from` 的值来进行更改。 + +```python +# 将预训练模型用于整个 TSN 网络 +load_from = 'https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/mmaction-v1/recognition/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth' # 模型路径可以在 model zoo 中找到 +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/3_new_dataset.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/3_new_dataset.md new file mode 100644 index 0000000000000000000000000000000000000000..9b2368c15c9edae328af89abc80e8a82259ccada --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/3_new_dataset.md @@ -0,0 +1,245 @@ +# 教程 3:如何增加新数据集 + +在本教程中,我们将介绍一些有关如何按已支持的数据格式进行数据组织,和组合已有数据集来自定义数据集的方法。 + + + +- [通过重组数据来自定义数据集](#%E9%80%9A%E8%BF%87%E9%87%8D%E7%BB%84%E6%95%B0%E6%8D%AE%E6%9D%A5%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86) + - [将数据集重新组织为现有格式](#%E5%B0%86%E6%95%B0%E6%8D%AE%E9%9B%86%E9%87%8D%E6%96%B0%E7%BB%84%E7%BB%87%E4%B8%BA%E7%8E%B0%E6%9C%89%E6%A0%BC%E5%BC%8F) + - [自定义数据集的示例](#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86%E7%9A%84%E7%A4%BA%E4%BE%8B) +- [通过组合已有数据集来自定义数据集](#%E9%80%9A%E8%BF%87%E7%BB%84%E5%90%88%E5%B7%B2%E6%9C%89%E6%95%B0%E6%8D%AE%E9%9B%86%E6%9D%A5%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86) + - [重复数据集](#%E9%87%8D%E5%A4%8D%E6%95%B0%E6%8D%AE%E9%9B%86) + + + +## 通过重组数据来自定义数据集 + +### 将数据集重新组织为现有格式 + +最简单的方法是将数据集转换为现有的数据集格式(RawframeDataset 或 VideoDataset)。 + +有三种标注文件: + +- 帧标注(rawframe annotation) + + 帧数据集(rawframe dataset)标注文件由多行文本组成,每行代表一个样本,每个样本分为三个部分,分别是 `帧(相对)文件夹`(rawframe directory of relative path), + `总帧数`(total frames)以及 `标签`(label),通过空格进行划分 + + 示例如下: + + ``` + some/directory-1 163 1 + some/directory-2 122 1 + some/directory-3 258 2 + some/directory-4 234 2 + some/directory-5 295 3 + some/directory-6 121 3 + ``` + +- 视频标注(video annotation) + + 视频数据集(video dataset)标注文件由多行文本组成,每行代表一个样本,每个样本分为两个部分,分别是 `文件(相对)路径`(filepath of relative path) + 和 `标签`(label),通过空格进行划分 + + 示例如下: + + ``` + some/path/000.mp4 1 + some/path/001.mp4 1 + some/path/002.mp4 2 + some/path/003.mp4 2 + some/path/004.mp4 3 + some/path/005.mp4 3 + ``` + +- ActivityNet 标注 + + ActivityNet 数据集的标注文件是一个 json 文件。每个键是一个视频名,其对应的值是这个视频的元数据和注释。 + + 示例如下: + + ``` + { + "video1": { + "duration_second": 211.53, + "duration_frame": 6337, + "annotations": [ + { + "segment": [ + 30.025882995319815, + 205.2318595943838 + ], + "label": "Rock climbing" + } + ], + "feature_frame": 6336, + "fps": 30.0, + "rfps": 29.9579255898 + }, + "video2": { + "duration_second": 26.75, + "duration_frame": 647, + "annotations": [ + { + "segment": [ + 2.578755070202808, + 24.914101404056165 + ], + "label": "Drinking beer" + } + ], + "feature_frame": 624, + "fps": 24.0, + "rfps": 24.1869158879 + } + } + ``` + +有两种使用自定义数据集的方法: + +- 在线转换 + + 用户可以通过继承 [BaseDataset](/mmaction/datasets/base.py) 基类编写一个新的数据集类,并重写三个抽象类方法: + `load_annotations(self)`,`evaluate(self, results, metrics, logger)` 和 `dump_results(self, results, out)`, + 如 [RawframeDataset](/mmaction/datasets/rawframe_dataset.py),[VideoDataset](/mmaction/datasets/video_dataset.py) 或 [ActivityNetDataset](/mmaction/datasets/activitynet_dataset.py)。 + +- 本地转换 + + 用户可以转换标注文件格式为上述期望的格式,并将其存储为 pickle 或 json 文件,然后便可以应用于 `RawframeDataset`,`VideoDataset` 或 `ActivityNetDataset` 中。 + +数据预处理后,用户需要进一步修改配置文件以使用数据集。 这里展示了以帧形式使用自定义数据集的例子: + +在 `configs/task/method/my_custom_config.py` 下: + +```python +... +# 数据集设定 +dataset_type = 'RawframeDataset' +data_root = 'path/to/your/root' +data_root_val = 'path/to/your/root_val' +ann_file_train = 'data/custom/custom_train_list.txt' +ann_file_val = 'data/custom/custom_val_list.txt' +ann_file_test = 'data/custom/custom_val_list.txt' +... +data = dict( + videos_per_gpu=32, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + ...), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + ...), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + ...)) +... +``` + +### 自定义数据集的示例 + +假设注释在文本文件中以新格式显示,并且图像文件名具有类似 “img_00005.jpg” 的模板。 +那么视频注释将以以下形式存储在文本文件 `annotation.txt` 中。 + +``` +#文件夹,总帧数,类别 +D32_1gwq35E,299,66 +-G-5CJ0JkKY,249,254 +T4h1bvOd9DA,299,33 +4uZ27ivBl00,299,341 +0LfESFkfBSw,249,186 +-YIsNpBEx6c,299,169 +``` + +在 `mmaction/datasets/my_dataset.py` 中创建新数据集加载数据 + +```python +import copy +import os.path as osp + +import mmcv + +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class MyDataset(BaseDataset): + + def __init__(self, + ann_file, + pipeline, + data_prefix=None, + test_mode=False, + filename_tmpl='img_{:05}.jpg'): + super(MyDataset, self).__init__(ann_file, pipeline, test_mode) + + self.filename_tmpl = filename_tmpl + + def load_annotations(self): + video_infos = [] + with open(self.ann_file, 'r') as fin: + for line in fin: + if line.startswith("directory"): + continue + frame_dir, total_frames, label = line.split(',') + if self.data_prefix is not None: + frame_dir = osp.join(self.data_prefix, frame_dir) + video_infos.append( + dict( + frame_dir=frame_dir, + total_frames=int(total_frames), + label=int(label))) + return video_infos + + def prepare_train_frames(self, idx): + results = copy.deepcopy(self.video_infos[idx]) + results['filename_tmpl'] = self.filename_tmpl + return self.pipeline(results) + + def prepare_test_frames(self, idx): + results = copy.deepcopy(self.video_infos[idx]) + results['filename_tmpl'] = self.filename_tmpl + return self.pipeline(results) + + def evaluate(self, + results, + metrics='top_k_accuracy', + topk=(1, 5), + logger=None): + pass +``` + +然后在配置文件中,用户可通过如下修改来使用 `MyDataset`: + +```python +dataset_A_train = dict( + type='MyDataset', + ann_file=ann_file_train, + pipeline=train_pipeline +) +``` + +## 通过组合已有数据集来自定义数据集 + +MMAction2 还支持组合已有数据集以进行训练。 目前,它支持重复数据集(repeat dataset)。 + +### 重复数据集 + +MMAction2 使用 “RepeatDataset” 作为包装器来重复数据集。例如,假设原始数据集为 “Dataset_A”, +为了重复此数据集,可设置配置如下: + +```python +dataset_A_train = dict( + type='RepeatDataset', + times=N, + dataset=dict( # 这是 Dataset_A 的原始配置 + type='Dataset_A', + ... + pipeline=train_pipeline + ) + ) +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/4_data_pipeline.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/4_data_pipeline.md new file mode 100644 index 0000000000000000000000000000000000000000..8f52ff5a25aa61e79d4d058d225bb51af89b2c1f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/4_data_pipeline.md @@ -0,0 +1,257 @@ +# 教程 4:如何设计数据处理流程 + +在本教程中,我们将介绍一些有关数据前处理流水线设计的方法,以及如何为项目自定义和扩展自己的数据流水线。 + + + +- [教程 4:如何设计数据处理流程](#%E6%95%99%E7%A8%8B-4%E5%A6%82%E4%BD%95%E8%AE%BE%E8%AE%A1%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86%E6%B5%81%E7%A8%8B) + - [数据前处理流水线设计](#%E6%95%B0%E6%8D%AE%E5%89%8D%E5%A4%84%E7%90%86%E6%B5%81%E6%B0%B4%E7%BA%BF%E8%AE%BE%E8%AE%A1) + - [数据加载](#%E6%95%B0%E6%8D%AE%E5%8A%A0%E8%BD%BD) + - [数据预处理](#%E6%95%B0%E6%8D%AE%E9%A2%84%E5%A4%84%E7%90%86) + - [数据格式化](#%E6%95%B0%E6%8D%AE%E6%A0%BC%E5%BC%8F%E5%8C%96) + - [扩展和使用自定义流水线](#%E6%89%A9%E5%B1%95%E5%92%8C%E4%BD%BF%E7%94%A8%E8%87%AA%E5%AE%9A%E4%B9%89%E6%B5%81%E6%B0%B4%E7%BA%BF) + + + +## 数据前处理流水线设计 + +按照惯例,MMAction2 使用 `Dataset` 和 `DataLoader` 实现多进程数据加载。 `Dataset` 返回一个字典,作为模型的输入。 +由于动作识别和时序动作检测的数据大小不一定相同(图片大小,边界框大小等),MMAction2 使用 MMCV 中的 `DataContainer` 收集和分配不同大小的数据, +详情可见 [这里](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py)。 + +“数据前处理流水线” 和 “数据集构建” 是相互解耦的。通常,“数据集构建” 定义如何处理标注文件,“数据前处理流水线” 定义数据加载、预处理、格式化等功能(后文将详细介绍)。 +数据前处理流水线由一系列相互解耦的操作组成。每个操作都输入一个字典(dict),新增/更新/删除相关字段,最终输出该字典,作为下一个操作的输入。 + +我们在下图中展示了一个典型的流水线。 蓝色块是流水线操作。 +随着流水线的深入,每个操作都可以向结果字典添加新键(标记为绿色)或更新现有键(标记为橙色)。 + +![流水线](https://github.com/open-mmlab/mmaction2/raw/master/resources/data_pipeline.png) + +这些操作分为数据加载,数据预处理和数据格式化。 + +这里以 TSN 的数据前处理流水线为例: + +```python +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) +train_pipeline = [ + dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3), + dict(type='RawFrameDecode', io_backend='disk'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=3, + test_mode=True), + dict(type='RawFrameDecode', io_backend='disk'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=25, + test_mode=True), + dict(type='RawFrameDecode', io_backend='disk'), + dict(type='Resize', scale=(-1, 256)), + dict(type='TenCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +``` + +MMAction2 也支持一些 lazy 操作符。 +Lazy 操作记录如何处理数据,但是它会推迟对原始数据的处理,直到进入 Fuse 阶段。 +具体而言,lazy 操作符避免了对原始数据的频繁读取和修改操作,只在最后的 Fuse 阶段中对原始数据进行了一次处理,从而加快了数据预处理速度,因此,推荐用户使用本功能。 + +这是使用 lazy 运算符的数据前处理流水线的例子: + +```python +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode', decoding_backend='turbojpeg'), + # 以下三个 lazy 操作符仅处理帧的 bbox 而不修改原始数据。 + dict(type='Resize', scale=(-1, 256), lazy=True), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0, + lazy=True), + dict(type='Resize', scale=(224, 224), keep_ratio=False, lazy=True), + # lazy 操作符 “Flip” 仅记录是否应该翻转框架和翻转方向。 + dict(type='Flip', flip_ratio=0.5, lazy=True), + # 在 Fuse 阶段处理一次原始数据 + dict(type='Fuse'), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +``` + +本节将所有操作分为数据加载、数据预处理、数据格式化三类,列出每个操作 新增/更新/删除 的相关字典字段,其中 `*` 代表所对应的键值不一定会被影响。 + +### 数据加载 + +`SampleFrames` + +- 新增: frame_inds, clip_len, frame_interval, num_clips, \*total_frames + +`DenseSampleFrames` + +- 新增: frame_inds, clip_len, frame_interval, num_clips, \*total_frames + +`PyAVDecode` + +- 新增: imgs, original_shape +- 更新: \*frame_inds + +`DecordDecode` + +- 新增: imgs, original_shape +- 更新: \*frame_inds + +`OpenCVDecode` + +- 新增: imgs, original_shape +- 更新: \*frame_inds + +`RawFrameDecode` + +- 新增: imgs, original_shape +- 更新: \*frame_inds + +### 数据预处理 + +`RandomCrop` + +- 新增: crop_bbox, img_shape +- 更新: imgs + +`RandomResizedCrop` + +- 新增: crop_bbox, img_shape +- 更新: imgs + +`MultiScaleCrop` + +- 新增: crop_bbox, img_shape, scales +- 更新: imgs + +`Resize` + +- 新增: img_shape, keep_ratio, scale_factor +- 更新: imgs + +`Flip` + +- 新增: flip, flip_direction +- 更新: imgs, label + +`Normalize` + +- 新增: img_norm_cfg +- 更新: imgs + +`CenterCrop` + +- 新增: crop_bbox, img_shape +- 更新: imgs + +`ThreeCrop` + +- 新增: crop_bbox, img_shape +- 更新: imgs + +`TenCrop` + +- 新增: crop_bbox, img_shape +- 更新: imgs + +### 数据格式化 + +`ToTensor` + +- 更新: specified by `keys`. + +`ImageToTensor` + +- 更新: specified by `keys`. + +`Transpose` + +- 更新: specified by `keys`. + +`Collect` + +- 新增: img_metas (所有需要的图像元数据,会被在此阶段整合进 `meta_keys` 键值中) +- 删除: 所有没有被整合进 `keys` 的键值 + +**值得注意的是**,第一个键,通常是 `imgs`,会作为主键用来计算批大小。 + +`FormatShape` + +- 新增: input_shape +- 更新: imgs + +## 扩展和使用自定义流水线 + +1. 在任何文件写入一个新的处理流水线,如 `my_pipeline.py`。它以一个字典作为输入并返回一个字典 + + ```python + from mmaction.datasets import PIPELINES + + @PIPELINES.register_module() + class MyTransform: + + def __call__(self, results): + results['key'] = value + return results + ``` + +2. 导入新类 + + ```python + from .my_pipeline import MyTransform + ``` + +3. 在配置文件使用它 + + ```python + img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) + train_pipeline = [ + dict(type='DenseSampleFrames', clip_len=8, frame_interval=8, num_clips=1), + dict(type='RawFrameDecode', io_backend='disk'), + dict(type='MyTransform'), # 使用自定义流水线操作 + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) + ] + ``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/5_new_modules.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/5_new_modules.md new file mode 100644 index 0000000000000000000000000000000000000000..7c381af8b100938e53493bf2d05dac1726846edf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/5_new_modules.md @@ -0,0 +1,279 @@ +# 教程 5:如何添加新模块 + +在本教程中,我们将介绍一些有关如何为该项目定制优化器,开发新组件,以及添加新的学习率调整器(更新器)的方法。 + + + +- [自定义优化器](#%E8%87%AA%E5%AE%9A%E4%B9%89%E4%BC%98%E5%8C%96%E5%99%A8) +- [自定义优化器构造器](#%E8%87%AA%E5%AE%9A%E4%B9%89%E4%BC%98%E5%8C%96%E5%99%A8%E6%9E%84%E9%80%A0%E5%99%A8) +- [开发新组件](#%E5%BC%80%E5%8F%91%E6%96%B0%E7%BB%84%E4%BB%B6) + - [添加新的 backbones](#%E6%B7%BB%E5%8A%A0%E6%96%B0-backbones) + - [添加新的 heads](#%E6%B7%BB%E5%8A%A0%E6%96%B0-heads) + - [添加新的 loss function](#%E6%B7%BB%E5%8A%A0%E6%96%B0-loss-function) +- [添加新的学习率调节器(更新器)](#%E6%B7%BB%E5%8A%A0%E6%96%B0%E7%9A%84%E5%AD%A6%E4%B9%A0%E7%8E%87%E8%B0%83%E8%8A%82%E5%99%A8%EF%BC%88%E6%9B%B4%E6%96%B0%E5%99%A8%EF%BC%89) + + + +## 自定义优化器 + +[CopyOfSGD](/mmaction/core/optimizer/copy_of_sgd.py) 是自定义优化器的一个例子,写在 `mmaction/core/optimizer/copy_of_sgd.py` 文件中。 +更一般地,可以根据如下方法自定义优化器。 + +假设添加的优化器名为 `MyOptimizer`,它有 `a`,`b` 和 `c` 三个参数。 +用户需要首先实现一个新的优化器文件,如 `mmaction/core/optimizer/my_optimizer.py`: + +```python +from mmcv.runner import OPTIMIZERS +from torch.optim import Optimizer + +@OPTIMIZERS.register_module() +class MyOptimizer(Optimizer): + + def __init__(self, a, b, c): +``` + +然后添加这个模块到 `mmaction/core/optimizer/__init__.py` 中,从而让注册器可以找到这个新的模块并添加它: + +```python +from .my_optimizer import MyOptimizer +``` + +之后,用户便可以在配置文件的 `optimizer` 字段中使用 `MyOptimizer`。 +在配置中,优化器由 `optimizer` 字段所定义,如下所示: + +```python +optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +``` + +用户可以直接根据 [PyTorch API 文档](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) 对参数进行直接设置。 + +## 自定义优化器构造器 + +某些模型可能对不同层的参数有特定的优化设置,例如 BatchNorm 层的梯度衰减。 +用户可以通过自定义优化器构造函数来进行那些细粒度的参数调整。 + +用户可以编写一个基于 [DefaultOptimizerConstructor](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py) 的新的优化器构造器, +并且重写 `add_params(self, params, module)` 方法。 + +一个自定义优化器构造器的例子是 [TSMOptimizerConstructor](/mmaction/core/optimizer/tsm_optimizer_constructor.py)。 +更具体地,可以如下定义定制的优化器构造器。 + +在 `mmaction/core/optimizer/my_optimizer_constructor.py`: + +```python +from mmcv.runner import OPTIMIZER_BUILDERS, DefaultOptimizerConstructor + +@OPTIMIZER_BUILDERS.register_module() +class MyOptimizerConstructor(DefaultOptimizerConstructor): + +``` + +在 `mmaction/core/optimizer/__init__.py`: + +```python +from .my_optimizer_constructor import MyOptimizerConstructor +``` + +之后便可在配置文件的 `optimizer` 域中使用 `MyOptimizerConstructor`。 + +```python +# 优化器 +optimizer = dict( + type='SGD', + constructor='MyOptimizerConstructor', + paramwise_cfg=dict(fc_lr5=True), + lr=0.02, + momentum=0.9, + weight_decay=0.0001) +``` + +## 开发新组件 + +MMAction2 将模型组件分为 4 种基础模型: + +- 识别器(recognizer):整个识别器模型流水线,通常包含一个主干网络(backbone)和分类头(cls_head)。 +- 主干网络(backbone):通常为一个用于提取特征的 FCN 网络,例如 ResNet,BNInception。 +- 分类头(cls_head):用于分类任务的组件,通常包括一个带有池化层的 FC 层。 +- 时序检测器(localizer):用于时序检测的模型,目前有的检测器包含 BSN,BMN,SSN。 + +### 添加新的 backbones + +这里以 TSN 为例,说明如何开发新的组件。 + +1. 创建新文件 `mmaction/models/backbones/resnet.py` + + ```python + import torch.nn as nn + + from ..builder import BACKBONES + + @BACKBONES.register_module() + class ResNet(nn.Module): + + def __init__(self, arg1, arg2): + pass + + def forward(self, x): # 应该返回一个元组 + pass + + def init_weights(self, pretrained=None): + pass + ``` + +2. 在 `mmaction/models/backbones/__init__.py` 中导入模型 + + ```python + from .resnet import ResNet + ``` + +3. 在配置文件中使用它 + + ```python + model = dict( + ... + backbone=dict( + type='ResNet', + arg1=xxx, + arg2=xxx), + ) + ``` + +### 添加新的 heads + +这里以 TSNHead 为例,说明如何开发新的 head + +1. 创建新文件 `mmaction/models/heads/tsn_head.py` + + 可以通过继承 [BaseHead](/mmaction/models/heads/base.py) 编写一个新的分类头, + 并重写 `init_weights(self)` 和 `forward(self, x)` 方法 + + ```python + from ..builder import HEADS + from .base import BaseHead + + + @HEADS.register_module() + class TSNHead(BaseHead): + + def __init__(self, arg1, arg2): + pass + + def forward(self, x): + pass + + def init_weights(self): + pass + ``` + +2. 在 `mmaction/models/heads/__init__.py` 中导入模型 + + ```python + from .tsn_head import TSNHead + ``` + +3. 在配置文件中使用它 + + ```python + model = dict( + ... + cls_head=dict( + type='TSNHead', + num_classes=400, + in_channels=2048, + arg1=xxx, + arg2=xxx), + ``` + +### 添加新的 loss function + +假设用户想添加新的 loss 为 `MyLoss`。为了添加一个新的损失函数,需要在 `mmaction/models/losses/my_loss.py` 下进行实现。 + +```python +import torch +import torch.nn as nn + +from ..builder import LOSSES + +def my_loss(pred, target): + assert pred.size() == target.size() and target.numel() > 0 + loss = torch.abs(pred - target) + return loss + + +@LOSSES.register_module() +class MyLoss(nn.Module): + + def forward(self, pred, target): + loss = my_loss(pred, target) + return loss +``` + +之后,用户需要把它添加进 `mmaction/models/losses/__init__.py` + +```python +from .my_loss import MyLoss, my_loss +``` + +为了使用它,需要修改 `loss_xxx` 域。由于 MyLoss 用户识别任务,可以把它作为边界框损失 `loss_bbox` + +```python +loss_bbox=dict(type='MyLoss')) +``` + +### 添加新的学习率调节器(更新器) + +构造学习率更新器(即 PyTorch 中的 "scheduler")的默认方法是修改配置,例如: + +```python +... +lr_config = dict(policy='step', step=[20, 40]) +... +``` + +在 [`train.py`](/mmaction/apis/train.py) 的 api 中,它会在以下位置注册用于学习率更新的钩子: + +```python +... + runner.register_training_hooks( + cfg.lr_config, + optimizer_config, + cfg.checkpoint_config, + cfg.log_config, + cfg.get('momentum_config', None)) +... +``` + +到目前位置,所有支持的更新器可参考 [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), +但如果用户想自定义学习率更新器,则需要遵循以下步骤: + +1. 首先,在 `$MMAction2/mmaction/core/scheduler` 编写自定义的学习率更新钩子(LrUpdaterHook)。以下片段是自定义学习率更新器的例子,它使用基于特定比率的学习率 `lrs`,并在每个 `steps` 处进行学习率衰减。以下代码段是自定义学习率更新器的例子: + +```python +# 在此注册 +@HOOKS.register_module() +class RelativeStepLrUpdaterHook(LrUpdaterHook): + # 该类应当继承于 mmcv.LrUpdaterHook + def __init__(self, steps, lrs, **kwargs): + super().__init__(**kwargs) + assert len(steps) == (len(lrs)) + self.steps = steps + self.lrs = lrs + + def get_lr(self, runner, base_lr): + # 仅需要重写该函数 + # 该函数在每个训练周期之前被调用, 并返回特定的学习率. + progress = runner.epoch if self.by_epoch else runner.iter + for i in range(len(self.steps)): + if progress < self.steps[i]: + return self.lrs[i] +``` + +2. 修改配置 + +在配置文件下替换原先的 `lr_config` 变量 + +```python +lr_config = dict(policy='RelativeStep', steps=[20, 40, 60], lrs=[0.1, 0.01, 0.001]) +``` + +更多例子可参考 [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py) diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/6_export_model.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/6_export_model.md new file mode 100644 index 0000000000000000000000000000000000000000..01b861d058efd49276b0f65efaf99d1d501386f9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/6_export_model.md @@ -0,0 +1,75 @@ +# 教程 6:如何导出模型为 onnx 格式 + +开放式神经网络交换格式(Open Neural Network Exchange,即 [ONNX](https://onnx.ai/))是一个开放的生态系统,使 AI 开发人员能够随着项目的发展选择正确的工具。 + + + +- [支持的模型](#%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B) +- [如何使用](#%E5%A6%82%E4%BD%95%E4%BD%BF%E7%94%A8) + - [准备工作](#%E5%87%86%E5%A4%87%E5%B7%A5%E4%BD%9C) + - [行为识别器](#%E8%A1%8C%E4%B8%BA%E8%AF%86%E5%88%AB%E5%99%A8) + - [时序动作检测器](#%E6%97%B6%E5%BA%8F%E5%8A%A8%E4%BD%9C%E6%A3%80%E6%B5%8B%E5%99%A8) + + + +## 支持的模型 + +到目前为止,MMAction2 支持将训练的 pytorch 模型中进行 onnx 导出。支持的模型有: + +- I3D +- TSN +- TIN +- TSM +- R(2+1)D +- SLOWFAST +- SLOWONLY +- BMN +- BSN(tem, pem) + +## 如何使用 + +对于简单的模型导出,用户可以使用这里的 [脚本](/tools/deployment/pytorch2onnx.py)。 +注意,需要安装 `onnx` 和 `onnxruntime` 包以进行导出后的验证。 + +### 准备工作 + +首先,安装 onnx + +```shell +pip install onnx onnxruntime +``` + +MMAction2 提供了一个 python 脚本,用于将 MMAction2 训练的 pytorch 模型导出到 ONNX。 + +```shell +python tools/deployment/pytorch2onnx.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--shape ${SHAPE}] \ + [--verify] [--show] [--output-file ${OUTPUT_FILE}] [--is-localizer] [--opset-version ${VERSION}] +``` + +可选参数: + +- `--shape`: 模型输入张量的形状。对于 2D 模型(如 TSN),输入形状应当为 `$batch $clip $channel $height $width` (例如,`1 1 3 224 224`);对于 3D 模型(如 I3D),输入形状应当为 `$batch $clip $channel $time $height $width` (如,`1 1 3 32 224 224`);对于时序检测器如 BSN,每个模块的数据都不相同,请查看对应的 `forward` 函数。如果没有被指定,它将被置为 `1 1 3 224 224`。 +- `--verify`: 决定是否对导出模型进行验证,验证项包括是否可运行,数值是否正确等。如果没有被指定,它将被置为 `False`。 +- `--show`: 决定是否打印导出模型的结构。如果没有被指定,它将被置为 `False`。 +- `--output-file`: 导出的 onnx 模型名。如果没有被指定,它将被置为 `tmp.onnx`。 +- `--is-localizer`:决定导出的模型是否为时序检测器。如果没有被指定,它将被置为 `False`。 +- `--opset-version`:决定 onnx 的执行版本,MMAction2 推荐用户使用高版本(例如 11 版本)的 onnx 以确保稳定性。如果没有被指定,它将被置为 `11`。 +- `--softmax`: 是否在行为识别器末尾添加 Softmax。如果没有指定,将被置为 `False`。目前仅支持行为识别器,不支持时序动作检测器。 + +### 行为识别器 + +对于行为识别器,可运行: + +```shell +python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify +``` + +### 时序动作检测器 + +对于时序动作检测器,可运行: + +```shell +python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify +``` + +如果发现提供的模型权重文件没有被成功导出,或者存在精度损失,可以在本 repo 下提出问题(issue)。 diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/7_customize_runtime.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/7_customize_runtime.md new file mode 100644 index 0000000000000000000000000000000000000000..76507a50888653b1dd75d647d5500417dd0ae59a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/tutorials/7_customize_runtime.md @@ -0,0 +1,347 @@ +# 教程 7:如何自定义模型运行参数 + +在本教程中,我们将介绍如何在运行自定义模型时,进行自定义参数优化方法,学习率调整策略,工作流和钩子的方法。 + + + +- [定制优化方法](#%E5%AE%9A%E5%88%B6%E4%BC%98%E5%8C%96%E6%96%B9%E6%B3%95) + - [使用 PyTorch 内置的优化器](#%E4%BD%BF%E7%94%A8-PyTorch-%E5%86%85%E7%BD%AE%E7%9A%84%E4%BC%98%E5%8C%96%E5%99%A8) + - [定制用户自定义的优化器](#%E5%AE%9A%E5%88%B6%E7%94%A8%E6%88%B7%E8%87%AA%E5%AE%9A%E4%B9%89%E7%9A%84%E4%BC%98%E5%8C%96%E5%99%A8) + - [1. 定义一个新的优化器](#1-%E5%AE%9A%E4%B9%89%E4%B8%80%E4%B8%AA%E6%96%B0%E7%9A%84%E4%BC%98%E5%8C%96%E5%99%A8) + - [2. 注册优化器](#2-%E6%B3%A8%E5%86%8C%E4%BC%98%E5%8C%96%E5%99%A8) + - [3. 在配置文件中指定优化器](#3-%E5%9C%A8%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E4%B8%AD%E6%8C%87%E5%AE%9A%E4%BC%98%E5%8C%96%E5%99%A8) + - [定制优化器构造器](#%E5%AE%9A%E5%88%B6%E4%BC%98%E5%8C%96%E5%99%A8%E6%9E%84%E9%80%A0%E5%99%A8) + - [额外设定](#%E9%A2%9D%E5%A4%96%E8%AE%BE%E5%AE%9A) +- [定制学习率调整策略](#%E5%AE%9A%E5%88%B6%E5%AD%A6%E4%B9%A0%E7%8E%87%E8%B0%83%E6%95%B4%E7%AD%96%E7%95%A5) +- [定制工作流](#%E5%AE%9A%E5%88%B6%E5%B7%A5%E4%BD%9C%E6%B5%81) +- [定制钩子](#%E5%AE%9A%E5%88%B6%E9%92%A9%E5%AD%90) + - [定制用户自定义钩子](#%E5%AE%9A%E5%88%B6%E7%94%A8%E6%88%B7%E8%87%AA%E5%AE%9A%E4%B9%89%E9%92%A9%E5%AD%90) + - [1. 创建一个新钩子](#1-%E5%88%9B%E5%BB%BA%E4%B8%80%E4%B8%AA%E6%96%B0%E9%92%A9%E5%AD%90) + - [2. 注册新钩子](#2-%E6%B3%A8%E5%86%8C%E6%96%B0%E9%92%A9%E5%AD%90) + - [3. 修改配置](#3-%E4%BF%AE%E6%94%B9%E9%85%8D%E7%BD%AE) + - [使用 MMCV 内置钩子](#%E4%BD%BF%E7%94%A8-MMCV-%E5%86%85%E7%BD%AE%E9%92%A9%E5%AD%90) + - [修改默认运行的钩子](#%E4%BF%AE%E6%94%B9%E9%BB%98%E8%AE%A4%E8%BF%90%E8%A1%8C%E7%9A%84%E9%92%A9%E5%AD%90) + - [模型权重文件配置](#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E6%96%87%E4%BB%B6%E9%85%8D%E7%BD%AE) + - [日志配置](#%E6%97%A5%E5%BF%97%E9%85%8D%E7%BD%AE) + - [验证配置](#%E9%AA%8C%E8%AF%81%E9%85%8D%E7%BD%AE) + + + +## 定制优化方法 + +### 使用 PyTorch 内置的优化器 + +MMAction2 支持 PyTorch 实现的所有优化器,仅需在配置文件中,指定 “optimizer” 字段 +例如,如果要使用 “Adam”,则修改如下。 + +```python +optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001) +``` + +要修改模型的学习率,用户只需要在优化程序的配置中修改 “lr” 即可。 +用户可根据 [PyTorch API 文档](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) 进行参数设置 + +例如,如果想使用 `Adam` 并设置参数为 `torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)`, +则需要进行如下修改 + +```python +optimizer = dict(type='Adam', lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False) +``` + +### 定制用户自定义的优化器 + +#### 1. 定义一个新的优化器 + +一个自定义的优化器可根据如下规则进行定制 + +假设用户想添加一个名为 `MyOptimzer` 的优化器,其拥有参数 `a`, `b` 和 `c`, +可以创建一个名为 `mmaction/core/optimizer` 的文件夹,并在目录下的文件进行构建,如 `mmaction/core/optimizer/my_optimizer.py`: + +```python +from mmcv.runner import OPTIMIZERS +from torch.optim import Optimizer + + +@OPTIMIZERS.register_module() +class MyOptimizer(Optimizer): + + def __init__(self, a, b, c): + +``` + +#### 2. 注册优化器 + +要找到上面定义的上述模块,首先应将此模块导入到主命名空间中。有两种方法可以实现它。 + +- 修改 `mmaction/core/optimizer/__init__.py` 来进行调用 + + 新定义的模块应导入到 `mmaction/core/optimizer/__init__.py` 中,以便注册器能找到新模块并将其添加: + +```python +from .my_optimizer import MyOptimizer +``` + +- 在配置中使用 `custom_imports` 手动导入 + +```python +custom_imports = dict(imports=['mmaction.core.optimizer.my_optimizer'], allow_failed_imports=False) +``` + +`mmaction.core.optimizer.my_optimizer` 模块将会在程序开始阶段被导入,`MyOptimizer` 类会随之自动被注册。 +注意,只有包含 `MyOptmizer` 类的包会被导入。`mmaction.core.optimizer.my_optimizer.MyOptimizer` **不会** 被直接导入。 + +#### 3. 在配置文件中指定优化器 + +之后,用户便可在配置文件的 `optimizer` 域中使用 `MyOptimizer`。 +在配置中,优化器由 “optimizer” 字段定义,如下所示: + +```python +optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +``` + +要使用自定义的优化器,可以将该字段更改为 + +```python +optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value) +``` + +### 定制优化器构造器 + +某些模型可能具有一些特定于参数的设置以进行优化,例如 BatchNorm 层的权重衰减。 +用户可以通过自定义优化器构造函数来进行那些细粒度的参数调整。 + +```python +from mmcv.runner.optimizer import OPTIMIZER_BUILDERS + + +@OPTIMIZER_BUILDERS.register_module() +class MyOptimizerConstructor: + + def __init__(self, optimizer_cfg, paramwise_cfg=None): + pass + + def __call__(self, model): + + return my_optimizer +``` + +默认的优化器构造器被创建于[此](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/optimizer/default_constructor.py#L11), +可被视为新优化器构造器的模板。 + +### 额外设定 + +优化器没有实现的优化技巧(trick)可通过优化器构造函数(例如,设置按参数的学习率)或钩子来实现。 +下面列出了一些可以稳定训练或加快训练速度的常用设置。用户亦可通过为 MMAction2 创建 PR,发布更多设置。 + +- __使用梯度裁剪来稳定训练__ + 一些模型需要使用梯度裁剪来剪辑渐变以稳定训练过程。 一个例子如下: + + ```python + optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) + ``` + +- __使用动量调整来加速模型收敛__ + MMAction2 支持动量调整器根据学习率修改模型的动量,从而使模型收敛更快。 + 动量调整程序通常与学习率调整器一起使用,例如,以下配置用于3D检测以加速收敛。 + 更多细节可参考 [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327) + 和 [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130)。 + + ```python + lr_config = dict( + policy='cyclic', + target_ratio=(10, 1e-4), + cyclic_times=1, + step_ratio_up=0.4, + ) + momentum_config = dict( + policy='cyclic', + target_ratio=(0.85 / 0.95, 1), + cyclic_times=1, + step_ratio_up=0.4, + ) + ``` + +## 定制学习率调整策略 + +在配置文件中使用默认值的逐步学习率调整,它调用 MMCV 中的 [`StepLRHook`](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L153)。 +此外,也支持其他学习率调整方法,如 `CosineAnnealing` 和 `Poly`。 详情可见 [这里](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py) + +- Poly: + + ```python + lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) + ``` + +- ConsineAnnealing: + + ```python + lr_config = dict( + policy='CosineAnnealing', + warmup='linear', + warmup_iters=1000, + warmup_ratio=1.0 / 10, + min_lr_ratio=1e-5) + ``` + +## 定制工作流 + +默认情况下,MMAction2 推荐用户在训练周期中使用 “EvalHook” 进行模型验证,也可以选择 “val” 工作流模型进行模型验证。 + +工作流是一个形如 (工作流名, 周期数) 的列表,用于指定运行顺序和周期。其默认设置为: + +```python +workflow = [('train', 1)] +``` + +其代表要进行一轮周期的训练。 +有时,用户可能希望检查有关验证集中模型的某些指标(例如,损失,准确性)。 +在这种情况下,可以将工作流程设置为 + +```python +[('train', 1), ('val', 1)] +``` + +从而将迭代运行1个训练时间和1个验证时间。 + +**值得注意的是**: + +1. 在验证周期时不会更新模型参数。 +2. 配置文件内的关键词 `total_epochs` 控制训练时期数,并且不会影响验证工作流程。 +3. 工作流 `[('train', 1), ('val', 1)]` 和 `[('train', 1)]` 不会改变 `EvalHook` 的行为。 + 因为 `EvalHook` 由 `after_train_epoch` 调用,而验证工作流只会影响 `after_val_epoch` 调用的钩子。 + 因此,`[('train', 1), ('val', 1)]` 和 `[('train', 1)]` 的区别在于,runner 在完成每一轮训练后,会计算验证集上的损失。 + +## 定制钩子 + +### 定制用户自定义钩子 + +#### 1. 创建一个新钩子 + +这里举一个在 MMAction2 中创建一个新钩子,并在训练中使用它的示例: + +```python +from mmcv.runner import HOOKS, Hook + + +@HOOKS.register_module() +class MyHook(Hook): + + def __init__(self, a, b): + pass + + def before_run(self, runner): + pass + + def after_run(self, runner): + pass + + def before_epoch(self, runner): + pass + + def after_epoch(self, runner): + pass + + def before_iter(self, runner): + pass + + def after_iter(self, runner): + pass +``` + +根据钩子的功能,用户需要指定钩子在训练的每个阶段将要执行的操作,比如 `before_run`,`after_run`,`before_epoch`,`after_epoch`,`before_iter` 和 `after_iter`。 + +#### 2. 注册新钩子 + +之后,需要导入 `MyHook`。假设该文件在 `mmaction/core/utils/my_hook.py`,有两种办法导入它: + +- 修改 `mmaction/core/utils/__init__.py` 进行导入 + + 新定义的模块应导入到 `mmaction/core/utils/__init__py` 中,以便注册表能找到并添加新模块: + +```python +from .my_hook import MyHook +``` + +- 使用配置文件中的 `custom_imports` 变量手动导入 + +```python +custom_imports = dict(imports=['mmaction.core.utils.my_hook'], allow_failed_imports=False) +``` + +#### 3. 修改配置 + +```python +custom_hooks = [ + dict(type='MyHook', a=a_value, b=b_value) +] +``` + +还可通过 `priority` 参数(可选参数值包括 `'NORMAL'` 或 `'HIGHEST'`)设置钩子优先级,如下所示: + +```python +custom_hooks = [ + dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL') +] +``` + +默认情况下,在注册过程中,钩子的优先级设置为 “NORMAL”。 + +### 使用 MMCV 内置钩子 + +如果该钩子已在 MMCV 中实现,则可以直接修改配置以使用该钩子,如下所示 + +```python +mmcv_hooks = [ + dict(type='MMCVHook', a=a_value, b=b_value, priority='NORMAL') +] +``` + +### 修改默认运行的钩子 + +有一些常见的钩子未通过 `custom_hooks` 注册,但在导入 MMCV 时已默认注册,它们是: + +- log_config +- checkpoint_config +- evaluation +- lr_config +- optimizer_config +- momentum_config + +在这些钩子中,只有 log_config 具有 “VERY_LOW” 优先级,其他钩子具有 “NORMAL” 优先级。 +上述教程已经介绍了如何修改 “optimizer_config”,“momentum_config” 和 “lr_config”。 +下面介绍如何使用 log_config,checkpoint_config,以及 evaluation 能做什么。 + +#### 模型权重文件配置 + +MMCV 的 runner 使用 `checkpoint_config` 来初始化 [`CheckpointHook`](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py#L9)。 + +```python +checkpoint_config = dict(interval=1) +``` + +用户可以设置 “max_keep_ckpts” 来仅保存少量模型权重文件,或者通过 “save_optimizer” 决定是否存储优化器的状态字典。 +更多细节可参考 [这里](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.CheckpointHook)。 + +#### 日志配置 + +`log_config` 包装了多个记录器钩子,并可以设置间隔。 +目前,MMCV 支持 `WandbLoggerHook`,`MlflowLoggerHook` 和 `TensorboardLoggerHook`。 +更多细节可参考[这里](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook)。 + +```python +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook'), + dict(type='TensorboardLoggerHook') + ]) +``` + +#### 验证配置 + +评估的配置将用于初始化 [`EvalHook`](https://github.com/open-mmlab/mmaction2/blob/master/mmaction/core/evaluation/eval_hooks.py#L12)。 +除了键 `interval` 外,其他参数,如 “metrics” 也将传递给 `dataset.evaluate()`。 + +```python +evaluation = dict(interval=1, metrics='bbox') +``` diff --git a/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/useful_tools.md b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/useful_tools.md new file mode 100644 index 0000000000000000000000000000000000000000..a0969a2bffa26f1a441ebccb8ff7943401a4266d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/docs_zh_CN/useful_tools.md @@ -0,0 +1,161 @@ +除了训练/测试脚本外,MMAction2 还在 `tools/` 目录下提供了许多有用的工具。 + +## 目录 + + + +- [日志分析](#%E6%97%A5%E5%BF%97%E5%88%86%E6%9E%90) +- [模型复杂度分析](#%E6%A8%A1%E5%9E%8B%E5%A4%8D%E6%9D%82%E5%BA%A6%E5%88%86%E6%9E%90) +- [模型转换](#%E6%A8%A1%E5%9E%8B%E8%BD%AC%E6%8D%A2) + - [导出 MMAction2 模型为 ONNX 格式(实验特性)](#%E5%AF%BC%E5%87%BA-MMAction2-%E6%A8%A1%E5%9E%8B%E4%B8%BA-ONNX-%E6%A0%BC%E5%BC%8F%EF%BC%88%E5%AE%9E%E9%AA%8C%E7%89%B9%E6%80%A7%EF%BC%89) + - [发布模型](#%E5%8F%91%E5%B8%83%E6%A8%A1%E5%9E%8B) +- [其他脚本](#%E5%85%B6%E4%BB%96%E8%84%9A%E6%9C%AC) + - [指标评价](#%E6%8C%87%E6%A0%87%E8%AF%84%E4%BB%B7) + - [打印完整配置](#%E6%89%93%E5%8D%B0%E5%AE%8C%E6%95%B4%E9%85%8D%E7%BD%AE) + + + +## 日志分析 + +输入变量指定一个训练日志文件,可通过 `tools/analysis/analyze_logs.py` 脚本绘制 loss/top-k 曲线。本功能依赖于 `seaborn`,使用前请先通过 `pip install seaborn` 安装依赖包。 + +![准确度曲线图](https://github.com/open-mmlab/mmaction2/raw/master/resources/acc_curve.png) + +```shell +python tools/analysis/analyze_logs.py plot_curve ${JSON_LOGS} [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}] +``` + +例如: + +- 绘制某日志文件对应的分类损失曲线图。 + + ```shell + python tools/analysis/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls + ``` + +- 绘制某日志文件对应的 top-1 和 top-5 准确率曲线图,并将曲线图导出为 PDF 文件。 + + ```shell + python tools/analysis/analyze_logs.py plot_curve log.json --keys top1_acc top5_acc --out results.pdf + ``` + +- 在同一图像内绘制两份日志文件对应的 top-1 准确率曲线图。 + + ```shell + python tools/analysis/analyze_logs.py plot_curve log1.json log2.json --keys top1_acc --legend run1 run2 + ``` + + 用户还可以通过本工具计算平均训练速度。 + + ```shell + python tools/analysis/analyze_logs.py cal_train_time ${JSON_LOGS} [--include-outliers] + ``` + +- 计算某日志文件对应的平均训练速度。 + + ```shell + python tools/analysis/analyze_logs.py cal_train_time work_dirs/some_exp/20200422_153324.log.json + ``` + + 预计输出结果如下所示: + + ```text + -----Analyze train time of work_dirs/some_exp/20200422_153324.log.json----- + slowest epoch 60, average time is 0.9736 + fastest epoch 18, average time is 0.9001 + time std over epochs is 0.0177 + average iter time: 0.9330 s/iter + ``` + +## 模型复杂度分析 + +`/tools/analysis/get_flops.py` 是根据 [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) 库改编的脚本,用于计算输入变量指定模型的 FLOPs 和参数量。 + +```shell +python tools/analysis/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}] +``` + +预计输出结果如下所示: + +```text +============================== +Input shape: (1, 3, 32, 340, 256) +Flops: 37.1 GMac +Params: 28.04 M +============================== +``` + +**注意**:该工具仍处于试验阶段,不保证该数字绝对正确。 +用户可以将结果用于简单比较,但若要在技术报告或论文中采用该结果,请仔细检查。 + +(1) FLOPs 与输入变量形状有关,但是模型的参数量与输入变量形状无关。2D 行为识别器的默认形状为 (1, 3, 340, 256),3D 行为识别器的默认形状为 (1, 3, 32, 340, 256)。 +(2) 部分算子不参与 FLOPs 以及参数量的计算,如 GN 和一些自定义算子。更多详细信息请参考 [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) + +## 模型转换 + +### 导出 MMAction2 模型为 ONNX 格式(实验特性) + +`/tools/deployment/pytorch2onnx.py` 脚本用于将模型转换为 [ONNX](https://github.com/onnx/onnx) 格式。 +同时,该脚本支持比较 PyTorch 模型和 ONNX 模型的输出结果,验证输出结果是否相同。 +本功能依赖于 `onnx` 以及 `onnxruntime`,使用前请先通过 `pip install onnx onnxruntime` 安装依赖包。 +请注意,可通过 `--softmax` 选项在行为识别器末尾添加 Softmax 层,从而获取 `[0, 1]` 范围内的预测结果。 + +- 对于行为识别模型,请运行: + + ```shell + python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify + ``` + +- 对于时序动作检测模型,请运行: + + ```shell + python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify + ``` + +### 发布模型 + +`tools/deployment/publish_model.py` 脚本用于进行模型发布前的准备工作,主要包括: + +(1) 将模型的权重张量转化为 CPU 张量。 +(2) 删除优化器状态信息。 +(3) 计算模型权重文件的哈希值,并将哈希值添加到文件名后。 + +```shell +python tools/deployment/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME} +``` + +例如, + +```shell +python tools/deployment/publish_model.py work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth tsn_r50_1x1x3_100e_kinetics400_rgb.pth +``` + +最终,输出文件名为 `tsn_r50_1x1x3_100e_kinetics400_rgb-{hash id}.pth`。 + +## 其他脚本 + +### 指标评价 + +`tools/analysis/eval_metric.py` 脚本通过输入变量指定配置文件,以及对应的结果存储文件,计算某一评价指标。 + +结果存储文件通过 `tools/test.py` 脚本(通过参数 `--out ${RESULT_FILE}` 指定)生成,保存了指定模型在指定数据集中的预测结果。 + +```shell +python tools/analysis/eval_metric.py ${CONFIG_FILE} ${RESULT_FILE} [--eval ${EVAL_METRICS}] [--cfg-options ${CFG_OPTIONS}] [--eval-options ${EVAL_OPTIONS}] +``` + +### 打印完整配置 + +`tools/analysis/print_config.py` 脚本会解析所有输入变量,并打印完整配置信息。 + +```shell +python tools/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}] +``` + +### 检查视频 + +`tools/analysis/check_videos.py` 脚本利用指定视频编码器,遍历指定配置文件视频数据集中所有样本,寻找无效视频文件(文件破损或者文件不存在),并将无效文件路径保存到输出文件中。请注意,删除无效视频文件后,需要重新生成视频文件列表。 + +```shell +python tools/analysis/check_videos.py ${CONFIG} [-h] [--options OPTIONS [OPTIONS ...]] [--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]] [--output-file OUTPUT_FILE] [--split SPLIT] [--decoder DECODER] [--num-processes NUM_PROCESSES] [--remove-corrupted-videos] +``` diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..27fd1f3ea6e226cd3fbb35320290f8a8c9ba3ef8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/__init__.py @@ -0,0 +1,16 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import mmcv +from mmcv import digit_version + +from .version import __version__ + +mmcv_minimum_version = '1.3.6' +mmcv_maximum_version = '1.7.0' +mmcv_version = digit_version(mmcv.__version__) + +assert (digit_version(mmcv_minimum_version) <= mmcv_version + <= digit_version(mmcv_maximum_version)), \ + f'MMCV=={mmcv.__version__} is used but incompatible. ' \ + f'Please install mmcv>={mmcv_minimum_version}, <={mmcv_maximum_version}.' + +__all__ = ['__version__'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/apis/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/apis/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..15961080e7b2031ea419de8ca0050d6cb4a35443 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/apis/__init__.py @@ -0,0 +1,9 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .inference import inference_recognizer, init_recognizer +from .test import multi_gpu_test, single_gpu_test +from .train import init_random_seed, train_model + +__all__ = [ + 'train_model', 'init_recognizer', 'inference_recognizer', 'multi_gpu_test', + 'single_gpu_test', 'init_random_seed' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/apis/inference.py b/openmmlab_test/mmaction2-0.24.1/mmaction/apis/inference.py new file mode 100644 index 0000000000000000000000000000000000000000..f303d20ed83be9319280e3093f5172495c3f804f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/apis/inference.py @@ -0,0 +1,192 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os +import os.path as osp +import re +import warnings +from operator import itemgetter + +import mmcv +import numpy as np +import torch +from mmcv.parallel import collate, scatter +from mmcv.runner import load_checkpoint + +from mmaction.core import OutputHook +from mmaction.datasets.pipelines import Compose +from mmaction.models import build_recognizer + + +def init_recognizer(config, checkpoint=None, device='cuda:0', **kwargs): + """Initialize a recognizer from config file. + + Args: + config (str | :obj:`mmcv.Config`): Config file path or the config + object. + checkpoint (str | None, optional): Checkpoint path/url. If set to None, + the model will not load any weights. Default: None. + device (str | :obj:`torch.device`): The desired device of returned + tensor. Default: 'cuda:0'. + + Returns: + nn.Module: The constructed recognizer. + """ + if 'use_frames' in kwargs: + warnings.warn('The argument `use_frames` is deprecated PR #1191. ' + 'Now you can use models trained with frames or videos ' + 'arbitrarily. ') + + if isinstance(config, str): + config = mmcv.Config.fromfile(config) + elif not isinstance(config, mmcv.Config): + raise TypeError('config must be a filename or Config object, ' + f'but got {type(config)}') + + # pretrained model is unnecessary since we directly load checkpoint later + config.model.backbone.pretrained = None + model = build_recognizer(config.model, test_cfg=config.get('test_cfg')) + + if checkpoint is not None: + load_checkpoint(model, checkpoint, map_location='cpu') + model.cfg = config + model.to(device) + model.eval() + return model + + +def inference_recognizer(model, video, outputs=None, as_tensor=True, **kwargs): + """Inference a video with the recognizer. + + Args: + model (nn.Module): The loaded recognizer. + video (str | dict | ndarray): The video file path / url or the + rawframes directory path / results dictionary (the input of + pipeline) / a 4D array T x H x W x 3 (The input video). + outputs (list(str) | tuple(str) | str | None) : Names of layers whose + outputs need to be returned, default: None. + as_tensor (bool): Same as that in ``OutputHook``. Default: True. + + Returns: + dict[tuple(str, float)]: Top-5 recognition result dict. + dict[torch.tensor | np.ndarray]: + Output feature maps from layers specified in `outputs`. + """ + if 'use_frames' in kwargs: + warnings.warn('The argument `use_frames` is deprecated PR #1191. ' + 'Now you can use models trained with frames or videos ' + 'arbitrarily. ') + if 'label_path' in kwargs: + warnings.warn('The argument `use_frames` is deprecated PR #1191. ' + 'Now the label file is not needed in ' + 'inference_recognizer. ') + + input_flag = None + if isinstance(video, dict): + input_flag = 'dict' + elif isinstance(video, np.ndarray): + assert len(video.shape) == 4, 'The shape should be T x H x W x C' + input_flag = 'array' + elif isinstance(video, str) and video.startswith('http'): + input_flag = 'video' + elif isinstance(video, str) and osp.exists(video): + if osp.isfile(video): + if video.endswith('.npy'): + input_flag = 'audio' + else: + input_flag = 'video' + if osp.isdir(video): + input_flag = 'rawframes' + else: + raise RuntimeError('The type of argument video is not supported: ' + f'{type(video)}') + + if isinstance(outputs, str): + outputs = (outputs, ) + assert outputs is None or isinstance(outputs, (tuple, list)) + + cfg = model.cfg + device = next(model.parameters()).device # model device + # build the data pipeline + test_pipeline = cfg.data.test.pipeline + # Alter data pipelines & prepare inputs + if input_flag == 'dict': + data = video + if input_flag == 'array': + modality_map = {2: 'Flow', 3: 'RGB'} + modality = modality_map.get(video.shape[-1]) + data = dict( + total_frames=video.shape[0], + label=-1, + start_index=0, + array=video, + modality=modality) + for i in range(len(test_pipeline)): + if 'Decode' in test_pipeline[i]['type']: + test_pipeline[i] = dict(type='ArrayDecode') + test_pipeline = [x for x in test_pipeline if 'Init' not in x['type']] + if input_flag == 'video': + data = dict(filename=video, label=-1, start_index=0, modality='RGB') + if 'Init' not in test_pipeline[0]['type']: + test_pipeline = [dict(type='OpenCVInit')] + test_pipeline + else: + test_pipeline[0] = dict(type='OpenCVInit') + for i in range(len(test_pipeline)): + if 'Decode' in test_pipeline[i]['type']: + test_pipeline[i] = dict(type='OpenCVDecode') + if input_flag == 'rawframes': + filename_tmpl = cfg.data.test.get('filename_tmpl', 'img_{:05}.jpg') + modality = cfg.data.test.get('modality', 'RGB') + start_index = cfg.data.test.get('start_index', 1) + + # count the number of frames that match the format of `filename_tmpl` + # RGB pattern example: img_{:05}.jpg -> ^img_\d+.jpg$ + # Flow patteren example: {}_{:05d}.jpg -> ^x_\d+.jpg$ + pattern = f'^{filename_tmpl}$' + if modality == 'Flow': + pattern = pattern.replace('{}', 'x') + pattern = pattern.replace( + pattern[pattern.find('{'):pattern.find('}') + 1], '\\d+') + total_frames = len( + list( + filter(lambda x: re.match(pattern, x) is not None, + os.listdir(video)))) + data = dict( + frame_dir=video, + total_frames=total_frames, + label=-1, + start_index=start_index, + filename_tmpl=filename_tmpl, + modality=modality) + if 'Init' in test_pipeline[0]['type']: + test_pipeline = test_pipeline[1:] + for i in range(len(test_pipeline)): + if 'Decode' in test_pipeline[i]['type']: + test_pipeline[i] = dict(type='RawFrameDecode') + if input_flag == 'audio': + data = dict( + audio_path=video, + total_frames=len(np.load(video)), + start_index=cfg.data.test.get('start_index', 1), + label=-1) + + test_pipeline = Compose(test_pipeline) + data = test_pipeline(data) + data = collate([data], samples_per_gpu=1) + + if next(model.parameters()).is_cuda: + # scatter to specified GPU + data = scatter(data, [device])[0] + + # forward the model + with OutputHook(model, outputs=outputs, as_tensor=as_tensor) as h: + with torch.no_grad(): + scores = model(return_loss=False, **data)[0] + returned_features = h.layer_outputs if outputs else None + + num_classes = scores.shape[-1] + score_tuples = tuple(zip(range(num_classes), scores)) + score_sorted = sorted(score_tuples, key=itemgetter(1), reverse=True) + + top5_label = score_sorted[:5] + if outputs: + return top5_label, returned_features + return top5_label diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/apis/test.py b/openmmlab_test/mmaction2-0.24.1/mmaction/apis/test.py new file mode 100644 index 0000000000000000000000000000000000000000..742b0e4ff760a631672202d09d68879a504a9f42 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/apis/test.py @@ -0,0 +1,206 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp +import pickle +import shutil +import tempfile +# TODO import test functions from mmcv and delete them from mmaction2 +import warnings + +import mmcv +import torch +import torch.distributed as dist +from mmcv.runner import get_dist_info + +try: + from mmcv.engine import (collect_results_cpu, collect_results_gpu, + multi_gpu_test, single_gpu_test) + from_mmcv = True +except (ImportError, ModuleNotFoundError): + warnings.warn( + 'DeprecationWarning: single_gpu_test, multi_gpu_test, ' + 'collect_results_cpu, collect_results_gpu from mmaction2 will be ' + 'deprecated. Please install mmcv through master branch.') + from_mmcv = False + +if not from_mmcv: + + def single_gpu_test(model, data_loader): # noqa: F811 + """Test model with a single gpu. + + This method tests model with a single gpu and + displays test progress bar. + + Args: + model (nn.Module): Model to be tested. + data_loader (nn.Dataloader): Pytorch data loader. + + Returns: + list: The prediction results. + """ + model.eval() + results = [] + dataset = data_loader.dataset + prog_bar = mmcv.ProgressBar(len(dataset)) + for data in data_loader: + with torch.no_grad(): + result = model(return_loss=False, **data) + results.extend(result) + + # use the first key as main key to calculate the batch size + batch_size = len(next(iter(data.values()))) + for _ in range(batch_size): + prog_bar.update() + return results + + def multi_gpu_test( # noqa: F811 + model, data_loader, tmpdir=None, gpu_collect=True): + """Test model with multiple gpus. + + This method tests model with multiple gpus and collects the results + under two different modes: gpu and cpu modes. By setting + 'gpu_collect=True' it encodes results to gpu tensors and use gpu + communication for results collection. On cpu mode it saves the results + on different gpus to 'tmpdir' and collects them by the rank 0 worker. + + Args: + model (nn.Module): Model to be tested. + data_loader (nn.Dataloader): Pytorch data loader. + tmpdir (str): Path of directory to save the temporary results from + different gpus under cpu mode. Default: None + gpu_collect (bool): Option to use either gpu or cpu to collect + results. Default: True + + Returns: + list: The prediction results. + """ + model.eval() + results = [] + dataset = data_loader.dataset + rank, world_size = get_dist_info() + if rank == 0: + prog_bar = mmcv.ProgressBar(len(dataset)) + for data in data_loader: + with torch.no_grad(): + result = model(return_loss=False, **data) + results.extend(result) + + if rank == 0: + # use the first key as main key to calculate the batch size + batch_size = len(next(iter(data.values()))) + for _ in range(batch_size * world_size): + prog_bar.update() + + # collect results from all ranks + if gpu_collect: + results = collect_results_gpu(results, len(dataset)) + else: + results = collect_results_cpu(results, len(dataset), tmpdir) + return results + + def collect_results_cpu(result_part, size, tmpdir=None): # noqa: F811 + """Collect results in cpu mode. + + It saves the results on different gpus to 'tmpdir' and collects + them by the rank 0 worker. + + Args: + result_part (list): Results to be collected + size (int): Result size. + tmpdir (str): Path of directory to save the temporary results from + different gpus under cpu mode. Default: None + + Returns: + list: Ordered results. + """ + rank, world_size = get_dist_info() + # create a tmp dir if it is not specified + if tmpdir is None: + MAX_LEN = 512 + # 32 is whitespace + dir_tensor = torch.full((MAX_LEN, ), + 32, + dtype=torch.uint8, + device='cuda') + if rank == 0: + mmcv.mkdir_or_exist('.dist_test') + tmpdir = tempfile.mkdtemp(dir='.dist_test') + tmpdir = torch.tensor( + bytearray(tmpdir.encode()), + dtype=torch.uint8, + device='cuda') + dir_tensor[:len(tmpdir)] = tmpdir + dist.broadcast(dir_tensor, 0) + tmpdir = dir_tensor.cpu().numpy().tobytes().decode().rstrip() + else: + tmpdir = osp.join(tmpdir, '.dist_test') + mmcv.mkdir_or_exist(tmpdir) + # synchronizes all processes to make sure tmpdir exist + dist.barrier() + # dump the part result to the dir + mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl')) + # synchronizes all processes for loading pickle file + dist.barrier() + # collect all parts + if rank != 0: + return None + # load results of all parts from tmp dir + part_list = [] + for i in range(world_size): + part_file = osp.join(tmpdir, f'part_{i}.pkl') + part_list.append(mmcv.load(part_file)) + # sort the results + ordered_results = [] + for res in zip(*part_list): + ordered_results.extend(list(res)) + # the dataloader may pad some samples + ordered_results = ordered_results[:size] + # remove tmp dir + shutil.rmtree(tmpdir) + return ordered_results + + def collect_results_gpu(result_part, size): # noqa: F811 + """Collect results in gpu mode. + + It encodes results to gpu tensors and use gpu communication for results + collection. + + Args: + result_part (list): Results to be collected + size (int): Result size. + + Returns: + list: Ordered results. + """ + rank, world_size = get_dist_info() + # dump result part to tensor with pickle + part_tensor = torch.tensor( + bytearray(pickle.dumps(result_part)), + dtype=torch.uint8, + device='cuda') + # gather all result part tensor shape + shape_tensor = torch.tensor(part_tensor.shape, device='cuda') + shape_list = [shape_tensor.clone() for _ in range(world_size)] + dist.all_gather(shape_list, shape_tensor) + # padding result part tensor to max length + shape_max = torch.tensor(shape_list).max() + part_send = torch.zeros(shape_max, dtype=torch.uint8, device='cuda') + part_send[:shape_tensor[0]] = part_tensor + part_recv_list = [ + part_tensor.new_zeros(shape_max) for _ in range(world_size) + ] + # gather all result part + dist.all_gather(part_recv_list, part_send) + + if rank == 0: + part_list = [] + for recv, shape in zip(part_recv_list, shape_list): + part_list.append( + pickle.loads(recv[:shape[0]].cpu().numpy().tobytes())) + # sort the results + ordered_results = [] + for res in zip(*part_list): + ordered_results.extend(list(res)) + # the dataloader may pad some samples + ordered_results = ordered_results[:size] + return ordered_results + return None diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/apis/train.py b/openmmlab_test/mmaction2-0.24.1/mmaction/apis/train.py new file mode 100644 index 0000000000000000000000000000000000000000..b0c7e06a7f8c9203d08eaff1329ecf049d2c3e78 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/apis/train.py @@ -0,0 +1,304 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy as cp +import os +import os.path as osp +import time + +import numpy as np +import torch +import torch.distributed as dist +from mmcv.runner import (DistSamplerSeedHook, EpochBasedRunner, OptimizerHook, + build_optimizer, get_dist_info) +from mmcv.runner.hooks import Fp16OptimizerHook + +from ..core import (DistEvalHook, EvalHook, OmniSourceDistSamplerSeedHook, + OmniSourceRunner) +from ..datasets import build_dataloader, build_dataset +from ..utils import (PreciseBNHook, build_ddp, build_dp, default_device, + get_root_logger) +from .test import multi_gpu_test + + +def init_random_seed(seed=None, device=default_device, distributed=True): + """Initialize random seed. + + If the seed is not set, the seed will be automatically randomized, + and then broadcast to all processes to prevent some potential bugs. + Args: + seed (int, Optional): The seed. Default to None. + device (str): The device where the seed will be put on. + Default to 'cuda'. + distributed (bool): Whether to use distributed training. + Default: True. + Returns: + int: Seed to be used. + """ + if seed is not None: + return seed + + # Make sure all ranks share the same random seed to prevent + # some potential bugs. Please refer to + # https://github.com/open-mmlab/mmdetection/issues/6339 + rank, world_size = get_dist_info() + seed = np.random.randint(2**31) + + if world_size == 1: + return seed + + if rank == 0: + random_num = torch.tensor(seed, dtype=torch.int32, device=device) + else: + random_num = torch.tensor(0, dtype=torch.int32, device=device) + + if distributed: + dist.broadcast(random_num, src=0) + return random_num.item() + + +def train_model(model, + dataset, + cfg, + distributed=False, + validate=False, + test=dict(test_best=False, test_last=False), + timestamp=None, + meta=None): + """Train model entry function. + + Args: + model (nn.Module): The model to be trained. + dataset (:obj:`Dataset`): Train dataset. + cfg (dict): The config dict for training. + distributed (bool): Whether to use distributed training. + Default: False. + validate (bool): Whether to do evaluation. Default: False. + test (dict): The testing option, with two keys: test_last & test_best. + The value is True or False, indicating whether to test the + corresponding checkpoint. + Default: dict(test_best=False, test_last=False). + timestamp (str | None): Local time for runner. Default: None. + meta (dict | None): Meta dict to record some important information. + Default: None + """ + logger = get_root_logger(log_level=cfg.log_level) + + # prepare data loaders + dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset] + + dataloader_setting = dict( + videos_per_gpu=cfg.data.get('videos_per_gpu', 1), + workers_per_gpu=cfg.data.get('workers_per_gpu', 1), + persistent_workers=cfg.data.get('persistent_workers', False), + num_gpus=len(cfg.gpu_ids), + dist=distributed, + seed=cfg.seed) + dataloader_setting = dict(dataloader_setting, + **cfg.data.get('train_dataloader', {})) + + if cfg.omnisource: + # The option can override videos_per_gpu + train_ratio = cfg.data.get('train_ratio', [1] * len(dataset)) + omni_videos_per_gpu = cfg.data.get('omni_videos_per_gpu', None) + if omni_videos_per_gpu is None: + dataloader_settings = [dataloader_setting] * len(dataset) + else: + dataloader_settings = [] + for videos_per_gpu in omni_videos_per_gpu: + this_setting = cp.deepcopy(dataloader_setting) + this_setting['videos_per_gpu'] = videos_per_gpu + dataloader_settings.append(this_setting) + data_loaders = [ + build_dataloader(ds, **setting) + for ds, setting in zip(dataset, dataloader_settings) + ] + + else: + data_loaders = [ + build_dataloader(ds, **dataloader_setting) for ds in dataset + ] + + # put model on gpus + if distributed: + find_unused_parameters = cfg.get('find_unused_parameters', False) + # Sets the `find_unused_parameters` parameter in + # torch.nn.parallel.DistributedDataParallel + + model = build_ddp( + model, + default_device, + default_args=dict( + device_ids=[int(os.environ['LOCAL_RANK'])], + broadcast_buffers=False, + find_unused_parameters=find_unused_parameters)) + else: + model = build_dp( + model, default_device, default_args=dict(device_ids=cfg.gpu_ids)) + + # build runner + optimizer = build_optimizer(model, cfg.optimizer) + + Runner = OmniSourceRunner if cfg.omnisource else EpochBasedRunner + runner = Runner( + model, + optimizer=optimizer, + work_dir=cfg.work_dir, + logger=logger, + meta=meta) + # an ugly workaround to make .log and .log.json filenames the same + runner.timestamp = timestamp + + # fp16 setting + fp16_cfg = cfg.get('fp16', None) + if fp16_cfg is not None: + optimizer_config = Fp16OptimizerHook( + **cfg.optimizer_config, **fp16_cfg, distributed=distributed) + elif distributed and 'type' not in cfg.optimizer_config: + optimizer_config = OptimizerHook(**cfg.optimizer_config) + else: + optimizer_config = cfg.optimizer_config + + # register hooks + runner.register_training_hooks( + cfg.lr_config, + optimizer_config, + cfg.checkpoint_config, + cfg.log_config, + cfg.get('momentum_config', None), + custom_hooks_config=cfg.get('custom_hooks', None)) + + # multigrid setting + multigrid_cfg = cfg.get('multigrid', None) + if multigrid_cfg is not None: + from mmaction.utils.multigrid import LongShortCycleHook + multigrid_scheduler = LongShortCycleHook(cfg) + runner.register_hook(multigrid_scheduler) + logger.info('Finish register multigrid hook') + + # subbn3d aggregation is HIGH, as it should be done before + # saving and evaluation + from mmaction.utils.multigrid import SubBatchNorm3dAggregationHook + subbn3d_aggre_hook = SubBatchNorm3dAggregationHook() + runner.register_hook(subbn3d_aggre_hook, priority='VERY_HIGH') + logger.info('Finish register subbn3daggre hook') + + # precise bn setting + if cfg.get('precise_bn', False): + precise_bn_dataset = build_dataset(cfg.data.train) + dataloader_setting = dict( + videos_per_gpu=cfg.data.get('videos_per_gpu', 1), + workers_per_gpu=1, # save memory and time + persistent_workers=cfg.data.get('persistent_workers', False), + num_gpus=len(cfg.gpu_ids), + dist=distributed, + seed=cfg.seed) + data_loader_precise_bn = build_dataloader(precise_bn_dataset, + **dataloader_setting) + precise_bn_hook = PreciseBNHook(data_loader_precise_bn, + **cfg.get('precise_bn')) + runner.register_hook(precise_bn_hook, priority='HIGHEST') + logger.info('Finish register precisebn hook') + + if distributed: + if cfg.omnisource: + runner.register_hook(OmniSourceDistSamplerSeedHook()) + else: + runner.register_hook(DistSamplerSeedHook()) + + if validate: + eval_cfg = cfg.get('evaluation', {}) + val_dataset = build_dataset(cfg.data.val, dict(test_mode=True)) + dataloader_setting = dict( + videos_per_gpu=cfg.data.get('videos_per_gpu', 1), + workers_per_gpu=cfg.data.get('workers_per_gpu', 1), + persistent_workers=cfg.data.get('persistent_workers', False), + # cfg.gpus will be ignored if distributed + num_gpus=len(cfg.gpu_ids), + dist=distributed, + shuffle=False) + dataloader_setting = dict(dataloader_setting, + **cfg.data.get('val_dataloader', {})) + val_dataloader = build_dataloader(val_dataset, **dataloader_setting) + eval_hook = DistEvalHook(val_dataloader, **eval_cfg) if distributed \ + else EvalHook(val_dataloader, **eval_cfg) + runner.register_hook(eval_hook, priority='LOW') + + if cfg.resume_from: + runner.resume(cfg.resume_from) + elif cfg.load_from: + runner.load_checkpoint(cfg.load_from) + runner_kwargs = dict() + if cfg.omnisource: + runner_kwargs = dict(train_ratio=train_ratio) + runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs) + + if distributed: + dist.barrier() + time.sleep(5) + + if test['test_last'] or test['test_best']: + best_ckpt_path = None + if test['test_best']: + ckpt_paths = [x for x in os.listdir(cfg.work_dir) if 'best' in x] + ckpt_paths = [x for x in ckpt_paths if x.endswith('.pth')] + if len(ckpt_paths) == 0: + runner.logger.info('Warning: test_best set, but no ckpt found') + test['test_best'] = False + if not test['test_last']: + return + elif len(ckpt_paths) > 1: + epoch_ids = [ + int(x.split('epoch_')[-1][:-4]) for x in ckpt_paths + ] + best_ckpt_path = ckpt_paths[np.argmax(epoch_ids)] + else: + best_ckpt_path = ckpt_paths[0] + if best_ckpt_path: + best_ckpt_path = osp.join(cfg.work_dir, best_ckpt_path) + + test_dataset = build_dataset(cfg.data.test, dict(test_mode=True)) + gpu_collect = cfg.get('evaluation', {}).get('gpu_collect', False) + tmpdir = cfg.get('evaluation', {}).get('tmpdir', + osp.join(cfg.work_dir, 'tmp')) + dataloader_setting = dict( + videos_per_gpu=cfg.data.get('videos_per_gpu', 1), + workers_per_gpu=cfg.data.get('workers_per_gpu', 1), + persistent_workers=cfg.data.get('persistent_workers', False), + num_gpus=len(cfg.gpu_ids), + dist=distributed, + shuffle=False) + dataloader_setting = dict(dataloader_setting, + **cfg.data.get('test_dataloader', {})) + + test_dataloader = build_dataloader(test_dataset, **dataloader_setting) + + names, ckpts = [], [] + + if test['test_last']: + names.append('last') + ckpts.append(None) + if test['test_best'] and best_ckpt_path is not None: + names.append('best') + ckpts.append(best_ckpt_path) + + for name, ckpt in zip(names, ckpts): + if ckpt is not None: + runner.load_checkpoint(ckpt) + + outputs = multi_gpu_test(runner.model, test_dataloader, tmpdir, + gpu_collect) + rank, _ = get_dist_info() + if rank == 0: + out = osp.join(cfg.work_dir, f'{name}_pred.pkl') + test_dataset.dump_results(outputs, out) + + eval_cfg = cfg.get('evaluation', {}) + for key in [ + 'interval', 'tmpdir', 'start', 'gpu_collect', + 'save_best', 'rule', 'by_epoch', 'broadcast_bn_buffers' + ]: + eval_cfg.pop(key, None) + + eval_res = test_dataset.evaluate(outputs, **eval_cfg) + runner.logger.info(f'Testing results of the {name} checkpoint') + for metric_name, val in eval_res.items(): + runner.logger.info(f'{metric_name}: {val:.04f}') diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..92c53bf8de53e1e020d198e92743643b0fc93ed3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/__init__.py @@ -0,0 +1,9 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .bbox import * # noqa: F401, F403 +from .dist_utils import * # noqa: F401, F403 +from .evaluation import * # noqa: F401, F403 +from .hooks import * # noqa: F401, F403 +from .lr import * # noqa: F401, F403 +from .optimizer import * # noqa: F401, F403 +from .runner import * # noqa: F401, F403 +from .scheduler import * # noqa: F401, F403 diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..27d8fe053264d015c5ae952be06048a08942b679 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/__init__.py @@ -0,0 +1,6 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .assigners import MaxIoUAssignerAVA +from .bbox_target import bbox_target +from .transforms import bbox2result + +__all__ = ['MaxIoUAssignerAVA', 'bbox_target', 'bbox2result'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/assigners/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/assigners/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..0e9112412b1e032799a64272ba75d92e28a522e2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/assigners/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .max_iou_assigner_ava import MaxIoUAssignerAVA + +__all__ = ['MaxIoUAssignerAVA'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/assigners/max_iou_assigner_ava.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/assigners/max_iou_assigner_ava.py new file mode 100644 index 0000000000000000000000000000000000000000..3f5439bbbe7f5486366b3997a733648a2d1d03a0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/assigners/max_iou_assigner_ava.py @@ -0,0 +1,142 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch + +try: + from mmdet.core.bbox import AssignResult, MaxIoUAssigner + from mmdet.core.bbox.builder import BBOX_ASSIGNERS + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + +if mmdet_imported: + + @BBOX_ASSIGNERS.register_module() + class MaxIoUAssignerAVA(MaxIoUAssigner): + """Assign a corresponding gt bbox or background to each bbox. + + Each proposals will be assigned with `-1`, `0`, or a positive integer + indicating the ground truth index. + + - -1: don't care + - 0: negative sample, no assigned gt + - positive integer: positive sample, index (1-based) of assigned gt + + Args: + pos_iou_thr (float): IoU threshold for positive bboxes. + neg_iou_thr (float | tuple): IoU threshold for negative bboxes. + min_pos_iou (float): Minimum iou for a bbox to be considered as a + positive bbox. Positive samples can have smaller IoU than + pos_iou_thr due to the 4th step (assign max IoU sample to each + gt). Default: 0. + gt_max_assign_all (bool): Whether to assign all bboxes with the + same highest overlap with some gt to that gt. Default: True. + """ + + # The function is overridden, to handle the case that gt_label is not + # int + def assign_wrt_overlaps(self, overlaps, gt_labels=None): + """Assign w.r.t. the overlaps of bboxes with gts. + + Args: + overlaps (Tensor): Overlaps between k gt_bboxes and n bboxes, + shape(k, n). + gt_labels (Tensor, optional): Labels of k gt_bboxes, shape + (k, ). + + Returns: + :obj:`AssignResult`: The assign result. + """ + num_gts, num_bboxes = overlaps.size(0), overlaps.size(1) + + # 1. assign -1 by default + assigned_gt_inds = overlaps.new_full((num_bboxes, ), + -1, + dtype=torch.long) + + if num_gts == 0 or num_bboxes == 0: + # No ground truth or boxes, return empty assignment + max_overlaps = overlaps.new_zeros((num_bboxes, )) + if num_gts == 0: + # No truth, assign everything to background + assigned_gt_inds[:] = 0 + if gt_labels is None: + assigned_labels = None + else: + assigned_labels = overlaps.new_full((num_bboxes, ), + -1, + dtype=torch.long) + return AssignResult( + num_gts, + assigned_gt_inds, + max_overlaps, + labels=assigned_labels) + + # for each anchor, which gt best overlaps with it + # for each anchor, the max iou of all gts + max_overlaps, argmax_overlaps = overlaps.max(dim=0) + # for each gt, which anchor best overlaps with it + # for each gt, the max iou of all proposals + gt_max_overlaps, gt_argmax_overlaps = overlaps.max(dim=1) + + # 2. assign negative: below + # the negative inds are set to be 0 + if isinstance(self.neg_iou_thr, float): + assigned_gt_inds[(max_overlaps >= 0) + & (max_overlaps < self.neg_iou_thr)] = 0 + elif isinstance(self.neg_iou_thr, tuple): + assert len(self.neg_iou_thr) == 2 + assigned_gt_inds[(max_overlaps >= self.neg_iou_thr[0]) + & (max_overlaps < self.neg_iou_thr[1])] = 0 + + # 3. assign positive: above positive IoU threshold + pos_inds = max_overlaps >= self.pos_iou_thr + assigned_gt_inds[pos_inds] = argmax_overlaps[pos_inds] + 1 + + if self.match_low_quality: + # Low-quality matching will overwrite the assigned_gt_inds + # assigned in Step 3. Thus, the assigned gt might not be the + # best one for prediction. + # For example, if bbox A has 0.9 and 0.8 iou with GT bbox + # 1 & 2, bbox 1 will be assigned as the best target for bbox A + # in step 3. However, if GT bbox 2's gt_argmax_overlaps = A, + # bbox A's assigned_gt_inds will be overwritten to be bbox B. + # This might be the reason that it is not used in ROI Heads. + for i in range(num_gts): + if gt_max_overlaps[i] >= self.min_pos_iou: + if self.gt_max_assign_all: + max_iou_inds = overlaps[i, :] == gt_max_overlaps[i] + assigned_gt_inds[max_iou_inds] = i + 1 + else: + assigned_gt_inds[gt_argmax_overlaps[i]] = i + 1 + + if gt_labels is not None: + # consider multi-class case (AVA) + assert len(gt_labels[0]) > 1 + assigned_labels = assigned_gt_inds.new_zeros( + (num_bboxes, len(gt_labels[0])), dtype=torch.float32) + + # If not assigned, labels will be all 0 + pos_inds = torch.nonzero( + assigned_gt_inds > 0, as_tuple=False).squeeze() + if pos_inds.numel() > 0: + assigned_labels[pos_inds] = gt_labels[ + assigned_gt_inds[pos_inds] - 1] + else: + assigned_labels = None + + return AssignResult( + num_gts, + assigned_gt_inds, + max_overlaps, + labels=assigned_labels) + +else: + # define an empty class, so that can be imported + class MaxIoUAssignerAVA: + + def __init__(self, *args, **kwargs): + raise ImportError( + 'Failed to import `AssignResult`, `MaxIoUAssigner` from ' + '`mmdet.core.bbox` or failed to import `BBOX_ASSIGNERS` from ' + '`mmdet.core.bbox.builder`. The class `MaxIoUAssignerAVA` is ' + 'invalid. ') diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/bbox_target.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/bbox_target.py new file mode 100644 index 0000000000000000000000000000000000000000..2d9f099e1b753ee8b295e3241b92dfbef9a10e7f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/bbox_target.py @@ -0,0 +1,42 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn.functional as F + + +def bbox_target(pos_bboxes_list, neg_bboxes_list, gt_labels, cfg): + """Generate classification targets for bboxes. + + Args: + pos_bboxes_list (list[Tensor]): Positive bboxes list. + neg_bboxes_list (list[Tensor]): Negative bboxes list. + gt_labels (list[Tensor]): Groundtruth classification label list. + cfg (Config): RCNN config. + + Returns: + (Tensor, Tensor): Label and label_weight for bboxes. + """ + labels, label_weights = [], [] + pos_weight = 1.0 if cfg.pos_weight <= 0 else cfg.pos_weight + + assert len(pos_bboxes_list) == len(neg_bboxes_list) == len(gt_labels) + length = len(pos_bboxes_list) + + for i in range(length): + pos_bboxes = pos_bboxes_list[i] + neg_bboxes = neg_bboxes_list[i] + gt_label = gt_labels[i] + + num_pos = pos_bboxes.size(0) + num_neg = neg_bboxes.size(0) + num_samples = num_pos + num_neg + label = F.pad(gt_label, (0, 0, 0, num_neg)) + label_weight = pos_bboxes.new_zeros(num_samples) + label_weight[:num_pos] = pos_weight + label_weight[-num_neg:] = 1. + + labels.append(label) + label_weights.append(label_weight) + + labels = torch.cat(labels, 0) + label_weights = torch.cat(label_weights, 0) + return labels, label_weights diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/transforms.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..4defb1817dfc10db71d1f9bb9463af692bb7ec45 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/bbox/transforms.py @@ -0,0 +1,57 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np + + +def bbox2result(bboxes, labels, num_classes, thr=0.01): + """Convert detection results to a list of numpy arrays. + + This identifies single-label classification (as opposed to multi-label) + through the thr parameter which is set to a negative value. + + Currently, the way to set this is to set + `test_cfg.rcnn.action_thr=-1.0` + ToDo: The ideal way would be for this to be automatically set when the + model cfg uses multilabel=False, however this could be a breaking change + and is left as a future exercise. + NB - this should not interfere with the evaluation in any case. + + Args: + bboxes (Tensor): shape (n, 4) + labels (Tensor): shape (n, #num_classes) + num_classes (int): class number, including background class + thr (float): The score threshold used when converting predictions to + detection results. If a single negative value, uses single-label + classification + Returns: + list(ndarray): bbox results of each class + """ + if bboxes.shape[0] == 0: + return list(np.zeros((num_classes - 1, 0, 5), dtype=np.float32)) + + bboxes = bboxes.cpu().numpy() + scores = labels.cpu().numpy() # rename for clarification + + # Although we can handle single-label classification, we still want scores + assert scores.shape[-1] > 1 + + # Robustly check for multi/single-label: + if not hasattr(thr, '__len__'): + multilabel = thr >= 0 + thr = (thr, ) * num_classes + else: + multilabel = True + + # Check Shape + assert scores.shape[1] == num_classes + assert len(thr) == num_classes + + result = [] + for i in range(num_classes - 1): + if multilabel: + where = (scores[:, i + 1] > thr[i + 1]) + else: + where = (scores[:, 1:].argmax(axis=1) == i) + result.append( + np.concatenate((bboxes[where, :4], scores[where, i + 1:i + 2]), + axis=1)) + return result diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/dist_utils.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/dist_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..cae452d9bd42590406d353053548fd8e71f6329a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/dist_utils.py @@ -0,0 +1,43 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import torch +import torch.distributed as dist +from mmcv.runner import get_dist_info + +from ..utils import default_device + + +def sync_random_seed(seed=None, device=default_device): + """Make sure different ranks share the same seed. All workers must call + this function, otherwise it will deadlock. This method is generally used in + `DistributedSampler`, because the seed should be identical across all + processes in the distributed group. + + In distributed sampling, different ranks should sample non-overlapped + data in the dataset. Therefore, this function is used to make sure that + each rank shuffles the data indices in the same order based + on the same seed. Then different ranks could use different indices + to select non-overlapped data from the same data list. + + Args: + seed (int, Optional): The seed. Default to None. + device (str): The device where the seed will be put on. + Default to 'cuda'. + Returns: + int: Seed to be used. + """ + if seed is None: + seed = np.random.randint(2**31) + assert isinstance(seed, int) + + rank, world_size = get_dist_info() + + if world_size == 1: + return seed + + if rank == 0: + random_num = torch.tensor(seed, dtype=torch.int32, device=device) + else: + random_num = torch.tensor(0, dtype=torch.int32, device=device) + dist.broadcast(random_num, src=0) + return random_num.item() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..354d525c7453611433ec5799a64764b3caca2a5a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/__init__.py @@ -0,0 +1,18 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .accuracy import (average_precision_at_temporal_iou, + average_recall_at_avg_proposals, confusion_matrix, + get_weighted_score, interpolated_precision_recall, + mean_average_precision, mean_class_accuracy, + mmit_mean_average_precision, pairwise_temporal_iou, + softmax, top_k_accuracy, top_k_classes) +from .eval_detection import ActivityNetLocalization +from .eval_hooks import DistEvalHook, EvalHook + +__all__ = [ + 'DistEvalHook', 'EvalHook', 'top_k_accuracy', 'mean_class_accuracy', + 'confusion_matrix', 'mean_average_precision', 'get_weighted_score', + 'average_recall_at_avg_proposals', 'pairwise_temporal_iou', + 'average_precision_at_temporal_iou', 'ActivityNetLocalization', 'softmax', + 'interpolated_precision_recall', 'mmit_mean_average_precision', + 'top_k_classes' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/accuracy.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/accuracy.py new file mode 100644 index 0000000000000000000000000000000000000000..08cb4b49b92b4d044ba348691ef392f2d04e13b1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/accuracy.py @@ -0,0 +1,568 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np + + +def confusion_matrix(y_pred, y_real, normalize=None): + """Compute confusion matrix. + + Args: + y_pred (list[int] | np.ndarray[int]): Prediction labels. + y_real (list[int] | np.ndarray[int]): Ground truth labels. + normalize (str | None): Normalizes confusion matrix over the true + (rows), predicted (columns) conditions or all the population. + If None, confusion matrix will not be normalized. Options are + "true", "pred", "all", None. Default: None. + + Returns: + np.ndarray: Confusion matrix. + """ + if normalize not in ['true', 'pred', 'all', None]: + raise ValueError("normalize must be one of {'true', 'pred', " + "'all', None}") + + if isinstance(y_pred, list): + y_pred = np.array(y_pred) + if y_pred.dtype == np.int32: + y_pred = y_pred.astype(np.int64) + if not isinstance(y_pred, np.ndarray): + raise TypeError( + f'y_pred must be list or np.ndarray, but got {type(y_pred)}') + if not y_pred.dtype == np.int64: + raise TypeError( + f'y_pred dtype must be np.int64, but got {y_pred.dtype}') + + if isinstance(y_real, list): + y_real = np.array(y_real) + if y_real.dtype == np.int32: + y_real = y_real.astype(np.int64) + if not isinstance(y_real, np.ndarray): + raise TypeError( + f'y_real must be list or np.ndarray, but got {type(y_real)}') + if not y_real.dtype == np.int64: + raise TypeError( + f'y_real dtype must be np.int64, but got {y_real.dtype}') + + label_set = np.unique(np.concatenate((y_pred, y_real))) + num_labels = len(label_set) + max_label = label_set[-1] + label_map = np.zeros(max_label + 1, dtype=np.int64) + for i, label in enumerate(label_set): + label_map[label] = i + + y_pred_mapped = label_map[y_pred] + y_real_mapped = label_map[y_real] + + confusion_mat = np.bincount( + num_labels * y_real_mapped + y_pred_mapped, + minlength=num_labels**2).reshape(num_labels, num_labels) + + with np.errstate(all='ignore'): + if normalize == 'true': + confusion_mat = ( + confusion_mat / confusion_mat.sum(axis=1, keepdims=True)) + elif normalize == 'pred': + confusion_mat = ( + confusion_mat / confusion_mat.sum(axis=0, keepdims=True)) + elif normalize == 'all': + confusion_mat = (confusion_mat / confusion_mat.sum()) + confusion_mat = np.nan_to_num(confusion_mat) + + return confusion_mat + + +def mean_class_accuracy(scores, labels): + """Calculate mean class accuracy. + + Args: + scores (list[np.ndarray]): Prediction scores for each class. + labels (list[int]): Ground truth labels. + + Returns: + np.ndarray: Mean class accuracy. + """ + pred = np.argmax(scores, axis=1) + cf_mat = confusion_matrix(pred, labels).astype(float) + + cls_cnt = cf_mat.sum(axis=1) + cls_hit = np.diag(cf_mat) + + mean_class_acc = np.mean( + [hit / cnt if cnt else 0.0 for cnt, hit in zip(cls_cnt, cls_hit)]) + + return mean_class_acc + + +def top_k_classes(scores, labels, k=10, mode='accurate'): + """Calculate the most K accurate (inaccurate) classes. + + Given the prediction scores, ground truth label and top-k value, + compute the top K accurate (inaccurate) classes. + + Args: + scores (list[np.ndarray]): Prediction scores for each class. + labels (list[int] | np.ndarray): Ground truth labels. + k (int): Top-k values. Default: 10. + mode (str): Comparison mode for Top-k. Options are 'accurate' + and 'inaccurate'. Default: 'accurate'. + + Return: + list: List of sorted (from high accuracy to low accuracy for + 'accurate' mode, and from low accuracy to high accuracy for + inaccurate mode) top K classes in format of (label_id, + acc_ratio). + """ + assert mode in ['accurate', 'inaccurate'] + pred = np.argmax(scores, axis=1) + cf_mat = confusion_matrix(pred, labels).astype(float) + + cls_cnt = cf_mat.sum(axis=1) + cls_hit = np.diag(cf_mat) + hit_ratio = np.array( + [hit / cnt if cnt else 0.0 for cnt, hit in zip(cls_cnt, cls_hit)]) + + if mode == 'accurate': + max_index = np.argsort(hit_ratio)[-k:][::-1] + max_value = hit_ratio[max_index] + results = list(zip(max_index, max_value)) + else: + min_index = np.argsort(hit_ratio)[:k] + min_value = hit_ratio[min_index] + results = list(zip(min_index, min_value)) + return results + + +def top_k_accuracy(scores, labels, topk=(1, )): + """Calculate top k accuracy score. + + Args: + scores (list[np.ndarray]): Prediction scores for each class. + labels (list[int]): Ground truth labels. + topk (tuple[int]): K value for top_k_accuracy. Default: (1, ). + + Returns: + list[float]: Top k accuracy score for each k. + """ + res = [] + labels = np.array(labels)[:, np.newaxis] + for k in topk: + max_k_preds = np.argsort(scores, axis=1)[:, -k:][:, ::-1] + match_array = np.logical_or.reduce(max_k_preds == labels, axis=1) + topk_acc_score = match_array.sum() / match_array.shape[0] + res.append(topk_acc_score) + + return res + + +def mmit_mean_average_precision(scores, labels): + """Mean average precision for multi-label recognition. Used for reporting + MMIT style mAP on Multi-Moments in Times. The difference is that this + method calculates average-precision for each sample and averages them among + samples. + + Args: + scores (list[np.ndarray]): Prediction scores of different classes for + each sample. + labels (list[np.ndarray]): Ground truth many-hot vector for each + sample. + + Returns: + np.float: The MMIT style mean average precision. + """ + results = [] + for score, label in zip(scores, labels): + precision, recall, _ = binary_precision_recall_curve(score, label) + ap = -np.sum(np.diff(recall) * np.array(precision)[:-1]) + results.append(ap) + return np.mean(results) + + +def mean_average_precision(scores, labels): + """Mean average precision for multi-label recognition. + + Args: + scores (list[np.ndarray]): Prediction scores of different classes for + each sample. + labels (list[np.ndarray]): Ground truth many-hot vector for each + sample. + + Returns: + np.float: The mean average precision. + """ + results = [] + scores = np.stack(scores).T + labels = np.stack(labels).T + + for score, label in zip(scores, labels): + precision, recall, _ = binary_precision_recall_curve(score, label) + ap = -np.sum(np.diff(recall) * np.array(precision)[:-1]) + results.append(ap) + results = [x for x in results if not np.isnan(x)] + if results == []: + return np.nan + return np.mean(results) + + +def binary_precision_recall_curve(y_score, y_true): + """Calculate the binary precision recall curve at step thresholds. + + Args: + y_score (np.ndarray): Prediction scores for each class. + Shape should be (num_classes, ). + y_true (np.ndarray): Ground truth many-hot vector. + Shape should be (num_classes, ). + + Returns: + precision (np.ndarray): The precision of different thresholds. + recall (np.ndarray): The recall of different thresholds. + thresholds (np.ndarray): Different thresholds at which precision and + recall are tested. + """ + assert isinstance(y_score, np.ndarray) + assert isinstance(y_true, np.ndarray) + assert y_score.shape == y_true.shape + + # make y_true a boolean vector + y_true = (y_true == 1) + # sort scores and corresponding truth values + desc_score_indices = np.argsort(y_score, kind='mergesort')[::-1] + y_score = y_score[desc_score_indices] + y_true = y_true[desc_score_indices] + # There may be ties in values, therefore find the `distinct_value_inds` + distinct_value_inds = np.where(np.diff(y_score))[0] + threshold_inds = np.r_[distinct_value_inds, y_true.size - 1] + # accumulate the true positives with decreasing threshold + tps = np.cumsum(y_true)[threshold_inds] + fps = 1 + threshold_inds - tps + thresholds = y_score[threshold_inds] + + precision = tps / (tps + fps) + precision[np.isnan(precision)] = 0 + recall = tps / tps[-1] + # stop when full recall attained + # and reverse the outputs so recall is decreasing + last_ind = tps.searchsorted(tps[-1]) + sl = slice(last_ind, None, -1) + + return np.r_[precision[sl], 1], np.r_[recall[sl], 0], thresholds[sl] + + +def pairwise_temporal_iou(candidate_segments, + target_segments, + calculate_overlap_self=False): + """Compute intersection over union between segments. + + Args: + candidate_segments (np.ndarray): 1-dim/2-dim array in format + ``[init, end]/[m x 2:=[init, end]]``. + target_segments (np.ndarray): 2-dim array in format + ``[n x 2:=[init, end]]``. + calculate_overlap_self (bool): Whether to calculate overlap_self + (union / candidate_length) or not. Default: False. + + Returns: + t_iou (np.ndarray): 1-dim array [n] / + 2-dim array [n x m] with IoU ratio. + t_overlap_self (np.ndarray, optional): 1-dim array [n] / + 2-dim array [n x m] with overlap_self, returns when + calculate_overlap_self is True. + """ + candidate_segments_ndim = candidate_segments.ndim + if target_segments.ndim != 2 or candidate_segments_ndim not in [1, 2]: + raise ValueError('Dimension of arguments is incorrect') + + if candidate_segments_ndim == 1: + candidate_segments = candidate_segments[np.newaxis, :] + + n, m = target_segments.shape[0], candidate_segments.shape[0] + t_iou = np.empty((n, m), dtype=np.float32) + if calculate_overlap_self: + t_overlap_self = np.empty((n, m), dtype=np.float32) + + for i in range(m): + candidate_segment = candidate_segments[i, :] + tt1 = np.maximum(candidate_segment[0], target_segments[:, 0]) + tt2 = np.minimum(candidate_segment[1], target_segments[:, 1]) + # Intersection including Non-negative overlap score. + segments_intersection = (tt2 - tt1).clip(0) + # Segment union. + segments_union = ((target_segments[:, 1] - target_segments[:, 0]) + + (candidate_segment[1] - candidate_segment[0]) - + segments_intersection) + # Compute overlap as the ratio of the intersection + # over union of two segments. + t_iou[:, i] = (segments_intersection.astype(float) / segments_union) + if calculate_overlap_self: + candidate_length = candidate_segment[1] - candidate_segment[0] + t_overlap_self[:, i] = ( + segments_intersection.astype(float) / candidate_length) + + if candidate_segments_ndim == 1: + t_iou = np.squeeze(t_iou, axis=1) + if calculate_overlap_self: + if candidate_segments_ndim == 1: + t_overlap_self = np.squeeze(t_overlap_self, axis=1) + return t_iou, t_overlap_self + + return t_iou + + +def average_recall_at_avg_proposals(ground_truth, + proposals, + total_num_proposals, + max_avg_proposals=None, + temporal_iou_thresholds=np.linspace( + 0.5, 0.95, 10)): + """Computes the average recall given an average number (percentile) of + proposals per video. + + Args: + ground_truth (dict): Dict containing the ground truth instances. + proposals (dict): Dict containing the proposal instances. + total_num_proposals (int): Total number of proposals in the + proposal dict. + max_avg_proposals (int | None): Max number of proposals for one video. + Default: None. + temporal_iou_thresholds (np.ndarray): 1D array with temporal_iou + thresholds. Default: ``np.linspace(0.5, 0.95, 10)``. + + Returns: + tuple([np.ndarray, np.ndarray, np.ndarray, float]): + (recall, average_recall, proposals_per_video, auc) + In recall, ``recall[i,j]`` is recall at i-th temporal_iou threshold + at the j-th average number (percentile) of average number of + proposals per video. The average_recall is recall averaged + over a list of temporal_iou threshold (1D array). This is + equivalent to ``recall.mean(axis=0)``. The ``proposals_per_video`` + is the average number of proposals per video. The auc is the area + under ``AR@AN`` curve. + """ + + total_num_videos = len(ground_truth) + + if not max_avg_proposals: + max_avg_proposals = float(total_num_proposals) / total_num_videos + + ratio = (max_avg_proposals * float(total_num_videos) / total_num_proposals) + + # For each video, compute temporal_iou scores among the retrieved proposals + score_list = [] + total_num_retrieved_proposals = 0 + for video_id in ground_truth: + # Get proposals for this video. + proposals_video_id = proposals[video_id] + this_video_proposals = proposals_video_id[:, :2] + # Sort proposals by score. + sort_idx = proposals_video_id[:, 2].argsort()[::-1] + this_video_proposals = this_video_proposals[sort_idx, :].astype( + np.float32) + + # Get ground-truth instances associated to this video. + ground_truth_video_id = ground_truth[video_id] + this_video_ground_truth = ground_truth_video_id[:, :2].astype( + np.float32) + if this_video_proposals.shape[0] == 0: + n = this_video_ground_truth.shape[0] + score_list.append(np.zeros((n, 1))) + continue + + if this_video_proposals.ndim != 2: + this_video_proposals = np.expand_dims(this_video_proposals, axis=0) + if this_video_ground_truth.ndim != 2: + this_video_ground_truth = np.expand_dims( + this_video_ground_truth, axis=0) + + num_retrieved_proposals = np.minimum( + int(this_video_proposals.shape[0] * ratio), + this_video_proposals.shape[0]) + total_num_retrieved_proposals += num_retrieved_proposals + this_video_proposals = this_video_proposals[: + num_retrieved_proposals, :] + + # Compute temporal_iou scores. + t_iou = pairwise_temporal_iou(this_video_proposals, + this_video_ground_truth) + score_list.append(t_iou) + + # Given that the length of the videos is really varied, we + # compute the number of proposals in terms of a ratio of the total + # proposals retrieved, i.e. average recall at a percentage of proposals + # retrieved per video. + + # Computes average recall. + pcn_list = np.arange(1, 101) / 100.0 * ( + max_avg_proposals * float(total_num_videos) / + total_num_retrieved_proposals) + matches = np.empty((total_num_videos, pcn_list.shape[0])) + positives = np.empty(total_num_videos) + recall = np.empty((temporal_iou_thresholds.shape[0], pcn_list.shape[0])) + # Iterates over each temporal_iou threshold. + for ridx, temporal_iou in enumerate(temporal_iou_thresholds): + # Inspect positives retrieved per video at different + # number of proposals (percentage of the total retrieved). + for i, score in enumerate(score_list): + # Total positives per video. + positives[i] = score.shape[0] + # Find proposals that satisfies minimum temporal_iou threshold. + true_positives_temporal_iou = score >= temporal_iou + # Get number of proposals as a percentage of total retrieved. + pcn_proposals = np.minimum( + (score.shape[1] * pcn_list).astype(np.int), score.shape[1]) + + for j, num_retrieved_proposals in enumerate(pcn_proposals): + # Compute the number of matches + # for each percentage of the proposals + matches[i, j] = np.count_nonzero( + (true_positives_temporal_iou[:, :num_retrieved_proposals] + ).sum(axis=1)) + + # Computes recall given the set of matches per video. + recall[ridx, :] = matches.sum(axis=0) / positives.sum() + + # Recall is averaged. + avg_recall = recall.mean(axis=0) + + # Get the average number of proposals per video. + proposals_per_video = pcn_list * ( + float(total_num_retrieved_proposals) / total_num_videos) + # Get AUC + area_under_curve = np.trapz(avg_recall, proposals_per_video) + auc = 100. * float(area_under_curve) / proposals_per_video[-1] + return recall, avg_recall, proposals_per_video, auc + + +def get_weighted_score(score_list, coeff_list): + """Get weighted score with given scores and coefficients. + + Given n predictions by different classifier: [score_1, score_2, ..., + score_n] (score_list) and their coefficients: [coeff_1, coeff_2, ..., + coeff_n] (coeff_list), return weighted score: weighted_score = + score_1 * coeff_1 + score_2 * coeff_2 + ... + score_n * coeff_n + + Args: + score_list (list[list[np.ndarray]]): List of list of scores, with shape + n(number of predictions) X num_samples X num_classes + coeff_list (list[float]): List of coefficients, with shape n. + + Returns: + list[np.ndarray]: List of weighted scores. + """ + assert len(score_list) == len(coeff_list) + num_samples = len(score_list[0]) + for i in range(1, len(score_list)): + assert len(score_list[i]) == num_samples + + scores = np.array(score_list) # (num_coeff, num_samples, num_classes) + coeff = np.array(coeff_list) # (num_coeff, ) + weighted_scores = list(np.dot(scores.T, coeff).T) + return weighted_scores + + +def softmax(x, dim=1): + """Compute softmax values for each sets of scores in x.""" + e_x = np.exp(x - np.max(x, axis=dim, keepdims=True)) + return e_x / e_x.sum(axis=dim, keepdims=True) + + +def interpolated_precision_recall(precision, recall): + """Interpolated AP - VOCdevkit from VOC 2011. + + Args: + precision (np.ndarray): The precision of different thresholds. + recall (np.ndarray): The recall of different thresholds. + + Returns: + float: Average precision score. + """ + mprecision = np.hstack([[0], precision, [0]]) + mrecall = np.hstack([[0], recall, [1]]) + for i in range(len(mprecision) - 1)[::-1]: + mprecision[i] = max(mprecision[i], mprecision[i + 1]) + idx = np.where(mrecall[1::] != mrecall[0:-1])[0] + 1 + ap = np.sum((mrecall[idx] - mrecall[idx - 1]) * mprecision[idx]) + return ap + + +def average_precision_at_temporal_iou(ground_truth, + prediction, + temporal_iou_thresholds=(np.linspace( + 0.5, 0.95, 10))): + """Compute average precision (in detection task) between ground truth and + predicted data frames. If multiple predictions match the same predicted + segment, only the one with highest score is matched as true positive. This + code is greatly inspired by Pascal VOC devkit. + + Args: + ground_truth (dict): Dict containing the ground truth instances. + Key: 'video_id' + Value (np.ndarray): 1D array of 't-start' and 't-end'. + prediction (np.ndarray): 2D array containing the information of + proposal instances, including 'video_id', 'class_id', 't-start', + 't-end' and 'score'. + temporal_iou_thresholds (np.ndarray): 1D array with temporal_iou + thresholds. Default: ``np.linspace(0.5, 0.95, 10)``. + + Returns: + np.ndarray: 1D array of average precision score. + """ + ap = np.zeros(len(temporal_iou_thresholds), dtype=np.float32) + if len(prediction) < 1: + return ap + + num_gts = 0. + lock_gt = dict() + for key in ground_truth: + lock_gt[key] = np.ones( + (len(temporal_iou_thresholds), len(ground_truth[key]))) * -1 + num_gts += len(ground_truth[key]) + + # Sort predictions by decreasing score order. + prediction = np.array(prediction) + scores = prediction[:, 4].astype(float) + sort_idx = np.argsort(scores)[::-1] + prediction = prediction[sort_idx] + + # Initialize true positive and false positive vectors. + tp = np.zeros((len(temporal_iou_thresholds), len(prediction)), + dtype=np.int32) + fp = np.zeros((len(temporal_iou_thresholds), len(prediction)), + dtype=np.int32) + + # Assigning true positive to truly grount truth instances. + for idx, this_pred in enumerate(prediction): + + # Check if there is at least one ground truth in the video. + if this_pred[0] in ground_truth: + this_gt = np.array(ground_truth[this_pred[0]], dtype=float) + else: + fp[:, idx] = 1 + continue + + t_iou = pairwise_temporal_iou(this_pred[2:4].astype(float), this_gt) + # We would like to retrieve the predictions with highest t_iou score. + t_iou_sorted_idx = t_iou.argsort()[::-1] + for t_idx, t_iou_threshold in enumerate(temporal_iou_thresholds): + for jdx in t_iou_sorted_idx: + if t_iou[jdx] < t_iou_threshold: + fp[t_idx, idx] = 1 + break + if lock_gt[this_pred[0]][t_idx, jdx] >= 0: + continue + # Assign as true positive after the filters above. + tp[t_idx, idx] = 1 + lock_gt[this_pred[0]][t_idx, jdx] = idx + break + + if fp[t_idx, idx] == 0 and tp[t_idx, idx] == 0: + fp[t_idx, idx] = 1 + + tp_cumsum = np.cumsum(tp, axis=1).astype(np.float32) + fp_cumsum = np.cumsum(fp, axis=1).astype(np.float32) + recall_cumsum = tp_cumsum / num_gts + + precision_cumsum = tp_cumsum / (tp_cumsum + fp_cumsum) + + for t_idx in range(len(temporal_iou_thresholds)): + ap[t_idx] = interpolated_precision_recall(precision_cumsum[t_idx, :], + recall_cumsum[t_idx, :]) + + return ap diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/README.md b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7414d0fbbd32d24d1e1b745d1df6a3fd2a2c2a43 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/README.md @@ -0,0 +1,2 @@ +The code under this folder is from the official [ActivityNet repo](https://github.com/activitynet/ActivityNet). +Some unused codes are removed to minimize the length of codes added. diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/__init__.py @@ -0,0 +1 @@ +# Copyright (c) OpenMMLab. All rights reserved. diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/metrics.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..4d566accb59a8f81a96eff24c4d2907b92449386 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/metrics.py @@ -0,0 +1,142 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================= +"""Functions for computing metrics like precision, recall, CorLoc and etc.""" + +import numpy as np + + +def compute_precision_recall(scores, labels, num_gt): + """Compute precision and recall. + + Args: + scores: A float numpy array representing detection score + labels: A boolean numpy array representing true/false positive labels + num_gt: Number of ground truth instances + + Raises: + ValueError: if the input is not of the correct format + + Returns: + precision: Fraction of positive instances over detected ones. This + value is None if no ground truth labels are present. + recall: Fraction of detected positive instance over all positive + instances. This value is None if no ground truth labels are + present. + """ + if (not isinstance(labels, np.ndarray) or labels.dtype != np.bool + or len(labels.shape) != 1): + raise ValueError('labels must be single dimension bool numpy array') + + if not isinstance(scores, np.ndarray) or len(scores.shape) != 1: + raise ValueError('scores must be single dimension numpy array') + + if num_gt < np.sum(labels): + raise ValueError( + 'Number of true positives must be smaller than num_gt.') + + if len(scores) != len(labels): + raise ValueError('scores and labels must be of the same size.') + + if num_gt == 0: + return None, None + + sorted_indices = np.argsort(scores) + sorted_indices = sorted_indices[::-1] + labels = labels.astype(int) + true_positive_labels = labels[sorted_indices] + false_positive_labels = 1 - true_positive_labels + cum_true_positives = np.cumsum(true_positive_labels) + cum_false_positives = np.cumsum(false_positive_labels) + precision = cum_true_positives.astype(float) / ( + cum_true_positives + cum_false_positives) + recall = cum_true_positives.astype(float) / num_gt + return precision, recall + + +def compute_average_precision(precision, recall): + """Compute Average Precision according to the definition in VOCdevkit. + + Precision is modified to ensure that it does not decrease as recall + decrease. + + Args: + precision: A float [N, 1] numpy array of precisions + recall: A float [N, 1] numpy array of recalls + + Raises: + ValueError: if the input is not of the correct format + + Returns: + average_precison: The area under the precision recall curve. NaN if + precision and recall are None. + """ + if precision is None: + if recall is not None: + raise ValueError('If precision is None, recall must also be None') + return np.NAN + + if not isinstance(precision, np.ndarray) or not isinstance( + recall, np.ndarray): + raise ValueError('precision and recall must be numpy array') + if precision.dtype != np.float or recall.dtype != np.float: + raise ValueError('input must be float numpy array.') + if len(precision) != len(recall): + raise ValueError('precision and recall must be of the same size.') + if not precision.size: + return 0.0 + if np.amin(precision) < 0 or np.amax(precision) > 1: + raise ValueError('Precision must be in the range of [0, 1].') + if np.amin(recall) < 0 or np.amax(recall) > 1: + raise ValueError('recall must be in the range of [0, 1].') + if not all(recall[i] <= recall[i + 1] for i in range(len(recall) - 1)): + raise ValueError('recall must be a non-decreasing array') + + recall = np.concatenate([[0], recall, [1]]) + precision = np.concatenate([[0], precision, [0]]) + + # Preprocess precision to be a non-decreasing array + for i in range(len(precision) - 2, -1, -1): + precision[i] = np.maximum(precision[i], precision[i + 1]) + + indices = np.where(recall[1:] != recall[:-1])[0] + 1 + average_precision = np.sum( + (recall[indices] - recall[indices - 1]) * precision[indices]) + return average_precision + + +def compute_cor_loc(num_gt_imgs_per_class, + num_images_correctly_detected_per_class): + """Compute CorLoc according to the definition in the following paper. + + https://www.robots.ox.ac.uk/~vgg/rg/papers/deselaers-eccv10.pdf + + Returns nans if there are no ground truth images for a class. + + Args: + num_gt_imgs_per_class: 1D array, representing number of images + containing at least one object instance of a particular class + num_images_correctly_detected_per_class: 1D array, representing number + of images that are correctly detected at least one object instance + of a particular class + + Returns: + corloc_per_class: A float numpy array represents the corloc score of + each class + """ + # Divide by zero expected for classes with no gt examples. + with np.errstate(divide='ignore', invalid='ignore'): + return np.where( + num_gt_imgs_per_class == 0, np.nan, + num_images_correctly_detected_per_class / num_gt_imgs_per_class) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/np_box_list.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/np_box_list.py new file mode 100644 index 0000000000000000000000000000000000000000..255bebe399c8fe3f3d1a47bf351802ab7ff0237e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/np_box_list.py @@ -0,0 +1,139 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================= +"""Numpy BoxList classes and functions.""" + +import numpy as np + + +class BoxList: + """Box collection. + + BoxList represents a list of bounding boxes as numpy array, where each + bounding box is represented as a row of 4 numbers, + [y_min, x_min, y_max, x_max]. It is assumed that all bounding boxes within + a given list correspond to a single image. + + Optionally, users can add additional related fields (such as + objectness/classification scores). + """ + + def __init__(self, data): + """Constructs box collection. + + Args: + data: a numpy array of shape [N, 4] representing box coordinates + + Raises: + ValueError: if bbox data is not a numpy array + ValueError: if invalid dimensions for bbox data + """ + if not isinstance(data, np.ndarray): + raise ValueError('data must be a numpy array.') + if len(data.shape) != 2 or data.shape[1] != 4: + raise ValueError('Invalid dimensions for box data.') + if data.dtype != np.float32 and data.dtype != np.float64: + raise ValueError( + 'Invalid data type for box data: float is required.') + if not self._is_valid_boxes(data): + raise ValueError('Invalid box data. data must be a numpy array of ' + 'N*[y_min, x_min, y_max, x_max]') + self.data = {'boxes': data} + + def num_boxes(self): + """Return number of boxes held in collections.""" + return self.data['boxes'].shape[0] + + def get_extra_fields(self): + """Return all non-box fields.""" + return [k for k in self.data if k != 'boxes'] + + def has_field(self, field): + return field in self.data + + def add_field(self, field, field_data): + """Add data to a specified field. + + Args: + field: a string parameter used to specify a related field to be + accessed. + field_data: a numpy array of [N, ...] representing the data + associated with the field. + Raises: + ValueError: if the field is already exist or the dimension of the + field data does not matches the number of boxes. + """ + if self.has_field(field): + raise ValueError('Field ' + field + 'already exists') + if len(field_data.shape) < 1 or field_data.shape[0] != self.num_boxes( + ): + raise ValueError('Invalid dimensions for field data') + self.data[field] = field_data + + def get(self): + """Convenience function for accesssing box coordinates. + + Returns: + a numpy array of shape [N, 4] representing box corners + """ + return self.get_field('boxes') + + def get_field(self, field): + """Accesses data associated with the specified field in the box + collection. + + Args: + field: a string parameter used to specify a related field to be + accessed. + + Returns: + a numpy 1-d array representing data of an associated field + + Raises: + ValueError: if invalid field + """ + if not self.has_field(field): + raise ValueError(f'field {field} does not exist') + return self.data[field] + + def get_coordinates(self): + """Get corner coordinates of boxes. + + Returns: + a list of 4 1-d numpy arrays [y_min, x_min, y_max, x_max] + """ + box_coordinates = self.get() + y_min = box_coordinates[:, 0] + x_min = box_coordinates[:, 1] + y_max = box_coordinates[:, 2] + x_max = box_coordinates[:, 3] + return [y_min, x_min, y_max, x_max] + + @staticmethod + def _is_valid_boxes(data): + """Check whether data fulfills the format of N*[ymin, xmin, ymax, + xmin]. + + Args: + data: a numpy array of shape [N, 4] representing box coordinates + + Returns: + a boolean indicating whether all ymax of boxes are equal or greater + than ymin, and all xmax of boxes are equal or greater than xmin. + """ + if len(data) != 0: + for v in data: + if v[0] > v[2] or v[1] > v[3]: + return False + return True diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/np_box_ops.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/np_box_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..94e7d300c80195f8a0299fbf33000dba9719bb0d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/np_box_ops.py @@ -0,0 +1,98 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Operations for [N, 4] numpy arrays representing bounding boxes. + +Example box operations that are supported: + * Areas: compute bounding box areas + * IOU: pairwise intersection-over-union scores +""" + +import numpy as np + + +def area(boxes): + """Computes area of boxes. + + Args: + boxes: Numpy array with shape [N, 4] holding N boxes + + Returns: + a numpy array with shape [N*1] representing box areas + """ + return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) + + +def intersection(boxes1, boxes2): + """Compute pairwise intersection areas between boxes. + + Args: + boxes1: a numpy array with shape [N, 4] holding N boxes + boxes2: a numpy array with shape [M, 4] holding M boxes + + Returns: + a numpy array with shape [N*M] representing pairwise intersection area + """ + [y_min1, x_min1, y_max1, x_max1] = np.split(boxes1, 4, axis=1) + [y_min2, x_min2, y_max2, x_max2] = np.split(boxes2, 4, axis=1) + + all_pairs_min_ymax = np.minimum(y_max1, np.transpose(y_max2)) + all_pairs_max_ymin = np.maximum(y_min1, np.transpose(y_min2)) + intersect_heights = np.maximum( + np.zeros(all_pairs_max_ymin.shape), + all_pairs_min_ymax - all_pairs_max_ymin) + all_pairs_min_xmax = np.minimum(x_max1, np.transpose(x_max2)) + all_pairs_max_xmin = np.maximum(x_min1, np.transpose(x_min2)) + intersect_widths = np.maximum( + np.zeros(all_pairs_max_xmin.shape), + all_pairs_min_xmax - all_pairs_max_xmin) + return intersect_heights * intersect_widths + + +def iou(boxes1, boxes2): + """Computes pairwise intersection-over-union between box collections. + + Args: + boxes1: a numpy array with shape [N, 4] holding N boxes. + boxes2: a numpy array with shape [M, 4] holding N boxes. + + Returns: + a numpy array with shape [N, M] representing pairwise iou scores. + """ + intersect = intersection(boxes1, boxes2) + area1 = area(boxes1) + area2 = area(boxes2) + union = ( + np.expand_dims(area1, axis=1) + np.expand_dims(area2, axis=0) - + intersect) + return intersect / union + + +def ioa(boxes1, boxes2): + """Computes pairwise intersection-over-area between box collections. + + Intersection-over-area (ioa) between two boxes box1 and box2 is defined as + their intersection area over box2's area. Note that ioa is not symmetric, + that is, IOA(box1, box2) != IOA(box2, box1). + + Args: + boxes1: a numpy array with shape [N, 4] holding N boxes. + boxes2: a numpy array with shape [M, 4] holding N boxes. + + Returns: + a numpy array with shape [N, M] representing pairwise ioa scores. + """ + intersect = intersection(boxes1, boxes2) + areas = np.expand_dims(area(boxes2), axis=0) + return intersect / areas diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/object_detection_evaluation.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/object_detection_evaluation.py new file mode 100644 index 0000000000000000000000000000000000000000..188652148524ec81124892dfabede05358de22af --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/object_detection_evaluation.py @@ -0,0 +1,574 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================= +"""object_detection_evaluation module. + +ObjectDetectionEvaluation is a class which manages ground truth information of +a object detection dataset, and computes frequently used detection metrics such +as Precision, Recall, CorLoc of the provided detection results. +It supports the following operations: +1) Add ground truth information of images sequentially. +2) Add detection result of images sequentially. +3) Evaluate detection metrics on already inserted detection results. +4) Write evaluation result into a pickle file for future processing or + visualization. + +Note: This module operates on numpy boxes and box lists. +""" + +import collections +import logging +import warnings +from abc import ABCMeta, abstractmethod +from collections import defaultdict + +import numpy as np + +from . import metrics, per_image_evaluation, standard_fields + + +class DetectionEvaluator: + """Interface for object detection evaluation classes. + + Example usage of the Evaluator: + ------------------------------ + evaluator = DetectionEvaluator(categories) + + # Detections and groundtruth for image 1. + evaluator.add_single_groundtruth_image_info(...) + evaluator.add_single_detected_image_info(...) + + # Detections and groundtruth for image 2. + evaluator.add_single_groundtruth_image_info(...) + evaluator.add_single_detected_image_info(...) + + metrics_dict = evaluator.evaluate() + """ + + __metaclass__ = ABCMeta + + def __init__(self, categories): + """Constructor. + + Args: + categories: A list of dicts, each of which has the following keys - + 'id': (required) an integer id uniquely identifying this + category. + 'name': (required) string representing category name e.g., + 'cat', 'dog'. + """ + self._categories = categories + + @abstractmethod + def add_single_ground_truth_image_info(self, image_id, groundtruth_dict): + """Adds groundtruth for a single image to be used for evaluation. + + Args: + image_id: A unique string/integer identifier for the image. + groundtruth_dict: A dictionary of groundtruth numpy arrays required + for evaluations. + """ + + @abstractmethod + def add_single_detected_image_info(self, image_id, detections_dict): + """Adds detections for a single image to be used for evaluation. + + Args: + image_id: A unique string/integer identifier for the image. + detections_dict: A dictionary of detection numpy arrays required + for evaluation. + """ + + @abstractmethod + def evaluate(self): + """Evaluates detections and returns a dictionary of metrics.""" + + @abstractmethod + def clear(self): + """Clears the state to prepare for a fresh evaluation.""" + + +class ObjectDetectionEvaluator(DetectionEvaluator): + """A class to evaluate detections.""" + + def __init__(self, + categories, + matching_iou_threshold=0.5, + evaluate_corlocs=False, + metric_prefix=None, + use_weighted_mean_ap=False, + evaluate_masks=False): + """Constructor. + + Args: + categories: A list of dicts, each of which has the following keys - + 'id': (required) an integer id uniquely identifying this + category. + 'name': (required) string representing category name e.g., + 'cat', 'dog'. + matching_iou_threshold: IOU threshold to use for matching + groundtruth boxes to detection boxes. + evaluate_corlocs: (optional) boolean which determines if corloc + scores are to be returned or not. + metric_prefix: (optional) string prefix for metric name; if None, + no prefix is used. + use_weighted_mean_ap: (optional) boolean which determines if the + mean average precision is computed directly from the scores and + tp_fp_labels of all classes. + evaluate_masks: If False, evaluation will be performed based on + boxes. If True, mask evaluation will be performed instead. + + Raises: + ValueError: If the category ids are not 1-indexed. + """ + super(ObjectDetectionEvaluator, self).__init__(categories) + self._num_classes = max([cat['id'] for cat in categories]) + if min(cat['id'] for cat in categories) < 1: + raise ValueError('Classes should be 1-indexed.') + self._matching_iou_threshold = matching_iou_threshold + self._use_weighted_mean_ap = use_weighted_mean_ap + self._label_id_offset = 1 + self._evaluate_masks = evaluate_masks + self._evaluation = ObjectDetectionEvaluation( + num_groundtruth_classes=self._num_classes, + matching_iou_threshold=self._matching_iou_threshold, + use_weighted_mean_ap=self._use_weighted_mean_ap, + label_id_offset=self._label_id_offset, + ) + self._image_ids = set([]) + self._evaluate_corlocs = evaluate_corlocs + self._metric_prefix = (metric_prefix + '_') if metric_prefix else '' + + def add_single_ground_truth_image_info(self, image_id, groundtruth_dict): + """Adds groundtruth for a single image to be used for evaluation. + + Args: + image_id: A unique string/integer identifier for the image. + groundtruth_dict: A dictionary containing - + standard_fields.InputDataFields.groundtruth_boxes: float32 + numpy array of shape [num_boxes, 4] containing `num_boxes` + groundtruth boxes of the format [ymin, xmin, ymax, xmax] in + absolute image coordinates. + standard_fields.InputDataFields.groundtruth_classes: integer + numpy array of shape [num_boxes] containing 1-indexed + groundtruth classes for the boxes. + standard_fields.InputDataFields.groundtruth_instance_masks: + Optional numpy array of shape [num_boxes, height, width] + with values in {0, 1}. + + Raises: + ValueError: On adding groundtruth for an image more than once. Will + also raise error if instance masks are not in groundtruth + dictionary. + """ + if image_id in self._image_ids: + raise ValueError( + 'Image with id {} already added.'.format(image_id)) + + groundtruth_classes = ( + groundtruth_dict[ + standard_fields.InputDataFields.groundtruth_classes] - + self._label_id_offset) + + groundtruth_masks = None + if self._evaluate_masks: + if (standard_fields.InputDataFields.groundtruth_instance_masks + not in groundtruth_dict): + raise ValueError( + 'Instance masks not in groundtruth dictionary.') + groundtruth_masks = groundtruth_dict[ + standard_fields.InputDataFields.groundtruth_instance_masks] + self._evaluation.add_single_ground_truth_image_info( + image_key=image_id, + groundtruth_boxes=groundtruth_dict[ + standard_fields.InputDataFields.groundtruth_boxes], + groundtruth_class_labels=groundtruth_classes, + groundtruth_masks=groundtruth_masks, + ) + self._image_ids.update([image_id]) + + def add_single_detected_image_info(self, image_id, detections_dict): + """Adds detections for a single image to be used for evaluation. + + Args: + image_id: A unique string/integer identifier for the image. + detections_dict: A dictionary containing - + standard_fields.DetectionResultFields.detection_boxes: float32 + numpy array of shape [num_boxes, 4] containing `num_boxes` + detection boxes of the format [ymin, xmin, ymax, xmax] in + absolute image coordinates. + standard_fields.DetectionResultFields.detection_scores: float32 + numpy array of shape [num_boxes] containing detection + scores for the boxes. + standard_fields.DetectionResultFields.detection_classes: + integer numpy array of shape [num_boxes] containing + 1-indexed detection classes for the boxes. + standard_fields.DetectionResultFields.detection_masks: uint8 + numpy array of shape [num_boxes, height, width] containing + `num_boxes` masks of values ranging between 0 and 1. + + Raises: + ValueError: If detection masks are not in detections dictionary. + """ + detection_classes = ( + detections_dict[ + standard_fields.DetectionResultFields.detection_classes] - + self._label_id_offset) + detection_masks = None + if self._evaluate_masks: + if (standard_fields.DetectionResultFields.detection_masks + not in detections_dict): + raise ValueError( + 'Detection masks not in detections dictionary.') + detection_masks = detections_dict[ + standard_fields.DetectionResultFields.detection_masks] + self._evaluation.add_single_detected_image_info( + image_key=image_id, + detected_boxes=detections_dict[ + standard_fields.DetectionResultFields.detection_boxes], + detected_scores=detections_dict[ + standard_fields.DetectionResultFields.detection_scores], + detected_class_labels=detection_classes, + detected_masks=detection_masks, + ) + + @staticmethod + def create_category_index(categories): + """Creates dictionary of COCO compatible categories keyed by category + id. + + Args: + categories: a list of dicts, each of which has the following keys: + 'id': (required) an integer id uniquely identifying this + category. + 'name': (required) string representing category name + e.g., 'cat', 'dog', 'pizza'. + + Returns: + category_index: a dict containing the same entries as categories, + but keyed by the 'id' field of each category. + """ + category_index = {} + for cat in categories: + category_index[cat['id']] = cat + return category_index + + def evaluate(self): + """Compute evaluation result. + + Returns: + A dictionary of metrics with the following fields - + + 1. summary_metrics: + 'Precision/mAP@IOU': mean average + precision at the specified IOU threshold + + 2. per_category_ap: category specific results with keys of the form + 'PerformanceByCategory/mAP@IOU/category' + """ + (per_class_ap, mean_ap, _, _, per_class_corloc, + mean_corloc) = self._evaluation.evaluate() + + metric = f'mAP@{self._matching_iou_threshold}IOU' + pascal_metrics = {self._metric_prefix + metric: mean_ap} + if self._evaluate_corlocs: + pascal_metrics[self._metric_prefix + + 'Precision/meanCorLoc@{}IOU'.format( + self._matching_iou_threshold)] = mean_corloc + category_index = self.create_category_index(self._categories) + for idx in range(per_class_ap.size): + if idx + self._label_id_offset in category_index: + display_name = ( + self._metric_prefix + + 'PerformanceByCategory/AP@{}IOU/{}'.format( + self._matching_iou_threshold, + category_index[idx + self._label_id_offset]['name'], + )) + pascal_metrics[display_name] = per_class_ap[idx] + + # Optionally add CorLoc metrics.classes + if self._evaluate_corlocs: + display_name = ( + self._metric_prefix + + 'PerformanceByCategory/CorLoc@{}IOU/{}'.format( + self._matching_iou_threshold, + category_index[idx + + self._label_id_offset]['name'], + )) + pascal_metrics[display_name] = per_class_corloc[idx] + + return pascal_metrics + + def clear(self): + """Clears the state to prepare for a fresh evaluation.""" + self._evaluation = ObjectDetectionEvaluation( + num_groundtruth_classes=self._num_classes, + matching_iou_threshold=self._matching_iou_threshold, + use_weighted_mean_ap=self._use_weighted_mean_ap, + label_id_offset=self._label_id_offset, + ) + self._image_ids.clear() + + +class PascalDetectionEvaluator(ObjectDetectionEvaluator): + """A class to evaluate detections using PASCAL metrics.""" + + def __init__(self, categories, matching_iou_threshold=0.5): + super(PascalDetectionEvaluator, self).__init__( + categories, + matching_iou_threshold=matching_iou_threshold, + evaluate_corlocs=False, + use_weighted_mean_ap=False, + ) + + +ObjectDetectionEvalMetrics = collections.namedtuple( + 'ObjectDetectionEvalMetrics', + [ + 'average_precisions', + 'mean_ap', + 'precisions', + 'recalls', + 'corlocs', + 'mean_corloc', + ], +) + + +class ObjectDetectionEvaluation: + """Internal implementation of Pascal object detection metrics.""" + + def __init__(self, + num_groundtruth_classes, + matching_iou_threshold=0.5, + nms_iou_threshold=1.0, + nms_max_output_boxes=10000, + use_weighted_mean_ap=False, + label_id_offset=0): + if num_groundtruth_classes < 1: + raise ValueError( + 'Need at least 1 groundtruth class for evaluation.') + + self.per_image_eval = per_image_evaluation.PerImageEvaluation( + num_groundtruth_classes=num_groundtruth_classes, + matching_iou_threshold=matching_iou_threshold, + ) + self.num_class = num_groundtruth_classes + self.use_weighted_mean_ap = use_weighted_mean_ap + self.label_id_offset = label_id_offset + + self.groundtruth_boxes = {} + self.groundtruth_class_labels = {} + self.groundtruth_masks = {} + self.num_gt_instances_per_class = np.zeros(self.num_class, dtype=int) + self.num_gt_imgs_per_class = np.zeros(self.num_class, dtype=int) + + self._initialize_detections() + + def _initialize_detections(self): + self.detection_keys = set() + self.scores_per_class = [[] for _ in range(self.num_class)] + self.tp_fp_labels_per_class = [[] for _ in range(self.num_class)] + self.num_images_correctly_detected_per_class = np.zeros(self.num_class) + self.average_precision_per_class = np.empty( + self.num_class, dtype=float) + self.average_precision_per_class.fill(np.nan) + self.precisions_per_class = [] + self.recalls_per_class = [] + self.corloc_per_class = np.ones(self.num_class, dtype=float) + + def clear_detections(self): + self._initialize_detections() + + def add_single_ground_truth_image_info(self, + image_key, + groundtruth_boxes, + groundtruth_class_labels, + groundtruth_masks=None): + """Adds groundtruth for a single image to be used for evaluation. + + Args: + image_key: A unique string/integer identifier for the image. + groundtruth_boxes: float32 numpy array of shape [num_boxes, 4] + containing `num_boxes` groundtruth boxes of the format + [ymin, xmin, ymax, xmax] in absolute image coordinates. + groundtruth_class_labels: integer numpy array of shape [num_boxes] + containing 0-indexed groundtruth classes for the boxes. + groundtruth_masks: uint8 numpy array of shape + [num_boxes, height, width] containing `num_boxes` groundtruth + masks. The mask values range from 0 to 1. + """ + if image_key in self.groundtruth_boxes: + warnings.warn(('image %s has already been added to the ground ' + 'truth database.'), image_key) + return + + self.groundtruth_boxes[image_key] = groundtruth_boxes + self.groundtruth_class_labels[image_key] = groundtruth_class_labels + self.groundtruth_masks[image_key] = groundtruth_masks + + self._update_ground_truth_statistics(groundtruth_class_labels) + + def add_single_detected_image_info(self, + image_key, + detected_boxes, + detected_scores, + detected_class_labels, + detected_masks=None): + """Adds detections for a single image to be used for evaluation. + + Args: + image_key: A unique string/integer identifier for the image. + detected_boxes: float32 numpy array of shape [num_boxes, 4] + containing `num_boxes` detection boxes of the format + [ymin, xmin, ymax, xmax] in absolute image coordinates. + detected_scores: float32 numpy array of shape [num_boxes] + containing detection scores for the boxes. + detected_class_labels: integer numpy array of shape [num_boxes] + containing 0-indexed detection classes for the boxes. + detected_masks: np.uint8 numpy array of shape + [num_boxes, height, width] containing `num_boxes` detection + masks with values ranging between 0 and 1. + + Raises: + ValueError: if the number of boxes, scores and class labels differ + in length. + """ + if len(detected_boxes) != len(detected_scores) or len( + detected_boxes) != len(detected_class_labels): + raise ValueError( + 'detected_boxes, detected_scores and ' + 'detected_class_labels should all have same lengths. Got' + '[%d, %d, %d]' % len(detected_boxes), + len(detected_scores), + len(detected_class_labels), + ) + + if image_key in self.detection_keys: + warnings.warn(('image %s has already been added to the ground ' + 'truth database.'), image_key) + return + + self.detection_keys.add(image_key) + if image_key in self.groundtruth_boxes: + groundtruth_boxes = self.groundtruth_boxes[image_key] + groundtruth_class_labels = self.groundtruth_class_labels[image_key] + # Masks are popped instead of look up. The reason is that we do not + # want to keep all masks in memory which can cause memory overflow. + groundtruth_masks = self.groundtruth_masks.pop(image_key) + else: + groundtruth_boxes = np.empty(shape=[0, 4], dtype=float) + groundtruth_class_labels = np.array([], dtype=int) + if detected_masks is None: + groundtruth_masks = None + else: + groundtruth_masks = np.empty(shape=[0, 1, 1], dtype=float) + ( + scores, + tp_fp_labels, + ) = self.per_image_eval.compute_object_detection_metrics( + detected_boxes=detected_boxes, + detected_scores=detected_scores, + detected_class_labels=detected_class_labels, + groundtruth_boxes=groundtruth_boxes, + groundtruth_class_labels=groundtruth_class_labels, + detected_masks=detected_masks, + groundtruth_masks=groundtruth_masks, + ) + + for i in range(self.num_class): + if scores[i].shape[0] > 0: + self.scores_per_class[i].append(scores[i]) + self.tp_fp_labels_per_class[i].append(tp_fp_labels[i]) + + def _update_ground_truth_statistics(self, groundtruth_class_labels): + """Update grouth truth statitistics. + + Args: + groundtruth_class_labels: An integer numpy array of length M, + representing M class labels of object instances in ground truth + """ + count = defaultdict(lambda: 0) + for label in groundtruth_class_labels: + count[label] += 1 + for k in count: + self.num_gt_instances_per_class[k] += count[k] + self.num_gt_imgs_per_class[k] += 1 + + def evaluate(self): + """Compute evaluation result. + + Returns: + A named tuple with the following fields - + average_precision: float numpy array of average precision for + each class. + mean_ap: mean average precision of all classes, float scalar + precisions: List of precisions, each precision is a float numpy + array + recalls: List of recalls, each recall is a float numpy array + corloc: numpy float array + mean_corloc: Mean CorLoc score for each class, float scalar + """ + if (self.num_gt_instances_per_class == 0).any(): + logging.info( + 'The following classes have no ground truth examples: %s', + np.squeeze(np.argwhere(self.num_gt_instances_per_class == 0)) + + self.label_id_offset) + + if self.use_weighted_mean_ap: + all_scores = np.array([], dtype=float) + all_tp_fp_labels = np.array([], dtype=bool) + + for class_index in range(self.num_class): + if self.num_gt_instances_per_class[class_index] == 0: + continue + if not self.scores_per_class[class_index]: + scores = np.array([], dtype=float) + tp_fp_labels = np.array([], dtype=bool) + else: + scores = np.concatenate(self.scores_per_class[class_index]) + tp_fp_labels = np.concatenate( + self.tp_fp_labels_per_class[class_index]) + if self.use_weighted_mean_ap: + all_scores = np.append(all_scores, scores) + all_tp_fp_labels = np.append(all_tp_fp_labels, tp_fp_labels) + precision, recall = metrics.compute_precision_recall( + scores, tp_fp_labels, + self.num_gt_instances_per_class[class_index]) + self.precisions_per_class.append(precision) + self.recalls_per_class.append(recall) + average_precision = metrics.compute_average_precision( + precision, recall) + self.average_precision_per_class[class_index] = average_precision + + self.corloc_per_class = metrics.compute_cor_loc( + self.num_gt_imgs_per_class, + self.num_images_correctly_detected_per_class) + + if self.use_weighted_mean_ap: + num_gt_instances = np.sum(self.num_gt_instances_per_class) + precision, recall = metrics.compute_precision_recall( + all_scores, all_tp_fp_labels, num_gt_instances) + mean_ap = metrics.compute_average_precision(precision, recall) + else: + mean_ap = np.nanmean(self.average_precision_per_class) + mean_corloc = np.nanmean(self.corloc_per_class) + return ObjectDetectionEvalMetrics( + self.average_precision_per_class, + mean_ap, + self.precisions_per_class, + self.recalls_per_class, + self.corloc_per_class, + mean_corloc, + ) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/per_image_evaluation.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/per_image_evaluation.py new file mode 100644 index 0000000000000000000000000000000000000000..9a6e0d9e405ec00b272ce9d195a3406af4ebbafc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/per_image_evaluation.py @@ -0,0 +1,358 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================= +"""Evaluate Object Detection result on a single image. + +Annotate each detected result as true positives or false positive according to +a predefined IOU ratio. Non Maximum Suppression is used by default. Multi class +detection is supported by default. Based on the settings, per image evaluation +is either performed on boxes or on object masks. +""" + +import numpy as np + +from . import np_box_list, np_box_ops + + +class PerImageEvaluation: + """Evaluate detection result of a single image.""" + + def __init__(self, num_groundtruth_classes, matching_iou_threshold=0.5): + """Initialized PerImageEvaluation by evaluation parameters. + + Args: + num_groundtruth_classes: Number of ground truth object classes + matching_iou_threshold: A ratio of area intersection to union, + which is the threshold to consider whether a detection is true + positive or not + """ + self.matching_iou_threshold = matching_iou_threshold + self.num_groundtruth_classes = num_groundtruth_classes + + def compute_object_detection_metrics(self, + detected_boxes, + detected_scores, + detected_class_labels, + groundtruth_boxes, + groundtruth_class_labels, + detected_masks=None, + groundtruth_masks=None): + """Evaluates detections as being tp, fp or ignored from a single image. + + The evaluation is done in two stages: + 1. All detections are matched to non group-of boxes. + + Args: + detected_boxes: A float numpy array of shape [N, 4], representing N + regions of detected object regions. + Each row is of the format [y_min, x_min, y_max, x_max] + detected_scores: A float numpy array of shape [N, 1], representing + the confidence scores of the detected N object instances. + detected_class_labels: A integer numpy array of shape [N, 1], + repreneting the class labels of the detected N object + instances. + groundtruth_boxes: A float numpy array of shape [M, 4], + representing M regions of object instances in ground truth + groundtruth_class_labels: An integer numpy array of shape [M, 1], + representing M class labels of object instances in ground truth + detected_masks: (optional) A uint8 numpy array of shape + [N, height, width]. If not None, the metrics will be computed + based on masks. + groundtruth_masks: (optional) A uint8 numpy array of shape + [M, height, width]. + + Returns: + scores: A list of C float numpy arrays. Each numpy array is of + shape [K, 1], representing K scores detected with object class + label c + tp_fp_labels: A list of C boolean numpy arrays. Each numpy array + is of shape [K, 1], representing K True/False positive label of + object instances detected with class label c + """ + ( + detected_boxes, + detected_scores, + detected_class_labels, + detected_masks, + ) = self._remove_invalid_boxes( + detected_boxes, + detected_scores, + detected_class_labels, + detected_masks, + ) + scores, tp_fp_labels = self._compute_tp_fp( + detected_boxes=detected_boxes, + detected_scores=detected_scores, + detected_class_labels=detected_class_labels, + groundtruth_boxes=groundtruth_boxes, + groundtruth_class_labels=groundtruth_class_labels, + detected_masks=detected_masks, + groundtruth_masks=groundtruth_masks, + ) + + return scores, tp_fp_labels + + def _compute_tp_fp(self, + detected_boxes, + detected_scores, + detected_class_labels, + groundtruth_boxes, + groundtruth_class_labels, + detected_masks=None, + groundtruth_masks=None): + """Labels true/false positives of detections of an image across all + classes. + + Args: + detected_boxes: A float numpy array of shape [N, 4], representing N + regions of detected object regions. + Each row is of the format [y_min, x_min, y_max, x_max] + detected_scores: A float numpy array of shape [N, 1], representing + the confidence scores of the detected N object instances. + detected_class_labels: A integer numpy array of shape [N, 1], + repreneting the class labels of the detected N object + instances. + groundtruth_boxes: A float numpy array of shape [M, 4], + representing M regions of object instances in ground truth + groundtruth_class_labels: An integer numpy array of shape [M, 1], + representing M class labels of object instances in ground truth + detected_masks: (optional) A np.uint8 numpy array of shape + [N, height, width]. If not None, the scores will be computed + based on masks. + groundtruth_masks: (optional) A np.uint8 numpy array of shape + [M, height, width]. + + Returns: + result_scores: A list of float numpy arrays. Each numpy array is of + shape [K, 1], representing K scores detected with object class + label c + result_tp_fp_labels: A list of boolean numpy array. Each numpy + array is of shape [K, 1], representing K True/False positive + label of object instances detected with class label c + + Raises: + ValueError: If detected masks is not None but groundtruth masks are + None, or the other way around. + """ + if detected_masks is not None and groundtruth_masks is None: + raise ValueError( + 'Detected masks is available but groundtruth masks is not.') + if detected_masks is None and groundtruth_masks is not None: + raise ValueError( + 'Groundtruth masks is available but detected masks is not.') + + result_scores = [] + result_tp_fp_labels = [] + for i in range(self.num_groundtruth_classes): + (gt_boxes_at_ith_class, gt_masks_at_ith_class, + detected_boxes_at_ith_class, detected_scores_at_ith_class, + detected_masks_at_ith_class) = self._get_ith_class_arrays( + detected_boxes, detected_scores, detected_masks, + detected_class_labels, groundtruth_boxes, groundtruth_masks, + groundtruth_class_labels, i) + scores, tp_fp_labels = self._compute_tp_fp_for_single_class( + detected_boxes=detected_boxes_at_ith_class, + detected_scores=detected_scores_at_ith_class, + groundtruth_boxes=gt_boxes_at_ith_class, + detected_masks=detected_masks_at_ith_class, + groundtruth_masks=gt_masks_at_ith_class, + ) + result_scores.append(scores) + result_tp_fp_labels.append(tp_fp_labels) + return result_scores, result_tp_fp_labels + + @staticmethod + def _get_overlaps_and_scores_box_mode(detected_boxes, detected_scores, + groundtruth_boxes): + """Computes overlaps and scores between detected and groudntruth boxes. + + Args: + detected_boxes: A numpy array of shape [N, 4] representing detected + box coordinates + detected_scores: A 1-d numpy array of length N representing + classification score + groundtruth_boxes: A numpy array of shape [M, 4] representing + ground truth box coordinates + + Returns: + iou: A float numpy array of size [num_detected_boxes, + num_gt_boxes]. If gt_non_group_of_boxlist.num_boxes() == 0 it + will be None. + ioa: A float numpy array of size [num_detected_boxes, + num_gt_boxes]. If gt_group_of_boxlist.num_boxes() == 0 it will + be None. + scores: The score of the detected boxlist. + num_boxes: Number of non-maximum suppressed detected boxes. + """ + detected_boxlist = np_box_list.BoxList(detected_boxes) + detected_boxlist.add_field('scores', detected_scores) + gt_non_group_of_boxlist = np_box_list.BoxList(groundtruth_boxes) + + iou = np_box_ops.iou(detected_boxlist.get(), + gt_non_group_of_boxlist.get()) + scores = detected_boxlist.get_field('scores') + num_boxes = detected_boxlist.num_boxes() + return iou, None, scores, num_boxes + + def _compute_tp_fp_for_single_class(self, + detected_boxes, + detected_scores, + groundtruth_boxes, + detected_masks=None, + groundtruth_masks=None): + """Labels boxes detected with the same class from the same image as + tp/fp. + + Args: + detected_boxes: A numpy array of shape [N, 4] representing detected + box coordinates + detected_scores: A 1-d numpy array of length N representing + classification score + groundtruth_boxes: A numpy array of shape [M, 4] representing + groundtruth box coordinates + detected_masks: (optional) A uint8 numpy array of shape + [N, height, width]. If not None, the scores will be computed + based on masks. + groundtruth_masks: (optional) A uint8 numpy array of shape + [M, height, width]. + + Returns: + Two arrays of the same size, containing all boxes that were + evaluated as being true positives or false positives. + + scores: A numpy array representing the detection scores. + tp_fp_labels: a boolean numpy array indicating whether a detection + is a true positive. + """ + if detected_boxes.size == 0: + return np.array([], dtype=float), np.array([], dtype=bool) + + (iou, _, scores, + num_detected_boxes) = self._get_overlaps_and_scores_box_mode( + detected_boxes=detected_boxes, + detected_scores=detected_scores, + groundtruth_boxes=groundtruth_boxes) + + if groundtruth_boxes.size == 0: + return scores, np.zeros(num_detected_boxes, dtype=bool) + + tp_fp_labels = np.zeros(num_detected_boxes, dtype=bool) + + # The evaluation is done in two stages: + # 1. All detections are matched to non group-of boxes. + # 2. Detections that are determined as false positives are matched + # against group-of boxes and ignored if matched. + + # Tp-fp evaluation for non-group of boxes (if any). + if iou.shape[1] > 0: + max_overlap_gt_ids = np.argmax(iou, axis=1) + is_gt_box_detected = np.zeros(iou.shape[1], dtype=bool) + for i in range(num_detected_boxes): + gt_id = max_overlap_gt_ids[i] + if iou[i, gt_id] >= self.matching_iou_threshold: + if not is_gt_box_detected[gt_id]: + tp_fp_labels[i] = True + is_gt_box_detected[gt_id] = True + + return scores, tp_fp_labels + + @staticmethod + def _get_ith_class_arrays(detected_boxes, detected_scores, detected_masks, + detected_class_labels, groundtruth_boxes, + groundtruth_masks, groundtruth_class_labels, + class_index): + """Returns numpy arrays belonging to class with index `class_index`. + + Args: + detected_boxes: A numpy array containing detected boxes. + detected_scores: A numpy array containing detected scores. + detected_masks: A numpy array containing detected masks. + detected_class_labels: A numpy array containing detected class + labels. + groundtruth_boxes: A numpy array containing groundtruth boxes. + groundtruth_masks: A numpy array containing groundtruth masks. + groundtruth_class_labels: A numpy array containing groundtruth + class labels. + class_index: An integer index. + + Returns: + gt_boxes_at_ith_class: A numpy array containing groundtruth boxes + labeled as ith class. + gt_masks_at_ith_class: A numpy array containing groundtruth masks + labeled as ith class. + detected_boxes_at_ith_class: A numpy array containing detected + boxes corresponding to the ith class. + detected_scores_at_ith_class: A numpy array containing detected + scores corresponding to the ith class. + detected_masks_at_ith_class: A numpy array containing detected + masks corresponding to the ith class. + """ + selected_groundtruth = groundtruth_class_labels == class_index + gt_boxes_at_ith_class = groundtruth_boxes[selected_groundtruth] + if groundtruth_masks is not None: + gt_masks_at_ith_class = groundtruth_masks[selected_groundtruth] + else: + gt_masks_at_ith_class = None + selected_detections = detected_class_labels == class_index + detected_boxes_at_ith_class = detected_boxes[selected_detections] + detected_scores_at_ith_class = detected_scores[selected_detections] + if detected_masks is not None: + detected_masks_at_ith_class = detected_masks[selected_detections] + else: + detected_masks_at_ith_class = None + return (gt_boxes_at_ith_class, gt_masks_at_ith_class, + detected_boxes_at_ith_class, detected_scores_at_ith_class, + detected_masks_at_ith_class) + + @staticmethod + def _remove_invalid_boxes(detected_boxes, + detected_scores, + detected_class_labels, + detected_masks=None): + """Removes entries with invalid boxes. + + A box is invalid if either its xmax is smaller than its xmin, or its + ymax is smaller than its ymin. + + Args: + detected_boxes: A float numpy array of size [num_boxes, 4] + containing box coordinates in [ymin, xmin, ymax, xmax] format. + detected_scores: A float numpy array of size [num_boxes]. + detected_class_labels: A int32 numpy array of size [num_boxes]. + detected_masks: A uint8 numpy array of size + [num_boxes, height, width]. + + Returns: + valid_detected_boxes: A float numpy array of size + [num_valid_boxes, 4] containing box coordinates in + [ymin, xmin, ymax, xmax] format. + valid_detected_scores: A float numpy array of size + [num_valid_boxes]. + valid_detected_class_labels: A int32 numpy array of size + [num_valid_boxes]. + valid_detected_masks: A uint8 numpy array of size + [num_valid_boxes, height, width]. + """ + valid_indices = np.logical_and( + detected_boxes[:, 0] < detected_boxes[:, 2], + detected_boxes[:, 1] < detected_boxes[:, 3]) + detected_boxes = detected_boxes[valid_indices] + detected_scores = detected_scores[valid_indices] + detected_class_labels = detected_class_labels[valid_indices] + if detected_masks is not None: + detected_masks = detected_masks[valid_indices] + return [ + detected_boxes, detected_scores, detected_class_labels, + detected_masks + ] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/standard_fields.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/standard_fields.py new file mode 100644 index 0000000000000000000000000000000000000000..8edf46d0816ab34458e5587b39b735c977f71572 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_evaluation/standard_fields.py @@ -0,0 +1,115 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================= +"""Contains classes specifying naming conventions used for object detection. + +Specifies: + InputDataFields: standard fields used by reader/preprocessor/batcher. + DetectionResultFields: standard fields returned by object detector. +""" + + +class InputDataFields: + """Names for the input tensors. + + Holds the standard data field names to use for identifying input tensors. + This should be used by the decoder to identify keys for the returned + tensor_dict containing input tensors. And it should be used by the model to + identify the tensors it needs. + + Attributes: + image: image. + original_image: image in the original input size. + key: unique key corresponding to image. + source_id: source of the original image. + filename: original filename of the dataset (without common path). + groundtruth_image_classes: image-level class labels. + groundtruth_boxes: coordinates of the ground truth boxes in the image. + groundtruth_classes: box-level class labels. + groundtruth_label_types: box-level label types (e.g. explicit + negative). + groundtruth_is_crowd: [DEPRECATED, use groundtruth_group_of instead] + is the groundtruth a single object or a crowd. + groundtruth_area: area of a groundtruth segment. + groundtruth_difficult: is a `difficult` object + groundtruth_group_of: is a `group_of` objects, e.g. multiple objects of + the same class, forming a connected group, where instances are + heavily occluding each other. + proposal_boxes: coordinates of object proposal boxes. + proposal_objectness: objectness score of each proposal. + groundtruth_instance_masks: ground truth instance masks. + groundtruth_instance_boundaries: ground truth instance boundaries. + groundtruth_instance_classes: instance mask-level class labels. + groundtruth_keypoints: ground truth keypoints. + groundtruth_keypoint_visibilities: ground truth keypoint visibilities. + groundtruth_label_scores: groundtruth label scores. + groundtruth_weights: groundtruth weight factor for bounding boxes. + num_groundtruth_boxes: number of groundtruth boxes. + true_image_shapes: true shapes of images in the resized images, as + resized images can be padded with zeros. + """ + + image = 'image' + original_image = 'original_image' + key = 'key' + source_id = 'source_id' + filename = 'filename' + groundtruth_image_classes = 'groundtruth_image_classes' + groundtruth_boxes = 'groundtruth_boxes' + groundtruth_classes = 'groundtruth_classes' + groundtruth_label_types = 'groundtruth_label_types' + groundtruth_is_crowd = 'groundtruth_is_crowd' + groundtruth_area = 'groundtruth_area' + groundtruth_difficult = 'groundtruth_difficult' + groundtruth_group_of = 'groundtruth_group_of' + proposal_boxes = 'proposal_boxes' + proposal_objectness = 'proposal_objectness' + groundtruth_instance_masks = 'groundtruth_instance_masks' + groundtruth_instance_boundaries = 'groundtruth_instance_boundaries' + groundtruth_instance_classes = 'groundtruth_instance_classes' + groundtruth_keypoints = 'groundtruth_keypoints' + groundtruth_keypoint_visibilities = 'groundtruth_keypoint_visibilities' + groundtruth_label_scores = 'groundtruth_label_scores' + groundtruth_weights = 'groundtruth_weights' + num_groundtruth_boxes = 'num_groundtruth_boxes' + true_image_shape = 'true_image_shape' + + +class DetectionResultFields: + """Naming conventions for storing the output of the detector. + + Attributes: + source_id: source of the original image. + key: unique key corresponding to image. + detection_boxes: coordinates of the detection boxes in the image. + detection_scores: detection scores for the detection boxes in the + image. + detection_classes: detection-level class labels. + detection_masks: contains a segmentation mask for each detection box. + detection_boundaries: contains an object boundary for each detection + box. + detection_keypoints: contains detection keypoints for each detection + box. + num_detections: number of detections in the batch. + """ + + source_id = 'source_id' + key = 'key' + detection_boxes = 'detection_boxes' + detection_scores = 'detection_scores' + detection_classes = 'detection_classes' + detection_masks = 'detection_masks' + detection_boundaries = 'detection_boundaries' + detection_keypoints = 'detection_keypoints' + num_detections = 'num_detections' diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_utils.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..ab11669b64ac2d68649f627ab971a50cfb8f4166 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/ava_utils.py @@ -0,0 +1,240 @@ +# Copyright (c) OpenMMLab. All rights reserved. +# This piece of code is directly adapted from ActivityNet official repo +# https://github.com/activitynet/ActivityNet/blob/master/ +# Evaluation/get_ava_performance.py. Some unused codes are removed. +import csv +import logging +import time +from collections import defaultdict + +import numpy as np + +from .ava_evaluation import object_detection_evaluation as det_eval +from .ava_evaluation import standard_fields + + +def det2csv(dataset, results, custom_classes): + csv_results = [] + for idx in range(len(dataset)): + video_id = dataset.video_infos[idx]['video_id'] + timestamp = dataset.video_infos[idx]['timestamp'] + result = results[idx] + for label, _ in enumerate(result): + for bbox in result[label]: + bbox_ = tuple(bbox.tolist()) + if custom_classes is not None: + actual_label = custom_classes[label + 1] + else: + actual_label = label + 1 + csv_results.append(( + video_id, + timestamp, + ) + bbox_[:4] + (actual_label, ) + bbox_[4:]) + return csv_results + + +# results is organized by class +def results2csv(dataset, results, out_file, custom_classes=None): + if isinstance(results[0], list): + csv_results = det2csv(dataset, results, custom_classes) + + # save space for float + def to_str(item): + if isinstance(item, float): + return f'{item:.3f}' + return str(item) + + with open(out_file, 'w') as f: + for csv_result in csv_results: + f.write(','.join(map(to_str, csv_result))) + f.write('\n') + + +def print_time(message, start): + print('==> %g seconds to %s' % (time.time() - start, message), flush=True) + + +def make_image_key(video_id, timestamp): + """Returns a unique identifier for a video id & timestamp.""" + return f'{video_id},{int(timestamp):04d}' + + +def read_csv(csv_file, class_whitelist=None): + """Loads boxes and class labels from a CSV file in the AVA format. + + CSV file format described at https://research.google.com/ava/download.html. + + Args: + csv_file: A file object. + class_whitelist: If provided, boxes corresponding to (integer) class + labels not in this set are skipped. + + Returns: + boxes: A dictionary mapping each unique image key (string) to a list of + boxes, given as coordinates [y1, x1, y2, x2]. + labels: A dictionary mapping each unique image key (string) to a list + of integer class labels, matching the corresponding box in `boxes`. + scores: A dictionary mapping each unique image key (string) to a list + of score values labels, matching the corresponding label in `labels`. + If scores are not provided in the csv, then they will default to 1.0. + """ + start = time.time() + entries = defaultdict(list) + boxes = defaultdict(list) + labels = defaultdict(list) + scores = defaultdict(list) + reader = csv.reader(csv_file) + for row in reader: + assert len(row) in [7, 8], 'Wrong number of columns: ' + row + image_key = make_image_key(row[0], row[1]) + x1, y1, x2, y2 = [float(n) for n in row[2:6]] + action_id = int(row[6]) + if class_whitelist and action_id not in class_whitelist: + continue + + score = 1.0 + if len(row) == 8: + score = float(row[7]) + + entries[image_key].append((score, action_id, y1, x1, y2, x2)) + + for image_key in entries: + # Evaluation API assumes boxes with descending scores + entry = sorted(entries[image_key], key=lambda tup: -tup[0]) + boxes[image_key] = [x[2:] for x in entry] + labels[image_key] = [x[1] for x in entry] + scores[image_key] = [x[0] for x in entry] + + print_time('read file ' + csv_file.name, start) + return boxes, labels, scores + + +def read_exclusions(exclusions_file): + """Reads a CSV file of excluded timestamps. + + Args: + exclusions_file: A file object containing a csv of video-id,timestamp. + + Returns: + A set of strings containing excluded image keys, e.g. + "aaaaaaaaaaa,0904", + or an empty set if exclusions file is None. + """ + excluded = set() + if exclusions_file: + reader = csv.reader(exclusions_file) + for row in reader: + assert len(row) == 2, f'Expected only 2 columns, got: {row}' + excluded.add(make_image_key(row[0], row[1])) + return excluded + + +def read_labelmap(labelmap_file): + """Reads a labelmap without the dependency on protocol buffers. + + Args: + labelmap_file: A file object containing a label map protocol buffer. + + Returns: + labelmap: The label map in the form used by the + object_detection_evaluation + module - a list of {"id": integer, "name": classname } dicts. + class_ids: A set containing all of the valid class id integers. + """ + labelmap = [] + class_ids = set() + name = '' + class_id = '' + for line in labelmap_file: + if line.startswith(' name:'): + name = line.split('"')[1] + elif line.startswith(' id:') or line.startswith(' label_id:'): + class_id = int(line.strip().split(' ')[-1]) + labelmap.append({'id': class_id, 'name': name}) + class_ids.add(class_id) + return labelmap, class_ids + + +# Seems there is at most 100 detections for each image +def ava_eval(result_file, + result_type, + label_file, + ann_file, + exclude_file, + verbose=True, + custom_classes=None): + + assert result_type in ['mAP'] + + start = time.time() + categories, class_whitelist = read_labelmap(open(label_file)) + if custom_classes is not None: + custom_classes = custom_classes[1:] + assert set(custom_classes).issubset(set(class_whitelist)) + class_whitelist = custom_classes + categories = [cat for cat in categories if cat['id'] in custom_classes] + + # loading gt, do not need gt score + gt_boxes, gt_labels, _ = read_csv(open(ann_file), class_whitelist) + if verbose: + print_time('Reading detection results', start) + + if exclude_file is not None: + excluded_keys = read_exclusions(open(exclude_file)) + else: + excluded_keys = list() + + start = time.time() + boxes, labels, scores = read_csv(open(result_file), class_whitelist) + if verbose: + print_time('Reading detection results', start) + + # Evaluation for mAP + pascal_evaluator = det_eval.PascalDetectionEvaluator(categories) + + start = time.time() + for image_key in gt_boxes: + if verbose and image_key in excluded_keys: + logging.info( + 'Found excluded timestamp in detections: %s.' + 'It will be ignored.', image_key) + continue + pascal_evaluator.add_single_ground_truth_image_info( + image_key, { + standard_fields.InputDataFields.groundtruth_boxes: + np.array(gt_boxes[image_key], dtype=float), + standard_fields.InputDataFields.groundtruth_classes: + np.array(gt_labels[image_key], dtype=int) + }) + if verbose: + print_time('Convert groundtruth', start) + + start = time.time() + for image_key in boxes: + if verbose and image_key in excluded_keys: + logging.info( + 'Found excluded timestamp in detections: %s.' + 'It will be ignored.', image_key) + continue + pascal_evaluator.add_single_detected_image_info( + image_key, { + standard_fields.DetectionResultFields.detection_boxes: + np.array(boxes[image_key], dtype=float), + standard_fields.DetectionResultFields.detection_classes: + np.array(labels[image_key], dtype=int), + standard_fields.DetectionResultFields.detection_scores: + np.array(scores[image_key], dtype=float) + }) + if verbose: + print_time('convert detections', start) + + start = time.time() + metrics = pascal_evaluator.evaluate() + if verbose: + print_time('run_evaluator', start) + for display_name in metrics: + print(f'{display_name}=\t{metrics[display_name]}') + return { + display_name: metrics[display_name] + for display_name in metrics if 'ByCategory' not in display_name + } diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/eval_detection.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/eval_detection.py new file mode 100644 index 0000000000000000000000000000000000000000..604ba4fb7ef04e3399283f6fb1f75bd14b1d50fd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/eval_detection.py @@ -0,0 +1,234 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import json + +import numpy as np +from mmcv.utils import print_log + +from ...utils import get_root_logger +from .accuracy import interpolated_precision_recall, pairwise_temporal_iou + + +class ActivityNetLocalization: + """Class to evaluate detection results on ActivityNet. + + Args: + ground_truth_filename (str | None): The filename of groundtruth. + Default: None. + prediction_filename (str | None): The filename of action detection + results. Default: None. + tiou_thresholds (np.ndarray): The thresholds of temporal iou to + evaluate. Default: ``np.linspace(0.5, 0.95, 10)``. + verbose (bool): Whether to print verbose logs. Default: False. + """ + + def __init__(self, + ground_truth_filename=None, + prediction_filename=None, + tiou_thresholds=np.linspace(0.5, 0.95, 10), + verbose=False): + if not ground_truth_filename: + raise IOError('Please input a valid ground truth file.') + if not prediction_filename: + raise IOError('Please input a valid prediction file.') + self.ground_truth_filename = ground_truth_filename + self.prediction_filename = prediction_filename + self.tiou_thresholds = tiou_thresholds + self.verbose = verbose + self.ap = None + self.logger = get_root_logger() + # Import ground truth and predictions. + self.ground_truth, self.activity_index = self._import_ground_truth( + ground_truth_filename) + self.prediction = self._import_prediction(prediction_filename) + + if self.verbose: + log_msg = ( + '[INIT] Loaded ground_truth from ' + f'{self.ground_truth_filename}, prediction from ' + f'{self.prediction_filename}.\n' + f'Number of ground truth instances: {len(self.ground_truth)}\n' + f'Number of predictions: {len(self.prediction)}\n' + f'Fixed threshold for tiou score: {self.tiou_thresholds}') + print_log(log_msg, logger=self.logger) + + @staticmethod + def _import_ground_truth(ground_truth_filename): + """Read ground truth file and return the ground truth instances and the + activity classes. + + Args: + ground_truth_filename (str): Full path to the ground truth json + file. + + Returns: + tuple[list, dict]: (ground_truth, activity_index). + ground_truth contains the ground truth instances, which is in a + dict format. + activity_index contains classes index. + """ + with open(ground_truth_filename, 'r') as f: + data = json.load(f) + # Checking format + activity_index, class_idx = {}, 0 + ground_truth = [] + for video_id, video_info in data.items(): + for anno in video_info['annotations']: + if anno['label'] not in activity_index: + activity_index[anno['label']] = class_idx + class_idx += 1 + # old video_anno + ground_truth_item = {} + ground_truth_item['video-id'] = video_id[2:] + ground_truth_item['t-start'] = float(anno['segment'][0]) + ground_truth_item['t-end'] = float(anno['segment'][1]) + ground_truth_item['label'] = activity_index[anno['label']] + ground_truth.append(ground_truth_item) + + return ground_truth, activity_index + + def _import_prediction(self, prediction_filename): + """Read prediction file and return the prediction instances. + + Args: + prediction_filename (str): Full path to the prediction json file. + + Returns: + List: List containing the prediction instances (dictionaries). + """ + with open(prediction_filename, 'r') as f: + data = json.load(f) + # Read predictions. + prediction = [] + for video_id, video_info in data['results'].items(): + for result in video_info: + prediction_item = dict() + prediction_item['video-id'] = video_id + prediction_item['label'] = self.activity_index[result['label']] + prediction_item['t-start'] = float(result['segment'][0]) + prediction_item['t-end'] = float(result['segment'][1]) + prediction_item['score'] = result['score'] + prediction.append(prediction_item) + + return prediction + + def wrapper_compute_average_precision(self): + """Computes average precision for each class.""" + ap = np.zeros((len(self.tiou_thresholds), len(self.activity_index))) + + # Adaptation to query faster + ground_truth_by_label = [] + prediction_by_label = [] + for i in range(len(self.activity_index)): + ground_truth_by_label.append([]) + prediction_by_label.append([]) + for gt in self.ground_truth: + ground_truth_by_label[gt['label']].append(gt) + for pred in self.prediction: + prediction_by_label[pred['label']].append(pred) + + for i in range(len(self.activity_index)): + ap_result = compute_average_precision_detection( + ground_truth_by_label[i], prediction_by_label[i], + self.tiou_thresholds) + ap[:, i] = ap_result + + return ap + + def evaluate(self): + """Evaluates a prediction file. + + For the detection task we measure the interpolated mean average + precision to measure the performance of a method. + """ + self.ap = self.wrapper_compute_average_precision() + + self.mAP = self.ap.mean(axis=1) + self.average_mAP = self.mAP.mean() + + return self.mAP, self.average_mAP + + +def compute_average_precision_detection(ground_truth, + prediction, + tiou_thresholds=np.linspace( + 0.5, 0.95, 10)): + """Compute average precision (detection task) between ground truth and + predictions data frames. If multiple predictions occurs for the same + predicted segment, only the one with highest score is matches as true + positive. This code is greatly inspired by Pascal VOC devkit. + + Args: + ground_truth (list[dict]): List containing the ground truth instances + (dictionaries). Required keys are 'video-id', 't-start' and + 't-end'. + prediction (list[dict]): List containing the prediction instances + (dictionaries). Required keys are: 'video-id', 't-start', 't-end' + and 'score'. + tiou_thresholds (np.ndarray): A 1darray indicates the temporal + intersection over union threshold, which is optional. + Default: ``np.linspace(0.5, 0.95, 10)``. + + Returns: + Float: ap, Average precision score. + """ + num_thresholds = len(tiou_thresholds) + num_gts = len(ground_truth) + num_preds = len(prediction) + ap = np.zeros(num_thresholds) + if len(prediction) == 0: + return ap + + num_positive = float(num_gts) + lock_gt = np.ones((num_thresholds, num_gts)) * -1 + # Sort predictions by decreasing score order. + prediction.sort(key=lambda x: -x['score']) + # Initialize true positive and false positive vectors. + tp = np.zeros((num_thresholds, num_preds)) + fp = np.zeros((num_thresholds, num_preds)) + + # Adaptation to query faster + ground_truth_by_videoid = {} + for i, item in enumerate(ground_truth): + item['index'] = i + ground_truth_by_videoid.setdefault(item['video-id'], []).append(item) + + # Assigning true positive to truly grount truth instances. + for idx, pred in enumerate(prediction): + if pred['video-id'] in ground_truth_by_videoid: + gts = ground_truth_by_videoid[pred['video-id']] + else: + fp[:, idx] = 1 + continue + + tiou_arr = pairwise_temporal_iou( + np.array([pred['t-start'], pred['t-end']]), + np.array([np.array([gt['t-start'], gt['t-end']]) for gt in gts])) + tiou_arr = tiou_arr.reshape(-1) + # We would like to retrieve the predictions with highest tiou score. + tiou_sorted_idx = tiou_arr.argsort()[::-1] + for t_idx, tiou_threshold in enumerate(tiou_thresholds): + for j_idx in tiou_sorted_idx: + if tiou_arr[j_idx] < tiou_threshold: + fp[t_idx, idx] = 1 + break + if lock_gt[t_idx, gts[j_idx]['index']] >= 0: + continue + # Assign as true positive after the filters above. + tp[t_idx, idx] = 1 + lock_gt[t_idx, gts[j_idx]['index']] = idx + break + + if fp[t_idx, idx] == 0 and tp[t_idx, idx] == 0: + fp[t_idx, idx] = 1 + + tp_cumsum = np.cumsum(tp, axis=1).astype(np.float) + fp_cumsum = np.cumsum(fp, axis=1).astype(np.float) + recall_cumsum = tp_cumsum / num_positive + + precision_cumsum = tp_cumsum / (tp_cumsum + fp_cumsum) + + for t_idx in range(len(tiou_thresholds)): + ap[t_idx] = interpolated_precision_recall(precision_cumsum[t_idx, :], + recall_cumsum[t_idx, :]) + + return ap diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/eval_hooks.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/eval_hooks.py new file mode 100644 index 0000000000000000000000000000000000000000..e125c3d2c6f8cbd14b466bd00fe9fa7968da6ae1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/evaluation/eval_hooks.py @@ -0,0 +1,391 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os +import os.path as osp +import warnings +from math import inf + +import torch.distributed as dist +from torch.nn.modules.batchnorm import _BatchNorm +from torch.utils.data import DataLoader + +try: + from mmcv.runner import DistEvalHook as BasicDistEvalHook + from mmcv.runner import EvalHook as BasicEvalHook + + from_mmcv = True + + class EvalHook(BasicEvalHook): + greater_keys = [ + 'acc', 'top', 'AR@', 'auc', 'precision', 'mAP@', 'Recall@' + ] + less_keys = ['loss'] + + def __init__(self, *args, save_best='auto', **kwargs): + super().__init__(*args, save_best=save_best, **kwargs) + + class DistEvalHook(BasicDistEvalHook): + greater_keys = [ + 'acc', 'top', 'AR@', 'auc', 'precision', 'mAP@', 'Recall@' + ] + less_keys = ['loss'] + + def __init__(self, *args, save_best='auto', **kwargs): + super().__init__(*args, save_best=save_best, **kwargs) + +except (ImportError, ModuleNotFoundError): + warnings.warn('DeprecationWarning: EvalHook and DistEvalHook in mmaction2 ' + 'will be deprecated, please install mmcv through master ' + 'branch.') + from_mmcv = False + +if not from_mmcv: + + from mmcv.runner import Hook + + class EvalHook(Hook): # noqa: F811 + """Non-Distributed evaluation hook. + + Notes: + If new arguments are added for EvalHook, tools/test.py, + tools/eval_metric.py may be effected. + + This hook will regularly perform evaluation in a given interval when + performing in non-distributed environment. + + Args: + dataloader (DataLoader): A PyTorch dataloader. + start (int | None, optional): Evaluation starting epoch. It enables + evaluation before the training starts if ``start`` <= the + resuming epoch. If None, whether to evaluate is merely decided + by ``interval``. Default: None. + interval (int): Evaluation interval. Default: 1. + by_epoch (bool): Determine perform evaluation by epoch or by + iteration. If set to True, it will perform by epoch. + Otherwise, by iteration. default: True. + save_best (str | None, optional): If a metric is specified, it + would measure the best checkpoint during evaluation. The + information about best checkpoint would be save in best.json. + Options are the evaluation metrics to the test dataset. e.g., + ``top1_acc``, ``top5_acc``, ``mean_class_accuracy``, + ``mean_average_precision``, ``mmit_mean_average_precision`` + for action recognition dataset (RawframeDataset and + VideoDataset). ``AR@AN``, ``auc`` for action localization + dataset. (ActivityNetDataset). ``mAP@0.5IOU`` for + spatio-temporal action detection dataset (AVADataset). + If ``save_best`` is ``auto``, the first key of the returned + ``OrderedDict`` result will be used. Default: 'auto'. + rule (str | None, optional): Comparison rule for best score. + If set to None, it will infer a reasonable rule. Keys such as + 'acc', 'top' .etc will be inferred by 'greater' rule. Keys + contain 'loss' will be inferred by 'less' rule. Options are + 'greater', 'less', None. Default: None. + **eval_kwargs: Evaluation arguments fed into the evaluate function + of the dataset. + """ + + rule_map = {'greater': lambda x, y: x > y, 'less': lambda x, y: x < y} + init_value_map = {'greater': -inf, 'less': inf} + greater_keys = [ + 'acc', 'top', 'AR@', 'auc', 'precision', 'mAP@', 'Recall@' + ] + less_keys = ['loss'] + + def __init__(self, + dataloader, + start=None, + interval=1, + by_epoch=True, + save_best='auto', + rule=None, + **eval_kwargs): + + if 'key_indicator' in eval_kwargs: + raise RuntimeError( + '"key_indicator" is deprecated, ' + 'you need to use "save_best" instead. ' + 'See https://github.com/open-mmlab/mmaction2/pull/395 ' + 'for more info') + + if not isinstance(dataloader, DataLoader): + raise TypeError(f'dataloader must be a pytorch DataLoader, ' + f'but got {type(dataloader)}') + + if interval <= 0: + raise ValueError( + f'interval must be positive, but got {interval}') + + assert isinstance(by_epoch, bool) + + if start is not None and start < 0: + warnings.warn( + f'The evaluation start epoch {start} is smaller than 0, ' + f'use 0 instead', UserWarning) + start = 0 + self.dataloader = dataloader + self.interval = interval + self.start = start + self.by_epoch = by_epoch + + assert isinstance(save_best, str) or save_best is None + self.save_best = save_best + self.eval_kwargs = eval_kwargs + self.initial_flag = True + + if self.save_best is not None: + self.best_ckpt_path = None + self._init_rule(rule, self.save_best) + + def _init_rule(self, rule, key_indicator): + """Initialize rule, key_indicator, comparison_func, and best score. + + Args: + rule (str | None): Comparison rule for best score. + key_indicator (str | None): Key indicator to determine the + comparison rule. + """ + if rule not in self.rule_map and rule is not None: + raise KeyError(f'rule must be greater, less or None, ' + f'but got {rule}.') + + if rule is None: + if key_indicator != 'auto': + if any(key in key_indicator for key in self.greater_keys): + rule = 'greater' + elif any(key in key_indicator for key in self.less_keys): + rule = 'less' + else: + raise ValueError( + f'Cannot infer the rule for key ' + f'{key_indicator}, thus a specific rule ' + f'must be specified.') + self.rule = rule + self.key_indicator = key_indicator + if self.rule is not None: + self.compare_func = self.rule_map[self.rule] + + def before_run(self, runner): + if self.save_best is not None: + if runner.meta is None: + warnings.warn('runner.meta is None. Creating a empty one.') + runner.meta = dict() + runner.meta.setdefault('hook_msgs', dict()) + + def before_train_iter(self, runner): + """Evaluate the model only at the start of training by + iteration.""" + if self.by_epoch: + return + if not self.initial_flag: + return + if self.start is not None and runner.iter >= self.start: + self.after_train_iter(runner) + self.initial_flag = False + + def before_train_epoch(self, runner): + """Evaluate the model only at the start of training by epoch.""" + if not self.by_epoch: + return + if not self.initial_flag: + return + if self.start is not None and runner.epoch >= self.start: + self.after_train_epoch(runner) + self.initial_flag = False + + def after_train_iter(self, runner): + """Called after every training iter to evaluate the results.""" + if not self.by_epoch: + self._do_evaluate(runner) + + def after_train_epoch(self, runner): + """Called after every training epoch to evaluate the results.""" + if self.by_epoch: + self._do_evaluate(runner) + + def _do_evaluate(self, runner): + """perform evaluation and save ckpt.""" + if not self.evaluation_flag(runner): + return + + from mmaction.apis import single_gpu_test + results = single_gpu_test(runner.model, self.dataloader) + key_score = self.evaluate(runner, results) + if self.save_best: + self._save_ckpt(runner, key_score) + + def evaluation_flag(self, runner): + """Judge whether to perform_evaluation. + + Returns: + bool: The flag indicating whether to perform evaluation. + """ + if self.by_epoch: + current = runner.epoch + check_time = self.every_n_epochs + else: + current = runner.iter + check_time = self.every_n_iters + + if self.start is None: + if not check_time(runner, self.interval): + # No evaluation during the interval. + return False + elif (current + 1) < self.start: + # No evaluation if start is larger than the current time. + return False + else: + # Evaluation only at epochs/iters 3, 5, 7... + # if start==3 and interval==2 + if (current + 1 - self.start) % self.interval: + return False + return True + + def _save_ckpt(self, runner, key_score): + if self.by_epoch: + current = f'epoch_{runner.epoch + 1}' + cur_type, cur_time = 'epoch', runner.epoch + 1 + else: + current = f'iter_{runner.iter + 1}' + cur_type, cur_time = 'iter', runner.iter + 1 + + best_score = runner.meta['hook_msgs'].get( + 'best_score', self.init_value_map[self.rule]) + if self.compare_func(key_score, best_score): + best_score = key_score + runner.meta['hook_msgs']['best_score'] = best_score + + if self.best_ckpt_path and osp.isfile(self.best_ckpt_path): + os.remove(self.best_ckpt_path) + + best_ckpt_name = f'best_{self.key_indicator}_{current}.pth' + runner.save_checkpoint( + runner.work_dir, best_ckpt_name, create_symlink=False) + self.best_ckpt_path = osp.join(runner.work_dir, best_ckpt_name) + + runner.meta['hook_msgs']['best_ckpt'] = self.best_ckpt_path + runner.logger.info( + f'Now best checkpoint is saved as {best_ckpt_name}.') + runner.logger.info( + f'Best {self.key_indicator} is {best_score:0.4f} ' + f'at {cur_time} {cur_type}.') + + def evaluate(self, runner, results): + """Evaluate the results. + + Args: + runner (:obj:`mmcv.Runner`): The underlined training runner. + results (list): Output results. + """ + eval_res = self.dataloader.dataset.evaluate( + results, logger=runner.logger, **self.eval_kwargs) + for name, val in eval_res.items(): + runner.log_buffer.output[name] = val + runner.log_buffer.ready = True + if self.save_best is not None: + if self.key_indicator == 'auto': + # infer from eval_results + self._init_rule(self.rule, list(eval_res.keys())[0]) + return eval_res[self.key_indicator] + + return None + + class DistEvalHook(EvalHook): # noqa: F811 + """Distributed evaluation hook. + + This hook will regularly perform evaluation in a given interval when + performing in distributed environment. + + Args: + dataloader (DataLoader): A PyTorch dataloader. + start (int | None, optional): Evaluation starting epoch. It enables + evaluation before the training starts if ``start`` <= the + resuming epoch. If None, whether to evaluate is merely decided + by ``interval``. Default: None. + interval (int): Evaluation interval. Default: 1. + by_epoch (bool): Determine perform evaluation by epoch or by + iteration. If set to True, it will perform by epoch. Otherwise, + by iteration. default: True. + save_best (str | None, optional): If a metric is specified, it + would measure the best checkpoint during evaluation. The + information about best checkpoint would be save in best.json. + Options are the evaluation metrics to the test dataset. e.g., + ``top1_acc``, ``top5_acc``, ``mean_class_accuracy``, + ``mean_average_precision``, ``mmit_mean_average_precision`` + for action recognition dataset (RawframeDataset and + VideoDataset). ``AR@AN``, ``auc`` for action localization + dataset (ActivityNetDataset). ``mAP@0.5IOU`` for + spatio-temporal action detection dataset (AVADataset). + If ``save_best`` is ``auto``, the first key of the returned + ``OrderedDict`` result will be used. Default: 'auto'. + rule (str | None, optional): Comparison rule for best score. If + set to None, it will infer a reasonable rule. Keys such as + 'acc', 'top' .etc will be inferred by 'greater' rule. Keys + contain 'loss' will be inferred by 'less' rule. Options are + 'greater', 'less', None. Default: None. + tmpdir (str | None): Temporary directory to save the results of all + processes. Default: None. + gpu_collect (bool): Whether to use gpu or cpu to collect results. + Default: False. + broadcast_bn_buffer (bool): Whether to broadcast the + buffer(running_mean and running_var) of rank 0 to other rank + before evaluation. Default: True. + **eval_kwargs: Evaluation arguments fed into the evaluate function + of the dataset. + """ + + def __init__(self, + dataloader, + start=None, + interval=1, + by_epoch=True, + save_best='auto', + rule=None, + broadcast_bn_buffer=True, + tmpdir=None, + gpu_collect=False, + **eval_kwargs): + super().__init__( + dataloader, + start=start, + interval=interval, + by_epoch=by_epoch, + save_best=save_best, + rule=rule, + **eval_kwargs) + self.broadcast_bn_buffer = broadcast_bn_buffer + self.tmpdir = tmpdir + self.gpu_collect = gpu_collect + + def _do_evaluate(self, runner): + """perform evaluation and save ckpt.""" + # Synchronization of BatchNorm's buffer (running_mean + # and running_var) is not supported in the DDP of pytorch, + # which may cause the inconsistent performance of models in + # different ranks, so we broadcast BatchNorm's buffers + # of rank 0 to other ranks to avoid this. + if self.broadcast_bn_buffer: + model = runner.model + for _, module in model.named_modules(): + if isinstance(module, + _BatchNorm) and module.track_running_stats: + dist.broadcast(module.running_var, 0) + dist.broadcast(module.running_mean, 0) + + if not self.evaluation_flag(runner): + return + + from mmaction.apis import multi_gpu_test + tmpdir = self.tmpdir + if tmpdir is None: + tmpdir = osp.join(runner.work_dir, '.eval_hook') + + results = multi_gpu_test( + runner.model, + self.dataloader, + tmpdir=tmpdir, + gpu_collect=self.gpu_collect) + if runner.rank == 0: + print('\n') + key_score = self.evaluate(runner, results) + + if self.save_best: + self._save_ckpt(runner, key_score) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/hooks/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/hooks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..42ce6c6c0e87c43828de96d2749d4725a69677f0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/hooks/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .output import OutputHook + +__all__ = ['OutputHook'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/hooks/output.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/hooks/output.py new file mode 100644 index 0000000000000000000000000000000000000000..fb30bebaac3d5b367bf23360d02bca75439038bb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/hooks/output.py @@ -0,0 +1,68 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import functools +import warnings + +import torch + + +class OutputHook: + """Output feature map of some layers. + + Args: + module (nn.Module): The whole module to get layers. + outputs (tuple[str] | list[str]): Layer name to output. Default: None. + as_tensor (bool): Determine to return a tensor or a numpy array. + Default: False. + """ + + def __init__(self, module, outputs=None, as_tensor=False): + self.outputs = outputs + self.as_tensor = as_tensor + self.layer_outputs = {} + self.handles = [] + self.register(module) + + def register(self, module): + + def hook_wrapper(name): + + def hook(model, input, output): + if not isinstance(output, torch.Tensor): + warnings.warn(f'Directly return the output from {name}, ' + f'since it is not a tensor') + self.layer_outputs[name] = output + elif self.as_tensor: + self.layer_outputs[name] = output + else: + self.layer_outputs[name] = output.detach().cpu().numpy() + + return hook + + if isinstance(self.outputs, (list, tuple)): + for name in self.outputs: + try: + layer = rgetattr(module, name) + h = layer.register_forward_hook(hook_wrapper(name)) + except AttributeError: + raise AttributeError(f'Module {name} not found') + self.handles.append(h) + + def remove(self): + for h in self.handles: + h.remove() + + def __enter__(self): + return self + + def __exit__(self, exc_type, exc_val, exc_tb): + self.remove() + + +# using wonder's beautiful simplification: +# https://stackoverflow.com/questions/31174295/getattr-and-setattr-on-nested-objects +def rgetattr(obj, attr, *args): + + def _getattr(obj, attr): + return getattr(obj, attr, *args) + + return functools.reduce(_getattr, [obj] + attr.split('.')) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/lr/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/lr/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..056c2933eb141780bf15bdb76e77715a4d6bba2b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/lr/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .multigridlr import RelativeStepLrUpdaterHook + +__all__ = ['RelativeStepLrUpdaterHook'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/lr/multigridlr.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/lr/multigridlr.py new file mode 100644 index 0000000000000000000000000000000000000000..1a98b68dec12ce5e73f58d657583c4ce0a9adbf6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/lr/multigridlr.py @@ -0,0 +1,41 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from mmcv.runner.hooks.hook import HOOKS +from mmcv.runner.hooks.lr_updater import LrUpdaterHook + + +@HOOKS.register_module() +class RelativeStepLrUpdaterHook(LrUpdaterHook): + """RelativeStepLrUpdaterHook. + Args: + runner (:obj:`mmcv.Runner`): The runner instance used. + steps (list[int]): The list of epochs at which decrease + the learning rate. + **kwargs (dict): Same as that of mmcv. + """ + + def __init__(self, + runner, + steps, + lrs, + warmup_epochs=34, + warmuplr_start=0.01, + **kwargs): + super().__init__(**kwargs) + assert len(steps) == (len(lrs)) + self.steps = steps + self.lrs = lrs + self.warmup_epochs = warmup_epochs + self.warmuplr_start = warmuplr_start + self.warmuplr_end = self.lrs[0] + super().before_run(runner) + + def get_lr(self, runner, base_lr): + """Similar to that of mmcv.""" + progress = runner.epoch if self.by_epoch else runner.iter + if progress <= self.warmup_epochs: + alpha = (self.warmuplr_end - + self.warmuplr_start) / self.warmup_epochs + return progress * alpha + self.warmuplr_start + for i in range(len(self.steps)): + if progress < self.steps[i]: + return self.lrs[i] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/optimizer/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/optimizer/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..9b96eb660f8f079ed4d79ede9a1853e0329852d5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/optimizer/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .copy_of_sgd import CopyOfSGD +from .tsm_optimizer_constructor import TSMOptimizerConstructor + +__all__ = ['CopyOfSGD', 'TSMOptimizerConstructor'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/optimizer/copy_of_sgd.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/optimizer/copy_of_sgd.py new file mode 100644 index 0000000000000000000000000000000000000000..daec4851dbd858df69d4fd490c56c32862b4ad06 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/optimizer/copy_of_sgd.py @@ -0,0 +1,12 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from mmcv.runner import OPTIMIZERS +from torch.optim import SGD + + +@OPTIMIZERS.register_module() +class CopyOfSGD(SGD): + """A clone of torch.optim.SGD. + + A customized optimizer could be defined like CopyOfSGD. You may derive from + built-in optimizers in torch.optim, or directly implement a new optimizer. + """ diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/optimizer/tsm_optimizer_constructor.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/optimizer/tsm_optimizer_constructor.py new file mode 100644 index 0000000000000000000000000000000000000000..340e37bcbb4ae954f0ebd120253e1eda16df1aac --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/optimizer/tsm_optimizer_constructor.py @@ -0,0 +1,110 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +from mmcv.runner import OPTIMIZER_BUILDERS, DefaultOptimizerConstructor +from mmcv.utils import SyncBatchNorm, _BatchNorm, _ConvNd + + +@OPTIMIZER_BUILDERS.register_module() +class TSMOptimizerConstructor(DefaultOptimizerConstructor): + """Optimizer constructor in TSM model. + + This constructor builds optimizer in different ways from the default one. + + 1. Parameters of the first conv layer have default lr and weight decay. + 2. Parameters of BN layers have default lr and zero weight decay. + 3. If the field "fc_lr5" in paramwise_cfg is set to True, the parameters + of the last fc layer in cls_head have 5x lr multiplier and 10x weight + decay multiplier. + 4. Weights of other layers have default lr and weight decay, and biases + have a 2x lr multiplier and zero weight decay. + """ + + def add_params(self, params, model): + """Add parameters and their corresponding lr and wd to the params. + + Args: + params (list): The list to be modified, containing all parameter + groups and their corresponding lr and wd configurations. + model (nn.Module): The model to be trained with the optimizer. + """ + # use fc_lr5 to determine whether to specify higher multi-factor + # for fc layer weights and bias. + fc_lr5 = self.paramwise_cfg['fc_lr5'] + first_conv_weight = [] + first_conv_bias = [] + normal_weight = [] + normal_bias = [] + lr5_weight = [] + lr10_bias = [] + bn = [] + + conv_cnt = 0 + + for m in model.modules(): + if isinstance(m, _ConvNd): + m_params = list(m.parameters()) + conv_cnt += 1 + if conv_cnt == 1: + first_conv_weight.append(m_params[0]) + if len(m_params) == 2: + first_conv_bias.append(m_params[1]) + else: + normal_weight.append(m_params[0]) + if len(m_params) == 2: + normal_bias.append(m_params[1]) + elif isinstance(m, torch.nn.Linear): + m_params = list(m.parameters()) + normal_weight.append(m_params[0]) + if len(m_params) == 2: + normal_bias.append(m_params[1]) + elif isinstance(m, + (_BatchNorm, SyncBatchNorm, torch.nn.GroupNorm)): + for param in list(m.parameters()): + if param.requires_grad: + bn.append(param) + elif len(m._modules) == 0: + if len(list(m.parameters())) > 0: + raise ValueError(f'New atomic module type: {type(m)}. ' + 'Need to give it a learning policy') + + # pop the cls_head fc layer params + last_fc_weight = normal_weight.pop() + last_fc_bias = normal_bias.pop() + if fc_lr5: + lr5_weight.append(last_fc_weight) + lr10_bias.append(last_fc_bias) + else: + normal_weight.append(last_fc_weight) + normal_bias.append(last_fc_bias) + + params.append({ + 'params': first_conv_weight, + 'lr': self.base_lr, + 'weight_decay': self.base_wd + }) + params.append({ + 'params': first_conv_bias, + 'lr': self.base_lr * 2, + 'weight_decay': 0 + }) + params.append({ + 'params': normal_weight, + 'lr': self.base_lr, + 'weight_decay': self.base_wd + }) + params.append({ + 'params': normal_bias, + 'lr': self.base_lr * 2, + 'weight_decay': 0 + }) + params.append({'params': bn, 'lr': self.base_lr, 'weight_decay': 0}) + params.append({ + 'params': lr5_weight, + 'lr': self.base_lr * 5, + 'weight_decay': self.base_wd + }) + params.append({ + 'params': lr10_bias, + 'lr': self.base_lr * 10, + 'weight_decay': 0 + }) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/runner/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/runner/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..c870e1da443e98a18a5c00cb2a56e24c85d56044 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/runner/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .omnisource_runner import OmniSourceDistSamplerSeedHook, OmniSourceRunner + +__all__ = ['OmniSourceRunner', 'OmniSourceDistSamplerSeedHook'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/runner/omnisource_runner.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/runner/omnisource_runner.py new file mode 100644 index 0000000000000000000000000000000000000000..0209d5d0b1decdb7141459a32cac24372f122c81 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/runner/omnisource_runner.py @@ -0,0 +1,162 @@ +# Copyright (c) Open-MMLab. All rights reserved. +import time +import warnings + +import mmcv +from mmcv.runner import EpochBasedRunner, Hook +from mmcv.runner.utils import get_host_info + + +def cycle(iterable): + iterator = iter(iterable) + while True: + try: + yield next(iterator) + except StopIteration: + iterator = iter(iterable) + + +class OmniSourceDistSamplerSeedHook(Hook): + + def before_epoch(self, runner): + for data_loader in runner.data_loaders: + if hasattr(data_loader.sampler, 'set_epoch'): + # in case the data loader uses `SequentialSampler` in Pytorch + data_loader.sampler.set_epoch(runner.epoch) + elif hasattr(data_loader.batch_sampler.sampler, 'set_epoch'): + # batch sampler in pytorch wraps the sampler as its attributes. + data_loader.batch_sampler.sampler.set_epoch(runner.epoch) + + +class OmniSourceRunner(EpochBasedRunner): + """OmniSource Epoch-based Runner. + + This runner train models epoch by epoch, the epoch length is defined by the + dataloader[0], which is the main dataloader. + """ + + def run_iter(self, data_batch, train_mode, source, **kwargs): + if self.batch_processor is not None: + outputs = self.batch_processor( + self.model, data_batch, train_mode=train_mode, **kwargs) + elif train_mode: + outputs = self.model.train_step(data_batch, self.optimizer, + **kwargs) + else: + outputs = self.model.val_step(data_batch, self.optimizer, **kwargs) + if not isinstance(outputs, dict): + raise TypeError('"batch_processor()" or "model.train_step()"' + 'and "model.val_step()" must return a dict') + # Since we have multiple sources, we add a suffix to log_var names, + # so that we can differentiate them. + if 'log_vars' in outputs: + log_vars = outputs['log_vars'] + log_vars = {k + source: v for k, v in log_vars.items()} + self.log_buffer.update(log_vars, outputs['num_samples']) + + self.outputs = outputs + + def train(self, data_loaders, **kwargs): + self.model.train() + self.mode = 'train' + self.data_loaders = data_loaders + self.main_loader = self.data_loaders[0] + # Add aliasing + self.data_loader = self.main_loader + self.aux_loaders = self.data_loaders[1:] + self.aux_iters = [cycle(loader) for loader in self.aux_loaders] + + auxiliary_iter_times = [1] * len(self.aux_loaders) + use_aux_per_niter = 1 + if 'train_ratio' in kwargs: + train_ratio = kwargs.pop('train_ratio') + use_aux_per_niter = train_ratio[0] + auxiliary_iter_times = train_ratio[1:] + + self._max_iters = self._max_epochs * len(self.main_loader) + + self.call_hook('before_train_epoch') + time.sleep(2) # Prevent possible deadlock during epoch transition + + for i, data_batch in enumerate(self.main_loader): + self._inner_iter = i + self.call_hook('before_train_iter') + self.run_iter(data_batch, train_mode=True, source='') + self.call_hook('after_train_iter') + + if self._iter % use_aux_per_niter != 0: + self._iter += 1 + continue + + for idx, n_times in enumerate(auxiliary_iter_times): + for _ in range(n_times): + data_batch = next(self.aux_iters[idx]) + self.call_hook('before_train_iter') + self.run_iter( + data_batch, train_mode=True, source=f'/aux{idx}') + self.call_hook('after_train_iter') + self._iter += 1 + + self.call_hook('after_train_epoch') + self._epoch += 1 + + # Now that we use validate hook, not implement this func to save efforts. + def val(self, data_loader, **kwargs): + raise NotImplementedError + + def run(self, data_loaders, workflow, max_epochs=None, **kwargs): + """Start running. + + Args: + data_loaders (list[:obj:`DataLoader`]): Dataloaders for training. + `data_loaders[0]` is the main data_loader, which contains + target datasets and determines the epoch length. + `data_loaders[1:]` are auxiliary data loaders, which contain + auxiliary web datasets. + workflow (list[tuple]): A list of (phase, epochs) to specify the + running order and epochs. E.g, [('train', 2)] means running 2 + epochs for training iteratively. Note that val epoch is not + supported for this runner for simplicity. + max_epochs (int | None): The max epochs that training lasts, + deprecated now. Default: None. + """ + assert isinstance(data_loaders, list) + assert mmcv.is_list_of(workflow, tuple) + assert len(workflow) == 1 and workflow[0][0] == 'train' + if max_epochs is not None: + warnings.warn( + 'setting max_epochs in run is deprecated, ' + 'please set max_epochs in runner_config', DeprecationWarning) + self._max_epochs = max_epochs + + assert self._max_epochs is not None, ( + 'max_epochs must be specified during instantiation') + + mode, epochs = workflow[0] + self._max_iters = self._max_epochs * len(data_loaders[0]) + + work_dir = self.work_dir if self.work_dir is not None else 'NONE' + self.logger.info('Start running, host: %s, work_dir: %s', + get_host_info(), work_dir) + self.logger.info('workflow: %s, max: %d epochs', workflow, + self._max_epochs) + self.call_hook('before_run') + + while self.epoch < self._max_epochs: + if isinstance(mode, str): # self.train() + if not hasattr(self, mode): + raise ValueError( + f'runner has no method named "{mode}" to run an ' + 'epoch') + epoch_runner = getattr(self, mode) + else: + raise TypeError( + f'mode in workflow must be a str, but got {mode}') + + for _ in range(epochs): + if mode == 'train' and self.epoch >= self._max_epochs: + break + epoch_runner(data_loaders, **kwargs) + + time.sleep(1) # wait for some hooks like loggers to finish + self.call_hook('after_run') diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/scheduler/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/scheduler/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..55757c435779459bf9aaca0f0798286ff90f1e79 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/scheduler/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .lr_updater import TINLrUpdaterHook + +__all__ = ['TINLrUpdaterHook'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/core/scheduler/lr_updater.py b/openmmlab_test/mmaction2-0.24.1/mmaction/core/scheduler/lr_updater.py new file mode 100644 index 0000000000000000000000000000000000000000..a36f2bb70dadd178c0c57be40663420ab4cce618 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/core/scheduler/lr_updater.py @@ -0,0 +1,40 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from mmcv.runner import HOOKS, LrUpdaterHook +from mmcv.runner.hooks.lr_updater import annealing_cos + + +@HOOKS.register_module() +class TINLrUpdaterHook(LrUpdaterHook): + + def __init__(self, min_lr, **kwargs): + self.min_lr = min_lr + super().__init__(**kwargs) + + def get_warmup_lr(self, cur_iters): + if self.warmup == 'linear': + # 'linear' warmup is rewritten according to TIN repo: + # https://github.com/deepcs233/TIN/blob/master/main.py#L409-L412 + k = (cur_iters / self.warmup_iters) * ( + 1 - self.warmup_ratio) + self.warmup_ratio + warmup_lr = [_lr * k for _lr in self.regular_lr] + elif self.warmup == 'constant': + warmup_lr = [_lr * self.warmup_ratio for _lr in self.regular_lr] + elif self.warmup == 'exp': + k = self.warmup_ratio**(1 - cur_iters / self.warmup_iters) + warmup_lr = [_lr * k for _lr in self.regular_lr] + return warmup_lr + + def get_lr(self, runner, base_lr): + if self.by_epoch: + progress = runner.epoch + max_progress = runner.max_epochs + else: + progress = runner.iter + max_progress = runner.max_iters + + target_lr = self.min_lr + if self.warmup is not None: + progress = progress - self.warmup_iters + max_progress = max_progress - self.warmup_iters + factor = progress / max_progress + return annealing_cos(base_lr, target_lr, factor) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..2c2bc8966c6cebd0f2f617f587ae82ae8cb0f639 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/__init__.py @@ -0,0 +1,28 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .activitynet_dataset import ActivityNetDataset +from .audio_dataset import AudioDataset +from .audio_feature_dataset import AudioFeatureDataset +from .audio_visual_dataset import AudioVisualDataset +from .ava_dataset import AVADataset +from .base import BaseDataset +from .blending_utils import (BaseMiniBatchBlending, CutmixBlending, + MixupBlending) +from .builder import (BLENDINGS, DATASETS, PIPELINES, build_dataloader, + build_dataset) +from .dataset_wrappers import ConcatDataset, RepeatDataset +from .hvu_dataset import HVUDataset +from .image_dataset import ImageDataset +from .pose_dataset import PoseDataset +from .rawframe_dataset import RawframeDataset +from .rawvideo_dataset import RawVideoDataset +from .ssn_dataset import SSNDataset +from .video_dataset import VideoDataset + +__all__ = [ + 'VideoDataset', 'build_dataloader', 'build_dataset', 'RepeatDataset', + 'RawframeDataset', 'BaseDataset', 'ActivityNetDataset', 'SSNDataset', + 'HVUDataset', 'AudioDataset', 'AudioFeatureDataset', 'ImageDataset', + 'RawVideoDataset', 'AVADataset', 'AudioVisualDataset', + 'BaseMiniBatchBlending', 'CutmixBlending', 'MixupBlending', 'DATASETS', + 'PIPELINES', 'BLENDINGS', 'PoseDataset', 'ConcatDataset' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/activitynet_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/activitynet_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..811d059c1696e8fa10be1b6f0e1c81bda4830e33 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/activitynet_dataset.py @@ -0,0 +1,270 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import os +import os.path as osp +import warnings +from collections import OrderedDict + +import mmcv +import numpy as np + +from ..core import average_recall_at_avg_proposals +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class ActivityNetDataset(BaseDataset): + """ActivityNet dataset for temporal action localization. + + The dataset loads raw features and apply specified transforms to return a + dict containing the frame tensors and other information. + + The ann_file is a json file with multiple objects, and each object has a + key of the name of a video, and value of total frames of the video, total + seconds of the video, annotations of a video, feature frames (frames + covered by features) of the video, fps and rfps. Example of a + annotation file: + + .. code-block:: JSON + + { + "v_--1DO2V4K74": { + "duration_second": 211.53, + "duration_frame": 6337, + "annotations": [ + { + "segment": [ + 30.025882995319815, + 205.2318595943838 + ], + "label": "Rock climbing" + } + ], + "feature_frame": 6336, + "fps": 30.0, + "rfps": 29.9579255898 + }, + "v_--6bJUbfpnQ": { + "duration_second": 26.75, + "duration_frame": 647, + "annotations": [ + { + "segment": [ + 2.578755070202808, + 24.914101404056165 + ], + "label": "Drinking beer" + } + ], + "feature_frame": 624, + "fps": 24.0, + "rfps": 24.1869158879 + }, + ... + } + + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + data_prefix (str | None): Path to a directory where videos are held. + Default: None. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + """ + + def __init__(self, ann_file, pipeline, data_prefix=None, test_mode=False): + super().__init__(ann_file, pipeline, data_prefix, test_mode) + + def load_annotations(self): + """Load the annotation according to ann_file into video_infos.""" + video_infos = [] + anno_database = mmcv.load(self.ann_file) + for video_name in anno_database: + video_info = anno_database[video_name] + video_info['video_name'] = video_name + video_infos.append(video_info) + return video_infos + + def prepare_test_frames(self, idx): + """Prepare the frames for testing given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + results['data_prefix'] = self.data_prefix + return self.pipeline(results) + + def prepare_train_frames(self, idx): + """Prepare the frames for training given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + results['data_prefix'] = self.data_prefix + return self.pipeline(results) + + def __len__(self): + """Get the size of the dataset.""" + return len(self.video_infos) + + def _import_ground_truth(self): + """Read ground truth data from video_infos.""" + ground_truth = {} + for video_info in self.video_infos: + video_id = video_info['video_name'][2:] + this_video_ground_truths = [] + for ann in video_info['annotations']: + t_start, t_end = ann['segment'] + label = ann['label'] + this_video_ground_truths.append([t_start, t_end, label]) + ground_truth[video_id] = np.array(this_video_ground_truths) + return ground_truth + + @staticmethod + def proposals2json(results, show_progress=False): + """Convert all proposals to a final dict(json) format. + + Args: + results (list[dict]): All proposals. + show_progress (bool): Whether to show the progress bar. + Defaults: False. + + Returns: + dict: The final result dict. E.g. + + .. code-block:: Python + + dict(video-1=[dict(segment=[1.1,2.0]. score=0.9), + dict(segment=[50.1, 129.3], score=0.6)]) + """ + result_dict = {} + print('Convert proposals to json format') + if show_progress: + prog_bar = mmcv.ProgressBar(len(results)) + for result in results: + video_name = result['video_name'] + result_dict[video_name[2:]] = result['proposal_list'] + if show_progress: + prog_bar.update() + return result_dict + + @staticmethod + def _import_proposals(results): + """Read predictions from results.""" + proposals = {} + num_proposals = 0 + for result in results: + video_id = result['video_name'][2:] + this_video_proposals = [] + for proposal in result['proposal_list']: + t_start, t_end = proposal['segment'] + score = proposal['score'] + this_video_proposals.append([t_start, t_end, score]) + num_proposals += 1 + proposals[video_id] = np.array(this_video_proposals) + return proposals, num_proposals + + def dump_results(self, results, out, output_format, version='VERSION 1.3'): + """Dump data to json/csv files.""" + if output_format == 'json': + result_dict = self.proposals2json(results) + output_dict = { + 'version': version, + 'results': result_dict, + 'external_data': {} + } + mmcv.dump(output_dict, out) + elif output_format == 'csv': + # TODO: add csv handler to mmcv and use mmcv.dump + os.makedirs(out, exist_ok=True) + header = 'action,start,end,tmin,tmax' + for result in results: + video_name, outputs = result + output_path = osp.join(out, video_name + '.csv') + np.savetxt( + output_path, + outputs, + header=header, + delimiter=',', + comments='') + else: + raise ValueError( + f'The output format {output_format} is not supported.') + + def evaluate( + self, + results, + metrics='AR@AN', + metric_options={ + 'AR@AN': + dict( + max_avg_proposals=100, + temporal_iou_thresholds=np.linspace(0.5, 0.95, 10)) + }, + logger=None, + **deprecated_kwargs): + """Evaluation in feature dataset. + + Args: + results (list[dict]): Output results. + metrics (str | sequence[str]): Metrics to be performed. + Defaults: 'AR@AN'. + metric_options (dict): Dict for metric options. Options are + ``max_avg_proposals``, ``temporal_iou_thresholds`` for + ``AR@AN``. + default: ``{'AR@AN': dict(max_avg_proposals=100, + temporal_iou_thresholds=np.linspace(0.5, 0.95, 10))}``. + logger (logging.Logger | None): Training logger. Defaults: None. + deprecated_kwargs (dict): Used for containing deprecated arguments. + See 'https://github.com/open-mmlab/mmaction2/pull/286'. + + Returns: + dict: Evaluation results for evaluation metrics. + """ + # Protect ``metric_options`` since it uses mutable value as default + metric_options = copy.deepcopy(metric_options) + + if deprecated_kwargs != {}: + warnings.warn( + 'Option arguments for metrics has been changed to ' + "`metric_options`, See 'https://github.com/open-mmlab/mmaction2/pull/286' " # noqa: E501 + 'for more details') + metric_options['AR@AN'] = dict(metric_options['AR@AN'], + **deprecated_kwargs) + + if not isinstance(results, list): + raise TypeError(f'results must be a list, but got {type(results)}') + assert len(results) == len(self), ( + f'The length of results is not equal to the dataset len: ' + f'{len(results)} != {len(self)}') + + metrics = metrics if isinstance(metrics, (list, tuple)) else [metrics] + allowed_metrics = ['AR@AN'] + for metric in metrics: + if metric not in allowed_metrics: + raise KeyError(f'metric {metric} is not supported') + + eval_results = OrderedDict() + ground_truth = self._import_ground_truth() + proposal, num_proposals = self._import_proposals(results) + + for metric in metrics: + if metric == 'AR@AN': + temporal_iou_thresholds = metric_options.setdefault( + 'AR@AN', {}).setdefault('temporal_iou_thresholds', + np.linspace(0.5, 0.95, 10)) + max_avg_proposals = metric_options.setdefault( + 'AR@AN', {}).setdefault('max_avg_proposals', 100) + if isinstance(temporal_iou_thresholds, list): + temporal_iou_thresholds = np.array(temporal_iou_thresholds) + + recall, _, _, auc = ( + average_recall_at_avg_proposals( + ground_truth, + proposal, + num_proposals, + max_avg_proposals=max_avg_proposals, + temporal_iou_thresholds=temporal_iou_thresholds)) + eval_results['auc'] = auc + eval_results['AR@1'] = np.mean(recall[:, 0]) + eval_results['AR@5'] = np.mean(recall[:, 4]) + eval_results['AR@10'] = np.mean(recall[:, 9]) + eval_results['AR@100'] = np.mean(recall[:, 99]) + + return eval_results diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/audio_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/audio_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..df19b1806a919184604627a0ad7867fda3137541 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/audio_dataset.py @@ -0,0 +1,70 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import torch + +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class AudioDataset(BaseDataset): + """Audio dataset for video recognition. Extracts the audio feature on-the- + fly. Annotation file can be that of the rawframe dataset, or: + + .. code-block:: txt + + some/directory-1.wav 163 1 + some/directory-2.wav 122 1 + some/directory-3.wav 258 2 + some/directory-4.wav 234 2 + some/directory-5.wav 295 3 + some/directory-6.wav 121 3 + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + suffix (str): The suffix of the audio file. Default: '.wav'. + kwargs (dict): Other keyword args for `BaseDataset`. + """ + + def __init__(self, ann_file, pipeline, suffix='.wav', **kwargs): + self.suffix = suffix + super().__init__(ann_file, pipeline, modality='Audio', **kwargs) + + def load_annotations(self): + """Load annotation file to get video information.""" + if self.ann_file.endswith('.json'): + return self.load_json_annotations() + video_infos = [] + with open(self.ann_file, 'r') as fin: + for line in fin: + line_split = line.strip().split() + video_info = {} + idx = 0 + filename = line_split[idx] + if self.data_prefix is not None: + if not filename.endswith(self.suffix): + filename = osp.join(self.data_prefix, + filename + self.suffix) + else: + filename = osp.join(self.data_prefix, filename) + video_info['audio_path'] = filename + idx += 1 + # idx for total_frames + video_info['total_frames'] = int(line_split[idx]) + idx += 1 + # idx for label[s] + label = [int(x) for x in line_split[idx:]] + assert label, f'missing label in line: {line}' + if self.multi_class: + assert self.num_classes is not None + onehot = torch.zeros(self.num_classes) + onehot[label] = 1.0 + video_info['label'] = onehot + else: + assert len(label) == 1 + video_info['label'] = label[0] + video_infos.append(video_info) + + return video_infos diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/audio_feature_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/audio_feature_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..eaa54642f08bf3fd55916bc59af314bce0dcb7da --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/audio_feature_dataset.py @@ -0,0 +1,71 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import torch + +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class AudioFeatureDataset(BaseDataset): + """Audio feature dataset for video recognition. Reads the features + extracted off-line. Annotation file can be that of the rawframe dataset, + or: + + .. code-block:: txt + + some/directory-1.npy 163 1 + some/directory-2.npy 122 1 + some/directory-3.npy 258 2 + some/directory-4.npy 234 2 + some/directory-5.npy 295 3 + some/directory-6.npy 121 3 + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + suffix (str): The suffix of the audio feature file. Default: '.npy'. + kwargs (dict): Other keyword args for `BaseDataset`. + """ + + def __init__(self, ann_file, pipeline, suffix='.npy', **kwargs): + self.suffix = suffix + super().__init__(ann_file, pipeline, modality='Audio', **kwargs) + + def load_annotations(self): + """Load annotation file to get video information.""" + if self.ann_file.endswith('.json'): + return self.load_json_annotations() + video_infos = [] + with open(self.ann_file, 'r') as fin: + for line in fin: + line_split = line.strip().split() + video_info = {} + idx = 0 + filename = line_split[idx] + if self.data_prefix is not None: + if not filename.endswith(self.suffix): + filename = osp.join(self.data_prefix, + filename) + self.suffix + else: + filename = osp.join(self.data_prefix, filename) + video_info['audio_path'] = filename + idx += 1 + # idx for total_frames + video_info['total_frames'] = int(line_split[idx]) + idx += 1 + # idx for label[s] + label = [int(x) for x in line_split[idx:]] + assert label, f'missing label in line: {line}' + if self.multi_class: + assert self.num_classes is not None + onehot = torch.zeros(self.num_classes) + onehot[label] = 1.0 + video_info['label'] = onehot + else: + assert len(label) == 1 + video_info['label'] = label[0] + video_infos.append(video_info) + + return video_infos diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/audio_visual_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/audio_visual_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..15a31240d8625b45cdda3cdb4c3fed3a7cca0e4b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/audio_visual_dataset.py @@ -0,0 +1,77 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +from .builder import DATASETS +from .rawframe_dataset import RawframeDataset + + +@DATASETS.register_module() +class AudioVisualDataset(RawframeDataset): + """Dataset that reads both audio and visual data, supporting both rawframes + and videos. The annotation file is same as that of the rawframe dataset, + such as: + + .. code-block:: txt + + some/directory-1 163 1 + some/directory-2 122 1 + some/directory-3 258 2 + some/directory-4 234 2 + some/directory-5 295 3 + some/directory-6 121 3 + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + audio_prefix (str): Directory of the audio files. + kwargs (dict): Other keyword args for `RawframeDataset`. `video_prefix` + is also allowed if pipeline is designed for videos. + """ + + def __init__(self, ann_file, pipeline, audio_prefix, **kwargs): + self.audio_prefix = audio_prefix + self.video_prefix = kwargs.pop('video_prefix', None) + self.data_prefix = kwargs.get('data_prefix', None) + super().__init__(ann_file, pipeline, **kwargs) + + def load_annotations(self): + video_infos = [] + with open(self.ann_file, 'r') as fin: + for line in fin: + line_split = line.strip().split() + video_info = {} + idx = 0 + # idx for frame_dir + frame_dir = line_split[idx] + if self.audio_prefix is not None: + audio_path = osp.join(self.audio_prefix, + frame_dir + '.npy') + video_info['audio_path'] = audio_path + if self.video_prefix: + video_path = osp.join(self.video_prefix, + frame_dir + '.mp4') + video_info['filename'] = video_path + if self.data_prefix is not None: + frame_dir = osp.join(self.data_prefix, frame_dir) + video_info['frame_dir'] = frame_dir + idx += 1 + if self.with_offset: + # idx for offset and total_frames + video_info['offset'] = int(line_split[idx]) + video_info['total_frames'] = int(line_split[idx + 1]) + idx += 2 + else: + # idx for total_frames + video_info['total_frames'] = int(line_split[idx]) + idx += 1 + # idx for label[s] + label = [int(x) for x in line_split[idx:]] + assert len(label) != 0, f'missing label in line: {line}' + if self.multi_class: + assert self.num_classes is not None + video_info['label'] = label + else: + assert len(label) == 1 + video_info['label'] = label[0] + video_infos.append(video_info) + return video_infos diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/ava_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/ava_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..ec64a20c114e2d784eefa717ea7540612a6e38a7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/ava_dataset.py @@ -0,0 +1,393 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import os +import os.path as osp +from collections import defaultdict +from datetime import datetime + +import mmcv +import numpy as np +from mmcv.utils import print_log + +from ..core.evaluation.ava_utils import ava_eval, read_labelmap, results2csv +from ..utils import get_root_logger +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class AVADataset(BaseDataset): + """AVA dataset for spatial temporal detection. + + Based on official AVA annotation files, the dataset loads raw frames, + bounding boxes, proposals and applies specified transformations to return + a dict containing the frame tensors and other information. + + This datasets can load information from the following files: + + .. code-block:: txt + + ann_file -> ava_{train, val}_{v2.1, v2.2}.csv + exclude_file -> ava_{train, val}_excluded_timestamps_{v2.1, v2.2}.csv + label_file -> ava_action_list_{v2.1, v2.2}.pbtxt / + ava_action_list_{v2.1, v2.2}_for_activitynet_2019.pbtxt + proposal_file -> ava_dense_proposals_{train, val}.FAIR.recall_93.9.pkl + + Particularly, the proposal_file is a pickle file which contains + ``img_key`` (in format of ``{video_id},{timestamp}``). Example of a pickle + file: + + .. code-block:: JSON + + { + ... + '0f39OWEqJ24,0902': + array([[0.011 , 0.157 , 0.655 , 0.983 , 0.998163]]), + '0f39OWEqJ24,0912': + array([[0.054 , 0.088 , 0.91 , 0.998 , 0.068273], + [0.016 , 0.161 , 0.519 , 0.974 , 0.984025], + [0.493 , 0.283 , 0.981 , 0.984 , 0.983621]]), + ... + } + + Args: + ann_file (str): Path to the annotation file like + ``ava_{train, val}_{v2.1, v2.2}.csv``. + exclude_file (str): Path to the excluded timestamp file like + ``ava_{train, val}_excluded_timestamps_{v2.1, v2.2}.csv``. + pipeline (list[dict | callable]): A sequence of data transforms. + label_file (str): Path to the label file like + ``ava_action_list_{v2.1, v2.2}.pbtxt`` or + ``ava_action_list_{v2.1, v2.2}_for_activitynet_2019.pbtxt``. + Default: None. + filename_tmpl (str): Template for each filename. + Default: 'img_{:05}.jpg'. + start_index (int): Specify a start index for frames in consideration of + different filename format. However, when taking videos as input, + it should be set to 0, since frames loaded from videos count + from 0. Default: 0. + proposal_file (str): Path to the proposal file like + ``ava_dense_proposals_{train, val}.FAIR.recall_93.9.pkl``. + Default: None. + person_det_score_thr (float): The threshold of person detection scores, + bboxes with scores above the threshold will be used. Default: 0.9. + Note that 0 <= person_det_score_thr <= 1. If no proposal has + detection score larger than the threshold, the one with the largest + detection score will be used. + num_classes (int): The number of classes of the dataset. Default: 81. + (AVA has 80 action classes, another 1-dim is added for potential + usage) + custom_classes (list[int]): A subset of class ids from origin dataset. + Please note that 0 should NOT be selected, and ``num_classes`` + should be equal to ``len(custom_classes) + 1`` + data_prefix (str): Path to a directory where videos are held. + Default: None. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + modality (str): Modality of data. Support 'RGB', 'Flow'. + Default: 'RGB'. + num_max_proposals (int): Max proposals number to store. Default: 1000. + timestamp_start (int): The start point of included timestamps. The + default value is referred from the official website. Default: 902. + timestamp_end (int): The end point of included timestamps. The + default value is referred from the official website. Default: 1798. + fps (int): Overrides the default FPS for the dataset. Default: 30. + """ + + def __init__(self, + ann_file, + exclude_file, + pipeline, + label_file=None, + filename_tmpl='img_{:05}.jpg', + start_index=0, + proposal_file=None, + person_det_score_thr=0.9, + num_classes=81, + custom_classes=None, + data_prefix=None, + test_mode=False, + modality='RGB', + num_max_proposals=1000, + timestamp_start=900, + timestamp_end=1800, + fps=30): + # since it inherits from `BaseDataset`, some arguments + # should be assigned before performing `load_annotations()` + self._FPS = fps # Keep this as standard + self.custom_classes = custom_classes + if custom_classes is not None: + assert num_classes == len(custom_classes) + 1 + assert 0 not in custom_classes + _, class_whitelist = read_labelmap(open(label_file)) + assert set(custom_classes).issubset(class_whitelist) + + self.custom_classes = tuple([0] + custom_classes) + self.exclude_file = exclude_file + self.label_file = label_file + self.proposal_file = proposal_file + assert 0 <= person_det_score_thr <= 1, ( + 'The value of ' + 'person_det_score_thr should in [0, 1]. ') + self.person_det_score_thr = person_det_score_thr + self.num_classes = num_classes + self.filename_tmpl = filename_tmpl + self.num_max_proposals = num_max_proposals + self.timestamp_start = timestamp_start + self.timestamp_end = timestamp_end + self.logger = get_root_logger() + super().__init__( + ann_file, + pipeline, + data_prefix, + test_mode, + start_index=start_index, + modality=modality, + num_classes=num_classes) + + if self.proposal_file is not None: + self.proposals = mmcv.load(self.proposal_file) + else: + self.proposals = None + + if not test_mode: + valid_indexes = self.filter_exclude_file() + self.logger.info( + f'{len(valid_indexes)} out of {len(self.video_infos)} ' + f'frames are valid.') + self.video_infos = [self.video_infos[i] for i in valid_indexes] + + def parse_img_record(self, img_records): + """Merge image records of the same entity at the same time. + + Args: + img_records (list[dict]): List of img_records (lines in AVA + annotations). + + Returns: + tuple(list): A tuple consists of lists of bboxes, action labels and + entity_ids + """ + bboxes, labels, entity_ids = [], [], [] + while len(img_records) > 0: + img_record = img_records[0] + num_img_records = len(img_records) + + selected_records = [ + x for x in img_records + if np.array_equal(x['entity_box'], img_record['entity_box']) + ] + + num_selected_records = len(selected_records) + img_records = [ + x for x in img_records if + not np.array_equal(x['entity_box'], img_record['entity_box']) + ] + + assert len(img_records) + num_selected_records == num_img_records + + bboxes.append(img_record['entity_box']) + valid_labels = np.array([ + selected_record['label'] + for selected_record in selected_records + ]) + + # The format can be directly used by BCELossWithLogits + label = np.zeros(self.num_classes, dtype=np.float32) + label[valid_labels] = 1. + + labels.append(label) + entity_ids.append(img_record['entity_id']) + + bboxes = np.stack(bboxes) + labels = np.stack(labels) + entity_ids = np.stack(entity_ids) + return bboxes, labels, entity_ids + + def filter_exclude_file(self): + """Filter out records in the exclude_file.""" + valid_indexes = [] + if self.exclude_file is None: + valid_indexes = list(range(len(self.video_infos))) + else: + exclude_video_infos = [ + x.strip().split(',') for x in open(self.exclude_file) + ] + for i, video_info in enumerate(self.video_infos): + valid_indexes.append(i) + for video_id, timestamp in exclude_video_infos: + if (video_info['video_id'] == video_id + and video_info['timestamp'] == int(timestamp)): + valid_indexes.pop() + break + return valid_indexes + + def load_annotations(self): + """Load AVA annotations.""" + video_infos = [] + records_dict_by_img = defaultdict(list) + with open(self.ann_file, 'r') as fin: + for line in fin: + line_split = line.strip().split(',') + + label = int(line_split[6]) + if self.custom_classes is not None: + if label not in self.custom_classes: + continue + label = self.custom_classes.index(label) + + video_id = line_split[0] + timestamp = int(line_split[1]) + img_key = f'{video_id},{timestamp:04d}' + + entity_box = np.array(list(map(float, line_split[2:6]))) + entity_id = int(line_split[7]) + shot_info = (0, (self.timestamp_end - self.timestamp_start) * + self._FPS) + + video_info = dict( + video_id=video_id, + timestamp=timestamp, + entity_box=entity_box, + label=label, + entity_id=entity_id, + shot_info=shot_info) + records_dict_by_img[img_key].append(video_info) + + for img_key in records_dict_by_img: + video_id, timestamp = img_key.split(',') + bboxes, labels, entity_ids = self.parse_img_record( + records_dict_by_img[img_key]) + ann = dict( + gt_bboxes=bboxes, gt_labels=labels, entity_ids=entity_ids) + frame_dir = video_id + if self.data_prefix is not None: + frame_dir = osp.join(self.data_prefix, frame_dir) + video_info = dict( + frame_dir=frame_dir, + video_id=video_id, + timestamp=int(timestamp), + img_key=img_key, + shot_info=shot_info, + fps=self._FPS, + ann=ann) + video_infos.append(video_info) + + return video_infos + + def prepare_train_frames(self, idx): + """Prepare the frames for training given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + img_key = results['img_key'] + + results['filename_tmpl'] = self.filename_tmpl + results['modality'] = self.modality + results['start_index'] = self.start_index + results['timestamp_start'] = self.timestamp_start + results['timestamp_end'] = self.timestamp_end + + if self.proposals is not None: + if img_key not in self.proposals: + results['proposals'] = np.array([[0, 0, 1, 1]]) + results['scores'] = np.array([1]) + else: + proposals = self.proposals[img_key] + assert proposals.shape[-1] in [4, 5] + if proposals.shape[-1] == 5: + thr = min(self.person_det_score_thr, max(proposals[:, 4])) + positive_inds = (proposals[:, 4] >= thr) + proposals = proposals[positive_inds] + proposals = proposals[:self.num_max_proposals] + results['proposals'] = proposals[:, :4] + results['scores'] = proposals[:, 4] + else: + proposals = proposals[:self.num_max_proposals] + results['proposals'] = proposals + + ann = results.pop('ann') + results['gt_bboxes'] = ann['gt_bboxes'] + results['gt_labels'] = ann['gt_labels'] + results['entity_ids'] = ann['entity_ids'] + + return self.pipeline(results) + + def prepare_test_frames(self, idx): + """Prepare the frames for testing given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + img_key = results['img_key'] + + results['filename_tmpl'] = self.filename_tmpl + results['modality'] = self.modality + results['start_index'] = self.start_index + results['timestamp_start'] = self.timestamp_start + results['timestamp_end'] = self.timestamp_end + + if self.proposals is not None: + if img_key not in self.proposals: + results['proposals'] = np.array([[0, 0, 1, 1]]) + results['scores'] = np.array([1]) + else: + proposals = self.proposals[img_key] + assert proposals.shape[-1] in [4, 5] + if proposals.shape[-1] == 5: + thr = min(self.person_det_score_thr, max(proposals[:, 4])) + positive_inds = (proposals[:, 4] >= thr) + proposals = proposals[positive_inds] + proposals = proposals[:self.num_max_proposals] + results['proposals'] = proposals[:, :4] + results['scores'] = proposals[:, 4] + else: + proposals = proposals[:self.num_max_proposals] + results['proposals'] = proposals + + ann = results.pop('ann') + # Follow the mmdet variable naming style. + results['gt_bboxes'] = ann['gt_bboxes'] + results['gt_labels'] = ann['gt_labels'] + results['entity_ids'] = ann['entity_ids'] + + return self.pipeline(results) + + def dump_results(self, results, out): + """Dump predictions into a csv file.""" + assert out.endswith('csv') + results2csv(self, results, out, self.custom_classes) + + def evaluate(self, + results, + metrics=('mAP', ), + metric_options=None, + logger=None): + """Evaluate the prediction results and report mAP.""" + assert len(metrics) == 1 and metrics[0] == 'mAP', ( + 'For evaluation on AVADataset, you need to use metrics "mAP" ' + 'See https://github.com/open-mmlab/mmaction2/pull/567 ' + 'for more info.') + time_now = datetime.now().strftime('%Y%m%d_%H%M%S') + temp_file = f'AVA_{time_now}_result.csv' + results2csv(self, results, temp_file, self.custom_classes) + + ret = {} + for metric in metrics: + msg = f'Evaluating {metric} ...' + if logger is None: + msg = '\n' + msg + print_log(msg, logger=logger) + + eval_result = ava_eval( + temp_file, + metric, + self.label_file, + self.ann_file, + self.exclude_file, + custom_classes=self.custom_classes) + log_msg = [] + for k, v in eval_result.items(): + log_msg.append(f'\n{k}\t{v: .4f}') + log_msg = ''.join(log_msg) + print_log(log_msg, logger=logger) + ret.update(eval_result) + + os.remove(temp_file) + + return ret diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/base.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/base.py new file mode 100644 index 0000000000000000000000000000000000000000..8d2589ca1265c23e2ae813f4a896d75245cb6716 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/base.py @@ -0,0 +1,289 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import os.path as osp +import warnings +from abc import ABCMeta, abstractmethod +from collections import OrderedDict, defaultdict + +import mmcv +import numpy as np +import torch +from mmcv.utils import print_log +from torch.utils.data import Dataset + +from ..core import (mean_average_precision, mean_class_accuracy, + mmit_mean_average_precision, top_k_accuracy) +from .pipelines import Compose + + +class BaseDataset(Dataset, metaclass=ABCMeta): + """Base class for datasets. + + All datasets to process video should subclass it. + All subclasses should overwrite: + + - Methods:`load_annotations`, supporting to load information from an + annotation file. + - Methods:`prepare_train_frames`, providing train data. + - Methods:`prepare_test_frames`, providing test data. + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + data_prefix (str | None): Path to a directory where videos are held. + Default: None. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + multi_class (bool): Determines whether the dataset is a multi-class + dataset. Default: False. + num_classes (int | None): Number of classes of the dataset, used in + multi-class datasets. Default: None. + start_index (int): Specify a start index for frames in consideration of + different filename format. However, when taking videos as input, + it should be set to 0, since frames loaded from videos count + from 0. Default: 1. + modality (str): Modality of data. Support 'RGB', 'Flow', 'Audio'. + Default: 'RGB'. + sample_by_class (bool): Sampling by class, should be set `True` when + performing inter-class data balancing. Only compatible with + `multi_class == False`. Only applies for training. Default: False. + power (float): We support sampling data with the probability + proportional to the power of its label frequency (freq ^ power) + when sampling data. `power == 1` indicates uniformly sampling all + data; `power == 0` indicates uniformly sampling all classes. + Default: 0. + dynamic_length (bool): If the dataset length is dynamic (used by + ClassSpecificDistributedSampler). Default: False. + """ + + def __init__(self, + ann_file, + pipeline, + data_prefix=None, + test_mode=False, + multi_class=False, + num_classes=None, + start_index=1, + modality='RGB', + sample_by_class=False, + power=0, + dynamic_length=False): + super().__init__() + + self.ann_file = ann_file + self.data_prefix = osp.realpath( + data_prefix) if data_prefix is not None and osp.isdir( + data_prefix) else data_prefix + self.test_mode = test_mode + self.multi_class = multi_class + self.num_classes = num_classes + self.start_index = start_index + self.modality = modality + self.sample_by_class = sample_by_class + self.power = power + self.dynamic_length = dynamic_length + + assert not (self.multi_class and self.sample_by_class) + + self.pipeline = Compose(pipeline) + self.video_infos = self.load_annotations() + if self.sample_by_class: + self.video_infos_by_class = self.parse_by_class() + + class_prob = [] + for _, samples in self.video_infos_by_class.items(): + class_prob.append(len(samples) / len(self.video_infos)) + class_prob = [x**self.power for x in class_prob] + + summ = sum(class_prob) + class_prob = [x / summ for x in class_prob] + + self.class_prob = dict(zip(self.video_infos_by_class, class_prob)) + + @abstractmethod + def load_annotations(self): + """Load the annotation according to ann_file into video_infos.""" + + # json annotations already looks like video_infos, so for each dataset, + # this func should be the same + def load_json_annotations(self): + """Load json annotation file to get video information.""" + video_infos = mmcv.load(self.ann_file) + num_videos = len(video_infos) + path_key = 'frame_dir' if 'frame_dir' in video_infos[0] else 'filename' + for i in range(num_videos): + path_value = video_infos[i][path_key] + if self.data_prefix is not None: + path_value = osp.join(self.data_prefix, path_value) + video_infos[i][path_key] = path_value + if self.multi_class: + assert self.num_classes is not None + else: + assert len(video_infos[i]['label']) == 1 + video_infos[i]['label'] = video_infos[i]['label'][0] + return video_infos + + def parse_by_class(self): + video_infos_by_class = defaultdict(list) + for item in self.video_infos: + label = item['label'] + video_infos_by_class[label].append(item) + return video_infos_by_class + + @staticmethod + def label2array(num, label): + arr = np.zeros(num, dtype=np.float32) + arr[label] = 1. + return arr + + def evaluate(self, + results, + metrics='top_k_accuracy', + metric_options=dict(top_k_accuracy=dict(topk=(1, 5))), + logger=None, + **deprecated_kwargs): + """Perform evaluation for common datasets. + + Args: + results (list): Output results. + metrics (str | sequence[str]): Metrics to be performed. + Defaults: 'top_k_accuracy'. + metric_options (dict): Dict for metric options. Options are + ``topk`` for ``top_k_accuracy``. + Default: ``dict(top_k_accuracy=dict(topk=(1, 5)))``. + logger (logging.Logger | None): Logger for recording. + Default: None. + deprecated_kwargs (dict): Used for containing deprecated arguments. + See 'https://github.com/open-mmlab/mmaction2/pull/286'. + + Returns: + dict: Evaluation results dict. + """ + # Protect ``metric_options`` since it uses mutable value as default + metric_options = copy.deepcopy(metric_options) + + if deprecated_kwargs != {}: + warnings.warn( + 'Option arguments for metrics has been changed to ' + "`metric_options`, See 'https://github.com/open-mmlab/mmaction2/pull/286' " # noqa: E501 + 'for more details') + metric_options['top_k_accuracy'] = dict( + metric_options['top_k_accuracy'], **deprecated_kwargs) + + if not isinstance(results, list): + raise TypeError(f'results must be a list, but got {type(results)}') + assert len(results) == len(self), ( + f'The length of results is not equal to the dataset len: ' + f'{len(results)} != {len(self)}') + + metrics = metrics if isinstance(metrics, (list, tuple)) else [metrics] + allowed_metrics = [ + 'top_k_accuracy', 'mean_class_accuracy', 'mean_average_precision', + 'mmit_mean_average_precision' + ] + + for metric in metrics: + if metric not in allowed_metrics: + raise KeyError(f'metric {metric} is not supported') + + eval_results = OrderedDict() + gt_labels = [ann['label'] for ann in self.video_infos] + + for metric in metrics: + msg = f'Evaluating {metric} ...' + if logger is None: + msg = '\n' + msg + print_log(msg, logger=logger) + + if metric == 'top_k_accuracy': + topk = metric_options.setdefault('top_k_accuracy', + {}).setdefault( + 'topk', (1, 5)) + if not isinstance(topk, (int, tuple)): + raise TypeError('topk must be int or tuple of int, ' + f'but got {type(topk)}') + if isinstance(topk, int): + topk = (topk, ) + + top_k_acc = top_k_accuracy(results, gt_labels, topk) + log_msg = [] + for k, acc in zip(topk, top_k_acc): + eval_results[f'top{k}_acc'] = acc + log_msg.append(f'\ntop{k}_acc\t{acc:.4f}') + log_msg = ''.join(log_msg) + print_log(log_msg, logger=logger) + continue + + if metric == 'mean_class_accuracy': + mean_acc = mean_class_accuracy(results, gt_labels) + eval_results['mean_class_accuracy'] = mean_acc + log_msg = f'\nmean_acc\t{mean_acc:.4f}' + print_log(log_msg, logger=logger) + continue + + if metric in [ + 'mean_average_precision', 'mmit_mean_average_precision' + ]: + gt_labels_arrays = [ + self.label2array(self.num_classes, label) + for label in gt_labels + ] + if metric == 'mean_average_precision': + mAP = mean_average_precision(results, gt_labels_arrays) + eval_results['mean_average_precision'] = mAP + log_msg = f'\nmean_average_precision\t{mAP:.4f}' + elif metric == 'mmit_mean_average_precision': + mAP = mmit_mean_average_precision(results, + gt_labels_arrays) + eval_results['mmit_mean_average_precision'] = mAP + log_msg = f'\nmmit_mean_average_precision\t{mAP:.4f}' + print_log(log_msg, logger=logger) + continue + + return eval_results + + @staticmethod + def dump_results(results, out): + """Dump data to json/yaml/pickle strings or files.""" + return mmcv.dump(results, out) + + def prepare_train_frames(self, idx): + """Prepare the frames for training given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + results['modality'] = self.modality + results['start_index'] = self.start_index + + # prepare tensor in getitem + # If HVU, type(results['label']) is dict + if self.multi_class and isinstance(results['label'], list): + onehot = torch.zeros(self.num_classes) + onehot[results['label']] = 1. + results['label'] = onehot + + return self.pipeline(results) + + def prepare_test_frames(self, idx): + """Prepare the frames for testing given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + results['modality'] = self.modality + results['start_index'] = self.start_index + + # prepare tensor in getitem + # If HVU, type(results['label']) is dict + if self.multi_class and isinstance(results['label'], list): + onehot = torch.zeros(self.num_classes) + onehot[results['label']] = 1. + results['label'] = onehot + + return self.pipeline(results) + + def __len__(self): + """Get the size of the dataset.""" + return len(self.video_infos) + + def __getitem__(self, idx): + """Get the sample for either training or testing given index.""" + if self.test_mode: + return self.prepare_test_frames(idx) + + return self.prepare_train_frames(idx) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/blending_utils.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/blending_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..bd8ded3674983ffffecc36b424ee46241b493d0c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/blending_utils.py @@ -0,0 +1,143 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from abc import ABCMeta, abstractmethod + +import torch +import torch.nn.functional as F +from torch.distributions.beta import Beta + +from .builder import BLENDINGS + +__all__ = ['BaseMiniBatchBlending', 'MixupBlending', 'CutmixBlending'] + + +class BaseMiniBatchBlending(metaclass=ABCMeta): + """Base class for Image Aliasing.""" + + def __init__(self, num_classes): + self.num_classes = num_classes + + @abstractmethod + def do_blending(self, imgs, label, **kwargs): + pass + + def __call__(self, imgs, label, **kwargs): + """Blending data in a mini-batch. + + Images are float tensors with the shape of (B, N, C, H, W) for 2D + recognizers or (B, N, C, T, H, W) for 3D recognizers. + + Besides, labels are converted from hard labels to soft labels. + Hard labels are integer tensors with the shape of (B, 1) and all of the + elements are in the range [0, num_classes - 1]. + Soft labels (probablity distribution over classes) are float tensors + with the shape of (B, 1, num_classes) and all of the elements are in + the range [0, 1]. + + Args: + imgs (torch.Tensor): Model input images, float tensor with the + shape of (B, N, C, H, W) or (B, N, C, T, H, W). + label (torch.Tensor): Hard labels, integer tensor with the shape + of (B, 1) and all elements are in range [0, num_classes). + kwargs (dict, optional): Other keyword argument to be used to + blending imgs and labels in a mini-batch. + + Returns: + mixed_imgs (torch.Tensor): Blending images, float tensor with the + same shape of the input imgs. + mixed_label (torch.Tensor): Blended soft labels, float tensor with + the shape of (B, 1, num_classes) and all elements are in range + [0, 1]. + """ + one_hot_label = F.one_hot(label, num_classes=self.num_classes) + + mixed_imgs, mixed_label = self.do_blending(imgs, one_hot_label, + **kwargs) + + return mixed_imgs, mixed_label + + +@BLENDINGS.register_module() +class MixupBlending(BaseMiniBatchBlending): + """Implementing Mixup in a mini-batch. + + This module is proposed in `mixup: Beyond Empirical Risk Minimization + `_. + Code Reference https://github.com/open-mmlab/mmclassification/blob/master/mmcls/models/utils/mixup.py # noqa + + Args: + num_classes (int): The number of classes. + alpha (float): Parameters for Beta distribution. + """ + + def __init__(self, num_classes, alpha=.2): + super().__init__(num_classes=num_classes) + self.beta = Beta(alpha, alpha) + + def do_blending(self, imgs, label, **kwargs): + """Blending images with mixup.""" + assert len(kwargs) == 0, f'unexpected kwargs for mixup {kwargs}' + + lam = self.beta.sample() + batch_size = imgs.size(0) + rand_index = torch.randperm(batch_size) + + mixed_imgs = lam * imgs + (1 - lam) * imgs[rand_index, :] + mixed_label = lam * label + (1 - lam) * label[rand_index, :] + + return mixed_imgs, mixed_label + + +@BLENDINGS.register_module() +class CutmixBlending(BaseMiniBatchBlending): + """Implementing Cutmix in a mini-batch. + + This module is proposed in `CutMix: Regularization Strategy to Train Strong + Classifiers with Localizable Features `_. + Code Reference https://github.com/clovaai/CutMix-PyTorch + + Args: + num_classes (int): The number of classes. + alpha (float): Parameters for Beta distribution. + """ + + def __init__(self, num_classes, alpha=.2): + super().__init__(num_classes=num_classes) + self.beta = Beta(alpha, alpha) + + @staticmethod + def rand_bbox(img_size, lam): + """Generate a random boudning box.""" + w = img_size[-1] + h = img_size[-2] + cut_rat = torch.sqrt(1. - lam) + cut_w = torch.tensor(int(w * cut_rat)) + cut_h = torch.tensor(int(h * cut_rat)) + + # uniform + cx = torch.randint(w, (1, ))[0] + cy = torch.randint(h, (1, ))[0] + + bbx1 = torch.clamp(cx - cut_w // 2, 0, w) + bby1 = torch.clamp(cy - cut_h // 2, 0, h) + bbx2 = torch.clamp(cx + cut_w // 2, 0, w) + bby2 = torch.clamp(cy + cut_h // 2, 0, h) + + return bbx1, bby1, bbx2, bby2 + + def do_blending(self, imgs, label, **kwargs): + """Blending images with cutmix.""" + assert len(kwargs) == 0, f'unexpected kwargs for cutmix {kwargs}' + + batch_size = imgs.size(0) + rand_index = torch.randperm(batch_size) + lam = self.beta.sample() + + bbx1, bby1, bbx2, bby2 = self.rand_bbox(imgs.size(), lam) + imgs[:, ..., bby1:bby2, bbx1:bbx2] = imgs[rand_index, ..., bby1:bby2, + bbx1:bbx2] + lam = 1 - (1.0 * (bbx2 - bbx1) * (bby2 - bby1) / + (imgs.size()[-1] * imgs.size()[-2])) + + label = lam * label + (1 - lam) * label[rand_index, :] + + return imgs, label diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/builder.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/builder.py new file mode 100644 index 0000000000000000000000000000000000000000..8a516af5425ae0082ee1dc53ae5fd178532940f3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/builder.py @@ -0,0 +1,168 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import platform +import random +from functools import partial + +import numpy as np +import torch +from mmcv.parallel import collate +from mmcv.runner import get_dist_info +from mmcv.utils import Registry, build_from_cfg, digit_version +from torch.utils.data import DataLoader + +from ..utils.multigrid import ShortCycleSampler +from .samplers import ClassSpecificDistributedSampler, DistributedSampler + +if platform.system() != 'Windows': + # https://github.com/pytorch/pytorch/issues/973 + import resource + rlimit = resource.getrlimit(resource.RLIMIT_NOFILE) + hard_limit = rlimit[1] + soft_limit = min(4096, hard_limit) + resource.setrlimit(resource.RLIMIT_NOFILE, (soft_limit, hard_limit)) + +DATASETS = Registry('dataset') +PIPELINES = Registry('pipeline') +BLENDINGS = Registry('blending') + + +def build_dataset(cfg, default_args=None): + """Build a dataset from config dict. + + Args: + cfg (dict): Config dict. It should at least contain the key "type". + default_args (dict | None, optional): Default initialization arguments. + Default: None. + + Returns: + Dataset: The constructed dataset. + """ + dataset = build_from_cfg(cfg, DATASETS, default_args) + return dataset + + +def build_dataloader(dataset, + videos_per_gpu, + workers_per_gpu, + num_gpus=1, + dist=True, + shuffle=True, + seed=None, + drop_last=False, + pin_memory=True, + persistent_workers=False, + **kwargs): + """Build PyTorch DataLoader. + + In distributed training, each GPU/process has a dataloader. + In non-distributed training, there is only one dataloader for all GPUs. + + Args: + dataset (:obj:`Dataset`): A PyTorch dataset. + videos_per_gpu (int): Number of videos on each GPU, i.e., + batch size of each GPU. + workers_per_gpu (int): How many subprocesses to use for data + loading for each GPU. + num_gpus (int): Number of GPUs. Only used in non-distributed + training. Default: 1. + dist (bool): Distributed training/test or not. Default: True. + shuffle (bool): Whether to shuffle the data at every epoch. + Default: True. + seed (int | None): Seed to be used. Default: None. + drop_last (bool): Whether to drop the last incomplete batch in epoch. + Default: False + pin_memory (bool): Whether to use pin_memory in DataLoader. + Default: True + persistent_workers (bool): If True, the data loader will not shutdown + the worker processes after a dataset has been consumed once. + This allows to maintain the workers Dataset instances alive. + The argument also has effect in PyTorch>=1.8.0. + Default: False + kwargs (dict, optional): Any keyword argument to be used to initialize + DataLoader. + + Returns: + DataLoader: A PyTorch dataloader. + """ + rank, world_size = get_dist_info() + sample_by_class = getattr(dataset, 'sample_by_class', False) + + short_cycle = kwargs.pop('short_cycle', False) + multigrid_cfg = kwargs.pop('multigrid_cfg', None) + crop_size = kwargs.pop('crop_size', 224) + + if dist: + if sample_by_class: + dynamic_length = getattr(dataset, 'dynamic_length', True) + sampler = ClassSpecificDistributedSampler( + dataset, + world_size, + rank, + dynamic_length=dynamic_length, + shuffle=shuffle, + seed=seed) + else: + sampler = DistributedSampler( + dataset, world_size, rank, shuffle=shuffle, seed=seed) + shuffle = False + batch_size = videos_per_gpu + num_workers = workers_per_gpu + + if short_cycle: + batch_sampler = ShortCycleSampler(sampler, batch_size, + multigrid_cfg, crop_size) + init_fn = partial( + worker_init_fn, num_workers=num_workers, rank=rank, + seed=seed) if seed is not None else None + + if digit_version(torch.__version__) >= digit_version('1.8.0'): + kwargs['persistent_workers'] = persistent_workers + + data_loader = DataLoader( + dataset, + batch_sampler=batch_sampler, + num_workers=num_workers, + pin_memory=pin_memory, + worker_init_fn=init_fn, + **kwargs) + return data_loader + + else: + if short_cycle: + raise NotImplementedError( + 'Short cycle using non-dist is not supported') + + sampler = None + batch_size = num_gpus * videos_per_gpu + num_workers = num_gpus * workers_per_gpu + + init_fn = partial( + worker_init_fn, num_workers=num_workers, rank=rank, + seed=seed) if seed is not None else None + + if digit_version(torch.__version__) >= digit_version('1.8.0'): + kwargs['persistent_workers'] = persistent_workers + + data_loader = DataLoader( + dataset, + batch_size=batch_size, + sampler=sampler, + num_workers=num_workers, + collate_fn=partial(collate, samples_per_gpu=videos_per_gpu), + pin_memory=pin_memory, + shuffle=shuffle, + worker_init_fn=init_fn, + drop_last=drop_last, + **kwargs) + + return data_loader + + +def worker_init_fn(worker_id, num_workers, rank, seed): + """Init the random seed for various workers.""" + # The seed of each worker equals to + # num_worker * rank + worker_id + user_seed + worker_seed = num_workers * rank + worker_id + seed + np.random.seed(worker_seed) + random.seed(worker_seed) + torch.manual_seed(worker_seed) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/dataset_wrappers.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/dataset_wrappers.py new file mode 100644 index 0000000000000000000000000000000000000000..7868e40709e02a25cfc0fa7cba23223eb2f461d6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/dataset_wrappers.py @@ -0,0 +1,71 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np + +from .builder import DATASETS, build_dataset + + +@DATASETS.register_module() +class RepeatDataset: + """A wrapper of repeated dataset. + + The length of repeated dataset will be ``times`` larger than the original + dataset. This is useful when the data loading time is long but the dataset + is small. Using RepeatDataset can reduce the data loading time between + epochs. + + Args: + dataset (dict): The config of the dataset to be repeated. + times (int): Repeat times. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + """ + + def __init__(self, dataset, times, test_mode=False): + dataset['test_mode'] = test_mode + self.dataset = build_dataset(dataset) + self.times = times + + self._ori_len = len(self.dataset) + + def __getitem__(self, idx): + """Get data.""" + return self.dataset[idx % self._ori_len] + + def __len__(self): + """Length after repetition.""" + return self.times * self._ori_len + + +@DATASETS.register_module() +class ConcatDataset: + """A wrapper of concatenated dataset. + + The length of concatenated dataset will be the sum of lengths of all + datasets. This is useful when you want to train a model with multiple data + sources. + + Args: + datasets (list[dict]): The configs of the datasets. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + """ + + def __init__(self, datasets, test_mode=False): + + for item in datasets: + item['test_mode'] = test_mode + + datasets = [build_dataset(cfg) for cfg in datasets] + self.datasets = datasets + self.lens = [len(x) for x in self.datasets] + self.cumsum = np.cumsum(self.lens) + + def __getitem__(self, idx): + """Get data.""" + dataset_idx = np.searchsorted(self.cumsum, idx, side='right') + item_idx = idx if dataset_idx == 0 else idx - self.cumsum[dataset_idx] + return self.datasets[dataset_idx][item_idx] + + def __len__(self): + """Length after repetition.""" + return sum(self.lens) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/hvu_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/hvu_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..7049944a21ef066571bdec5530a544d9c8de48cd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/hvu_dataset.py @@ -0,0 +1,192 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import os.path as osp +from collections import OrderedDict + +import mmcv +import numpy as np +from mmcv.utils import print_log + +from ..core import mean_average_precision +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class HVUDataset(BaseDataset): + """HVU dataset, which supports the recognition tags of multiple categories. + Accept both video annotation files or rawframe annotation files. + + The dataset loads videos or raw frames and applies specified transforms to + return a dict containing the frame tensors and other information. + + The ann_file is a json file with multiple dictionaries, and each dictionary + indicates a sample video with the filename and tags, the tags are organized + as different categories. Example of a video dictionary: + + .. code-block:: txt + + { + 'filename': 'gD_G1b0wV5I_001015_001035.mp4', + 'label': { + 'concept': [250, 131, 42, 51, 57, 155, 122], + 'object': [1570, 508], + 'event': [16], + 'action': [180], + 'scene': [206] + } + } + + Example of a rawframe dictionary: + + .. code-block:: txt + + { + 'frame_dir': 'gD_G1b0wV5I_001015_001035', + 'total_frames': 61 + 'label': { + 'concept': [250, 131, 42, 51, 57, 155, 122], + 'object': [1570, 508], + 'event': [16], + 'action': [180], + 'scene': [206] + } + } + + + Args: + ann_file (str): Path to the annotation file, should be a json file. + pipeline (list[dict | callable]): A sequence of data transforms. + tag_categories (list[str]): List of category names of tags. + tag_category_nums (list[int]): List of number of tags in each category. + filename_tmpl (str | None): Template for each filename. If set to None, + video dataset is used. Default: None. + **kwargs: Keyword arguments for ``BaseDataset``. + """ + + def __init__(self, + ann_file, + pipeline, + tag_categories, + tag_category_nums, + filename_tmpl=None, + **kwargs): + assert len(tag_categories) == len(tag_category_nums) + self.tag_categories = tag_categories + self.tag_category_nums = tag_category_nums + self.filename_tmpl = filename_tmpl + self.num_categories = len(self.tag_categories) + self.num_tags = sum(self.tag_category_nums) + self.category2num = dict(zip(tag_categories, tag_category_nums)) + self.start_idx = [0] + for i in range(self.num_categories - 1): + self.start_idx.append(self.start_idx[-1] + + self.tag_category_nums[i]) + self.category2startidx = dict(zip(tag_categories, self.start_idx)) + self.start_index = kwargs.pop('start_index', 0) + self.dataset_type = None + super().__init__( + ann_file, pipeline, start_index=self.start_index, **kwargs) + + def load_annotations(self): + """Load annotation file to get video information.""" + assert self.ann_file.endswith('.json') + return self.load_json_annotations() + + def load_json_annotations(self): + video_infos = mmcv.load(self.ann_file) + num_videos = len(video_infos) + + video_info0 = video_infos[0] + assert ('filename' in video_info0) != ('frame_dir' in video_info0) + path_key = 'filename' if 'filename' in video_info0 else 'frame_dir' + self.dataset_type = 'video' if path_key == 'filename' else 'rawframe' + if self.dataset_type == 'rawframe': + assert self.filename_tmpl is not None + + for i in range(num_videos): + path_value = video_infos[i][path_key] + if self.data_prefix is not None: + path_value = osp.join(self.data_prefix, path_value) + video_infos[i][path_key] = path_value + + # We will convert label to torch tensors in the pipeline + video_infos[i]['categories'] = self.tag_categories + video_infos[i]['category_nums'] = self.tag_category_nums + if self.dataset_type == 'rawframe': + video_infos[i]['filename_tmpl'] = self.filename_tmpl + video_infos[i]['start_index'] = self.start_index + video_infos[i]['modality'] = self.modality + + return video_infos + + @staticmethod + def label2array(num, label): + arr = np.zeros(num, dtype=np.float32) + arr[label] = 1. + return arr + + def evaluate(self, + results, + metrics='mean_average_precision', + metric_options=None, + logger=None): + """Evaluation in HVU Video Dataset. We only support evaluating mAP for + each tag categories. Since some tag categories are missing for some + videos, we can not evaluate mAP for all tags. + + Args: + results (list): Output results. + metrics (str | sequence[str]): Metrics to be performed. + Defaults: 'mean_average_precision'. + metric_options (dict | None): Dict for metric options. + Default: None. + logger (logging.Logger | None): Logger for recording. + Default: None. + + Returns: + dict: Evaluation results dict. + """ + # Protect ``metric_options`` since it uses mutable value as default + metric_options = copy.deepcopy(metric_options) + + if not isinstance(results, list): + raise TypeError(f'results must be a list, but got {type(results)}') + assert len(results) == len(self), ( + f'The length of results is not equal to the dataset len: ' + f'{len(results)} != {len(self)}') + + metrics = metrics if isinstance(metrics, (list, tuple)) else [metrics] + + # There should be only one metric in the metrics list: + # 'mean_average_precision' + assert len(metrics) == 1 + metric = metrics[0] + assert metric == 'mean_average_precision' + + gt_labels = [ann['label'] for ann in self.video_infos] + + eval_results = OrderedDict() + + for category in self.tag_categories: + + start_idx = self.category2startidx[category] + num = self.category2num[category] + preds = [ + result[start_idx:start_idx + num] + for video_idx, result in enumerate(results) + if category in gt_labels[video_idx] + ] + gts = [ + gt_label[category] for gt_label in gt_labels + if category in gt_label + ] + + gts = [self.label2array(num, item) for item in gts] + + mAP = mean_average_precision(preds, gts) + eval_results[f'{category}_mAP'] = mAP + log_msg = f'\n{category}_mAP\t{mAP:.4f}' + print_log(log_msg, logger=logger) + + return eval_results diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/image_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/image_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..6d84b35f85fb595981fa46e65e59c0ceb586510c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/image_dataset.py @@ -0,0 +1,46 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .builder import DATASETS +from .video_dataset import VideoDataset + + +@DATASETS.register_module() +class ImageDataset(VideoDataset): + """Image dataset for action recognition, used in the Project OmniSource. + + The dataset loads image list and apply specified transforms to return a + dict containing the image tensors and other information. For the + ImageDataset + + The ann_file is a text file with multiple lines, and each line indicates + the image path and the image label, which are split with a whitespace. + Example of a annotation file: + + .. code-block:: txt + + path/to/image1.jpg 1 + path/to/image2.jpg 1 + path/to/image3.jpg 2 + path/to/image4.jpg 2 + path/to/image5.jpg 3 + path/to/image6.jpg 3 + + Example of a multi-class annotation file: + + .. code-block:: txt + + path/to/image1.jpg 1 3 5 + path/to/image2.jpg 1 2 + path/to/image3.jpg 2 + path/to/image4.jpg 2 4 6 8 + path/to/image5.jpg 3 + path/to/image6.jpg 3 + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + **kwargs: Keyword arguments for ``BaseDataset``. + """ + + def __init__(self, ann_file, pipeline, **kwargs): + super().__init__(ann_file, pipeline, start_index=None, **kwargs) + # use `start_index=None` to indicate it is for `ImageDataset` diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..1905bf9893539c915c4d0554b43a23473405898d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/__init__.py @@ -0,0 +1,41 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .augmentations import (AudioAmplify, CenterCrop, ColorJitter, Flip, Fuse, + Imgaug, MelSpectrogram, MultiScaleCrop, Normalize, + PytorchVideoTrans, RandomCrop, RandomRescale, + RandomResizedCrop, Resize, TenCrop, ThreeCrop, + TorchvisionTrans) +from .compose import Compose +from .formatting import (Collect, FormatAudioShape, FormatGCNInput, + FormatShape, ImageToTensor, JointToBone, Rename, + ToDataContainer, ToTensor, Transpose) +from .loading import (ArrayDecode, AudioDecode, AudioDecodeInit, + AudioFeatureSelector, BuildPseudoClip, DecordDecode, + DecordInit, DenseSampleFrames, + GenerateLocalizationLabels, ImageDecode, + LoadAudioFeature, LoadHVULabel, LoadLocalizationFeature, + LoadProposals, OpenCVDecode, OpenCVInit, PIMSDecode, + PIMSInit, PyAVDecode, PyAVDecodeMotionVector, PyAVInit, + RawFrameDecode, SampleAVAFrames, SampleFrames, + SampleProposalFrames, UntrimmedSampleFrames) +from .pose_loading import (GeneratePoseTarget, LoadKineticsPose, + PaddingWithLoop, PoseDecode, PoseNormalize, + UniformSampleFrames) + +__all__ = [ + 'SampleFrames', 'PyAVDecode', 'DecordDecode', 'DenseSampleFrames', + 'OpenCVDecode', 'MultiScaleCrop', 'RandomResizedCrop', 'RandomCrop', + 'Resize', 'Flip', 'Fuse', 'Normalize', 'ThreeCrop', 'CenterCrop', + 'TenCrop', 'ImageToTensor', 'Transpose', 'Collect', 'FormatShape', + 'Compose', 'ToTensor', 'ToDataContainer', 'GenerateLocalizationLabels', + 'LoadLocalizationFeature', 'LoadProposals', 'DecordInit', 'OpenCVInit', + 'PyAVInit', 'SampleProposalFrames', 'UntrimmedSampleFrames', + 'RawFrameDecode', 'DecordInit', 'OpenCVInit', 'PyAVInit', + 'SampleProposalFrames', 'ColorJitter', 'LoadHVULabel', 'SampleAVAFrames', + 'AudioAmplify', 'MelSpectrogram', 'AudioDecode', 'FormatAudioShape', + 'LoadAudioFeature', 'AudioFeatureSelector', 'AudioDecodeInit', + 'ImageDecode', 'BuildPseudoClip', 'RandomRescale', + 'PyAVDecodeMotionVector', 'Rename', 'Imgaug', 'UniformSampleFrames', + 'PoseDecode', 'LoadKineticsPose', 'GeneratePoseTarget', 'PIMSInit', + 'PIMSDecode', 'TorchvisionTrans', 'PytorchVideoTrans', 'PoseNormalize', + 'FormatGCNInput', 'PaddingWithLoop', 'ArrayDecode', 'JointToBone' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/augmentations.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/augmentations.py new file mode 100644 index 0000000000000000000000000000000000000000..9bd5d266a1f995507603639c46fd7d2f9a90ecdd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/augmentations.py @@ -0,0 +1,1905 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import random +import warnings +from collections.abc import Sequence + +import cv2 +import mmcv +import numpy as np +from mmcv.utils import digit_version +from torch.nn.modules.utils import _pair + +from ..builder import PIPELINES +from .formatting import to_tensor + + +def _combine_quadruple(a, b): + return (a[0] + a[2] * b[0], a[1] + a[3] * b[1], a[2] * b[2], a[3] * b[3]) + + +def _flip_quadruple(a): + return (1 - a[0] - a[2], a[1], a[2], a[3]) + + +def _init_lazy_if_proper(results, lazy): + """Initialize lazy operation properly. + + Make sure that a lazy operation is properly initialized, + and avoid a non-lazy operation accidentally getting mixed in. + + Required keys in results are "imgs" if "img_shape" not in results, + otherwise, Required keys in results are "img_shape", add or modified keys + are "img_shape", "lazy". + Add or modified keys in "lazy" are "original_shape", "crop_bbox", "flip", + "flip_direction", "interpolation". + + Args: + results (dict): A dict stores data pipeline result. + lazy (bool): Determine whether to apply lazy operation. Default: False. + """ + + if 'img_shape' not in results: + results['img_shape'] = results['imgs'][0].shape[:2] + if lazy: + if 'lazy' not in results: + img_h, img_w = results['img_shape'] + lazyop = dict() + lazyop['original_shape'] = results['img_shape'] + lazyop['crop_bbox'] = np.array([0, 0, img_w, img_h], + dtype=np.float32) + lazyop['flip'] = False + lazyop['flip_direction'] = None + lazyop['interpolation'] = None + results['lazy'] = lazyop + else: + assert 'lazy' not in results, 'Use Fuse after lazy operations' + + +@PIPELINES.register_module() +class TorchvisionTrans: + """Torchvision Augmentations, under torchvision.transforms. + + Args: + type (str): The name of the torchvision transformation. + """ + + def __init__(self, type, **kwargs): + try: + import torchvision + import torchvision.transforms as tv_trans + except ImportError: + raise RuntimeError('Install torchvision to use TorchvisionTrans') + if digit_version(torchvision.__version__) < digit_version('0.8.0'): + raise RuntimeError('The version of torchvision should be at least ' + '0.8.0') + + trans = getattr(tv_trans, type, None) + assert trans, f'Transform {type} not in torchvision' + self.trans = trans(**kwargs) + + def __call__(self, results): + assert 'imgs' in results + + imgs = [x.transpose(2, 0, 1) for x in results['imgs']] + imgs = to_tensor(np.stack(imgs)) + + imgs = self.trans(imgs).data.numpy() + imgs[imgs > 255] = 255 + imgs[imgs < 0] = 0 + imgs = imgs.astype(np.uint8) + imgs = [x.transpose(1, 2, 0) for x in imgs] + results['imgs'] = imgs + return results + + +@PIPELINES.register_module() +class PytorchVideoTrans: + """PytorchVideoTrans Augmentations, under pytorchvideo.transforms. + + Args: + type (str): The name of the pytorchvideo transformation. + """ + + def __init__(self, type, **kwargs): + try: + import pytorchvideo.transforms as ptv_trans + import torch + except ImportError: + raise RuntimeError('Install pytorchvideo to use PytorchVideoTrans') + if digit_version(torch.__version__) < digit_version('1.8.0'): + raise RuntimeError( + 'The version of PyTorch should be at least 1.8.0') + + trans = getattr(ptv_trans, type, None) + assert trans, f'Transform {type} not in pytorchvideo' + + supported_pytorchvideo_trans = ('AugMix', 'RandAugment', + 'RandomResizedCrop', 'ShortSideScale', + 'RandomShortSideScale') + assert type in supported_pytorchvideo_trans,\ + f'PytorchVideo Transform {type} is not supported in MMAction2' + + self.trans = trans(**kwargs) + self.type = type + + def __call__(self, results): + assert 'imgs' in results + + assert 'gt_bboxes' not in results,\ + f'PytorchVideo {self.type} doesn\'t support bboxes yet.' + assert 'proposals' not in results,\ + f'PytorchVideo {self.type} doesn\'t support bboxes yet.' + + if self.type in ('AugMix', 'RandAugment'): + # list[ndarray(h, w, 3)] -> torch.tensor(t, c, h, w) + imgs = [x.transpose(2, 0, 1) for x in results['imgs']] + imgs = to_tensor(np.stack(imgs)) + else: + # list[ndarray(h, w, 3)] -> torch.tensor(c, t, h, w) + # uint8 -> float32 + imgs = to_tensor((np.stack(results['imgs']).transpose(3, 0, 1, 2) / + 255.).astype(np.float32)) + + imgs = self.trans(imgs).data.numpy() + + if self.type in ('AugMix', 'RandAugment'): + imgs[imgs > 255] = 255 + imgs[imgs < 0] = 0 + imgs = imgs.astype(np.uint8) + + # torch.tensor(t, c, h, w) -> list[ndarray(h, w, 3)] + imgs = [x.transpose(1, 2, 0) for x in imgs] + else: + # float32 -> uint8 + imgs = imgs * 255 + imgs[imgs > 255] = 255 + imgs[imgs < 0] = 0 + imgs = imgs.astype(np.uint8) + + # torch.tensor(c, t, h, w) -> list[ndarray(h, w, 3)] + imgs = [x for x in imgs.transpose(1, 2, 3, 0)] + + results['imgs'] = imgs + + return results + + +@PIPELINES.register_module() +class PoseCompact: + """Convert the coordinates of keypoints to make it more compact. + Specifically, it first find a tight bounding box that surrounds all joints + in each frame, then we expand the tight box by a given padding ratio. For + example, if 'padding == 0.25', then the expanded box has unchanged center, + and 1.25x width and height. + + Required keys in results are "img_shape", "keypoint", add or modified keys + are "img_shape", "keypoint", "crop_quadruple". + + Args: + padding (float): The padding size. Default: 0.25. + threshold (int): The threshold for the tight bounding box. If the width + or height of the tight bounding box is smaller than the threshold, + we do not perform the compact operation. Default: 10. + hw_ratio (float | tuple[float] | None): The hw_ratio of the expanded + box. Float indicates the specific ratio and tuple indicates a + ratio range. If set as None, it means there is no requirement on + hw_ratio. Default: None. + allow_imgpad (bool): Whether to allow expanding the box outside the + image to meet the hw_ratio requirement. Default: True. + + Returns: + type: Description of returned object. + """ + + def __init__(self, + padding=0.25, + threshold=10, + hw_ratio=None, + allow_imgpad=True): + + self.padding = padding + self.threshold = threshold + if hw_ratio is not None: + hw_ratio = _pair(hw_ratio) + + self.hw_ratio = hw_ratio + + self.allow_imgpad = allow_imgpad + assert self.padding >= 0 + + def __call__(self, results): + img_shape = results['img_shape'] + h, w = img_shape + kp = results['keypoint'] + + # Make NaN zero + kp[np.isnan(kp)] = 0. + kp_x = kp[..., 0] + kp_y = kp[..., 1] + + min_x = np.min(kp_x[kp_x != 0], initial=np.Inf) + min_y = np.min(kp_y[kp_y != 0], initial=np.Inf) + max_x = np.max(kp_x[kp_x != 0], initial=-np.Inf) + max_y = np.max(kp_y[kp_y != 0], initial=-np.Inf) + + # The compact area is too small + if max_x - min_x < self.threshold or max_y - min_y < self.threshold: + return results + + center = ((max_x + min_x) / 2, (max_y + min_y) / 2) + half_width = (max_x - min_x) / 2 * (1 + self.padding) + half_height = (max_y - min_y) / 2 * (1 + self.padding) + + if self.hw_ratio is not None: + half_height = max(self.hw_ratio[0] * half_width, half_height) + half_width = max(1 / self.hw_ratio[1] * half_height, half_width) + + min_x, max_x = center[0] - half_width, center[0] + half_width + min_y, max_y = center[1] - half_height, center[1] + half_height + + # hot update + if not self.allow_imgpad: + min_x, min_y = int(max(0, min_x)), int(max(0, min_y)) + max_x, max_y = int(min(w, max_x)), int(min(h, max_y)) + else: + min_x, min_y = int(min_x), int(min_y) + max_x, max_y = int(max_x), int(max_y) + + kp_x[kp_x != 0] -= min_x + kp_y[kp_y != 0] -= min_y + + new_shape = (max_y - min_y, max_x - min_x) + results['img_shape'] = new_shape + + # the order is x, y, w, h (in [0, 1]), a tuple + crop_quadruple = results.get('crop_quadruple', (0., 0., 1., 1.)) + new_crop_quadruple = (min_x / w, min_y / h, (max_x - min_x) / w, + (max_y - min_y) / h) + crop_quadruple = _combine_quadruple(crop_quadruple, new_crop_quadruple) + results['crop_quadruple'] = crop_quadruple + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(padding={self.padding}, ' + f'threshold={self.threshold}, ' + f'hw_ratio={self.hw_ratio}, ' + f'allow_imgpad={self.allow_imgpad})') + return repr_str + + +@PIPELINES.register_module() +class Imgaug: + """Imgaug augmentation. + + Adds custom transformations from imgaug library. + Please visit `https://imgaug.readthedocs.io/en/latest/index.html` + to get more information. Two demo configs could be found in tsn and i3d + config folder. + + It's better to use uint8 images as inputs since imgaug works best with + numpy dtype uint8 and isn't well tested with other dtypes. It should be + noted that not all of the augmenters have the same input and output dtype, + which may cause unexpected results. + + Required keys are "imgs", "img_shape"(if "gt_bboxes" is not None) and + "modality", added or modified keys are "imgs", "img_shape", "gt_bboxes" + and "proposals". + + It is worth mentioning that `Imgaug` will NOT create custom keys like + "interpolation", "crop_bbox", "flip_direction", etc. So when using + `Imgaug` along with other mmaction2 pipelines, we should pay more attention + to required keys. + + Two steps to use `Imgaug` pipeline: + 1. Create initialization parameter `transforms`. There are three ways + to create `transforms`. + 1) string: only support `default` for now. + e.g. `transforms='default'` + 2) list[dict]: create a list of augmenters by a list of dicts, each + dict corresponds to one augmenter. Every dict MUST contain a key + named `type`. `type` should be a string(iaa.Augmenter's name) or + an iaa.Augmenter subclass. + e.g. `transforms=[dict(type='Rotate', rotate=(-20, 20))]` + e.g. `transforms=[dict(type=iaa.Rotate, rotate=(-20, 20))]` + 3) iaa.Augmenter: create an imgaug.Augmenter object. + e.g. `transforms=iaa.Rotate(rotate=(-20, 20))` + 2. Add `Imgaug` in dataset pipeline. It is recommended to insert imgaug + pipeline before `Normalize`. A demo pipeline is listed as follows. + ``` + pipeline = [ + dict( + type='SampleFrames', + clip_len=1, + frame_interval=1, + num_clips=16, + ), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.875, 0.75, 0.66), + random_crop=False, + max_wh_scale_gap=1, + num_fixed_crops=13), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Imgaug', transforms='default'), + # dict(type='Imgaug', transforms=[ + # dict(type='Rotate', rotate=(-20, 20)) + # ]), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) + ] + ``` + + Args: + transforms (str | list[dict] | :obj:`iaa.Augmenter`): Three different + ways to create imgaug augmenter. + """ + + def __init__(self, transforms): + import imgaug.augmenters as iaa + + if transforms == 'default': + self.transforms = self.default_transforms() + elif isinstance(transforms, list): + assert all(isinstance(trans, dict) for trans in transforms) + self.transforms = transforms + elif isinstance(transforms, iaa.Augmenter): + self.aug = self.transforms = transforms + else: + raise ValueError('transforms must be `default` or a list of dicts' + ' or iaa.Augmenter object') + + if not isinstance(transforms, iaa.Augmenter): + self.aug = iaa.Sequential( + [self.imgaug_builder(t) for t in self.transforms]) + + @staticmethod + def default_transforms(): + """Default transforms for imgaug. + + Implement RandAugment by imgaug. + Please visit `https://arxiv.org/abs/1909.13719` for more information. + + Augmenters and hyper parameters are borrowed from the following repo: + https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/autoaugment.py # noqa + + Miss one augmenter ``SolarizeAdd`` since imgaug doesn't support this. + + Returns: + dict: The constructed RandAugment transforms. + """ + # RandAugment hyper params + num_augmenters = 2 + cur_magnitude, max_magnitude = 9, 10 + cur_level = 1.0 * cur_magnitude / max_magnitude + + return [ + dict( + type='SomeOf', + n=num_augmenters, + children=[ + dict( + type='ShearX', + shear=17.19 * cur_level * random.choice([-1, 1])), + dict( + type='ShearY', + shear=17.19 * cur_level * random.choice([-1, 1])), + dict( + type='TranslateX', + percent=.2 * cur_level * random.choice([-1, 1])), + dict( + type='TranslateY', + percent=.2 * cur_level * random.choice([-1, 1])), + dict( + type='Rotate', + rotate=30 * cur_level * random.choice([-1, 1])), + dict(type='Posterize', nb_bits=max(1, int(4 * cur_level))), + dict(type='Solarize', threshold=256 * cur_level), + dict(type='EnhanceColor', factor=1.8 * cur_level + .1), + dict(type='EnhanceContrast', factor=1.8 * cur_level + .1), + dict( + type='EnhanceBrightness', factor=1.8 * cur_level + .1), + dict(type='EnhanceSharpness', factor=1.8 * cur_level + .1), + dict(type='Autocontrast', cutoff=0), + dict(type='Equalize'), + dict(type='Invert', p=1.), + dict( + type='Cutout', + nb_iterations=1, + size=0.2 * cur_level, + squared=True) + ]) + ] + + def imgaug_builder(self, cfg): + """Import a module from imgaug. + + It follows the logic of :func:`build_from_cfg`. Use a dict object to + create an iaa.Augmenter object. + + Args: + cfg (dict): Config dict. It should at least contain the key "type". + + Returns: + obj:`iaa.Augmenter`: The constructed imgaug augmenter. + """ + import imgaug.augmenters as iaa + + assert isinstance(cfg, dict) and 'type' in cfg + args = cfg.copy() + + obj_type = args.pop('type') + if mmcv.is_str(obj_type): + obj_cls = getattr(iaa, obj_type) if hasattr(iaa, obj_type) \ + else getattr(iaa.pillike, obj_type) + elif issubclass(obj_type, iaa.Augmenter): + obj_cls = obj_type + else: + raise TypeError( + f'type must be a str or valid type, but got {type(obj_type)}') + + if 'children' in args: + args['children'] = [ + self.imgaug_builder(child) for child in args['children'] + ] + + return obj_cls(**args) + + def __repr__(self): + repr_str = self.__class__.__name__ + f'(transforms={self.aug})' + return repr_str + + def __call__(self, results): + assert results['modality'] == 'RGB', 'Imgaug only support RGB images.' + in_type = results['imgs'][0].dtype.type + + cur_aug = self.aug.to_deterministic() + + results['imgs'] = [ + cur_aug.augment_image(frame) for frame in results['imgs'] + ] + img_h, img_w, _ = results['imgs'][0].shape + + out_type = results['imgs'][0].dtype.type + assert in_type == out_type, \ + ('Imgaug input dtype and output dtype are not the same. ', + f'Convert from {in_type} to {out_type}') + + if 'gt_bboxes' in results: + from imgaug.augmentables import bbs + bbox_list = [ + bbs.BoundingBox( + x1=bbox[0], y1=bbox[1], x2=bbox[2], y2=bbox[3]) + for bbox in results['gt_bboxes'] + ] + bboxes = bbs.BoundingBoxesOnImage( + bbox_list, shape=results['img_shape']) + bbox_aug, *_ = cur_aug.augment_bounding_boxes([bboxes]) + results['gt_bboxes'] = [[ + max(bbox.x1, 0), + max(bbox.y1, 0), + min(bbox.x2, img_w), + min(bbox.y2, img_h) + ] for bbox in bbox_aug.items] + if 'proposals' in results: + bbox_list = [ + bbs.BoundingBox( + x1=bbox[0], y1=bbox[1], x2=bbox[2], y2=bbox[3]) + for bbox in results['proposals'] + ] + bboxes = bbs.BoundingBoxesOnImage( + bbox_list, shape=results['img_shape']) + bbox_aug, *_ = cur_aug.augment_bounding_boxes([bboxes]) + results['proposals'] = [[ + max(bbox.x1, 0), + max(bbox.y1, 0), + min(bbox.x2, img_w), + min(bbox.y2, img_h) + ] for bbox in bbox_aug.items] + + results['img_shape'] = (img_h, img_w) + + return results + + +@PIPELINES.register_module() +class Fuse: + """Fuse lazy operations. + + Fusion order: + crop -> resize -> flip + + Required keys are "imgs", "img_shape" and "lazy", added or modified keys + are "imgs", "lazy". + Required keys in "lazy" are "crop_bbox", "interpolation", "flip_direction". + """ + + def __call__(self, results): + if 'lazy' not in results: + raise ValueError('No lazy operation detected') + lazyop = results['lazy'] + imgs = results['imgs'] + + # crop + left, top, right, bottom = lazyop['crop_bbox'].round().astype(int) + imgs = [img[top:bottom, left:right] for img in imgs] + + # resize + img_h, img_w = results['img_shape'] + if lazyop['interpolation'] is None: + interpolation = 'bilinear' + else: + interpolation = lazyop['interpolation'] + imgs = [ + mmcv.imresize(img, (img_w, img_h), interpolation=interpolation) + for img in imgs + ] + + # flip + if lazyop['flip']: + for img in imgs: + mmcv.imflip_(img, lazyop['flip_direction']) + + results['imgs'] = imgs + del results['lazy'] + + return results + + +@PIPELINES.register_module() +class RandomCrop: + """Vanilla square random crop that specifics the output size. + + Required keys in results are "img_shape", "keypoint" (optional), "imgs" + (optional), added or modified keys are "keypoint", "imgs", "lazy"; Required + keys in "lazy" are "flip", "crop_bbox", added or modified key is + "crop_bbox". + + Args: + size (int): The output size of the images. + lazy (bool): Determine whether to apply lazy operation. Default: False. + """ + + def __init__(self, size, lazy=False): + if not isinstance(size, int): + raise TypeError(f'Size must be an int, but got {type(size)}') + self.size = size + self.lazy = lazy + + @staticmethod + def _crop_kps(kps, crop_bbox): + return kps - crop_bbox[:2] + + @staticmethod + def _crop_imgs(imgs, crop_bbox): + x1, y1, x2, y2 = crop_bbox + return [img[y1:y2, x1:x2] for img in imgs] + + @staticmethod + def _box_crop(box, crop_bbox): + """Crop the bounding boxes according to the crop_bbox. + + Args: + box (np.ndarray): The bounding boxes. + crop_bbox(np.ndarray): The bbox used to crop the original image. + """ + + x1, y1, x2, y2 = crop_bbox + img_w, img_h = x2 - x1, y2 - y1 + + box_ = box.copy() + box_[..., 0::2] = np.clip(box[..., 0::2] - x1, 0, img_w - 1) + box_[..., 1::2] = np.clip(box[..., 1::2] - y1, 0, img_h - 1) + return box_ + + def _all_box_crop(self, results, crop_bbox): + """Crop the gt_bboxes and proposals in results according to crop_bbox. + + Args: + results (dict): All information about the sample, which contain + 'gt_bboxes' and 'proposals' (optional). + crop_bbox(np.ndarray): The bbox used to crop the original image. + """ + results['gt_bboxes'] = self._box_crop(results['gt_bboxes'], crop_bbox) + if 'proposals' in results and results['proposals'] is not None: + assert results['proposals'].shape[1] == 4 + results['proposals'] = self._box_crop(results['proposals'], + crop_bbox) + return results + + def __call__(self, results): + """Performs the RandomCrop augmentation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + _init_lazy_if_proper(results, self.lazy) + if 'keypoint' in results: + assert not self.lazy, ('Keypoint Augmentations are not compatible ' + 'with lazy == True') + + img_h, img_w = results['img_shape'] + assert self.size <= img_h and self.size <= img_w + + y_offset = 0 + x_offset = 0 + if img_h > self.size: + y_offset = int(np.random.randint(0, img_h - self.size)) + if img_w > self.size: + x_offset = int(np.random.randint(0, img_w - self.size)) + + if 'crop_quadruple' not in results: + results['crop_quadruple'] = np.array( + [0, 0, 1, 1], # x, y, w, h + dtype=np.float32) + + x_ratio, y_ratio = x_offset / img_w, y_offset / img_h + w_ratio, h_ratio = self.size / img_w, self.size / img_h + + old_crop_quadruple = results['crop_quadruple'] + old_x_ratio, old_y_ratio = old_crop_quadruple[0], old_crop_quadruple[1] + old_w_ratio, old_h_ratio = old_crop_quadruple[2], old_crop_quadruple[3] + new_crop_quadruple = [ + old_x_ratio + x_ratio * old_w_ratio, + old_y_ratio + y_ratio * old_h_ratio, w_ratio * old_w_ratio, + h_ratio * old_h_ratio + ] + results['crop_quadruple'] = np.array( + new_crop_quadruple, dtype=np.float32) + + new_h, new_w = self.size, self.size + + crop_bbox = np.array( + [x_offset, y_offset, x_offset + new_w, y_offset + new_h]) + results['crop_bbox'] = crop_bbox + + results['img_shape'] = (new_h, new_w) + + if not self.lazy: + if 'keypoint' in results: + results['keypoint'] = self._crop_kps(results['keypoint'], + crop_bbox) + if 'imgs' in results: + results['imgs'] = self._crop_imgs(results['imgs'], crop_bbox) + else: + lazyop = results['lazy'] + if lazyop['flip']: + raise NotImplementedError('Put Flip at last for now') + + # record crop_bbox in lazyop dict to ensure only crop once in Fuse + lazy_left, lazy_top, lazy_right, lazy_bottom = lazyop['crop_bbox'] + left = x_offset * (lazy_right - lazy_left) / img_w + right = (x_offset + new_w) * (lazy_right - lazy_left) / img_w + top = y_offset * (lazy_bottom - lazy_top) / img_h + bottom = (y_offset + new_h) * (lazy_bottom - lazy_top) / img_h + lazyop['crop_bbox'] = np.array([(lazy_left + left), + (lazy_top + top), + (lazy_left + right), + (lazy_top + bottom)], + dtype=np.float32) + + # Process entity boxes + if 'gt_bboxes' in results: + assert not self.lazy + results = self._all_box_crop(results, results['crop_bbox']) + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(size={self.size}, ' + f'lazy={self.lazy})') + return repr_str + + +@PIPELINES.register_module() +class RandomResizedCrop(RandomCrop): + """Random crop that specifics the area and height-weight ratio range. + + Required keys in results are "img_shape", "crop_bbox", "imgs" (optional), + "keypoint" (optional), added or modified keys are "imgs", "keypoint", + "crop_bbox" and "lazy"; Required keys in "lazy" are "flip", "crop_bbox", + added or modified key is "crop_bbox". + + Args: + area_range (Tuple[float]): The candidate area scales range of + output cropped images. Default: (0.08, 1.0). + aspect_ratio_range (Tuple[float]): The candidate aspect ratio range of + output cropped images. Default: (3 / 4, 4 / 3). + lazy (bool): Determine whether to apply lazy operation. Default: False. + """ + + def __init__(self, + area_range=(0.08, 1.0), + aspect_ratio_range=(3 / 4, 4 / 3), + lazy=False): + self.area_range = area_range + self.aspect_ratio_range = aspect_ratio_range + self.lazy = lazy + if not mmcv.is_tuple_of(self.area_range, float): + raise TypeError(f'Area_range must be a tuple of float, ' + f'but got {type(area_range)}') + if not mmcv.is_tuple_of(self.aspect_ratio_range, float): + raise TypeError(f'Aspect_ratio_range must be a tuple of float, ' + f'but got {type(aspect_ratio_range)}') + + @staticmethod + def get_crop_bbox(img_shape, + area_range, + aspect_ratio_range, + max_attempts=10): + """Get a crop bbox given the area range and aspect ratio range. + + Args: + img_shape (Tuple[int]): Image shape + area_range (Tuple[float]): The candidate area scales range of + output cropped images. Default: (0.08, 1.0). + aspect_ratio_range (Tuple[float]): The candidate aspect + ratio range of output cropped images. Default: (3 / 4, 4 / 3). + max_attempts (int): The maximum of attempts. Default: 10. + max_attempts (int): Max attempts times to generate random candidate + bounding box. If it doesn't qualified one, the center bounding + box will be used. + Returns: + (list[int]) A random crop bbox within the area range and aspect + ratio range. + """ + assert 0 < area_range[0] <= area_range[1] <= 1 + assert 0 < aspect_ratio_range[0] <= aspect_ratio_range[1] + + img_h, img_w = img_shape + area = img_h * img_w + + min_ar, max_ar = aspect_ratio_range + aspect_ratios = np.exp( + np.random.uniform( + np.log(min_ar), np.log(max_ar), size=max_attempts)) + target_areas = np.random.uniform(*area_range, size=max_attempts) * area + candidate_crop_w = np.round(np.sqrt(target_areas * + aspect_ratios)).astype(np.int32) + candidate_crop_h = np.round(np.sqrt(target_areas / + aspect_ratios)).astype(np.int32) + + for i in range(max_attempts): + crop_w = candidate_crop_w[i] + crop_h = candidate_crop_h[i] + if crop_h <= img_h and crop_w <= img_w: + x_offset = random.randint(0, img_w - crop_w) + y_offset = random.randint(0, img_h - crop_h) + return x_offset, y_offset, x_offset + crop_w, y_offset + crop_h + + # Fallback + crop_size = min(img_h, img_w) + x_offset = (img_w - crop_size) // 2 + y_offset = (img_h - crop_size) // 2 + return x_offset, y_offset, x_offset + crop_size, y_offset + crop_size + + def __call__(self, results): + """Performs the RandomResizeCrop augmentation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + _init_lazy_if_proper(results, self.lazy) + if 'keypoint' in results: + assert not self.lazy, ('Keypoint Augmentations are not compatible ' + 'with lazy == True') + + img_h, img_w = results['img_shape'] + + left, top, right, bottom = self.get_crop_bbox( + (img_h, img_w), self.area_range, self.aspect_ratio_range) + new_h, new_w = bottom - top, right - left + + if 'crop_quadruple' not in results: + results['crop_quadruple'] = np.array( + [0, 0, 1, 1], # x, y, w, h + dtype=np.float32) + + x_ratio, y_ratio = left / img_w, top / img_h + w_ratio, h_ratio = new_w / img_w, new_h / img_h + + old_crop_quadruple = results['crop_quadruple'] + old_x_ratio, old_y_ratio = old_crop_quadruple[0], old_crop_quadruple[1] + old_w_ratio, old_h_ratio = old_crop_quadruple[2], old_crop_quadruple[3] + new_crop_quadruple = [ + old_x_ratio + x_ratio * old_w_ratio, + old_y_ratio + y_ratio * old_h_ratio, w_ratio * old_w_ratio, + h_ratio * old_h_ratio + ] + results['crop_quadruple'] = np.array( + new_crop_quadruple, dtype=np.float32) + + crop_bbox = np.array([left, top, right, bottom]) + results['crop_bbox'] = crop_bbox + results['img_shape'] = (new_h, new_w) + + if not self.lazy: + if 'keypoint' in results: + results['keypoint'] = self._crop_kps(results['keypoint'], + crop_bbox) + if 'imgs' in results: + results['imgs'] = self._crop_imgs(results['imgs'], crop_bbox) + else: + lazyop = results['lazy'] + if lazyop['flip']: + raise NotImplementedError('Put Flip at last for now') + + # record crop_bbox in lazyop dict to ensure only crop once in Fuse + lazy_left, lazy_top, lazy_right, lazy_bottom = lazyop['crop_bbox'] + left = left * (lazy_right - lazy_left) / img_w + right = right * (lazy_right - lazy_left) / img_w + top = top * (lazy_bottom - lazy_top) / img_h + bottom = bottom * (lazy_bottom - lazy_top) / img_h + lazyop['crop_bbox'] = np.array([(lazy_left + left), + (lazy_top + top), + (lazy_left + right), + (lazy_top + bottom)], + dtype=np.float32) + + if 'gt_bboxes' in results: + assert not self.lazy + results = self._all_box_crop(results, results['crop_bbox']) + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'area_range={self.area_range}, ' + f'aspect_ratio_range={self.aspect_ratio_range}, ' + f'lazy={self.lazy})') + return repr_str + + +@PIPELINES.register_module() +class MultiScaleCrop(RandomCrop): + """Crop images with a list of randomly selected scales. + + Randomly select the w and h scales from a list of scales. Scale of 1 means + the base size, which is the minimal of image width and height. The scale + level of w and h is controlled to be smaller than a certain value to + prevent too large or small aspect ratio. + + Required keys are "img_shape", "imgs" (optional), "keypoint" (optional), + added or modified keys are "imgs", "crop_bbox", "img_shape", "lazy" and + "scales". Required keys in "lazy" are "crop_bbox", added or modified key is + "crop_bbox". + + Args: + input_size (int | tuple[int]): (w, h) of network input. + scales (tuple[float]): width and height scales to be selected. + max_wh_scale_gap (int): Maximum gap of w and h scale levels. + Default: 1. + random_crop (bool): If set to True, the cropping bbox will be randomly + sampled, otherwise it will be sampler from fixed regions. + Default: False. + num_fixed_crops (int): If set to 5, the cropping bbox will keep 5 + basic fixed regions: "upper left", "upper right", "lower left", + "lower right", "center". If set to 13, the cropping bbox will + append another 8 fix regions: "center left", "center right", + "lower center", "upper center", "upper left quarter", + "upper right quarter", "lower left quarter", "lower right quarter". + Default: 5. + lazy (bool): Determine whether to apply lazy operation. Default: False. + """ + + def __init__(self, + input_size, + scales=(1, ), + max_wh_scale_gap=1, + random_crop=False, + num_fixed_crops=5, + lazy=False): + self.input_size = _pair(input_size) + if not mmcv.is_tuple_of(self.input_size, int): + raise TypeError(f'Input_size must be int or tuple of int, ' + f'but got {type(input_size)}') + + if not isinstance(scales, tuple): + raise TypeError(f'Scales must be tuple, but got {type(scales)}') + + if num_fixed_crops not in [5, 13]: + raise ValueError(f'Num_fix_crops must be in {[5, 13]}, ' + f'but got {num_fixed_crops}') + + self.scales = scales + self.max_wh_scale_gap = max_wh_scale_gap + self.random_crop = random_crop + self.num_fixed_crops = num_fixed_crops + self.lazy = lazy + + def __call__(self, results): + """Performs the MultiScaleCrop augmentation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + _init_lazy_if_proper(results, self.lazy) + if 'keypoint' in results: + assert not self.lazy, ('Keypoint Augmentations are not compatible ' + 'with lazy == True') + + img_h, img_w = results['img_shape'] + base_size = min(img_h, img_w) + crop_sizes = [int(base_size * s) for s in self.scales] + + candidate_sizes = [] + for i, h in enumerate(crop_sizes): + for j, w in enumerate(crop_sizes): + if abs(i - j) <= self.max_wh_scale_gap: + candidate_sizes.append([w, h]) + + crop_size = random.choice(candidate_sizes) + for i in range(2): + if abs(crop_size[i] - self.input_size[i]) < 3: + crop_size[i] = self.input_size[i] + + crop_w, crop_h = crop_size + + if self.random_crop: + x_offset = random.randint(0, img_w - crop_w) + y_offset = random.randint(0, img_h - crop_h) + else: + w_step = (img_w - crop_w) // 4 + h_step = (img_h - crop_h) // 4 + candidate_offsets = [ + (0, 0), # upper left + (4 * w_step, 0), # upper right + (0, 4 * h_step), # lower left + (4 * w_step, 4 * h_step), # lower right + (2 * w_step, 2 * h_step), # center + ] + if self.num_fixed_crops == 13: + extra_candidate_offsets = [ + (0, 2 * h_step), # center left + (4 * w_step, 2 * h_step), # center right + (2 * w_step, 4 * h_step), # lower center + (2 * w_step, 0 * h_step), # upper center + (1 * w_step, 1 * h_step), # upper left quarter + (3 * w_step, 1 * h_step), # upper right quarter + (1 * w_step, 3 * h_step), # lower left quarter + (3 * w_step, 3 * h_step) # lower right quarter + ] + candidate_offsets.extend(extra_candidate_offsets) + x_offset, y_offset = random.choice(candidate_offsets) + + new_h, new_w = crop_h, crop_w + + crop_bbox = np.array( + [x_offset, y_offset, x_offset + new_w, y_offset + new_h]) + results['crop_bbox'] = crop_bbox + results['img_shape'] = (new_h, new_w) + results['scales'] = self.scales + + if 'crop_quadruple' not in results: + results['crop_quadruple'] = np.array( + [0, 0, 1, 1], # x, y, w, h + dtype=np.float32) + + x_ratio, y_ratio = x_offset / img_w, y_offset / img_h + w_ratio, h_ratio = new_w / img_w, new_h / img_h + + old_crop_quadruple = results['crop_quadruple'] + old_x_ratio, old_y_ratio = old_crop_quadruple[0], old_crop_quadruple[1] + old_w_ratio, old_h_ratio = old_crop_quadruple[2], old_crop_quadruple[3] + new_crop_quadruple = [ + old_x_ratio + x_ratio * old_w_ratio, + old_y_ratio + y_ratio * old_h_ratio, w_ratio * old_w_ratio, + h_ratio * old_h_ratio + ] + results['crop_quadruple'] = np.array( + new_crop_quadruple, dtype=np.float32) + + if not self.lazy: + if 'keypoint' in results: + results['keypoint'] = self._crop_kps(results['keypoint'], + crop_bbox) + if 'imgs' in results: + results['imgs'] = self._crop_imgs(results['imgs'], crop_bbox) + else: + lazyop = results['lazy'] + if lazyop['flip']: + raise NotImplementedError('Put Flip at last for now') + + # record crop_bbox in lazyop dict to ensure only crop once in Fuse + lazy_left, lazy_top, lazy_right, lazy_bottom = lazyop['crop_bbox'] + left = x_offset * (lazy_right - lazy_left) / img_w + right = (x_offset + new_w) * (lazy_right - lazy_left) / img_w + top = y_offset * (lazy_bottom - lazy_top) / img_h + bottom = (y_offset + new_h) * (lazy_bottom - lazy_top) / img_h + lazyop['crop_bbox'] = np.array([(lazy_left + left), + (lazy_top + top), + (lazy_left + right), + (lazy_top + bottom)], + dtype=np.float32) + + if 'gt_bboxes' in results: + assert not self.lazy + results = self._all_box_crop(results, results['crop_bbox']) + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'input_size={self.input_size}, scales={self.scales}, ' + f'max_wh_scale_gap={self.max_wh_scale_gap}, ' + f'random_crop={self.random_crop}, ' + f'num_fixed_crops={self.num_fixed_crops}, ' + f'lazy={self.lazy})') + return repr_str + + +@PIPELINES.register_module() +class Resize: + """Resize images to a specific size. + + Required keys are "img_shape", "modality", "imgs" (optional), "keypoint" + (optional), added or modified keys are "imgs", "img_shape", "keep_ratio", + "scale_factor", "lazy", "resize_size". Required keys in "lazy" is None, + added or modified key is "interpolation". + + Args: + scale (float | Tuple[int]): If keep_ratio is True, it serves as scaling + factor or maximum size: + If it is a float number, the image will be rescaled by this + factor, else if it is a tuple of 2 integers, the image will + be rescaled as large as possible within the scale. + Otherwise, it serves as (w, h) of output size. + keep_ratio (bool): If set to True, Images will be resized without + changing the aspect ratio. Otherwise, it will resize images to a + given size. Default: True. + interpolation (str): Algorithm used for interpolation: + "nearest" | "bilinear". Default: "bilinear". + lazy (bool): Determine whether to apply lazy operation. Default: False. + """ + + def __init__(self, + scale, + keep_ratio=True, + interpolation='bilinear', + lazy=False): + if isinstance(scale, float): + if scale <= 0: + raise ValueError(f'Invalid scale {scale}, must be positive.') + elif isinstance(scale, tuple): + max_long_edge = max(scale) + max_short_edge = min(scale) + if max_short_edge == -1: + # assign np.inf to long edge for rescaling short edge later. + scale = (np.inf, max_long_edge) + else: + raise TypeError( + f'Scale must be float or tuple of int, but got {type(scale)}') + self.scale = scale + self.keep_ratio = keep_ratio + self.interpolation = interpolation + self.lazy = lazy + + def _resize_imgs(self, imgs, new_w, new_h): + return [ + mmcv.imresize( + img, (new_w, new_h), interpolation=self.interpolation) + for img in imgs + ] + + @staticmethod + def _resize_kps(kps, scale_factor): + return kps * scale_factor + + @staticmethod + def _box_resize(box, scale_factor): + """Rescale the bounding boxes according to the scale_factor. + + Args: + box (np.ndarray): The bounding boxes. + scale_factor (np.ndarray): The scale factor used for rescaling. + """ + assert len(scale_factor) == 2 + scale_factor = np.concatenate([scale_factor, scale_factor]) + return box * scale_factor + + def __call__(self, results): + """Performs the Resize augmentation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + + _init_lazy_if_proper(results, self.lazy) + if 'keypoint' in results: + assert not self.lazy, ('Keypoint Augmentations are not compatible ' + 'with lazy == True') + + if 'scale_factor' not in results: + results['scale_factor'] = np.array([1, 1], dtype=np.float32) + img_h, img_w = results['img_shape'] + + if self.keep_ratio: + new_w, new_h = mmcv.rescale_size((img_w, img_h), self.scale) + else: + new_w, new_h = self.scale + + self.scale_factor = np.array([new_w / img_w, new_h / img_h], + dtype=np.float32) + + results['img_shape'] = (new_h, new_w) + results['keep_ratio'] = self.keep_ratio + results['scale_factor'] = results['scale_factor'] * self.scale_factor + + if not self.lazy: + if 'imgs' in results: + results['imgs'] = self._resize_imgs(results['imgs'], new_w, + new_h) + if 'keypoint' in results: + results['keypoint'] = self._resize_kps(results['keypoint'], + self.scale_factor) + else: + lazyop = results['lazy'] + if lazyop['flip']: + raise NotImplementedError('Put Flip at last for now') + lazyop['interpolation'] = self.interpolation + + if 'gt_bboxes' in results: + assert not self.lazy + results['gt_bboxes'] = self._box_resize(results['gt_bboxes'], + self.scale_factor) + if 'proposals' in results and results['proposals'] is not None: + assert results['proposals'].shape[1] == 4 + results['proposals'] = self._box_resize( + results['proposals'], self.scale_factor) + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'scale={self.scale}, keep_ratio={self.keep_ratio}, ' + f'interpolation={self.interpolation}, ' + f'lazy={self.lazy})') + return repr_str + + +@PIPELINES.register_module() +class RandomRescale: + """Randomly resize images so that the short_edge is resized to a specific + size in a given range. The scale ratio is unchanged after resizing. + + Required keys are "imgs", "img_shape", "modality", added or modified + keys are "imgs", "img_shape", "keep_ratio", "scale_factor", "resize_size", + "short_edge". + + Args: + scale_range (tuple[int]): The range of short edge length. A closed + interval. + interpolation (str): Algorithm used for interpolation: + "nearest" | "bilinear". Default: "bilinear". + """ + + def __init__(self, scale_range, interpolation='bilinear'): + self.scale_range = scale_range + # make sure scale_range is legal, first make sure the type is OK + assert mmcv.is_tuple_of(scale_range, int) + assert len(scale_range) == 2 + assert scale_range[0] < scale_range[1] + assert np.all([x > 0 for x in scale_range]) + + self.keep_ratio = True + self.interpolation = interpolation + + def __call__(self, results): + """Performs the Resize augmentation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + short_edge = np.random.randint(self.scale_range[0], + self.scale_range[1] + 1) + resize = Resize((-1, short_edge), + keep_ratio=True, + interpolation=self.interpolation, + lazy=False) + results = resize(results) + + results['short_edge'] = short_edge + return results + + def __repr__(self): + scale_range = self.scale_range + repr_str = (f'{self.__class__.__name__}(' + f'scale_range=({scale_range[0]}, {scale_range[1]}), ' + f'interpolation={self.interpolation})') + return repr_str + + +@PIPELINES.register_module() +class Flip: + """Flip the input images with a probability. + + Reverse the order of elements in the given imgs with a specific direction. + The shape of the imgs is preserved, but the elements are reordered. + + Required keys are "img_shape", "modality", "imgs" (optional), "keypoint" + (optional), added or modified keys are "imgs", "keypoint", "lazy" and + "flip_direction". Required keys in "lazy" is None, added or modified key + are "flip" and "flip_direction". The Flip augmentation should be placed + after any cropping / reshaping augmentations, to make sure crop_quadruple + is calculated properly. + + Args: + flip_ratio (float): Probability of implementing flip. Default: 0.5. + direction (str): Flip imgs horizontally or vertically. Options are + "horizontal" | "vertical". Default: "horizontal". + flip_label_map (Dict[int, int] | None): Transform the label of the + flipped image with the specific label. Default: None. + left_kp (list[int]): Indexes of left keypoints, used to flip keypoints. + Default: None. + right_kp (list[ind]): Indexes of right keypoints, used to flip + keypoints. Default: None. + lazy (bool): Determine whether to apply lazy operation. Default: False. + """ + _directions = ['horizontal', 'vertical'] + + def __init__(self, + flip_ratio=0.5, + direction='horizontal', + flip_label_map=None, + left_kp=None, + right_kp=None, + lazy=False): + if direction not in self._directions: + raise ValueError(f'Direction {direction} is not supported. ' + f'Currently support ones are {self._directions}') + self.flip_ratio = flip_ratio + self.direction = direction + self.flip_label_map = flip_label_map + self.left_kp = left_kp + self.right_kp = right_kp + self.lazy = lazy + + def _flip_imgs(self, imgs, modality): + _ = [mmcv.imflip_(img, self.direction) for img in imgs] + lt = len(imgs) + if modality == 'Flow': + # The 1st frame of each 2 frames is flow-x + for i in range(0, lt, 2): + imgs[i] = mmcv.iminvert(imgs[i]) + return imgs + + def _flip_kps(self, kps, kpscores, img_width): + kp_x = kps[..., 0] + kp_x[kp_x != 0] = img_width - kp_x[kp_x != 0] + new_order = list(range(kps.shape[2])) + if self.left_kp is not None and self.right_kp is not None: + for left, right in zip(self.left_kp, self.right_kp): + new_order[left] = right + new_order[right] = left + kps = kps[:, :, new_order] + if kpscores is not None: + kpscores = kpscores[:, :, new_order] + return kps, kpscores + + @staticmethod + def _box_flip(box, img_width): + """Flip the bounding boxes given the width of the image. + + Args: + box (np.ndarray): The bounding boxes. + img_width (int): The img width. + """ + box_ = box.copy() + box_[..., 0::4] = img_width - box[..., 2::4] + box_[..., 2::4] = img_width - box[..., 0::4] + return box_ + + def __call__(self, results): + """Performs the Flip augmentation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + _init_lazy_if_proper(results, self.lazy) + if 'keypoint' in results: + assert not self.lazy, ('Keypoint Augmentations are not compatible ' + 'with lazy == True') + assert self.direction == 'horizontal', ( + 'Only horizontal flips are' + 'supported for human keypoints') + + modality = results['modality'] + if modality == 'Flow': + assert self.direction == 'horizontal' + + flip = np.random.rand() < self.flip_ratio + + results['flip'] = flip + results['flip_direction'] = self.direction + img_width = results['img_shape'][1] + + if self.flip_label_map is not None and flip: + results['label'] = self.flip_label_map.get(results['label'], + results['label']) + + if not self.lazy: + if flip: + if 'imgs' in results: + results['imgs'] = self._flip_imgs(results['imgs'], + modality) + if 'keypoint' in results: + kp = results['keypoint'] + kpscore = results.get('keypoint_score', None) + kp, kpscore = self._flip_kps(kp, kpscore, img_width) + results['keypoint'] = kp + if 'keypoint_score' in results: + results['keypoint_score'] = kpscore + else: + lazyop = results['lazy'] + if lazyop['flip']: + raise NotImplementedError('Use one Flip please') + lazyop['flip'] = flip + lazyop['flip_direction'] = self.direction + + if 'gt_bboxes' in results and flip: + assert not self.lazy and self.direction == 'horizontal' + width = results['img_shape'][1] + results['gt_bboxes'] = self._box_flip(results['gt_bboxes'], width) + if 'proposals' in results and results['proposals'] is not None: + assert results['proposals'].shape[1] == 4 + results['proposals'] = self._box_flip(results['proposals'], + width) + + return results + + def __repr__(self): + repr_str = ( + f'{self.__class__.__name__}(' + f'flip_ratio={self.flip_ratio}, direction={self.direction}, ' + f'flip_label_map={self.flip_label_map}, lazy={self.lazy})') + return repr_str + + +@PIPELINES.register_module() +class Normalize: + """Normalize images with the given mean and std value. + + Required keys are "imgs", "img_shape", "modality", added or modified + keys are "imgs" and "img_norm_cfg". If modality is 'Flow', additional + keys "scale_factor" is required + + Args: + mean (Sequence[float]): Mean values of different channels. + std (Sequence[float]): Std values of different channels. + to_bgr (bool): Whether to convert channels from RGB to BGR. + Default: False. + adjust_magnitude (bool): Indicate whether to adjust the flow magnitude + on 'scale_factor' when modality is 'Flow'. Default: False. + """ + + def __init__(self, mean, std, to_bgr=False, adjust_magnitude=False): + if not isinstance(mean, Sequence): + raise TypeError( + f'Mean must be list, tuple or np.ndarray, but got {type(mean)}' + ) + + if not isinstance(std, Sequence): + raise TypeError( + f'Std must be list, tuple or np.ndarray, but got {type(std)}') + + self.mean = np.array(mean, dtype=np.float32) + self.std = np.array(std, dtype=np.float32) + self.to_bgr = to_bgr + self.adjust_magnitude = adjust_magnitude + + def __call__(self, results): + modality = results['modality'] + + if modality == 'RGB': + n = len(results['imgs']) + h, w, c = results['imgs'][0].shape + imgs = np.empty((n, h, w, c), dtype=np.float32) + for i, img in enumerate(results['imgs']): + imgs[i] = img + + for img in imgs: + mmcv.imnormalize_(img, self.mean, self.std, self.to_bgr) + + results['imgs'] = imgs + results['img_norm_cfg'] = dict( + mean=self.mean, std=self.std, to_bgr=self.to_bgr) + return results + if modality == 'Flow': + num_imgs = len(results['imgs']) + assert num_imgs % 2 == 0 + assert self.mean.shape[0] == 2 + assert self.std.shape[0] == 2 + n = num_imgs // 2 + h, w = results['imgs'][0].shape + x_flow = np.empty((n, h, w), dtype=np.float32) + y_flow = np.empty((n, h, w), dtype=np.float32) + for i in range(n): + x_flow[i] = results['imgs'][2 * i] + y_flow[i] = results['imgs'][2 * i + 1] + x_flow = (x_flow - self.mean[0]) / self.std[0] + y_flow = (y_flow - self.mean[1]) / self.std[1] + if self.adjust_magnitude: + x_flow = x_flow * results['scale_factor'][0] + y_flow = y_flow * results['scale_factor'][1] + imgs = np.stack([x_flow, y_flow], axis=-1) + results['imgs'] = imgs + args = dict( + mean=self.mean, + std=self.std, + to_bgr=self.to_bgr, + adjust_magnitude=self.adjust_magnitude) + results['img_norm_cfg'] = args + return results + raise NotImplementedError + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'mean={self.mean}, ' + f'std={self.std}, ' + f'to_bgr={self.to_bgr}, ' + f'adjust_magnitude={self.adjust_magnitude})') + return repr_str + + +@PIPELINES.register_module() +class ColorJitter: + """Perform ColorJitter to each img. + + Required keys are "imgs", added or modified keys are "imgs". + + Args: + brightness (float | tuple[float]): The jitter range for brightness, if + set as a float, the range will be (1 - brightness, 1 + brightness). + Default: 0.5. + contrast (float | tuple[float]): The jitter range for contrast, if set + as a float, the range will be (1 - contrast, 1 + contrast). + Default: 0.5. + saturation (float | tuple[float]): The jitter range for saturation, if + set as a float, the range will be (1 - saturation, 1 + saturation). + Default: 0.5. + hue (float | tuple[float]): The jitter range for hue, if set as a + float, the range will be (-hue, hue). Default: 0.1. + """ + + @staticmethod + def check_input(val, max, base): + if isinstance(val, tuple): + assert base - max <= val[0] <= val[1] <= base + max + return val + assert val <= max + return (base - val, base + val) + + @staticmethod + def rgb_to_grayscale(img): + return 0.2989 * img[..., 0] + 0.587 * img[..., 1] + 0.114 * img[..., 2] + + @staticmethod + def adjust_contrast(img, factor): + val = np.mean(ColorJitter.rgb_to_grayscale(img)) + return factor * img + (1 - factor) * val + + @staticmethod + def adjust_saturation(img, factor): + gray = np.stack([ColorJitter.rgb_to_grayscale(img)] * 3, axis=-1) + return factor * img + (1 - factor) * gray + + @staticmethod + def adjust_hue(img, factor): + img = np.clip(img, 0, 255).astype(np.uint8) + hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV) + offset = int(factor * 255) + hsv[..., 0] = (hsv[..., 0] + offset) % 180 + img = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB) + return img.astype(np.float32) + + def __init__(self, brightness=0.5, contrast=0.5, saturation=0.5, hue=0.1): + self.brightness = self.check_input(brightness, 1, 1) + self.contrast = self.check_input(contrast, 1, 1) + self.saturation = self.check_input(saturation, 1, 1) + self.hue = self.check_input(hue, 0.5, 0) + self.fn_idx = np.random.permutation(4) + + def __call__(self, results): + imgs = results['imgs'] + num_clips, clip_len = 1, len(imgs) + + new_imgs = [] + for i in range(num_clips): + b = np.random.uniform( + low=self.brightness[0], high=self.brightness[1]) + c = np.random.uniform(low=self.contrast[0], high=self.contrast[1]) + s = np.random.uniform( + low=self.saturation[0], high=self.saturation[1]) + h = np.random.uniform(low=self.hue[0], high=self.hue[1]) + start, end = i * clip_len, (i + 1) * clip_len + + for img in imgs[start:end]: + img = img.astype(np.float32) + for fn_id in self.fn_idx: + if fn_id == 0 and b != 1: + img *= b + if fn_id == 1 and c != 1: + img = self.adjust_contrast(img, c) + if fn_id == 2 and s != 1: + img = self.adjust_saturation(img, s) + if fn_id == 3 and h != 0: + img = self.adjust_hue(img, h) + img = np.clip(img, 0, 255).astype(np.uint8) + new_imgs.append(img) + results['imgs'] = new_imgs + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'brightness={self.brightness}, ' + f'contrast={self.contrast}, ' + f'saturation={self.saturation}, ' + f'hue={self.hue})') + return repr_str + + +@PIPELINES.register_module() +class CenterCrop(RandomCrop): + """Crop the center area from images. + + Required keys are "img_shape", "imgs" (optional), "keypoint" (optional), + added or modified keys are "imgs", "keypoint", "crop_bbox", "lazy" and + "img_shape". Required keys in "lazy" is "crop_bbox", added or modified key + is "crop_bbox". + + Args: + crop_size (int | tuple[int]): (w, h) of crop size. + lazy (bool): Determine whether to apply lazy operation. Default: False. + """ + + def __init__(self, crop_size, lazy=False): + self.crop_size = _pair(crop_size) + self.lazy = lazy + if not mmcv.is_tuple_of(self.crop_size, int): + raise TypeError(f'Crop_size must be int or tuple of int, ' + f'but got {type(crop_size)}') + + def __call__(self, results): + """Performs the CenterCrop augmentation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + _init_lazy_if_proper(results, self.lazy) + if 'keypoint' in results: + assert not self.lazy, ('Keypoint Augmentations are not compatible ' + 'with lazy == True') + + img_h, img_w = results['img_shape'] + crop_w, crop_h = self.crop_size + + left = (img_w - crop_w) // 2 + top = (img_h - crop_h) // 2 + right = left + crop_w + bottom = top + crop_h + new_h, new_w = bottom - top, right - left + + crop_bbox = np.array([left, top, right, bottom]) + results['crop_bbox'] = crop_bbox + results['img_shape'] = (new_h, new_w) + + if 'crop_quadruple' not in results: + results['crop_quadruple'] = np.array( + [0, 0, 1, 1], # x, y, w, h + dtype=np.float32) + + x_ratio, y_ratio = left / img_w, top / img_h + w_ratio, h_ratio = new_w / img_w, new_h / img_h + + old_crop_quadruple = results['crop_quadruple'] + old_x_ratio, old_y_ratio = old_crop_quadruple[0], old_crop_quadruple[1] + old_w_ratio, old_h_ratio = old_crop_quadruple[2], old_crop_quadruple[3] + new_crop_quadruple = [ + old_x_ratio + x_ratio * old_w_ratio, + old_y_ratio + y_ratio * old_h_ratio, w_ratio * old_w_ratio, + h_ratio * old_h_ratio + ] + results['crop_quadruple'] = np.array( + new_crop_quadruple, dtype=np.float32) + + if not self.lazy: + if 'keypoint' in results: + results['keypoint'] = self._crop_kps(results['keypoint'], + crop_bbox) + if 'imgs' in results: + results['imgs'] = self._crop_imgs(results['imgs'], crop_bbox) + else: + lazyop = results['lazy'] + if lazyop['flip']: + raise NotImplementedError('Put Flip at last for now') + + # record crop_bbox in lazyop dict to ensure only crop once in Fuse + lazy_left, lazy_top, lazy_right, lazy_bottom = lazyop['crop_bbox'] + left = left * (lazy_right - lazy_left) / img_w + right = right * (lazy_right - lazy_left) / img_w + top = top * (lazy_bottom - lazy_top) / img_h + bottom = bottom * (lazy_bottom - lazy_top) / img_h + lazyop['crop_bbox'] = np.array([(lazy_left + left), + (lazy_top + top), + (lazy_left + right), + (lazy_top + bottom)], + dtype=np.float32) + + if 'gt_bboxes' in results: + assert not self.lazy + results = self._all_box_crop(results, results['crop_bbox']) + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(crop_size={self.crop_size}, ' + f'lazy={self.lazy})') + return repr_str + + +@PIPELINES.register_module() +class ThreeCrop: + """Crop images into three crops. + + Crop the images equally into three crops with equal intervals along the + shorter side. + Required keys are "imgs", "img_shape", added or modified keys are "imgs", + "crop_bbox" and "img_shape". + + Args: + crop_size(int | tuple[int]): (w, h) of crop size. + """ + + def __init__(self, crop_size): + self.crop_size = _pair(crop_size) + if not mmcv.is_tuple_of(self.crop_size, int): + raise TypeError(f'Crop_size must be int or tuple of int, ' + f'but got {type(crop_size)}') + + def __call__(self, results): + """Performs the ThreeCrop augmentation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + _init_lazy_if_proper(results, False) + if 'gt_bboxes' in results or 'proposals' in results: + warnings.warn('ThreeCrop cannot process bounding boxes') + + imgs = results['imgs'] + img_h, img_w = results['imgs'][0].shape[:2] + crop_w, crop_h = self.crop_size + assert crop_h == img_h or crop_w == img_w + + if crop_h == img_h: + w_step = (img_w - crop_w) // 2 + offsets = [ + (0, 0), # left + (2 * w_step, 0), # right + (w_step, 0), # middle + ] + elif crop_w == img_w: + h_step = (img_h - crop_h) // 2 + offsets = [ + (0, 0), # top + (0, 2 * h_step), # down + (0, h_step), # middle + ] + + cropped = [] + crop_bboxes = [] + for x_offset, y_offset in offsets: + bbox = [x_offset, y_offset, x_offset + crop_w, y_offset + crop_h] + crop = [ + img[y_offset:y_offset + crop_h, x_offset:x_offset + crop_w] + for img in imgs + ] + cropped.extend(crop) + crop_bboxes.extend([bbox for _ in range(len(imgs))]) + + crop_bboxes = np.array(crop_bboxes) + results['imgs'] = cropped + results['crop_bbox'] = crop_bboxes + results['img_shape'] = results['imgs'][0].shape[:2] + + return results + + def __repr__(self): + repr_str = f'{self.__class__.__name__}(crop_size={self.crop_size})' + return repr_str + + +@PIPELINES.register_module() +class TenCrop: + """Crop the images into 10 crops (corner + center + flip). + + Crop the four corners and the center part of the image with the same + given crop_size, and flip it horizontally. + Required keys are "imgs", "img_shape", added or modified keys are "imgs", + "crop_bbox" and "img_shape". + + Args: + crop_size(int | tuple[int]): (w, h) of crop size. + """ + + def __init__(self, crop_size): + self.crop_size = _pair(crop_size) + if not mmcv.is_tuple_of(self.crop_size, int): + raise TypeError(f'Crop_size must be int or tuple of int, ' + f'but got {type(crop_size)}') + + def __call__(self, results): + """Performs the TenCrop augmentation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + _init_lazy_if_proper(results, False) + + if 'gt_bboxes' in results or 'proposals' in results: + warnings.warn('TenCrop cannot process bounding boxes') + + imgs = results['imgs'] + + img_h, img_w = results['imgs'][0].shape[:2] + crop_w, crop_h = self.crop_size + + w_step = (img_w - crop_w) // 4 + h_step = (img_h - crop_h) // 4 + + offsets = [ + (0, 0), # upper left + (4 * w_step, 0), # upper right + (0, 4 * h_step), # lower left + (4 * w_step, 4 * h_step), # lower right + (2 * w_step, 2 * h_step), # center + ] + + img_crops = list() + crop_bboxes = list() + for x_offset, y_offsets in offsets: + crop = [ + img[y_offsets:y_offsets + crop_h, x_offset:x_offset + crop_w] + for img in imgs + ] + flip_crop = [np.flip(c, axis=1).copy() for c in crop] + bbox = [x_offset, y_offsets, x_offset + crop_w, y_offsets + crop_h] + img_crops.extend(crop) + img_crops.extend(flip_crop) + crop_bboxes.extend([bbox for _ in range(len(imgs) * 2)]) + + crop_bboxes = np.array(crop_bboxes) + results['imgs'] = img_crops + results['crop_bbox'] = crop_bboxes + results['img_shape'] = results['imgs'][0].shape[:2] + + return results + + def __repr__(self): + repr_str = f'{self.__class__.__name__}(crop_size={self.crop_size})' + return repr_str + + +@PIPELINES.register_module() +class AudioAmplify: + """Amplify the waveform. + + Required keys are "audios", added or modified keys are "audios", + "amplify_ratio". + + Args: + ratio (float): The ratio used to amplify the audio waveform. + """ + + def __init__(self, ratio): + if isinstance(ratio, float): + self.ratio = ratio + else: + raise TypeError('Amplification ratio should be float.') + + def __call__(self, results): + """Perform the audio amplification. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + + assert 'audios' in results + results['audios'] *= self.ratio + results['amplify_ratio'] = self.ratio + + return results + + def __repr__(self): + repr_str = f'{self.__class__.__name__}(ratio={self.ratio})' + return repr_str + + +@PIPELINES.register_module() +class MelSpectrogram: + """MelSpectrogram. Transfer an audio wave into a melspectogram figure. + + Required keys are "audios", "sample_rate", "num_clips", added or modified + keys are "audios". + + Args: + window_size (int): The window size in millisecond. Default: 32. + step_size (int): The step size in millisecond. Default: 16. + n_mels (int): Number of mels. Default: 80. + fixed_length (int): The sample length of melspectrogram maybe not + exactly as wished due to different fps, fix the length for batch + collation by truncating or padding. Default: 128. + """ + + def __init__(self, + window_size=32, + step_size=16, + n_mels=80, + fixed_length=128): + if all( + isinstance(x, int) + for x in [window_size, step_size, n_mels, fixed_length]): + self.window_size = window_size + self.step_size = step_size + self.n_mels = n_mels + self.fixed_length = fixed_length + else: + raise TypeError('All arguments should be int.') + + def __call__(self, results): + """Perform MelSpectrogram transformation. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + try: + import librosa + except ImportError: + raise ImportError('Install librosa first.') + signals = results['audios'] + sample_rate = results['sample_rate'] + n_fft = int(round(sample_rate * self.window_size / 1000)) + hop_length = int(round(sample_rate * self.step_size / 1000)) + melspectrograms = list() + for clip_idx in range(results['num_clips']): + clip_signal = signals[clip_idx] + mel = librosa.feature.melspectrogram( + y=clip_signal, + sr=sample_rate, + n_fft=n_fft, + hop_length=hop_length, + n_mels=self.n_mels) + if mel.shape[0] >= self.fixed_length: + mel = mel[:self.fixed_length, :] + else: + mel = np.pad( + mel, ((0, mel.shape[-1] - self.fixed_length), (0, 0)), + mode='edge') + melspectrograms.append(mel) + + results['audios'] = np.array(melspectrograms) + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}' + f'(window_size={self.window_size}), ' + f'step_size={self.step_size}, ' + f'n_mels={self.n_mels}, ' + f'fixed_length={self.fixed_length})') + return repr_str diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/compose.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/compose.py new file mode 100644 index 0000000000000000000000000000000000000000..61fc5c56451a6cc1facbf2f5a7879ae2f215e4e1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/compose.py @@ -0,0 +1,61 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from collections.abc import Sequence + +from mmcv.utils import build_from_cfg + +from ..builder import PIPELINES +from .augmentations import PytorchVideoTrans, TorchvisionTrans + + +@PIPELINES.register_module() +class Compose: + """Compose a data pipeline with a sequence of transforms. + + Args: + transforms (list[dict | callable]): + Either config dicts of transforms or transform objects. + """ + + def __init__(self, transforms): + assert isinstance(transforms, Sequence) + self.transforms = [] + for transform in transforms: + if isinstance(transform, dict): + if transform['type'].startswith('torchvision.'): + trans_type = transform.pop('type')[12:] + transform = TorchvisionTrans(trans_type, **transform) + elif transform['type'].startswith('pytorchvideo.'): + trans_type = transform.pop('type')[13:] + transform = PytorchVideoTrans(trans_type, **transform) + else: + transform = build_from_cfg(transform, PIPELINES) + self.transforms.append(transform) + elif callable(transform): + self.transforms.append(transform) + else: + raise TypeError(f'transform must be callable or a dict, ' + f'but got {type(transform)}') + + def __call__(self, data): + """Call function to apply transforms sequentially. + + Args: + data (dict): A result dict contains the data to transform. + + Returns: + dict: Transformed data. + """ + + for t in self.transforms: + data = t(data) + if data is None: + return None + return data + + def __repr__(self): + format_string = self.__class__.__name__ + '(' + for t in self.transforms: + format_string += '\n' + format_string += ' {0}'.format(t) + format_string += '\n)' + return format_string diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/formatting.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/formatting.py new file mode 100644 index 0000000000000000000000000000000000000000..4b1fbc3f7f490b9591df4be45916d709c3c1bb26 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/formatting.py @@ -0,0 +1,490 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from collections.abc import Sequence + +import mmcv +import numpy as np +import torch +from mmcv.parallel import DataContainer as DC + +from ..builder import PIPELINES + + +def to_tensor(data): + """Convert objects of various python types to :obj:`torch.Tensor`. + + Supported types are: :class:`numpy.ndarray`, :class:`torch.Tensor`, + :class:`Sequence`, :class:`int` and :class:`float`. + """ + if isinstance(data, torch.Tensor): + return data + if isinstance(data, np.ndarray): + return torch.from_numpy(data) + if isinstance(data, Sequence) and not mmcv.is_str(data): + return torch.tensor(data) + if isinstance(data, int): + return torch.LongTensor([data]) + if isinstance(data, float): + return torch.FloatTensor([data]) + raise TypeError(f'type {type(data)} cannot be converted to tensor.') + + +@PIPELINES.register_module() +class ToTensor: + """Convert some values in results dict to `torch.Tensor` type in data + loader pipeline. + + Args: + keys (Sequence[str]): Required keys to be converted. + """ + + def __init__(self, keys): + self.keys = keys + + def __call__(self, results): + """Performs the ToTensor formatting. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + for key in self.keys: + results[key] = to_tensor(results[key]) + return results + + def __repr__(self): + return f'{self.__class__.__name__}(keys={self.keys})' + + +@PIPELINES.register_module() +class Rename: + """Rename the key in results. + + Args: + mapping (dict): The keys in results that need to be renamed. The key of + the dict is the original name, while the value is the new name. If + the original name not found in results, do nothing. + Default: dict(). + """ + + def __init__(self, mapping): + self.mapping = mapping + + def __call__(self, results): + for key, value in self.mapping.items(): + if key in results: + assert isinstance(key, str) and isinstance(value, str) + assert value not in results, ('the new name already exists in ' + 'results') + results[value] = results[key] + results.pop(key) + return results + + +@PIPELINES.register_module() +class ToDataContainer: + """Convert the data to DataContainer. + + Args: + fields (Sequence[dict]): Required fields to be converted + with keys and attributes. E.g. + fields=(dict(key='gt_bbox', stack=False),). + Note that key can also be a list of keys, if so, every tensor in + the list will be converted to DataContainer. + """ + + def __init__(self, fields): + self.fields = fields + + def __call__(self, results): + """Performs the ToDataContainer formatting. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + for field in self.fields: + _field = field.copy() + key = _field.pop('key') + if isinstance(key, list): + for item in key: + results[item] = DC(results[item], **_field) + else: + results[key] = DC(results[key], **_field) + return results + + def __repr__(self): + return self.__class__.__name__ + f'(fields={self.fields})' + + +@PIPELINES.register_module() +class ImageToTensor: + """Convert image type to `torch.Tensor` type. + + Args: + keys (Sequence[str]): Required keys to be converted. + """ + + def __init__(self, keys): + self.keys = keys + + def __call__(self, results): + """Performs the ImageToTensor formatting. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + for key in self.keys: + results[key] = to_tensor(results[key].transpose(2, 0, 1)) + return results + + def __repr__(self): + return f'{self.__class__.__name__}(keys={self.keys})' + + +@PIPELINES.register_module() +class Transpose: + """Transpose image channels to a given order. + + Args: + keys (Sequence[str]): Required keys to be converted. + order (Sequence[int]): Image channel order. + """ + + def __init__(self, keys, order): + self.keys = keys + self.order = order + + def __call__(self, results): + """Performs the Transpose formatting. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + for key in self.keys: + results[key] = results[key].transpose(self.order) + return results + + def __repr__(self): + return (f'{self.__class__.__name__}(' + f'keys={self.keys}, order={self.order})') + + +@PIPELINES.register_module() +class Collect: + """Collect data from the loader relevant to the specific task. + + This keeps the items in ``keys`` as it is, and collect items in + ``meta_keys`` into a meta item called ``meta_name``.This is usually + the last stage of the data loader pipeline. + For example, when keys='imgs', meta_keys=('filename', 'label', + 'original_shape'), meta_name='img_metas', the results will be a dict with + keys 'imgs' and 'img_metas', where 'img_metas' is a DataContainer of + another dict with keys 'filename', 'label', 'original_shape'. + + Args: + keys (Sequence[str]): Required keys to be collected. + meta_name (str): The name of the key that contains meta information. + This key is always populated. Default: "img_metas". + meta_keys (Sequence[str]): Keys that are collected under meta_name. + The contents of the ``meta_name`` dictionary depends on + ``meta_keys``. + By default this includes: + + - "filename": path to the image file + - "label": label of the image file + - "original_shape": original shape of the image as a tuple + (h, w, c) + - "img_shape": shape of the image input to the network as a tuple + (h, w, c). Note that images may be zero padded on the + bottom/right, if the batch tensor is larger than this shape. + - "pad_shape": image shape after padding + - "flip_direction": a str in ("horiziontal", "vertival") to + indicate if the image is fliped horizontally or vertically. + - "img_norm_cfg": a dict of normalization information: + - mean - per channel mean subtraction + - std - per channel std divisor + - to_rgb - bool indicating if bgr was converted to rgb + nested (bool): If set as True, will apply data[x] = [data[x]] to all + items in data. The arg is added for compatibility. Default: False. + """ + + def __init__(self, + keys, + meta_keys=('filename', 'label', 'original_shape', 'img_shape', + 'pad_shape', 'flip_direction', 'img_norm_cfg'), + meta_name='img_metas', + nested=False): + self.keys = keys + self.meta_keys = meta_keys + self.meta_name = meta_name + self.nested = nested + + def __call__(self, results): + """Performs the Collect formatting. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + data = {} + for key in self.keys: + data[key] = results[key] + + if len(self.meta_keys) != 0: + meta = {} + for key in self.meta_keys: + meta[key] = results[key] + data[self.meta_name] = DC(meta, cpu_only=True) + if self.nested: + for k in data: + data[k] = [data[k]] + + return data + + def __repr__(self): + return (f'{self.__class__.__name__}(' + f'keys={self.keys}, meta_keys={self.meta_keys}, ' + f'nested={self.nested})') + + +@PIPELINES.register_module() +class FormatShape: + """Format final imgs shape to the given input_format. + + Required keys are "imgs", "num_clips" and "clip_len", added or modified + keys are "imgs" and "input_shape". + + Args: + input_format (str): Define the final imgs format. + collapse (bool): To collpase input_format N... to ... (NCTHW to CTHW, + etc.) if N is 1. Should be set as True when training and testing + detectors. Default: False. + """ + + def __init__(self, input_format, collapse=False): + self.input_format = input_format + self.collapse = collapse + if self.input_format not in ['NCTHW', 'NCHW', 'NCHW_Flow', 'NPTCHW']: + raise ValueError( + f'The input format {self.input_format} is invalid.') + + def __call__(self, results): + """Performs the FormatShape formatting. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + if not isinstance(results['imgs'], np.ndarray): + results['imgs'] = np.array(results['imgs']) + imgs = results['imgs'] + # [M x H x W x C] + # M = 1 * N_crops * N_clips * L + if self.collapse: + assert results['num_clips'] == 1 + + if self.input_format == 'NCTHW': + num_clips = results['num_clips'] + clip_len = results['clip_len'] + + imgs = imgs.reshape((-1, num_clips, clip_len) + imgs.shape[1:]) + # N_crops x N_clips x L x H x W x C + imgs = np.transpose(imgs, (0, 1, 5, 2, 3, 4)) + # N_crops x N_clips x C x L x H x W + imgs = imgs.reshape((-1, ) + imgs.shape[2:]) + # M' x C x L x H x W + # M' = N_crops x N_clips + elif self.input_format == 'NCHW': + imgs = np.transpose(imgs, (0, 3, 1, 2)) + # M x C x H x W + elif self.input_format == 'NCHW_Flow': + num_clips = results['num_clips'] + clip_len = results['clip_len'] + imgs = imgs.reshape((-1, num_clips, clip_len) + imgs.shape[1:]) + # N_crops x N_clips x L x H x W x C + imgs = np.transpose(imgs, (0, 1, 2, 5, 3, 4)) + # N_crops x N_clips x L x C x H x W + imgs = imgs.reshape((-1, imgs.shape[2] * imgs.shape[3]) + + imgs.shape[4:]) + # M' x C' x H x W + # M' = N_crops x N_clips + # C' = L x C + elif self.input_format == 'NPTCHW': + num_proposals = results['num_proposals'] + num_clips = results['num_clips'] + clip_len = results['clip_len'] + imgs = imgs.reshape((num_proposals, num_clips * clip_len) + + imgs.shape[1:]) + # P x M x H x W x C + # M = N_clips x L + imgs = np.transpose(imgs, (0, 1, 4, 2, 3)) + # P x M x C x H x W + + if self.collapse: + assert imgs.shape[0] == 1 + imgs = imgs.squeeze(0) + + results['imgs'] = imgs + results['input_shape'] = imgs.shape + return results + + def __repr__(self): + repr_str = self.__class__.__name__ + repr_str += f"(input_format='{self.input_format}')" + return repr_str + + +@PIPELINES.register_module() +class FormatAudioShape: + """Format final audio shape to the given input_format. + + Required keys are "imgs", "num_clips" and "clip_len", added or modified + keys are "imgs" and "input_shape". + + Args: + input_format (str): Define the final imgs format. + """ + + def __init__(self, input_format): + self.input_format = input_format + if self.input_format not in ['NCTF']: + raise ValueError( + f'The input format {self.input_format} is invalid.') + + def __call__(self, results): + """Performs the FormatShape formatting. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + audios = results['audios'] + # clip x sample x freq -> clip x channel x sample x freq + clip, sample, freq = audios.shape + audios = audios.reshape(clip, 1, sample, freq) + results['audios'] = audios + results['input_shape'] = audios.shape + return results + + def __repr__(self): + repr_str = self.__class__.__name__ + repr_str += f"(input_format='{self.input_format}')" + return repr_str + + +@PIPELINES.register_module() +class JointToBone: + """Convert the joint information to bone information. + + Required keys are "keypoint" , + added or modified keys are "keypoint". + + Args: + dataset (str): Define the type of dataset: 'nturgb+d', 'openpose-18', + 'coco'. Default: 'nturgb+d'. + """ + + def __init__(self, dataset='nturgb+d'): + self.dataset = dataset + if self.dataset not in ['nturgb+d', 'openpose-18', 'coco']: + raise ValueError( + f'The dataset type {self.dataset} is not supported') + if self.dataset == 'nturgb+d': + self.pairs = [(0, 1), (1, 20), (2, 20), (3, 2), (4, 20), (5, 4), + (6, 5), (7, 6), (8, 20), (9, 8), (10, 9), (11, 10), + (12, 0), (13, 12), (14, 13), (15, 14), (16, 0), + (17, 16), (18, 17), (19, 18), (21, 22), (20, 20), + (22, 7), (23, 24), (24, 11)] + elif self.dataset == 'openpose-18': + self.pairs = ((0, 0), (1, 0), (2, 1), (3, 2), (4, 3), (5, 1), + (6, 5), (7, 6), (8, 2), (9, 8), (10, 9), (11, 5), + (12, 11), (13, 12), (14, 0), (15, 0), (16, 14), (17, + 15)) + elif self.dataset == 'coco': + self.pairs = ((0, 0), (1, 0), (2, 0), (3, 1), (4, 2), (5, 0), + (6, 0), (7, 5), (8, 6), (9, 7), (10, 8), (11, 0), + (12, 0), (13, 11), (14, 12), (15, 13), (16, 14)) + + def __call__(self, results): + """Performs the Bone formatting. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + keypoint = results['keypoint'] + M, T, V, C = keypoint.shape + bone = np.zeros((M, T, V, C), dtype=np.float32) + + assert C in [2, 3] + for v1, v2 in self.pairs: + bone[..., v1, :] = keypoint[..., v1, :] - keypoint[..., v2, :] + if C == 3 and self.dataset in ['openpose-18', 'coco']: + score = (keypoint[..., v1, 2] + keypoint[..., v2, 2]) / 2 + bone[..., v1, 2] = score + + results['keypoint'] = bone + return results + + def __repr__(self): + repr_str = self.__class__.__name__ + repr_str += f"(dataset_type='{self.dataset}')" + return repr_str + + +@PIPELINES.register_module() +class FormatGCNInput: + """Format final skeleton shape to the given input_format. + + Required keys are "keypoint" and "keypoint_score"(optional), + added or modified keys are "keypoint" and "input_shape". + + Args: + input_format (str): Define the final skeleton format. + """ + + def __init__(self, input_format, num_person=2): + self.input_format = input_format + if self.input_format not in ['NCTVM']: + raise ValueError( + f'The input format {self.input_format} is invalid.') + self.num_person = num_person + + def __call__(self, results): + """Performs the FormatShape formatting. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + keypoint = results['keypoint'] + + if 'keypoint_score' in results: + keypoint_confidence = results['keypoint_score'] + keypoint_confidence = np.expand_dims(keypoint_confidence, -1) + keypoint_3d = np.concatenate((keypoint, keypoint_confidence), + axis=-1) + else: + keypoint_3d = keypoint + + keypoint_3d = np.transpose(keypoint_3d, + (3, 1, 2, 0)) # M T V C -> C T V M + + if keypoint_3d.shape[-1] < self.num_person: + pad_dim = self.num_person - keypoint_3d.shape[-1] + pad = np.zeros( + keypoint_3d.shape[:-1] + (pad_dim, ), dtype=keypoint_3d.dtype) + keypoint_3d = np.concatenate((keypoint_3d, pad), axis=-1) + elif keypoint_3d.shape[-1] > self.num_person: + keypoint_3d = keypoint_3d[:, :, :, :self.num_person] + + results['keypoint'] = keypoint_3d + results['input_shape'] = keypoint_3d.shape + return results + + def __repr__(self): + repr_str = self.__class__.__name__ + repr_str += f"(input_format='{self.input_format}')" + return repr_str diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/loading.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/loading.py new file mode 100644 index 0000000000000000000000000000000000000000..5d7832c96786aee0c11edf2f576f78626d1beba6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/loading.py @@ -0,0 +1,1850 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy as cp +import io +import os +import os.path as osp +import shutil +import warnings + +import mmcv +import numpy as np +import torch +from mmcv.fileio import FileClient +from torch.nn.modules.utils import _pair + +from ...utils import get_random_string, get_shm_dir, get_thread_id +from ..builder import PIPELINES + + +@PIPELINES.register_module() +class LoadHVULabel: + """Convert the HVU label from dictionaries to torch tensors. + + Required keys are "label", "categories", "category_nums", added or modified + keys are "label", "mask" and "category_mask". + """ + + def __init__(self, **kwargs): + self.hvu_initialized = False + self.kwargs = kwargs + + def init_hvu_info(self, categories, category_nums): + assert len(categories) == len(category_nums) + self.categories = categories + self.category_nums = category_nums + self.num_categories = len(self.categories) + self.num_tags = sum(self.category_nums) + self.category2num = dict(zip(categories, category_nums)) + self.start_idx = [0] + for i in range(self.num_categories - 1): + self.start_idx.append(self.start_idx[-1] + self.category_nums[i]) + self.category2startidx = dict(zip(categories, self.start_idx)) + self.hvu_initialized = True + + def __call__(self, results): + """Convert the label dictionary to 3 tensors: "label", "mask" and + "category_mask". + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + + if not self.hvu_initialized: + self.init_hvu_info(results['categories'], results['category_nums']) + + onehot = torch.zeros(self.num_tags) + onehot_mask = torch.zeros(self.num_tags) + category_mask = torch.zeros(self.num_categories) + + for category, tags in results['label'].items(): + # skip if not training on this category + if category not in self.categories: + continue + category_mask[self.categories.index(category)] = 1. + start_idx = self.category2startidx[category] + category_num = self.category2num[category] + tags = [idx + start_idx for idx in tags] + onehot[tags] = 1. + onehot_mask[start_idx:category_num + start_idx] = 1. + + results['label'] = onehot + results['mask'] = onehot_mask + results['category_mask'] = category_mask + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'hvu_initialized={self.hvu_initialized})') + return repr_str + + +@PIPELINES.register_module() +class SampleFrames: + """Sample frames from the video. + + Required keys are "total_frames", "start_index" , added or modified keys + are "frame_inds", "frame_interval" and "num_clips". + + Args: + clip_len (int): Frames of each sampled output clip. + frame_interval (int): Temporal interval of adjacent sampled frames. + Default: 1. + num_clips (int): Number of clips to be sampled. Default: 1. + temporal_jitter (bool): Whether to apply temporal jittering. + Default: False. + twice_sample (bool): Whether to use twice sample when testing. + If set to True, it will sample frames with and without fixed shift, + which is commonly used for testing in TSM model. Default: False. + out_of_bound_opt (str): The way to deal with out of bounds frame + indexes. Available options are 'loop', 'repeat_last'. + Default: 'loop'. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + start_index (None): This argument is deprecated and moved to dataset + class (``BaseDataset``, ``VideoDataset``, ``RawframeDataset``, + etc), see this: https://github.com/open-mmlab/mmaction2/pull/89. + keep_tail_frames (bool): Whether to keep tail frames when sampling. + Default: False. + """ + + def __init__(self, + clip_len, + frame_interval=1, + num_clips=1, + temporal_jitter=False, + twice_sample=False, + out_of_bound_opt='loop', + test_mode=False, + start_index=None, + keep_tail_frames=False): + + self.clip_len = clip_len + self.frame_interval = frame_interval + self.num_clips = num_clips + self.temporal_jitter = temporal_jitter + self.twice_sample = twice_sample + self.out_of_bound_opt = out_of_bound_opt + self.test_mode = test_mode + self.keep_tail_frames = keep_tail_frames + assert self.out_of_bound_opt in ['loop', 'repeat_last'] + + if start_index is not None: + warnings.warn('No longer support "start_index" in "SampleFrames", ' + 'it should be set in dataset class, see this pr: ' + 'https://github.com/open-mmlab/mmaction2/pull/89') + + def _get_train_clips(self, num_frames): + """Get clip offsets in train mode. + + It will calculate the average interval for selected frames, + and randomly shift them within offsets between [0, avg_interval]. + If the total number of frames is smaller than clips num or origin + frames length, it will return all zero indices. + + Args: + num_frames (int): Total number of frame in the video. + + Returns: + np.ndarray: Sampled frame indices in train mode. + """ + ori_clip_len = self.clip_len * self.frame_interval + + if self.keep_tail_frames: + avg_interval = (num_frames - ori_clip_len + 1) / float( + self.num_clips) + if num_frames > ori_clip_len - 1: + base_offsets = np.arange(self.num_clips) * avg_interval + clip_offsets = (base_offsets + np.random.uniform( + 0, avg_interval, self.num_clips)).astype(np.int) + else: + clip_offsets = np.zeros((self.num_clips, ), dtype=np.int) + else: + avg_interval = (num_frames - ori_clip_len + 1) // self.num_clips + + if avg_interval > 0: + base_offsets = np.arange(self.num_clips) * avg_interval + clip_offsets = base_offsets + np.random.randint( + avg_interval, size=self.num_clips) + elif num_frames > max(self.num_clips, ori_clip_len): + clip_offsets = np.sort( + np.random.randint( + num_frames - ori_clip_len + 1, size=self.num_clips)) + elif avg_interval == 0: + ratio = (num_frames - ori_clip_len + 1.0) / self.num_clips + clip_offsets = np.around(np.arange(self.num_clips) * ratio) + else: + clip_offsets = np.zeros((self.num_clips, ), dtype=np.int) + + return clip_offsets + + def _get_test_clips(self, num_frames): + """Get clip offsets in test mode. + + Calculate the average interval for selected frames, and shift them + fixedly by avg_interval/2. If set twice_sample True, it will sample + frames together without fixed shift. If the total number of frames is + not enough, it will return all zero indices. + + Args: + num_frames (int): Total number of frame in the video. + + Returns: + np.ndarray: Sampled frame indices in test mode. + """ + ori_clip_len = self.clip_len * self.frame_interval + avg_interval = (num_frames - ori_clip_len + 1) / float(self.num_clips) + if num_frames > ori_clip_len - 1: + base_offsets = np.arange(self.num_clips) * avg_interval + clip_offsets = (base_offsets + avg_interval / 2.0).astype(np.int) + if self.twice_sample: + clip_offsets = np.concatenate([clip_offsets, base_offsets]) + else: + clip_offsets = np.zeros((self.num_clips, ), dtype=np.int) + return clip_offsets + + def _sample_clips(self, num_frames): + """Choose clip offsets for the video in a given mode. + + Args: + num_frames (int): Total number of frame in the video. + + Returns: + np.ndarray: Sampled frame indices. + """ + if self.test_mode: + clip_offsets = self._get_test_clips(num_frames) + else: + clip_offsets = self._get_train_clips(num_frames) + + return clip_offsets + + def __call__(self, results): + """Perform the SampleFrames loading. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + total_frames = results['total_frames'] + + clip_offsets = self._sample_clips(total_frames) + frame_inds = clip_offsets[:, None] + np.arange( + self.clip_len)[None, :] * self.frame_interval + frame_inds = np.concatenate(frame_inds) + + if self.temporal_jitter: + perframe_offsets = np.random.randint( + self.frame_interval, size=len(frame_inds)) + frame_inds += perframe_offsets + + frame_inds = frame_inds.reshape((-1, self.clip_len)) + if self.out_of_bound_opt == 'loop': + frame_inds = np.mod(frame_inds, total_frames) + elif self.out_of_bound_opt == 'repeat_last': + safe_inds = frame_inds < total_frames + unsafe_inds = 1 - safe_inds + last_ind = np.max(safe_inds * frame_inds, axis=1) + new_inds = (safe_inds * frame_inds + (unsafe_inds.T * last_ind).T) + frame_inds = new_inds + else: + raise ValueError('Illegal out_of_bound option.') + + start_index = results['start_index'] + frame_inds = np.concatenate(frame_inds) + start_index + results['frame_inds'] = frame_inds.astype(np.int) + results['clip_len'] = self.clip_len + results['frame_interval'] = self.frame_interval + results['num_clips'] = self.num_clips + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'clip_len={self.clip_len}, ' + f'frame_interval={self.frame_interval}, ' + f'num_clips={self.num_clips}, ' + f'temporal_jitter={self.temporal_jitter}, ' + f'twice_sample={self.twice_sample}, ' + f'out_of_bound_opt={self.out_of_bound_opt}, ' + f'test_mode={self.test_mode})') + return repr_str + + +@PIPELINES.register_module() +class UntrimmedSampleFrames: + """Sample frames from the untrimmed video. + + Required keys are "filename", "total_frames", added or modified keys are + "frame_inds", "frame_interval" and "num_clips". + + Args: + clip_len (int): The length of sampled clips. Default: 1. + frame_interval (int): Temporal interval of adjacent sampled frames. + Default: 16. + start_index (None): This argument is deprecated and moved to dataset + class (``BaseDataset``, ``VideoDataset``, ``RawframeDataset``, + etc), see this: https://github.com/open-mmlab/mmaction2/pull/89. + """ + + def __init__(self, clip_len=1, frame_interval=16, start_index=None): + + self.clip_len = clip_len + self.frame_interval = frame_interval + + if start_index is not None: + warnings.warn('No longer support "start_index" in "SampleFrames", ' + 'it should be set in dataset class, see this pr: ' + 'https://github.com/open-mmlab/mmaction2/pull/89') + + def __call__(self, results): + """Perform the SampleFrames loading. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + total_frames = results['total_frames'] + start_index = results['start_index'] + + clip_centers = np.arange(self.frame_interval // 2, total_frames, + self.frame_interval) + num_clips = clip_centers.shape[0] + frame_inds = clip_centers[:, None] + np.arange( + -(self.clip_len // 2), self.clip_len - + (self.clip_len // 2))[None, :] + # clip frame_inds to legal range + frame_inds = np.clip(frame_inds, 0, total_frames - 1) + + frame_inds = np.concatenate(frame_inds) + start_index + results['frame_inds'] = frame_inds.astype(np.int) + results['clip_len'] = self.clip_len + results['frame_interval'] = self.frame_interval + results['num_clips'] = num_clips + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'clip_len={self.clip_len}, ' + f'frame_interval={self.frame_interval})') + return repr_str + + +@PIPELINES.register_module() +class DenseSampleFrames(SampleFrames): + """Select frames from the video by dense sample strategy. + + Required keys are "filename", added or modified keys are "total_frames", + "frame_inds", "frame_interval" and "num_clips". + + Args: + clip_len (int): Frames of each sampled output clip. + frame_interval (int): Temporal interval of adjacent sampled frames. + Default: 1. + num_clips (int): Number of clips to be sampled. Default: 1. + sample_range (int): Total sample range for dense sample. + Default: 64. + num_sample_positions (int): Number of sample start positions, Which is + only used in test mode. Default: 10. That is to say, by default, + there are at least 10 clips for one input sample in test mode. + temporal_jitter (bool): Whether to apply temporal jittering. + Default: False. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + """ + + def __init__(self, + *args, + sample_range=64, + num_sample_positions=10, + **kwargs): + super().__init__(*args, **kwargs) + self.sample_range = sample_range + self.num_sample_positions = num_sample_positions + + def _get_train_clips(self, num_frames): + """Get clip offsets by dense sample strategy in train mode. + + It will calculate a sample position and sample interval and set + start index 0 when sample_pos == 1 or randomly choose from + [0, sample_pos - 1]. Then it will shift the start index by each + base offset. + + Args: + num_frames (int): Total number of frame in the video. + + Returns: + np.ndarray: Sampled frame indices in train mode. + """ + sample_position = max(1, 1 + num_frames - self.sample_range) + interval = self.sample_range // self.num_clips + start_idx = 0 if sample_position == 1 else np.random.randint( + 0, sample_position - 1) + base_offsets = np.arange(self.num_clips) * interval + clip_offsets = (base_offsets + start_idx) % num_frames + return clip_offsets + + def _get_test_clips(self, num_frames): + """Get clip offsets by dense sample strategy in test mode. + + It will calculate a sample position and sample interval and evenly + sample several start indexes as start positions between + [0, sample_position-1]. Then it will shift each start index by the + base offsets. + + Args: + num_frames (int): Total number of frame in the video. + + Returns: + np.ndarray: Sampled frame indices in train mode. + """ + sample_position = max(1, 1 + num_frames - self.sample_range) + interval = self.sample_range // self.num_clips + start_list = np.linspace( + 0, sample_position - 1, num=self.num_sample_positions, dtype=int) + base_offsets = np.arange(self.num_clips) * interval + clip_offsets = list() + for start_idx in start_list: + clip_offsets.extend((base_offsets + start_idx) % num_frames) + clip_offsets = np.array(clip_offsets) + return clip_offsets + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'clip_len={self.clip_len}, ' + f'frame_interval={self.frame_interval}, ' + f'num_clips={self.num_clips}, ' + f'sample_range={self.sample_range}, ' + f'num_sample_positions={self.num_sample_positions}, ' + f'temporal_jitter={self.temporal_jitter}, ' + f'out_of_bound_opt={self.out_of_bound_opt}, ' + f'test_mode={self.test_mode})') + return repr_str + + +@PIPELINES.register_module() +class SampleAVAFrames(SampleFrames): + + def __init__(self, clip_len, frame_interval=2, test_mode=False): + + super().__init__(clip_len, frame_interval, test_mode=test_mode) + + def _get_clips(self, center_index, skip_offsets, shot_info): + start = center_index - (self.clip_len // 2) * self.frame_interval + end = center_index + ((self.clip_len + 1) // 2) * self.frame_interval + frame_inds = list(range(start, end, self.frame_interval)) + if not self.test_mode: + frame_inds = frame_inds + skip_offsets + frame_inds = np.clip(frame_inds, shot_info[0], shot_info[1] - 1) + return frame_inds + + def __call__(self, results): + fps = results['fps'] + timestamp = results['timestamp'] + timestamp_start = results['timestamp_start'] + shot_info = results['shot_info'] + + center_index = fps * (timestamp - timestamp_start) + 1 + + skip_offsets = np.random.randint( + -self.frame_interval // 2, (self.frame_interval + 1) // 2, + size=self.clip_len) + frame_inds = self._get_clips(center_index, skip_offsets, shot_info) + start_index = results.get('start_index', 0) + + frame_inds = np.array(frame_inds, dtype=np.int) + start_index + results['frame_inds'] = frame_inds + results['clip_len'] = self.clip_len + results['frame_interval'] = self.frame_interval + results['num_clips'] = 1 + results['crop_quadruple'] = np.array([0, 0, 1, 1], dtype=np.float32) + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'clip_len={self.clip_len}, ' + f'frame_interval={self.frame_interval}, ' + f'test_mode={self.test_mode})') + return repr_str + + +@PIPELINES.register_module() +class SampleProposalFrames(SampleFrames): + """Sample frames from proposals in the video. + + Required keys are "total_frames" and "out_proposals", added or + modified keys are "frame_inds", "frame_interval", "num_clips", + 'clip_len' and 'num_proposals'. + + Args: + clip_len (int): Frames of each sampled output clip. + body_segments (int): Number of segments in course period. + aug_segments (list[int]): Number of segments in starting and + ending period. + aug_ratio (int | float | tuple[int | float]): The ratio + of the length of augmentation to that of the proposal. + frame_interval (int): Temporal interval of adjacent sampled frames. + Default: 1. + test_interval (int): Temporal interval of adjacent sampled frames + in test mode. Default: 6. + temporal_jitter (bool): Whether to apply temporal jittering. + Default: False. + mode (str): Choose 'train', 'val' or 'test' mode. + Default: 'train'. + """ + + def __init__(self, + clip_len, + body_segments, + aug_segments, + aug_ratio, + frame_interval=1, + test_interval=6, + temporal_jitter=False, + mode='train'): + super().__init__( + clip_len, + frame_interval=frame_interval, + temporal_jitter=temporal_jitter) + self.body_segments = body_segments + self.aug_segments = aug_segments + self.aug_ratio = _pair(aug_ratio) + if not mmcv.is_tuple_of(self.aug_ratio, (int, float)): + raise TypeError(f'aug_ratio should be int, float' + f'or tuple of int and float, ' + f'but got {type(aug_ratio)}') + assert len(self.aug_ratio) == 2 + assert mode in ['train', 'val', 'test'] + self.mode = mode + self.test_interval = test_interval + + @staticmethod + def _get_train_indices(valid_length, num_segments): + """Get indices of different stages of proposals in train mode. + + It will calculate the average interval for each segment, + and randomly shift them within offsets between [0, average_duration]. + If the total number of frames is smaller than num segments, it will + return all zero indices. + + Args: + valid_length (int): The length of the starting point's + valid interval. + num_segments (int): Total number of segments. + + Returns: + np.ndarray: Sampled frame indices in train mode. + """ + avg_interval = (valid_length + 1) // num_segments + if avg_interval > 0: + base_offsets = np.arange(num_segments) * avg_interval + offsets = base_offsets + np.random.randint( + avg_interval, size=num_segments) + else: + offsets = np.zeros((num_segments, ), dtype=np.int) + + return offsets + + @staticmethod + def _get_val_indices(valid_length, num_segments): + """Get indices of different stages of proposals in validation mode. + + It will calculate the average interval for each segment. + If the total number of valid length is smaller than num segments, + it will return all zero indices. + + Args: + valid_length (int): The length of the starting point's + valid interval. + num_segments (int): Total number of segments. + + Returns: + np.ndarray: Sampled frame indices in validation mode. + """ + if valid_length >= num_segments: + avg_interval = valid_length / float(num_segments) + base_offsets = np.arange(num_segments) * avg_interval + offsets = (base_offsets + avg_interval / 2.0).astype(np.int) + else: + offsets = np.zeros((num_segments, ), dtype=np.int) + + return offsets + + def _get_proposal_clips(self, proposal, num_frames): + """Get clip offsets in train mode. + + It will calculate sampled frame indices in the proposal's three + stages: starting, course and ending stage. + + Args: + proposal (obj): The proposal object. + num_frames (int): Total number of frame in the video. + + Returns: + np.ndarray: Sampled frame indices in train mode. + """ + # proposal interval: [start_frame, end_frame) + start_frame = proposal.start_frame + end_frame = proposal.end_frame + ori_clip_len = self.clip_len * self.frame_interval + + duration = end_frame - start_frame + assert duration != 0 + valid_length = duration - ori_clip_len + + valid_starting = max(0, + start_frame - int(duration * self.aug_ratio[0])) + valid_ending = min(num_frames - ori_clip_len + 1, + end_frame - 1 + int(duration * self.aug_ratio[1])) + + valid_starting_length = start_frame - valid_starting - ori_clip_len + valid_ending_length = (valid_ending - end_frame + 1) - ori_clip_len + + if self.mode == 'train': + starting_offsets = self._get_train_indices(valid_starting_length, + self.aug_segments[0]) + course_offsets = self._get_train_indices(valid_length, + self.body_segments) + ending_offsets = self._get_train_indices(valid_ending_length, + self.aug_segments[1]) + elif self.mode == 'val': + starting_offsets = self._get_val_indices(valid_starting_length, + self.aug_segments[0]) + course_offsets = self._get_val_indices(valid_length, + self.body_segments) + ending_offsets = self._get_val_indices(valid_ending_length, + self.aug_segments[1]) + starting_offsets += valid_starting + course_offsets += start_frame + ending_offsets += end_frame + + offsets = np.concatenate( + (starting_offsets, course_offsets, ending_offsets)) + return offsets + + def _get_train_clips(self, num_frames, proposals): + """Get clip offsets in train mode. + + It will calculate sampled frame indices of each proposal, and then + assemble them. + + Args: + num_frames (int): Total number of frame in the video. + proposals (list): Proposals fetched. + + Returns: + np.ndarray: Sampled frame indices in train mode. + """ + clip_offsets = [] + for proposal in proposals: + proposal_clip_offsets = self._get_proposal_clips( + proposal[0][1], num_frames) + clip_offsets = np.concatenate( + [clip_offsets, proposal_clip_offsets]) + + return clip_offsets + + def _get_test_clips(self, num_frames): + """Get clip offsets in test mode. + + It will calculate sampled frame indices based on test interval. + + Args: + num_frames (int): Total number of frame in the video. + + Returns: + np.ndarray: Sampled frame indices in test mode. + """ + ori_clip_len = self.clip_len * self.frame_interval + return np.arange( + 0, num_frames - ori_clip_len, self.test_interval, dtype=np.int) + + def _sample_clips(self, num_frames, proposals): + """Choose clip offsets for the video in a given mode. + + Args: + num_frames (int): Total number of frame in the video. + proposals (list | None): Proposals fetched. + It is set to None in test mode. + + Returns: + np.ndarray: Sampled frame indices. + """ + if self.mode == 'test': + clip_offsets = self._get_test_clips(num_frames) + else: + assert proposals is not None + clip_offsets = self._get_train_clips(num_frames, proposals) + + return clip_offsets + + def __call__(self, results): + """Perform the SampleFrames loading. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + total_frames = results['total_frames'] + + out_proposals = results.get('out_proposals', None) + clip_offsets = self._sample_clips(total_frames, out_proposals) + frame_inds = clip_offsets[:, None] + np.arange( + self.clip_len)[None, :] * self.frame_interval + frame_inds = np.concatenate(frame_inds) + + if self.temporal_jitter: + perframe_offsets = np.random.randint( + self.frame_interval, size=len(frame_inds)) + frame_inds += perframe_offsets + + start_index = results['start_index'] + frame_inds = np.mod(frame_inds, total_frames) + start_index + + results['frame_inds'] = np.array(frame_inds).astype(np.int) + results['clip_len'] = self.clip_len + results['frame_interval'] = self.frame_interval + results['num_clips'] = ( + self.body_segments + self.aug_segments[0] + self.aug_segments[1]) + if self.mode in ['train', 'val']: + results['num_proposals'] = len(results['out_proposals']) + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'clip_len={self.clip_len}, ' + f'body_segments={self.body_segments}, ' + f'aug_segments={self.aug_segments}, ' + f'aug_ratio={self.aug_ratio}, ' + f'frame_interval={self.frame_interval}, ' + f'test_interval={self.test_interval}, ' + f'temporal_jitter={self.temporal_jitter}, ' + f'mode={self.mode})') + return repr_str + + +@PIPELINES.register_module() +class PyAVInit: + """Using pyav to initialize the video. + + PyAV: https://github.com/mikeboers/PyAV + + Required keys are "filename", + added or modified keys are "video_reader", and "total_frames". + + Args: + io_backend (str): io backend where frames are store. + Default: 'disk'. + kwargs (dict): Args for file client. + """ + + def __init__(self, io_backend='disk', **kwargs): + self.io_backend = io_backend + self.kwargs = kwargs + self.file_client = None + + def __call__(self, results): + """Perform the PyAV initialization. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + try: + import av + except ImportError: + raise ImportError('Please run "conda install av -c conda-forge" ' + 'or "pip install av" to install PyAV first.') + + if self.file_client is None: + self.file_client = FileClient(self.io_backend, **self.kwargs) + + file_obj = io.BytesIO(self.file_client.get(results['filename'])) + container = av.open(file_obj) + + results['video_reader'] = container + results['total_frames'] = container.streams.video[0].frames + + return results + + def __repr__(self): + repr_str = f'{self.__class__.__name__}(io_backend={self.io_backend})' + return repr_str + + +@PIPELINES.register_module() +class PyAVDecode: + """Using PyAV to decode the video. + + PyAV: https://github.com/mikeboers/PyAV + + Required keys are "video_reader" and "frame_inds", + added or modified keys are "imgs", "img_shape" and "original_shape". + + Args: + multi_thread (bool): If set to True, it will apply multi + thread processing. Default: False. + mode (str): Decoding mode. Options are 'accurate' and 'efficient'. + If set to 'accurate', it will decode videos into accurate frames. + If set to 'efficient', it will adopt fast seeking but only return + the nearest key frames, which may be duplicated and inaccurate, + and more suitable for large scene-based video datasets. + Default: 'accurate'. + """ + + def __init__(self, multi_thread=False, mode='accurate'): + self.multi_thread = multi_thread + self.mode = mode + assert mode in ['accurate', 'efficient'] + + @staticmethod + def frame_generator(container, stream): + """Frame generator for PyAV.""" + for packet in container.demux(stream): + for frame in packet.decode(): + if frame: + return frame.to_rgb().to_ndarray() + + def __call__(self, results): + """Perform the PyAV decoding. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + container = results['video_reader'] + imgs = list() + + if self.multi_thread: + container.streams.video[0].thread_type = 'AUTO' + if results['frame_inds'].ndim != 1: + results['frame_inds'] = np.squeeze(results['frame_inds']) + + if self.mode == 'accurate': + # set max indice to make early stop + max_inds = max(results['frame_inds']) + i = 0 + for frame in container.decode(video=0): + if i > max_inds + 1: + break + imgs.append(frame.to_rgb().to_ndarray()) + i += 1 + + # the available frame in pyav may be less than its length, + # which may raise error + results['imgs'] = [ + imgs[i % len(imgs)] for i in results['frame_inds'] + ] + elif self.mode == 'efficient': + for frame in container.decode(video=0): + backup_frame = frame + break + stream = container.streams.video[0] + for idx in results['frame_inds']: + pts_scale = stream.average_rate * stream.time_base + frame_pts = int(idx / pts_scale) + container.seek( + frame_pts, any_frame=False, backward=True, stream=stream) + frame = self.frame_generator(container, stream) + if frame is not None: + imgs.append(frame) + backup_frame = frame + else: + imgs.append(backup_frame) + results['imgs'] = imgs + results['original_shape'] = imgs[0].shape[:2] + results['img_shape'] = imgs[0].shape[:2] + results['video_reader'] = None + del container + + return results + + def __repr__(self): + repr_str = self.__class__.__name__ + repr_str += f'(multi_thread={self.multi_thread}, mode={self.mode})' + return repr_str + + +@PIPELINES.register_module() +class PIMSInit: + """Use PIMS to initialize the video. + + PIMS: https://github.com/soft-matter/pims + + Args: + io_backend (str): io backend where frames are store. + Default: 'disk'. + mode (str): Decoding mode. Options are 'accurate' and 'efficient'. + If set to 'accurate', it will always use ``pims.PyAVReaderIndexed`` + to decode videos into accurate frames. If set to 'efficient', it + will adopt fast seeking by using ``pims.PyAVReaderTimed``. + Both will return the accurate frames in most cases. + Default: 'accurate'. + kwargs (dict): Args for file client. + """ + + def __init__(self, io_backend='disk', mode='accurate', **kwargs): + self.io_backend = io_backend + self.kwargs = kwargs + self.file_client = None + self.mode = mode + assert mode in ['accurate', 'efficient'] + + def __call__(self, results): + try: + import pims + except ImportError: + raise ImportError('Please run "conda install pims -c conda-forge" ' + 'or "pip install pims" to install pims first.') + + if self.file_client is None: + self.file_client = FileClient(self.io_backend, **self.kwargs) + + file_obj = io.BytesIO(self.file_client.get(results['filename'])) + if self.mode == 'accurate': + container = pims.PyAVReaderIndexed(file_obj) + else: + container = pims.PyAVReaderTimed(file_obj) + + results['video_reader'] = container + results['total_frames'] = len(container) + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(io_backend={self.io_backend}, ' + f'mode={self.mode})') + return repr_str + + +@PIPELINES.register_module() +class PIMSDecode: + """Using PIMS to decode the videos. + + PIMS: https://github.com/soft-matter/pims + + Required keys are "video_reader" and "frame_inds", + added or modified keys are "imgs", "img_shape" and "original_shape". + """ + + def __call__(self, results): + container = results['video_reader'] + + if results['frame_inds'].ndim != 1: + results['frame_inds'] = np.squeeze(results['frame_inds']) + + frame_inds = results['frame_inds'] + imgs = [container[idx] for idx in frame_inds] + + results['video_reader'] = None + del container + + results['imgs'] = imgs + results['original_shape'] = imgs[0].shape[:2] + results['img_shape'] = imgs[0].shape[:2] + + return results + + +@PIPELINES.register_module() +class PyAVDecodeMotionVector(PyAVDecode): + """Using pyav to decode the motion vectors from video. + + Reference: https://github.com/PyAV-Org/PyAV/ + blob/main/tests/test_decode.py + + Required keys are "video_reader" and "frame_inds", + added or modified keys are "motion_vectors", "frame_inds". + """ + + @staticmethod + def _parse_vectors(mv, vectors, height, width): + """Parse the returned vectors.""" + (w, h, src_x, src_y, dst_x, + dst_y) = (vectors['w'], vectors['h'], vectors['src_x'], + vectors['src_y'], vectors['dst_x'], vectors['dst_y']) + val_x = dst_x - src_x + val_y = dst_y - src_y + start_x = dst_x - w // 2 + start_y = dst_y - h // 2 + end_x = start_x + w + end_y = start_y + h + for sx, ex, sy, ey, vx, vy in zip(start_x, end_x, start_y, end_y, + val_x, val_y): + if (sx >= 0 and ex < width and sy >= 0 and ey < height): + mv[sy:ey, sx:ex] = (vx, vy) + + return mv + + def __call__(self, results): + """Perform the PyAV motion vector decoding. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + container = results['video_reader'] + imgs = list() + + if self.multi_thread: + container.streams.video[0].thread_type = 'AUTO' + if results['frame_inds'].ndim != 1: + results['frame_inds'] = np.squeeze(results['frame_inds']) + + # set max index to make early stop + max_idx = max(results['frame_inds']) + i = 0 + stream = container.streams.video[0] + codec_context = stream.codec_context + codec_context.options = {'flags2': '+export_mvs'} + for packet in container.demux(stream): + for frame in packet.decode(): + if i > max_idx + 1: + break + i += 1 + height = frame.height + width = frame.width + mv = np.zeros((height, width, 2), dtype=np.int8) + vectors = frame.side_data.get('MOTION_VECTORS') + if frame.key_frame: + # Key frame don't have motion vectors + assert vectors is None + if vectors is not None and len(vectors) > 0: + mv = self._parse_vectors(mv, vectors.to_ndarray(), height, + width) + imgs.append(mv) + + results['video_reader'] = None + del container + + # the available frame in pyav may be less than its length, + # which may raise error + results['motion_vectors'] = np.array( + [imgs[i % len(imgs)] for i in results['frame_inds']]) + return results + + +@PIPELINES.register_module() +class DecordInit: + """Using decord to initialize the video_reader. + + Decord: https://github.com/dmlc/decord + + Required keys are "filename", + added or modified keys are "video_reader" and "total_frames". + + Args: + io_backend (str): io backend where frames are store. + Default: 'disk'. + num_threads (int): Number of thread to decode the video. Default: 1. + kwargs (dict): Args for file client. + """ + + def __init__(self, io_backend='disk', num_threads=1, **kwargs): + self.io_backend = io_backend + self.num_threads = num_threads + self.kwargs = kwargs + self.file_client = None + + def __call__(self, results): + """Perform the Decord initialization. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + try: + import decord + except ImportError: + raise ImportError( + 'Please run "pip install decord" to install Decord first.') + + if self.file_client is None: + self.file_client = FileClient(self.io_backend, **self.kwargs) + + file_obj = io.BytesIO(self.file_client.get(results['filename'])) + container = decord.VideoReader(file_obj, num_threads=self.num_threads) + results['video_reader'] = container + results['total_frames'] = len(container) + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'io_backend={self.io_backend}, ' + f'num_threads={self.num_threads})') + return repr_str + + +@PIPELINES.register_module() +class DecordDecode: + """Using decord to decode the video. + + Decord: https://github.com/dmlc/decord + + Required keys are "video_reader", "filename" and "frame_inds", + added or modified keys are "imgs" and "original_shape". + + Args: + mode (str): Decoding mode. Options are 'accurate' and 'efficient'. + If set to 'accurate', it will decode videos into accurate frames. + If set to 'efficient', it will adopt fast seeking but only return + key frames, which may be duplicated and inaccurate, and more + suitable for large scene-based video datasets. Default: 'accurate'. + """ + + def __init__(self, mode='accurate'): + self.mode = mode + assert mode in ['accurate', 'efficient'] + + def __call__(self, results): + """Perform the Decord decoding. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + container = results['video_reader'] + + if results['frame_inds'].ndim != 1: + results['frame_inds'] = np.squeeze(results['frame_inds']) + + frame_inds = results['frame_inds'] + + if self.mode == 'accurate': + imgs = container.get_batch(frame_inds).asnumpy() + imgs = list(imgs) + elif self.mode == 'efficient': + # This mode is faster, however it always returns I-FRAME + container.seek(0) + imgs = list() + for idx in frame_inds: + container.seek(idx) + frame = container.next() + imgs.append(frame.asnumpy()) + + results['video_reader'] = None + del container + + results['imgs'] = imgs + results['original_shape'] = imgs[0].shape[:2] + results['img_shape'] = imgs[0].shape[:2] + + return results + + def __repr__(self): + repr_str = f'{self.__class__.__name__}(mode={self.mode})' + return repr_str + + +@PIPELINES.register_module() +class OpenCVInit: + """Using OpenCV to initialize the video_reader. + + Required keys are "filename", added or modified keys are "new_path", + "video_reader" and "total_frames". + + Args: + io_backend (str): io backend where frames are store. + Default: 'disk'. + kwargs (dict): Args for file client. + """ + + def __init__(self, io_backend='disk', **kwargs): + self.io_backend = io_backend + self.kwargs = kwargs + self.file_client = None + self.tmp_folder = None + if self.io_backend != 'disk': + random_string = get_random_string() + thread_id = get_thread_id() + self.tmp_folder = osp.join(get_shm_dir(), + f'{random_string}_{thread_id}') + os.mkdir(self.tmp_folder) + + def __call__(self, results): + """Perform the OpenCV initialization. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + if self.io_backend == 'disk': + new_path = results['filename'] + else: + if self.file_client is None: + self.file_client = FileClient(self.io_backend, **self.kwargs) + + thread_id = get_thread_id() + # save the file of same thread at the same place + new_path = osp.join(self.tmp_folder, f'tmp_{thread_id}.mp4') + with open(new_path, 'wb') as f: + f.write(self.file_client.get(results['filename'])) + + container = mmcv.VideoReader(new_path) + results['new_path'] = new_path + results['video_reader'] = container + results['total_frames'] = len(container) + + return results + + def __del__(self): + if self.tmp_folder and osp.exists(self.tmp_folder): + shutil.rmtree(self.tmp_folder) + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'io_backend={self.io_backend})') + return repr_str + + +@PIPELINES.register_module() +class OpenCVDecode: + """Using OpenCV to decode the video. + + Required keys are "video_reader", "filename" and "frame_inds", added or + modified keys are "imgs", "img_shape" and "original_shape". + """ + + def __call__(self, results): + """Perform the OpenCV decoding. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + container = results['video_reader'] + imgs = list() + + if results['frame_inds'].ndim != 1: + results['frame_inds'] = np.squeeze(results['frame_inds']) + + for frame_ind in results['frame_inds']: + cur_frame = container[frame_ind] + # last frame may be None in OpenCV + while isinstance(cur_frame, type(None)): + frame_ind -= 1 + cur_frame = container[frame_ind] + imgs.append(cur_frame) + + results['video_reader'] = None + del container + + imgs = np.array(imgs) + # The default channel order of OpenCV is BGR, thus we change it to RGB + imgs = imgs[:, :, :, ::-1] + results['imgs'] = list(imgs) + results['original_shape'] = imgs[0].shape[:2] + results['img_shape'] = imgs[0].shape[:2] + + return results + + +@PIPELINES.register_module() +class RawFrameDecode: + """Load and decode frames with given indices. + + Required keys are "frame_dir", "filename_tmpl" and "frame_inds", + added or modified keys are "imgs", "img_shape" and "original_shape". + + Args: + io_backend (str): IO backend where frames are stored. Default: 'disk'. + decoding_backend (str): Backend used for image decoding. + Default: 'cv2'. + kwargs (dict, optional): Arguments for FileClient. + """ + + def __init__(self, io_backend='disk', decoding_backend='cv2', **kwargs): + self.io_backend = io_backend + self.decoding_backend = decoding_backend + self.kwargs = kwargs + self.file_client = None + + def __call__(self, results): + """Perform the ``RawFrameDecode`` to pick frames given indices. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + mmcv.use_backend(self.decoding_backend) + + directory = results['frame_dir'] + filename_tmpl = results['filename_tmpl'] + modality = results['modality'] + + if self.file_client is None: + self.file_client = FileClient(self.io_backend, **self.kwargs) + + imgs = list() + + if results['frame_inds'].ndim != 1: + results['frame_inds'] = np.squeeze(results['frame_inds']) + + offset = results.get('offset', 0) + + cache = {} + for i, frame_idx in enumerate(results['frame_inds']): + # Avoid loading duplicated frames + if frame_idx in cache: + if modality == 'RGB': + imgs.append(cp.deepcopy(imgs[cache[frame_idx]])) + else: + imgs.append(cp.deepcopy(imgs[2 * cache[frame_idx]])) + imgs.append(cp.deepcopy(imgs[2 * cache[frame_idx] + 1])) + continue + else: + cache[frame_idx] = i + + frame_idx += offset + if modality == 'RGB': + filepath = osp.join(directory, filename_tmpl.format(frame_idx)) + img_bytes = self.file_client.get(filepath) + # Get frame with channel order RGB directly. + cur_frame = mmcv.imfrombytes(img_bytes, channel_order='rgb') + imgs.append(cur_frame) + elif modality == 'Flow': + x_filepath = osp.join(directory, + filename_tmpl.format('x', frame_idx)) + y_filepath = osp.join(directory, + filename_tmpl.format('y', frame_idx)) + x_img_bytes = self.file_client.get(x_filepath) + x_frame = mmcv.imfrombytes(x_img_bytes, flag='grayscale') + y_img_bytes = self.file_client.get(y_filepath) + y_frame = mmcv.imfrombytes(y_img_bytes, flag='grayscale') + imgs.extend([x_frame, y_frame]) + else: + raise NotImplementedError + + results['imgs'] = imgs + results['original_shape'] = imgs[0].shape[:2] + results['img_shape'] = imgs[0].shape[:2] + + # we resize the gt_bboxes and proposals to their real scale + if 'gt_bboxes' in results: + h, w = results['img_shape'] + scale_factor = np.array([w, h, w, h]) + gt_bboxes = results['gt_bboxes'] + gt_bboxes = (gt_bboxes * scale_factor).astype(np.float32) + results['gt_bboxes'] = gt_bboxes + if 'proposals' in results and results['proposals'] is not None: + proposals = results['proposals'] + proposals = (proposals * scale_factor).astype(np.float32) + results['proposals'] = proposals + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'io_backend={self.io_backend}, ' + f'decoding_backend={self.decoding_backend})') + return repr_str + + +@PIPELINES.register_module() +class ArrayDecode: + """Load and decode frames with given indices from a 4D array. + + Required keys are "array and "frame_inds", added or modified keys are + "imgs", "img_shape" and "original_shape". + """ + + def __call__(self, results): + """Perform the ``RawFrameDecode`` to pick frames given indices. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + + modality = results['modality'] + array = results['array'] + + imgs = list() + + if results['frame_inds'].ndim != 1: + results['frame_inds'] = np.squeeze(results['frame_inds']) + + offset = results.get('offset', 0) + + for i, frame_idx in enumerate(results['frame_inds']): + + frame_idx += offset + if modality == 'RGB': + imgs.append(array[frame_idx]) + elif modality == 'Flow': + imgs.extend( + [array[frame_idx, ..., 0], array[frame_idx, ..., 1]]) + else: + raise NotImplementedError + + results['imgs'] = imgs + results['original_shape'] = imgs[0].shape[:2] + results['img_shape'] = imgs[0].shape[:2] + + return results + + def __repr__(self): + return f'{self.__class__.__name__}()' + + +@PIPELINES.register_module() +class ImageDecode: + """Load and decode images. + + Required key is "filename", added or modified keys are "imgs", "img_shape" + and "original_shape". + + Args: + io_backend (str): IO backend where frames are stored. Default: 'disk'. + decoding_backend (str): Backend used for image decoding. + Default: 'cv2'. + kwargs (dict, optional): Arguments for FileClient. + """ + + def __init__(self, io_backend='disk', decoding_backend='cv2', **kwargs): + self.io_backend = io_backend + self.decoding_backend = decoding_backend + self.kwargs = kwargs + self.file_client = None + + def __call__(self, results): + """Perform the ``ImageDecode`` to load image given the file path. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + mmcv.use_backend(self.decoding_backend) + + filename = results['filename'] + + if self.file_client is None: + self.file_client = FileClient(self.io_backend, **self.kwargs) + + imgs = list() + img_bytes = self.file_client.get(filename) + + img = mmcv.imfrombytes(img_bytes, channel_order='rgb') + imgs.append(img) + + results['imgs'] = imgs + results['original_shape'] = imgs[0].shape[:2] + results['img_shape'] = imgs[0].shape[:2] + return results + + +@PIPELINES.register_module() +class AudioDecodeInit: + """Using librosa to initialize the audio reader. + + Required keys are "audio_path", added or modified keys are "length", + "sample_rate", "audios". + + Args: + io_backend (str): io backend where frames are store. + Default: 'disk'. + sample_rate (int): Audio sampling times per second. Default: 16000. + """ + + def __init__(self, + io_backend='disk', + sample_rate=16000, + pad_method='zero', + **kwargs): + self.io_backend = io_backend + self.sample_rate = sample_rate + if pad_method in ['random', 'zero']: + self.pad_method = pad_method + else: + raise NotImplementedError + self.kwargs = kwargs + self.file_client = None + + @staticmethod + def _zero_pad(shape): + return np.zeros(shape, dtype=np.float32) + + @staticmethod + def _random_pad(shape): + # librosa load raw audio file into a distribution of -1~+1 + return np.random.rand(shape).astype(np.float32) * 2 - 1 + + def __call__(self, results): + """Perform the librosa initialization. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + try: + import librosa + except ImportError: + raise ImportError('Please install librosa first.') + + if self.file_client is None: + self.file_client = FileClient(self.io_backend, **self.kwargs) + if osp.exists(results['audio_path']): + file_obj = io.BytesIO(self.file_client.get(results['audio_path'])) + y, sr = librosa.load(file_obj, sr=self.sample_rate) + else: + # Generate a random dummy 10s input + pad_func = getattr(self, f'_{self.pad_method}_pad') + y = pad_func(int(round(10.0 * self.sample_rate))) + sr = self.sample_rate + + results['length'] = y.shape[0] + results['sample_rate'] = sr + results['audios'] = y + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'io_backend={self.io_backend}, ' + f'sample_rate={self.sample_rate}, ' + f'pad_method={self.pad_method})') + return repr_str + + +@PIPELINES.register_module() +class LoadAudioFeature: + """Load offline extracted audio features. + + Required keys are "audio_path", added or modified keys are "length", + audios". + """ + + def __init__(self, pad_method='zero'): + if pad_method not in ['zero', 'random']: + raise NotImplementedError + self.pad_method = pad_method + + @staticmethod + def _zero_pad(shape): + return np.zeros(shape, dtype=np.float32) + + @staticmethod + def _random_pad(shape): + # spectrogram is normalized into a distribution of 0~1 + return np.random.rand(shape).astype(np.float32) + + def __call__(self, results): + """Perform the numpy loading. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + if osp.exists(results['audio_path']): + feature_map = np.load(results['audio_path']) + else: + # Generate a random dummy 10s input + # Some videos do not have audio stream + pad_func = getattr(self, f'_{self.pad_method}_pad') + feature_map = pad_func((640, 80)) + + results['length'] = feature_map.shape[0] + results['audios'] = feature_map + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'pad_method={self.pad_method})') + return repr_str + + +@PIPELINES.register_module() +class AudioDecode: + """Sample the audio w.r.t. the frames selected. + + Args: + fixed_length (int): As the audio clip selected by frames sampled may + not be exactly the same, `fixed_length` will truncate or pad them + into the same size. Default: 32000. + + Required keys are "frame_inds", "num_clips", "total_frames", "length", + added or modified keys are "audios", "audios_shape". + """ + + def __init__(self, fixed_length=32000): + self.fixed_length = fixed_length + + def __call__(self, results): + """Perform the ``AudioDecode`` to pick audio clips.""" + audio = results['audios'] + frame_inds = results['frame_inds'] + num_clips = results['num_clips'] + resampled_clips = list() + frame_inds = frame_inds.reshape(num_clips, -1) + for clip_idx in range(num_clips): + clip_frame_inds = frame_inds[clip_idx] + start_idx = max( + 0, + int( + round((clip_frame_inds[0] + 1) / results['total_frames'] * + results['length']))) + end_idx = min( + results['length'], + int( + round((clip_frame_inds[-1] + 1) / results['total_frames'] * + results['length']))) + cropped_audio = audio[start_idx:end_idx] + if cropped_audio.shape[0] >= self.fixed_length: + truncated_audio = cropped_audio[:self.fixed_length] + else: + truncated_audio = np.pad( + cropped_audio, + ((0, self.fixed_length - cropped_audio.shape[0])), + mode='constant') + + resampled_clips.append(truncated_audio) + + results['audios'] = np.array(resampled_clips) + results['audios_shape'] = results['audios'].shape + return results + + +@PIPELINES.register_module() +class BuildPseudoClip: + """Build pseudo clips with one single image by repeating it n times. + + Required key is "imgs", added or modified key is "imgs", "num_clips", + "clip_len". + + Args: + clip_len (int): Frames of the generated pseudo clips. + """ + + def __init__(self, clip_len): + self.clip_len = clip_len + + def __call__(self, results): + # the input should be one single image + assert len(results['imgs']) == 1 + im = results['imgs'][0] + for _ in range(1, self.clip_len): + results['imgs'].append(np.copy(im)) + results['clip_len'] = self.clip_len + results['num_clips'] = 1 + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'fix_length={self.fixed_length})') + return repr_str + + +@PIPELINES.register_module() +class AudioFeatureSelector: + """Sample the audio feature w.r.t. the frames selected. + + Required keys are "audios", "frame_inds", "num_clips", "length", + "total_frames", added or modified keys are "audios", "audios_shape". + + Args: + fixed_length (int): As the features selected by frames sampled may + not be exactly the same, `fixed_length` will truncate or pad them + into the same size. Default: 128. + """ + + def __init__(self, fixed_length=128): + self.fixed_length = fixed_length + + def __call__(self, results): + """Perform the ``AudioFeatureSelector`` to pick audio feature clips. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + audio = results['audios'] + frame_inds = results['frame_inds'] + num_clips = results['num_clips'] + resampled_clips = list() + + frame_inds = frame_inds.reshape(num_clips, -1) + for clip_idx in range(num_clips): + clip_frame_inds = frame_inds[clip_idx] + start_idx = max( + 0, + int( + round((clip_frame_inds[0] + 1) / results['total_frames'] * + results['length']))) + end_idx = min( + results['length'], + int( + round((clip_frame_inds[-1] + 1) / results['total_frames'] * + results['length']))) + cropped_audio = audio[start_idx:end_idx, :] + if cropped_audio.shape[0] >= self.fixed_length: + truncated_audio = cropped_audio[:self.fixed_length, :] + else: + truncated_audio = np.pad( + cropped_audio, + ((0, self.fixed_length - cropped_audio.shape[0]), (0, 0)), + mode='constant') + + resampled_clips.append(truncated_audio) + results['audios'] = np.array(resampled_clips) + results['audios_shape'] = results['audios'].shape + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'fix_length={self.fixed_length})') + return repr_str + + +@PIPELINES.register_module() +class LoadLocalizationFeature: + """Load Video features for localizer with given video_name list. + + Required keys are "video_name" and "data_prefix", added or modified keys + are "raw_feature". + + Args: + raw_feature_ext (str): Raw feature file extension. Default: '.csv'. + """ + + def __init__(self, raw_feature_ext='.csv'): + valid_raw_feature_ext = ('.csv', ) + if raw_feature_ext not in valid_raw_feature_ext: + raise NotImplementedError + self.raw_feature_ext = raw_feature_ext + + def __call__(self, results): + """Perform the LoadLocalizationFeature loading. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + video_name = results['video_name'] + data_prefix = results['data_prefix'] + + data_path = osp.join(data_prefix, video_name + self.raw_feature_ext) + raw_feature = np.loadtxt( + data_path, dtype=np.float32, delimiter=',', skiprows=1) + + results['raw_feature'] = np.transpose(raw_feature, (1, 0)) + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'raw_feature_ext={self.raw_feature_ext})') + return repr_str + + +@PIPELINES.register_module() +class GenerateLocalizationLabels: + """Load video label for localizer with given video_name list. + + Required keys are "duration_frame", "duration_second", "feature_frame", + "annotations", added or modified keys are "gt_bbox". + """ + + def __call__(self, results): + """Perform the GenerateLocalizationLabels loading. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + video_frame = results['duration_frame'] + video_second = results['duration_second'] + feature_frame = results['feature_frame'] + corrected_second = float(feature_frame) / video_frame * video_second + annotations = results['annotations'] + + gt_bbox = [] + + for annotation in annotations: + current_start = max( + min(1, annotation['segment'][0] / corrected_second), 0) + current_end = max( + min(1, annotation['segment'][1] / corrected_second), 0) + gt_bbox.append([current_start, current_end]) + + gt_bbox = np.array(gt_bbox) + results['gt_bbox'] = gt_bbox + return results + + +@PIPELINES.register_module() +class LoadProposals: + """Loading proposals with given proposal results. + + Required keys are "video_name", added or modified keys are 'bsp_feature', + 'tmin', 'tmax', 'tmin_score', 'tmax_score' and 'reference_temporal_iou'. + + Args: + top_k (int): The top k proposals to be loaded. + pgm_proposals_dir (str): Directory to load proposals. + pgm_features_dir (str): Directory to load proposal features. + proposal_ext (str): Proposal file extension. Default: '.csv'. + feature_ext (str): Feature file extension. Default: '.npy'. + """ + + def __init__(self, + top_k, + pgm_proposals_dir, + pgm_features_dir, + proposal_ext='.csv', + feature_ext='.npy'): + self.top_k = top_k + self.pgm_proposals_dir = pgm_proposals_dir + self.pgm_features_dir = pgm_features_dir + valid_proposal_ext = ('.csv', ) + if proposal_ext not in valid_proposal_ext: + raise NotImplementedError + self.proposal_ext = proposal_ext + valid_feature_ext = ('.npy', ) + if feature_ext not in valid_feature_ext: + raise NotImplementedError + self.feature_ext = feature_ext + + def __call__(self, results): + """Perform the LoadProposals loading. + + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + video_name = results['video_name'] + proposal_path = osp.join(self.pgm_proposals_dir, + video_name + self.proposal_ext) + if self.proposal_ext == '.csv': + pgm_proposals = np.loadtxt( + proposal_path, dtype=np.float32, delimiter=',', skiprows=1) + + pgm_proposals = np.array(pgm_proposals[:self.top_k]) + tmin = pgm_proposals[:, 0] + tmax = pgm_proposals[:, 1] + tmin_score = pgm_proposals[:, 2] + tmax_score = pgm_proposals[:, 3] + reference_temporal_iou = pgm_proposals[:, 5] + + feature_path = osp.join(self.pgm_features_dir, + video_name + self.feature_ext) + if self.feature_ext == '.npy': + bsp_feature = np.load(feature_path).astype(np.float32) + + bsp_feature = bsp_feature[:self.top_k, :] + + results['bsp_feature'] = bsp_feature + results['tmin'] = tmin + results['tmax'] = tmax + results['tmin_score'] = tmin_score + results['tmax_score'] = tmax_score + results['reference_temporal_iou'] = reference_temporal_iou + + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'top_k={self.top_k}, ' + f'pgm_proposals_dir={self.pgm_proposals_dir}, ' + f'pgm_features_dir={self.pgm_features_dir}, ' + f'proposal_ext={self.proposal_ext}, ' + f'feature_ext={self.feature_ext})') + return repr_str diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/pose_loading.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/pose_loading.py new file mode 100644 index 0000000000000000000000000000000000000000..51a210da2805416c350ec07ed4bc89228cfd9372 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pipelines/pose_loading.py @@ -0,0 +1,695 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy as cp +import pickle + +import numpy as np +from mmcv.fileio import FileClient +from scipy.stats import mode + +from ..builder import PIPELINES +from .augmentations import Flip + + +@PIPELINES.register_module() +class UniformSampleFrames: + """Uniformly sample frames from the video. + + To sample an n-frame clip from the video. UniformSampleFrames basically + divide the video into n segments of equal length and randomly sample one + frame from each segment. To make the testing results reproducible, a + random seed is set during testing, to make the sampling results + deterministic. + + Required keys are "total_frames", "start_index" , added or modified keys + are "frame_inds", "clip_len", "frame_interval" and "num_clips". + + Args: + clip_len (int): Frames of each sampled output clip. + num_clips (int): Number of clips to be sampled. Default: 1. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + seed (int): The random seed used during test time. Default: 255. + """ + + def __init__(self, clip_len, num_clips=1, test_mode=False, seed=255): + + self.clip_len = clip_len + self.num_clips = num_clips + self.test_mode = test_mode + self.seed = seed + + def _get_train_clips(self, num_frames, clip_len): + """Uniformly sample indices for training clips. + + Args: + num_frames (int): The number of frames. + clip_len (int): The length of the clip. + """ + + assert self.num_clips == 1 + if num_frames < clip_len: + start = np.random.randint(0, num_frames) + inds = np.arange(start, start + clip_len) + elif clip_len <= num_frames < 2 * clip_len: + basic = np.arange(clip_len) + inds = np.random.choice( + clip_len + 1, num_frames - clip_len, replace=False) + offset = np.zeros(clip_len + 1, dtype=np.int64) + offset[inds] = 1 + offset = np.cumsum(offset) + inds = basic + offset[:-1] + else: + bids = np.array( + [i * num_frames // clip_len for i in range(clip_len + 1)]) + bsize = np.diff(bids) + bst = bids[:clip_len] + offset = np.random.randint(bsize) + inds = bst + offset + return inds + + def _get_test_clips(self, num_frames, clip_len): + """Uniformly sample indices for testing clips. + + Args: + num_frames (int): The number of frames. + clip_len (int): The length of the clip. + """ + + np.random.seed(self.seed) + if num_frames < clip_len: + # Then we use a simple strategy + if num_frames < self.num_clips: + start_inds = list(range(self.num_clips)) + else: + start_inds = [ + i * num_frames // self.num_clips + for i in range(self.num_clips) + ] + inds = np.concatenate( + [np.arange(i, i + clip_len) for i in start_inds]) + elif clip_len <= num_frames < clip_len * 2: + all_inds = [] + for i in range(self.num_clips): + basic = np.arange(clip_len) + inds = np.random.choice( + clip_len + 1, num_frames - clip_len, replace=False) + offset = np.zeros(clip_len + 1, dtype=np.int64) + offset[inds] = 1 + offset = np.cumsum(offset) + inds = basic + offset[:-1] + all_inds.append(inds) + inds = np.concatenate(all_inds) + else: + bids = np.array( + [i * num_frames // clip_len for i in range(clip_len + 1)]) + bsize = np.diff(bids) + bst = bids[:clip_len] + all_inds = [] + for i in range(self.num_clips): + offset = np.random.randint(bsize) + all_inds.append(bst + offset) + inds = np.concatenate(all_inds) + return inds + + def __call__(self, results): + num_frames = results['total_frames'] + + if self.test_mode: + inds = self._get_test_clips(num_frames, self.clip_len) + else: + inds = self._get_train_clips(num_frames, self.clip_len) + + inds = np.mod(inds, num_frames) + start_index = results['start_index'] + inds = inds + start_index + + results['frame_inds'] = inds.astype(np.int) + results['clip_len'] = self.clip_len + results['frame_interval'] = None + results['num_clips'] = self.num_clips + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'clip_len={self.clip_len}, ' + f'num_clips={self.num_clips}, ' + f'test_mode={self.test_mode}, ' + f'seed={self.seed})') + return repr_str + + +@PIPELINES.register_module() +class PoseDecode: + """Load and decode pose with given indices. + + Required keys are "keypoint", "frame_inds" (optional), "keypoint_score" + (optional), added or modified keys are "keypoint", "keypoint_score" (if + applicable). + """ + + @staticmethod + def _load_kp(kp, frame_inds): + """Load keypoints given frame indices. + + Args: + kp (np.ndarray): The keypoint coordinates. + frame_inds (np.ndarray): The frame indices. + """ + + return [x[frame_inds].astype(np.float32) for x in kp] + + @staticmethod + def _load_kpscore(kpscore, frame_inds): + """Load keypoint scores given frame indices. + + Args: + kpscore (np.ndarray): The confidence scores of keypoints. + frame_inds (np.ndarray): The frame indices. + """ + + return [x[frame_inds].astype(np.float32) for x in kpscore] + + def __call__(self, results): + + if 'frame_inds' not in results: + results['frame_inds'] = np.arange(results['total_frames']) + + if results['frame_inds'].ndim != 1: + results['frame_inds'] = np.squeeze(results['frame_inds']) + + offset = results.get('offset', 0) + frame_inds = results['frame_inds'] + offset + + if 'keypoint_score' in results: + kpscore = results['keypoint_score'] + results['keypoint_score'] = kpscore[:, + frame_inds].astype(np.float32) + + if 'keypoint' in results: + results['keypoint'] = results['keypoint'][:, frame_inds].astype( + np.float32) + + return results + + def __repr__(self): + repr_str = f'{self.__class__.__name__}()' + return repr_str + + +@PIPELINES.register_module() +class LoadKineticsPose: + """Load Kinetics Pose given filename (The format should be pickle) + + Required keys are "filename", "total_frames", "img_shape", "frame_inds", + "anno_inds" (for mmpose source, optional), added or modified keys are + "keypoint", "keypoint_score". + + Args: + io_backend (str): IO backend where frames are stored. Default: 'disk'. + squeeze (bool): Whether to remove frames with no human pose. + Default: True. + max_person (int): The max number of persons in a frame. Default: 10. + keypoint_weight (dict): The weight of keypoints. We set the confidence + score of a person as the weighted sum of confidence scores of each + joint. Persons with low confidence scores are dropped (if exceed + max_person). Default: dict(face=1, torso=2, limb=3). + source (str): The sources of the keypoints used. Choices are 'mmpose' + and 'openpose-18'. Default: 'mmpose'. + kwargs (dict, optional): Arguments for FileClient. + """ + + def __init__(self, + io_backend='disk', + squeeze=True, + max_person=100, + keypoint_weight=dict(face=1, torso=2, limb=3), + source='mmpose', + **kwargs): + + self.io_backend = io_backend + self.squeeze = squeeze + self.max_person = max_person + self.keypoint_weight = cp.deepcopy(keypoint_weight) + self.source = source + + if source == 'openpose-18': + self.kpsubset = dict( + face=[0, 14, 15, 16, 17], + torso=[1, 2, 8, 5, 11], + limb=[3, 4, 6, 7, 9, 10, 12, 13]) + elif source == 'mmpose': + self.kpsubset = dict( + face=[0, 1, 2, 3, 4], + torso=[5, 6, 11, 12], + limb=[7, 8, 9, 10, 13, 14, 15, 16]) + else: + raise NotImplementedError('Unknown source of Kinetics Pose') + + self.kwargs = kwargs + self.file_client = None + + def __call__(self, results): + + assert 'filename' in results + filename = results.pop('filename') + + # only applicable to source == 'mmpose' + anno_inds = None + if 'anno_inds' in results: + assert self.source == 'mmpose' + anno_inds = results.pop('anno_inds') + results.pop('box_score', None) + + if self.file_client is None: + self.file_client = FileClient(self.io_backend, **self.kwargs) + + bytes = self.file_client.get(filename) + + # only the kp array is in the pickle file, each kp include x, y, score. + kps = pickle.loads(bytes) + + total_frames = results['total_frames'] + + frame_inds = results.pop('frame_inds') + + if anno_inds is not None: + kps = kps[anno_inds] + frame_inds = frame_inds[anno_inds] + + frame_inds = list(frame_inds) + + def mapinds(inds): + uni = np.unique(inds) + map_ = {x: i for i, x in enumerate(uni)} + inds = [map_[x] for x in inds] + return np.array(inds, dtype=np.int16) + + if self.squeeze: + frame_inds = mapinds(frame_inds) + total_frames = np.max(frame_inds) + 1 + + # write it back + results['total_frames'] = total_frames + + h, w = results['img_shape'] + if self.source == 'openpose-18': + kps[:, :, 0] *= w + kps[:, :, 1] *= h + + num_kp = kps.shape[1] + num_person = mode(frame_inds)[-1][0] + + new_kp = np.zeros([num_person, total_frames, num_kp, 2], + dtype=np.float16) + new_kpscore = np.zeros([num_person, total_frames, num_kp], + dtype=np.float16) + # 32768 is enough + num_person_frame = np.zeros([total_frames], dtype=np.int16) + + for frame_ind, kp in zip(frame_inds, kps): + person_ind = num_person_frame[frame_ind] + new_kp[person_ind, frame_ind] = kp[:, :2] + new_kpscore[person_ind, frame_ind] = kp[:, 2] + num_person_frame[frame_ind] += 1 + + kpgrp = self.kpsubset + weight = self.keypoint_weight + results['num_person'] = num_person + + if num_person > self.max_person: + for i in range(total_frames): + np_frame = num_person_frame[i] + val = new_kpscore[:np_frame, i] + + val = ( + np.sum(val[:, kpgrp['face']], 1) * weight['face'] + + np.sum(val[:, kpgrp['torso']], 1) * weight['torso'] + + np.sum(val[:, kpgrp['limb']], 1) * weight['limb']) + inds = sorted(range(np_frame), key=lambda x: -val[x]) + new_kpscore[:np_frame, i] = new_kpscore[inds, i] + new_kp[:np_frame, i] = new_kp[inds, i] + results['num_person'] = self.max_person + + results['keypoint'] = new_kp[:self.max_person] + results['keypoint_score'] = new_kpscore[:self.max_person] + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'io_backend={self.io_backend}, ' + f'squeeze={self.squeeze}, ' + f'max_person={self.max_person}, ' + f'keypoint_weight={self.keypoint_weight}, ' + f'source={self.source}, ' + f'kwargs={self.kwargs})') + return repr_str + + +@PIPELINES.register_module() +class GeneratePoseTarget: + """Generate pseudo heatmaps based on joint coordinates and confidence. + + Required keys are "keypoint", "img_shape", "keypoint_score" (optional), + added or modified keys are "imgs". + + Args: + sigma (float): The sigma of the generated gaussian map. Default: 0.6. + use_score (bool): Use the confidence score of keypoints as the maximum + of the gaussian maps. Default: True. + with_kp (bool): Generate pseudo heatmaps for keypoints. Default: True. + with_limb (bool): Generate pseudo heatmaps for limbs. At least one of + 'with_kp' and 'with_limb' should be True. Default: False. + skeletons (tuple[tuple]): The definition of human skeletons. + Default: ((0, 1), (0, 2), (1, 3), (2, 4), (0, 5), (5, 7), (7, 9), + (0, 6), (6, 8), (8, 10), (5, 11), (11, 13), (13, 15), + (6, 12), (12, 14), (14, 16), (11, 12)), + which is the definition of COCO-17p skeletons. + double (bool): Output both original heatmaps and flipped heatmaps. + Default: False. + left_kp (tuple[int]): Indexes of left keypoints, which is used when + flipping heatmaps. Default: (1, 3, 5, 7, 9, 11, 13, 15), + which is left keypoints in COCO-17p. + right_kp (tuple[int]): Indexes of right keypoints, which is used when + flipping heatmaps. Default: (2, 4, 6, 8, 10, 12, 14, 16), + which is right keypoints in COCO-17p. + """ + + def __init__(self, + sigma=0.6, + use_score=True, + with_kp=True, + with_limb=False, + skeletons=((0, 1), (0, 2), (1, 3), (2, 4), (0, 5), (5, 7), + (7, 9), (0, 6), (6, 8), (8, 10), (5, 11), (11, 13), + (13, 15), (6, 12), (12, 14), (14, 16), (11, 12)), + double=False, + left_kp=(1, 3, 5, 7, 9, 11, 13, 15), + right_kp=(2, 4, 6, 8, 10, 12, 14, 16)): + + self.sigma = sigma + self.use_score = use_score + self.with_kp = with_kp + self.with_limb = with_limb + self.double = double + + # an auxiliary const + self.eps = 1e-4 + + assert self.with_kp or self.with_limb, ( + 'At least one of "with_limb" ' + 'and "with_kp" should be set as True.') + self.left_kp = left_kp + self.right_kp = right_kp + self.skeletons = skeletons + + def generate_a_heatmap(self, img_h, img_w, centers, sigma, max_values): + """Generate pseudo heatmap for one keypoint in one frame. + + Args: + img_h (int): The height of the heatmap. + img_w (int): The width of the heatmap. + centers (np.ndarray): The coordinates of corresponding keypoints + (of multiple persons). + sigma (float): The sigma of generated gaussian. + max_values (np.ndarray): The max values of each keypoint. + + Returns: + np.ndarray: The generated pseudo heatmap. + """ + + heatmap = np.zeros([img_h, img_w], dtype=np.float32) + + for center, max_value in zip(centers, max_values): + mu_x, mu_y = center[0], center[1] + if max_value < self.eps: + continue + + st_x = max(int(mu_x - 3 * sigma), 0) + ed_x = min(int(mu_x + 3 * sigma) + 1, img_w) + st_y = max(int(mu_y - 3 * sigma), 0) + ed_y = min(int(mu_y + 3 * sigma) + 1, img_h) + x = np.arange(st_x, ed_x, 1, np.float32) + y = np.arange(st_y, ed_y, 1, np.float32) + + # if the keypoint not in the heatmap coordinate system + if not (len(x) and len(y)): + continue + y = y[:, None] + + patch = np.exp(-((x - mu_x)**2 + (y - mu_y)**2) / 2 / sigma**2) + patch = patch * max_value + heatmap[st_y:ed_y, + st_x:ed_x] = np.maximum(heatmap[st_y:ed_y, st_x:ed_x], + patch) + + return heatmap + + def generate_a_limb_heatmap(self, img_h, img_w, starts, ends, sigma, + start_values, end_values): + """Generate pseudo heatmap for one limb in one frame. + + Args: + img_h (int): The height of the heatmap. + img_w (int): The width of the heatmap. + starts (np.ndarray): The coordinates of one keypoint in the + corresponding limbs (of multiple persons). + ends (np.ndarray): The coordinates of the other keypoint in the + corresponding limbs (of multiple persons). + sigma (float): The sigma of generated gaussian. + start_values (np.ndarray): The max values of one keypoint in the + corresponding limbs. + end_values (np.ndarray): The max values of the other keypoint in + the corresponding limbs. + + Returns: + np.ndarray: The generated pseudo heatmap. + """ + + heatmap = np.zeros([img_h, img_w], dtype=np.float32) + + for start, end, start_value, end_value in zip(starts, ends, + start_values, + end_values): + value_coeff = min(start_value, end_value) + if value_coeff < self.eps: + continue + + min_x, max_x = min(start[0], end[0]), max(start[0], end[0]) + min_y, max_y = min(start[1], end[1]), max(start[1], end[1]) + + min_x = max(int(min_x - 3 * sigma), 0) + max_x = min(int(max_x + 3 * sigma) + 1, img_w) + min_y = max(int(min_y - 3 * sigma), 0) + max_y = min(int(max_y + 3 * sigma) + 1, img_h) + + x = np.arange(min_x, max_x, 1, np.float32) + y = np.arange(min_y, max_y, 1, np.float32) + + if not (len(x) and len(y)): + continue + + y = y[:, None] + x_0 = np.zeros_like(x) + y_0 = np.zeros_like(y) + + # distance to start keypoints + d2_start = ((x - start[0])**2 + (y - start[1])**2) + + # distance to end keypoints + d2_end = ((x - end[0])**2 + (y - end[1])**2) + + # the distance between start and end keypoints. + d2_ab = ((start[0] - end[0])**2 + (start[1] - end[1])**2) + + if d2_ab < 1: + full_map = self.generate_a_heatmap(img_h, img_w, [start], + sigma, [start_value]) + heatmap = np.maximum(heatmap, full_map) + continue + + coeff = (d2_start - d2_end + d2_ab) / 2. / d2_ab + + a_dominate = coeff <= 0 + b_dominate = coeff >= 1 + seg_dominate = 1 - a_dominate - b_dominate + + position = np.stack([x + y_0, y + x_0], axis=-1) + projection = start + np.stack([coeff, coeff], axis=-1) * ( + end - start) + d2_line = position - projection + d2_line = d2_line[:, :, 0]**2 + d2_line[:, :, 1]**2 + d2_seg = ( + a_dominate * d2_start + b_dominate * d2_end + + seg_dominate * d2_line) + + patch = np.exp(-d2_seg / 2. / sigma**2) + patch = patch * value_coeff + + heatmap[min_y:max_y, min_x:max_x] = np.maximum( + heatmap[min_y:max_y, min_x:max_x], patch) + + return heatmap + + def generate_heatmap(self, img_h, img_w, kps, sigma, max_values): + """Generate pseudo heatmap for all keypoints and limbs in one frame (if + needed). + + Args: + img_h (int): The height of the heatmap. + img_w (int): The width of the heatmap. + kps (np.ndarray): The coordinates of keypoints in this frame. + sigma (float): The sigma of generated gaussian. + max_values (np.ndarray): The confidence score of each keypoint. + + Returns: + np.ndarray: The generated pseudo heatmap. + """ + + heatmaps = [] + if self.with_kp: + num_kp = kps.shape[1] + for i in range(num_kp): + heatmap = self.generate_a_heatmap(img_h, img_w, kps[:, i], + sigma, max_values[:, i]) + heatmaps.append(heatmap) + + if self.with_limb: + for limb in self.skeletons: + start_idx, end_idx = limb + starts = kps[:, start_idx] + ends = kps[:, end_idx] + + start_values = max_values[:, start_idx] + end_values = max_values[:, end_idx] + heatmap = self.generate_a_limb_heatmap(img_h, img_w, starts, + ends, sigma, + start_values, + end_values) + heatmaps.append(heatmap) + + return np.stack(heatmaps, axis=-1) + + def gen_an_aug(self, results): + """Generate pseudo heatmaps for all frames. + + Args: + results (dict): The dictionary that contains all info of a sample. + + Returns: + list[np.ndarray]: The generated pseudo heatmaps. + """ + + all_kps = results['keypoint'] + kp_shape = all_kps.shape + + if 'keypoint_score' in results: + all_kpscores = results['keypoint_score'] + else: + all_kpscores = np.ones(kp_shape[:-1], dtype=np.float32) + + img_h, img_w = results['img_shape'] + num_frame = kp_shape[1] + + imgs = [] + for i in range(num_frame): + sigma = self.sigma + kps = all_kps[:, i] + kpscores = all_kpscores[:, i] + + max_values = np.ones(kpscores.shape, dtype=np.float32) + if self.use_score: + max_values = kpscores + + hmap = self.generate_heatmap(img_h, img_w, kps, sigma, max_values) + imgs.append(hmap) + + return imgs + + def __call__(self, results): + if not self.double: + results['imgs'] = np.stack(self.gen_an_aug(results)) + else: + results_ = cp.deepcopy(results) + flip = Flip( + flip_ratio=1, left_kp=self.left_kp, right_kp=self.right_kp) + results_ = flip(results_) + results['imgs'] = np.concatenate( + [self.gen_an_aug(results), + self.gen_an_aug(results_)]) + return results + + def __repr__(self): + repr_str = (f'{self.__class__.__name__}(' + f'sigma={self.sigma}, ' + f'use_score={self.use_score}, ' + f'with_kp={self.with_kp}, ' + f'with_limb={self.with_limb}, ' + f'skeletons={self.skeletons}, ' + f'double={self.double}, ' + f'left_kp={self.left_kp}, ' + f'right_kp={self.right_kp})') + return repr_str + + +@PIPELINES.register_module() +class PaddingWithLoop: + """Sample frames from the video. + + To sample an n-frame clip from the video, PaddingWithLoop samples + the frames from zero index, and loop the frames if the length of + video frames is less than te value of 'clip_len'. + + Required keys are "total_frames", added or modified keys + are "frame_inds", "clip_len", "frame_interval" and "num_clips". + + Args: + clip_len (int): Frames of each sampled output clip. + num_clips (int): Number of clips to be sampled. Default: 1. + """ + + def __init__(self, clip_len, num_clips=1): + + self.clip_len = clip_len + self.num_clips = num_clips + + def __call__(self, results): + num_frames = results['total_frames'] + + start = 0 + inds = np.arange(start, start + self.clip_len) + inds = np.mod(inds, num_frames) + + results['frame_inds'] = inds.astype(np.int) + results['clip_len'] = self.clip_len + results['frame_interval'] = None + results['num_clips'] = self.num_clips + return results + + +@PIPELINES.register_module() +class PoseNormalize: + """Normalize the range of keypoint values to [-1,1]. + + Args: + mean (list | tuple): The mean value of the keypoint values. + min_value (list | tuple): The minimum value of the keypoint values. + max_value (list | tuple): The maximum value of the keypoint values. + """ + + def __init__(self, + mean=(960., 540., 0.5), + min_value=(0., 0., 0.), + max_value=(1920, 1080, 1.)): + self.mean = np.array(mean, dtype=np.float32).reshape(-1, 1, 1, 1) + self.min_value = np.array( + min_value, dtype=np.float32).reshape(-1, 1, 1, 1) + self.max_value = np.array( + max_value, dtype=np.float32).reshape(-1, 1, 1, 1) + + def __call__(self, results): + keypoint = results['keypoint'] + keypoint = (keypoint - self.mean) / (self.max_value - self.min_value) + results['keypoint'] = keypoint + results['keypoint_norm_cfg'] = dict( + mean=self.mean, min_value=self.min_value, max_value=self.max_value) + return results diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pose_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pose_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..2bbea4c4345ab8085ec202bae5d37e6e28459fc2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/pose_dataset.py @@ -0,0 +1,113 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import mmcv +import numpy as np + +from ..utils import get_root_logger +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class PoseDataset(BaseDataset): + """Pose dataset for action recognition. + + The dataset loads pose and apply specified transforms to return a + dict containing pose information. + + The ann_file is a pickle file, the json file contains a list of + annotations, the fields of an annotation include frame_dir(video_id), + total_frames, label, kp, kpscore. + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + split (str | None): The dataset split used. Only applicable to UCF or + HMDB. Allowed choiced are 'train1', 'test1', 'train2', 'test2', + 'train3', 'test3'. Default: None. + valid_ratio (float | None): The valid_ratio for videos in KineticsPose. + For a video with n frames, it is a valid training sample only if + n * valid_ratio frames have human pose. None means not applicable + (only applicable to Kinetics Pose). Default: None. + box_thr (str | None): The threshold for human proposals. Only boxes + with confidence score larger than `box_thr` is kept. None means + not applicable (only applicable to Kinetics Pose [ours]). Allowed + choices are '0.5', '0.6', '0.7', '0.8', '0.9'. Default: None. + class_prob (dict | None): The per class sampling probability. If not + None, it will override the class_prob calculated in + BaseDataset.__init__(). Default: None. + **kwargs: Keyword arguments for ``BaseDataset``. + """ + + def __init__(self, + ann_file, + pipeline, + split=None, + valid_ratio=None, + box_thr=None, + class_prob=None, + **kwargs): + modality = 'Pose' + # split, applicable to ucf or hmdb + self.split = split + + super().__init__( + ann_file, pipeline, start_index=0, modality=modality, **kwargs) + + # box_thr, which should be a string + self.box_thr = box_thr + if self.box_thr is not None: + assert box_thr in ['0.5', '0.6', '0.7', '0.8', '0.9'] + + # Thresholding Training Examples + self.valid_ratio = valid_ratio + if self.valid_ratio is not None: + assert isinstance(self.valid_ratio, float) + if self.box_thr is None: + self.video_infos = self.video_infos = [ + x for x in self.video_infos + if x['valid_frames'] / x['total_frames'] >= valid_ratio + ] + else: + key = f'valid@{self.box_thr}' + self.video_infos = [ + x for x in self.video_infos + if x[key] / x['total_frames'] >= valid_ratio + ] + if self.box_thr != '0.5': + box_thr = float(self.box_thr) + for item in self.video_infos: + inds = [ + i for i, score in enumerate(item['box_score']) + if score >= box_thr + ] + item['anno_inds'] = np.array(inds) + + if class_prob is not None: + self.class_prob = class_prob + + logger = get_root_logger() + logger.info(f'{len(self)} videos remain after valid thresholding') + + def load_annotations(self): + """Load annotation file to get video information.""" + assert self.ann_file.endswith('.pkl') + return self.load_pkl_annotations() + + def load_pkl_annotations(self): + data = mmcv.load(self.ann_file) + + if self.split: + split, data = data['split'], data['annotations'] + identifier = 'filename' if 'filename' in data[0] else 'frame_dir' + data = [x for x in data if x[identifier] in split[self.split]] + + for item in data: + # Sometimes we may need to load anno from the file + if 'filename' in item: + item['filename'] = osp.join(self.data_prefix, item['filename']) + if 'frame_dir' in item: + item['frame_dir'] = osp.join(self.data_prefix, + item['frame_dir']) + return data diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/rawframe_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/rawframe_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..9359e117b7f52bc234b0e389de0b731e96c9e8db --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/rawframe_dataset.py @@ -0,0 +1,212 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import os.path as osp + +import torch + +from mmaction.datasets.pipelines import Resize +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class RawframeDataset(BaseDataset): + """Rawframe dataset for action recognition. + + The dataset loads raw frames and apply specified transforms to return a + dict containing the frame tensors and other information. + + The ann_file is a text file with multiple lines, and each line indicates + the directory to frames of a video, total frames of the video and + the label of a video, which are split with a whitespace. + Example of a annotation file: + + .. code-block:: txt + + some/directory-1 163 1 + some/directory-2 122 1 + some/directory-3 258 2 + some/directory-4 234 2 + some/directory-5 295 3 + some/directory-6 121 3 + + Example of a multi-class annotation file: + + + .. code-block:: txt + + some/directory-1 163 1 3 5 + some/directory-2 122 1 2 + some/directory-3 258 2 + some/directory-4 234 2 4 6 8 + some/directory-5 295 3 + some/directory-6 121 3 + + Example of a with_offset annotation file (clips from long videos), each + line indicates the directory to frames of a video, the index of the start + frame, total frames of the video clip and the label of a video clip, which + are split with a whitespace. + + + .. code-block:: txt + + some/directory-1 12 163 3 + some/directory-2 213 122 4 + some/directory-3 100 258 5 + some/directory-4 98 234 2 + some/directory-5 0 295 3 + some/directory-6 50 121 3 + + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + data_prefix (str | None): Path to a directory where videos are held. + Default: None. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + filename_tmpl (str): Template for each filename. + Default: 'img_{:05}.jpg'. + with_offset (bool): Determines whether the offset information is in + ann_file. Default: False. + multi_class (bool): Determines whether it is a multi-class + recognition dataset. Default: False. + num_classes (int | None): Number of classes in the dataset. + Default: None. + modality (str): Modality of data. Support 'RGB', 'Flow'. + Default: 'RGB'. + sample_by_class (bool): Sampling by class, should be set `True` when + performing inter-class data balancing. Only compatible with + `multi_class == False`. Only applies for training. Default: False. + power (float): We support sampling data with the probability + proportional to the power of its label frequency (freq ^ power) + when sampling data. `power == 1` indicates uniformly sampling all + data; `power == 0` indicates uniformly sampling all classes. + Default: 0. + dynamic_length (bool): If the dataset length is dynamic (used by + ClassSpecificDistributedSampler). Default: False. + """ + + def __init__(self, + ann_file, + pipeline, + data_prefix=None, + test_mode=False, + filename_tmpl='img_{:05}.jpg', + with_offset=False, + multi_class=False, + num_classes=None, + start_index=1, + modality='RGB', + sample_by_class=False, + power=0., + dynamic_length=False, + **kwargs): + self.filename_tmpl = filename_tmpl + self.with_offset = with_offset + super().__init__( + ann_file, + pipeline, + data_prefix, + test_mode, + multi_class, + num_classes, + start_index, + modality, + sample_by_class=sample_by_class, + power=power, + dynamic_length=dynamic_length) + self.short_cycle_factors = kwargs.get('short_cycle_factors', + [0.5, 0.7071]) + self.default_s = kwargs.get('default_s', (224, 224)) + + def load_annotations(self): + """Load annotation file to get video information.""" + if self.ann_file.endswith('.json'): + return self.load_json_annotations() + video_infos = [] + with open(self.ann_file, 'r') as fin: + for line in fin: + line_split = line.strip().split() + video_info = {} + idx = 0 + # idx for frame_dir + frame_dir = line_split[idx] + if self.data_prefix is not None: + frame_dir = osp.join(self.data_prefix, frame_dir) + video_info['frame_dir'] = frame_dir + idx += 1 + if self.with_offset: + # idx for offset and total_frames + video_info['offset'] = int(line_split[idx]) + video_info['total_frames'] = int(line_split[idx + 1]) + idx += 2 + else: + # idx for total_frames + video_info['total_frames'] = int(line_split[idx]) + idx += 1 + # idx for label[s] + label = [int(x) for x in line_split[idx:]] + assert label, f'missing label in line: {line}' + if self.multi_class: + assert self.num_classes is not None + video_info['label'] = label + else: + assert len(label) == 1 + video_info['label'] = label[0] + video_infos.append(video_info) + + return video_infos + + def prepare_train_frames(self, idx): + """Prepare the frames for training given the index.""" + + def pipeline_for_a_sample(idx): + results = copy.deepcopy(self.video_infos[idx]) + results['filename_tmpl'] = self.filename_tmpl + results['modality'] = self.modality + results['start_index'] = self.start_index + + # prepare tensor in getitem + if self.multi_class: + onehot = torch.zeros(self.num_classes) + onehot[results['label']] = 1. + results['label'] = onehot + + return self.pipeline(results) + + if isinstance(idx, tuple): + index, short_cycle_idx = idx + last_resize = None + for trans in self.pipeline.transforms: + if isinstance(trans, Resize): + last_resize = trans + origin_scale = self.default_s + long_cycle_scale = last_resize.scale + + if short_cycle_idx in [0, 1]: + # 0 and 1 is hard-coded as PySlowFast + scale_ratio = self.short_cycle_factors[short_cycle_idx] + target_scale = tuple( + [int(round(scale_ratio * s)) for s in origin_scale]) + last_resize.scale = target_scale + res = pipeline_for_a_sample(index) + last_resize.scale = long_cycle_scale + return res + else: + return pipeline_for_a_sample(idx) + + def prepare_test_frames(self, idx): + """Prepare the frames for testing given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + results['filename_tmpl'] = self.filename_tmpl + results['modality'] = self.modality + results['start_index'] = self.start_index + + # prepare tensor in getitem + if self.multi_class: + onehot = torch.zeros(self.num_classes) + onehot[results['label']] = 1. + results['label'] = onehot + + return self.pipeline(results) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/rawvideo_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/rawvideo_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..7199f1dff17555f9fa8f29cde78320b1253bfc41 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/rawvideo_dataset.py @@ -0,0 +1,147 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import os.path as osp +import random + +import mmcv + +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class RawVideoDataset(BaseDataset): + """RawVideo dataset for action recognition, used in the Project OmniSource. + + The dataset loads clips of raw videos and apply specified transforms to + return a dict containing the frame tensors and other information. Not that + for this dataset, `multi_class` should be False. + + The ann_file is a text file with multiple lines, and each line indicates + a sample video with the filepath (without suffix), label, number of clips + and index of positive clips (starting from 0), which are split with a + whitespace. Raw videos should be first trimmed into 10 second clips, + organized in the following format: + + .. code-block:: txt + + some/path/D32_1gwq35E/part_0.mp4 + some/path/D32_1gwq35E/part_1.mp4 + ...... + some/path/D32_1gwq35E/part_n.mp4 + + Example of a annotation file: + + .. code-block:: txt + + some/path/D32_1gwq35E 66 10 0 1 2 + some/path/-G-5CJ0JkKY 254 5 3 4 + some/path/T4h1bvOd9DA 33 1 0 + some/path/4uZ27ivBl00 341 2 0 1 + some/path/0LfESFkfBSw 186 234 7 9 11 + some/path/-YIsNpBEx6c 169 100 9 10 11 + + The first line indicates that the raw video `some/path/D32_1gwq35E` has + action label `66`, consists of 10 clips (from `part_0.mp4` to + `part_9.mp4`). The 1st, 2nd and 3rd clips are positive clips. + + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + sampling_strategy (str): The strategy to sample clips from raw videos. + Choices are 'random' or 'positive'. Default: 'positive'. + clipname_tmpl (str): The template of clip name in the raw video. + Default: 'part_{}.mp4'. + **kwargs: Keyword arguments for ``BaseDataset``. + """ + + def __init__(self, + ann_file, + pipeline, + clipname_tmpl='part_{}.mp4', + sampling_strategy='positive', + **kwargs): + super().__init__(ann_file, pipeline, start_index=0, **kwargs) + assert self.multi_class is False + self.sampling_strategy = sampling_strategy + self.clipname_tmpl = clipname_tmpl + # If positive, we should only keep those raw videos with positive + # clips + if self.sampling_strategy == 'positive': + self.video_infos = [ + x for x in self.video_infos if len(x['positive_clip_inds']) + ] + + # do not support multi_class + def load_annotations(self): + """Load annotation file to get video information.""" + if self.ann_file.endswith('.json'): + return self.load_json_annotations() + + video_infos = [] + with open(self.ann_file, 'r') as fin: + for line in fin: + line_split = line.strip().split() + video_dir = line_split[0] + label = int(line_split[1]) + num_clips = int(line_split[2]) + positive_clip_inds = [int(ind) for ind in line_split[3:]] + + if self.data_prefix is not None: + video_dir = osp.join(self.data_prefix, video_dir) + video_infos.append( + dict( + video_dir=video_dir, + label=label, + num_clips=num_clips, + positive_clip_inds=positive_clip_inds)) + return video_infos + + # do not support multi_class + def load_json_annotations(self): + """Load json annotation file to get video information.""" + video_infos = mmcv.load(self.ann_file) + num_videos = len(video_infos) + path_key = 'video_dir' + for i in range(num_videos): + if self.data_prefix is not None: + path_value = video_infos[i][path_key] + path_value = osp.join(self.data_prefix, path_value) + video_infos[i][path_key] = path_value + return video_infos + + def sample_clip(self, results): + """Sample a clip from the raw video given the sampling strategy.""" + assert self.sampling_strategy in ['positive', 'random'] + if self.sampling_strategy == 'positive': + assert results['positive_clip_inds'] + ind = random.choice(results['positive_clip_inds']) + else: + ind = random.randint(0, results['num_clips'] - 1) + clipname = self.clipname_tmpl.format(ind) + + # if the first char of self.clipname_tmpl is a letter, use osp.join; + # otherwise, directly concat them + if self.clipname_tmpl[0].isalpha(): + filename = osp.join(results['video_dir'], clipname) + else: + filename = results['video_dir'] + clipname + results['filename'] = filename + return results + + def prepare_train_frames(self, idx): + """Prepare the frames for training given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + results = self.sample_clip(results) + results['modality'] = self.modality + results['start_index'] = self.start_index + return self.pipeline(results) + + def prepare_test_frames(self, idx): + """Prepare the frames for testing given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + results = self.sample_clip(results) + results['modality'] = self.modality + results['start_index'] = self.start_index + return self.pipeline(results) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/samplers/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/samplers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..00dfae83de84c5a6a8b703e1d362b68c94185fda --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/samplers/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .distributed_sampler import (ClassSpecificDistributedSampler, + DistributedSampler) + +__all__ = ['DistributedSampler', 'ClassSpecificDistributedSampler'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/samplers/distributed_sampler.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/samplers/distributed_sampler.py new file mode 100644 index 0000000000000000000000000000000000000000..1d54079de06f2f54ce50830fcda8be94df13c679 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/samplers/distributed_sampler.py @@ -0,0 +1,142 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import math +from collections import defaultdict + +import torch +from torch.utils.data import DistributedSampler as _DistributedSampler + +from mmaction.core import sync_random_seed + + +class DistributedSampler(_DistributedSampler): + """DistributedSampler inheriting from + ``torch.utils.data.DistributedSampler``. + + In pytorch of lower versions, there is no ``shuffle`` argument. This child + class will port one to DistributedSampler. + """ + + def __init__(self, + dataset, + num_replicas=None, + rank=None, + shuffle=True, + seed=0): + super().__init__( + dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle) + # for the compatibility from PyTorch 1.3+ + # In distributed sampling, different ranks should sample non-overlapped + # data in the dataset. Therefore, this function is used to make sure + # that each rank shuffles the data indices in the same order based + # on the same seed. Then different ranks could use different indices + # to select non-overlapped data from the same data list. + self.seed = sync_random_seed(seed) if seed is not None else 0 + + def __iter__(self): + # deterministically shuffle based on epoch + if self.shuffle: + g = torch.Generator() + g.manual_seed(self.epoch + self.seed) + indices = torch.randperm(len(self.dataset), generator=g).tolist() + else: + indices = torch.arange(len(self.dataset)).tolist() + + # add extra samples to make it evenly divisible + indices += indices[:(self.total_size - len(indices))] + assert len(indices) == self.total_size + + # subsample + indices = indices[self.rank:self.total_size:self.num_replicas] + assert len(indices) == self.num_samples + return iter(indices) + + +class ClassSpecificDistributedSampler(_DistributedSampler): + """ClassSpecificDistributedSampler inheriting from + ``torch.utils.data.DistributedSampler``. + + Samples are sampled with a class specific probability, which should be an + attribute of the dataset (dataset.class_prob, which is a dictionary that + map label index to the prob). This sampler is only applicable to single + class recognition dataset. This sampler is also compatible with + RepeatDataset. + + The default value of dynamic_length is True, which means we use + oversampling / subsampling, and the dataset length may changed. If + dynamic_length is set as False, the dataset length is fixed. + """ + + def __init__(self, + dataset, + num_replicas=None, + rank=None, + dynamic_length=True, + shuffle=True, + seed=0): + super().__init__(dataset, num_replicas=num_replicas, rank=rank) + self.shuffle = shuffle + + if type(dataset).__name__ == 'RepeatDataset': + dataset = dataset.dataset + + assert hasattr(dataset, 'class_prob') + + self.class_prob = dataset.class_prob + self.dynamic_length = dynamic_length + # for the compatibility from PyTorch 1.3+ + self.seed = seed if seed is not None else 0 + + def __iter__(self): + g = torch.Generator() + g.manual_seed(self.seed + self.epoch) + + class_indices = defaultdict(list) + + # To be compatible with RepeatDataset + times = 1 + dataset = self.dataset + if type(dataset).__name__ == 'RepeatDataset': + times = dataset.times + dataset = dataset.dataset + for i, item in enumerate(dataset.video_infos): + class_indices[item['label']].append(i) + + if self.dynamic_length: + indices = [] + for k, prob in self.class_prob.items(): + prob = prob * times + for i in range(int(prob // 1)): + indices.extend(class_indices[k]) + rem = int((prob % 1) * len(class_indices[k])) + rem_indices = torch.randperm( + len(class_indices[k]), generator=g).tolist()[:rem] + indices.extend(rem_indices) + if self.shuffle: + shuffle = torch.randperm(len(indices), generator=g).tolist() + indices = [indices[i] for i in shuffle] + + # re-calc num_samples & total_size + self.num_samples = math.ceil(len(indices) / self.num_replicas) + self.total_size = self.num_samples * self.num_replicas + else: + # We want to keep the dataloader length same as original + video_labels = [x['label'] for x in dataset.video_infos] + probs = [ + self.class_prob[lb] / len(class_indices[lb]) + for lb in video_labels + ] + + indices = torch.multinomial( + torch.Tensor(probs), + self.total_size, + replacement=True, + generator=g) + indices = indices.data.numpy().tolist() + + indices += indices[:(self.total_size - len(indices))] + assert len(indices) == self.total_size + + # retrieve indices for current process + indices = indices[self.rank:self.total_size:self.num_replicas] + assert len(indices) == self.num_samples + return iter(indices) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/ssn_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/ssn_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..065c7422b28b5c35dcd1dbd8bf3c4e85d162213b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/ssn_dataset.py @@ -0,0 +1,882 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import os.path as osp +import warnings +from collections import OrderedDict + +import mmcv +import numpy as np +from torch.nn.modules.utils import _pair + +from ..core import softmax +from ..localization import (eval_ap, load_localize_proposal_file, + perform_regression, temporal_iou, temporal_nms) +from ..utils import get_root_logger +from .base import BaseDataset +from .builder import DATASETS + + +class SSNInstance: + """Proposal instance of SSN. + + Args: + start_frame (int): Index of the proposal's start frame. + end_frame (int): Index of the proposal's end frame. + num_video_frames (int): Total frames of the video. + label (int | None): The category label of the proposal. Default: None. + best_iou (float): The highest IOU with the groundtruth instance. + Default: 0. + overlap_self (float): Percent of the proposal's own span contained + in a groundtruth instance. Default: 0. + """ + + def __init__(self, + start_frame, + end_frame, + num_video_frames, + label=None, + best_iou=0, + overlap_self=0): + self.start_frame = start_frame + self.end_frame = min(end_frame, num_video_frames) + self.num_video_frames = num_video_frames + self.label = label if label is not None else -1 + self.coverage = (end_frame - start_frame) / num_video_frames + self.best_iou = best_iou + self.overlap_self = overlap_self + self.loc_reg = None + self.size_reg = None + self.regression_targets = [0., 0.] + + def compute_regression_targets(self, gt_list): + """Compute regression targets of positive proposals. + + Args: + gt_list (list): The list of groundtruth instances. + """ + # Find the groundtruth instance with the highest IOU. + ious = [ + temporal_iou(self.start_frame, self.end_frame, gt.start_frame, + gt.end_frame) for gt in gt_list + ] + best_gt = gt_list[np.argmax(ious)] + + # interval: [start_frame, end_frame) + proposal_center = (self.start_frame + self.end_frame - 1) / 2 + gt_center = (best_gt.start_frame + best_gt.end_frame - 1) / 2 + proposal_size = self.end_frame - self.start_frame + gt_size = best_gt.end_frame - best_gt.start_frame + + # Get regression targets: + # (1). Localization regression target: + # center shift proportional to the proposal duration + # (2). Duration/Size regression target: + # logarithm of the groundtruth duration over proposal duration + + self.loc_reg = (gt_center - proposal_center) / proposal_size + self.size_reg = np.log(gt_size / proposal_size) + self.regression_targets = ([self.loc_reg, self.size_reg] + if self.loc_reg is not None else [0., 0.]) + + +@DATASETS.register_module() +class SSNDataset(BaseDataset): + """Proposal frame dataset for Structured Segment Networks. + + Based on proposal information, the dataset loads raw frames and applies + specified transforms to return a dict containing the frame tensors and + other information. + + The ann_file is a text file with multiple lines and each + video's information takes up several lines. This file can be a normalized + file with percent or standard file with specific frame indexes. If the file + is a normalized file, it will be converted into a standard file first. + + Template information of a video in a standard file: + .. code-block:: txt + # index + video_id + num_frames + fps + num_gts + label, start_frame, end_frame + label, start_frame, end_frame + ... + num_proposals + label, best_iou, overlap_self, start_frame, end_frame + label, best_iou, overlap_self, start_frame, end_frame + ... + + Example of a standard annotation file: + .. code-block:: txt + # 0 + video_validation_0000202 + 5666 + 1 + 3 + 8 130 185 + 8 832 1136 + 8 1303 1381 + 5 + 8 0.0620 0.0620 790 5671 + 8 0.1656 0.1656 790 2619 + 8 0.0833 0.0833 3945 5671 + 8 0.0960 0.0960 4173 5671 + 8 0.0614 0.0614 3327 5671 + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + train_cfg (dict): Config for training. + test_cfg (dict): Config for testing. + data_prefix (str): Path to a directory where videos are held. + test_mode (bool): Store True when building test or validation dataset. + Default: False. + filename_tmpl (str): Template for each filename. + Default: 'img_{:05}.jpg'. + start_index (int): Specify a start index for frames in consideration of + different filename format. Default: 1. + modality (str): Modality of data. Support 'RGB', 'Flow'. + Default: 'RGB'. + video_centric (bool): Whether to sample proposals just from + this video or sample proposals randomly from the entire dataset. + Default: True. + reg_normalize_constants (list): Regression target normalized constants, + including mean and standard deviation of location and duration. + body_segments (int): Number of segments in course period. + Default: 5. + aug_segments (list[int]): Number of segments in starting and + ending period. Default: (2, 2). + aug_ratio (int | float | tuple[int | float]): The ratio of the length + of augmentation to that of the proposal. Default: (0.5, 0.5). + clip_len (int): Frames of each sampled output clip. + Default: 1. + frame_interval (int): Temporal interval of adjacent sampled frames. + Default: 1. + filter_gt (bool): Whether to filter videos with no annotation + during training. Default: True. + use_regression (bool): Whether to perform regression. Default: True. + verbose (bool): Whether to print full information or not. + Default: False. + """ + + def __init__(self, + ann_file, + pipeline, + train_cfg, + test_cfg, + data_prefix, + test_mode=False, + filename_tmpl='img_{:05d}.jpg', + start_index=1, + modality='RGB', + video_centric=True, + reg_normalize_constants=None, + body_segments=5, + aug_segments=(2, 2), + aug_ratio=(0.5, 0.5), + clip_len=1, + frame_interval=1, + filter_gt=True, + use_regression=True, + verbose=False): + self.logger = get_root_logger() + super().__init__( + ann_file, + pipeline, + data_prefix=data_prefix, + test_mode=test_mode, + start_index=start_index, + modality=modality) + self.train_cfg = train_cfg + self.test_cfg = test_cfg + self.assigner = train_cfg.ssn.assigner + self.sampler = train_cfg.ssn.sampler + self.evaluater = test_cfg.ssn.evaluater + self.verbose = verbose + self.filename_tmpl = filename_tmpl + + if filter_gt or not test_mode: + valid_inds = [ + i for i, video_info in enumerate(self.video_infos) + if len(video_info['gts']) > 0 + ] + self.logger.info(f'{len(valid_inds)} out of {len(self.video_infos)} ' + f'videos are valid.') + self.video_infos = [self.video_infos[i] for i in valid_inds] + + # construct three pools: + # 1. Positive(Foreground) + # 2. Background + # 3. Incomplete + self.positive_pool = [] + self.background_pool = [] + self.incomplete_pool = [] + self.construct_proposal_pools() + + if reg_normalize_constants is None: + self.reg_norm_consts = self._compute_reg_normalize_constants() + else: + self.reg_norm_consts = reg_normalize_constants + self.video_centric = video_centric + self.body_segments = body_segments + self.aug_segments = aug_segments + self.aug_ratio = _pair(aug_ratio) + if not mmcv.is_tuple_of(self.aug_ratio, (int, float)): + raise TypeError(f'aug_ratio should be int, float' + f'or tuple of int and float, ' + f'but got {type(aug_ratio)}') + assert len(self.aug_ratio) == 2 + + total_ratio = ( + self.sampler.positive_ratio + self.sampler.background_ratio + + self.sampler.incomplete_ratio) + self.positive_per_video = int( + self.sampler.num_per_video * + (self.sampler.positive_ratio / total_ratio)) + self.background_per_video = int( + self.sampler.num_per_video * + (self.sampler.background_ratio / total_ratio)) + self.incomplete_per_video = ( + self.sampler.num_per_video - self.positive_per_video - + self.background_per_video) + + self.test_interval = self.test_cfg.ssn.sampler.test_interval + # number of consecutive frames + self.clip_len = clip_len + # number of steps (sparse sampling for efficiency of io) + self.frame_interval = frame_interval + + # test mode or not + self.filter_gt = filter_gt + self.use_regression = use_regression + self.test_mode = test_mode + + # yapf: disable + if self.verbose: + self.logger.info(f""" + SSNDataset: proposal file {self.proposal_file} parsed. + + There are {len(self.positive_pool) + len(self.background_pool) + + len(self.incomplete_pool)} usable proposals from {len(self.video_infos)} videos. + {len(self.positive_pool)} positive proposals + {len(self.incomplete_pool)} incomplete proposals + {len(self.background_pool)} background proposals + + Sample config: + FG/BG/INCOMP: {self.positive_per_video}/{self.background_per_video}/{self.incomplete_per_video} # noqa:E501 + Video Centric: {self.video_centric} + + Regression Normalization Constants: + Location: mean {self.reg_norm_consts[0][0]:.05f} std {self.reg_norm_consts[1][0]:.05f} # noqa: E501 + Duration: mean {self.reg_norm_consts[0][1]:.05f} std {self.reg_norm_consts[1][1]:.05f} # noqa: E501 + """) + # yapf: enable + else: + self.logger.info( + f'SSNDataset: proposal file {self.proposal_file} parsed.') + + def load_annotations(self): + """Load annotation file to get video information.""" + video_infos = [] + if 'normalized_' in self.ann_file: + self.proposal_file = self.ann_file.replace('normalized_', '') + if not osp.exists(self.proposal_file): + raise Exception(f'Please refer to `$MMACTION2/tools/data` to' + f'denormalize {self.ann_file}.') + else: + self.proposal_file = self.ann_file + proposal_infos = load_localize_proposal_file(self.proposal_file) + # proposal_info:[video_id, num_frames, gt_list, proposal_list] + # gt_list member: [label, start_frame, end_frame] + # proposal_list member: [label, best_iou, overlap_self, + # start_frame, end_frame] + for proposal_info in proposal_infos: + if self.data_prefix is not None: + frame_dir = osp.join(self.data_prefix, proposal_info[0]) + num_frames = int(proposal_info[1]) + # gts:start, end, num_frames, class_label, tIoU=1 + gts = [] + for x in proposal_info[2]: + if int(x[2]) > int(x[1]) and int(x[1]) < num_frames: + ssn_instance = SSNInstance( + int(x[1]), + int(x[2]), + num_frames, + label=int(x[0]), + best_iou=1.0) + gts.append(ssn_instance) + # proposals:start, end, num_frames, class_label + # tIoU=best_iou, overlap_self + proposals = [] + for x in proposal_info[3]: + if int(x[4]) > int(x[3]) and int(x[3]) < num_frames: + ssn_instance = SSNInstance( + int(x[3]), + int(x[4]), + num_frames, + label=int(x[0]), + best_iou=float(x[1]), + overlap_self=float(x[2])) + proposals.append(ssn_instance) + video_infos.append( + dict( + frame_dir=frame_dir, + video_id=proposal_info[0], + total_frames=num_frames, + gts=gts, + proposals=proposals)) + return video_infos + + def results_to_detections(self, results, top_k=2000, **kwargs): + """Convert prediction results into detections. + + Args: + results (list): Prediction results. + top_k (int): Number of top results. Default: 2000. + + Returns: + list: Detection results. + """ + num_classes = results[0]['activity_scores'].shape[1] - 1 + detections = [dict() for _ in range(num_classes)] + + for idx in range(len(self)): + video_id = self.video_infos[idx]['video_id'] + relative_proposals = results[idx]['relative_proposal_list'] + if len(relative_proposals[0].shape) == 3: + relative_proposals = np.squeeze(relative_proposals, 0) + + activity_scores = results[idx]['activity_scores'] + completeness_scores = results[idx]['completeness_scores'] + regression_scores = results[idx]['bbox_preds'] + if regression_scores is None: + regression_scores = np.zeros( + (len(relative_proposals), num_classes, 2), + dtype=np.float32) + regression_scores = regression_scores.reshape((-1, num_classes, 2)) + + if top_k <= 0: + combined_scores = ( + softmax(activity_scores[:, 1:], dim=1) * + np.exp(completeness_scores)) + for i in range(num_classes): + center_scores = regression_scores[:, i, 0][:, None] + duration_scores = regression_scores[:, i, 1][:, None] + detections[i][video_id] = np.concatenate( + (relative_proposals, combined_scores[:, i][:, None], + center_scores, duration_scores), + axis=1) + else: + combined_scores = ( + softmax(activity_scores[:, 1:], dim=1) * + np.exp(completeness_scores)) + keep_idx = np.argsort(combined_scores.ravel())[-top_k:] + for k in keep_idx: + class_idx = k % num_classes + proposal_idx = k // num_classes + new_item = [ + relative_proposals[proposal_idx, 0], + relative_proposals[proposal_idx, + 1], combined_scores[proposal_idx, + class_idx], + regression_scores[proposal_idx, class_idx, + 0], regression_scores[proposal_idx, + class_idx, 1] + ] + if video_id not in detections[class_idx]: + detections[class_idx][video_id] = np.array([new_item]) + else: + detections[class_idx][video_id] = np.vstack( + [detections[class_idx][video_id], new_item]) + + return detections + + def evaluate(self, + results, + metrics='mAP', + metric_options=dict(mAP=dict(eval_dataset='thumos14')), + logger=None, + **deprecated_kwargs): + """Evaluation in SSN proposal dataset. + + Args: + results (list[dict]): Output results. + metrics (str | sequence[str]): Metrics to be performed. + Defaults: 'mAP'. + metric_options (dict): Dict for metric options. Options are + ``eval_dataset`` for ``mAP``. + Default: ``dict(mAP=dict(eval_dataset='thumos14'))``. + logger (logging.Logger | None): Logger for recording. + Default: None. + deprecated_kwargs (dict): Used for containing deprecated arguments. + See 'https://github.com/open-mmlab/mmaction2/pull/286'. + + Returns: + dict: Evaluation results for evaluation metrics. + """ + # Protect ``metric_options`` since it uses mutable value as default + metric_options = copy.deepcopy(metric_options) + + if deprecated_kwargs != {}: + warnings.warn( + 'Option arguments for metrics has been changed to ' + "`metric_options`, See 'https://github.com/open-mmlab/mmaction2/pull/286' " # noqa: E501 + 'for more details') + metric_options['mAP'] = dict(metric_options['mAP'], + **deprecated_kwargs) + + if not isinstance(results, list): + raise TypeError(f'results must be a list, but got {type(results)}') + assert len(results) == len(self), ( + f'The length of results is not equal to the dataset len: ' + f'{len(results)} != {len(self)}') + + metrics = metrics if isinstance(metrics, (list, tuple)) else [metrics] + allowed_metrics = ['mAP'] + for metric in metrics: + if metric not in allowed_metrics: + raise KeyError(f'metric {metric} is not supported') + + detections = self.results_to_detections(results, **self.evaluater) + + if self.use_regression: + self.logger.info('Performing location regression') + for class_idx, _ in enumerate(detections): + detections[class_idx] = { + k: perform_regression(v) + for k, v in detections[class_idx].items() + } + self.logger.info('Regression finished') + + self.logger.info('Performing NMS') + for class_idx, _ in enumerate(detections): + detections[class_idx] = { + k: temporal_nms(v, self.evaluater.nms) + for k, v in detections[class_idx].items() + } + self.logger.info('NMS finished') + + # get gts + all_gts = self.get_all_gts() + for class_idx, _ in enumerate(detections): + if class_idx not in all_gts: + all_gts[class_idx] = dict() + + # get predictions + plain_detections = {} + for class_idx, _ in enumerate(detections): + detection_list = [] + for video, dets in detections[class_idx].items(): + detection_list.extend([[video, class_idx] + x[:3] + for x in dets.tolist()]) + plain_detections[class_idx] = detection_list + + eval_results = OrderedDict() + for metric in metrics: + if metric == 'mAP': + eval_dataset = metric_options.setdefault('mAP', {}).setdefault( + 'eval_dataset', 'thumos14') + if eval_dataset == 'thumos14': + iou_range = np.arange(0.1, 1.0, .1) + ap_values = eval_ap(plain_detections, all_gts, iou_range) + map_ious = ap_values.mean(axis=0) + self.logger.info('Evaluation finished') + + for iou, map_iou in zip(iou_range, map_ious): + eval_results[f'mAP@{iou:.02f}'] = map_iou + + return eval_results + + def construct_proposal_pools(self): + """Construct positive proposal pool, incomplete proposal pool and + background proposal pool of the entire dataset.""" + for video_info in self.video_infos: + positives = self.get_positives( + video_info['gts'], video_info['proposals'], + self.assigner.positive_iou_threshold, + self.sampler.add_gt_as_proposals) + self.positive_pool.extend([(video_info['video_id'], proposal) + for proposal in positives]) + + incompletes, backgrounds = self.get_negatives( + video_info['proposals'], + self.assigner.incomplete_iou_threshold, + self.assigner.background_iou_threshold, + self.assigner.background_coverage_threshold, + self.assigner.incomplete_overlap_threshold) + self.incomplete_pool.extend([(video_info['video_id'], proposal) + for proposal in incompletes]) + self.background_pool.extend([video_info['video_id'], proposal] + for proposal in backgrounds) + + def get_all_gts(self): + """Fetch groundtruth instances of the entire dataset.""" + gts = {} + for video_info in self.video_infos: + video = video_info['video_id'] + for gt in video_info['gts']: + class_idx = gt.label - 1 + # gt_info: [relative_start, relative_end] + gt_info = [ + gt.start_frame / video_info['total_frames'], + gt.end_frame / video_info['total_frames'] + ] + gts.setdefault(class_idx, {}).setdefault(video, + []).append(gt_info) + + return gts + + @staticmethod + def get_positives(gts, proposals, positive_threshold, with_gt=True): + """Get positive/foreground proposals. + + Args: + gts (list): List of groundtruth instances(:obj:`SSNInstance`). + proposals (list): List of proposal instances(:obj:`SSNInstance`). + positive_threshold (float): Minimum threshold of overlap of + positive/foreground proposals and groundtruths. + with_gt (bool): Whether to include groundtruth instances in + positive proposals. Default: True. + + Returns: + list[:obj:`SSNInstance`]: (positives), positives is a list + comprised of positive proposal instances. + """ + positives = [ + proposal for proposal in proposals + if proposal.best_iou > positive_threshold + ] + + if with_gt: + positives.extend(gts) + + for proposal in positives: + proposal.compute_regression_targets(gts) + + return positives + + @staticmethod + def get_negatives(proposals, + incomplete_iou_threshold, + background_iou_threshold, + background_coverage_threshold=0.01, + incomplete_overlap_threshold=0.7): + """Get negative proposals, including incomplete proposals and + background proposals. + + Args: + proposals (list): List of proposal instances(:obj:`SSNInstance`). + incomplete_iou_threshold (float): Maximum threshold of overlap + of incomplete proposals and groundtruths. + background_iou_threshold (float): Maximum threshold of overlap + of background proposals and groundtruths. + background_coverage_threshold (float): Minimum coverage + of background proposals in video duration. Default: 0.01. + incomplete_overlap_threshold (float): Minimum percent of incomplete + proposals' own span contained in a groundtruth instance. + Default: 0.7. + + Returns: + list[:obj:`SSNInstance`]: (incompletes, backgrounds), incompletes + and backgrounds are lists comprised of incomplete + proposal instances and background proposal instances. + """ + incompletes = [] + backgrounds = [] + + for proposal in proposals: + if (proposal.best_iou < incomplete_iou_threshold + and proposal.overlap_self > incomplete_overlap_threshold): + incompletes.append(proposal) + elif (proposal.best_iou < background_iou_threshold + and proposal.coverage > background_coverage_threshold): + backgrounds.append(proposal) + + return incompletes, backgrounds + + def _video_centric_sampling(self, record): + """Sample proposals from the this video instance. + + Args: + record (dict): Information of the video instance(video_info[idx]). + key: frame_dir, video_id, total_frames, + gts: List of groundtruth instances(:obj:`SSNInstance`). + proposals: List of proposal instances(:obj:`SSNInstance`). + """ + positives = self.get_positives(record['gts'], record['proposals'], + self.assigner.positive_iou_threshold, + self.sampler.add_gt_as_proposals) + incompletes, backgrounds = self.get_negatives( + record['proposals'], self.assigner.incomplete_iou_threshold, + self.assigner.background_iou_threshold, + self.assigner.background_coverage_threshold, + self.assigner.incomplete_overlap_threshold) + + def sample_video_proposals(proposal_type, video_id, video_pool, + num_requested_proposals, dataset_pool): + """This method will sample proposals from the this video pool. If + the video pool is empty, it will fetch from the dataset pool + (collect proposal of the entire dataset). + + Args: + proposal_type (int): Type id of proposal. + Positive/Foreground: 0 + Negative: + Incomplete: 1 + Background: 2 + video_id (str): Name of the video. + video_pool (list): Pool comprised of proposals in this video. + num_requested_proposals (int): Number of proposals + to be sampled. + dataset_pool (list): Proposals of the entire dataset. + + Returns: + list[(str, :obj:`SSNInstance`), int]: + video_id (str): Name of the video. + :obj:`SSNInstance`: Instance of class SSNInstance. + proposal_type (int): Type of proposal. + """ + + if len(video_pool) == 0: + idx = np.random.choice( + len(dataset_pool), num_requested_proposals, replace=False) + return [(dataset_pool[x], proposal_type) for x in idx] + + replicate = len(video_pool) < num_requested_proposals + idx = np.random.choice( + len(video_pool), num_requested_proposals, replace=replicate) + return [((video_id, video_pool[x]), proposal_type) for x in idx] + + out_proposals = [] + out_proposals.extend( + sample_video_proposals(0, record['video_id'], positives, + self.positive_per_video, + self.positive_pool)) + out_proposals.extend( + sample_video_proposals(1, record['video_id'], incompletes, + self.incomplete_per_video, + self.incomplete_pool)) + out_proposals.extend( + sample_video_proposals(2, record['video_id'], backgrounds, + self.background_per_video, + self.background_pool)) + + return out_proposals + + def _random_sampling(self): + """Randomly sample proposals from the entire dataset.""" + out_proposals = [] + + positive_idx = np.random.choice( + len(self.positive_pool), + self.positive_per_video, + replace=len(self.positive_pool) < self.positive_per_video) + out_proposals.extend([(self.positive_pool[x], 0) + for x in positive_idx]) + incomplete_idx = np.random.choice( + len(self.incomplete_pool), + self.incomplete_per_video, + replace=len(self.incomplete_pool) < self.incomplete_per_video) + out_proposals.extend([(self.incomplete_pool[x], 1) + for x in incomplete_idx]) + background_idx = np.random.choice( + len(self.background_pool), + self.background_per_video, + replace=len(self.background_pool) < self.background_per_video) + out_proposals.extend([(self.background_pool[x], 2) + for x in background_idx]) + + return out_proposals + + def _get_stage(self, proposal, num_frames): + """Fetch the scale factor of starting and ending stage and get the + stage split. + + Args: + proposal (:obj:`SSNInstance`): Proposal instance. + num_frames (int): Total frames of the video. + + Returns: + tuple[float, float, list]: (starting_scale_factor, + ending_scale_factor, stage_split), starting_scale_factor is + the ratio of the effective sampling length to augment length + in starting stage, ending_scale_factor is the ratio of the + effective sampling length to augment length in ending stage, + stage_split is ending segment id of starting, course and + ending stage. + """ + # proposal interval: [start_frame, end_frame) + start_frame = proposal.start_frame + end_frame = proposal.end_frame + ori_clip_len = self.clip_len * self.frame_interval + + duration = end_frame - start_frame + assert duration != 0 + + valid_starting = max(0, + start_frame - int(duration * self.aug_ratio[0])) + valid_ending = min(num_frames - ori_clip_len + 1, + end_frame - 1 + int(duration * self.aug_ratio[1])) + + valid_starting_length = start_frame - valid_starting - ori_clip_len + valid_ending_length = (valid_ending - end_frame + 1) - ori_clip_len + + starting_scale_factor = ((valid_starting_length + ori_clip_len + 1) / + (duration * self.aug_ratio[0])) + ending_scale_factor = (valid_ending_length + ori_clip_len + 1) / ( + duration * self.aug_ratio[1]) + + aug_start, aug_end = self.aug_segments + stage_split = [ + aug_start, aug_start + self.body_segments, + aug_start + self.body_segments + aug_end + ] + + return starting_scale_factor, ending_scale_factor, stage_split + + def _compute_reg_normalize_constants(self): + """Compute regression target normalized constants.""" + if self.verbose: + self.logger.info('Compute regression target normalized constants') + targets = [] + for video_info in self.video_infos: + positives = self.get_positives( + video_info['gts'], video_info['proposals'], + self.assigner.positive_iou_threshold, False) + for positive in positives: + targets.append(list(positive.regression_targets)) + + return np.array((np.mean(targets, axis=0), np.std(targets, axis=0))) + + def prepare_train_frames(self, idx): + """Prepare the frames for training given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + results['filename_tmpl'] = self.filename_tmpl + results['modality'] = self.modality + results['start_index'] = self.start_index + + if self.video_centric: + # yapf: disable + results['out_proposals'] = self._video_centric_sampling(self.video_infos[idx]) # noqa: E501 + # yapf: enable + else: + results['out_proposals'] = self._random_sampling() + + out_proposal_scale_factor = [] + out_proposal_type = [] + out_proposal_labels = [] + out_proposal_reg_targets = [] + + for _, proposal in enumerate(results['out_proposals']): + # proposal: [(video_id, SSNInstance), proposal_type] + num_frames = proposal[0][1].num_video_frames + + (starting_scale_factor, ending_scale_factor, + _) = self._get_stage(proposal[0][1], num_frames) + + # proposal[1]: Type id of proposal. + # Positive/Foreground: 0 + # Negative: + # Incomplete: 1 + # Background: 2 + + # Positivte/Foreground proposal + if proposal[1] == 0: + label = proposal[0][1].label + # Incomplete proposal + elif proposal[1] == 1: + label = proposal[0][1].label + # Background proposal + elif proposal[1] == 2: + label = 0 + else: + raise ValueError(f'Proposal type should be 0, 1, or 2,' + f'but got {proposal[1]}') + out_proposal_scale_factor.append( + [starting_scale_factor, ending_scale_factor]) + if not isinstance(label, int): + raise TypeError(f'proposal_label must be an int,' + f'but got {type(label)}') + out_proposal_labels.append(label) + out_proposal_type.append(proposal[1]) + + reg_targets = proposal[0][1].regression_targets + if proposal[1] == 0: + # Normalize regression targets of positive proposals. + reg_targets = ((reg_targets[0] - self.reg_norm_consts[0][0]) / + self.reg_norm_consts[1][0], + (reg_targets[1] - self.reg_norm_consts[0][1]) / + self.reg_norm_consts[1][1]) + out_proposal_reg_targets.append(reg_targets) + + results['reg_targets'] = np.array( + out_proposal_reg_targets, dtype=np.float32) + results['proposal_scale_factor'] = np.array( + out_proposal_scale_factor, dtype=np.float32) + results['proposal_labels'] = np.array(out_proposal_labels) + results['proposal_type'] = np.array(out_proposal_type) + + return self.pipeline(results) + + def prepare_test_frames(self, idx): + """Prepare the frames for testing given the index.""" + results = copy.deepcopy(self.video_infos[idx]) + results['filename_tmpl'] = self.filename_tmpl + results['modality'] = self.modality + results['start_index'] = self.start_index + + proposals = results['proposals'] + num_frames = results['total_frames'] + ori_clip_len = self.clip_len * self.frame_interval + frame_ticks = np.arange( + 0, num_frames - ori_clip_len, self.test_interval, dtype=int) + 1 + + num_sampled_frames = len(frame_ticks) + + if len(proposals) == 0: + proposals.append(SSNInstance(0, num_frames - 1, num_frames)) + + relative_proposal_list = [] + proposal_tick_list = [] + scale_factor_list = [] + + for proposal in proposals: + relative_proposal = (proposal.start_frame / num_frames, + proposal.end_frame / num_frames) + relative_duration = relative_proposal[1] - relative_proposal[0] + relative_starting_duration = relative_duration * self.aug_ratio[0] + relative_ending_duration = relative_duration * self.aug_ratio[1] + relative_starting = ( + relative_proposal[0] - relative_starting_duration) + relative_ending = relative_proposal[1] + relative_ending_duration + + real_relative_starting = max(0.0, relative_starting) + real_relative_ending = min(1.0, relative_ending) + + starting_scale_factor = ( + (relative_proposal[0] - real_relative_starting) / + relative_starting_duration) + ending_scale_factor = ( + (real_relative_ending - relative_proposal[1]) / + relative_ending_duration) + + proposal_ranges = (real_relative_starting, *relative_proposal, + real_relative_ending) + proposal_ticks = (np.array(proposal_ranges) * + num_sampled_frames).astype(np.int32) + + relative_proposal_list.append(relative_proposal) + proposal_tick_list.append(proposal_ticks) + scale_factor_list.append( + (starting_scale_factor, ending_scale_factor)) + + results['relative_proposal_list'] = np.array( + relative_proposal_list, dtype=np.float32) + results['scale_factor_list'] = np.array( + scale_factor_list, dtype=np.float32) + results['proposal_tick_list'] = np.array( + proposal_tick_list, dtype=np.int32) + results['reg_norm_consts'] = self.reg_norm_consts + + return self.pipeline(results) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/video_dataset.py b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/video_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..21c47808b6198977ebe91e466bd1a16d1551ca5c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/datasets/video_dataset.py @@ -0,0 +1,61 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +from .base import BaseDataset +from .builder import DATASETS + + +@DATASETS.register_module() +class VideoDataset(BaseDataset): + """Video dataset for action recognition. + + The dataset loads raw videos and apply specified transforms to return a + dict containing the frame tensors and other information. + + The ann_file is a text file with multiple lines, and each line indicates + a sample video with the filepath and label, which are split with a + whitespace. Example of a annotation file: + + .. code-block:: txt + + some/path/000.mp4 1 + some/path/001.mp4 1 + some/path/002.mp4 2 + some/path/003.mp4 2 + some/path/004.mp4 3 + some/path/005.mp4 3 + + + Args: + ann_file (str): Path to the annotation file. + pipeline (list[dict | callable]): A sequence of data transforms. + start_index (int): Specify a start index for frames in consideration of + different filename format. However, when taking videos as input, + it should be set to 0, since frames loaded from videos count + from 0. Default: 0. + **kwargs: Keyword arguments for ``BaseDataset``. + """ + + def __init__(self, ann_file, pipeline, start_index=0, **kwargs): + super().__init__(ann_file, pipeline, start_index=start_index, **kwargs) + + def load_annotations(self): + """Load annotation file to get video information.""" + if self.ann_file.endswith('.json'): + return self.load_json_annotations() + + video_infos = [] + with open(self.ann_file, 'r') as fin: + for line in fin: + line_split = line.strip().split() + if self.multi_class: + assert self.num_classes is not None + filename, label = line_split[0], line_split[1:] + label = list(map(int, label)) + else: + filename, label = line_split + label = int(label) + if self.data_prefix is not None: + filename = osp.join(self.data_prefix, filename) + video_infos.append(dict(filename=filename, label=label)) + return video_infos diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/localization/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/localization/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..64ebdaab9feef6fadbd94de2324984b43f7a7de2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/localization/__init__.py @@ -0,0 +1,11 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .bsn_utils import generate_bsp_feature, generate_candidate_proposals +from .proposal_utils import soft_nms, temporal_iop, temporal_iou +from .ssn_utils import (eval_ap, load_localize_proposal_file, + perform_regression, temporal_nms) + +__all__ = [ + 'generate_candidate_proposals', 'generate_bsp_feature', 'temporal_iop', + 'temporal_iou', 'soft_nms', 'load_localize_proposal_file', + 'perform_regression', 'temporal_nms', 'eval_ap' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/localization/bsn_utils.py b/openmmlab_test/mmaction2-0.24.1/mmaction/localization/bsn_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..27de337215a4c7918e3f58ef894f75f8b4abedf3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/localization/bsn_utils.py @@ -0,0 +1,268 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import numpy as np + +from .proposal_utils import temporal_iop, temporal_iou + + +def generate_candidate_proposals(video_list, + video_infos, + tem_results_dir, + temporal_scale, + peak_threshold, + tem_results_ext='.csv', + result_dict=None): + """Generate Candidate Proposals with given temporal evaluation results. + Each proposal file will contain: + 'tmin,tmax,tmin_score,tmax_score,score,match_iou,match_ioa'. + + Args: + video_list (list[int]): List of video indexes to generate proposals. + video_infos (list[dict]): List of video_info dict that contains + 'video_name', 'duration_frame', 'duration_second', + 'feature_frame', and 'annotations'. + tem_results_dir (str): Directory to load temporal evaluation + results. + temporal_scale (int): The number (scale) on temporal axis. + peak_threshold (float): The threshold for proposal generation. + tem_results_ext (str): File extension for temporal evaluation + model output. Default: '.csv'. + result_dict (dict | None): The dict to save the results. Default: None. + + Returns: + dict: A dict contains video_name as keys and proposal list as value. + If result_dict is not None, save the results to it. + """ + if tem_results_ext != '.csv': + raise NotImplementedError('Only support csv format now.') + + tscale = temporal_scale + tgap = 1. / tscale + proposal_dict = {} + for video_index in video_list: + video_name = video_infos[video_index]['video_name'] + tem_path = osp.join(tem_results_dir, video_name + tem_results_ext) + tem_results = np.loadtxt( + tem_path, dtype=np.float32, delimiter=',', skiprows=1) + start_scores = tem_results[:, 1] + end_scores = tem_results[:, 2] + + max_start = max(start_scores) + max_end = max(end_scores) + + start_bins = np.zeros(len(start_scores)) + start_bins[[0, -1]] = 1 + end_bins = np.zeros(len(end_scores)) + end_bins[[0, -1]] = 1 + for idx in range(1, tscale - 1): + if start_scores[idx] > start_scores[ + idx + 1] and start_scores[idx] > start_scores[idx - 1]: + start_bins[idx] = 1 + elif start_scores[idx] > (peak_threshold * max_start): + start_bins[idx] = 1 + if end_scores[idx] > end_scores[ + idx + 1] and end_scores[idx] > end_scores[idx - 1]: + end_bins[idx] = 1 + elif end_scores[idx] > (peak_threshold * max_end): + end_bins[idx] = 1 + + tmin_list = [] + tmin_score_list = [] + tmax_list = [] + tmax_score_list = [] + for idx in range(tscale): + if start_bins[idx] == 1: + tmin_list.append(tgap / 2 + tgap * idx) + tmin_score_list.append(start_scores[idx]) + if end_bins[idx] == 1: + tmax_list.append(tgap / 2 + tgap * idx) + tmax_score_list.append(end_scores[idx]) + + new_props = [] + for tmax, tmax_score in zip(tmax_list, tmax_score_list): + for tmin, tmin_score in zip(tmin_list, tmin_score_list): + if tmin >= tmax: + break + new_props.append([tmin, tmax, tmin_score, tmax_score]) + + new_props = np.stack(new_props) + + score = (new_props[:, 2] * new_props[:, 3]).reshape(-1, 1) + new_props = np.concatenate((new_props, score), axis=1) + + new_props = new_props[new_props[:, -1].argsort()[::-1]] + video_info = video_infos[video_index] + video_frame = video_info['duration_frame'] + video_second = video_info['duration_second'] + feature_frame = video_info['feature_frame'] + corrected_second = float(feature_frame) / video_frame * video_second + + gt_tmins = [] + gt_tmaxs = [] + for annotations in video_info['annotations']: + gt_tmins.append(annotations['segment'][0] / corrected_second) + gt_tmaxs.append(annotations['segment'][1] / corrected_second) + + new_iou_list = [] + new_ioa_list = [] + for new_prop in new_props: + new_iou = max( + temporal_iou(new_prop[0], new_prop[1], gt_tmins, gt_tmaxs)) + new_ioa = max( + temporal_iop(new_prop[0], new_prop[1], gt_tmins, gt_tmaxs)) + new_iou_list.append(new_iou) + new_ioa_list.append(new_ioa) + + new_iou_list = np.array(new_iou_list).reshape(-1, 1) + new_ioa_list = np.array(new_ioa_list).reshape(-1, 1) + new_props = np.concatenate((new_props, new_iou_list), axis=1) + new_props = np.concatenate((new_props, new_ioa_list), axis=1) + proposal_dict[video_name] = new_props + if result_dict is not None: + result_dict[video_name] = new_props + return proposal_dict + + +def generate_bsp_feature(video_list, + video_infos, + tem_results_dir, + pgm_proposals_dir, + top_k=1000, + bsp_boundary_ratio=0.2, + num_sample_start=8, + num_sample_end=8, + num_sample_action=16, + num_sample_interp=3, + tem_results_ext='.csv', + pgm_proposal_ext='.csv', + result_dict=None): + """Generate Boundary-Sensitive Proposal Feature with given proposals. + + Args: + video_list (list[int]): List of video indexes to generate bsp_feature. + video_infos (list[dict]): List of video_info dict that contains + 'video_name'. + tem_results_dir (str): Directory to load temporal evaluation + results. + pgm_proposals_dir (str): Directory to load proposals. + top_k (int): Number of proposals to be considered. Default: 1000 + bsp_boundary_ratio (float): Ratio for proposal boundary + (start/end). Default: 0.2. + num_sample_start (int): Num of samples for actionness in + start region. Default: 8. + num_sample_end (int): Num of samples for actionness in end region. + Default: 8. + num_sample_action (int): Num of samples for actionness in center + region. Default: 16. + num_sample_interp (int): Num of samples for interpolation for + each sample point. Default: 3. + tem_results_ext (str): File extension for temporal evaluation + model output. Default: '.csv'. + pgm_proposal_ext (str): File extension for proposals. Default: '.csv'. + result_dict (dict | None): The dict to save the results. Default: None. + + Returns: + bsp_feature_dict (dict): A dict contains video_name as keys and + bsp_feature as value. If result_dict is not None, save the + results to it. + """ + if tem_results_ext != '.csv' or pgm_proposal_ext != '.csv': + raise NotImplementedError('Only support csv format now.') + + bsp_feature_dict = {} + for video_index in video_list: + video_name = video_infos[video_index]['video_name'] + + # Load temporal evaluation results + tem_path = osp.join(tem_results_dir, video_name + tem_results_ext) + tem_results = np.loadtxt( + tem_path, dtype=np.float32, delimiter=',', skiprows=1) + score_action = tem_results[:, 0] + seg_tmins = tem_results[:, 3] + seg_tmaxs = tem_results[:, 4] + video_scale = len(tem_results) + video_gap = seg_tmaxs[0] - seg_tmins[0] + video_extend = int(video_scale / 4 + 10) + + # Load proposals results + proposal_path = osp.join(pgm_proposals_dir, + video_name + pgm_proposal_ext) + pgm_proposals = np.loadtxt( + proposal_path, dtype=np.float32, delimiter=',', skiprows=1) + pgm_proposals = pgm_proposals[:top_k] + + # Generate temporal sample points + boundary_zeros = np.zeros([video_extend]) + score_action = np.concatenate( + (boundary_zeros, score_action, boundary_zeros)) + begin_tp = [] + middle_tp = [] + end_tp = [] + for i in range(video_extend): + begin_tp.append(-video_gap / 2 - + (video_extend - 1 - i) * video_gap) + end_tp.append(video_gap / 2 + seg_tmaxs[-1] + i * video_gap) + for i in range(video_scale): + middle_tp.append(video_gap / 2 + i * video_gap) + t_points = begin_tp + middle_tp + end_tp + + bsp_feature = [] + for pgm_proposal in pgm_proposals: + tmin = pgm_proposal[0] + tmax = pgm_proposal[1] + + tlen = tmax - tmin + # Temporal range for start + tmin_0 = tmin - tlen * bsp_boundary_ratio + tmin_1 = tmin + tlen * bsp_boundary_ratio + # Temporal range for end + tmax_0 = tmax - tlen * bsp_boundary_ratio + tmax_1 = tmax + tlen * bsp_boundary_ratio + + # Generate features at start boundary + tlen_start = (tmin_1 - tmin_0) / (num_sample_start - 1) + tlen_start_sample = tlen_start / num_sample_interp + t_new = [ + tmin_0 - tlen_start / 2 + tlen_start_sample * i + for i in range(num_sample_start * num_sample_interp + 1) + ] + y_new_start_action = np.interp(t_new, t_points, score_action) + y_new_start = [ + np.mean(y_new_start_action[i * num_sample_interp:(i + 1) * + num_sample_interp + 1]) + for i in range(num_sample_start) + ] + # Generate features at end boundary + tlen_end = (tmax_1 - tmax_0) / (num_sample_end - 1) + tlen_end_sample = tlen_end / num_sample_interp + t_new = [ + tmax_0 - tlen_end / 2 + tlen_end_sample * i + for i in range(num_sample_end * num_sample_interp + 1) + ] + y_new_end_action = np.interp(t_new, t_points, score_action) + y_new_end = [ + np.mean(y_new_end_action[i * num_sample_interp:(i + 1) * + num_sample_interp + 1]) + for i in range(num_sample_end) + ] + # Generate features for action + tlen_action = (tmax - tmin) / (num_sample_action - 1) + tlen_action_sample = tlen_action / num_sample_interp + t_new = [ + tmin - tlen_action / 2 + tlen_action_sample * i + for i in range(num_sample_action * num_sample_interp + 1) + ] + y_new_action = np.interp(t_new, t_points, score_action) + y_new_action = [ + np.mean(y_new_action[i * num_sample_interp:(i + 1) * + num_sample_interp + 1]) + for i in range(num_sample_action) + ] + feature = np.concatenate([y_new_action, y_new_start, y_new_end]) + bsp_feature.append(feature) + bsp_feature = np.array(bsp_feature) + bsp_feature_dict[video_name] = bsp_feature + if result_dict is not None: + result_dict[video_name] = bsp_feature + return bsp_feature_dict diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/localization/proposal_utils.py b/openmmlab_test/mmaction2-0.24.1/mmaction/localization/proposal_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..a3e2f4cf0ddfbad8570daf52cf217379a1b0e8cc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/localization/proposal_utils.py @@ -0,0 +1,95 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np + + +def temporal_iou(proposal_min, proposal_max, gt_min, gt_max): + """Compute IoU score between a groundtruth bbox and the proposals. + + Args: + proposal_min (list[float]): List of temporal anchor min. + proposal_max (list[float]): List of temporal anchor max. + gt_min (float): Groundtruth temporal box min. + gt_max (float): Groundtruth temporal box max. + + Returns: + list[float]: List of iou scores. + """ + len_anchors = proposal_max - proposal_min + int_tmin = np.maximum(proposal_min, gt_min) + int_tmax = np.minimum(proposal_max, gt_max) + inter_len = np.maximum(int_tmax - int_tmin, 0.) + union_len = len_anchors - inter_len + gt_max - gt_min + jaccard = np.divide(inter_len, union_len) + return jaccard + + +def temporal_iop(proposal_min, proposal_max, gt_min, gt_max): + """Compute IoP score between a groundtruth bbox and the proposals. + + Compute the IoP which is defined as the overlap ratio with + groundtruth proportional to the duration of this proposal. + + Args: + proposal_min (list[float]): List of temporal anchor min. + proposal_max (list[float]): List of temporal anchor max. + gt_min (float): Groundtruth temporal box min. + gt_max (float): Groundtruth temporal box max. + + Returns: + list[float]: List of intersection over anchor scores. + """ + len_anchors = np.array(proposal_max - proposal_min) + int_tmin = np.maximum(proposal_min, gt_min) + int_tmax = np.minimum(proposal_max, gt_max) + inter_len = np.maximum(int_tmax - int_tmin, 0.) + scores = np.divide(inter_len, len_anchors) + return scores + + +def soft_nms(proposals, alpha, low_threshold, high_threshold, top_k): + """Soft NMS for temporal proposals. + + Args: + proposals (np.ndarray): Proposals generated by network. + alpha (float): Alpha value of Gaussian decaying function. + low_threshold (float): Low threshold for soft nms. + high_threshold (float): High threshold for soft nms. + top_k (int): Top k values to be considered. + + Returns: + np.ndarray: The updated proposals. + """ + proposals = proposals[proposals[:, -1].argsort()[::-1]] + tstart = list(proposals[:, 0]) + tend = list(proposals[:, 1]) + tscore = list(proposals[:, -1]) + rstart = [] + rend = [] + rscore = [] + + while len(tscore) > 0 and len(rscore) <= top_k: + max_index = np.argmax(tscore) + max_width = tend[max_index] - tstart[max_index] + iou_list = temporal_iou(tstart[max_index], tend[max_index], + np.array(tstart), np.array(tend)) + iou_exp_list = np.exp(-np.square(iou_list) / alpha) + + for idx, _ in enumerate(tscore): + if idx != max_index: + current_iou = iou_list[idx] + if current_iou > low_threshold + (high_threshold - + low_threshold) * max_width: + tscore[idx] = tscore[idx] * iou_exp_list[idx] + + rstart.append(tstart[max_index]) + rend.append(tend[max_index]) + rscore.append(tscore[max_index]) + tstart.pop(max_index) + tend.pop(max_index) + tscore.pop(max_index) + + rstart = np.array(rstart).reshape(-1, 1) + rend = np.array(rend).reshape(-1, 1) + rscore = np.array(rscore).reshape(-1, 1) + new_proposals = np.concatenate((rstart, rend, rscore), axis=1) + return new_proposals diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/localization/ssn_utils.py b/openmmlab_test/mmaction2-0.24.1/mmaction/localization/ssn_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..51f434b0ff9338138174718a1340b08100c4cd90 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/localization/ssn_utils.py @@ -0,0 +1,169 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from itertools import groupby + +import numpy as np + +from ..core import average_precision_at_temporal_iou +from . import temporal_iou + + +def load_localize_proposal_file(filename): + """Load the proposal file and split it into many parts which contain one + video's information separately. + + Args: + filename(str): Path to the proposal file. + + Returns: + list: List of all videos' information. + """ + lines = list(open(filename)) + + # Split the proposal file into many parts which contain one video's + # information separately. + groups = groupby(lines, lambda x: x.startswith('#')) + + video_infos = [[x.strip() for x in list(g)] for k, g in groups if not k] + + def parse_group(video_info): + """Parse the video's information. + + Template information of a video in a standard file: + # index + video_id + num_frames + fps + num_gts + label, start_frame, end_frame + label, start_frame, end_frame + ... + num_proposals + label, best_iou, overlap_self, start_frame, end_frame + label, best_iou, overlap_self, start_frame, end_frame + ... + + Example of a standard annotation file: + + .. code-block:: txt + + # 0 + video_validation_0000202 + 5666 + 1 + 3 + 8 130 185 + 8 832 1136 + 8 1303 1381 + 5 + 8 0.0620 0.0620 790 5671 + 8 0.1656 0.1656 790 2619 + 8 0.0833 0.0833 3945 5671 + 8 0.0960 0.0960 4173 5671 + 8 0.0614 0.0614 3327 5671 + + Args: + video_info (list): Information of the video. + + Returns: + tuple[str, int, list, list]: + video_id (str): Name of the video. + num_frames (int): Number of frames in the video. + gt_boxes (list): List of the information of gt boxes. + proposal_boxes (list): List of the information of + proposal boxes. + """ + offset = 0 + video_id = video_info[offset] + offset += 1 + + num_frames = int(float(video_info[1]) * float(video_info[2])) + num_gts = int(video_info[3]) + offset = 4 + + gt_boxes = [x.split() for x in video_info[offset:offset + num_gts]] + offset += num_gts + num_proposals = int(video_info[offset]) + offset += 1 + proposal_boxes = [ + x.split() for x in video_info[offset:offset + num_proposals] + ] + + return video_id, num_frames, gt_boxes, proposal_boxes + + return [parse_group(video_info) for video_info in video_infos] + + +def perform_regression(detections): + """Perform regression on detection results. + + Args: + detections (list): Detection results before regression. + + Returns: + list: Detection results after regression. + """ + starts = detections[:, 0] + ends = detections[:, 1] + centers = (starts + ends) / 2 + durations = ends - starts + + new_centers = centers + durations * detections[:, 3] + new_durations = durations * np.exp(detections[:, 4]) + + new_detections = np.concatenate( + (np.clip(new_centers - new_durations / 2, 0, + 1)[:, None], np.clip(new_centers + new_durations / 2, 0, + 1)[:, None], detections[:, 2:]), + axis=1) + return new_detections + + +def temporal_nms(detections, threshold): + """Parse the video's information. + + Args: + detections (list): Detection results before NMS. + threshold (float): Threshold of NMS. + + Returns: + list: Detection results after NMS. + """ + starts = detections[:, 0] + ends = detections[:, 1] + scores = detections[:, 2] + + order = scores.argsort()[::-1] + + keep = [] + while order.size > 0: + i = order[0] + keep.append(i) + ious = temporal_iou(starts[order[1:]], ends[order[1:]], starts[i], + ends[i]) + idxs = np.where(ious <= threshold)[0] + order = order[idxs + 1] + + return detections[keep, :] + + +def eval_ap(detections, gt_by_cls, iou_range): + """Evaluate average precisions. + + Args: + detections (dict): Results of detections. + gt_by_cls (dict): Information of groudtruth. + iou_range (list): Ranges of iou. + + Returns: + list: Average precision values of classes at ious. + """ + ap_values = np.zeros((len(detections), len(iou_range))) + + for iou_idx, min_overlap in enumerate(iou_range): + for class_idx, _ in enumerate(detections): + ap = average_precision_at_temporal_iou(gt_by_cls[class_idx], + detections[class_idx], + [min_overlap]) + ap_values[class_idx, iou_idx] = ap + + return ap_values diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..d3936ced2e699af6c0500e28ce5ee421afe8c2bb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/__init__.py @@ -0,0 +1,45 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .backbones import (C3D, STGCN, X3D, MobileNetV2, MobileNetV2TSM, ResNet, + ResNet2Plus1d, ResNet3d, ResNet3dCSN, ResNet3dLayer, + ResNet3dSlowFast, ResNet3dSlowOnly, ResNetAudio, + ResNetTIN, ResNetTSM, TANet, TimeSformer) +from .builder import (BACKBONES, DETECTORS, HEADS, LOCALIZERS, LOSSES, NECKS, + RECOGNIZERS, build_backbone, build_detector, build_head, + build_localizer, build_loss, build_model, build_neck, + build_recognizer) +from .common import (LFB, TAM, Conv2plus1d, ConvAudio, + DividedSpatialAttentionWithNorm, + DividedTemporalAttentionWithNorm, FFNWithNorm, + SubBatchNorm3D) +from .heads import (ACRNHead, AudioTSNHead, AVARoIHead, BaseHead, BBoxHeadAVA, + FBOHead, I3DHead, LFBInferHead, SlowFastHead, STGCNHead, + TimeSformerHead, TPNHead, TRNHead, TSMHead, TSNHead, + X3DHead) +from .localizers import BMN, PEM, TEM +from .losses import (BCELossWithLogits, BinaryLogisticRegressionLoss, BMNLoss, + CBFocalLoss, CrossEntropyLoss, HVULoss, NLLLoss, + OHEMHingeLoss, SSNLoss) +from .necks import TPN +from .recognizers import (AudioRecognizer, BaseRecognizer, Recognizer2D, + Recognizer3D) +from .roi_extractors import SingleRoIExtractor3D +from .skeleton_gcn import BaseGCN, SkeletonGCN + +__all__ = [ + 'BACKBONES', 'HEADS', 'RECOGNIZERS', 'build_recognizer', 'build_head', + 'build_backbone', 'Recognizer2D', 'Recognizer3D', 'C3D', 'ResNet', 'STGCN', + 'ResNet3d', 'ResNet2Plus1d', 'I3DHead', 'TSNHead', 'TSMHead', 'BaseHead', + 'STGCNHead', 'BaseRecognizer', 'LOSSES', 'CrossEntropyLoss', 'NLLLoss', + 'HVULoss', 'ResNetTSM', 'ResNet3dSlowFast', 'SlowFastHead', 'Conv2plus1d', + 'ResNet3dSlowOnly', 'BCELossWithLogits', 'LOCALIZERS', 'build_localizer', + 'PEM', 'TAM', 'TEM', 'BinaryLogisticRegressionLoss', 'BMN', 'BMNLoss', + 'build_model', 'OHEMHingeLoss', 'SSNLoss', 'ResNet3dCSN', 'ResNetTIN', + 'TPN', 'TPNHead', 'build_loss', 'build_neck', 'AudioRecognizer', + 'AudioTSNHead', 'X3D', 'X3DHead', 'ResNet3dLayer', 'DETECTORS', + 'SingleRoIExtractor3D', 'BBoxHeadAVA', 'ResNetAudio', 'build_detector', + 'ConvAudio', 'AVARoIHead', 'MobileNetV2', 'MobileNetV2TSM', 'TANet', 'LFB', + 'FBOHead', 'LFBInferHead', 'TRNHead', 'NECKS', 'TimeSformer', + 'TimeSformerHead', 'DividedSpatialAttentionWithNorm', + 'DividedTemporalAttentionWithNorm', 'FFNWithNorm', 'ACRNHead', 'BaseGCN', + 'SkeletonGCN', 'CBFocalLoss', 'SubBatchNorm3D' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..0beb89ddfe9e1b11112875948b9456d91e427ea6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/__init__.py @@ -0,0 +1,25 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .agcn import AGCN +from .c3d import C3D +from .mobilenet_v2 import MobileNetV2 +from .mobilenet_v2_tsm import MobileNetV2TSM +from .resnet import ResNet +from .resnet2plus1d import ResNet2Plus1d +from .resnet3d import ResNet3d, ResNet3dLayer +from .resnet3d_csn import ResNet3dCSN +from .resnet3d_slowfast import ResNet3dSlowFast +from .resnet3d_slowonly import ResNet3dSlowOnly +from .resnet_audio import ResNetAudio +from .resnet_tin import ResNetTIN +from .resnet_tsm import ResNetTSM +from .stgcn import STGCN +from .tanet import TANet +from .timesformer import TimeSformer +from .x3d import X3D + +__all__ = [ + 'C3D', 'ResNet', 'ResNet3d', 'ResNetTSM', 'ResNet2Plus1d', + 'ResNet3dSlowFast', 'ResNet3dSlowOnly', 'ResNet3dCSN', 'ResNetTIN', 'X3D', + 'ResNetAudio', 'ResNet3dLayer', 'MobileNetV2TSM', 'MobileNetV2', 'TANet', + 'TimeSformer', 'STGCN', 'AGCN' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/agcn.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/agcn.py new file mode 100644 index 0000000000000000000000000000000000000000..b3932de9fe6d034b3cbf00c6d17c9ff32b6f6453 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/agcn.py @@ -0,0 +1,338 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import math + +import torch +import torch.nn as nn +from mmcv.cnn import constant_init, kaiming_init, normal_init +from mmcv.runner import load_checkpoint + +from ...utils import get_root_logger +from ..builder import BACKBONES +from ..skeleton_gcn.utils import Graph + + +def conv_branch_init(conv, branches): + weight = conv.weight + n = weight.size(0) + k1 = weight.size(1) + k2 = weight.size(2) + normal_init(weight, mean=0, std=math.sqrt(2. / (n * k1 * k2 * branches))) + constant_init(conv.bias, 0) + + +def conv_init(conv): + kaiming_init(conv.weight) + constant_init(conv.bias, 0) + + +def bn_init(bn, scale): + constant_init(bn.weight, scale) + constant_init(bn.bias, 0) + + +def zero(x): + """return zero.""" + return 0 + + +def identity(x): + """return input itself.""" + return x + + +class AGCNBlock(nn.Module): + """Applies spatial graph convolution and temporal convolution over an + input graph sequence. + + Args: + in_channels (int): Number of channels in the input sequence data + out_channels (int): Number of channels produced by the convolution + kernel_size (tuple): Size of the temporal convolving kernel and + graph convolving kernel + stride (int, optional): Stride of the temporal convolution. Default: 1 + adj_len (int, optional): The length of the adjacency matrix. + Default: 17 + dropout (int, optional): Dropout rate of the final output. Default: 0 + residual (bool, optional): If ``True``, applies a residual mechanism. + Default: ``True`` + + Shape: + - Input[0]: Input graph sequence in :math:`(N, in_channels, T_{in}, V)` + format + - Input[1]: Input graph adjacency matrix in :math:`(K, V, V)` format + - Output[0]: Outpu graph sequence in :math:`(N, out_channels, T_{out}, + V)` format + - Output[1]: Graph adjacency matrix for output data in :math:`(K, V, + V)` format + + where + :math:`N` is a batch size, + :math:`K` is the spatial kernel size, as :math:`K == kernel_size[1] + `, + :math:`T_{in}/T_{out}` is a length of input/output sequence, + :math:`V` is the number of graph nodes. + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size, + stride=1, + adj_len=17, + dropout=0, + residual=True): + super().__init__() + + assert len(kernel_size) == 2 + assert kernel_size[0] % 2 == 1 + padding = ((kernel_size[0] - 1) // 2, 0) + + self.gcn = ConvTemporalGraphical( + in_channels, out_channels, kernel_size[1], adj_len=adj_len) + self.tcn = nn.Sequential( + nn.Conv2d(out_channels, out_channels, (kernel_size[0], 1), + (stride, 1), padding), nn.BatchNorm2d(out_channels), + nn.Dropout(dropout, inplace=True)) + + # tcn init + for m in self.tcn.modules(): + if isinstance(m, nn.Conv2d): + conv_init(m) + elif isinstance(m, nn.BatchNorm2d): + bn_init(m, 1) + + if not residual: + self.residual = zero + + elif (in_channels == out_channels) and (stride == 1): + self.residual = identity + + else: + self.residual = nn.Sequential( + nn.Conv2d( + in_channels, + out_channels, + kernel_size=1, + stride=(stride, 1)), nn.BatchNorm2d(out_channels)) + + self.relu = nn.ReLU(inplace=True) + + def forward(self, x, adj_mat): + """Defines the computation performed at every call.""" + res = self.residual(x) + x, adj_mat = self.gcn(x, adj_mat) + + x = self.tcn(x) + res + + return self.relu(x), adj_mat + + +class ConvTemporalGraphical(nn.Module): + """The basic module for applying a graph convolution. + + Args: + in_channels (int): Number of channels in the input sequence data + out_channels (int): Number of channels produced by the convolution + kernel_size (int): Size of the graph convolving kernel + t_kernel_size (int): Size of the temporal convolving kernel + t_stride (int, optional): Stride of the temporal convolution. + Default: 1 + t_padding (int, optional): Temporal zero-padding added to both sides + of the input. Default: 0 + t_dilation (int, optional): Spacing between temporal kernel elements. + Default: 1 + adj_len (int, optional): The length of the adjacency matrix. + Default: 17 + bias (bool, optional): If ``True``, adds a learnable bias to the + output. Default: ``True`` + + Shape: + - Input[0]: Input graph sequence in :math:`(N, in_channels, T_{in}, V)` + format + - Input[1]: Input graph adjacency matrix in :math:`(K, V, V)` format + - Output[0]: Output graph sequence in :math:`(N, out_channels, T_{out} + , V)` format + - Output[1]: Graph adjacency matrix for output data in :math:`(K, V, V) + ` format + + where + :math:`N` is a batch size, + :math:`K` is the spatial kernel size, as :math:`K == kernel_size[1] + `, + :math:`T_{in}/T_{out}` is a length of input/output sequence, + :math:`V` is the number of graph nodes. + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size, + t_kernel_size=1, + t_stride=1, + t_padding=0, + t_dilation=1, + adj_len=17, + bias=True): + super().__init__() + + self.kernel_size = kernel_size + + self.PA = nn.Parameter(torch.FloatTensor(3, adj_len, adj_len)) + torch.nn.init.constant_(self.PA, 1e-6) + + self.num_subset = 3 + inter_channels = out_channels // 4 + self.inter_c = inter_channels + self.conv_a = nn.ModuleList() + self.conv_b = nn.ModuleList() + self.conv_d = nn.ModuleList() + for i in range(self.num_subset): + self.conv_a.append(nn.Conv2d(in_channels, inter_channels, 1)) + self.conv_b.append(nn.Conv2d(in_channels, inter_channels, 1)) + self.conv_d.append(nn.Conv2d(in_channels, out_channels, 1)) + + if in_channels != out_channels: + self.down = nn.Sequential( + nn.Conv2d(in_channels, out_channels, 1), + nn.BatchNorm2d(out_channels)) + else: + self.down = lambda x: x + + self.bn = nn.BatchNorm2d(out_channels) + self.soft = nn.Softmax(-2) + self.relu = nn.ReLU() + + for m in self.modules(): + if isinstance(m, nn.Conv2d): + conv_init(m) + elif isinstance(m, nn.BatchNorm2d): + bn_init(m, 1) + bn_init(self.bn, 1e-6) + for i in range(self.num_subset): + conv_branch_init(self.conv_d[i], self.num_subset) + + def forward(self, x, adj_mat): + """Defines the computation performed at every call.""" + assert adj_mat.size(0) == self.kernel_size + + N, C, T, V = x.size() + A = adj_mat + self.PA + + y = None + for i in range(self.num_subset): + A1 = self.conv_a[i](x).permute(0, 3, 1, 2).contiguous().view( + N, V, self.inter_c * T) + A2 = self.conv_b[i](x).view(N, self.inter_c * T, V) + A1 = self.soft(torch.matmul(A1, A2) / A1.size(-1)) # N V V + A1 = A1 + A[i] + A2 = x.view(N, C * T, V) + z = self.conv_d[i](torch.matmul(A2, A1).view(N, C, T, V)) + y = z + y if y is not None else z + y = self.bn(y) + y += self.down(x) + + return self.relu(y), adj_mat + + +@BACKBONES.register_module() +class AGCN(nn.Module): + """Backbone of Two-Stream Adaptive Graph Convolutional Networks for + Skeleton-Based Action Recognition. + + Args: + in_channels (int): Number of channels in the input data. + graph_cfg (dict): The arguments for building the graph. + data_bn (bool): If 'True', adds data normalization to the inputs. + Default: True. + pretrained (str | None): Name of pretrained model. + **kwargs (optional): Other parameters for graph convolution units. + + Shape: + - Input: :math:`(N, in_channels, T_{in}, V_{in}, M_{in})` + - Output: :math:`(N, num_class)` where + :math:`N` is a batch size, + :math:`T_{in}` is a length of input sequence, + :math:`V_{in}` is the number of graph nodes, + :math:`M_{in}` is the number of instance in a frame. + """ + + def __init__(self, + in_channels, + graph_cfg, + data_bn=True, + pretrained=None, + **kwargs): + super().__init__() + + # load graph + self.graph = Graph(**graph_cfg) + A = torch.tensor( + self.graph.A, dtype=torch.float32, requires_grad=False) + self.register_buffer('A', A) + + # build networks + spatial_kernel_size = A.size(0) + temporal_kernel_size = 9 + kernel_size = (temporal_kernel_size, spatial_kernel_size) + self.data_bn = nn.BatchNorm1d(in_channels * + A.size(1)) if data_bn else identity + + kwargs0 = {k: v for k, v in kwargs.items() if k != 'dropout'} + self.agcn_networks = nn.ModuleList(( + AGCNBlock( + in_channels, + 64, + kernel_size, + 1, + adj_len=A.size(1), + residual=False, + **kwargs0), + AGCNBlock(64, 64, kernel_size, 1, adj_len=A.size(1), **kwargs), + AGCNBlock(64, 64, kernel_size, 1, adj_len=A.size(1), **kwargs), + AGCNBlock(64, 64, kernel_size, 1, adj_len=A.size(1), **kwargs), + AGCNBlock(64, 128, kernel_size, 2, adj_len=A.size(1), **kwargs), + AGCNBlock(128, 128, kernel_size, 1, adj_len=A.size(1), **kwargs), + AGCNBlock(128, 128, kernel_size, 1, adj_len=A.size(1), **kwargs), + AGCNBlock(128, 256, kernel_size, 2, adj_len=A.size(1), **kwargs), + AGCNBlock(256, 256, kernel_size, 1, adj_len=A.size(1), **kwargs), + AGCNBlock(256, 256, kernel_size, 1, adj_len=A.size(1), **kwargs), + )) + + self.pretrained = pretrained + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + if isinstance(self.pretrained, str): + logger = get_root_logger() + logger.info(f'load model from: {self.pretrained}') + + load_checkpoint(self, self.pretrained, strict=False, logger=logger) + + elif self.pretrained is None: + pass + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, x): + """Defines the computation performed at every call. + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + # data normalization + x = x.float() + n, c, t, v, m = x.size() + x = x.permute(0, 4, 3, 1, 2).contiguous() # N M V C T + x = x.view(n * m, v * c, t) + x = self.data_bn(x) + x = x.view(n, m, v, c, t) + x = x.permute(0, 1, 3, 4, 2).contiguous() + x = x.view(n * m, c, t, v) + + for gcn in self.agcn_networks: + x, _ = gcn(x, self.A) + + return x diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/c3d.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/c3d.py new file mode 100644 index 0000000000000000000000000000000000000000..ad5d4aa672cfdddd37a2c6b89bad0754158ae14a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/c3d.py @@ -0,0 +1,143 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import ConvModule, constant_init, kaiming_init, normal_init +from mmcv.runner import load_checkpoint +from mmcv.utils import _BatchNorm + +from ...utils import get_root_logger +from ..builder import BACKBONES + + +@BACKBONES.register_module() +class C3D(nn.Module): + """C3D backbone. + + Args: + pretrained (str | None): Name of pretrained model. + style (str): ``pytorch`` or ``caffe``. If set to "pytorch", the + stride-two layer is the 3x3 conv layer, otherwise the stride-two + layer is the first 1x1 conv layer. Default: 'pytorch'. + conv_cfg (dict | None): Config dict for convolution layer. + If set to None, it uses ``dict(type='Conv3d')`` to construct + layers. Default: None. + norm_cfg (dict | None): Config for norm layers. required keys are + ``type``, Default: None. + act_cfg (dict | None): Config dict for activation layer. If set to + None, it uses ``dict(type='ReLU')`` to construct layers. + Default: None. + out_dim (int): The dimension of last layer feature (after flatten). + Depends on the input shape. Default: 8192. + dropout_ratio (float): Probability of dropout layer. Default: 0.5. + init_std (float): Std value for Initiation of fc layers. Default: 0.01. + """ + + def __init__(self, + pretrained=None, + style='pytorch', + conv_cfg=None, + norm_cfg=None, + act_cfg=None, + out_dim=8192, + dropout_ratio=0.5, + init_std=0.005): + super().__init__() + if conv_cfg is None: + conv_cfg = dict(type='Conv3d') + if act_cfg is None: + act_cfg = dict(type='ReLU') + self.pretrained = pretrained + self.style = style + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.dropout_ratio = dropout_ratio + self.init_std = init_std + + c3d_conv_param = dict( + kernel_size=(3, 3, 3), + padding=(1, 1, 1), + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + + self.conv1a = ConvModule(3, 64, **c3d_conv_param) + self.pool1 = nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2)) + + self.conv2a = ConvModule(64, 128, **c3d_conv_param) + self.pool2 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2)) + + self.conv3a = ConvModule(128, 256, **c3d_conv_param) + self.conv3b = ConvModule(256, 256, **c3d_conv_param) + self.pool3 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2)) + + self.conv4a = ConvModule(256, 512, **c3d_conv_param) + self.conv4b = ConvModule(512, 512, **c3d_conv_param) + self.pool4 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2)) + + self.conv5a = ConvModule(512, 512, **c3d_conv_param) + self.conv5b = ConvModule(512, 512, **c3d_conv_param) + self.pool5 = nn.MaxPool3d( + kernel_size=(2, 2, 2), stride=(2, 2, 2), padding=(0, 1, 1)) + + self.fc6 = nn.Linear(out_dim, 4096) + self.fc7 = nn.Linear(4096, 4096) + + self.relu = nn.ReLU() + self.dropout = nn.Dropout(p=self.dropout_ratio) + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + if isinstance(self.pretrained, str): + logger = get_root_logger() + logger.info(f'load model from: {self.pretrained}') + + load_checkpoint(self, self.pretrained, strict=False, logger=logger) + + elif self.pretrained is None: + for m in self.modules(): + if isinstance(m, nn.Conv3d): + kaiming_init(m) + elif isinstance(m, nn.Linear): + normal_init(m, std=self.init_std) + elif isinstance(m, _BatchNorm): + constant_init(m, 1) + + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + the size of x is (num_batches, 3, 16, 112, 112). + + Returns: + torch.Tensor: The feature of the input + samples extracted by the backbone. + """ + x = self.conv1a(x) + x = self.pool1(x) + + x = self.conv2a(x) + x = self.pool2(x) + + x = self.conv3a(x) + x = self.conv3b(x) + x = self.pool3(x) + + x = self.conv4a(x) + x = self.conv4b(x) + x = self.pool4(x) + + x = self.conv5a(x) + x = self.conv5b(x) + x = self.pool5(x) + + x = x.flatten(start_dim=1) + x = self.relu(self.fc6(x)) + x = self.dropout(x) + x = self.relu(self.fc7(x)) + + return x diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/mobilenet_v2.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/mobilenet_v2.py new file mode 100644 index 0000000000000000000000000000000000000000..b0047b81e8f6bf931cc0913429e3c7753f0e9ecb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/mobilenet_v2.py @@ -0,0 +1,301 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +import torch.utils.checkpoint as cp +from mmcv.cnn import ConvModule, constant_init, kaiming_init +from mmcv.runner import load_checkpoint +from torch.nn.modules.batchnorm import _BatchNorm + +from ...utils import get_root_logger +from ..builder import BACKBONES + + +def make_divisible(value, divisor, min_value=None, min_ratio=0.9): + """Make divisible function. + + This function rounds the channel number down to the nearest value that can + be divisible by the divisor. + Args: + value (int): The original channel number. + divisor (int): The divisor to fully divide the channel number. + min_value (int, optional): The minimum value of the output channel. + Default: None, means that the minimum value equal to the divisor. + min_ratio (float, optional): The minimum ratio of the rounded channel + number to the original channel number. Default: 0.9. + Returns: + int: The modified output channel number + """ + + if min_value is None: + min_value = divisor + new_value = max(min_value, int(value + divisor / 2) // divisor * divisor) + # Make sure that round down does not go down by more than (1-min_ratio). + if new_value < min_ratio * value: + new_value += divisor + return new_value + + +class InvertedResidual(nn.Module): + """InvertedResidual block for MobileNetV2. + + Args: + in_channels (int): The input channels of the InvertedResidual block. + out_channels (int): The output channels of the InvertedResidual block. + stride (int): Stride of the middle (first) 3x3 convolution. + expand_ratio (int): adjusts number of channels of the hidden layer + in InvertedResidual by this amount. + conv_cfg (dict): Config dict for convolution layer. + Default: None, which means using conv2d. + norm_cfg (dict): Config dict for normalization layer. + Default: dict(type='BN'). + act_cfg (dict): Config dict for activation layer. + Default: dict(type='ReLU6'). + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + Returns: + Tensor: The output tensor + """ + + def __init__(self, + in_channels, + out_channels, + stride, + expand_ratio, + conv_cfg=None, + norm_cfg=dict(type='BN'), + act_cfg=dict(type='ReLU6'), + with_cp=False): + super(InvertedResidual, self).__init__() + self.stride = stride + assert stride in [1, 2], f'stride must in [1, 2]. ' \ + f'But received {stride}.' + self.with_cp = with_cp + self.use_res_connect = self.stride == 1 and in_channels == out_channels + hidden_dim = int(round(in_channels * expand_ratio)) + + layers = [] + if expand_ratio != 1: + layers.append( + ConvModule( + in_channels=in_channels, + out_channels=hidden_dim, + kernel_size=1, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg)) + layers.extend([ + ConvModule( + in_channels=hidden_dim, + out_channels=hidden_dim, + kernel_size=3, + stride=stride, + padding=1, + groups=hidden_dim, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg), + ConvModule( + in_channels=hidden_dim, + out_channels=out_channels, + kernel_size=1, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + ]) + self.conv = nn.Sequential(*layers) + + def forward(self, x): + + def _inner_forward(x): + if self.use_res_connect: + return x + self.conv(x) + + return self.conv(x) + + if self.with_cp and x.requires_grad: + out = cp.checkpoint(_inner_forward, x) + else: + out = _inner_forward(x) + + return out + + +@BACKBONES.register_module() +class MobileNetV2(nn.Module): + """MobileNetV2 backbone. + + Args: + pretrained (str | None): Name of pretrained model. Default: None. + widen_factor (float): Width multiplier, multiply number of + channels in each layer by this amount. Default: 1.0. + out_indices (None or Sequence[int]): Output from which stages. + Default: (7, ). + frozen_stages (int): Stages to be frozen (all param fixed). Note that + the last stage in ``MobileNetV2`` is ``conv2``. Default: -1, + which means not freezing any parameters. + conv_cfg (dict): Config dict for convolution layer. + Default: None, which means using conv2d. + norm_cfg (dict): Config dict for normalization layer. + Default: dict(type='BN'). + act_cfg (dict): Config dict for activation layer. + Default: dict(type='ReLU6'). + norm_eval (bool): Whether to set norm layers to eval mode, namely, + freeze running stats (mean and var). Note: Effect on Batch Norm + and its variants only. Default: False. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + """ + + # Parameters to build layers. 4 parameters are needed to construct a + # layer, from left to right: expand_ratio, channel, num_blocks, stride. + arch_settings = [[1, 16, 1, 1], [6, 24, 2, 2], [6, 32, 3, 2], + [6, 64, 4, 2], [6, 96, 3, 1], [6, 160, 3, 2], + [6, 320, 1, 1]] + + def __init__(self, + pretrained=None, + widen_factor=1., + out_indices=(7, ), + frozen_stages=-1, + conv_cfg=dict(type='Conv'), + norm_cfg=dict(type='BN2d', requires_grad=True), + act_cfg=dict(type='ReLU6', inplace=True), + norm_eval=False, + with_cp=False): + super().__init__() + self.pretrained = pretrained + self.widen_factor = widen_factor + self.out_indices = out_indices + for index in out_indices: + if index not in range(0, 8): + raise ValueError('the item in out_indices must in ' + f'range(0, 8). But received {index}') + + if frozen_stages not in range(-1, 9): + raise ValueError('frozen_stages must be in range(-1, 9). ' + f'But received {frozen_stages}') + self.out_indices = out_indices + self.frozen_stages = frozen_stages + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.norm_eval = norm_eval + self.with_cp = with_cp + + self.in_channels = make_divisible(32 * widen_factor, 8) + + self.conv1 = ConvModule( + in_channels=3, + out_channels=self.in_channels, + kernel_size=3, + stride=2, + padding=1, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + + self.layers = [] + + for i, layer_cfg in enumerate(self.arch_settings): + expand_ratio, channel, num_blocks, stride = layer_cfg + out_channels = make_divisible(channel * widen_factor, 8) + inverted_res_layer = self.make_layer( + out_channels=out_channels, + num_blocks=num_blocks, + stride=stride, + expand_ratio=expand_ratio) + layer_name = f'layer{i + 1}' + self.add_module(layer_name, inverted_res_layer) + self.layers.append(layer_name) + + if widen_factor > 1.0: + self.out_channel = int(1280 * widen_factor) + else: + self.out_channel = 1280 + + layer = ConvModule( + in_channels=self.in_channels, + out_channels=self.out_channel, + kernel_size=1, + stride=1, + padding=0, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + self.add_module('conv2', layer) + self.layers.append('conv2') + + def make_layer(self, out_channels, num_blocks, stride, expand_ratio): + """Stack InvertedResidual blocks to build a layer for MobileNetV2. + + Args: + out_channels (int): out_channels of block. + num_blocks (int): number of blocks. + stride (int): stride of the first block. Default: 1 + expand_ratio (int): Expand the number of channels of the + hidden layer in InvertedResidual by this ratio. Default: 6. + """ + layers = [] + for i in range(num_blocks): + if i >= 1: + stride = 1 + layers.append( + InvertedResidual( + self.in_channels, + out_channels, + stride, + expand_ratio=expand_ratio, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg, + with_cp=self.with_cp)) + self.in_channels = out_channels + + return nn.Sequential(*layers) + + def init_weights(self): + if isinstance(self.pretrained, str): + logger = get_root_logger() + load_checkpoint(self, self.pretrained, strict=False, logger=logger) + elif self.pretrained is None: + for m in self.modules(): + if isinstance(m, nn.Conv2d): + kaiming_init(m) + elif isinstance(m, (_BatchNorm, nn.GroupNorm)): + constant_init(m, 1) + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, x): + x = self.conv1(x) + + outs = [] + for i, layer_name in enumerate(self.layers): + layer = getattr(self, layer_name) + x = layer(x) + if i in self.out_indices: + outs.append(x) + + if len(outs) == 1: + return outs[0] + + return tuple(outs) + + def _freeze_stages(self): + if self.frozen_stages >= 0: + self.conv1.eval() + for param in self.conv1.parameters(): + param.requires_grad = False + for i in range(1, self.frozen_stages + 1): + layer_name = self.layers[i - 1] + layer = getattr(self, layer_name) + layer.eval() + for param in layer.parameters(): + param.requires_grad = False + + def train(self, mode=True): + super(MobileNetV2, self).train(mode) + self._freeze_stages() + if mode and self.norm_eval: + for m in self.modules(): + if isinstance(m, _BatchNorm): + m.eval() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/mobilenet_v2_tsm.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/mobilenet_v2_tsm.py new file mode 100644 index 0000000000000000000000000000000000000000..a7050e559dfe1c0113d16c98f4576a87573aec6a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/mobilenet_v2_tsm.py @@ -0,0 +1,41 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from ..builder import BACKBONES +from .mobilenet_v2 import InvertedResidual, MobileNetV2 +from .resnet_tsm import TemporalShift + + +@BACKBONES.register_module() +class MobileNetV2TSM(MobileNetV2): + """MobileNetV2 backbone for TSM. + + Args: + num_segments (int): Number of frame segments. Default: 8. + is_shift (bool): Whether to make temporal shift in reset layers. + Default: True. + shift_div (int): Number of div for shift. Default: 8. + **kwargs (keyword arguments, optional): Arguments for MobilNetV2. + """ + + def __init__(self, num_segments=8, is_shift=True, shift_div=8, **kwargs): + super().__init__(**kwargs) + self.num_segments = num_segments + self.is_shift = is_shift + self.shift_div = shift_div + + def make_temporal_shift(self): + """Make temporal shift for some layers.""" + for m in self.modules(): + if isinstance(m, InvertedResidual) and \ + len(m.conv) == 3 and m.use_res_connect: + m.conv[0] = TemporalShift( + m.conv[0], + num_segments=self.num_segments, + shift_div=self.shift_div, + ) + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + super().init_weights() + if self.is_shift: + self.make_temporal_shift() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..d8f697a00168495d081de04bbb4c4a155e6616a0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet.py @@ -0,0 +1,591 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import ConvModule, constant_init, kaiming_init +from mmcv.runner import _load_checkpoint, load_checkpoint +from mmcv.utils import _BatchNorm +from torch.utils import checkpoint as cp + +from ...utils import get_root_logger +from ..builder import BACKBONES + + +class BasicBlock(nn.Module): + """Basic block for ResNet. + + Args: + inplanes (int): Number of channels for the input in first conv2d layer. + planes (int): Number of channels produced by some norm/conv2d layers. + stride (int): Stride in the conv layer. Default: 1. + dilation (int): Spacing between kernel elements. Default: 1. + downsample (nn.Module | None): Downsample layer. Default: None. + style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two + layer is the 3x3 conv layer, otherwise the stride-two layer is + the first 1x1 conv layer. Default: 'pytorch'. + conv_cfg (dict): Config for norm layers. Default: dict(type='Conv'). + norm_cfg (dict): + Config for norm layers. required keys are `type` and + `requires_grad`. Default: dict(type='BN2d', requires_grad=True). + act_cfg (dict): Config for activate layers. + Default: dict(type='ReLU', inplace=True). + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + """ + expansion = 1 + + def __init__(self, + inplanes, + planes, + stride=1, + dilation=1, + downsample=None, + style='pytorch', + conv_cfg=dict(type='Conv'), + norm_cfg=dict(type='BN', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True), + with_cp=False): + super().__init__() + assert style in ['pytorch', 'caffe'] + self.conv1 = ConvModule( + inplanes, + planes, + kernel_size=3, + stride=stride, + padding=dilation, + dilation=dilation, + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg) + + self.conv2 = ConvModule( + planes, + planes, + kernel_size=3, + stride=1, + padding=1, + dilation=1, + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + + self.relu = nn.ReLU(inplace=True) + self.downsample = downsample + self.style = style + self.stride = stride + self.dilation = dilation + self.norm_cfg = norm_cfg + assert not with_cp + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + identity = x + + out = self.conv1(x) + out = self.conv2(out) + + if self.downsample is not None: + identity = self.downsample(x) + + out = out + identity + out = self.relu(out) + + return out + + +class Bottleneck(nn.Module): + """Bottleneck block for ResNet. + + Args: + inplanes (int): + Number of channels for the input feature in first conv layer. + planes (int): + Number of channels produced by some norm layes and conv layers + stride (int): Spatial stride in the conv layer. Default: 1. + dilation (int): Spacing between kernel elements. Default: 1. + downsample (nn.Module | None): Downsample layer. Default: None. + style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two + layer is the 3x3 conv layer, otherwise the stride-two layer is + the first 1x1 conv layer. Default: 'pytorch'. + conv_cfg (dict): Config for norm layers. Default: dict(type='Conv'). + norm_cfg (dict): + Config for norm layers. required keys are `type` and + `requires_grad`. Default: dict(type='BN2d', requires_grad=True). + act_cfg (dict): Config for activate layers. + Default: dict(type='ReLU', inplace=True). + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + """ + + expansion = 4 + + def __init__(self, + inplanes, + planes, + stride=1, + dilation=1, + downsample=None, + style='pytorch', + conv_cfg=dict(type='Conv'), + norm_cfg=dict(type='BN', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True), + with_cp=False): + super().__init__() + assert style in ['pytorch', 'caffe'] + self.inplanes = inplanes + self.planes = planes + if style == 'pytorch': + self.conv1_stride = 1 + self.conv2_stride = stride + else: + self.conv1_stride = stride + self.conv2_stride = 1 + self.conv1 = ConvModule( + inplanes, + planes, + kernel_size=1, + stride=self.conv1_stride, + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg) + self.conv2 = ConvModule( + planes, + planes, + kernel_size=3, + stride=self.conv2_stride, + padding=dilation, + dilation=dilation, + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg) + + self.conv3 = ConvModule( + planes, + planes * self.expansion, + kernel_size=1, + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + + self.relu = nn.ReLU(inplace=True) + self.downsample = downsample + self.stride = stride + self.dilation = dilation + self.norm_cfg = norm_cfg + self.with_cp = with_cp + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + + def _inner_forward(x): + """Forward wrapper for utilizing checkpoint.""" + identity = x + + out = self.conv1(x) + out = self.conv2(out) + out = self.conv3(out) + + if self.downsample is not None: + identity = self.downsample(x) + + out = out + identity + + return out + + if self.with_cp and x.requires_grad: + out = cp.checkpoint(_inner_forward, x) + else: + out = _inner_forward(x) + + out = self.relu(out) + + return out + + +def make_res_layer(block, + inplanes, + planes, + blocks, + stride=1, + dilation=1, + style='pytorch', + conv_cfg=None, + norm_cfg=None, + act_cfg=None, + with_cp=False): + """Build residual layer for ResNet. + + Args: + block: (nn.Module): Residual module to be built. + inplanes (int): Number of channels for the input feature in each block. + planes (int): Number of channels for the output feature in each block. + blocks (int): Number of residual blocks. + stride (int): Stride in the conv layer. Default: 1. + dilation (int): Spacing between kernel elements. Default: 1. + style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two + layer is the 3x3 conv layer, otherwise the stride-two layer is + the first 1x1 conv layer. Default: 'pytorch'. + conv_cfg (dict | None): Config for norm layers. Default: None. + norm_cfg (dict | None): Config for norm layers. Default: None. + act_cfg (dict | None): Config for activate layers. Default: None. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + + Returns: + nn.Module: A residual layer for the given config. + """ + downsample = None + if stride != 1 or inplanes != planes * block.expansion: + downsample = ConvModule( + inplanes, + planes * block.expansion, + kernel_size=1, + stride=stride, + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + + layers = [] + layers.append( + block( + inplanes, + planes, + stride, + dilation, + downsample, + style=style, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg, + with_cp=with_cp)) + inplanes = planes * block.expansion + for _ in range(1, blocks): + layers.append( + block( + inplanes, + planes, + 1, + dilation, + style=style, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg, + with_cp=with_cp)) + + return nn.Sequential(*layers) + + +@BACKBONES.register_module() +class ResNet(nn.Module): + """ResNet backbone. + + Args: + depth (int): Depth of resnet, from {18, 34, 50, 101, 152}. + pretrained (str | None): Name of pretrained model. Default: None. + in_channels (int): Channel num of input features. Default: 3. + num_stages (int): Resnet stages. Default: 4. + strides (Sequence[int]): Strides of the first block of each stage. + out_indices (Sequence[int]): Indices of output feature. Default: (3, ). + dilations (Sequence[int]): Dilation of each stage. + style (str): ``pytorch`` or ``caffe``. If set to "pytorch", the + stride-two layer is the 3x3 conv layer, otherwise the stride-two + layer is the first 1x1 conv layer. Default: ``pytorch``. + frozen_stages (int): Stages to be frozen (all param fixed). -1 means + not freezing any parameters. Default: -1. + conv_cfg (dict): Config for norm layers. Default: dict(type='Conv'). + norm_cfg (dict): + Config for norm layers. required keys are `type` and + `requires_grad`. Default: dict(type='BN2d', requires_grad=True). + act_cfg (dict): Config for activate layers. + Default: dict(type='ReLU', inplace=True). + norm_eval (bool): Whether to set BN layers to eval mode, namely, freeze + running stats (mean and var). Default: False. + partial_bn (bool): Whether to use partial bn. Default: False. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + """ + + arch_settings = { + 18: (BasicBlock, (2, 2, 2, 2)), + 34: (BasicBlock, (3, 4, 6, 3)), + 50: (Bottleneck, (3, 4, 6, 3)), + 101: (Bottleneck, (3, 4, 23, 3)), + 152: (Bottleneck, (3, 8, 36, 3)) + } + + def __init__(self, + depth, + pretrained=None, + torchvision_pretrain=True, + in_channels=3, + num_stages=4, + out_indices=(3, ), + strides=(1, 2, 2, 2), + dilations=(1, 1, 1, 1), + style='pytorch', + frozen_stages=-1, + conv_cfg=dict(type='Conv'), + norm_cfg=dict(type='BN2d', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True), + norm_eval=False, + partial_bn=False, + with_cp=False): + super().__init__() + if depth not in self.arch_settings: + raise KeyError(f'invalid depth {depth} for resnet') + self.depth = depth + self.in_channels = in_channels + self.pretrained = pretrained + self.torchvision_pretrain = torchvision_pretrain + self.num_stages = num_stages + assert 1 <= num_stages <= 4 + self.out_indices = out_indices + assert max(out_indices) < num_stages + self.strides = strides + self.dilations = dilations + assert len(strides) == len(dilations) == num_stages + self.style = style + self.frozen_stages = frozen_stages + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.norm_eval = norm_eval + self.partial_bn = partial_bn + self.with_cp = with_cp + + self.block, stage_blocks = self.arch_settings[depth] + self.stage_blocks = stage_blocks[:num_stages] + self.inplanes = 64 + + self._make_stem_layer() + + self.res_layers = [] + for i, num_blocks in enumerate(self.stage_blocks): + stride = strides[i] + dilation = dilations[i] + planes = 64 * 2**i + res_layer = make_res_layer( + self.block, + self.inplanes, + planes, + num_blocks, + stride=stride, + dilation=dilation, + style=self.style, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg, + with_cp=with_cp) + self.inplanes = planes * self.block.expansion + layer_name = f'layer{i + 1}' + self.add_module(layer_name, res_layer) + self.res_layers.append(layer_name) + + self.feat_dim = self.block.expansion * 64 * 2**( + len(self.stage_blocks) - 1) + + def _make_stem_layer(self): + """Construct the stem layers consists of a conv+norm+act module and a + pooling layer.""" + self.conv1 = ConvModule( + self.in_channels, + 64, + kernel_size=7, + stride=2, + padding=3, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) + + @staticmethod + def _load_conv_params(conv, state_dict_tv, module_name_tv, + loaded_param_names): + """Load the conv parameters of resnet from torchvision. + + Args: + conv (nn.Module): The destination conv module. + state_dict_tv (OrderedDict): The state dict of pretrained + torchvision model. + module_name_tv (str): The name of corresponding conv module in the + torchvision model. + loaded_param_names (list[str]): List of parameters that have been + loaded. + """ + + weight_tv_name = module_name_tv + '.weight' + if conv.weight.data.shape == state_dict_tv[weight_tv_name].shape: + conv.weight.data.copy_(state_dict_tv[weight_tv_name]) + loaded_param_names.append(weight_tv_name) + + if getattr(conv, 'bias') is not None: + bias_tv_name = module_name_tv + '.bias' + if conv.bias.data.shape == state_dict_tv[bias_tv_name].shape: + conv.bias.data.copy_(state_dict_tv[bias_tv_name]) + loaded_param_names.append(bias_tv_name) + + @staticmethod + def _load_bn_params(bn, state_dict_tv, module_name_tv, loaded_param_names): + """Load the bn parameters of resnet from torchvision. + + Args: + bn (nn.Module): The destination bn module. + state_dict_tv (OrderedDict): The state dict of pretrained + torchvision model. + module_name_tv (str): The name of corresponding bn module in the + torchvision model. + loaded_param_names (list[str]): List of parameters that have been + loaded. + """ + + for param_name, param in bn.named_parameters(): + param_tv_name = f'{module_name_tv}.{param_name}' + param_tv = state_dict_tv[param_tv_name] + if param.data.shape == param_tv.shape: + param.data.copy_(param_tv) + loaded_param_names.append(param_tv_name) + + for param_name, param in bn.named_buffers(): + param_tv_name = f'{module_name_tv}.{param_name}' + # some buffers like num_batches_tracked may not exist + if param_tv_name in state_dict_tv: + param_tv = state_dict_tv[param_tv_name] + if param.data.shape == param_tv.shape: + param.data.copy_(param_tv) + loaded_param_names.append(param_tv_name) + + def _load_torchvision_checkpoint(self, logger=None): + """Initiate the parameters from torchvision pretrained checkpoint.""" + state_dict_torchvision = _load_checkpoint(self.pretrained) + if 'state_dict' in state_dict_torchvision: + state_dict_torchvision = state_dict_torchvision['state_dict'] + + loaded_param_names = [] + for name, module in self.named_modules(): + if isinstance(module, ConvModule): + # we use a ConvModule to wrap conv+bn+relu layers, thus the + # name mapping is needed + if 'downsample' in name: + # layer{X}.{Y}.downsample.conv->layer{X}.{Y}.downsample.0 + original_conv_name = name + '.0' + # layer{X}.{Y}.downsample.bn->layer{X}.{Y}.downsample.1 + original_bn_name = name + '.1' + else: + # layer{X}.{Y}.conv{n}.conv->layer{X}.{Y}.conv{n} + original_conv_name = name + # layer{X}.{Y}.conv{n}.bn->layer{X}.{Y}.bn{n} + original_bn_name = name.replace('conv', 'bn') + self._load_conv_params(module.conv, state_dict_torchvision, + original_conv_name, loaded_param_names) + self._load_bn_params(module.bn, state_dict_torchvision, + original_bn_name, loaded_param_names) + + # check if any parameters in the 2d checkpoint are not loaded + remaining_names = set( + state_dict_torchvision.keys()) - set(loaded_param_names) + if remaining_names: + logger.info( + f'These parameters in pretrained checkpoint are not loaded' + f': {remaining_names}') + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + if isinstance(self.pretrained, str): + logger = get_root_logger() + if self.torchvision_pretrain: + # torchvision's + self._load_torchvision_checkpoint(logger) + else: + # ours + load_checkpoint( + self, self.pretrained, strict=False, logger=logger) + elif self.pretrained is None: + for m in self.modules(): + if isinstance(m, nn.Conv2d): + kaiming_init(m) + elif isinstance(m, nn.BatchNorm2d): + constant_init(m, 1) + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The feature of the input samples extracted + by the backbone. + """ + x = self.conv1(x) + x = self.maxpool(x) + outs = [] + for i, layer_name in enumerate(self.res_layers): + res_layer = getattr(self, layer_name) + x = res_layer(x) + if i in self.out_indices: + outs.append(x) + if len(outs) == 1: + return outs[0] + + return tuple(outs) + + def _freeze_stages(self): + """Prevent all the parameters from being optimized before + ``self.frozen_stages``.""" + if self.frozen_stages >= 0: + self.conv1.bn.eval() + for m in self.conv1.modules(): + for param in m.parameters(): + param.requires_grad = False + + for i in range(1, self.frozen_stages + 1): + m = getattr(self, f'layer{i}') + m.eval() + for param in m.parameters(): + param.requires_grad = False + + def _partial_bn(self): + logger = get_root_logger() + logger.info('Freezing BatchNorm2D except the first one.') + count_bn = 0 + for m in self.modules(): + if isinstance(m, nn.BatchNorm2d): + count_bn += 1 + if count_bn >= 2: + m.eval() + # shutdown update in frozen mode + m.weight.requires_grad = False + m.bias.requires_grad = False + + def train(self, mode=True): + """Set the optimization status when training.""" + super().train(mode) + self._freeze_stages() + if mode and self.norm_eval: + for m in self.modules(): + if isinstance(m, _BatchNorm): + m.eval() + if mode and self.partial_bn: + self._partial_bn() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet2plus1d.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet2plus1d.py new file mode 100644 index 0000000000000000000000000000000000000000..1055343b0c471b63daee064490875f72514b65d4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet2plus1d.py @@ -0,0 +1,50 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from ..builder import BACKBONES +from .resnet3d import ResNet3d + + +@BACKBONES.register_module() +class ResNet2Plus1d(ResNet3d): + """ResNet (2+1)d backbone. + + This model is proposed in `A Closer Look at Spatiotemporal Convolutions for + Action Recognition `_ + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + assert self.pretrained2d is False + assert self.conv_cfg['type'] == 'Conv2plus1d' + + def _freeze_stages(self): + """Prevent all the parameters from being optimized before + ``self.frozen_stages``.""" + if self.frozen_stages >= 0: + self.conv1.eval() + for param in self.conv1.parameters(): + param.requires_grad = False + + for i in range(1, self.frozen_stages + 1): + m = getattr(self, f'layer{i}') + m.eval() + for param in m.parameters(): + param.requires_grad = False + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The feature of the input + samples extracted by the backbone. + """ + x = self.conv1(x) + x = self.maxpool(x) + for layer_name in self.res_layers: + res_layer = getattr(self, layer_name) + # no pool2 in R(2+1)d + x = res_layer(x) + + return x diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d.py new file mode 100644 index 0000000000000000000000000000000000000000..f4ab71f9b9319b27aa432332322a64efd7b9d104 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d.py @@ -0,0 +1,1034 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import warnings + +import torch.nn as nn +import torch.utils.checkpoint as cp +from mmcv.cnn import (ConvModule, NonLocal3d, build_activation_layer, + constant_init, kaiming_init) +from mmcv.runner import _load_checkpoint, load_checkpoint +from mmcv.utils import _BatchNorm +from torch.nn.modules.utils import _ntuple, _triple + +from ...utils import get_root_logger +from ..builder import BACKBONES + +try: + from mmdet.models import BACKBONES as MMDET_BACKBONES + from mmdet.models.builder import SHARED_HEADS as MMDET_SHARED_HEADS + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + + +class BasicBlock3d(nn.Module): + """BasicBlock 3d block for ResNet3D. + + Args: + inplanes (int): Number of channels for the input in first conv3d layer. + planes (int): Number of channels produced by some norm/conv3d layers. + spatial_stride (int): Spatial stride in the conv3d layer. Default: 1. + temporal_stride (int): Temporal stride in the conv3d layer. Default: 1. + dilation (int): Spacing between kernel elements. Default: 1. + downsample (nn.Module | None): Downsample layer. Default: None. + style (str): ``pytorch`` or ``caffe``. If set to "pytorch", the + stride-two layer is the 3x3 conv layer, otherwise the stride-two + layer is the first 1x1 conv layer. Default: 'pytorch'. + inflate (bool): Whether to inflate kernel. Default: True. + non_local (bool): Determine whether to apply non-local module in this + block. Default: False. + non_local_cfg (dict): Config for non-local module. Default: ``dict()``. + conv_cfg (dict): Config dict for convolution layer. + Default: ``dict(type='Conv3d')``. + norm_cfg (dict): Config for norm layers. required keys are ``type``, + Default: ``dict(type='BN3d')``. + act_cfg (dict): Config dict for activation layer. + Default: ``dict(type='ReLU')``. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + """ + expansion = 1 + + def __init__(self, + inplanes, + planes, + spatial_stride=1, + temporal_stride=1, + dilation=1, + downsample=None, + style='pytorch', + inflate=True, + non_local=False, + non_local_cfg=dict(), + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d'), + act_cfg=dict(type='ReLU'), + with_cp=False, + **kwargs): + super().__init__() + assert style in ['pytorch', 'caffe'] + # make sure that only ``inflate_style`` is passed into kwargs + assert set(kwargs).issubset(['inflate_style']) + + self.inplanes = inplanes + self.planes = planes + self.spatial_stride = spatial_stride + self.temporal_stride = temporal_stride + self.dilation = dilation + self.style = style + self.inflate = inflate + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.with_cp = with_cp + self.non_local = non_local + self.non_local_cfg = non_local_cfg + + self.conv1_stride_s = spatial_stride + self.conv2_stride_s = 1 + self.conv1_stride_t = temporal_stride + self.conv2_stride_t = 1 + + if self.inflate: + conv1_kernel_size = (3, 3, 3) + conv1_padding = (1, dilation, dilation) + conv2_kernel_size = (3, 3, 3) + conv2_padding = (1, 1, 1) + else: + conv1_kernel_size = (1, 3, 3) + conv1_padding = (0, dilation, dilation) + conv2_kernel_size = (1, 3, 3) + conv2_padding = (0, 1, 1) + + self.conv1 = ConvModule( + inplanes, + planes, + conv1_kernel_size, + stride=(self.conv1_stride_t, self.conv1_stride_s, + self.conv1_stride_s), + padding=conv1_padding, + dilation=(1, dilation, dilation), + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + + self.conv2 = ConvModule( + planes, + planes * self.expansion, + conv2_kernel_size, + stride=(self.conv2_stride_t, self.conv2_stride_s, + self.conv2_stride_s), + padding=conv2_padding, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=None) + + self.downsample = downsample + self.relu = build_activation_layer(self.act_cfg) + + if self.non_local: + self.non_local_block = NonLocal3d(self.conv2.norm.num_features, + **self.non_local_cfg) + + def forward(self, x): + """Defines the computation performed at every call.""" + + def _inner_forward(x): + """Forward wrapper for utilizing checkpoint.""" + identity = x + + out = self.conv1(x) + out = self.conv2(out) + + if self.downsample is not None: + identity = self.downsample(x) + + out = out + identity + return out + + if self.with_cp and x.requires_grad: + out = cp.checkpoint(_inner_forward, x) + else: + out = _inner_forward(x) + out = self.relu(out) + + if self.non_local: + out = self.non_local_block(out) + + return out + + +class Bottleneck3d(nn.Module): + """Bottleneck 3d block for ResNet3D. + + Args: + inplanes (int): Number of channels for the input in first conv3d layer. + planes (int): Number of channels produced by some norm/conv3d layers. + spatial_stride (int): Spatial stride in the conv3d layer. Default: 1. + temporal_stride (int): Temporal stride in the conv3d layer. Default: 1. + dilation (int): Spacing between kernel elements. Default: 1. + downsample (nn.Module | None): Downsample layer. Default: None. + style (str): ``pytorch`` or ``caffe``. If set to "pytorch", the + stride-two layer is the 3x3 conv layer, otherwise the stride-two + layer is the first 1x1 conv layer. Default: 'pytorch'. + inflate (bool): Whether to inflate kernel. Default: True. + inflate_style (str): ``3x1x1`` or ``3x3x3``. which determines the + kernel sizes and padding strides for conv1 and conv2 in each block. + Default: '3x1x1'. + non_local (bool): Determine whether to apply non-local module in this + block. Default: False. + non_local_cfg (dict): Config for non-local module. Default: ``dict()``. + conv_cfg (dict): Config dict for convolution layer. + Default: ``dict(type='Conv3d')``. + norm_cfg (dict): Config for norm layers. required keys are ``type``, + Default: ``dict(type='BN3d')``. + act_cfg (dict): Config dict for activation layer. + Default: ``dict(type='ReLU')``. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + """ + expansion = 4 + + def __init__(self, + inplanes, + planes, + spatial_stride=1, + temporal_stride=1, + dilation=1, + downsample=None, + style='pytorch', + inflate=True, + inflate_style='3x1x1', + non_local=False, + non_local_cfg=dict(), + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d'), + act_cfg=dict(type='ReLU'), + with_cp=False): + super().__init__() + assert style in ['pytorch', 'caffe'] + assert inflate_style in ['3x1x1', '3x3x3'] + + self.inplanes = inplanes + self.planes = planes + self.spatial_stride = spatial_stride + self.temporal_stride = temporal_stride + self.dilation = dilation + self.style = style + self.inflate = inflate + self.inflate_style = inflate_style + self.norm_cfg = norm_cfg + self.conv_cfg = conv_cfg + self.act_cfg = act_cfg + self.with_cp = with_cp + self.non_local = non_local + self.non_local_cfg = non_local_cfg + + if self.style == 'pytorch': + self.conv1_stride_s = 1 + self.conv2_stride_s = spatial_stride + self.conv1_stride_t = 1 + self.conv2_stride_t = temporal_stride + else: + self.conv1_stride_s = spatial_stride + self.conv2_stride_s = 1 + self.conv1_stride_t = temporal_stride + self.conv2_stride_t = 1 + + if self.inflate: + if inflate_style == '3x1x1': + conv1_kernel_size = (3, 1, 1) + conv1_padding = (1, 0, 0) + conv2_kernel_size = (1, 3, 3) + conv2_padding = (0, dilation, dilation) + else: + conv1_kernel_size = (1, 1, 1) + conv1_padding = (0, 0, 0) + conv2_kernel_size = (3, 3, 3) + conv2_padding = (1, dilation, dilation) + else: + conv1_kernel_size = (1, 1, 1) + conv1_padding = (0, 0, 0) + conv2_kernel_size = (1, 3, 3) + conv2_padding = (0, dilation, dilation) + + self.conv1 = ConvModule( + inplanes, + planes, + conv1_kernel_size, + stride=(self.conv1_stride_t, self.conv1_stride_s, + self.conv1_stride_s), + padding=conv1_padding, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + + self.conv2 = ConvModule( + planes, + planes, + conv2_kernel_size, + stride=(self.conv2_stride_t, self.conv2_stride_s, + self.conv2_stride_s), + padding=conv2_padding, + dilation=(1, dilation, dilation), + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + + self.conv3 = ConvModule( + planes, + planes * self.expansion, + 1, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + # No activation in the third ConvModule for bottleneck + act_cfg=None) + + self.downsample = downsample + self.relu = build_activation_layer(self.act_cfg) + + if self.non_local: + self.non_local_block = NonLocal3d(self.conv3.norm.num_features, + **self.non_local_cfg) + + def forward(self, x): + """Defines the computation performed at every call.""" + + def _inner_forward(x): + """Forward wrapper for utilizing checkpoint.""" + identity = x + + out = self.conv1(x) + out = self.conv2(out) + out = self.conv3(out) + + if self.downsample is not None: + identity = self.downsample(x) + + out = out + identity + return out + + if self.with_cp and x.requires_grad: + out = cp.checkpoint(_inner_forward, x) + else: + out = _inner_forward(x) + out = self.relu(out) + + if self.non_local: + out = self.non_local_block(out) + + return out + + +@BACKBONES.register_module() +class ResNet3d(nn.Module): + """ResNet 3d backbone. + + Args: + depth (int): Depth of resnet, from {18, 34, 50, 101, 152}. + pretrained (str | None): Name of pretrained model. + stage_blocks (tuple | None): Set number of stages for each res layer. + Default: None. + pretrained2d (bool): Whether to load pretrained 2D model. + Default: True. + in_channels (int): Channel num of input features. Default: 3. + base_channels (int): Channel num of stem output features. Default: 64. + out_indices (Sequence[int]): Indices of output feature. Default: (3, ). + num_stages (int): Resnet stages. Default: 4. + spatial_strides (Sequence[int]): + Spatial strides of residual blocks of each stage. + Default: ``(1, 2, 2, 2)``. + temporal_strides (Sequence[int]): + Temporal strides of residual blocks of each stage. + Default: ``(1, 1, 1, 1)``. + dilations (Sequence[int]): Dilation of each stage. + Default: ``(1, 1, 1, 1)``. + conv1_kernel (Sequence[int]): Kernel size of the first conv layer. + Default: ``(3, 7, 7)``. + conv1_stride_s (int): Spatial stride of the first conv layer. + Default: 2. + conv1_stride_t (int): Temporal stride of the first conv layer. + Default: 1. + pool1_stride_s (int): Spatial stride of the first pooling layer. + Default: 2. + pool1_stride_t (int): Temporal stride of the first pooling layer. + Default: 1. + with_pool2 (bool): Whether to use pool2. Default: True. + style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two + layer is the 3x3 conv layer, otherwise the stride-two layer is + the first 1x1 conv layer. Default: 'pytorch'. + frozen_stages (int): Stages to be frozen (all param fixed). -1 means + not freezing any parameters. Default: -1. + inflate (Sequence[int]): Inflate Dims of each block. + Default: (1, 1, 1, 1). + inflate_style (str): ``3x1x1`` or ``3x3x3``. which determines the + kernel sizes and padding strides for conv1 and conv2 in each block. + Default: '3x1x1'. + conv_cfg (dict): Config for conv layers. required keys are ``type`` + Default: ``dict(type='Conv3d')``. + norm_cfg (dict): Config for norm layers. required keys are ``type`` and + ``requires_grad``. + Default: ``dict(type='BN3d', requires_grad=True)``. + act_cfg (dict): Config dict for activation layer. + Default: ``dict(type='ReLU', inplace=True)``. + norm_eval (bool): Whether to set BN layers to eval mode, namely, freeze + running stats (mean and var). Default: False. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + non_local (Sequence[int]): Determine whether to apply non-local module + in the corresponding block of each stages. Default: (0, 0, 0, 0). + non_local_cfg (dict): Config for non-local module. Default: ``dict()``. + zero_init_residual (bool): + Whether to use zero initialization for residual block, + Default: True. + kwargs (dict, optional): Key arguments for "make_res_layer". + """ + + arch_settings = { + 18: (BasicBlock3d, (2, 2, 2, 2)), + 34: (BasicBlock3d, (3, 4, 6, 3)), + 50: (Bottleneck3d, (3, 4, 6, 3)), + 101: (Bottleneck3d, (3, 4, 23, 3)), + 152: (Bottleneck3d, (3, 8, 36, 3)) + } + + def __init__(self, + depth, + pretrained, + stage_blocks=None, + pretrained2d=True, + in_channels=3, + num_stages=4, + base_channels=64, + out_indices=(3, ), + spatial_strides=(1, 2, 2, 2), + temporal_strides=(1, 1, 1, 1), + dilations=(1, 1, 1, 1), + conv1_kernel=(3, 7, 7), + conv1_stride_s=2, + conv1_stride_t=1, + pool1_stride_s=2, + pool1_stride_t=1, + with_pool1=True, + with_pool2=True, + style='pytorch', + frozen_stages=-1, + inflate=(1, 1, 1, 1), + inflate_style='3x1x1', + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True), + norm_eval=False, + with_cp=False, + non_local=(0, 0, 0, 0), + non_local_cfg=dict(), + zero_init_residual=True, + **kwargs): + super().__init__() + if depth not in self.arch_settings: + raise KeyError(f'invalid depth {depth} for resnet') + self.depth = depth + self.pretrained = pretrained + self.pretrained2d = pretrained2d + self.in_channels = in_channels + self.base_channels = base_channels + self.num_stages = num_stages + assert 1 <= num_stages <= 4 + self.stage_blocks = stage_blocks + self.out_indices = out_indices + assert max(out_indices) < num_stages + self.spatial_strides = spatial_strides + self.temporal_strides = temporal_strides + self.dilations = dilations + assert len(spatial_strides) == len(temporal_strides) == len( + dilations) == num_stages + if self.stage_blocks is not None: + assert len(self.stage_blocks) == num_stages + + self.conv1_kernel = conv1_kernel + self.conv1_stride_s = conv1_stride_s + self.conv1_stride_t = conv1_stride_t + self.pool1_stride_s = pool1_stride_s + self.pool1_stride_t = pool1_stride_t + self.with_pool1 = with_pool1 + self.with_pool2 = with_pool2 + self.style = style + self.frozen_stages = frozen_stages + self.stage_inflations = _ntuple(num_stages)(inflate) + self.non_local_stages = _ntuple(num_stages)(non_local) + self.inflate_style = inflate_style + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.norm_eval = norm_eval + self.with_cp = with_cp + self.zero_init_residual = zero_init_residual + + self.block, stage_blocks = self.arch_settings[depth] + + if self.stage_blocks is None: + self.stage_blocks = stage_blocks[:num_stages] + + self.inplanes = self.base_channels + + self.non_local_cfg = non_local_cfg + + self._make_stem_layer() + + self.res_layers = [] + for i, num_blocks in enumerate(self.stage_blocks): + spatial_stride = spatial_strides[i] + temporal_stride = temporal_strides[i] + dilation = dilations[i] + planes = self.base_channels * 2**i + res_layer = self.make_res_layer( + self.block, + self.inplanes, + planes, + num_blocks, + spatial_stride=spatial_stride, + temporal_stride=temporal_stride, + dilation=dilation, + style=self.style, + norm_cfg=self.norm_cfg, + conv_cfg=self.conv_cfg, + act_cfg=self.act_cfg, + non_local=self.non_local_stages[i], + non_local_cfg=self.non_local_cfg, + inflate=self.stage_inflations[i], + inflate_style=self.inflate_style, + with_cp=with_cp, + **kwargs) + self.inplanes = planes * self.block.expansion + layer_name = f'layer{i + 1}' + self.add_module(layer_name, res_layer) + self.res_layers.append(layer_name) + + self.feat_dim = self.block.expansion * self.base_channels * 2**( + len(self.stage_blocks) - 1) + + @staticmethod + def make_res_layer(block, + inplanes, + planes, + blocks, + spatial_stride=1, + temporal_stride=1, + dilation=1, + style='pytorch', + inflate=1, + inflate_style='3x1x1', + non_local=0, + non_local_cfg=dict(), + norm_cfg=None, + act_cfg=None, + conv_cfg=None, + with_cp=False, + **kwargs): + """Build residual layer for ResNet3D. + + Args: + block (nn.Module): Residual module to be built. + inplanes (int): Number of channels for the input feature + in each block. + planes (int): Number of channels for the output feature + in each block. + blocks (int): Number of residual blocks. + spatial_stride (int | Sequence[int]): Spatial strides in + residual and conv layers. Default: 1. + temporal_stride (int | Sequence[int]): Temporal strides in + residual and conv layers. Default: 1. + dilation (int): Spacing between kernel elements. Default: 1. + style (str): ``pytorch`` or ``caffe``. If set to ``pytorch``, + the stride-two layer is the 3x3 conv layer, otherwise + the stride-two layer is the first 1x1 conv layer. + Default: ``pytorch``. + inflate (int | Sequence[int]): Determine whether to inflate + for each block. Default: 1. + inflate_style (str): ``3x1x1`` or ``3x3x3``. which determines + the kernel sizes and padding strides for conv1 and conv2 + in each block. Default: '3x1x1'. + non_local (int | Sequence[int]): Determine whether to apply + non-local module in the corresponding block of each stages. + Default: 0. + non_local_cfg (dict): Config for non-local module. + Default: ``dict()``. + conv_cfg (dict | None): Config for norm layers. Default: None. + norm_cfg (dict | None): Config for norm layers. Default: None. + act_cfg (dict | None): Config for activate layers. Default: None. + with_cp (bool | None): Use checkpoint or not. Using checkpoint + will save some memory while slowing down the training speed. + Default: False. + + Returns: + nn.Module: A residual layer for the given config. + """ + inflate = inflate if not isinstance(inflate, + int) else (inflate, ) * blocks + non_local = non_local if not isinstance( + non_local, int) else (non_local, ) * blocks + assert len(inflate) == blocks and len(non_local) == blocks + downsample = None + if spatial_stride != 1 or inplanes != planes * block.expansion: + downsample = ConvModule( + inplanes, + planes * block.expansion, + kernel_size=1, + stride=(temporal_stride, spatial_stride, spatial_stride), + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + + layers = [] + layers.append( + block( + inplanes, + planes, + spatial_stride=spatial_stride, + temporal_stride=temporal_stride, + dilation=dilation, + downsample=downsample, + style=style, + inflate=(inflate[0] == 1), + inflate_style=inflate_style, + non_local=(non_local[0] == 1), + non_local_cfg=non_local_cfg, + norm_cfg=norm_cfg, + conv_cfg=conv_cfg, + act_cfg=act_cfg, + with_cp=with_cp, + **kwargs)) + inplanes = planes * block.expansion + for i in range(1, blocks): + layers.append( + block( + inplanes, + planes, + spatial_stride=1, + temporal_stride=1, + dilation=dilation, + style=style, + inflate=(inflate[i] == 1), + inflate_style=inflate_style, + non_local=(non_local[i] == 1), + non_local_cfg=non_local_cfg, + norm_cfg=norm_cfg, + conv_cfg=conv_cfg, + act_cfg=act_cfg, + with_cp=with_cp, + **kwargs)) + + return nn.Sequential(*layers) + + @staticmethod + def _inflate_conv_params(conv3d, state_dict_2d, module_name_2d, + inflated_param_names): + """Inflate a conv module from 2d to 3d. + + Args: + conv3d (nn.Module): The destination conv3d module. + state_dict_2d (OrderedDict): The state dict of pretrained 2d model. + module_name_2d (str): The name of corresponding conv module in the + 2d model. + inflated_param_names (list[str]): List of parameters that have been + inflated. + """ + weight_2d_name = module_name_2d + '.weight' + + conv2d_weight = state_dict_2d[weight_2d_name] + kernel_t = conv3d.weight.data.shape[2] + + new_weight = conv2d_weight.data.unsqueeze(2).expand_as( + conv3d.weight) / kernel_t + conv3d.weight.data.copy_(new_weight) + inflated_param_names.append(weight_2d_name) + + if getattr(conv3d, 'bias') is not None: + bias_2d_name = module_name_2d + '.bias' + conv3d.bias.data.copy_(state_dict_2d[bias_2d_name]) + inflated_param_names.append(bias_2d_name) + + @staticmethod + def _inflate_bn_params(bn3d, state_dict_2d, module_name_2d, + inflated_param_names): + """Inflate a norm module from 2d to 3d. + + Args: + bn3d (nn.Module): The destination bn3d module. + state_dict_2d (OrderedDict): The state dict of pretrained 2d model. + module_name_2d (str): The name of corresponding bn module in the + 2d model. + inflated_param_names (list[str]): List of parameters that have been + inflated. + """ + for param_name, param in bn3d.named_parameters(): + param_2d_name = f'{module_name_2d}.{param_name}' + param_2d = state_dict_2d[param_2d_name] + if param.data.shape != param_2d.shape: + warnings.warn(f'The parameter of {module_name_2d} is not' + 'loaded due to incompatible shapes. ') + return + + param.data.copy_(param_2d) + inflated_param_names.append(param_2d_name) + + for param_name, param in bn3d.named_buffers(): + param_2d_name = f'{module_name_2d}.{param_name}' + # some buffers like num_batches_tracked may not exist in old + # checkpoints + if param_2d_name in state_dict_2d: + param_2d = state_dict_2d[param_2d_name] + param.data.copy_(param_2d) + inflated_param_names.append(param_2d_name) + + @staticmethod + def _inflate_weights(self, logger): + """Inflate the resnet2d parameters to resnet3d. + + The differences between resnet3d and resnet2d mainly lie in an extra + axis of conv kernel. To utilize the pretrained parameters in 2d model, + the weight of conv2d models should be inflated to fit in the shapes of + the 3d counterpart. + + Args: + logger (logging.Logger): The logger used to print + debugging information. + """ + + state_dict_r2d = _load_checkpoint(self.pretrained) + if 'state_dict' in state_dict_r2d: + state_dict_r2d = state_dict_r2d['state_dict'] + + inflated_param_names = [] + for name, module in self.named_modules(): + if isinstance(module, ConvModule): + # we use a ConvModule to wrap conv+bn+relu layers, thus the + # name mapping is needed + if 'downsample' in name: + # layer{X}.{Y}.downsample.conv->layer{X}.{Y}.downsample.0 + original_conv_name = name + '.0' + # layer{X}.{Y}.downsample.bn->layer{X}.{Y}.downsample.1 + original_bn_name = name + '.1' + else: + # layer{X}.{Y}.conv{n}.conv->layer{X}.{Y}.conv{n} + original_conv_name = name + # layer{X}.{Y}.conv{n}.bn->layer{X}.{Y}.bn{n} + original_bn_name = name.replace('conv', 'bn') + if original_conv_name + '.weight' not in state_dict_r2d: + logger.warning(f'Module not exist in the state_dict_r2d' + f': {original_conv_name}') + else: + shape_2d = state_dict_r2d[original_conv_name + + '.weight'].shape + shape_3d = module.conv.weight.data.shape + if shape_2d != shape_3d[:2] + shape_3d[3:]: + logger.warning(f'Weight shape mismatch for ' + f': {original_conv_name} : ' + f'3d weight shape: {shape_3d}; ' + f'2d weight shape: {shape_2d}. ') + else: + self._inflate_conv_params(module.conv, state_dict_r2d, + original_conv_name, + inflated_param_names) + + if original_bn_name + '.weight' not in state_dict_r2d: + logger.warning(f'Module not exist in the state_dict_r2d' + f': {original_bn_name}') + else: + self._inflate_bn_params(module.bn, state_dict_r2d, + original_bn_name, + inflated_param_names) + + # check if any parameters in the 2d checkpoint are not loaded + remaining_names = set( + state_dict_r2d.keys()) - set(inflated_param_names) + if remaining_names: + logger.info(f'These parameters in the 2d checkpoint are not loaded' + f': {remaining_names}') + + def inflate_weights(self, logger): + self._inflate_weights(self, logger) + + def _make_stem_layer(self): + """Construct the stem layers consists of a conv+norm+act module and a + pooling layer.""" + self.conv1 = ConvModule( + self.in_channels, + self.base_channels, + kernel_size=self.conv1_kernel, + stride=(self.conv1_stride_t, self.conv1_stride_s, + self.conv1_stride_s), + padding=tuple([(k - 1) // 2 for k in _triple(self.conv1_kernel)]), + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + + self.maxpool = nn.MaxPool3d( + kernel_size=(1, 3, 3), + stride=(self.pool1_stride_t, self.pool1_stride_s, + self.pool1_stride_s), + padding=(0, 1, 1)) + + self.pool2 = nn.MaxPool3d(kernel_size=(2, 1, 1), stride=(2, 1, 1)) + + def _freeze_stages(self): + """Prevent all the parameters from being optimized before + ``self.frozen_stages``.""" + if self.frozen_stages >= 0: + self.conv1.eval() + for param in self.conv1.parameters(): + param.requires_grad = False + + for i in range(1, self.frozen_stages + 1): + m = getattr(self, f'layer{i}') + m.eval() + for param in m.parameters(): + param.requires_grad = False + + @staticmethod + def _init_weights(self, pretrained=None): + """Initiate the parameters either from existing checkpoint or from + scratch. + + Args: + pretrained (str | None): The path of the pretrained weight. Will + override the original `pretrained` if set. The arg is added to + be compatible with mmdet. Default: None. + """ + if pretrained: + self.pretrained = pretrained + if isinstance(self.pretrained, str): + logger = get_root_logger() + logger.info(f'load model from: {self.pretrained}') + + if self.pretrained2d: + # Inflate 2D model into 3D model. + self.inflate_weights(logger) + + else: + # Directly load 3D model. + load_checkpoint( + self, self.pretrained, strict=False, logger=logger) + + elif self.pretrained is None: + for m in self.modules(): + if isinstance(m, nn.Conv3d): + kaiming_init(m) + elif isinstance(m, _BatchNorm): + constant_init(m, 1) + + if self.zero_init_residual: + for m in self.modules(): + if isinstance(m, Bottleneck3d): + constant_init(m.conv3.bn, 0) + elif isinstance(m, BasicBlock3d): + constant_init(m.conv2.bn, 0) + else: + raise TypeError('pretrained must be a str or None') + + def init_weights(self, pretrained=None): + self._init_weights(self, pretrained) + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The feature of the input + samples extracted by the backbone. + """ + x = self.conv1(x) + if self.with_pool1: + x = self.maxpool(x) + outs = [] + for i, layer_name in enumerate(self.res_layers): + res_layer = getattr(self, layer_name) + x = res_layer(x) + if i == 0 and self.with_pool2: + x = self.pool2(x) + if i in self.out_indices: + outs.append(x) + if len(outs) == 1: + return outs[0] + + return tuple(outs) + + def train(self, mode=True): + """Set the optimization status when training.""" + super().train(mode) + self._freeze_stages() + if mode and self.norm_eval: + for m in self.modules(): + if isinstance(m, _BatchNorm): + m.eval() + + +@BACKBONES.register_module() +class ResNet3dLayer(nn.Module): + """ResNet 3d Layer. + + Args: + depth (int): Depth of resnet, from {18, 34, 50, 101, 152}. + pretrained (str | None): Name of pretrained model. + pretrained2d (bool): Whether to load pretrained 2D model. + Default: True. + stage (int): The index of Resnet stage. Default: 3. + base_channels (int): Channel num of stem output features. Default: 64. + spatial_stride (int): The 1st res block's spatial stride. Default 2. + temporal_stride (int): The 1st res block's temporal stride. Default 1. + dilation (int): The dilation. Default: 1. + style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two + layer is the 3x3 conv layer, otherwise the stride-two layer is + the first 1x1 conv layer. Default: 'pytorch'. + all_frozen (bool): Frozen all modules in the layer. Default: False. + inflate (int): Inflate Dims of each block. Default: 1. + inflate_style (str): ``3x1x1`` or ``3x3x3``. which determines the + kernel sizes and padding strides for conv1 and conv2 in each block. + Default: '3x1x1'. + conv_cfg (dict): Config for conv layers. required keys are ``type`` + Default: ``dict(type='Conv3d')``. + norm_cfg (dict): Config for norm layers. required keys are ``type`` and + ``requires_grad``. + Default: ``dict(type='BN3d', requires_grad=True)``. + act_cfg (dict): Config dict for activation layer. + Default: ``dict(type='ReLU', inplace=True)``. + norm_eval (bool): Whether to set BN layers to eval mode, namely, freeze + running stats (mean and var). Default: False. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + zero_init_residual (bool): + Whether to use zero initialization for residual block, + Default: True. + kwargs (dict, optional): Key arguments for "make_res_layer". + """ + + def __init__(self, + depth, + pretrained, + pretrained2d=True, + stage=3, + base_channels=64, + spatial_stride=2, + temporal_stride=1, + dilation=1, + style='pytorch', + all_frozen=False, + inflate=1, + inflate_style='3x1x1', + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True), + norm_eval=False, + with_cp=False, + zero_init_residual=True, + **kwargs): + + super().__init__() + self.arch_settings = ResNet3d.arch_settings + assert depth in self.arch_settings + + self.make_res_layer = ResNet3d.make_res_layer + self._inflate_conv_params = ResNet3d._inflate_conv_params + self._inflate_bn_params = ResNet3d._inflate_bn_params + self._inflate_weights = ResNet3d._inflate_weights + self._init_weights = ResNet3d._init_weights + + self.depth = depth + self.pretrained = pretrained + self.pretrained2d = pretrained2d + self.stage = stage + # stage index is 0 based + assert 0 <= stage <= 3 + self.base_channels = base_channels + + self.spatial_stride = spatial_stride + self.temporal_stride = temporal_stride + self.dilation = dilation + + self.style = style + self.all_frozen = all_frozen + + self.stage_inflation = inflate + self.inflate_style = inflate_style + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.norm_eval = norm_eval + self.with_cp = with_cp + self.zero_init_residual = zero_init_residual + + block, stage_blocks = self.arch_settings[depth] + stage_block = stage_blocks[stage] + planes = 64 * 2**stage + inplanes = 64 * 2**(stage - 1) * block.expansion + + res_layer = self.make_res_layer( + block, + inplanes, + planes, + stage_block, + spatial_stride=spatial_stride, + temporal_stride=temporal_stride, + dilation=dilation, + style=self.style, + norm_cfg=self.norm_cfg, + conv_cfg=self.conv_cfg, + act_cfg=self.act_cfg, + inflate=self.stage_inflation, + inflate_style=self.inflate_style, + with_cp=with_cp, + **kwargs) + + self.layer_name = f'layer{stage + 1}' + self.add_module(self.layer_name, res_layer) + + def inflate_weights(self, logger): + self._inflate_weights(self, logger) + + def _freeze_stages(self): + """Prevent all the parameters from being optimized before + ``self.frozen_stages``.""" + if self.all_frozen: + layer = getattr(self, self.layer_name) + layer.eval() + for param in layer.parameters(): + param.requires_grad = False + + def init_weights(self, pretrained=None): + self._init_weights(self, pretrained) + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The feature of the input + samples extracted by the backbone. + """ + res_layer = getattr(self, self.layer_name) + out = res_layer(x) + return out + + def train(self, mode=True): + """Set the optimization status when training.""" + super().train(mode) + self._freeze_stages() + if mode and self.norm_eval: + for m in self.modules(): + if isinstance(m, _BatchNorm): + m.eval() + + +if mmdet_imported: + MMDET_SHARED_HEADS.register_module()(ResNet3dLayer) + MMDET_BACKBONES.register_module()(ResNet3d) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d_csn.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d_csn.py new file mode 100644 index 0000000000000000000000000000000000000000..8b7a5feebcb799cf51b4cbbc5cf9164ba70060c7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d_csn.py @@ -0,0 +1,157 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import ConvModule +from mmcv.utils import _BatchNorm + +from ..builder import BACKBONES +from .resnet3d import Bottleneck3d, ResNet3d + + +class CSNBottleneck3d(Bottleneck3d): + """Channel-Separated Bottleneck Block. + + This module is proposed in + "Video Classification with Channel-Separated Convolutional Networks" + Link: https://arxiv.org/pdf/1711.11248.pdf + + Args: + inplanes (int): Number of channels for the input in first conv3d layer. + planes (int): Number of channels produced by some norm/conv3d layers. + bottleneck_mode (str): Determine which ways to factorize a 3D + bottleneck block using channel-separated convolutional networks. + If set to 'ip', it will replace the 3x3x3 conv2 layer with a + 1x1x1 traditional convolution and a 3x3x3 depthwise + convolution, i.e., Interaction-preserved channel-separated + bottleneck block. + If set to 'ir', it will replace the 3x3x3 conv2 layer with a + 3x3x3 depthwise convolution, which is derived from preserved + bottleneck block by removing the extra 1x1x1 convolution, + i.e., Interaction-reduced channel-separated bottleneck block. + Default: 'ir'. + args (position arguments): Position arguments for Bottleneck. + kwargs (dict, optional): Keyword arguments for Bottleneck. + """ + + def __init__(self, + inplanes, + planes, + *args, + bottleneck_mode='ir', + **kwargs): + super(CSNBottleneck3d, self).__init__(inplanes, planes, *args, + **kwargs) + self.bottleneck_mode = bottleneck_mode + conv2 = [] + if self.bottleneck_mode == 'ip': + conv2.append( + ConvModule( + planes, + planes, + 1, + stride=1, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=None)) + conv2_kernel_size = self.conv2.conv.kernel_size + conv2_stride = self.conv2.conv.stride + conv2_padding = self.conv2.conv.padding + conv2_dilation = self.conv2.conv.dilation + conv2_bias = bool(self.conv2.conv.bias) + self.conv2 = ConvModule( + planes, + planes, + conv2_kernel_size, + stride=conv2_stride, + padding=conv2_padding, + dilation=conv2_dilation, + bias=conv2_bias, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg, + groups=planes) + conv2.append(self.conv2) + self.conv2 = nn.Sequential(*conv2) + + +@BACKBONES.register_module() +class ResNet3dCSN(ResNet3d): + """ResNet backbone for CSN. + + Args: + depth (int): Depth of ResNetCSN, from {18, 34, 50, 101, 152}. + pretrained (str | None): Name of pretrained model. + temporal_strides (tuple[int]): + Temporal strides of residual blocks of each stage. + Default: (1, 2, 2, 2). + conv1_kernel (tuple[int]): Kernel size of the first conv layer. + Default: (3, 7, 7). + conv1_stride_t (int): Temporal stride of the first conv layer. + Default: 1. + pool1_stride_t (int): Temporal stride of the first pooling layer. + Default: 1. + norm_cfg (dict): Config for norm layers. required keys are `type` and + `requires_grad`. + Default: dict(type='BN3d', requires_grad=True, eps=1e-3). + inflate_style (str): `3x1x1` or `3x3x3`. which determines the kernel + sizes and padding strides for conv1 and conv2 in each block. + Default: '3x3x3'. + bottleneck_mode (str): Determine which ways to factorize a 3D + bottleneck block using channel-separated convolutional networks. + If set to 'ip', it will replace the 3x3x3 conv2 layer with a + 1x1x1 traditional convolution and a 3x3x3 depthwise + convolution, i.e., Interaction-preserved channel-separated + bottleneck block. + If set to 'ir', it will replace the 3x3x3 conv2 layer with a + 3x3x3 depthwise convolution, which is derived from preserved + bottleneck block by removing the extra 1x1x1 convolution, + i.e., Interaction-reduced channel-separated bottleneck block. + Default: 'ip'. + kwargs (dict, optional): Key arguments for "make_res_layer". + """ + + def __init__(self, + depth, + pretrained, + temporal_strides=(1, 2, 2, 2), + conv1_kernel=(3, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + norm_cfg=dict(type='BN3d', requires_grad=True, eps=1e-3), + inflate_style='3x3x3', + bottleneck_mode='ir', + bn_frozen=False, + **kwargs): + self.arch_settings = { + # 18: (BasicBlock3d, (2, 2, 2, 2)), + # 34: (BasicBlock3d, (3, 4, 6, 3)), + 50: (CSNBottleneck3d, (3, 4, 6, 3)), + 101: (CSNBottleneck3d, (3, 4, 23, 3)), + 152: (CSNBottleneck3d, (3, 8, 36, 3)) + } + self.bn_frozen = bn_frozen + if bottleneck_mode not in ['ip', 'ir']: + raise ValueError(f'Bottleneck mode must be "ip" or "ir",' + f'but got {bottleneck_mode}.') + super(ResNet3dCSN, self).__init__( + depth, + pretrained, + temporal_strides=temporal_strides, + conv1_kernel=conv1_kernel, + conv1_stride_t=conv1_stride_t, + pool1_stride_t=pool1_stride_t, + norm_cfg=norm_cfg, + inflate_style=inflate_style, + bottleneck_mode=bottleneck_mode, + **kwargs) + + def train(self, mode=True): + super(ResNet3d, self).train(mode) + self._freeze_stages() + if mode and self.norm_eval: + for m in self.modules(): + if isinstance(m, _BatchNorm): + m.eval() + if self.bn_frozen: + for param in m.parameters(): + param.requires_grad = False diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d_slowfast.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d_slowfast.py new file mode 100644 index 0000000000000000000000000000000000000000..31da6fde0726e68a54235c07e848082756af4fd4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d_slowfast.py @@ -0,0 +1,531 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import warnings + +import torch +import torch.nn as nn +from mmcv.cnn import ConvModule, kaiming_init +from mmcv.runner import _load_checkpoint, load_checkpoint +from mmcv.utils import print_log + +from ...utils import get_root_logger +from ..builder import BACKBONES +from .resnet3d import ResNet3d + +try: + from mmdet.models import BACKBONES as MMDET_BACKBONES + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + + +class ResNet3dPathway(ResNet3d): + """A pathway of Slowfast based on ResNet3d. + + Args: + *args (arguments): Arguments same as :class:``ResNet3d``. + lateral (bool): Determines whether to enable the lateral connection + from another pathway. Default: False. + speed_ratio (int): Speed ratio indicating the ratio between time + dimension of the fast and slow pathway, corresponding to the + ``alpha`` in the paper. Default: 8. + channel_ratio (int): Reduce the channel number of fast pathway + by ``channel_ratio``, corresponding to ``beta`` in the paper. + Default: 8. + fusion_kernel (int): The kernel size of lateral fusion. + Default: 5. + **kwargs (keyword arguments): Keywords arguments for ResNet3d. + """ + + def __init__(self, + *args, + lateral=False, + lateral_norm=False, + speed_ratio=8, + channel_ratio=8, + fusion_kernel=5, + **kwargs): + self.lateral = lateral + self.lateral_norm = lateral_norm + self.speed_ratio = speed_ratio + self.channel_ratio = channel_ratio + self.fusion_kernel = fusion_kernel + super().__init__(*args, **kwargs) + self.inplanes = self.base_channels + if self.lateral: + self.conv1_lateral = ConvModule( + self.inplanes // self.channel_ratio, + # https://arxiv.org/abs/1812.03982, the + # third type of lateral connection has out_channel: + # 2 * \beta * C + self.inplanes * 2 // self.channel_ratio, + kernel_size=(fusion_kernel, 1, 1), + stride=(self.speed_ratio, 1, 1), + padding=((fusion_kernel - 1) // 2, 0, 0), + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg if self.lateral_norm else None, + act_cfg=self.act_cfg if self.lateral_norm else None) + + self.lateral_connections = [] + for i in range(len(self.stage_blocks)): + planes = self.base_channels * 2**i + self.inplanes = planes * self.block.expansion + + if lateral and i != self.num_stages - 1: + # no lateral connection needed in final stage + lateral_name = f'layer{(i + 1)}_lateral' + setattr( + self, lateral_name, + ConvModule( + self.inplanes // self.channel_ratio, + self.inplanes * 2 // self.channel_ratio, + kernel_size=(fusion_kernel, 1, 1), + stride=(self.speed_ratio, 1, 1), + padding=((fusion_kernel - 1) // 2, 0, 0), + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg if self.lateral_norm else None, + act_cfg=self.act_cfg if self.lateral_norm else None)) + self.lateral_connections.append(lateral_name) + + def make_res_layer(self, + block, + inplanes, + planes, + blocks, + spatial_stride=1, + temporal_stride=1, + dilation=1, + style='pytorch', + inflate=1, + inflate_style='3x1x1', + non_local=0, + non_local_cfg=dict(), + conv_cfg=None, + norm_cfg=None, + act_cfg=None, + with_cp=False): + """Build residual layer for Slowfast. + + Args: + block (nn.Module): Residual module to be built. + inplanes (int): Number of channels for the input + feature in each block. + planes (int): Number of channels for the output + feature in each block. + blocks (int): Number of residual blocks. + spatial_stride (int | Sequence[int]): Spatial strides + in residual and conv layers. Default: 1. + temporal_stride (int | Sequence[int]): Temporal strides in + residual and conv layers. Default: 1. + dilation (int): Spacing between kernel elements. Default: 1. + style (str): ``pytorch`` or ``caffe``. If set to ``pytorch``, + the stride-two layer is the 3x3 conv layer, + otherwise the stride-two layer is the first 1x1 conv layer. + Default: ``pytorch``. + inflate (int | Sequence[int]): Determine whether to inflate + for each block. Default: 1. + inflate_style (str): ``3x1x1`` or ``3x3x3``. which determines + the kernel sizes and padding strides for conv1 and + conv2 in each block. Default: ``3x1x1``. + non_local (int | Sequence[int]): Determine whether to apply + non-local module in the corresponding block of each stages. + Default: 0. + non_local_cfg (dict): Config for non-local module. + Default: ``dict()``. + conv_cfg (dict | None): Config for conv layers. Default: None. + norm_cfg (dict | None): Config for norm layers. Default: None. + act_cfg (dict | None): Config for activate layers. Default: None. + with_cp (bool): Use checkpoint or not. Using checkpoint will save + some memory while slowing down the training speed. + Default: False. + + Returns: + nn.Module: A residual layer for the given config. + """ + inflate = inflate if not isinstance(inflate, + int) else (inflate, ) * blocks + non_local = non_local if not isinstance( + non_local, int) else (non_local, ) * blocks + assert len(inflate) == blocks and len(non_local) == blocks + if self.lateral: + lateral_inplanes = inplanes * 2 // self.channel_ratio + else: + lateral_inplanes = 0 + if (spatial_stride != 1 + or (inplanes + lateral_inplanes) != planes * block.expansion): + downsample = ConvModule( + inplanes + lateral_inplanes, + planes * block.expansion, + kernel_size=1, + stride=(temporal_stride, spatial_stride, spatial_stride), + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + else: + downsample = None + + layers = [] + layers.append( + block( + inplanes + lateral_inplanes, + planes, + spatial_stride, + temporal_stride, + dilation, + downsample, + style=style, + inflate=(inflate[0] == 1), + inflate_style=inflate_style, + non_local=(non_local[0] == 1), + non_local_cfg=non_local_cfg, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg, + with_cp=with_cp)) + inplanes = planes * block.expansion + + for i in range(1, blocks): + layers.append( + block( + inplanes, + planes, + 1, + 1, + dilation, + style=style, + inflate=(inflate[i] == 1), + inflate_style=inflate_style, + non_local=(non_local[i] == 1), + non_local_cfg=non_local_cfg, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg, + with_cp=with_cp)) + + return nn.Sequential(*layers) + + def inflate_weights(self, logger): + """Inflate the resnet2d parameters to resnet3d pathway. + + The differences between resnet3d and resnet2d mainly lie in an extra + axis of conv kernel. To utilize the pretrained parameters in 2d model, + the weight of conv2d models should be inflated to fit in the shapes of + the 3d counterpart. For pathway the ``lateral_connection`` part should + not be inflated from 2d weights. + + Args: + logger (logging.Logger): The logger used to print + debugging information. + """ + + state_dict_r2d = _load_checkpoint(self.pretrained) + if 'state_dict' in state_dict_r2d: + state_dict_r2d = state_dict_r2d['state_dict'] + + inflated_param_names = [] + for name, module in self.named_modules(): + if 'lateral' in name: + continue + if isinstance(module, ConvModule): + # we use a ConvModule to wrap conv+bn+relu layers, thus the + # name mapping is needed + if 'downsample' in name: + # layer{X}.{Y}.downsample.conv->layer{X}.{Y}.downsample.0 + original_conv_name = name + '.0' + # layer{X}.{Y}.downsample.bn->layer{X}.{Y}.downsample.1 + original_bn_name = name + '.1' + else: + # layer{X}.{Y}.conv{n}.conv->layer{X}.{Y}.conv{n} + original_conv_name = name + # layer{X}.{Y}.conv{n}.bn->layer{X}.{Y}.bn{n} + original_bn_name = name.replace('conv', 'bn') + if original_conv_name + '.weight' not in state_dict_r2d: + logger.warning(f'Module not exist in the state_dict_r2d' + f': {original_conv_name}') + else: + self._inflate_conv_params(module.conv, state_dict_r2d, + original_conv_name, + inflated_param_names) + if original_bn_name + '.weight' not in state_dict_r2d: + logger.warning(f'Module not exist in the state_dict_r2d' + f': {original_bn_name}') + else: + self._inflate_bn_params(module.bn, state_dict_r2d, + original_bn_name, + inflated_param_names) + + # check if any parameters in the 2d checkpoint are not loaded + remaining_names = set( + state_dict_r2d.keys()) - set(inflated_param_names) + if remaining_names: + logger.info(f'These parameters in the 2d checkpoint are not loaded' + f': {remaining_names}') + + def _inflate_conv_params(self, conv3d, state_dict_2d, module_name_2d, + inflated_param_names): + """Inflate a conv module from 2d to 3d. + + The differences of conv modules betweene 2d and 3d in Pathway + mainly lie in the inplanes due to lateral connections. To fit the + shapes of the lateral connection counterpart, it will expand + parameters by concatting conv2d parameters and extra zero paddings. + + Args: + conv3d (nn.Module): The destination conv3d module. + state_dict_2d (OrderedDict): The state dict of pretrained 2d model. + module_name_2d (str): The name of corresponding conv module in the + 2d model. + inflated_param_names (list[str]): List of parameters that have been + inflated. + """ + weight_2d_name = module_name_2d + '.weight' + conv2d_weight = state_dict_2d[weight_2d_name] + old_shape = conv2d_weight.shape + new_shape = conv3d.weight.data.shape + kernel_t = new_shape[2] + + if new_shape[1] != old_shape[1]: + if new_shape[1] < old_shape[1]: + warnings.warn(f'The parameter of {module_name_2d} is not' + 'loaded due to incompatible shapes. ') + return + # Inplanes may be different due to lateral connections + new_channels = new_shape[1] - old_shape[1] + pad_shape = old_shape + pad_shape = pad_shape[:1] + (new_channels, ) + pad_shape[2:] + # Expand parameters by concat extra channels + conv2d_weight = torch.cat( + (conv2d_weight, + torch.zeros(pad_shape).type_as(conv2d_weight).to( + conv2d_weight.device)), + dim=1) + + new_weight = conv2d_weight.data.unsqueeze(2).expand_as( + conv3d.weight) / kernel_t + conv3d.weight.data.copy_(new_weight) + inflated_param_names.append(weight_2d_name) + + if getattr(conv3d, 'bias') is not None: + bias_2d_name = module_name_2d + '.bias' + conv3d.bias.data.copy_(state_dict_2d[bias_2d_name]) + inflated_param_names.append(bias_2d_name) + + def _freeze_stages(self): + """Prevent all the parameters from being optimized before + `self.frozen_stages`.""" + if self.frozen_stages >= 0: + self.conv1.eval() + for param in self.conv1.parameters(): + param.requires_grad = False + + for i in range(1, self.frozen_stages + 1): + m = getattr(self, f'layer{i}') + m.eval() + for param in m.parameters(): + param.requires_grad = False + + if i != len(self.res_layers) and self.lateral: + # No fusion needed in the final stage + lateral_name = self.lateral_connections[i - 1] + conv_lateral = getattr(self, lateral_name) + conv_lateral.eval() + for param in conv_lateral.parameters(): + param.requires_grad = False + + def init_weights(self, pretrained=None): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + if pretrained: + self.pretrained = pretrained + + # Override the init_weights of i3d + super().init_weights() + for module_name in self.lateral_connections: + layer = getattr(self, module_name) + for m in layer.modules(): + if isinstance(m, (nn.Conv3d, nn.Conv2d)): + kaiming_init(m) + + +pathway_cfg = { + 'resnet3d': ResNet3dPathway, + # TODO: BNInceptionPathway +} + + +def build_pathway(cfg, *args, **kwargs): + """Build pathway. + + Args: + cfg (None or dict): cfg should contain: + - type (str): identify conv layer type. + + Returns: + nn.Module: Created pathway. + """ + if not (isinstance(cfg, dict) and 'type' in cfg): + raise TypeError('cfg must be a dict containing the key "type"') + cfg_ = cfg.copy() + + pathway_type = cfg_.pop('type') + if pathway_type not in pathway_cfg: + raise KeyError(f'Unrecognized pathway type {pathway_type}') + + pathway_cls = pathway_cfg[pathway_type] + pathway = pathway_cls(*args, **kwargs, **cfg_) + + return pathway + + +@BACKBONES.register_module() +class ResNet3dSlowFast(nn.Module): + """Slowfast backbone. + + This module is proposed in `SlowFast Networks for Video Recognition + `_ + + Args: + pretrained (str): The file path to a pretrained model. + resample_rate (int): A large temporal stride ``resample_rate`` + on input frames. The actual resample rate is calculated by + multipling the ``interval`` in ``SampleFrames`` in the + pipeline with ``resample_rate``, equivalent to the :math:`\\tau` + in the paper, i.e. it processes only one out of + ``resample_rate * interval`` frames. Default: 8. + speed_ratio (int): Speed ratio indicating the ratio between time + dimension of the fast and slow pathway, corresponding to the + :math:`\\alpha` in the paper. Default: 8. + channel_ratio (int): Reduce the channel number of fast pathway + by ``channel_ratio``, corresponding to :math:`\\beta` in the paper. + Default: 8. + slow_pathway (dict): Configuration of slow branch, should contain + necessary arguments for building the specific type of pathway + and: + type (str): type of backbone the pathway bases on. + lateral (bool): determine whether to build lateral connection + for the pathway.Default: + + .. code-block:: Python + + dict(type='ResNetPathway', + lateral=True, depth=50, pretrained=None, + conv1_kernel=(1, 7, 7), dilations=(1, 1, 1, 1), + conv1_stride_t=1, pool1_stride_t=1, inflate=(0, 0, 1, 1)) + + fast_pathway (dict): Configuration of fast branch, similar to + `slow_pathway`. Default: + + .. code-block:: Python + + dict(type='ResNetPathway', + lateral=False, depth=50, pretrained=None, base_channels=8, + conv1_kernel=(5, 7, 7), conv1_stride_t=1, pool1_stride_t=1) + """ + + def __init__(self, + pretrained, + resample_rate=8, + speed_ratio=8, + channel_ratio=8, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=True, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1)), + fast_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + base_channels=8, + conv1_kernel=(5, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1)): + super().__init__() + self.pretrained = pretrained + self.resample_rate = resample_rate + self.speed_ratio = speed_ratio + self.channel_ratio = channel_ratio + + if slow_pathway['lateral']: + slow_pathway['speed_ratio'] = speed_ratio + slow_pathway['channel_ratio'] = channel_ratio + + self.slow_path = build_pathway(slow_pathway) + self.fast_path = build_pathway(fast_pathway) + + def init_weights(self, pretrained=None): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + if pretrained: + self.pretrained = pretrained + + if isinstance(self.pretrained, str): + logger = get_root_logger() + msg = f'load model from: {self.pretrained}' + print_log(msg, logger=logger) + # Directly load 3D model. + load_checkpoint(self, self.pretrained, strict=True, logger=logger) + elif self.pretrained is None: + # Init two branch separately. + self.fast_path.init_weights() + self.slow_path.init_weights() + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + tuple[torch.Tensor]: The feature of the input samples extracted + by the backbone. + """ + x_slow = nn.functional.interpolate( + x, + mode='nearest', + scale_factor=(1.0 / self.resample_rate, 1.0, 1.0)) + x_slow = self.slow_path.conv1(x_slow) + x_slow = self.slow_path.maxpool(x_slow) + + x_fast = nn.functional.interpolate( + x, + mode='nearest', + scale_factor=(1.0 / (self.resample_rate // self.speed_ratio), 1.0, + 1.0)) + x_fast = self.fast_path.conv1(x_fast) + x_fast = self.fast_path.maxpool(x_fast) + + if self.slow_path.lateral: + x_fast_lateral = self.slow_path.conv1_lateral(x_fast) + x_slow = torch.cat((x_slow, x_fast_lateral), dim=1) + + for i, layer_name in enumerate(self.slow_path.res_layers): + res_layer = getattr(self.slow_path, layer_name) + x_slow = res_layer(x_slow) + res_layer_fast = getattr(self.fast_path, layer_name) + x_fast = res_layer_fast(x_fast) + if (i != len(self.slow_path.res_layers) - 1 + and self.slow_path.lateral): + # No fusion needed in the final stage + lateral_name = self.slow_path.lateral_connections[i] + conv_lateral = getattr(self.slow_path, lateral_name) + x_fast_lateral = conv_lateral(x_fast) + x_slow = torch.cat((x_slow, x_fast_lateral), dim=1) + + out = (x_slow, x_fast) + + return out + + +if mmdet_imported: + MMDET_BACKBONES.register_module()(ResNet3dSlowFast) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d_slowonly.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d_slowonly.py new file mode 100644 index 0000000000000000000000000000000000000000..b983b2a1f95e22a5f527b06376ffd2f9334af7d9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet3d_slowonly.py @@ -0,0 +1,53 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from ..builder import BACKBONES +from .resnet3d_slowfast import ResNet3dPathway + +try: + from mmdet.models.builder import BACKBONES as MMDET_BACKBONES + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + + +@BACKBONES.register_module() +class ResNet3dSlowOnly(ResNet3dPathway): + """SlowOnly backbone based on ResNet3dPathway. + + Args: + *args (arguments): Arguments same as :class:`ResNet3dPathway`. + conv1_kernel (Sequence[int]): Kernel size of the first conv layer. + Default: (1, 7, 7). + conv1_stride_t (int): Temporal stride of the first conv layer. + Default: 1. + pool1_stride_t (int): Temporal stride of the first pooling layer. + Default: 1. + inflate (Sequence[int]): Inflate Dims of each block. + Default: (0, 0, 1, 1). + **kwargs (keyword arguments): Keywords arguments for + :class:`ResNet3dPathway`. + """ + + def __init__(self, + *args, + lateral=False, + conv1_kernel=(1, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + with_pool2=False, + **kwargs): + super().__init__( + *args, + lateral=lateral, + conv1_kernel=conv1_kernel, + conv1_stride_t=conv1_stride_t, + pool1_stride_t=pool1_stride_t, + inflate=inflate, + with_pool2=with_pool2, + **kwargs) + + assert not self.lateral + + +if mmdet_imported: + MMDET_BACKBONES.register_module()(ResNet3dSlowOnly) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet_audio.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet_audio.py new file mode 100644 index 0000000000000000000000000000000000000000..2245219a60966338f732037972adad4857685417 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet_audio.py @@ -0,0 +1,374 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +import torch.utils.checkpoint as cp +from mmcv.cnn import ConvModule, constant_init, kaiming_init +from mmcv.runner import load_checkpoint +from torch.nn.modules.batchnorm import _BatchNorm +from torch.nn.modules.utils import _ntuple + +from ...utils import get_root_logger +from ..builder import BACKBONES + + +class Bottleneck2dAudio(nn.Module): + """Bottleneck2D block for ResNet2D. + + Args: + inplanes (int): Number of channels for the input in first conv3d layer. + planes (int): Number of channels produced by some norm/conv3d layers. + stride (int | tuple[int]): Stride in the conv layer. Default: 1. + dilation (int): Spacing between kernel elements. Default: 1. + downsample (nn.Module): Downsample layer. Default: None. + factorize (bool): Whether to factorize kernel. Default: True. + norm_cfg (dict): + Config for norm layers. required keys are `type` and + `requires_grad`. Default: None. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + """ + expansion = 4 + + def __init__(self, + inplanes, + planes, + stride=2, + dilation=1, + downsample=None, + factorize=True, + norm_cfg=None, + with_cp=False): + super().__init__() + + self.inplanes = inplanes + self.planes = planes + self.stride = stride + self.dilation = dilation + self.factorize = factorize + self.norm_cfg = norm_cfg + self.with_cp = with_cp + + self.conv1_stride = 1 + self.conv2_stride = stride + + conv1_kernel_size = (1, 1) + conv1_padding = 0 + conv2_kernel_size = (3, 3) + conv2_padding = (dilation, dilation) + self.conv1 = ConvModule( + inplanes, + planes, + kernel_size=conv1_kernel_size, + padding=conv1_padding, + dilation=dilation, + norm_cfg=self.norm_cfg, + bias=False) + self.conv2 = ConvModule( + planes, + planes, + kernel_size=conv2_kernel_size, + stride=stride, + padding=conv2_padding, + dilation=dilation, + bias=False, + conv_cfg=dict(type='ConvAudio') if factorize else dict( + type='Conv'), + norm_cfg=None, + act_cfg=None) + self.conv3 = ConvModule( + 2 * planes if factorize else planes, + planes * self.expansion, + kernel_size=1, + bias=False, + norm_cfg=self.norm_cfg, + act_cfg=None) + + self.relu = nn.ReLU(inplace=True) + self.downsample = downsample + + def forward(self, x): + + def _inner_forward(x): + identity = x + out = self.conv1(x) + out = self.conv2(out) + out = self.conv3(out) + + if self.downsample is not None: + identity = self.downsample(x) + out += identity + + return out + + if self.with_cp and x.requires_grad: + out = cp.checkpoint(_inner_forward, x) + else: + out = _inner_forward(x) + + out = self.relu(out) + + return out + + +@BACKBONES.register_module() +class ResNetAudio(nn.Module): + """ResNet 2d audio backbone. Reference: + + `_. + + Args: + depth (int): Depth of resnet, from {50, 101, 152}. + pretrained (str | None): Name of pretrained model. + in_channels (int): Channel num of input features. Default: 1. + base_channels (int): Channel num of stem output features. Default: 32. + num_stages (int): Resnet stages. Default: 4. + strides (Sequence[int]): Strides of residual blocks of each stage. + Default: (1, 2, 2, 2). + dilations (Sequence[int]): Dilation of each stage. + Default: (1, 1, 1, 1). + conv1_kernel (int): Kernel size of the first conv layer. Default: 9. + conv1_stride (int | tuple[int]): Stride of the first conv layer. + Default: 1. + frozen_stages (int): Stages to be frozen (all param fixed). -1 means + not freezing any parameters. + factorize (Sequence[int]): factorize Dims of each block for audio. + Default: (1, 1, 0, 0). + norm_eval (bool): Whether to set BN layers to eval mode, namely, freeze + running stats (mean and var). Default: False. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + conv_cfg (dict): Config for norm layers. Default: dict(type='Conv'). + norm_cfg (dict): + Config for norm layers. required keys are `type` and + `requires_grad`. Default: dict(type='BN2d', requires_grad=True). + act_cfg (dict): Config for activate layers. + Default: dict(type='ReLU', inplace=True). + zero_init_residual (bool): + Whether to use zero initialization for residual block, + Default: True. + """ + + arch_settings = { + # 18: (BasicBlock2dAudio, (2, 2, 2, 2)), + # 34: (BasicBlock2dAudio, (3, 4, 6, 3)), + 50: (Bottleneck2dAudio, (3, 4, 6, 3)), + 101: (Bottleneck2dAudio, (3, 4, 23, 3)), + 152: (Bottleneck2dAudio, (3, 8, 36, 3)) + } + + def __init__(self, + depth, + pretrained, + in_channels=1, + num_stages=4, + base_channels=32, + strides=(1, 2, 2, 2), + dilations=(1, 1, 1, 1), + conv1_kernel=9, + conv1_stride=1, + frozen_stages=-1, + factorize=(1, 1, 0, 0), + norm_eval=False, + with_cp=False, + conv_cfg=dict(type='Conv'), + norm_cfg=dict(type='BN2d', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True), + zero_init_residual=True): + super().__init__() + if depth not in self.arch_settings: + raise KeyError(f'invalid depth {depth} for resnet') + self.depth = depth + self.pretrained = pretrained + self.in_channels = in_channels + self.base_channels = base_channels + self.num_stages = num_stages + assert 1 <= num_stages <= 4 + self.dilations = dilations + self.conv1_kernel = conv1_kernel + self.conv1_stride = conv1_stride + self.frozen_stages = frozen_stages + self.stage_factorization = _ntuple(num_stages)(factorize) + self.norm_eval = norm_eval + self.with_cp = with_cp + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.zero_init_residual = zero_init_residual + + self.block, stage_blocks = self.arch_settings[depth] + self.stage_blocks = stage_blocks[:num_stages] + self.inplanes = self.base_channels + + self._make_stem_layer() + + self.res_layers = [] + for i, num_blocks in enumerate(self.stage_blocks): + stride = strides[i] + dilation = dilations[i] + planes = self.base_channels * 2**i + res_layer = self.make_res_layer( + self.block, + self.inplanes, + planes, + num_blocks, + stride=stride, + dilation=dilation, + factorize=self.stage_factorization[i], + norm_cfg=self.norm_cfg, + with_cp=with_cp) + self.inplanes = planes * self.block.expansion + layer_name = f'layer{i + 1}' + self.add_module(layer_name, res_layer) + self.res_layers.append(layer_name) + + self.feat_dim = self.block.expansion * self.base_channels * 2**( + len(self.stage_blocks) - 1) + + @staticmethod + def make_res_layer(block, + inplanes, + planes, + blocks, + stride=1, + dilation=1, + factorize=1, + norm_cfg=None, + with_cp=False): + """Build residual layer for ResNetAudio. + + Args: + block (nn.Module): Residual module to be built. + inplanes (int): Number of channels for the input feature + in each block. + planes (int): Number of channels for the output feature + in each block. + blocks (int): Number of residual blocks. + stride (Sequence[int]): Strides of residual blocks of each stage. + Default: (1, 2, 2, 2). + dilation (int): Spacing between kernel elements. Default: 1. + factorize (int | Sequence[int]): Determine whether to factorize + for each block. Default: 1. + norm_cfg (dict): + Config for norm layers. required keys are `type` and + `requires_grad`. Default: None. + with_cp (bool): Use checkpoint or not. Using checkpoint will save + some memory while slowing down the training speed. + Default: False. + + Returns: + A residual layer for the given config. + """ + factorize = factorize if not isinstance( + factorize, int) else (factorize, ) * blocks + assert len(factorize) == blocks + downsample = None + if stride != 1 or inplanes != planes * block.expansion: + downsample = ConvModule( + inplanes, + planes * block.expansion, + kernel_size=1, + stride=stride, + bias=False, + norm_cfg=norm_cfg, + act_cfg=None) + + layers = [] + layers.append( + block( + inplanes, + planes, + stride, + dilation, + downsample, + factorize=(factorize[0] == 1), + norm_cfg=norm_cfg, + with_cp=with_cp)) + inplanes = planes * block.expansion + for i in range(1, blocks): + layers.append( + block( + inplanes, + planes, + 1, + dilation, + factorize=(factorize[i] == 1), + norm_cfg=norm_cfg, + with_cp=with_cp)) + + return nn.Sequential(*layers) + + def _make_stem_layer(self): + """Construct the stem layers consists of a conv+norm+act module and a + pooling layer.""" + self.conv1 = ConvModule( + self.in_channels, + self.base_channels, + kernel_size=self.conv1_kernel, + stride=self.conv1_stride, + bias=False, + conv_cfg=dict(type='ConvAudio', op='sum'), + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + + def _freeze_stages(self): + """Prevent all the parameters from being optimized before + ``self.frozen_stages``.""" + if self.frozen_stages >= 0: + self.conv1.bn.eval() + for m in [self.conv1.conv, self.conv1.bn]: + for param in m.parameters(): + param.requires_grad = False + + for i in range(1, self.frozen_stages + 1): + m = getattr(self, f'layer{i}') + m.eval() + for param in m.parameters(): + param.requires_grad = False + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + if isinstance(self.pretrained, str): + logger = get_root_logger() + logger.info(f'load model from: {self.pretrained}') + + load_checkpoint(self, self.pretrained, strict=False, logger=logger) + + elif self.pretrained is None: + for m in self.modules(): + if isinstance(m, nn.Conv2d): + kaiming_init(m) + elif isinstance(m, _BatchNorm): + constant_init(m, 1) + + if self.zero_init_residual: + for m in self.modules(): + if isinstance(m, Bottleneck2dAudio): + constant_init(m.conv3.bn, 0) + + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The feature of the input samples extracted + by the backbone. + """ + x = self.conv1(x) + for layer_name in self.res_layers: + res_layer = getattr(self, layer_name) + x = res_layer(x) + return x + + def train(self, mode=True): + """Set the optimization status when training.""" + super().train(mode) + self._freeze_stages() + if mode and self.norm_eval: + for m in self.modules(): + if isinstance(m, _BatchNorm): + m.eval() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet_tin.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet_tin.py new file mode 100644 index 0000000000000000000000000000000000000000..f5c8307c8da50a31f7be6e193bb2845514bf0444 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet_tin.py @@ -0,0 +1,377 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn + +from ..builder import BACKBONES +from .resnet_tsm import ResNetTSM + + +def linear_sampler(data, offset): + """Differentiable Temporal-wise Frame Sampling, which is essentially a + linear interpolation process. + + It gets the feature map which has been split into several groups + and shift them by different offsets according to their groups. + Then compute the weighted sum along with the temporal dimension. + + Args: + data (torch.Tensor): Split data for certain group in shape + [N, num_segments, C, H, W]. + offset (torch.Tensor): Data offsets for this group data in shape + [N, num_segments]. + """ + # [N, num_segments, C, H, W] + n, t, c, h, w = data.shape + + # offset0, offset1: [N, num_segments] + offset0 = torch.floor(offset).int() + offset1 = offset0 + 1 + + # data, data0, data1: [N, num_segments, C, H * W] + data = data.view(n, t, c, h * w).contiguous() + + try: + from mmcv.ops import tin_shift + except (ImportError, ModuleNotFoundError): + raise ImportError('Failed to import `tin_shift` from `mmcv.ops`. You ' + 'will be unable to use TIN. ') + + data0 = tin_shift(data, offset0) + data1 = tin_shift(data, offset1) + + # weight0, weight1: [N, num_segments] + weight0 = 1 - (offset - offset0.float()) + weight1 = 1 - weight0 + + # weight0, weight1: + # [N, num_segments] -> [N, num_segments, C // num_segments] -> [N, C] + group_size = offset.shape[1] + weight0 = weight0[:, :, None].repeat(1, 1, c // group_size) + weight0 = weight0.view(weight0.size(0), -1) + weight1 = weight1[:, :, None].repeat(1, 1, c // group_size) + weight1 = weight1.view(weight1.size(0), -1) + + # weight0, weight1: [N, C] -> [N, 1, C, 1] + weight0 = weight0[:, None, :, None] + weight1 = weight1[:, None, :, None] + + # output: [N, num_segments, C, H * W] -> [N, num_segments, C, H, W] + output = weight0 * data0 + weight1 * data1 + output = output.view(n, t, c, h, w) + + return output + + +class CombineNet(nn.Module): + """Combine Net. + + It combines Temporal interlace module with some part of ResNet layer. + + Args: + net1 (nn.module): Temporal interlace module. + net2 (nn.module): Some part of ResNet layer. + """ + + def __init__(self, net1, net2): + super().__init__() + self.net1 = net1 + self.net2 = net2 + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + # input shape: [num_batches * num_segments, C, H, W] + # output x shape: [num_batches * num_segments, C, H, W] + x = self.net1(x) + # [num_batches * num_segments, C, H, W] + x = self.net2(x) + return x + + +class WeightNet(nn.Module): + """WeightNet in Temporal interlace module. + + The WeightNet consists of two parts: one convolution layer + and a sigmoid function. Following the convolution layer, the sigmoid + function and rescale module can scale our output to the range (0, 2). + Here we set the initial bias of the convolution layer to 0, and the + final initial output will be 1.0. + + Args: + in_channels (int): Channel num of input features. + groups (int): Number of groups for fc layer outputs. + """ + + def __init__(self, in_channels, groups): + super().__init__() + self.sigmoid = nn.Sigmoid() + self.groups = groups + + self.conv = nn.Conv1d(in_channels, groups, 3, padding=1) + + self.init_weights() + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + # we set the initial bias of the convolution + # layer to 0, and the final initial output will be 1.0 + self.conv.bias.data[...] = 0 + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + # calculate weight + # [N, C, T] + n, _, t = x.shape + # [N, groups, T] + x = self.conv(x) + x = x.view(n, self.groups, t) + # [N, T, groups] + x = x.permute(0, 2, 1) + + # scale the output to range (0, 2) + x = 2 * self.sigmoid(x) + # [N, T, groups] + return x + + +class OffsetNet(nn.Module): + """OffsetNet in Temporal interlace module. + + The OffsetNet consists of one convolution layer and two fc layers + with a relu activation following with a sigmoid function. Following + the convolution layer, two fc layers and relu are applied to the output. + Then, apply the sigmoid function with a multiply factor and a minus 0.5 + to transform the output to (-4, 4). + + Args: + in_channels (int): Channel num of input features. + groups (int): Number of groups for fc layer outputs. + num_segments (int): Number of frame segments. + """ + + def __init__(self, in_channels, groups, num_segments): + super().__init__() + self.sigmoid = nn.Sigmoid() + # hard code ``kernel_size`` and ``padding`` according to original repo. + kernel_size = 3 + padding = 1 + + self.conv = nn.Conv1d(in_channels, 1, kernel_size, padding=padding) + self.fc1 = nn.Linear(num_segments, num_segments) + self.relu = nn.ReLU() + self.fc2 = nn.Linear(num_segments, groups) + + self.init_weights() + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + # The bias of the last fc layer is initialized to + # make the post-sigmoid output start from 1 + self.fc2.bias.data[...] = 0.5108 + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + # calculate offset + # [N, C, T] + n, _, t = x.shape + # [N, 1, T] + x = self.conv(x) + # [N, T] + x = x.view(n, t) + # [N, T] + x = self.relu(self.fc1(x)) + # [N, groups] + x = self.fc2(x) + # [N, 1, groups] + x = x.view(n, 1, -1) + + # to make sure the output is in (-t/2, t/2) + # where t = num_segments = 8 + x = 4 * (self.sigmoid(x) - 0.5) + # [N, 1, groups] + return x + + +class TemporalInterlace(nn.Module): + """Temporal interlace module. + + This module is proposed in `Temporal Interlacing Network + `_ + + Args: + in_channels (int): Channel num of input features. + num_segments (int): Number of frame segments. Default: 3. + shift_div (int): Number of division parts for shift. Default: 1. + """ + + def __init__(self, in_channels, num_segments=3, shift_div=1): + super().__init__() + self.num_segments = num_segments + self.shift_div = shift_div + self.in_channels = in_channels + # hard code ``deform_groups`` according to original repo. + self.deform_groups = 2 + + self.offset_net = OffsetNet(in_channels // shift_div, + self.deform_groups, num_segments) + self.weight_net = WeightNet(in_channels // shift_div, + self.deform_groups) + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + # x: [N, C, H, W], + # where N = num_batches x num_segments, C = shift_div * num_folds + n, c, h, w = x.size() + num_batches = n // self.num_segments + num_folds = c // self.shift_div + + # x_out: [num_batches x num_segments, C, H, W] + x_out = torch.zeros((n, c, h, w), device=x.device) + # x_descriptor: [num_batches, num_segments, num_folds, H, W] + x_descriptor = x[:, :num_folds, :, :].view(num_batches, + self.num_segments, + num_folds, h, w) + + # x should only obtain information on temporal and channel dimensions + # x_pooled: [num_batches, num_segments, num_folds, W] + x_pooled = torch.mean(x_descriptor, 3) + # x_pooled: [num_batches, num_segments, num_folds] + x_pooled = torch.mean(x_pooled, 3) + # x_pooled: [num_batches, num_folds, num_segments] + x_pooled = x_pooled.permute(0, 2, 1).contiguous() + + # Calculate weight and bias, here groups = 2 + # x_offset: [num_batches, groups] + x_offset = self.offset_net(x_pooled).view(num_batches, -1) + # x_weight: [num_batches, num_segments, groups] + x_weight = self.weight_net(x_pooled) + + # x_offset: [num_batches, 2 * groups] + x_offset = torch.cat([x_offset, -x_offset], 1) + # x_shift: [num_batches, num_segments, num_folds, H, W] + x_shift = linear_sampler(x_descriptor, x_offset) + + # x_weight: [num_batches, num_segments, groups, 1] + x_weight = x_weight[:, :, :, None] + # x_weight: + # [num_batches, num_segments, groups * 2, c // self.shift_div // 4] + x_weight = x_weight.repeat(1, 1, 2, num_folds // 2 // 2) + # x_weight: + # [num_batches, num_segments, c // self.shift_div = num_folds] + x_weight = x_weight.view(x_weight.size(0), x_weight.size(1), -1) + + # x_weight: [num_batches, num_segments, num_folds, 1, 1] + x_weight = x_weight[:, :, :, None, None] + # x_shift: [num_batches, num_segments, num_folds, H, W] + x_shift = x_shift * x_weight + # x_shift: [num_batches, num_segments, num_folds, H, W] + x_shift = x_shift.contiguous().view(n, num_folds, h, w) + + # x_out: [num_batches x num_segments, C, H, W] + x_out[:, :num_folds, :] = x_shift + x_out[:, num_folds:, :] = x[:, num_folds:, :] + + return x_out + + +@BACKBONES.register_module() +class ResNetTIN(ResNetTSM): + """ResNet backbone for TIN. + + Args: + depth (int): Depth of ResNet, from {18, 34, 50, 101, 152}. + num_segments (int): Number of frame segments. Default: 8. + is_tin (bool): Whether to apply temporal interlace. Default: True. + shift_div (int): Number of division parts for shift. Default: 4. + kwargs (dict, optional): Arguments for ResNet. + """ + + def __init__(self, + depth, + num_segments=8, + is_tin=True, + shift_div=4, + **kwargs): + super().__init__(depth, **kwargs) + self.num_segments = num_segments + self.is_tin = is_tin + self.shift_div = shift_div + + def make_temporal_interlace(self): + """Make temporal interlace for some layers.""" + num_segment_list = [self.num_segments] * 4 + assert num_segment_list[-1] > 0 + + n_round = 1 + if len(list(self.layer3.children())) >= 23: + print(f'=> Using n_round {n_round} to insert temporal shift.') + + def make_block_interlace(stage, num_segments, shift_div): + """Apply Deformable shift for a ResNet layer module. + + Args: + stage (nn.module): A ResNet layer to be deformed. + num_segments (int): Number of frame segments. + shift_div (int): Number of division parts for shift. + + Returns: + nn.Sequential: A Sequential container consisted of + deformed Interlace blocks. + """ + blocks = list(stage.children()) + for i, b in enumerate(blocks): + if i % n_round == 0: + tds = TemporalInterlace( + b.conv1.in_channels, + num_segments=num_segments, + shift_div=shift_div) + blocks[i].conv1.conv = CombineNet(tds, + blocks[i].conv1.conv) + return nn.Sequential(*blocks) + + self.layer1 = make_block_interlace(self.layer1, num_segment_list[0], + self.shift_div) + self.layer2 = make_block_interlace(self.layer2, num_segment_list[1], + self.shift_div) + self.layer3 = make_block_interlace(self.layer3, num_segment_list[2], + self.shift_div) + self.layer4 = make_block_interlace(self.layer4, num_segment_list[3], + self.shift_div) + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + super(ResNetTSM, self).init_weights() + if self.is_tin: + self.make_temporal_interlace() + if len(self.non_local_cfg) != 0: + self.make_non_local() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet_tsm.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet_tsm.py new file mode 100644 index 0000000000000000000000000000000000000000..0fbc20ed103fb9f8ae43dcaa97952af222d83ee1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/resnet_tsm.py @@ -0,0 +1,295 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +from mmcv.cnn import NonLocal3d +from torch.nn.modules.utils import _ntuple + +from ..builder import BACKBONES +from .resnet import ResNet + + +class NL3DWrapper(nn.Module): + """3D Non-local wrapper for ResNet50. + + Wrap ResNet layers with 3D NonLocal modules. + + Args: + block (nn.Module): Residual blocks to be built. + num_segments (int): Number of frame segments. + non_local_cfg (dict): Config for non-local layers. Default: ``dict()``. + """ + + def __init__(self, block, num_segments, non_local_cfg=dict()): + super(NL3DWrapper, self).__init__() + self.block = block + self.non_local_cfg = non_local_cfg + self.non_local_block = NonLocal3d(self.block.conv3.norm.num_features, + **self.non_local_cfg) + self.num_segments = num_segments + + def forward(self, x): + x = self.block(x) + + n, c, h, w = x.size() + x = x.view(n // self.num_segments, self.num_segments, c, h, + w).transpose(1, 2).contiguous() + x = self.non_local_block(x) + x = x.transpose(1, 2).contiguous().view(n, c, h, w) + return x + + +class TemporalShift(nn.Module): + """Temporal shift module. + + This module is proposed in + `TSM: Temporal Shift Module for Efficient Video Understanding + `_ + + Args: + net (nn.module): Module to make temporal shift. + num_segments (int): Number of frame segments. Default: 3. + shift_div (int): Number of divisions for shift. Default: 8. + """ + + def __init__(self, net, num_segments=3, shift_div=8): + super().__init__() + self.net = net + self.num_segments = num_segments + self.shift_div = shift_div + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + x = self.shift(x, self.num_segments, shift_div=self.shift_div) + return self.net(x) + + @staticmethod + def shift(x, num_segments, shift_div=3): + """Perform temporal shift operation on the feature. + + Args: + x (torch.Tensor): The input feature to be shifted. + num_segments (int): Number of frame segments. + shift_div (int): Number of divisions for shift. Default: 3. + + Returns: + torch.Tensor: The shifted feature. + """ + # [N, C, H, W] + n, c, h, w = x.size() + + # [N // num_segments, num_segments, C, H*W] + # can't use 5 dimensional array on PPL2D backend for caffe + x = x.view(-1, num_segments, c, h * w) + + # get shift fold + fold = c // shift_div + + # split c channel into three parts: + # left_split, mid_split, right_split + left_split = x[:, :, :fold, :] + mid_split = x[:, :, fold:2 * fold, :] + right_split = x[:, :, 2 * fold:, :] + + # can't use torch.zeros(*A.shape) or torch.zeros_like(A) + # because array on caffe inference must be got by computing + + # shift left on num_segments channel in `left_split` + zeros = left_split - left_split + blank = zeros[:, :1, :, :] + left_split = left_split[:, 1:, :, :] + left_split = torch.cat((left_split, blank), 1) + + # shift right on num_segments channel in `mid_split` + zeros = mid_split - mid_split + blank = zeros[:, :1, :, :] + mid_split = mid_split[:, :-1, :, :] + mid_split = torch.cat((blank, mid_split), 1) + + # right_split: no shift + + # concatenate + out = torch.cat((left_split, mid_split, right_split), 2) + + # [N, C, H, W] + # restore the original dimension + return out.view(n, c, h, w) + + +@BACKBONES.register_module() +class ResNetTSM(ResNet): + """ResNet backbone for TSM. + + Args: + num_segments (int): Number of frame segments. Default: 8. + is_shift (bool): Whether to make temporal shift in reset layers. + Default: True. + non_local (Sequence[int]): Determine whether to apply non-local module + in the corresponding block of each stages. Default: (0, 0, 0, 0). + non_local_cfg (dict): Config for non-local module. Default: ``dict()``. + shift_div (int): Number of div for shift. Default: 8. + shift_place (str): Places in resnet layers for shift, which is chosen + from ['block', 'blockres']. + If set to 'block', it will apply temporal shift to all child blocks + in each resnet layer. + If set to 'blockres', it will apply temporal shift to each `conv1` + layer of all child blocks in each resnet layer. + Default: 'blockres'. + temporal_pool (bool): Whether to add temporal pooling. Default: False. + **kwargs (keyword arguments, optional): Arguments for ResNet. + """ + + def __init__(self, + depth, + num_segments=8, + is_shift=True, + non_local=(0, 0, 0, 0), + non_local_cfg=dict(), + shift_div=8, + shift_place='blockres', + temporal_pool=False, + **kwargs): + super().__init__(depth, **kwargs) + self.num_segments = num_segments + self.is_shift = is_shift + self.shift_div = shift_div + self.shift_place = shift_place + self.temporal_pool = temporal_pool + self.non_local = non_local + self.non_local_stages = _ntuple(self.num_stages)(non_local) + self.non_local_cfg = non_local_cfg + + def make_temporal_shift(self): + """Make temporal shift for some layers.""" + if self.temporal_pool: + num_segment_list = [ + self.num_segments, self.num_segments // 2, + self.num_segments // 2, self.num_segments // 2 + ] + else: + num_segment_list = [self.num_segments] * 4 + if num_segment_list[-1] <= 0: + raise ValueError('num_segment_list[-1] must be positive') + + if self.shift_place == 'block': + + def make_block_temporal(stage, num_segments): + """Make temporal shift on some blocks. + + Args: + stage (nn.Module): Model layers to be shifted. + num_segments (int): Number of frame segments. + + Returns: + nn.Module: The shifted blocks. + """ + blocks = list(stage.children()) + for i, b in enumerate(blocks): + blocks[i] = TemporalShift( + b, num_segments=num_segments, shift_div=self.shift_div) + return nn.Sequential(*blocks) + + self.layer1 = make_block_temporal(self.layer1, num_segment_list[0]) + self.layer2 = make_block_temporal(self.layer2, num_segment_list[1]) + self.layer3 = make_block_temporal(self.layer3, num_segment_list[2]) + self.layer4 = make_block_temporal(self.layer4, num_segment_list[3]) + + elif 'blockres' in self.shift_place: + n_round = 1 + if len(list(self.layer3.children())) >= 23: + n_round = 2 + + def make_block_temporal(stage, num_segments): + """Make temporal shift on some blocks. + + Args: + stage (nn.Module): Model layers to be shifted. + num_segments (int): Number of frame segments. + + Returns: + nn.Module: The shifted blocks. + """ + blocks = list(stage.children()) + for i, b in enumerate(blocks): + if i % n_round == 0: + blocks[i].conv1.conv = TemporalShift( + b.conv1.conv, + num_segments=num_segments, + shift_div=self.shift_div) + return nn.Sequential(*blocks) + + self.layer1 = make_block_temporal(self.layer1, num_segment_list[0]) + self.layer2 = make_block_temporal(self.layer2, num_segment_list[1]) + self.layer3 = make_block_temporal(self.layer3, num_segment_list[2]) + self.layer4 = make_block_temporal(self.layer4, num_segment_list[3]) + + else: + raise NotImplementedError + + def make_temporal_pool(self): + """Make temporal pooling between layer1 and layer2, using a 3D max + pooling layer.""" + + class TemporalPool(nn.Module): + """Temporal pool module. + + Wrap layer2 in ResNet50 with a 3D max pooling layer. + + Args: + net (nn.Module): Module to make temporal pool. + num_segments (int): Number of frame segments. + """ + + def __init__(self, net, num_segments): + super().__init__() + self.net = net + self.num_segments = num_segments + self.max_pool3d = nn.MaxPool3d( + kernel_size=(3, 1, 1), stride=(2, 1, 1), padding=(1, 0, 0)) + + def forward(self, x): + # [N, C, H, W] + n, c, h, w = x.size() + # [N // num_segments, C, num_segments, H, W] + x = x.view(n // self.num_segments, self.num_segments, c, h, + w).transpose(1, 2) + # [N // num_segmnets, C, num_segments // 2, H, W] + x = self.max_pool3d(x) + # [N // 2, C, H, W] + x = x.transpose(1, 2).contiguous().view(n // 2, c, h, w) + return self.net(x) + + self.layer2 = TemporalPool(self.layer2, self.num_segments) + + def make_non_local(self): + # This part is for ResNet50 + for i in range(self.num_stages): + non_local_stage = self.non_local_stages[i] + if sum(non_local_stage) == 0: + continue + + layer_name = f'layer{i + 1}' + res_layer = getattr(self, layer_name) + + for idx, non_local in enumerate(non_local_stage): + if non_local: + res_layer[idx] = NL3DWrapper(res_layer[idx], + self.num_segments, + self.non_local_cfg) + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + super().init_weights() + if self.is_shift: + self.make_temporal_shift() + if len(self.non_local_cfg) != 0: + self.make_non_local() + if self.temporal_pool: + self.make_temporal_pool() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/stgcn.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/stgcn.py new file mode 100644 index 0000000000000000000000000000000000000000..99ab938b080b29cadc42458d3584dc6fc3c3e03f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/stgcn.py @@ -0,0 +1,281 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +from mmcv.cnn import constant_init, kaiming_init, normal_init +from mmcv.runner import load_checkpoint +from mmcv.utils import _BatchNorm + +from ...utils import get_root_logger +from ..builder import BACKBONES +from ..skeleton_gcn.utils import Graph + + +def zero(x): + """return zero.""" + return 0 + + +def identity(x): + """return input itself.""" + return x + + +class STGCNBlock(nn.Module): + """Applies a spatial temporal graph convolution over an input graph + sequence. + + Args: + in_channels (int): Number of channels in the input sequence data + out_channels (int): Number of channels produced by the convolution + kernel_size (tuple): Size of the temporal convolving kernel and + graph convolving kernel + stride (int, optional): Stride of the temporal convolution. Default: 1 + dropout (int, optional): Dropout rate of the final output. Default: 0 + residual (bool, optional): If ``True``, applies a residual mechanism. + Default: ``True`` + + Shape: + - Input[0]: Input graph sequence in :math:`(N, in_channels, T_{in}, V)` + format + - Input[1]: Input graph adjacency matrix in :math:`(K, V, V)` format + - Output[0]: Outpu graph sequence in :math:`(N, out_channels, T_{out}, + V)` format + - Output[1]: Graph adjacency matrix for output data in :math:`(K, V, + V)` format + + where + :math:`N` is a batch size, + :math:`K` is the spatial kernel size, as :math:`K == kernel_size[1] + `, + :math:`T_{in}/T_{out}` is a length of input/output sequence, + :math:`V` is the number of graph nodes. + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size, + stride=1, + dropout=0, + residual=True): + super().__init__() + + assert len(kernel_size) == 2 + assert kernel_size[0] % 2 == 1 + padding = ((kernel_size[0] - 1) // 2, 0) + + self.gcn = ConvTemporalGraphical(in_channels, out_channels, + kernel_size[1]) + self.tcn = nn.Sequential( + nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True), + nn.Conv2d(out_channels, out_channels, (kernel_size[0], 1), + (stride, 1), padding), nn.BatchNorm2d(out_channels), + nn.Dropout(dropout, inplace=True)) + + if not residual: + self.residual = zero + + elif (in_channels == out_channels) and (stride == 1): + self.residual = identity + + else: + self.residual = nn.Sequential( + nn.Conv2d( + in_channels, + out_channels, + kernel_size=1, + stride=(stride, 1)), nn.BatchNorm2d(out_channels)) + + self.relu = nn.ReLU(inplace=True) + + def forward(self, x, adj_mat): + """Defines the computation performed at every call.""" + res = self.residual(x) + x, adj_mat = self.gcn(x, adj_mat) + x = self.tcn(x) + res + + return self.relu(x), adj_mat + + +class ConvTemporalGraphical(nn.Module): + """The basic module for applying a graph convolution. + + Args: + in_channels (int): Number of channels in the input sequence data + out_channels (int): Number of channels produced by the convolution + kernel_size (int): Size of the graph convolving kernel + t_kernel_size (int): Size of the temporal convolving kernel + t_stride (int, optional): Stride of the temporal convolution. + Default: 1 + t_padding (int, optional): Temporal zero-padding added to both sides + of the input. Default: 0 + t_dilation (int, optional): Spacing between temporal kernel elements. + Default: 1 + bias (bool, optional): If ``True``, adds a learnable bias to the + output. Default: ``True`` + + Shape: + - Input[0]: Input graph sequence in :math:`(N, in_channels, T_{in}, V)` + format + - Input[1]: Input graph adjacency matrix in :math:`(K, V, V)` format + - Output[0]: Output graph sequence in :math:`(N, out_channels, T_{out} + , V)` format + - Output[1]: Graph adjacency matrix for output data in :math:`(K, V, V) + ` format + + where + :math:`N` is a batch size, + :math:`K` is the spatial kernel size, as :math:`K == kernel_size[1] + `, + :math:`T_{in}/T_{out}` is a length of input/output sequence, + :math:`V` is the number of graph nodes. + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size, + t_kernel_size=1, + t_stride=1, + t_padding=0, + t_dilation=1, + bias=True): + super().__init__() + + self.kernel_size = kernel_size + self.conv = nn.Conv2d( + in_channels, + out_channels * kernel_size, + kernel_size=(t_kernel_size, 1), + padding=(t_padding, 0), + stride=(t_stride, 1), + dilation=(t_dilation, 1), + bias=bias) + + def forward(self, x, adj_mat): + """Defines the computation performed at every call.""" + assert adj_mat.size(0) == self.kernel_size + + x = self.conv(x) + + n, kc, t, v = x.size() + x = x.view(n, self.kernel_size, kc // self.kernel_size, t, v) + x = torch.einsum('nkctv,kvw->nctw', (x, adj_mat)) + + return x.contiguous(), adj_mat + + +@BACKBONES.register_module() +class STGCN(nn.Module): + """Backbone of Spatial temporal graph convolutional networks. + + Args: + in_channels (int): Number of channels in the input data. + graph_cfg (dict): The arguments for building the graph. + edge_importance_weighting (bool): If ``True``, adds a learnable + importance weighting to the edges of the graph. Default: True. + data_bn (bool): If 'True', adds data normalization to the inputs. + Default: True. + pretrained (str | None): Name of pretrained model. + **kwargs (optional): Other parameters for graph convolution units. + + Shape: + - Input: :math:`(N, in_channels, T_{in}, V_{in}, M_{in})` + - Output: :math:`(N, num_class)` where + :math:`N` is a batch size, + :math:`T_{in}` is a length of input sequence, + :math:`V_{in}` is the number of graph nodes, + :math:`M_{in}` is the number of instance in a frame. + """ + + def __init__(self, + in_channels, + graph_cfg, + edge_importance_weighting=True, + data_bn=True, + pretrained=None, + **kwargs): + super().__init__() + + # load graph + self.graph = Graph(**graph_cfg) + A = torch.tensor( + self.graph.A, dtype=torch.float32, requires_grad=False) + self.register_buffer('A', A) + + # build networks + spatial_kernel_size = A.size(0) + temporal_kernel_size = 9 + kernel_size = (temporal_kernel_size, spatial_kernel_size) + self.data_bn = nn.BatchNorm1d(in_channels * + A.size(1)) if data_bn else identity + + kwargs0 = {k: v for k, v in kwargs.items() if k != 'dropout'} + self.st_gcn_networks = nn.ModuleList(( + STGCNBlock( + in_channels, 64, kernel_size, 1, residual=False, **kwargs0), + STGCNBlock(64, 64, kernel_size, 1, **kwargs), + STGCNBlock(64, 64, kernel_size, 1, **kwargs), + STGCNBlock(64, 64, kernel_size, 1, **kwargs), + STGCNBlock(64, 128, kernel_size, 2, **kwargs), + STGCNBlock(128, 128, kernel_size, 1, **kwargs), + STGCNBlock(128, 128, kernel_size, 1, **kwargs), + STGCNBlock(128, 256, kernel_size, 2, **kwargs), + STGCNBlock(256, 256, kernel_size, 1, **kwargs), + STGCNBlock(256, 256, kernel_size, 1, **kwargs), + )) + + # initialize parameters for edge importance weighting + if edge_importance_weighting: + self.edge_importance = nn.ParameterList([ + nn.Parameter(torch.ones(self.A.size())) + for i in self.st_gcn_networks + ]) + else: + self.edge_importance = [1 for _ in self.st_gcn_networks] + + self.pretrained = pretrained + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + if isinstance(self.pretrained, str): + logger = get_root_logger() + logger.info(f'load model from: {self.pretrained}') + + load_checkpoint(self, self.pretrained, strict=False, logger=logger) + + elif self.pretrained is None: + for m in self.modules(): + if isinstance(m, nn.Conv2d): + kaiming_init(m) + elif isinstance(m, nn.Linear): + normal_init(m) + elif isinstance(m, _BatchNorm): + constant_init(m, 1) + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, x): + """Defines the computation performed at every call. + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + # data normalization + x = x.float() + n, c, t, v, m = x.size() # bs 3 300 25(17) 2 + x = x.permute(0, 4, 3, 1, 2).contiguous() # N M V C T + x = x.view(n * m, v * c, t) + x = self.data_bn(x) + x = x.view(n, m, v, c, t) + x = x.permute(0, 1, 3, 4, 2).contiguous() + x = x.view(n * m, c, t, v) # bsx2 3 300 25(17) + + # forward + for gcn, importance in zip(self.st_gcn_networks, self.edge_importance): + x, _ = gcn(x, self.A * importance) + + return x diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/tanet.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/tanet.py new file mode 100644 index 0000000000000000000000000000000000000000..8cbaa8fcd93fa39aa2d13f8e46ea93f7dae9cecc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/tanet.py @@ -0,0 +1,115 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from copy import deepcopy + +import torch.nn as nn +from torch.utils import checkpoint as cp + +from ..builder import BACKBONES +from ..common import TAM +from .resnet import Bottleneck, ResNet + + +class TABlock(nn.Module): + """Temporal Adaptive Block (TA-Block) for TANet. + + This block is proposed in `TAM: TEMPORAL ADAPTIVE MODULE FOR VIDEO + RECOGNITION `_ + + The temporal adaptive module (TAM) is embedded into ResNet-Block + after the first Conv2D, which turns the vanilla ResNet-Block + into TA-Block. + + Args: + block (nn.Module): Residual blocks to be substituted. + num_segments (int): Number of frame segments. + tam_cfg (dict): Config for temporal adaptive module (TAM). + Default: dict(). + """ + + def __init__(self, block, num_segments, tam_cfg=dict()): + super().__init__() + self.tam_cfg = deepcopy(tam_cfg) + self.block = block + self.num_segments = num_segments + self.tam = TAM( + in_channels=block.conv1.out_channels, + num_segments=num_segments, + **self.tam_cfg) + + if not isinstance(self.block, Bottleneck): + raise NotImplementedError('TA-Blocks have not been fully ' + 'implemented except the pattern based ' + 'on Bottleneck block.') + + def forward(self, x): + assert isinstance(self.block, Bottleneck) + + def _inner_forward(x): + """Forward wrapper for utilizing checkpoint.""" + identity = x + + out = self.block.conv1(x) + out = self.tam(out) + out = self.block.conv2(out) + out = self.block.conv3(out) + + if self.block.downsample is not None: + identity = self.block.downsample(x) + + out = out + identity + + return out + + if self.block.with_cp and x.requires_grad: + out = cp.checkpoint(_inner_forward, x) + else: + out = _inner_forward(x) + + out = self.block.relu(out) + + return out + + +@BACKBONES.register_module() +class TANet(ResNet): + """Temporal Adaptive Network (TANet) backbone. + + This backbone is proposed in `TAM: TEMPORAL ADAPTIVE MODULE FOR VIDEO + RECOGNITION `_ + + Embedding the temporal adaptive module (TAM) into ResNet to + instantiate TANet. + + Args: + depth (int): Depth of resnet, from {18, 34, 50, 101, 152}. + num_segments (int): Number of frame segments. + tam_cfg (dict | None): Config for temporal adaptive module (TAM). + Default: dict(). + **kwargs (keyword arguments, optional): Arguments for ResNet except + ```depth```. + """ + + def __init__(self, depth, num_segments, tam_cfg=dict(), **kwargs): + super().__init__(depth, **kwargs) + assert num_segments >= 3 + self.num_segments = num_segments + self.tam_cfg = deepcopy(tam_cfg) + + def init_weights(self): + super().init_weights() + self.make_tam_modeling() + + def make_tam_modeling(self): + """Replace ResNet-Block with TA-Block.""" + + def make_tam_block(stage, num_segments, tam_cfg=dict()): + blocks = list(stage.children()) + for i, block in enumerate(blocks): + blocks[i] = TABlock(block, num_segments, deepcopy(tam_cfg)) + return nn.Sequential(*blocks) + + for i in range(self.num_stages): + layer_name = f'layer{i + 1}' + res_layer = getattr(self, layer_name) + setattr(self, layer_name, + make_tam_block(res_layer, self.num_segments, self.tam_cfg)) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/timesformer.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/timesformer.py new file mode 100644 index 0000000000000000000000000000000000000000..26a9d7ad681465c6e4cf68a998a65d0e2adb0a35 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/timesformer.py @@ -0,0 +1,285 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import torch +import torch.nn as nn +from einops import rearrange +from mmcv import ConfigDict +from mmcv.cnn import build_conv_layer, build_norm_layer, kaiming_init +from mmcv.cnn.bricks.transformer import build_transformer_layer_sequence +from mmcv.cnn.utils.weight_init import trunc_normal_ +from mmcv.runner import _load_checkpoint, load_state_dict +from torch.nn.modules.utils import _pair + +from ...utils import get_root_logger +from ..builder import BACKBONES + + +class PatchEmbed(nn.Module): + """Image to Patch Embedding. + + Args: + img_size (int | tuple): Size of input image. + patch_size (int): Size of one patch. + in_channels (int): Channel num of input features. Defaults to 3. + embed_dims (int): Dimensions of embedding. Defaults to 768. + conv_cfg (dict | None): Config dict for convolution layer. Defaults to + `dict(type='Conv2d')`. + """ + + def __init__(self, + img_size, + patch_size, + in_channels=3, + embed_dims=768, + conv_cfg=dict(type='Conv2d')): + super().__init__() + self.img_size = _pair(img_size) + self.patch_size = _pair(patch_size) + + num_patches = (self.img_size[1] // self.patch_size[1]) * ( + self.img_size[0] // self.patch_size[0]) + assert num_patches * self.patch_size[0] * self.patch_size[1] == \ + self.img_size[0] * self.img_size[1], \ + 'The image size H*W must be divisible by patch size' + self.num_patches = num_patches + + # Use conv layer to embed + self.projection = build_conv_layer( + conv_cfg, + in_channels, + embed_dims, + kernel_size=patch_size, + stride=patch_size) + + self.init_weights() + + def init_weights(self): + # Lecun norm from ClassyVision + kaiming_init(self.projection, mode='fan_in', nonlinearity='linear') + + def forward(self, x): + x = rearrange(x, 'b c t h w -> (b t) c h w') + x = self.projection(x).flatten(2).transpose(1, 2) + return x + + +@BACKBONES.register_module() +class TimeSformer(nn.Module): + """TimeSformer. A PyTorch impl of `Is Space-Time Attention All You Need for + Video Understanding? `_ + + Args: + num_frames (int): Number of frames in the video. + img_size (int | tuple): Size of input image. + patch_size (int): Size of one patch. + pretrained (str | None): Name of pretrained model. Default: None. + embed_dims (int): Dimensions of embedding. Defaults to 768. + num_heads (int): Number of parallel attention heads in + TransformerCoder. Defaults to 12. + num_transformer_layers (int): Number of transformer layers. Defaults to + 12. + in_channels (int): Channel num of input features. Defaults to 3. + dropout_ratio (float): Probability of dropout layer. Defaults to 0.. + transformer_layers (list[obj:`mmcv.ConfigDict`] | + obj:`mmcv.ConfigDict` | None): Config of transformerlayer in + TransformerCoder. If it is obj:`mmcv.ConfigDict`, it would be + repeated `num_transformer_layers` times to a + list[obj:`mmcv.ConfigDict`]. Defaults to None. + attention_type (str): Type of attentions in TransformerCoder. Choices + are 'divided_space_time', 'space_only' and 'joint_space_time'. + Defaults to 'divided_space_time'. + norm_cfg (dict): Config for norm layers. Defaults to + `dict(type='LN', eps=1e-6)`. + """ + supported_attention_types = [ + 'divided_space_time', 'space_only', 'joint_space_time' + ] + + def __init__(self, + num_frames, + img_size, + patch_size, + pretrained=None, + embed_dims=768, + num_heads=12, + num_transformer_layers=12, + in_channels=3, + dropout_ratio=0., + transformer_layers=None, + attention_type='divided_space_time', + norm_cfg=dict(type='LN', eps=1e-6), + **kwargs): + super().__init__(**kwargs) + assert attention_type in self.supported_attention_types, ( + f'Unsupported Attention Type {attention_type}!') + assert transformer_layers is None or isinstance( + transformer_layers, (dict, list)) + + self.num_frames = num_frames + self.pretrained = pretrained + self.embed_dims = embed_dims + self.num_transformer_layers = num_transformer_layers + self.attention_type = attention_type + + self.patch_embed = PatchEmbed( + img_size=img_size, + patch_size=patch_size, + in_channels=in_channels, + embed_dims=embed_dims) + num_patches = self.patch_embed.num_patches + + self.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dims)) + self.pos_embed = nn.Parameter( + torch.zeros(1, num_patches + 1, embed_dims)) + self.drop_after_pos = nn.Dropout(p=dropout_ratio) + if self.attention_type != 'space_only': + self.time_embed = nn.Parameter( + torch.zeros(1, num_frames, embed_dims)) + self.drop_after_time = nn.Dropout(p=dropout_ratio) + + self.norm = build_norm_layer(norm_cfg, embed_dims)[1] + + if transformer_layers is None: + # stochastic depth decay rule + dpr = np.linspace(0, 0.1, num_transformer_layers) + + if self.attention_type == 'divided_space_time': + _transformerlayers_cfg = [ + dict( + type='BaseTransformerLayer', + attn_cfgs=[ + dict( + type='DividedTemporalAttentionWithNorm', + embed_dims=embed_dims, + num_heads=num_heads, + num_frames=num_frames, + dropout_layer=dict( + type='DropPath', drop_prob=dpr[i]), + norm_cfg=dict(type='LN', eps=1e-6)), + dict( + type='DividedSpatialAttentionWithNorm', + embed_dims=embed_dims, + num_heads=num_heads, + num_frames=num_frames, + dropout_layer=dict( + type='DropPath', drop_prob=dpr[i]), + norm_cfg=dict(type='LN', eps=1e-6)) + ], + ffn_cfgs=dict( + type='FFNWithNorm', + embed_dims=embed_dims, + feedforward_channels=embed_dims * 4, + num_fcs=2, + act_cfg=dict(type='GELU'), + dropout_layer=dict( + type='DropPath', drop_prob=dpr[i]), + norm_cfg=dict(type='LN', eps=1e-6)), + operation_order=('self_attn', 'self_attn', 'ffn')) + for i in range(num_transformer_layers) + ] + else: + # Sapce Only & Joint Space Time + _transformerlayers_cfg = [ + dict( + type='BaseTransformerLayer', + attn_cfgs=[ + dict( + type='MultiheadAttention', + embed_dims=embed_dims, + num_heads=num_heads, + batch_first=True, + dropout_layer=dict( + type='DropPath', drop_prob=dpr[i])) + ], + ffn_cfgs=dict( + type='FFN', + embed_dims=embed_dims, + feedforward_channels=embed_dims * 4, + num_fcs=2, + act_cfg=dict(type='GELU'), + dropout_layer=dict( + type='DropPath', drop_prob=dpr[i])), + operation_order=('norm', 'self_attn', 'norm', 'ffn'), + norm_cfg=dict(type='LN', eps=1e-6), + batch_first=True) + for i in range(num_transformer_layers) + ] + + transformer_layers = ConfigDict( + dict( + type='TransformerLayerSequence', + transformerlayers=_transformerlayers_cfg, + num_layers=num_transformer_layers)) + + self.transformer_layers = build_transformer_layer_sequence( + transformer_layers) + + def init_weights(self, pretrained=None): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + trunc_normal_(self.pos_embed, std=.02) + trunc_normal_(self.cls_token, std=.02) + + if pretrained: + self.pretrained = pretrained + if isinstance(self.pretrained, str): + logger = get_root_logger() + logger.info(f'load model from: {self.pretrained}') + + state_dict = _load_checkpoint(self.pretrained) + if 'state_dict' in state_dict: + state_dict = state_dict['state_dict'] + + if self.attention_type == 'divided_space_time': + # modify the key names of norm layers + old_state_dict_keys = list(state_dict.keys()) + for old_key in old_state_dict_keys: + if 'norms' in old_key: + new_key = old_key.replace('norms.0', + 'attentions.0.norm') + new_key = new_key.replace('norms.1', 'ffns.0.norm') + state_dict[new_key] = state_dict.pop(old_key) + + # copy the parameters of space attention to time attention + old_state_dict_keys = list(state_dict.keys()) + for old_key in old_state_dict_keys: + if 'attentions.0' in old_key: + new_key = old_key.replace('attentions.0', + 'attentions.1') + state_dict[new_key] = state_dict[old_key].clone() + + load_state_dict(self, state_dict, strict=False, logger=logger) + + def forward(self, x): + """Defines the computation performed at every call.""" + # x [batch_size * num_frames, num_patches, embed_dims] + batches = x.shape[0] + x = self.patch_embed(x) + + # x [batch_size * num_frames, num_patches + 1, embed_dims] + cls_tokens = self.cls_token.expand(x.size(0), -1, -1) + x = torch.cat((cls_tokens, x), dim=1) + x = x + self.pos_embed + x = self.drop_after_pos(x) + + # Add Time Embedding + if self.attention_type != 'space_only': + # x [batch_size, num_patches * num_frames + 1, embed_dims] + cls_tokens = x[:batches, 0, :].unsqueeze(1) + x = rearrange(x[:, 1:, :], '(b t) p m -> (b p) t m', b=batches) + x = x + self.time_embed + x = self.drop_after_time(x) + x = rearrange(x, '(b p) t m -> b (p t) m', b=batches) + x = torch.cat((cls_tokens, x), dim=1) + + x = self.transformer_layers(x, None, None) + + if self.attention_type == 'space_only': + # x [batch_size, num_patches + 1, embed_dims] + x = x.view(-1, self.num_frames, *x.size()[-2:]) + x = torch.mean(x, 1) + + x = self.norm(x) + + # Return Class Token + return x[:, 0] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/x3d.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/x3d.py new file mode 100644 index 0000000000000000000000000000000000000000..357af53ae5205dec94c93a48e1d6839393e8a109 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/backbones/x3d.py @@ -0,0 +1,524 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import math + +import torch.nn as nn +import torch.utils.checkpoint as cp +from mmcv.cnn import (ConvModule, Swish, build_activation_layer, constant_init, + kaiming_init) +from mmcv.runner import load_checkpoint +from mmcv.utils import _BatchNorm + +from ...utils import get_root_logger +from ..builder import BACKBONES + + +class SEModule(nn.Module): + + def __init__(self, channels, reduction): + super().__init__() + self.avg_pool = nn.AdaptiveAvgPool3d(1) + self.bottleneck = self._round_width(channels, reduction) + self.fc1 = nn.Conv3d( + channels, self.bottleneck, kernel_size=1, padding=0) + self.relu = nn.ReLU() + self.fc2 = nn.Conv3d( + self.bottleneck, channels, kernel_size=1, padding=0) + self.sigmoid = nn.Sigmoid() + + @staticmethod + def _round_width(width, multiplier, min_width=8, divisor=8): + width *= multiplier + min_width = min_width or divisor + width_out = max(min_width, + int(width + divisor / 2) // divisor * divisor) + if width_out < 0.9 * width: + width_out += divisor + return int(width_out) + + def forward(self, x): + module_input = x + x = self.avg_pool(x) + x = self.fc1(x) + x = self.relu(x) + x = self.fc2(x) + x = self.sigmoid(x) + return module_input * x + + +class BlockX3D(nn.Module): + """BlockX3D 3d building block for X3D. + + Args: + inplanes (int): Number of channels for the input in first conv3d layer. + planes (int): Number of channels produced by some norm/conv3d layers. + outplanes (int): Number of channels produced by final the conv3d layer. + spatial_stride (int): Spatial stride in the conv3d layer. Default: 1. + downsample (nn.Module | None): Downsample layer. Default: None. + se_ratio (float | None): The reduction ratio of squeeze and excitation + unit. If set as None, it means not using SE unit. Default: None. + use_swish (bool): Whether to use swish as the activation function + before and after the 3x3x3 conv. Default: True. + conv_cfg (dict): Config dict for convolution layer. + Default: ``dict(type='Conv3d')``. + norm_cfg (dict): Config for norm layers. required keys are ``type``, + Default: ``dict(type='BN3d')``. + act_cfg (dict): Config dict for activation layer. + Default: ``dict(type='ReLU')``. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + """ + + def __init__(self, + inplanes, + planes, + outplanes, + spatial_stride=1, + downsample=None, + se_ratio=None, + use_swish=True, + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d'), + act_cfg=dict(type='ReLU'), + with_cp=False): + super().__init__() + + self.inplanes = inplanes + self.planes = planes + self.outplanes = outplanes + self.spatial_stride = spatial_stride + self.downsample = downsample + self.se_ratio = se_ratio + self.use_swish = use_swish + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.act_cfg_swish = dict(type='Swish') + self.with_cp = with_cp + + self.conv1 = ConvModule( + in_channels=inplanes, + out_channels=planes, + kernel_size=1, + stride=1, + padding=0, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + # Here we use the channel-wise conv + self.conv2 = ConvModule( + in_channels=planes, + out_channels=planes, + kernel_size=3, + stride=(1, self.spatial_stride, self.spatial_stride), + padding=1, + groups=planes, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=None) + + self.swish = Swish() + + self.conv3 = ConvModule( + in_channels=planes, + out_channels=outplanes, + kernel_size=1, + stride=1, + padding=0, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=None) + + if self.se_ratio is not None: + self.se_module = SEModule(planes, self.se_ratio) + + self.relu = build_activation_layer(self.act_cfg) + + def forward(self, x): + """Defines the computation performed at every call.""" + + def _inner_forward(x): + """Forward wrapper for utilizing checkpoint.""" + identity = x + + out = self.conv1(x) + out = self.conv2(out) + if self.se_ratio is not None: + out = self.se_module(out) + + out = self.swish(out) + + out = self.conv3(out) + + if self.downsample is not None: + identity = self.downsample(x) + + out = out + identity + return out + + if self.with_cp and x.requires_grad: + out = cp.checkpoint(_inner_forward, x) + else: + out = _inner_forward(x) + out = self.relu(out) + return out + + +# We do not support initialize with 2D pretrain weight for X3D +@BACKBONES.register_module() +class X3D(nn.Module): + """X3D backbone. https://arxiv.org/pdf/2004.04730.pdf. + + Args: + gamma_w (float): Global channel width expansion factor. Default: 1. + gamma_b (float): Bottleneck channel width expansion factor. Default: 1. + gamma_d (float): Network depth expansion factor. Default: 1. + pretrained (str | None): Name of pretrained model. Default: None. + in_channels (int): Channel num of input features. Default: 3. + num_stages (int): Resnet stages. Default: 4. + spatial_strides (Sequence[int]): + Spatial strides of residual blocks of each stage. + Default: ``(1, 2, 2, 2)``. + frozen_stages (int): Stages to be frozen (all param fixed). If set to + -1, it means not freezing any parameters. Default: -1. + se_style (str): The style of inserting SE modules into BlockX3D, 'half' + denotes insert into half of the blocks, while 'all' denotes insert + into all blocks. Default: 'half'. + se_ratio (float | None): The reduction ratio of squeeze and excitation + unit. If set as None, it means not using SE unit. Default: 1 / 16. + use_swish (bool): Whether to use swish as the activation function + before and after the 3x3x3 conv. Default: True. + conv_cfg (dict): Config for conv layers. required keys are ``type`` + Default: ``dict(type='Conv3d')``. + norm_cfg (dict): Config for norm layers. required keys are ``type`` and + ``requires_grad``. + Default: ``dict(type='BN3d', requires_grad=True)``. + act_cfg (dict): Config dict for activation layer. + Default: ``dict(type='ReLU', inplace=True)``. + norm_eval (bool): Whether to set BN layers to eval mode, namely, freeze + running stats (mean and var). Default: False. + with_cp (bool): Use checkpoint or not. Using checkpoint will save some + memory while slowing down the training speed. Default: False. + zero_init_residual (bool): + Whether to use zero initialization for residual block, + Default: True. + kwargs (dict, optional): Key arguments for "make_res_layer". + """ + + def __init__(self, + gamma_w=1.0, + gamma_b=1.0, + gamma_d=1.0, + pretrained=None, + in_channels=3, + num_stages=4, + spatial_strides=(2, 2, 2, 2), + frozen_stages=-1, + se_style='half', + se_ratio=1 / 16, + use_swish=True, + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True), + norm_eval=False, + with_cp=False, + zero_init_residual=True, + **kwargs): + super().__init__() + self.gamma_w = gamma_w + self.gamma_b = gamma_b + self.gamma_d = gamma_d + + self.pretrained = pretrained + self.in_channels = in_channels + # Hard coded, can be changed by gamma_w + self.base_channels = 24 + self.stage_blocks = [1, 2, 5, 3] + + # apply parameters gamma_w and gamma_d + self.base_channels = self._round_width(self.base_channels, + self.gamma_w) + + self.stage_blocks = [ + self._round_repeats(x, self.gamma_d) for x in self.stage_blocks + ] + + self.num_stages = num_stages + assert 1 <= num_stages <= 4 + self.spatial_strides = spatial_strides + assert len(spatial_strides) == num_stages + self.frozen_stages = frozen_stages + + self.se_style = se_style + assert self.se_style in ['all', 'half'] + self.se_ratio = se_ratio + assert (self.se_ratio is None) or (self.se_ratio > 0) + self.use_swish = use_swish + + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.norm_eval = norm_eval + self.with_cp = with_cp + self.zero_init_residual = zero_init_residual + + self.block = BlockX3D + self.stage_blocks = self.stage_blocks[:num_stages] + self.layer_inplanes = self.base_channels + self._make_stem_layer() + + self.res_layers = [] + for i, num_blocks in enumerate(self.stage_blocks): + spatial_stride = spatial_strides[i] + inplanes = self.base_channels * 2**i + planes = int(inplanes * self.gamma_b) + + res_layer = self.make_res_layer( + self.block, + self.layer_inplanes, + inplanes, + planes, + num_blocks, + spatial_stride=spatial_stride, + se_style=self.se_style, + se_ratio=self.se_ratio, + use_swish=self.use_swish, + norm_cfg=self.norm_cfg, + conv_cfg=self.conv_cfg, + act_cfg=self.act_cfg, + with_cp=with_cp, + **kwargs) + self.layer_inplanes = inplanes + layer_name = f'layer{i + 1}' + self.add_module(layer_name, res_layer) + self.res_layers.append(layer_name) + + self.feat_dim = self.base_channels * 2**(len(self.stage_blocks) - 1) + self.conv5 = ConvModule( + self.feat_dim, + int(self.feat_dim * self.gamma_b), + kernel_size=1, + stride=1, + padding=0, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + self.feat_dim = int(self.feat_dim * self.gamma_b) + + @staticmethod + def _round_width(width, multiplier, min_depth=8, divisor=8): + """Round width of filters based on width multiplier.""" + if not multiplier: + return width + + width *= multiplier + min_depth = min_depth or divisor + new_filters = max(min_depth, + int(width + divisor / 2) // divisor * divisor) + if new_filters < 0.9 * width: + new_filters += divisor + return int(new_filters) + + @staticmethod + def _round_repeats(repeats, multiplier): + """Round number of layers based on depth multiplier.""" + if not multiplier: + return repeats + return int(math.ceil(multiplier * repeats)) + + # the module is parameterized with gamma_b + # no temporal_stride + def make_res_layer(self, + block, + layer_inplanes, + inplanes, + planes, + blocks, + spatial_stride=1, + se_style='half', + se_ratio=None, + use_swish=True, + norm_cfg=None, + act_cfg=None, + conv_cfg=None, + with_cp=False, + **kwargs): + """Build residual layer for ResNet3D. + + Args: + block (nn.Module): Residual module to be built. + layer_inplanes (int): Number of channels for the input feature + of the res layer. + inplanes (int): Number of channels for the input feature in each + block, which equals to base_channels * gamma_w. + planes (int): Number of channels for the output feature in each + block, which equals to base_channel * gamma_w * gamma_b. + blocks (int): Number of residual blocks. + spatial_stride (int): Spatial strides in residual and conv layers. + Default: 1. + se_style (str): The style of inserting SE modules into BlockX3D, + 'half' denotes insert into half of the blocks, while 'all' + denotes insert into all blocks. Default: 'half'. + se_ratio (float | None): The reduction ratio of squeeze and + excitation unit. If set as None, it means not using SE unit. + Default: None. + use_swish (bool): Whether to use swish as the activation function + before and after the 3x3x3 conv. Default: True. + conv_cfg (dict | None): Config for norm layers. Default: None. + norm_cfg (dict | None): Config for norm layers. Default: None. + act_cfg (dict | None): Config for activate layers. Default: None. + with_cp (bool | None): Use checkpoint or not. Using checkpoint + will save some memory while slowing down the training speed. + Default: False. + + Returns: + nn.Module: A residual layer for the given config. + """ + downsample = None + if spatial_stride != 1 or layer_inplanes != inplanes: + downsample = ConvModule( + layer_inplanes, + inplanes, + kernel_size=1, + stride=(1, spatial_stride, spatial_stride), + padding=0, + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + + use_se = [False] * blocks + if self.se_style == 'all': + use_se = [True] * blocks + elif self.se_style == 'half': + use_se = [i % 2 == 0 for i in range(blocks)] + else: + raise NotImplementedError + + layers = [] + layers.append( + block( + layer_inplanes, + planes, + inplanes, + spatial_stride=spatial_stride, + downsample=downsample, + se_ratio=se_ratio if use_se[0] else None, + use_swish=use_swish, + norm_cfg=norm_cfg, + conv_cfg=conv_cfg, + act_cfg=act_cfg, + with_cp=with_cp, + **kwargs)) + + for i in range(1, blocks): + layers.append( + block( + inplanes, + planes, + inplanes, + spatial_stride=1, + se_ratio=se_ratio if use_se[i] else None, + use_swish=use_swish, + norm_cfg=norm_cfg, + conv_cfg=conv_cfg, + act_cfg=act_cfg, + with_cp=with_cp, + **kwargs)) + + return nn.Sequential(*layers) + + def _make_stem_layer(self): + """Construct the stem layers consists of a conv+norm+act module and a + pooling layer.""" + self.conv1_s = ConvModule( + self.in_channels, + self.base_channels, + kernel_size=(1, 3, 3), + stride=(1, 2, 2), + padding=(0, 1, 1), + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=None, + act_cfg=None) + self.conv1_t = ConvModule( + self.base_channels, + self.base_channels, + kernel_size=(5, 1, 1), + stride=(1, 1, 1), + padding=(2, 0, 0), + groups=self.base_channels, + bias=False, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + act_cfg=self.act_cfg) + + def _freeze_stages(self): + """Prevent all the parameters from being optimized before + ``self.frozen_stages``.""" + if self.frozen_stages >= 0: + self.conv1_s.eval() + self.conv1_t.eval() + for param in self.conv1_s.parameters(): + param.requires_grad = False + for param in self.conv1_t.parameters(): + param.requires_grad = False + + for i in range(1, self.frozen_stages + 1): + m = getattr(self, f'layer{i}') + m.eval() + for param in m.parameters(): + param.requires_grad = False + + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + if isinstance(self.pretrained, str): + logger = get_root_logger() + logger.info(f'load model from: {self.pretrained}') + + load_checkpoint(self, self.pretrained, strict=False, logger=logger) + + elif self.pretrained is None: + for m in self.modules(): + if isinstance(m, nn.Conv3d): + kaiming_init(m) + elif isinstance(m, _BatchNorm): + constant_init(m, 1) + + if self.zero_init_residual: + for m in self.modules(): + if isinstance(m, BlockX3D): + constant_init(m.conv3.bn, 0) + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The feature of the input + samples extracted by the backbone. + """ + x = self.conv1_s(x) + x = self.conv1_t(x) + for layer_name in self.res_layers: + res_layer = getattr(self, layer_name) + x = res_layer(x) + x = self.conv5(x) + return x + + def train(self, mode=True): + """Set the optimization status when training.""" + super().train(mode) + self._freeze_stages() + if mode and self.norm_eval: + for m in self.modules(): + if isinstance(m, _BatchNorm): + m.eval() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/builder.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/builder.py new file mode 100644 index 0000000000000000000000000000000000000000..86a5cef146404582126d08960abfead54ed722db --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/builder.py @@ -0,0 +1,92 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import warnings + +from mmcv.cnn import MODELS as MMCV_MODELS +from mmcv.utils import Registry + +MODELS = Registry('models', parent=MMCV_MODELS) +BACKBONES = MODELS +NECKS = MODELS +HEADS = MODELS +RECOGNIZERS = MODELS +LOSSES = MODELS +LOCALIZERS = MODELS + +try: + from mmdet.models.builder import DETECTORS, build_detector +except (ImportError, ModuleNotFoundError): + # Define an empty registry and building func, so that can import + DETECTORS = MODELS + + def build_detector(cfg, train_cfg, test_cfg): + warnings.warn( + 'Failed to import `DETECTORS`, `build_detector` from ' + '`mmdet.models.builder`. You will be unable to register or build ' + 'a spatio-temporal detection model. ') + + +def build_backbone(cfg): + """Build backbone.""" + return BACKBONES.build(cfg) + + +def build_head(cfg): + """Build head.""" + return HEADS.build(cfg) + + +def build_recognizer(cfg, train_cfg=None, test_cfg=None): + """Build recognizer.""" + if train_cfg is not None or test_cfg is not None: + warnings.warn( + 'train_cfg and test_cfg is deprecated, ' + 'please specify them in model. Details see this ' + 'PR: https://github.com/open-mmlab/mmaction2/pull/629', + UserWarning) + assert cfg.get( + 'train_cfg' + ) is None or train_cfg is None, 'train_cfg specified in both outer field and model field' # noqa: E501 + assert cfg.get( + 'test_cfg' + ) is None or test_cfg is None, 'test_cfg specified in both outer field and model field ' # noqa: E501 + return RECOGNIZERS.build( + cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg)) + + +def build_loss(cfg): + """Build loss.""" + return LOSSES.build(cfg) + + +def build_localizer(cfg): + """Build localizer.""" + return LOCALIZERS.build(cfg) + + +def build_model(cfg, train_cfg=None, test_cfg=None): + """Build model.""" + args = cfg.copy() + obj_type = args.pop('type') + if obj_type in LOCALIZERS: + return build_localizer(cfg) + if obj_type in RECOGNIZERS: + return build_recognizer(cfg, train_cfg, test_cfg) + if obj_type in DETECTORS: + if train_cfg is not None or test_cfg is not None: + warnings.warn( + 'train_cfg and test_cfg is deprecated, ' + 'please specify them in model. Details see this ' + 'PR: https://github.com/open-mmlab/mmaction2/pull/629', + UserWarning) + return build_detector(cfg, train_cfg, test_cfg) + model_in_mmdet = ['FastRCNN'] + if obj_type in model_in_mmdet: + raise ImportError( + 'Please install mmdet for spatial temporal detection tasks.') + raise ValueError(f'{obj_type} is not registered in ' + 'LOCALIZERS, RECOGNIZERS or DETECTORS') + + +def build_neck(cfg): + """Build neck.""" + return NECKS.build(cfg) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..3fca90af6430eceaf57942909c80eb3bbe14c186 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/__init__.py @@ -0,0 +1,14 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .conv2plus1d import Conv2plus1d +from .conv_audio import ConvAudio +from .lfb import LFB +from .sub_batchnorm3d import SubBatchNorm3D +from .tam import TAM +from .transformer import (DividedSpatialAttentionWithNorm, + DividedTemporalAttentionWithNorm, FFNWithNorm) + +__all__ = [ + 'Conv2plus1d', 'ConvAudio', 'LFB', 'TAM', + 'DividedSpatialAttentionWithNorm', 'DividedTemporalAttentionWithNorm', + 'FFNWithNorm', 'SubBatchNorm3D' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/conv2plus1d.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/conv2plus1d.py new file mode 100644 index 0000000000000000000000000000000000000000..72965617b25e6b72a83efa6029a566d1c6fb461e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/conv2plus1d.py @@ -0,0 +1,105 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import CONV_LAYERS, build_norm_layer, constant_init, kaiming_init +from torch.nn.modules.utils import _triple + + +@CONV_LAYERS.register_module() +class Conv2plus1d(nn.Module): + """(2+1)d Conv module for R(2+1)d backbone. + + https://arxiv.org/pdf/1711.11248.pdf. + + Args: + in_channels (int): Same as nn.Conv3d. + out_channels (int): Same as nn.Conv3d. + kernel_size (int | tuple[int]): Same as nn.Conv3d. + stride (int | tuple[int]): Same as nn.Conv3d. + padding (int | tuple[int]): Same as nn.Conv3d. + dilation (int | tuple[int]): Same as nn.Conv3d. + groups (int): Same as nn.Conv3d. + bias (bool | str): If specified as `auto`, it will be decided by the + norm_cfg. Bias will be set as True if norm_cfg is None, otherwise + False. + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + bias=True, + norm_cfg=dict(type='BN3d')): + super().__init__() + + kernel_size = _triple(kernel_size) + stride = _triple(stride) + padding = _triple(padding) + assert len(kernel_size) == len(stride) == len(padding) == 3 + + self.in_channels = in_channels + self.out_channels = out_channels + self.kernel_size = kernel_size + self.stride = stride + self.padding = padding + self.dilation = dilation + self.groups = groups + self.bias = bias + self.norm_cfg = norm_cfg + self.output_padding = (0, 0, 0) + self.transposed = False + + # The middle-plane is calculated according to: + # M_i = \floor{\frac{t * d^2 N_i-1 * N_i} + # {d^2 * N_i-1 + t * N_i}} + # where d, t are spatial and temporal kernel, and + # N_i, N_i-1 are planes + # and inplanes. https://arxiv.org/pdf/1711.11248.pdf + mid_channels = 3 * ( + in_channels * out_channels * kernel_size[1] * kernel_size[2]) + mid_channels /= ( + in_channels * kernel_size[1] * kernel_size[2] + 3 * out_channels) + mid_channels = int(mid_channels) + + self.conv_s = nn.Conv3d( + in_channels, + mid_channels, + kernel_size=(1, kernel_size[1], kernel_size[2]), + stride=(1, stride[1], stride[2]), + padding=(0, padding[1], padding[2]), + bias=bias) + _, self.bn_s = build_norm_layer(self.norm_cfg, mid_channels) + self.relu = nn.ReLU(inplace=True) + self.conv_t = nn.Conv3d( + mid_channels, + out_channels, + kernel_size=(kernel_size[0], 1, 1), + stride=(stride[0], 1, 1), + padding=(padding[0], 0, 0), + bias=bias) + + self.init_weights() + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + x = self.conv_s(x) + x = self.bn_s(x) + x = self.relu(x) + x = self.conv_t(x) + return x + + def init_weights(self): + """Initiate the parameters from scratch.""" + kaiming_init(self.conv_s) + kaiming_init(self.conv_t) + constant_init(self.bn_s, 1, bias=0) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/conv_audio.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/conv_audio.py new file mode 100644 index 0000000000000000000000000000000000000000..54f04c9cad04fba8e4f7c4a4c6650f0987f92430 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/conv_audio.py @@ -0,0 +1,105 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +from mmcv.cnn import CONV_LAYERS, ConvModule, constant_init, kaiming_init +from torch.nn.modules.utils import _pair + + +@CONV_LAYERS.register_module() +class ConvAudio(nn.Module): + """Conv2d module for AudioResNet backbone. + + `_. + + Args: + in_channels (int): Same as nn.Conv2d. + out_channels (int): Same as nn.Conv2d. + kernel_size (int | tuple[int]): Same as nn.Conv2d. + op (string): Operation to merge the output of freq + and time feature map. Choices are 'sum' and 'concat'. + Default: 'concat'. + stride (int | tuple[int]): Same as nn.Conv2d. + padding (int | tuple[int]): Same as nn.Conv2d. + dilation (int | tuple[int]): Same as nn.Conv2d. + groups (int): Same as nn.Conv2d. + bias (bool | str): If specified as `auto`, it will be decided by the + norm_cfg. Bias will be set as True if norm_cfg is None, otherwise + False. + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size, + op='concat', + stride=1, + padding=0, + dilation=1, + groups=1, + bias=False): + super().__init__() + + kernel_size = _pair(kernel_size) + stride = _pair(stride) + padding = _pair(padding) + + self.in_channels = in_channels + self.out_channels = out_channels + self.kernel_size = kernel_size + assert op in ['concat', 'sum'] + self.op = op + self.stride = stride + self.padding = padding + self.dilation = dilation + self.groups = groups + self.bias = bias + self.output_padding = (0, 0) + self.transposed = False + + self.conv_1 = ConvModule( + in_channels, + out_channels, + kernel_size=(kernel_size[0], 1), + stride=stride, + padding=(kernel_size[0] // 2, 0), + bias=bias, + conv_cfg=dict(type='Conv'), + norm_cfg=dict(type='BN'), + act_cfg=dict(type='ReLU')) + + self.conv_2 = ConvModule( + in_channels, + out_channels, + kernel_size=(1, kernel_size[1]), + stride=stride, + padding=(0, kernel_size[1] // 2), + bias=bias, + conv_cfg=dict(type='Conv'), + norm_cfg=dict(type='BN'), + act_cfg=dict(type='ReLU')) + + self.init_weights() + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + x_1 = self.conv_1(x) + x_2 = self.conv_2(x) + if self.op == 'concat': + out = torch.cat([x_1, x_2], 1) + else: + out = x_1 + x_2 + return out + + def init_weights(self): + """Initiate the parameters from scratch.""" + kaiming_init(self.conv_1.conv) + kaiming_init(self.conv_2.conv) + constant_init(self.conv_1.bn, 1, bias=0) + constant_init(self.conv_2.bn, 1, bias=0) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/lfb.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/lfb.py new file mode 100644 index 0000000000000000000000000000000000000000..3fb82cf3330247da10847a8f939073a41cda8e83 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/lfb.py @@ -0,0 +1,189 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import io +import os.path as osp +import warnings + +import numpy as np +import torch +import torch.distributed as dist +from mmcv.runner import get_dist_info + +try: + import lmdb + lmdb_imported = True +except (ImportError, ModuleNotFoundError): + lmdb_imported = False + + +class LFB: + """Long-Term Feature Bank (LFB). + + LFB is proposed in `Long-Term Feature Banks for Detailed Video + Understanding `_ + + The ROI features of videos are stored in the feature bank. The feature bank + was generated by inferring with a lfb infer config. + + Formally, LFB is a Dict whose keys are video IDs and its values are also + Dicts whose keys are timestamps in seconds. Example of LFB: + + .. code-block:: Python + { + '0f39OWEqJ24': { + 901: tensor([[ 1.2760, 1.1965, ..., 0.0061, -0.0639], + [-0.6320, 0.3794, ..., -1.2768, 0.5684], + [ 0.2535, 1.0049, ..., 0.4906, 1.2555], + [-0.5838, 0.8549, ..., -2.1736, 0.4162]]), + ... + 1705: tensor([[-1.0169, -1.1293, ..., 0.6793, -2.0540], + [ 1.2436, -0.4555, ..., 0.2281, -0.8219], + [ 0.2815, -0.0547, ..., -0.4199, 0.5157]]), + ... + }, + 'xmqSaQPzL1E': { + ... + }, + ... + } + + Args: + lfb_prefix_path (str): The storage path of lfb. + max_num_sampled_feat (int): The max number of sampled features. + Default: 5. + window_size (int): Window size of sampling long term feature. + Default: 60. + lfb_channels (int): Number of the channels of the features stored + in LFB. Default: 2048. + dataset_modes (tuple[str] | str): Load LFB of datasets with different + modes, such as training, validation, testing datasets. If you don't + do cross validation during training, just load the training dataset + i.e. setting `dataset_modes = ('train')`. + Default: ('train', 'val'). + device (str): Where to load lfb. Choices are 'gpu', 'cpu' and 'lmdb'. + A 1.65GB half-precision ava lfb (including training and validation) + occupies about 2GB GPU memory. Default: 'gpu'. + lmdb_map_size (int): Map size of lmdb. Default: 4e9. + construct_lmdb (bool): Whether to construct lmdb. If you have + constructed lmdb of lfb, you can set to False to skip the + construction. Default: True. + """ + + def __init__(self, + lfb_prefix_path, + max_num_sampled_feat=5, + window_size=60, + lfb_channels=2048, + dataset_modes=('train', 'val'), + device='gpu', + lmdb_map_size=4e9, + construct_lmdb=True): + if not osp.exists(lfb_prefix_path): + raise ValueError( + f'lfb prefix path {lfb_prefix_path} does not exist!') + self.lfb_prefix_path = lfb_prefix_path + self.max_num_sampled_feat = max_num_sampled_feat + self.window_size = window_size + self.lfb_channels = lfb_channels + if not isinstance(dataset_modes, tuple): + assert isinstance(dataset_modes, str) + dataset_modes = (dataset_modes, ) + self.dataset_modes = dataset_modes + self.device = device + + rank, world_size = get_dist_info() + + # Loading LFB + if self.device == 'gpu': + self.load_lfb(f'cuda:{rank}') + elif self.device == 'cpu': + if world_size > 1: + warnings.warn( + 'If distributed training is used with multi-GPUs, lfb ' + 'will be loaded multiple times on RAM. In this case, ' + "'lmdb' is recommended.", UserWarning) + self.load_lfb('cpu') + elif self.device == 'lmdb': + assert lmdb_imported, ( + 'Please install `lmdb` to load lfb on lmdb!') + self.lmdb_map_size = lmdb_map_size + self.construct_lmdb = construct_lmdb + self.lfb_lmdb_path = osp.normpath( + osp.join(self.lfb_prefix_path, 'lmdb')) + + if rank == 0 and self.construct_lmdb: + print('Constructing LFB lmdb...') + self.load_lfb_on_lmdb() + + # Synchronizes all processes to make sure lfb lmdb exist. + if world_size > 1: + dist.barrier() + self.lmdb_env = lmdb.open(self.lfb_lmdb_path, readonly=True) + else: + raise ValueError("Device must be 'gpu', 'cpu' or 'lmdb', ", + f'but get {self.device}.') + + def load_lfb(self, map_location): + self.lfb = {} + for dataset_mode in self.dataset_modes: + lfb_path = osp.normpath( + osp.join(self.lfb_prefix_path, f'lfb_{dataset_mode}.pkl')) + print(f'Loading LFB from {lfb_path}...') + self.lfb.update(torch.load(lfb_path, map_location=map_location)) + print(f'LFB has been loaded on {map_location}.') + + def load_lfb_on_lmdb(self): + lfb = {} + for dataset_mode in self.dataset_modes: + lfb_path = osp.normpath( + osp.join(self.lfb_prefix_path, f'lfb_{dataset_mode}.pkl')) + lfb.update(torch.load(lfb_path, map_location='cpu')) + + lmdb_env = lmdb.open(self.lfb_lmdb_path, map_size=self.lmdb_map_size) + for key, value in lfb.items(): + txn = lmdb_env.begin(write=True) + buff = io.BytesIO() + torch.save(value, buff) + buff.seek(0) + txn.put(key.encode(), buff.read()) + txn.commit() + buff.close() + + print(f'LFB lmdb has been constructed on {self.lfb_lmdb_path}!') + + def sample_long_term_features(self, video_id, timestamp): + if self.device == 'lmdb': + with self.lmdb_env.begin(write=False) as txn: + buf = txn.get(video_id.encode()) + video_features = torch.load(io.BytesIO(buf)) + else: + video_features = self.lfb[video_id] + + # Sample long term features. + window_size, K = self.window_size, self.max_num_sampled_feat + start = timestamp - (window_size // 2) + lt_feats = torch.zeros(window_size * K, self.lfb_channels) + + for idx, sec in enumerate(range(start, start + window_size)): + if sec in video_features: + # `num_feat` is the number of roi features in this second. + num_feat = len(video_features[sec]) + num_feat_sampled = min(num_feat, K) + # Sample some roi features randomly. + random_lfb_indices = np.random.choice( + range(num_feat), num_feat_sampled, replace=False) + + for k, rand_idx in enumerate(random_lfb_indices): + lt_feats[idx * K + k] = video_features[sec][rand_idx] + + # [window_size * max_num_sampled_feat, lfb_channels] + return lt_feats + + def __getitem__(self, img_key): + """Sample long term features like `lfb['0f39OWEqJ24,0902']` where `lfb` + is a instance of class LFB.""" + video_id, timestamp = img_key.split(',') + return self.sample_long_term_features(video_id, int(timestamp)) + + def __len__(self): + """The number of videos whose ROI features are stored in LFB.""" + return len(self.lfb) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/sub_batchnorm3d.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/sub_batchnorm3d.py new file mode 100644 index 0000000000000000000000000000000000000000..c020e875832d0041693901b4e4a7e7e7def31d9c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/sub_batchnorm3d.py @@ -0,0 +1,75 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from copy import deepcopy + +import torch +import torch.nn as nn +from mmcv.cnn import NORM_LAYERS + + +@NORM_LAYERS.register_module() +class SubBatchNorm3D(nn.Module): + """Sub BatchNorm3d splits the batch dimension into N splits, and run BN on + each of them separately (so that the stats are computed on each subset of + examples (1/N of batch) independently). During evaluation, it aggregates + the stats from all splits into one BN. + + Args: + num_features (int): Dimensions of BatchNorm. + """ + + def __init__(self, num_features, **cfg): + super(SubBatchNorm3D, self).__init__() + + self.num_features = num_features + self.cfg_ = deepcopy(cfg) + self.num_splits = self.cfg_.pop('num_splits', 1) + self.num_features_split = self.num_features * self.num_splits + # only keep one set of affine params, not in .bn or .split_bn + self.cfg_['affine'] = False + self.bn = nn.BatchNorm3d(num_features, **self.cfg_) + self.split_bn = nn.BatchNorm3d(self.num_features_split, **self.cfg_) + self.init_weights(cfg) + + def init_weights(self, cfg): + if cfg.get('affine', True): + self.weight = torch.nn.Parameter(torch.ones(self.num_features)) + self.bias = torch.nn.Parameter(torch.zeros(self.num_features)) + self.affine = True + else: + self.affine = False + + def _get_aggregated_mean_std(self, means, stds, n): + mean = means.view(n, -1).sum(0) / n + std = stds.view(n, -1).sum(0) / n + ( + (means.view(n, -1) - mean)**2).view(n, -1).sum(0) / n + return mean.detach(), std.detach() + + def aggregate_stats(self): + """Synchronize running_mean, and running_var to self.bn. + + Call this before eval, then call model.eval(); When eval, forward + function will call self.bn instead of self.split_bn, During this time + the running_mean, and running_var of self.bn has been obtained from + self.split_bn. + """ + if self.split_bn.track_running_stats: + aggre_func = self._get_aggregated_mean_std + self.bn.running_mean.data, self.bn.running_var.data = aggre_func( + self.split_bn.running_mean, self.split_bn.running_var, + self.num_splits) + self.bn.num_batches_tracked = self.split_bn.num_batches_tracked.detach( + ) + + def forward(self, x): + if self.training: + n, c, t, h, w = x.shape + assert n % self.num_splits == 0 + x = x.view(n // self.num_splits, c * self.num_splits, t, h, w) + x = self.split_bn(x) + x = x.view(n, c, t, h, w) + else: + x = self.bn(x) + if self.affine: + x = x * self.weight.view(-1, 1, 1, 1) + x = x + self.bias.view(-1, 1, 1, 1) + return x diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/tam.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/tam.py new file mode 100644 index 0000000000000000000000000000000000000000..5574213de07d944524d042fa13baefcbf4b3e194 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/tam.py @@ -0,0 +1,122 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +import torch.nn.functional as F + + +class TAM(nn.Module): + """Temporal Adaptive Module(TAM) for TANet. + + This module is proposed in `TAM: TEMPORAL ADAPTIVE MODULE FOR VIDEO + RECOGNITION `_ + + Args: + in_channels (int): Channel num of input features. + num_segments (int): Number of frame segments. + alpha (int): ```alpha``` in the paper and is the ratio of the + intermediate channel number to the initial channel number in the + global branch. Default: 2. + adaptive_kernel_size (int): ```K``` in the paper and is the size of the + adaptive kernel size in the global branch. Default: 3. + beta (int): ```beta``` in the paper and is set to control the model + complexity in the local branch. Default: 4. + conv1d_kernel_size (int): Size of the convolution kernel of Conv1d in + the local branch. Default: 3. + adaptive_convolution_stride (int): The first dimension of strides in + the adaptive convolution of ```Temporal Adaptive Aggregation```. + Default: 1. + adaptive_convolution_padding (int): The first dimension of paddings in + the adaptive convolution of ```Temporal Adaptive Aggregation```. + Default: 1. + init_std (float): Std value for initiation of `nn.Linear`. Default: + 0.001. + """ + + def __init__(self, + in_channels, + num_segments, + alpha=2, + adaptive_kernel_size=3, + beta=4, + conv1d_kernel_size=3, + adaptive_convolution_stride=1, + adaptive_convolution_padding=1, + init_std=0.001): + super().__init__() + + assert beta > 0 and alpha > 0 + self.in_channels = in_channels + self.num_segments = num_segments + self.alpha = alpha + self.adaptive_kernel_size = adaptive_kernel_size + self.beta = beta + self.conv1d_kernel_size = conv1d_kernel_size + self.adaptive_convolution_stride = adaptive_convolution_stride + self.adaptive_convolution_padding = adaptive_convolution_padding + self.init_std = init_std + + self.G = nn.Sequential( + nn.Linear(num_segments, num_segments * alpha, bias=False), + nn.BatchNorm1d(num_segments * alpha), nn.ReLU(inplace=True), + nn.Linear(num_segments * alpha, adaptive_kernel_size, bias=False), + nn.Softmax(-1)) + + self.L = nn.Sequential( + nn.Conv1d( + in_channels, + in_channels // beta, + conv1d_kernel_size, + stride=1, + padding=conv1d_kernel_size // 2, + bias=False), nn.BatchNorm1d(in_channels // beta), + nn.ReLU(inplace=True), + nn.Conv1d(in_channels // beta, in_channels, 1, bias=False), + nn.Sigmoid()) + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + # [n, c, h, w] + n, c, h, w = x.size() + num_segments = self.num_segments + num_batches = n // num_segments + assert c == self.in_channels + + # [num_batches, c, num_segments, h, w] + x = x.view(num_batches, num_segments, c, h, w) + x = x.permute(0, 2, 1, 3, 4).contiguous() + + # [num_batches * c, num_segments, 1, 1] + theta_out = F.adaptive_avg_pool2d( + x.view(-1, num_segments, h, w), (1, 1)) + + # [num_batches * c, 1, adaptive_kernel_size, 1] + conv_kernel = self.G(theta_out.view(-1, num_segments)).view( + num_batches * c, 1, -1, 1) + + # [num_batches, c, num_segments, 1, 1] + local_activation = self.L(theta_out.view(-1, c, num_segments)).view( + num_batches, c, num_segments, 1, 1) + + # [num_batches, c, num_segments, h, w] + new_x = x * local_activation + + # [1, num_batches * c, num_segments, h * w] + y = F.conv2d( + new_x.view(1, num_batches * c, num_segments, h * w), + conv_kernel, + bias=None, + stride=(self.adaptive_convolution_stride, 1), + padding=(self.adaptive_convolution_padding, 0), + groups=num_batches * c) + + # [n, c, h, w] + y = y.view(num_batches, c, num_segments, h, w) + y = y.permute(0, 2, 1, 3, 4).contiguous().view(n, c, h, w) + + return y diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/transformer.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..f7b6796859ab826f54643ebba85a97e45a3bfd3d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/common/transformer.py @@ -0,0 +1,216 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +from einops import rearrange +from mmcv.cnn import build_norm_layer, constant_init +from mmcv.cnn.bricks.registry import ATTENTION, FEEDFORWARD_NETWORK +from mmcv.cnn.bricks.transformer import FFN, build_dropout +from mmcv.runner.base_module import BaseModule +from mmcv.utils import digit_version + + +@ATTENTION.register_module() +class DividedTemporalAttentionWithNorm(BaseModule): + """Temporal Attention in Divided Space Time Attention. + + Args: + embed_dims (int): Dimensions of embedding. + num_heads (int): Number of parallel attention heads in + TransformerCoder. + num_frames (int): Number of frames in the video. + attn_drop (float): A Dropout layer on attn_output_weights. Defaults to + 0.. + proj_drop (float): A Dropout layer after `nn.MultiheadAttention`. + Defaults to 0.. + dropout_layer (dict): The dropout_layer used when adding the shortcut. + Defaults to `dict(type='DropPath', drop_prob=0.1)`. + norm_cfg (dict): Config dict for normalization layer. Defaults to + `dict(type='LN')`. + init_cfg (dict | None): The Config for initialization. Defaults to + None. + """ + + def __init__(self, + embed_dims, + num_heads, + num_frames, + attn_drop=0., + proj_drop=0., + dropout_layer=dict(type='DropPath', drop_prob=0.1), + norm_cfg=dict(type='LN'), + init_cfg=None, + **kwargs): + super().__init__(init_cfg) + self.embed_dims = embed_dims + self.num_heads = num_heads + self.num_frames = num_frames + self.norm = build_norm_layer(norm_cfg, self.embed_dims)[1] + + if digit_version(torch.__version__) < digit_version('1.9.0'): + kwargs.pop('batch_first', None) + self.attn = nn.MultiheadAttention(embed_dims, num_heads, attn_drop, + **kwargs) + self.proj_drop = nn.Dropout(proj_drop) + self.dropout_layer = build_dropout( + dropout_layer) if dropout_layer else nn.Identity() + self.temporal_fc = nn.Linear(self.embed_dims, self.embed_dims) + + self.init_weights() + + def init_weights(self): + constant_init(self.temporal_fc, val=0, bias=0) + + def forward(self, query, key=None, value=None, residual=None, **kwargs): + assert residual is None, ( + 'Always adding the shortcut in the forward function') + + init_cls_token = query[:, 0, :].unsqueeze(1) + identity = query_t = query[:, 1:, :] + + # query_t [batch_size, num_patches * num_frames, embed_dims] + b, pt, m = query_t.size() + p, t = pt // self.num_frames, self.num_frames + + # res_temporal [batch_size * num_patches, num_frames, embed_dims] + query_t = self.norm(query_t.reshape(b * p, t, m)).permute(1, 0, 2) + res_temporal = self.attn(query_t, query_t, query_t)[0].permute(1, 0, 2) + res_temporal = self.dropout_layer( + self.proj_drop(res_temporal.contiguous())) + res_temporal = self.temporal_fc(res_temporal) + + # res_temporal [batch_size, num_patches * num_frames, embed_dims] + res_temporal = res_temporal.reshape(b, p * t, m) + + # ret_value [batch_size, num_patches * num_frames + 1, embed_dims] + new_query_t = identity + res_temporal + new_query = torch.cat((init_cls_token, new_query_t), 1) + return new_query + + +@ATTENTION.register_module() +class DividedSpatialAttentionWithNorm(BaseModule): + """Spatial Attention in Divided Space Time Attention. + + Args: + embed_dims (int): Dimensions of embedding. + num_heads (int): Number of parallel attention heads in + TransformerCoder. + num_frames (int): Number of frames in the video. + attn_drop (float): A Dropout layer on attn_output_weights. Defaults to + 0.. + proj_drop (float): A Dropout layer after `nn.MultiheadAttention`. + Defaults to 0.. + dropout_layer (dict): The dropout_layer used when adding the shortcut. + Defaults to `dict(type='DropPath', drop_prob=0.1)`. + norm_cfg (dict): Config dict for normalization layer. Defaults to + `dict(type='LN')`. + init_cfg (dict | None): The Config for initialization. Defaults to + None. + """ + + def __init__(self, + embed_dims, + num_heads, + num_frames, + attn_drop=0., + proj_drop=0., + dropout_layer=dict(type='DropPath', drop_prob=0.1), + norm_cfg=dict(type='LN'), + init_cfg=None, + **kwargs): + super().__init__(init_cfg) + self.embed_dims = embed_dims + self.num_heads = num_heads + self.num_frames = num_frames + self.norm = build_norm_layer(norm_cfg, self.embed_dims)[1] + if digit_version(torch.__version__) < digit_version('1.9.0'): + kwargs.pop('batch_first', None) + self.attn = nn.MultiheadAttention(embed_dims, num_heads, attn_drop, + **kwargs) + self.proj_drop = nn.Dropout(proj_drop) + self.dropout_layer = build_dropout( + dropout_layer) if dropout_layer else nn.Identity() + + self.init_weights() + + def init_weights(self): + # init DividedSpatialAttentionWithNorm by default + pass + + def forward(self, query, key=None, value=None, residual=None, **kwargs): + assert residual is None, ( + 'Always adding the shortcut in the forward function') + + identity = query + init_cls_token = query[:, 0, :].unsqueeze(1) + query_s = query[:, 1:, :] + + # query_s [batch_size, num_patches * num_frames, embed_dims] + b, pt, m = query_s.size() + p, t = pt // self.num_frames, self.num_frames + + # cls_token [batch_size * num_frames, 1, embed_dims] + cls_token = init_cls_token.repeat(1, t, 1).reshape(b * t, + m).unsqueeze(1) + + # query_s [batch_size * num_frames, num_patches + 1, embed_dims] + query_s = rearrange(query_s, 'b (p t) m -> (b t) p m', p=p, t=t) + query_s = torch.cat((cls_token, query_s), 1) + + # res_spatial [batch_size * num_frames, num_patches + 1, embed_dims] + query_s = self.norm(query_s).permute(1, 0, 2) + res_spatial = self.attn(query_s, query_s, query_s)[0].permute(1, 0, 2) + res_spatial = self.dropout_layer( + self.proj_drop(res_spatial.contiguous())) + + # cls_token [batch_size, 1, embed_dims] + cls_token = res_spatial[:, 0, :].reshape(b, t, m) + cls_token = torch.mean(cls_token, 1, True) + + # res_spatial [batch_size * num_frames, num_patches + 1, embed_dims] + res_spatial = rearrange( + res_spatial[:, 1:, :], '(b t) p m -> b (p t) m', p=p, t=t) + res_spatial = torch.cat((cls_token, res_spatial), 1) + + new_query = identity + res_spatial + return new_query + + +@FEEDFORWARD_NETWORK.register_module() +class FFNWithNorm(FFN): + """FFN with pre normalization layer. + + FFNWithNorm is implemented to be compatible with `BaseTransformerLayer` + when using `DividedTemporalAttentionWithNorm` and + `DividedSpatialAttentionWithNorm`. + + FFNWithNorm has one main difference with FFN: + + - It apply one normalization layer before forwarding the input data to + feed-forward networks. + + Args: + embed_dims (int): Dimensions of embedding. Defaults to 256. + feedforward_channels (int): Hidden dimension of FFNs. Defaults to 1024. + num_fcs (int, optional): Number of fully-connected layers in FFNs. + Defaults to 2. + act_cfg (dict): Config for activate layers. + Defaults to `dict(type='ReLU')` + ffn_drop (float, optional): Probability of an element to be + zeroed in FFN. Defaults to 0.. + add_residual (bool, optional): Whether to add the + residual connection. Defaults to `True`. + dropout_layer (dict | None): The dropout_layer used when adding the + shortcut. Defaults to None. + init_cfg (dict): The Config for initialization. Defaults to None. + norm_cfg (dict): Config dict for normalization layer. Defaults to + `dict(type='LN')`. + """ + + def __init__(self, *args, norm_cfg=dict(type='LN'), **kwargs): + super().__init__(*args, **kwargs) + self.norm = build_norm_layer(norm_cfg, self.embed_dims)[1] + + def forward(self, x, residual=None): + assert residual is None, ('Cannot apply pre-norm with FFNWithNorm') + return super().forward(self.norm(x), x) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..edc3a0d553ac6c9c5b831541223e970ec7a7741b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/__init__.py @@ -0,0 +1,25 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .audio_tsn_head import AudioTSNHead +from .base import BaseHead +from .bbox_head import BBoxHeadAVA +from .fbo_head import FBOHead +from .i3d_head import I3DHead +from .lfb_infer_head import LFBInferHead +from .misc_head import ACRNHead +from .roi_head import AVARoIHead +from .slowfast_head import SlowFastHead +from .ssn_head import SSNHead +from .stgcn_head import STGCNHead +from .timesformer_head import TimeSformerHead +from .tpn_head import TPNHead +from .trn_head import TRNHead +from .tsm_head import TSMHead +from .tsn_head import TSNHead +from .x3d_head import X3DHead + +__all__ = [ + 'TSNHead', 'I3DHead', 'BaseHead', 'TSMHead', 'SlowFastHead', 'SSNHead', + 'TPNHead', 'AudioTSNHead', 'X3DHead', 'BBoxHeadAVA', 'AVARoIHead', + 'FBOHead', 'LFBInferHead', 'TRNHead', 'TimeSformerHead', 'ACRNHead', + 'STGCNHead' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/audio_tsn_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/audio_tsn_head.py new file mode 100644 index 0000000000000000000000000000000000000000..9f5f35efa8c08d394382915afea86c9390316ec3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/audio_tsn_head.py @@ -0,0 +1,74 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import normal_init + +from ..builder import HEADS +from .base import BaseHead + + +@HEADS.register_module() +class AudioTSNHead(BaseHead): + """Classification head for TSN on audio. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + loss_cls (dict): Config for building loss. + Default: dict(type='CrossEntropyLoss'). + spatial_type (str): Pooling type in spatial dimension. Default: 'avg'. + dropout_ratio (float): Probability of dropout layer. Default: 0.4. + init_std (float): Std value for Initiation. Default: 0.01. + kwargs (dict, optional): Any keyword argument to be used to initialize + the head. + """ + + def __init__(self, + num_classes, + in_channels, + loss_cls=dict(type='CrossEntropyLoss'), + spatial_type='avg', + dropout_ratio=0.4, + init_std=0.01, + **kwargs): + super().__init__(num_classes, in_channels, loss_cls=loss_cls, **kwargs) + + self.spatial_type = spatial_type + self.dropout_ratio = dropout_ratio + self.init_std = init_std + + if self.spatial_type == 'avg': + # use `nn.AdaptiveAvgPool2d` to adaptively match the in_channels. + self.avg_pool = nn.AdaptiveAvgPool2d((1, 1)) + else: + self.avg_pool = None + + if self.dropout_ratio != 0: + self.dropout = nn.Dropout(p=self.dropout_ratio) + else: + self.dropout = None + self.fc_cls = nn.Linear(self.in_channels, self.num_classes) + + def init_weights(self): + """Initiate the parameters from scratch.""" + normal_init(self.fc_cls, std=self.init_std) + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The classification scores for input samples. + """ + # [N * num_segs, in_channels, h, w] + x = self.avg_pool(x) + # [N, in_channels, 1, 1] + x = x.view(x.size(0), -1) + # [N, in_channels] + if self.dropout is not None: + x = self.dropout(x) + # [N, in_channels] + cls_score = self.fc_cls(x) + # [N, num_classes] + return cls_score diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/base.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/base.py new file mode 100644 index 0000000000000000000000000000000000000000..d89e3af312feb232ab4f6ae4712d2e8d8683d1fa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/base.py @@ -0,0 +1,117 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from abc import ABCMeta, abstractmethod + +import torch +import torch.nn as nn + +from ...core import top_k_accuracy +from ..builder import build_loss + + +class AvgConsensus(nn.Module): + """Average consensus module. + + Args: + dim (int): Decide which dim consensus function to apply. + Default: 1. + """ + + def __init__(self, dim=1): + super().__init__() + self.dim = dim + + def forward(self, x): + """Defines the computation performed at every call.""" + return x.mean(dim=self.dim, keepdim=True) + + +class BaseHead(nn.Module, metaclass=ABCMeta): + """Base class for head. + + All Head should subclass it. + All subclass should overwrite: + - Methods:``init_weights``, initializing weights in some modules. + - Methods:``forward``, supporting to forward both for training and testing. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + loss_cls (dict): Config for building loss. + Default: dict(type='CrossEntropyLoss', loss_weight=1.0). + multi_class (bool): Determines whether it is a multi-class + recognition task. Default: False. + label_smooth_eps (float): Epsilon used in label smooth. + Reference: arxiv.org/abs/1906.02629. Default: 0. + topk (int | tuple): Top-k accuracy. Default: (1, 5). + """ + + def __init__(self, + num_classes, + in_channels, + loss_cls=dict(type='CrossEntropyLoss', loss_weight=1.0), + multi_class=False, + label_smooth_eps=0.0, + topk=(1, 5)): + super().__init__() + self.num_classes = num_classes + self.in_channels = in_channels + self.loss_cls = build_loss(loss_cls) + self.multi_class = multi_class + self.label_smooth_eps = label_smooth_eps + assert isinstance(topk, (int, tuple)) + if isinstance(topk, int): + topk = (topk, ) + for _topk in topk: + assert _topk > 0, 'Top-k should be larger than 0' + self.topk = topk + + @abstractmethod + def init_weights(self): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + + @abstractmethod + def forward(self, x): + """Defines the computation performed at every call.""" + + def loss(self, cls_score, labels, **kwargs): + """Calculate the loss given output ``cls_score``, target ``labels``. + + Args: + cls_score (torch.Tensor): The output of the model. + labels (torch.Tensor): The target output of the model. + + Returns: + dict: A dict containing field 'loss_cls'(mandatory) + and 'topk_acc'(optional). + """ + losses = dict() + if labels.shape == torch.Size([]): + labels = labels.unsqueeze(0) + elif labels.dim() == 1 and labels.size()[0] == self.num_classes \ + and cls_score.size()[0] == 1: + # Fix a bug when training with soft labels and batch size is 1. + # When using soft labels, `labels` and `cls_socre` share the same + # shape. + labels = labels.unsqueeze(0) + + if not self.multi_class and cls_score.size() != labels.size(): + top_k_acc = top_k_accuracy(cls_score.detach().cpu().numpy(), + labels.detach().cpu().numpy(), + self.topk) + for k, a in zip(self.topk, top_k_acc): + losses[f'top{k}_acc'] = torch.tensor( + a, device=cls_score.device) + + elif self.multi_class and self.label_smooth_eps != 0: + labels = ((1 - self.label_smooth_eps) * labels + + self.label_smooth_eps / self.num_classes) + + loss_cls = self.loss_cls(cls_score, labels, **kwargs) + # loss_cls may be dictionary or single tensor + if isinstance(loss_cls, dict): + losses.update(loss_cls) + else: + losses['loss_cls'] = loss_cls + + return losses diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/bbox_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/bbox_head.py new file mode 100644 index 0000000000000000000000000000000000000000..19787a5eb1ddafc34ddd9b11ab78dbf8f242b5d7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/bbox_head.py @@ -0,0 +1,306 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +import torch.nn.functional as F + +from mmaction.core.bbox import bbox_target + +try: + from mmdet.models.builder import HEADS as MMDET_HEADS + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + +# Resolve cross-entropy function to support multi-target in Torch < 1.10 +# This is a very basic 'hack', with minimal functionality to support the +# procedure under prior torch versions +from packaging import version as pv + +if pv.parse(torch.__version__) < pv.parse('1.10'): + + def cross_entropy_loss(input, target, reduction='None'): + input = input.log_softmax(dim=-1) # Compute Log of Softmax + loss = -(input * target).sum(dim=-1) # Compute Loss manually + if reduction.lower() == 'mean': + return loss.mean() + elif reduction.lower() == 'sum': + return loss.sum() + else: + return loss +else: + cross_entropy_loss = F.cross_entropy + + +class BBoxHeadAVA(nn.Module): + """Simplest RoI head, with only two fc layers for classification and + regression respectively. + + Args: + temporal_pool_type (str): The temporal pool type. Choices are 'avg' or + 'max'. Default: 'avg'. + spatial_pool_type (str): The spatial pool type. Choices are 'avg' or + 'max'. Default: 'max'. + in_channels (int): The number of input channels. Default: 2048. + focal_alpha (float): The hyper-parameter alpha for Focal Loss. + When alpha == 1 and gamma == 0, Focal Loss degenerates to + BCELossWithLogits. Default: 1. + focal_gamma (float): The hyper-parameter gamma for Focal Loss. + When alpha == 1 and gamma == 0, Focal Loss degenerates to + BCELossWithLogits. Default: 0. + num_classes (int): The number of classes. Default: 81. + dropout_ratio (float): A float in [0, 1], indicates the dropout_ratio. + Default: 0. + dropout_before_pool (bool): Dropout Feature before spatial temporal + pooling. Default: True. + topk (int or tuple[int]): Parameter for evaluating Top-K accuracy. + Default: (3, 5) + multilabel (bool): Whether used for a multilabel task. Default: True. + """ + + def __init__( + self, + temporal_pool_type='avg', + spatial_pool_type='max', + in_channels=2048, + focal_gamma=0., + focal_alpha=1., + num_classes=81, # First class reserved (BBox as pos/neg) + dropout_ratio=0, + dropout_before_pool=True, + topk=(3, 5), + multilabel=True): + + super(BBoxHeadAVA, self).__init__() + assert temporal_pool_type in ['max', 'avg'] + assert spatial_pool_type in ['max', 'avg'] + self.temporal_pool_type = temporal_pool_type + self.spatial_pool_type = spatial_pool_type + + self.in_channels = in_channels + self.num_classes = num_classes + + self.dropout_ratio = dropout_ratio + self.dropout_before_pool = dropout_before_pool + + self.multilabel = multilabel + + self.focal_gamma = focal_gamma + self.focal_alpha = focal_alpha + + if topk is None: + self.topk = () + elif isinstance(topk, int): + self.topk = (topk, ) + elif isinstance(topk, tuple): + assert all([isinstance(k, int) for k in topk]) + self.topk = topk + else: + raise TypeError('topk should be int or tuple[int], ' + f'but get {type(topk)}') + # Class 0 is ignored when calculating accuracy, + # so topk cannot be equal to num_classes. + assert all([k < num_classes for k in self.topk]) + + in_channels = self.in_channels + # Pool by default + if self.temporal_pool_type == 'avg': + self.temporal_pool = nn.AdaptiveAvgPool3d((1, None, None)) + else: + self.temporal_pool = nn.AdaptiveMaxPool3d((1, None, None)) + if self.spatial_pool_type == 'avg': + self.spatial_pool = nn.AdaptiveAvgPool3d((None, 1, 1)) + else: + self.spatial_pool = nn.AdaptiveMaxPool3d((None, 1, 1)) + + if dropout_ratio > 0: + self.dropout = nn.Dropout(dropout_ratio) + + self.fc_cls = nn.Linear(in_channels, num_classes) + self.debug_imgs = None + + def init_weights(self): + nn.init.normal_(self.fc_cls.weight, 0, 0.01) + nn.init.constant_(self.fc_cls.bias, 0) + + def forward(self, x): + if self.dropout_before_pool and self.dropout_ratio > 0: + x = self.dropout(x) + + x = self.temporal_pool(x) + x = self.spatial_pool(x) + + if not self.dropout_before_pool and self.dropout_ratio > 0: + x = self.dropout(x) + + x = x.view(x.size(0), -1) + cls_score = self.fc_cls(x) + # We do not predict bbox, so return None + return cls_score, None + + @staticmethod + def get_targets(sampling_results, gt_bboxes, gt_labels, rcnn_train_cfg): + pos_proposals = [res.pos_bboxes for res in sampling_results] + neg_proposals = [res.neg_bboxes for res in sampling_results] + pos_gt_labels = [res.pos_gt_labels for res in sampling_results] + cls_reg_targets = bbox_target(pos_proposals, neg_proposals, + pos_gt_labels, rcnn_train_cfg) + return cls_reg_targets + + @staticmethod + def get_recall_prec(pred_vec, target_vec): + """Computes the Recall/Precision for both multi-label and single label + scenarios. + + Note that the computation calculates the micro average. + + Note, that in both cases, the concept of correct/incorrect is the same. + Args: + pred_vec (tensor[N x C]): each element is either 0 or 1 + target_vec (tensor[N x C]): each element is either 0 or 1 - for + single label it is expected that only one element is on (1) + although this is not enforced. + """ + correct = pred_vec & target_vec + recall = correct.sum(1) / target_vec.sum(1).float() # Enforce Float + prec = correct.sum(1) / (pred_vec.sum(1) + 1e-6) + return recall.mean(), prec.mean() + + @staticmethod + def topk_to_matrix(probs, k): + """Converts top-k to binary matrix.""" + topk_labels = probs.topk(k, 1, True, True)[1] + topk_matrix = probs.new_full(probs.size(), 0, dtype=torch.bool) + for i in range(probs.shape[0]): + topk_matrix[i, topk_labels[i]] = 1 + return topk_matrix + + def topk_accuracy(self, pred, target, thr=0.5): + """Computes the Top-K Accuracies for both single and multi-label + scenarios.""" + # Define Target vector: + target_bool = target > 0.5 + + # Branch on Multilabel for computing output classification + if self.multilabel: + pred = pred.sigmoid() + else: + pred = pred.softmax(dim=1) + + # Compute at threshold (K=1 for single) + if self.multilabel: + pred_bool = pred > thr + else: + pred_bool = self.topk_to_matrix(pred, 1) + recall_thr, prec_thr = self.get_recall_prec(pred_bool, target_bool) + + # Compute at various K + recalls_k, precs_k = [], [] + for k in self.topk: + pred_bool = self.topk_to_matrix(pred, k) + recall, prec = self.get_recall_prec(pred_bool, target_bool) + recalls_k.append(recall) + precs_k.append(prec) + + # Return all + return recall_thr, prec_thr, recalls_k, precs_k + + def loss(self, + cls_score, + bbox_pred, + rois, + labels, + label_weights, + bbox_targets=None, + bbox_weights=None, + reduce=True): + + losses = dict() + # Only use the cls_score + if cls_score is not None: + labels = labels[:, 1:] # Get valid labels (ignore first one) + pos_inds = torch.sum(labels, dim=-1) > 0 + cls_score = cls_score[pos_inds, 1:] + labels = labels[pos_inds] + + # Compute First Recall/Precisions + # This has to be done first before normalising the label-space. + recall_thr, prec_thr, recall_k, prec_k = self.topk_accuracy( + cls_score, labels, thr=0.5) + losses['recall@thr=0.5'] = recall_thr + losses['prec@thr=0.5'] = prec_thr + for i, k in enumerate(self.topk): + losses[f'recall@top{k}'] = recall_k[i] + losses[f'prec@top{k}'] = prec_k[i] + + # If Single-label, need to ensure that target labels sum to 1: ie + # that they are valid probabilities. + if not self.multilabel: + labels = labels / labels.sum(dim=1, keepdim=True) + + # Select Loss function based on single/multi-label + # NB. Both losses auto-compute sigmoid/softmax on prediction + if self.multilabel: + loss_func = F.binary_cross_entropy_with_logits + else: + loss_func = cross_entropy_loss + + # Compute loss + loss = loss_func(cls_score, labels, reduction='none') + pt = torch.exp(-loss) + F_loss = self.focal_alpha * (1 - pt)**self.focal_gamma * loss + losses['loss_action_cls'] = torch.mean(F_loss) + + return losses + + def get_det_bboxes(self, + rois, + cls_score, + img_shape, + flip=False, + crop_quadruple=None, + cfg=None): + + # might be used by testing w. augmentation + if isinstance(cls_score, list): + cls_score = sum(cls_score) / float(len(cls_score)) + + # Handle Multi/Single Label + if cls_score is not None: + if self.multilabel: + scores = cls_score.sigmoid() + else: + scores = cls_score.softmax(dim=-1) + else: + scores = None + + bboxes = rois[:, 1:] + assert bboxes.shape[-1] == 4 + + # First reverse the flip + img_h, img_w = img_shape + if flip: + bboxes_ = bboxes.clone() + bboxes_[:, 0] = img_w - 1 - bboxes[:, 2] + bboxes_[:, 2] = img_w - 1 - bboxes[:, 0] + bboxes = bboxes_ + + # Then normalize the bbox to [0, 1] + bboxes[:, 0::2] /= img_w + bboxes[:, 1::2] /= img_h + + def _bbox_crop_undo(bboxes, crop_quadruple): + decropped = bboxes.clone() + + if crop_quadruple is not None: + x1, y1, tw, th = crop_quadruple + decropped[:, 0::2] = bboxes[..., 0::2] * tw + x1 + decropped[:, 1::2] = bboxes[..., 1::2] * th + y1 + + return decropped + + bboxes = _bbox_crop_undo(bboxes, crop_quadruple) + return bboxes, scores + + +if mmdet_imported: + MMDET_HEADS.register_module()(BBoxHeadAVA) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/fbo_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/fbo_head.py new file mode 100644 index 0000000000000000000000000000000000000000..42bbbb34d9ca6e7e6a522d2ce49e27af35dd492f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/fbo_head.py @@ -0,0 +1,401 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy + +import torch +import torch.nn as nn +from mmcv.cnn import ConvModule, constant_init, kaiming_init +from mmcv.runner import load_checkpoint +from mmcv.utils import _BatchNorm + +from mmaction.models.common import LFB +from mmaction.utils import get_root_logger + +try: + from mmdet.models.builder import SHARED_HEADS as MMDET_SHARED_HEADS + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + + +class NonLocalLayer(nn.Module): + """Non-local layer used in `FBONonLocal` is a variation of the vanilla non- + local block. + + Args: + st_feat_channels (int): Channels of short-term features. + lt_feat_channels (int): Channels of long-term features. + latent_channels (int): Channels of latent features. + use_scale (bool): Whether to scale pairwise_weight by + `1/sqrt(latent_channels)`. Default: True. + pre_activate (bool): Whether to use the activation function before + upsampling. Default: False. + conv_cfg (Dict | None): The config dict for convolution layers. If + not specified, it will use `nn.Conv2d` for convolution layers. + Default: None. + norm_cfg (Dict | None): he config dict for normalization layers. + Default: None. + dropout_ratio (float, optional): Probability of dropout layer. + Default: 0.2. + zero_init_out_conv (bool): Whether to use zero initialization for + out_conv. Default: False. + """ + + def __init__(self, + st_feat_channels, + lt_feat_channels, + latent_channels, + num_st_feat, + num_lt_feat, + use_scale=True, + pre_activate=True, + pre_activate_with_ln=True, + conv_cfg=None, + norm_cfg=None, + dropout_ratio=0.2, + zero_init_out_conv=False): + super().__init__() + if conv_cfg is None: + conv_cfg = dict(type='Conv3d') + self.st_feat_channels = st_feat_channels + self.lt_feat_channels = lt_feat_channels + self.latent_channels = latent_channels + self.num_st_feat = num_st_feat + self.num_lt_feat = num_lt_feat + self.use_scale = use_scale + self.pre_activate = pre_activate + self.pre_activate_with_ln = pre_activate_with_ln + self.dropout_ratio = dropout_ratio + self.zero_init_out_conv = zero_init_out_conv + + self.st_feat_conv = ConvModule( + self.st_feat_channels, + self.latent_channels, + kernel_size=1, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + + self.lt_feat_conv = ConvModule( + self.lt_feat_channels, + self.latent_channels, + kernel_size=1, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + + self.global_conv = ConvModule( + self.lt_feat_channels, + self.latent_channels, + kernel_size=1, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + + if pre_activate: + self.ln = nn.LayerNorm([latent_channels, num_st_feat, 1, 1]) + else: + self.ln = nn.LayerNorm([st_feat_channels, num_st_feat, 1, 1]) + + self.relu = nn.ReLU() + + self.out_conv = ConvModule( + self.latent_channels, + self.st_feat_channels, + kernel_size=1, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None) + + if self.dropout_ratio > 0: + self.dropout = nn.Dropout(self.dropout_ratio) + + def init_weights(self, pretrained=None): + """Initiate the parameters either from existing checkpoint or from + scratch.""" + if isinstance(pretrained, str): + logger = get_root_logger() + logger.info(f'load model from: {pretrained}') + load_checkpoint(self, pretrained, strict=False, logger=logger) + elif pretrained is None: + for m in self.modules(): + if isinstance(m, nn.Conv3d): + kaiming_init(m) + elif isinstance(m, _BatchNorm): + constant_init(m, 1) + if self.zero_init_out_conv: + constant_init(self.out_conv, 0, bias=0) + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, st_feat, lt_feat): + n, c = st_feat.size(0), self.latent_channels + num_st_feat, num_lt_feat = self.num_st_feat, self.num_lt_feat + + theta = self.st_feat_conv(st_feat) + theta = theta.view(n, c, num_st_feat) + + phi = self.lt_feat_conv(lt_feat) + phi = phi.view(n, c, num_lt_feat) + + g = self.global_conv(lt_feat) + g = g.view(n, c, num_lt_feat) + + # (n, num_st_feat, c), (n, c, num_lt_feat) + # -> (n, num_st_feat, num_lt_feat) + theta_phi = torch.matmul(theta.permute(0, 2, 1), phi) + if self.use_scale: + theta_phi /= c**0.5 + + p = theta_phi.softmax(dim=-1) + + # (n, c, num_lt_feat), (n, num_lt_feat, num_st_feat) + # -> (n, c, num_st_feat, 1, 1) + out = torch.matmul(g, p.permute(0, 2, 1)).view(n, c, num_st_feat, 1, 1) + + # If need to activate it before out_conv, use relu here, otherwise + # use relu outside the non local layer. + if self.pre_activate: + if self.pre_activate_with_ln: + out = self.ln(out) + out = self.relu(out) + + out = self.out_conv(out) + + if not self.pre_activate: + out = self.ln(out) + if self.dropout_ratio > 0: + out = self.dropout(out) + + return out + + +class FBONonLocal(nn.Module): + """Non local feature bank operator. + + Args: + st_feat_channels (int): Channels of short-term features. + lt_feat_channels (int): Channels of long-term features. + latent_channels (int): Channels of latent features. + num_st_feat (int): Number of short-term roi features. + num_lt_feat (int): Number of long-term roi features. + num_non_local_layers (int): Number of non-local layers, which is + at least 1. Default: 2. + st_feat_dropout_ratio (float): Probability of dropout layer for + short-term features. Default: 0.2. + lt_feat_dropout_ratio (float): Probability of dropout layer for + long-term features. Default: 0.2. + pre_activate (bool): Whether to use the activation function before + upsampling in non local layers. Default: True. + zero_init_out_conv (bool): Whether to use zero initialization for + out_conv in NonLocalLayer. Default: False. + """ + + def __init__(self, + st_feat_channels, + lt_feat_channels, + latent_channels, + num_st_feat, + num_lt_feat, + num_non_local_layers=2, + st_feat_dropout_ratio=0.2, + lt_feat_dropout_ratio=0.2, + pre_activate=True, + zero_init_out_conv=False): + super().__init__() + assert num_non_local_layers >= 1, ( + 'At least one non_local_layer is needed.') + self.st_feat_channels = st_feat_channels + self.lt_feat_channels = lt_feat_channels + self.latent_channels = latent_channels + self.num_st_feat = num_st_feat + self.num_lt_feat = num_lt_feat + self.num_non_local_layers = num_non_local_layers + self.st_feat_dropout_ratio = st_feat_dropout_ratio + self.lt_feat_dropout_ratio = lt_feat_dropout_ratio + self.pre_activate = pre_activate + self.zero_init_out_conv = zero_init_out_conv + + self.st_feat_conv = nn.Conv3d( + st_feat_channels, latent_channels, kernel_size=1) + self.lt_feat_conv = nn.Conv3d( + lt_feat_channels, latent_channels, kernel_size=1) + + if self.st_feat_dropout_ratio > 0: + self.st_feat_dropout = nn.Dropout(self.st_feat_dropout_ratio) + + if self.lt_feat_dropout_ratio > 0: + self.lt_feat_dropout = nn.Dropout(self.lt_feat_dropout_ratio) + + if not self.pre_activate: + self.relu = nn.ReLU() + + self.non_local_layers = [] + for idx in range(self.num_non_local_layers): + layer_name = f'non_local_layer_{idx + 1}' + self.add_module( + layer_name, + NonLocalLayer( + latent_channels, + latent_channels, + latent_channels, + num_st_feat, + num_lt_feat, + pre_activate=self.pre_activate, + zero_init_out_conv=self.zero_init_out_conv)) + self.non_local_layers.append(layer_name) + + def init_weights(self, pretrained=None): + if isinstance(pretrained, str): + logger = get_root_logger() + load_checkpoint(self, pretrained, strict=False, logger=logger) + elif pretrained is None: + kaiming_init(self.st_feat_conv) + kaiming_init(self.lt_feat_conv) + for layer_name in self.non_local_layers: + non_local_layer = getattr(self, layer_name) + non_local_layer.init_weights(pretrained=pretrained) + else: + raise TypeError('pretrained must be a str or None') + + def forward(self, st_feat, lt_feat): + # prepare st_feat + st_feat = self.st_feat_conv(st_feat) + if self.st_feat_dropout_ratio > 0: + st_feat = self.st_feat_dropout(st_feat) + + # prepare lt_feat + lt_feat = self.lt_feat_conv(lt_feat) + if self.lt_feat_dropout_ratio > 0: + lt_feat = self.lt_feat_dropout(lt_feat) + + # fuse short-term and long-term features in NonLocal Layer + for layer_name in self.non_local_layers: + identity = st_feat + non_local_layer = getattr(self, layer_name) + nl_out = non_local_layer(st_feat, lt_feat) + nl_out = identity + nl_out + if not self.pre_activate: + nl_out = self.relu(nl_out) + st_feat = nl_out + + return nl_out + + +class FBOAvg(nn.Module): + """Avg pool feature bank operator.""" + + def __init__(self): + super().__init__() + self.avg_pool = nn.AdaptiveAvgPool3d((1, None, None)) + + def init_weights(self, pretrained=None): + # FBOAvg has no parameters to be initialized. + pass + + def forward(self, st_feat, lt_feat): + out = self.avg_pool(lt_feat) + return out + + +class FBOMax(nn.Module): + """Max pool feature bank operator.""" + + def __init__(self): + super().__init__() + self.max_pool = nn.AdaptiveMaxPool3d((1, None, None)) + + def init_weights(self, pretrained=None): + # FBOMax has no parameters to be initialized. + pass + + def forward(self, st_feat, lt_feat): + out = self.max_pool(lt_feat) + return out + + +class FBOHead(nn.Module): + """Feature Bank Operator Head. + + Add feature bank operator for the spatiotemporal detection model to fuse + short-term features and long-term features. + + Args: + lfb_cfg (Dict): The config dict for LFB which is used to sample + long-term features. + fbo_cfg (Dict): The config dict for feature bank operator (FBO). The + type of fbo is also in the config dict and supported fbo type is + `fbo_dict`. + temporal_pool_type (str): The temporal pool type. Choices are 'avg' or + 'max'. Default: 'avg'. + spatial_pool_type (str): The spatial pool type. Choices are 'avg' or + 'max'. Default: 'max'. + """ + + fbo_dict = {'non_local': FBONonLocal, 'avg': FBOAvg, 'max': FBOMax} + + def __init__(self, + lfb_cfg, + fbo_cfg, + temporal_pool_type='avg', + spatial_pool_type='max', + pretrained=None): + super().__init__() + fbo_type = fbo_cfg.pop('type', 'non_local') + assert fbo_type in FBOHead.fbo_dict + assert temporal_pool_type in ['max', 'avg'] + assert spatial_pool_type in ['max', 'avg'] + + self.lfb_cfg = copy.deepcopy(lfb_cfg) + self.fbo_cfg = copy.deepcopy(fbo_cfg) + self.pretrained = pretrained + + self.lfb = LFB(**self.lfb_cfg) + self.fbo = self.fbo_dict[fbo_type](**self.fbo_cfg) + + # Pool by default + if temporal_pool_type == 'avg': + self.temporal_pool = nn.AdaptiveAvgPool3d((1, None, None)) + else: + self.temporal_pool = nn.AdaptiveMaxPool3d((1, None, None)) + if spatial_pool_type == 'avg': + self.spatial_pool = nn.AdaptiveAvgPool3d((None, 1, 1)) + else: + self.spatial_pool = nn.AdaptiveMaxPool3d((None, 1, 1)) + + def init_weights(self, pretrained=None): + """Initialize the weights in the module. + + Args: + pretrained (str, optional): Path to pre-trained weights. + Default: None. + """ + self.fbo.init_weights(pretrained=pretrained) + + def sample_lfb(self, rois, img_metas): + """Sample long-term features for each ROI feature.""" + inds = rois[:, 0].type(torch.int64) + lt_feat_list = [] + for ind in inds: + lt_feat_list.append(self.lfb[img_metas[ind]['img_key']].to()) + lt_feat = torch.stack(lt_feat_list, dim=0) + # [N, lfb_channels, window_size * max_num_feat_per_step] + lt_feat = lt_feat.permute(0, 2, 1).contiguous() + return lt_feat.unsqueeze(-1).unsqueeze(-1) + + def forward(self, x, rois, img_metas, **kwargs): + # [N, C, 1, 1, 1] + st_feat = self.temporal_pool(x) + st_feat = self.spatial_pool(st_feat) + identity = st_feat + + # [N, C, window_size * num_feat_per_step, 1, 1] + lt_feat = self.sample_lfb(rois, img_metas).to(st_feat.device) + + fbo_feat = self.fbo(st_feat, lt_feat) + + out = torch.cat([identity, fbo_feat], dim=1) + return out + + +if mmdet_imported: + MMDET_SHARED_HEADS.register_module()(FBOHead) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/i3d_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/i3d_head.py new file mode 100644 index 0000000000000000000000000000000000000000..a5fe18e52633cdeed01af5ce0a4e041f1cf83a56 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/i3d_head.py @@ -0,0 +1,74 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import normal_init + +from ..builder import HEADS +from .base import BaseHead + + +@HEADS.register_module() +class I3DHead(BaseHead): + """Classification head for I3D. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + loss_cls (dict): Config for building loss. + Default: dict(type='CrossEntropyLoss') + spatial_type (str): Pooling type in spatial dimension. Default: 'avg'. + dropout_ratio (float): Probability of dropout layer. Default: 0.5. + init_std (float): Std value for Initiation. Default: 0.01. + kwargs (dict, optional): Any keyword argument to be used to initialize + the head. + """ + + def __init__(self, + num_classes, + in_channels, + loss_cls=dict(type='CrossEntropyLoss'), + spatial_type='avg', + dropout_ratio=0.5, + init_std=0.01, + **kwargs): + super().__init__(num_classes, in_channels, loss_cls, **kwargs) + + self.spatial_type = spatial_type + self.dropout_ratio = dropout_ratio + self.init_std = init_std + if self.dropout_ratio != 0: + self.dropout = nn.Dropout(p=self.dropout_ratio) + else: + self.dropout = None + self.fc_cls = nn.Linear(self.in_channels, self.num_classes) + + if self.spatial_type == 'avg': + # use `nn.AdaptiveAvgPool3d` to adaptively match the in_channels. + self.avg_pool = nn.AdaptiveAvgPool3d((1, 1, 1)) + else: + self.avg_pool = None + + def init_weights(self): + """Initiate the parameters from scratch.""" + normal_init(self.fc_cls, std=self.init_std) + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The classification scores for input samples. + """ + # [N, in_channels, 4, 7, 7] + if self.avg_pool is not None: + x = self.avg_pool(x) + # [N, in_channels, 1, 1, 1] + if self.dropout is not None: + x = self.dropout(x) + # [N, in_channels, 1, 1, 1] + x = x.view(x.shape[0], -1) + # [N, in_channels] + cls_score = self.fc_cls(x) + # [N, num_classes] + return cls_score diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/lfb_infer_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/lfb_infer_head.py new file mode 100644 index 0000000000000000000000000000000000000000..2ad7cc5828d094ab54e86285b96ab3552e91a428 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/lfb_infer_head.py @@ -0,0 +1,148 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import mmcv +import torch +import torch.distributed as dist +import torch.nn as nn +from mmcv.runner import get_dist_info + +try: + from mmdet.models.builder import SHARED_HEADS as MMDET_SHARED_HEADS + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + + +class LFBInferHead(nn.Module): + """Long-Term Feature Bank Infer Head. + + This head is used to derive and save the LFB without affecting the input. + + Args: + lfb_prefix_path (str): The prefix path to store the lfb. + dataset_mode (str, optional): Which dataset to be inferred. Choices are + 'train', 'val' or 'test'. Default: 'train'. + use_half_precision (bool, optional): Whether to store the + half-precision roi features. Default: True. + temporal_pool_type (str): The temporal pool type. Choices are 'avg' or + 'max'. Default: 'avg'. + spatial_pool_type (str): The spatial pool type. Choices are 'avg' or + 'max'. Default: 'max'. + """ + + def __init__(self, + lfb_prefix_path, + dataset_mode='train', + use_half_precision=True, + temporal_pool_type='avg', + spatial_pool_type='max', + pretrained=None): + super().__init__() + rank, _ = get_dist_info() + if rank == 0: + if not osp.exists(lfb_prefix_path): + print(f'lfb prefix path {lfb_prefix_path} does not exist. ' + f'Creating the folder...') + mmcv.mkdir_or_exist(lfb_prefix_path) + print('\nInferring LFB...') + + assert temporal_pool_type in ['max', 'avg'] + assert spatial_pool_type in ['max', 'avg'] + self.lfb_prefix_path = lfb_prefix_path + self.dataset_mode = dataset_mode + self.use_half_precision = use_half_precision + self.pretrained = pretrained + + # Pool by default + if temporal_pool_type == 'avg': + self.temporal_pool = nn.AdaptiveAvgPool3d((1, None, None)) + else: + self.temporal_pool = nn.AdaptiveMaxPool3d((1, None, None)) + if spatial_pool_type == 'avg': + self.spatial_pool = nn.AdaptiveAvgPool3d((None, 1, 1)) + else: + self.spatial_pool = nn.AdaptiveMaxPool3d((None, 1, 1)) + + self.all_features = [] + self.all_metadata = [] + + def init_weights(self, pretrained=None): + # LFBInferHead has no parameters to be initialized. + pass + + def forward(self, x, rois, img_metas, **kwargs): + # [N, C, 1, 1, 1] + features = self.temporal_pool(x) + features = self.spatial_pool(features) + if self.use_half_precision: + features = features.half() + + inds = rois[:, 0].type(torch.int64) + for ind in inds: + self.all_metadata.append(img_metas[ind]['img_key']) + self.all_features += list(features) + + # Return the input directly and doesn't affect the input. + return x + + def __del__(self): + assert len(self.all_features) == len(self.all_metadata), ( + 'features and metadata are not equal in length!') + + rank, world_size = get_dist_info() + if world_size > 1: + dist.barrier() + + _lfb = {} + for feature, metadata in zip(self.all_features, self.all_metadata): + video_id, timestamp = metadata.split(',') + timestamp = int(timestamp) + + if video_id not in _lfb: + _lfb[video_id] = {} + if timestamp not in _lfb[video_id]: + _lfb[video_id][timestamp] = [] + + _lfb[video_id][timestamp].append(torch.squeeze(feature)) + + _lfb_file_path = osp.normpath( + osp.join(self.lfb_prefix_path, + f'_lfb_{self.dataset_mode}_{rank}.pkl')) + torch.save(_lfb, _lfb_file_path) + print(f'{len(self.all_features)} features from {len(_lfb)} videos ' + f'on GPU {rank} have been stored in {_lfb_file_path}.') + + # Synchronizes all processes to make sure all gpus have stored their + # roi features + if world_size > 1: + dist.barrier() + if rank > 0: + return + + print('Gathering all the roi features...') + + lfb = {} + for rank_id in range(world_size): + _lfb_file_path = osp.normpath( + osp.join(self.lfb_prefix_path, + f'_lfb_{self.dataset_mode}_{rank_id}.pkl')) + + # Since each frame will only be distributed to one GPU, + # the roi features on the same timestamp of the same video are all + # on the same GPU + _lfb = torch.load(_lfb_file_path) + for video_id in _lfb: + if video_id not in lfb: + lfb[video_id] = _lfb[video_id] + else: + lfb[video_id].update(_lfb[video_id]) + + lfb_file_path = osp.normpath( + osp.join(self.lfb_prefix_path, f'lfb_{self.dataset_mode}.pkl')) + torch.save(lfb, lfb_file_path) + print(f'LFB has been constructed in {lfb_file_path}!') + + +if mmdet_imported: + MMDET_SHARED_HEADS.register_module()(LFBInferHead) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/misc_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/misc_head.py new file mode 100644 index 0000000000000000000000000000000000000000..a2888a26d81e7e80c3b1227216344519cd05d5a9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/misc_head.py @@ -0,0 +1,134 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +from mmcv.cnn import ConvModule, constant_init, kaiming_init +from mmcv.utils import _BatchNorm + +try: + from mmdet.models.builder import SHARED_HEADS as MMDET_SHARED_HEADS + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + +# Note: All these heads take 5D Tensors as input (N, C, T, H, W) + + +class ACRNHead(nn.Module): + """ACRN Head: Tile + 1x1 convolution + 3x3 convolution. + + This module is proposed in + `Actor-Centric Relation Network + `_ + + Args: + in_channels (int): The input channel. + out_channels (int): The output channel. + stride (int): The spatial stride. + num_convs (int): The number of 3x3 convolutions in ACRNHead. + conv_cfg (dict): Config for norm layers. Default: dict(type='Conv'). + norm_cfg (dict): + Config for norm layers. required keys are `type` and + `requires_grad`. Default: dict(type='BN2d', requires_grad=True). + act_cfg (dict): Config for activate layers. + Default: dict(type='ReLU', inplace=True). + kwargs (dict): Other new arguments, to be compatible with MMDet update. + """ + + def __init__(self, + in_channels, + out_channels, + stride=1, + num_convs=1, + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True), + **kwargs): + + super().__init__() + self.in_channels = in_channels + self.out_channels = out_channels + self.stride = stride + self.num_convs = num_convs + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.act_cfg = act_cfg + self.max_pool = nn.AdaptiveMaxPool3d(1) + + self.conv1 = ConvModule( + in_channels, + out_channels, + kernel_size=1, + stride=1, + padding=0, + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg) + + assert num_convs >= 1 + self.conv2 = ConvModule( + out_channels, + out_channels, + kernel_size=(1, 3, 3), + stride=(1, stride, stride), + padding=(0, 1, 1), + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg) + + convs = [] + for _ in range(num_convs - 1): + conv = ConvModule( + out_channels, + out_channels, + kernel_size=(1, 3, 3), + padding=(0, 1, 1), + bias=False, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg) + convs.append(conv) + self.convs = nn.ModuleList(convs) + + def init_weights(self, **kwargs): + """Weight Initialization for ACRNHead.""" + for m in self.modules(): + if isinstance(m, nn.Conv3d): + kaiming_init(m) + elif isinstance(m, _BatchNorm): + constant_init(m, 1) + + def forward(self, x, feat, rois, **kwargs): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The extracted RoI feature. + feat (torch.Tensor): The context feature. + rois (torch.Tensor): The regions of interest. + + Returns: + torch.Tensor: The RoI features that have interacted with context + feature. + """ + # We use max pooling by default + x = self.max_pool(x) + + h, w = feat.shape[-2:] + x_tile = x.repeat(1, 1, 1, h, w) + + roi_inds = rois[:, 0].type(torch.long) + roi_gfeat = feat[roi_inds] + + new_feat = torch.cat([x_tile, roi_gfeat], dim=1) + new_feat = self.conv1(new_feat) + new_feat = self.conv2(new_feat) + + for conv in self.convs: + new_feat = conv(new_feat) + + return new_feat + + +if mmdet_imported: + MMDET_SHARED_HEADS.register_module()(ACRNHead) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/roi_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/roi_head.py new file mode 100644 index 0000000000000000000000000000000000000000..2a06a2586716caef4f0ce93bc851f12b116e8647 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/roi_head.py @@ -0,0 +1,128 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np + +from mmaction.core.bbox import bbox2result + +try: + from mmdet.core.bbox import bbox2roi + from mmdet.models import HEADS as MMDET_HEADS + from mmdet.models.roi_heads import StandardRoIHead + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + +if mmdet_imported: + + @MMDET_HEADS.register_module() + class AVARoIHead(StandardRoIHead): + + def _bbox_forward(self, x, rois, img_metas): + """Defines the computation performed to get bbox predictions. + + Args: + x (torch.Tensor): The input tensor. + rois (torch.Tensor): The regions of interest. + img_metas (list): The meta info of images + + Returns: + dict: bbox predictions with features and classification scores. + """ + bbox_feat, global_feat = self.bbox_roi_extractor(x, rois) + + if self.with_shared_head: + bbox_feat = self.shared_head( + bbox_feat, + feat=global_feat, + rois=rois, + img_metas=img_metas) + + cls_score, bbox_pred = self.bbox_head(bbox_feat) + + bbox_results = dict( + cls_score=cls_score, bbox_pred=bbox_pred, bbox_feats=bbox_feat) + return bbox_results + + def _bbox_forward_train(self, x, sampling_results, gt_bboxes, + gt_labels, img_metas): + """Run forward function and calculate loss for box head in + training.""" + rois = bbox2roi([res.bboxes for res in sampling_results]) + bbox_results = self._bbox_forward(x, rois, img_metas) + + bbox_targets = self.bbox_head.get_targets(sampling_results, + gt_bboxes, gt_labels, + self.train_cfg) + loss_bbox = self.bbox_head.loss(bbox_results['cls_score'], + bbox_results['bbox_pred'], rois, + *bbox_targets) + + bbox_results.update(loss_bbox=loss_bbox) + return bbox_results + + def simple_test(self, + x, + proposal_list, + img_metas, + proposals=None, + rescale=False): + """Defines the computation performed for simple testing.""" + assert self.with_bbox, 'Bbox head must be implemented.' + + if isinstance(x, tuple): + x_shape = x[0].shape + else: + x_shape = x.shape + + assert x_shape[0] == 1, 'only accept 1 sample at test mode' + assert x_shape[0] == len(img_metas) == len(proposal_list) + + det_bboxes, det_labels = self.simple_test_bboxes( + x, img_metas, proposal_list, self.test_cfg, rescale=rescale) + bbox_results = bbox2result( + det_bboxes, + det_labels, + self.bbox_head.num_classes, + thr=self.test_cfg.action_thr) + return [bbox_results] + + def simple_test_bboxes(self, + x, + img_metas, + proposals, + rcnn_test_cfg, + rescale=False): + """Test only det bboxes without augmentation.""" + rois = bbox2roi(proposals) + bbox_results = self._bbox_forward(x, rois, img_metas) + cls_score = bbox_results['cls_score'] + + img_shape = img_metas[0]['img_shape'] + crop_quadruple = np.array([0, 0, 1, 1]) + flip = False + + if 'crop_quadruple' in img_metas[0]: + crop_quadruple = img_metas[0]['crop_quadruple'] + + if 'flip' in img_metas[0]: + flip = img_metas[0]['flip'] + + det_bboxes, det_labels = self.bbox_head.get_det_bboxes( + rois, + cls_score, + img_shape, + flip=flip, + crop_quadruple=crop_quadruple, + cfg=rcnn_test_cfg) + + return det_bboxes, det_labels +else: + # Just define an empty class, so that __init__ can import it. + class AVARoIHead: + + def __init__(self, *args, **kwargs): + raise ImportError( + 'Failed to import `bbox2roi` from `mmdet.core.bbox`, ' + 'or failed to import `HEADS` from `mmdet.models`, ' + 'or failed to import `StandardRoIHead` from ' + '`mmdet.models.roi_heads`. You will be unable to use ' + '`AVARoIHead`. ') diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/slowfast_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/slowfast_head.py new file mode 100644 index 0000000000000000000000000000000000000000..62ff22c0c1749049d23498099e1b88b39430384c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/slowfast_head.py @@ -0,0 +1,80 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +from mmcv.cnn import normal_init + +from ..builder import HEADS +from .base import BaseHead + + +@HEADS.register_module() +class SlowFastHead(BaseHead): + """The classification head for SlowFast. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + loss_cls (dict): Config for building loss. + Default: dict(type='CrossEntropyLoss'). + spatial_type (str): Pooling type in spatial dimension. Default: 'avg'. + dropout_ratio (float): Probability of dropout layer. Default: 0.8. + init_std (float): Std value for Initiation. Default: 0.01. + kwargs (dict, optional): Any keyword argument to be used to initialize + the head. + """ + + def __init__(self, + num_classes, + in_channels, + loss_cls=dict(type='CrossEntropyLoss'), + spatial_type='avg', + dropout_ratio=0.8, + init_std=0.01, + **kwargs): + + super().__init__(num_classes, in_channels, loss_cls, **kwargs) + self.spatial_type = spatial_type + self.dropout_ratio = dropout_ratio + self.init_std = init_std + + if self.dropout_ratio != 0: + self.dropout = nn.Dropout(p=self.dropout_ratio) + else: + self.dropout = None + self.fc_cls = nn.Linear(in_channels, num_classes) + + if self.spatial_type == 'avg': + self.avg_pool = nn.AdaptiveAvgPool3d((1, 1, 1)) + else: + self.avg_pool = None + + def init_weights(self): + """Initiate the parameters from scratch.""" + normal_init(self.fc_cls, std=self.init_std) + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The classification scores for input samples. + """ + # ([N, channel_fast, T, H, W], [(N, channel_slow, T, H, W)]) + x_fast, x_slow = x + # ([N, channel_fast, 1, 1, 1], [N, channel_slow, 1, 1, 1]) + x_fast = self.avg_pool(x_fast) + x_slow = self.avg_pool(x_slow) + # [N, channel_fast + channel_slow, 1, 1, 1] + x = torch.cat((x_slow, x_fast), dim=1) + + if self.dropout is not None: + x = self.dropout(x) + + # [N x C] + x = x.view(x.size(0), -1) + # [N x num_classes] + cls_score = self.fc_cls(x) + + return cls_score diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/ssn_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/ssn_head.py new file mode 100644 index 0000000000000000000000000000000000000000..239e349d6945b4ccdc611818e3e7bc8c2f4750a0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/ssn_head.py @@ -0,0 +1,413 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +from mmcv.cnn import normal_init + +from ..builder import HEADS + + +def parse_stage_config(stage_cfg): + """Parse config of STPP for three stages. + + Args: + stage_cfg (int | tuple[int]): + Config of structured temporal pyramid pooling. + + Returns: + tuple[tuple[int], int]: + Config of structured temporal pyramid pooling and + total number of parts(number of multipliers). + """ + if isinstance(stage_cfg, int): + return (stage_cfg, ), stage_cfg + if isinstance(stage_cfg, tuple): + return stage_cfg, sum(stage_cfg) + raise ValueError(f'Incorrect STPP config {stage_cfg}') + + +class STPPTrain(nn.Module): + """Structured temporal pyramid pooling for SSN at training. + + Args: + stpp_stage (tuple): Config of structured temporal pyramid pooling. + Default: (1, (1, 2), 1). + num_segments_list (tuple): Number of segments to be sampled + in three stages. Default: (2, 5, 2). + """ + + def __init__(self, stpp_stage=(1, (1, 2), 1), num_segments_list=(2, 5, 2)): + super().__init__() + + starting_part, starting_multiplier = parse_stage_config(stpp_stage[0]) + course_part, course_multiplier = parse_stage_config(stpp_stage[1]) + ending_part, ending_multiplier = parse_stage_config(stpp_stage[2]) + + self.num_multipliers = ( + starting_multiplier + course_multiplier + ending_multiplier) + self.stpp_stages = (starting_part, course_part, ending_part) + self.multiplier_list = (starting_multiplier, course_multiplier, + ending_multiplier) + + self.num_segments_list = num_segments_list + + @staticmethod + def _extract_stage_feature(stage_feat, stage_parts, num_multipliers, + scale_factors, num_samples): + """Extract stage feature based on structured temporal pyramid pooling. + + Args: + stage_feat (torch.Tensor): Stage features to be STPP. + stage_parts (tuple): Config of STPP. + num_multipliers (int): Total number of parts in the stage. + scale_factors (list): Ratios of the effective sampling lengths + to augmented lengths. + num_samples (int): Number of samples. + + Returns: + torch.Tensor: Features of the stage. + """ + stage_stpp_feat = [] + stage_len = stage_feat.size(1) + for stage_part in stage_parts: + ticks = torch.arange(0, stage_len + 1e-5, + stage_len / stage_part).int() + for i in range(stage_part): + part_feat = stage_feat[:, ticks[i]:ticks[i + 1], :].mean( + dim=1) / num_multipliers + if scale_factors is not None: + part_feat = ( + part_feat * scale_factors.view(num_samples, 1)) + stage_stpp_feat.append(part_feat) + return stage_stpp_feat + + def forward(self, x, scale_factors): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + scale_factors (list): Ratios of the effective sampling lengths + to augmented lengths. + + Returns: + tuple[torch.Tensor, torch.Tensor]: + Features for predicting activity scores and + completeness scores. + """ + x0 = self.num_segments_list[0] + x1 = x0 + self.num_segments_list[1] + num_segments = x1 + self.num_segments_list[2] + + feat_dim = x.size(1) + x = x.view(-1, num_segments, feat_dim) + num_samples = x.size(0) + + scale_factors = scale_factors.view(-1, 2) + + stage_stpp_feats = [] + stage_stpp_feats.extend( + self._extract_stage_feature(x[:, :x0, :], self.stpp_stages[0], + self.multiplier_list[0], + scale_factors[:, 0], num_samples)) + stage_stpp_feats.extend( + self._extract_stage_feature(x[:, x0:x1, :], self.stpp_stages[1], + self.multiplier_list[1], None, + num_samples)) + stage_stpp_feats.extend( + self._extract_stage_feature(x[:, x1:, :], self.stpp_stages[2], + self.multiplier_list[2], + scale_factors[:, 1], num_samples)) + stpp_feat = torch.cat(stage_stpp_feats, dim=1) + + course_feat = x[:, x0:x1, :].mean(dim=1) + return course_feat, stpp_feat + + +class STPPTest(nn.Module): + """Structured temporal pyramid pooling for SSN at testing. + + Args: + num_classes (int): Number of classes to be classified. + use_regression (bool): Whether to perform regression or not. + Default: True. + stpp_stage (tuple): Config of structured temporal pyramid pooling. + Default: (1, (1, 2), 1). + """ + + def __init__(self, + num_classes, + use_regression=True, + stpp_stage=(1, (1, 2), 1)): + super().__init__() + + self.activity_score_len = num_classes + 1 + self.complete_score_len = num_classes + self.reg_score_len = num_classes * 2 + self.use_regression = use_regression + + starting_parts, starting_multiplier = parse_stage_config(stpp_stage[0]) + course_parts, course_multiplier = parse_stage_config(stpp_stage[1]) + ending_parts, ending_multiplier = parse_stage_config(stpp_stage[2]) + + self.num_multipliers = ( + starting_multiplier + course_multiplier + ending_multiplier) + if self.use_regression: + self.feat_dim = ( + self.activity_score_len + self.num_multipliers * + (self.complete_score_len + self.reg_score_len)) + else: + self.feat_dim = ( + self.activity_score_len + + self.num_multipliers * self.complete_score_len) + self.stpp_stage = (starting_parts, course_parts, ending_parts) + + self.activity_slice = slice(0, self.activity_score_len) + self.complete_slice = slice( + self.activity_slice.stop, self.activity_slice.stop + + self.complete_score_len * self.num_multipliers) + self.reg_slice = slice( + self.complete_slice.stop, self.complete_slice.stop + + self.reg_score_len * self.num_multipliers) + + @staticmethod + def _pyramids_pooling(out_scores, index, raw_scores, ticks, scale_factors, + score_len, stpp_stage): + """Perform pyramids pooling. + + Args: + out_scores (torch.Tensor): Scores to be returned. + index (int): Index of output scores. + raw_scores (torch.Tensor): Raw scores before STPP. + ticks (list): Ticks of raw scores. + scale_factors (list): Ratios of the effective sampling lengths + to augmented lengths. + score_len (int): Length of the score. + stpp_stage (tuple): Config of STPP. + """ + offset = 0 + for stage_idx, stage_cfg in enumerate(stpp_stage): + if stage_idx == 0: + scale_factor = scale_factors[0] + elif stage_idx == len(stpp_stage) - 1: + scale_factor = scale_factors[1] + else: + scale_factor = 1.0 + + sum_parts = sum(stage_cfg) + tick_left = ticks[stage_idx] + tick_right = float(max(ticks[stage_idx] + 1, ticks[stage_idx + 1])) + + if tick_right <= 0 or tick_left >= raw_scores.size(0): + offset += sum_parts + continue + for num_parts in stage_cfg: + part_ticks = torch.arange(tick_left, tick_right + 1e-5, + (tick_right - tick_left) / + num_parts).int() + + for i in range(num_parts): + part_tick_left = part_ticks[i] + part_tick_right = part_ticks[i + 1] + if part_tick_right - part_tick_left >= 1: + raw_score = raw_scores[part_tick_left:part_tick_right, + offset * + score_len:(offset + 1) * + score_len] + raw_scale_score = raw_score.mean(dim=0) * scale_factor + out_scores[index, :] += raw_scale_score.detach().cpu() + offset += 1 + + return out_scores + + def forward(self, x, proposal_ticks, scale_factors): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + proposal_ticks (list): Ticks of proposals to be STPP. + scale_factors (list): Ratios of the effective sampling lengths + to augmented lengths. + + Returns: + tuple[torch.Tensor, torch.Tensor, torch.Tensor]: + out_activity_scores (torch.Tensor): Activity scores + out_complete_scores (torch.Tensor): Completeness scores. + out_reg_scores (torch.Tensor): Regression scores. + """ + assert x.size(1) == self.feat_dim + num_ticks = proposal_ticks.size(0) + + out_activity_scores = torch.zeros((num_ticks, self.activity_score_len), + dtype=x.dtype) + raw_activity_scores = x[:, self.activity_slice] + + out_complete_scores = torch.zeros((num_ticks, self.complete_score_len), + dtype=x.dtype) + raw_complete_scores = x[:, self.complete_slice] + + if self.use_regression: + out_reg_scores = torch.zeros((num_ticks, self.reg_score_len), + dtype=x.dtype) + raw_reg_scores = x[:, self.reg_slice] + else: + out_reg_scores = None + raw_reg_scores = None + + for i in range(num_ticks): + ticks = proposal_ticks[i] + + out_activity_scores[i, :] = raw_activity_scores[ + ticks[1]:max(ticks[1] + 1, ticks[2]), :].mean(dim=0) + + out_complete_scores = self._pyramids_pooling( + out_complete_scores, i, raw_complete_scores, ticks, + scale_factors[i], self.complete_score_len, self.stpp_stage) + + if self.use_regression: + out_reg_scores = self._pyramids_pooling( + out_reg_scores, i, raw_reg_scores, ticks, scale_factors[i], + self.reg_score_len, self.stpp_stage) + + return out_activity_scores, out_complete_scores, out_reg_scores + + +@HEADS.register_module() +class SSNHead(nn.Module): + """The classification head for SSN. + + Args: + dropout_ratio (float): Probability of dropout layer. Default: 0.8. + in_channels (int): Number of channels for input data. Default: 1024. + num_classes (int): Number of classes to be classified. Default: 20. + consensus (dict): Config of segmental consensus. + use_regression (bool): Whether to perform regression or not. + Default: True. + init_std (float): Std value for Initiation. Default: 0.001. + """ + + def __init__(self, + dropout_ratio=0.8, + in_channels=1024, + num_classes=20, + consensus=dict( + type='STPPTrain', + standalong_classifier=True, + stpp_cfg=(1, 1, 1), + num_seg=(2, 5, 2)), + use_regression=True, + init_std=0.001): + + super().__init__() + + self.dropout_ratio = dropout_ratio + self.num_classes = num_classes + self.use_regression = use_regression + self.init_std = init_std + + if self.dropout_ratio != 0: + self.dropout = nn.Dropout(p=self.dropout_ratio) + else: + self.dropout = None + + # Based on this copy, the model will utilize different + # structured temporal pyramid pooling at training and testing. + # Warning: this copy cannot be removed. + consensus_ = consensus.copy() + consensus_type = consensus_.pop('type') + if consensus_type == 'STPPTrain': + self.consensus = STPPTrain(**consensus_) + elif consensus_type == 'STPPTest': + consensus_['num_classes'] = self.num_classes + self.consensus = STPPTest(**consensus_) + + self.in_channels_activity = in_channels + self.in_channels_complete = ( + self.consensus.num_multipliers * in_channels) + self.activity_fc = nn.Linear(in_channels, num_classes + 1) + self.completeness_fc = nn.Linear(self.in_channels_complete, + num_classes) + if self.use_regression: + self.regressor_fc = nn.Linear(self.in_channels_complete, + num_classes * 2) + + def init_weights(self): + """Initiate the parameters from scratch.""" + normal_init(self.activity_fc, std=self.init_std) + normal_init(self.completeness_fc, std=self.init_std) + if self.use_regression: + normal_init(self.regressor_fc, std=self.init_std) + + def prepare_test_fc(self, stpp_feat_multiplier): + """Reorganize the shape of fully connected layer at testing, in order + to improve testing efficiency. + + Args: + stpp_feat_multiplier (int): Total number of parts. + + Returns: + bool: Whether the shape transformation is ready for testing. + """ + + in_features = self.activity_fc.in_features + out_features = ( + self.activity_fc.out_features + + self.completeness_fc.out_features * stpp_feat_multiplier) + if self.use_regression: + out_features += ( + self.regressor_fc.out_features * stpp_feat_multiplier) + self.test_fc = nn.Linear(in_features, out_features) + + # Fetch weight and bias of the reorganized fc. + complete_weight = self.completeness_fc.weight.data.view( + self.completeness_fc.out_features, stpp_feat_multiplier, + in_features).transpose(0, 1).contiguous().view(-1, in_features) + complete_bias = self.completeness_fc.bias.data.view(1, -1).expand( + stpp_feat_multiplier, self.completeness_fc.out_features + ).contiguous().view(-1) / stpp_feat_multiplier + + weight = torch.cat((self.activity_fc.weight.data, complete_weight)) + bias = torch.cat((self.activity_fc.bias.data, complete_bias)) + + if self.use_regression: + reg_weight = self.regressor_fc.weight.data.view( + self.regressor_fc.out_features, stpp_feat_multiplier, + in_features).transpose(0, + 1).contiguous().view(-1, in_features) + reg_bias = self.regressor_fc.bias.data.view(1, -1).expand( + stpp_feat_multiplier, self.regressor_fc.out_features + ).contiguous().view(-1) / stpp_feat_multiplier + weight = torch.cat((weight, reg_weight)) + bias = torch.cat((bias, reg_bias)) + + self.test_fc.weight.data = weight + self.test_fc.bias.data = bias + return True + + def forward(self, x, test_mode=False): + """Defines the computation performed at every call.""" + if not test_mode: + x, proposal_scale_factor = x + activity_feat, completeness_feat = self.consensus( + x, proposal_scale_factor) + + if self.dropout is not None: + activity_feat = self.dropout(activity_feat) + completeness_feat = self.dropout(completeness_feat) + + activity_scores = self.activity_fc(activity_feat) + complete_scores = self.completeness_fc(completeness_feat) + if self.use_regression: + bbox_preds = self.regressor_fc(completeness_feat) + bbox_preds = bbox_preds.view(-1, + self.completeness_fc.out_features, + 2) + else: + bbox_preds = None + return activity_scores, complete_scores, bbox_preds + + x, proposal_tick_list, scale_factor_list = x + test_scores = self.test_fc(x) + (activity_scores, completeness_scores, + bbox_preds) = self.consensus(test_scores, proposal_tick_list, + scale_factor_list) + + return (test_scores, activity_scores, completeness_scores, bbox_preds) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/stgcn_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/stgcn_head.py new file mode 100644 index 0000000000000000000000000000000000000000..1961b46421e1ac9ee67a278fbbace8165f349f39 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/stgcn_head.py @@ -0,0 +1,65 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import normal_init + +from ..builder import HEADS +from .base import BaseHead + + +@HEADS.register_module() +class STGCNHead(BaseHead): + """The classification head for STGCN. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + loss_cls (dict): Config for building loss. + Default: dict(type='CrossEntropyLoss') + spatial_type (str): Pooling type in spatial dimension. Default: 'avg'. + num_person (int): Number of person. Default: 2. + init_std (float): Std value for Initiation. Default: 0.01. + kwargs (dict, optional): Any keyword argument to be used to initialize + the head. + """ + + def __init__(self, + num_classes, + in_channels, + loss_cls=dict(type='CrossEntropyLoss'), + spatial_type='avg', + num_person=2, + init_std=0.01, + **kwargs): + super().__init__(num_classes, in_channels, loss_cls, **kwargs) + + self.spatial_type = spatial_type + self.in_channels = in_channels + self.num_classes = num_classes + self.num_person = num_person + self.init_std = init_std + + self.pool = None + if self.spatial_type == 'avg': + self.pool = nn.AdaptiveAvgPool2d((1, 1)) + elif self.spatial_type == 'max': + self.pool = nn.AdaptiveMaxPool2d((1, 1)) + else: + raise NotImplementedError + + self.fc = nn.Conv2d(self.in_channels, self.num_classes, kernel_size=1) + + def init_weights(self): + normal_init(self.fc, std=self.init_std) + + def forward(self, x): + # global pooling + assert self.pool is not None + x = self.pool(x) + x = x.view(x.shape[0] // self.num_person, self.num_person, -1, 1, + 1).mean(dim=1) + + # prediction + x = self.fc(x) + x = x.view(x.shape[0], -1) + + return x diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/timesformer_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/timesformer_head.py new file mode 100644 index 0000000000000000000000000000000000000000..72ccf562bd3a6f12ad4707df7ca40db8182b05b6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/timesformer_head.py @@ -0,0 +1,41 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import trunc_normal_init + +from ..builder import HEADS +from .base import BaseHead + + +@HEADS.register_module() +class TimeSformerHead(BaseHead): + """Classification head for TimeSformer. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + loss_cls (dict): Config for building loss. + Defaults to `dict(type='CrossEntropyLoss')`. + init_std (float): Std value for Initiation. Defaults to 0.02. + kwargs (dict, optional): Any keyword argument to be used to initialize + the head. + """ + + def __init__(self, + num_classes, + in_channels, + loss_cls=dict(type='CrossEntropyLoss'), + init_std=0.02, + **kwargs): + super().__init__(num_classes, in_channels, loss_cls, **kwargs) + self.init_std = init_std + self.fc_cls = nn.Linear(self.in_channels, self.num_classes) + + def init_weights(self): + """Initiate the parameters from scratch.""" + trunc_normal_init(self.fc_cls, std=self.init_std) + + def forward(self, x): + # [N, in_channels] + cls_score = self.fc_cls(x) + # [N, num_classes] + return cls_score diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/tpn_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/tpn_head.py new file mode 100644 index 0000000000000000000000000000000000000000..051feaa2176459d1b0399c65563467b6d85b7e05 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/tpn_head.py @@ -0,0 +1,91 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn + +from ..builder import HEADS +from .tsn_head import TSNHead + + +@HEADS.register_module() +class TPNHead(TSNHead): + """Class head for TPN. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + loss_cls (dict): Config for building loss. + Default: dict(type='CrossEntropyLoss'). + spatial_type (str): Pooling type in spatial dimension. Default: 'avg'. + consensus (dict): Consensus config dict. + dropout_ratio (float): Probability of dropout layer. Default: 0.4. + init_std (float): Std value for Initiation. Default: 0.01. + multi_class (bool): Determines whether it is a multi-class + recognition task. Default: False. + label_smooth_eps (float): Epsilon used in label smooth. + Reference: https://arxiv.org/abs/1906.02629. Default: 0. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + if self.spatial_type == 'avg': + # use `nn.AdaptiveAvgPool3d` to adaptively match the in_channels. + self.avg_pool3d = nn.AdaptiveAvgPool3d((1, 1, 1)) + else: + self.avg_pool3d = None + + self.avg_pool2d = None + self.new_cls = None + + def _init_new_cls(self): + self.new_cls = nn.Conv3d(self.in_channels, self.num_classes, 1, 1, 0) + if next(self.fc_cls.parameters()).is_cuda: + self.new_cls = self.new_cls.cuda() + self.new_cls.weight.copy_(self.fc_cls.weight[..., None, None, None]) + self.new_cls.bias.copy_(self.fc_cls.bias) + + def forward(self, x, num_segs=None, fcn_test=False): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + num_segs (int | None): Number of segments into which a video + is divided. Default: None. + fcn_test (bool): Whether to apply full convolution (fcn) testing. + Default: False. + + Returns: + torch.Tensor: The classification scores for input samples. + """ + if fcn_test: + if self.avg_pool3d: + x = self.avg_pool3d(x) + if self.new_cls is None: + self._init_new_cls() + cls_score_feat_map = self.new_cls(x) + return cls_score_feat_map + + if self.avg_pool2d is None: + kernel_size = (1, x.shape[-2], x.shape[-1]) + self.avg_pool2d = nn.AvgPool3d(kernel_size, stride=1, padding=0) + + if num_segs is None: + # [N, in_channels, 3, 7, 7] + x = self.avg_pool3d(x) + else: + # [N * num_segs, in_channels, 7, 7] + x = self.avg_pool2d(x) + # [N * num_segs, in_channels, 1, 1] + x = x.reshape((-1, num_segs) + x.shape[1:]) + # [N, num_segs, in_channels, 1, 1] + x = self.consensus(x) + # [N, 1, in_channels, 1, 1] + x = x.squeeze(1) + # [N, in_channels, 1, 1] + if self.dropout is not None: + x = self.dropout(x) + # [N, in_channels, 1, 1] + x = x.view(x.size(0), -1) + # [N, in_channels] + cls_score = self.fc_cls(x) + # [N, num_classes] + return cls_score diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/trn_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/trn_head.py new file mode 100644 index 0000000000000000000000000000000000000000..7a2a21bb6a78a7ec2bf90b61ea33c58ea6c0e2ac --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/trn_head.py @@ -0,0 +1,211 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import itertools + +import numpy as np +import torch +import torch.nn as nn +from mmcv.cnn import normal_init + +from ..builder import HEADS +from .base import BaseHead + + +class RelationModule(nn.Module): + """Relation Module of TRN. + + Args: + hidden_dim (int): The dimension of hidden layer of MLP in relation + module. + num_segments (int): Number of frame segments. + num_classes (int): Number of classes to be classified. + """ + + def __init__(self, hidden_dim, num_segments, num_classes): + super().__init__() + self.hidden_dim = hidden_dim + self.num_segments = num_segments + self.num_classes = num_classes + bottleneck_dim = 512 + self.classifier = nn.Sequential( + nn.ReLU(), + nn.Linear(self.num_segments * self.hidden_dim, bottleneck_dim), + nn.ReLU(), nn.Linear(bottleneck_dim, self.num_classes)) + + def init_weights(self): + # Use the default kaiming_uniform for all nn.linear layers. + pass + + def forward(self, x): + # [N, num_segs * hidden_dim] + x = x.view(x.size(0), -1) + x = self.classifier(x) + return x + + +class RelationModuleMultiScale(nn.Module): + """Relation Module with Multi Scale of TRN. + + Args: + hidden_dim (int): The dimension of hidden layer of MLP in relation + module. + num_segments (int): Number of frame segments. + num_classes (int): Number of classes to be classified. + """ + + def __init__(self, hidden_dim, num_segments, num_classes): + super().__init__() + self.hidden_dim = hidden_dim + self.num_segments = num_segments + self.num_classes = num_classes + + # generate the multiple frame relations + self.scales = range(num_segments, 1, -1) + + self.relations_scales = [] + self.subsample_scales = [] + max_subsample = 3 + for scale in self.scales: + # select the different frame features for different scales + relations_scale = list( + itertools.combinations(range(self.num_segments), scale)) + self.relations_scales.append(relations_scale) + # sample `max_subsample` relation_scale at most + self.subsample_scales.append( + min(max_subsample, len(relations_scale))) + assert len(self.relations_scales[0]) == 1 + + bottleneck_dim = 256 + self.fc_fusion_scales = nn.ModuleList() + for scale in self.scales: + fc_fusion = nn.Sequential( + nn.ReLU(), nn.Linear(scale * self.hidden_dim, bottleneck_dim), + nn.ReLU(), nn.Linear(bottleneck_dim, self.num_classes)) + self.fc_fusion_scales.append(fc_fusion) + + def init_weights(self): + # Use the default kaiming_uniform for all nn.linear layers. + pass + + def forward(self, x): + # the first one is the largest scale + act_all = x[:, self.relations_scales[0][0], :] + act_all = act_all.view( + act_all.size(0), self.scales[0] * self.hidden_dim) + act_all = self.fc_fusion_scales[0](act_all) + + for scaleID in range(1, len(self.scales)): + # iterate over the scales + idx_relations_randomsample = np.random.choice( + len(self.relations_scales[scaleID]), + self.subsample_scales[scaleID], + replace=False) + for idx in idx_relations_randomsample: + act_relation = x[:, self.relations_scales[scaleID][idx], :] + act_relation = act_relation.view( + act_relation.size(0), + self.scales[scaleID] * self.hidden_dim) + act_relation = self.fc_fusion_scales[scaleID](act_relation) + act_all += act_relation + return act_all + + +@HEADS.register_module() +class TRNHead(BaseHead): + """Class head for TRN. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + num_segments (int): Number of frame segments. Default: 8. + loss_cls (dict): Config for building loss. Default: + dict(type='CrossEntropyLoss') + spatial_type (str): Pooling type in spatial dimension. Default: 'avg'. + relation_type (str): The relation module type. Choices are 'TRN' or + 'TRNMultiScale'. Default: 'TRNMultiScale'. + hidden_dim (int): The dimension of hidden layer of MLP in relation + module. Default: 256. + dropout_ratio (float): Probability of dropout layer. Default: 0.8. + init_std (float): Std value for Initiation. Default: 0.001. + kwargs (dict, optional): Any keyword argument to be used to initialize + the head. + """ + + def __init__(self, + num_classes, + in_channels, + num_segments=8, + loss_cls=dict(type='CrossEntropyLoss'), + spatial_type='avg', + relation_type='TRNMultiScale', + hidden_dim=256, + dropout_ratio=0.8, + init_std=0.001, + **kwargs): + super().__init__(num_classes, in_channels, loss_cls, **kwargs) + + self.num_classes = num_classes + self.in_channels = in_channels + self.num_segments = num_segments + self.spatial_type = spatial_type + self.relation_type = relation_type + self.hidden_dim = hidden_dim + self.dropout_ratio = dropout_ratio + self.init_std = init_std + + if self.relation_type == 'TRN': + self.consensus = RelationModule(self.hidden_dim, self.num_segments, + self.num_classes) + elif self.relation_type == 'TRNMultiScale': + self.consensus = RelationModuleMultiScale(self.hidden_dim, + self.num_segments, + self.num_classes) + else: + raise ValueError(f'Unknown Relation Type {self.relation_type}!') + + if self.dropout_ratio != 0: + self.dropout = nn.Dropout(p=self.dropout_ratio) + else: + self.dropout = None + self.fc_cls = nn.Linear(self.in_channels, self.hidden_dim) + + if self.spatial_type == 'avg': + # use `nn.AdaptiveAvgPool2d` to adaptively match the in_channels. + self.avg_pool = nn.AdaptiveAvgPool2d(1) + else: + self.avg_pool = None + + def init_weights(self): + """Initiate the parameters from scratch.""" + normal_init(self.fc_cls, std=self.init_std) + self.consensus.init_weights() + + def forward(self, x, num_segs): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + num_segs (int): Useless in TRNHead. By default, `num_segs` + is equal to `clip_len * num_clips * num_crops`, which is + automatically generated in Recognizer forward phase and + useless in TRN models. The `self.num_segments` we need is a + hyper parameter to build TRN models. + Returns: + torch.Tensor: The classification scores for input samples. + """ + # [N * num_segs, in_channels, 7, 7] + if self.avg_pool is not None: + x = self.avg_pool(x) + # [N * num_segs, in_channels, 1, 1] + x = torch.flatten(x, 1) + # [N * num_segs, in_channels] + if self.dropout is not None: + x = self.dropout(x) + + # [N, num_segs, hidden_dim] + cls_score = self.fc_cls(x) + cls_score = cls_score.view((-1, self.num_segments) + + cls_score.size()[1:]) + + # [N, num_classes] + cls_score = self.consensus(cls_score) + return cls_score diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/tsm_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/tsm_head.py new file mode 100644 index 0000000000000000000000000000000000000000..b181f3db435e2c0f9140950d7d996497ed11fbfa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/tsm_head.py @@ -0,0 +1,112 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +from mmcv.cnn import normal_init + +from ..builder import HEADS +from .base import AvgConsensus, BaseHead + + +@HEADS.register_module() +class TSMHead(BaseHead): + """Class head for TSM. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + num_segments (int): Number of frame segments. Default: 8. + loss_cls (dict): Config for building loss. + Default: dict(type='CrossEntropyLoss') + spatial_type (str): Pooling type in spatial dimension. Default: 'avg'. + consensus (dict): Consensus config dict. + dropout_ratio (float): Probability of dropout layer. Default: 0.4. + init_std (float): Std value for Initiation. Default: 0.01. + is_shift (bool): Indicating whether the feature is shifted. + Default: True. + temporal_pool (bool): Indicating whether feature is temporal pooled. + Default: False. + kwargs (dict, optional): Any keyword argument to be used to initialize + the head. + """ + + def __init__(self, + num_classes, + in_channels, + num_segments=8, + loss_cls=dict(type='CrossEntropyLoss'), + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.8, + init_std=0.001, + is_shift=True, + temporal_pool=False, + **kwargs): + super().__init__(num_classes, in_channels, loss_cls, **kwargs) + + self.spatial_type = spatial_type + self.dropout_ratio = dropout_ratio + self.num_segments = num_segments + self.init_std = init_std + self.is_shift = is_shift + self.temporal_pool = temporal_pool + + consensus_ = consensus.copy() + + consensus_type = consensus_.pop('type') + if consensus_type == 'AvgConsensus': + self.consensus = AvgConsensus(**consensus_) + else: + self.consensus = None + + if self.dropout_ratio != 0: + self.dropout = nn.Dropout(p=self.dropout_ratio) + else: + self.dropout = None + self.fc_cls = nn.Linear(self.in_channels, self.num_classes) + + if self.spatial_type == 'avg': + # use `nn.AdaptiveAvgPool2d` to adaptively match the in_channels. + self.avg_pool = nn.AdaptiveAvgPool2d(1) + else: + self.avg_pool = None + + def init_weights(self): + """Initiate the parameters from scratch.""" + normal_init(self.fc_cls, std=self.init_std) + + def forward(self, x, num_segs): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + num_segs (int): Useless in TSMHead. By default, `num_segs` + is equal to `clip_len * num_clips * num_crops`, which is + automatically generated in Recognizer forward phase and + useless in TSM models. The `self.num_segments` we need is a + hyper parameter to build TSM models. + Returns: + torch.Tensor: The classification scores for input samples. + """ + # [N * num_segs, in_channels, 7, 7] + if self.avg_pool is not None: + x = self.avg_pool(x) + # [N * num_segs, in_channels, 1, 1] + x = torch.flatten(x, 1) + # [N * num_segs, in_channels] + if self.dropout is not None: + x = self.dropout(x) + # [N * num_segs, num_classes] + cls_score = self.fc_cls(x) + + if self.is_shift and self.temporal_pool: + # [2 * N, num_segs // 2, num_classes] + cls_score = cls_score.view((-1, self.num_segments // 2) + + cls_score.size()[1:]) + else: + # [N, num_segs, num_classes] + cls_score = cls_score.view((-1, self.num_segments) + + cls_score.size()[1:]) + # [N, 1, num_classes] + cls_score = self.consensus(cls_score) + # [N, num_classes] + return cls_score.squeeze(1) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/tsn_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/tsn_head.py new file mode 100644 index 0000000000000000000000000000000000000000..73d9ae4f3b987381ed0c6d4beee3b903aec1ae02 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/tsn_head.py @@ -0,0 +1,95 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import normal_init + +from ..builder import HEADS +from .base import AvgConsensus, BaseHead + + +@HEADS.register_module() +class TSNHead(BaseHead): + """Class head for TSN. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + loss_cls (dict): Config for building loss. + Default: dict(type='CrossEntropyLoss'). + spatial_type (str): Pooling type in spatial dimension. Default: 'avg'. + consensus (dict): Consensus config dict. + dropout_ratio (float): Probability of dropout layer. Default: 0.4. + init_std (float): Std value for Initiation. Default: 0.01. + kwargs (dict, optional): Any keyword argument to be used to initialize + the head. + """ + + def __init__(self, + num_classes, + in_channels, + loss_cls=dict(type='CrossEntropyLoss'), + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1), + dropout_ratio=0.4, + init_std=0.01, + **kwargs): + super().__init__(num_classes, in_channels, loss_cls=loss_cls, **kwargs) + + self.spatial_type = spatial_type + self.dropout_ratio = dropout_ratio + self.init_std = init_std + + consensus_ = consensus.copy() + + consensus_type = consensus_.pop('type') + if consensus_type == 'AvgConsensus': + self.consensus = AvgConsensus(**consensus_) + else: + self.consensus = None + + if self.spatial_type == 'avg': + # use `nn.AdaptiveAvgPool2d` to adaptively match the in_channels. + self.avg_pool = nn.AdaptiveAvgPool2d((1, 1)) + else: + self.avg_pool = None + + if self.dropout_ratio != 0: + self.dropout = nn.Dropout(p=self.dropout_ratio) + else: + self.dropout = None + self.fc_cls = nn.Linear(self.in_channels, self.num_classes) + + def init_weights(self): + """Initiate the parameters from scratch.""" + normal_init(self.fc_cls, std=self.init_std) + + def forward(self, x, num_segs): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + num_segs (int): Number of segments into which a video + is divided. + Returns: + torch.Tensor: The classification scores for input samples. + """ + # [N * num_segs, in_channels, 7, 7] + if self.avg_pool is not None: + if isinstance(x, tuple): + shapes = [y.shape for y in x] + assert 1 == 0, f'x is tuple {shapes}' + x = self.avg_pool(x) + # [N * num_segs, in_channels, 1, 1] + x = x.reshape((-1, num_segs) + x.shape[1:]) + # [N, num_segs, in_channels, 1, 1] + x = self.consensus(x) + # [N, 1, in_channels, 1, 1] + x = x.squeeze(1) + # [N, in_channels, 1, 1] + if self.dropout is not None: + x = self.dropout(x) + # [N, in_channels, 1, 1] + x = x.view(x.size(0), -1) + # [N, in_channels] + cls_score = self.fc_cls(x) + # [N, num_classes] + return cls_score diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/x3d_head.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/x3d_head.py new file mode 100644 index 0000000000000000000000000000000000000000..4007744ff8ff9ee9a7969007181b3514144b9224 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/heads/x3d_head.py @@ -0,0 +1,90 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn as nn +from mmcv.cnn import normal_init + +from ..builder import HEADS +from .base import BaseHead + + +@HEADS.register_module() +class X3DHead(BaseHead): + """Classification head for I3D. + + Args: + num_classes (int): Number of classes to be classified. + in_channels (int): Number of channels in input feature. + loss_cls (dict): Config for building loss. + Default: dict(type='CrossEntropyLoss') + spatial_type (str): Pooling type in spatial dimension. Default: 'avg'. + dropout_ratio (float): Probability of dropout layer. Default: 0.5. + init_std (float): Std value for Initiation. Default: 0.01. + fc1_bias (bool): If the first fc layer has bias. Default: False. + """ + + def __init__(self, + num_classes, + in_channels, + loss_cls=dict(type='CrossEntropyLoss'), + spatial_type='avg', + dropout_ratio=0.5, + init_std=0.01, + fc1_bias=False): + super().__init__(num_classes, in_channels, loss_cls) + + self.spatial_type = spatial_type + self.dropout_ratio = dropout_ratio + self.init_std = init_std + if self.dropout_ratio != 0: + self.dropout = nn.Dropout(p=self.dropout_ratio) + else: + self.dropout = None + self.in_channels = in_channels + self.mid_channels = 2048 + self.num_classes = num_classes + self.fc1_bias = fc1_bias + + self.fc1 = nn.Linear( + self.in_channels, self.mid_channels, bias=self.fc1_bias) + self.fc2 = nn.Linear(self.mid_channels, self.num_classes) + + self.relu = nn.ReLU() + + self.pool = None + if self.spatial_type == 'avg': + self.pool = nn.AdaptiveAvgPool3d((1, 1, 1)) + elif self.spatial_type == 'max': + self.pool = nn.AdaptiveMaxPool3d((1, 1, 1)) + else: + raise NotImplementedError + + def init_weights(self): + """Initiate the parameters from scratch.""" + normal_init(self.fc1, std=self.init_std) + normal_init(self.fc2, std=self.init_std) + + def forward(self, x): + """Defines the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The classification scores for input samples. + """ + # [N, in_channels, T, H, W] + assert self.pool is not None + x = self.pool(x) + # [N, in_channels, 1, 1, 1] + # [N, in_channels, 1, 1, 1] + x = x.view(x.shape[0], -1) + # [N, in_channels] + x = self.fc1(x) + # [N, 2048] + x = self.relu(x) + + if self.dropout is not None: + x = self.dropout(x) + + cls_score = self.fc2(x) + # [N, num_classes] + return cls_score diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..4befe6f0b2ca8a38e997aaa151701486c77e98e6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/__init__.py @@ -0,0 +1,7 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .base import BaseTAGClassifier, BaseTAPGenerator +from .bmn import BMN +from .bsn import PEM, TEM +from .ssn import SSN + +__all__ = ['PEM', 'TEM', 'BMN', 'SSN', 'BaseTAPGenerator', 'BaseTAGClassifier'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/base.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/base.py new file mode 100644 index 0000000000000000000000000000000000000000..65b5c6f304cf8b1e27bb0f7abeb849489b9f1903 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/base.py @@ -0,0 +1,262 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import warnings +from abc import ABCMeta, abstractmethod +from collections import OrderedDict + +import torch +import torch.distributed as dist +import torch.nn as nn + +from .. import builder + + +class BaseTAPGenerator(nn.Module, metaclass=ABCMeta): + """Base class for temporal action proposal generator. + + All temporal action proposal generator should subclass it. All subclass + should overwrite: Methods:``forward_train``, supporting to forward when + training. Methods:``forward_test``, supporting to forward when testing. + """ + + @abstractmethod + def forward_train(self, *args, **kwargs): + """Defines the computation performed at training.""" + + @abstractmethod + def forward_test(self, *args): + """Defines the computation performed at testing.""" + + @abstractmethod + def forward(self, *args, **kwargs): + """Define the computation performed at every call.""" + + @staticmethod + def _parse_losses(losses): + """Parse the raw outputs (losses) of the network. + + Args: + losses (dict): Raw output of the network, which usually contain + losses and other necessary information. + + Returns: + tuple[Tensor, dict]: (loss, log_vars), loss is the loss tensor + which may be a weighted sum of all losses, log_vars contains + all the variables to be sent to the logger. + """ + log_vars = OrderedDict() + for loss_name, loss_value in losses.items(): + if isinstance(loss_value, torch.Tensor): + log_vars[loss_name] = loss_value.mean() + elif isinstance(loss_value, list): + log_vars[loss_name] = sum(_loss.mean() for _loss in loss_value) + else: + raise TypeError( + f'{loss_name} is not a tensor or list of tensors') + + loss = sum(_value for _key, _value in log_vars.items() + if 'loss' in _key) + + log_vars['loss'] = loss + for loss_name, loss_value in log_vars.items(): + # reduce loss when distributed training + if dist.is_available() and dist.is_initialized(): + loss_value = loss_value.data.clone() + dist.all_reduce(loss_value.div_(dist.get_world_size())) + log_vars[loss_name] = loss_value.item() + + return loss, log_vars + + def train_step(self, data_batch, optimizer, **kwargs): + """The iteration step during training. + + This method defines an iteration step during training, except for the + back propagation and optimizer updating, which are done in an optimizer + hook. Note that in some complicated cases or models, the whole process + including back propagation and optimizer updating is also defined in + this method, such as GAN. + + Args: + data_batch (dict): The output of dataloader. + optimizer (:obj:`torch.optim.Optimizer` | dict): The optimizer of + runner is passed to ``train_step()``. This argument is unused + and reserved. + + Returns: + dict: It should contain at least 3 keys: ``loss``, ``log_vars``, + ``num_samples``. + ``loss`` is a tensor for back propagation, which can be a + weighted sum of multiple losses. + ``log_vars`` contains all the variables to be sent to the + logger. + ``num_samples`` indicates the batch size (when the model is + DDP, it means the batch size on each GPU), which is used for + averaging the logs. + """ + losses = self.forward(**data_batch) + + loss, log_vars = self._parse_losses(losses) + + outputs = dict( + loss=loss, + log_vars=log_vars, + num_samples=len(next(iter(data_batch.values())))) + + return outputs + + def val_step(self, data_batch, optimizer, **kwargs): + """The iteration step during validation. + + This method shares the same signature as :func:`train_step`, but used + during val epochs. Note that the evaluation after training epochs is + not implemented with this method, but an evaluation hook. + """ + results = self.forward(return_loss=False, **data_batch) + + outputs = dict(results=results) + + return outputs + + +class BaseTAGClassifier(nn.Module, metaclass=ABCMeta): + """Base class for temporal action proposal classifier. + + All temporal action generation classifier should subclass it. All subclass + should overwrite: Methods:``forward_train``, supporting to forward when + training. Methods:``forward_test``, supporting to forward when testing. + """ + + def __init__(self, backbone, cls_head, train_cfg=None, test_cfg=None): + super().__init__() + self.backbone = builder.build_backbone(backbone) + self.cls_head = builder.build_head(cls_head) + + self.train_cfg = train_cfg + self.test_cfg = test_cfg + self.init_weights() + + def init_weights(self): + """Weight initialization for model.""" + self.backbone.init_weights() + self.cls_head.init_weights() + + def extract_feat(self, imgs): + """Extract features through a backbone. + + Args: + imgs (torch.Tensor): The input images. + Returns: + torch.tensor: The extracted features. + """ + x = self.backbone(imgs) + return x + + @abstractmethod + def forward_train(self, *args, **kwargs): + """Defines the computation performed at training.""" + + @abstractmethod + def forward_test(self, *args, **kwargs): + """Defines the computation performed at testing.""" + + def forward(self, *args, return_loss=True, **kwargs): + """Define the computation performed at every call.""" + if return_loss: + return self.forward_train(*args, **kwargs) + + return self.forward_test(*args, **kwargs) + + @staticmethod + def _parse_losses(losses): + """Parse the raw outputs (losses) of the network. + + Args: + losses (dict): Raw output of the network, which usually contain + losses and other necessary information. + + Returns: + tuple[Tensor, dict]: (loss, log_vars), loss is the loss tensor + which may be a weighted sum of all losses, log_vars contains + all the variables to be sent to the logger. + """ + log_vars = OrderedDict() + for loss_name, loss_value in losses.items(): + if isinstance(loss_value, torch.Tensor): + log_vars[loss_name] = loss_value.mean() + elif isinstance(loss_value, list): + log_vars[loss_name] = sum(_loss.mean() for _loss in loss_value) + else: + raise TypeError( + f'{loss_name} is not a tensor or list of tensors') + + loss = sum(_value for _key, _value in log_vars.items() + if 'loss' in _key) + + log_vars['loss'] = loss + for loss_name, loss_value in log_vars.items(): + # reduce loss when distributed training + if dist.is_available() and dist.is_initialized(): + loss_value = loss_value.data.clone() + dist.all_reduce(loss_value.div_(dist.get_world_size())) + log_vars[loss_name] = loss_value.item() + + return loss, log_vars + + def train_step(self, data_batch, optimizer, **kwargs): + """The iteration step during training. + + This method defines an iteration step during training, except for the + back propagation and optimizer updating, which are done in an optimizer + hook. Note that in some complicated cases or models, the whole process + including back propagation and optimizer updating is also defined in + this method, such as GAN. + + Args: + data_batch (dict): The output of dataloader. + optimizer (:obj:`torch.optim.Optimizer` | dict): The optimizer of + runner is passed to ``train_step()``. This argument is unused + and reserved. + + Returns: + dict: It should contain at least 3 keys: ``loss``, ``log_vars``, + ``num_samples``. + ``loss`` is a tensor for back propagation, which can be a + weighted sum of multiple losses. + ``log_vars`` contains all the variables to be sent to the + logger. + ``num_samples`` indicates the batch size (when the model is + DDP, it means the batch size on each GPU), which is used for + averaging the logs. + """ + losses = self.forward(**data_batch) + + loss, log_vars = self._parse_losses(losses) + + outputs = dict( + loss=loss, + log_vars=log_vars, + num_samples=len(next(iter(data_batch.values())))) + + return outputs + + def val_step(self, data_batch, optimizer, **kwargs): + """The iteration step during validation. + + This method shares the same signature as :func:`train_step`, but used + during val epochs. Note that the evaluation after training epochs is + not implemented with this method, but an evaluation hook. + """ + results = self.forward(return_loss=False, **data_batch) + + outputs = dict(results=results) + + return outputs + + +class BaseLocalizer(BaseTAGClassifier): + """Deprecated class for ``BaseTAPGenerator`` and ``BaseTAGClassifier``.""" + + def __init__(*args, **kwargs): + warnings.warn('``BaseLocalizer`` is deprecated, please switch to' + '``BaseTAPGenerator`` or ``BaseTAGClassifier``. Details ' + 'see https://github.com/open-mmlab/mmaction2/pull/913') + super().__init__(*args, **kwargs) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/bmn.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/bmn.py new file mode 100644 index 0000000000000000000000000000000000000000..df137b3818e24e0f90066fc4819dc0d73529fccf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/bmn.py @@ -0,0 +1,417 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import math + +import numpy as np +import torch +import torch.nn as nn + +from ...localization import temporal_iop, temporal_iou +from ..builder import LOCALIZERS, build_loss +from .base import BaseTAPGenerator +from .utils import post_processing + + +@LOCALIZERS.register_module() +class BMN(BaseTAPGenerator): + """Boundary Matching Network for temporal action proposal generation. + + Please refer `BMN: Boundary-Matching Network for Temporal Action Proposal + Generation `_. + Code Reference https://github.com/JJBOY/BMN-Boundary-Matching-Network + + Args: + temporal_dim (int): Total frames selected for each video. + boundary_ratio (float): Ratio for determining video boundaries. + num_samples (int): Number of samples for each proposal. + num_samples_per_bin (int): Number of bin samples for each sample. + feat_dim (int): Feature dimension. + soft_nms_alpha (float): Soft NMS alpha. + soft_nms_low_threshold (float): Soft NMS low threshold. + soft_nms_high_threshold (float): Soft NMS high threshold. + post_process_top_k (int): Top k proposals in post process. + feature_extraction_interval (int): + Interval used in feature extraction. Default: 16. + loss_cls (dict): Config for building loss. + Default: ``dict(type='BMNLoss')``. + hidden_dim_1d (int): Hidden dim for 1d conv. Default: 256. + hidden_dim_2d (int): Hidden dim for 2d conv. Default: 128. + hidden_dim_3d (int): Hidden dim for 3d conv. Default: 512. + """ + + def __init__(self, + temporal_dim, + boundary_ratio, + num_samples, + num_samples_per_bin, + feat_dim, + soft_nms_alpha, + soft_nms_low_threshold, + soft_nms_high_threshold, + post_process_top_k, + feature_extraction_interval=16, + loss_cls=dict(type='BMNLoss'), + hidden_dim_1d=256, + hidden_dim_2d=128, + hidden_dim_3d=512): + super().__init__() + + self.tscale = temporal_dim + self.boundary_ratio = boundary_ratio + self.num_samples = num_samples + self.num_samples_per_bin = num_samples_per_bin + self.feat_dim = feat_dim + self.soft_nms_alpha = soft_nms_alpha + self.soft_nms_low_threshold = soft_nms_low_threshold + self.soft_nms_high_threshold = soft_nms_high_threshold + self.post_process_top_k = post_process_top_k + self.feature_extraction_interval = feature_extraction_interval + self.loss_cls = build_loss(loss_cls) + self.hidden_dim_1d = hidden_dim_1d + self.hidden_dim_2d = hidden_dim_2d + self.hidden_dim_3d = hidden_dim_3d + + self._get_interp1d_mask() + + # Base Module + self.x_1d_b = nn.Sequential( + nn.Conv1d( + self.feat_dim, + self.hidden_dim_1d, + kernel_size=3, + padding=1, + groups=4), nn.ReLU(inplace=True), + nn.Conv1d( + self.hidden_dim_1d, + self.hidden_dim_1d, + kernel_size=3, + padding=1, + groups=4), nn.ReLU(inplace=True)) + + # Temporal Evaluation Module + self.x_1d_s = nn.Sequential( + nn.Conv1d( + self.hidden_dim_1d, + self.hidden_dim_1d, + kernel_size=3, + padding=1, + groups=4), nn.ReLU(inplace=True), + nn.Conv1d(self.hidden_dim_1d, 1, kernel_size=1), nn.Sigmoid()) + self.x_1d_e = nn.Sequential( + nn.Conv1d( + self.hidden_dim_1d, + self.hidden_dim_1d, + kernel_size=3, + padding=1, + groups=4), nn.ReLU(inplace=True), + nn.Conv1d(self.hidden_dim_1d, 1, kernel_size=1), nn.Sigmoid()) + + # Proposal Evaluation Module + self.x_1d_p = nn.Sequential( + nn.Conv1d( + self.hidden_dim_1d, + self.hidden_dim_1d, + kernel_size=3, + padding=1), nn.ReLU(inplace=True)) + self.x_3d_p = nn.Sequential( + nn.Conv3d( + self.hidden_dim_1d, + self.hidden_dim_3d, + kernel_size=(self.num_samples, 1, 1)), nn.ReLU(inplace=True)) + self.x_2d_p = nn.Sequential( + nn.Conv2d(self.hidden_dim_3d, self.hidden_dim_2d, kernel_size=1), + nn.ReLU(inplace=True), + nn.Conv2d( + self.hidden_dim_2d, + self.hidden_dim_2d, + kernel_size=3, + padding=1), nn.ReLU(inplace=True), + nn.Conv2d( + self.hidden_dim_2d, + self.hidden_dim_2d, + kernel_size=3, + padding=1), nn.ReLU(inplace=True), + nn.Conv2d(self.hidden_dim_2d, 2, kernel_size=1), nn.Sigmoid()) + self.anchors_tmins, self.anchors_tmaxs = self._temporal_anchors( + -0.5, 1.5) + self.match_map = self._match_map() + self.bm_mask = self._get_bm_mask() + + def _match_map(self): + """Generate match map.""" + temporal_gap = 1. / self.tscale + match_map = [] + for idx in range(self.tscale): + match_window = [] + tmin = temporal_gap * idx + for jdx in range(1, self.tscale + 1): + tmax = tmin + temporal_gap * jdx + match_window.append([tmin, tmax]) + match_map.append(match_window) + match_map = np.array(match_map) + match_map = np.transpose(match_map, [1, 0, 2]) + match_map = np.reshape(match_map, [-1, 2]) + return match_map + + def _temporal_anchors(self, tmin_offset=0., tmax_offset=1.): + """Generate temporal anchors. + + Args: + tmin_offset (int): Offset for the minimum value of temporal anchor. + Default: 0. + tmax_offset (int): Offset for the maximum value of temporal anchor. + Default: 1. + + Returns: + tuple[Sequence[float]]: The minimum and maximum values of temporal + anchors. + """ + temporal_gap = 1. / self.tscale + anchors_tmins = [] + anchors_tmaxs = [] + for i in range(self.tscale): + anchors_tmins.append(temporal_gap * (i + tmin_offset)) + anchors_tmaxs.append(temporal_gap * (i + tmax_offset)) + + return anchors_tmins, anchors_tmaxs + + def _forward(self, x): + """Define the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + # x.shape [batch_size, self.feat_dim, self.tscale] + base_feature = self.x_1d_b(x) + # base_feature.shape [batch_size, self.hidden_dim_1d, self.tscale] + start = self.x_1d_s(base_feature).squeeze(1) + # start.shape [batch_size, self.tscale] + end = self.x_1d_e(base_feature).squeeze(1) + # end.shape [batch_size, self.tscale] + confidence_map = self.x_1d_p(base_feature) + # [batch_size, self.hidden_dim_1d, self.tscale] + confidence_map = self._boundary_matching_layer(confidence_map) + # [batch_size, self.hidden_dim_1d,, self.num_sampls, self.tscale, self.tscale] # noqa + confidence_map = self.x_3d_p(confidence_map).squeeze(2) + # [batch_size, self.hidden_dim_3d, self.tscale, self.tscale] + confidence_map = self.x_2d_p(confidence_map) + # [batch_size, 2, self.tscale, self.tscale] + + return confidence_map, start, end + + def _boundary_matching_layer(self, x): + """Generate matching layer.""" + input_size = x.size() + out = torch.matmul(x, + self.sample_mask).reshape(input_size[0], + input_size[1], + self.num_samples, + self.tscale, self.tscale) + return out + + def forward_test(self, raw_feature, video_meta): + """Define the computation performed at every call when testing.""" + confidence_map, start, end = self._forward(raw_feature) + start_scores = start[0].cpu().numpy() + end_scores = end[0].cpu().numpy() + cls_confidence = (confidence_map[0][1]).cpu().numpy() + reg_confidence = (confidence_map[0][0]).cpu().numpy() + + max_start = max(start_scores) + max_end = max(end_scores) + + # generate the set of start points and end points + start_bins = np.zeros(len(start_scores)) + start_bins[0] = 1 # [1,0,0...,0,0] + end_bins = np.zeros(len(end_scores)) + end_bins[-1] = 1 # [0,0,0...,0,1] + for idx in range(1, self.tscale - 1): + if start_scores[idx] > start_scores[ + idx + 1] and start_scores[idx] > start_scores[idx - 1]: + start_bins[idx] = 1 + elif start_scores[idx] > (0.5 * max_start): + start_bins[idx] = 1 + if end_scores[idx] > end_scores[ + idx + 1] and end_scores[idx] > end_scores[idx - 1]: + end_bins[idx] = 1 + elif end_scores[idx] > (0.5 * max_end): + end_bins[idx] = 1 + + # iterate through all combinations of start_index and end_index + new_proposals = [] + for idx in range(self.tscale): + for jdx in range(self.tscale): + start_index = jdx + end_index = start_index + idx + 1 + if end_index < self.tscale and start_bins[ + start_index] == 1 and end_bins[end_index] == 1: + tmin = start_index / self.tscale + tmax = end_index / self.tscale + tmin_score = start_scores[start_index] + tmax_score = end_scores[end_index] + cls_score = cls_confidence[idx, jdx] + reg_score = reg_confidence[idx, jdx] + score = tmin_score * tmax_score * cls_score * reg_score + new_proposals.append([ + tmin, tmax, tmin_score, tmax_score, cls_score, + reg_score, score + ]) + new_proposals = np.stack(new_proposals) + video_info = dict(video_meta[0]) + proposal_list = post_processing(new_proposals, video_info, + self.soft_nms_alpha, + self.soft_nms_low_threshold, + self.soft_nms_high_threshold, + self.post_process_top_k, + self.feature_extraction_interval) + output = [ + dict( + video_name=video_info['video_name'], + proposal_list=proposal_list) + ] + return output + + def forward_train(self, raw_feature, label_confidence, label_start, + label_end): + """Define the computation performed at every call when training.""" + confidence_map, start, end = self._forward(raw_feature) + loss = self.loss_cls(confidence_map, start, end, label_confidence, + label_start, label_end, + self.bm_mask.to(raw_feature.device)) + loss_dict = dict(loss=loss[0]) + return loss_dict + + def generate_labels(self, gt_bbox): + """Generate training labels.""" + match_score_confidence_list = [] + match_score_start_list = [] + match_score_end_list = [] + for every_gt_bbox in gt_bbox: + gt_iou_map = [] + for start, end in every_gt_bbox: + if isinstance(start, torch.Tensor): + start = start.numpy() + if isinstance(end, torch.Tensor): + end = end.numpy() + current_gt_iou_map = temporal_iou(self.match_map[:, 0], + self.match_map[:, 1], start, + end) + current_gt_iou_map = np.reshape(current_gt_iou_map, + [self.tscale, self.tscale]) + gt_iou_map.append(current_gt_iou_map) + gt_iou_map = np.array(gt_iou_map).astype(np.float32) + gt_iou_map = np.max(gt_iou_map, axis=0) + + gt_tmins = every_gt_bbox[:, 0] + gt_tmaxs = every_gt_bbox[:, 1] + + gt_len_pad = 3 * (1. / self.tscale) + + gt_start_bboxs = np.stack( + (gt_tmins - gt_len_pad / 2, gt_tmins + gt_len_pad / 2), axis=1) + gt_end_bboxs = np.stack( + (gt_tmaxs - gt_len_pad / 2, gt_tmaxs + gt_len_pad / 2), axis=1) + + match_score_start = [] + match_score_end = [] + + for anchor_tmin, anchor_tmax in zip(self.anchors_tmins, + self.anchors_tmaxs): + match_score_start.append( + np.max( + temporal_iop(anchor_tmin, anchor_tmax, + gt_start_bboxs[:, 0], gt_start_bboxs[:, + 1]))) + match_score_end.append( + np.max( + temporal_iop(anchor_tmin, anchor_tmax, + gt_end_bboxs[:, 0], gt_end_bboxs[:, 1]))) + match_score_confidence_list.append(gt_iou_map) + match_score_start_list.append(match_score_start) + match_score_end_list.append(match_score_end) + match_score_confidence_list = torch.Tensor(match_score_confidence_list) + match_score_start_list = torch.Tensor(match_score_start_list) + match_score_end_list = torch.Tensor(match_score_end_list) + return (match_score_confidence_list, match_score_start_list, + match_score_end_list) + + def forward(self, + raw_feature, + gt_bbox=None, + video_meta=None, + return_loss=True): + """Define the computation performed at every call.""" + if return_loss: + label_confidence, label_start, label_end = ( + self.generate_labels(gt_bbox)) + device = raw_feature.device + label_confidence = label_confidence.to(device) + label_start = label_start.to(device) + label_end = label_end.to(device) + return self.forward_train(raw_feature, label_confidence, + label_start, label_end) + + return self.forward_test(raw_feature, video_meta) + + @staticmethod + def _get_interp1d_bin_mask(seg_tmin, seg_tmax, tscale, num_samples, + num_samples_per_bin): + """Generate sample mask for a boundary-matching pair.""" + plen = float(seg_tmax - seg_tmin) + plen_sample = plen / (num_samples * num_samples_per_bin - 1.0) + total_samples = [ + seg_tmin + plen_sample * i + for i in range(num_samples * num_samples_per_bin) + ] + p_mask = [] + for idx in range(num_samples): + bin_samples = total_samples[idx * num_samples_per_bin:(idx + 1) * + num_samples_per_bin] + bin_vector = np.zeros(tscale) + for sample in bin_samples: + sample_upper = math.ceil(sample) + sample_decimal, sample_down = math.modf(sample) + if 0 <= int(sample_down) <= (tscale - 1): + bin_vector[int(sample_down)] += 1 - sample_decimal + if 0 <= int(sample_upper) <= (tscale - 1): + bin_vector[int(sample_upper)] += sample_decimal + bin_vector = 1.0 / num_samples_per_bin * bin_vector + p_mask.append(bin_vector) + p_mask = np.stack(p_mask, axis=1) + return p_mask + + def _get_interp1d_mask(self): + """Generate sample mask for each point in Boundary-Matching Map.""" + mask_mat = [] + for start_index in range(self.tscale): + mask_mat_vector = [] + for duration_index in range(self.tscale): + if start_index + duration_index < self.tscale: + p_tmin = start_index + p_tmax = start_index + duration_index + center_len = float(p_tmax - p_tmin) + 1 + sample_tmin = p_tmin - (center_len * self.boundary_ratio) + sample_tmax = p_tmax + (center_len * self.boundary_ratio) + p_mask = self._get_interp1d_bin_mask( + sample_tmin, sample_tmax, self.tscale, + self.num_samples, self.num_samples_per_bin) + else: + p_mask = np.zeros([self.tscale, self.num_samples]) + mask_mat_vector.append(p_mask) + mask_mat_vector = np.stack(mask_mat_vector, axis=2) + mask_mat.append(mask_mat_vector) + mask_mat = np.stack(mask_mat, axis=3) + mask_mat = mask_mat.astype(np.float32) + self.sample_mask = nn.Parameter( + torch.tensor(mask_mat).view(self.tscale, -1), requires_grad=False) + + def _get_bm_mask(self): + """Generate Boundary-Matching Mask.""" + bm_mask = [] + for idx in range(self.tscale): + mask_vector = [1] * (self.tscale - idx) + [0] * idx + bm_mask.append(mask_vector) + bm_mask = torch.tensor(bm_mask, dtype=torch.float) + return bm_mask diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/bsn.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/bsn.py new file mode 100644 index 0000000000000000000000000000000000000000..ef595fe7832ebfc00ee740fec33ad8452adcc074 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/bsn.py @@ -0,0 +1,395 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import torch +import torch.nn as nn +import torch.nn.functional as F + +from ...localization import temporal_iop +from ..builder import LOCALIZERS, build_loss +from .base import BaseTAPGenerator +from .utils import post_processing + + +@LOCALIZERS.register_module() +class TEM(BaseTAPGenerator): + """Temporal Evaluation Model for Boundary Sensitive Network. + + Please refer `BSN: Boundary Sensitive Network for Temporal Action + Proposal Generation `_. + + Code reference + https://github.com/wzmsltw/BSN-boundary-sensitive-network + + Args: + tem_feat_dim (int): Feature dimension. + tem_hidden_dim (int): Hidden layer dimension. + tem_match_threshold (float): Temporal evaluation match threshold. + loss_cls (dict): Config for building loss. + Default: ``dict(type='BinaryLogisticRegressionLoss')``. + loss_weight (float): Weight term for action_loss. Default: 2. + output_dim (int): Output dimension. Default: 3. + conv1_ratio (float): Ratio of conv1 layer output. Default: 1.0. + conv2_ratio (float): Ratio of conv2 layer output. Default: 1.0. + conv3_ratio (float): Ratio of conv3 layer output. Default: 0.01. + """ + + def __init__(self, + temporal_dim, + boundary_ratio, + tem_feat_dim, + tem_hidden_dim, + tem_match_threshold, + loss_cls=dict(type='BinaryLogisticRegressionLoss'), + loss_weight=2, + output_dim=3, + conv1_ratio=1, + conv2_ratio=1, + conv3_ratio=0.01): + super().__init__() + + self.temporal_dim = temporal_dim + self.boundary_ratio = boundary_ratio + self.feat_dim = tem_feat_dim + self.c_hidden = tem_hidden_dim + self.match_threshold = tem_match_threshold + self.output_dim = output_dim + self.loss_cls = build_loss(loss_cls) + self.loss_weight = loss_weight + self.conv1_ratio = conv1_ratio + self.conv2_ratio = conv2_ratio + self.conv3_ratio = conv3_ratio + + self.conv1 = nn.Conv1d( + in_channels=self.feat_dim, + out_channels=self.c_hidden, + kernel_size=3, + stride=1, + padding=1, + groups=1) + self.conv2 = nn.Conv1d( + in_channels=self.c_hidden, + out_channels=self.c_hidden, + kernel_size=3, + stride=1, + padding=1, + groups=1) + self.conv3 = nn.Conv1d( + in_channels=self.c_hidden, + out_channels=self.output_dim, + kernel_size=1, + stride=1, + padding=0) + self.anchors_tmins, self.anchors_tmaxs = self._temporal_anchors() + + def _temporal_anchors(self, tmin_offset=0., tmax_offset=1.): + """Generate temporal anchors. + + Args: + tmin_offset (int): Offset for the minimum value of temporal anchor. + Default: 0. + tmax_offset (int): Offset for the maximum value of temporal anchor. + Default: 1. + + Returns: + tuple[Sequence[float]]: The minimum and maximum values of temporal + anchors. + """ + temporal_gap = 1. / self.temporal_dim + anchors_tmins = [] + anchors_tmaxs = [] + for i in range(self.temporal_dim): + anchors_tmins.append(temporal_gap * (i + tmin_offset)) + anchors_tmaxs.append(temporal_gap * (i + tmax_offset)) + + return anchors_tmins, anchors_tmaxs + + def _forward(self, x): + """Define the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + x = F.relu(self.conv1_ratio * self.conv1(x)) + x = F.relu(self.conv2_ratio * self.conv2(x)) + x = torch.sigmoid(self.conv3_ratio * self.conv3(x)) + return x + + def forward_train(self, raw_feature, label_action, label_start, label_end): + """Define the computation performed at every call when training.""" + tem_output = self._forward(raw_feature) + score_action = tem_output[:, 0, :] + score_start = tem_output[:, 1, :] + score_end = tem_output[:, 2, :] + + loss_action = self.loss_cls(score_action, label_action, + self.match_threshold) + loss_start_small = self.loss_cls(score_start, label_start, + self.match_threshold) + loss_end_small = self.loss_cls(score_end, label_end, + self.match_threshold) + loss_dict = { + 'loss_action': loss_action * self.loss_weight, + 'loss_start': loss_start_small, + 'loss_end': loss_end_small + } + + return loss_dict + + def forward_test(self, raw_feature, video_meta): + """Define the computation performed at every call when testing.""" + tem_output = self._forward(raw_feature).cpu().numpy() + batch_action = tem_output[:, 0, :] + batch_start = tem_output[:, 1, :] + batch_end = tem_output[:, 2, :] + + video_meta_list = [dict(x) for x in video_meta] + + video_results = [] + + for batch_idx, _ in enumerate(batch_action): + video_name = video_meta_list[batch_idx]['video_name'] + video_action = batch_action[batch_idx] + video_start = batch_start[batch_idx] + video_end = batch_end[batch_idx] + video_result = np.stack((video_action, video_start, video_end, + self.anchors_tmins, self.anchors_tmaxs), + axis=1) + video_results.append((video_name, video_result)) + return video_results + + def generate_labels(self, gt_bbox): + """Generate training labels.""" + match_score_action_list = [] + match_score_start_list = [] + match_score_end_list = [] + for every_gt_bbox in gt_bbox: + gt_tmins = every_gt_bbox[:, 0].cpu().numpy() + gt_tmaxs = every_gt_bbox[:, 1].cpu().numpy() + + gt_lens = gt_tmaxs - gt_tmins + gt_len_pad = np.maximum(1. / self.temporal_dim, + self.boundary_ratio * gt_lens) + + gt_start_bboxs = np.stack( + (gt_tmins - gt_len_pad / 2, gt_tmins + gt_len_pad / 2), axis=1) + gt_end_bboxs = np.stack( + (gt_tmaxs - gt_len_pad / 2, gt_tmaxs + gt_len_pad / 2), axis=1) + + match_score_action = [] + match_score_start = [] + match_score_end = [] + + for anchor_tmin, anchor_tmax in zip(self.anchors_tmins, + self.anchors_tmaxs): + match_score_action.append( + np.max( + temporal_iop(anchor_tmin, anchor_tmax, gt_tmins, + gt_tmaxs))) + match_score_start.append( + np.max( + temporal_iop(anchor_tmin, anchor_tmax, + gt_start_bboxs[:, 0], gt_start_bboxs[:, + 1]))) + match_score_end.append( + np.max( + temporal_iop(anchor_tmin, anchor_tmax, + gt_end_bboxs[:, 0], gt_end_bboxs[:, 1]))) + match_score_action_list.append(match_score_action) + match_score_start_list.append(match_score_start) + match_score_end_list.append(match_score_end) + match_score_action_list = torch.Tensor(match_score_action_list) + match_score_start_list = torch.Tensor(match_score_start_list) + match_score_end_list = torch.Tensor(match_score_end_list) + return (match_score_action_list, match_score_start_list, + match_score_end_list) + + def forward(self, + raw_feature, + gt_bbox=None, + video_meta=None, + return_loss=True): + """Define the computation performed at every call.""" + if return_loss: + label_action, label_start, label_end = ( + self.generate_labels(gt_bbox)) + device = raw_feature.device + label_action = label_action.to(device) + label_start = label_start.to(device) + label_end = label_end.to(device) + return self.forward_train(raw_feature, label_action, label_start, + label_end) + + return self.forward_test(raw_feature, video_meta) + + +@LOCALIZERS.register_module() +class PEM(BaseTAPGenerator): + """Proposals Evaluation Model for Boundary Sensitive Network. + + Please refer `BSN: Boundary Sensitive Network for Temporal Action + Proposal Generation `_. + + Code reference + https://github.com/wzmsltw/BSN-boundary-sensitive-network + + Args: + pem_feat_dim (int): Feature dimension. + pem_hidden_dim (int): Hidden layer dimension. + pem_u_ratio_m (float): Ratio for medium score proprosals to balance + data. + pem_u_ratio_l (float): Ratio for low score proprosals to balance data. + pem_high_temporal_iou_threshold (float): High IoU threshold. + pem_low_temporal_iou_threshold (float): Low IoU threshold. + soft_nms_alpha (float): Soft NMS alpha. + soft_nms_low_threshold (float): Soft NMS low threshold. + soft_nms_high_threshold (float): Soft NMS high threshold. + post_process_top_k (int): Top k proposals in post process. + feature_extraction_interval (int): + Interval used in feature extraction. Default: 16. + fc1_ratio (float): Ratio for fc1 layer output. Default: 0.1. + fc2_ratio (float): Ratio for fc2 layer output. Default: 0.1. + output_dim (int): Output dimension. Default: 1. + """ + + def __init__(self, + pem_feat_dim, + pem_hidden_dim, + pem_u_ratio_m, + pem_u_ratio_l, + pem_high_temporal_iou_threshold, + pem_low_temporal_iou_threshold, + soft_nms_alpha, + soft_nms_low_threshold, + soft_nms_high_threshold, + post_process_top_k, + feature_extraction_interval=16, + fc1_ratio=0.1, + fc2_ratio=0.1, + output_dim=1): + super().__init__() + + self.feat_dim = pem_feat_dim + self.hidden_dim = pem_hidden_dim + self.u_ratio_m = pem_u_ratio_m + self.u_ratio_l = pem_u_ratio_l + self.pem_high_temporal_iou_threshold = pem_high_temporal_iou_threshold + self.pem_low_temporal_iou_threshold = pem_low_temporal_iou_threshold + self.soft_nms_alpha = soft_nms_alpha + self.soft_nms_low_threshold = soft_nms_low_threshold + self.soft_nms_high_threshold = soft_nms_high_threshold + self.post_process_top_k = post_process_top_k + self.feature_extraction_interval = feature_extraction_interval + self.fc1_ratio = fc1_ratio + self.fc2_ratio = fc2_ratio + self.output_dim = output_dim + + self.fc1 = nn.Linear( + in_features=self.feat_dim, out_features=self.hidden_dim, bias=True) + self.fc2 = nn.Linear( + in_features=self.hidden_dim, + out_features=self.output_dim, + bias=True) + + def _forward(self, x): + """Define the computation performed at every call. + + Args: + x (torch.Tensor): The input data. + + Returns: + torch.Tensor: The output of the module. + """ + x = torch.cat(list(x)) + x = F.relu(self.fc1_ratio * self.fc1(x)) + x = torch.sigmoid(self.fc2_ratio * self.fc2(x)) + return x + + def forward_train(self, bsp_feature, reference_temporal_iou): + """Define the computation performed at every call when training.""" + pem_output = self._forward(bsp_feature) + reference_temporal_iou = torch.cat(list(reference_temporal_iou)) + device = pem_output.device + reference_temporal_iou = reference_temporal_iou.to(device) + + anchors_temporal_iou = pem_output.view(-1) + u_hmask = (reference_temporal_iou > + self.pem_high_temporal_iou_threshold).float() + u_mmask = ( + (reference_temporal_iou <= self.pem_high_temporal_iou_threshold) + & (reference_temporal_iou > self.pem_low_temporal_iou_threshold) + ).float() + u_lmask = (reference_temporal_iou <= + self.pem_low_temporal_iou_threshold).float() + + num_h = torch.sum(u_hmask) + num_m = torch.sum(u_mmask) + num_l = torch.sum(u_lmask) + + r_m = self.u_ratio_m * num_h / (num_m) + r_m = torch.min(r_m, torch.Tensor([1.0]).to(device))[0] + u_smmask = torch.rand(u_hmask.size()[0], device=device) + u_smmask = u_smmask * u_mmask + u_smmask = (u_smmask > (1. - r_m)).float() + + r_l = self.u_ratio_l * num_h / (num_l) + r_l = torch.min(r_l, torch.Tensor([1.0]).to(device))[0] + u_slmask = torch.rand(u_hmask.size()[0], device=device) + u_slmask = u_slmask * u_lmask + u_slmask = (u_slmask > (1. - r_l)).float() + + temporal_iou_weights = u_hmask + u_smmask + u_slmask + temporal_iou_loss = F.smooth_l1_loss(anchors_temporal_iou, + reference_temporal_iou) + temporal_iou_loss = torch.sum( + temporal_iou_loss * + temporal_iou_weights) / torch.sum(temporal_iou_weights) + loss_dict = dict(temporal_iou_loss=temporal_iou_loss) + + return loss_dict + + def forward_test(self, bsp_feature, tmin, tmax, tmin_score, tmax_score, + video_meta): + """Define the computation performed at every call when testing.""" + pem_output = self._forward(bsp_feature).view(-1).cpu().numpy().reshape( + -1, 1) + + tmin = tmin.view(-1).cpu().numpy().reshape(-1, 1) + tmax = tmax.view(-1).cpu().numpy().reshape(-1, 1) + tmin_score = tmin_score.view(-1).cpu().numpy().reshape(-1, 1) + tmax_score = tmax_score.view(-1).cpu().numpy().reshape(-1, 1) + score = np.array(pem_output * tmin_score * tmax_score).reshape(-1, 1) + result = np.concatenate( + (tmin, tmax, tmin_score, tmax_score, pem_output, score), axis=1) + result = result.reshape(-1, 6) + video_info = dict(video_meta[0]) + proposal_list = post_processing(result, video_info, + self.soft_nms_alpha, + self.soft_nms_low_threshold, + self.soft_nms_high_threshold, + self.post_process_top_k, + self.feature_extraction_interval) + output = [ + dict( + video_name=video_info['video_name'], + proposal_list=proposal_list) + ] + return output + + def forward(self, + bsp_feature, + reference_temporal_iou=None, + tmin=None, + tmax=None, + tmin_score=None, + tmax_score=None, + video_meta=None, + return_loss=True): + """Define the computation performed at every call.""" + if return_loss: + return self.forward_train(bsp_feature, reference_temporal_iou) + + return self.forward_test(bsp_feature, tmin, tmax, tmin_score, + tmax_score, video_meta) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/ssn.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/ssn.py new file mode 100644 index 0000000000000000000000000000000000000000..3136d651f6d76f4be04410605d7dcf7a2d0a34a4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/ssn.py @@ -0,0 +1,136 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn + +from .. import builder +from ..builder import LOCALIZERS +from .base import BaseTAGClassifier + + +@LOCALIZERS.register_module() +class SSN(BaseTAGClassifier): + """Temporal Action Detection with Structured Segment Networks. + + Args: + backbone (dict): Config for building backbone. + cls_head (dict): Config for building classification head. + in_channels (int): Number of channels for input data. + Default: 3. + spatial_type (str): Type of spatial pooling. + Default: 'avg'. + dropout_ratio (float): Ratio of dropout. + Default: 0.5. + loss_cls (dict): Config for building loss. + Default: ``dict(type='SSNLoss')``. + train_cfg (dict | None): Config for training. Default: None. + test_cfg (dict | None): Config for testing. Default: None. + """ + + def __init__(self, + backbone, + cls_head, + in_channels=3, + spatial_type='avg', + dropout_ratio=0.5, + loss_cls=dict(type='SSNLoss'), + train_cfg=None, + test_cfg=None): + + super().__init__(backbone, cls_head, train_cfg, test_cfg) + + self.is_test_prepared = False + self.in_channels = in_channels + + self.spatial_type = spatial_type + if self.spatial_type == 'avg': + self.pool = nn.AvgPool2d((7, 7), stride=1, padding=0) + elif self.spatial_type == 'max': + self.pool = nn.MaxPool2d((7, 7), stride=1, padding=0) + else: + self.pool = None + + self.dropout_ratio = dropout_ratio + if self.dropout_ratio != 0: + self.dropout = nn.Dropout(p=self.dropout_ratio) + else: + self.dropout = None + self.loss_cls = builder.build_loss(loss_cls) + + def forward_train(self, imgs, proposal_scale_factor, proposal_type, + proposal_labels, reg_targets, **kwargs): + """Define the computation performed at every call when training.""" + imgs = imgs.reshape((-1, self.in_channels) + imgs.shape[4:]) + + x = self.extract_feat(imgs) + + if self.pool: + x = self.pool(x) + if self.dropout is not None: + x = self.dropout(x) + + activity_scores, completeness_scores, bbox_preds = self.cls_head( + (x, proposal_scale_factor)) + + loss = self.loss_cls(activity_scores, completeness_scores, bbox_preds, + proposal_type, proposal_labels, reg_targets, + self.train_cfg) + loss_dict = dict(**loss) + + return loss_dict + + def forward_test(self, imgs, relative_proposal_list, scale_factor_list, + proposal_tick_list, reg_norm_consts, **kwargs): + """Define the computation performed at every call when testing.""" + num_crops = imgs.shape[0] + imgs = imgs.reshape((num_crops, -1, self.in_channels) + imgs.shape[3:]) + num_ticks = imgs.shape[1] + + output = [] + minibatch_size = self.test_cfg.ssn.sampler.batch_size + for idx in range(0, num_ticks, minibatch_size): + chunk = imgs[:, idx:idx + + minibatch_size, :, :, :].view((-1, ) + imgs.shape[2:]) + x = self.extract_feat(chunk) + if self.pool: + x = self.pool(x) + # Merge crop to save memory. + x = x.reshape((num_crops, x.size(0) // num_crops, -1)).mean(dim=0) + output.append(x) + output = torch.cat(output, dim=0) + + relative_proposal_list = relative_proposal_list.squeeze(0) + proposal_tick_list = proposal_tick_list.squeeze(0) + scale_factor_list = scale_factor_list.squeeze(0) + reg_norm_consts = reg_norm_consts.squeeze(0) + + if not self.is_test_prepared: + self.is_test_prepared = self.cls_head.prepare_test_fc( + self.cls_head.consensus.num_multipliers) + + (output, activity_scores, completeness_scores, + bbox_preds) = self.cls_head( + (output, proposal_tick_list, scale_factor_list), test_mode=True) + + relative_proposal_list = relative_proposal_list.cpu().numpy() + activity_scores = activity_scores.cpu().numpy() + completeness_scores = completeness_scores.cpu().numpy() + reg_norm_consts = reg_norm_consts.cpu().numpy() + if bbox_preds is not None: + bbox_preds = bbox_preds.view(-1, self.cls_head.num_classes, 2) + bbox_preds[:, :, 0] = ( + bbox_preds[:, :, 0] * reg_norm_consts[1, 0] + + reg_norm_consts[0, 0]) + bbox_preds[:, :, 1] = ( + bbox_preds[:, :, 1] * reg_norm_consts[1, 1] + + reg_norm_consts[0, 1]) + bbox_preds = bbox_preds.cpu().numpy() + + result = [ + dict( + relative_proposal_list=relative_proposal_list, + activity_scores=activity_scores, + completeness_scores=completeness_scores, + bbox_preds=bbox_preds) + ] + + return result diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/utils/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..13f70f35f57a6fb057d4d2724192691271c6aee8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/utils/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .post_processing import post_processing + +__all__ = ['post_processing'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/utils/post_processing.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/utils/post_processing.py new file mode 100644 index 0000000000000000000000000000000000000000..4ac81e2f07bb7e86a90a51317528f273546a9169 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/localizers/utils/post_processing.py @@ -0,0 +1,45 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from mmaction.localization import soft_nms + + +def post_processing(result, video_info, soft_nms_alpha, soft_nms_low_threshold, + soft_nms_high_threshold, post_process_top_k, + feature_extraction_interval): + """Post process for temporal proposals generation. + + Args: + result (np.ndarray): Proposals generated by network. + video_info (dict): Meta data of video. Required keys are + 'duration_frame', 'duration_second'. + soft_nms_alpha (float): Alpha value of Gaussian decaying function. + soft_nms_low_threshold (float): Low threshold for soft nms. + soft_nms_high_threshold (float): High threshold for soft nms. + post_process_top_k (int): Top k values to be considered. + feature_extraction_interval (int): Interval used in feature extraction. + + Returns: + list[dict]: The updated proposals, e.g. + [{'score': 0.9, 'segment': [0, 1]}, + {'score': 0.8, 'segment': [0, 2]}, + ...]. + """ + if len(result) > 1: + result = soft_nms(result, soft_nms_alpha, soft_nms_low_threshold, + soft_nms_high_threshold, post_process_top_k) + + result = result[result[:, -1].argsort()[::-1]] + video_duration = float( + video_info['duration_frame'] // feature_extraction_interval * + feature_extraction_interval + ) / video_info['duration_frame'] * video_info['duration_second'] + proposal_list = [] + + for j in range(min(post_process_top_k, len(result))): + proposal = {} + proposal['score'] = float(result[j, -1]) + proposal['segment'] = [ + max(0, result[j, 0]) * video_duration, + min(1, result[j, 1]) * video_duration + ] + proposal_list.append(proposal) + return proposal_list diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..41afcb7ace2ad28f44e3036a42dc5461301fe11e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/__init__.py @@ -0,0 +1,16 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .base import BaseWeightedLoss +from .binary_logistic_regression_loss import BinaryLogisticRegressionLoss +from .bmn_loss import BMNLoss +from .cross_entropy_loss import (BCELossWithLogits, CBFocalLoss, + CrossEntropyLoss) +from .hvu_loss import HVULoss +from .nll_loss import NLLLoss +from .ohem_hinge_loss import OHEMHingeLoss +from .ssn_loss import SSNLoss + +__all__ = [ + 'BaseWeightedLoss', 'CrossEntropyLoss', 'NLLLoss', 'BCELossWithLogits', + 'BinaryLogisticRegressionLoss', 'BMNLoss', 'OHEMHingeLoss', 'SSNLoss', + 'HVULoss', 'CBFocalLoss' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/base.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/base.py new file mode 100644 index 0000000000000000000000000000000000000000..9e1df07d7dd7c7db4b7f308f8beb3adaa8256a29 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/base.py @@ -0,0 +1,45 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from abc import ABCMeta, abstractmethod + +import torch.nn as nn + + +class BaseWeightedLoss(nn.Module, metaclass=ABCMeta): + """Base class for loss. + + All subclass should overwrite the ``_forward()`` method which returns the + normal loss without loss weights. + + Args: + loss_weight (float): Factor scalar multiplied on the loss. + Default: 1.0. + """ + + def __init__(self, loss_weight=1.0): + super().__init__() + self.loss_weight = loss_weight + + @abstractmethod + def _forward(self, *args, **kwargs): + pass + + def forward(self, *args, **kwargs): + """Defines the computation performed at every call. + + Args: + *args: The positional arguments for the corresponding + loss. + **kwargs: The keyword arguments for the corresponding + loss. + + Returns: + torch.Tensor: The calculated loss. + """ + ret = self._forward(*args, **kwargs) + if isinstance(ret, dict): + for k in ret: + if 'loss' in k: + ret[k] *= self.loss_weight + else: + ret *= self.loss_weight + return ret diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/binary_logistic_regression_loss.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/binary_logistic_regression_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..74ed294f53884eb0902383678d6707c000077a63 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/binary_logistic_regression_loss.py @@ -0,0 +1,62 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn + +from ..builder import LOSSES + + +def binary_logistic_regression_loss(reg_score, + label, + threshold=0.5, + ratio_range=(1.05, 21), + eps=1e-5): + """Binary Logistic Regression Loss.""" + label = label.view(-1).to(reg_score.device) + reg_score = reg_score.contiguous().view(-1) + + pmask = (label > threshold).float().to(reg_score.device) + num_positive = max(torch.sum(pmask), 1) + num_entries = len(label) + ratio = num_entries / num_positive + # clip ratio value between ratio_range + ratio = min(max(ratio, ratio_range[0]), ratio_range[1]) + + coef_0 = 0.5 * ratio / (ratio - 1) + coef_1 = 0.5 * ratio + loss = coef_1 * pmask * torch.log(reg_score + eps) + coef_0 * ( + 1.0 - pmask) * torch.log(1.0 - reg_score + eps) + loss = -torch.mean(loss) + return loss + + +@LOSSES.register_module() +class BinaryLogisticRegressionLoss(nn.Module): + """Binary Logistic Regression Loss. + + It will calculate binary logistic regression loss given reg_score and + label. + """ + + def forward(self, + reg_score, + label, + threshold=0.5, + ratio_range=(1.05, 21), + eps=1e-5): + """Calculate Binary Logistic Regression Loss. + + Args: + reg_score (torch.Tensor): Predicted score by model. + label (torch.Tensor): Groundtruth labels. + threshold (float): Threshold for positive instances. + Default: 0.5. + ratio_range (tuple): Lower bound and upper bound for ratio. + Default: (1.05, 21) + eps (float): Epsilon for small value. Default: 1e-5. + + Returns: + torch.Tensor: Returned binary logistic loss. + """ + + return binary_logistic_regression_loss(reg_score, label, threshold, + ratio_range, eps) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/bmn_loss.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/bmn_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..eb997c9ea2487b87a3d0ae3da1c3090bdf7ccad9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/bmn_loss.py @@ -0,0 +1,181 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +import torch.nn.functional as F + +from ..builder import LOSSES +from .binary_logistic_regression_loss import binary_logistic_regression_loss + + +@LOSSES.register_module() +class BMNLoss(nn.Module): + """BMN Loss. + + From paper https://arxiv.org/abs/1907.09702, + code https://github.com/JJBOY/BMN-Boundary-Matching-Network. + It will calculate loss for BMN Model. This loss is a weighted sum of + + 1) temporal evaluation loss based on confidence score of start and + end positions. + 2) proposal evaluation regression loss based on confidence scores of + candidate proposals. + 3) proposal evaluation classification loss based on classification + results of candidate proposals. + """ + + @staticmethod + def tem_loss(pred_start, pred_end, gt_start, gt_end): + """Calculate Temporal Evaluation Module Loss. + + This function calculate the binary_logistic_regression_loss for start + and end respectively and returns the sum of their losses. + + Args: + pred_start (torch.Tensor): Predicted start score by BMN model. + pred_end (torch.Tensor): Predicted end score by BMN model. + gt_start (torch.Tensor): Groundtruth confidence score for start. + gt_end (torch.Tensor): Groundtruth confidence score for end. + + Returns: + torch.Tensor: Returned binary logistic loss. + """ + loss_start = binary_logistic_regression_loss(pred_start, gt_start) + loss_end = binary_logistic_regression_loss(pred_end, gt_end) + loss = loss_start + loss_end + return loss + + @staticmethod + def pem_reg_loss(pred_score, + gt_iou_map, + mask, + high_temporal_iou_threshold=0.7, + low_temporal_iou_threshold=0.3): + """Calculate Proposal Evaluation Module Regression Loss. + + Args: + pred_score (torch.Tensor): Predicted temporal_iou score by BMN. + gt_iou_map (torch.Tensor): Groundtruth temporal_iou score. + mask (torch.Tensor): Boundary-Matching mask. + high_temporal_iou_threshold (float): Higher threshold of + temporal_iou. Default: 0.7. + low_temporal_iou_threshold (float): Higher threshold of + temporal_iou. Default: 0.3. + + Returns: + torch.Tensor: Proposal evaluation regression loss. + """ + u_hmask = (gt_iou_map > high_temporal_iou_threshold).float() + u_mmask = ((gt_iou_map <= high_temporal_iou_threshold) & + (gt_iou_map > low_temporal_iou_threshold)).float() + u_lmask = ((gt_iou_map <= low_temporal_iou_threshold) & + (gt_iou_map > 0.)).float() + u_lmask = u_lmask * mask + + num_h = torch.sum(u_hmask) + num_m = torch.sum(u_mmask) + num_l = torch.sum(u_lmask) + + r_m = num_h / num_m + u_smmask = torch.rand_like(gt_iou_map) + u_smmask = u_mmask * u_smmask + u_smmask = (u_smmask > (1. - r_m)).float() + + r_l = num_h / num_l + u_slmask = torch.rand_like(gt_iou_map) + u_slmask = u_lmask * u_slmask + u_slmask = (u_slmask > (1. - r_l)).float() + + weights = u_hmask + u_smmask + u_slmask + + loss = F.mse_loss(pred_score * weights, gt_iou_map * weights) + loss = 0.5 * torch.sum( + loss * torch.ones_like(weights)) / torch.sum(weights) + + return loss + + @staticmethod + def pem_cls_loss(pred_score, + gt_iou_map, + mask, + threshold=0.9, + ratio_range=(1.05, 21), + eps=1e-5): + """Calculate Proposal Evaluation Module Classification Loss. + + Args: + pred_score (torch.Tensor): Predicted temporal_iou score by BMN. + gt_iou_map (torch.Tensor): Groundtruth temporal_iou score. + mask (torch.Tensor): Boundary-Matching mask. + threshold (float): Threshold of temporal_iou for positive + instances. Default: 0.9. + ratio_range (tuple): Lower bound and upper bound for ratio. + Default: (1.05, 21) + eps (float): Epsilon for small value. Default: 1e-5 + + Returns: + torch.Tensor: Proposal evaluation classification loss. + """ + pmask = (gt_iou_map > threshold).float() + nmask = (gt_iou_map <= threshold).float() + nmask = nmask * mask + + num_positive = max(torch.sum(pmask), 1) + num_entries = num_positive + torch.sum(nmask) + ratio = num_entries / num_positive + ratio = torch.clamp(ratio, ratio_range[0], ratio_range[1]) + + coef_0 = 0.5 * ratio / (ratio - 1) + coef_1 = 0.5 * ratio + + loss_pos = coef_1 * torch.log(pred_score + eps) * pmask + loss_neg = coef_0 * torch.log(1.0 - pred_score + eps) * nmask + loss = -1 * torch.sum(loss_pos + loss_neg) / num_entries + return loss + + def forward(self, + pred_bm, + pred_start, + pred_end, + gt_iou_map, + gt_start, + gt_end, + bm_mask, + weight_tem=1.0, + weight_pem_reg=10.0, + weight_pem_cls=1.0): + """Calculate Boundary Matching Network Loss. + + Args: + pred_bm (torch.Tensor): Predicted confidence score for boundary + matching map. + pred_start (torch.Tensor): Predicted confidence score for start. + pred_end (torch.Tensor): Predicted confidence score for end. + gt_iou_map (torch.Tensor): Groundtruth score for boundary matching + map. + gt_start (torch.Tensor): Groundtruth temporal_iou score for start. + gt_end (torch.Tensor): Groundtruth temporal_iou score for end. + bm_mask (torch.Tensor): Boundary-Matching mask. + weight_tem (float): Weight for tem loss. Default: 1.0. + weight_pem_reg (float): Weight for pem regression loss. + Default: 10.0. + weight_pem_cls (float): Weight for pem classification loss. + Default: 1.0. + + Returns: + tuple([torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]): + (loss, tem_loss, pem_reg_loss, pem_cls_loss). Loss is the bmn + loss, tem_loss is the temporal evaluation loss, pem_reg_loss is + the proposal evaluation regression loss, pem_cls_loss is the + proposal evaluation classification loss. + """ + pred_bm_reg = pred_bm[:, 0].contiguous() + pred_bm_cls = pred_bm[:, 1].contiguous() + gt_iou_map = gt_iou_map * bm_mask + + pem_reg_loss = self.pem_reg_loss(pred_bm_reg, gt_iou_map, bm_mask) + pem_cls_loss = self.pem_cls_loss(pred_bm_cls, gt_iou_map, bm_mask) + tem_loss = self.tem_loss(pred_start, pred_end, gt_start, gt_end) + loss = ( + weight_tem * tem_loss + weight_pem_reg * pem_reg_loss + + weight_pem_cls * pem_cls_loss) + return loss, tem_loss, pem_reg_loss, pem_cls_loss diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/cross_entropy_loss.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/cross_entropy_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..fbb91d19bae15eadefe9204ec235c7ce631add37 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/cross_entropy_loss.py @@ -0,0 +1,191 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import torch +import torch.nn.functional as F + +from ..builder import LOSSES +from .base import BaseWeightedLoss + + +@LOSSES.register_module() +class CrossEntropyLoss(BaseWeightedLoss): + """Cross Entropy Loss. + + Support two kinds of labels and their corresponding loss type. It's worth + mentioning that loss type will be detected by the shape of ``cls_score`` + and ``label``. + 1) Hard label: This label is an integer array and all of the elements are + in the range [0, num_classes - 1]. This label's shape should be + ``cls_score``'s shape with the `num_classes` dimension removed. + 2) Soft label(probablity distribution over classes): This label is a + probability distribution and all of the elements are in the range + [0, 1]. This label's shape must be the same as ``cls_score``. For now, + only 2-dim soft label is supported. + + Args: + loss_weight (float): Factor scalar multiplied on the loss. + Default: 1.0. + class_weight (list[float] | None): Loss weight for each class. If set + as None, use the same weight 1 for all classes. Only applies + to CrossEntropyLoss and BCELossWithLogits (should not be set when + using other losses). Default: None. + """ + + def __init__(self, loss_weight=1.0, class_weight=None): + super().__init__(loss_weight=loss_weight) + self.class_weight = None + if class_weight is not None: + self.class_weight = torch.Tensor(class_weight) + + def _forward(self, cls_score, label, **kwargs): + """Forward function. + + Args: + cls_score (torch.Tensor): The class score. + label (torch.Tensor): The ground truth label. + kwargs: Any keyword argument to be used to calculate + CrossEntropy loss. + + Returns: + torch.Tensor: The returned CrossEntropy loss. + """ + if cls_score.size() == label.size(): + # calculate loss for soft label + + assert cls_score.dim() == 2, 'Only support 2-dim soft label' + assert len(kwargs) == 0, \ + ('For now, no extra args are supported for soft label, ' + f'but get {kwargs}') + + lsm = F.log_softmax(cls_score, 1) + if self.class_weight is not None: + self.class_weight = self.class_weight.to(cls_score.device) + lsm = lsm * self.class_weight.unsqueeze(0) + loss_cls = -(label * lsm).sum(1) + + # default reduction 'mean' + if self.class_weight is not None: + # Use weighted average as pytorch CrossEntropyLoss does. + # For more information, please visit https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html # noqa + loss_cls = loss_cls.sum() / torch.sum( + self.class_weight.unsqueeze(0) * label) + else: + loss_cls = loss_cls.mean() + else: + # calculate loss for hard label + + if self.class_weight is not None: + assert 'weight' not in kwargs, \ + "The key 'weight' already exists." + kwargs['weight'] = self.class_weight.to(cls_score.device) + loss_cls = F.cross_entropy(cls_score, label, **kwargs) + + return loss_cls + + +@LOSSES.register_module() +class BCELossWithLogits(BaseWeightedLoss): + """Binary Cross Entropy Loss with logits. + + Args: + loss_weight (float): Factor scalar multiplied on the loss. + Default: 1.0. + class_weight (list[float] | None): Loss weight for each class. If set + as None, use the same weight 1 for all classes. Only applies + to CrossEntropyLoss and BCELossWithLogits (should not be set when + using other losses). Default: None. + """ + + def __init__(self, loss_weight=1.0, class_weight=None): + super().__init__(loss_weight=loss_weight) + self.class_weight = None + if class_weight is not None: + self.class_weight = torch.Tensor(class_weight) + + def _forward(self, cls_score, label, **kwargs): + """Forward function. + + Args: + cls_score (torch.Tensor): The class score. + label (torch.Tensor): The ground truth label. + kwargs: Any keyword argument to be used to calculate + bce loss with logits. + + Returns: + torch.Tensor: The returned bce loss with logits. + """ + if self.class_weight is not None: + assert 'weight' not in kwargs, "The key 'weight' already exists." + kwargs['weight'] = self.class_weight.to(cls_score.device) + loss_cls = F.binary_cross_entropy_with_logits(cls_score, label, + **kwargs) + return loss_cls + + +@LOSSES.register_module() +class CBFocalLoss(BaseWeightedLoss): + """Class Balanced Focal Loss. Adapted from https://github.com/abhinanda- + punnakkal/BABEL/. This loss is used in the skeleton-based action + recognition baseline for BABEL. + + Args: + loss_weight (float): Factor scalar multiplied on the loss. + Default: 1.0. + samples_per_cls (list[int]): The number of samples per class. + Default: []. + beta (float): Hyperparameter that controls the per class loss weight. + Default: 0.9999. + gamma (float): Hyperparameter of the focal loss. Default: 2.0. + """ + + def __init__(self, + loss_weight=1.0, + samples_per_cls=[], + beta=0.9999, + gamma=2.): + super().__init__(loss_weight=loss_weight) + self.samples_per_cls = samples_per_cls + self.beta = beta + self.gamma = gamma + effective_num = 1.0 - np.power(beta, samples_per_cls) + weights = (1.0 - beta) / np.array(effective_num) + weights = weights / np.sum(weights) * len(weights) + self.weights = weights + self.num_classes = len(weights) + + def _forward(self, cls_score, label, **kwargs): + """Forward function. + + Args: + cls_score (torch.Tensor): The class score. + label (torch.Tensor): The ground truth label. + kwargs: Any keyword argument to be used to calculate + bce loss with logits. + + Returns: + torch.Tensor: The returned bce loss with logits. + """ + weights = torch.tensor(self.weights).float().to(cls_score.device) + label_one_hot = F.one_hot(label, self.num_classes).float() + weights = weights.unsqueeze(0) + weights = weights.repeat(label_one_hot.shape[0], 1) * label_one_hot + weights = weights.sum(1) + weights = weights.unsqueeze(1) + weights = weights.repeat(1, self.num_classes) + + BCELoss = F.binary_cross_entropy_with_logits( + input=cls_score, target=label_one_hot, reduction='none') + + modulator = 1.0 + if self.gamma: + modulator = torch.exp(-self.gamma * label_one_hot * cls_score - + self.gamma * + torch.log(1 + torch.exp(-1.0 * cls_score))) + + loss = modulator * BCELoss + weighted_loss = weights * loss + + focal_loss = torch.sum(weighted_loss) + focal_loss /= torch.sum(label_one_hot) + + return focal_loss diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/hvu_loss.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/hvu_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..9deb862177be8257eb432adcb84fb0122e4e254d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/hvu_loss.py @@ -0,0 +1,142 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn.functional as F + +from ..builder import LOSSES +from .base import BaseWeightedLoss + + +@LOSSES.register_module() +class HVULoss(BaseWeightedLoss): + """Calculate the BCELoss for HVU. + + Args: + categories (tuple[str]): Names of tag categories, tags are organized in + this order. Default: ['action', 'attribute', 'concept', 'event', + 'object', 'scene']. + category_nums (tuple[int]): Number of tags for each category. Default: + (739, 117, 291, 69, 1678, 248). + category_loss_weights (tuple[float]): Loss weights of categories, it + applies only if `loss_type == 'individual'`. The loss weights will + be normalized so that the sum equals to 1, so that you can give any + positive number as loss weight. Default: (1, 1, 1, 1, 1, 1). + loss_type (str): The loss type we calculate, we can either calculate + the BCELoss for all tags, or calculate the BCELoss for tags in each + category. Choices are 'individual' or 'all'. Default: 'all'. + with_mask (bool): Since some tag categories are missing for some video + clips. If `with_mask == True`, we will not calculate loss for these + missing categories. Otherwise, these missing categories are treated + as negative samples. + reduction (str): Reduction way. Choices are 'mean' or 'sum'. Default: + 'mean'. + loss_weight (float): The loss weight. Default: 1.0. + """ + + def __init__(self, + categories=('action', 'attribute', 'concept', 'event', + 'object', 'scene'), + category_nums=(739, 117, 291, 69, 1678, 248), + category_loss_weights=(1, 1, 1, 1, 1, 1), + loss_type='all', + with_mask=False, + reduction='mean', + loss_weight=1.0): + + super().__init__(loss_weight) + self.categories = categories + self.category_nums = category_nums + self.category_loss_weights = category_loss_weights + assert len(self.category_nums) == len(self.category_loss_weights) + for category_loss_weight in self.category_loss_weights: + assert category_loss_weight >= 0 + self.loss_type = loss_type + self.with_mask = with_mask + self.reduction = reduction + self.category_startidx = [0] + for i in range(len(self.category_nums) - 1): + self.category_startidx.append(self.category_startidx[-1] + + self.category_nums[i]) + assert self.loss_type in ['individual', 'all'] + assert self.reduction in ['mean', 'sum'] + + def _forward(self, cls_score, label, mask, category_mask): + """Forward function. + + Args: + cls_score (torch.Tensor): The class score. + label (torch.Tensor): The ground truth label. + mask (torch.Tensor): The mask of tags. 0 indicates that the + category of this tag is missing in the label of the video. + category_mask (torch.Tensor): The category mask. For each sample, + it's a tensor with length `len(self.categories)`, denotes that + if the category is labeled for this video. + + Returns: + torch.Tensor: The returned CrossEntropy loss. + """ + + if self.loss_type == 'all': + loss_cls = F.binary_cross_entropy_with_logits( + cls_score, label, reduction='none') + if self.with_mask: + w_loss_cls = mask * loss_cls + w_loss_cls = torch.sum(w_loss_cls, dim=1) + if self.reduction == 'mean': + w_loss_cls = w_loss_cls / torch.sum(mask, dim=1) + w_loss_cls = torch.mean(w_loss_cls) + return dict(loss_cls=w_loss_cls) + + if self.reduction == 'sum': + loss_cls = torch.sum(loss_cls, dim=-1) + return dict(loss_cls=torch.mean(loss_cls)) + + if self.loss_type == 'individual': + losses = {} + loss_weights = {} + for name, num, start_idx in zip(self.categories, + self.category_nums, + self.category_startidx): + category_score = cls_score[:, start_idx:start_idx + num] + category_label = label[:, start_idx:start_idx + num] + category_loss = F.binary_cross_entropy_with_logits( + category_score, category_label, reduction='none') + if self.reduction == 'mean': + category_loss = torch.mean(category_loss, dim=1) + elif self.reduction == 'sum': + category_loss = torch.sum(category_loss, dim=1) + + idx = self.categories.index(name) + if self.with_mask: + category_mask_i = category_mask[:, idx].reshape(-1) + # there should be at least one sample which contains tags + # in this category + if torch.sum(category_mask_i) < 0.5: + losses[f'{name}_LOSS'] = torch.tensor(.0).cuda() + loss_weights[f'{name}_LOSS'] = .0 + continue + category_loss = torch.sum(category_loss * category_mask_i) + category_loss = category_loss / torch.sum(category_mask_i) + else: + category_loss = torch.mean(category_loss) + # We name the loss of each category as 'LOSS', since we only + # want to monitor them, not backward them. We will also provide + # the loss used for backward in the losses dictionary + losses[f'{name}_LOSS'] = category_loss + loss_weights[f'{name}_LOSS'] = self.category_loss_weights[idx] + loss_weight_sum = sum(loss_weights.values()) + loss_weights = { + k: v / loss_weight_sum + for k, v in loss_weights.items() + } + loss_cls = sum([losses[k] * loss_weights[k] for k in losses]) + losses['loss_cls'] = loss_cls + # We also trace the loss weights + losses.update({ + k + '_weight': torch.tensor(v).to(losses[k].device) + for k, v in loss_weights.items() + }) + # Note that the loss weights are just for reference. + return losses + else: + raise ValueError("loss_type should be 'all' or 'individual', " + f'but got {self.loss_type}') diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/nll_loss.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/nll_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..754b498ac4888084d5a3942e4200b4fc556cc57d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/nll_loss.py @@ -0,0 +1,27 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch.nn.functional as F + +from ..builder import LOSSES +from .base import BaseWeightedLoss + + +@LOSSES.register_module() +class NLLLoss(BaseWeightedLoss): + """NLL Loss. + + It will calculate NLL loss given cls_score and label. + """ + + def _forward(self, cls_score, label, **kwargs): + """Forward function. + + Args: + cls_score (torch.Tensor): The class score. + label (torch.Tensor): The ground truth label. + kwargs: Any keyword argument to be used to calculate nll loss. + + Returns: + torch.Tensor: The returned nll loss. + """ + loss_cls = F.nll_loss(cls_score, label, **kwargs) + return loss_cls diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/ohem_hinge_loss.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/ohem_hinge_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..8804a194ee210ba86de4dcf47f317e6e93495278 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/ohem_hinge_loss.py @@ -0,0 +1,65 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch + + +class OHEMHingeLoss(torch.autograd.Function): + """This class is the core implementation for the completeness loss in + paper. + + It compute class-wise hinge loss and performs online hard example mining + (OHEM). + """ + + @staticmethod + def forward(ctx, pred, labels, is_positive, ohem_ratio, group_size): + """Calculate OHEM hinge loss. + + Args: + pred (torch.Tensor): Predicted completeness score. + labels (torch.Tensor): Groundtruth class label. + is_positive (int): Set to 1 when proposals are positive and + set to -1 when proposals are incomplete. + ohem_ratio (float): Ratio of hard examples. + group_size (int): Number of proposals sampled per video. + + Returns: + torch.Tensor: Returned class-wise hinge loss. + """ + num_samples = pred.size(0) + if num_samples != len(labels): + raise ValueError(f'Number of samples should be equal to that ' + f'of labels, but got {num_samples} samples and ' + f'{len(labels)} labels.') + + losses = torch.zeros(num_samples, device=pred.device) + slopes = torch.zeros(num_samples, device=pred.device) + for i in range(num_samples): + losses[i] = max(0, 1 - is_positive * pred[i, labels[i] - 1]) + slopes[i] = -is_positive if losses[i] != 0 else 0 + + losses = losses.view(-1, group_size).contiguous() + sorted_losses, indices = torch.sort(losses, dim=1, descending=True) + keep_length = int(group_size * ohem_ratio) + loss = torch.zeros(1, device=pred.device) + for i in range(losses.size(0)): + loss += sorted_losses[i, :keep_length].sum() + ctx.loss_index = indices[:, :keep_length] + ctx.labels = labels + ctx.slopes = slopes + ctx.shape = pred.size() + ctx.group_size = group_size + ctx.num_groups = losses.size(0) + return loss + + @staticmethod + def backward(ctx, grad_output): + labels = ctx.labels + slopes = ctx.slopes + + grad_in = torch.zeros(ctx.shape, device=ctx.slopes.device) + for group in range(ctx.num_groups): + for idx in ctx.loss_index[group]: + loc = idx + group * ctx.group_size + grad_in[loc, labels[loc] - 1] = ( + slopes[loc] * grad_output.data[0]) + return torch.autograd.Variable(grad_in), None, None, None, None diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/ssn_loss.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/ssn_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..02c03e3efa8dac5b2e875d3a62310df609f34880 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/losses/ssn_loss.py @@ -0,0 +1,180 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +import torch.nn.functional as F + +from ..builder import LOSSES +from .ohem_hinge_loss import OHEMHingeLoss + + +@LOSSES.register_module() +class SSNLoss(nn.Module): + + @staticmethod + def activity_loss(activity_score, labels, activity_indexer): + """Activity Loss. + + It will calculate activity loss given activity_score and label. + + Args: + activity_score (torch.Tensor): Predicted activity score. + labels (torch.Tensor): Groundtruth class label. + activity_indexer (torch.Tensor): Index slices of proposals. + + Returns: + torch.Tensor: Returned cross entropy loss. + """ + pred = activity_score[activity_indexer, :] + gt = labels[activity_indexer] + return F.cross_entropy(pred, gt) + + @staticmethod + def completeness_loss(completeness_score, + labels, + completeness_indexer, + positive_per_video, + incomplete_per_video, + ohem_ratio=0.17): + """Completeness Loss. + + It will calculate completeness loss given completeness_score and label. + + Args: + completeness_score (torch.Tensor): Predicted completeness score. + labels (torch.Tensor): Groundtruth class label. + completeness_indexer (torch.Tensor): Index slices of positive and + incomplete proposals. + positive_per_video (int): Number of positive proposals sampled + per video. + incomplete_per_video (int): Number of incomplete proposals sampled + pre video. + ohem_ratio (float): Ratio of online hard example mining. + Default: 0.17. + + Returns: + torch.Tensor: Returned class-wise completeness loss. + """ + pred = completeness_score[completeness_indexer, :] + gt = labels[completeness_indexer] + + pred_dim = pred.size(1) + pred = pred.view(-1, positive_per_video + incomplete_per_video, + pred_dim) + gt = gt.view(-1, positive_per_video + incomplete_per_video) + + # yapf:disable + positive_pred = pred[:, :positive_per_video, :].contiguous().view(-1, pred_dim) # noqa:E501 + incomplete_pred = pred[:, positive_per_video:, :].contiguous().view(-1, pred_dim) # noqa:E501 + # yapf:enable + + positive_loss = OHEMHingeLoss.apply( + positive_pred, gt[:, :positive_per_video].contiguous().view(-1), 1, + 1.0, positive_per_video) + incomplete_loss = OHEMHingeLoss.apply( + incomplete_pred, gt[:, positive_per_video:].contiguous().view(-1), + -1, ohem_ratio, incomplete_per_video) + num_positives = positive_pred.size(0) + num_incompletes = int(incomplete_pred.size(0) * ohem_ratio) + + return ((positive_loss + incomplete_loss) / + float(num_positives + num_incompletes)) + + @staticmethod + def classwise_regression_loss(bbox_pred, labels, bbox_targets, + regression_indexer): + """Classwise Regression Loss. + + It will calculate classwise_regression loss given + class_reg_pred and targets. + + Args: + bbox_pred (torch.Tensor): Predicted interval center and span + of positive proposals. + labels (torch.Tensor): Groundtruth class label. + bbox_targets (torch.Tensor): Groundtruth center and span + of positive proposals. + regression_indexer (torch.Tensor): Index slices of + positive proposals. + + Returns: + torch.Tensor: Returned class-wise regression loss. + """ + pred = bbox_pred[regression_indexer, :, :] + gt = labels[regression_indexer] + reg_target = bbox_targets[regression_indexer, :] + + class_idx = gt.data - 1 + classwise_pred = pred[:, class_idx, :] + classwise_reg_pred = torch.cat( + (torch.diag(classwise_pred[:, :, 0]).view( + -1, 1), torch.diag(classwise_pred[:, :, 1]).view(-1, 1)), + dim=1) + loss = F.smooth_l1_loss( + classwise_reg_pred.view(-1), reg_target.view(-1)) * 2 + return loss + + def forward(self, activity_score, completeness_score, bbox_pred, + proposal_type, labels, bbox_targets, train_cfg): + """Calculate Boundary Matching Network Loss. + + Args: + activity_score (torch.Tensor): Predicted activity score. + completeness_score (torch.Tensor): Predicted completeness score. + bbox_pred (torch.Tensor): Predicted interval center and span + of positive proposals. + proposal_type (torch.Tensor): Type index slices of proposals. + labels (torch.Tensor): Groundtruth class label. + bbox_targets (torch.Tensor): Groundtruth center and span + of positive proposals. + train_cfg (dict): Config for training. + + Returns: + dict([torch.Tensor, torch.Tensor, torch.Tensor]): + (loss_activity, loss_completeness, loss_reg). + Loss_activity is the activity loss, loss_completeness is + the class-wise completeness loss, + loss_reg is the class-wise regression loss. + """ + self.sampler = train_cfg.ssn.sampler + self.loss_weight = train_cfg.ssn.loss_weight + losses = dict() + + proposal_type = proposal_type.view(-1) + labels = labels.view(-1) + activity_indexer = ((proposal_type == 0) + + (proposal_type == 2)).nonzero().squeeze(1) + completeness_indexer = ((proposal_type == 0) + + (proposal_type == 1)).nonzero().squeeze(1) + + total_ratio = ( + self.sampler.positive_ratio + self.sampler.background_ratio + + self.sampler.incomplete_ratio) + positive_per_video = int(self.sampler.num_per_video * + (self.sampler.positive_ratio / total_ratio)) + background_per_video = int( + self.sampler.num_per_video * + (self.sampler.background_ratio / total_ratio)) + incomplete_per_video = ( + self.sampler.num_per_video - positive_per_video - + background_per_video) + + losses['loss_activity'] = self.activity_loss(activity_score, labels, + activity_indexer) + + losses['loss_completeness'] = self.completeness_loss( + completeness_score, + labels, + completeness_indexer, + positive_per_video, + incomplete_per_video, + ohem_ratio=positive_per_video / incomplete_per_video) + losses['loss_completeness'] *= self.loss_weight.comp_loss_weight + + if bbox_pred is not None: + regression_indexer = (proposal_type == 0).nonzero().squeeze(1) + bbox_targets = bbox_targets.view(-1, 2) + losses['loss_reg'] = self.classwise_regression_loss( + bbox_pred, labels, bbox_targets, regression_indexer) + losses['loss_reg'] *= self.loss_weight.reg_loss_weight + + return losses diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/necks/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/necks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..4ffd340960170e55db6976ef9e5b8fa368a4647d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/necks/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .tpn import TPN + +__all__ = ['TPN'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/necks/tpn.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/necks/tpn.py new file mode 100644 index 0000000000000000000000000000000000000000..5770ffa98ea7605ee8485ad025ddd5be7f3cba50 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/necks/tpn.py @@ -0,0 +1,449 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import torch +import torch.nn as nn +from mmcv.cnn import ConvModule, constant_init, normal_init, xavier_init + +from ..builder import NECKS, build_loss + + +class Identity(nn.Module): + """Identity mapping.""" + + def forward(self, x): + return x + + +class DownSample(nn.Module): + """DownSample modules. + + It uses convolution and maxpooling to downsample the input feature, + and specifies downsample position to determine `pool-conv` or `conv-pool`. + + Args: + in_channels (int): Channel number of input features. + out_channels (int): Channel number of output feature. + kernel_size (int | tuple[int]): Same as :class:`ConvModule`. + Default: (3, 1, 1). + stride (int | tuple[int]): Same as :class:`ConvModule`. + Default: (1, 1, 1). + padding (int | tuple[int]): Same as :class:`ConvModule`. + Default: (1, 0, 0). + groups (int): Same as :class:`ConvModule`. Default: 1. + bias (bool | str): Same as :class:`ConvModule`. Default: False. + conv_cfg (dict | None): Same as :class:`ConvModule`. + Default: dict(type='Conv3d'). + norm_cfg (dict | None): Same as :class:`ConvModule`. Default: None. + act_cfg (dict | None): Same as :class:`ConvModule`. Default: None. + downsample_position (str): Type of downsample position. Options are + 'before' and 'after'. Default: 'after'. + downsample_scale (int | tuple[int]): downsample scale for maxpooling. + It will be used for kernel size and stride of maxpooling. + Default: (1, 2, 2). + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size=(3, 1, 1), + stride=(1, 1, 1), + padding=(1, 0, 0), + groups=1, + bias=False, + conv_cfg=dict(type='Conv3d'), + norm_cfg=None, + act_cfg=None, + downsample_position='after', + downsample_scale=(1, 2, 2)): + super().__init__() + self.conv = ConvModule( + in_channels, + out_channels, + kernel_size, + stride, + padding, + groups=groups, + bias=bias, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=act_cfg) + assert downsample_position in ['before', 'after'] + self.downsample_position = downsample_position + self.pool = nn.MaxPool3d( + downsample_scale, downsample_scale, (0, 0, 0), ceil_mode=True) + + def forward(self, x): + if self.downsample_position == 'before': + x = self.pool(x) + x = self.conv(x) + else: + x = self.conv(x) + x = self.pool(x) + return x + + +class LevelFusion(nn.Module): + """Level Fusion module. + + This module is used to aggregate the hierarchical features dynamic in + visual tempos and consistent in spatial semantics. The top/bottom features + for top-down/bottom-up flow would be combined to achieve two additional + options, namely 'Cascade Flow' or 'Parallel Flow'. While applying a + bottom-up flow after a top-down flow will lead to the cascade flow, + applying them simultaneously will result in the parallel flow. + + Args: + in_channels (tuple[int]): Channel numbers of input features tuple. + mid_channels (tuple[int]): Channel numbers of middle features tuple. + out_channels (int): Channel numbers of output features. + downsample_scales (tuple[int | tuple[int]]): downsample scales for + each :class:`DownSample` module. Default: ((1, 1, 1), (1, 1, 1)). + """ + + def __init__(self, + in_channels, + mid_channels, + out_channels, + downsample_scales=((1, 1, 1), (1, 1, 1))): + super().__init__() + num_stages = len(in_channels) + + self.downsamples = nn.ModuleList() + for i in range(num_stages): + downsample = DownSample( + in_channels[i], + mid_channels[i], + kernel_size=(1, 1, 1), + stride=(1, 1, 1), + bias=False, + padding=(0, 0, 0), + groups=32, + norm_cfg=dict(type='BN3d', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True), + downsample_position='before', + downsample_scale=downsample_scales[i]) + self.downsamples.append(downsample) + + self.fusion_conv = ConvModule( + sum(mid_channels), + out_channels, + 1, + stride=1, + padding=0, + bias=False, + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True)) + + def forward(self, x): + out = [self.downsamples[i](feature) for i, feature in enumerate(x)] + out = torch.cat(out, 1) + out = self.fusion_conv(out) + + return out + + +class SpatialModulation(nn.Module): + """Spatial Semantic Modulation. + + This module is used to align spatial semantics of features in the + multi-depth pyramid. For each but the top-level feature, a stack + of convolutions with level-specific stride are applied to it, matching + its spatial shape and receptive field with the top one. + + Args: + in_channels (tuple[int]): Channel numbers of input features tuple. + out_channels (int): Channel numbers of output features tuple. + """ + + def __init__(self, in_channels, out_channels): + super().__init__() + + self.spatial_modulation = nn.ModuleList() + for channel in in_channels: + downsample_scale = out_channels // channel + downsample_factor = int(np.log2(downsample_scale)) + op = nn.ModuleList() + if downsample_factor < 1: + op = Identity() + else: + for factor in range(downsample_factor): + in_factor = 2**factor + out_factor = 2**(factor + 1) + op.append( + ConvModule( + channel * in_factor, + channel * out_factor, (1, 3, 3), + stride=(1, 2, 2), + padding=(0, 1, 1), + bias=False, + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d', requires_grad=True), + act_cfg=dict(type='ReLU', inplace=True))) + self.spatial_modulation.append(op) + + def forward(self, x): + out = [] + for i, _ in enumerate(x): + if isinstance(self.spatial_modulation[i], nn.ModuleList): + out_ = x[i] + for op in self.spatial_modulation[i]: + out_ = op(out_) + out.append(out_) + else: + out.append(self.spatial_modulation[i](x[i])) + return out + + +class AuxHead(nn.Module): + """Auxiliary Head. + + This auxiliary head is appended to receive stronger supervision, + leading to enhanced semantics. + + Args: + in_channels (int): Channel number of input features. + out_channels (int): Channel number of output features. + loss_weight (float): weight of loss for the auxiliary head. + Default: 0.5. + loss_cls (dict): loss_cls (dict): Config for building loss. + Default: ``dict(type='CrossEntropyLoss')``. + """ + + def __init__(self, + in_channels, + out_channels, + loss_weight=0.5, + loss_cls=dict(type='CrossEntropyLoss')): + super().__init__() + + self.conv = ConvModule( + in_channels, + in_channels * 2, (1, 3, 3), + stride=(1, 2, 2), + padding=(0, 1, 1), + bias=False, + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d', requires_grad=True)) + self.avg_pool = nn.AdaptiveAvgPool3d((1, 1, 1)) + self.loss_weight = loss_weight + self.dropout = nn.Dropout(p=0.5) + self.fc = nn.Linear(in_channels * 2, out_channels) + self.loss_cls = build_loss(loss_cls) + + def init_weights(self): + for m in self.modules(): + if isinstance(m, nn.Linear): + normal_init(m, std=0.01) + if isinstance(m, nn.Conv3d): + xavier_init(m, distribution='uniform') + if isinstance(m, nn.BatchNorm3d): + constant_init(m, 1) + + def forward(self, x, target=None): + losses = dict() + if target is None: + return losses + x = self.conv(x) + x = self.avg_pool(x).squeeze(-1).squeeze(-1).squeeze(-1) + x = self.dropout(x) + x = self.fc(x) + + if target.shape == torch.Size([]): + target = target.unsqueeze(0) + + losses['loss_aux'] = self.loss_weight * self.loss_cls(x, target) + return losses + + +class TemporalModulation(nn.Module): + """Temporal Rate Modulation. + + The module is used to equip TPN with a similar flexibility for temporal + tempo modulation as in the input-level frame pyramid. + + Args: + in_channels (int): Channel number of input features. + out_channels (int): Channel number of output features. + downsample_scale (int): Downsample scale for maxpooling. Default: 8. + """ + + def __init__(self, in_channels, out_channels, downsample_scale=8): + super().__init__() + + self.conv = ConvModule( + in_channels, + out_channels, (3, 1, 1), + stride=(1, 1, 1), + padding=(1, 0, 0), + bias=False, + groups=32, + conv_cfg=dict(type='Conv3d'), + act_cfg=None) + self.pool = nn.MaxPool3d((downsample_scale, 1, 1), + (downsample_scale, 1, 1), (0, 0, 0), + ceil_mode=True) + + def forward(self, x): + x = self.conv(x) + x = self.pool(x) + return x + + +@NECKS.register_module() +class TPN(nn.Module): + """TPN neck. + + This module is proposed in `Temporal Pyramid Network for Action Recognition + `_ + + Args: + in_channels (tuple[int]): Channel numbers of input features tuple. + out_channels (int): Channel number of output feature. + spatial_modulation_cfg (dict | None): Config for spatial modulation + layers. Required keys are `in_channels` and `out_channels`. + Default: None. + temporal_modulation_cfg (dict | None): Config for temporal modulation + layers. Default: None. + upsample_cfg (dict | None): Config for upsample layers. The keys are + same as that in :class:``nn.Upsample``. Default: None. + downsample_cfg (dict | None): Config for downsample layers. + Default: None. + level_fusion_cfg (dict | None): Config for level fusion layers. + Required keys are 'in_channels', 'mid_channels', 'out_channels'. + Default: None. + aux_head_cfg (dict | None): Config for aux head layers. + Required keys are 'out_channels'. Default: None. + flow_type (str): Flow type to combine the features. Options are + 'cascade' and 'parallel'. Default: 'cascade'. + """ + + def __init__(self, + in_channels, + out_channels, + spatial_modulation_cfg=None, + temporal_modulation_cfg=None, + upsample_cfg=None, + downsample_cfg=None, + level_fusion_cfg=None, + aux_head_cfg=None, + flow_type='cascade'): + super().__init__() + assert isinstance(in_channels, tuple) + assert isinstance(out_channels, int) + self.in_channels = in_channels + self.out_channels = out_channels + self.num_tpn_stages = len(in_channels) + + assert spatial_modulation_cfg is None or isinstance( + spatial_modulation_cfg, dict) + assert temporal_modulation_cfg is None or isinstance( + temporal_modulation_cfg, dict) + assert upsample_cfg is None or isinstance(upsample_cfg, dict) + assert downsample_cfg is None or isinstance(downsample_cfg, dict) + assert aux_head_cfg is None or isinstance(aux_head_cfg, dict) + assert level_fusion_cfg is None or isinstance(level_fusion_cfg, dict) + + if flow_type not in ['cascade', 'parallel']: + raise ValueError( + f"flow type in TPN should be 'cascade' or 'parallel', " + f'but got {flow_type} instead.') + self.flow_type = flow_type + + self.temporal_modulation_ops = nn.ModuleList() + self.upsample_ops = nn.ModuleList() + self.downsample_ops = nn.ModuleList() + + self.level_fusion_1 = LevelFusion(**level_fusion_cfg) + self.spatial_modulation = SpatialModulation(**spatial_modulation_cfg) + + for i in range(self.num_tpn_stages): + + if temporal_modulation_cfg is not None: + downsample_scale = temporal_modulation_cfg[ + 'downsample_scales'][i] + temporal_modulation = TemporalModulation( + in_channels[-1], out_channels, downsample_scale) + self.temporal_modulation_ops.append(temporal_modulation) + + if i < self.num_tpn_stages - 1: + if upsample_cfg is not None: + upsample = nn.Upsample(**upsample_cfg) + self.upsample_ops.append(upsample) + + if downsample_cfg is not None: + downsample = DownSample(out_channels, out_channels, + **downsample_cfg) + self.downsample_ops.append(downsample) + + out_dims = level_fusion_cfg['out_channels'] + + # two pyramids + self.level_fusion_2 = LevelFusion(**level_fusion_cfg) + + self.pyramid_fusion = ConvModule( + out_dims * 2, + 2048, + 1, + stride=1, + padding=0, + bias=False, + conv_cfg=dict(type='Conv3d'), + norm_cfg=dict(type='BN3d', requires_grad=True)) + + if aux_head_cfg is not None: + self.aux_head = AuxHead(self.in_channels[-2], **aux_head_cfg) + else: + self.aux_head = None + self.init_weights() + + # default init_weights for conv(msra) and norm in ConvModule + def init_weights(self): + for m in self.modules(): + if isinstance(m, nn.Conv3d): + xavier_init(m, distribution='uniform') + if isinstance(m, nn.BatchNorm3d): + constant_init(m, 1) + + if self.aux_head is not None: + self.aux_head.init_weights() + + def forward(self, x, target=None): + loss_aux = dict() + + # Auxiliary loss + if self.aux_head is not None: + loss_aux = self.aux_head(x[-2], target) + + # Spatial Modulation + spatial_modulation_outs = self.spatial_modulation(x) + + # Temporal Modulation + temporal_modulation_outs = [] + for i, temporal_modulation in enumerate(self.temporal_modulation_ops): + temporal_modulation_outs.append( + temporal_modulation(spatial_modulation_outs[i])) + + outs = [out.clone() for out in temporal_modulation_outs] + if len(self.upsample_ops) != 0: + for i in range(self.num_tpn_stages - 1, 0, -1): + outs[i - 1] = outs[i - 1] + self.upsample_ops[i - 1](outs[i]) + + # Get top-down outs + top_down_outs = self.level_fusion_1(outs) + + # Build bottom-up flow using downsample operation + if self.flow_type == 'parallel': + outs = [out.clone() for out in temporal_modulation_outs] + if len(self.downsample_ops) != 0: + for i in range(self.num_tpn_stages - 1): + outs[i + 1] = outs[i + 1] + self.downsample_ops[i](outs[i]) + + # Get bottom-up outs + botton_up_outs = self.level_fusion_2(outs) + + # fuse two pyramid outs + outs = self.pyramid_fusion( + torch.cat([top_down_outs, botton_up_outs], 1)) + + return outs, loss_aux diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..47c06f879a297e8f7d9c069d61d20ea6dfbae681 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/__init__.py @@ -0,0 +1,7 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .audio_recognizer import AudioRecognizer +from .base import BaseRecognizer +from .recognizer2d import Recognizer2D +from .recognizer3d import Recognizer3D + +__all__ = ['BaseRecognizer', 'Recognizer2D', 'Recognizer3D', 'AudioRecognizer'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/audio_recognizer.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/audio_recognizer.py new file mode 100644 index 0000000000000000000000000000000000000000..6d5c828207778c906edcbeab160b312c39be2162 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/audio_recognizer.py @@ -0,0 +1,102 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from ..builder import RECOGNIZERS +from .base import BaseRecognizer + + +@RECOGNIZERS.register_module() +class AudioRecognizer(BaseRecognizer): + """Audio recognizer model framework.""" + + def forward(self, audios, label=None, return_loss=True): + """Define the computation performed at every call.""" + if return_loss: + if label is None: + raise ValueError('Label should not be None.') + return self.forward_train(audios, label) + + return self.forward_test(audios) + + def forward_train(self, audios, labels): + """Defines the computation performed at every call when training.""" + audios = audios.reshape((-1, ) + audios.shape[2:]) + x = self.extract_feat(audios) + cls_score = self.cls_head(x) + gt_labels = labels.squeeze() + loss = self.cls_head.loss(cls_score, gt_labels) + + return loss + + def forward_test(self, audios): + """Defines the computation performed at every call when evaluation and + testing.""" + num_segs = audios.shape[1] + audios = audios.reshape((-1, ) + audios.shape[2:]) + x = self.extract_feat(audios) + cls_score = self.cls_head(x) + cls_score = self.average_clip(cls_score, num_segs) + + return cls_score.cpu().numpy() + + def forward_gradcam(self, audios): + raise NotImplementedError + + def train_step(self, data_batch, optimizer, **kwargs): + """The iteration step during training. + + This method defines an iteration step during training, except for the + back propagation and optimizer updating, which are done in an optimizer + hook. Note that in some complicated cases or models, the whole process + including back propagation and optimizer updating is also defined in + this method, such as GAN. + + Args: + data_batch (dict): The output of dataloader. + optimizer (:obj:`torch.optim.Optimizer` | dict): The optimizer of + runner is passed to ``train_step()``. This argument is unused + and reserved. + + Returns: + dict: It should contain at least 3 keys: ``loss``, ``log_vars``, + ``num_samples``. + ``loss`` is a tensor for back propagation, which can be a + weighted sum of multiple losses. + ``log_vars`` contains all the variables to be sent to the + logger. + ``num_samples`` indicates the batch size (when the model is + DDP, it means the batch size on each GPU), which is used for + averaging the logs. + """ + audios = data_batch['audios'] + label = data_batch['label'] + + losses = self(audios, label) + + loss, log_vars = self._parse_losses(losses) + + outputs = dict( + loss=loss, + log_vars=log_vars, + num_samples=len(next(iter(data_batch.values())))) + + return outputs + + def val_step(self, data_batch, optimizer, **kwargs): + """The iteration step during validation. + + This method shares the same signature as :func:`train_step`, but used + during val epochs. Note that the evaluation after training epochs is + not implemented with this method, but an evaluation hook. + """ + audios = data_batch['audios'] + label = data_batch['label'] + + losses = self(audios, label) + + loss, log_vars = self._parse_losses(losses) + + outputs = dict( + loss=loss, + log_vars=log_vars, + num_samples=len(next(iter(data_batch.values())))) + + return outputs diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/base.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/base.py new file mode 100644 index 0000000000000000000000000000000000000000..a06ec10461fc9bf1d084d345e41d8a8b42ccf551 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/base.py @@ -0,0 +1,335 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import warnings +from abc import ABCMeta, abstractmethod +from collections import OrderedDict + +import torch +import torch.distributed as dist +import torch.nn as nn +import torch.nn.functional as F +from mmcv.runner import auto_fp16 + +from .. import builder + + +class BaseRecognizer(nn.Module, metaclass=ABCMeta): + """Base class for recognizers. + + All recognizers should subclass it. + All subclass should overwrite: + + - Methods:``forward_train``, supporting to forward when training. + - Methods:``forward_test``, supporting to forward when testing. + + Args: + backbone (dict): Backbone modules to extract feature. + cls_head (dict | None): Classification head to process feature. + Default: None. + neck (dict | None): Neck for feature fusion. Default: None. + train_cfg (dict | None): Config for training. Default: None. + test_cfg (dict | None): Config for testing. Default: None. + """ + + def __init__(self, + backbone, + cls_head=None, + neck=None, + train_cfg=None, + test_cfg=None): + super().__init__() + # record the source of the backbone + self.backbone_from = 'mmaction2' + + if backbone['type'].startswith('mmcls.'): + try: + import mmcls.models.builder as mmcls_builder + except (ImportError, ModuleNotFoundError): + raise ImportError('Please install mmcls to use this backbone.') + backbone['type'] = backbone['type'][6:] + self.backbone = mmcls_builder.build_backbone(backbone) + self.backbone_from = 'mmcls' + elif backbone['type'].startswith('torchvision.'): + try: + import torchvision.models + except (ImportError, ModuleNotFoundError): + raise ImportError('Please install torchvision to use this ' + 'backbone.') + backbone_type = backbone.pop('type')[12:] + self.backbone = torchvision.models.__dict__[backbone_type]( + **backbone) + # disable the classifier + self.backbone.classifier = nn.Identity() + self.backbone.fc = nn.Identity() + self.backbone_from = 'torchvision' + elif backbone['type'].startswith('timm.'): + try: + import timm + except (ImportError, ModuleNotFoundError): + raise ImportError('Please install timm to use this ' + 'backbone.') + backbone_type = backbone.pop('type')[5:] + # disable the classifier + backbone['num_classes'] = 0 + self.backbone = timm.create_model(backbone_type, **backbone) + self.backbone_from = 'timm' + else: + self.backbone = builder.build_backbone(backbone) + + if neck is not None: + self.neck = builder.build_neck(neck) + + self.cls_head = builder.build_head(cls_head) if cls_head else None + + self.train_cfg = train_cfg + self.test_cfg = test_cfg + + # aux_info is the list of tensor names beyond 'imgs' and 'label' which + # will be used in train_step and val_step, data_batch should contain + # these tensors + self.aux_info = [] + if train_cfg is not None and 'aux_info' in train_cfg: + self.aux_info = train_cfg['aux_info'] + # max_testing_views should be int + self.max_testing_views = None + if test_cfg is not None and 'max_testing_views' in test_cfg: + self.max_testing_views = test_cfg['max_testing_views'] + assert isinstance(self.max_testing_views, int) + + if test_cfg is not None and 'feature_extraction' in test_cfg: + self.feature_extraction = test_cfg['feature_extraction'] + else: + self.feature_extraction = False + + # mini-batch blending, e.g. mixup, cutmix, etc. + self.blending = None + if train_cfg is not None and 'blending' in train_cfg: + from mmcv.utils import build_from_cfg + + from mmaction.datasets.builder import BLENDINGS + self.blending = build_from_cfg(train_cfg['blending'], BLENDINGS) + + self.init_weights() + + self.fp16_enabled = False + + @property + def with_neck(self): + """bool: whether the recognizer has a neck""" + return hasattr(self, 'neck') and self.neck is not None + + @property + def with_cls_head(self): + """bool: whether the recognizer has a cls_head""" + return hasattr(self, 'cls_head') and self.cls_head is not None + + def init_weights(self): + """Initialize the model network weights.""" + if self.backbone_from in ['mmcls', 'mmaction2']: + self.backbone.init_weights() + elif self.backbone_from in ['torchvision', 'timm']: + warnings.warn('We do not initialize weights for backbones in ' + f'{self.backbone_from}, since the weights for ' + f'backbones in {self.backbone_from} are initialized' + 'in their __init__ functions.') + else: + raise NotImplementedError('Unsupported backbone source ' + f'{self.backbone_from}!') + + if self.with_cls_head: + self.cls_head.init_weights() + if self.with_neck: + self.neck.init_weights() + + @auto_fp16() + def extract_feat(self, imgs): + """Extract features through a backbone. + + Args: + imgs (torch.Tensor): The input images. + + Returns: + torch.tensor: The extracted features. + """ + if (hasattr(self.backbone, 'features') + and self.backbone_from == 'torchvision'): + x = self.backbone.features(imgs) + elif self.backbone_from == 'timm': + x = self.backbone.forward_features(imgs) + elif self.backbone_from == 'mmcls': + x = self.backbone(imgs) + if isinstance(x, tuple): + assert len(x) == 1 + x = x[0] + else: + x = self.backbone(imgs) + return x + + def average_clip(self, cls_score, num_segs=1): + """Averaging class score over multiple clips. + + Using different averaging types ('score' or 'prob' or None, + which defined in test_cfg) to computed the final averaged + class score. Only called in test mode. + + Args: + cls_score (torch.Tensor): Class score to be averaged. + num_segs (int): Number of clips for each input sample. + + Returns: + torch.Tensor: Averaged class score. + """ + if 'average_clips' not in self.test_cfg.keys(): + raise KeyError('"average_clips" must defined in test_cfg\'s keys') + + average_clips = self.test_cfg['average_clips'] + if average_clips not in ['score', 'prob', None]: + raise ValueError(f'{average_clips} is not supported. ' + f'Currently supported ones are ' + f'["score", "prob", None]') + + if average_clips is None: + return cls_score + + batch_size = cls_score.shape[0] + cls_score = cls_score.view(batch_size // num_segs, num_segs, -1) + + if average_clips == 'prob': + cls_score = F.softmax(cls_score, dim=2).mean(dim=1) + elif average_clips == 'score': + cls_score = cls_score.mean(dim=1) + + return cls_score + + @abstractmethod + def forward_train(self, imgs, labels, **kwargs): + """Defines the computation performed at every call when training.""" + + @abstractmethod + def forward_test(self, imgs): + """Defines the computation performed at every call when evaluation and + testing.""" + + @abstractmethod + def forward_gradcam(self, imgs): + """Defines the computation performed at every all when using gradcam + utils.""" + + @staticmethod + def _parse_losses(losses): + """Parse the raw outputs (losses) of the network. + + Args: + losses (dict): Raw output of the network, which usually contain + losses and other necessary information. + + Returns: + tuple[Tensor, dict]: (loss, log_vars), loss is the loss tensor + which may be a weighted sum of all losses, log_vars contains + all the variables to be sent to the logger. + """ + log_vars = OrderedDict() + for loss_name, loss_value in losses.items(): + if isinstance(loss_value, torch.Tensor): + log_vars[loss_name] = loss_value.mean() + elif isinstance(loss_value, list): + log_vars[loss_name] = sum(_loss.mean() for _loss in loss_value) + else: + raise TypeError( + f'{loss_name} is not a tensor or list of tensors') + + loss = sum(_value for _key, _value in log_vars.items() + if 'loss' in _key) + + log_vars['loss'] = loss + for loss_name, loss_value in log_vars.items(): + # reduce loss when distributed training + if dist.is_available() and dist.is_initialized(): + loss_value = loss_value.data.clone() + dist.all_reduce(loss_value.div_(dist.get_world_size())) + log_vars[loss_name] = loss_value.item() + + return loss, log_vars + + def forward(self, imgs, label=None, return_loss=True, **kwargs): + """Define the computation performed at every call.""" + if kwargs.get('gradcam', False): + del kwargs['gradcam'] + return self.forward_gradcam(imgs, **kwargs) + if return_loss: + if label is None: + raise ValueError('Label should not be None.') + if self.blending is not None: + imgs, label = self.blending(imgs, label) + return self.forward_train(imgs, label, **kwargs) + + return self.forward_test(imgs, **kwargs) + + def train_step(self, data_batch, optimizer, **kwargs): + """The iteration step during training. + + This method defines an iteration step during training, except for the + back propagation and optimizer updating, which are done in an optimizer + hook. Note that in some complicated cases or models, the whole process + including back propagation and optimizer updating is also defined in + this method, such as GAN. + + Args: + data_batch (dict): The output of dataloader. + optimizer (:obj:`torch.optim.Optimizer` | dict): The optimizer of + runner is passed to ``train_step()``. This argument is unused + and reserved. + + Returns: + dict: It should contain at least 3 keys: ``loss``, ``log_vars``, + ``num_samples``. + ``loss`` is a tensor for back propagation, which can be a + weighted sum of multiple losses. + ``log_vars`` contains all the variables to be sent to the + logger. + ``num_samples`` indicates the batch size (when the model is + DDP, it means the batch size on each GPU), which is used for + averaging the logs. + """ + imgs = data_batch['imgs'] + label = data_batch['label'] + + aux_info = {} + for item in self.aux_info: + assert item in data_batch + aux_info[item] = data_batch[item] + + losses = self(imgs, label, return_loss=True, **aux_info) + + loss, log_vars = self._parse_losses(losses) + + outputs = dict( + loss=loss, + log_vars=log_vars, + num_samples=len(next(iter(data_batch.values())))) + + return outputs + + def val_step(self, data_batch, optimizer, **kwargs): + """The iteration step during validation. + + This method shares the same signature as :func:`train_step`, but used + during val epochs. Note that the evaluation after training epochs is + not implemented with this method, but an evaluation hook. + """ + imgs = data_batch['imgs'] + label = data_batch['label'] + + aux_info = {} + for item in self.aux_info: + aux_info[item] = data_batch[item] + + losses = self(imgs, label, return_loss=True, **aux_info) + + loss, log_vars = self._parse_losses(losses) + + outputs = dict( + loss=loss, + log_vars=log_vars, + num_samples=len(next(iter(data_batch.values())))) + + return outputs diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/recognizer2d.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/recognizer2d.py new file mode 100644 index 0000000000000000000000000000000000000000..a1acc091381744d1dee48da0c0404aa4e03b909f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/recognizer2d.py @@ -0,0 +1,186 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +from torch import nn + +from ..builder import RECOGNIZERS +from .base import BaseRecognizer + + +@RECOGNIZERS.register_module() +class Recognizer2D(BaseRecognizer): + """2D recognizer model framework.""" + + def forward_train(self, imgs, labels, **kwargs): + """Defines the computation performed at every call when training.""" + + assert self.with_cls_head + batches = imgs.shape[0] + imgs = imgs.reshape((-1, ) + imgs.shape[2:]) + num_segs = imgs.shape[0] // batches + + losses = dict() + + x = self.extract_feat(imgs) + + if self.backbone_from in ['torchvision', 'timm']: + if len(x.shape) == 4 and (x.shape[2] > 1 or x.shape[3] > 1): + # apply adaptive avg pooling + x = nn.AdaptiveAvgPool2d(1)(x) + x = x.reshape((x.shape[0], -1)) + x = x.reshape(x.shape + (1, 1)) + + if self.with_neck: + x = [ + each.reshape((-1, num_segs) + + each.shape[1:]).transpose(1, 2).contiguous() + for each in x + ] + x, loss_aux = self.neck(x, labels.squeeze()) + x = x.squeeze(2) + num_segs = 1 + losses.update(loss_aux) + + cls_score = self.cls_head(x, num_segs) + gt_labels = labels.squeeze() + loss_cls = self.cls_head.loss(cls_score, gt_labels, **kwargs) + losses.update(loss_cls) + + return losses + + def _do_test(self, imgs): + """Defines the computation performed at every call when evaluation, + testing and gradcam.""" + batches = imgs.shape[0] + imgs = imgs.reshape((-1, ) + imgs.shape[2:]) + num_segs = imgs.shape[0] // batches + + x = self.extract_feat(imgs) + + if self.backbone_from in ['torchvision', 'timm']: + if len(x.shape) == 4 and (x.shape[2] > 1 or x.shape[3] > 1): + # apply adaptive avg pooling + x = nn.AdaptiveAvgPool2d(1)(x) + x = x.reshape((x.shape[0], -1)) + x = x.reshape(x.shape + (1, 1)) + + if self.with_neck: + x = [ + each.reshape((-1, num_segs) + + each.shape[1:]).transpose(1, 2).contiguous() + for each in x + ] + x, _ = self.neck(x) + x = x.squeeze(2) + num_segs = 1 + + if self.feature_extraction: + # perform spatial pooling + avg_pool = nn.AdaptiveAvgPool2d(1) + x = avg_pool(x) + # squeeze dimensions + x = x.reshape((batches, num_segs, -1)) + # temporal average pooling + x = x.mean(axis=1) + return x + + # When using `TSNHead` or `TPNHead`, shape is [batch_size, num_classes] + # When using `TSMHead`, shape is [batch_size * num_crops, num_classes] + # `num_crops` is calculated by: + # 1) `twice_sample` in `SampleFrames` + # 2) `num_sample_positions` in `DenseSampleFrames` + # 3) `ThreeCrop/TenCrop` in `test_pipeline` + # 4) `num_clips` in `SampleFrames` or its subclass if `clip_len != 1` + + # should have cls_head if not extracting features + cls_score = self.cls_head(x, num_segs) + + assert cls_score.size()[0] % batches == 0 + # calculate num_crops automatically + cls_score = self.average_clip(cls_score, + cls_score.size()[0] // batches) + return cls_score + + def _do_fcn_test(self, imgs): + # [N, num_crops * num_segs, C, H, W] -> + # [N * num_crops * num_segs, C, H, W] + batches = imgs.shape[0] + imgs = imgs.reshape((-1, ) + imgs.shape[2:]) + num_segs = self.test_cfg.get('num_segs', self.backbone.num_segments) + + if self.test_cfg.get('flip', False): + imgs = torch.flip(imgs, [-1]) + x = self.extract_feat(imgs) + + if self.with_neck: + x = [ + each.reshape((-1, num_segs) + + each.shape[1:]).transpose(1, 2).contiguous() + for each in x + ] + x, _ = self.neck(x) + else: + x = x.reshape((-1, num_segs) + + x.shape[1:]).transpose(1, 2).contiguous() + + # When using `TSNHead` or `TPNHead`, shape is [batch_size, num_classes] + # When using `TSMHead`, shape is [batch_size * num_crops, num_classes] + # `num_crops` is calculated by: + # 1) `twice_sample` in `SampleFrames` + # 2) `num_sample_positions` in `DenseSampleFrames` + # 3) `ThreeCrop/TenCrop` in `test_pipeline` + # 4) `num_clips` in `SampleFrames` or its subclass if `clip_len != 1` + cls_score = self.cls_head(x, fcn_test=True) + + assert cls_score.size()[0] % batches == 0 + # calculate num_crops automatically + cls_score = self.average_clip(cls_score, + cls_score.size()[0] // batches) + return cls_score + + def forward_test(self, imgs): + """Defines the computation performed at every call when evaluation and + testing.""" + if self.test_cfg.get('fcn_test', False): + # If specified, spatially fully-convolutional testing is performed + assert not self.feature_extraction + assert self.with_cls_head + return self._do_fcn_test(imgs).cpu().numpy() + return self._do_test(imgs).cpu().numpy() + + def forward_dummy(self, imgs, softmax=False): + """Used for computing network FLOPs. + + See ``tools/analysis/get_flops.py``. + + Args: + imgs (torch.Tensor): Input images. + + Returns: + Tensor: Class score. + """ + assert self.with_cls_head + batches = imgs.shape[0] + imgs = imgs.reshape((-1, ) + imgs.shape[2:]) + num_segs = imgs.shape[0] // batches + + x = self.extract_feat(imgs) + if self.with_neck: + x = [ + each.reshape((-1, num_segs) + + each.shape[1:]).transpose(1, 2).contiguous() + for each in x + ] + x, _ = self.neck(x) + x = x.squeeze(2) + num_segs = 1 + + outs = self.cls_head(x, num_segs) + if softmax: + outs = nn.functional.softmax(outs) + return (outs, ) + + def forward_gradcam(self, imgs): + """Defines the computation performed at every call when using gradcam + utils.""" + assert self.with_cls_head + return self._do_test(imgs) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/recognizer3d.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/recognizer3d.py new file mode 100644 index 0000000000000000000000000000000000000000..8133e7c12e975e69e67954fb737004a96e91d055 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/recognizers/recognizer3d.py @@ -0,0 +1,128 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +from torch import nn + +from ..builder import RECOGNIZERS +from .base import BaseRecognizer + + +@RECOGNIZERS.register_module() +class Recognizer3D(BaseRecognizer): + """3D recognizer model framework.""" + + def forward_train(self, imgs, labels, **kwargs): + """Defines the computation performed at every call when training.""" + + assert self.with_cls_head + imgs = imgs.reshape((-1, ) + imgs.shape[2:]) + losses = dict() + + x = self.extract_feat(imgs) + if self.with_neck: + x, loss_aux = self.neck(x, labels.squeeze()) + losses.update(loss_aux) + + cls_score = self.cls_head(x) + gt_labels = labels.squeeze() + loss_cls = self.cls_head.loss(cls_score, gt_labels, **kwargs) + losses.update(loss_cls) + + return losses + + def _do_test(self, imgs): + """Defines the computation performed at every call when evaluation, + testing and gradcam.""" + batches = imgs.shape[0] + num_segs = imgs.shape[1] + imgs = imgs.reshape((-1, ) + imgs.shape[2:]) + + if self.max_testing_views is not None: + total_views = imgs.shape[0] + assert num_segs == total_views, ( + 'max_testing_views is only compatible ' + 'with batch_size == 1') + view_ptr = 0 + feats = [] + while view_ptr < total_views: + batch_imgs = imgs[view_ptr:view_ptr + self.max_testing_views] + x = self.extract_feat(batch_imgs) + if self.with_neck: + x, _ = self.neck(x) + feats.append(x) + view_ptr += self.max_testing_views + # should consider the case that feat is a tuple + if isinstance(feats[0], tuple): + len_tuple = len(feats[0]) + feat = [ + torch.cat([x[i] for x in feats]) for i in range(len_tuple) + ] + feat = tuple(feat) + else: + feat = torch.cat(feats) + else: + feat = self.extract_feat(imgs) + if self.with_neck: + feat, _ = self.neck(feat) + + if self.feature_extraction: + feat_dim = len(feat[0].size()) if isinstance(feat, tuple) else len( + feat.size()) + assert feat_dim in [ + 5, 2 + ], ('Got feature of unknown architecture, ' + 'only 3D-CNN-like ([N, in_channels, T, H, W]), and ' + 'transformer-like ([N, in_channels]) features are supported.') + if feat_dim == 5: # 3D-CNN architecture + # perform spatio-temporal pooling + avg_pool = nn.AdaptiveAvgPool3d(1) + if isinstance(feat, tuple): + feat = [avg_pool(x) for x in feat] + # concat them + feat = torch.cat(feat, axis=1) + else: + feat = avg_pool(feat) + # squeeze dimensions + feat = feat.reshape((batches, num_segs, -1)) + # temporal average pooling + feat = feat.mean(axis=1) + return feat + + # should have cls_head if not extracting features + assert self.with_cls_head + cls_score = self.cls_head(feat) + cls_score = self.average_clip(cls_score, num_segs) + return cls_score + + def forward_test(self, imgs): + """Defines the computation performed at every call when evaluation and + testing.""" + return self._do_test(imgs).cpu().numpy() + + def forward_dummy(self, imgs, softmax=False): + """Used for computing network FLOPs. + + See ``tools/analysis/get_flops.py``. + + Args: + imgs (torch.Tensor): Input images. + + Returns: + Tensor: Class score. + """ + assert self.with_cls_head + imgs = imgs.reshape((-1, ) + imgs.shape[2:]) + x = self.extract_feat(imgs) + + if self.with_neck: + x, _ = self.neck(x) + + outs = self.cls_head(x) + if softmax: + outs = nn.functional.softmax(outs) + return (outs, ) + + def forward_gradcam(self, imgs): + """Defines the computation performed at every call when using gradcam + utils.""" + assert self.with_cls_head + return self._do_test(imgs) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/roi_extractors/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/roi_extractors/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..62d681419605a52045c48d1dd92cbdd248772c9a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/roi_extractors/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .single_straight3d import SingleRoIExtractor3D + +__all__ = ['SingleRoIExtractor3D'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/roi_extractors/single_straight3d.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/roi_extractors/single_straight3d.py new file mode 100644 index 0000000000000000000000000000000000000000..fb0c1542dbb7a9a96b788dabd606c8c842a09ccb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/roi_extractors/single_straight3d.py @@ -0,0 +1,121 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +import torch.nn.functional as F + +try: + from mmdet.models import ROI_EXTRACTORS + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + + +class SingleRoIExtractor3D(nn.Module): + """Extract RoI features from a single level feature map. + + Args: + roi_layer_type (str): Specify the RoI layer type. Default: 'RoIAlign'. + featmap_stride (int): Strides of input feature maps. Default: 16. + output_size (int | tuple): Size or (Height, Width). Default: 16. + sampling_ratio (int): number of inputs samples to take for each + output sample. 0 to take samples densely for current models. + Default: 0. + pool_mode (str, 'avg' or 'max'): pooling mode in each bin. + Default: 'avg'. + aligned (bool): if False, use the legacy implementation in + MMDetection. If True, align the results more perfectly. + Default: True. + with_temporal_pool (bool): if True, avgpool the temporal dim. + Default: True. + with_global (bool): if True, concatenate the RoI feature with global + feature. Default: False. + + Note that sampling_ratio, pool_mode, aligned only apply when roi_layer_type + is set as RoIAlign. + """ + + def __init__(self, + roi_layer_type='RoIAlign', + featmap_stride=16, + output_size=16, + sampling_ratio=0, + pool_mode='avg', + aligned=True, + with_temporal_pool=True, + temporal_pool_mode='avg', + with_global=False): + super().__init__() + self.roi_layer_type = roi_layer_type + assert self.roi_layer_type in ['RoIPool', 'RoIAlign'] + self.featmap_stride = featmap_stride + self.spatial_scale = 1. / self.featmap_stride + + self.output_size = output_size + self.sampling_ratio = sampling_ratio + self.pool_mode = pool_mode + self.aligned = aligned + + self.with_temporal_pool = with_temporal_pool + self.temporal_pool_mode = temporal_pool_mode + + self.with_global = with_global + + try: + from mmcv.ops import RoIAlign, RoIPool + except (ImportError, ModuleNotFoundError): + raise ImportError('Failed to import `RoIAlign` and `RoIPool` from ' + '`mmcv.ops`. The two modules will be used in ' + '`SingleRoIExtractor3D`! ') + + if self.roi_layer_type == 'RoIPool': + self.roi_layer = RoIPool(self.output_size, self.spatial_scale) + else: + self.roi_layer = RoIAlign( + self.output_size, + self.spatial_scale, + sampling_ratio=self.sampling_ratio, + pool_mode=self.pool_mode, + aligned=self.aligned) + self.global_pool = nn.AdaptiveAvgPool2d(self.output_size) + + def init_weights(self): + pass + + # The shape of feat is N, C, T, H, W + def forward(self, feat, rois): + if not isinstance(feat, tuple): + feat = (feat, ) + + if len(feat) >= 2: + maxT = max([x.shape[2] for x in feat]) + max_shape = (maxT, ) + feat[0].shape[3:] + # resize each feat to the largest shape (w. nearest) + feat = [F.interpolate(x, max_shape).contiguous() for x in feat] + + if self.with_temporal_pool: + if self.temporal_pool_mode == 'avg': + feat = [torch.mean(x, 2, keepdim=True) for x in feat] + elif self.temporal_pool_mode == 'max': + feat = [torch.max(x, 2, keepdim=True)[0] for x in feat] + else: + raise NotImplementedError + + feat = torch.cat(feat, axis=1).contiguous() + + roi_feats = [] + for t in range(feat.size(2)): + frame_feat = feat[:, :, t].contiguous() + roi_feat = self.roi_layer(frame_feat, rois) + if self.with_global: + global_feat = self.global_pool(frame_feat.contiguous()) + inds = rois[:, 0].type(torch.int64) + global_feat = global_feat[inds] + roi_feat = torch.cat([roi_feat, global_feat], dim=1) + roi_feat = roi_feat.contiguous() + roi_feats.append(roi_feat) + + return torch.stack(roi_feats, dim=2), feat + + +if mmdet_imported: + ROI_EXTRACTORS.register_module()(SingleRoIExtractor3D) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..914fd3ec1e0f4876f94fe69bcbabc17702bd96e0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .base import BaseGCN +from .skeletongcn import SkeletonGCN + +__all__ = ['BaseGCN', 'SkeletonGCN'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/base.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/base.py new file mode 100644 index 0000000000000000000000000000000000000000..656266a4f515ad95bd7e3e7ac17f9224bcc1601e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/base.py @@ -0,0 +1,176 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from abc import ABCMeta, abstractmethod +from collections import OrderedDict + +import torch +import torch.distributed as dist +import torch.nn as nn + +from .. import builder + + +class BaseGCN(nn.Module, metaclass=ABCMeta): + """Base class for GCN-based action recognition. + + All GCN-based recognizers should subclass it. + All subclass should overwrite: + + - Methods:``forward_train``, supporting to forward when training. + - Methods:``forward_test``, supporting to forward when testing. + + Args: + backbone (dict): Backbone modules to extract feature. + cls_head (dict | None): Classification head to process feature. + Default: None. + train_cfg (dict | None): Config for training. Default: None. + test_cfg (dict | None): Config for testing. Default: None. + """ + + def __init__(self, backbone, cls_head=None, train_cfg=None, test_cfg=None): + super().__init__() + # record the source of the backbone + self.backbone_from = 'mmaction2' + self.backbone = builder.build_backbone(backbone) + self.cls_head = builder.build_head(cls_head) if cls_head else None + + self.train_cfg = train_cfg + self.test_cfg = test_cfg + + self.init_weights() + + @property + def with_cls_head(self): + """bool: whether the recognizer has a cls_head""" + return hasattr(self, 'cls_head') and self.cls_head is not None + + def init_weights(self): + """Initialize the model network weights.""" + if self.backbone_from in ['mmcls', 'mmaction2']: + self.backbone.init_weights() + else: + raise NotImplementedError('Unsupported backbone source ' + f'{self.backbone_from}!') + + if self.with_cls_head: + self.cls_head.init_weights() + + @abstractmethod + def forward_train(self, *args, **kwargs): + """Defines the computation performed at training.""" + + @abstractmethod + def forward_test(self, *args): + """Defines the computation performed at testing.""" + + @staticmethod + def _parse_losses(losses): + """Parse the raw outputs (losses) of the network. + + Args: + losses (dict): Raw output of the network, which usually contain + losses and other necessary information. + + Returns: + tuple[Tensor, dict]: (loss, log_vars), loss is the loss tensor + which may be a weighted sum of all losses, log_vars contains + all the variables to be sent to the logger. + """ + log_vars = OrderedDict() + for loss_name, loss_value in losses.items(): + if isinstance(loss_value, torch.Tensor): + log_vars[loss_name] = loss_value.mean() + elif isinstance(loss_value, list): + log_vars[loss_name] = sum(_loss.mean() for _loss in loss_value) + else: + raise TypeError( + f'{loss_name} is not a tensor or list of tensors') + + loss = sum(_value for _key, _value in log_vars.items() + if 'loss' in _key) + + log_vars['loss'] = loss + for loss_name, loss_value in log_vars.items(): + # reduce loss when distributed training + if dist.is_available() and dist.is_initialized(): + loss_value = loss_value.data.clone() + dist.all_reduce(loss_value.div_(dist.get_world_size())) + log_vars[loss_name] = loss_value.item() + + return loss, log_vars + + def forward(self, keypoint, label=None, return_loss=True, **kwargs): + """Define the computation performed at every call.""" + if return_loss: + if label is None: + raise ValueError('Label should not be None.') + return self.forward_train(keypoint, label, **kwargs) + + return self.forward_test(keypoint, **kwargs) + + def extract_feat(self, skeletons): + """Extract features through a backbone. + + Args: + skeletons (torch.Tensor): The input skeletons. + + Returns: + torch.tensor: The extracted features. + """ + x = self.backbone(skeletons) + return x + + def train_step(self, data_batch, optimizer, **kwargs): + """The iteration step during training. + + This method defines an iteration step during training, except for the + back propagation and optimizer updating, which are done in an optimizer + hook. Note that in some complicated cases or models, the whole process + including back propagation and optimizer updating is also defined in + this method, such as GAN. + + Args: + data_batch (dict): The output of dataloader. + optimizer (:obj:`torch.optim.Optimizer` | dict): The optimizer of + runner is passed to ``train_step()``. This argument is unused + and reserved. + + Returns: + dict: It should contain at least 3 keys: ``loss``, ``log_vars``, + ``num_samples``. + ``loss`` is a tensor for back propagation, which can be a + weighted sum of multiple losses. + ``log_vars`` contains all the variables to be sent to the + logger. + ``num_samples`` indicates the batch size (when the model is + DDP, it means the batch size on each GPU), which is used for + averaging the logs. + """ + skeletons = data_batch['keypoint'] + label = data_batch['label'] + label = label.squeeze(-1) + + losses = self(skeletons, label, return_loss=True) + + loss, log_vars = self._parse_losses(losses) + outputs = dict( + loss=loss, log_vars=log_vars, num_samples=len(skeletons.data)) + + return outputs + + def val_step(self, data_batch, optimizer, **kwargs): + """The iteration step during validation. + + This method shares the same signature as :func:`train_step`, but used + during val epochs. Note that the evaluation after training epochs is + not implemented with this method, but an evaluation hook. + """ + skeletons = data_batch['keypoint'] + label = data_batch['label'] + + losses = self(skeletons, label, return_loss=True) + + loss, log_vars = self._parse_losses(losses) + outputs = dict( + loss=loss, log_vars=log_vars, num_samples=len(skeletons.data)) + + return outputs diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/skeletongcn.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/skeletongcn.py new file mode 100644 index 0000000000000000000000000000000000000000..0576ee20a36569570c7cd7b06b979bb81b96eac7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/skeletongcn.py @@ -0,0 +1,30 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from ..builder import RECOGNIZERS +from .base import BaseGCN + + +@RECOGNIZERS.register_module() +class SkeletonGCN(BaseGCN): + """Spatial temporal graph convolutional networks.""" + + def forward_train(self, skeletons, labels, **kwargs): + """Defines the computation performed at every call when training.""" + assert self.with_cls_head + losses = dict() + + x = self.extract_feat(skeletons) + output = self.cls_head(x) + gt_labels = labels.squeeze(-1) + loss = self.cls_head.loss(output, gt_labels) + losses.update(loss) + + return losses + + def forward_test(self, skeletons): + """Defines the computation performed at every call when evaluation and + testing.""" + x = self.extract_feat(skeletons) + assert self.with_cls_head + output = self.cls_head(x) + + return output.data.cpu().numpy() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/utils/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..6c0b7c0529cf179f08a128a12004ee1e9385b57c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/utils/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .graph import Graph + +__all__ = ['Graph'] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/utils/graph.py b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/utils/graph.py new file mode 100644 index 0000000000000000000000000000000000000000..e0fce39cb18f74cceacc9193a24b81f967c74020 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/models/skeleton_gcn/utils/graph.py @@ -0,0 +1,196 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np + + +def get_hop_distance(num_node, edge, max_hop=1): + adj_mat = np.zeros((num_node, num_node)) + for i, j in edge: + adj_mat[i, j] = 1 + adj_mat[j, i] = 1 + + # compute hop steps + hop_dis = np.zeros((num_node, num_node)) + np.inf + transfer_mat = [ + np.linalg.matrix_power(adj_mat, d) for d in range(max_hop + 1) + ] + arrive_mat = (np.stack(transfer_mat) > 0) + for d in range(max_hop, -1, -1): + hop_dis[arrive_mat[d]] = d + return hop_dis + + +def normalize_digraph(adj_matrix): + Dl = np.sum(adj_matrix, 0) + num_nodes = adj_matrix.shape[0] + Dn = np.zeros((num_nodes, num_nodes)) + for i in range(num_nodes): + if Dl[i] > 0: + Dn[i, i] = Dl[i]**(-1) + norm_matrix = np.dot(adj_matrix, Dn) + return norm_matrix + + +def edge2mat(link, num_node): + A = np.zeros((num_node, num_node)) + for i, j in link: + A[j, i] = 1 + return A + + +class Graph: + """The Graph to model the skeletons extracted by the openpose. + + Args: + layout (str): must be one of the following candidates + - openpose: 18 or 25 joints. For more information, please refer to: + https://github.com/CMU-Perceptual-Computing-Lab/openpose#output + - ntu-rgb+d: Is consists of 25 joints. For more information, please + refer to https://github.com/shahroudy/NTURGB-D + + strategy (str): must be one of the follow candidates + - uniform: Uniform Labeling + - distance: Distance Partitioning + - spatial: Spatial Configuration + For more information, please refer to the section 'Partition + Strategies' in our paper (https://arxiv.org/abs/1801.07455). + + max_hop (int): the maximal distance between two connected nodes. + Default: 1 + dilation (int): controls the spacing between the kernel points. + Default: 1 + """ + + def __init__(self, + layout='openpose-18', + strategy='uniform', + max_hop=1, + dilation=1): + self.max_hop = max_hop + self.dilation = dilation + + assert layout in [ + 'openpose-18', 'openpose-25', 'ntu-rgb+d', 'ntu_edge', 'coco' + ] + assert strategy in ['uniform', 'distance', 'spatial', 'agcn'] + self.get_edge(layout) + self.hop_dis = get_hop_distance( + self.num_node, self.edge, max_hop=max_hop) + self.get_adjacency(strategy) + + def __str__(self): + return self.A + + def get_edge(self, layout): + """This method returns the edge pairs of the layout.""" + + if layout == 'openpose-18': + self.num_node = 18 + self_link = [(i, i) for i in range(self.num_node)] + neighbor_link = [(4, 3), (3, 2), (7, 6), (6, 5), + (13, 12), (12, 11), (10, 9), (9, 8), (11, 5), + (8, 2), (5, 1), (2, 1), (0, 1), (15, 0), (14, 0), + (17, 15), (16, 14)] + self.edge = self_link + neighbor_link + self.center = 1 + elif layout == 'openpose-25': + self.num_node = 25 + self_link = [(i, i) for i in range(self.num_node)] + neighbor_link = [(4, 3), (3, 2), (7, 6), (6, 5), (23, 22), + (22, 11), (24, 11), (11, 10), (10, 9), (9, 8), + (20, 19), (19, 14), (21, 14), (14, 13), (13, 12), + (12, 8), (8, 1), (5, 1), (2, 1), (0, 1), (15, 0), + (16, 0), (17, 15), (18, 16)] + self.self_link = self_link + self.neighbor_link = neighbor_link + self.edge = self_link + neighbor_link + self.center = 1 + elif layout == 'ntu-rgb+d': + self.num_node = 25 + self_link = [(i, i) for i in range(self.num_node)] + neighbor_1base = [(1, 2), (2, 21), (3, 21), + (4, 3), (5, 21), (6, 5), (7, 6), (8, 7), (9, 21), + (10, 9), (11, 10), (12, 11), (13, 1), (14, 13), + (15, 14), (16, 15), (17, 1), (18, 17), (19, 18), + (20, 19), (22, 23), (23, 8), (24, 25), (25, 12)] + neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base] + self.self_link = self_link + self.neighbor_link = neighbor_link + self.edge = self_link + neighbor_link + self.center = 21 - 1 + elif layout == 'ntu_edge': + self.num_node = 24 + self_link = [(i, i) for i in range(self.num_node)] + neighbor_1base = [(1, 2), (3, 2), (4, 3), (5, 2), (6, 5), (7, 6), + (8, 7), (9, 2), (10, 9), (11, 10), (12, 11), + (13, 1), (14, 13), (15, 14), (16, 15), (17, 1), + (18, 17), (19, 18), (20, 19), (21, 22), (22, 8), + (23, 24), (24, 12)] + neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base] + self.edge = self_link + neighbor_link + self.center = 2 + elif layout == 'coco': + self.num_node = 17 + self_link = [(i, i) for i in range(self.num_node)] + neighbor_1base = [[16, 14], [14, 12], [17, 15], [15, 13], [12, 13], + [6, 12], [7, 13], [6, 7], [8, 6], [9, 7], + [10, 8], [11, 9], [2, 3], [2, 1], [3, 1], [4, 2], + [5, 3], [4, 6], [5, 7]] + neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base] + self.edge = self_link + neighbor_link + self.center = 0 + else: + raise ValueError(f'{layout} is not supported.') + + def get_adjacency(self, strategy): + """This method returns the adjacency matrix according to strategy.""" + + valid_hop = range(0, self.max_hop + 1, self.dilation) + adjacency = np.zeros((self.num_node, self.num_node)) + for hop in valid_hop: + adjacency[self.hop_dis == hop] = 1 + normalize_adjacency = normalize_digraph(adjacency) + + if strategy == 'uniform': + A = np.zeros((1, self.num_node, self.num_node)) + A[0] = normalize_adjacency + self.A = A + elif strategy == 'distance': + A = np.zeros((len(valid_hop), self.num_node, self.num_node)) + for i, hop in enumerate(valid_hop): + A[i][self.hop_dis == hop] = normalize_adjacency[self.hop_dis == + hop] + self.A = A + elif strategy == 'spatial': + A = [] + for hop in valid_hop: + a_root = np.zeros((self.num_node, self.num_node)) + a_close = np.zeros((self.num_node, self.num_node)) + a_further = np.zeros((self.num_node, self.num_node)) + for i in range(self.num_node): + for j in range(self.num_node): + if self.hop_dis[j, i] == hop: + if self.hop_dis[j, self.center] == self.hop_dis[ + i, self.center]: + a_root[j, i] = normalize_adjacency[j, i] + elif self.hop_dis[j, self.center] > self.hop_dis[ + i, self.center]: + a_close[j, i] = normalize_adjacency[j, i] + else: + a_further[j, i] = normalize_adjacency[j, i] + if hop == 0: + A.append(a_root) + else: + A.append(a_root + a_close) + A.append(a_further) + A = np.stack(A) + self.A = A + elif strategy == 'agcn': + A = [] + link_mat = edge2mat(self.self_link, self.num_node) + In = normalize_digraph(edge2mat(self.neighbor_link, self.num_node)) + outward = [(j, i) for (i, j) in self.neighbor_link] + Out = normalize_digraph(edge2mat(outward, self.num_node)) + A = np.stack((link_mat, In, Out)) + self.A = A + else: + raise ValueError('Do Not Exist This Strategy') diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..a1bbbb761ad71c10037fdf5c6983c7f2466d0f0a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .collect_env import collect_env +from .distribution_env import build_ddp, build_dp, default_device +from .gradcam_utils import GradCAM +from .logger import get_root_logger +from .misc import get_random_string, get_shm_dir, get_thread_id +from .module_hooks import register_module_hooks +from .precise_bn import PreciseBNHook +from .setup_env import setup_multi_processes + +__all__ = [ + 'get_root_logger', 'collect_env', 'get_random_string', 'get_thread_id', + 'get_shm_dir', 'GradCAM', 'PreciseBNHook', 'register_module_hooks', + 'setup_multi_processes', 'build_ddp', 'build_dp', 'default_device' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/collect_env.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/collect_env.py new file mode 100644 index 0000000000000000000000000000000000000000..fb8e26409599f969b29eef0cfeba4f81e9be1e31 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/collect_env.py @@ -0,0 +1,17 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from mmcv.utils import collect_env as collect_basic_env +from mmcv.utils import get_git_hash + +import mmaction + + +def collect_env(): + env_info = collect_basic_env() + env_info['MMAction2'] = ( + mmaction.__version__ + '+' + get_git_hash(digits=7)) + return env_info + + +if __name__ == '__main__': + for name, val in collect_env().items(): + print(f'{name}: {val}') diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/distribution_env.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/distribution_env.py new file mode 100644 index 0000000000000000000000000000000000000000..6e241e032e979034f691f699b60c0d5fcb6c30fc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/distribution_env.py @@ -0,0 +1,94 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +from mmcv.parallel import MMDataParallel, MMDistributedDataParallel + +dp_factory = {'cuda': MMDataParallel, 'cpu': MMDataParallel} + +ddp_factory = {'cuda': MMDistributedDataParallel} + + +def build_dp(model, device='cuda', default_args=None): + """build DataParallel module by device type. + + if device is cuda, return a MMDataParallel model; if device is mlu, + return a MLUDataParallel model. + Args: + model(nn.Module): model to be parallelized. + device(str): device type, cuda, cpu or mlu. Defaults to cuda. + default_args: dict type, include the following parameters. + device_ids(int): device ids of modules to be scattered to. + Defaults to None when GPU or MLU is not available. + Returns: + model(nn.Module): the model to be parallelized. + """ + + if device == 'cuda': + model = model.cuda() + elif device == 'mlu': + from mmcv.device.mlu import MLUDataParallel + dp_factory['mlu'] = MLUDataParallel + model = model.mlu() + + return dp_factory[device](model, **default_args) + + +def build_ddp(model, device='cuda', default_args=None): + """Build DistributedDataParallel module by device type. + If device is cuda, return a MMDistributedDataParallel model; + if device is mlu, return a MLUDistributedDataParallel model. + Args: + model(:class:`nn.Moudle`): module to be parallelized. + device(str): device type, mlu or cuda. + default_args: dict type, include the following parameters. + device_ids(int): which represents the only device where the input + module corresponding to this process resides. Defaults to None. + broadcast_buffers(bool): Flag that enables syncing (broadcasting) + buffers of the module at beginning of the forward function. + Defaults to True. + find_unused_parameters(bool): Traverse the autograd graph of all + tensors contained in the return value of the wrapped module's + ``forward`` function. + Parameters that don't receive gradients as part of this graph + are preemptively marked as being ready to be reduced. Note that + all ``forward`` outputs that are derived from module parameters + must participate in calculating loss and later the gradient + computation. If they don't, this wrapper will hang waiting + for autograd to produce gradients for those parameters. Any + outputs derived from module parameters that are otherwise + unused can be detached from the autograd graph using + ``torch.Tensor.detach``. Defaults to False. + Returns: + model(nn.Module): the module to be parallelized + References: + .. [1] https://pytorch.org/docs/stable/generated/torch.nn.parallel. + DistributedDataParallel.html + """ + + assert device in ['cuda', 'mlu' + ], 'Only available for cuda or mlu devices currently.' + if device == 'cuda': + model = model.cuda() + elif device == 'mlu': + from mmcv.device.mlu import MLUDistributedDataParallel + ddp_factory['mlu'] = MLUDistributedDataParallel + model = model.mlu() + + return ddp_factory[device](model, **default_args) + + +def is_mlu_available(): + """Returns a bool indicating if MLU is currently available.""" + return hasattr(torch, 'is_mlu_available') and torch.is_mlu_available() + + +def get_device(): + """Returns an available device, cpu, cuda or mlu.""" + is_device_available = { + 'cuda': torch.cuda.is_available(), + 'mlu': is_mlu_available() + } + device_list = [k for k, v in is_device_available.items() if v] + return device_list[0] if len(device_list) == 1 else 'cpu' + + +default_device = get_device() diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/gradcam_utils.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/gradcam_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..06d0c78b8ed6dd9595ac59d8cbe1fac71397373f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/gradcam_utils.py @@ -0,0 +1,232 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn.functional as F + + +class GradCAM: + """GradCAM class helps create visualization results. + + Visualization results are blended by heatmaps and input images. + This class is modified from + https://github.com/facebookresearch/SlowFast/blob/master/slowfast/visualization/gradcam_utils.py # noqa + For more information about GradCAM, please visit: + https://arxiv.org/pdf/1610.02391.pdf + """ + + def __init__(self, model, target_layer_name, colormap='viridis'): + """Create GradCAM class with recognizer, target layername & colormap. + + Args: + model (nn.Module): the recognizer model to be used. + target_layer_name (str): name of convolutional layer to + be used to get gradients and feature maps from for creating + localization maps. + colormap (Optional[str]): matplotlib colormap used to create + heatmap. Default: 'viridis'. For more information, please visit + https://matplotlib.org/3.3.0/tutorials/colors/colormaps.html + """ + from ..models.recognizers import Recognizer2D, Recognizer3D + if isinstance(model, Recognizer2D): + self.is_recognizer2d = True + elif isinstance(model, Recognizer3D): + self.is_recognizer2d = False + else: + raise ValueError( + 'GradCAM utils only support Recognizer2D & Recognizer3D.') + + self.model = model + self.model.eval() + self.target_gradients = None + self.target_activations = None + + import matplotlib.pyplot as plt + self.colormap = plt.get_cmap(colormap) + self.data_mean = torch.tensor(model.cfg.img_norm_cfg['mean']) + self.data_std = torch.tensor(model.cfg.img_norm_cfg['std']) + self._register_hooks(target_layer_name) + + def _register_hooks(self, layer_name): + """Register forward and backward hook to a layer, given layer_name, to + obtain gradients and activations. + + Args: + layer_name (str): name of the layer. + """ + + def get_gradients(module, grad_input, grad_output): + self.target_gradients = grad_output[0].detach() + + def get_activations(module, input, output): + self.target_activations = output.clone().detach() + + layer_ls = layer_name.split('/') + prev_module = self.model + for layer in layer_ls: + prev_module = prev_module._modules[layer] + + target_layer = prev_module + target_layer.register_forward_hook(get_activations) + target_layer.register_backward_hook(get_gradients) + + def _calculate_localization_map(self, inputs, use_labels, delta=1e-20): + """Calculate localization map for all inputs with Grad-CAM. + + Args: + inputs (dict): model inputs, generated by test pipeline, + at least including two keys, ``imgs`` and ``label``. + use_labels (bool): Whether to use given labels to generate + localization map. Labels are in ``inputs['label']``. + delta (float): used in localization map normalization, + must be small enough. Please make sure + `localization_map_max - localization_map_min >> delta` + Returns: + tuple[torch.Tensor, torch.Tensor]: (localization_map, preds) + localization_map (torch.Tensor): the localization map for + input imgs. + preds (torch.Tensor): Model predictions for `inputs` with + shape (batch_size, num_classes). + """ + inputs['imgs'] = inputs['imgs'].clone() + + # model forward & backward + preds = self.model(gradcam=True, **inputs) + if use_labels: + labels = inputs['label'] + if labels.ndim == 1: + labels = labels.unsqueeze(-1) + score = torch.gather(preds, dim=1, index=labels) + else: + score = torch.max(preds, dim=-1)[0] + self.model.zero_grad() + score = torch.sum(score) + score.backward() + + if self.is_recognizer2d: + # [batch_size, num_segments, 3, H, W] + b, t, _, h, w = inputs['imgs'].size() + else: + # [batch_size, num_crops*num_clips, 3, clip_len, H, W] + b1, b2, _, t, h, w = inputs['imgs'].size() + b = b1 * b2 + + gradients = self.target_gradients + activations = self.target_activations + if self.is_recognizer2d: + # [B*Tg, C', H', W'] + b_tg, c, _, _ = gradients.size() + tg = b_tg // b + else: + # source shape: [B, C', Tg, H', W'] + _, c, tg, _, _ = gradients.size() + # target shape: [B, Tg, C', H', W'] + gradients = gradients.permute(0, 2, 1, 3, 4) + activations = activations.permute(0, 2, 1, 3, 4) + + # calculate & resize to [B, 1, T, H, W] + weights = torch.mean(gradients.view(b, tg, c, -1), dim=3) + weights = weights.view(b, tg, c, 1, 1) + activations = activations.view([b, tg, c] + + list(activations.size()[-2:])) + localization_map = torch.sum( + weights * activations, dim=2, keepdim=True) + localization_map = F.relu(localization_map) + localization_map = localization_map.permute(0, 2, 1, 3, 4) + localization_map = F.interpolate( + localization_map, + size=(t, h, w), + mode='trilinear', + align_corners=False) + + # Normalize the localization map. + localization_map_min, localization_map_max = ( + torch.min(localization_map.view(b, -1), dim=-1, keepdim=True)[0], + torch.max(localization_map.view(b, -1), dim=-1, keepdim=True)[0]) + localization_map_min = torch.reshape( + localization_map_min, shape=(b, 1, 1, 1, 1)) + localization_map_max = torch.reshape( + localization_map_max, shape=(b, 1, 1, 1, 1)) + localization_map = (localization_map - localization_map_min) / ( + localization_map_max - localization_map_min + delta) + localization_map = localization_map.data + + return localization_map.squeeze(dim=1), preds + + def _alpha_blending(self, localization_map, input_imgs, alpha): + """Blend heatmaps and model input images and get visulization results. + + Args: + localization_map (torch.Tensor): localization map for all inputs, + generated with Grad-CAM + input_imgs (torch.Tensor): model inputs, normed images. + alpha (float): transparency level of the heatmap, + in the range [0, 1]. + Returns: + torch.Tensor: blending results for localization map and input + images, with shape [B, T, H, W, 3] and pixel values in + RGB order within range [0, 1]. + """ + # localization_map shape [B, T, H, W] + localization_map = localization_map.cpu() + + # heatmap shape [B, T, H, W, 3] in RGB order + heatmap = self.colormap(localization_map.detach().numpy()) + heatmap = heatmap[:, :, :, :, :3] + heatmap = torch.from_numpy(heatmap) + + # Permute input imgs to [B, T, H, W, 3], like heatmap + if self.is_recognizer2d: + # Recognizer2D input (B, T, C, H, W) + curr_inp = input_imgs.permute(0, 1, 3, 4, 2) + else: + # Recognizer3D input (B', num_clips*num_crops, C, T, H, W) + # B = B' * num_clips * num_crops + curr_inp = input_imgs.view([-1] + list(input_imgs.size()[2:])) + curr_inp = curr_inp.permute(0, 2, 3, 4, 1) + + # renormalize input imgs to [0, 1] + curr_inp = curr_inp.cpu() + curr_inp *= self.data_std + curr_inp += self.data_mean + curr_inp /= 255. + + # alpha blending + blended_imgs = alpha * heatmap + (1 - alpha) * curr_inp + + return blended_imgs + + def __call__(self, inputs, use_labels=False, alpha=0.5): + """Visualize the localization maps on their corresponding inputs as + heatmap, using Grad-CAM. + + Generate visualization results for **ALL CROPS**. + For example, for I3D model, if `clip_len=32, num_clips=10` and + use `ThreeCrop` in test pipeline, then for every model inputs, + there are 960(32*10*3) images generated. + + Args: + inputs (dict): model inputs, generated by test pipeline, + at least including two keys, ``imgs`` and ``label``. + use_labels (bool): Whether to use given labels to generate + localization map. Labels are in ``inputs['label']``. + alpha (float): transparency level of the heatmap, + in the range [0, 1]. + Returns: + blended_imgs (torch.Tensor): Visualization results, blended by + localization maps and model inputs. + preds (torch.Tensor): Model predictions for inputs. + """ + + # localization_map shape [B, T, H, W] + # preds shape [batch_size, num_classes] + localization_map, preds = self._calculate_localization_map( + inputs, use_labels=use_labels) + + # blended_imgs shape [B, T, H, W, 3] + blended_imgs = self._alpha_blending(localization_map, inputs['imgs'], + alpha) + + # blended_imgs shape [B, T, H, W, 3] + # preds shape [batch_size, num_classes] + # Recognizer2D: B = batch_size, T = num_segments + # Recognizer3D: B = batch_size * num_crops * num_clips, T = clip_len + return blended_imgs, preds diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/logger.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/logger.py new file mode 100644 index 0000000000000000000000000000000000000000..6b4a3fc0ee75717b048ff60f8fe9387a36d1319b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/logger.py @@ -0,0 +1,25 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import logging + +from mmcv.utils import get_logger + + +def get_root_logger(log_file=None, log_level=logging.INFO): + """Use ``get_logger`` method in mmcv to get the root logger. + + The logger will be initialized if it has not been initialized. By default a + StreamHandler will be added. If ``log_file`` is specified, a FileHandler + will also be added. The name of the root logger is the top-level package + name, e.g., "mmaction". + + Args: + log_file (str | None): The log filename. If specified, a FileHandler + will be added to the root logger. + log_level (int): The root logger level. Note that only the process of + rank 0 is affected, while other processes will set the level to + "Error" and be silent most of the time. + + Returns: + :obj:`logging.Logger`: The root logger. + """ + return get_logger(__name__.split('.')[0], log_file, log_level) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/misc.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/misc.py new file mode 100644 index 0000000000000000000000000000000000000000..cc1efc95984033cc6faf9042c6db72e3a3c8bca7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/misc.py @@ -0,0 +1,27 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import ctypes +import random +import string + + +def get_random_string(length=15): + """Get random string with letters and digits. + + Args: + length (int): Length of random string. Default: 15. + """ + return ''.join( + random.choice(string.ascii_letters + string.digits) + for _ in range(length)) + + +def get_thread_id(): + """Get current thread id.""" + # use ctype to find thread id + thread_id = ctypes.CDLL('libc.so.6').syscall(186) + return thread_id + + +def get_shm_dir(): + """Get shm dir for temporary usage.""" + return '/dev/shm' diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/module_hooks.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/module_hooks.py new file mode 100644 index 0000000000000000000000000000000000000000..6ee6227d3ce09748de7c49b8ec2829f28a0c79a5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/module_hooks.py @@ -0,0 +1,88 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +from mmcv.utils import Registry, build_from_cfg + +MODULE_HOOKS = Registry('module_hooks') + + +def register_module_hooks(Module, module_hooks_list): + handles = [] + for module_hook_cfg in module_hooks_list: + hooked_module_name = module_hook_cfg.pop('hooked_module', 'backbone') + if not hasattr(Module, hooked_module_name): + raise ValueError( + f'{Module.__class__} has no {hooked_module_name}!') + hooked_module = getattr(Module, hooked_module_name) + hook_pos = module_hook_cfg.pop('hook_pos', 'forward_pre') + + if hook_pos == 'forward_pre': + handle = hooked_module.register_forward_pre_hook( + build_from_cfg(module_hook_cfg, MODULE_HOOKS).hook_func()) + elif hook_pos == 'forward': + handle = hooked_module.register_forward_hook( + build_from_cfg(module_hook_cfg, MODULE_HOOKS).hook_func()) + elif hook_pos == 'backward': + handle = hooked_module.register_backward_hook( + build_from_cfg(module_hook_cfg, MODULE_HOOKS).hook_func()) + else: + raise ValueError( + f'hook_pos must be `forward_pre`, `forward` or `backward`, ' + f'but get {hook_pos}') + handles.append(handle) + return handles + + +@MODULE_HOOKS.register_module() +class GPUNormalize: + """Normalize images with the given mean and std value on GPUs. + + Call the member function ``hook_func`` will return the forward pre-hook + function for module registration. + + GPU normalization, rather than CPU normalization, is more recommended in + the case of a model running on GPUs with strong compute capacity such as + Tesla V100. + + Args: + mean (Sequence[float]): Mean values of different channels. + std (Sequence[float]): Std values of different channels. + """ + + def __init__(self, input_format, mean, std): + if input_format not in ['NCTHW', 'NCHW', 'NCHW_Flow', 'NPTCHW']: + raise ValueError(f'The input format {input_format} is invalid.') + self.input_format = input_format + _mean = torch.tensor(mean) + _std = torch.tensor(std) + if input_format == 'NCTHW': + self._mean = _mean[None, :, None, None, None] + self._std = _std[None, :, None, None, None] + elif input_format == 'NCHW': + self._mean = _mean[None, :, None, None] + self._std = _std[None, :, None, None] + elif input_format == 'NCHW_Flow': + self._mean = _mean[None, :, None, None] + self._std = _std[None, :, None, None] + elif input_format == 'NPTCHW': + self._mean = _mean[None, None, None, :, None, None] + self._std = _std[None, None, None, :, None, None] + else: + raise ValueError(f'The input format {input_format} is invalid.') + + def hook_func(self): + + def normalize_hook(Module, input): + x = input[0] + assert x.dtype == torch.uint8, ( + f'The previous augmentation should use uint8 data type to ' + f'speed up computation, but get {x.dtype}') + + mean = self._mean.to(x.device) + std = self._std.to(x.device) + + with torch.no_grad(): + x = x.float().sub_(mean).div_(std) + + return (x, *input[1:]) + + return normalize_hook diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/__init__.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..fd183a6df792d61e3b38190eca6b7378dd196c96 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/__init__.py @@ -0,0 +1,8 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .longshortcyclehook import LongShortCycleHook +from .short_sampler import ShortCycleSampler +from .subbn_aggregate import SubBatchNorm3dAggregationHook + +__all__ = [ + 'ShortCycleSampler', 'LongShortCycleHook', 'SubBatchNorm3dAggregationHook' +] diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/longshortcyclehook.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/longshortcyclehook.py new file mode 100644 index 0000000000000000000000000000000000000000..202c81045fd22b9fc0743eacb393026ba56930ea --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/longshortcyclehook.py @@ -0,0 +1,257 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import torch +import torch.nn as nn +from mmcv.runner import Hook +from mmcv.runner.hooks.lr_updater import LrUpdaterHook, StepLrUpdaterHook +from torch.nn.modules.utils import _ntuple + +from mmaction.core.lr import RelativeStepLrUpdaterHook +from mmaction.utils import get_root_logger + + +def modify_subbn3d_num_splits(logger, module, num_splits): + """Recursively modify the number of splits of subbn3ds in module. + Inheritates the running_mean and running_var from last subbn.bn. + + Args: + logger (:obj:`logging.Logger`): The logger to log information. + module (nn.Module): The module to be modified. + num_splits (int): The targeted number of splits. + Returns: + int: The number of subbn3d modules modified. + """ + count = 0 + for child in module.children(): + from mmaction.models import SubBatchNorm3D + if isinstance(child, SubBatchNorm3D): + new_split_bn = nn.BatchNorm3d( + child.num_features * num_splits, affine=False).cuda() + new_state_dict = new_split_bn.state_dict() + + for param_name, param in child.bn.state_dict().items(): + origin_param_shape = param.size() + new_param_shape = new_state_dict[param_name].size() + if len(origin_param_shape) == 1 and len( + new_param_shape + ) == 1 and new_param_shape[0] >= origin_param_shape[ + 0] and new_param_shape[0] % origin_param_shape[0] == 0: + # weight bias running_var running_mean + new_state_dict[param_name] = torch.cat( + [param] * + (new_param_shape[0] // origin_param_shape[0])) + else: + logger.info(f'skip {param_name}') + + child.num_splits = num_splits + new_split_bn.load_state_dict(new_state_dict) + child.split_bn = new_split_bn + count += 1 + else: + count += modify_subbn3d_num_splits(logger, child, num_splits) + return count + + +class LongShortCycleHook(Hook): + """A multigrid method for efficiently training video models. + + This hook defines multigrid training schedule and update cfg + accordingly, which is proposed in `A Multigrid Method for Efficiently + Training Video Models `_. + + Args: + cfg (:obj:`mmcv.ConfigDictg`): The whole config for the experiment. + """ + + def __init__(self, cfg): + self.cfg = cfg + self.multi_grid_cfg = cfg.get('multigrid', None) + self.data_cfg = cfg.get('data', None) + assert (self.multi_grid_cfg is not None and self.data_cfg is not None) + self.logger = get_root_logger() + self.logger.info(self.multi_grid_cfg) + + def before_run(self, runner): + """Called before running, change the StepLrUpdaterHook to + RelativeStepLrHook.""" + self._init_schedule(runner, self.multi_grid_cfg, self.data_cfg) + steps = [] + steps = [s[-1] for s in self.schedule] + steps.insert(-1, (steps[-2] + steps[-1]) // 2) # add finetune stage + for index, hook in enumerate(runner.hooks): + if isinstance(hook, StepLrUpdaterHook): + base_lr = hook.base_lr[0] + gamma = hook.gamma + lrs = [base_lr * gamma**s[0] * s[1][0] for s in self.schedule] + lrs = lrs[:-1] + [lrs[-2], lrs[-1] * gamma + ] # finetune-stage lrs + new_hook = RelativeStepLrUpdaterHook(runner, steps, lrs) + runner.hooks[index] = new_hook + + def before_train_epoch(self, runner): + """Before training epoch, update the runner based on long-cycle + schedule.""" + self._update_long_cycle(runner) + + def _update_long_cycle(self, runner): + """Before every epoch, check if long cycle shape should change. If it + should, change the pipelines accordingly. + + change dataloader and model's subbn3d(split_bn) + """ + base_b, base_t, base_s = self._get_schedule(runner.epoch) + + # rebuild dataset + from mmaction.datasets import build_dataset + resize_list = [] + for trans in self.cfg.data.train.pipeline: + if trans['type'] == 'SampleFrames': + curr_t = trans['clip_len'] + trans['clip_len'] = base_t + trans['frame_interval'] = (curr_t * + trans['frame_interval']) / base_t + elif trans['type'] == 'Resize': + resize_list.append(trans) + resize_list[-1]['scale'] = _ntuple(2)(base_s) + + ds = build_dataset(self.cfg.data.train) + + from mmaction.datasets import build_dataloader + + dataloader = build_dataloader( + ds, + self.data_cfg.videos_per_gpu * base_b, + self.data_cfg.workers_per_gpu, + dist=True, + num_gpus=len(self.cfg.gpu_ids), + drop_last=True, + seed=self.cfg.get('seed', None), + ) + runner.data_loader = dataloader + self.logger.info('Rebuild runner.data_loader') + + # the self._max_epochs is changed, therefore update here + runner._max_iters = runner._max_epochs * len(runner.data_loader) + + # rebuild all the sub_batch_bn layers + num_modifies = modify_subbn3d_num_splits(self.logger, runner.model, + base_b) + self.logger.info(f'{num_modifies} subbns modified to {base_b}.') + + def _get_long_cycle_schedule(self, runner, cfg): + # `schedule` is a list of [step_index, base_shape, epochs] + schedule = [] + avg_bs = [] + all_shapes = [] + self.default_size = self.default_t * self.default_s**2 + for t_factor, s_factor in cfg.long_cycle_factors: + base_t = int(round(self.default_t * t_factor)) + base_s = int(round(self.default_s * s_factor)) + if cfg.short_cycle: + shapes = [[ + base_t, + int(round(self.default_s * cfg.short_cycle_factors[0])) + ], + [ + base_t, + int( + round(self.default_s * + cfg.short_cycle_factors[1])) + ], [base_t, base_s]] + else: + shapes = [[base_t, base_s]] + # calculate the batchsize, shape = [batchsize, #frames, scale] + shapes = [[ + int(round(self.default_size / (s[0] * s[1]**2))), s[0], s[1] + ] for s in shapes] + avg_bs.append(np.mean([s[0] for s in shapes])) + all_shapes.append(shapes) + + for hook in runner.hooks: + if isinstance(hook, LrUpdaterHook): + if isinstance(hook, StepLrUpdaterHook): + steps = hook.step if isinstance(hook.step, + list) else [hook.step] + steps = [0] + steps + break + else: + raise NotImplementedError( + 'Only step scheduler supports multi grid now') + else: + pass + total_iters = 0 + default_iters = steps[-1] + for step_index in range(len(steps) - 1): + # except the final step + step_epochs = steps[step_index + 1] - steps[step_index] + # number of epochs for this step + for long_cycle_index, shapes in enumerate(all_shapes): + cur_epochs = ( + step_epochs * avg_bs[long_cycle_index] / sum(avg_bs)) + cur_iters = cur_epochs / avg_bs[long_cycle_index] + total_iters += cur_iters + schedule.append((step_index, shapes[-1], cur_epochs)) + iter_saving = default_iters / total_iters + final_step_epochs = runner.max_epochs - steps[-1] + # the fine-tuning phase to have the same amount of iteration + # saving as the rest of the training + ft_epochs = final_step_epochs / iter_saving * avg_bs[-1] + # in `schedule` we ignore the shape of ShortCycle + schedule.append((step_index + 1, all_shapes[-1][-1], ft_epochs)) + + x = ( + runner.max_epochs * cfg.epoch_factor / sum(s[-1] + for s in schedule)) + runner._max_epochs = int(runner._max_epochs * cfg.epoch_factor) + final_schedule = [] + total_epochs = 0 + for s in schedule: + # extend the epochs by `factor` + epochs = s[2] * x + total_epochs += epochs + final_schedule.append((s[0], s[1], int(round(total_epochs)))) + self.logger.info(final_schedule) + return final_schedule + + def _print_schedule(self, schedule): + """logging the schedule.""" + self.logger.info('\tLongCycleId\tBase shape\tEpochs\t') + for s in schedule: + self.logger.info(f'\t{s[0]}\t{s[1]}\t{s[2]}\t') + + def _get_schedule(self, epoch): + """Returning the corresponding shape.""" + for s in self.schedule: + if epoch < s[-1]: + return s[1] + return self.schedule[-1][1] + + def _init_schedule(self, runner, multi_grid_cfg, data_cfg): + """Initialize the multigrid shcedule. + + Args: + runner (:obj: `mmcv.Runner`): The runner within which to train. + multi_grid_cfg (:obj: `mmcv.ConfigDict`): The multigrid config. + data_cfg (:obj: `mmcv.ConfigDict`): The data config. + """ + self.default_bs = data_cfg.videos_per_gpu + data_cfg = data_cfg.get('train', None) + final_resize_cfg = [ + aug for aug in data_cfg.pipeline if aug.type == 'Resize' + ][-1] + if isinstance(final_resize_cfg.scale, tuple): + # Assume square image + if max(final_resize_cfg.scale) == min(final_resize_cfg.scale): + self.default_s = max(final_resize_cfg.scale) + else: + raise NotImplementedError('non-square scale not considered.') + sample_frame_cfg = [ + aug for aug in data_cfg.pipeline if aug.type == 'SampleFrames' + ][0] + self.default_t = sample_frame_cfg.clip_len + + if multi_grid_cfg.long_cycle: + self.schedule = self._get_long_cycle_schedule( + runner, multi_grid_cfg) + else: + raise ValueError('There should be at least long cycle.') diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/short_sampler.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/short_sampler.py new file mode 100644 index 0000000000000000000000000000000000000000..01326f85bf8edd6f961f087c89d3066335490106 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/short_sampler.py @@ -0,0 +1,61 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +from torch.utils.data.sampler import Sampler + + +class ShortCycleSampler(Sampler): + """Extend Sampler to support "short cycle" sampling. + + See paper "A Multigrid Method for Efficiently Training Video Models", Wu et + al., 2019 (https://arxiv.org/abs/1912.00998) for details. + + Args: + sampler (:obj: `torch.Sampler`): The default sampler to be warpped. + batch_size (int): The batchsize before short-cycle modification. + multi_grid_cfg (dict): The config dict for multigrid training. + crop_size (int): The actual spatial scale. + drop_last (bool): Whether to drop the last incomplete batch in epoch. + Default: True. + """ + + def __init__(self, + sampler, + batch_size, + multigrid_cfg, + crop_size, + drop_last=True): + + self.sampler = sampler + self.drop_last = drop_last + + bs_factor = [ + int( + round( + (float(crop_size) / (s * multigrid_cfg.default_s[0]))**2)) + for s in multigrid_cfg.short_cycle_factors + ] + + self.batch_sizes = [ + batch_size * bs_factor[0], batch_size * bs_factor[1], batch_size + ] + + def __iter__(self): + counter = 0 + batch_size = self.batch_sizes[0] + batch = [] + for idx in self.sampler: + batch.append((idx, counter % 3)) + if len(batch) == batch_size: + yield batch + counter += 1 + batch_size = self.batch_sizes[counter % 3] + batch = [] + if len(batch) > 0 and not self.drop_last: + yield batch + + def __len__(self): + avg_batch_size = sum(self.batch_sizes) / 3.0 + if self.drop_last: + return int(np.floor(len(self.sampler) / avg_batch_size)) + else: + return int(np.ceil(len(self.sampler) / avg_batch_size)) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/subbn_aggregate.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/subbn_aggregate.py new file mode 100644 index 0000000000000000000000000000000000000000..ce0da1f8a272c02f8d7767a9724eec9db9ca0d47 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/multigrid/subbn_aggregate.py @@ -0,0 +1,22 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from mmcv.runner import HOOKS, Hook + + +def aggregate_sub_bn_status(module): + from mmaction.models import SubBatchNorm3D + count = 0 + for child in module.children(): + if isinstance(child, SubBatchNorm3D): + child.aggregate_stats() + count += 1 + else: + count += aggregate_sub_bn_status(child) + return count + + +@HOOKS.register_module() +class SubBatchNorm3dAggregationHook(Hook): + """Recursively find all SubBN modules and aggregate sub-BN stats.""" + + def after_train_epoch(self, runner): + _ = aggregate_sub_bn_status(runner.model) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/precise_bn.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/precise_bn.py new file mode 100644 index 0000000000000000000000000000000000000000..2751b2e736c03ca41a6048b321e6f5527ca5bf94 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/precise_bn.py @@ -0,0 +1,155 @@ +# Adapted from https://github.com/facebookresearch/fvcore/blob/master/fvcore/nn/precise_bn.py # noqa: E501 +# Original licence: Copyright (c) 2019 Facebook, Inc under the Apache License 2.0 # noqa: E501 + +import logging +import time + +import mmcv +import torch +from mmcv.parallel import MMDistributedDataParallel +from mmcv.runner import Hook +from mmcv.utils import print_log +from torch.nn import GroupNorm +from torch.nn.modules.batchnorm import _BatchNorm +from torch.nn.modules.instancenorm import _InstanceNorm +from torch.nn.parallel import DataParallel, DistributedDataParallel +from torch.utils.data import DataLoader + + +def is_parallel_module(module): + """Check if a module is a parallel module. + + The following 3 modules (and their subclasses) are regarded as parallel + modules: DataParallel, DistributedDataParallel, + MMDistributedDataParallel (the deprecated version). + + Args: + module (nn.Module): The module to be checked. + Returns: + bool: True if the input module is a parallel module. + """ + parallels = (DataParallel, DistributedDataParallel, + MMDistributedDataParallel) + return bool(isinstance(module, parallels)) + + +@torch.no_grad() +def update_bn_stats(model, data_loader, num_iters=200, logger=None): + """Recompute and update the batch norm stats to make them more precise. + + During + training both BN stats and the weight are changing after every iteration, + so the running average can not precisely reflect the actual stats of the + current model. + In this function, the BN stats are recomputed with fixed weights, to make + the running average more precise. Specifically, it computes the true + average of per-batch mean/variance instead of the running average. + + Args: + model (nn.Module): The model whose bn stats will be recomputed. + data_loader (iterator): The DataLoader iterator. + num_iters (int): number of iterations to compute the stats. + logger (:obj:`logging.Logger` | None): Logger for logging. + Default: None. + """ + + model.train() + + assert len(data_loader) >= num_iters, ( + f'length of dataloader {len(data_loader)} must be greater than ' + f'iteration number {num_iters}') + + if is_parallel_module(model): + parallel_module = model + model = model.module + else: + parallel_module = model + # Finds all the bn layers with training=True. + bn_layers = [ + m for m in model.modules() if m.training and isinstance(m, _BatchNorm) + ] + + if len(bn_layers) == 0: + print_log('No BN found in model', logger=logger, level=logging.WARNING) + return + print_log(f'{len(bn_layers)} BN found', logger=logger) + + # Finds all the other norm layers with training=True. + for m in model.modules(): + if m.training and isinstance(m, (_InstanceNorm, GroupNorm)): + print_log( + 'IN/GN stats will be updated like training.', + logger=logger, + level=logging.WARNING) + + # In order to make the running stats only reflect the current batch, the + # momentum is disabled. + # bn.running_mean = (1 - momentum) * bn.running_mean + momentum * + # batch_mean + # Setting the momentum to 1.0 to compute the stats without momentum. + momentum_actual = [bn.momentum for bn in bn_layers] # pyre-ignore + for bn in bn_layers: + bn.momentum = 1.0 + + # Note that running_var actually means "running average of variance" + running_mean = [torch.zeros_like(bn.running_mean) for bn in bn_layers] + running_var = [torch.zeros_like(bn.running_var) for bn in bn_layers] + + finish_before_loader = False + prog_bar = mmcv.ProgressBar(len(data_loader)) + for ind, data in enumerate(data_loader): + with torch.no_grad(): + parallel_module(**data, return_loss=False) + prog_bar.update() + for i, bn in enumerate(bn_layers): + # Accumulates the bn stats. + running_mean[i] += (bn.running_mean - running_mean[i]) / (ind + 1) + # running var is actually + running_var[i] += (bn.running_var - running_var[i]) / (ind + 1) + + if (ind + 1) >= num_iters: + finish_before_loader = True + break + assert finish_before_loader, 'Dataloader stopped before ' \ + f'iteration {num_iters}' + + for i, bn in enumerate(bn_layers): + # Sets the precise bn stats. + bn.running_mean = running_mean[i] + bn.running_var = running_var[i] + bn.momentum = momentum_actual[i] + + +class PreciseBNHook(Hook): + """Precise BN hook. + + Attributes: + dataloader (DataLoader): A PyTorch dataloader. + num_iters (int): Number of iterations to update the bn stats. + Default: 200. + interval (int): Perform precise bn interval (by epochs). Default: 1. + """ + + def __init__(self, dataloader, num_iters=200, interval=1): + if not isinstance(dataloader, DataLoader): + raise TypeError('dataloader must be a pytorch DataLoader, but got' + f' {type(dataloader)}') + self.dataloader = dataloader + self.interval = interval + self.num_iters = num_iters + + def after_train_epoch(self, runner): + if self.every_n_epochs(runner, self.interval): + # sleep to avoid possible deadlock + time.sleep(2.) + print_log( + f'Running Precise BN for {self.num_iters} iterations', + logger=runner.logger) + update_bn_stats( + runner.model, + self.dataloader, + self.num_iters, + logger=runner.logger) + print_log('BN stats updated', logger=runner.logger) + # sleep to avoid possible deadlock + time.sleep(2.) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/utils/setup_env.py b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/setup_env.py new file mode 100644 index 0000000000000000000000000000000000000000..21def2f0809153a5f755af2431f7e702db625e5c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/utils/setup_env.py @@ -0,0 +1,47 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os +import platform +import warnings + +import cv2 +import torch.multiprocessing as mp + + +def setup_multi_processes(cfg): + """Setup multi-processing environment variables.""" + # set multi-process start method as `fork` to speed up the training + if platform.system() != 'Windows': + mp_start_method = cfg.get('mp_start_method', 'fork') + current_method = mp.get_start_method(allow_none=True) + if current_method is not None and current_method != mp_start_method: + warnings.warn( + f'Multi-processing start method `{mp_start_method}` is ' + f'different from the previous setting `{current_method}`.' + f'It will be force set to `{mp_start_method}`. You can change ' + f'this behavior by changing `mp_start_method` in your config.') + mp.set_start_method(mp_start_method, force=True) + + # disable opencv multithreading to avoid system being overloaded + opencv_num_threads = cfg.get('opencv_num_threads', 0) + cv2.setNumThreads(opencv_num_threads) + + # setup OMP threads + # This code is referred from https://github.com/pytorch/pytorch/blob/master/torch/distributed/run.py # noqa + if 'OMP_NUM_THREADS' not in os.environ and cfg.data.workers_per_gpu > 1: + omp_num_threads = 1 + warnings.warn( + f'Setting OMP_NUM_THREADS environment variable for each process ' + f'to be {omp_num_threads} in default, to avoid your system being ' + f'overloaded, please further tune the variable for optimal ' + f'performance in your application as needed.') + os.environ['OMP_NUM_THREADS'] = str(omp_num_threads) + + # setup MKL threads + if 'MKL_NUM_THREADS' not in os.environ and cfg.data.workers_per_gpu > 1: + mkl_num_threads = 1 + warnings.warn( + f'Setting MKL_NUM_THREADS environment variable for each process ' + f'to be {mkl_num_threads} in default, to avoid your system being ' + f'overloaded, please further tune the variable for optimal ' + f'performance in your application as needed.') + os.environ['MKL_NUM_THREADS'] = str(mkl_num_threads) diff --git a/openmmlab_test/mmaction2-0.24.1/mmaction/version.py b/openmmlab_test/mmaction2-0.24.1/mmaction/version.py new file mode 100644 index 0000000000000000000000000000000000000000..e05146f0a07709d5f51efed652b4cb9af900c0d2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/mmaction/version.py @@ -0,0 +1,18 @@ +# Copyright (c) Open-MMLab. All rights reserved. + +__version__ = '0.24.1' + + +def parse_version_info(version_str): + version_info = [] + for x in version_str.split('.'): + if x.isdigit(): + version_info.append(int(x)) + elif x.find('rc') != -1: + patch_version = x.split('rc') + version_info.append(int(patch_version[0])) + version_info.append(f'rc{patch_version[1]}') + return tuple(version_info) + + +version_info = parse_version_info(__version__) diff --git a/openmmlab_test/mmaction2-0.24.1/model-index.yml b/openmmlab_test/mmaction2-0.24.1/model-index.yml new file mode 100644 index 0000000000000000000000000000000000000000..e76d6e5b17d6b38c35ab5baa47804f28bb33d50b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/model-index.yml @@ -0,0 +1,24 @@ +Import: +- configs/localization/bmn/metafile.yml +- configs/localization/bsn/metafile.yml +- configs/localization/ssn/metafile.yml +- configs/recognition/csn/metafile.yml +- configs/recognition/i3d/metafile.yml +- configs/recognition/omnisource/metafile.yml +- configs/recognition/r2plus1d/metafile.yml +- configs/recognition/slowfast/metafile.yml +- configs/recognition/slowonly/metafile.yml +- configs/recognition/timesformer/metafile.yml +- configs/recognition/tin/metafile.yml +- configs/recognition/tpn/metafile.yml +- configs/recognition/tsm/metafile.yml +- configs/recognition/tsn/metafile.yml +- configs/recognition/c3d/metafile.yml +- configs/recognition/tanet/metafile.yml +- configs/recognition/x3d/metafile.yml +- configs/recognition/trn/metafile.yml +- configs/detection/ava/metafile.yml +- configs/detection/lfb/metafile.yml +- configs/detection/acrn/metafile.yml +- configs/recognition_audio/resnet/metafile.yml +- configs/skeleton/posec3d/metafile.yml diff --git a/openmmlab_test/mmaction2-0.24.1/requirements.txt b/openmmlab_test/mmaction2-0.24.1/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..3f6205f8dcdc6b2e9a1e1dee0ec65f0aad0b66c6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/requirements.txt @@ -0,0 +1,3 @@ +-r requirements/build.txt +-r requirements/optional.txt +-r requirements/tests.txt diff --git a/openmmlab_test/mmaction2-0.24.1/requirements/build.txt b/openmmlab_test/mmaction2-0.24.1/requirements/build.txt new file mode 100644 index 0000000000000000000000000000000000000000..9bbe532c191d28b823d61261971dffdc382b1dea --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/requirements/build.txt @@ -0,0 +1,8 @@ +decord >= 0.4.1 +einops +matplotlib +numpy +opencv-contrib-python +Pillow +scipy +torch>=1.3 diff --git a/openmmlab_test/mmaction2-0.24.1/requirements/docs.txt b/openmmlab_test/mmaction2-0.24.1/requirements/docs.txt new file mode 100644 index 0000000000000000000000000000000000000000..bced1dc3e1c7c9d51d8b63316e58a3018d9806dd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/requirements/docs.txt @@ -0,0 +1,17 @@ +docutils==0.16.0 +einops +markdown<3.4.0 +myst-parser +opencv-python!=4.5.5.62,!=4.5.5.64 +# Skip problematic opencv-python versions +# MMCV depends opencv-python instead of headless, thus we install opencv-python +# Due to a bug from upstream, we skip this two version +# https://github.com/opencv/opencv-python/issues/602 +# https://github.com/opencv/opencv/issues/21366 +# It seems to be fixed in https://github.com/opencv/opencv/pull/21382opencv-python +-e git+https://github.com/gaotongxiao/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme +scipy +sphinx==4.0.2 +sphinx_copybutton +sphinx_markdown_tables +sphinx_rtd_theme==0.5.2 diff --git a/openmmlab_test/mmaction2-0.24.1/requirements/mminstall.txt b/openmmlab_test/mmaction2-0.24.1/requirements/mminstall.txt new file mode 100644 index 0000000000000000000000000000000000000000..7651fd8f8ced8df366a10cf89960e253dd56f10f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/requirements/mminstall.txt @@ -0,0 +1 @@ +mmcv-full>=1.3.1 diff --git a/openmmlab_test/mmaction2-0.24.1/requirements/optional.txt b/openmmlab_test/mmaction2-0.24.1/requirements/optional.txt new file mode 100644 index 0000000000000000000000000000000000000000..631cfe7b869dc4da3383811567bb89d87ebec34d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/requirements/optional.txt @@ -0,0 +1,11 @@ +av +imgaug +librosa +lmdb +moviepy +onnx +onnxruntime +packaging +pims +PyTurboJPEG +timm diff --git a/openmmlab_test/mmaction2-0.24.1/requirements/readthedocs.txt b/openmmlab_test/mmaction2-0.24.1/requirements/readthedocs.txt new file mode 100644 index 0000000000000000000000000000000000000000..70a4cd3546befeffb983a46e501a6a5b6c4bc58c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/requirements/readthedocs.txt @@ -0,0 +1,4 @@ +mmcv +titlecase +torch +torchvision diff --git a/openmmlab_test/mmaction2-0.24.1/requirements/tests.txt b/openmmlab_test/mmaction2-0.24.1/requirements/tests.txt new file mode 100644 index 0000000000000000000000000000000000000000..2552b69f49f54c7a74a438fb89aedd595fc800ba --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/requirements/tests.txt @@ -0,0 +1,9 @@ +coverage +flake8 +interrogate +isort==4.3.21 +protobuf<=3.20.1 +pytest +pytest-runner +xdoctest >= 0.10.0 +yapf diff --git a/openmmlab_test/mmaction2-0.24.1/resources/acc_curve.png b/openmmlab_test/mmaction2-0.24.1/resources/acc_curve.png new file mode 100644 index 0000000000000000000000000000000000000000..27a2f0851e7d9ee0c912f73af947b11453422988 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/resources/acc_curve.png differ diff --git a/openmmlab_test/mmaction2-0.24.1/resources/data_pipeline.png b/openmmlab_test/mmaction2-0.24.1/resources/data_pipeline.png new file mode 100644 index 0000000000000000000000000000000000000000..c5217b17aa745a654bc44898ea342ca9c85dd2b4 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/resources/data_pipeline.png differ diff --git a/openmmlab_test/mmaction2-0.24.1/resources/mmaction2_logo.png b/openmmlab_test/mmaction2-0.24.1/resources/mmaction2_logo.png new file mode 100644 index 0000000000000000000000000000000000000000..f0c759bb78c5424b4394d18a5ba833a8c9f43add Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/resources/mmaction2_logo.png differ diff --git a/openmmlab_test/mmaction2-0.24.1/resources/mmaction2_overview.gif b/openmmlab_test/mmaction2-0.24.1/resources/mmaction2_overview.gif new file mode 100644 index 0000000000000000000000000000000000000000..9e77ed8c1af30292ef703676660891fd45fcebe0 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/resources/mmaction2_overview.gif differ diff --git a/openmmlab_test/mmaction2-0.24.1/resources/qq_group_qrcode.png b/openmmlab_test/mmaction2-0.24.1/resources/qq_group_qrcode.png new file mode 100644 index 0000000000000000000000000000000000000000..dad05428f5602ed1d20621db0906701bd7955166 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/resources/qq_group_qrcode.png differ diff --git a/openmmlab_test/mmaction2-0.24.1/resources/spatio-temporal-det.gif b/openmmlab_test/mmaction2-0.24.1/resources/spatio-temporal-det.gif new file mode 100644 index 0000000000000000000000000000000000000000..6f52fc76aa5fffe5bc95fb4ddff06afcf22a9910 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/resources/spatio-temporal-det.gif differ diff --git a/openmmlab_test/mmaction2-0.24.1/resources/zhihu_qrcode.jpg b/openmmlab_test/mmaction2-0.24.1/resources/zhihu_qrcode.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c745fb027f06564d41794e9a40069b06c34e2bb5 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/resources/zhihu_qrcode.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/setup.cfg b/openmmlab_test/mmaction2-0.24.1/setup.cfg new file mode 100644 index 0000000000000000000000000000000000000000..ad08ec3e4c8fd4b86d1e356328a5613812ca3860 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/setup.cfg @@ -0,0 +1,24 @@ +[bdist_wheel] +universal=1 + +[aliases] +test=pytest + +[tool:pytest] +addopts=tests/ + +[yapf] +based_on_style = pep8 +blank_line_before_nested_class_or_def = true +split_before_expression_after_opening_paren = true +split_penalty_import_names=0 +SPLIT_PENALTY_AFTER_OPENING_BRACKET=800 + +[isort] +line_length = 79 +multi_line_output = 0 +extra_standard_library = pkg_resources,setuptools +known_first_party = mmaction +known_third_party = cv2,decord,einops,joblib,matplotlib,mmcv,numpy,pandas,pytest,pytorch_sphinx_theme,scipy,seaborn,titlecase,torch,webcolors +no_lines_before = STDLIB,LOCALFOLDER +default_section = THIRDPARTY diff --git a/openmmlab_test/mmaction2-0.24.1/setup.py b/openmmlab_test/mmaction2-0.24.1/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..16923e9815d071714256d682aadcf8b7ba400077 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/setup.py @@ -0,0 +1,196 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os +import os.path as osp +import shutil +import sys +import warnings +from setuptools import find_packages, setup + + +def readme(): + with open('README.md', encoding='utf-8') as f: + content = f.read() + return content + + +version_file = 'mmaction/version.py' + + +def get_version(): + with open(version_file, 'r') as f: + exec(compile(f.read(), version_file, 'exec')) + return locals()['__version__'] + + +def parse_requirements(fname='requirements.txt', with_version=True): + """Parse the package dependencies listed in a requirements file but strips + specific versioning information. + + Args: + fname (str): path to requirements file + with_version (bool, default=False): if True include version specs + + Returns: + List[str]: list of requirements items + + CommandLine: + python -c "import setup; print(setup.parse_requirements())" + """ + import re + import sys + from os.path import exists + require_fpath = fname + + def parse_line(line): + """Parse information from a line in a requirements text file.""" + if line.startswith('-r '): + # Allow specifying requirements in other files + target = line.split(' ')[1] + for info in parse_require_file(target): + yield info + else: + info = {'line': line} + if line.startswith('-e '): + info['package'] = line.split('#egg=')[1] + elif '@git+' in line: + info['package'] = line + else: + # Remove versioning from the package + pat = '(' + '|'.join(['>=', '==', '>']) + ')' + parts = re.split(pat, line, maxsplit=1) + parts = [p.strip() for p in parts] + + info['package'] = parts[0] + if len(parts) > 1: + op, rest = parts[1:] + if ';' in rest: + # Handle platform specific dependencies + # http://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-platform-specific-dependencies + version, platform_deps = map(str.strip, + rest.split(';')) + info['platform_deps'] = platform_deps + else: + version = rest # NOQA + info['version'] = (op, version) + yield info + + def parse_require_file(fpath): + with open(fpath, 'r') as f: + for line in f.readlines(): + line = line.strip() + if line and not line.startswith('#'): + for info in parse_line(line): + yield info + + def gen_packages_items(): + if exists(require_fpath): + for info in parse_require_file(require_fpath): + parts = [info['package']] + if with_version and 'version' in info: + parts.extend(info['version']) + if not sys.version.startswith('3.4'): + # apparently package_deps are broken in 3.4 + platform_deps = info.get('platform_deps') + if platform_deps is not None: + parts.append(';' + platform_deps) + item = ''.join(parts) + yield item + + packages = list(gen_packages_items()) + return packages + + +def add_mim_extension(): + """Add extra files that are required to support MIM into the package. + + These files will be added by creating a symlink to the originals if the + package is installed in `editable` mode (e.g. pip install -e .), or by + copying from the originals otherwise. + """ + + # parse installment mode + if 'develop' in sys.argv: + # installed by `pip install -e .` + mode = 'symlink' + elif 'sdist' in sys.argv or 'bdist_wheel' in sys.argv: + # installed by `pip install .` + # or create source distribution by `python setup.py sdist` + mode = 'copy' + else: + return + + filenames = ['tools', 'configs', 'model-index.yml'] + repo_path = osp.dirname(__file__) + mim_path = osp.join(repo_path, 'mmaction', '.mim') + os.makedirs(mim_path, exist_ok=True) + + for filename in filenames: + if osp.exists(filename): + src_path = osp.join(repo_path, filename) + tar_path = osp.join(mim_path, filename) + + if osp.isfile(tar_path) or osp.islink(tar_path): + os.remove(tar_path) + elif osp.isdir(tar_path): + shutil.rmtree(tar_path) + + if mode == 'symlink': + src_relpath = osp.relpath(src_path, osp.dirname(tar_path)) + try: + os.symlink(src_relpath, tar_path) + except OSError: + # Creating a symbolic link on windows may raise an + # `OSError: [WinError 1314]` due to privilege. If + # the error happens, the src file will be copied + mode = 'copy' + warnings.warn( + f'Failed to create a symbolic link for {src_relpath}, ' + f'and it will be copied to {tar_path}') + else: + continue + elif mode == 'copy': + if osp.isfile(src_path): + shutil.copyfile(src_path, tar_path) + elif osp.isdir(src_path): + shutil.copytree(src_path, tar_path) + else: + warnings.warn(f'Cannot copy file {src_path}.') + else: + raise ValueError(f'Invalid mode {mode}') + + +if __name__ == '__main__': + add_mim_extension() + setup( + name='mmaction2', + version=get_version(), + description='OpenMMLab Video Understanding Toolbox and Benchmark', + long_description=readme(), + long_description_content_type='text/markdown', + author='MMAction2 Contributors', + author_email='openmmlab@gmail.com', + maintainer='MMAction2 Contributors', + maintainer_email='openmmlab@gmail.com', + packages=find_packages(exclude=('configs', 'tools', 'demo')), + keywords='computer vision, video understanding', + include_package_data=True, + classifiers=[ + 'Development Status :: 4 - Beta', + 'License :: OSI Approved :: Apache Software License', + 'Operating System :: OS Independent', + 'Programming Language :: Python :: 3', + 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', + 'Programming Language :: Python :: 3.8', + 'Programming Language :: Python :: 3.9', + ], + url='https://github.com/open-mmlab/mmaction2', + license='Apache License 2.0', + install_requires=parse_requirements('requirements/build.txt'), + extras_require={ + 'all': parse_requirements('requirements.txt'), + 'tests': parse_requirements('requirements/tests.txt'), + 'optional': parse_requirements('requirements/optional.txt'), + 'mim': parse_requirements('requirements/mminstall.txt'), + }, + zip_safe=False) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/activitynet_features/v_test1.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/activitynet_features/v_test1.csv new file mode 100644 index 0000000000000000000000000000000000000000..5e713e7f8e1924812a5b6b1e4fc66f22cf0e6692 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/activitynet_features/v_test1.csv @@ -0,0 +1,6 @@ +f0,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72,f73,f74,f75,f76,f77,f78,f79,f80,f81,f82,f83,f84,f85,f86,f87,f88,f89,f90,f91,f92,f93,f94,f95,f96,f97,f98,f99,f100,f101,f102,f103,f104,f105,f106,f107,f108,f109,f110,f111,f112,f113,f114,f115,f116,f117,f118,f119,f120,f121,f122,f123,f124,f125,f126,f127,f128,f129,f130,f131,f132,f133,f134,f135,f136,f137,f138,f139,f140,f141,f142,f143,f144,f145,f146,f147,f148,f149,f150,f151,f152,f153,f154,f155,f156,f157,f158,f159,f160,f161,f162,f163,f164,f165,f166,f167,f168,f169,f170,f171,f172,f173,f174,f175,f176,f177,f178,f179,f180,f181,f182,f183,f184,f185,f186,f187,f188,f189,f190,f191,f192,f193,f194,f195,f196,f197,f198,f199,f200,f201,f202,f203,f204,f205,f206,f207,f208,f209,f210,f211,f212,f213,f214,f215,f216,f217,f218,f219,f220,f221,f222,f223,f224,f225,f226,f227,f228,f229,f230,f231,f232,f233,f234,f235,f236,f237,f238,f239,f240,f241,f242,f243,f244,f245,f246,f247,f248,f249,f250,f251,f252,f253,f254,f255,f256,f257,f258,f259,f260,f261,f262,f263,f264,f265,f266,f267,f268,f269,f270,f271,f272,f273,f274,f275,f276,f277,f278,f279,f280,f281,f282,f283,f284,f285,f286,f287,f288,f289,f290,f291,f292,f293,f294,f295,f296,f297,f298,f299,f300,f301,f302,f303,f304,f305,f306,f307,f308,f309,f310,f311,f312,f313,f314,f315,f316,f317,f318,f319,f320,f321,f322,f323,f324,f325,f326,f327,f328,f329,f330,f331,f332,f333,f334,f335,f336,f337,f338,f339,f340,f341,f342,f343,f344,f345,f346,f347,f348,f349,f350,f351,f352,f353,f354,f355,f356,f357,f358,f359,f360,f361,f362,f363,f364,f365,f366,f367,f368,f369,f370,f371,f372,f373,f374,f375,f376,f377,f378,f379,f380,f381,f382,f383,f384,f385,f386,f387,f388,f389,f390,f391,f392,f393,f394,f395,f396,f397,f398,f399 +-2.52400826749,0.0481050342173,-0.727137195971,2.75537272315,3.09127621822,-1.57007092339,-0.418208286763,0.0913230466118,-0.536148328353,-0.527615223662,1.09348152733,-0.740857539139,1.03076939449,0.947990020203,-0.00932133916349,0.546988826083,-0.737920381243,0.823520260094,-1.44379751155,1.67705288164,1.85386635752,0.62453161102,1.13374109944,-0.161873651211,1.40335457467,0.267813141882,1.40327533282,0.143771933515,-0.29447495679,0.779869429758,-1.38585145822,-0.361671252653,-1.46679541523,0.0859254586217,0.266080879981,-0.680839165484,-0.774731957742,-0.618207527285,1.57201336054,0.875829197772,-0.896498123858,-2.55398872891,-0.796735937603,-0.338483746318,0.511324636391,-1.21437529424,-0.0488620607446,0.253289302886,2.71006785221,-0.573161459164,-0.341657902954,-0.854258292083,0.562081610284,-0.828878082845,2.00327134909,1.29068322546,-0.418051389774,1.14570354001,1.39098484308,-1.13415579068,-1.01751984858,-0.823485884605,0.354335798556,1.79059040272,0.609877418462,-1.01807533199,1.56390048495,1.00308338848,0.226345738051,-0.145077751076,0.0986133282503,-0.0274079232177,0.0618308794267,2.33058959297,0.0527062771437,-1.11440070055,-2.85928208684,2.15750540841,0.866524370256,-0.999664886812,0.65322760642,-1.01907039308,-0.827862563442,0.702348045951,-0.266591888881,-0.51787754913,-0.87550654118,-1.08840756221,-0.330164993751,-0.885034718769,-1.09602854198,-1.90739000514,-1.41201400125,3.55564525741,2.24864990051,1.85192671744,-0.886962869481,-0.706411036437,0.962288821262,-1.30219301658,0.0603706527015,-0.672105670826,-0.147220359933,-1.00931681574,-1.34130794644,-0.0213488208036,-0.965187689045,0.427090878957,-0.922304333641,-1.13947635577,0.637382086489,-1.706998011,0.00132625548269,0.663770250584,1.58249601114,-1.04340366269,0.375227416108,-0.0870821477482,0.551722806776,0.588611513848,-0.477017772079,-1.51536188044,0.237936462599,0.31261506067,-0.198127712396,-0.318572429209,-1.18890325315,0.035582087437,2.67528950232,-0.197889107378,1.55762961412,0.104639883842,-1.66993450781,0.702282006582,1.36717389178,0.634535223722,2.85315937821,-1.27367064913,0.483830422936,-0.869812565212,0.641265734616,-0.11914733068,1.0239396073,-3.92902142357,0.694317328488,1.34085481986,-0.135329176331,0.0261293066915,-0.303456270416,0.909167548313,-2.04735304332,-0.285427697695,-1.03457319064,-2.77420531572,0.197031497599,-0.520362589547,-1.37924786457,-0.418569629841,1.54322130788,1.83725603097,3.35605137842,-0.117215889143,-0.970470848036,-0.339063598965,1.57921290781,0.196319119013,-1.22568776573,-0.448961007657,0.609897182756,-0.168152849526,0.254480323573,-0.51589471003,-0.253088873187,-0.716572365129,-1.56268640697,-3.33835895995,-0.679914745818,0.107016925667,-1.61204098026,-0.387739681651,2.40210230323,-1.0956975287,-1.72501473746,-0.766200882827,0.752211827669,1.55532805525,0.113983938016,4.54239864121,-1.36827292666,-1.88835217549,1.40817465219,0.708602657522,1.31514883588,0.0314930005956,-0.79571607963,0.75615035674,1.14977174081,-1.72166323668,0.565034879125,-1.41448308724,-1.57710396359,-1.17078288789,1.1485206762,0.393694747107,1.20387821507,0.699366232003,1.80047030851,1.42655580688,-1.41627641805,-0.0899006423315,1.0611155262,-1.131250839,2.23898952868,-3.58230877813,-0.889216990584,1.40956827182,-1.46751403757,-0.691296854089,-1.54265676827,2.65262625498,2.19788404633,-2.01697903653,0.611521417417,0.359316692791,4.6816105414,0.862952723244,0.167491980372,2.6932665368,-3.00625465314,-0.351348050268,-0.89827277051,1.1813078626,-0.683418750015,0.612255702038,1.80744153164,0.0561640557506,-1.55411351133,0.711329718813,-3.72017506799,0.381065155569,-0.414420442519,-1.60570235569,-0.599320146458,1.05618929973,-1.47036342112,1.14814616981,-0.245414197276,-1.86036272008,2.96957122081,-1.61679375941,-0.50189343957,3.2102935297,3.52676818145,3.37559696234,1.65133903096,1.07003903059,0.246458431642,-2.86996585644,2.9472088513,0.156860758686,2.65348488352,-1.65249707957,-1.10731408448,1.62994935577,-1.96909845304,-1.9090510476,2.51069158859,-1.65984114813,0.148115664273,1.10611308391,1.18241718985,-4.85953441229,-1.0049765752,3.88280249662,-1.75265659238,0.372608524032,-2.22002927662,1.18168715581,-2.87508345833,-0.676288569625,-2.44675108062,-1.55716385372,-1.62059798953,0.724381881496,-0.960783561886,-0.552230426264,0.121615798579,1.04462357852,0.118085120237,1.26606201262,-0.380661477003,-2.58578204132,4.03374155601,-2.25326988394,-2.88061044978,3.26819336615,1.91267201179,-0.19674664532,2.05710699236,-3.54867236793,-0.326269919106,0.752888089223,0.132116086772,-1.54644230279,-2.836589684,0.141382075407,-1.44156945706,1.19807019893,1.68431397116,0.438746488152,-2.06834516275,-0.842738093366,0.465043608979,-0.629041527666,-0.0120976683258,-3.00099798249,-1.73881566772,0.881273090875,-0.540746588847,-0.38645376593,-2.43880278615,-0.563591295604,1.477140512,-1.75295748363,1.76406287775,2.66264589914,0.484454554128,0.273973214982,-2.05206947308,-0.369256326252,-0.689306857174,1.66270560488,-0.131857610115,0.955091272134,-1.60116198558,-2.28544168464,2.11164102397,-4.18991734267,0.173959671197,-0.0354114097397,-1.4089728089,-0.311132524,1.89336391541,2.43192427419,1.01858890895,2.03606205304,1.62452822335,3.64225894583,2.28056802496,5.64531833088,-1.1566376147,2.07540663589,0.620578413989,0.750977221371,0.0162535885321,-2.16207619048,-0.105952032448,-0.117025236938,-2.50755272675,1.48142693144,-0.430885550216,2.23543980132,-0.326485130108,0.0243268507167,2.06152002688,-1.02234084951,-2.0303752323,0.561301589735,2.3433107876,-0.925805005171,2.80904484078,-1.94807647011,0.329007639042,0.397634451785,1.47111085828,-2.50084066219,1.09999789629,-2.99330297808,-0.0599839422321,-1.9690194292,0.960052060426,-2.19808352939,-2.01816409011,-5.65800942077,-0.0169289777679,1.16420775694,0.723551353918,0.643957264021,-0.140148446853,-0.056547111384,1.91572655252,-1.37543404733,0.484043939791,2.79265339713,-1.17311209973,-0.371278463653,0.469582405128,-2.31444814128,1.41635027072,-1.07100369346 +-4.16998558362,2.12975610028,-2.56134395649,7.28089529038,5.71112143199,-1.43967841108,-2.27770995537,-0.621412650546,-1.44766437213,-2.65973161459,1.36775091092,-0.475116016803,-0.587382383942,4.81157625596,0.770176066954,0.363275742132,-0.0876347057022,-0.475521533538,-0.0547252563637,4.64327842236,3.68908154567,2.63090462903,4.96261648734,-2.3996240147,0.249490239721,1.12136919369,2.95945439398,-1.5711039712,2.68638911406,0.584886546134,-2.50314228614,-2.72285134157,0.61815967679,-1.74822253416,-0.311564020118,-2.74809125702,1.47346679886,-3.40588476142,1.47545339028,3.02455658674,-3.94506848613,-4.14376579285,-1.73336583535,-2.40840473334,-2.22219073812,-5.15251653036,0.988312865494,1.78566960146,6.54388860067,-1.45725802938,0.214708279868,-2.72405630668,2.83319289843,-1.85521226009,3.58616267999,3.34310981591,1.02165599783,3.42570413748,0.846149519881,-2.93276470105,-1.80281494916,-4.22263733625,-1.52749340316,3.2283666563,4.42827975909,-1.44139790932,1.73660321256,1.17811784268,4.59021838108,1.89355262021,0.455512814919,1.27808425168,1.62865997315,6.70429563522,-0.847455751549,-5.35004572391,-5.12095170339,6.48116056124,0.300556570692,-5.01764505545,-0.875816748044,-1.82039844963,-1.25923923691,0.632047503791,2.15801657677,-2.92180285851,0.511598025958,-2.96027669827,0.547309962512,-2.98510901829,-0.335630682309,-4.73974208434,-2.01421547413,3.362338895,5.79285810471,9.42033552887,-2.91738398632,1.82035643975,1.98379708379,-2.70420178073,-1.48058941424,1.56434452216,-0.992579338154,2.37859466165,-3.72032371362,1.26282515267,-3.50253353516,0.00376921892301,1.18962185065,-1.0557041204,0.54337829232,-1.99295026461,2.62920855999,3.76263545752,1.2841622142,-2.72069926341,-1.80479015474,1.58534218073,2.60577425917,-0.440677909057,2.20203198473,-3.39447330793,2.79975073894,2.23906295717,0.677189537287,1.39489221702,-0.518861652811,-1.19545238594,5.21395279209,2.14497482498,3.99990809123,-1.70296090881,-2.09669830044,-0.502894639969,3.01051452478,1.25882732471,-1.28701953888,-3.64675308704,0.679585470159,-3.88040889422,0.100971349178,-3.87473366777,8.57528485777,-7.33635827383,0.620873548189,4.256256452,-3.20197622975,0.181273331641,-1.08387027582,4.7040402782,-4.30957582315,-3.2032131656,-3.55255149682,-5.39665594737,-1.43142532587,-1.0020887959,0.310152183772,-4.9755616792,0.544686280489,3.23141360442,3.48532564084,-2.27912784214,-5.4400074927,-2.9422715648,5.55690115452,1.07856818487,-2.60423706293,0.296417542696,0.018438497484,-1.6693427813,-1.97826829297,-0.649584023059,1.0299335142,-1.30126957735,-1.49028243661,-7.05598390897,1.53666977635,2.47103852113,0.548410004575,-2.33345104297,1.05941242347,-2.22456861824,-0.833920312524,0.616063261429,1.08299628615,4.64962686857,-0.85913300693,8.38019424758,-3.35722782453,-5.88692650636,2.48297270139,-1.82296590428,-1.72441059232,-3.50540684352,-4.86662904103,1.4669711864,4.01910547892,-0.666310483219,1.94299481273,-1.65633018176,-0.233463008008,2.92032059917,-3.11237916489,1.65681514025,-5.82044394652,-0.84150699973,5.2420919474,1.65209466338,5.1169664971,2.8554833293,2.7991078945,1.85252228816,-1.80552712282,0.913601561388,0.441482040088,-0.160765804846,1.5659571287,-5.15831661542,1.85946914524,4.30885611724,2.5515617756,4.66296468178,6.40177754471,0.323659792742,2.79168056408,-2.54396620949,2.11927359978,3.5409553499,0.143619238635,0.247531717618,-3.67236700398,0.0737643596032,6.4369303449,-4.20339368939,1.39238156477,-0.479590680996,1.23359161367,1.11356295109,-0.530017747878,2.8127275755,1.67139578978,-0.648806054595,-3.56483347257,-0.00777567660002,-4.97657731056,2.76010027647,2.79106523007,-2.92366722226,-0.381967118582,-8.20272569498,-1.22538543622,-0.975923561257,-1.2079847001,5.68413191756,-0.519274702668,1.34021991417,0.46834429979,-0.752738639987,-4.23064642449,-6.19847359916,1.9824349012,-5.77588344375,-6.11922142108,3.66428396702,7.66924429814,-0.776042481264,7.10654588699,-0.732527501781,2.01595049262,-0.872191261451,2.67919575771,2.4503210032,-2.90921337763,8.53517298381,0.212812230588,0.476091645162,0.748127258619,-0.886277671655,-2.89118565341,0.142637886207,-3.79416944186,1.11709731897,-1.30126662016,0.359220613638,-2.86900741637,-4.63997180067,1.53915568789,-4.55603598674,-2.03369594216,-1.81275931041,-2.69728669763,-2.77373948296,-0.780138870872,-0.710413366953,-1.87378830453,2.78039755662,-3.32990742207,3.18837203344,-1.00930721204,-4.34471332073,-2.7804454573,1.49880246004,1.22752761165,-1.44689382633,1.45333088478,4.27367163022,1.721656696,-3.6055589668,3.01899054011,7.5569880708,-2.61906720797,-2.57271003584,2.80881048858,-0.415334333976,-3.0628209281,-3.63716221015,-0.194801000356,2.79870586514,2.79689924727,0.0788984746723,-1.96187414487,-2.75171196282,-2.28218094111,0.444554739001,4.8369281887,0.373838265736,-3.15276482065,4.03460666657,-1.86244435867,0.253326237999,-0.800799566707,-1.74990467469,-2.74444140275,-5.73288337012,-4.91918236891,-0.418412837584,-2.99338801781,-1.38950726748,1.11461923277,5.90281201998,-0.707580384415,-2.67438790878,4.21448961059,0.828290172268,1.15630444248,2.80011883676,2.65575761526,0.483185992143,1.03626998862,0.131995103361,-2.91395613949,-1.43565141161,-2.69984012683,-0.626701895692,3.98586324195,-2.19652486801,-2.48867563566,-1.19348388483,2.79217995802,-0.750475711823,-0.945274029968,-0.126381392279,-3.6633948501,-1.54844618718,1.36196402073,0.468697243529,1.29018088311,0.94496485432,0.257892522415,-5.15796130657,-1.53281098127,0.595785883914,-0.833150585492,2.10806567272,5.13338648002,0.01430302143,1.24969169378,0.00611201127369,1.25787633081,-0.926280161539,2.16456234137,2.116730539,4.47622630279,2.12537882169,0.520683592956,-1.542467405,6.23520137549,-1.31958263814,0.309113717082,-1.16410690943,2.81666246732,1.45756631712,-5.58640872558,-0.689133227666,-1.21494281928,-2.40350431559,-2.07186533292,-4.34414368868,-0.898425387144,-2.84011162599 +-2.85525532881,4.14924573302,-1.27022984872,4.43080223083,1.04979521433,-1.7563615183,-1.1571517543,0.443647010723,-0.840120493175,-0.564384366473,-0.631840480766,0.532262438599,0.584832645258,3.23352189611,3.05675490737,2.79432141225,-1.4358461082,0.0141486930853,0.928806241353,4.37966580232,2.8490308106,0.783738804857,3.78208962361,-2.80982620994,2.02718123476,0.447202665606,2.01867037753,0.748949680329,0.626896452109,-0.226885780966,-2.62637141645,-4.79518300573,0.517160896062,-0.495881884893,0.551008209387,-1.1999056525,1.58518931756,0.092337232629,-1.19481320501,2.92050409516,0.70208245794,-1.14886969738,0.497751923401,-0.698487961093,-1.87117256582,-1.65841737827,3.39620117505,3.17374242703,3.50091727654,0.480773175558,1.40684746265,-3.40429907004,0.423096078237,-1.25402658423,1.40384977142,2.23528889895,-0.70792874376,3.44265838623,-0.298643459876,-2.92092214823,-0.387096325756,-3.39548440655,-2.21305868606,4.01884763082,2.1962247467,0.178924582303,-0.175330102443,-1.81287087758,4.0013677895,0.506375047565,0.164289975565,2.65211846734,1.90428843131,4.45052925507,0.60681405703,-2.01008831143,-1.829990381,3.47248803615,-1.04316819509,-2.40825766305,-2.3010283341,1.26562317558,1.44828870733,0.254433333177,-0.294035871825,-2.39190562248,1.16849062324,-2.10750372112,-0.213768513898,-4.53380696336,-2.05353827099,-6.3679600064,-3.59502876282,-0.357480708757,3.44140817722,7.012797233,-1.16484250784,-0.17219096899,-1.65201326678,-3.91428116242,-1.39317485134,-1.78935467323,-2.13693570018,1.49206449827,-1.47030715466,0.326555347044,-2.8691151468,0.987859331371,-0.0670162276435,-1.38699082017,1.38502636115,-0.891648494402,1.63707906797,0.654039901097,0.315870566068,-1.13308484296,-1.63928325141,-0.569100450525,1.42651925405,-0.627428011101,0.216225209237,-1.25899307927,0.828946293494,0.974174125592,-0.332280605535,2.90402588169,-1.104502304,-0.644741526048,-1.07491079171,0.416999756893,1.47221087893,-3.26141314586,-2.26964950522,-0.0280790646872,3.24086038212,1.20009862085,-1.75527016382,-0.539535063108,3.23909044464,-2.99914438327,-0.492613923551,-2.91626054168,7.31597944102,-1.64774904013,-0.73017560184,0.442671738662,0.283633226553,0.714817404846,-1.79878552278,5.11262804588,-4.30506066322,-3.61411044379,-3.82477523089,-2.89008922736,-1.73692337195,-0.71265813748,0.314715143045,-3.16757190545,3.47336832523,1.5834569327,-0.637929768363,-1.56214804153,-2.64970105807,-1.12900751829,3.98810140292,1.87983502666,-2.51413838069,-0.909131198054,2.46703845749,-1.16912671606,-0.352692016586,2.6085906589,0.711290110747,1.82539761384,0.137608984311,-4.09530947288,1.01127222915,1.98808420658,0.725776154994,0.456542024016,2.36024162223,-1.51671710104,0.909857604951,1.3748901693,1.41866263221,2.22546428785,-0.842200076581,3.517446268,-1.94564609289,-2.96543750087,3.66959119841,-4.30324907561,-3.19456482887,-2.38057807227,-4.43179172357,-0.982803171277,-0.41006461223,0.280178608544,1.95349114498,0.637461675009,0.711961734593,1.80234276384,-1.78083568494,0.520603844326,-2.37248194615,0.146621232829,1.95268532594,1.55047165434,0.825010337035,2.16551250696,0.958925328056,-1.03714228699,0.654975053468,-3.01727262656,0.247705178956,-0.0690905296781,-0.235510739784,-3.40891237398,1.3884248968,1.15451488764,2.64650440057,0.807570249241,2.08921063463,0.508586264452,2.52009829918,-1.11128878554,1.39935349762,1.06951609214,-0.485668144226,-0.460008237761,-1.70877252301,0.942621914198,3.41737226328,-2.40122259855,1.40087889274,1.62360543887,-1.58665239096,1.05352225239,-1.45161462784,0.468765456079,1.15845116933,-0.269039389293,-1.64486767074,1.02112615665,-3.15314137697,1.83668091496,-0.21584566613,-3.70026185195,-0.418916064101,-3.95508877378,-2.58916404287,-0.282405416965,0.0237940859794,1.56997692525,1.15945299725,1.77722654502,2.98457802137,-1.70026101914,-1.18428363204,-3.13462997307,2.47967257818,-3.06139141003,-3.33533022483,1.78348285884,3.65876099269,-0.542083423932,4.338555275,0.646300950845,2.75761772871,-1.33789882819,3.41355988423,-2.1038232104,-1.58832200845,4.30315493663,-0.497908014457,1.43125514845,1.23661852837,1.89458917022,-1.27604429007,-0.118665337562,-2.98061999162,0.96282290379,-0.317447299958,0.177331671019,0.190233225426,-2.25885749382,0.633996060689,-0.931709454854,-0.453512817619,-1.06709086379,-0.45003234585,-2.11921728969,0.742342797123,-1.2796056505,-3.18736832539,1.89475087484,0.647759524982,3.05645425161,1.20850815674,-2.71339397748,-0.888974133234,2.6798757871,0.973526877165,-3.10087224166,0.282148707707,-0.588648343086,1.1617284895,-0.947238893711,1.91763001402,2.77221791545,0.242102493444,-2.7309236304,1.19404949462,-2.29922574123,-0.496662088036,-1.43388394435,-0.541529648303,0.914798926115,-1.00208673149,0.693878029583,-1.63149386843,-1.92279982587,-2.83413622906,1.15527868609,1.48624739955,0.0722957122324,-2.01015367587,2.79194158167,-1.34159947316,0.350978424549,-0.150799014967,0.594457630018,-0.702615435521,-2.49834770679,-3.44722706755,0.724352367323,-1.91413194974,-1.50618719021,0.208274304816,2.56051458041,-3.38282206297,-2.67611726205,4.30181331436,2.60196872592,0.980345343721,4.28195017179,2.45016477822,-0.720569800933,0.134198579739,-0.29681619644,-0.620866637628,0.0668065834062,-0.820043117604,0.427079674204,1.07770038346,-1.89850125671,-0.367198590239,0.309245206813,1.49165853987,-1.93249949853,-0.770264412958,0.697864535651,-1.92503979524,0.36664308548,0.6772959585,-0.407557226819,-0.110297719638,0.0780190831417,1.13796422362,-2.93108891884,-0.108831601143,0.0333983715381,-0.582767866453,1.68451089442,1.07477574031,0.759609896341,1.02592154245,-1.07680930615,0.977406439981,-2.15689084132,0.897650267382,0.871076323589,0.485362575054,-0.271094031335,0.392738024197,-1.50007651523,4.16120113373,-0.87542103827,0.770962069035,-0.193105610213,2.63168554207,-0.0860587771735,-1.02318051895,1.64206330359,-1.97631421804,-0.459768193164,-0.987577437561,-3.05661367973,0.700944906869,-2.85832208077 +-2.40668356418,3.32200128635,-0.583146995504,5.17893602371,0.543722619215,-3.61351331234,-2.15219051798,0.154239282607,-1.86185589939,1.86499222438,0.546306239763,0.173791361054,4.68988918622,1.45787520011,4.61635592778,4.10645994823,-3.2520207723,2.82534058571,1.75578262289,1.28921755393,-2.56118538936,-0.681506864627,1.08718702157,-1.73322505633,1.85559087117,2.59411209822,4.86438429197,0.952494197489,-2.00043742815,1.56013310157,-2.24776257197,-3.37023128669,-4.3081034263,2.49645762126,0.0613088496522,1.5614004375,0.196160220802,6.71882646243,-0.515890210072,4.46806035837,6.49843154748,-1.07791967916,3.66291252851,0.340969046157,-0.717211693128,0.893422653279,4.23518612067,1.59024640679,4.00953623931,7.01554282506,3.3829888622,-5.28307714462,-2.56433442275,-1.21852455298,0.420509056251,3.97645592849,-3.46140729904,2.4203199927,0.499145697951,-3.22149805546,-0.210846113366,-1.82363392035,-0.608880066672,6.9203904438,-1.60331305107,-0.572833641767,-0.809020875096,-3.67446678479,-0.751598751347,-4.0169324255,-2.54423304001,3.43391434272,2.22814426263,0.720494257411,2.44403583368,0.126800663272,1.21261574904,2.80068611622,-1.46503902833,-1.02387386938,-2.22691595475,2.92893217762,4.35140001932,2.05282717824,-1.44687641621,-1.2482182169,-2.92161394775,-1.7117234171,-0.664106516638,-4.8541015784,-5.77170533816,-4.39334596634,-3.39425205867,2.10928462108,1.63525372922,2.20211301041,1.6979695034,-3.62859933059,-5.0955384318,-3.70584682147,0.913468626738,-7.92930506388,-5.18711395264,-2.14751714547,0.553891262807,-1.69585991979,-3.80843970299,5.93398868561,-2.32868751923,-1.18235898415,2.63725592931,-1.31388532559,0.924713171173,-3.68923300982,1.09287478288,0.447131590248,-1.02456968466,-1.82614021699,-1.27993409872,1.58124616583,-3.71338141124,2.08220694741,-2.52321253032,-1.8201927487,-0.489585822324,2.26087673823,-3.07679171085,-2.40032638788,-2.84321398576,-1.48280228813,-0.933238696854,-4.71049805482,-2.02947084586,3.95902432919,4.56408443928,2.77234577179,1.14790276547,3.24662017902,8.24014697393,-2.22661842028,2.16570036093,-1.47694238861,-1.56150964896,3.00861291885,-1.58600352287,-2.14261006952,3.36371217092,-0.277815688848,-2.55071312587,3.11163931847,-3.03255870501,-5.94063932737,-4.34915611903,-1.83065024058,0.344852973223,1.66785877029,-2.92896215598,1.02625600656,3.99294057846,-0.764026923974,-1.21331283232,1.14239682655,2.6062800312,0.555238639911,4.5995118173,4.17675596714,-4.47169959545,0.607188218708,4.99268372536,-1.10778329849,0.359094379742,2.3692166694,0.923014166752,1.39561937173,0.489449826081,-3.64099951267,1.49465563099,-0.864940508206,-3.8856684494,3.41578993161,3.80568179767,-3.16751228809,-1.90362671534,-1.0676062123,-0.827274825275,0.810656501699,-2.94211922248,3.80980886777,-0.505323204397,-2.70784498771,5.20668672403,-4.93021412532,-4.60470018069,-0.988903661569,-3.12164619764,-0.759834496776,-1.40789370815,-1.30719569206,3.67482577324,1.25514381965,0.729897277155,-0.221074349482,0.727831269502,-0.159013110398,2.35894515037,-1.60380238533,2.4536198473,-0.0437957082188,-2.46773814758,1.21704642216,-0.603128572703,-1.80407706489,3.83205666224,-6.8485059007,-0.767830495338,3.48311652978,-1.5156415077,-0.384740158121,-2.00051572681,1.33816781203,2.74709336281,-4.9876317811,-0.8754006657,1.1287828366,4.12337694327,-0.0656415704896,0.988705775539,1.21024437666,-5.10868624687,-0.0440934690972,0.0288263560326,-0.0765196786313,-2.51989612102,0.279547793863,3.3720527331,0.871332397062,-1.06302194118,3.50864712556,-5.51388967832,-2.0657237343,1.25920955737,-0.851355524064,0.682309628327,1.77832262437,0.827240066528,2.64016712666,-1.44682307978,-1.31921160618,3.49327129126,-0.484558734299,0.692844864529,1.00374541759,2.69691859166,0.154326318701,3.57687735557,2.06113112768,0.991898488825,-2.44635528803,3.95126618067,0.472989312112,2.33190206448,-1.30573364337,-0.437735764186,-0.251160595852,-4.47043835958,-1.51135720338,0.506121761999,-2.44358267824,0.0295987832554,-0.774288076561,1.33123704235,-9.26131312053,-1.16106868347,2.29522511721,-0.143810934227,1.58851175785,-0.934488321146,2.50735031327,-2.19833483537,0.610350404581,0.244342085619,-0.716844118736,2.41659238497,-1.20272970358,-0.134129219055,1.19137221933,1.4639560167,2.79779875596,0.0395902216937,-1.13805999756,-1.18215333223,-4.94711904526,2.09147545735,-2.13449596683,-5.07175304095,3.36638139486,3.70780602773,-0.945894616344,2.34982962509,-3.65934572061,1.50665946653,-1.83905771414,0.419523326158,-3.01953722636,-2.5896670572,-3.02772776922,-0.675756167273,2.18817773163,0.581919515134,-2.15692337871,-0.136594539186,-0.149565262596,-0.947531465589,-2.10921741764,2.44600348274,-0.959342634677,-1.03096477588,-0.498095233439,-2.70281470935,0.375763909419,-1.34648666104,-1.03886758149,0.246556117833,1.06395082176,1.52048031847,4.41094911337,1.58565980355,-0.538471474896,-3.59832179228,-1.84744771719,-1.98041345438,0.181751922071,-1.86992271225,2.09672110558,-3.00351278146,-2.34073953231,1.90364366372,-3.77574122826,-1.82476956447,-1.66754270395,-4.17944864114,-1.643569568,3.2956170102,4.84715448697,1.54389404712,0.413878052236,2.01489253759,3.26832122485,0.128817051054,5.05713614782,1.29822279056,3.8207182916,1.3051289777,2.15857723474,-1.16148341576,-2.10272764564,2.65485213935,3.33767395735,-0.225942747493,-0.0608929246157,0.386773107847,3.04139202913,0.880819526515,3.79432223876,1.34475161622,-1.15084494869,-2.72890689214,-2.20355211159,4.0270291551,0.831315397334,3.15832736333,-1.64833269834,-1.15337079207,4.42843692621,3.73798665524,1.77370616277,-0.414466093183,-5.21718411287,2.14873480677,-3.09902131875,-0.431480846305,-2.21315110326,-2.32947000265,-7.03267769655,0.620159295995,0.669400061817,-1.29065409263,0.639066412349,0.412046761511,1.52948790789,1.63768410901,-3.5861120669,1.49905408064,3.24001261135,-1.20556717555,-1.63470236778,-0.0621758023897,-2.13516124328,1.88267453392,-0.0397303390498 +-1.94168348829,1.77759615302,0.00324969291651,2.76537520647,0.356809294373,-3.04903903445,-0.571212081513,0.542071000835,-0.3627079765,1.24325743755,-0.427730951508,0.239423566062,3.11484637578,0.816348610718,2.79456279387,3.34600726088,-2.36868370374,0.960648590526,0.492966024081,1.63726032575,-0.520594614346,-0.710762829333,0.599766778151,-1.0725793888,1.89727054477,1.25175032437,2.9051876543,1.70391878923,-1.64619573633,0.92607907027,-1.19849523346,-2.36430278246,-2.74358758171,2.20087053259,0.111789479652,-0.408449259351,-0.328728172382,4.23223049204,-0.694623126908,2.42802311579,2.87807498376,-0.740068741241,1.62616318464,0.592944381834,0.159329541226,0.917487897477,1.93800289412,0.471566647292,2.55488344431,3.85311472585,1.20064109365,-2.55722387075,-1.1082152708,-1.02037551522,0.175513311128,2.44115464489,-1.72615523438,0.765665462018,0.977433196902,-1.83432733496,0.349625592828,-1.36312687636,0.715892488958,4.3416105775,-1.06046443701,-0.509649908741,0.497787223061,-1.32805623293,-0.711287250494,-3.12120837728,-0.91976089676,0.324390023948,1.0570640707,1.21327393075,0.919257143736,0.591331963142,0.457342879871,1.42952108284,-0.507037276824,-0.131746374767,-0.758843362331,0.87595297138,1.89250633915,1.27972093304,-1.20407567422,-1.03513803601,-1.98328871648,-0.134607525474,-1.06089170476,-3.55350990852,-3.87543780128,-3.44827607234,-1.73182748934,0.474614856841,0.146985151768,1.38470371882,1.86053468158,-2.31340465764,-2.91458182531,-2.94684989293,0.104040072957,-5.12927719911,-2.74529750084,-1.73663758914,-0.050694579085,-1.07352878233,-2.26959511399,3.31798489332,-2.0237891674,-0.758091919621,1.11872776558,-0.914398193459,0.528536718686,-2.51926944176,1.82730301281,-0.9518930234,-0.139356404841,-1.27573636502,-1.00687736809,0.303327493866,-2.61500696495,0.686027350427,-1.85459803333,-0.927233275571,-0.823916974465,1.41521901513,-2.20459855944,-1.11158712417,-2.42684030851,-0.775827880999,-0.958329697748,-2.08249924024,-1.46892851353,2.33831092993,2.1452542154,1.52739960074,2.11092267672,1.58193236212,4.76442153255,-0.500175990462,1.07725728969,-0.0380358799299,-0.134679699142,1.48794374386,-0.768634611766,-0.826269167265,3.30978691737,0.666516949734,-0.977266769807,2.49315859,-2.23554114183,-2.87566772004,-2.56910360535,-2.05376130993,-0.0498415172097,0.0825265093635,-2.29967758298,1.79486357128,2.97849754334,0.294754260181,-0.787193464239,1.07911070456,2.28606698573,1.0229565537,3.31828083058,2.20088501116,-2.9493214941,0.495515686273,3.67637633016,-0.658510478187,1.48509448349,0.636143160462,-0.197983719906,1.11500002464,0.257854927381,-2.22390707294,-0.292455268702,-0.33748634686,-2.67893269607,1.85805808067,3.25056247751,-1.99736208757,-0.936352628816,-1.63319216132,0.0197323211029,0.691710200409,-1.60304873069,2.2222357738,-0.494030532042,-1.08713753581,3.57224936565,-3.01036009913,-1.89224094709,0.0352522730817,-1.16673329006,-0.490160132845,-0.675803392779,-1.24046576689,1.81585133443,0.898491098881,0.944541202783,-1.58925671836,0.205119132002,0.531537915468,0.253908309937,-0.676644065382,2.24614178936,1.33602100372,-0.244497090975,1.76761640933,-1.35158142765,-0.414446250596,3.73523249467,-6.43518327713,1.29597712395,4.63066692571,-0.613321131865,0.561347877184,0.0711209956796,-0.127557443778,3.31162866752,-3.6926500086,-0.285006345312,0.5099318854,4.93547654788,0.868819684585,0.137249038219,1.1507523783,-4.40680729866,-0.0998956763242,0.600819382666,-0.423278113404,-2.2000334398,1.6212370952,3.27790774345,1.55507115324,-1.22907078028,-0.029405062197,-3.98268408457,-0.990495022685,3.63038349777,0.218821062246,-0.752823298723,-0.248150258065,1.06529252927,0.178199325207,1.01655516048,-1.81574172656,3.30965251942,-1.68384102901,-1.26371297797,1.5262250487,3.47741630872,-0.265562386513,-0.801813617449,2.82347845296,0.909660657868,-3.02272153417,1.71411415577,0.936149520477,2.0352847523,-2.75044326146,-2.03834780852,1.15331309106,-1.52446875255,0.960695721459,0.943234353662,-0.174043610891,-1.19030439148,-2.10528520733,-0.142430415601,-8.60760105293,-1.17194890261,0.969270079944,-0.40029415071,2.89137278279,2.72696355025,2.00195770899,-3.36749429703,-0.503019749323,-0.34973009636,1.07590580434,1.94033220887,0.0662941793603,-0.844664263724,-0.205224726594,-0.927714525858,1.82151385903,0.716765152912,-0.505078462759,-1.39169801553,-3.75972258488,3.03527408441,-3.67419142961,-4.80039191882,3.66278994322,2.01150090774,1.10328300466,1.56799778958,-1.6706875356,2.33891484579,-2.23406077822,-0.0790168158239,-2.29835296949,-3.19061029037,-3.09279531479,-1.82349063297,0.689713494777,2.95120071332,-0.454457020759,-1.06216011772,1.86404781302,0.412750553488,-0.312192496856,-0.901166524788,-1.71619956255,-0.00137017687125,-0.982375823656,-2.71353883425,-2.3097029229,-1.10547592401,0.557556620639,-0.718295222919,0.482262715501,0.333214447946,6.6798358191,1.22103029927,-0.80201618895,-4.47059836705,-1.35593414923,-2.29788345019,0.258600590626,-0.521844846309,0.437594954967,-2.18392724584,-0.493631593385,1.37908267339,-2.26255252202,-0.30756078084,-0.831326435408,-3.69798319499,-0.223937482238,4.03011242072,3.48706426779,0.441070608024,0.828923836351,1.9435341994,1.40788590272,0.878239062827,4.71550399621,-1.09901936968,3.4838750726,0.68982342432,2.16981327931,-1.96828734874,-2.21202177366,0.926186291775,1.88568594058,-1.7648316586,0.236547902774,2.64866965254,1.89112312635,2.27105943719,2.17706474463,0.199846277238,-0.520975260338,-4.22671780745,1.23446348428,2.73591025611,-0.260378292885,2.32260232011,-1.45908730944,-2.6201878802,3.38336368958,3.02514527659,0.979315823712,-1.99782266637,-1.60707169851,0.0483122205721,-4.34822138787,-0.213511068026,-2.52483056227,-1.12644841035,-6.88962921143,0.44326542365,0.096239015261,-0.0212235212322,-0.688477359512,-0.351519578299,2.47742046833,1.44010951281,-1.97519741376,2.6616740036,2.26513570666,-0.766266692481,-2.78300611377,-0.376965727806,-2.54099842787,2.187827818,1.03102740248 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/activitynet_features/v_test2.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/activitynet_features/v_test2.csv new file mode 100644 index 0000000000000000000000000000000000000000..95ab472569d02a2a91c19e22189c95c7ae867367 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/activitynet_features/v_test2.csv @@ -0,0 +1,6 @@ +f0,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72,f73,f74,f75,f76,f77,f78,f79,f80,f81,f82,f83,f84,f85,f86,f87,f88,f89,f90,f91,f92,f93,f94,f95,f96,f97,f98,f99,f100,f101,f102,f103,f104,f105,f106,f107,f108,f109,f110,f111,f112,f113,f114,f115,f116,f117,f118,f119,f120,f121,f122,f123,f124,f125,f126,f127,f128,f129,f130,f131,f132,f133,f134,f135,f136,f137,f138,f139,f140,f141,f142,f143,f144,f145,f146,f147,f148,f149,f150,f151,f152,f153,f154,f155,f156,f157,f158,f159,f160,f161,f162,f163,f164,f165,f166,f167,f168,f169,f170,f171,f172,f173,f174,f175,f176,f177,f178,f179,f180,f181,f182,f183,f184,f185,f186,f187,f188,f189,f190,f191,f192,f193,f194,f195,f196,f197,f198,f199,f200,f201,f202,f203,f204,f205,f206,f207,f208,f209,f210,f211,f212,f213,f214,f215,f216,f217,f218,f219,f220,f221,f222,f223,f224,f225,f226,f227,f228,f229,f230,f231,f232,f233,f234,f235,f236,f237,f238,f239,f240,f241,f242,f243,f244,f245,f246,f247,f248,f249,f250,f251,f252,f253,f254,f255,f256,f257,f258,f259,f260,f261,f262,f263,f264,f265,f266,f267,f268,f269,f270,f271,f272,f273,f274,f275,f276,f277,f278,f279,f280,f281,f282,f283,f284,f285,f286,f287,f288,f289,f290,f291,f292,f293,f294,f295,f296,f297,f298,f299,f300,f301,f302,f303,f304,f305,f306,f307,f308,f309,f310,f311,f312,f313,f314,f315,f316,f317,f318,f319,f320,f321,f322,f323,f324,f325,f326,f327,f328,f329,f330,f331,f332,f333,f334,f335,f336,f337,f338,f339,f340,f341,f342,f343,f344,f345,f346,f347,f348,f349,f350,f351,f352,f353,f354,f355,f356,f357,f358,f359,f360,f361,f362,f363,f364,f365,f366,f367,f368,f369,f370,f371,f372,f373,f374,f375,f376,f377,f378,f379,f380,f381,f382,f383,f384,f385,f386,f387,f388,f389,f390,f391,f392,f393,f394,f395,f396,f397,f398,f399 +-2.50391422427,1.68599787994,-6.01226188664,-0.125473405835,-4.05747392075,6.31113406836,3.125083399,-1.28819161128,-0.594363160034,-4.04687042561,3.33266554158,2.05021273438,5.06569788016,-1.51135614382,-1.75754686884,-0.330255823582,2.89510802927,-0.73977406509,-7.89353751824,-3.45772308633,1.17079686934,-4.14460512795,-1.39475490187,3.86253584502,0.447348279778,3.92883117367,-4.46848521844,-3.76229701362,1.69349113829,-3.27463325871,0.924009592578,2.12999677853,2.85659594768,-4.17102590297,5.99293164916,10.2884632288,1.83231558377,1.4478797998,-4.38947245616,3.90167659309,-1.85908630842,-3.78404481822,-4.00131390917,-5.05896560394,-5.12547527286,-1.43005141799,-0.799648821025,-3.57910595264,-2.2926393485,5.31605148185,-4.44407908701,2.30758203368,4.12896344555,-2.10192899924,-1.57365770347,1.46184540219,1.02006796352,0.693975594963,-0.882507590565,-0.268305251769,-1.78810432009,-1.44049936972,-1.30807676828,-2.54602796889,1.91918086343,-1.87330246853,-1.19116743588,-4.94173944111,3.41346881759,1.04477840726,-3.87883468949,-1.6401990057,-3.11963649974,3.10739194639,2.00107403406,-3.01992488162,-2.17734208151,1.18544464156,-3.26027744456,-1.38117784752,-1.12807281493,1.23731617227,-4.22769494609,-2.31104123998,-2.73342858264,-2.60609814517,-3.91516964902,-1.43564934755,5.86923505644,10.8698481406,-0.0644558026284,1.29974983175,11.9821762355,2.63645925008,-0.800439528532,0.305979802689,10.4448009584,3.89998507623,10.3629906773,0.987935663397,1.06111665476,1.15934493999,4.74597180691,-0.53357543254,-5.53819862455,-1.08892905758,-2.84128587559,2.54403880204,3.08628575869,2.26009004126,2.77060999349,-0.582374569877,-1.77802346002,-0.2937931835,1.02838354244,3.37142584142,-6.2468072647,2.20336157741,4.02669576097,7.7139954797,-2.62292807265,-1.63856477894,5.24209850422,-5.95689444574,10.9237309757,5.56173629091,-0.06239338509,-0.11586309122,10.5260359799,0.0455641002992,-0.143587274683,6.85981490484,1.30256727268,0.099060309792,-0.99507694974,-2.39523977029,0.646837872527,-0.549287130061,0.528060432284,0.478981495421,-2.87669151504,-1.24631201746,-2.76280551886,-4.99648601327,1.56782352093,1.72098800023,-0.0553381940814,-5.35496277362,-1.12433242997,-0.526286978024,4.84426140262,-1.67891876845,-0.0265676538691,-3.17656040053,0.26415708479,4.03517758548,1.4993594204,3.83278299704,-2.77651900406,-0.861125229206,11.2030357751,-3.15313750697,-2.50459314309,1.78739187732,-7.82420403389,0.904809594012,-4.18456179152,0.60995901817,-1.44564015234,3.83168430104,-0.00437937539048,-2.3451228437,5.58568740368,2.97791145801,4.32271502614,-1.54512997459,0.536759116431,-1.1815032758,-3.14896126398,-6.86535051022,-2.70346348657,0.0113500145858,-2.77794979296,2.35137890776,-2.64285167165,-3.95364762035,-5.22867795339,6.15572625407,-6.91736113212,-1.52054794698,-2.80880080933,0.30321730122,-5.91560237718,-7.42976562356,-1.07937648743,-3.26394725639,5.0495641506,-0.553299233738,3.96384933141,-2.30659410078,-1.92410211898,-0.0740623548288,-0.741995456365,1.25729537246,3.06146581722,2.64592689772,-0.768545938319,-0.368544330909,-4.14440217226,1.39461226592,0.549227126659,-2.66866894906,2.50084337145,-6.41121511041,0.753405646177,0.280067476256,0.0344201303652,1.11097541213,-0.756136736626,-0.134220118965,5.6025168238,-2.69538654726,-1.20349766834,-2.90915489789,-3.07136878235,5.78831844318,4.79880530822,-1.54153241949,-4.93687499883,-1.02846407186,2.11793406884,1.81036372992,0.928447236083,-1.67445344365,5.93752378918,5.25534441684,-1.32955752029,5.02874157984,-8.32498580794,1.22665544488,0.729978278127,3.76998885216,1.18933444305,-4.01561953996,-1.91036380149,-2.01600540918,-2.19074894269,-6.06838036269,1.91566910093,3.16219263298,-5.36112836713,-3.03646755643,2.60723549671,-4.73392456058,-1.27864055974,1.65558185437,0.35871136493,-1.97445669054,2.00282359886,0.766041404302,0.935142604145,0.146960995005,0.90301123882,0.584378651645,2.43738964301,2.14986027277,2.13076803503,3.4849176696,3.37372560032,1.19906408345,-3.25606738189,-7.18101082565,-1.28755031363,0.930275378818,0.638566405974,4.33632120663,3.7835789624,3.41258601273,-0.279865853117,-0.651737863704,-4.7223025058,5.75545690528,-0.820105519292,-4.00676441302,2.11396374954,2.60952237005,-0.820631582523,-0.546553676079,5.33481172893,1.34852465273,2.93794032376,-1.33422280837,0.00903616898423,-2.36627310158,-4.99107783527,4.48972757256,3.85615534734,0.528791357535,5.58767522678,0.127227772965,0.973913995567,-1.8062694088,2.32322553868,-0.442473914737,-0.123751487135,-1.67863033336,0.0891421785383,2.82212784306,-0.478511586228,-3.3537200428,-0.522387139102,-4.25974474021,2.87018204241,-0.111813521457,3.94839403804,3.74490500576,-2.30623158975,1.49538655047,0.530469396242,5.1296629385,-0.453469798231,0.306027388129,0.35104102143,-2.34272025863,2.87870763106,0.212640115016,0.719817214469,-0.20345939615,-0.506974699062,5.3592568385,-2.28140813929,2.88992723737,1.65410613199,4.48693866632,-0.09672872709,-1.87582435405,-2.46928755752,-3.56278716312,1.74785164057,2.74009034813,-7.29490411233,-3.16100976408,0.847520336401,2.92602454656,-0.0986801903656,-2.16201799224,-3.39690165524,1.53765563161,-1.41997380147,2.71161737728,-0.0167333157083,1.75945290337,2.10004583364,0.765974609689,1.79493778887,3.43569638106,1.49552039321,1.90617850633,-0.592973705882,4.00305455331,0.0335191789012,1.05186070161,2.48385107847,4.89055257951,2.06091725733,-0.18432842804,-4.0123498625,-1.32194922277,2.87064841629,-2.07818711219,0.695646315956,-2.8474977249,-0.372025591391,0.277543174562,0.348284025789,-0.54074715731,2.48928393808,-5.685446576,-1.66416304574,-7.02726226008,-4.88155203391,-5.57406386037,-4.91916411608,-7.94337537982,-3.65389317081,-2.97659988583,-5.97952768511,-0.575712613136,-3.38044490327,1.89594224776,-0.106777342905,-1.21814931744,2.66339186237,2.37583883107,-2.34277046832,0.0847875222918,2.1196259109,-2.034442402,0.994460807731,-5.99126604669 +-3.61196599602,1.54396823943,-7.05199570656,0.70936037898,-4.42450754642,5.79873381853,4.79998759627,-1.51375595927,0.041889913378,-5.36947724223,3.11711617708,1.87290850281,5.37537143231,-0.140440261367,-1.07927534082,-0.8091666732,4.91609726548,-1.47799203396,-8.695467484,-4.09717354178,-1.04496299029,-3.85961924196,-2.10038466751,3.32289713025,-0.286860848963,3.96218072772,-4.39675701856,-4.40787660479,3.73622534722,-2.87716412544,0.454319910706,2.42820411325,3.82069679498,-2.79692421705,4.38538633883,10.2156878471,3.4358463645,2.12645539939,-4.04702971578,3.87549848557,-3.44834155142,-4.70891635418,-3.76960349679,-4.85522414446,-4.31793097854,-1.22963698059,0.447048375012,-2.53883199245,-3.42271156311,4.74730663896,-3.28625443876,1.15255518705,4.48008643985,-2.00973020792,0.25715895891,2.01633035838,1.72455749959,2.46865062863,-2.55920924097,-0.941734179414,-1.01115750857,-1.55530408025,-1.35561941266,-1.23846808225,4.0139059037,-2.82922329605,-1.54500077367,-4.14823132754,3.46829478144,1.42298098058,-3.60501238108,-0.478655001521,-2.27799000442,3.80441823602,0.555091810227,-4.56343603134,-3.86684781313,2.51266635656,-2.34452754557,-3.54211790189,-1.63034411222,-1.93864814639,-3.73451783657,-1.60328631774,-2.4672175467,-3.80095796585,-4.04769252539,-1.72506986559,5.59767432213,11.0820033073,-0.191732565476,1.90799899697,11.6760043621,4.55487689376,-0.31670263633,0.824923895671,8.5647937417,6.5042055428,11.780738759,1.50271001905,-0.0258838802575,0.435441556572,3.30290358961,0.377896644174,-6.5453125,-1.00815881342,-4.10386363864,1.63551698476,3.23607475758,1.42431855202,2.55384192467,-0.456127088517,-1.94804133773,0.550055715443,0.636448504358,2.32128318697,-6.70778397321,2.73787901104,3.27784690857,8.87038059237,-3.74099546671,-1.75985428691,4.34281664491,-6.43530688286,12.9979223013,6.78234988451,-0.806176937745,-0.697792875396,12.720209074,1.51877520681,0.540385435523,6.74378789664,0.843219137377,-0.0813938416541,0.253477528694,-0.220510208608,-0.133373232186,0.959342181682,1.10231779218,0.231312006339,-1.99769770503,-2.40456032157,-2.95679311156,-5.95258055926,1.98243983686,2.28856839836,-0.382299264148,-5.90337668657,-2.26504155695,-2.81989197582,5.54886015653,-2.23119397462,0.655153363942,-3.77459974289,1.65176175833,5.3708147645,0.977352631095,1.60295453668,-4.00599938631,-1.69029248208,10.0866486311,-3.23101823926,-3.1206391573,-0.391065824031,-6.68118602037,2.16630054861,-4.7760153234,0.383674836252,-2.48520847857,2.07149813026,-1.99720753431,-1.20698849112,6.08765767813,2.54862617255,4.67334094047,-2.9711537391,0.948479612171,-1.01456621587,-3.11699818373,-6.72917854786,-2.92183075547,0.496130555124,-1.61810724959,4.37298168838,-1.93378492743,-1.86215627491,-4.90786517859,8.62715418338,-7.5756526351,-3.27301322818,-1.76513157338,0.75444869213,-6.96635819673,-8.78930687905,-1.7524562791,-2.41629351974,3.68741244673,-1.43222312816,3.23068808318,-1.59724262357,-3.27234983742,1.24265492261,-0.0109941303718,2.80159805715,2.48849355877,3.07970299125,-0.557770807296,0.432648000119,-3.69374324679,0.0467125833038,0.424763832987,-3.38139162659,3.42404463887,-4.51077946425,2.03796033263,0.507232870907,-0.506469908358,1.50909484178,-1.27529992908,-0.255473581143,6.49730739594,-3.27221466898,0.583703720573,-2.57865453363,-2.25019647181,5.4004673481,4.42697024941,-0.0842542231125,-3.7730645895,-0.905618444086,2.8413999021,1.14175421931,0.425801990927,-0.551772788169,4.81836385727,2.67149700224,-1.60633691549,3.67677226961,-7.09939215183,3.07843704373,-0.603567731382,1.07058879137,-0.284542271494,-2.65182375908,-0.966910338403,-2.21251030267,-1.5918459788,-6.73685925007,2.16504070461,3.16708334088,-5.73397156,-0.0308346152315,3.96178902388,-4.34651784301,-0.626209998878,2.96317673624,1.55037861467,-1.6240209043,-0.916502046583,2.22772178277,1.73989147246,0.425792780239,2.44748416841,1.27179720402,3.01824558973,0.45870998502,1.6810954839,4.9340551734,4.52931187153,1.22776987255,-4.30461632609,-8.0007350564,0.293104887008,2.59760651291,-2.09017359019,2.84267843664,3.92640956045,4.39850687385,0.263943502309,-2.52996243984,-4.9456074357,3.01140740514,0.060671949388,-3.45182769299,3.45659797787,-0.717935073377,-1.70038859993,-0.159526935219,4.78994245529,1.73284136951,3.39466386437,-3.02896884084,0.745040215552,-2.42295794487,-5.48635936975,5.81924671531,4.81498251557,0.588836860656,5.34480842352,-1.69491340667,-0.931661537289,-1.47670565099,1.95115838945,4.33551876547,-2.35900820047,-2.03742983938,-2.51175971031,2.00818323493,-1.02861073502,-2.83876619935,-1.42532885447,-3.22665929496,3.24723680019,2.50910392105,1.66940991878,1.98924016655,-2.976414603,2.39372268021,0.0301794916395,2.93753557801,-2.53472368196,-0.224031038582,2.22086050436,-4.60367997885,0.344105190041,0.892087735609,-0.732750460502,-0.0278959076854,-2.04538312331,4.39118845462,-1.92525613308,2.48760456741,2.12224386633,4.20933679342,-0.160378366895,-0.847533833979,-2.68713091612,-2.85529101193,1.45633238703,3.13940095305,-6.84778351784,-3.07674325108,2.9240462061,1.66283178181,0.366562292727,-0.474471753836,-2.22659401149,2.12781591714,-0.698044653983,3.11203145981,-0.0878812848356,2.08509909212,2.37360790372,-0.383632448313,2.85876693129,1.43884898126,2.44588458538,1.13197429609,0.669784083962,2.82567384094,-0.303028093278,0.0804680705045,1.01148720384,3.96722738147,3.78676999509,0.484674140066,-5.0017509222,0.154588726159,2.53468632102,-2.48899200261,0.211847947538,-2.28771493435,-0.277051561698,1.01623694403,0.347248692065,-1.88412645785,0.431219244007,-5.62209599018,-2.32514169514,-6.17786878348,-4.5459565401,-5.45559768676,-5.25804600716,-7.30329209566,-4.18787643314,-1.41929989755,-6.36565381289,0.691979244352,-5.4266118586,0.243365764617,-0.33372869622,-1.60025772154,2.65902011394,1.72226278037,-3.51518207789,0.837280854209,2.64499332011,-0.451456475259,4.05596930012,-4.51415959 +-4.72683149606,1.45348708808,-8.07086817742,1.63604789376,-4.73549800873,5.20675960303,6.51230325818,-1.76839387298,0.728590119478,-6.74178866983,2.8130164218,1.58456622004,5.62148933888,1.37578496694,-0.371593541978,-1.41557620727,7.0383985126,-2.29083102226,-9.45700079202,-4.80206114411,-3.43986400128,-3.55278479934,-2.83554306328,2.7735268724,-1.13780232042,3.92281681627,-4.25488941192,-5.10927104115,5.96552311688,-2.43940485954,-0.0862283119556,2.73895709873,4.89024929762,-1.2763541922,2.57022780523,9.9613841939,5.07362765074,2.82582543075,-3.62501172424,3.7390643692,-5.19941673696,-5.66170942306,-3.52688271404,-4.6018137145,-3.43782470346,-0.992310488373,1.76652944327,-1.43113125652,-4.60094419718,3.99586562991,-2.03482079327,-0.160126103461,4.7740121144,-1.88776037335,2.26084538698,2.65253681004,2.54412336618,4.38450802416,-4.3977601847,-1.7176710071,-0.0724306311467,-1.70681380391,-1.41692107796,0.200332455933,6.24482979595,-3.83351793349,-1.88694544792,-3.24113301516,3.48263311743,1.83456811458,-3.1987385869,0.769642335775,-1.36940517485,4.47494917393,-1.01712017417,-6.15526720286,-5.62981627226,3.9166711688,-1.23287549198,-5.84563351884,-2.13252854615,-5.38287308335,-3.12790068805,-0.774887352436,-2.1297221756,-5.0906492424,-4.12367990136,-1.97023809493,5.23813544751,11.0778312242,-0.275825666287,2.59604639888,11.1118171802,6.55417260289,0.203035293669,1.38965836696,6.4515772891,9.32944820284,13.2775346517,2.04594562918,-1.18929040372,-0.312611132264,1.6740041858,1.40754847616,-7.60108621597,-0.907561735809,-5.39238245725,0.626936522051,3.35088065982,0.46351477623,2.31236622334,-0.229608643204,-2.07551843763,1.55680642903,0.263669897775,1.0858634612,-7.05738488197,3.32455673039,2.40335632682,10.0899427987,-4.92568675757,-1.80175588966,3.28225847542,-6.88330174923,14.9608820614,8.02759130716,-1.60224438258,-1.24848374822,14.9900966168,3.09142677188,1.2888044706,6.5442295146,0.330789602659,-0.286123776287,1.62822659672,2.06531837225,-0.982651502788,2.60571396113,1.63691263556,0.01017631717,-1.03312850952,-3.68506930947,-3.12813932538,-6.89839523554,2.3975418067,2.95167421162,-0.811870859787,-6.43306355715,-3.44969232738,-5.32219609171,6.3486418271,-2.75835331619,1.37597230494,-4.40136899472,3.19074914694,6.78243587256,0.445585229398,-0.808829127549,-5.32398023844,-2.61561192304,8.69628513216,-3.31122705817,-3.75478894711,-2.72484310418,-5.34768217325,3.53855306476,-5.38706000924,0.145446739923,-3.58612233102,0.120355840028,-4.15744045019,0.0731746891131,6.55438641787,1.99956796408,4.91731314421,-4.42644771397,1.40971697062,-0.784811406731,-3.00484983444,-6.53485749721,-3.15200479388,1.03534908369,-0.301970368177,6.51142239392,-1.10611675471,0.418995252622,-4.4721977675,11.1724257183,-8.21665349245,-5.11762260079,-0.615411399901,1.18636612185,-8.06906448126,-10.1247596884,-2.49426667422,-1.32065032601,2.17061477065,-2.33631666951,2.38926856876,-0.913166025876,-4.7118704623,2.72928834141,0.775672697726,4.4457443577,1.71014433921,3.57591197133,-0.235582885593,1.25215531408,-3.14634150744,-1.4078004086,0.365033659041,-4.17761438727,4.40297134757,-2.42336025477,3.4580388701,0.689679874331,-1.04557964027,1.87770598858,-1.80380414367,-0.417696796171,7.45841611862,-3.81225969553,2.56200723887,-2.21683688522,-1.32409115911,4.95071142197,3.92624093532,1.46352795839,-2.46225001812,-0.77849281013,3.50410349012,0.434351972267,-0.0288636657596,0.669223650095,3.49293913841,-0.137764969467,-1.96554630518,2.1402142328,-5.7265598917,5.16214273542,-2.05637966395,-1.8495585683,-1.87955528319,-1.25644548416,0.00674796104395,-2.43147389591,-0.893102669418,-7.49637273312,2.34914988339,3.13358963132,-6.12764425039,3.23036705017,5.41211955786,-3.91730147004,0.0444684042034,4.39211372912,2.92113072753,-1.25977230668,-4.12997387886,3.87697173372,2.66106281221,0.736292763781,4.03323895753,2.06197661877,3.74714529276,-1.27023549199,1.21123526514,6.49754122019,5.87128979206,1.2765970856,-5.3870420897,-8.90884536504,2.01509624004,4.4446681577,-5.09674575568,1.20312212527,4.04149165512,5.50566021562,0.953406482342,-4.55933359832,-5.21267021895,-0.036395560507,1.1284481287,-2.80024212361,4.99020810008,-4.41919901133,-2.62691727608,0.226202066541,4.16152264595,2.0979556495,3.87861913562,-4.9043425262,1.60233154863,-2.46861632347,-6.0349463439,7.17580538869,5.88561519026,0.718002053499,5.10737453699,-3.68287960738,-3.00543767631,-1.03803471714,1.53446617425,9.3747028375,-4.76337719411,-2.39580845952,-5.3522044754,1.13427948356,-1.6372946959,-2.29562118411,-2.37800694928,-2.10207263529,3.68294849873,5.38075784862,-0.940855975155,0.0137967544802,-3.74462119222,3.33829092682,-0.57550301969,0.537392029762,-4.84174327537,-0.825007719694,4.19546295956,-7.04726793528,-2.39606908321,1.61995286934,-2.34724253952,0.159427139386,-3.66048334882,3.28457990646,-1.59395935536,2.02604223549,2.65396766722,3.91925804377,-0.170175538174,0.293078864813,-2.97810955763,-2.11363542974,1.19750591725,3.54246556639,-6.34636378288,-2.98813998103,5.24311850287,0.266658103764,0.848274391745,1.48310565829,-0.99932412535,2.74228922785,0.028886015862,3.42641401768,-0.174800014277,2.45710129201,2.67823993087,-1.63095737636,3.88755993008,-0.699719142316,3.417716069,0.163006665744,2.16666536272,1.66770118028,-0.553962221444,-1.03107923508,-0.689737435581,2.84424331307,5.59421723187,1.20365538374,-6.0307972002,1.79253649413,2.07976007581,-2.97050522506,-0.320198328197,-1.71101762295,-0.148553741649,1.92997103455,0.389586392492,-3.34172380107,-1.60005307674,-5.45010868966,-3.076508376,-5.23991111994,-4.07970976352,-5.24768321514,-5.51570352555,-6.46153886914,-4.78648862958,0.280570674728,-6.66282331825,2.05202573478,-7.5744939363,-1.66311737061,-0.568106225319,-1.98653774977,2.69276298046,1.04291445166,-4.88652718305,1.6799737481,3.19981912076,1.09642167091,7.33881660357,-2.92239319682 +-3.95618640951,2.16822504699,-7.02749201775,2.07438584924,-3.7952008903,4.66516063452,5.66598080516,-1.93683131397,0.83286083467,-6.31038688779,1.93803728581,0.415994385479,4.63695873261,2.03064954996,-0.546765608815,-2.54600209773,6.67720080018,-2.60086139083,-8.36665858864,-5.08000973701,-3.84362360537,-3.51486208201,-2.64075744003,3.07348869205,-1.94571852326,3.0428294871,-3.48582068503,-5.26945194721,6.5893364191,-2.27115260124,-0.558212063015,2.65741990924,5.38911813021,-0.610317340195,1.36496483032,7.88430027903,4.24496084571,2.5491838041,-2.95291282773,2.46365449905,-5.8806508565,-5.27971760869,-3.57540645719,-4.17462575197,-3.20521330357,-0.712461964526,1.66458856776,-1.43753664225,-4.29921654403,2.28583934903,-1.82383457958,-1.12579432636,3.8323690407,-1.60873620778,2.88645622611,3.1870587337,3.35539863348,4.68089458585,-5.01220222473,-2.40511398852,1.23198447682,-2.04995642841,-1.54208872378,0.738531426192,6.23694182634,-3.66800229013,-1.47559821933,-2.51566377998,2.96481087386,1.93647179783,-1.85266061902,0.897218961718,-1.2290535754,3.62848708004,-1.39016747028,-5.53799726665,-5.19588583469,3.79989851355,0.365908132196,-5.86183534264,-1.74588927373,-6.0965897572,-2.17361679807,0.099301538021,-1.49651467532,-5.28756560326,-3.35764337569,-1.22807119251,4.41288296581,8.37310397655,0.329299056678,3.0666925776,8.31520066255,6.03162533879,0.254658643305,1.52927615046,5.15474370718,9.92706954478,13.1178707933,1.9851475221,-1.25251645445,-0.040588879585,0.598402907254,2.09637820482,-7.39962798595,-0.736607771963,-4.72784618586,0.148764773328,2.82482881815,-0.363951296807,2.18847515703,0.851648719757,-1.44513312698,2.82303802848,0.789665968129,-0.284895439446,-5.39480451405,3.52706449866,1.50199447424,9.94445934776,-4.85012166024,-0.775828022365,2.07768519119,-6.15859429717,12.0614514388,7.37984260201,-1.64554053068,-0.434650133851,14.1951656962,3.12879480362,1.52092895806,5.6518155706,0.0597475437445,-0.432820611596,2.15243572235,1.70108392119,-1.19518387556,3.0659382844,0.729992161989,0.512096637264,-0.702464946806,-4.23238757848,-2.71316921115,-6.04356548428,2.08492669598,3.63833817005,-1.76652027816,-5.79197620272,-3.09022756994,-6.01349622488,6.92608562946,-2.03923279405,1.31198180869,-4.27980091691,3.90416300416,6.64981202126,0.73166857958,-1.23485268474,-5.4199275887,-3.10880723954,6.33416883498,-3.2787891686,-3.49453917981,-2.87733795069,-3.98702534318,3.87149213552,-5.16316780805,0.178835353982,-3.50880401373,-0.771996193229,-4.59445316195,0.868211128412,5.75491086721,0.921819759609,3.39493911088,-3.67554339618,1.67544182837,-0.174868727922,-2.08721256792,-5.95615169048,-3.12308293462,1.30280533791,0.644019361586,6.33218312264,-0.25693573624,1.04176057992,-3.36895969659,10.1426500809,-7.50808531523,-4.85486101508,-0.170589606464,0.612994321586,-7.87276499986,-8.79793308139,-2.78509446978,0.942439908986,1.39931613266,-1.95726648182,1.68011825532,-1.75475023031,-4.74921035767,3.71489373327,0.868516312915,4.43326895118,-0.263135685322,3.9764669311,0.911694865376,0.85224120736,-2.35560669035,-1.62565724194,1.2212044698,-4.61154775619,4.34895780444,-1.68536224604,4.06422766924,-0.0101673817625,-0.609392880799,1.22532760024,-1.5149737785,-0.805999085308,7.55067921162,-2.93719872087,3.43533396363,-2.10260034561,-0.721583162695,4.52110221148,2.69720968336,1.40812491387,-1.62846618414,-0.822517428993,2.23470644593,0.491862057373,0.920802225173,0.962496383188,1.928562572,-0.802637988328,-2.72160144806,1.0092707017,-4.93745543241,6.46554609537,-2.43392473698,-2.37087579571,-2.17133839786,-1.93240495443,-0.362681306601,-2.54449704886,-0.17978616923,-8.05280478001,1.39086142182,2.67881788671,-6.08614060402,3.92572582901,5.49754135013,-3.72346940279,-0.242022804468,4.81397798061,4.11047571898,-1.36651873588,-5.34488024235,4.95870956659,3.41118116498,0.89432107985,3.33253220856,2.74165137768,5.04070746183,-0.415948872567,1.31926612794,6.72856174469,7.17419068098,1.49098495662,-4.98007160067,-9.318038764,2.46224850535,5.27640871287,-6.26628448487,0.635381773711,3.60578859449,6.173201437,2.24732711256,-4.89329962254,-5.55538270712,-1.49875565291,2.64946635843,-2.09067063332,6.20336785316,-6.25677093268,-2.50105109721,-0.0861860245474,3.59812706232,1.57726798058,3.84794261813,-5.72557672262,2.46239029348,-2.29553559303,-6.28103302002,6.47278197646,6.46319063902,1.48405849189,5.35767221928,-4.23237529636,-3.51878979206,-0.00904786854982,1.29577608407,8.77539933744,-5.03432886004,-2.11539484441,-6.16999167681,1.0546652633,-1.90332779229,-2.35973056435,-2.26917619407,-1.82008438647,4.08268388271,6.31470301866,-3.08372749806,-1.22069035709,-4.38186541558,3.19182102323,-1.42976873428,-0.223793095648,-5.89660835981,-1.25134502113,3.99110957295,-7.45729860783,-2.86559789747,1.66721295506,-3.13464591861,0.162813140824,-3.38049943731,2.39996716856,-2.15944387913,1.63885930896,3.04169135332,3.98578349114,0.511457957626,0.823394746482,-3.67019996286,-2.25544205963,1.80545994013,3.28000457585,-6.05162557602,-3.00187867403,6.49878694773,-0.326051785648,0.684602611069,3.36035886407,-1.228521097,2.57487190307,-0.46660696879,2.10812581897,-0.305482617393,2.75176966548,2.83328473449,-1.89653189778,2.65913075805,-0.83185869336,2.94031493856,-1.53106848534,3.9481344676,2.79967945367,0.710376281441,-1.93211027801,-2.24844452739,1.20713421225,5.22792970717,1.27727831364,-5.73701616764,2.55549032926,0.93986610532,-3.48593280315,-0.51567519635,-1.94204506159,0.172434514092,3.41956290126,0.900014420896,-3.65240677357,0.294835821394,-4.22226468399,-3.63110159874,-4.85140349388,-2.80221052408,-4.28761808038,-4.3011406827,-4.58334078341,-5.13591312647,0.760158468181,-5.32113479346,2.1639226532,-7.19870259762,-3.37775546551,-0.481121961772,-1.74219072804,3.14396611452,1.24187298924,-6.32387711763,2.16209208607,3.14260455966,-0.531431690456,7.58907546639,-2.70918695331 +-1.8262197373,3.46346980632,-4.49737847328,2.16065784335,-1.95281272531,4.15987870455,2.97505878091,-2.04312422812,0.517240521534,-4.57863372207,0.651493945123,-1.38716154456,2.76521640778,2.06453320265,-1.35841371835,-4.05420722187,4.5255736053,-2.54840182424,-5.94124334216,-5.05016509056,-2.81190917194,-3.67080595732,-1.77554705888,3.98575968385,-2.7226165092,1.5568113219,-2.26458370864,-5.039455657,6.05570743084,-2.2971960926,-0.980765613618,2.29306887269,5.47656385183,-0.560339287222,0.599394002372,4.49311896622,1.63815704465,1.56890586585,-2.10052304626,0.367122573851,-5.79060420752,-3.93543901801,-3.83389718414,-3.62215631246,-3.43940053463,-0.401958022528,0.53790163979,-2.24713481337,-2.93054077685,-0.115260034797,-2.36293837487,-1.84129033774,1.99996710196,-1.21648682684,2.51857071042,3.64827320695,4.16069695712,3.80975566268,-4.74414714813,-3.02875908077,2.80003528588,-2.53125300467,-1.71329928577,0.627459498644,4.61502670049,-2.65913504899,-0.521180084049,-1.92113643766,2.06333797991,1.81511531889,0.170950590374,0.216834144594,-1.64254453719,1.68837884694,-0.898700688779,-3.32811955512,-3.17814456463,2.586751405,2.31587963283,-4.22904350161,-0.718470119983,-4.84180047393,-0.968689443319,1.00650176987,-0.650119360983,-4.69666749001,-1.9845663628,0.225895456072,3.25188346505,3.7214648661,1.43130174562,3.38060764193,3.90915834465,3.69100523353,-0.0311959603429,1.36241446137,4.44646521807,8.91873133182,11.7640150213,1.48888324678,-0.522589141129,0.966836237908,-0.0783090251672,2.53949897528,-6.29179323673,-0.514931401759,-2.65529345423,0.0529807021281,1.83676325411,-1.09529018879,2.14935440898,2.54911320939,-0.268381892444,4.27633724332,1.96361587535,-1.7532244429,-2.28158183038,3.45261253476,0.581260310115,8.81487321379,-3.86599963188,1.0199303931,0.769287056774,-4.58845028997,5.65657658659,5.3673800838,-1.14614109278,1.36291043639,11.190714488,2.05933052063,1.38084109723,4.25990110517,-0.0372709861028,-0.537679631114,2.06346488953,-0.573661087751,-0.94866688013,2.67103304624,-1.21616028428,1.53528989427,-0.828803119361,-4.25112649202,-1.87550593019,-3.89059672713,1.24767834112,4.34198590755,-3.09971417338,-4.30684231401,-1.61756324589,-5.39918883562,7.34323934078,-0.421631439929,0.682198462336,-3.61873383403,4.0223959434,5.40389529228,1.60736526251,-0.229281746149,-4.63487404227,-3.29047121048,3.27148656696,-3.16514086127,-2.58948973179,-1.45728034504,-2.60679310679,3.4551587224,-4.33734436273,0.408040666226,-2.58206250429,-0.901034320293,-3.78914433122,1.31350503355,4.04259036302,-0.537077046634,0.599152674081,-1.33412403017,1.80021581352,0.709162880183,-0.588873988985,-5.10033129215,-2.90737100959,1.37433995247,1.32315881923,4.48205828667,0.607754543123,0.468945158216,-1.78444975466,6.53551485418,-5.82657202005,-3.07283199429,-0.23395036608,-0.685120877029,-6.73997298956,-5.55178875566,-2.75079527497,4.04717801094,1.16547028959,-0.65315918714,1.06632480771,-3.69622943163,-3.77567577362,4.33929287315,0.461161797867,3.2264848721,-3.09787745297,4.30806201577,2.65380481884,-0.426790667771,-1.38944570095,-0.951971263289,2.73767599046,-4.78429574967,3.55026640534,-1.92020277738,4.08365875006,-1.34608724355,0.52991460502,-0.16311301112,-0.636902204679,-1.35728861392,7.01656522274,-1.04194504768,3.51204951167,-2.16685251236,-0.352366254777,4.10601737738,0.943123545944,0.196937171517,-1.13858823061,-0.989929439423,-0.42760035634,1.10082056798,2.88289318532,0.586945049761,0.191918394566,0.0784438192848,-3.76375469208,0.170746930539,-4.56917799234,7.20640591383,-2.03627742291,-1.162803858,-1.52358367979,-4.10165442705,-1.70038300634,-2.58114453554,0.544036584797,-8.4628292799,-0.39101603627,1.92033407032,-5.73090517998,2.77129089892,4.59895916582,-3.6993329978,-1.21856652394,4.50981304645,5.16903883219,-1.81281971931,-5.11896033764,5.63131075621,4.03798224807,0.94242796123,0.983446211217,3.34165998936,6.74135187387,2.30066786289,1.84391614258,5.99896759033,8.44891975641,1.82473051131,-3.49935032487,-9.36754787922,1.99033073068,5.37617359877,-6.11145027161,0.840345951319,2.77300786376,6.52381299019,3.97701953709,-4.00499682903,-5.95263335466,-1.81812204599,4.49724048495,-1.33929533959,7.18550524711,-6.75075093508,-1.61648165077,-0.901867833736,3.08160940886,0.417978906632,3.44625248551,-5.78684989214,3.32444414914,-1.96475922584,-6.30903808594,4.28494357228,6.6853062272,2.70926908404,5.95959033489,-3.74483410478,-2.90718505382,1.44551556975,1.18541310728,4.11113968076,-3.76723547578,-1.37445659459,-5.52958389044,1.54767837942,-1.92233352125,-2.86162799716,-1.39507161677,-2.14555080593,4.45648285389,5.85169536352,-4.88964028836,-1.92099967062,-4.92469491005,2.25878873229,-2.46324559539,0.196596455872,-6.0487574029,-1.55177951395,2.21588104457,-6.40127635479,-1.69814975291,1.22380428523,-3.3257760489,0.0335933651775,-1.73429207817,1.6753979218,-3.37144515515,1.30529874444,3.32560100317,4.3093956852,1.69156472504,0.913729201853,-4.65146396518,-3.03416330755,3.03830222517,2.53774605692,-5.90589033127,-3.08939878225,6.98781540632,-0.339520339072,0.055654079916,5.17970392108,-2.50772905916,1.84376598061,-1.84338212296,-0.387204087972,-0.467715920136,2.99068574548,2.88048758626,-1.45433287024,-0.196598118542,0.482496773005,1.41811820263,-3.74789556999,5.93477154017,5.5825525865,3.06712063462,-2.68137378276,-3.70440641046,-0.800623167756,3.29453107536,0.885642924309,-4.48956893444,2.6876345849,-0.693841806652,-4.02581026554,-0.468509003222,-2.75538569212,0.6321949444,5.32430804849,1.74790291831,-3.13624450207,5.02021304012,-2.23322165832,-4.04383488655,-4.85901101112,-0.93985485792,-2.78530479312,-2.02520967245,-1.95793826431,-5.30569067717,0.360007514108,-2.79794396788,1.37599081039,-5.00351879596,-4.95408667088,-0.162458266318,-1.04320560053,3.89612897635,2.07402950018,-7.80881192922,2.38426132559,2.64415159821,-4.4487659061,5.65304963946,-3.48982639909 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/action_test_anno.json b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/action_test_anno.json new file mode 100644 index 0000000000000000000000000000000000000000..e19fc76c4c796be35d0536d5b35b4f55eb45fa9b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/action_test_anno.json @@ -0,0 +1,34 @@ + { + "v_test1": { + "duration_second": 1, + "duration_frame": 30, + "annotations": [ + { + "segment": [ + 0.3, + 0.6 + ], + "label": "Rock climbing" + } + ], + "feature_frame": 30, + "fps": 30.0, + "rfps": 30 + }, + "v_test2": { + "duration_second": 2, + "duration_frame": 48, + "annotations": [ + { + "segment": [ + 1.0, + 2.0 + ], + "label": "Drinking beer" + } + ], + "feature_frame": 48, + "fps": 24.0, + "rfps": 24.0 + } + } diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/audio_feature_test_list.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/audio_feature_test_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..98a4ea9cccc28b492789f78b6ffb0a682ff230e3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/audio_feature_test_list.txt @@ -0,0 +1,2 @@ +test 100 127 +test 100 127 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/audio_test_list.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/audio_test_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..57935b0877970cf6bf4640e609f6a1d1f0067d49 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/audio_test_list.txt @@ -0,0 +1,2 @@ +test.wav 100 127 +test.wav 100 127 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/hvu_frame_test_anno.json b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/hvu_frame_test_anno.json new file mode 100644 index 0000000000000000000000000000000000000000..2a54307358a101c73bec0a6afa8ca013a1cdd909 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/hvu_frame_test_anno.json @@ -0,0 +1,24 @@ +[ + { + "frame_dir":"imgs", + "total_frames":5, + "label":{ + "concept":[250, 131, 42, 51, 57, 155, 122], + "object":[1570, 508], + "event":[16], + "action":[180], + "scene":[206] + } + }, + { + "frame_dir":"imgs", + "total_frames":5, + "label":{ + "concept":[250, 131, 42, 51, 57, 155, 122], + "object":[1570, 508], + "event":[16], + "action":[180], + "scene":[206] + } + } +] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/hvu_video_eval_test_anno.json b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/hvu_video_eval_test_anno.json new file mode 100644 index 0000000000000000000000000000000000000000..f0f98ddc610f959e8b995d7d2ee1932ac58afe20 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/hvu_video_eval_test_anno.json @@ -0,0 +1,18 @@ +[ + { + "filename":"test.mp4", + "label":{ + "action": [2], + "scene": [2], + "object": [1] + } + }, + { + "filename":"test.avi", + "label":{ + "action": [1], + "scene": [1], + "object": [2] + } + } +] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/hvu_video_test_anno.json b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/hvu_video_test_anno.json new file mode 100644 index 0000000000000000000000000000000000000000..ae20d24c846ccd5c8b67a974d1f42b3ada89e318 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/hvu_video_test_anno.json @@ -0,0 +1,22 @@ +[ + { + "filename":"tmp.mp4", + "label":{ + "concept":[250, 131, 42, 51, 57, 155, 122], + "object":[1570, 508], + "event":[16], + "action":[180], + "scene":[206] + } + }, + { + "filename":"tmp.mp4", + "label":{ + "concept":[250, 131, 42, 51, 57, 155, 122], + "object":[1570, 508], + "event":[16], + "action":[180], + "scene":[206] + } + } +] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/proposal_normalized_list.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/proposal_normalized_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..e9a5a3f5c87d5a640378ff66fb0d2662cad1ab00 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/proposal_normalized_list.txt @@ -0,0 +1,18 @@ +# 0 +imgs +5 +1 +2 +3 0.2000 0.4000 +3 0.6000 1.0000 +10 +3 1.0000 1.0000 0.2000 0.4000 +3 0.5000 0.5000 0.2000 0.6000 +3 0.3333 0.3333 0.2000 0.8000 +3 0.5000 0.5000 0.2000 1.0000 +3 0.0000 0.0000 0.4000 0.6000 +3 0.3333 0.5000 0.4000 0.8000 +3 0.6666 0.6666 0.4000 1.0000 +3 0.5000 1.0000 0.6000 0.8000 +3 1.0000 1.0000 0.6000 1.0000 +3 0.5000 1.0000 0.8000 1.0000 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/proposal_test_list.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/proposal_test_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..840a8d680503f3b8f0cf7b2ef2045102bc5c1a70 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/proposal_test_list.txt @@ -0,0 +1,18 @@ +# 0 +imgs +5 +1 +2 +3 1 2 +3 3 5 +10 +3 1.0000 1.0000 1 2 +3 0.5000 0.5000 1 3 +3 0.3333 0.3333 1 4 +3 0.5000 0.5000 1 5 +3 0.0000 0.0000 2 3 +3 0.3333 0.5000 2 4 +3 0.6666 0.6666 2 5 +3 0.5000 1.0000 3 4 +3 1.0000 1.0000 3 5 +3 0.5000 1.0000 4 5 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawframe_test_list.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawframe_test_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..e0c7c713f6a1c1f2c1355ae8796665e69c01af82 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawframe_test_list.txt @@ -0,0 +1,2 @@ +imgs 5 127 +imgs 5 127 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawframe_test_list_multi_label.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawframe_test_list_multi_label.txt new file mode 100644 index 0000000000000000000000000000000000000000..bfdee423443be88e3aa41e6c829530c854b8c6aa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawframe_test_list_multi_label.txt @@ -0,0 +1,2 @@ +imgs 5 1 +imgs 5 3 5 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawframe_test_list_with_offset.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawframe_test_list_with_offset.txt new file mode 100644 index 0000000000000000000000000000000000000000..a3a81015a96012e7277dec399d08781a4121012a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawframe_test_list_with_offset.txt @@ -0,0 +1,2 @@ +imgs 2 5 127 +imgs 2 5 127 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawvideo_test_anno.json b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawvideo_test_anno.json new file mode 100644 index 0000000000000000000000000000000000000000..8ce4ffcb6bd6250ccee48c3a068d11b2741a6e5f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawvideo_test_anno.json @@ -0,0 +1,8 @@ +[ + { + "video_dir":"rawvideo_dataset", + "label":1, + "num_clips":2, + "positive_clip_inds":[0] + } +] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawvideo_test_anno.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawvideo_test_anno.txt new file mode 100644 index 0000000000000000000000000000000000000000..d487afb61bc27a16d537ce440ddc051b801a1268 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/rawvideo_test_anno.txt @@ -0,0 +1 @@ +rawvideo_dataset 1 2 0 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/sample.pkl b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/sample.pkl new file mode 100644 index 0000000000000000000000000000000000000000..ee61c7125247ab7d622d9ef6528ce01e88af99c5 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/sample.pkl differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/video_test_list.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/video_test_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..2f415bddb5d9a5114ddd2ab6031369b5b03c8e48 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/video_test_list.txt @@ -0,0 +1,2 @@ +test.mp4 0 +test.mp4 0 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/video_test_list_multi_label.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/video_test_list_multi_label.txt new file mode 100644 index 0000000000000000000000000000000000000000..0d59b257e04c9800228a76cbeae88de7e58ef42b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/annotations/video_test_list_multi_label.txt @@ -0,0 +1,2 @@ +test.mp4 0 3 +test.mp4 0 2 4 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/action_list.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/action_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..48f8ec821c61e3275cc8e40f3dce735d84e61047 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/action_list.txt @@ -0,0 +1,16 @@ +item { + name: "action1" + id: 12 +} +item { + name: "action2" + id: 17 +} +item { + name: "action3" + id: 79 +} +item { + name: "action3" + id: 80 +} diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/ava_excluded_timestamps_sample.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/ava_excluded_timestamps_sample.csv new file mode 100644 index 0000000000000000000000000000000000000000..c3467e84e498acfc42e9cffae11786ea489dfc0c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/ava_excluded_timestamps_sample.csv @@ -0,0 +1,2 @@ +0f39OWEqJ24,0903 +_-Z6wFjXtGQ,0902 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/ava_proposals_sample.pkl b/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/ava_proposals_sample.pkl new file mode 100644 index 0000000000000000000000000000000000000000..ada67b5319a0bef13bb95a53c4badc0b041251fa Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/ava_proposals_sample.pkl differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/ava_sample.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/ava_sample.csv new file mode 100644 index 0000000000000000000000000000000000000000..e1a6982e2c704da219799a74694584312b4a9a7e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/ava_dataset/ava_sample.csv @@ -0,0 +1,8 @@ +0f39OWEqJ24,0902,0.031,0.162,0.670,0.995,12,0 +0f39OWEqJ24,0902,0.031,0.162,0.670,0.995,17,0 +0f39OWEqJ24,0902,0.031,0.162,0.670,0.995,79,0 +0f39OWEqJ24,0903,0.034,0.189,0.669,0.980,12,0 +0f39OWEqJ24,0903,0.034,0.189,0.669,0.980,17,0 +_-Z6wFjXtGQ,0902,0.063,0.049,0.524,0.996,12,0 +_-Z6wFjXtGQ,0902,0.063,0.049,0.524,0.996,74,0 +_-Z6wFjXtGQ,0902,0.063,0.049,0.524,0.996,80,0 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/bsp_features/v_test1.npy b/openmmlab_test/mmaction2-0.24.1/tests/data/bsp_features/v_test1.npy new file mode 100644 index 0000000000000000000000000000000000000000..f291a057fe9ba0555d22fce81873c8c5accbd2fa Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/bsp_features/v_test1.npy differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/action_list.txt b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/action_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..758f3daa50ea6b7bb44e54ed7baeb8b86708ab81 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/action_list.txt @@ -0,0 +1,12 @@ +item { + name: "action1" + id: 1 +} +item { + name: "action2" + id: 2 +} +item { + name: "action3" + id: 3 +} diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/gt.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/gt.csv new file mode 100644 index 0000000000000000000000000000000000000000..13913a869406aae19eb37f87b2cb74839e8a5cce --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/gt.csv @@ -0,0 +1,12 @@ +3reY9zJKhqN,1774,0.278,0.203,0.964,0.677,3,0 +3reY9zJKhqN,1774,0.050,0.230,0.522,0.952,1,1 +3reY9zJKhqN,1774,0.154,0.039,0.757,0.743,1,2 +3reY9zJKhqN,1774,0.428,0.482,0.659,0.607,2,3 +HmR8SmNIoxu,1384,0.278,0.296,0.729,0.957,3,0 +HmR8SmNIoxu,1384,0.254,0.371,0.677,0.859,3,1 +HmR8SmNIoxu,1384,0.061,0.318,0.584,0.710,1,2 +HmR8SmNIoxu,1384,0.484,0.483,0.895,0.837,3,3 +5HNXoce1raG,1097,0.195,0.031,1.000,0.664,2,0 +5HNXoce1raG,1097,0.047,0.218,0.512,0.504,1,1 +5HNXoce1raG,1097,0.362,0.465,0.932,0.696,2,2 +5HNXoce1raG,1097,0.446,0.156,0.856,0.951,3,3 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/pred.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/pred.csv new file mode 100644 index 0000000000000000000000000000000000000000..6e8ff802639ee64f743a2a67f0f44ecdf9764b55 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/pred.csv @@ -0,0 +1,30 @@ +3reY9zJKhqN,1774,0.072,0.470,0.840,0.898,2,0.655 +3reY9zJKhqN,1774,0.230,0.215,0.781,0.534,1,0.949 +3reY9zJKhqN,1774,0.195,0.128,0.643,0.944,1,0.640 +3reY9zJKhqN,1774,0.236,0.189,0.689,0.740,3,0.681 +3reY9zJKhqN,1774,0.375,0.371,0.726,0.804,3,0.425 +3reY9zJKhqN,1774,0.024,0.398,0.776,0.719,1,0.160 +3reY9zJKhqN,1774,0.477,0.135,0.959,0.967,2,0.753 +3reY9zJKhqN,1774,0.435,0.071,0.966,0.578,1,0.088 +3reY9zJKhqN,1774,0.089,0.494,0.583,0.669,1,0.084 +3reY9zJKhqN,1774,0.136,0.129,0.507,0.532,1,0.041 +HmR8SmNIoxu,1384,0.152,0.299,0.599,0.577,1,0.060 +HmR8SmNIoxu,1384,0.360,0.170,0.731,0.987,3,0.138 +HmR8SmNIoxu,1384,0.348,0.193,0.533,0.727,2,0.429 +HmR8SmNIoxu,1384,0.242,0.396,0.875,0.907,2,0.470 +HmR8SmNIoxu,1384,0.496,0.023,0.730,0.673,3,0.473 +HmR8SmNIoxu,1384,0.038,0.025,0.843,0.570,1,0.606 +HmR8SmNIoxu,1384,0.156,0.193,0.836,0.836,2,0.388 +HmR8SmNIoxu,1384,0.433,0.072,0.962,0.755,3,0.787 +HmR8SmNIoxu,1384,0.430,0.026,0.948,0.524,2,0.518 +HmR8SmNIoxu,1384,0.273,0.210,0.907,0.712,3,0.396 +5HNXoce1raG,1097,0.331,0.328,0.783,0.825,3,0.157 +5HNXoce1raG,1097,0.140,0.195,0.558,0.983,3,0.989 +5HNXoce1raG,1097,0.130,0.207,0.761,0.523,2,0.976 +5HNXoce1raG,1097,0.145,0.444,0.611,0.571,1,0.560 +5HNXoce1raG,1097,0.448,0.116,0.513,0.657,1,0.131 +5HNXoce1raG,1097,0.468,0.361,0.511,0.512,2,0.608 +5HNXoce1raG,1097,0.321,0.093,0.749,0.841,1,0.298 +5HNXoce1raG,1097,0.018,0.137,0.650,0.832,3,0.390 +5HNXoce1raG,1097,0.002,0.417,0.851,0.573,1,0.083 +5HNXoce1raG,1097,0.130,0.389,0.872,0.611,2,0.912 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/proposal.pkl b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/proposal.pkl new file mode 100644 index 0000000000000000000000000000000000000000..f28ad8530258fa49ad389c29f8f4c78e9ea3e5f3 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_detection/proposal.pkl differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/eval_localization/gt.json b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_localization/gt.json new file mode 100644 index 0000000000000000000000000000000000000000..4ee7b1285be8e2822f53ed1bf40eabb6d4df2956 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_localization/gt.json @@ -0,0 +1,46 @@ +{ + "v_bYUmtLBL7W4": { + "duration": 224.49, + "subset": "validation", + "resolution": "1920x1080", + "url": "https://www.youtube.com/watch?v=bYUmtLBL7W4", + "annotations": [ + { + "segment": [ + 11.553655226209049, + 57.06805460218409 + ], + "label": "Wakeboarding" + }, + { + "segment": [ + 68.62170982839314, + 126.03987519500778 + ], + "label": "Wakeboarding" + }, + { + "segment": [ + 135.4928658346334, + 201.31368954758187 + ], + "label": "Wakeboarding" + } + ] + }, + "v_hDPLy21Yyuk": { + "duration": 76.23, + "subset": "validation", + "resolution": "1280x720", + "url": "https://www.youtube.com/watch?v=hDPLy21Yyuk", + "annotations": [ + { + "segment": [ + 21.392480499219968, + 76.161 + ], + "label": "Cleaning shoes" + } + ] + } +} diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/eval_localization/result.json b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_localization/result.json new file mode 100644 index 0000000000000000000000000000000000000000..98a6075cad58fafeea14a95bcfaa9ea4303b3004 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/eval_localization/result.json @@ -0,0 +1,120 @@ +{ + "results": { + "bYUmtLBL7W4": [ + { + "label": "Wakeboarding", + "score": 0.6533445119857788, + "segment": [ + 0.0, + 206.3465619982159 + ] + }, + { + "label": "Wakeboarding", + "score": 0.5620265007019043, + "segment": [ + 33.64346119536128, + 206.3465619982159 + ] + }, + { + "label": "Wakeboarding", + "score": 0.4421495497226715, + "segment": [ + 148.03122925958965, + 204.1036645851918 + ] + }, + { + "label": "Wakeboarding", + "score": 0.31284379959106445, + "segment": [ + 0.0, + 123.35935771632472 + ] + }, + { + "label": "Wakeboarding", + "score": 0.2897574603557587, + "segment": [ + 67.28692239072257, + 206.3465619982159 + ] + }, + { + "label": "Wakeboarding", + "score": 0.284942090511322, + "segment": [ + 33.64346119536128, + 125.60225512934882 + ] + }, + { + "label": "Wakeboarding", + "score": 0.12905514240264893, + "segment": [ + 0.0, + 53.829537912578054 + ] + }, + { + "label": "Wakeboarding", + "score": 0.12616874277591705, + "segment": [ + 67.28692239072257, + 123.35935771632472 + ] + }, + { + "label": "Wakeboarding", + "score": 0.12591737508773804, + "segment": [ + 100.93038358608386, + 204.1036645851918 + ] + }, + { + "label": "Wakeboarding", + "score": 0.10444077104330064, + "segment": [ + 38.12925602140946, + 53.829537912578054 + ] + } + ], + "hDPLy21Yyuk": [ + { + "label": "Cleaning shoes", + "score": 0.5667440891265869, + "segment": [ + 21.222965776805253, + 75.03834328227572 + ] + }, + { + "label": "Cleaning shoes", + "score": 0.414698988199234, + "segment": [ + 21.222965776805253, + 43.96185768052516 + ] + }, + { + "label": "Cleaning shoes", + "score": 0.21768000721931455, + "segment": [ + 0.0, + 75.03834328227572 + ] + }, + { + "label": "Cleaning shoes", + "score": 0.10800375044345856, + "segment": [ + 29.560559474835888, + 70.49056490153174 + ] + } + ] + } +} diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00001.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00001.jpg new file mode 100644 index 0000000000000000000000000000000000000000..e846e5af2e8cad0c4d99f440b0d1d7709f82fd26 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00001.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00002.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00002.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6d7c81b31702ec1e861eb94fbeed4509d4a23d75 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00002.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00003.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00003.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6289b32ecf59281f846de097bad6d577b9fb59a4 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00003.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00004.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00004.jpg new file mode 100644 index 0000000000000000000000000000000000000000..a75094d0d5b64889ed0c36c5ecdb98428ecd3b94 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00004.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00005.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00005.jpg new file mode 100644 index 0000000000000000000000000000000000000000..25828b83669ad72e1e76fb249282899798167eaf Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00005.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00006.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00006.jpg new file mode 100644 index 0000000000000000000000000000000000000000..7f0fa6ca5ce2bd44b3c2108c9024cafcd9e12fc7 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00006.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00007.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00007.jpg new file mode 100644 index 0000000000000000000000000000000000000000..2ebc51fe1b110d26e1201299eaad86c4ee5c0460 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00007.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00008.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00008.jpg new file mode 100644 index 0000000000000000000000000000000000000000..f9747042fbb3c2409c013b288076bdc9d2a0d3aa Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00008.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00009.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00009.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b4a74ebb0debc4fcbd9c96e75fb679383a05fbbb Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00009.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00010.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00010.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9944e620895f613649e97c7cd74a4c3f6d1ab746 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/img_00010.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00001.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00001.jpg new file mode 100644 index 0000000000000000000000000000000000000000..705ba4b6aee3fd579f0c6ff3edf709b27bacdb8b Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00001.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00002.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00002.jpg new file mode 100644 index 0000000000000000000000000000000000000000..f5016755fb98a2f7f81913b85d1c8e728f6098bb Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00002.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00003.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00003.jpg new file mode 100644 index 0000000000000000000000000000000000000000..f419d712d874e67305799c39818c8375e3c66d15 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00003.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00004.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00004.jpg new file mode 100644 index 0000000000000000000000000000000000000000..cb52d25933899bc3bee8a29e36fa22554f0b2e31 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00004.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00005.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00005.jpg new file mode 100644 index 0000000000000000000000000000000000000000..399fda2544f4c819ec86bcd460d90fd90a27c1c2 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/x_00005.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00001.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00001.jpg new file mode 100644 index 0000000000000000000000000000000000000000..743b0b2a6d1c16093a5a20b1d0583d791145d4b9 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00001.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00002.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00002.jpg new file mode 100644 index 0000000000000000000000000000000000000000..37f84d07eec4abc29311f743636c9e33d67bbe15 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00002.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00003.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00003.jpg new file mode 100644 index 0000000000000000000000000000000000000000..938a5b6cdc6280c477f88f1815177888200aead5 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00003.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00004.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00004.jpg new file mode 100644 index 0000000000000000000000000000000000000000..af4c666c4c411c97ab8d48034ddaaa0b8c05855a Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00004.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00005.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00005.jpg new file mode 100644 index 0000000000000000000000000000000000000000..41e05d707236b7cf9cb3c48d01ac3b8d2cbfe3bf Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/imgs/y_00005.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/lfb/lfb_unittest.pkl b/openmmlab_test/mmaction2-0.24.1/tests/data/lfb/lfb_unittest.pkl new file mode 100644 index 0000000000000000000000000000000000000000..26de4b75b5b1806e11fd6c749c027f8afa2d15a1 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/lfb/lfb_unittest.pkl differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/proposals/v_test1.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/proposals/v_test1.csv new file mode 100644 index 0000000000000000000000000000000000000000..5c65dc30ba666dd3f6c3cd7bfd2e9d7ed265faa9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/proposals/v_test1.csv @@ -0,0 +1,10 @@ +tmin,tmax,tmin_score,tmax_score,score,match_iou,match_ioa +0.1,0.2,0.95,0.96,0.97,0.85,0.84 +0.2,0.3,0.94,0.95,0.96,0.84,0.83 +0.3,0.4,0.93,0.94,0.95,0.83,0.82 +0.4,0.5,0.92,0.93,0.94,0.82,0.81 +0.5,0.6,0.91,0.92,0.93,0.81,0.80 +0.6,0.7,0.90,0.91,0.92,0.80,0.79 +0.5,0.7,0.90,0.91,0.92,0.80,0.79 +0.6,0.8,0.90,0.91,0.92,0.80,0.79 +0.4,0.7,0.90,0.91,0.92,0.80,0.79 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/proposals/v_test2.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/proposals/v_test2.csv new file mode 100644 index 0000000000000000000000000000000000000000..5fa47cff2a053401e9f2446959e16aa2aeffcb83 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/proposals/v_test2.csv @@ -0,0 +1,7 @@ +tmin,tmax,tmin_score,tmax_score,score,match_iou,match_ioa +0.1,0.2,0.95,0.96,0.97,0.75,0.74 +0.2,0.3,0.94,0.95,0.96,0.74,0.73 +0.3,0.4,0.93,0.94,0.95,0.73,0.72 +0.4,0.5,0.92,0.93,0.94,0.72,0.71 +0.5,0.6,0.91,0.92,0.93,0.71,0.70 +0.6,0.7,0.90,0.91,0.92,0.70,0.79 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/rawvideo_dataset/part_0.mp4 b/openmmlab_test/mmaction2-0.24.1/tests/data/rawvideo_dataset/part_0.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..285e1b423c6f9aa0932ee63d2fce04c956e23165 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/rawvideo_dataset/part_0.mp4 differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/rawvideo_dataset/part_1.mp4 b/openmmlab_test/mmaction2-0.24.1/tests/data/rawvideo_dataset/part_1.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..285e1b423c6f9aa0932ee63d2fce04c956e23165 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/rawvideo_dataset/part_1.mp4 differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/tem_results/v_test1.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/tem_results/v_test1.csv new file mode 100644 index 0000000000000000000000000000000000000000..2a20d5a58b9de607dd60a7c88a9d7fb4d265dbe7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/tem_results/v_test1.csv @@ -0,0 +1,11 @@ +action,start,end,tmin,tmax +3.711169585585594177e-02,5.839086771011352539e-01,1.464508026838302612e-01,0.0,0.1 +1.555041410028934479e-02,3.062666654586791992e-01,2.622193098068237305e-01,0.1,0.2 +1.146762818098068237e-02,1.464279890060424805e-01,3.260520696640014648e-01,0.2,0.3 +1.371797081083059311e-02,1.365097165107727051e-01,3.570831716060638428e-01,0.3,0.4 +1.519643329083919525e-02,1.688144057989120483e-01,3.057994544506072998e-01,0.4,0.5 +1.968025043606758118e-02,1.974480003118515015e-01,2.933082580566406250e-01,0.5,0.6 +2.251588553190231323e-02,1.885317713022232056e-01,3.326449990272521973e-01,0.6,0.7 +2.402217499911785126e-02,1.918197423219680786e-01,3.420312106609344482e-01,0.7,0.8 +2.045033127069473267e-02,1.970291137695312500e-01,3.339000344276428223e-01,0.8,0.9 +3.435279428958892822e-02,5.583426356315612793e-01,1.250019371509552002e-01,0.9,1.0 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/tem_results/v_test2.csv b/openmmlab_test/mmaction2-0.24.1/tests/data/tem_results/v_test2.csv new file mode 100644 index 0000000000000000000000000000000000000000..89fd5c40aca73ce64fcb294fd630d56c63fafdfe --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/data/tem_results/v_test2.csv @@ -0,0 +1,11 @@ +action,start,end,tmin,tmax +5.711169585585594177e-02,7.839086771011352539e-01,3.464508026838302612e-01,0.0,0.1 +2.555041410028934479e-02,3.062666654586791992e-01,3.622193098068237305e-01,0.1,0.2 +2.146762818098068237e-02,2.464279890060424805e-01,3.260520696640014648e-01,0.2,0.3 +1.371797081083059311e-02,1.365097165107727051e-01,3.570831716060638428e-01,0.3,0.4 +1.519643329083919525e-02,1.688144057989120483e-01,3.057994544506072998e-01,0.4,0.5 +1.968025043606758118e-02,1.974480003118515015e-01,2.933082580566406250e-01,0.5,0.6 +2.251588553190231323e-02,1.885317713022232056e-01,3.326449990272521973e-01,0.6,0.7 +2.402217499911785126e-02,1.918197423219680786e-01,3.420312106609344482e-01,0.7,0.8 +2.045033127069473267e-02,1.970291137695312500e-01,3.339000344276428223e-01,0.8,0.9 +3.435279428958892822e-02,5.583426356315612793e-01,1.250019371509552002e-01,0.9,1.0 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/test.avi b/openmmlab_test/mmaction2-0.24.1/tests/data/test.avi new file mode 100644 index 0000000000000000000000000000000000000000..4026d9088d0c8bab0eff9f700ae7a376bfad14a7 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/test.avi differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/test.jpg b/openmmlab_test/mmaction2-0.24.1/tests/data/test.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d88aea0ac50bce6efdde58c2248bbd25d1ae9122 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/test.jpg differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/test.mp4 b/openmmlab_test/mmaction2-0.24.1/tests/data/test.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..ef46c799088f7306950ed6804316e6f688d79888 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/test.mp4 differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/data/test.wav b/openmmlab_test/mmaction2-0.24.1/tests/data/test.wav new file mode 100644 index 0000000000000000000000000000000000000000..c66741b24bd3d02a37adc261a5f222a02b9f73c5 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tests/data/test.wav differ diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_blending.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_blending.py new file mode 100644 index 0000000000000000000000000000000000000000..cff88e161eea5b66bbd825cde01989b225b1ed79 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_blending.py @@ -0,0 +1,42 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch + +from mmaction.datasets import CutmixBlending, MixupBlending + + +def test_mixup(): + alpha = 0.2 + num_classes = 10 + label = torch.randint(0, num_classes, (4, )) + mixup = MixupBlending(num_classes, alpha) + + # NCHW imgs + imgs = torch.randn(4, 4, 3, 32, 32) + mixed_imgs, mixed_label = mixup(imgs, label) + assert mixed_imgs.shape == torch.Size((4, 4, 3, 32, 32)) + assert mixed_label.shape == torch.Size((4, num_classes)) + + # NCTHW imgs + imgs = torch.randn(4, 4, 2, 3, 32, 32) + mixed_imgs, mixed_label = mixup(imgs, label) + assert mixed_imgs.shape == torch.Size((4, 4, 2, 3, 32, 32)) + assert mixed_label.shape == torch.Size((4, num_classes)) + + +def test_cutmix(): + alpha = 0.2 + num_classes = 10 + label = torch.randint(0, num_classes, (4, )) + mixup = CutmixBlending(num_classes, alpha) + + # NCHW imgs + imgs = torch.randn(4, 4, 3, 32, 32) + mixed_imgs, mixed_label = mixup(imgs, label) + assert mixed_imgs.shape == torch.Size((4, 4, 3, 32, 32)) + assert mixed_label.shape == torch.Size((4, num_classes)) + + # NCTHW imgs + imgs = torch.randn(4, 4, 2, 3, 32, 32) + mixed_imgs, mixed_label = mixup(imgs, label) + assert mixed_imgs.shape == torch.Size((4, 4, 2, 3, 32, 32)) + assert mixed_label.shape == torch.Size((4, num_classes)) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_compose.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_compose.py new file mode 100644 index 0000000000000000000000000000000000000000..5e782b80e141130157e62c208db45277508eb4c2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_compose.py @@ -0,0 +1,72 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +from mmcv.utils import assert_keys_equal, digit_version + +from mmaction.datasets.pipelines import Compose, ImageToTensor + +try: + import torchvision + torchvision_ok = False + if digit_version(torchvision.__version__) >= digit_version('0.8.0'): + torchvision_ok = True +except (ImportError, ModuleNotFoundError): + torchvision_ok = False + + +def test_compose(): + with pytest.raises(TypeError): + # transform must be callable or a dict + Compose('LoadImage') + + target_keys = ['img', 'img_metas'] + + # test Compose given a data pipeline + img = np.random.randn(256, 256, 3) + results = dict(img=img, abandoned_key=None, img_name='test_image.png') + test_pipeline = [ + dict(type='Collect', keys=['img'], meta_keys=['img_name']), + dict(type='ImageToTensor', keys=['img']) + ] + compose = Compose(test_pipeline) + compose_results = compose(results) + assert assert_keys_equal(compose_results.keys(), target_keys) + assert assert_keys_equal(compose_results['img_metas'].data.keys(), + ['img_name']) + + # test Compose when forward data is None + results = None + image_to_tensor = ImageToTensor(keys=[]) + test_pipeline = [image_to_tensor] + compose = Compose(test_pipeline) + compose_results = compose(results) + assert compose_results is None + + assert repr(compose) == compose.__class__.__name__ + \ + f'(\n {image_to_tensor}\n)' + + +@pytest.mark.skipif( + not torchvision_ok, reason='torchvision >= 0.8.0 is required') +def test_compose_support_torchvision(): + target_keys = ['imgs', 'img_metas'] + + # test Compose given a data pipeline + imgs = [np.random.randn(256, 256, 3)] * 8 + results = dict( + imgs=imgs, + abandoned_key=None, + img_name='test_image.png', + clip_len=8, + num_clips=1) + test_pipeline = [ + dict(type='torchvision.Grayscale', num_output_channels=3), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs'], meta_keys=['img_name']), + dict(type='ToTensor', keys=['imgs']) + ] + compose = Compose(test_pipeline) + compose_results = compose(results) + assert assert_keys_equal(compose_results.keys(), target_keys) + assert assert_keys_equal(compose_results['img_metas'].data.keys(), + ['img_name']) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/__init__.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..1a8f94080954e71ff8f8aef4d2051a667e0e9f39 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .base import BaseTestDataset + +__all__ = ['BaseTestDataset'] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/base.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/base.py new file mode 100644 index 0000000000000000000000000000000000000000..3b4604c5bbd45958d3dc1e233db1242432e1193e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/base.py @@ -0,0 +1,150 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +from mmcv import ConfigDict + + +class BaseTestDataset: + + @classmethod + def setup_class(cls): + # prefix path + cls.data_prefix = osp.normpath( + osp.join(osp.dirname(__file__), '../../data')) + cls.ann_file_prefix = osp.join(cls.data_prefix, 'annotations') + + # annotations path + cls.action_ann_file = osp.join(cls.ann_file_prefix, + 'action_test_anno.json') + cls.audio_feature_ann_file = osp.join(cls.ann_file_prefix, + 'audio_feature_test_list.txt') + cls.audio_ann_file = osp.join(cls.ann_file_prefix, + 'audio_test_list.txt') + cls.frame_ann_file_multi_label = osp.join( + cls.ann_file_prefix, 'rawframe_test_list_multi_label.txt') + cls.frame_ann_file_with_offset = osp.join( + cls.ann_file_prefix, 'rawframe_test_list_with_offset.txt') + cls.frame_ann_file = osp.join(cls.ann_file_prefix, + 'rawframe_test_list.txt') + cls.hvu_frame_ann_file = osp.join(cls.ann_file_prefix, + 'hvu_frame_test_anno.json') + cls.hvu_video_ann_file = osp.join(cls.ann_file_prefix, + 'hvu_video_test_anno.json') + cls.hvu_video_eval_ann_file = osp.join( + cls.ann_file_prefix, 'hvu_video_eval_test_anno.json') + cls.proposal_ann_file = osp.join(cls.ann_file_prefix, + 'proposal_test_list.txt') + cls.proposal_norm_ann_file = osp.join(cls.ann_file_prefix, + 'proposal_normalized_list.txt') + cls.rawvideo_test_anno_json = osp.join(cls.ann_file_prefix, + 'rawvideo_test_anno.json') + cls.rawvideo_test_anno_txt = osp.join(cls.ann_file_prefix, + 'rawvideo_test_anno.txt') + cls.video_ann_file = osp.join(cls.ann_file_prefix, + 'video_test_list.txt') + cls.video_ann_file_multi_label = osp.join( + cls.ann_file_prefix, 'video_test_list_multi_label.txt') + cls.pose_ann_file = osp.join(cls.ann_file_prefix, 'sample.pkl') + + # pipeline configuration + cls.action_pipeline = [] + cls.audio_feature_pipeline = [ + dict(type='LoadAudioFeature'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1), + dict(type='AudioFeatureSelector') + ] + cls.audio_pipeline = [ + dict(type='AudioDecodeInit'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1), + dict(type='AudioDecode') + ] + cls.frame_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1), + dict(type='RawFrameDecode', io_backend='disk') + ] + cls.proposal_pipeline = [ + dict( + type='SampleProposalFrames', + clip_len=1, + body_segments=5, + aug_segments=(2, 2), + aug_ratio=0.5), + dict(type='RawFrameDecode', io_backend='disk') + ] + cls.proposal_test_pipeline = [ + dict( + type='SampleProposalFrames', + clip_len=1, + body_segments=5, + aug_segments=(2, 2), + aug_ratio=0.5, + mode='test'), + dict(type='RawFrameDecode', io_backend='disk') + ] + cls.proposal_train_cfg = ConfigDict( + dict( + ssn=dict( + assigner=dict( + positive_iou_threshold=0.7, + background_iou_threshold=0.01, + incomplete_iou_threshold=0.5, + background_coverage_threshold=0.02, + incomplete_overlap_threshold=0.01), + sampler=dict( + num_per_video=8, + positive_ratio=1, + background_ratio=1, + incomplete_ratio=6, + add_gt_as_proposals=True), + loss_weight=dict( + comp_loss_weight=0.1, reg_loss_weight=0.1), + debug=False))) + cls.proposal_test_cfg = ConfigDict( + dict( + ssn=dict( + sampler=dict(test_interval=6, batch_size=16), + evaluater=dict( + top_k=2000, + nms=0.2, + softmax_before_filter=True, + cls_top_k=2)))) + cls.proposal_test_cfg_topall = ConfigDict( + dict( + ssn=dict( + sampler=dict(test_interval=6, batch_size=16), + evaluater=dict( + top_k=-1, + nms=0.2, + softmax_before_filter=True, + cls_top_k=2)))) + cls.rawvideo_pipeline = [] + cls.video_pipeline = [ + dict(type='OpenCVInit'), + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1), + dict(type='OpenCVDecode') + ] + + cls.hvu_categories = [ + 'action', 'attribute', 'concept', 'event', 'object', 'scene' + ] + cls.hvu_category_nums = [739, 117, 291, 69, 1679, 248] + cls.hvu_categories_for_eval = ['action', 'scene', 'object'] + cls.hvu_category_nums_for_eval = [3, 3, 3] + + cls.filename_tmpl = 'img_{:05d}.jpg' diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_activitynet_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_activitynet_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..02ae3fdf96342b476a94dfbac8a8390f67bf1da9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_activitynet_dataset.py @@ -0,0 +1,176 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp +import tempfile + +import mmcv +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys +from numpy.testing import assert_array_equal + +from mmaction.datasets import ActivityNetDataset +from .base import BaseTestDataset + + +class TestActivitynetDataset(BaseTestDataset): + + def test_activitynet_dataset(self): + activitynet_dataset = ActivityNetDataset(self.action_ann_file, + self.action_pipeline, + self.data_prefix) + activitynet_infos = activitynet_dataset.video_infos + assert activitynet_infos == [ + dict( + video_name='v_test1', + duration_second=1, + duration_frame=30, + annotations=[dict(segment=[0.3, 0.6], label='Rock climbing')], + feature_frame=30, + fps=30.0, + rfps=30), + dict( + video_name='v_test2', + duration_second=2, + duration_frame=48, + annotations=[dict(segment=[1.0, 2.0], label='Drinking beer')], + feature_frame=48, + fps=24.0, + rfps=24.0) + ] + + def test_activitynet_proposals2json(self): + activitynet_dataset = ActivityNetDataset(self.action_ann_file, + self.action_pipeline, + self.data_prefix) + results = [ + dict( + video_name='v_test1', + proposal_list=[dict(segment=[0.1, 0.9], score=0.1)]), + dict( + video_name='v_test2', + proposal_list=[dict(segment=[10.1, 20.9], score=0.9)]) + ] + result_dict = activitynet_dataset.proposals2json(results) + assert result_dict == dict( + test1=[{ + 'segment': [0.1, 0.9], + 'score': 0.1 + }], + test2=[{ + 'segment': [10.1, 20.9], + 'score': 0.9 + }]) + result_dict = activitynet_dataset.proposals2json(results, True) + assert result_dict == dict( + test1=[{ + 'segment': [0.1, 0.9], + 'score': 0.1 + }], + test2=[{ + 'segment': [10.1, 20.9], + 'score': 0.9 + }]) + + def test_activitynet_evaluate(self): + activitynet_dataset = ActivityNetDataset(self.action_ann_file, + self.action_pipeline, + self.data_prefix) + + with pytest.raises(TypeError): + # results must be a list + activitynet_dataset.evaluate('0.5') + + with pytest.raises(AssertionError): + # The length of results must be equal to the dataset len + activitynet_dataset.evaluate([0] * 5) + + with pytest.raises(KeyError): + # unsupported metric + activitynet_dataset.evaluate( + [0] * len(activitynet_dataset), metrics='iou') + + # evaluate AR@AN metric + results = [ + dict( + video_name='v_test1', + proposal_list=[dict(segment=[0.1, 0.9], score=0.1)]), + dict( + video_name='v_test2', + proposal_list=[dict(segment=[10.1, 20.9], score=0.9)]) + ] + eval_result = activitynet_dataset.evaluate(results, metrics=['AR@AN']) + assert set(eval_result) == set( + ['auc', 'AR@1', 'AR@5', 'AR@10', 'AR@100']) + + def test_activitynet_dump_results(self): + activitynet_dataset = ActivityNetDataset(self.action_ann_file, + self.action_pipeline, + self.data_prefix) + # test dumping json file + results = [ + dict( + video_name='v_test1', + proposal_list=[dict(segment=[0.1, 0.9], score=0.1)]), + dict( + video_name='v_test2', + proposal_list=[dict(segment=[10.1, 20.9], score=0.9)]) + ] + dump_results = { + 'version': 'VERSION 1.3', + 'results': { + 'test1': [{ + 'segment': [0.1, 0.9], + 'score': 0.1 + }], + 'test2': [{ + 'segment': [10.1, 20.9], + 'score': 0.9 + }] + }, + 'external_data': {} + } + + with tempfile.TemporaryDirectory() as tmpdir: + + tmp_filename = osp.join(tmpdir, 'result.json') + activitynet_dataset.dump_results(results, tmp_filename, 'json') + assert osp.isfile(tmp_filename) + with open(tmp_filename, 'r+') as f: + load_obj = mmcv.load(f, file_format='json') + assert load_obj == dump_results + + # test dumping csv file + results = [('test_video', np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, + 10]]))] + with tempfile.TemporaryDirectory() as tmpdir: + activitynet_dataset.dump_results(results, tmpdir, 'csv') + load_obj = np.loadtxt( + osp.join(tmpdir, 'test_video.csv'), + dtype=np.float32, + delimiter=',', + skiprows=1) + assert_array_equal( + load_obj, + np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], + dtype=np.float32)) + + def test_action_pipeline(self): + target_keys = ['video_name', 'data_prefix'] + + # ActivityNet Dataset not in test mode + action_dataset = ActivityNetDataset( + self.action_ann_file, + self.action_pipeline, + self.data_prefix, + test_mode=False) + result = action_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + # ActivityNet Dataset in test mode + action_dataset = ActivityNetDataset( + self.action_ann_file, + self.action_pipeline, + self.data_prefix, + test_mode=True) + result = action_dataset[0] + assert assert_dict_has_keys(result, target_keys) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_audio_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_audio_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..bb7015223da267d2bdcace88b04bd6cdc898e2b7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_audio_dataset.py @@ -0,0 +1,78 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets import AudioDataset +from .base import BaseTestDataset + + +class TestAudioDataset(BaseTestDataset): + + def test_audio_dataset(self): + audio_dataset = AudioDataset( + self.audio_ann_file, + self.audio_pipeline, + data_prefix=self.data_prefix) + audio_infos = audio_dataset.video_infos + wav_path = osp.join(self.data_prefix, 'test.wav') + assert audio_infos == [ + dict(audio_path=wav_path, total_frames=100, label=127) + ] * 2 + + def test_audio_pipeline(self): + target_keys = [ + 'audio_path', 'label', 'start_index', 'modality', 'audios_shape', + 'length', 'sample_rate', 'total_frames' + ] + + # Audio dataset not in test mode + audio_dataset = AudioDataset( + self.audio_ann_file, + self.audio_pipeline, + data_prefix=self.data_prefix, + test_mode=False) + result = audio_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + # Audio dataset in test mode + audio_dataset = AudioDataset( + self.audio_ann_file, + self.audio_pipeline, + data_prefix=self.data_prefix, + test_mode=True) + result = audio_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + def test_audio_evaluate(self): + audio_dataset = AudioDataset( + self.audio_ann_file, + self.audio_pipeline, + data_prefix=self.data_prefix) + + with pytest.raises(TypeError): + # results must be a list + audio_dataset.evaluate('0.5') + + with pytest.raises(AssertionError): + # The length of results must be equal to the dataset len + audio_dataset.evaluate([0] * 5) + + with pytest.raises(TypeError): + # topk must be int or tuple of int + audio_dataset.evaluate( + [0] * len(audio_dataset), + metric_options=dict(top_k_accuracy=dict(topk=1.))) + + with pytest.raises(KeyError): + # unsupported metric + audio_dataset.evaluate([0] * len(audio_dataset), metrics='iou') + + # evaluate top_k_accuracy and mean_class_accuracy metric + results = [np.array([0.1, 0.5, 0.4])] * 2 + eval_result = audio_dataset.evaluate( + results, metrics=['top_k_accuracy', 'mean_class_accuracy']) + assert set(eval_result) == set( + ['top1_acc', 'top5_acc', 'mean_class_accuracy']) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_audio_feature_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_audio_feature_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..ceb4946133b6a8f582a278092ead0bae3cfec4e6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_audio_feature_dataset.py @@ -0,0 +1,78 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets import AudioFeatureDataset +from .base import BaseTestDataset + + +class TestAudioFeatureDataset(BaseTestDataset): + + def test_audio_feature_dataset(self): + audio_dataset = AudioFeatureDataset( + self.audio_feature_ann_file, + self.audio_feature_pipeline, + data_prefix=self.data_prefix) + audio_infos = audio_dataset.video_infos + feature_path = osp.join(self.data_prefix, 'test.npy') + assert audio_infos == [ + dict(audio_path=feature_path, total_frames=100, label=127) + ] * 2 + + def test_audio_feature_pipeline(self): + target_keys = [ + 'audio_path', 'label', 'start_index', 'modality', 'audios', + 'total_frames' + ] + + # Audio feature dataset not in test mode + audio_feature_dataset = AudioFeatureDataset( + self.audio_feature_ann_file, + self.audio_feature_pipeline, + data_prefix=self.data_prefix, + test_mode=False) + result = audio_feature_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + # Audio dataset in test mode + audio_feature_dataset = AudioFeatureDataset( + self.audio_feature_ann_file, + self.audio_feature_pipeline, + data_prefix=self.data_prefix, + test_mode=True) + result = audio_feature_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + def test_audio_feature_evaluate(self): + audio_dataset = AudioFeatureDataset( + self.audio_feature_ann_file, + self.audio_feature_pipeline, + data_prefix=self.data_prefix) + + with pytest.raises(TypeError): + # results must be a list + audio_dataset.evaluate('0.5') + + with pytest.raises(AssertionError): + # The length of results must be equal to the dataset len + audio_dataset.evaluate([0] * 5) + + with pytest.raises(TypeError): + # topk must be int or tuple of int + audio_dataset.evaluate( + [0] * len(audio_dataset), + metric_options=dict(top_k_accuracy=dict(topk=1.))) + + with pytest.raises(KeyError): + # unsupported metric + audio_dataset.evaluate([0] * len(audio_dataset), metrics='iou') + + # evaluate top_k_accuracy and mean_class_accuracy metric + results = [np.array([0.1, 0.5, 0.4])] * 2 + eval_result = audio_dataset.evaluate( + results, metrics=['top_k_accuracy', 'mean_class_accuracy']) + assert set(eval_result) == set( + ['top1_acc', 'top5_acc', 'mean_class_accuracy']) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_audio_visual_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_audio_visual_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..34fedabb5801bada95be6f3d00bedbe1dfba5b01 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_audio_visual_dataset.py @@ -0,0 +1,29 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +from mmaction.datasets import AudioVisualDataset +from .base import BaseTestDataset + + +class TestAudioVisualDataset(BaseTestDataset): + + def test_audio_visual_dataset(self): + test_dataset = AudioVisualDataset( + self.frame_ann_file, + self.frame_pipeline, + self.data_prefix, + video_prefix=self.data_prefix, + data_prefix=self.data_prefix) + video_infos = test_dataset.video_infos + frame_dir = osp.join(self.data_prefix, 'imgs') + audio_path = osp.join(self.data_prefix, 'imgs.npy') + filename = osp.join(self.data_prefix, 'imgs.mp4') + assert video_infos == [ + dict( + frame_dir=frame_dir, + audio_path=audio_path, + filename=filename, + total_frames=5, + label=127) + ] * 2 + assert test_dataset.start_index == 1 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_ava_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_ava_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..0d054023acf26753c90069f15ed0c9b7c0f3cb84 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_ava_dataset.py @@ -0,0 +1,221 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import mmcv +import numpy as np +from mmcv.utils import assert_dict_has_keys +from numpy.testing import assert_array_almost_equal, assert_array_equal + +from mmaction.datasets import AVADataset + + +class TestAVADataset: + + @classmethod + def setup_class(cls): + cls.data_prefix = osp.normpath( + osp.join(osp.dirname(__file__), '../../data', 'ava_dataset')) + cls.label_file = osp.join(cls.data_prefix, 'action_list.txt') + cls.ann_file = osp.join(cls.data_prefix, 'ava_sample.csv') + cls.exclude_file = osp.join(cls.data_prefix, + 'ava_excluded_timestamps_sample.csv') + cls.proposal_file = osp.join(cls.data_prefix, + 'ava_proposals_sample.pkl') + cls.pipeline = [ + dict(dict(type='SampleAVAFrames', clip_len=32, frame_interval=2)) + ] + cls.proposal = mmcv.load(cls.proposal_file) + + def test_ava_dataset(self): + target_keys = [ + 'frame_dir', 'video_id', 'timestamp', 'img_key', 'shot_info', + 'fps', 'ann' + ] + ann_keys = ['gt_labels', 'gt_bboxes', 'entity_ids'] + pkl_keys = ['0f39OWEqJ24,0902', '0f39OWEqJ24,0903', '_-Z6wFjXtGQ,0902'] + + ava_dataset = AVADataset( + self.ann_file, + self.exclude_file, + self.pipeline, + data_prefix=self.data_prefix, + proposal_file=self.proposal_file) + ava_infos = ava_dataset.video_infos + assert assert_dict_has_keys(ava_dataset.proposals, pkl_keys) + + assert assert_dict_has_keys(ava_infos[0], target_keys) + assert assert_dict_has_keys(ava_infos[0]['ann'], ann_keys) + assert len(ava_infos) == 1 + assert ava_infos[0]['frame_dir'] == osp.join(self.data_prefix, + '0f39OWEqJ24') + assert ava_infos[0]['video_id'] == '0f39OWEqJ24' + assert ava_infos[0]['timestamp'] == 902 + assert ava_infos[0]['img_key'] == '0f39OWEqJ24,0902' + assert ava_infos[0]['shot_info'] == (0, 27000) + assert ava_infos[0]['fps'] == 30 + assert len(ava_infos[0]['ann']) == 3 + target_labels = np.array([12, 17, 79]) + labels = np.zeros([81]) + labels[target_labels] = 1. + target_labels = labels[None, ...] + assert_array_equal(ava_infos[0]['ann']['gt_labels'], target_labels) + assert_array_equal(ava_infos[0]['ann']['gt_bboxes'], + np.array([[0.031, 0.162, 0.67, 0.995]])) + assert_array_equal(ava_infos[0]['ann']['entity_ids'], np.array([0])) + + # custom classes + ava_dataset = AVADataset( + self.ann_file, + self.exclude_file, + self.pipeline, + label_file=self.label_file, + custom_classes=[17, 79], + num_classes=3, + data_prefix=self.data_prefix, + proposal_file=self.proposal_file) + ava_infos = ava_dataset.video_infos + target_labels = np.array([1, 2]) + labels = np.zeros([3]) + labels[target_labels] = 1. + target_labels = labels[None, ...] + assert_array_equal(ava_infos[0]['ann']['gt_labels'], target_labels) + assert_array_equal(ava_infos[0]['ann']['gt_bboxes'], + np.array([[0.031, 0.162, 0.67, 0.995]])) + assert_array_equal(ava_infos[0]['ann']['entity_ids'], np.array([0])) + + ava_dataset = AVADataset( + self.ann_file, + None, + self.pipeline, + data_prefix=self.data_prefix, + proposal_file=self.proposal_file) + ava_infos = ava_dataset.video_infos + assert len(ava_infos) == 3 + + ava_dataset = AVADataset( + self.ann_file, + None, + self.pipeline, + test_mode=True, + data_prefix=self.data_prefix, + proposal_file=self.proposal_file) + ava_infos = ava_dataset.video_infos + assert len(ava_infos) == 3 + + ava_dataset = AVADataset( + self.ann_file, + None, + self.pipeline, + test_mode=True, + data_prefix=self.data_prefix, + proposal_file=self.proposal_file) + + def test_ava_pipeline(self): + target_keys = [ + 'frame_dir', 'video_id', 'timestamp', 'img_key', 'shot_info', + 'fps', 'filename_tmpl', 'modality', 'start_index', + 'timestamp_start', 'timestamp_end', 'proposals', 'scores', + 'frame_inds', 'clip_len', 'frame_interval', 'gt_labels', + 'gt_bboxes', 'entity_ids' + ] + + ava_dataset = AVADataset( + self.ann_file, + self.exclude_file, + self.pipeline, + data_prefix=self.data_prefix, + proposal_file=self.proposal_file) + result = ava_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + assert result['filename_tmpl'] == 'img_{:05}.jpg' + assert result['modality'] == 'RGB' + assert result['start_index'] == 0 + assert result['timestamp_start'] == 900 + assert result['timestamp_end'] == 1800 + assert_array_equal(result['proposals'], + np.array([[0.011, 0.157, 0.655, 0.983]])) + assert_array_equal(result['scores'], np.array([0.998163])) + + assert result['clip_len'] == 32 + assert result['frame_interval'] == 2 + assert len(result['frame_inds']) == 32 + + ava_dataset = AVADataset( + self.ann_file, + None, + self.pipeline, + test_mode=True, + data_prefix=self.data_prefix, + proposal_file=self.proposal_file) + # Try to get a sample + result = ava_dataset[0] + assert result['filename_tmpl'] == 'img_{:05}.jpg' + assert result['modality'] == 'RGB' + assert result['start_index'] == 0 + assert result['timestamp_start'] == 900 + assert result['timestamp_end'] == 1800 + + @staticmethod + def test_ava_evaluate(): + data_prefix = osp.normpath( + osp.join(osp.dirname(__file__), '../../data', 'eval_detection')) + ann_file = osp.join(data_prefix, 'gt.csv') + label_file = osp.join(data_prefix, 'action_list.txt') + + ava_dataset = AVADataset( + ann_file, None, [], label_file=label_file, num_classes=4) + fake_result = [[ + np.array([[0.362, 0.156, 0.969, 0.666, 0.106], + [0.442, 0.083, 0.721, 0.947, 0.162]]), + np.array([[0.288, 0.365, 0.766, 0.551, 0.706], + [0.178, 0.296, 0.707, 0.995, 0.223]]), + np.array([[0.417, 0.167, 0.843, 0.939, 0.015], + [0.35, 0.421, 0.57, 0.689, 0.427]]) + ], + [ + np.array([[0.256, 0.338, 0.726, 0.799, 0.563], + [0.071, 0.256, 0.64, 0.75, 0.297]]), + np.array([[0.326, 0.036, 0.513, 0.991, 0.405], + [0.351, 0.035, 0.729, 0.936, 0.945]]), + np.array([[0.051, 0.005, 0.975, 0.942, 0.424], + [0.347, 0.05, 0.97, 0.944, 0.396]]) + ], + [ + np.array([[0.39, 0.087, 0.833, 0.616, 0.447], + [0.461, 0.212, 0.627, 0.527, 0.036]]), + np.array([[0.022, 0.394, 0.93, 0.527, 0.109], + [0.208, 0.462, 0.874, 0.948, 0.954]]), + np.array([[0.206, 0.456, 0.564, 0.725, 0.685], + [0.106, 0.445, 0.782, 0.673, 0.367]]) + ]] + res = ava_dataset.evaluate(fake_result) + assert_array_almost_equal(res['mAP@0.5IOU'], 0.027777778) + + # custom classes + ava_dataset = AVADataset( + ann_file, + None, [], + label_file=label_file, + num_classes=3, + custom_classes=[1, 3]) + fake_result = [[ + np.array([[0.362, 0.156, 0.969, 0.666, 0.106], + [0.442, 0.083, 0.721, 0.947, 0.162]]), + np.array([[0.417, 0.167, 0.843, 0.939, 0.015], + [0.35, 0.421, 0.57, 0.689, 0.427]]) + ], + [ + np.array([[0.256, 0.338, 0.726, 0.799, 0.563], + [0.071, 0.256, 0.64, 0.75, 0.297]]), + np.array([[0.051, 0.005, 0.975, 0.942, 0.424], + [0.347, 0.05, 0.97, 0.944, 0.396]]) + ], + [ + np.array([[0.39, 0.087, 0.833, 0.616, 0.447], + [0.461, 0.212, 0.627, 0.527, 0.036]]), + np.array([[0.206, 0.456, 0.564, 0.725, 0.685], + [0.106, 0.445, 0.782, 0.673, 0.367]]) + ]] + res = ava_dataset.evaluate(fake_result) + assert_array_almost_equal(res['mAP@0.5IOU'], 0.04166667) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_concat_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_concat_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..4c9b6ed782e866a93025705ffc76d4db6a92f054 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_concat_dataset.py @@ -0,0 +1,34 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np + +from mmaction.datasets import ConcatDataset +from .base import BaseTestDataset + + +class TestConcatDataset(BaseTestDataset): + + def test_concat_dataset(self): + dataset_cfg = dict( + type='RawframeDataset', + ann_file=self.frame_ann_file, + pipeline=self.frame_pipeline, + data_prefix=self.data_prefix) + repeat_dataset_cfg = dict( + type='RepeatDataset', times=2, dataset=dataset_cfg) + + concat_dataset = ConcatDataset( + datasets=[dataset_cfg, repeat_dataset_cfg]) + + assert len(concat_dataset) == 6 + result_a = concat_dataset[0] + result_b = concat_dataset[4] + assert set(result_a) == set(result_b) + for key in result_a: + if isinstance(result_a[key], np.ndarray): + assert np.equal(result_a[key], result_b[key]).all() + elif isinstance(result_a[key], list): + assert all( + np.array_equal(a, b) + for (a, b) in zip(result_a[key], result_b[key])) + else: + assert result_a[key] == result_b[key] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_hvu_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_hvu_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..eb449778da40a99a8d852f72b039169e736a068b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_hvu_dataset.py @@ -0,0 +1,82 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import numpy as np +from numpy.testing import assert_array_almost_equal + +from mmaction.datasets import HVUDataset +from .base import BaseTestDataset + + +class TestHVUDataset(BaseTestDataset): + + def test_hvu_dataset(self): + hvu_frame_dataset = HVUDataset( + ann_file=self.hvu_frame_ann_file, + pipeline=self.frame_pipeline, + tag_categories=self.hvu_categories, + tag_category_nums=self.hvu_category_nums, + filename_tmpl=self.filename_tmpl, + data_prefix=self.data_prefix, + start_index=1) + hvu_frame_infos = hvu_frame_dataset.video_infos + frame_dir = osp.join(self.data_prefix, 'imgs') + assert hvu_frame_infos == [ + dict( + frame_dir=frame_dir, + total_frames=5, + label=dict( + concept=[250, 131, 42, 51, 57, 155, 122], + object=[1570, 508], + event=[16], + action=[180], + scene=[206]), + categories=self.hvu_categories, + category_nums=self.hvu_category_nums, + filename_tmpl=self.filename_tmpl, + start_index=1, + modality='RGB') + ] * 2 + + hvu_video_dataset = HVUDataset( + ann_file=self.hvu_video_ann_file, + pipeline=self.video_pipeline, + tag_categories=self.hvu_categories, + tag_category_nums=self.hvu_category_nums, + data_prefix=self.data_prefix) + hvu_video_infos = hvu_video_dataset.video_infos + filename = osp.join(self.data_prefix, 'tmp.mp4') + assert hvu_video_infos == [ + dict( + filename=filename, + label=dict( + concept=[250, 131, 42, 51, 57, 155, 122], + object=[1570, 508], + event=[16], + action=[180], + scene=[206]), + categories=self.hvu_categories, + category_nums=self.hvu_category_nums) + ] * 2 + + hvu_video_eval_dataset = HVUDataset( + ann_file=self.hvu_video_eval_ann_file, + pipeline=self.video_pipeline, + tag_categories=self.hvu_categories_for_eval, + tag_category_nums=self.hvu_category_nums_for_eval, + data_prefix=self.data_prefix) + + results = [ + np.array([ + -1.59812844, 0.24459082, 1.38486497, 0.28801252, 1.09813449, + -0.28696971, 0.0637848, 0.22877678, -1.82406999 + ]), + np.array([ + 0.87904563, 1.64264224, 0.46382051, 0.72865088, -2.13712525, + 1.28571358, 1.01320328, 0.59292737, -0.05502892 + ]) + ] + mAP = hvu_video_eval_dataset.evaluate(results) + assert_array_almost_equal(mAP['action_mAP'], 1.0) + assert_array_almost_equal(mAP['scene_mAP'], 0.5) + assert_array_almost_equal(mAP['object_mAP'], 0.75) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_pose_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_pose_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..3449cc878bd7b0758f7e99d06662acbcaba67b91 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_pose_dataset.py @@ -0,0 +1,62 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest + +from mmaction.datasets import PoseDataset +from .base import BaseTestDataset + + +class TestPoseDataset(BaseTestDataset): + + def test_pose_dataset(self): + ann_file = self.pose_ann_file + data_prefix = 'root' + dataset = PoseDataset( + ann_file=ann_file, + pipeline=[], + box_thr='0.5', + data_prefix=data_prefix) + assert len(dataset) == 100 + item = dataset[0] + assert item['filename'].startswith(data_prefix) + + dataset = PoseDataset( + ann_file=ann_file, + pipeline=[], + valid_ratio=0.2, + box_thr='0.9', + data_prefix=data_prefix) + assert len(dataset) == 84 + for item in dataset: + assert item['filename'].startswith(data_prefix) + assert np.all(item['box_score'][item['anno_inds']] >= 0.9) + assert item['valid@0.9'] / item['total_frames'] >= 0.2 + + dataset = PoseDataset( + ann_file=ann_file, + pipeline=[], + valid_ratio=0.3, + box_thr='0.7', + data_prefix=data_prefix) + assert len(dataset) == 87 + for item in dataset: + assert item['filename'].startswith(data_prefix) + assert np.all(item['box_score'][item['anno_inds']] >= 0.7) + assert item['valid@0.7'] / item['total_frames'] >= 0.3 + + class_prob = {i: 1 for i in range(400)} + dataset = PoseDataset( + ann_file=ann_file, + pipeline=[], + valid_ratio=0.3, + box_thr='0.7', + data_prefix=data_prefix, + class_prob=class_prob) + + with pytest.raises(AssertionError): + dataset = PoseDataset( + ann_file=ann_file, + pipeline=[], + valid_ratio=0.2, + box_thr='0.55', + data_prefix=data_prefix) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_rawframe_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_rawframe_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..43fbeec100035ae5adcc3538570b7227cda847d6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_rawframe_dataset.py @@ -0,0 +1,165 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets import RawframeDataset +from .base import BaseTestDataset + + +class TestRawframDataset(BaseTestDataset): + + def test_rawframe_dataset(self): + rawframe_dataset = RawframeDataset(self.frame_ann_file, + self.frame_pipeline, + self.data_prefix) + rawframe_infos = rawframe_dataset.video_infos + frame_dir = osp.join(self.data_prefix, 'imgs') + assert rawframe_infos == [ + dict(frame_dir=frame_dir, total_frames=5, label=127) + ] * 2 + assert rawframe_dataset.start_index == 1 + + def test_rawframe_dataset_with_offset(self): + rawframe_dataset = RawframeDataset( + self.frame_ann_file_with_offset, + self.frame_pipeline, + self.data_prefix, + with_offset=True) + rawframe_infos = rawframe_dataset.video_infos + frame_dir = osp.join(self.data_prefix, 'imgs') + assert rawframe_infos == [ + dict(frame_dir=frame_dir, offset=2, total_frames=5, label=127) + ] * 2 + assert rawframe_dataset.start_index == 1 + + def test_rawframe_dataset_multi_label(self): + rawframe_dataset = RawframeDataset( + self.frame_ann_file_multi_label, + self.frame_pipeline, + self.data_prefix, + multi_class=True, + num_classes=100) + rawframe_infos = rawframe_dataset.video_infos + frame_dir = osp.join(self.data_prefix, 'imgs') + label0 = [1] + label1 = [3, 5] + labels = [label0, label1] + for info, label in zip(rawframe_infos, labels): + assert info['frame_dir'] == frame_dir + assert info['total_frames'] == 5 + assert set(info['label']) == set(label) + assert rawframe_dataset.start_index == 1 + + def test_dataset_realpath(self): + dataset = RawframeDataset(self.frame_ann_file, self.frame_pipeline, + '.') + assert dataset.data_prefix == osp.realpath('.') + dataset = RawframeDataset(self.frame_ann_file, self.frame_pipeline, + 's3://good') + assert dataset.data_prefix == 's3://good' + + dataset = RawframeDataset(self.frame_ann_file, self.frame_pipeline) + assert dataset.data_prefix is None + assert dataset.video_infos[0]['frame_dir'] == 'imgs' + + def test_rawframe_pipeline(self): + target_keys = [ + 'frame_dir', 'total_frames', 'label', 'filename_tmpl', + 'start_index', 'modality' + ] + + # RawframeDataset not in test mode + rawframe_dataset = RawframeDataset( + self.frame_ann_file, + self.frame_pipeline, + self.data_prefix, + test_mode=False) + result = rawframe_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + # RawframeDataset in multi-class tasks + rawframe_dataset = RawframeDataset( + self.frame_ann_file, + self.frame_pipeline, + self.data_prefix, + multi_class=True, + num_classes=400, + test_mode=False) + result = rawframe_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + # RawframeDataset with offset + rawframe_dataset = RawframeDataset( + self.frame_ann_file_with_offset, + self.frame_pipeline, + self.data_prefix, + with_offset=True, + num_classes=400, + test_mode=False) + result = rawframe_dataset[0] + assert assert_dict_has_keys(result, target_keys + ['offset']) + + # RawframeDataset in test mode + rawframe_dataset = RawframeDataset( + self.frame_ann_file, + self.frame_pipeline, + self.data_prefix, + test_mode=True) + result = rawframe_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + # RawframeDataset in multi-class tasks in test mode + rawframe_dataset = RawframeDataset( + self.frame_ann_file, + self.frame_pipeline, + self.data_prefix, + multi_class=True, + num_classes=400, + test_mode=True) + result = rawframe_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + # RawframeDataset with offset + rawframe_dataset = RawframeDataset( + self.frame_ann_file_with_offset, + self.frame_pipeline, + self.data_prefix, + with_offset=True, + num_classes=400, + test_mode=True) + result = rawframe_dataset[0] + assert assert_dict_has_keys(result, target_keys + ['offset']) + + def test_rawframe_evaluate(self): + rawframe_dataset = RawframeDataset(self.frame_ann_file, + self.frame_pipeline, + self.data_prefix) + + with pytest.raises(TypeError): + # results must be a list + rawframe_dataset.evaluate('0.5') + + with pytest.raises(AssertionError): + # The length of results must be equal to the dataset len + rawframe_dataset.evaluate([0] * 5) + + with pytest.raises(TypeError): + # topk must be int or tuple of int + rawframe_dataset.evaluate( + [0] * len(rawframe_dataset), + metric_options=dict(top_k_accuracy=dict(topk=1.))) + + with pytest.raises(KeyError): + # unsupported metric + rawframe_dataset.evaluate( + [0] * len(rawframe_dataset), metrics='iou') + + # evaluate top_k_accuracy and mean_class_accuracy metric + results = [np.array([0.1, 0.5, 0.4])] * 2 + eval_result = rawframe_dataset.evaluate( + results, metrics=['top_k_accuracy', 'mean_class_accuracy']) + assert set(eval_result) == set( + ['top1_acc', 'top5_acc', 'mean_class_accuracy']) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_rawvideo_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_rawvideo_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..86fd4b0c153728fbbc72bc79c5723af867457be3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_rawvideo_dataset.py @@ -0,0 +1,30 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +from mmaction.datasets import RawVideoDataset +from .base import BaseTestDataset + + +class TestRawVideoDataset(BaseTestDataset): + + def test_rawvideo_dataset(self): + # Try to load txt file + rawvideo_dataset = RawVideoDataset( + ann_file=self.rawvideo_test_anno_txt, + pipeline=self.rawvideo_pipeline, + clipname_tmpl='part_{}.mp4', + sampling_strategy='positive', + data_prefix=self.data_prefix) + result = rawvideo_dataset[0] + clipname = osp.join(self.data_prefix, 'rawvideo_dataset', 'part_0.mp4') + assert result['filename'] == clipname + + # Try to load json file + rawvideo_dataset = RawVideoDataset( + ann_file=self.rawvideo_test_anno_json, + pipeline=self.rawvideo_pipeline, + clipname_tmpl='part_{}.mp4', + sampling_strategy='random', + data_prefix=self.data_prefix, + test_mode=True) + result = rawvideo_dataset[0] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_repeat_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_repeat_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..736fcc3998cf96c44df69bff05347ce632786cb4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_repeat_dataset.py @@ -0,0 +1,30 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np + +from mmaction.datasets import RepeatDataset +from .base import BaseTestDataset + + +class TestRepeatDataset(BaseTestDataset): + + def test_repeat_dataset(self): + dataset_cfg = dict( + type='RawframeDataset', + ann_file=self.frame_ann_file, + pipeline=self.frame_pipeline, + data_prefix=self.data_prefix) + + repeat_dataset = RepeatDataset(dataset_cfg, 5) + assert len(repeat_dataset) == 10 + result_a = repeat_dataset[0] + result_b = repeat_dataset[2] + assert set(result_a) == set(result_b) + for key in result_a: + if isinstance(result_a[key], np.ndarray): + assert np.equal(result_a[key], result_b[key]).all() + elif isinstance(result_a[key], list): + assert all( + np.array_equal(a, b) + for (a, b) in zip(result_a[key], result_b[key])) + else: + assert result_a[key] == result_b[key] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_ssn_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_ssn_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..3b71f3cde975feca237a94ed459d221f56cf07b2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_ssn_dataset.py @@ -0,0 +1,176 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets import SSNDataset +from .base import BaseTestDataset + + +class TestSSNDataset(BaseTestDataset): + + def test_proposal_pipeline(self): + target_keys = [ + 'frame_dir', 'video_id', 'total_frames', 'gts', 'proposals', + 'filename_tmpl', 'modality', 'out_proposals', 'reg_targets', + 'proposal_scale_factor', 'proposal_labels', 'proposal_type', + 'start_index' + ] + + # SSN Dataset not in test mode + proposal_dataset = SSNDataset( + self.proposal_ann_file, + self.proposal_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg, + data_prefix=self.data_prefix) + result = proposal_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + # SSN Dataset with random sampling proposals + proposal_dataset = SSNDataset( + self.proposal_ann_file, + self.proposal_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg, + data_prefix=self.data_prefix, + video_centric=False) + result = proposal_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + target_keys = [ + 'frame_dir', 'video_id', 'total_frames', 'gts', 'proposals', + 'filename_tmpl', 'modality', 'relative_proposal_list', + 'scale_factor_list', 'proposal_tick_list', 'reg_norm_consts', + 'start_index' + ] + + # SSN Dataset in test mode + proposal_dataset = SSNDataset( + self.proposal_ann_file, + self.proposal_test_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg, + data_prefix=self.data_prefix, + test_mode=True) + result = proposal_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + def test_ssn_dataset(self): + # test ssn dataset + ssn_dataset = SSNDataset( + self.proposal_ann_file, + self.proposal_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg, + data_prefix=self.data_prefix) + ssn_infos = ssn_dataset.video_infos + assert ssn_infos[0]['video_id'] == 'imgs' + assert ssn_infos[0]['total_frames'] == 5 + + # test ssn dataset with verbose + ssn_dataset = SSNDataset( + self.proposal_ann_file, + self.proposal_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg, + data_prefix=self.data_prefix, + verbose=True) + ssn_infos = ssn_dataset.video_infos + assert ssn_infos[0]['video_id'] == 'imgs' + assert ssn_infos[0]['total_frames'] == 5 + + # test ssn dataset with normalized proposal file + with pytest.raises(Exception): + ssn_dataset = SSNDataset( + self.proposal_norm_ann_file, + self.proposal_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg, + data_prefix=self.data_prefix) + ssn_infos = ssn_dataset.video_infos + + # test ssn dataset with reg_normalize_constants + ssn_dataset = SSNDataset( + self.proposal_ann_file, + self.proposal_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg, + data_prefix=self.data_prefix, + reg_normalize_constants=[[[-0.0603, 0.0325], [0.0752, 0.1596]]]) + ssn_infos = ssn_dataset.video_infos + assert ssn_infos[0]['video_id'] == 'imgs' + assert ssn_infos[0]['total_frames'] == 5 + + # test error case + with pytest.raises(TypeError): + ssn_dataset = SSNDataset( + self.proposal_ann_file, + self.proposal_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg, + data_prefix=self.data_prefix, + aug_ratio=('error', 'error')) + ssn_infos = ssn_dataset.video_infos + + def test_ssn_evaluate(self): + ssn_dataset = SSNDataset( + self.proposal_ann_file, + self.proposal_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg, + data_prefix=self.data_prefix) + ssn_dataset_topall = SSNDataset( + self.proposal_ann_file, + self.proposal_pipeline, + self.proposal_train_cfg, + self.proposal_test_cfg_topall, + data_prefix=self.data_prefix) + + with pytest.raises(TypeError): + # results must be a list + ssn_dataset.evaluate('0.5') + + with pytest.raises(AssertionError): + # The length of results must be equal to the dataset len + ssn_dataset.evaluate([0] * 5) + + with pytest.raises(KeyError): + # unsupported metric + ssn_dataset.evaluate([0] * len(ssn_dataset), metrics='iou') + + # evaluate mAP metric + results_relative_proposal_list = np.random.randn(16, 2) + results_activity_scores = np.random.randn(16, 21) + results_completeness_scores = np.random.randn(16, 20) + results_bbox_preds = np.random.randn(16, 20, 2) + results = [ + dict( + relative_proposal_list=results_relative_proposal_list, + activity_scores=results_activity_scores, + completeness_scores=results_completeness_scores, + bbox_preds=results_bbox_preds) + ] + eval_result = ssn_dataset.evaluate(results, metrics=['mAP']) + assert set(eval_result) == set([ + 'mAP@0.10', 'mAP@0.20', 'mAP@0.30', 'mAP@0.40', 'mAP@0.50', + 'mAP@0.50', 'mAP@0.60', 'mAP@0.70', 'mAP@0.80', 'mAP@0.90' + ]) + + # evaluate mAP metric without filtering topk + results_relative_proposal_list = np.random.randn(16, 2) + results_activity_scores = np.random.randn(16, 21) + results_completeness_scores = np.random.randn(16, 20) + results_bbox_preds = np.random.randn(16, 20, 2) + results = [ + dict( + relative_proposal_list=results_relative_proposal_list, + activity_scores=results_activity_scores, + completeness_scores=results_completeness_scores, + bbox_preds=results_bbox_preds) + ] + eval_result = ssn_dataset_topall.evaluate(results, metrics=['mAP']) + assert set(eval_result) == set([ + 'mAP@0.10', 'mAP@0.20', 'mAP@0.30', 'mAP@0.40', 'mAP@0.50', + 'mAP@0.50', 'mAP@0.60', 'mAP@0.70', 'mAP@0.80', 'mAP@0.90' + ]) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_video_dataset.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_video_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..36d280b3ef8f70fa03a8bab981f9588679ea63cf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_datasets/test_video_dataset.py @@ -0,0 +1,100 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets import VideoDataset +from .base import BaseTestDataset + + +class TestVideoDataset(BaseTestDataset): + + def test_video_dataset(self): + video_dataset = VideoDataset( + self.video_ann_file, + self.video_pipeline, + data_prefix=self.data_prefix, + start_index=3) + assert len(video_dataset) == 2 + assert video_dataset.start_index == 3 + + video_dataset = VideoDataset( + self.video_ann_file, + self.video_pipeline, + data_prefix=self.data_prefix) + video_infos = video_dataset.video_infos + video_filename = osp.join(self.data_prefix, 'test.mp4') + assert video_infos == [dict(filename=video_filename, label=0)] * 2 + assert video_dataset.start_index == 0 + + def test_video_dataset_multi_label(self): + video_dataset = VideoDataset( + self.video_ann_file_multi_label, + self.video_pipeline, + data_prefix=self.data_prefix, + multi_class=True, + num_classes=100) + video_infos = video_dataset.video_infos + video_filename = osp.join(self.data_prefix, 'test.mp4') + label0 = [0, 3] + label1 = [0, 2, 4] + labels = [label0, label1] + for info, label in zip(video_infos, labels): + print(info, video_filename) + assert info['filename'] == video_filename + assert set(info['label']) == set(label) + assert video_dataset.start_index == 0 + + def test_video_pipeline(self): + target_keys = ['filename', 'label', 'start_index', 'modality'] + + # VideoDataset not in test mode + video_dataset = VideoDataset( + self.video_ann_file, + self.video_pipeline, + data_prefix=self.data_prefix, + test_mode=False) + result = video_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + # VideoDataset in test mode + video_dataset = VideoDataset( + self.video_ann_file, + self.video_pipeline, + data_prefix=self.data_prefix, + test_mode=True) + result = video_dataset[0] + assert assert_dict_has_keys(result, target_keys) + + def test_video_evaluate(self): + video_dataset = VideoDataset( + self.video_ann_file, + self.video_pipeline, + data_prefix=self.data_prefix) + + with pytest.raises(TypeError): + # results must be a list + video_dataset.evaluate('0.5') + + with pytest.raises(AssertionError): + # The length of results must be equal to the dataset len + video_dataset.evaluate([0] * 5) + + with pytest.raises(TypeError): + # topk must be int or tuple of int + video_dataset.evaluate( + [0] * len(video_dataset), + metric_options=dict(top_k_accuracy=dict(topk=1.))) + + with pytest.raises(KeyError): + # unsupported metric + video_dataset.evaluate([0] * len(video_dataset), metrics='iou') + + # evaluate top_k_accuracy and mean_class_accuracy metric + results = [np.array([0.1, 0.5, 0.4])] * 2 + eval_result = video_dataset.evaluate( + results, metrics=['top_k_accuracy', 'mean_class_accuracy']) + assert set(eval_result) == set( + ['top1_acc', 'top5_acc', 'mean_class_accuracy']) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_formating.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_formating.py new file mode 100644 index 0000000000000000000000000000000000000000..c3607e64a3a20c13cdb2bf3348a9c3a6443a12ca --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_formating.py @@ -0,0 +1,227 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +import torch +from mmcv.parallel import DataContainer as DC +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets.pipelines import (Collect, FormatAudioShape, + FormatGCNInput, FormatShape, + ImageToTensor, Rename, + ToDataContainer, ToTensor, Transpose) + + +def test_rename(): + org_name = 'a' + new_name = 'b' + mapping = {org_name: new_name} + rename = Rename(mapping) + results = dict(a=2) + results = rename(results) + assert results['b'] == 2 + assert 'a' not in results + + +def test_to_tensor(): + to_tensor = ToTensor(['str']) + with pytest.raises(TypeError): + # str cannot be converted to tensor + results = dict(str='0') + to_tensor(results) + + # convert tensor, numpy, sequence, int, float to tensor + target_keys = ['tensor', 'numpy', 'sequence', 'int', 'float'] + to_tensor = ToTensor(target_keys) + original_results = dict( + tensor=torch.randn(2, 3), + numpy=np.random.randn(2, 3), + sequence=list(range(10)), + int=1, + float=0.1) + results = to_tensor(original_results) + assert assert_dict_has_keys(results, target_keys) + for key in target_keys: + assert isinstance(results[key], torch.Tensor) + assert torch.equal(results[key].data, original_results[key]) + + # Add an additional key which is not in keys. + original_results = dict( + tensor=torch.randn(2, 3), + numpy=np.random.randn(2, 3), + sequence=list(range(10)), + int=1, + float=0.1, + str='test') + results = to_tensor(original_results) + assert assert_dict_has_keys(results, target_keys) + for key in target_keys: + assert isinstance(results[key], torch.Tensor) + assert torch.equal(results[key].data, original_results[key]) + + assert repr(to_tensor) == to_tensor.__class__.__name__ + \ + f'(keys={target_keys})' + + +def test_to_data_container(): + # check user-defined fields + fields = (dict(key='key1', stack=True), dict(key='key2')) + to_data_container = ToDataContainer(fields=fields) + target_keys = ['key1', 'key2'] + original_results = dict(key1=np.random.randn(10, 20), key2=['a', 'b']) + results = to_data_container(original_results.copy()) + assert assert_dict_has_keys(results, target_keys) + for key in target_keys: + assert isinstance(results[key], DC) + assert np.all(results[key].data == original_results[key]) + assert results['key1'].stack + assert not results['key2'].stack + + # Add an additional key which is not in keys. + original_results = dict( + key1=np.random.randn(10, 20), key2=['a', 'b'], key3='value3') + results = to_data_container(original_results.copy()) + assert assert_dict_has_keys(results, target_keys) + for key in target_keys: + assert isinstance(results[key], DC) + assert np.all(results[key].data == original_results[key]) + assert results['key1'].stack + assert not results['key2'].stack + + assert repr(to_data_container) == ( + to_data_container.__class__.__name__ + f'(fields={fields})') + + +def test_image_to_tensor(): + original_results = dict(imgs=np.random.randn(256, 256, 3)) + keys = ['imgs'] + image_to_tensor = ImageToTensor(keys) + results = image_to_tensor(original_results) + assert results['imgs'].shape == torch.Size([3, 256, 256]) + assert isinstance(results['imgs'], torch.Tensor) + assert torch.equal(results['imgs'].data, original_results['imgs']) + assert repr(image_to_tensor) == image_to_tensor.__class__.__name__ + \ + f'(keys={keys})' + + +def test_transpose(): + results = dict(imgs=np.random.randn(256, 256, 3)) + keys = ['imgs'] + order = [2, 0, 1] + transpose = Transpose(keys, order) + results = transpose(results) + assert results['imgs'].shape == (3, 256, 256) + assert repr(transpose) == transpose.__class__.__name__ + \ + f'(keys={keys}, order={order})' + + +def test_collect(): + inputs = dict( + imgs=np.random.randn(256, 256, 3), + label=[1], + filename='test.txt', + original_shape=(256, 256, 3), + img_shape=(256, 256, 3), + pad_shape=(256, 256, 3), + flip_direction='vertical', + img_norm_cfg=dict(to_bgr=False)) + keys = ['imgs', 'label'] + collect = Collect(keys) + results = collect(inputs) + assert sorted(list(results.keys())) == sorted( + ['imgs', 'label', 'img_metas']) + imgs = inputs.pop('imgs') + assert set(results['img_metas'].data) == set(inputs) + for key in results['img_metas'].data: + assert results['img_metas'].data[key] == inputs[key] + assert repr(collect) == collect.__class__.__name__ + \ + (f'(keys={keys}, meta_keys={collect.meta_keys}, ' + f'nested={collect.nested})') + + inputs['imgs'] = imgs + collect = Collect(keys, nested=True) + results = collect(inputs) + assert sorted(list(results.keys())) == sorted( + ['imgs', 'label', 'img_metas']) + for k in results: + assert isinstance(results[k], list) + + +def test_format_shape(): + with pytest.raises(ValueError): + # invalid input format + FormatShape('NHWC') + + # 'NCHW' input format + results = dict( + imgs=np.random.randn(3, 224, 224, 3), num_clips=1, clip_len=3) + format_shape = FormatShape('NCHW') + assert format_shape(results)['input_shape'] == (3, 3, 224, 224) + + # `NCTHW` input format with num_clips=1, clip_len=3 + results = dict( + imgs=np.random.randn(3, 224, 224, 3), num_clips=1, clip_len=3) + format_shape = FormatShape('NCTHW') + assert format_shape(results)['input_shape'] == (1, 3, 3, 224, 224) + + # `NCTHW` input format with num_clips=2, clip_len=3 + results = dict( + imgs=np.random.randn(18, 224, 224, 3), num_clips=2, clip_len=3) + assert format_shape(results)['input_shape'] == (6, 3, 3, 224, 224) + target_keys = ['imgs', 'input_shape'] + assert assert_dict_has_keys(results, target_keys) + + assert repr(format_shape) == format_shape.__class__.__name__ + \ + "(input_format='NCTHW')" + + # 'NPTCHW' input format + results = dict( + imgs=np.random.randn(72, 224, 224, 3), + num_clips=9, + clip_len=1, + num_proposals=8) + format_shape = FormatShape('NPTCHW') + assert format_shape(results)['input_shape'] == (8, 9, 3, 224, 224) + + +def test_format_audio_shape(): + with pytest.raises(ValueError): + # invalid input format + FormatAudioShape('XXXX') + + # 'NCTF' input format + results = dict(audios=np.random.randn(3, 128, 8)) + format_shape = FormatAudioShape('NCTF') + assert format_shape(results)['input_shape'] == (3, 1, 128, 8) + assert repr(format_shape) == format_shape.__class__.__name__ + \ + "(input_format='NCTF')" + + +def test_format_gcn_input(): + with pytest.raises(ValueError): + # invalid input format + FormatGCNInput('XXXX') + + # 'NCTVM' input format + results = dict( + keypoint=np.random.randn(2, 300, 17, 2), + keypoint_score=np.random.randn(2, 300, 17)) + format_shape = FormatGCNInput('NCTVM', num_person=2) + assert format_shape(results)['input_shape'] == (3, 300, 17, 2) + assert repr(format_shape) == format_shape.__class__.__name__ + \ + "(input_format='NCTVM')" + + # test real num_person < 2 + results = dict( + keypoint=np.random.randn(1, 300, 17, 2), + keypoint_score=np.random.randn(1, 300, 17)) + assert format_shape(results)['input_shape'] == (3, 300, 17, 2) + assert repr(format_shape) == format_shape.__class__.__name__ + \ + "(input_format='NCTVM')" + + # test real num_person > 2 + results = dict( + keypoint=np.random.randn(3, 300, 17, 2), + keypoint_score=np.random.randn(3, 300, 17)) + assert format_shape(results)['input_shape'] == (3, 300, 17, 2) + assert repr(format_shape) == format_shape.__class__.__name__ + \ + "(input_format='NCTVM')" diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/__init__.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..949b51e923c8098fd526ca1c68b96fe5984aa56f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .base import check_crop, check_flip, check_normalize + +__all__ = ['check_crop', 'check_flip', 'check_normalize'] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/base.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/base.py new file mode 100644 index 0000000000000000000000000000000000000000..cc75917bfc818566e712d441eea0153ec13aef0e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/base.py @@ -0,0 +1,70 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +from numpy.testing import assert_array_almost_equal + + +def check_crop(origin_imgs, result_imgs, result_bbox, num_crops=1): + """Check if the result_bbox is in correspond to result_imgs.""" + + def check_single_crop(origin_imgs, result_imgs, result_bbox): + result_img_shape = result_imgs[0].shape[:2] + crop_w = result_bbox[2] - result_bbox[0] + crop_h = result_bbox[3] - result_bbox[1] + crop_shape = (crop_h, crop_w) + if not crop_shape == result_img_shape: + return False + left, top, right, bottom = result_bbox + return np.array_equal( + np.array(origin_imgs)[:, top:bottom, left:right, :], + np.array(result_imgs)) + + if result_bbox.ndim == 1: + return check_single_crop(origin_imgs, result_imgs, result_bbox) + if result_bbox.ndim == 2: + num_batch = len(origin_imgs) + for i, bbox in enumerate(result_bbox): + if num_crops == 10: + if (i // num_batch) % 2 == 0: + flag = check_single_crop([origin_imgs[i % num_batch]], + [result_imgs[i]], bbox) + else: + flag = check_single_crop([origin_imgs[i % num_batch]], + [np.flip(result_imgs[i], axis=1)], + bbox) + else: + flag = check_single_crop([origin_imgs[i % num_batch]], + [result_imgs[i]], bbox) + if not flag: + return False + return True + else: + # bbox has a wrong dimension + return False + + +def check_flip(origin_imgs, result_imgs, flip_type): + """Check if the origin_imgs are flipped correctly into result_imgs in + different flip_types.""" + n, _, _, _ = np.shape(origin_imgs) + if flip_type == 'horizontal': + for i in range(n): + if np.any(result_imgs[i] != np.fliplr(origin_imgs[i])): + return False + else: + # yapf: disable + for i in range(n): + if np.any(result_imgs[i] != np.transpose(np.fliplr(np.transpose(origin_imgs[i], (1, 0, 2))), (1, 0, 2))): # noqa:E501 + return False + # yapf: enable + return True + + +def check_normalize(origin_imgs, result_imgs, norm_cfg): + """Check if the origin_imgs are normalized correctly into result_imgs in a + given norm_cfg.""" + target_imgs = result_imgs.copy() + target_imgs *= norm_cfg['std'] + target_imgs += norm_cfg['mean'] + if norm_cfg['to_bgr']: + target_imgs = target_imgs[..., ::-1].copy() + assert_array_almost_equal(origin_imgs, target_imgs, decimal=4) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_audio.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_audio.py new file mode 100644 index 0000000000000000000000000000000000000000..cf1a53e14cb2a01a1178b2720058a7d0f53635aa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_audio.py @@ -0,0 +1,54 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets.pipelines import AudioAmplify, MelSpectrogram + + +class TestAudio: + + @staticmethod + def test_audio_amplify(): + target_keys = ['audios', 'amplify_ratio'] + with pytest.raises(TypeError): + # ratio should be float + AudioAmplify(1) + + audio = (np.random.rand(8, )) + results = dict(audios=audio) + amplifier = AudioAmplify(1.5) + results = amplifier(results) + assert assert_dict_has_keys(results, target_keys) + assert repr(amplifier) == (f'{amplifier.__class__.__name__}' + f'(ratio={amplifier.ratio})') + + @staticmethod + def test_melspectrogram(): + target_keys = ['audios'] + with pytest.raises(TypeError): + # ratio should be float + MelSpectrogram(window_size=12.5) + audio = (np.random.rand(1, 160000)) + + # test padding + results = dict(audios=audio, sample_rate=16000) + results['num_clips'] = 1 + results['sample_rate'] = 16000 + mel = MelSpectrogram() + results = mel(results) + assert assert_dict_has_keys(results, target_keys) + + # test truncating + audio = (np.random.rand(1, 160000)) + results = dict(audios=audio, sample_rate=16000) + results['num_clips'] = 1 + results['sample_rate'] = 16000 + mel = MelSpectrogram(fixed_length=1) + results = mel(results) + assert assert_dict_has_keys(results, target_keys) + assert repr(mel) == (f'{mel.__class__.__name__}' + f'(window_size={mel.window_size}), ' + f'step_size={mel.step_size}, ' + f'n_mels={mel.n_mels}, ' + f'fixed_length={mel.fixed_length})') diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_color.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_color.py new file mode 100644 index 0000000000000000000000000000000000000000..ebf849cc1681302041e5ecf7a1f1e3b5ab253b9c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_color.py @@ -0,0 +1,35 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets.pipelines import ColorJitter + + +class TestColor: + + @staticmethod + def test_color_jitter(): + imgs = list( + np.random.randint(0, 255, size=(3, 112, 112, 3), dtype=np.uint8)) + results = dict(imgs=imgs) + + color_jitter = ColorJitter() + assert color_jitter.brightness == (0.5, 1.5) + assert color_jitter.contrast == (0.5, 1.5) + assert color_jitter.saturation == (0.5, 1.5) + assert color_jitter.hue == (-0.1, 0.1) + + color_jitter_results = color_jitter(results) + target_keys = ['imgs'] + + assert assert_dict_has_keys(color_jitter_results, target_keys) + assert np.shape(color_jitter_results['imgs']) == (3, 112, 112, 3) + for img in color_jitter_results['imgs']: + assert np.all(img >= 0) + assert np.all(img <= 255) + + assert repr(color_jitter) == (f'{color_jitter.__class__.__name__}(' + f'brightness={(0.5, 1.5)}, ' + f'contrast={(0.5, 1.5)}, ' + f'saturation={(0.5, 1.5)}, ' + f'hue={-0.1, 0.1})') diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_crop.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_crop.py new file mode 100644 index 0000000000000000000000000000000000000000..400327deaca004e162afe9a894e7b2914fd3f810 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_crop.py @@ -0,0 +1,294 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets.pipelines import (CenterCrop, MultiScaleCrop, + RandomCrop, RandomResizedCrop, + TenCrop, ThreeCrop) +from .base import check_crop + + +class TestCrops: + + @staticmethod + def test_random_crop(): + with pytest.raises(TypeError): + # size must be an int + RandomCrop(size=(112, 112)) + with pytest.raises(AssertionError): + # "size > height" or "size > width" is not allowed + imgs = list(np.random.rand(2, 224, 341, 3)) + results = dict(imgs=imgs) + random_crop = RandomCrop(size=320) + random_crop(results) + + target_keys = ['imgs', 'crop_bbox', 'img_shape'] + + # General case + imgs = list(np.random.rand(2, 224, 341, 3)) + results = dict(imgs=imgs) + random_crop = RandomCrop(size=224) + results['gt_bboxes'] = np.array([[0, 0, 340, 224]]) + results['proposals'] = np.array([[0, 0, 340, 224]]) + kp = np.array([[160, 120], [160, 120]]).reshape([1, 1, 2, 2]) + results['keypoint'] = kp + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert check_crop(imgs, random_crop_result['imgs'], + results['crop_bbox']) + h, w = random_crop_result['img_shape'] + assert h == w == 224 + + # Test the case that no need for cropping + imgs = list(np.random.rand(2, 224, 224, 3)) + results = dict(imgs=imgs) + random_crop = RandomCrop(size=224) + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert check_crop(imgs, random_crop_result['imgs'], + results['crop_bbox']) + h, w = random_crop_result['img_shape'] + assert h == w == 224 + + # Test the one-side-equal case + imgs = list(np.random.rand(2, 224, 225, 3)) + results = dict(imgs=imgs) + random_crop = RandomCrop(size=224) + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert check_crop(imgs, random_crop_result['imgs'], + results['crop_bbox']) + h, w = random_crop_result['img_shape'] + assert h == w == 224 + + assert repr(random_crop) == (f'{random_crop.__class__.__name__}' + f'(size={224}, lazy={False})') + + @staticmethod + def test_random_resized_crop(): + with pytest.raises(TypeError): + # area_range must be a tuple of float + RandomResizedCrop(area_range=0.5) + with pytest.raises(TypeError): + # aspect_ratio_range must be a tuple of float + RandomResizedCrop(area_range=(0.08, 1.0), aspect_ratio_range=0.1) + + target_keys = ['imgs', 'crop_bbox', 'img_shape'] + # There will be a slight difference because of rounding + eps = 0.01 + imgs = list(np.random.rand(2, 256, 341, 3)) + results = dict(imgs=imgs) + results['gt_bboxes'] = np.array([[0, 0, 340, 256]]) + results['proposals'] = np.array([[0, 0, 340, 256]]) + kp = np.array([[160, 120], [160, 120]]).reshape([1, 1, 2, 2]) + results['keypoint'] = kp + + with pytest.raises(AssertionError): + # area_range[0] > area_range[1], which is wrong + random_crop = RandomResizedCrop(area_range=(0.9, 0.7)) + random_crop(results) + with pytest.raises(AssertionError): + # 0 > area_range[0] and area_range[1] > 1, which is wrong + random_crop = RandomResizedCrop(aspect_ratio_range=(-0.1, 2.0)) + random_crop(results) + + random_crop = RandomResizedCrop() + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert check_crop(imgs, random_crop_result['imgs'], + results['crop_bbox']) + h, w = random_crop_result['img_shape'] + assert ((0.08 - eps <= h * w / 256 / 341) + and (h * w / 256 / 341 <= 1 + eps)) + assert (3. / 4. - eps <= h / w) and (h / w - eps <= 4. / 3.) + assert repr(random_crop) == (f'{random_crop.__class__.__name__}' + f'(area_range={(0.08, 1.0)}, ' + f'aspect_ratio_range={(3 / 4, 4 / 3)}, ' + f'lazy={False})') + + random_crop = RandomResizedCrop( + area_range=(0.9, 0.9), aspect_ratio_range=(10.0, 10.1)) + # Test fallback cases by very big area range + imgs = list(np.random.rand(2, 256, 341, 3)) + results = dict(imgs=imgs) + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert check_crop(imgs, random_crop_result['imgs'], + results['crop_bbox']) + h, w = random_crop_result['img_shape'] + assert h == w == 256 + + @staticmethod + def test_multi_scale_crop(): + with pytest.raises(TypeError): + # input_size must be int or tuple of int + MultiScaleCrop(0.5) + + with pytest.raises(TypeError): + # input_size must be int or tuple of int + MultiScaleCrop('224') + + with pytest.raises(TypeError): + # scales must be tuple. + MultiScaleCrop( + 224, scales=[ + 1, + ]) + + with pytest.raises(ValueError): + # num_fix_crops must be in [5, 13] + MultiScaleCrop(224, num_fixed_crops=6) + + target_keys = ['imgs', 'crop_bbox', 'img_shape', 'scales'] + + # MultiScaleCrop with normal crops. + imgs = list(np.random.rand(2, 256, 341, 3)) + results = dict(imgs=imgs) + results['gt_bboxes'] = np.array([[0, 0, 340, 256]]) + results['proposals'] = np.array([[0, 0, 340, 256]]) + kp = np.array([[160, 120], [160, 120]]).reshape([1, 1, 2, 2]) + results['keypoint'] = kp + config = dict( + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0) + multi_scale_crop = MultiScaleCrop(**config) + multi_scale_crop_results = multi_scale_crop(results) + assert assert_dict_has_keys(multi_scale_crop_results, target_keys) + assert check_crop(imgs, multi_scale_crop_results['imgs'], + multi_scale_crop_results['crop_bbox']) + assert multi_scale_crop_results['img_shape'] in [(256, 256), + (204, 204)] + + # MultiScaleCrop with more fixed crops. + imgs = list(np.random.rand(2, 256, 341, 3)) + results = dict(imgs=imgs) + config = dict( + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0, + num_fixed_crops=13) + multi_scale_crop = MultiScaleCrop(**config) + multi_scale_crop_results = multi_scale_crop(results) + assert assert_dict_has_keys(multi_scale_crop_results, target_keys) + assert check_crop(imgs, multi_scale_crop_results['imgs'], + multi_scale_crop_results['crop_bbox']) + assert multi_scale_crop_results['img_shape'] in [(256, 256), + (204, 204)] + + # MultiScaleCrop with random crop. + imgs = list(np.random.rand(2, 256, 341, 3)) + results = dict(imgs=imgs) + config = dict( + input_size=224, + scales=(1, 0.8), + random_crop=True, + max_wh_scale_gap=0) + multi_scale_crop = MultiScaleCrop(**config) + multi_scale_crop_results = multi_scale_crop(results) + assert assert_dict_has_keys(multi_scale_crop_results, target_keys) + assert check_crop(imgs, multi_scale_crop_results['imgs'], + multi_scale_crop_results['crop_bbox']) + assert (multi_scale_crop_results['img_shape'] in [(256, 256), + (204, 204)]) + + assert repr(multi_scale_crop) == ( + f'{multi_scale_crop.__class__.__name__}' + f'(input_size={(224, 224)}, scales={(1, 0.8)}, ' + f'max_wh_scale_gap={0}, random_crop={True}, ' + f'num_fixed_crops=5, lazy={False})') + + @staticmethod + def test_center_crop(): + with pytest.raises(TypeError): + # crop_size must be int or tuple of int + CenterCrop(0.5) + + with pytest.raises(TypeError): + # crop_size must be int or tuple of int + CenterCrop('224') + + # center crop with crop_size 224 + # add kps in test_center_crop + imgs = list(np.random.rand(2, 240, 320, 3)) + results = dict(imgs=imgs) + kp = np.array([[160, 120], [160, 120]]).reshape([1, 1, 2, 2]) + results['keypoint'] = kp + + results['gt_bboxes'] = np.array([[0, 0, 320, 240]]) + results['proposals'] = np.array([[0, 0, 320, 240]]) + center_crop = CenterCrop(crop_size=224) + center_crop_results = center_crop(results) + target_keys = ['imgs', 'crop_bbox', 'img_shape', 'keypoint'] + assert assert_dict_has_keys(center_crop_results, target_keys) + assert check_crop(imgs, center_crop_results['imgs'], + center_crop_results['crop_bbox']) + assert np.all( + center_crop_results['crop_bbox'] == np.array([48, 8, 272, 232])) + assert center_crop_results['img_shape'] == (224, 224) + assert np.all(center_crop_results['keypoint'] == 112) + + assert repr(center_crop) == (f'{center_crop.__class__.__name__}' + f'(crop_size={(224, 224)}, lazy={False})') + + @staticmethod + def test_three_crop(): + with pytest.raises(TypeError): + # crop_size must be int or tuple of int + ThreeCrop(0.5) + + with pytest.raises(TypeError): + # crop_size must be int or tuple of int + ThreeCrop('224') + + # three crop with crop_size 120 + imgs = list(np.random.rand(2, 240, 120, 3)) + results = dict(imgs=imgs) + three_crop = ThreeCrop(crop_size=120) + three_crop_results = three_crop(results) + target_keys = ['imgs', 'crop_bbox', 'img_shape'] + assert assert_dict_has_keys(three_crop_results, target_keys) + assert check_crop(imgs, three_crop_results['imgs'], + three_crop_results['crop_bbox'], 3) + assert three_crop_results['img_shape'] == (120, 120) + + # three crop with crop_size 224 + imgs = list(np.random.rand(2, 224, 224, 3)) + results = dict(imgs=imgs) + three_crop = ThreeCrop(crop_size=224) + three_crop_results = three_crop(results) + target_keys = ['imgs', 'crop_bbox', 'img_shape'] + assert assert_dict_has_keys(three_crop_results, target_keys) + assert check_crop(imgs, three_crop_results['imgs'], + three_crop_results['crop_bbox'], 3) + assert three_crop_results['img_shape'] == (224, 224) + + assert repr(three_crop) == (f'{three_crop.__class__.__name__}' + f'(crop_size={(224, 224)})') + + @staticmethod + def test_ten_crop(): + with pytest.raises(TypeError): + # crop_size must be int or tuple of int + TenCrop(0.5) + + with pytest.raises(TypeError): + # crop_size must be int or tuple of int + TenCrop('224') + + # ten crop with crop_size 256 + imgs = list(np.random.rand(2, 256, 256, 3)) + results = dict(imgs=imgs) + ten_crop = TenCrop(crop_size=224) + ten_crop_results = ten_crop(results) + target_keys = ['imgs', 'crop_bbox', 'img_shape'] + assert assert_dict_has_keys(ten_crop_results, target_keys) + assert check_crop(imgs, ten_crop_results['imgs'], + ten_crop_results['crop_bbox'], 10) + assert ten_crop_results['img_shape'] == (224, 224) + + assert repr(ten_crop) == (f'{ten_crop.__class__.__name__}' + f'(crop_size={(224, 224)})') diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_flip.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_flip.py new file mode 100644 index 0000000000000000000000000000000000000000..fd62e13f0034b8ab2d39f309cbef6a65f796a43b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_flip.py @@ -0,0 +1,136 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy + +import mmcv +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys +from numpy.testing import assert_array_almost_equal + +from mmaction.datasets.pipelines import Flip +from .base import check_flip + + +class TestFlip: + + @staticmethod + def test_flip(): + with pytest.raises(ValueError): + # direction must be in ['horizontal', 'vertical'] + Flip(direction='vertically') + + target_keys = ['imgs', 'flip_direction', 'modality'] + + # do not flip imgs. + imgs = list(np.random.rand(2, 64, 64, 3)) + results = dict(imgs=copy.deepcopy(imgs), modality='RGB') + flip = Flip(flip_ratio=0, direction='horizontal') + flip_results = flip(results) + assert assert_dict_has_keys(flip_results, target_keys) + assert np.array_equal(imgs, results['imgs']) + assert id(flip_results['imgs']) == id(results['imgs']) + assert np.shape(flip_results['imgs']) == np.shape(imgs) + + # always flip imgs horizontally. + imgs = list(np.random.rand(2, 64, 64, 3)) + results = dict(imgs=copy.deepcopy(imgs), modality='RGB') + results['gt_bboxes'] = np.array([[0, 0, 60, 60]]) + results['proposals'] = np.array([[0, 0, 60, 60]]) + flip = Flip(flip_ratio=1, direction='horizontal') + flip_results = flip(results) + assert assert_dict_has_keys(flip_results, target_keys) + if flip_results['flip'] is True: + assert check_flip(imgs, flip_results['imgs'], + flip_results['flip_direction']) + assert id(flip_results['imgs']) == id(results['imgs']) + assert np.shape(flip_results['imgs']) == np.shape(imgs) + + # flip flow images horizontally + imgs = [ + np.arange(16).reshape(4, 4).astype(np.float32), + np.arange(16, 32).reshape(4, 4).astype(np.float32) + ] + results = dict(imgs=copy.deepcopy(imgs), modality='Flow') + flip = Flip(flip_ratio=1, direction='horizontal') + flip_results = flip(results) + assert assert_dict_has_keys(flip_results, target_keys) + imgs = [x.reshape(4, 4, 1) for x in imgs] + flip_results['imgs'] = [ + x.reshape(4, 4, 1) for x in flip_results['imgs'] + ] + if flip_results['flip'] is True: + assert check_flip([imgs[0]], + [mmcv.iminvert(flip_results['imgs'][0])], + flip_results['flip_direction']) + assert check_flip([imgs[1]], [flip_results['imgs'][1]], + flip_results['flip_direction']) + assert id(flip_results['imgs']) == id(results['imgs']) + assert np.shape(flip_results['imgs']) == np.shape(imgs) + + # always flip imgs vertivally. + imgs = list(np.random.rand(2, 64, 64, 3)) + results = dict(imgs=copy.deepcopy(imgs), modality='RGB') + flip = Flip(flip_ratio=1, direction='vertical') + flip_results = flip(results) + assert assert_dict_has_keys(flip_results, target_keys) + if flip_results['flip'] is True: + assert check_flip(imgs, flip_results['imgs'], + flip_results['flip_direction']) + assert id(flip_results['imgs']) == id(results['imgs']) + assert np.shape(flip_results['imgs']) == np.shape(imgs) + + assert repr(flip) == (f'{flip.__class__.__name__}' + f'(flip_ratio={1}, direction=vertical, ' + f'flip_label_map={None}, lazy={False})') + + # transform label for the flipped image with the specific label. + _flip_label_map = {4: 6} + imgs = list(np.random.rand(2, 64, 64, 3)) + + # the label should be mapped. + results = dict(imgs=copy.deepcopy(imgs), modality='RGB', label=4) + flip = Flip( + flip_ratio=1, + direction='horizontal', + flip_label_map=_flip_label_map) + flip_results = flip(results) + assert results['label'] == 6 + + # the label should not be mapped. + results = dict(imgs=copy.deepcopy(imgs), modality='RGB', label=3) + flip = Flip( + flip_ratio=1, + direction='horizontal', + flip_label_map=_flip_label_map) + flip_results = flip(results) + assert results['label'] == 3 + + # flip the keypoints + results = dict( + keypoint=np.array([[1, 1], [63, 63]]).reshape([1, 1, 2, 2]), + modality='Pose', + img_shape=(64, 64)) + flip = Flip( + flip_ratio=1, direction='horizontal', left_kp=[0], right_kp=[1]) + flip_results = flip(results) + assert_array_almost_equal(flip_results['keypoint'][0, 0], + np.array([[1, 63], [63, 1]])) + + results = dict( + keypoint=np.array([[1, 1], [63, 63]]).reshape([1, 1, 2, 2]), + modality='Pose', + img_shape=(64, 64)) + flip = Flip( + flip_ratio=1, direction='horizontal', left_kp=[], right_kp=[]) + flip_results = flip(results) + assert_array_almost_equal(flip_results['keypoint'][0, 0], + np.array([[63, 1], [1, 63]])) + + with pytest.raises(AssertionError): + results = dict( + keypoint=np.array([[1, 1], [63, 63]]).reshape([1, 1, 2, 2]), + modality='Pose', + img_shape=(64, 64)) + flip = Flip( + flip_ratio=1, direction='vertical', left_kp=[], right_kp=[]) + flip_results = flip(results) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_imgaug.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_imgaug.py new file mode 100644 index 0000000000000000000000000000000000000000..646e0fb8130e1b235b6b8ab1c204d712f260a973 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_imgaug.py @@ -0,0 +1,101 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys +from numpy.testing import assert_array_almost_equal + +from mmaction.datasets.pipelines import CenterCrop, Imgaug +from .base import check_flip + + +class TestAugumentations: + + @staticmethod + def test_imgaug(): + + with pytest.raises(ValueError): + # transforms only support one string, 'default' + Imgaug(transforms='test') + + with pytest.raises(ValueError): + # transforms only support string or list of dicts + # or iaa.Augmenter object + Imgaug(transforms=dict(type='Rotate')) + + with pytest.raises(AssertionError): + # each dict must have a `type` key + Imgaug(transforms=[dict(rotate=(-30, 30))]) + + with pytest.raises(AttributeError): + # `type` must be available in imgaug + Imgaug(transforms=[dict(type='BlaBla')]) + + with pytest.raises(TypeError): + # `type` must be str or iaa available type + Imgaug(transforms=[dict(type=CenterCrop)]) + + from imgaug import augmenters as iaa + + # check default configs + target_keys = ['imgs', 'img_shape', 'modality'] + imgs = list(np.random.randint(0, 255, (1, 64, 64, 3)).astype(np.uint8)) + results = dict(imgs=imgs, modality='RGB') + default_imgaug = Imgaug(transforms='default') + default_results = default_imgaug(results) + assert_dict_has_keys(default_results, target_keys) + assert default_results['img_shape'] == (64, 64) + + # check flip (both images and bboxes) + target_keys = ['imgs', 'gt_bboxes', 'proposals', 'img_shape'] + imgs = list(np.random.rand(1, 64, 64, 3).astype(np.float32)) + results = dict( + imgs=imgs, + modality='RGB', + proposals=np.array([[0, 0, 25, 35]]), + img_shape=(64, 64), + gt_bboxes=np.array([[0, 0, 25, 35]])) + imgaug_flip = Imgaug(transforms=[dict(type='Fliplr')]) + flip_results = imgaug_flip(results) + assert assert_dict_has_keys(flip_results, target_keys) + assert check_flip(imgs, flip_results['imgs'], 'horizontal') + assert_array_almost_equal(flip_results['gt_bboxes'], + np.array([[39, 0, 64, 35]])) + assert_array_almost_equal(flip_results['proposals'], + np.array([[39, 0, 64, 35]])) + transforms = iaa.Sequential([iaa.Fliplr()]) + assert repr(imgaug_flip) == f'Imgaug(transforms={transforms})' + + # check crop (both images and bboxes) + target_keys = ['crop_bbox', 'gt_bboxes', 'imgs', 'img_shape'] + imgs = list(np.random.rand(1, 122, 122, 3)) + results = dict( + imgs=imgs, + modality='RGB', + img_shape=(122, 122), + gt_bboxes=np.array([[1.5, 2.5, 110, 64]])) + imgaug_center_crop = Imgaug(transforms=[ + dict( + type=iaa.CropToFixedSize, + width=100, + height=100, + position='center') + ]) + crop_results = imgaug_center_crop(results) + assert_dict_has_keys(crop_results, target_keys) + assert_array_almost_equal(crop_results['gt_bboxes'], + np.array([[0., 0., 99., 53.]])) + assert 'proposals' not in results + transforms = iaa.Sequential( + [iaa.CropToFixedSize(width=100, height=100, position='center')]) + assert repr(imgaug_center_crop) == f'Imgaug(transforms={transforms})' + + # check resize (images only) + target_keys = ['imgs', 'img_shape'] + imgs = list(np.random.rand(1, 64, 64, 3)) + results = dict(imgs=imgs, modality='RGB') + transforms = iaa.Resize(32) + imgaug_resize = Imgaug(transforms=transforms) + resize_results = imgaug_resize(results) + assert_dict_has_keys(resize_results, target_keys) + assert resize_results['img_shape'] == (32, 32) + assert repr(imgaug_resize) == f'Imgaug(transforms={transforms})' diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_lazy.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_lazy.py new file mode 100644 index 0000000000000000000000000000000000000000..34d535c502c021f3ce16ad2268cb3b2d07c85ba8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_lazy.py @@ -0,0 +1,373 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets.pipelines import (CenterCrop, Flip, Fuse, + MultiScaleCrop, RandomCrop, + RandomResizedCrop, Resize) +from .base import check_crop, check_flip + + +class TestLazy: + + @staticmethod + def test_init_lazy(): + from mmaction.datasets.pipelines.augmentations import \ + _init_lazy_if_proper # noqa: E501 + with pytest.raises(AssertionError): + # use lazy operation but "lazy" not in results + result = dict(lazy=dict(), img_shape=[64, 64]) + _init_lazy_if_proper(result, False) + + lazy_keys = [ + 'original_shape', 'crop_bbox', 'flip', 'flip_direction', + 'interpolation' + ] + + # 'img_shape' not in results + result = dict(imgs=list(np.random.randn(3, 64, 64, 3))) + _init_lazy_if_proper(result, True) + assert assert_dict_has_keys(result, ['imgs', 'lazy', 'img_shape']) + assert assert_dict_has_keys(result['lazy'], lazy_keys) + + # 'img_shape' in results + result = dict(img_shape=[64, 64]) + _init_lazy_if_proper(result, True) + assert assert_dict_has_keys(result, ['lazy', 'img_shape']) + assert assert_dict_has_keys(result['lazy'], lazy_keys) + + # do not use lazy operation + result = dict(img_shape=[64, 64]) + _init_lazy_if_proper(result, False) + assert assert_dict_has_keys(result, ['img_shape']) + assert 'lazy' not in result + + @staticmethod + def test_random_crop_lazy(): + with pytest.raises(TypeError): + # size must be an int + RandomCrop(size=(112, 112), lazy=True) + with pytest.raises(AssertionError): + # "size > height" or "size > width" is not allowed + imgs = list(np.random.rand(2, 224, 341, 3)) + results = dict(imgs=imgs) + random_crop = RandomCrop(size=320, lazy=True) + random_crop(results) + + target_keys = ['imgs', 'crop_bbox', 'img_shape', 'lazy'] + + # General case + imgs = list(np.random.rand(2, 224, 341, 3)) + results = dict(imgs=imgs) + random_crop = RandomCrop(size=224, lazy=True) + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert id(imgs) == id(random_crop_result['imgs']) + random_crop_result_fuse = Fuse()(random_crop_result) + assert 'lazy' not in random_crop_result_fuse + assert check_crop(imgs, random_crop_result_fuse['imgs'], + results['crop_bbox']) + h, w = random_crop_result_fuse['img_shape'] + assert h == w == 224 + + # Test the case that no need for cropping + imgs = list(np.random.rand(2, 224, 224, 3)) + results = dict(imgs=imgs) + random_crop = RandomCrop(size=224, lazy=True) + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert id(imgs) == id(random_crop_result['imgs']) + random_crop_result_fuse = Fuse()(random_crop_result) + assert 'lazy' not in random_crop_result_fuse + assert check_crop(imgs, random_crop_result_fuse['imgs'], + results['crop_bbox']) + h, w = random_crop_result_fuse['img_shape'] + assert h == w == 224 + + # Test the one-side-equal case + imgs = list(np.random.rand(2, 224, 225, 3)) + results = dict(imgs=imgs) + random_crop = RandomCrop(size=224, lazy=True) + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert id(imgs) == id(random_crop_result['imgs']) + random_crop_result_fuse = Fuse()(random_crop_result) + assert 'lazy' not in random_crop_result_fuse + assert check_crop(imgs, random_crop_result_fuse['imgs'], + results['crop_bbox']) + h, w = random_crop_result_fuse['img_shape'] + assert h == w == 224 + + assert repr(random_crop) == (f'{random_crop.__class__.__name__}' + f'(size={224}, lazy={True})') + + @staticmethod + def test_random_resized_crop_lazy(): + + target_keys = ['imgs', 'crop_bbox', 'img_shape', 'lazy'] + # There will be a slight difference because of rounding + eps = 0.01 + imgs = list(np.random.rand(2, 256, 341, 3)) + results = dict(imgs=imgs) + + with pytest.raises(AssertionError): + # area_range[0] > area_range[1], which is wrong + random_crop = RandomResizedCrop(area_range=(0.9, 0.7), lazy=True) + random_crop(results) + with pytest.raises(AssertionError): + # 0 > area_range[0] and area_range[1] > 1, which is wrong + random_crop = RandomResizedCrop( + aspect_ratio_range=(-0.1, 2.0), lazy=True) + random_crop(results) + + random_crop = RandomResizedCrop(lazy=True) + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert id(imgs) == id(random_crop_result['imgs']) + random_crop_result_fuse = Fuse()(random_crop_result) + assert check_crop(imgs, random_crop_result_fuse['imgs'], + results['crop_bbox']) + h, w = random_crop_result['img_shape'] + assert ((0.08 - eps <= h * w / 256 / 341) + and (h * w / 256 / 341 <= 1 + eps)) + assert (3. / 4. - eps <= h / w) and (h / w - eps <= 4. / 3.) + assert repr(random_crop) == (f'{random_crop.__class__.__name__}' + f'(area_range={(0.08, 1.0)}, ' + f'aspect_ratio_range={(3 / 4, 4 / 3)}, ' + f'lazy={True})') + + random_crop = RandomResizedCrop( + area_range=(0.9, 0.9), aspect_ratio_range=(10.0, 10.1), lazy=True) + # Test fallback cases by very big area range + imgs = np.random.rand(2, 256, 341, 3) + results = dict(imgs=imgs) + random_crop_result = random_crop(results) + assert assert_dict_has_keys(random_crop_result, target_keys) + assert id(imgs) == id(random_crop_result['imgs']) + random_crop_result_fuse = Fuse()(random_crop_result) + assert check_crop(imgs, random_crop_result_fuse['imgs'], + results['crop_bbox']) + h, w = random_crop_result['img_shape'] + assert h == w == 256 + + @staticmethod + def test_multi_scale_crop_lazy(): + with pytest.raises(TypeError): + # input_size must be int or tuple of int + MultiScaleCrop(0.5, lazy=True) + + with pytest.raises(TypeError): + # input_size must be int or tuple of int + MultiScaleCrop('224', lazy=True) + + with pytest.raises(TypeError): + # scales must be tuple. + MultiScaleCrop( + 224, scales=[ + 1, + ], lazy=True) + + with pytest.raises(ValueError): + # num_fix_crops must be in [5, 13] + MultiScaleCrop(224, num_fixed_crops=6, lazy=True) + + target_keys = ['imgs', 'crop_bbox', 'img_shape', 'scales'] + + # MultiScaleCrop with normal crops. + imgs = list(np.random.rand(2, 256, 341, 3)) + results = dict(imgs=imgs) + config = dict( + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0, + lazy=True) + multi_scale_crop = MultiScaleCrop(**config) + multi_scale_crop_result = multi_scale_crop(results) + assert id(imgs) == id(multi_scale_crop_result['imgs']) + assert assert_dict_has_keys(multi_scale_crop_result, target_keys) + multi_scale_crop_result_fuse = Fuse()(multi_scale_crop_result) + assert check_crop(imgs, multi_scale_crop_result_fuse['imgs'], + multi_scale_crop_result['crop_bbox']) + assert multi_scale_crop_result_fuse['img_shape'] in [(256, 256), + (204, 204)] + + # MultiScaleCrop with more fixed crops. + imgs = list(np.random.rand(2, 256, 341, 3)) + results = dict(imgs=imgs) + config = dict( + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0, + num_fixed_crops=13, + lazy=True) + multi_scale_crop = MultiScaleCrop(**config) + multi_scale_crop_result = multi_scale_crop(results) + assert id(imgs) == id(multi_scale_crop_result['imgs']) + assert assert_dict_has_keys(multi_scale_crop_result, target_keys) + multi_scale_crop_result_fuse = Fuse()(multi_scale_crop_result) + assert check_crop(imgs, multi_scale_crop_result_fuse['imgs'], + multi_scale_crop_result['crop_bbox']) + assert multi_scale_crop_result_fuse['img_shape'] in [(256, 256), + (204, 204)] + + # MultiScaleCrop with random crop. + imgs = list(np.random.rand(2, 256, 341, 3)) + results = dict(imgs=imgs) + config = dict( + input_size=224, + scales=(1, 0.8), + random_crop=True, + max_wh_scale_gap=0, + lazy=True) + multi_scale_crop = MultiScaleCrop(**config) + multi_scale_crop_result = multi_scale_crop(results) + assert id(imgs) == id(multi_scale_crop_result['imgs']) + assert assert_dict_has_keys(multi_scale_crop_result, target_keys) + multi_scale_crop_result_fuse = Fuse()(multi_scale_crop_result) + assert check_crop(imgs, multi_scale_crop_result_fuse['imgs'], + multi_scale_crop_result['crop_bbox']) + assert (multi_scale_crop_result_fuse['img_shape'] in [(256, 256), + (204, 204)]) + + assert repr(multi_scale_crop) == ( + f'{multi_scale_crop.__class__.__name__}' + f'(input_size={(224, 224)}, scales={(1, 0.8)}, ' + f'max_wh_scale_gap={0}, random_crop={True}, ' + f'num_fixed_crops={5}, lazy={True})') + + @staticmethod + def test_resize_lazy(): + with pytest.raises(ValueError): + # scale must be positive + Resize(-0.5, lazy=True) + + with pytest.raises(TypeError): + # scale must be tuple of int + Resize('224', lazy=True) + + target_keys = [ + 'imgs', 'img_shape', 'keep_ratio', 'scale_factor', 'modality' + ] + + # scale with -1 to indicate np.inf + imgs = list(np.random.rand(2, 240, 320, 3)) + results = dict(imgs=imgs, modality='RGB') + resize = Resize(scale=(-1, 256), keep_ratio=True, lazy=True) + resize_results = resize(results) + assert id(imgs) == id(resize_results['imgs']) + assert assert_dict_has_keys(resize_results, target_keys) + resize_results_fuse = Fuse()(resize_results) + assert np.all(resize_results_fuse['scale_factor'] == np.array( + [341 / 320, 256 / 240], dtype=np.float32)) + assert resize_results_fuse['img_shape'] == (256, 341) + + # scale with a normal tuple (320, 320) to indicate np.inf + imgs = list(np.random.rand(2, 240, 320, 3)) + results = dict(imgs=imgs, modality='RGB') + resize = Resize(scale=(320, 320), keep_ratio=False, lazy=True) + resize_results = resize(results) + assert id(imgs) == id(resize_results['imgs']) + assert assert_dict_has_keys(resize_results, target_keys) + resize_results_fuse = Fuse()(resize_results) + assert np.all(resize_results_fuse['scale_factor'] == np.array( + [1, 320 / 240], dtype=np.float32)) + assert resize_results_fuse['img_shape'] == (320, 320) + + # scale with a normal tuple (341, 256) to indicate np.inf + imgs = list(np.random.rand(2, 240, 320, 3)) + results = dict(imgs=imgs, modality='RGB') + resize = Resize(scale=(341, 256), keep_ratio=False, lazy=True) + resize_results = resize(results) + assert id(imgs) == id(resize_results['imgs']) + assert assert_dict_has_keys(resize_results, target_keys) + resize_results_fuse = Fuse()(resize_results) + assert np.all(resize_results_fuse['scale_factor'] == np.array( + [341 / 320, 256 / 240], dtype=np.float32)) + assert resize_results_fuse['img_shape'] == (256, 341) + + assert repr(resize) == (f'{resize.__class__.__name__ }' + f'(scale={(341, 256)}, keep_ratio={False}, ' + + f'interpolation=bilinear, lazy={True})') + + @staticmethod + def test_flip_lazy(): + with pytest.raises(ValueError): + Flip(direction='vertically', lazy=True) + + target_keys = ['imgs', 'flip_direction', 'modality'] + + # do not flip imgs. + imgs = list(np.random.rand(2, 64, 64, 3)) + imgs_tmp = imgs.copy() + results = dict(imgs=imgs_tmp, modality='RGB') + flip = Flip(flip_ratio=0, direction='horizontal', lazy=True) + flip_results = flip(results) + assert id(imgs_tmp) == id(flip_results['imgs']) + assert assert_dict_has_keys(flip_results, target_keys) + flip_results_fuse = Fuse()(flip_results) + assert np.equal(imgs, results['imgs']).all() + assert id(flip_results['imgs']) == id(results['imgs']) + assert flip_results_fuse['imgs'][0].shape == (64, 64, 3) + + # always flip imgs horizontally. + imgs = list(np.random.rand(2, 64, 64, 3)) + imgs_tmp = imgs.copy() + results = dict(imgs=imgs_tmp, modality='RGB') + flip = Flip(flip_ratio=1, direction='horizontal', lazy=True) + flip_results = flip(results) + assert id(imgs_tmp) == id(flip_results['imgs']) + assert assert_dict_has_keys(flip_results, target_keys) + flip_results_fuse = Fuse()(flip_results) + assert check_flip(imgs, flip_results['imgs'], + flip_results['flip_direction']) + assert id(flip_results['imgs']) == id(results['imgs']) + assert flip_results_fuse['imgs'][0].shape == (64, 64, 3) + + # always flip imgs vertivally. + imgs = list(np.random.rand(2, 64, 64, 3)) + imgs_tmp = imgs.copy() + results = dict(imgs=imgs_tmp, modality='RGB') + flip = Flip(flip_ratio=1, direction='vertical', lazy=True) + flip_results = flip(results) + assert id(imgs_tmp) == id(flip_results['imgs']) + assert assert_dict_has_keys(flip_results, target_keys) + flip_results_fuse = Fuse()(flip_results) + assert check_flip(imgs, flip_results['imgs'], + flip_results['flip_direction']) + assert id(flip_results['imgs']) == id(results['imgs']) + assert flip_results_fuse['imgs'][0].shape == (64, 64, 3) + + assert repr(flip) == (f'{flip.__class__.__name__}' + f'(flip_ratio={1}, direction=vertical, ' + f'flip_label_map={None}, lazy={True})') + + @staticmethod + def test_center_crop_lazy(): + with pytest.raises(TypeError): + # crop_size must be int or tuple of int + CenterCrop(0.5) + + with pytest.raises(TypeError): + # crop_size must be int or tuple of int + CenterCrop('224') + + # center crop with crop_size 224 + imgs = list(np.random.rand(2, 240, 320, 3)) + results = dict(imgs=imgs) + center_crop = CenterCrop(crop_size=224, lazy=True) + center_crop_results = center_crop(results) + + target_keys = ['imgs', 'crop_bbox', 'img_shape'] + assert assert_dict_has_keys(center_crop_results, target_keys) + center_crop_results_fuse = Fuse()(center_crop_results) + assert check_crop(imgs, center_crop_results_fuse['imgs'], + center_crop_results['crop_bbox']) + assert np.all(center_crop_results_fuse['crop_bbox'] == np.array( + [48, 8, 272, 232])) + assert center_crop_results_fuse['img_shape'] == (224, 224) + + assert repr(center_crop) == (f'{center_crop.__class__.__name__}' + f'(crop_size={(224, 224)}, lazy={True})') diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_misc.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_misc.py new file mode 100644 index 0000000000000000000000000000000000000000..a3ad2c6abc50f19bb51c8bd27653bfa183bb2aca --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_misc.py @@ -0,0 +1,19 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from mmaction.datasets.pipelines.augmentations import (_combine_quadruple, + _flip_quadruple) + + +class TestQuadrupleOps: + + @staticmethod + def test_combine_quadruple(): + a = (0.1, 0.1, 0.5, 0.5) + b = (0.3, 0.3, 0.7, 0.7) + res = _combine_quadruple(a, b) + assert res == (0.25, 0.25, 0.35, 0.35) + + @staticmethod + def test_flip_quadruple(): + a = (0.1, 0.1, 0.5, 0.5) + res = _flip_quadruple(a) + assert res == (0.4, 0.1, 0.5, 0.5) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_normalization.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_normalization.py new file mode 100644 index 0000000000000000000000000000000000000000..ee3bb1cee3ad52c693c098ed4a7b09b8c1068a36 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_normalization.py @@ -0,0 +1,71 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets.pipelines import Normalize +from .base import check_normalize + + +class TestNormalization: + + @staticmethod + def test_normalize(): + with pytest.raises(TypeError): + # mean must be list, tuple or np.ndarray + Normalize( + dict(mean=[123.675, 116.28, 103.53]), [58.395, 57.12, 57.375]) + + with pytest.raises(TypeError): + # std must be list, tuple or np.ndarray + Normalize([123.675, 116.28, 103.53], + dict(std=[58.395, 57.12, 57.375])) + + target_keys = ['imgs', 'img_norm_cfg', 'modality'] + + # normalize imgs in RGB format + imgs = list(np.random.rand(2, 240, 320, 3).astype(np.float32)) + results = dict(imgs=imgs, modality='RGB') + config = dict( + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + to_bgr=False) + normalize = Normalize(**config) + normalize_results = normalize(results) + assert assert_dict_has_keys(normalize_results, target_keys) + check_normalize(imgs, normalize_results['imgs'], + normalize_results['img_norm_cfg']) + + # normalize flow imgs + imgs = list(np.random.rand(4, 240, 320).astype(np.float32)) + results = dict(imgs=imgs, modality='Flow') + config = dict(mean=[128, 128], std=[128, 128]) + normalize = Normalize(**config) + normalize_results = normalize(results) + assert assert_dict_has_keys(normalize_results, target_keys) + assert normalize_results['imgs'].shape == (2, 240, 320, 2) + x_components = np.array(imgs[0::2]) + y_components = np.array(imgs[1::2]) + x_components = (x_components - config['mean'][0]) / config['std'][0] + y_components = (y_components - config['mean'][1]) / config['std'][1] + result_imgs = np.stack([x_components, y_components], axis=-1) + assert np.all(np.isclose(result_imgs, normalize_results['imgs'])) + + # normalize imgs in BGR format + imgs = list(np.random.rand(2, 240, 320, 3).astype(np.float32)) + results = dict(imgs=imgs, modality='RGB') + config = dict( + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + to_bgr=True) + normalize = Normalize(**config) + normalize_results = normalize(results) + assert assert_dict_has_keys(normalize_results, target_keys) + check_normalize(imgs, normalize_results['imgs'], + normalize_results['img_norm_cfg']) + + assert normalize.__repr__() == ( + normalize.__class__.__name__ + + f'(mean={np.array([123.675, 116.28, 103.53])}, ' + + f'std={np.array([58.395, 57.12, 57.375])}, to_bgr={True}, ' + f'adjust_magnitude={False})') diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_pytorchvideo.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_pytorchvideo.py new file mode 100644 index 0000000000000000000000000000000000000000..61ab7d28d16261abcd9970baa5cf68546524cf17 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_pytorchvideo.py @@ -0,0 +1,71 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys, digit_version + +try: + import torch + + from mmaction.datasets.pipelines import PytorchVideoTrans + pytorchvideo_ok = False + if digit_version(torch.__version__) >= digit_version('1.8.0'): + pytorchvideo_ok = True +except (ImportError, ModuleNotFoundError): + pytorchvideo_ok = False + + +@pytest.mark.skipif(not pytorchvideo_ok, reason='torch >= 1.8.0 is required') +class TestPytorchVideoTrans: + + @staticmethod + def test_pytorchvideo_trans(): + with pytest.raises(AssertionError): + # transforms not supported in pytorchvideo + PytorchVideoTrans(type='BlaBla') + + with pytest.raises(AssertionError): + # This trans exists in pytorchvideo but not supported in MMAction2 + PytorchVideoTrans(type='MixUp') + + target_keys = ['imgs'] + + imgs = list(np.random.randint(0, 256, (4, 32, 32, 3)).astype(np.uint8)) + results = dict(imgs=imgs) + + # test AugMix + augmix = PytorchVideoTrans(type='AugMix') + results = augmix(results) + assert assert_dict_has_keys(results, target_keys) + assert (all(img.shape == (32, 32, 3) for img in results['imgs'])) + + # test RandAugment + rand_augment = PytorchVideoTrans(type='RandAugment') + results = rand_augment(results) + assert assert_dict_has_keys(results, target_keys) + assert (all(img.shape == (32, 32, 3) for img in results['imgs'])) + + # test RandomResizedCrop + random_resized_crop = PytorchVideoTrans( + type='RandomResizedCrop', + target_height=16, + target_width=16, + scale=(0.1, 1.), + aspect_ratio=(0.8, 1.2)) + results = random_resized_crop(results) + assert assert_dict_has_keys(results, target_keys) + assert (all(img.shape == (16, 16, 3) for img in results['imgs'])) + + # test ShortSideScale + short_side_scale = PytorchVideoTrans(type='ShortSideScale', size=24) + results = short_side_scale(results) + assert assert_dict_has_keys(results, target_keys) + assert (all(img.shape == (24, 24, 3) for img in results['imgs'])) + + # test ShortSideScale + random_short_side_scale = PytorchVideoTrans( + type='RandomShortSideScale', min_size=24, max_size=36) + results = random_short_side_scale(results) + target_shape = results['imgs'][0].shape + assert 36 >= target_shape[0] >= 24 + assert assert_dict_has_keys(results, target_keys) + assert (all(img.shape == target_shape for img in results['imgs'])) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_transform.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_transform.py new file mode 100644 index 0000000000000000000000000000000000000000..31abd647f15706ca37629237f67cfc14cbbab17a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_augmentations/test_transform.py @@ -0,0 +1,160 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy + +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys +from numpy.testing import assert_array_almost_equal + +from mmaction.datasets.pipelines import RandomRescale, Resize +from mmaction.datasets.pipelines.augmentations import PoseCompact + + +class TestTransform: + + @staticmethod + def test_random_rescale(): + with pytest.raises(AssertionError): + # scale_range must be a tuple of int + RandomRescale(scale_range=224) + + with pytest.raises(AssertionError): + # scale_range must be a tuple of int + RandomRescale(scale_range=(224.0, 256.0)) + + with pytest.raises(AssertionError): + # scale_range[0] > scale_range[1], which is wrong + RandomRescale(scale_range=(320, 256)) + + with pytest.raises(AssertionError): + # scale_range[0] <= 0, which is wrong + RandomRescale(scale_range=(0, 320)) + + target_keys = ['imgs', 'short_edge', 'img_shape'] + # There will be a slight difference because of rounding + eps = 0.01 + imgs = list(np.random.rand(2, 256, 340, 3)) + results = dict(imgs=imgs, img_shape=(256, 340), modality='RGB') + + random_rescale = RandomRescale(scale_range=(300, 400)) + random_rescale_result = random_rescale(results) + + assert assert_dict_has_keys(random_rescale_result, target_keys) + + h, w = random_rescale_result['img_shape'] + + # check rescale + assert np.abs(h / 256 - w / 340) < eps + assert 300 / 256 - eps <= h / 256 <= 400 / 256 + eps + assert repr(random_rescale) == (f'{random_rescale.__class__.__name__}' + f'(scale_range={(300, 400)}, ' + 'interpolation=bilinear)') + + @staticmethod + def test_resize(): + with pytest.raises(ValueError): + # scale must be positive + Resize(-0.5) + + with pytest.raises(TypeError): + # scale must be tuple of int + Resize('224') + + target_keys = [ + 'imgs', 'img_shape', 'keep_ratio', 'scale_factor', 'modality' + ] + + # test resize for flow images + imgs = list(np.random.rand(2, 240, 320)) + kp = np.array([60, 60]).reshape([1, 1, 1, 2]) + results = dict(imgs=imgs, keypoint=kp, modality='Flow') + resize = Resize(scale=(160, 80), keep_ratio=False) + resize_results = resize(results) + assert assert_dict_has_keys(resize_results, target_keys) + assert np.all(resize_results['scale_factor'] == np.array( + [.5, 1. / 3.], dtype=np.float32)) + assert resize_results['img_shape'] == (80, 160) + kp = resize_results['keypoint'][0, 0, 0] + assert_array_almost_equal(kp, np.array([30, 20])) + + # scale with -1 to indicate np.inf + imgs = list(np.random.rand(2, 240, 320, 3)) + results = dict(imgs=imgs, modality='RGB') + results['gt_bboxes'] = np.array([[0, 0, 320, 240]]) + results['proposals'] = np.array([[0, 0, 320, 240]]) + resize = Resize(scale=(-1, 256), keep_ratio=True) + resize_results = resize(results) + assert assert_dict_has_keys(resize_results, target_keys) + assert np.all(resize_results['scale_factor'] == np.array( + [341 / 320, 256 / 240], dtype=np.float32)) + assert resize_results['img_shape'] == (256, 341) + + # scale with a normal tuple (320, 320) to indicate np.inf + imgs = list(np.random.rand(2, 240, 320, 3)) + results = dict(imgs=imgs, modality='RGB') + resize = Resize(scale=(320, 320), keep_ratio=False) + resize_results = resize(results) + assert assert_dict_has_keys(resize_results, target_keys) + assert np.all(resize_results['scale_factor'] == np.array( + [1, 320 / 240], dtype=np.float32)) + assert resize_results['img_shape'] == (320, 320) + + # scale with a normal tuple (341, 256) to indicate np.inf + imgs = list(np.random.rand(2, 240, 320, 3)) + results = dict(imgs=imgs, modality='RGB') + resize = Resize(scale=(341, 256), keep_ratio=False) + resize_results = resize(results) + assert assert_dict_has_keys(resize_results, target_keys) + assert np.all(resize_results['scale_factor'] == np.array( + [341 / 320, 256 / 240], dtype=np.float32)) + assert resize_results['img_shape'] == (256, 341) + + assert repr(resize) == ( + resize.__class__.__name__ + + f'(scale={(341, 256)}, keep_ratio={False}, ' + + f'interpolation=bilinear, lazy={False})') + + +class TestPoseCompact: + + @staticmethod + def test_pose_compact(): + results = {} + results['img_shape'] = (100, 100) + fake_kp = np.zeros([1, 4, 2, 2]) + fake_kp[:, :, 0] = [10, 10] + fake_kp[:, :, 1] = [90, 90] + results['keypoint'] = fake_kp + + pose_compact = PoseCompact( + padding=0, threshold=0, hw_ratio=None, allow_imgpad=False) + inp = copy.deepcopy(results) + ret = pose_compact(inp) + assert ret['img_shape'] == (80, 80) + assert str(pose_compact) == ( + 'PoseCompact(padding=0, threshold=0, hw_ratio=None, ' + 'allow_imgpad=False)') + + pose_compact = PoseCompact( + padding=0.3, threshold=0, hw_ratio=None, allow_imgpad=False) + inp = copy.deepcopy(results) + ret = pose_compact(inp) + assert ret['img_shape'] == (100, 100) + + pose_compact = PoseCompact( + padding=0.3, threshold=0, hw_ratio=None, allow_imgpad=True) + inp = copy.deepcopy(results) + ret = pose_compact(inp) + assert ret['img_shape'] == (104, 104) + + pose_compact = PoseCompact( + padding=0, threshold=100, hw_ratio=None, allow_imgpad=False) + inp = copy.deepcopy(results) + ret = pose_compact(inp) + assert ret['img_shape'] == (100, 100) + + pose_compact = PoseCompact( + padding=0, threshold=0, hw_ratio=0.75, allow_imgpad=True) + inp = copy.deepcopy(results) + ret = pose_compact(inp) + assert ret['img_shape'] == (80, 106) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/__init__.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..fe54e15c46f8ecd1b68cfb1356fe5d63e62198d4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .base import BaseTestLoading + +__all__ = ['BaseTestLoading'] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/base.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/base.py new file mode 100644 index 0000000000000000000000000000000000000000..3c74628779f14c1463d2a39240c049f9225c5589 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/base.py @@ -0,0 +1,93 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import mmcv +import numpy as np + + +class BaseTestLoading: + + @classmethod + def setup_class(cls): + cls.data_prefix = osp.normpath( + osp.join(osp.dirname(__file__), '../../../data')) + cls.img_path = osp.join(cls.data_prefix, 'test.jpg') + cls.video_path = osp.join(cls.data_prefix, 'test.mp4') + cls.wav_path = osp.join(cls.data_prefix, 'test.wav') + cls.audio_spec_path = osp.join(cls.data_prefix, 'test.npy') + cls.img_dir = osp.join(cls.data_prefix, 'imgs') + cls.raw_feature_dir = osp.join(cls.data_prefix, 'activitynet_features') + cls.bsp_feature_dir = osp.join(cls.data_prefix, 'bsp_features') + cls.proposals_dir = osp.join(cls.data_prefix, 'proposals') + + cls.total_frames = 5 + cls.filename_tmpl = 'img_{:05}.jpg' + cls.flow_filename_tmpl = '{}_{:05d}.jpg' + video_total_frames = len(mmcv.VideoReader(cls.video_path)) + cls.audio_total_frames = video_total_frames + + cls.video_results = dict( + filename=cls.video_path, + label=1, + total_frames=video_total_frames, + start_index=0) + cls.audio_results = dict( + audios=np.random.randn(1280, ), + audio_path=cls.wav_path, + total_frames=cls.audio_total_frames, + label=1, + start_index=0) + cls.audio_feature_results = dict( + audios=np.random.randn(128, 80), + audio_path=cls.audio_spec_path, + total_frames=cls.audio_total_frames, + label=1, + start_index=0) + cls.frame_results = dict( + frame_dir=cls.img_dir, + total_frames=cls.total_frames, + filename_tmpl=cls.filename_tmpl, + start_index=1, + modality='RGB', + offset=0, + label=1) + cls.flow_frame_results = dict( + frame_dir=cls.img_dir, + total_frames=cls.total_frames, + filename_tmpl=cls.flow_filename_tmpl, + modality='Flow', + offset=0, + label=1) + cls.action_results = dict( + video_name='v_test1', + data_prefix=cls.raw_feature_dir, + temporal_scale=5, + boundary_ratio=0.1, + duration_second=10, + duration_frame=10, + feature_frame=8, + annotations=[{ + 'segment': [3.0, 5.0], + 'label': 'Rock climbing' + }]) + from mmaction.datasets.ssn_dataset import SSNInstance + cls.proposal_results = dict( + frame_dir=cls.img_dir, + video_id='imgs', + total_frames=cls.total_frames, + filename_tmpl=cls.filename_tmpl, + start_index=1, + out_proposals=[[['imgs', SSNInstance(1, 4, 10, 1, 1, 1)], 0], + [['imgs', SSNInstance(2, 5, 10, 2, 1, 1)], 0]]) + + cls.ava_results = dict( + fps=30, timestamp=902, timestamp_start=840, shot_info=(0, 27000)) + + cls.hvu_label_example1 = dict( + categories=['action', 'object', 'scene', 'concept'], + category_nums=[2, 5, 3, 2], + label=dict(action=[0], object=[2, 3], scene=[0, 1])) + cls.hvu_label_example2 = dict( + categories=['action', 'object', 'scene', 'concept'], + category_nums=[2, 5, 3, 2], + label=dict(action=[1], scene=[1, 2], concept=[1])) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_decode.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_decode.py new file mode 100644 index 0000000000000000000000000000000000000000..aca0943d245076d54b9a860fe43ea189ec7f0b54 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_decode.py @@ -0,0 +1,498 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import platform + +import numpy as np +from mmcv.utils import assert_dict_has_keys + +from mmaction.datasets.pipelines import (AudioDecode, AudioDecodeInit, + DecordDecode, DecordInit, + OpenCVDecode, OpenCVInit, PIMSDecode, + PIMSInit, PyAVDecode, + PyAVDecodeMotionVector, PyAVInit, + RawFrameDecode) +from .base import BaseTestLoading + + +class TestDecode(BaseTestLoading): + + def test_pyav_init(self): + target_keys = ['video_reader', 'total_frames'] + video_result = copy.deepcopy(self.video_results) + pyav_init = PyAVInit() + pyav_init_result = pyav_init(video_result) + assert assert_dict_has_keys(pyav_init_result, target_keys) + assert pyav_init_result['total_frames'] == 300 + assert repr( + pyav_init) == f'{pyav_init.__class__.__name__}(io_backend=disk)' + + def test_pyav_decode(self): + target_keys = ['frame_inds', 'imgs', 'original_shape'] + + # test PyAV with 2 dim input and start_index = 0 + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(0, self.total_frames, + 2)[:, np.newaxis] + pyav_init = PyAVInit() + pyav_init_result = pyav_init(video_result) + video_result['video_reader'] = pyav_init_result['video_reader'] + + pyav_decode = PyAVDecode() + pyav_decode_result = pyav_decode(video_result) + assert assert_dict_has_keys(pyav_decode_result, target_keys) + assert pyav_decode_result['original_shape'] == (256, 340) + assert np.shape(pyav_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + assert repr(pyav_decode) == (f'{pyav_decode.__class__.__name__}(' + f'multi_thread={False}, mode=accurate)') + + # test PyAV with 1 dim input and start_index = 0 + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(0, self.total_frames, 5) + pyav_init = PyAVInit() + pyav_init_result = pyav_init(video_result) + video_result['video_reader'] = pyav_init_result['video_reader'] + + pyav_decode = PyAVDecode() + pyav_decode_result = pyav_decode(video_result) + assert assert_dict_has_keys(pyav_decode_result, target_keys) + assert pyav_decode_result['original_shape'] == (256, 340) + assert np.shape(pyav_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + # PyAV with multi thread and start_index = 0 + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(0, self.total_frames, 5) + pyav_init = PyAVInit() + pyav_init_result = pyav_init(video_result) + video_result['video_reader'] = pyav_init_result['video_reader'] + + pyav_decode = PyAVDecode(multi_thread=True) + pyav_decode_result = pyav_decode(video_result) + assert assert_dict_has_keys(pyav_decode_result, target_keys) + assert pyav_decode_result['original_shape'] == (256, 340) + assert np.shape(pyav_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + assert repr(pyav_decode) == (f'{pyav_decode.__class__.__name__}(' + f'multi_thread={True}, mode=accurate)') + + # test PyAV with 2 dim input + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(1, self.total_frames, + 2)[:, np.newaxis] + pyav_init = PyAVInit() + pyav_init_result = pyav_init(video_result) + video_result['video_reader'] = pyav_init_result['video_reader'] + + pyav_decode = PyAVDecode() + pyav_decode_result = pyav_decode(video_result) + assert assert_dict_has_keys(pyav_decode_result, target_keys) + assert pyav_decode_result['original_shape'] == (256, 340) + assert np.shape(pyav_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + # test PyAV with 1 dim input + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(1, self.total_frames, 5) + pyav_init = PyAVInit() + pyav_init_result = pyav_init(video_result) + video_result['video_reader'] = pyav_init_result['video_reader'] + + pyav_decode = PyAVDecode() + pyav_decode_result = pyav_decode(video_result) + assert assert_dict_has_keys(pyav_decode_result, target_keys) + assert pyav_decode_result['original_shape'] == (256, 340) + assert np.shape(pyav_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + # PyAV with multi thread + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(1, self.total_frames, 5) + pyav_init = PyAVInit() + pyav_init_result = pyav_init(video_result) + video_result['video_reader'] = pyav_init_result['video_reader'] + + pyav_decode = PyAVDecode(multi_thread=True) + pyav_decode_result = pyav_decode(video_result) + assert assert_dict_has_keys(pyav_decode_result, target_keys) + assert pyav_decode_result['original_shape'] == (256, 340) + assert np.shape(pyav_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + # PyAV with efficient mode + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(1, self.total_frames, 5) + pyav_init = PyAVInit() + pyav_init_result = pyav_init(video_result) + video_result['video_reader'] = pyav_init_result['video_reader'] + + pyav_decode = PyAVDecode(multi_thread=True, mode='efficient') + pyav_decode_result = pyav_decode(video_result) + assert assert_dict_has_keys(pyav_decode_result, target_keys) + assert pyav_decode_result['original_shape'] == (256, 340) + assert np.shape(pyav_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + assert pyav_decode_result['video_reader'] is None + + assert (repr(pyav_decode) == pyav_decode.__class__.__name__ + + f'(multi_thread={True}, mode=efficient)') + + def test_pims_init(self): + target_keys = ['video_reader', 'total_frames'] + video_result = copy.deepcopy(self.video_results) + pims_init = PIMSInit() + pims_init_result = pims_init(video_result) + assert assert_dict_has_keys(pims_init_result, target_keys) + assert pims_init_result['total_frames'] == 300 + + pims_init = PIMSInit(mode='efficient') + pims_init_result = pims_init(video_result) + assert assert_dict_has_keys(pims_init_result, target_keys) + assert pims_init_result['total_frames'] == 300 + + assert repr(pims_init) == (f'{pims_init.__class__.__name__}' + f'(io_backend=disk, mode=efficient)') + + def test_pims_decode(self): + target_keys = ['frame_inds', 'imgs', 'original_shape'] + + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(0, self.total_frames, + 2)[:, np.newaxis] + pims_init = PIMSInit() + pims_init_result = pims_init(video_result) + + pims_decode = PIMSDecode() + pims_decode_result = pims_decode(pims_init_result) + assert assert_dict_has_keys(pims_decode_result, target_keys) + assert pims_decode_result['original_shape'] == (256, 340) + assert np.shape(pims_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + def test_decord_init(self): + target_keys = ['video_reader', 'total_frames'] + video_result = copy.deepcopy(self.video_results) + decord_init = DecordInit() + decord_init_result = decord_init(video_result) + assert assert_dict_has_keys(decord_init_result, target_keys) + assert decord_init_result['total_frames'] == len( + decord_init_result['video_reader']) + assert repr(decord_init) == (f'{decord_init.__class__.__name__}(' + f'io_backend=disk, ' + f'num_threads={1})') + + def test_decord_decode(self): + target_keys = ['frame_inds', 'imgs', 'original_shape'] + + # test Decord with 2 dim input and start_index = 0 + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(0, self.total_frames, + 3)[:, np.newaxis] + decord_init = DecordInit() + decord_init_result = decord_init(video_result) + video_result['video_reader'] = decord_init_result['video_reader'] + + decord_decode = DecordDecode() + decord_decode_result = decord_decode(video_result) + assert assert_dict_has_keys(decord_decode_result, target_keys) + assert decord_decode_result['original_shape'] == (256, 340) + assert np.shape(decord_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + # test Decord with 1 dim input and start_index = 0 + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(0, self.total_frames, 3) + decord_init = DecordInit() + decord_init_result = decord_init(video_result) + video_result['video_reader'] = decord_init_result['video_reader'] + + decord_decode = DecordDecode() + decord_decode_result = decord_decode(video_result) + assert assert_dict_has_keys(decord_decode_result, target_keys) + assert decord_decode_result['original_shape'] == (256, 340) + assert np.shape(decord_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + # test Decord with 2 dim input and start_index = 0 + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(0, self.total_frames, + 3)[:, np.newaxis] + decord_init = DecordInit() + decord_init_result = decord_init(video_result) + video_result['video_reader'] = decord_init_result['video_reader'] + + decord_decode = DecordDecode() + decord_decode_result = decord_decode(video_result) + assert assert_dict_has_keys(decord_decode_result, target_keys) + assert decord_decode_result['original_shape'] == (256, 340) + assert np.shape(decord_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + # test Decord with 1 dim input + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(1, self.total_frames, 3) + decord_init = DecordInit() + decord_init_result = decord_init(video_result) + video_result['video_reader'] = decord_init_result['video_reader'] + + decord_decode = DecordDecode(mode='efficient') + decord_decode_result = decord_decode(video_result) + assert assert_dict_has_keys(decord_decode_result, target_keys) + assert decord_decode_result['original_shape'] == (256, 340) + assert np.shape(decord_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + assert repr(decord_decode) == (f'{decord_decode.__class__.__name__}(' + f'mode=efficient)') + + def test_opencv_init(self): + target_keys = ['new_path', 'video_reader', 'total_frames'] + video_result = copy.deepcopy(self.video_results) + opencv_init = OpenCVInit() + opencv_init_result = opencv_init(video_result) + assert assert_dict_has_keys(opencv_init_result, target_keys) + assert opencv_init_result['total_frames'] == len( + opencv_init_result['video_reader']) + assert repr(opencv_init) == (f'{opencv_init.__class__.__name__}(' + f'io_backend=disk)') + + def test_opencv_decode(self): + target_keys = ['frame_inds', 'imgs', 'original_shape'] + + # test OpenCV with 2 dim input when start_index = 0 + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(0, self.total_frames, + 2)[:, np.newaxis] + opencv_init = OpenCVInit() + opencv_init_result = opencv_init(video_result) + video_result['video_reader'] = opencv_init_result['video_reader'] + + opencv_decode = OpenCVDecode() + opencv_decode_result = opencv_decode(video_result) + assert assert_dict_has_keys(opencv_decode_result, target_keys) + assert opencv_decode_result['original_shape'] == (256, 340) + assert np.shape(opencv_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + # test OpenCV with 2 dim input + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(1, self.total_frames, + 2)[:, np.newaxis] + opencv_init = OpenCVInit() + opencv_init_result = opencv_init(video_result) + video_result['video_reader'] = opencv_init_result['video_reader'] + + opencv_decode = OpenCVDecode() + opencv_decode_result = opencv_decode(video_result) + assert assert_dict_has_keys(opencv_decode_result, target_keys) + assert opencv_decode_result['original_shape'] == (256, 340) + assert np.shape(opencv_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + # test OpenCV with 1 dim input when start_index = 0 + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(0, self.total_frames, 3) + opencv_init = OpenCVInit() + opencv_init_result = opencv_init(video_result) + video_result['video_reader'] = opencv_init_result['video_reader'] + + # test OpenCV with 1 dim input + video_result = copy.deepcopy(self.video_results) + video_result['frame_inds'] = np.arange(1, self.total_frames, 3) + opencv_init = OpenCVInit() + opencv_init_result = opencv_init(video_result) + video_result['video_reader'] = opencv_init_result['video_reader'] + + opencv_decode = OpenCVDecode() + opencv_decode_result = opencv_decode(video_result) + assert assert_dict_has_keys(opencv_decode_result, target_keys) + assert opencv_decode_result['original_shape'] == (256, 340) + assert np.shape(opencv_decode_result['imgs']) == (len( + video_result['frame_inds']), 256, 340, 3) + + def test_rawframe_decode(self): + target_keys = ['frame_inds', 'imgs', 'original_shape', 'modality'] + + # test frame selector with 2 dim input + inputs = copy.deepcopy(self.frame_results) + inputs['frame_inds'] = np.arange(0, self.total_frames, 2)[:, + np.newaxis] + # since the test images start with index 1, we plus 1 to frame_inds + # in order to pass the CI + inputs['frame_inds'] = inputs['frame_inds'] + 1 + + inputs['gt_bboxes'] = np.array([[0, 0, 1, 1]]) + inputs['proposals'] = np.array([[0, 0, 1, 1]]) + frame_selector = RawFrameDecode(io_backend='disk') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']), 240, + 320, 3) + assert results['original_shape'] == (240, 320) + + # test frame selector with 2 dim input + inputs = copy.deepcopy(self.frame_results) + inputs['frame_inds'] = np.arange(1, self.total_frames, 2)[:, + np.newaxis] + frame_selector = RawFrameDecode(io_backend='disk') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']), 240, + 320, 3) + assert results['original_shape'] == (240, 320) + + # test frame selector with 1 dim input when start_index = 0 + inputs = copy.deepcopy(self.frame_results) + inputs['frame_inds'] = np.arange(0, self.total_frames, 5) + # since the test images start with index 1, we plus 1 to frame_inds + # in order to pass the CI + inputs['frame_inds'] = inputs['frame_inds'] + 1 + frame_selector = RawFrameDecode(io_backend='disk') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']), 240, + 320, 3) + assert results['original_shape'] == (240, 320) + + # test frame selector with 1 dim input + inputs = copy.deepcopy(self.frame_results) + inputs['frame_inds'] = np.arange(1, self.total_frames, 5) + frame_selector = RawFrameDecode(io_backend='disk') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']), 240, + 320, 3) + assert results['original_shape'] == (240, 320) + + # test frame selector with 1 dim input + inputs = copy.deepcopy(self.frame_results) + inputs['frame_inds'] = np.arange(0, self.total_frames, 2) + # since the test images start with index 1, we plus 1 to frame_inds + # in order to pass the CI + inputs['frame_inds'] = inputs['frame_inds'] + 1 + frame_selector = RawFrameDecode(io_backend='disk') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']), 240, + 320, 3) + assert results['original_shape'] == (240, 320) + + # test frame selector with 1 dim input + inputs = copy.deepcopy(self.frame_results) + inputs['frame_inds'] = np.arange(1, self.total_frames, 2) + frame_selector = RawFrameDecode(io_backend='disk') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']), 240, + 320, 3) + assert results['original_shape'] == (240, 320) + + # test frame selector with 1 dim input for flow images + inputs = copy.deepcopy(self.flow_frame_results) + inputs['frame_inds'] = np.arange(0, self.total_frames, 2) + # since the test images start with index 1, we plus 1 to frame_inds + # in order to pass the CI + inputs['frame_inds'] = inputs['frame_inds'] + 1 + frame_selector = RawFrameDecode(io_backend='disk') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']) * 2, + 240, 320) + assert results['original_shape'] == (240, 320) + + # test frame selector with 1 dim input for flow images + inputs = copy.deepcopy(self.flow_frame_results) + inputs['frame_inds'] = np.arange(1, self.total_frames, 2) + frame_selector = RawFrameDecode(io_backend='disk') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']) * 2, + 240, 320) + assert results['original_shape'] == (240, 320) + + if platform.system() != 'Windows': + # test frame selector in turbojpeg decoding backend + # when start_index = 0 + inputs = copy.deepcopy(self.frame_results) + inputs['frame_inds'] = np.arange(0, self.total_frames, 5) + # since the test images start with index 1, we plus 1 to frame_inds + # in order to pass the CI + inputs['frame_inds'] = inputs['frame_inds'] + 1 + frame_selector = RawFrameDecode( + io_backend='disk', decoding_backend='turbojpeg') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']), + 240, 320, 3) + assert results['original_shape'] == (240, 320) + + # test frame selector in turbojpeg decoding backend + inputs = copy.deepcopy(self.frame_results) + inputs['frame_inds'] = np.arange(1, self.total_frames, 5) + frame_selector = RawFrameDecode( + io_backend='disk', decoding_backend='turbojpeg') + results = frame_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert np.shape(results['imgs']) == (len(inputs['frame_inds']), + 240, 320, 3) + assert results['original_shape'] == (240, 320) + assert repr(frame_selector) == ( + f'{frame_selector.__class__.__name__}(io_backend=disk, ' + f'decoding_backend=turbojpeg)') + + def test_audio_decode_init(self): + target_keys = ['audios', 'length', 'sample_rate'] + inputs = copy.deepcopy(self.audio_results) + audio_decode_init = AudioDecodeInit() + results = audio_decode_init(inputs) + assert assert_dict_has_keys(results, target_keys) + + # test when no audio file exists + inputs = copy.deepcopy(self.audio_results) + inputs['audio_path'] = 'foo/foo/bar.wav' + audio_decode_init = AudioDecodeInit() + results = audio_decode_init(inputs) + assert assert_dict_has_keys(results, target_keys) + assert results['audios'].shape == (10.0 * + audio_decode_init.sample_rate, ) + assert repr(audio_decode_init) == ( + f'{audio_decode_init.__class__.__name__}(' + f'io_backend=disk, ' + f'sample_rate=16000, ' + f'pad_method=zero)') + + def test_audio_decode(self): + target_keys = ['frame_inds', 'audios'] + inputs = copy.deepcopy(self.audio_results) + inputs['frame_inds'] = np.arange(0, self.audio_total_frames, + 2)[:, np.newaxis] + inputs['num_clips'] = 1 + inputs['length'] = 1280 + audio_selector = AudioDecode() + results = audio_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + + def test_pyav_decode_motion_vector(self): + pyav_init = PyAVInit() + pyav = PyAVDecodeMotionVector() + + # test pyav with 2-dim input + results = { + 'filename': self.video_path, + 'frame_inds': np.arange(0, 32, 1)[:, np.newaxis] + } + results = pyav_init(results) + results = pyav(results) + target_keys = ['motion_vectors'] + assert assert_dict_has_keys(results, target_keys) + + # test pyav with 1 dim input + results = { + 'filename': self.video_path, + 'frame_inds': np.arange(0, 32, 1) + } + pyav_init = PyAVInit() + results = pyav_init(results) + pyav = PyAVDecodeMotionVector() + results = pyav(results) + + assert assert_dict_has_keys(results, target_keys) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_load.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_load.py new file mode 100644 index 0000000000000000000000000000000000000000..560edd090369fe411dd9c4a69b265c66fa1f23e3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_load.py @@ -0,0 +1,152 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy + +import numpy as np +import pytest +import torch +from mmcv.utils import assert_dict_has_keys +from numpy.testing import assert_array_almost_equal + +from mmaction.datasets.pipelines import (LoadAudioFeature, LoadHVULabel, + LoadLocalizationFeature, + LoadProposals) +from .base import BaseTestLoading + + +class TestLoad(BaseTestLoading): + + def test_load_hvu_label(self): + hvu_label_example1 = copy.deepcopy(self.hvu_label_example1) + hvu_label_example2 = copy.deepcopy(self.hvu_label_example2) + categories = hvu_label_example1['categories'] + category_nums = hvu_label_example1['category_nums'] + num_tags = sum(category_nums) + num_categories = len(categories) + + loader = LoadHVULabel() + assert repr(loader) == (f'{loader.__class__.__name__}(' + f'hvu_initialized={False})') + + result1 = loader(hvu_label_example1) + label1 = torch.zeros(num_tags) + mask1 = torch.zeros(num_tags) + category_mask1 = torch.zeros(num_categories) + + assert repr(loader) == (f'{loader.__class__.__name__}(' + f'hvu_initialized={True})') + + label1[[0, 4, 5, 7, 8]] = 1. + mask1[:10] = 1. + category_mask1[:3] = 1. + + assert torch.all(torch.eq(label1, result1['label'])) + assert torch.all(torch.eq(mask1, result1['mask'])) + assert torch.all(torch.eq(category_mask1, result1['category_mask'])) + + result2 = loader(hvu_label_example2) + label2 = torch.zeros(num_tags) + mask2 = torch.zeros(num_tags) + category_mask2 = torch.zeros(num_categories) + + label2[[1, 8, 9, 11]] = 1. + mask2[:2] = 1. + mask2[7:] = 1. + category_mask2[[0, 2, 3]] = 1. + + assert torch.all(torch.eq(label2, result2['label'])) + assert torch.all(torch.eq(mask2, result2['mask'])) + assert torch.all(torch.eq(category_mask2, result2['category_mask'])) + + def test_load_localization_feature(self): + target_keys = ['raw_feature'] + + action_result = copy.deepcopy(self.action_results) + + # test error cases + with pytest.raises(NotImplementedError): + load_localization_feature = LoadLocalizationFeature( + 'unsupport_ext') + + # test normal cases + load_localization_feature = LoadLocalizationFeature() + load_localization_feature_result = load_localization_feature( + action_result) + assert assert_dict_has_keys(load_localization_feature_result, + target_keys) + assert load_localization_feature_result['raw_feature'].shape == (400, + 5) + assert repr(load_localization_feature) == ( + f'{load_localization_feature.__class__.__name__}(' + f'raw_feature_ext=.csv)') + + def test_load_proposals(self): + target_keys = [ + 'bsp_feature', 'tmin', 'tmax', 'tmin_score', 'tmax_score', + 'reference_temporal_iou' + ] + + action_result = copy.deepcopy(self.action_results) + + # test error cases + with pytest.raises(NotImplementedError): + load_proposals = LoadProposals(5, self.proposals_dir, + self.bsp_feature_dir, + 'unsupport_ext') + + with pytest.raises(NotImplementedError): + load_proposals = LoadProposals(5, self.proposals_dir, + self.bsp_feature_dir, '.csv', + 'unsupport_ext') + + # test normal cases + load_proposals = LoadProposals(5, self.proposals_dir, + self.bsp_feature_dir) + load_proposals_result = load_proposals(action_result) + assert assert_dict_has_keys(load_proposals_result, target_keys) + assert load_proposals_result['bsp_feature'].shape[0] == 5 + assert load_proposals_result['tmin'].shape == (5, ) + assert_array_almost_equal( + load_proposals_result['tmin'], np.arange(0.1, 0.6, 0.1), decimal=4) + assert load_proposals_result['tmax'].shape == (5, ) + assert_array_almost_equal( + load_proposals_result['tmax'], np.arange(0.2, 0.7, 0.1), decimal=4) + assert load_proposals_result['tmin_score'].shape == (5, ) + assert_array_almost_equal( + load_proposals_result['tmin_score'], + np.arange(0.95, 0.90, -0.01), + decimal=4) + assert load_proposals_result['tmax_score'].shape == (5, ) + assert_array_almost_equal( + load_proposals_result['tmax_score'], + np.arange(0.96, 0.91, -0.01), + decimal=4) + assert load_proposals_result['reference_temporal_iou'].shape == (5, ) + assert_array_almost_equal( + load_proposals_result['reference_temporal_iou'], + np.arange(0.85, 0.80, -0.01), + decimal=4) + assert repr(load_proposals) == ( + f'{load_proposals.__class__.__name__}(' + f'top_k={5}, ' + f'pgm_proposals_dir={self.proposals_dir}, ' + f'pgm_features_dir={self.bsp_feature_dir}, ' + f'proposal_ext=.csv, ' + f'feature_ext=.npy)') + + def test_load_audio_feature(self): + target_keys = ['audios'] + inputs = copy.deepcopy(self.audio_feature_results) + load_audio_feature = LoadAudioFeature() + results = load_audio_feature(inputs) + assert assert_dict_has_keys(results, target_keys) + + # test when no audio feature file exists + inputs = copy.deepcopy(self.audio_feature_results) + inputs['audio_path'] = 'foo/foo/bar.npy' + load_audio_feature = LoadAudioFeature() + results = load_audio_feature(inputs) + assert results['audios'].shape == (640, 80) + assert assert_dict_has_keys(results, target_keys) + assert repr(load_audio_feature) == ( + f'{load_audio_feature.__class__.__name__}(' + f'pad_method=zero)') diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_localization.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_localization.py new file mode 100644 index 0000000000000000000000000000000000000000..40005965b6a06e91c7a013a52d1695dc2c5ed455 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_localization.py @@ -0,0 +1,28 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy + +import numpy as np +from mmcv.utils import assert_dict_has_keys +from numpy.testing import assert_array_almost_equal + +from mmaction.datasets.pipelines import GenerateLocalizationLabels +from .base import BaseTestLoading + + +class TestLocalization(BaseTestLoading): + + def test_generate_localization_label(self): + action_result = copy.deepcopy(self.action_results) + action_result['raw_feature'] = np.random.randn(400, 5) + + # test default setting + target_keys = ['gt_bbox'] + generate_localization_labels = GenerateLocalizationLabels() + generate_localization_labels_result = generate_localization_labels( + action_result) + assert assert_dict_has_keys(generate_localization_labels_result, + target_keys) + + assert_array_almost_equal( + generate_localization_labels_result['gt_bbox'], [[0.375, 0.625]], + decimal=4) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_pose_loading.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_pose_loading.py new file mode 100644 index 0000000000000000000000000000000000000000..055f4e67253c223c5322bd5bd495eb3d4e09c450 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_pose_loading.py @@ -0,0 +1,391 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy as cp +import os.path as osp +import tempfile +from collections import defaultdict + +import numpy as np +import pytest +from mmcv import dump +from mmcv.utils import assert_dict_has_keys +from numpy.testing import assert_array_almost_equal, assert_array_equal + +from mmaction.datasets.pipelines import (GeneratePoseTarget, LoadKineticsPose, + PaddingWithLoop, PoseDecode, + PoseNormalize, UniformSampleFrames) + + +class TestPoseLoading: + + @staticmethod + def test_uniform_sample_frames(): + results = dict(total_frames=64, start_index=0) + sampling = UniformSampleFrames( + clip_len=8, num_clips=1, test_mode=True, seed=0) + + assert str(sampling) == ('UniformSampleFrames(clip_len=8, ' + 'num_clips=1, test_mode=True, seed=0)') + sampling_results = sampling(results) + assert sampling_results['clip_len'] == 8 + assert sampling_results['frame_interval'] is None + assert sampling_results['num_clips'] == 1 + assert_array_equal(sampling_results['frame_inds'], + np.array([4, 15, 21, 24, 35, 43, 51, 63])) + + results = dict(total_frames=15, start_index=0) + sampling = UniformSampleFrames( + clip_len=8, num_clips=1, test_mode=True, seed=0) + sampling_results = sampling(results) + assert sampling_results['clip_len'] == 8 + assert sampling_results['frame_interval'] is None + assert sampling_results['num_clips'] == 1 + assert_array_equal(sampling_results['frame_inds'], + np.array([0, 2, 4, 6, 8, 9, 11, 13])) + + results = dict(total_frames=7, start_index=0) + sampling = UniformSampleFrames( + clip_len=8, num_clips=1, test_mode=True, seed=0) + sampling_results = sampling(results) + assert sampling_results['clip_len'] == 8 + assert sampling_results['frame_interval'] is None + assert sampling_results['num_clips'] == 1 + assert_array_equal(sampling_results['frame_inds'], + np.array([0, 1, 2, 3, 4, 5, 6, 0])) + + results = dict(total_frames=7, start_index=0) + sampling = UniformSampleFrames( + clip_len=8, num_clips=8, test_mode=True, seed=0) + sampling_results = sampling(results) + assert sampling_results['clip_len'] == 8 + assert sampling_results['frame_interval'] is None + assert sampling_results['num_clips'] == 8 + assert len(sampling_results['frame_inds']) == 64 + + results = dict(total_frames=64, start_index=0) + sampling = UniformSampleFrames( + clip_len=8, num_clips=4, test_mode=True, seed=0) + sampling_results = sampling(results) + assert sampling_results['clip_len'] == 8 + assert sampling_results['frame_interval'] is None + assert sampling_results['num_clips'] == 4 + assert_array_equal( + sampling_results['frame_inds'], + np.array([ + 4, 15, 21, 24, 35, 43, 51, 63, 1, 11, 21, 26, 36, 47, 54, 56, + 0, 12, 18, 25, 38, 47, 55, 62, 0, 9, 21, 25, 37, 40, 49, 60 + ])) + + results = dict(total_frames=64, start_index=0) + sampling = UniformSampleFrames( + clip_len=8, num_clips=1, test_mode=False, seed=0) + sampling_results = sampling(results) + assert sampling_results['clip_len'] == 8 + assert sampling_results['frame_interval'] is None + assert sampling_results['num_clips'] == 1 + assert len(sampling_results['frame_inds']) == 8 + + results = dict(total_frames=7, start_index=0) + sampling = UniformSampleFrames( + clip_len=8, num_clips=1, test_mode=False, seed=0) + sampling_results = sampling(results) + assert sampling_results['clip_len'] == 8 + assert sampling_results['frame_interval'] is None + assert sampling_results['num_clips'] == 1 + assert len(sampling_results['frame_inds']) == 8 + + results = dict(total_frames=15, start_index=0) + sampling = UniformSampleFrames( + clip_len=8, num_clips=1, test_mode=False, seed=0) + sampling_results = sampling(results) + assert sampling_results['clip_len'] == 8 + assert sampling_results['frame_interval'] is None + assert sampling_results['num_clips'] == 1 + assert len(sampling_results['frame_inds']) == 8 + + @staticmethod + def test_pose_decode(): + kp = np.random.random([1, 16, 17, 2]) + kpscore = np.random.random([1, 16, 17]) + frame_inds = np.array([2, 4, 6, 8, 10]) + results = dict( + keypoint=kp, keypoint_score=kpscore, frame_inds=frame_inds) + pose_decode = PoseDecode() + assert str(pose_decode) == ('PoseDecode()') + decode_results = pose_decode(results) + assert_array_almost_equal(decode_results['keypoint'], kp[:, + frame_inds]) + assert_array_almost_equal(decode_results['keypoint_score'], + kpscore[:, frame_inds]) + + results = dict(keypoint=kp, keypoint_score=kpscore, total_frames=16) + pose_decode = PoseDecode() + decode_results = pose_decode(results) + assert_array_almost_equal(decode_results['keypoint'], kp) + assert_array_almost_equal(decode_results['keypoint_score'], kpscore) + + @staticmethod + def test_load_kinetics_pose(): + + def get_mode(arr): + cnt = defaultdict(lambda: 0) + for num in arr: + cnt[num] += 1 + max_val = max(cnt.values()) + return [k for k in cnt if cnt[k] == max_val], max_val + + with tempfile.TemporaryDirectory() as tmpdir: + filename = osp.join(tmpdir, 'tmp.pkl') + total_frames = 100 + img_shape = (224, 224) + frame_inds = np.random.choice(range(100), size=120) + frame_inds.sort() + anno_flag = np.random.random(120) > 0.1 + anno_inds = np.array([i for i, f in enumerate(anno_flag) if f]) + kp = np.random.random([120, 17, 3]) + dump(kp, filename) + results = dict( + filename=filename, + total_frames=total_frames, + img_shape=img_shape, + frame_inds=frame_inds) + + inp = cp.deepcopy(results) + + with pytest.raises(NotImplementedError): + LoadKineticsPose(squeeze=True, max_person=100, source='xxx') + + load_kinetics_pose = LoadKineticsPose( + squeeze=True, max_person=100, source='openpose-18') + + assert str(load_kinetics_pose) == ( + 'LoadKineticsPose(io_backend=disk, ' + 'squeeze=True, max_person=100, ' + "keypoint_weight={'face': 1, " + "'torso': 2, 'limb': 3}, " + 'source=openpose-18, kwargs={})') + return_results = load_kinetics_pose(inp) + assert return_results['keypoint'].shape[:-1] == \ + return_results['keypoint_score'].shape + + num_person = return_results['keypoint'].shape[0] + num_frame = return_results['keypoint'].shape[1] + assert num_person == get_mode(frame_inds)[1] + assert np.max(return_results['keypoint']) > 1 + assert num_frame == len(set(frame_inds)) + + inp = cp.deepcopy(results) + load_kinetics_pose = LoadKineticsPose( + squeeze=False, max_person=100, source='openpose-18') + return_results = load_kinetics_pose(inp) + assert return_results['keypoint'].shape[:-1] == \ + return_results['keypoint_score'].shape + + num_person = return_results['keypoint'].shape[0] + num_frame = return_results['keypoint'].shape[1] + assert num_person == get_mode(frame_inds)[1] + assert np.max(return_results['keypoint']) > 1 + assert num_frame == total_frames + + inp = cp.deepcopy(results) + inp['anno_inds'] = anno_inds + load_kinetics_pose = LoadKineticsPose( + squeeze=True, max_person=100, source='mmpose') + return_results = load_kinetics_pose(inp) + assert return_results['keypoint'].shape[:-1] == \ + return_results['keypoint_score'].shape + + num_person = return_results['keypoint'].shape[0] + num_frame = return_results['keypoint'].shape[1] + assert num_person == get_mode(frame_inds[anno_inds])[1] + assert np.max(return_results['keypoint']) <= 1 + assert num_frame == len(set(frame_inds[anno_inds])) + + inp = cp.deepcopy(results) + inp['anno_inds'] = anno_inds + load_kinetics_pose = LoadKineticsPose( + squeeze=True, max_person=2, source='mmpose') + return_results = load_kinetics_pose(inp) + assert return_results['keypoint'].shape[:-1] == \ + return_results['keypoint_score'].shape + + num_person = return_results['keypoint'].shape[0] + num_frame = return_results['keypoint'].shape[1] + assert num_person <= 2 + assert np.max(return_results['keypoint']) <= 1 + assert num_frame == len(set(frame_inds[anno_inds])) + + @staticmethod + def test_generate_pose_target(): + img_shape = (64, 64) + kp = np.array([[[[24, 24], [40, 40], [24, 40]]]]) + kpscore = np.array([[[1., 1., 1.]]]) + kp = np.concatenate([kp] * 8, axis=1) + kpscore = np.concatenate([kpscore] * 8, axis=1) + results = dict( + img_shape=img_shape, + keypoint=kp, + keypoint_score=kpscore, + modality='Pose') + + generate_pose_target = GeneratePoseTarget( + sigma=1, with_kp=True, left_kp=(0, ), right_kp=(1, ), skeletons=()) + assert str(generate_pose_target) == ('GeneratePoseTarget(sigma=1, ' + 'use_score=True, with_kp=True, ' + 'with_limb=False, skeletons=(), ' + 'double=False, left_kp=(0,), ' + 'right_kp=(1,))') + return_results = generate_pose_target(results) + assert return_results['imgs'].shape == (8, 64, 64, 3) + assert_array_almost_equal(return_results['imgs'][0], + return_results['imgs'][1]) + + results = dict(img_shape=img_shape, keypoint=kp, modality='Pose') + + generate_pose_target = GeneratePoseTarget( + sigma=1, with_kp=True, left_kp=(0, ), right_kp=(1, ), skeletons=()) + return_results = generate_pose_target(results) + assert return_results['imgs'].shape == (8, 64, 64, 3) + assert_array_almost_equal(return_results['imgs'][0], + return_results['imgs'][1]) + + generate_pose_target = GeneratePoseTarget( + sigma=1, + with_kp=False, + with_limb=True, + left_kp=(0, ), + right_kp=(1, ), + skeletons=((0, 1), (1, 2), (0, 2))) + return_results = generate_pose_target(results) + assert return_results['imgs'].shape == (8, 64, 64, 3) + assert_array_almost_equal(return_results['imgs'][0], + return_results['imgs'][1]) + + generate_pose_target = GeneratePoseTarget( + sigma=1, + with_kp=True, + with_limb=True, + left_kp=(0, ), + right_kp=(1, ), + skeletons=((0, 1), (1, 2), (0, 2))) + return_results = generate_pose_target(results) + assert return_results['imgs'].shape == (8, 64, 64, 6) + assert_array_almost_equal(return_results['imgs'][0], + return_results['imgs'][1]) + + generate_pose_target = GeneratePoseTarget( + sigma=1, + with_kp=True, + with_limb=True, + double=True, + left_kp=(0, ), + right_kp=(1, ), + skeletons=((0, 1), (1, 2), (0, 2))) + return_results = generate_pose_target(results) + imgs = return_results['imgs'] + assert imgs.shape == (16, 64, 64, 6) + assert_array_almost_equal(imgs[0], imgs[1]) + assert_array_almost_equal(imgs[:8, 2], imgs[8:, 2, :, ::-1]) + assert_array_almost_equal(imgs[:8, 0], imgs[8:, 1, :, ::-1]) + assert_array_almost_equal(imgs[:8, 1], imgs[8:, 0, :, ::-1]) + + img_shape = (64, 64) + kp = np.array([[[[24, 24], [40, 40], [24, 40]]]]) + kpscore = np.array([[[0., 0., 0.]]]) + kp = np.concatenate([kp] * 8, axis=1) + kpscore = np.concatenate([kpscore] * 8, axis=1) + results = dict( + img_shape=img_shape, + keypoint=kp, + keypoint_score=kpscore, + modality='Pose') + generate_pose_target = GeneratePoseTarget( + sigma=1, with_kp=True, left_kp=(0, ), right_kp=(1, ), skeletons=()) + return_results = generate_pose_target(results) + assert_array_almost_equal(return_results['imgs'], 0) + + img_shape = (64, 64) + kp = np.array([[[[24, 24], [40, 40], [24, 40]]]]) + kpscore = np.array([[[0., 0., 0.]]]) + kp = np.concatenate([kp] * 8, axis=1) + kpscore = np.concatenate([kpscore] * 8, axis=1) + results = dict( + img_shape=img_shape, + keypoint=kp, + keypoint_score=kpscore, + modality='Pose') + generate_pose_target = GeneratePoseTarget( + sigma=1, + with_kp=False, + with_limb=True, + left_kp=(0, ), + right_kp=(1, ), + skeletons=((0, 1), (1, 2), (0, 2))) + return_results = generate_pose_target(results) + assert_array_almost_equal(return_results['imgs'], 0) + + img_shape = (64, 64) + kp = np.array([[[[124, 124], [140, 140], [124, 140]]]]) + kpscore = np.array([[[0., 0., 0.]]]) + kp = np.concatenate([kp] * 8, axis=1) + kpscore = np.concatenate([kpscore] * 8, axis=1) + results = dict( + img_shape=img_shape, + keypoint=kp, + keypoint_score=kpscore, + modality='Pose') + generate_pose_target = GeneratePoseTarget( + sigma=1, with_kp=True, left_kp=(0, ), right_kp=(1, ), skeletons=()) + return_results = generate_pose_target(results) + assert_array_almost_equal(return_results['imgs'], 0) + + img_shape = (64, 64) + kp = np.array([[[[124, 124], [140, 140], [124, 140]]]]) + kpscore = np.array([[[0., 0., 0.]]]) + kp = np.concatenate([kp] * 8, axis=1) + kpscore = np.concatenate([kpscore] * 8, axis=1) + results = dict( + img_shape=img_shape, + keypoint=kp, + keypoint_score=kpscore, + modality='Pose') + generate_pose_target = GeneratePoseTarget( + sigma=1, + with_kp=False, + with_limb=True, + left_kp=(0, ), + right_kp=(1, ), + skeletons=((0, 1), (1, 2), (0, 2))) + return_results = generate_pose_target(results) + assert_array_almost_equal(return_results['imgs'], 0) + + @staticmethod + def test_padding_with_loop(): + results = dict(total_frames=3) + sampling = PaddingWithLoop(clip_len=6) + sampling_results = sampling(results) + assert sampling_results['clip_len'] == 6 + assert sampling_results['frame_interval'] is None + assert sampling_results['num_clips'] == 1 + assert_array_equal(sampling_results['frame_inds'], + np.array([0, 1, 2, 0, 1, 2])) + + @staticmethod + def test_pose_normalize(): + target_keys = ['keypoint', 'keypoint_norm_cfg'] + keypoints = np.random.randn(3, 300, 17, 2) + results = dict(keypoint=keypoints) + pose_normalize = PoseNormalize( + mean=[960., 540., 0.5], + min_value=[0., 0., 0.], + max_value=[1920, 1080, 1.]) + normalize_results = pose_normalize(results) + assert assert_dict_has_keys(normalize_results, target_keys) + check_pose_normalize(keypoints, normalize_results['keypoint'], + normalize_results['keypoint_norm_cfg']) + + +def check_pose_normalize(origin_keypoints, result_keypoints, norm_cfg): + target_keypoints = result_keypoints.copy() + target_keypoints *= (norm_cfg['max_value'] - norm_cfg['min_value']) + target_keypoints += norm_cfg['mean'] + assert_array_almost_equal(origin_keypoints, target_keypoints, decimal=4) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_sampling.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_sampling.py new file mode 100644 index 0000000000000000000000000000000000000000..ff08436ac960176f2a6078937f911ebb20f411b9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_pipelines/test_loadings/test_sampling.py @@ -0,0 +1,757 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy + +import numpy as np +import pytest +from mmcv.utils import assert_dict_has_keys +from numpy.testing import assert_array_equal + +from mmaction.datasets.pipelines import (AudioFeatureSelector, + DenseSampleFrames, SampleAVAFrames, + SampleFrames, SampleProposalFrames, + UntrimmedSampleFrames) +from .base import BaseTestLoading + + +class TestSampling(BaseTestLoading): + + def test_sample_frames(self): + target_keys = [ + 'frame_inds', 'clip_len', 'frame_interval', 'num_clips', + 'total_frames' + ] + + with pytest.warns(UserWarning): + # start_index has been deprecated + config = dict( + clip_len=3, frame_interval=1, num_clips=5, start_index=1) + SampleFrames(**config) + + # Sample Frame with tail Frames + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=3, frame_interval=1, num_clips=5, keep_tail_frames=True) + sample_frames = SampleFrames(**config) + sample_frames(video_result) + sample_frames(frame_result) + + # Sample Frame with no temporal_jitter + # clip_len=3, frame_interval=1, num_clips=5 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=3, frame_interval=1, num_clips=5, temporal_jitter=False) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 15 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 15 + assert np.max(sample_frames_results['frame_inds']) <= 5 + assert np.min(sample_frames_results['frame_inds']) >= 1 + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={3}, ' + f'frame_interval={1}, ' + f'num_clips={5}, ' + f'temporal_jitter={False}, ' + f'twice_sample={False}, ' + f'out_of_bound_opt=loop, ' + f'test_mode={False})') + + # Sample Frame with no temporal_jitter + # clip_len=5, frame_interval=1, num_clips=5, + # out_of_bound_opt='repeat_last' + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=5, + frame_interval=1, + num_clips=5, + temporal_jitter=False, + out_of_bound_opt='repeat_last') + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={5}, ' + f'frame_interval={1}, ' + f'num_clips={5}, ' + f'temporal_jitter={False}, ' + f'twice_sample={False}, ' + f'out_of_bound_opt=repeat_last, ' + f'test_mode={False})') + + def check_monotonous(arr): + length = arr.shape[0] + for i in range(length - 1): + if arr[i] > arr[i + 1]: + return False + return True + + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 25 + frame_inds = sample_frames_results['frame_inds'].reshape([5, 5]) + for i in range(5): + assert check_monotonous(frame_inds[i]) + + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 25 + frame_inds = sample_frames_results['frame_inds'].reshape([5, 5]) + for i in range(5): + assert check_monotonous(frame_inds[i]) + assert np.max(sample_frames_results['frame_inds']) <= 5 + assert np.min(sample_frames_results['frame_inds']) >= 1 + + # Sample Frame with temporal_jitter + # clip_len=4, frame_interval=2, num_clips=5 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=4, frame_interval=2, num_clips=5, temporal_jitter=True) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 20 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 20 + assert np.max(sample_frames_results['frame_inds']) <= 5 + assert np.min(sample_frames_results['frame_inds']) >= 1 + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={4}, ' + f'frame_interval={2}, ' + f'num_clips={5}, ' + f'temporal_jitter={True}, ' + f'twice_sample={False}, ' + f'out_of_bound_opt=loop, ' + f'test_mode={False})') + + # Sample Frame with no temporal_jitter in test mode + # clip_len=4, frame_interval=1, num_clips=6 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=4, + frame_interval=1, + num_clips=6, + temporal_jitter=False, + test_mode=True) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 24 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 24 + assert np.max(sample_frames_results['frame_inds']) <= 5 + assert np.min(sample_frames_results['frame_inds']) >= 1 + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={4}, ' + f'frame_interval={1}, ' + f'num_clips={6}, ' + f'temporal_jitter={False}, ' + f'twice_sample={False}, ' + f'out_of_bound_opt=loop, ' + f'test_mode={True})') + + # Sample Frame with no temporal_jitter in test mode + # clip_len=3, frame_interval=1, num_clips=6 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=3, + frame_interval=1, + num_clips=6, + temporal_jitter=False, + test_mode=True) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 18 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 18 + assert np.max(sample_frames_results['frame_inds']) <= 5 + assert np.min(sample_frames_results['frame_inds']) >= 1 + + # Sample Frame with no temporal_jitter to get clip_offsets + # clip_len=1, frame_interval=1, num_clips=8 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + frame_result['total_frames'] = 6 + config = dict( + clip_len=1, + frame_interval=1, + num_clips=8, + temporal_jitter=False, + test_mode=True) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 8 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 8 + assert_array_equal(sample_frames_results['frame_inds'], + np.array([1, 2, 2, 3, 4, 5, 5, 6])) + + # Sample Frame with no temporal_jitter to get clip_offsets + # clip_len=1, frame_interval=1, num_clips=8 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + frame_result['total_frames'] = 6 + config = dict( + clip_len=1, + frame_interval=1, + num_clips=8, + temporal_jitter=False, + test_mode=True) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 8 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 8 + assert_array_equal(sample_frames_results['frame_inds'], + np.array([1, 2, 2, 3, 4, 5, 5, 6])) + + # Sample Frame with no temporal_jitter to get clip_offsets zero + # clip_len=6, frame_interval=1, num_clips=1 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + frame_result['total_frames'] = 5 + config = dict( + clip_len=6, + frame_interval=1, + num_clips=1, + temporal_jitter=False, + test_mode=True) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 6 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 6 + assert_array_equal(sample_frames_results['frame_inds'], + [1, 2, 3, 4, 5, 1]) + + # Sample Frame with no temporal_jitter to get avg_interval <= 0 + # clip_len=12, frame_interval=1, num_clips=20 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + frame_result['total_frames'] = 30 + config = dict( + clip_len=12, + frame_interval=1, + num_clips=20, + temporal_jitter=False, + test_mode=False) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 240 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 240 + assert np.max(sample_frames_results['frame_inds']) <= 30 + assert np.min(sample_frames_results['frame_inds']) >= 1 + + # Sample Frame with no temporal_jitter to get clip_offsets + # clip_len=1, frame_interval=1, num_clips=8 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + frame_result['total_frames'] = 6 + config = dict( + clip_len=1, + frame_interval=1, + num_clips=8, + temporal_jitter=False, + test_mode=False) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert sample_frames_results['start_index'] == 0 + assert len(sample_frames_results['frame_inds']) == 8 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 8 + assert_array_equal(sample_frames_results['frame_inds'], + np.array([1, 2, 3, 3, 4, 5, 5, 6])) + + # Sample Frame with no temporal_jitter to get clip_offsets zero + # clip_len=12, frame_interval=1, num_clips=2 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + frame_result['total_frames'] = 10 + config = dict( + clip_len=12, + frame_interval=1, + num_clips=2, + temporal_jitter=False, + test_mode=False) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 24 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 24 + assert np.max(sample_frames_results['frame_inds']) <= 10 + assert np.min(sample_frames_results['frame_inds']) >= 1 + + # Sample Frame using twice sample + # clip_len=12, frame_interval=1, num_clips=2 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + frame_result['total_frames'] = 40 + config = dict( + clip_len=12, + frame_interval=1, + num_clips=2, + temporal_jitter=False, + twice_sample=True, + test_mode=True) + sample_frames = SampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 48 + sample_frames_results = sample_frames(frame_result) + assert len(sample_frames_results['frame_inds']) == 48 + assert np.max(sample_frames_results['frame_inds']) <= 40 + assert np.min(sample_frames_results['frame_inds']) >= 1 + + def test_dense_sample_frames(self): + target_keys = [ + 'frame_inds', 'clip_len', 'frame_interval', 'num_clips', + 'total_frames' + ] + + # Dense sample with no temporal_jitter in test mode + # clip_len=4, frame_interval=1, num_clips=6 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=4, + frame_interval=1, + num_clips=6, + temporal_jitter=False, + test_mode=True) + dense_sample_frames = DenseSampleFrames(**config) + dense_sample_frames_results = dense_sample_frames(video_result) + assert dense_sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(dense_sample_frames_results, target_keys) + assert len(dense_sample_frames_results['frame_inds']) == 240 + dense_sample_frames_results = dense_sample_frames(frame_result) + assert len(dense_sample_frames_results['frame_inds']) == 240 + assert repr(dense_sample_frames) == ( + f'{dense_sample_frames.__class__.__name__}(' + f'clip_len={4}, ' + f'frame_interval={1}, ' + f'num_clips={6}, ' + f'sample_range={64}, ' + f'num_sample_positions={10}, ' + f'temporal_jitter={False}, ' + f'out_of_bound_opt=loop, ' + f'test_mode={True})') + + # Dense sample with no temporal_jitter + # clip_len=4, frame_interval=1, num_clips=6 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=4, frame_interval=1, num_clips=6, temporal_jitter=False) + dense_sample_frames = DenseSampleFrames(**config) + dense_sample_frames_results = dense_sample_frames(video_result) + assert dense_sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(dense_sample_frames_results, target_keys) + assert len(dense_sample_frames_results['frame_inds']) == 24 + dense_sample_frames_results = dense_sample_frames(frame_result) + assert len(dense_sample_frames_results['frame_inds']) == 24 + + # Dense sample with no temporal_jitter, sample_range=32 in test mode + # clip_len=4, frame_interval=1, num_clips=6 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=4, + frame_interval=1, + num_clips=6, + sample_range=32, + temporal_jitter=False, + test_mode=True) + dense_sample_frames = DenseSampleFrames(**config) + dense_sample_frames_results = dense_sample_frames(video_result) + assert dense_sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(dense_sample_frames_results, target_keys) + assert len(dense_sample_frames_results['frame_inds']) == 240 + dense_sample_frames_results = dense_sample_frames(frame_result) + assert len(dense_sample_frames_results['frame_inds']) == 240 + + # Dense sample with no temporal_jitter, sample_range=32 + # clip_len=4, frame_interval=1, num_clips=6 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=4, + frame_interval=1, + num_clips=6, + sample_range=32, + temporal_jitter=False) + dense_sample_frames = DenseSampleFrames(**config) + dense_sample_frames_results = dense_sample_frames(video_result) + assert dense_sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(dense_sample_frames_results, target_keys) + assert len(dense_sample_frames_results['frame_inds']) == 24 + dense_sample_frames_results = dense_sample_frames(frame_result) + assert len(dense_sample_frames_results['frame_inds']) == 24 + assert repr(dense_sample_frames) == ( + f'{dense_sample_frames.__class__.__name__}(' + f'clip_len={4}, ' + f'frame_interval={1}, ' + f'num_clips={6}, ' + f'sample_range={32}, ' + f'num_sample_positions={10}, ' + f'temporal_jitter={False}, ' + f'out_of_bound_opt=loop, ' + f'test_mode={False})') + + # Dense sample with no temporal_jitter, sample_range=1000 to check mod + # clip_len=4, frame_interval=1, num_clips=6 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=4, + frame_interval=1, + num_clips=6, + sample_range=1000, + temporal_jitter=False) + dense_sample_frames = DenseSampleFrames(**config) + dense_sample_frames_results = dense_sample_frames(video_result) + assert dense_sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(dense_sample_frames_results, target_keys) + assert len(dense_sample_frames_results['frame_inds']) == 24 + dense_sample_frames_results = dense_sample_frames(frame_result) + assert len(dense_sample_frames_results['frame_inds']) == 24 + + # Dense sample with no temporal_jitter in test mode + # sample_range=32, num_sample_positions=5 + # clip_len=4, frame_interval=1, num_clips=6 + video_result = copy.deepcopy(self.video_results) + frame_result = copy.deepcopy(self.frame_results) + config = dict( + clip_len=4, + frame_interval=1, + num_clips=6, + num_sample_positions=5, + sample_range=32, + temporal_jitter=False, + test_mode=True) + dense_sample_frames = DenseSampleFrames(**config) + dense_sample_frames_results = dense_sample_frames(video_result) + assert dense_sample_frames_results['start_index'] == 0 + assert assert_dict_has_keys(dense_sample_frames_results, target_keys) + assert len(dense_sample_frames_results['frame_inds']) == 120 + dense_sample_frames_results = dense_sample_frames(frame_result) + assert len(dense_sample_frames_results['frame_inds']) == 120 + assert repr(dense_sample_frames) == ( + f'{dense_sample_frames.__class__.__name__}(' + f'clip_len={4}, ' + f'frame_interval={1}, ' + f'num_clips={6}, ' + f'sample_range={32}, ' + f'num_sample_positions={5}, ' + f'temporal_jitter={False}, ' + f'out_of_bound_opt=loop, ' + f'test_mode={True})') + + def test_untrim_sample_frames(self): + + target_keys = [ + 'frame_inds', 'clip_len', 'frame_interval', 'num_clips', + 'total_frames' + ] + + frame_result = dict( + frame_dir=None, + total_frames=100, + filename_tmpl=None, + modality='RGB', + start_index=0, + label=1) + video_result = copy.deepcopy(self.video_results) + + config = dict(clip_len=1, frame_interval=16, start_index=0) + sample_frames = UntrimmedSampleFrames(**config) + sample_frames_results = sample_frames(frame_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 6 + assert_array_equal(sample_frames_results['frame_inds'], + np.array([8, 24, 40, 56, 72, 88])) + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={1}, ' + f'frame_interval={16})') + + config = dict(clip_len=1, frame_interval=16, start_index=0) + sample_frames = UntrimmedSampleFrames(**config) + sample_frames_results = sample_frames(video_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + frame_inds = np.array(list(range(8, 300, 16))) + assert len(sample_frames_results['frame_inds']) == frame_inds.shape[0] + assert_array_equal(sample_frames_results['frame_inds'], frame_inds) + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={1}, ' + f'frame_interval={16})') + + config = dict(clip_len=1, frame_interval=16) + sample_frames = UntrimmedSampleFrames(**config) + frame_result_ = copy.deepcopy(frame_result) + frame_result_['start_index'] = 1 + sample_frames_results = sample_frames(frame_result_) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 6 + assert_array_equal(sample_frames_results['frame_inds'], + np.array([8, 24, 40, 56, 72, 88]) + 1) + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={1}, ' + f'frame_interval={16})') + + config = dict(clip_len=3, frame_interval=16, start_index=0) + sample_frames = UntrimmedSampleFrames(**config) + sample_frames_results = sample_frames(frame_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 18 + assert_array_equal( + sample_frames_results['frame_inds'], + np.array([ + 7, 8, 9, 23, 24, 25, 39, 40, 41, 55, 56, 57, 71, 72, 73, 87, + 88, 89 + ])) + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={3}, ' + f'frame_interval={16})') + + def test_sample_ava_frames(self): + target_keys = [ + 'fps', 'timestamp', 'timestamp_start', 'shot_info', 'frame_inds', + 'clip_len', 'frame_interval' + ] + config = dict(clip_len=32, frame_interval=2) + sample_ava_dataset = SampleAVAFrames(**config) + ava_result = sample_ava_dataset(results=self.ava_results) + assert assert_dict_has_keys(ava_result, target_keys) + assert ava_result['clip_len'] == 32 + assert ava_result['frame_interval'] == 2 + assert len(ava_result['frame_inds']) == 32 + assert repr(sample_ava_dataset) == ( + f'{sample_ava_dataset.__class__.__name__}(' + f'clip_len={32}, ' + f'frame_interval={2}, ' + f'test_mode={False})') + + # add test case in Issue #306 + config = dict(clip_len=8, frame_interval=8) + sample_ava_dataset = SampleAVAFrames(**config) + ava_result = sample_ava_dataset(results=self.ava_results) + assert assert_dict_has_keys(ava_result, target_keys) + assert ava_result['clip_len'] == 8 + assert ava_result['frame_interval'] == 8 + assert len(ava_result['frame_inds']) == 8 + assert repr(sample_ava_dataset) == ( + f'{sample_ava_dataset.__class__.__name__}(' + f'clip_len={8}, ' + f'frame_interval={8}, ' + f'test_mode={False})') + + def test_sample_proposal_frames(self): + target_keys = [ + 'frame_inds', 'clip_len', 'frame_interval', 'num_clips', + 'total_frames', 'start_index' + ] + + # test error cases + with pytest.raises(TypeError): + proposal_result = copy.deepcopy(self.proposal_results) + config = dict( + clip_len=1, + frame_interval=1, + body_segments=2, + aug_segments=('error', 'error'), + aug_ratio=0.5, + temporal_jitter=False) + sample_frames = SampleProposalFrames(**config) + sample_frames(proposal_result) + + # test normal cases + # Sample Frame with no temporal_jitter + # clip_len=1, frame_interval=1 + # body_segments=2, aug_segments=(1, 1) + proposal_result = copy.deepcopy(self.proposal_results) + proposal_result['total_frames'] = 9 + config = dict( + clip_len=1, + frame_interval=1, + body_segments=2, + aug_segments=(1, 1), + aug_ratio=0.5, + temporal_jitter=False) + sample_frames = SampleProposalFrames(**config) + sample_frames_results = sample_frames(proposal_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 8 + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={1}, ' + f'body_segments={2}, ' + f'aug_segments={(1, 1)}, ' + f'aug_ratio={(0.5, 0.5)}, ' + f'frame_interval={1}, ' + f'test_interval={6}, ' + f'temporal_jitter={False}, ' + f'mode=train)') + + # Sample Frame with temporal_jitter + # clip_len=1, frame_interval=1 + # body_segments=2, aug_segments=(1, 1) + proposal_result = copy.deepcopy(self.proposal_results) + proposal_result['total_frames'] = 9 + config = dict( + clip_len=1, + frame_interval=1, + body_segments=2, + aug_segments=(1, 1), + aug_ratio=0.5, + temporal_jitter=True) + sample_frames = SampleProposalFrames(**config) + sample_frames_results = sample_frames(proposal_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 8 + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={1}, ' + f'body_segments={2}, ' + f'aug_segments={(1, 1)}, ' + f'aug_ratio={(0.5, 0.5)}, ' + f'frame_interval={1}, ' + f'test_interval={6}, ' + f'temporal_jitter={True}, ' + f'mode=train)') + + # Sample Frame with no temporal_jitter in val mode + # clip_len=1, frame_interval=1 + # body_segments=2, aug_segments=(1, 1) + proposal_result = copy.deepcopy(self.proposal_results) + proposal_result['total_frames'] = 9 + config = dict( + clip_len=1, + frame_interval=1, + body_segments=2, + aug_segments=(1, 1), + aug_ratio=0.5, + temporal_jitter=False, + mode='val') + sample_frames = SampleProposalFrames(**config) + sample_frames_results = sample_frames(proposal_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 8 + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={1}, ' + f'body_segments={2}, ' + f'aug_segments={(1, 1)}, ' + f'aug_ratio={(0.5, 0.5)}, ' + f'frame_interval={1}, ' + f'test_interval={6}, ' + f'temporal_jitter={False}, ' + f'mode=val)') + + # Sample Frame with no temporal_jitter in test mode + # test_interval=2 + proposal_result = copy.deepcopy(self.proposal_results) + proposal_result['out_proposals'] = None + proposal_result['total_frames'] = 10 + config = dict( + clip_len=1, + frame_interval=1, + body_segments=2, + aug_segments=(1, 1), + aug_ratio=0.5, + test_interval=2, + temporal_jitter=False, + mode='test') + sample_frames = SampleProposalFrames(**config) + sample_frames_results = sample_frames(proposal_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 5 + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={1}, ' + f'body_segments={2}, ' + f'aug_segments={(1, 1)}, ' + f'aug_ratio={(0.5, 0.5)}, ' + f'frame_interval={1}, ' + f'test_interval={2}, ' + f'temporal_jitter={False}, ' + f'mode=test)') + + # Sample Frame with no temporal_jitter to get clip_offsets zero + # clip_len=1, frame_interval=1 + # body_segments=2, aug_segments=(1, 1) + proposal_result = copy.deepcopy(self.proposal_results) + proposal_result['total_frames'] = 3 + config = dict( + clip_len=1, + frame_interval=1, + body_segments=2, + aug_segments=(1, 1), + aug_ratio=0.5, + temporal_jitter=False) + sample_frames = SampleProposalFrames(**config) + sample_frames_results = sample_frames(proposal_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 8 + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={1}, ' + f'body_segments={2}, ' + f'aug_segments={(1, 1)}, ' + f'aug_ratio={(0.5, 0.5)}, ' + f'frame_interval={1}, ' + f'test_interval={6}, ' + f'temporal_jitter={False}, ' + f'mode=train)') + + # Sample Frame with no temporal_jitter to + # get clip_offsets zero in val mode + # clip_len=1, frame_interval=1 + # body_segments=4, aug_segments=(2, 2) + proposal_result = copy.deepcopy(self.proposal_results) + proposal_result['total_frames'] = 3 + config = dict( + clip_len=1, + frame_interval=1, + body_segments=4, + aug_segments=(2, 2), + aug_ratio=0.5, + temporal_jitter=False, + mode='val') + sample_frames = SampleProposalFrames(**config) + sample_frames_results = sample_frames(proposal_result) + assert assert_dict_has_keys(sample_frames_results, target_keys) + assert len(sample_frames_results['frame_inds']) == 16 + assert repr(sample_frames) == (f'{sample_frames.__class__.__name__}(' + f'clip_len={1}, ' + f'body_segments={4}, ' + f'aug_segments={(2, 2)}, ' + f'aug_ratio={(0.5, 0.5)}, ' + f'frame_interval={1}, ' + f'test_interval={6}, ' + f'temporal_jitter={False}, ' + f'mode=val)') + + def test_audio_feature_selector(self): + target_keys = ['audios'] + # test frame selector with 2 dim input + inputs = copy.deepcopy(self.audio_feature_results) + inputs['frame_inds'] = np.arange(0, self.audio_total_frames, + 2)[:, np.newaxis] + inputs['num_clips'] = 1 + inputs['length'] = 1280 + audio_feature_selector = AudioFeatureSelector() + results = audio_feature_selector(inputs) + assert assert_dict_has_keys(results, target_keys) + assert repr(audio_feature_selector) == ( + f'{audio_feature_selector.__class__.__name__}(' + f'fix_length={128})') diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_sampler.py b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_sampler.py new file mode 100644 index 0000000000000000000000000000000000000000..19bfd64a9592cb66b3e06d1a8feb54c45b85bec7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_data/test_sampler.py @@ -0,0 +1,96 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from torch.utils.data import DataLoader, Dataset + +from mmaction.datasets.samplers import (ClassSpecificDistributedSampler, + DistributedSampler) + + +class MyDataset(Dataset): + + def __init__(self, class_prob={i: 1 for i in range(10)}): + super().__init__() + self.class_prob = class_prob + self.video_infos = [ + dict(data=idx, label=idx % 10) for idx in range(100) + ] + + def __len__(self): + return len(self.video_infos) + + def __getitem__(self, idx): + return self.video_infos[idx] + + +def test_distributed_sampler(): + dataset = MyDataset() + sampler = DistributedSampler(dataset, num_replicas=1, rank=0) + data_loader = DataLoader(dataset, batch_size=4, sampler=sampler) + batches = [] + for _, data in enumerate(data_loader): + batches.append(data) + + assert len(batches) == 25 + assert sum([len(x['data']) for x in batches]) == 100 + + sampler = DistributedSampler(dataset, num_replicas=4, rank=2) + data_loader = DataLoader(dataset, batch_size=4, sampler=sampler) + batches = [] + for i, data in enumerate(data_loader): + batches.append(data) + + assert len(batches) == 7 + assert sum([len(x['data']) for x in batches]) == 25 + + sampler = DistributedSampler(dataset, num_replicas=6, rank=3) + data_loader = DataLoader(dataset, batch_size=4, sampler=sampler) + batches = [] + for i, data in enumerate(data_loader): + batches.append(data) + + assert len(batches) == 5 + assert sum([len(x['data']) for x in batches]) == 17 + + +def test_class_specific_distributed_sampler(): + class_prob = dict(zip(list(range(10)), [1] * 5 + [3] * 5)) + dataset = MyDataset(class_prob=class_prob) + + sampler = ClassSpecificDistributedSampler( + dataset, num_replicas=1, rank=0, dynamic_length=True) + data_loader = DataLoader(dataset, batch_size=4, sampler=sampler) + batches = [] + for _, data in enumerate(data_loader): + batches.append(data) + + assert len(batches) == 50 + assert sum([len(x['data']) for x in batches]) == 200 + + sampler = ClassSpecificDistributedSampler( + dataset, num_replicas=1, rank=0, dynamic_length=False) + data_loader = DataLoader(dataset, batch_size=4, sampler=sampler) + batches = [] + for i, data in enumerate(data_loader): + batches.append(data) + + assert len(batches) == 25 + assert sum([len(x['data']) for x in batches]) == 100 + + sampler = ClassSpecificDistributedSampler( + dataset, num_replicas=6, rank=2, dynamic_length=True) + data_loader = DataLoader(dataset, batch_size=4, sampler=sampler) + batches = [] + for i, data in enumerate(data_loader): + batches.append(data) + + assert len(batches) == 9 + assert sum([len(x['data']) for x in batches]) == 34 + + sampler = ClassSpecificDistributedSampler( + dataset, num_replicas=6, rank=2, dynamic_length=False) + data_loader = DataLoader(dataset, batch_size=4, sampler=sampler) + batches = [] + for i, data in enumerate(data_loader): + batches.append(data) + + assert len(batches) == 5 + assert sum([len(x['data']) for x in batches]) == 17 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_metrics/test_accuracy.py b/openmmlab_test/mmaction2-0.24.1/tests/test_metrics/test_accuracy.py new file mode 100644 index 0000000000000000000000000000000000000000..e2ac82cbda371026a4a01e899063250955e1b8f8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_metrics/test_accuracy.py @@ -0,0 +1,343 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp +import random + +import numpy as np +import pytest +from numpy.testing import assert_array_almost_equal, assert_array_equal + +from mmaction.core import (ActivityNetLocalization, + average_recall_at_avg_proposals, confusion_matrix, + get_weighted_score, mean_average_precision, + mean_class_accuracy, mmit_mean_average_precision, + pairwise_temporal_iou, top_k_accuracy, + top_k_classes) +from mmaction.core.evaluation.ava_utils import ava_eval + + +def gt_confusion_matrix(gt_labels, pred_labels, normalize=None): + """Calculate the ground truth confusion matrix.""" + max_index = max(max(gt_labels), max(pred_labels)) + confusion_mat = np.zeros((max_index + 1, max_index + 1), dtype=np.int64) + for gt, pred in zip(gt_labels, pred_labels): + confusion_mat[gt][pred] += 1 + del_index = [] + for i in range(max_index): + if sum(confusion_mat[i]) == 0 and sum(confusion_mat[:, i]) == 0: + del_index.append(i) + confusion_mat = np.delete(confusion_mat, del_index, axis=0) + confusion_mat = np.delete(confusion_mat, del_index, axis=1) + + if normalize is not None: + confusion_mat = np.array(confusion_mat, dtype=np.float) + m, n = confusion_mat.shape + if normalize == 'true': + for i in range(m): + s = np.sum(confusion_mat[i], dtype=float) + if s == 0: + continue + confusion_mat[i, :] = confusion_mat[i, :] / s + print(confusion_mat[i, :]) + elif normalize == 'pred': + for i in range(n): + s = sum(confusion_mat[:, i]) + if s == 0: + continue + confusion_mat[:, i] = confusion_mat[:, i] / s + elif normalize == 'all': + s = np.sum(confusion_mat) + if s != 0: + confusion_mat /= s + + return confusion_mat + + +def test_activitynet_localization(): + data_prefix = osp.normpath( + osp.join(osp.dirname(__file__), '../data/eval_localization')) + + gt_path = osp.join(data_prefix, 'gt.json') + result_path = osp.join(data_prefix, 'result.json') + localization = ActivityNetLocalization(gt_path, result_path) + + results = localization.evaluate() + mAP = np.array([ + 0.71428571, 0.71428571, 0.71428571, 0.6875, 0.6875, 0.59722222, + 0.52083333, 0.52083333, 0.52083333, 0.5 + ]) + average_mAP = 0.6177579365079365 + + assert_array_almost_equal(results[0], mAP) + assert_array_almost_equal(results[1], average_mAP) + + +def test_ava_detection(): + data_prefix = osp.normpath( + osp.join(osp.dirname(__file__), '../data/eval_detection')) + + gt_path = osp.join(data_prefix, 'gt.csv') + result_path = osp.join(data_prefix, 'pred.csv') + label_map = osp.join(data_prefix, 'action_list.txt') + + # eval bbox + detection = ava_eval(result_path, 'mAP', label_map, gt_path, None) + assert_array_almost_equal(detection['mAP@0.5IOU'], 0.09385522) + + +def test_confusion_matrix(): + # custom confusion_matrix + gt_labels = [np.int64(random.randint(0, 9)) for _ in range(100)] + pred_labels = np.random.randint(10, size=100, dtype=np.int64) + + for normalize in [None, 'true', 'pred', 'all']: + cf_mat = confusion_matrix(pred_labels, gt_labels, normalize) + gt_cf_mat = gt_confusion_matrix(gt_labels, pred_labels, normalize) + assert_array_equal(cf_mat, gt_cf_mat) + + with pytest.raises(ValueError): + # normalize must be in ['true', 'pred', 'all', None] + confusion_matrix([1], [1], 'unsupport') + + with pytest.raises(TypeError): + # y_pred must be list or np.ndarray + confusion_matrix(0.5, [1]) + + with pytest.raises(TypeError): + # y_real must be list or np.ndarray + confusion_matrix([1], 0.5) + + with pytest.raises(TypeError): + # y_pred dtype must be np.int64 + confusion_matrix([0.5], [1]) + + with pytest.raises(TypeError): + # y_real dtype must be np.int64 + confusion_matrix([1], [0.5]) + + +def test_topk(): + scores = [ + np.array([-0.2203, -0.7538, 1.8789, 0.4451, -0.2526]), + np.array([-0.0413, 0.6366, 1.1155, 0.3484, 0.0395]), + np.array([0.0365, 0.5158, 1.1067, -0.9276, -0.2124]), + np.array([0.6232, 0.9912, -0.8562, 0.0148, 1.6413]) + ] + + # top1 acc + k = (1, ) + top1_labels_0 = [3, 1, 1, 1] + top1_labels_25 = [2, 0, 4, 3] + top1_labels_50 = [2, 2, 3, 1] + top1_labels_75 = [2, 2, 2, 3] + top1_labels_100 = [2, 2, 2, 4] + res = top_k_accuracy(scores, top1_labels_0, k) + assert res == [0] + res = top_k_accuracy(scores, top1_labels_25, k) + assert res == [0.25] + res = top_k_accuracy(scores, top1_labels_50, k) + assert res == [0.5] + res = top_k_accuracy(scores, top1_labels_75, k) + assert res == [0.75] + res = top_k_accuracy(scores, top1_labels_100, k) + assert res == [1.0] + + # top1 acc, top2 acc + k = (1, 2) + top2_labels_0_100 = [3, 1, 1, 1] + top2_labels_25_75 = [3, 1, 2, 3] + res = top_k_accuracy(scores, top2_labels_0_100, k) + assert res == [0, 1.0] + res = top_k_accuracy(scores, top2_labels_25_75, k) + assert res == [0.25, 0.75] + + # top1 acc, top3 acc, top5 acc + k = (1, 3, 5) + top5_labels_0_0_100 = [1, 0, 3, 2] + top5_labels_0_50_100 = [1, 3, 4, 0] + top5_labels_25_75_100 = [2, 3, 0, 2] + res = top_k_accuracy(scores, top5_labels_0_0_100, k) + assert res == [0, 0, 1.0] + res = top_k_accuracy(scores, top5_labels_0_50_100, k) + assert res == [0, 0.5, 1.0] + res = top_k_accuracy(scores, top5_labels_25_75_100, k) + assert res == [0.25, 0.75, 1.0] + + +def test_mean_class_accuracy(): + scores = [ + np.array([-0.2203, -0.7538, 1.8789, 0.4451, -0.2526]), + np.array([-0.0413, 0.6366, 1.1155, 0.3484, 0.0395]), + np.array([0.0365, 0.5158, 1.1067, -0.9276, -0.2124]), + np.array([0.6232, 0.9912, -0.8562, 0.0148, 1.6413]) + ] + + # test mean class accuracy in [0, 0.25, 1/3, 0.75, 1.0] + mean_cls_acc_0 = np.int64([1, 4, 0, 2]) + mean_cls_acc_25 = np.int64([2, 0, 4, 3]) + mean_cls_acc_33 = np.int64([2, 2, 2, 3]) + mean_cls_acc_75 = np.int64([4, 2, 2, 4]) + mean_cls_acc_100 = np.int64([2, 2, 2, 4]) + assert mean_class_accuracy(scores, mean_cls_acc_0) == 0 + assert mean_class_accuracy(scores, mean_cls_acc_25) == 0.25 + assert mean_class_accuracy(scores, mean_cls_acc_33) == 1 / 3 + assert mean_class_accuracy(scores, mean_cls_acc_75) == 0.75 + assert mean_class_accuracy(scores, mean_cls_acc_100) == 1.0 + + +def test_mmit_mean_average_precision(): + # One sample + y_true = [np.array([0, 0, 1, 1])] + y_scores = [np.array([0.1, 0.4, 0.35, 0.8])] + map = mmit_mean_average_precision(y_scores, y_true) + + precision = [2.0 / 3.0, 0.5, 1., 1.] + recall = [1., 0.5, 0.5, 0.] + target = -np.sum(np.diff(recall) * np.array(precision)[:-1]) + assert target == map + + +def test_pairwise_temporal_iou(): + target_segments = np.array([]) + candidate_segments = np.array([]) + with pytest.raises(ValueError): + pairwise_temporal_iou(target_segments, candidate_segments) + + # test temporal iou + target_segments = np.array([[1, 2], [2, 3]]) + candidate_segments = np.array([[2, 3], [2.5, 3]]) + temporal_iou = pairwise_temporal_iou(candidate_segments, target_segments) + assert_array_equal(temporal_iou, [[0, 0], [1, 0.5]]) + + # test temporal overlap_self + target_segments = np.array([[1, 2], [2, 3]]) + candidate_segments = np.array([[2, 3], [2.5, 3]]) + temporal_iou, temporal_overlap_self = pairwise_temporal_iou( + candidate_segments, target_segments, calculate_overlap_self=True) + assert_array_equal(temporal_overlap_self, [[0, 0], [1, 1]]) + + # test temporal overlap_self when candidate_segments is 1d + target_segments = np.array([[1, 2], [2, 3]]) + candidate_segments = np.array([2.5, 3]) + temporal_iou, temporal_overlap_self = pairwise_temporal_iou( + candidate_segments, target_segments, calculate_overlap_self=True) + assert_array_equal(temporal_overlap_self, [0, 1]) + + +def test_average_recall_at_avg_proposals(): + ground_truth1 = { + 'v_test1': np.array([[0, 1], [1, 2]]), + 'v_test2': np.array([[0, 1], [1, 2]]) + } + ground_truth2 = {'v_test1': np.array([[0, 1]])} + proposals1 = { + 'v_test1': np.array([[0, 1, 1], [1, 2, 1]]), + 'v_test2': np.array([[0, 1, 1], [1, 2, 1]]) + } + proposals2 = { + 'v_test1': np.array([[10, 11, 0.6], [11, 12, 0.4]]), + 'v_test2': np.array([[10, 11, 0.6], [11, 12, 0.4]]) + } + proposals3 = { + 'v_test1': np.array([[i, i + 1, 1 / (i + 1)] for i in range(100)]) + } + + recall, avg_recall, proposals_per_video, auc = ( + average_recall_at_avg_proposals(ground_truth1, proposals1, 4)) + assert_array_equal(recall, [[0.] * 49 + [0.5] * 50 + [1.]] * 10) + assert_array_equal(avg_recall, [0.] * 49 + [0.5] * 50 + [1.]) + assert_array_almost_equal( + proposals_per_video, np.arange(0.02, 2.02, 0.02), decimal=10) + assert auc == 25.5 + + recall, avg_recall, proposals_per_video, auc = ( + average_recall_at_avg_proposals(ground_truth1, proposals2, 4)) + assert_array_equal(recall, [[0.] * 100] * 10) + assert_array_equal(avg_recall, [0.] * 100) + assert_array_almost_equal( + proposals_per_video, np.arange(0.02, 2.02, 0.02), decimal=10) + assert auc == 0 + + recall, avg_recall, proposals_per_video, auc = ( + average_recall_at_avg_proposals(ground_truth2, proposals3, 100)) + assert_array_equal(recall, [[1.] * 100] * 10) + assert_array_equal(avg_recall, ([1.] * 100)) + assert_array_almost_equal( + proposals_per_video, np.arange(1, 101, 1), decimal=10) + assert auc == 99.0 + + +def test_get_weighted_score(): + score_a = [ + np.array([-0.2203, -0.7538, 1.8789, 0.4451, -0.2526]), + np.array([-0.0413, 0.6366, 1.1155, 0.3484, 0.0395]), + np.array([0.0365, 0.5158, 1.1067, -0.9276, -0.2124]), + np.array([0.6232, 0.9912, -0.8562, 0.0148, 1.6413]) + ] + score_b = [ + np.array([-0.0413, 0.6366, 1.1155, 0.3484, 0.0395]), + np.array([0.0365, 0.5158, 1.1067, -0.9276, -0.2124]), + np.array([0.6232, 0.9912, -0.8562, 0.0148, 1.6413]), + np.array([-0.2203, -0.7538, 1.8789, 0.4451, -0.2526]) + ] + weighted_score = get_weighted_score([score_a], [1]) + assert np.all(np.isclose(np.array(score_a), np.array(weighted_score))) + coeff_a, coeff_b = 2., 1. + weighted_score = get_weighted_score([score_a, score_b], [coeff_a, coeff_b]) + ground_truth = [ + x * coeff_a + y * coeff_b for x, y in zip(score_a, score_b) + ] + assert np.all(np.isclose(np.array(ground_truth), np.array(weighted_score))) + + +def test_mean_average_precision(): + + def content_for_unittest(scores, labels, result): + gt = mean_average_precision(scores, labels) + assert gt == result + + scores = [ + np.array([0.1, 0.2, 0.3, 0.4]), + np.array([0.2, 0.3, 0.4, 0.1]), + np.array([0.3, 0.4, 0.1, 0.2]), + np.array([0.4, 0.1, 0.2, 0.3]) + ] + + label1 = np.array([[1, 1, 0, 0], [1, 0, 1, 1], [1, 0, 1, 0], [1, 1, 0, 1]]) + result1 = 2 / 3 + label2 = np.array([[0, 1, 0, 1], [0, 1, 1, 0], [1, 0, 1, 0], [0, 0, 1, 1]]) + result2 = np.mean([0.5, 0.5833333333333333, 0.8055555555555556, 1.0]) + + content_for_unittest(scores, label1, result1) + content_for_unittest(scores, label2, result2) + + +def test_top_k_accurate_classes(): + scores = [ + np.array([0.1, 0.2, 0.3, 0.4]), # 3 + np.array([0.2, 0.3, 0.4, 0.1]), # 2 + np.array([0.3, 0.4, 0.1, 0.2]), # 1 + np.array([0.4, 0.1, 0.2, 0.3]), # 0 + np.array([0.25, 0.1, 0.3, 0.35]), # 3 + np.array([0.2, 0.15, 0.3, 0.35]), # 3 + ] + label = np.array([3, 2, 2, 1, 3, 3], dtype=np.int64) + + with pytest.raises(AssertionError): + top_k_classes(scores, label, 1, mode='wrong') + + results_top1 = top_k_classes(scores, label, 1) + results_top3 = top_k_classes(scores, label, 3) + assert len(results_top1) == 1 + assert len(results_top3) == 3 + assert results_top3[0] == results_top1[0] + assert results_top1 == [(3, 1.)] + assert results_top3 == [(3, 1.), (2, 0.5), (1, 0.0)] + + label = np.array([3, 2, 1, 1, 3, 0], dtype=np.int64) + results_top1 = top_k_classes(scores, label, 1, mode='inaccurate') + results_top3 = top_k_classes(scores, label, 3, mode='inaccurate') + assert len(results_top1) == 1 + assert len(results_top3) == 3 + assert results_top3[0] == results_top1[0] + assert results_top1 == [(0, 0.)] + assert results_top3 == [(0, 0.0), (1, 0.5), (2, 1.0)] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_metrics/test_losses.py b/openmmlab_test/mmaction2-0.24.1/tests/test_metrics/test_losses.py new file mode 100644 index 0000000000000000000000000000000000000000..1c0d657798353ec64f74c6e2e1dc3af24a07fdf2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_metrics/test_losses.py @@ -0,0 +1,332 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np +import pytest +import torch +import torch.nn as nn +import torch.nn.functional as F +from mmcv import ConfigDict +from numpy.testing import assert_almost_equal, assert_array_almost_equal +from torch.autograd import Variable + +from mmaction.models import (BCELossWithLogits, BinaryLogisticRegressionLoss, + BMNLoss, CrossEntropyLoss, HVULoss, NLLLoss, + OHEMHingeLoss, SSNLoss) + + +def test_hvu_loss(): + pred = torch.tensor([[-1.0525, -0.7085, 0.1819, -0.8011], + [0.1555, -1.5550, 0.5586, 1.9746]]) + gt = torch.tensor([[1., 0., 0., 0.], [0., 0., 1., 1.]]) + mask = torch.tensor([[1., 1., 0., 0.], [0., 0., 1., 1.]]) + category_mask = torch.tensor([[1., 0.], [0., 1.]]) + categories = ['action', 'scene'] + category_nums = (2, 2) + category_loss_weights = (1, 1) + loss_all_nomask_sum = HVULoss( + categories=categories, + category_nums=category_nums, + category_loss_weights=category_loss_weights, + loss_type='all', + with_mask=False, + reduction='sum') + loss = loss_all_nomask_sum(pred, gt, mask, category_mask) + loss1 = F.binary_cross_entropy_with_logits(pred, gt, reduction='none') + loss1 = torch.sum(loss1, dim=1) + assert torch.eq(loss['loss_cls'], torch.mean(loss1)) + + loss_all_mask = HVULoss( + categories=categories, + category_nums=category_nums, + category_loss_weights=category_loss_weights, + loss_type='all', + with_mask=True) + loss = loss_all_mask(pred, gt, mask, category_mask) + loss1 = F.binary_cross_entropy_with_logits(pred, gt, reduction='none') + loss1 = torch.sum(loss1 * mask, dim=1) / torch.sum(mask, dim=1) + loss1 = torch.mean(loss1) + assert torch.eq(loss['loss_cls'], loss1) + + loss_ind_mask = HVULoss( + categories=categories, + category_nums=category_nums, + category_loss_weights=category_loss_weights, + loss_type='individual', + with_mask=True) + loss = loss_ind_mask(pred, gt, mask, category_mask) + action_loss = F.binary_cross_entropy_with_logits(pred[:1, :2], gt[:1, :2]) + scene_loss = F.binary_cross_entropy_with_logits(pred[1:, 2:], gt[1:, 2:]) + loss1 = (action_loss + scene_loss) / 2 + assert torch.eq(loss['loss_cls'], loss1) + + loss_ind_nomask_sum = HVULoss( + categories=categories, + category_nums=category_nums, + category_loss_weights=category_loss_weights, + loss_type='individual', + with_mask=False, + reduction='sum') + loss = loss_ind_nomask_sum(pred, gt, mask, category_mask) + action_loss = F.binary_cross_entropy_with_logits( + pred[:, :2], gt[:, :2], reduction='none') + action_loss = torch.sum(action_loss, dim=1) + action_loss = torch.mean(action_loss) + + scene_loss = F.binary_cross_entropy_with_logits( + pred[:, 2:], gt[:, 2:], reduction='none') + scene_loss = torch.sum(scene_loss, dim=1) + scene_loss = torch.mean(scene_loss) + + loss1 = (action_loss + scene_loss) / 2 + assert torch.eq(loss['loss_cls'], loss1) + + +def test_cross_entropy_loss(): + cls_scores = torch.rand((3, 4)) + hard_gt_labels = torch.LongTensor([0, 1, 2]).squeeze() + soft_gt_labels = torch.FloatTensor([[1, 0, 0, 0], [0, 1, 0, 0], + [0, 0, 1, 0]]).squeeze() + + # hard label without weight + cross_entropy_loss = CrossEntropyLoss() + output_loss = cross_entropy_loss(cls_scores, hard_gt_labels) + assert torch.equal(output_loss, F.cross_entropy(cls_scores, + hard_gt_labels)) + + # hard label with class weight + weight = torch.rand(4) + class_weight = weight.numpy().tolist() + cross_entropy_loss = CrossEntropyLoss(class_weight=class_weight) + output_loss = cross_entropy_loss(cls_scores, hard_gt_labels) + assert torch.equal( + output_loss, + F.cross_entropy(cls_scores, hard_gt_labels, weight=weight)) + + # soft label without class weight + cross_entropy_loss = CrossEntropyLoss() + output_loss = cross_entropy_loss(cls_scores, soft_gt_labels) + assert_almost_equal( + output_loss.numpy(), + F.cross_entropy(cls_scores, hard_gt_labels).numpy(), + decimal=4) + + # soft label with class weight + cross_entropy_loss = CrossEntropyLoss(class_weight=class_weight) + output_loss = cross_entropy_loss(cls_scores, soft_gt_labels) + assert_almost_equal( + output_loss.numpy(), + F.cross_entropy(cls_scores, hard_gt_labels, weight=weight).numpy(), + decimal=4) + + +def test_bce_loss_with_logits(): + cls_scores = torch.rand((3, 4)) + gt_labels = torch.rand((3, 4)) + + bce_loss_with_logits = BCELossWithLogits() + output_loss = bce_loss_with_logits(cls_scores, gt_labels) + assert torch.equal( + output_loss, F.binary_cross_entropy_with_logits(cls_scores, gt_labels)) + + weight = torch.rand(4) + class_weight = weight.numpy().tolist() + bce_loss_with_logits = BCELossWithLogits(class_weight=class_weight) + output_loss = bce_loss_with_logits(cls_scores, gt_labels) + assert torch.equal( + output_loss, + F.binary_cross_entropy_with_logits( + cls_scores, gt_labels, weight=weight)) + + +def test_nll_loss(): + cls_scores = torch.randn(3, 3) + gt_labels = torch.tensor([0, 2, 1]).squeeze() + + sm = nn.Softmax(dim=0) + nll_loss = NLLLoss() + cls_score_log = torch.log(sm(cls_scores)) + output_loss = nll_loss(cls_score_log, gt_labels) + assert torch.equal(output_loss, F.nll_loss(cls_score_log, gt_labels)) + + +def test_binary_logistic_loss(): + binary_logistic_regression_loss = BinaryLogisticRegressionLoss() + reg_score = torch.tensor([0., 1.]) + label = torch.tensor([0., 1.]) + output_loss = binary_logistic_regression_loss(reg_score, label, 0.5) + assert_array_almost_equal(output_loss.numpy(), np.array([0.]), decimal=4) + + reg_score = torch.tensor([0.3, 0.9]) + label = torch.tensor([0., 1.]) + output_loss = binary_logistic_regression_loss(reg_score, label, 0.5) + assert_array_almost_equal( + output_loss.numpy(), np.array([0.231]), decimal=4) + + +def test_bmn_loss(): + bmn_loss = BMNLoss() + + # test tem_loss + pred_start = torch.tensor([0.9, 0.1]) + pred_end = torch.tensor([0.1, 0.9]) + gt_start = torch.tensor([1., 0.]) + gt_end = torch.tensor([0., 1.]) + output_tem_loss = bmn_loss.tem_loss(pred_start, pred_end, gt_start, gt_end) + binary_logistic_regression_loss = BinaryLogisticRegressionLoss() + assert_loss = ( + binary_logistic_regression_loss(pred_start, gt_start) + + binary_logistic_regression_loss(pred_end, gt_end)) + assert_array_almost_equal( + output_tem_loss.numpy(), assert_loss.numpy(), decimal=4) + + # test pem_reg_loss + seed = 1 + torch.manual_seed(seed) + torch.cuda.manual_seed(seed) + torch.cuda.manual_seed_all(seed) + + pred_bm_reg = torch.tensor([[0.1, 0.99], [0.5, 0.4]]) + gt_iou_map = torch.tensor([[0, 1.], [0, 1.]]) + mask = torch.tensor([[0.1, 0.4], [0.4, 0.1]]) + output_pem_reg_loss = bmn_loss.pem_reg_loss(pred_bm_reg, gt_iou_map, mask) + assert_array_almost_equal( + output_pem_reg_loss.numpy(), np.array([0.2140]), decimal=4) + + # test pem_cls_loss + pred_bm_cls = torch.tensor([[0.1, 0.99], [0.95, 0.2]]) + gt_iou_map = torch.tensor([[0., 1.], [0., 1.]]) + mask = torch.tensor([[0.1, 0.4], [0.4, 0.1]]) + output_pem_cls_loss = bmn_loss.pem_cls_loss(pred_bm_cls, gt_iou_map, mask) + assert_array_almost_equal( + output_pem_cls_loss.numpy(), np.array([1.6137]), decimal=4) + + # test bmn_loss + pred_bm = torch.tensor([[[[0.1, 0.99], [0.5, 0.4]], + [[0.1, 0.99], [0.95, 0.2]]]]) + pred_start = torch.tensor([[0.9, 0.1]]) + pred_end = torch.tensor([[0.1, 0.9]]) + gt_iou_map = torch.tensor([[[0., 2.5], [0., 10.]]]) + gt_start = torch.tensor([[1., 0.]]) + gt_end = torch.tensor([[0., 1.]]) + mask = torch.tensor([[0.1, 0.4], [0.4, 0.1]]) + output_loss = bmn_loss(pred_bm, pred_start, pred_end, gt_iou_map, gt_start, + gt_end, mask) + assert_array_almost_equal( + output_loss[0].numpy(), + output_tem_loss + 10 * output_pem_reg_loss + output_pem_cls_loss) + assert_array_almost_equal(output_loss[1].numpy(), output_tem_loss) + assert_array_almost_equal(output_loss[2].numpy(), output_pem_reg_loss) + assert_array_almost_equal(output_loss[3].numpy(), output_pem_cls_loss) + + +def test_ohem_hinge_loss(): + # test normal case + pred = torch.tensor([[ + 0.5161, 0.5228, 0.7748, 0.0573, 0.1113, 0.8862, 0.1752, 0.9448, 0.0253, + 0.1009, 0.4371, 0.2232, 0.0412, 0.3487, 0.3350, 0.9294, 0.7122, 0.3072, + 0.2942, 0.7679 + ]], + requires_grad=True) + gt = torch.tensor([8]) + num_video = 1 + loss = OHEMHingeLoss.apply(pred, gt, 1, 1.0, num_video) + assert_array_almost_equal( + loss.detach().numpy(), np.array([0.0552]), decimal=4) + loss.backward(Variable(torch.ones([1]))) + assert_array_almost_equal( + np.array(pred.grad), + np.array([[ + 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0., 0., 0., 0., 0., 0., + 0., 0., 0., 0. + ]]), + decimal=4) + + # test error case + with pytest.raises(ValueError): + gt = torch.tensor([8, 10]) + loss = OHEMHingeLoss.apply(pred, gt, 1, 1.0, num_video) + + +def test_ssn_loss(): + ssn_loss = SSNLoss() + + # test activity_loss + activity_score = torch.rand((8, 21)) + labels = torch.LongTensor([8] * 8).squeeze() + activity_indexer = torch.tensor([0, 7]) + output_activity_loss = ssn_loss.activity_loss(activity_score, labels, + activity_indexer) + assert torch.equal( + output_activity_loss, + F.cross_entropy(activity_score[activity_indexer, :], + labels[activity_indexer])) + + # test completeness_loss + completeness_score = torch.rand((8, 20), requires_grad=True) + labels = torch.LongTensor([8] * 8).squeeze() + completeness_indexer = torch.tensor([0, 1, 2, 3, 4, 5, 6]) + positive_per_video = 1 + incomplete_per_video = 6 + output_completeness_loss = ssn_loss.completeness_loss( + completeness_score, labels, completeness_indexer, positive_per_video, + incomplete_per_video) + + pred = completeness_score[completeness_indexer, :] + gt = labels[completeness_indexer] + pred_dim = pred.size(1) + pred = pred.view(-1, positive_per_video + incomplete_per_video, pred_dim) + gt = gt.view(-1, positive_per_video + incomplete_per_video) + # yapf:disable + positive_pred = pred[:, :positive_per_video, :].contiguous().view(-1, pred_dim) # noqa:E501 + incomplete_pred = pred[:, positive_per_video:, :].contiguous().view(-1, pred_dim) # noqa:E501 + # yapf:enable + ohem_ratio = 0.17 + positive_loss = OHEMHingeLoss.apply( + positive_pred, gt[:, :positive_per_video].contiguous().view(-1), 1, + 1.0, positive_per_video) + incomplete_loss = OHEMHingeLoss.apply( + incomplete_pred, gt[:, positive_per_video:].contiguous().view(-1), -1, + ohem_ratio, incomplete_per_video) + num_positives = positive_pred.size(0) + num_incompletes = int(incomplete_pred.size(0) * ohem_ratio) + assert_loss = ((positive_loss + incomplete_loss) / + float(num_positives + num_incompletes)) + assert torch.equal(output_completeness_loss, assert_loss) + + # test reg_loss + bbox_pred = torch.rand((8, 20, 2)) + labels = torch.LongTensor([8] * 8).squeeze() + bbox_targets = torch.rand((8, 2)) + regression_indexer = torch.tensor([0]) + output_reg_loss = ssn_loss.classwise_regression_loss( + bbox_pred, labels, bbox_targets, regression_indexer) + + pred = bbox_pred[regression_indexer, :, :] + gt = labels[regression_indexer] + reg_target = bbox_targets[regression_indexer, :] + class_idx = gt.data - 1 + classwise_pred = pred[:, class_idx, :] + classwise_reg_pred = torch.cat((torch.diag(classwise_pred[:, :, 0]).view( + -1, 1), torch.diag(classwise_pred[:, :, 1]).view(-1, 1)), + dim=1) + assert torch.equal( + output_reg_loss, + F.smooth_l1_loss(classwise_reg_pred.view(-1), reg_target.view(-1)) * 2) + + # test ssn_loss + proposal_type = torch.tensor([[0, 1, 1, 1, 1, 1, 1, 2]]) + train_cfg = ConfigDict( + dict( + ssn=dict( + sampler=dict( + num_per_video=8, + positive_ratio=1, + background_ratio=1, + incomplete_ratio=6, + add_gt_as_proposals=True), + loss_weight=dict(comp_loss_weight=0.1, reg_loss_weight=0.1)))) + output_loss = ssn_loss(activity_score, completeness_score, bbox_pred, + proposal_type, labels, bbox_targets, train_cfg) + assert torch.equal(output_loss['loss_activity'], output_activity_loss) + assert torch.equal(output_loss['loss_completeness'], + output_completeness_loss * 0.1) + assert torch.equal(output_loss['loss_reg'], output_reg_loss * 0.1) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/__init__.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..7ae5f7087a9016a3d9c6c1265a253c7e093860ee --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .base import (check_norm_state, generate_backbone_demo_inputs, + generate_detector_demo_inputs, generate_gradcam_inputs, + generate_recognizer_demo_inputs, get_audio_recognizer_cfg, + get_cfg, get_detector_cfg, get_localizer_cfg, + get_recognizer_cfg, get_skeletongcn_cfg) + +__all__ = [ + 'check_norm_state', 'generate_backbone_demo_inputs', + 'generate_recognizer_demo_inputs', 'generate_gradcam_inputs', 'get_cfg', + 'get_recognizer_cfg', 'get_audio_recognizer_cfg', 'get_localizer_cfg', + 'get_detector_cfg', 'generate_detector_demo_inputs', 'get_skeletongcn_cfg' +] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/base.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/base.py new file mode 100644 index 0000000000000000000000000000000000000000..49c1fd7ad9009dcae0ff258d53f98b099fc166cc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/base.py @@ -0,0 +1,167 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import mmcv +import numpy as np +import torch +from mmcv.utils import _BatchNorm + + +def check_norm_state(modules, train_state): + """Check if norm layer is in correct train state.""" + for mod in modules: + if isinstance(mod, _BatchNorm): + if mod.training != train_state: + return False + return True + + +def generate_backbone_demo_inputs(input_shape=(1, 3, 64, 64)): + """Create a superset of inputs needed to run backbone. + + Args: + input_shape (tuple): input batch dimensions. + Default: (1, 3, 64, 64). + """ + imgs = np.random.random(input_shape) + imgs = torch.FloatTensor(imgs) + + return imgs + + +def generate_recognizer_demo_inputs( + input_shape=(1, 3, 3, 224, 224), model_type='2D'): + """Create a superset of inputs needed to run test or train batches. + + Args: + input_shape (tuple): input batch dimensions. + Default: (1, 250, 3, 224, 224). + model_type (str): Model type for data generation, from {'2D', '3D'}. + Default:'2D' + """ + if len(input_shape) == 5: + (N, L, _, _, _) = input_shape + elif len(input_shape) == 6: + (N, M, _, L, _, _) = input_shape + + imgs = np.random.random(input_shape) + + if model_type == '2D' or model_type == 'skeleton': + gt_labels = torch.LongTensor([2] * N) + elif model_type == '3D': + gt_labels = torch.LongTensor([2] * M) + elif model_type == 'audio': + gt_labels = torch.LongTensor([2] * L) + else: + raise ValueError(f'Data type {model_type} is not available') + + inputs = {'imgs': torch.FloatTensor(imgs), 'gt_labels': gt_labels} + return inputs + + +def generate_detector_demo_inputs( + input_shape=(1, 3, 4, 224, 224), num_classes=81, train=True, + device='cpu'): + num_samples = input_shape[0] + if not train: + assert num_samples == 1 + + def random_box(n): + box = torch.rand(n, 4) * 0.5 + box[:, 2:] += 0.5 + box[:, 0::2] *= input_shape[3] + box[:, 1::2] *= input_shape[4] + if device == 'cuda': + box = box.cuda() + return box + + def random_label(n): + label = torch.randn(n, num_classes) + label = (label > 0.8).type(torch.float32) + label[:, 0] = 0 + if device == 'cuda': + label = label.cuda() + return label + + img = torch.FloatTensor(np.random.random(input_shape)) + if device == 'cuda': + img = img.cuda() + + proposals = [random_box(2) for i in range(num_samples)] + gt_bboxes = [random_box(2) for i in range(num_samples)] + gt_labels = [random_label(2) for i in range(num_samples)] + img_metas = [dict(img_shape=input_shape[-2:]) for i in range(num_samples)] + + if train: + return dict( + img=img, + proposals=proposals, + gt_bboxes=gt_bboxes, + gt_labels=gt_labels, + img_metas=img_metas) + + return dict(img=[img], proposals=[proposals], img_metas=[img_metas]) + + +def generate_gradcam_inputs(input_shape=(1, 3, 3, 224, 224), model_type='2D'): + """Create a superset of inputs needed to run gradcam. + + Args: + input_shape (tuple[int]): input batch dimensions. + Default: (1, 3, 3, 224, 224). + model_type (str): Model type for data generation, from {'2D', '3D'}. + Default:'2D' + return: + dict: model inputs, including two keys, ``imgs`` and ``label``. + """ + imgs = np.random.random(input_shape) + + if model_type in ['2D', '3D']: + gt_labels = torch.LongTensor([2] * input_shape[0]) + else: + raise ValueError(f'Data type {model_type} is not available') + + inputs = { + 'imgs': torch.FloatTensor(imgs), + 'label': gt_labels, + } + return inputs + + +def get_cfg(config_type, fname): + """Grab configs necessary to create a recognizer. + + These are deep copied to allow for safe modification of parameters without + influencing other tests. + """ + config_types = ('recognition', 'recognition_audio', 'localization', + 'detection', 'skeleton') + assert config_type in config_types + + repo_dpath = osp.dirname(osp.dirname(osp.dirname(__file__))) + config_dpath = osp.join(repo_dpath, 'configs/' + config_type) + config_fpath = osp.join(config_dpath, fname) + if not osp.exists(config_dpath): + raise Exception('Cannot find config path') + config = mmcv.Config.fromfile(config_fpath) + return config + + +def get_recognizer_cfg(fname): + return get_cfg('recognition', fname) + + +def get_audio_recognizer_cfg(fname): + return get_cfg('recognition_audio', fname) + + +def get_localizer_cfg(fname): + return get_cfg('localization', fname) + + +def get_detector_cfg(fname): + return get_cfg('detection', fname) + + +def get_skeletongcn_cfg(fname): + return get_cfg('skeleton', fname) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_backbones.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_backbones.py new file mode 100644 index 0000000000000000000000000000000000000000..1933b98182ebab17b1239d30b343a99f8addf53c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_backbones.py @@ -0,0 +1,931 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy + +import pytest +import torch +import torch.nn as nn +from mmcv.utils import _BatchNorm + +from mmaction.models import (C3D, STGCN, X3D, MobileNetV2TSM, ResNet2Plus1d, + ResNet3dCSN, ResNet3dSlowFast, ResNet3dSlowOnly, + ResNetAudio, ResNetTIN, ResNetTSM, TANet, + TimeSformer) +from mmaction.models.backbones.resnet_tsm import NL3DWrapper +from .base import check_norm_state, generate_backbone_demo_inputs + + +def test_x3d_backbone(): + """Test x3d backbone.""" + with pytest.raises(AssertionError): + # In X3D: 1 <= num_stages <= 4 + X3D(gamma_w=1.0, gamma_b=2.25, gamma_d=2.2, num_stages=0) + + with pytest.raises(AssertionError): + # In X3D: 1 <= num_stages <= 4 + X3D(gamma_w=1.0, gamma_b=2.25, gamma_d=2.2, num_stages=5) + + with pytest.raises(AssertionError): + # len(spatial_strides) == num_stages + X3D(gamma_w=1.0, + gamma_b=2.25, + gamma_d=2.2, + spatial_strides=(1, 2), + num_stages=4) + + with pytest.raises(AssertionError): + # se_style in ['half', 'all'] + X3D(gamma_w=1.0, gamma_b=2.25, gamma_d=2.2, se_style=None) + + with pytest.raises(AssertionError): + # se_ratio should be None or > 0 + X3D(gamma_w=1.0, + gamma_b=2.25, + gamma_d=2.2, + se_style='half', + se_ratio=0) + + # x3d_s, no pretrained, norm_eval True + x3d_s = X3D(gamma_w=1.0, gamma_b=2.25, gamma_d=2.2, norm_eval=True) + x3d_s.init_weights() + x3d_s.train() + assert check_norm_state(x3d_s.modules(), False) + + # x3d_l, no pretrained, norm_eval True + x3d_l = X3D(gamma_w=1.0, gamma_b=2.25, gamma_d=5.0, norm_eval=True) + x3d_l.init_weights() + x3d_l.train() + assert check_norm_state(x3d_l.modules(), False) + + # x3d_s, no pretrained, norm_eval False + x3d_s = X3D(gamma_w=1.0, gamma_b=2.25, gamma_d=2.2, norm_eval=False) + x3d_s.init_weights() + x3d_s.train() + assert check_norm_state(x3d_s.modules(), True) + + # x3d_l, no pretrained, norm_eval False + x3d_l = X3D(gamma_w=1.0, gamma_b=2.25, gamma_d=5.0, norm_eval=False) + x3d_l.init_weights() + x3d_l.train() + assert check_norm_state(x3d_l.modules(), True) + + # x3d_s, no pretrained, frozen_stages, norm_eval False + frozen_stages = 1 + x3d_s_frozen = X3D( + gamma_w=1.0, + gamma_b=2.25, + gamma_d=2.2, + norm_eval=False, + frozen_stages=frozen_stages) + + x3d_s_frozen.init_weights() + x3d_s_frozen.train() + assert x3d_s_frozen.conv1_t.bn.training is False + for param in x3d_s_frozen.conv1_s.parameters(): + assert param.requires_grad is False + for param in x3d_s_frozen.conv1_t.parameters(): + assert param.requires_grad is False + + for i in range(1, frozen_stages + 1): + layer = getattr(x3d_s_frozen, f'layer{i}') + for mod in layer.modules(): + if isinstance(mod, _BatchNorm): + assert mod.training is False + for param in layer.parameters(): + assert param.requires_grad is False + + # test zero_init_residual, zero_init_residual is True by default + for m in x3d_s_frozen.modules(): + if hasattr(m, 'conv3'): + assert torch.equal(m.conv3.bn.weight, + torch.zeros_like(m.conv3.bn.weight)) + assert torch.equal(m.conv3.bn.bias, + torch.zeros_like(m.conv3.bn.bias)) + + # x3d_s inference + input_shape = (1, 3, 13, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + x3d_s_frozen = x3d_s_frozen.cuda() + imgs_gpu = imgs.cuda() + feat = x3d_s_frozen(imgs_gpu) + assert feat.shape == torch.Size([1, 432, 13, 2, 2]) + else: + feat = x3d_s_frozen(imgs) + assert feat.shape == torch.Size([1, 432, 13, 2, 2]) + + # x3d_m inference + input_shape = (1, 3, 16, 96, 96) + imgs = generate_backbone_demo_inputs(input_shape) + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + x3d_s_frozen = x3d_s_frozen.cuda() + imgs_gpu = imgs.cuda() + feat = x3d_s_frozen(imgs_gpu) + assert feat.shape == torch.Size([1, 432, 16, 3, 3]) + else: + feat = x3d_s_frozen(imgs) + assert feat.shape == torch.Size([1, 432, 16, 3, 3]) + + +def test_resnet2plus1d_backbone(): + # Test r2+1d backbone + with pytest.raises(AssertionError): + # r2+1d does not support inflation + ResNet2Plus1d(50, None, pretrained2d=True) + + with pytest.raises(AssertionError): + # r2+1d requires conv(2+1)d module + ResNet2Plus1d( + 50, None, pretrained2d=False, conv_cfg=dict(type='Conv3d')) + + frozen_stages = 1 + r2plus1d_34_frozen = ResNet2Plus1d( + 34, + None, + conv_cfg=dict(type='Conv2plus1d'), + pretrained2d=False, + frozen_stages=frozen_stages, + conv1_kernel=(3, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(1, 1, 1, 1), + spatial_strides=(1, 2, 2, 2), + temporal_strides=(1, 2, 2, 2)) + r2plus1d_34_frozen.init_weights() + r2plus1d_34_frozen.train() + assert r2plus1d_34_frozen.conv1.conv.bn_s.training is False + assert r2plus1d_34_frozen.conv1.bn.training is False + for param in r2plus1d_34_frozen.conv1.parameters(): + assert param.requires_grad is False + for i in range(1, frozen_stages + 1): + layer = getattr(r2plus1d_34_frozen, f'layer{i}') + for mod in layer.modules(): + if isinstance(mod, _BatchNorm): + assert mod.training is False + for param in layer.parameters(): + assert param.requires_grad is False + input_shape = (1, 3, 8, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + r2plus1d_34_frozen = r2plus1d_34_frozen.cuda() + imgs_gpu = imgs.cuda() + feat = r2plus1d_34_frozen(imgs_gpu) + assert feat.shape == torch.Size([1, 512, 1, 2, 2]) + else: + feat = r2plus1d_34_frozen(imgs) + assert feat.shape == torch.Size([1, 512, 1, 2, 2]) + + r2plus1d_50_frozen = ResNet2Plus1d( + 50, + None, + conv_cfg=dict(type='Conv2plus1d'), + pretrained2d=False, + conv1_kernel=(3, 7, 7), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(1, 1, 1, 1), + spatial_strides=(1, 2, 2, 2), + temporal_strides=(1, 2, 2, 2), + frozen_stages=frozen_stages) + r2plus1d_50_frozen.init_weights() + + r2plus1d_50_frozen.train() + assert r2plus1d_50_frozen.conv1.conv.bn_s.training is False + assert r2plus1d_50_frozen.conv1.bn.training is False + for param in r2plus1d_50_frozen.conv1.parameters(): + assert param.requires_grad is False + for i in range(1, frozen_stages + 1): + layer = getattr(r2plus1d_50_frozen, f'layer{i}') + for mod in layer.modules(): + if isinstance(mod, _BatchNorm): + assert mod.training is False + for param in layer.parameters(): + assert param.requires_grad is False + input_shape = (1, 3, 8, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + r2plus1d_50_frozen = r2plus1d_50_frozen.cuda() + imgs_gpu = imgs.cuda() + feat = r2plus1d_50_frozen(imgs_gpu) + assert feat.shape == torch.Size([1, 2048, 1, 2, 2]) + else: + feat = r2plus1d_50_frozen(imgs) + assert feat.shape == torch.Size([1, 2048, 1, 2, 2]) + + +def test_resnet_tsm_backbone(): + """Test resnet_tsm backbone.""" + with pytest.raises(NotImplementedError): + # shift_place must be block or blockres + resnet_tsm_50_block = ResNetTSM(50, shift_place='Block') + resnet_tsm_50_block.init_weights() + + from mmaction.models.backbones.resnet import Bottleneck + from mmaction.models.backbones.resnet_tsm import TemporalShift + + input_shape = (8, 3, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + + # resnet_tsm with depth 50 + resnet_tsm_50 = ResNetTSM(50) + resnet_tsm_50.init_weights() + for layer_name in resnet_tsm_50.res_layers: + layer = getattr(resnet_tsm_50, layer_name) + blocks = list(layer.children()) + for block in blocks: + assert isinstance(block.conv1.conv, TemporalShift) + assert block.conv1.conv.num_segments == resnet_tsm_50.num_segments + assert block.conv1.conv.shift_div == resnet_tsm_50.shift_div + assert isinstance(block.conv1.conv.net, nn.Conv2d) + + # resnet_tsm with depth 50, no pretrained, shift_place is block + resnet_tsm_50_block = ResNetTSM(50, shift_place='block') + resnet_tsm_50_block.init_weights() + for layer_name in resnet_tsm_50_block.res_layers: + layer = getattr(resnet_tsm_50_block, layer_name) + blocks = list(layer.children()) + for block in blocks: + assert isinstance(block, TemporalShift) + assert block.num_segments == resnet_tsm_50_block.num_segments + assert block.num_segments == resnet_tsm_50_block.num_segments + assert block.shift_div == resnet_tsm_50_block.shift_div + assert isinstance(block.net, Bottleneck) + + # resnet_tsm with depth 50, no pretrained, use temporal_pool + resnet_tsm_50_temporal_pool = ResNetTSM(50, temporal_pool=True) + resnet_tsm_50_temporal_pool.init_weights() + for layer_name in resnet_tsm_50_temporal_pool.res_layers: + layer = getattr(resnet_tsm_50_temporal_pool, layer_name) + blocks = list(layer.children()) + + if layer_name == 'layer2': + assert len(blocks) == 2 + assert isinstance(blocks[1], nn.MaxPool3d) + blocks = copy.deepcopy(blocks[0]) + + for block in blocks: + assert isinstance(block.conv1.conv, TemporalShift) + if layer_name == 'layer1': + assert block.conv1.conv.num_segments == \ + resnet_tsm_50_temporal_pool.num_segments + else: + assert block.conv1.conv.num_segments == \ + resnet_tsm_50_temporal_pool.num_segments // 2 + assert block.conv1.conv.shift_div == resnet_tsm_50_temporal_pool.shift_div # noqa: E501 + assert isinstance(block.conv1.conv.net, nn.Conv2d) + + # resnet_tsm with non-local module + non_local_cfg = dict( + sub_sample=True, + use_scale=False, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='embedded_gaussian') + non_local = ((0, 0, 0), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 0, 0)) + resnet_tsm_nonlocal = ResNetTSM( + 50, non_local=non_local, non_local_cfg=non_local_cfg) + resnet_tsm_nonlocal.init_weights() + for layer_name in ['layer2', 'layer3']: + layer = getattr(resnet_tsm_nonlocal, layer_name) + for i, _ in enumerate(layer): + if i % 2 == 0: + assert isinstance(layer[i], NL3DWrapper) + + resnet_tsm_50_full = ResNetTSM( + 50, + non_local=non_local, + non_local_cfg=non_local_cfg, + temporal_pool=True) + resnet_tsm_50_full.init_weights() + + # TSM forword + feat = resnet_tsm_50(imgs) + assert feat.shape == torch.Size([8, 2048, 2, 2]) + + # TSM with non-local forward + feat = resnet_tsm_nonlocal(imgs) + assert feat.shape == torch.Size([8, 2048, 2, 2]) + + # TSM with temporal pool forward + feat = resnet_tsm_50_temporal_pool(imgs) + assert feat.shape == torch.Size([4, 2048, 2, 2]) + + # TSM with temporal pool + non-local forward + input_shape = (16, 3, 32, 32) + imgs = generate_backbone_demo_inputs(input_shape) + feat = resnet_tsm_50_full(imgs) + assert feat.shape == torch.Size([8, 2048, 1, 1]) + + +def test_mobilenetv2_tsm_backbone(): + """Test mobilenetv2_tsm backbone.""" + from mmcv.cnn import ConvModule + + from mmaction.models.backbones.mobilenet_v2 import InvertedResidual + from mmaction.models.backbones.resnet_tsm import TemporalShift + + input_shape = (8, 3, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + + # mobilenetv2_tsm with width_mult = 1.0 + mobilenetv2_tsm = MobileNetV2TSM() + mobilenetv2_tsm.init_weights() + for cur_module in mobilenetv2_tsm.modules(): + if isinstance(cur_module, InvertedResidual) and \ + len(cur_module.conv) == 3 and \ + cur_module.use_res_connect: + assert isinstance(cur_module.conv[0], TemporalShift) + assert cur_module.conv[0].num_segments == \ + mobilenetv2_tsm.num_segments + assert cur_module.conv[0].shift_div == mobilenetv2_tsm.shift_div + assert isinstance(cur_module.conv[0].net, ConvModule) + + # TSM-MobileNetV2 with widen_factor = 1.0 forword + feat = mobilenetv2_tsm(imgs) + assert feat.shape == torch.Size([8, 1280, 2, 2]) + + # mobilenetv2 with widen_factor = 0.5 forword + mobilenetv2_tsm_05 = MobileNetV2TSM(widen_factor=0.5) + mobilenetv2_tsm_05.init_weights() + feat = mobilenetv2_tsm_05(imgs) + assert feat.shape == torch.Size([8, 1280, 2, 2]) + + # mobilenetv2 with widen_factor = 1.5 forword + mobilenetv2_tsm_15 = MobileNetV2TSM(widen_factor=1.5) + mobilenetv2_tsm_15.init_weights() + feat = mobilenetv2_tsm_15(imgs) + assert feat.shape == torch.Size([8, 1920, 2, 2]) + + +def test_slowfast_backbone(): + """Test SlowFast backbone.""" + with pytest.raises(TypeError): + # cfg should be a dict + ResNet3dSlowFast(None, slow_pathway=list(['foo', 'bar'])) + with pytest.raises(TypeError): + # pretrained should be a str + sf_50 = ResNet3dSlowFast(dict(foo='bar')) + sf_50.init_weights() + with pytest.raises(KeyError): + # pathway type should be implemented + ResNet3dSlowFast(None, slow_pathway=dict(type='resnext')) + + # test slowfast with slow inflated + sf_50_inflate = ResNet3dSlowFast( + None, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained='torchvision://resnet50', + pretrained2d=True, + lateral=True, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1))) + sf_50_inflate.init_weights() + sf_50_inflate.train() + + # test slowfast with no lateral connection + sf_50_wo_lateral = ResNet3dSlowFast( + None, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + lateral=False, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1))) + sf_50_wo_lateral.init_weights() + sf_50_wo_lateral.train() + + # slowfast w/o lateral connection inference test + input_shape = (1, 3, 8, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + sf_50_wo_lateral = sf_50_wo_lateral.cuda() + imgs_gpu = imgs.cuda() + feat = sf_50_wo_lateral(imgs_gpu) + else: + feat = sf_50_wo_lateral(imgs) + + assert isinstance(feat, tuple) + assert feat[0].shape == torch.Size([1, 2048, 1, 2, 2]) + assert feat[1].shape == torch.Size([1, 256, 8, 2, 2]) + + # test slowfast with frozen stages config + frozen_slow = 3 + sf_50 = ResNet3dSlowFast( + None, + slow_pathway=dict( + type='resnet3d', + depth=50, + pretrained=None, + pretrained2d=True, + lateral=True, + conv1_kernel=(1, 7, 7), + dilations=(1, 1, 1, 1), + conv1_stride_t=1, + pool1_stride_t=1, + inflate=(0, 0, 1, 1), + frozen_stages=frozen_slow)) + sf_50.init_weights() + sf_50.train() + + for stage in range(1, sf_50.slow_path.num_stages): + lateral_name = sf_50.slow_path.lateral_connections[stage - 1] + conv_lateral = getattr(sf_50.slow_path, lateral_name) + for mod in conv_lateral.modules(): + if isinstance(mod, _BatchNorm): + if stage <= frozen_slow: + assert mod.training is False + else: + assert mod.training is True + for param in conv_lateral.parameters(): + if stage <= frozen_slow: + assert param.requires_grad is False + else: + assert param.requires_grad is True + + # test slowfast with normal config + sf_50 = ResNet3dSlowFast(None) + sf_50.init_weights() + sf_50.train() + + # slowfast inference test + input_shape = (1, 3, 8, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + sf_50 = sf_50.cuda() + imgs_gpu = imgs.cuda() + feat = sf_50(imgs_gpu) + else: + feat = sf_50(imgs) + + assert isinstance(feat, tuple) + assert feat[0].shape == torch.Size([1, 2048, 1, 2, 2]) + assert feat[1].shape == torch.Size([1, 256, 8, 2, 2]) + + +def test_slowonly_backbone(): + """Test SlowOnly backbone.""" + with pytest.raises(AssertionError): + # SlowOnly should contain no lateral connection + ResNet3dSlowOnly(50, None, lateral=True) + + # test SlowOnly for PoseC3D + so_50 = ResNet3dSlowOnly( + depth=50, + pretrained=None, + in_channels=17, + base_channels=32, + num_stages=3, + out_indices=(2, ), + stage_blocks=(4, 6, 3), + conv1_stride_s=1, + pool1_stride_s=1, + inflate=(0, 1, 1), + spatial_strides=(2, 2, 2), + temporal_strides=(1, 1, 2), + dilations=(1, 1, 1)) + so_50.init_weights() + so_50.train() + + # test SlowOnly with normal config + so_50 = ResNet3dSlowOnly(50, None) + so_50.init_weights() + so_50.train() + + # SlowOnly inference test + input_shape = (1, 3, 8, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + so_50 = so_50.cuda() + imgs_gpu = imgs.cuda() + feat = so_50(imgs_gpu) + else: + feat = so_50(imgs) + assert feat.shape == torch.Size([1, 2048, 8, 2, 2]) + + +def test_resnet_csn_backbone(): + """Test resnet_csn backbone.""" + with pytest.raises(ValueError): + # Bottleneck mode must be "ip" or "ir" + ResNet3dCSN(152, None, bottleneck_mode='id') + + input_shape = (2, 3, 6, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + + resnet3d_csn_frozen = ResNet3dCSN( + 152, None, bn_frozen=True, norm_eval=True) + resnet3d_csn_frozen.train() + for m in resnet3d_csn_frozen.modules(): + if isinstance(m, _BatchNorm): + for param in m.parameters(): + assert param.requires_grad is False + + # Interaction-preserved channel-separated bottleneck block + resnet3d_csn_ip = ResNet3dCSN(152, None, bottleneck_mode='ip') + resnet3d_csn_ip.init_weights() + resnet3d_csn_ip.train() + for i, layer_name in enumerate(resnet3d_csn_ip.res_layers): + layers = getattr(resnet3d_csn_ip, layer_name) + num_blocks = resnet3d_csn_ip.stage_blocks[i] + assert len(layers) == num_blocks + for layer in layers: + assert isinstance(layer.conv2, nn.Sequential) + assert len(layer.conv2) == 2 + assert layer.conv2[1].groups == layer.planes + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + resnet3d_csn_ip = resnet3d_csn_ip.cuda() + imgs_gpu = imgs.cuda() + feat = resnet3d_csn_ip(imgs_gpu) + assert feat.shape == torch.Size([2, 2048, 1, 2, 2]) + else: + feat = resnet3d_csn_ip(imgs) + assert feat.shape == torch.Size([2, 2048, 1, 2, 2]) + + # Interaction-reduced channel-separated bottleneck block + resnet3d_csn_ir = ResNet3dCSN(152, None, bottleneck_mode='ir') + resnet3d_csn_ir.init_weights() + resnet3d_csn_ir.train() + for i, layer_name in enumerate(resnet3d_csn_ir.res_layers): + layers = getattr(resnet3d_csn_ir, layer_name) + num_blocks = resnet3d_csn_ir.stage_blocks[i] + assert len(layers) == num_blocks + for layer in layers: + assert isinstance(layer.conv2, nn.Sequential) + assert len(layer.conv2) == 1 + assert layer.conv2[0].groups == layer.planes + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + resnet3d_csn_ir = resnet3d_csn_ir.cuda() + imgs_gpu = imgs.cuda() + feat = resnet3d_csn_ir(imgs_gpu) + assert feat.shape == torch.Size([2, 2048, 1, 2, 2]) + else: + feat = resnet3d_csn_ir(imgs) + assert feat.shape == torch.Size([2, 2048, 1, 2, 2]) + + # Set training status = False + resnet3d_csn_ip = ResNet3dCSN(152, None, bottleneck_mode='ip') + resnet3d_csn_ip.init_weights() + resnet3d_csn_ip.train(False) + for module in resnet3d_csn_ip.children(): + assert module.training is False + + +def test_tanet_backbone(): + """Test tanet backbone.""" + with pytest.raises(NotImplementedError): + # TA-Blocks are only based on Bottleneck block now + tanet_18 = TANet(18, 8) + tanet_18.init_weights() + + from mmaction.models.backbones.resnet import Bottleneck + from mmaction.models.backbones.tanet import TABlock + + # tanet with depth 50 + tanet_50 = TANet(50, 8) + tanet_50.init_weights() + + for layer_name in tanet_50.res_layers: + layer = getattr(tanet_50, layer_name) + blocks = list(layer.children()) + for block in blocks: + assert isinstance(block, TABlock) + assert isinstance(block.block, Bottleneck) + assert block.tam.num_segments == block.num_segments + assert block.tam.in_channels == block.block.conv1.out_channels + + input_shape = (8, 3, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + feat = tanet_50(imgs) + assert feat.shape == torch.Size([8, 2048, 2, 2]) + + input_shape = (16, 3, 32, 32) + imgs = generate_backbone_demo_inputs(input_shape) + feat = tanet_50(imgs) + assert feat.shape == torch.Size([16, 2048, 1, 1]) + + +def test_timesformer_backbone(): + input_shape = (1, 3, 8, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + + # divided_space_time + timesformer = TimeSformer( + 8, 64, 16, embed_dims=768, attention_type='divided_space_time') + timesformer.init_weights() + from mmaction.models.common import (DividedSpatialAttentionWithNorm, + DividedTemporalAttentionWithNorm, + FFNWithNorm) + assert isinstance(timesformer.transformer_layers.layers[0].attentions[0], + DividedTemporalAttentionWithNorm) + assert isinstance(timesformer.transformer_layers.layers[11].attentions[1], + DividedSpatialAttentionWithNorm) + assert isinstance(timesformer.transformer_layers.layers[0].ffns[0], + FFNWithNorm) + assert hasattr(timesformer, 'time_embed') + assert timesformer.patch_embed.num_patches == 16 + + cls_tokens = timesformer(imgs) + assert cls_tokens.shape == torch.Size([1, 768]) + + # space_only + timesformer = TimeSformer( + 8, 64, 16, embed_dims=512, num_heads=8, attention_type='space_only') + timesformer.init_weights() + + assert not hasattr(timesformer, 'time_embed') + assert timesformer.patch_embed.num_patches == 16 + + cls_tokens = timesformer(imgs) + assert cls_tokens.shape == torch.Size([1, 512]) + + # joint_space_time + input_shape = (1, 3, 2, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + timesformer = TimeSformer( + 2, + 64, + 8, + embed_dims=256, + num_heads=8, + attention_type='joint_space_time') + timesformer.init_weights() + + assert hasattr(timesformer, 'time_embed') + assert timesformer.patch_embed.num_patches == 64 + + cls_tokens = timesformer(imgs) + assert cls_tokens.shape == torch.Size([1, 256]) + + with pytest.raises(AssertionError): + # unsupported attention type + timesformer = TimeSformer( + 8, 64, 16, attention_type='wrong_attention_type') + + with pytest.raises(AssertionError): + # Wrong transformer_layers type + timesformer = TimeSformer(8, 64, 16, transformer_layers='wrong_type') + + +def test_c3d_backbone(): + """Test c3d backbone.""" + input_shape = (1, 3, 16, 24, 24) + imgs = generate_backbone_demo_inputs(input_shape) + + # c3d inference test + c3d = C3D(out_dim=512) + c3d.init_weights() + c3d.train() + feat = c3d(imgs) + assert feat.shape == torch.Size([1, 4096]) + + # c3d with bn inference test + c3d_bn = C3D(out_dim=512, norm_cfg=dict(type='BN3d')) + c3d_bn.init_weights() + c3d_bn.train() + feat = c3d_bn(imgs) + assert feat.shape == torch.Size([1, 4096]) + + +def test_resnet_audio_backbone(): + """Test ResNetAudio backbone.""" + input_shape = (1, 1, 16, 16) + spec = generate_backbone_demo_inputs(input_shape) + # inference + audioonly = ResNetAudio(50, None) + audioonly.init_weights() + audioonly.train() + feat = audioonly(spec) + assert feat.shape == torch.Size([1, 1024, 2, 2]) + + +@pytest.mark.skipif( + not torch.cuda.is_available(), reason='requires CUDA support') +def test_resnet_tin_backbone(): + """Test resnet_tin backbone.""" + with pytest.raises(AssertionError): + # num_segments should be positive + resnet_tin = ResNetTIN(50, num_segments=-1) + resnet_tin.init_weights() + + from mmaction.models.backbones.resnet_tin import (CombineNet, + TemporalInterlace) + + # resnet_tin with normal config + resnet_tin = ResNetTIN(50) + resnet_tin.init_weights() + for layer_name in resnet_tin.res_layers: + layer = getattr(resnet_tin, layer_name) + blocks = list(layer.children()) + for block in blocks: + assert isinstance(block.conv1.conv, CombineNet) + assert isinstance(block.conv1.conv.net1, TemporalInterlace) + assert ( + block.conv1.conv.net1.num_segments == resnet_tin.num_segments) + assert block.conv1.conv.net1.shift_div == resnet_tin.shift_div + + # resnet_tin with partial batchnorm + resnet_tin_pbn = ResNetTIN(50, partial_bn=True) + resnet_tin_pbn.train() + count_bn = 0 + for m in resnet_tin_pbn.modules(): + if isinstance(m, nn.BatchNorm2d): + count_bn += 1 + if count_bn >= 2: + assert m.training is False + assert m.weight.requires_grad is False + assert m.bias.requires_grad is False + else: + assert m.training is True + assert m.weight.requires_grad is True + assert m.bias.requires_grad is True + + input_shape = (8, 3, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape).cuda() + resnet_tin = resnet_tin.cuda() + + # resnet_tin with normal cfg inference + feat = resnet_tin(imgs) + assert feat.shape == torch.Size([8, 2048, 2, 2]) + + +def test_stgcn_backbone(): + """Test STGCN backbone.""" + # test coco layout, spatial strategy + input_shape = (1, 3, 300, 17, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='coco', strategy='spatial')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 17]) + + # test openpose-18 layout, spatial strategy + input_shape = (1, 3, 300, 18, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='openpose-18', strategy='spatial')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 18]) + + # test ntu-rgb+d layout, spatial strategy + input_shape = (1, 3, 300, 25, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 25]) + + # test ntu_edge layout, spatial strategy + input_shape = (1, 3, 300, 24, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu_edge', strategy='spatial')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 24]) + + # test coco layout, uniform strategy + input_shape = (1, 3, 300, 17, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='coco', strategy='uniform')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 17]) + + # test openpose-18 layout, uniform strategy + input_shape = (1, 3, 300, 18, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='openpose-18', strategy='uniform')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 18]) + + # test ntu-rgb+d layout, uniform strategy + input_shape = (1, 3, 300, 25, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu-rgb+d', strategy='uniform')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 25]) + + # test ntu_edge layout, uniform strategy + input_shape = (1, 3, 300, 24, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu_edge', strategy='uniform')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 24]) + + # test coco layout, distance strategy + input_shape = (1, 3, 300, 17, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='coco', strategy='distance')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 17]) + + # test openpose-18 layout, distance strategy + input_shape = (1, 3, 300, 18, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='openpose-18', strategy='distance')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 18]) + + # test ntu-rgb+d layout, distance strategy + input_shape = (1, 3, 300, 25, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu-rgb+d', strategy='distance')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 25]) + + # test ntu_edge layout, distance strategy + input_shape = (1, 3, 300, 24, 2) + skeletons = generate_backbone_demo_inputs(input_shape) + + stgcn = STGCN( + in_channels=3, + edge_importance_weighting=True, + graph_cfg=dict(layout='ntu_edge', strategy='distance')) + stgcn.init_weights() + stgcn.train() + feat = stgcn(skeletons) + assert feat.shape == torch.Size([2, 256, 75, 24]) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common.py new file mode 100644 index 0000000000000000000000000000000000000000..3cd6de2f09e463ee5066d5560fabe98265bd5e79 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common.py @@ -0,0 +1,149 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import pytest +import torch +import torch.nn as nn +from mmcv.utils import assert_params_all_zeros + +from mmaction.models.common import (LFB, TAM, Conv2plus1d, ConvAudio, + DividedSpatialAttentionWithNorm, + DividedTemporalAttentionWithNorm, + FFNWithNorm, SubBatchNorm3D) + + +def test_conv2plus1d(): + with pytest.raises(AssertionError): + # Length of kernel size, stride and padding must be the same + Conv2plus1d(3, 8, (2, 2)) + + conv_2plus1d = Conv2plus1d(3, 8, 2) + conv_2plus1d.init_weights() + + assert torch.equal(conv_2plus1d.bn_s.weight, + torch.ones_like(conv_2plus1d.bn_s.weight)) + assert torch.equal(conv_2plus1d.bn_s.bias, + torch.zeros_like(conv_2plus1d.bn_s.bias)) + + x = torch.rand(1, 3, 8, 256, 256) + output = conv_2plus1d(x) + assert output.shape == torch.Size([1, 8, 7, 255, 255]) + + +def test_conv_audio(): + conv_audio = ConvAudio(3, 8, 3) + conv_audio.init_weights() + + x = torch.rand(1, 3, 8, 8) + output = conv_audio(x) + assert output.shape == torch.Size([1, 16, 8, 8]) + + conv_audio_sum = ConvAudio(3, 8, 3, op='sum') + output = conv_audio_sum(x) + assert output.shape == torch.Size([1, 8, 8, 8]) + + +def test_divided_temporal_attention_with_norm(): + _cfg = dict(embed_dims=768, num_heads=12, num_frames=8) + divided_temporal_attention = DividedTemporalAttentionWithNorm(**_cfg) + assert isinstance(divided_temporal_attention.norm, nn.LayerNorm) + assert assert_params_all_zeros(divided_temporal_attention.temporal_fc) + + x = torch.rand(1, 1 + 8 * 14 * 14, 768) + output = divided_temporal_attention(x) + assert output.shape == torch.Size([1, 1 + 8 * 14 * 14, 768]) + + +def test_divided_spatial_attention_with_norm(): + _cfg = dict(embed_dims=512, num_heads=8, num_frames=4, dropout_layer=None) + divided_spatial_attention = DividedSpatialAttentionWithNorm(**_cfg) + assert isinstance(divided_spatial_attention.dropout_layer, nn.Identity) + assert isinstance(divided_spatial_attention.norm, nn.LayerNorm) + + x = torch.rand(1, 1 + 4 * 14 * 14, 512) + output = divided_spatial_attention(x) + assert output.shape == torch.Size([1, 1 + 4 * 14 * 14, 512]) + + +def test_ffn_with_norm(): + _cfg = dict( + embed_dims=256, feedforward_channels=256 * 2, norm_cfg=dict(type='LN')) + ffn_with_norm = FFNWithNorm(**_cfg) + assert isinstance(ffn_with_norm.norm, nn.LayerNorm) + + x = torch.rand(1, 1 + 4 * 14 * 14, 256) + output = ffn_with_norm(x) + assert output.shape == torch.Size([1, 1 + 4 * 14 * 14, 256]) + + +def test_TAM(): + """test TAM.""" + with pytest.raises(AssertionError): + # alpha must be a positive integer + TAM(16, 8, alpha=0, beta=4) + + with pytest.raises(AssertionError): + # beta must be a positive integer + TAM(16, 8, alpha=2, beta=0) + + with pytest.raises(AssertionError): + # the channels number of x should be equal to self.in_channels of TAM + tam = TAM(16, 8) + x = torch.rand(64, 8, 112, 112) + tam(x) + + tam = TAM(16, 8) + x = torch.rand(32, 16, 112, 112) + output = tam(x) + assert output.shape == torch.Size([32, 16, 112, 112]) + + +def test_LFB(): + """test LFB.""" + with pytest.raises(ValueError): + LFB(lfb_prefix_path='./_non_exist_path') + + lfb_prefix_path = osp.normpath( + osp.join(osp.dirname(__file__), '../data/lfb')) + + with pytest.raises(AssertionError): + LFB(lfb_prefix_path=lfb_prefix_path, dataset_modes=100) + + with pytest.raises(ValueError): + LFB(lfb_prefix_path=lfb_prefix_path, device='ceph') + + # load on cpu + lfb_cpu = LFB( + lfb_prefix_path=lfb_prefix_path, + max_num_sampled_feat=5, + window_size=60, + lfb_channels=16, + dataset_modes=('unittest'), + device='cpu') + + lt_feat_cpu = lfb_cpu['video_1,930'] + assert lt_feat_cpu.shape == (5 * 60, 16) + assert len(lfb_cpu) == 1 + + # load on lmdb + lfb_lmdb = LFB( + lfb_prefix_path=lfb_prefix_path, + max_num_sampled_feat=3, + window_size=30, + lfb_channels=16, + dataset_modes=('unittest'), + device='lmdb', + lmdb_map_size=1e6) + lt_feat_lmdb = lfb_lmdb['video_1,930'] + assert lt_feat_lmdb.shape == (3 * 30, 16) + + +def test_SubBatchNorm3D(): + _cfg = dict(num_splits=2) + num_features = 4 + sub_batchnorm_3d = SubBatchNorm3D(num_features, **_cfg) + assert sub_batchnorm_3d.bn.num_features == num_features + assert sub_batchnorm_3d.split_bn.num_features == num_features * 2 + + assert sub_batchnorm_3d.bn.affine is False + assert sub_batchnorm_3d.split_bn.affine is False diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/__init__.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/__init__.py @@ -0,0 +1 @@ +# Copyright (c) OpenMMLab. All rights reserved. diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_base_head.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_base_head.py new file mode 100644 index 0000000000000000000000000000000000000000..cff9eb4a7f18e1b7124c84d1dd97c99d951b9b3d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_base_head.py @@ -0,0 +1,73 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn.functional as F +from mmcv.utils import assert_dict_has_keys + +from mmaction.models import BaseHead + + +class ExampleHead(BaseHead): + # use an ExampleHead to test BaseHead + def init_weights(self): + pass + + def forward(self, x): + pass + + +def test_base_head(): + head = ExampleHead(3, 400, dict(type='CrossEntropyLoss')) + + cls_scores = torch.rand((3, 4)) + # When truth is non-empty then cls loss should be nonzero for random inputs + gt_labels = torch.LongTensor([2] * 3).squeeze() + losses = head.loss(cls_scores, gt_labels) + assert 'loss_cls' in losses.keys() + assert losses.get('loss_cls') > 0, 'cls loss should be non-zero' + + head = ExampleHead(3, 400, dict(type='CrossEntropyLoss', loss_weight=2.0)) + + cls_scores = torch.rand((3, 4)) + # When truth is non-empty then cls loss should be nonzero for random inputs + gt_labels = torch.LongTensor([2] * 3).squeeze() + losses = head.loss(cls_scores, gt_labels) + assert_dict_has_keys(losses, ['loss_cls']) + assert losses.get('loss_cls') > 0, 'cls loss should be non-zero' + + # Test Soft label with batch size > 1 + cls_scores = torch.rand((3, 3)) + gt_labels = torch.LongTensor([[2] * 3]) + gt_one_hot_labels = F.one_hot(gt_labels, num_classes=3).squeeze() + losses = head.loss(cls_scores, gt_one_hot_labels) + assert 'loss_cls' in losses.keys() + assert losses.get('loss_cls') > 0, 'cls loss should be non-zero' + + # Test Soft label with batch size = 1 + cls_scores = torch.rand((1, 3)) + gt_labels = torch.LongTensor([2]) + gt_one_hot_labels = F.one_hot(gt_labels, num_classes=3).squeeze() + losses = head.loss(cls_scores, gt_one_hot_labels) + assert 'loss_cls' in losses.keys() + assert losses.get('loss_cls') > 0, 'cls loss should be non-zero' + + # test multi-class & label smoothing + head = ExampleHead( + 3, + 400, + dict(type='BCELossWithLogits'), + multi_class=True, + label_smooth_eps=0.1) + + # batch size > 1 + cls_scores = torch.rand((2, 3)) + gt_labels = torch.LongTensor([[1, 0, 1], [0, 1, 0]]).squeeze() + losses = head.loss(cls_scores, gt_labels) + assert 'loss_cls' in losses.keys() + assert losses.get('loss_cls') > 0, 'cls loss should be non-zero' + + # batch size = 1 + cls_scores = torch.rand((1, 3)) + gt_labels = torch.LongTensor([[1, 0, 1]]).squeeze() + losses = head.loss(cls_scores, gt_labels) + assert 'loss_cls' in losses.keys() + assert losses.get('loss_cls') > 0, 'cls loss should be non-zero' diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_base_recognizers.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_base_recognizers.py new file mode 100644 index 0000000000000000000000000000000000000000..7a145701d050fc380ba5eb32fc8416a45daee4ed --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_base_recognizers.py @@ -0,0 +1,66 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import pytest +import torch +import torch.nn.functional as F + +from mmaction.models import BaseRecognizer + + +class ExampleRecognizer(BaseRecognizer): + + def __init__(self, train_cfg, test_cfg): + super(BaseRecognizer, self).__init__() + # reconstruct `__init__()` method in BaseRecognizer to avoid building + # backbone and head which are useless to ExampleRecognizer, + # since ExampleRecognizer is only used for model-unrelated methods + # (like `average_clip`) testing. + self.train_cfg = train_cfg + self.test_cfg = test_cfg + + def forward_train(self, imgs, labels): + pass + + def forward_test(self, imgs): + pass + + def forward_gradcam(self, imgs): + pass + + +def test_base_recognizer(): + cls_score = torch.rand(5, 400) + with pytest.raises(KeyError): + # "average_clips" must defined in test_cfg keys + wrong_test_cfg = dict(clip='score') + recognizer = ExampleRecognizer(None, wrong_test_cfg) + recognizer.average_clip(cls_score) + + with pytest.raises(ValueError): + # unsupported average clips type + wrong_test_cfg = dict(average_clips='softmax') + recognizer = ExampleRecognizer(None, wrong_test_cfg) + recognizer.average_clip(cls_score) + + with pytest.raises(ValueError): + # Label should not be None + recognizer = ExampleRecognizer(None, None) + recognizer(torch.tensor(0)) + + # average_clips=None + test_cfg = dict(average_clips=None) + recognizer = ExampleRecognizer(None, test_cfg) + score = recognizer.average_clip(cls_score, num_segs=5) + assert torch.equal(score, cls_score) + + # average_clips='score' + test_cfg = dict(average_clips='score') + recognizer = ExampleRecognizer(None, test_cfg) + score = recognizer.average_clip(cls_score, num_segs=5) + assert torch.equal(score, cls_score.mean(dim=0, keepdim=True)) + + # average_clips='prob' + test_cfg = dict(average_clips='prob') + recognizer = ExampleRecognizer(None, test_cfg) + score = recognizer.average_clip(cls_score, num_segs=5) + assert torch.equal(score, + F.softmax(cls_score, dim=1).mean(dim=0, keepdim=True)) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_mobilenet_v2.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_mobilenet_v2.py new file mode 100644 index 0000000000000000000000000000000000000000..09baee92fa861f2a68586bea0873b39ba751373b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_mobilenet_v2.py @@ -0,0 +1,218 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import pytest +import torch +from mmcv.utils import _BatchNorm + +from mmaction.models import MobileNetV2 +from ..base import check_norm_state, generate_backbone_demo_inputs + + +def test_mobilenetv2_backbone(): + """Test MobileNetV2. + + Modified from mmclassification. + """ + from torch.nn.modules import GroupNorm + + from mmaction.models.backbones.mobilenet_v2 import InvertedResidual + + def is_norm(modules): + """Check if is one of the norms.""" + if isinstance(modules, (GroupNorm, _BatchNorm)): + return True + return False + + def is_block(modules): + """Check if is ResNet building block.""" + if isinstance(modules, (InvertedResidual, )): + return True + return False + + with pytest.raises(TypeError): + # pretrained must be a string path + model = MobileNetV2(pretrained=0) + model.init_weights() + + with pytest.raises(ValueError): + # frozen_stages must in range(1, 9) + MobileNetV2(frozen_stages=9) + + with pytest.raises(ValueError): + # tout_indices in range(-1, 8) + MobileNetV2(out_indices=[8]) + + input_shape = (1, 3, 224, 224) + imgs = generate_backbone_demo_inputs(input_shape) + + # Test MobileNetV2 with first stage frozen + frozen_stages = 1 + model = MobileNetV2(frozen_stages=frozen_stages) + model.init_weights() + model.train() + + for mod in model.conv1.modules(): + for param in mod.parameters(): + assert param.requires_grad is False + for i in range(1, frozen_stages + 1): + layer = getattr(model, f'layer{i}') + for mod in layer.modules(): + if isinstance(mod, _BatchNorm): + assert mod.training is False + for param in layer.parameters(): + assert param.requires_grad is False + + # Test MobileNetV2 with all stages frozen + frozen_stages = 8 + model = MobileNetV2(frozen_stages=frozen_stages) + model.init_weights() + model.train() + + for mod in model.modules(): + if not isinstance(mod, MobileNetV2): + assert mod.training is False + for param in mod.parameters(): + assert param.requires_grad is False + + # Test MobileNetV2 with norm_eval=True + model = MobileNetV2(norm_eval=True) + model.init_weights() + model.train() + + assert check_norm_state(model.modules(), False) + + # Test MobileNetV2 forward with widen_factor=1.0, pretrained + model = MobileNetV2( + widen_factor=1.0, + out_indices=range(0, 8), + pretrained='mmcls://mobilenet_v2') + model.init_weights() + model.train() + + assert check_norm_state(model.modules(), True) + + feat = model(imgs) + assert len(feat) == 8 + assert feat[0].shape == torch.Size((1, 16, 112, 112)) + assert feat[1].shape == torch.Size((1, 24, 56, 56)) + assert feat[2].shape == torch.Size((1, 32, 28, 28)) + assert feat[3].shape == torch.Size((1, 64, 14, 14)) + assert feat[4].shape == torch.Size((1, 96, 14, 14)) + assert feat[5].shape == torch.Size((1, 160, 7, 7)) + assert feat[6].shape == torch.Size((1, 320, 7, 7)) + assert feat[7].shape == torch.Size((1, 1280, 7, 7)) + + # Test MobileNetV2 forward with widen_factor=0.5 + model = MobileNetV2(widen_factor=0.5, out_indices=range(0, 7)) + model.init_weights() + model.train() + + feat = model(imgs) + assert len(feat) == 7 + assert feat[0].shape == torch.Size((1, 8, 112, 112)) + assert feat[1].shape == torch.Size((1, 16, 56, 56)) + assert feat[2].shape == torch.Size((1, 16, 28, 28)) + assert feat[3].shape == torch.Size((1, 32, 14, 14)) + assert feat[4].shape == torch.Size((1, 48, 14, 14)) + assert feat[5].shape == torch.Size((1, 80, 7, 7)) + assert feat[6].shape == torch.Size((1, 160, 7, 7)) + + # Test MobileNetV2 forward with widen_factor=2.0 + model = MobileNetV2(widen_factor=2.0) + model.init_weights() + model.train() + + feat = model(imgs) + assert feat.shape == torch.Size((1, 2560, 7, 7)) + + # Test MobileNetV2 forward with out_indices=None + model = MobileNetV2(widen_factor=1.0) + model.init_weights() + model.train() + + feat = model(imgs) + assert feat.shape == torch.Size((1, 1280, 7, 7)) + + # Test MobileNetV2 forward with dict(type='ReLU') + model = MobileNetV2( + widen_factor=1.0, act_cfg=dict(type='ReLU'), out_indices=range(0, 7)) + model.init_weights() + model.train() + + feat = model(imgs) + assert len(feat) == 7 + assert feat[0].shape == torch.Size((1, 16, 112, 112)) + assert feat[1].shape == torch.Size((1, 24, 56, 56)) + assert feat[2].shape == torch.Size((1, 32, 28, 28)) + assert feat[3].shape == torch.Size((1, 64, 14, 14)) + assert feat[4].shape == torch.Size((1, 96, 14, 14)) + assert feat[5].shape == torch.Size((1, 160, 7, 7)) + assert feat[6].shape == torch.Size((1, 320, 7, 7)) + + # Test MobileNetV2 with GroupNorm forward + model = MobileNetV2(widen_factor=1.0, out_indices=range(0, 7)) + for m in model.modules(): + if is_norm(m): + assert isinstance(m, _BatchNorm) + model.init_weights() + model.train() + + feat = model(imgs) + assert len(feat) == 7 + assert feat[0].shape == torch.Size((1, 16, 112, 112)) + assert feat[1].shape == torch.Size((1, 24, 56, 56)) + assert feat[2].shape == torch.Size((1, 32, 28, 28)) + assert feat[3].shape == torch.Size((1, 64, 14, 14)) + assert feat[4].shape == torch.Size((1, 96, 14, 14)) + assert feat[5].shape == torch.Size((1, 160, 7, 7)) + assert feat[6].shape == torch.Size((1, 320, 7, 7)) + + # Test MobileNetV2 with BatchNorm forward + model = MobileNetV2( + widen_factor=1.0, + norm_cfg=dict(type='GN', num_groups=2, requires_grad=True), + out_indices=range(0, 7)) + for m in model.modules(): + if is_norm(m): + assert isinstance(m, GroupNorm) + model.init_weights() + model.train() + + feat = model(imgs) + assert len(feat) == 7 + assert feat[0].shape == torch.Size((1, 16, 112, 112)) + assert feat[1].shape == torch.Size((1, 24, 56, 56)) + assert feat[2].shape == torch.Size((1, 32, 28, 28)) + assert feat[3].shape == torch.Size((1, 64, 14, 14)) + assert feat[4].shape == torch.Size((1, 96, 14, 14)) + assert feat[5].shape == torch.Size((1, 160, 7, 7)) + assert feat[6].shape == torch.Size((1, 320, 7, 7)) + + # Test MobileNetV2 with layers 1, 3, 5 out forward + model = MobileNetV2(widen_factor=1.0, out_indices=(0, 2, 4)) + model.init_weights() + model.train() + + feat = model(imgs) + assert len(feat) == 3 + assert feat[0].shape == torch.Size((1, 16, 112, 112)) + assert feat[1].shape == torch.Size((1, 32, 28, 28)) + assert feat[2].shape == torch.Size((1, 96, 14, 14)) + + # Test MobileNetV2 with checkpoint forward + model = MobileNetV2( + widen_factor=1.0, with_cp=True, out_indices=range(0, 7)) + for m in model.modules(): + if is_block(m): + assert m.with_cp + model.init_weights() + model.train() + + feat = model(imgs) + assert len(feat) == 7 + assert feat[0].shape == torch.Size((1, 16, 112, 112)) + assert feat[1].shape == torch.Size((1, 24, 56, 56)) + assert feat[2].shape == torch.Size((1, 32, 28, 28)) + assert feat[3].shape == torch.Size((1, 64, 14, 14)) + assert feat[4].shape == torch.Size((1, 96, 14, 14)) + assert feat[5].shape == torch.Size((1, 160, 7, 7)) + assert feat[6].shape == torch.Size((1, 320, 7, 7)) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_resnet.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..7f4a46ecd0f9d7de75e1f303b3af88ba3a0b2eab --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_resnet.py @@ -0,0 +1,128 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import pytest +import torch +import torch.nn as nn +from mmcv.utils import _BatchNorm + +from mmaction.models import ResNet +from ..base import check_norm_state, generate_backbone_demo_inputs + + +def test_resnet_backbone(): + """Test resnet backbone.""" + with pytest.raises(KeyError): + # ResNet depth should be in [18, 34, 50, 101, 152] + ResNet(20) + + with pytest.raises(AssertionError): + # In ResNet: 1 <= num_stages <= 4 + ResNet(50, num_stages=0) + + with pytest.raises(AssertionError): + # In ResNet: 1 <= num_stages <= 4 + ResNet(50, num_stages=5) + + with pytest.raises(AssertionError): + # len(strides) == len(dilations) == num_stages + ResNet(50, strides=(1, ), dilations=(1, 1), num_stages=3) + + with pytest.raises(TypeError): + # pretrain must be a str + resnet50 = ResNet(50, pretrained=0) + resnet50.init_weights() + + with pytest.raises(AssertionError): + # style must be in ['pytorch', 'caffe'] + ResNet(18, style='tensorflow') + + with pytest.raises(AssertionError): + # assert not with_cp + ResNet(18, with_cp=True) + + # resnet with depth 18, norm_eval False, initial weights + resnet18 = ResNet(18) + resnet18.init_weights() + + # resnet with depth 50, norm_eval True + resnet50 = ResNet(50, norm_eval=True) + resnet50.init_weights() + resnet50.train() + assert check_norm_state(resnet50.modules(), False) + + # resnet with depth 50, norm_eval True, pretrained + resnet50_pretrain = ResNet( + pretrained='torchvision://resnet50', depth=50, norm_eval=True) + resnet50_pretrain.init_weights() + resnet50_pretrain.train() + assert check_norm_state(resnet50_pretrain.modules(), False) + + # resnet with depth 50, norm_eval True, frozen_stages 1 + frozen_stages = 1 + resnet50_frozen = ResNet(50, frozen_stages=frozen_stages) + resnet50_frozen.init_weights() + resnet50_frozen.train() + assert resnet50_frozen.conv1.bn.training is False + for layer in resnet50_frozen.conv1.modules(): + for param in layer.parameters(): + assert param.requires_grad is False + for i in range(1, frozen_stages + 1): + layer = getattr(resnet50_frozen, f'layer{i}') + for mod in layer.modules(): + if isinstance(mod, _BatchNorm): + assert mod.training is False + for param in layer.parameters(): + assert param.requires_grad is False + + # resnet with depth 50, partial batchnorm + resnet_pbn = ResNet(50, partial_bn=True) + resnet_pbn.train() + count_bn = 0 + for m in resnet_pbn.modules(): + if isinstance(m, nn.BatchNorm2d): + count_bn += 1 + if count_bn >= 2: + assert m.weight.requires_grad is False + assert m.bias.requires_grad is False + assert m.training is False + else: + assert m.weight.requires_grad is True + assert m.bias.requires_grad is True + assert m.training is True + + input_shape = (1, 3, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + + # resnet with depth 18 inference + resnet18 = ResNet(18, norm_eval=False) + resnet18.init_weights() + resnet18.train() + feat = resnet18(imgs) + assert feat.shape == torch.Size([1, 512, 2, 2]) + + # resnet with depth 50 inference + resnet50 = ResNet(50, norm_eval=False) + resnet50.init_weights() + resnet50.train() + feat = resnet50(imgs) + assert feat.shape == torch.Size([1, 2048, 2, 2]) + + # resnet with depth 50 in caffe style inference + resnet50_caffe = ResNet(50, style='caffe', norm_eval=False) + resnet50_caffe.init_weights() + resnet50_caffe.train() + feat = resnet50_caffe(imgs) + assert feat.shape == torch.Size([1, 2048, 2, 2]) + + resnet50_flow = ResNet( + depth=50, pretrained='torchvision://resnet50', in_channels=10) + input_shape = (1, 10, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + feat = resnet50_flow(imgs) + assert feat.shape == torch.Size([1, 2048, 2, 2]) + + resnet50 = ResNet( + depth=50, pretrained='torchvision://resnet50', in_channels=3) + input_shape = (1, 3, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + feat = resnet50(imgs) + assert feat.shape == torch.Size([1, 2048, 2, 2]) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_resnet3d.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_resnet3d.py new file mode 100644 index 0000000000000000000000000000000000000000..d0c354eaa44ef7651c27744a2293e8c5fe8b93da --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_common_modules/test_resnet3d.py @@ -0,0 +1,335 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import pytest +import torch +import torch.nn as nn +from mmcv.utils import _BatchNorm + +from mmaction.models import ResNet3d, ResNet3dLayer +from ..base import check_norm_state, generate_backbone_demo_inputs + + +def test_resnet3d_backbone(): + """Test resnet3d backbone.""" + with pytest.raises(AssertionError): + # In ResNet3d: 1 <= num_stages <= 4 + ResNet3d(34, None, num_stages=0) + + with pytest.raises(AssertionError): + # In ResNet3d: 1 <= num_stages <= 4 + ResNet3d(34, None, num_stages=5) + + with pytest.raises(AssertionError): + # In ResNet3d: 1 <= num_stages <= 4 + ResNet3d(50, None, num_stages=0) + + with pytest.raises(AssertionError): + # In ResNet3d: 1 <= num_stages <= 4 + ResNet3d(50, None, num_stages=5) + + with pytest.raises(AssertionError): + # len(spatial_strides) == len(temporal_strides) + # == len(dilations) == num_stages + ResNet3d( + 50, + None, + spatial_strides=(1, ), + temporal_strides=(1, 1), + dilations=(1, 1, 1), + num_stages=4) + + with pytest.raises(AssertionError): + # len(spatial_strides) == len(temporal_strides) + # == len(dilations) == num_stages + ResNet3d( + 34, + None, + spatial_strides=(1, ), + temporal_strides=(1, 1), + dilations=(1, 1, 1), + num_stages=4) + + with pytest.raises(TypeError): + # pretrain must be str or None. + resnet3d_34 = ResNet3d(34, ['resnet', 'bninception']) + resnet3d_34.init_weights() + + with pytest.raises(TypeError): + # pretrain must be str or None. + resnet3d_50 = ResNet3d(50, ['resnet', 'bninception']) + resnet3d_50.init_weights() + + # resnet3d with depth 34, no pretrained, norm_eval True + resnet3d_34 = ResNet3d(34, None, pretrained2d=False, norm_eval=True) + resnet3d_34.init_weights() + resnet3d_34.train() + assert check_norm_state(resnet3d_34.modules(), False) + + # resnet3d with depth 50, no pretrained, norm_eval True + resnet3d_50 = ResNet3d(50, None, pretrained2d=False, norm_eval=True) + resnet3d_50.init_weights() + resnet3d_50.train() + assert check_norm_state(resnet3d_50.modules(), False) + + # resnet3d with depth 50, pretrained2d, norm_eval True + resnet3d_50_pretrain = ResNet3d( + 50, 'torchvision://resnet50', norm_eval=True) + resnet3d_50_pretrain.init_weights() + resnet3d_50_pretrain.train() + assert check_norm_state(resnet3d_50_pretrain.modules(), False) + from mmcv.runner import _load_checkpoint + chkp_2d = _load_checkpoint('torchvision://resnet50') + for name, module in resnet3d_50_pretrain.named_modules(): + if len(name.split('.')) == 4: + # layer.block.module.submodule + prefix = name.split('.')[:2] + module_type = name.split('.')[2] + submodule_type = name.split('.')[3] + + if module_type == 'downsample': + name2d = name.replace('conv', '0').replace('bn', '1') + else: + layer_id = name.split('.')[2][-1] + name2d = prefix[0] + '.' + prefix[1] + '.' + \ + submodule_type + layer_id + + if isinstance(module, nn.Conv3d): + conv2d_weight = chkp_2d[name2d + '.weight'] + conv3d_weight = getattr(module, 'weight').data + assert torch.equal( + conv3d_weight, + conv2d_weight.data.unsqueeze(2).expand_as(conv3d_weight) / + conv3d_weight.shape[2]) + if getattr(module, 'bias') is not None: + conv2d_bias = chkp_2d[name2d + '.bias'] + conv3d_bias = getattr(module, 'bias').data + assert torch.equal(conv2d_bias, conv3d_bias) + + elif isinstance(module, nn.BatchNorm3d): + for pname in ['weight', 'bias', 'running_mean', 'running_var']: + param_2d = chkp_2d[name2d + '.' + pname] + param_3d = getattr(module, pname).data + assert torch.equal(param_2d, param_3d) + + conv3d = resnet3d_50_pretrain.conv1.conv + assert torch.equal( + conv3d.weight, + chkp_2d['conv1.weight'].unsqueeze(2).expand_as(conv3d.weight) / + conv3d.weight.shape[2]) + conv3d = resnet3d_50_pretrain.layer3[2].conv2.conv + assert torch.equal( + conv3d.weight, chkp_2d['layer3.2.conv2.weight'].unsqueeze(2).expand_as( + conv3d.weight) / conv3d.weight.shape[2]) + + # resnet3d with depth 34, no pretrained, norm_eval False + resnet3d_34_no_bn_eval = ResNet3d( + 34, None, pretrained2d=False, norm_eval=False) + resnet3d_34_no_bn_eval.init_weights() + resnet3d_34_no_bn_eval.train() + assert check_norm_state(resnet3d_34_no_bn_eval.modules(), True) + + # resnet3d with depth 50, no pretrained, norm_eval False + resnet3d_50_no_bn_eval = ResNet3d( + 50, None, pretrained2d=False, norm_eval=False) + resnet3d_50_no_bn_eval.init_weights() + resnet3d_50_no_bn_eval.train() + assert check_norm_state(resnet3d_50_no_bn_eval.modules(), True) + + # resnet3d with depth 34, no pretrained, frozen_stages, norm_eval False + frozen_stages = 1 + resnet3d_34_frozen = ResNet3d( + 34, None, pretrained2d=False, frozen_stages=frozen_stages) + resnet3d_34_frozen.init_weights() + resnet3d_34_frozen.train() + assert resnet3d_34_frozen.conv1.bn.training is False + for param in resnet3d_34_frozen.conv1.parameters(): + assert param.requires_grad is False + for i in range(1, frozen_stages + 1): + layer = getattr(resnet3d_34_frozen, f'layer{i}') + for mod in layer.modules(): + if isinstance(mod, _BatchNorm): + assert mod.training is False + for param in layer.parameters(): + assert param.requires_grad is False + # test zero_init_residual + for m in resnet3d_34_frozen.modules(): + if hasattr(m, 'conv2'): + assert torch.equal(m.conv2.bn.weight, + torch.zeros_like(m.conv2.bn.weight)) + assert torch.equal(m.conv2.bn.bias, + torch.zeros_like(m.conv2.bn.bias)) + + # resnet3d with depth 50, no pretrained, frozen_stages, norm_eval False + frozen_stages = 1 + resnet3d_50_frozen = ResNet3d( + 50, None, pretrained2d=False, frozen_stages=frozen_stages) + resnet3d_50_frozen.init_weights() + resnet3d_50_frozen.train() + assert resnet3d_50_frozen.conv1.bn.training is False + for param in resnet3d_50_frozen.conv1.parameters(): + assert param.requires_grad is False + for i in range(1, frozen_stages + 1): + layer = getattr(resnet3d_50_frozen, f'layer{i}') + for mod in layer.modules(): + if isinstance(mod, _BatchNorm): + assert mod.training is False + for param in layer.parameters(): + assert param.requires_grad is False + # test zero_init_residual + for m in resnet3d_50_frozen.modules(): + if hasattr(m, 'conv3'): + assert torch.equal(m.conv3.bn.weight, + torch.zeros_like(m.conv3.bn.weight)) + assert torch.equal(m.conv3.bn.bias, + torch.zeros_like(m.conv3.bn.bias)) + + # resnet3d frozen with depth 34 inference + input_shape = (1, 3, 6, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + resnet3d_34_frozen = resnet3d_34_frozen.cuda() + imgs_gpu = imgs.cuda() + feat = resnet3d_34_frozen(imgs_gpu) + assert feat.shape == torch.Size([1, 512, 3, 2, 2]) + else: + feat = resnet3d_34_frozen(imgs) + assert feat.shape == torch.Size([1, 512, 3, 2, 2]) + + # resnet3d with depth 50 inference + input_shape = (1, 3, 6, 64, 64) + imgs = generate_backbone_demo_inputs(input_shape) + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + resnet3d_50_frozen = resnet3d_50_frozen.cuda() + imgs_gpu = imgs.cuda() + feat = resnet3d_50_frozen(imgs_gpu) + assert feat.shape == torch.Size([1, 2048, 3, 2, 2]) + else: + feat = resnet3d_50_frozen(imgs) + assert feat.shape == torch.Size([1, 2048, 3, 2, 2]) + + # resnet3d with depth 50 in caffe style inference + resnet3d_50_caffe = ResNet3d(50, None, pretrained2d=False, style='caffe') + resnet3d_50_caffe.init_weights() + resnet3d_50_caffe.train() + + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + resnet3d_50_caffe = resnet3d_50_caffe.cuda() + imgs_gpu = imgs.cuda() + feat = resnet3d_50_caffe(imgs_gpu) + assert feat.shape == torch.Size([1, 2048, 3, 2, 2]) + else: + feat = resnet3d_50_caffe(imgs) + assert feat.shape == torch.Size([1, 2048, 3, 2, 2]) + + # resnet3d with depth 34 in caffe style inference + resnet3d_34_caffe = ResNet3d(34, None, pretrained2d=False, style='caffe') + resnet3d_34_caffe.init_weights() + resnet3d_34_caffe.train() + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + resnet3d_34_caffe = resnet3d_34_caffe.cuda() + imgs_gpu = imgs.cuda() + feat = resnet3d_34_caffe(imgs_gpu) + assert feat.shape == torch.Size([1, 512, 3, 2, 2]) + else: + feat = resnet3d_34_caffe(imgs) + assert feat.shape == torch.Size([1, 512, 3, 2, 2]) + + # resnet3d with depth with 3x3x3 inflate_style inference + resnet3d_50_1x1x1 = ResNet3d( + 50, None, pretrained2d=False, inflate_style='3x3x3') + resnet3d_50_1x1x1.init_weights() + resnet3d_50_1x1x1.train() + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + resnet3d_50_1x1x1 = resnet3d_50_1x1x1.cuda() + imgs_gpu = imgs.cuda() + feat = resnet3d_50_1x1x1(imgs_gpu) + assert feat.shape == torch.Size([1, 2048, 3, 2, 2]) + else: + feat = resnet3d_50_1x1x1(imgs) + assert feat.shape == torch.Size([1, 2048, 3, 2, 2]) + + resnet3d_34_1x1x1 = ResNet3d( + 34, None, pretrained2d=False, inflate_style='3x3x3') + resnet3d_34_1x1x1.init_weights() + resnet3d_34_1x1x1.train() + + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + resnet3d_34_1x1x1 = resnet3d_34_1x1x1.cuda() + imgs_gpu = imgs.cuda() + feat = resnet3d_34_1x1x1(imgs_gpu) + assert feat.shape == torch.Size([1, 512, 3, 2, 2]) + else: + feat = resnet3d_34_1x1x1(imgs) + assert feat.shape == torch.Size([1, 512, 3, 2, 2]) + + # resnet3d with non-local module + non_local_cfg = dict( + sub_sample=True, + use_scale=False, + norm_cfg=dict(type='BN3d', requires_grad=True), + mode='embedded_gaussian') + non_local = ((0, 0, 0), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 0, 0)) + resnet3d_nonlocal = ResNet3d( + 50, + None, + pretrained2d=False, + non_local=non_local, + non_local_cfg=non_local_cfg) + resnet3d_nonlocal.init_weights() + for layer_name in ['layer2', 'layer3']: + layer = getattr(resnet3d_nonlocal, layer_name) + for i, _ in enumerate(layer): + if i % 2 == 0: + assert hasattr(layer[i], 'non_local_block') + + feat = resnet3d_nonlocal(imgs) + assert feat.shape == torch.Size([1, 2048, 3, 2, 2]) + + +def test_resnet3d_layer(): + with pytest.raises(AssertionError): + ResNet3dLayer(22, None) + + with pytest.raises(AssertionError): + ResNet3dLayer(50, None, stage=4) + + res_layer = ResNet3dLayer(50, None, stage=3, norm_eval=True) + res_layer.init_weights() + res_layer.train() + input_shape = (1, 1024, 1, 4, 4) + imgs = generate_backbone_demo_inputs(input_shape) + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + res_layer = res_layer.cuda() + imgs_gpu = imgs.cuda() + feat = res_layer(imgs_gpu) + assert feat.shape == torch.Size([1, 2048, 1, 2, 2]) + else: + feat = res_layer(imgs) + assert feat.shape == torch.Size([1, 2048, 1, 2, 2]) + + res_layer = ResNet3dLayer( + 50, 'torchvision://resnet50', stage=3, all_frozen=True) + res_layer.init_weights() + res_layer.train() + imgs = generate_backbone_demo_inputs(input_shape) + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + res_layer = res_layer.cuda() + imgs_gpu = imgs.cuda() + feat = res_layer(imgs_gpu) + assert feat.shape == torch.Size([1, 2048, 1, 2, 2]) + else: + feat = res_layer(imgs) + assert feat.shape == torch.Size([1, 2048, 1, 2, 2]) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_detectors/__init__.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_detectors/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_detectors/__init__.py @@ -0,0 +1 @@ +# Copyright (c) OpenMMLab. All rights reserved. diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_detectors/test_detectors.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_detectors/test_detectors.py new file mode 100644 index 0000000000000000000000000000000000000000..e1590be44208669e179a65cf0d6055f0fac6e2cf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_detectors/test_detectors.py @@ -0,0 +1,42 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import pytest +import torch + +from ..base import generate_detector_demo_inputs, get_detector_cfg + +try: + from mmaction.models import build_detector + mmdet_imported = True +except (ImportError, ModuleNotFoundError): + mmdet_imported = False + + +@pytest.mark.skipif(not mmdet_imported, reason='requires mmdet') +def test_ava_detector(): + config = get_detector_cfg('ava/slowonly_kinetics_pretrained_r50_' + '4x16x1_20e_ava_rgb.py') + detector = build_detector(config.model) + + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + train_demo_inputs = generate_detector_demo_inputs( + train=True, device='cuda') + test_demo_inputs = generate_detector_demo_inputs( + train=False, device='cuda') + detector = detector.cuda() + + losses = detector(**train_demo_inputs) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + _ = detector(**test_demo_inputs, return_loss=False) + else: + train_demo_inputs = generate_detector_demo_inputs(train=True) + test_demo_inputs = generate_detector_demo_inputs(train=False) + losses = detector(**train_demo_inputs) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + _ = detector(**test_demo_inputs, return_loss=False) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_gradcam.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_gradcam.py new file mode 100644 index 0000000000000000000000000000000000000000..f80333deee66b56f40681e7065db3ef6b6b52c72 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_gradcam.py @@ -0,0 +1,230 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import pytest +import torch + +from mmaction.models import build_recognizer +from mmaction.utils.gradcam_utils import GradCAM +from .base import generate_gradcam_inputs, get_recognizer_cfg + + +def _get_target_shapes(input_shape, num_classes=400, model_type='2D'): + if model_type not in ['2D', '3D']: + raise ValueError(f'Data type {model_type} is not available') + + preds_target_shape = (input_shape[0], num_classes) + if model_type == '3D': + # input shape (batch_size, num_crops*num_clips, C, clip_len, H, W) + # target shape (batch_size*num_crops*num_clips, clip_len, H, W, C) + blended_imgs_target_shape = (input_shape[0] * input_shape[1], + input_shape[3], input_shape[4], + input_shape[5], input_shape[2]) + else: + # input shape (batch_size, num_segments, C, H, W) + # target shape (batch_size, num_segments, H, W, C) + blended_imgs_target_shape = (input_shape[0], input_shape[1], + input_shape[3], input_shape[4], + input_shape[2]) + + return blended_imgs_target_shape, preds_target_shape + + +def _do_test_2D_models(recognizer, + target_layer_name, + input_shape, + num_classes=400, + device='cpu'): + demo_inputs = generate_gradcam_inputs(input_shape) + demo_inputs['imgs'] = demo_inputs['imgs'].to(device) + demo_inputs['label'] = demo_inputs['label'].to(device) + + recognizer = recognizer.to(device) + gradcam = GradCAM(recognizer, target_layer_name) + + blended_imgs_target_shape, preds_target_shape = _get_target_shapes( + input_shape, num_classes=num_classes, model_type='2D') + + blended_imgs, preds = gradcam(demo_inputs) + assert blended_imgs.size() == blended_imgs_target_shape + assert preds.size() == preds_target_shape + + blended_imgs, preds = gradcam(demo_inputs, True) + assert blended_imgs.size() == blended_imgs_target_shape + assert preds.size() == preds_target_shape + + +def _do_test_3D_models(recognizer, + target_layer_name, + input_shape, + num_classes=400): + blended_imgs_target_shape, preds_target_shape = _get_target_shapes( + input_shape, num_classes=num_classes, model_type='3D') + demo_inputs = generate_gradcam_inputs(input_shape, '3D') + + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + recognizer = recognizer.cuda() + demo_inputs['imgs'] = demo_inputs['imgs'].cuda() + demo_inputs['label'] = demo_inputs['label'].cuda() + gradcam = GradCAM(recognizer, target_layer_name) + + blended_imgs, preds = gradcam(demo_inputs) + assert blended_imgs.size() == blended_imgs_target_shape + assert preds.size() == preds_target_shape + + blended_imgs, preds = gradcam(demo_inputs, True) + assert blended_imgs.size() == blended_imgs_target_shape + assert preds.size() == preds_target_shape + else: + gradcam = GradCAM(recognizer, target_layer_name) + + blended_imgs, preds = gradcam(demo_inputs) + assert blended_imgs.size() == blended_imgs_target_shape + assert preds.size() == preds_target_shape + + blended_imgs, preds = gradcam(demo_inputs, True) + assert blended_imgs.size() == blended_imgs_target_shape + assert preds.size() == preds_target_shape + + +def test_tsn(): + config = get_recognizer_cfg('tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py') + config.model['backbone']['pretrained'] = None + recognizer = build_recognizer(config.model) + recognizer.cfg = config + + input_shape = (1, 25, 3, 32, 32) + target_layer_name = 'backbone/layer4/1/relu' + + _do_test_2D_models(recognizer, target_layer_name, input_shape) + + +def test_i3d(): + config = get_recognizer_cfg('i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py') + config.model['backbone']['pretrained2d'] = False + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + recognizer.cfg = config + + input_shape = [1, 1, 3, 32, 32, 32] + target_layer_name = 'backbone/layer4/1/relu' + + _do_test_3D_models(recognizer, target_layer_name, input_shape) + + +def test_r2plus1d(): + config = get_recognizer_cfg( + 'r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py') + config.model['backbone']['pretrained2d'] = False + config.model['backbone']['pretrained'] = None + config.model['backbone']['norm_cfg'] = dict(type='BN3d') + + recognizer = build_recognizer(config.model) + recognizer.cfg = config + + input_shape = (1, 3, 3, 8, 32, 32) + target_layer_name = 'backbone/layer4/1/relu' + + _do_test_3D_models(recognizer, target_layer_name, input_shape) + + +def test_slowfast(): + config = get_recognizer_cfg( + 'slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py') + + recognizer = build_recognizer(config.model) + recognizer.cfg = config + + input_shape = (1, 1, 3, 32, 32, 32) + target_layer_name = 'backbone/slow_path/layer4/1/relu' + + _do_test_3D_models(recognizer, target_layer_name, input_shape) + + +def test_tsm(): + config = get_recognizer_cfg('tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py') + config.model['backbone']['pretrained'] = None + target_layer_name = 'backbone/layer4/1/relu' + + # base config + recognizer = build_recognizer(config.model) + recognizer.cfg = config + input_shape = (1, 8, 3, 32, 32) + _do_test_2D_models(recognizer, target_layer_name, input_shape) + + # test twice sample + 3 crops, 2*3*8=48 + config.model.test_cfg = dict(average_clips='prob') + recognizer = build_recognizer(config.model) + recognizer.cfg = config + input_shape = (1, 48, 3, 32, 32) + _do_test_2D_models(recognizer, target_layer_name, input_shape) + + +def test_csn(): + config = get_recognizer_cfg( + 'csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py') + config.model['backbone']['pretrained2d'] = False + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + recognizer.cfg = config + input_shape = (1, 1, 3, 32, 32, 32) + target_layer_name = 'backbone/layer4/1/relu' + + _do_test_3D_models(recognizer, target_layer_name, input_shape) + + +def test_tpn(): + target_layer_name = 'backbone/layer4/1/relu' + + config = get_recognizer_cfg('tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.py') + config.model['backbone']['pretrained'] = None + recognizer = build_recognizer(config.model) + recognizer.cfg = config + + input_shape = (1, 8, 3, 32, 32) + _do_test_2D_models(recognizer, target_layer_name, input_shape, 174) + + config = get_recognizer_cfg( + 'tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py') + config.model['backbone']['pretrained'] = None + recognizer = build_recognizer(config.model) + recognizer.cfg = config + input_shape = (1, 3, 3, 8, 32, 32) + _do_test_3D_models(recognizer, target_layer_name, input_shape) + + +def test_c3d(): + config = get_recognizer_cfg('c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py') + config.model['backbone']['pretrained'] = None + recognizer = build_recognizer(config.model) + recognizer.cfg = config + input_shape = (1, 1, 3, 16, 112, 112) + target_layer_name = 'backbone/conv5a/activate' + _do_test_3D_models(recognizer, target_layer_name, input_shape, 101) + + +@pytest.mark.skipif( + not torch.cuda.is_available(), reason='requires CUDA support') +def test_tin(): + config = get_recognizer_cfg( + 'tin/tin_tsm_finetune_r50_1x1x8_50e_kinetics400_rgb.py') + config.model['backbone']['pretrained'] = None + target_layer_name = 'backbone/layer4/1/relu' + + recognizer = build_recognizer(config.model) + recognizer.cfg = config + input_shape = (1, 8, 3, 64, 64) + _do_test_2D_models( + recognizer, target_layer_name, input_shape, device='cuda:0') + + +def test_x3d(): + config = get_recognizer_cfg('x3d/x3d_s_13x6x1_facebook_kinetics400_rgb.py') + config.model['backbone']['pretrained'] = None + recognizer = build_recognizer(config.model) + recognizer.cfg = config + input_shape = (1, 1, 3, 13, 32, 32) + target_layer_name = 'backbone/layer4/1/relu' + _do_test_3D_models(recognizer, target_layer_name, input_shape) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_head.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_head.py new file mode 100644 index 0000000000000000000000000000000000000000..21ebf9a3982fdbe88df72a1936d0a18daedb0243 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_head.py @@ -0,0 +1,608 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp +import tempfile +from unittest.mock import Mock, patch + +import numpy as np +import pytest +import torch +import torch.nn as nn + +import mmaction +from mmaction.models import (ACRNHead, AudioTSNHead, BBoxHeadAVA, FBOHead, + I3DHead, LFBInferHead, SlowFastHead, STGCNHead, + TimeSformerHead, TPNHead, TRNHead, TSMHead, + TSNHead, X3DHead) +from .base import generate_backbone_demo_inputs + + +def test_i3d_head(): + """Test loss method, layer construction, attributes and forward function in + i3d head.""" + i3d_head = I3DHead(num_classes=4, in_channels=2048) + i3d_head.init_weights() + + assert i3d_head.num_classes == 4 + assert i3d_head.dropout_ratio == 0.5 + assert i3d_head.in_channels == 2048 + assert i3d_head.init_std == 0.01 + + assert isinstance(i3d_head.dropout, nn.Dropout) + assert i3d_head.dropout.p == i3d_head.dropout_ratio + + assert isinstance(i3d_head.fc_cls, nn.Linear) + assert i3d_head.fc_cls.in_features == i3d_head.in_channels + assert i3d_head.fc_cls.out_features == i3d_head.num_classes + + assert isinstance(i3d_head.avg_pool, nn.AdaptiveAvgPool3d) + assert i3d_head.avg_pool.output_size == (1, 1, 1) + + input_shape = (3, 2048, 4, 7, 7) + feat = torch.rand(input_shape) + + # i3d head inference + cls_scores = i3d_head(feat) + assert cls_scores.shape == torch.Size([3, 4]) + + +def test_bbox_head_ava(): + """Test loss method, layer construction, attributes and forward function in + bbox head.""" + with pytest.raises(TypeError): + # topk must be None, int or tuple[int] + BBoxHeadAVA(topk=0.1) + + with pytest.raises(AssertionError): + # topk should be smaller than num_classes + BBoxHeadAVA(num_classes=5, topk=(3, 5)) + + bbox_head = BBoxHeadAVA(in_channels=10, num_classes=4, topk=1) + input = torch.randn([3, 10, 2, 2, 2]) + ret, _ = bbox_head(input) + assert ret.shape == (3, 4) + + cls_score = torch.tensor( + [[0.568, -0.162, 0.273, -0.390, 0.447, 0.102, -0.409], + [2.388, 0.609, 0.369, 1.630, -0.808, -0.212, 0.296], + [0.252, -0.533, -0.644, -0.591, 0.148, 0.963, -0.525], + [0.134, -0.311, -0.764, -0.752, 0.656, -1.517, 0.185]]) + labels = torch.tensor([[0., 0., 1., 0., 0., 1., 0.], + [0., 0., 0., 1., 0., 0., 0.], + [0., 1., 0., 0., 1., 0., 1.], + [0., 0., 1., 1., 0., 0., 1.]]) + label_weights = torch.tensor([1., 1., 1., 1.]) + + # Test topk_to_matrix() + assert torch.equal( + BBoxHeadAVA.topk_to_matrix(cls_score[:, 1:], 1), + torch.tensor([[0, 0, 0, 1, 0, 0], [0, 0, 1, 0, 0, 0], + [0, 0, 0, 0, 1, 0], [0, 0, 0, 1, 0, 0]], + dtype=bool)) + assert torch.equal( + BBoxHeadAVA.topk_to_matrix(cls_score[:, 1:], 2), + torch.tensor([[0, 1, 0, 1, 0, 0], [1, 0, 1, 0, 0, 0], + [0, 0, 0, 1, 1, 0], [0, 0, 0, 1, 0, 1]], + dtype=bool)) + assert torch.equal( + BBoxHeadAVA.topk_to_matrix(cls_score[:, 1:], 3), + torch.tensor([[0, 1, 0, 1, 1, 0], [1, 1, 1, 0, 0, 0], + [0, 0, 0, 1, 1, 1], [1, 0, 0, 1, 0, 1]], + dtype=bool)) + assert torch.equal( + BBoxHeadAVA.topk_to_matrix(cls_score[:, 1:], 6), + torch.ones([4, 6], dtype=bool)) + + # Test Multi-Label Loss + bbox_head = BBoxHeadAVA() # Why is this here? isn't this redundant? + bbox_head.init_weights() + bbox_head = BBoxHeadAVA(temporal_pool_type='max', spatial_pool_type='avg') + bbox_head.init_weights() + losses = bbox_head.loss( + cls_score=cls_score, + bbox_pred=None, + rois=None, + labels=labels, + label_weights=label_weights) + assert torch.isclose(losses['loss_action_cls'], torch.tensor(0.7162495)) + assert torch.isclose(losses['recall@thr=0.5'], torch.tensor(0.6666666)) + assert torch.isclose(losses['prec@thr=0.5'], torch.tensor(0.4791665)) + assert torch.isclose(losses['recall@top3'], torch.tensor(0.75)) + assert torch.isclose(losses['prec@top3'], torch.tensor(0.5)) + assert torch.isclose(losses['recall@top5'], torch.tensor(1.0)) + assert torch.isclose(losses['prec@top5'], torch.tensor(0.45)) + + # Test Single-Label Loss + bbox_head = BBoxHeadAVA(multilabel=False) + losses = bbox_head.loss( + cls_score=cls_score, + bbox_pred=None, + rois=None, + labels=labels, + label_weights=label_weights) + assert torch.isclose(losses['loss_action_cls'], torch.tensor(1.639561)) + assert torch.isclose(losses['recall@thr=0.5'], torch.tensor(0.25)) + assert torch.isclose(losses['prec@thr=0.5'], torch.tensor(0.25)) + assert torch.isclose(losses['recall@top3'], torch.tensor(0.75)) + assert torch.isclose(losses['prec@top3'], torch.tensor(0.5)) + assert torch.isclose(losses['recall@top5'], torch.tensor(1.0)) + assert torch.isclose(losses['prec@top5'], torch.tensor(0.45)) + + # Test ROI + rois = torch.tensor([[0.0, 0.1, 0.2, 0.3, 0.4], [0.0, 0.5, 0.6, 0.7, 0.8]]) + rois[1::2] *= 380 + rois[2::2] *= 220 + crop_quadruple = np.array([0.1, 0.2, 0.8, 0.7]) + cls_score = torch.tensor([0.995, 0.728]) + img_shape = (320, 480) + flip = True + + bbox_head = BBoxHeadAVA(multilabel=True) + bboxes, scores = bbox_head.get_det_bboxes( + rois=rois, + cls_score=cls_score, + img_shape=img_shape, + flip=flip, + crop_quadruple=crop_quadruple) + assert torch.all( + torch.isclose( + bboxes, + torch.tensor([[0.89783341, 0.20043750, 0.89816672, 0.20087500], + [0.45499998, 0.69875002, 0.58166665, 0.86499995]]))) + assert torch.all( + torch.isclose(scores, torch.tensor([0.73007441, 0.67436624]))) + + bbox_head = BBoxHeadAVA(multilabel=False) + bboxes, scores = bbox_head.get_det_bboxes( + rois=rois, + cls_score=cls_score, + img_shape=img_shape, + flip=flip, + crop_quadruple=crop_quadruple) + assert torch.all( + torch.isclose( + bboxes, + torch.tensor([[0.89783341, 0.20043750, 0.89816672, 0.20087500], + [0.45499998, 0.69875002, 0.58166665, 0.86499995]]))) + assert torch.all(torch.isclose(scores, torch.tensor([0.56636, 0.43364]))) + + +def test_x3d_head(): + """Test loss method, layer construction, attributes and forward function in + x3d head.""" + x3d_head = X3DHead(in_channels=432, num_classes=4, fc1_bias=False) + x3d_head.init_weights() + + assert x3d_head.num_classes == 4 + assert x3d_head.dropout_ratio == 0.5 + assert x3d_head.in_channels == 432 + assert x3d_head.init_std == 0.01 + + assert isinstance(x3d_head.dropout, nn.Dropout) + assert x3d_head.dropout.p == x3d_head.dropout_ratio + + assert isinstance(x3d_head.fc1, nn.Linear) + assert x3d_head.fc1.in_features == x3d_head.in_channels + assert x3d_head.fc1.out_features == x3d_head.mid_channels + assert x3d_head.fc1.bias is None + + assert isinstance(x3d_head.fc2, nn.Linear) + assert x3d_head.fc2.in_features == x3d_head.mid_channels + assert x3d_head.fc2.out_features == x3d_head.num_classes + + assert isinstance(x3d_head.pool, nn.AdaptiveAvgPool3d) + assert x3d_head.pool.output_size == (1, 1, 1) + + input_shape = (3, 432, 4, 7, 7) + feat = torch.rand(input_shape) + + # i3d head inference + cls_scores = x3d_head(feat) + assert cls_scores.shape == torch.Size([3, 4]) + + +def test_slowfast_head(): + """Test loss method, layer construction, attributes and forward function in + slowfast head.""" + sf_head = SlowFastHead(num_classes=4, in_channels=2304) + sf_head.init_weights() + + assert sf_head.num_classes == 4 + assert sf_head.dropout_ratio == 0.8 + assert sf_head.in_channels == 2304 + assert sf_head.init_std == 0.01 + + assert isinstance(sf_head.dropout, nn.Dropout) + assert sf_head.dropout.p == sf_head.dropout_ratio + + assert isinstance(sf_head.fc_cls, nn.Linear) + assert sf_head.fc_cls.in_features == sf_head.in_channels + assert sf_head.fc_cls.out_features == sf_head.num_classes + + assert isinstance(sf_head.avg_pool, nn.AdaptiveAvgPool3d) + assert sf_head.avg_pool.output_size == (1, 1, 1) + + input_shape = (3, 2048, 32, 7, 7) + feat_slow = torch.rand(input_shape) + + input_shape = (3, 256, 4, 7, 7) + feat_fast = torch.rand(input_shape) + + sf_head = SlowFastHead(num_classes=4, in_channels=2304) + cls_scores = sf_head((feat_slow, feat_fast)) + assert cls_scores.shape == torch.Size([3, 4]) + + +def test_tsn_head(): + """Test loss method, layer construction, attributes and forward function in + tsn head.""" + tsn_head = TSNHead(num_classes=4, in_channels=2048) + tsn_head.init_weights() + + assert tsn_head.num_classes == 4 + assert tsn_head.dropout_ratio == 0.4 + assert tsn_head.in_channels == 2048 + assert tsn_head.init_std == 0.01 + assert tsn_head.consensus.dim == 1 + assert tsn_head.spatial_type == 'avg' + + assert isinstance(tsn_head.dropout, nn.Dropout) + assert tsn_head.dropout.p == tsn_head.dropout_ratio + + assert isinstance(tsn_head.fc_cls, nn.Linear) + assert tsn_head.fc_cls.in_features == tsn_head.in_channels + assert tsn_head.fc_cls.out_features == tsn_head.num_classes + + assert isinstance(tsn_head.avg_pool, nn.AdaptiveAvgPool2d) + assert tsn_head.avg_pool.output_size == (1, 1) + + input_shape = (8, 2048, 7, 7) + feat = torch.rand(input_shape) + + # tsn head inference + num_segs = input_shape[0] + cls_scores = tsn_head(feat, num_segs) + assert cls_scores.shape == torch.Size([1, 4]) + + # Test multi-class recognition + multi_tsn_head = TSNHead( + num_classes=4, + in_channels=2048, + loss_cls=dict(type='BCELossWithLogits', loss_weight=160.0), + multi_class=True, + label_smooth_eps=0.01) + multi_tsn_head.init_weights() + assert multi_tsn_head.num_classes == 4 + assert multi_tsn_head.dropout_ratio == 0.4 + assert multi_tsn_head.in_channels == 2048 + assert multi_tsn_head.init_std == 0.01 + assert multi_tsn_head.consensus.dim == 1 + + assert isinstance(multi_tsn_head.dropout, nn.Dropout) + assert multi_tsn_head.dropout.p == multi_tsn_head.dropout_ratio + + assert isinstance(multi_tsn_head.fc_cls, nn.Linear) + assert multi_tsn_head.fc_cls.in_features == multi_tsn_head.in_channels + assert multi_tsn_head.fc_cls.out_features == multi_tsn_head.num_classes + + assert isinstance(multi_tsn_head.avg_pool, nn.AdaptiveAvgPool2d) + assert multi_tsn_head.avg_pool.output_size == (1, 1) + + input_shape = (8, 2048, 7, 7) + feat = torch.rand(input_shape) + + # multi-class tsn head inference + num_segs = input_shape[0] + cls_scores = tsn_head(feat, num_segs) + assert cls_scores.shape == torch.Size([1, 4]) + + +def test_tsn_head_audio(): + """Test loss method, layer construction, attributes and forward function in + tsn head.""" + tsn_head_audio = AudioTSNHead(num_classes=4, in_channels=5) + tsn_head_audio.init_weights() + + assert tsn_head_audio.num_classes == 4 + assert tsn_head_audio.dropout_ratio == 0.4 + assert tsn_head_audio.in_channels == 5 + assert tsn_head_audio.init_std == 0.01 + assert tsn_head_audio.spatial_type == 'avg' + + assert isinstance(tsn_head_audio.dropout, nn.Dropout) + assert tsn_head_audio.dropout.p == tsn_head_audio.dropout_ratio + + assert isinstance(tsn_head_audio.fc_cls, nn.Linear) + assert tsn_head_audio.fc_cls.in_features == tsn_head_audio.in_channels + assert tsn_head_audio.fc_cls.out_features == tsn_head_audio.num_classes + + assert isinstance(tsn_head_audio.avg_pool, nn.AdaptiveAvgPool2d) + assert tsn_head_audio.avg_pool.output_size == (1, 1) + + input_shape = (8, 5, 7, 7) + feat = torch.rand(input_shape) + + # tsn head inference + cls_scores = tsn_head_audio(feat) + assert cls_scores.shape == torch.Size([8, 4]) + + +def test_tsm_head(): + """Test loss method, layer construction, attributes and forward function in + tsm head.""" + tsm_head = TSMHead(num_classes=4, in_channels=2048) + tsm_head.init_weights() + + assert tsm_head.num_classes == 4 + assert tsm_head.dropout_ratio == 0.8 + assert tsm_head.in_channels == 2048 + assert tsm_head.init_std == 0.001 + assert tsm_head.consensus.dim == 1 + assert tsm_head.spatial_type == 'avg' + + assert isinstance(tsm_head.dropout, nn.Dropout) + assert tsm_head.dropout.p == tsm_head.dropout_ratio + + assert isinstance(tsm_head.fc_cls, nn.Linear) + assert tsm_head.fc_cls.in_features == tsm_head.in_channels + assert tsm_head.fc_cls.out_features == tsm_head.num_classes + + assert isinstance(tsm_head.avg_pool, nn.AdaptiveAvgPool2d) + assert tsm_head.avg_pool.output_size == 1 + + input_shape = (8, 2048, 7, 7) + feat = torch.rand(input_shape) + + # tsm head inference with no init + num_segs = input_shape[0] + cls_scores = tsm_head(feat, num_segs) + assert cls_scores.shape == torch.Size([1, 4]) + + # tsm head inference with init + tsm_head = TSMHead(num_classes=4, in_channels=2048, temporal_pool=True) + tsm_head.init_weights() + cls_scores = tsm_head(feat, num_segs) + assert cls_scores.shape == torch.Size([2, 4]) + + +def test_trn_head(): + """Test loss method, layer construction, attributes and forward function in + trn head.""" + from mmaction.models.heads.trn_head import (RelationModule, + RelationModuleMultiScale) + trn_head = TRNHead(num_classes=4, in_channels=2048, relation_type='TRN') + trn_head.init_weights() + + assert trn_head.num_classes == 4 + assert trn_head.dropout_ratio == 0.8 + assert trn_head.in_channels == 2048 + assert trn_head.init_std == 0.001 + assert trn_head.spatial_type == 'avg' + + relation_module = trn_head.consensus + assert isinstance(relation_module, RelationModule) + assert relation_module.hidden_dim == 256 + assert isinstance(relation_module.classifier[3], nn.Linear) + assert relation_module.classifier[3].out_features == trn_head.num_classes + + assert trn_head.dropout.p == trn_head.dropout_ratio + assert isinstance(trn_head.dropout, nn.Dropout) + assert isinstance(trn_head.fc_cls, nn.Linear) + assert trn_head.fc_cls.in_features == trn_head.in_channels + assert trn_head.fc_cls.out_features == trn_head.hidden_dim + + assert isinstance(trn_head.avg_pool, nn.AdaptiveAvgPool2d) + assert trn_head.avg_pool.output_size == 1 + + input_shape = (8, 2048, 7, 7) + feat = torch.rand(input_shape) + + # tsm head inference with no init + num_segs = input_shape[0] + cls_scores = trn_head(feat, num_segs) + assert cls_scores.shape == torch.Size([1, 4]) + + # tsm head inference with init + trn_head = TRNHead( + num_classes=4, + in_channels=2048, + num_segments=8, + relation_type='TRNMultiScale') + trn_head.init_weights() + assert isinstance(trn_head.consensus, RelationModuleMultiScale) + assert trn_head.consensus.scales == range(8, 1, -1) + cls_scores = trn_head(feat, num_segs) + assert cls_scores.shape == torch.Size([1, 4]) + + with pytest.raises(ValueError): + trn_head = TRNHead( + num_classes=4, + in_channels=2048, + num_segments=8, + relation_type='RelationModlue') + + +def test_timesformer_head(): + """Test loss method, layer construction, attributes and forward function in + timesformer head.""" + timesformer_head = TimeSformerHead(num_classes=4, in_channels=64) + timesformer_head.init_weights() + + assert timesformer_head.num_classes == 4 + assert timesformer_head.in_channels == 64 + assert timesformer_head.init_std == 0.02 + + input_shape = (2, 64) + feat = torch.rand(input_shape) + + cls_scores = timesformer_head(feat) + assert cls_scores.shape == torch.Size([2, 4]) + + +@patch.object(mmaction.models.LFBInferHead, '__del__', Mock) +def test_lfb_infer_head(): + """Test layer construction, attributes and forward function in lfb infer + head.""" + with tempfile.TemporaryDirectory() as tmpdir: + lfb_infer_head = LFBInferHead( + lfb_prefix_path=tmpdir, use_half_precision=True) + lfb_infer_head.init_weights() + + st_feat_shape = (3, 16, 1, 8, 8) + st_feat = generate_backbone_demo_inputs(st_feat_shape) + rois = torch.cat( + (torch.tensor([0, 1, 0]).float().view(3, 1), torch.randn(3, 4)), dim=1) + img_metas = [dict(img_key='video_1,777'), dict(img_key='video_2, 888')] + result = lfb_infer_head(st_feat, rois, img_metas) + assert st_feat.equal(result) + assert len(lfb_infer_head.all_features) == 3 + assert lfb_infer_head.all_features[0].shape == (16, 1, 1, 1) + + +def test_fbo_head(): + """Test layer construction, attributes and forward function in fbo head.""" + lfb_prefix_path = osp.normpath( + osp.join(osp.dirname(__file__), '../data/lfb')) + + st_feat_shape = (1, 16, 1, 8, 8) + st_feat = generate_backbone_demo_inputs(st_feat_shape) + rois = torch.randn(1, 5) + rois[0][0] = 0 + img_metas = [dict(img_key='video_1, 930')] + + # non local fbo + fbo_head = FBOHead( + lfb_cfg=dict( + lfb_prefix_path=lfb_prefix_path, + max_num_sampled_feat=5, + window_size=60, + lfb_channels=16, + dataset_modes=('unittest'), + device='cpu'), + fbo_cfg=dict( + type='non_local', + st_feat_channels=16, + lt_feat_channels=16, + latent_channels=8, + num_st_feat=1, + num_lt_feat=5 * 60, + )) + fbo_head.init_weights() + out = fbo_head(st_feat, rois, img_metas) + assert out.shape == (1, 24, 1, 1, 1) + + # avg fbo + fbo_head = FBOHead( + lfb_cfg=dict( + lfb_prefix_path=lfb_prefix_path, + max_num_sampled_feat=5, + window_size=60, + lfb_channels=16, + dataset_modes=('unittest'), + device='cpu'), + fbo_cfg=dict(type='avg')) + fbo_head.init_weights() + out = fbo_head(st_feat, rois, img_metas) + assert out.shape == (1, 32, 1, 1, 1) + + # max fbo + fbo_head = FBOHead( + lfb_cfg=dict( + lfb_prefix_path=lfb_prefix_path, + max_num_sampled_feat=5, + window_size=60, + lfb_channels=16, + dataset_modes=('unittest'), + device='cpu'), + fbo_cfg=dict(type='max')) + fbo_head.init_weights() + out = fbo_head(st_feat, rois, img_metas) + assert out.shape == (1, 32, 1, 1, 1) + + +def test_tpn_head(): + """Test loss method, layer construction, attributes and forward function in + tpn head.""" + tpn_head = TPNHead(num_classes=4, in_channels=2048) + tpn_head.init_weights() + + assert hasattr(tpn_head, 'avg_pool2d') + assert hasattr(tpn_head, 'avg_pool3d') + assert isinstance(tpn_head.avg_pool3d, nn.AdaptiveAvgPool3d) + assert tpn_head.avg_pool3d.output_size == (1, 1, 1) + assert tpn_head.avg_pool2d is None + + input_shape = (4, 2048, 7, 7) + feat = torch.rand(input_shape) + + # tpn head inference with num_segs + num_segs = 2 + cls_scores = tpn_head(feat, num_segs) + assert isinstance(tpn_head.avg_pool2d, nn.AvgPool3d) + assert tpn_head.avg_pool2d.kernel_size == (1, 7, 7) + assert cls_scores.shape == torch.Size([2, 4]) + + # tpn head inference with no num_segs + input_shape = (2, 2048, 3, 7, 7) + feat = torch.rand(input_shape) + cls_scores = tpn_head(feat) + assert isinstance(tpn_head.avg_pool2d, nn.AvgPool3d) + assert tpn_head.avg_pool2d.kernel_size == (1, 7, 7) + assert cls_scores.shape == torch.Size([2, 4]) + + +def test_acrn_head(): + roi_feat = torch.randn(4, 16, 1, 7, 7) + feat = torch.randn(2, 16, 1, 16, 16) + rois = torch.Tensor([[0, 2.2268, 0.5926, 10.6142, 8.0029], + [0, 2.2577, 0.1519, 11.6451, 8.9282], + [1, 1.9874, 1.0000, 11.1585, 8.2840], + [1, 3.3338, 3.7166, 8.4174, 11.2785]]) + + acrn_head = ACRNHead(32, 16) + acrn_head.init_weights() + new_feat = acrn_head(roi_feat, feat, rois) + assert new_feat.shape == (4, 16, 1, 16, 16) + + acrn_head = ACRNHead(32, 16, stride=2) + new_feat = acrn_head(roi_feat, feat, rois) + assert new_feat.shape == (4, 16, 1, 8, 8) + + acrn_head = ACRNHead(32, 16, stride=2, num_convs=2) + new_feat = acrn_head(roi_feat, feat, rois) + assert new_feat.shape == (4, 16, 1, 8, 8) + + +def test_stgcn_head(): + """Test loss method, layer construction, attributes and forward function in + stgcn head.""" + with pytest.raises(NotImplementedError): + # spatial_type not in ['avg', 'max'] + stgcn_head = STGCNHead( + num_classes=60, in_channels=256, spatial_type='min') + stgcn_head.init_weights() + + # spatial_type='avg' + stgcn_head = STGCNHead(num_classes=60, in_channels=256, spatial_type='avg') + stgcn_head.init_weights() + + assert stgcn_head.num_classes == 60 + assert stgcn_head.in_channels == 256 + + input_shape = (2, 256, 75, 17) + feat = torch.rand(input_shape) + + cls_scores = stgcn_head(feat) + assert cls_scores.shape == torch.Size([1, 60]) + + # spatial_type='max' + stgcn_head = STGCNHead(num_classes=60, in_channels=256, spatial_type='max') + stgcn_head.init_weights() + + assert stgcn_head.num_classes == 60 + assert stgcn_head.in_channels == 256 + + input_shape = (2, 256, 75, 17) + feat = torch.rand(input_shape) + + cls_scores = stgcn_head(feat) + assert cls_scores.shape == torch.Size([1, 60]) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/__init__.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/__init__.py @@ -0,0 +1 @@ +# Copyright (c) OpenMMLab. All rights reserved. diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_bmn.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_bmn.py new file mode 100644 index 0000000000000000000000000000000000000000..dde3029cafb540a7196683024b4a217fe92a86b4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_bmn.py @@ -0,0 +1,68 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import platform + +import numpy as np +import pytest +import torch + +from mmaction.models import build_localizer +from ..base import get_localizer_cfg + + +@pytest.mark.skipif(platform.system() == 'Windows', reason='Windows mem limit') +def test_bmn_train(): + model_cfg = get_localizer_cfg( + 'bmn/bmn_400x100_2x8_9e_activitynet_feature.py') + + if torch.cuda.is_available(): + localizer_bmn = build_localizer(model_cfg.model).cuda() + raw_feature = torch.rand(3, 400, 100).cuda() + gt_bbox = np.array([[[0.1, 0.3], [0.375, 0.625]]] * 3) + losses = localizer_bmn(raw_feature, gt_bbox) + assert isinstance(losses, dict) + + else: + localizer_bmn = build_localizer(model_cfg.model) + raw_feature = torch.rand(3, 400, 100) + gt_bbox = torch.Tensor([[[0.1, 0.3], [0.375, 0.625]]] * 3) + losses = localizer_bmn(raw_feature, gt_bbox) + assert isinstance(losses, dict) + + +@pytest.mark.skipif(platform.system() == 'Windows', reason='Windows mem limit') +def test_bmn_test(): + model_cfg = get_localizer_cfg( + 'bmn/bmn_400x100_2x8_9e_activitynet_feature.py') + + if torch.cuda.is_available(): + localizer_bmn = build_localizer(model_cfg.model).cuda() + video_meta = [ + dict( + video_name='v_test', + duration_second=100, + duration_frame=960, + feature_frame=960) + ] + with torch.no_grad(): + one_raw_feature = torch.rand(1, 400, 100).cuda() + localizer_bmn( + one_raw_feature, + gt_bbox=None, + video_meta=video_meta, + return_loss=False) + else: + localizer_bmn = build_localizer(model_cfg.model) + video_meta = [ + dict( + video_name='v_test', + duration_second=100, + duration_frame=960, + feature_frame=960) + ] + with torch.no_grad(): + one_raw_feature = torch.rand(1, 400, 100) + localizer_bmn( + one_raw_feature, + gt_bbox=None, + video_meta=video_meta, + return_loss=False) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_localizers.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_localizers.py new file mode 100644 index 0000000000000000000000000000000000000000..98df755126b5b2e8082eeacc0d2b12e6c2bbfd8d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_localizers.py @@ -0,0 +1,34 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import numpy as np + +from mmaction.models.localizers.utils import post_processing + + +def test_post_processing(): + # test with multiple results + result = np.array([[0., 1., 1., 1., 0.5, 0.5], [0., 0.4, 1., 1., 0.4, 0.4], + [0., 0.95, 1., 1., 0.6, 0.6]]) + video_info = dict( + video_name='v_test', + duration_second=100, + duration_frame=960, + feature_frame=960) + proposal_list = post_processing(result, video_info, 0.75, 0.65, 0.9, 2, 16) + assert isinstance(proposal_list[0], dict) + assert proposal_list[0]['score'] == 0.6 + assert proposal_list[0]['segment'] == [0., 95.0] + assert isinstance(proposal_list[1], dict) + assert proposal_list[1]['score'] == 0.4 + assert proposal_list[1]['segment'] == [0., 40.0] + + # test with only result + result = np.array([[0., 1., 1., 1., 0.5, 0.5]]) + video_info = dict( + video_name='v_test', + duration_second=100, + duration_frame=960, + feature_frame=960) + proposal_list = post_processing(result, video_info, 0.75, 0.65, 0.9, 1, 16) + assert isinstance(proposal_list[0], dict) + assert proposal_list[0]['score'] == 0.5 + assert proposal_list[0]['segment'] == [0., 100.0] diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_pem.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_pem.py new file mode 100644 index 0000000000000000000000000000000000000000..c0e5ff775025091ba549c88a78b69f50a8110003 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_pem.py @@ -0,0 +1,49 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import platform + +import pytest +import torch + +from mmaction.models import build_localizer +from ..base import get_localizer_cfg + + +@pytest.mark.skipif(platform.system() == 'Windows', reason='Windows mem limit') +def test_pem(): + model_cfg = get_localizer_cfg( + 'bsn/bsn_pem_400x100_1x16_20e_activitynet_feature.py') + + localizer_pem = build_localizer(model_cfg.model) + bsp_feature = torch.rand(8, 100, 32) + reference_temporal_iou = torch.rand(8, 100) + losses = localizer_pem(bsp_feature, reference_temporal_iou) + assert isinstance(losses, dict) + + # Test forward test + tmin = torch.rand(100) + tmax = torch.rand(100) + tmin_score = torch.rand(100) + tmax_score = torch.rand(100) + + video_meta = [ + dict( + video_name='v_test', + duration_second=100, + duration_frame=1000, + annotations=[{ + 'segment': [0.3, 0.6], + 'label': 'Rock climbing' + }], + feature_frame=900) + ] + with torch.no_grad(): + for one_bsp_feature in bsp_feature: + one_bsp_feature = one_bsp_feature.reshape(1, 100, 32) + localizer_pem( + one_bsp_feature, + tmin=tmin, + tmax=tmax, + tmin_score=tmin_score, + tmax_score=tmax_score, + video_meta=video_meta, + return_loss=False) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_ssn.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_ssn.py new file mode 100644 index 0000000000000000000000000000000000000000..f1de07462a88a7e7ec66d1bed880203d9a9cfcbd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_ssn.py @@ -0,0 +1,206 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import platform + +import mmcv +import pytest +import torch + +from mmaction.models import build_localizer + + +@pytest.mark.skipif(platform.system() == 'Windows', reason='Windows mem limit') +def test_ssn_train(): + train_cfg = mmcv.ConfigDict( + dict( + ssn=dict( + assigner=dict( + positive_iou_threshold=0.7, + background_iou_threshold=0.01, + incomplete_iou_threshold=0.3, + background_coverage_threshold=0.02, + incomplete_overlap_threshold=0.01), + sampler=dict( + num_per_video=8, + positive_ratio=1, + background_ratio=1, + incomplete_ratio=6, + add_gt_as_proposals=True), + loss_weight=dict(comp_loss_weight=0.1, reg_loss_weight=0.1), + debug=False))) + base_model_cfg = dict( + type='SSN', + backbone=dict( + type='ResNet', pretrained=None, depth=18, norm_eval=True), + spatial_type='avg', + dropout_ratio=0.8, + loss_cls=dict(type='SSNLoss'), + cls_head=dict( + type='SSNHead', + dropout_ratio=0., + in_channels=512, + num_classes=20, + consensus=dict( + type='STPPTrain', + stpp_stage=(1, 1, 1), + num_segments_list=(2, 5, 2)), + use_regression=True), + train_cfg=train_cfg) + dropout_cfg = copy.deepcopy(base_model_cfg) + dropout_cfg['dropout_ratio'] = 0 + dropout_cfg['cls_head']['dropout_ratio'] = 0.5 + non_regression_cfg = copy.deepcopy(base_model_cfg) + non_regression_cfg['cls_head']['use_regression'] = False + + imgs = torch.rand(1, 8, 9, 3, 224, 224) + proposal_scale_factor = torch.Tensor([[[1.0345, 1.0345], [1.0028, 0.0028], + [1.0013, 1.0013], [1.0008, 1.0008], + [0.3357, 1.0006], [1.0006, 1.0006], + [0.0818, 1.0005], [1.0030, + 1.0030]]]) + proposal_type = torch.Tensor([[0, 1, 1, 1, 1, 1, 1, 2]]) + proposal_labels = torch.LongTensor([[8, 8, 8, 8, 8, 8, 8, 0]]) + reg_targets = torch.Tensor([[[0.2929, 0.2694], [0.0000, 0.0000], + [0.0000, 0.0000], [0.0000, 0.0000], + [0.0000, 0.0000], [0.0000, 0.0000], + [0.0000, 0.0000], [0.0000, 0.0000]]]) + + localizer_ssn = build_localizer(base_model_cfg) + localizer_ssn_dropout = build_localizer(dropout_cfg) + localizer_ssn_non_regression = build_localizer(non_regression_cfg) + + if torch.cuda.is_available(): + localizer_ssn = localizer_ssn.cuda() + localizer_ssn_dropout = localizer_ssn_dropout.cuda() + localizer_ssn_non_regression = localizer_ssn_non_regression.cuda() + imgs = imgs.cuda() + proposal_scale_factor = proposal_scale_factor.cuda() + proposal_type = proposal_type.cuda() + proposal_labels = proposal_labels.cuda() + reg_targets = reg_targets.cuda() + + # Train normal case + losses = localizer_ssn( + imgs, + proposal_scale_factor=proposal_scale_factor, + proposal_type=proposal_type, + proposal_labels=proposal_labels, + reg_targets=reg_targets) + assert isinstance(losses, dict) + + # Train SSN without dropout in model, with dropout in head + losses = localizer_ssn_dropout( + imgs, + proposal_scale_factor=proposal_scale_factor, + proposal_type=proposal_type, + proposal_labels=proposal_labels, + reg_targets=reg_targets) + assert isinstance(losses, dict) + + # Train SSN model without regression + losses = localizer_ssn_non_regression( + imgs, + proposal_scale_factor=proposal_scale_factor, + proposal_type=proposal_type, + proposal_labels=proposal_labels, + reg_targets=reg_targets) + assert isinstance(losses, dict) + + +@pytest.mark.skipif(platform.system() == 'Windows', reason='Windows mem limit') +def test_ssn_test(): + test_cfg = mmcv.ConfigDict( + dict( + ssn=dict( + sampler=dict(test_interval=6, batch_size=16), + evaluater=dict( + top_k=2000, + nms=0.2, + softmax_before_filter=True, + cls_score_dict=None, + cls_top_k=2)))) + base_model_cfg = dict( + type='SSN', + backbone=dict( + type='ResNet', pretrained=None, depth=18, norm_eval=True), + spatial_type='avg', + dropout_ratio=0.8, + cls_head=dict( + type='SSNHead', + dropout_ratio=0., + in_channels=512, + num_classes=20, + consensus=dict(type='STPPTest', stpp_stage=(1, 1, 1)), + use_regression=True), + test_cfg=test_cfg) + maxpool_model_cfg = copy.deepcopy(base_model_cfg) + maxpool_model_cfg['spatial_type'] = 'max' + non_regression_cfg = copy.deepcopy(base_model_cfg) + non_regression_cfg['cls_head']['use_regression'] = False + non_regression_cfg['cls_head']['consensus']['use_regression'] = False + tuple_stage_cfg = copy.deepcopy(base_model_cfg) + tuple_stage_cfg['cls_head']['consensus']['stpp_stage'] = (1, (1, 2), 1) + str_stage_cfg = copy.deepcopy(base_model_cfg) + str_stage_cfg['cls_head']['consensus']['stpp_stage'] = ('error', ) + + imgs = torch.rand(1, 8, 3, 224, 224) + relative_proposal_list = torch.Tensor([[[0.2500, 0.6250], [0.3750, + 0.7500]]]) + scale_factor_list = torch.Tensor([[[1.0000, 1.0000], [1.0000, 0.2661]]]) + proposal_tick_list = torch.LongTensor([[[1, 2, 5, 7], [20, 30, 60, 80]]]) + reg_norm_consts = torch.Tensor([[[-0.0603, 0.0325], [0.0752, 0.1596]]]) + + localizer_ssn = build_localizer(base_model_cfg) + localizer_ssn_maxpool = build_localizer(maxpool_model_cfg) + localizer_ssn_non_regression = build_localizer(non_regression_cfg) + localizer_ssn_tuple_stage_cfg = build_localizer(tuple_stage_cfg) + with pytest.raises(ValueError): + build_localizer(str_stage_cfg) + + if torch.cuda.is_available(): + localizer_ssn = localizer_ssn.cuda() + localizer_ssn_maxpool = localizer_ssn_maxpool.cuda() + localizer_ssn_non_regression = localizer_ssn_non_regression.cuda() + localizer_ssn_tuple_stage_cfg = localizer_ssn_tuple_stage_cfg.cuda() + imgs = imgs.cuda() + relative_proposal_list = relative_proposal_list.cuda() + scale_factor_list = scale_factor_list.cuda() + proposal_tick_list = proposal_tick_list.cuda() + reg_norm_consts = reg_norm_consts.cuda() + + with torch.no_grad(): + # Test normal case + localizer_ssn( + imgs, + relative_proposal_list=relative_proposal_list, + scale_factor_list=scale_factor_list, + proposal_tick_list=proposal_tick_list, + reg_norm_consts=reg_norm_consts, + return_loss=False) + + # Test SSN model with max spatial pooling + localizer_ssn_maxpool( + imgs, + relative_proposal_list=relative_proposal_list, + scale_factor_list=scale_factor_list, + proposal_tick_list=proposal_tick_list, + reg_norm_consts=reg_norm_consts, + return_loss=False) + + # Test SSN model without regression + localizer_ssn_non_regression( + imgs, + relative_proposal_list=relative_proposal_list, + scale_factor_list=scale_factor_list, + proposal_tick_list=proposal_tick_list, + reg_norm_consts=reg_norm_consts, + return_loss=False) + + # Test SSN model with tuple stage cfg. + localizer_ssn_tuple_stage_cfg( + imgs, + relative_proposal_list=relative_proposal_list, + scale_factor_list=scale_factor_list, + proposal_tick_list=proposal_tick_list, + reg_norm_consts=reg_norm_consts, + return_loss=False) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_tem.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_tem.py new file mode 100644 index 0000000000000000000000000000000000000000..ce19d385cb0da4311478c3b9b703cc8a9158a88b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_localizers/test_tem.py @@ -0,0 +1,28 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import platform + +import pytest +import torch + +from mmaction.models import build_localizer +from ..base import get_localizer_cfg + + +@pytest.mark.skipif(platform.system() == 'Windows', reason='Windows mem limit') +def test_tem(): + model_cfg = get_localizer_cfg( + 'bsn/bsn_tem_400x100_1x16_20e_activitynet_feature.py') + + localizer_tem = build_localizer(model_cfg.model) + raw_feature = torch.rand(8, 400, 100) + gt_bbox = torch.Tensor([[[1.0, 3.0], [3.0, 5.0]]] * 8) + losses = localizer_tem(raw_feature, gt_bbox) + assert isinstance(losses, dict) + + # Test forward test + video_meta = [{'video_name': 'v_test'}] + with torch.no_grad(): + for one_raw_feature in raw_feature: + one_raw_feature = one_raw_feature.reshape(1, 400, 100) + localizer_tem( + one_raw_feature, video_meta=video_meta, return_loss=False) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_neck.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_neck.py new file mode 100644 index 0000000000000000000000000000000000000000..6fc97fd19f9ff39fa4d62cf48345f03f20985dee --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_neck.py @@ -0,0 +1,87 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy + +import pytest +import torch + +from mmaction.models import TPN +from .base import generate_backbone_demo_inputs + + +def test_tpn(): + """Test TPN backbone.""" + + tpn_cfg = dict( + in_channels=(1024, 2048), + out_channels=1024, + spatial_modulation_cfg=dict( + in_channels=(1024, 2048), out_channels=2048), + temporal_modulation_cfg=dict(downsample_scales=(8, 8)), + upsample_cfg=dict(scale_factor=(1, 1, 1)), + downsample_cfg=dict(downsample_scale=(1, 1, 1)), + level_fusion_cfg=dict( + in_channels=(1024, 1024), + mid_channels=(1024, 1024), + out_channels=2048, + downsample_scales=((1, 1, 1), (1, 1, 1))), + aux_head_cfg=dict(out_channels=400, loss_weight=0.5)) + + with pytest.raises(AssertionError): + tpn_cfg_ = copy.deepcopy(tpn_cfg) + tpn_cfg_['in_channels'] = list(tpn_cfg_['in_channels']) + TPN(**tpn_cfg_) + + with pytest.raises(AssertionError): + tpn_cfg_ = copy.deepcopy(tpn_cfg) + tpn_cfg_['out_channels'] = float(tpn_cfg_['out_channels']) + TPN(**tpn_cfg_) + + with pytest.raises(AssertionError): + tpn_cfg_ = copy.deepcopy(tpn_cfg) + tpn_cfg_['downsample_cfg']['downsample_position'] = 'unsupport' + TPN(**tpn_cfg_) + + for k in tpn_cfg: + if not k.endswith('_cfg'): + continue + tpn_cfg_ = copy.deepcopy(tpn_cfg) + tpn_cfg_[k] = list() + with pytest.raises(AssertionError): + TPN(**tpn_cfg_) + + with pytest.raises(ValueError): + tpn_cfg_ = copy.deepcopy(tpn_cfg) + tpn_cfg_['flow_type'] = 'unsupport' + TPN(**tpn_cfg_) + + target_shape = (32, 1) + target = generate_backbone_demo_inputs(target_shape).long().squeeze() + x0_shape = (32, 1024, 1, 4, 4) + x1_shape = (32, 2048, 1, 2, 2) + x0 = generate_backbone_demo_inputs(x0_shape) + x1 = generate_backbone_demo_inputs(x1_shape) + x = [x0, x1] + + # ResNetTPN with 'cascade' flow_type + tpn_cfg_ = copy.deepcopy(tpn_cfg) + tpn_cascade = TPN(**tpn_cfg_) + feat, loss_aux = tpn_cascade(x, target) + assert feat.shape == torch.Size([32, 2048, 1, 2, 2]) + assert len(loss_aux) == 1 + + # ResNetTPN with 'parallel' flow_type + tpn_cfg_ = copy.deepcopy(tpn_cfg) + tpn_parallel = TPN(flow_type='parallel', **tpn_cfg_) + feat, loss_aux = tpn_parallel(x, target) + assert feat.shape == torch.Size([32, 2048, 1, 2, 2]) + assert len(loss_aux) == 1 + + # ResNetTPN with 'cascade' flow_type and target is None + feat, loss_aux = tpn_cascade(x, None) + assert feat.shape == torch.Size([32, 2048, 1, 2, 2]) + assert len(loss_aux) == 0 + + # ResNetTPN with 'parallel' flow_type and target is None + feat, loss_aux = tpn_parallel(x, None) + assert feat.shape == torch.Size([32, 2048, 1, 2, 2]) + assert len(loss_aux) == 0 diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/__init__.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/__init__.py @@ -0,0 +1 @@ +# Copyright (c) OpenMMLab. All rights reserved. diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_audio_recognizer.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_audio_recognizer.py new file mode 100644 index 0000000000000000000000000000000000000000..b2d0b2ef04ed0fe0f21dbaa4e8f39efbb2e921e2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_audio_recognizer.py @@ -0,0 +1,29 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch + +from mmaction.models import build_recognizer +from ..base import generate_recognizer_demo_inputs, get_audio_recognizer_cfg + + +def test_audio_recognizer(): + config = get_audio_recognizer_cfg( + 'resnet/tsn_r18_64x1x1_100e_kinetics400_audio_feature.py') + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 1, 128, 80) + demo_inputs = generate_recognizer_demo_inputs( + input_shape, model_type='audio') + + audios = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(audios, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + audio_list = [audio[None, :] for audio in audios] + for one_spectro in audio_list: + recognizer(one_spectro, None, return_loss=False) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_recognizer2d.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_recognizer2d.py new file mode 100644 index 0000000000000000000000000000000000000000..21c3a725d7d8cf0c51598bfb0250eb7ec736a9ff --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_recognizer2d.py @@ -0,0 +1,282 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch + +from mmaction.models import build_recognizer +from ..base import generate_recognizer_demo_inputs, get_recognizer_cfg + + +def test_tsn(): + config = get_recognizer_cfg('tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py') + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + # test forward dummy + recognizer.forward_dummy(imgs, softmax=False) + res = recognizer.forward_dummy(imgs, softmax=True)[0] + assert torch.min(res) >= 0 + assert torch.max(res) <= 1 + + mmcls_backbone = dict( + type='mmcls.ResNeXt', + depth=101, + num_stages=4, + out_indices=(3, ), + groups=32, + width_per_group=4, + style='pytorch') + config.model['backbone'] = mmcls_backbone + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # test mixup forward + config = get_recognizer_cfg( + 'tsn/tsn_r50_video_mixup_1x1x8_100e_kinetics400_rgb.py') + config.model['backbone']['pretrained'] = None + recognizer = build_recognizer(config.model) + input_shape = (2, 8, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # test torchvision backbones + tv_backbone = dict(type='torchvision.densenet161', pretrained=True) + config.model['backbone'] = tv_backbone + config.model['cls_head']['in_channels'] = 2208 + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # test timm backbones + timm_backbone = dict(type='timm.efficientnet_b0', pretrained=False) + config.model['backbone'] = timm_backbone + config.model['cls_head']['in_channels'] = 1280 + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + +def test_tsm(): + config = get_recognizer_cfg('tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py') + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + + input_shape = (1, 8, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # test twice sample + 3 crops + input_shape = (2, 48, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + imgs = demo_inputs['imgs'] + + config.model.test_cfg = dict(average_clips='prob') + recognizer = build_recognizer(config.model) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + +def test_trn(): + config = get_recognizer_cfg('trn/trn_r50_1x1x8_50e_sthv1_rgb.py') + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + + input_shape = (1, 8, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # test twice sample + 3 crops + input_shape = (2, 48, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + imgs = demo_inputs['imgs'] + + config.model.test_cfg = dict(average_clips='prob') + recognizer = build_recognizer(config.model) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + +def test_tpn(): + config = get_recognizer_cfg('tpn/tpn_tsm_r50_1x1x8_150e_sthv1_rgb.py') + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + + input_shape = (1, 8, 3, 224, 224) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + assert 'loss_aux' in losses and 'loss_cls' in losses + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + # Test forward dummy + with torch.no_grad(): + _recognizer = build_recognizer(config.model) + img_list = [img[None, :] for img in imgs] + if hasattr(_recognizer, 'forward_dummy'): + _recognizer.forward = _recognizer.forward_dummy + for one_img in img_list: + _recognizer(one_img) + + +def test_tanet(): + config = get_recognizer_cfg( + 'tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py') + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + + input_shape = (1, 8, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # test twice sample + 3 crops + input_shape = (2, 48, 3, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape) + imgs = demo_inputs['imgs'] + + config.model.test_cfg = dict(average_clips='prob') + recognizer = build_recognizer(config.model) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_recognizer3d.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_recognizer3d.py new file mode 100644 index 0000000000000000000000000000000000000000..f3bf5d62e758ccbde367bb77dbffd8a66937923a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_recognizer3d.py @@ -0,0 +1,314 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch + +from mmaction.models import build_recognizer +from ..base import generate_recognizer_demo_inputs, get_recognizer_cfg + + +def test_i3d(): + config = get_recognizer_cfg('i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py') + config.model['backbone']['pretrained2d'] = False + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 8, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape, '3D') + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + recognizer = recognizer.cuda() + imgs = imgs.cuda() + gt_labels = gt_labels.cuda() + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + # Test forward dummy + recognizer.forward_dummy(imgs, softmax=False) + res = recognizer.forward_dummy(imgs, softmax=True)[0] + assert torch.min(res) >= 0 + assert torch.max(res) <= 1 + + else: + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + # Test forward dummy + recognizer.forward_dummy(imgs, softmax=False) + res = recognizer.forward_dummy(imgs, softmax=True)[0] + assert torch.min(res) >= 0 + assert torch.max(res) <= 1 + + +def test_r2plus1d(): + config = get_recognizer_cfg( + 'r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py') + config.model['backbone']['pretrained2d'] = False + config.model['backbone']['pretrained'] = None + config.model['backbone']['norm_cfg'] = dict(type='BN3d') + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 8, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape, '3D') + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + recognizer = recognizer.cuda() + imgs = imgs.cuda() + gt_labels = gt_labels.cuda() + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + else: + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + +def test_slowfast(): + config = get_recognizer_cfg( + 'slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py') + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 16, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape, '3D') + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + recognizer = recognizer.cuda() + imgs = imgs.cuda() + gt_labels = gt_labels.cuda() + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + else: + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + # Test the feature max_testing_views + config.model.test_cfg['max_testing_views'] = 1 + recognizer = build_recognizer(config.model) + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + +def test_csn(): + config = get_recognizer_cfg( + 'csn/ircsn_ig65m_pretrained_r152_32x2x1_58e_kinetics400_rgb.py') + config.model['backbone']['pretrained2d'] = False + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 8, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape, '3D') + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + # parrots 3dconv is only implemented on gpu + if torch.__version__ == 'parrots': + if torch.cuda.is_available(): + recognizer = recognizer.cuda() + imgs = imgs.cuda() + gt_labels = gt_labels.cuda() + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + else: + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + +def test_tpn(): + config = get_recognizer_cfg( + 'tpn/tpn_slowonly_r50_8x8x1_150e_kinetics_rgb.py') + config.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config.model) + + input_shape = (1, 8, 3, 1, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape, '3D') + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + # Test dummy forward + with torch.no_grad(): + _recognizer = build_recognizer(config.model) + img_list = [img[None, :] for img in imgs] + if hasattr(_recognizer, 'forward_dummy'): + _recognizer.forward = _recognizer.forward_dummy + for one_img in img_list: + _recognizer(one_img) + + +def test_timesformer(): + config = get_recognizer_cfg( + 'timesformer/timesformer_divST_8x32x1_15e_kinetics400_rgb.py') + config.model['backbone']['pretrained'] = None + config.model['backbone']['img_size'] = 32 + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 8, 32, 32) + demo_inputs = generate_recognizer_demo_inputs(input_shape, '3D') + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) + + +def test_c3d(): + config = get_recognizer_cfg('c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py') + config.model['backbone']['pretrained'] = None + config.model['backbone']['out_dim'] = 512 + + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 3, 16, 28, 28) + demo_inputs = generate_recognizer_demo_inputs(input_shape, '3D') + + imgs = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(imgs, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + img_list = [img[None, :] for img in imgs] + for one_img in img_list: + recognizer(one_img, None, return_loss=False) + + # Test forward gradcam + recognizer(imgs, gradcam=True) + for one_img in img_list: + recognizer(one_img, gradcam=True) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_skeletongcn.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_skeletongcn.py new file mode 100644 index 0000000000000000000000000000000000000000..063a090214d43e390ef8a81618a583a02069bdc5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_recognizers/test_skeletongcn.py @@ -0,0 +1,51 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import pytest +import torch + +from mmaction.models import build_recognizer +from ..base import generate_recognizer_demo_inputs, get_skeletongcn_cfg + + +def test_skeletongcn(): + config = get_skeletongcn_cfg('stgcn/stgcn_80e_ntu60_xsub_keypoint.py') + with pytest.raises(TypeError): + # "pretrained" must be a str or None + config.model['backbone']['pretrained'] = ['None'] + recognizer = build_recognizer(config.model) + + config.model['backbone']['pretrained'] = None + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 300, 17, 2) + demo_inputs = generate_recognizer_demo_inputs(input_shape, 'skeleton') + + skeletons = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(skeletons, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + skeleton_list = [skeleton[None, :] for skeleton in skeletons] + for one_skeleton in skeleton_list: + recognizer(one_skeleton, None, return_loss=False) + + # test stgcn without edge importance weighting + config.model['backbone']['edge_importance_weighting'] = False + recognizer = build_recognizer(config.model) + + input_shape = (1, 3, 300, 17, 2) + demo_inputs = generate_recognizer_demo_inputs(input_shape, 'skeleton') + + skeletons = demo_inputs['imgs'] + gt_labels = demo_inputs['gt_labels'] + + losses = recognizer(skeletons, gt_labels) + assert isinstance(losses, dict) + + # Test forward test + with torch.no_grad(): + skeleton_list = [skeleton[None, :] for skeleton in skeletons] + for one_skeleton in skeleton_list: + recognizer(one_skeleton, None, return_loss=False) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_roi_extractor.py b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_roi_extractor.py new file mode 100644 index 0000000000000000000000000000000000000000..6448019845329f91e5033185fdb3235d8097e3a9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_models/test_roi_extractor.py @@ -0,0 +1,58 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch + +from mmaction.models import SingleRoIExtractor3D + + +def test_single_roi_extractor3d(): + roi_extractor = SingleRoIExtractor3D( + roi_layer_type='RoIAlign', + featmap_stride=16, + output_size=8, + sampling_ratio=0, + pool_mode='avg', + aligned=True, + with_temporal_pool=True) + feat = torch.randn([4, 64, 8, 16, 16]) + rois = torch.tensor([[0., 1., 1., 6., 6.], [1., 2., 2., 7., 7.], + [3., 2., 2., 9., 9.], [2., 2., 0., 10., 9.]]) + roi_feat, feat = roi_extractor(feat, rois) + assert roi_feat.shape == (4, 64, 1, 8, 8) + assert feat.shape == (4, 64, 1, 16, 16) + + feat = (torch.randn([4, 64, 8, 16, 16]), torch.randn([4, 32, 16, 16, 16])) + roi_feat, feat = roi_extractor(feat, rois) + assert roi_feat.shape == (4, 96, 1, 8, 8) + assert feat.shape == (4, 96, 1, 16, 16) + + feat = torch.randn([4, 64, 8, 16, 16]) + roi_extractor = SingleRoIExtractor3D( + roi_layer_type='RoIAlign', + featmap_stride=16, + output_size=8, + sampling_ratio=0, + pool_mode='avg', + aligned=True, + with_temporal_pool=False) + roi_feat, feat = roi_extractor(feat, rois) + assert roi_feat.shape == (4, 64, 8, 8, 8) + assert feat.shape == (4, 64, 8, 16, 16) + + feat = (torch.randn([4, 64, 8, 16, 16]), torch.randn([4, 32, 16, 16, 16])) + roi_feat, feat = roi_extractor(feat, rois) + assert roi_feat.shape == (4, 96, 16, 8, 8) + assert feat.shape == (4, 96, 16, 16, 16) + + feat = torch.randn([4, 64, 8, 16, 16]) + roi_extractor = SingleRoIExtractor3D( + roi_layer_type='RoIAlign', + featmap_stride=16, + output_size=8, + sampling_ratio=0, + pool_mode='avg', + aligned=True, + with_temporal_pool=True, + with_global=True) + roi_feat, feat = roi_extractor(feat, rois) + assert roi_feat.shape == (4, 128, 1, 8, 8) + assert feat.shape == (4, 64, 1, 16, 16) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_apis_test.py b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_apis_test.py new file mode 100644 index 0000000000000000000000000000000000000000..c3b853d3e72e190d2116cf11c7a124c1b9e0989a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_apis_test.py @@ -0,0 +1,119 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import sys +import warnings +from unittest.mock import MagicMock, Mock, patch + +import pytest +import torch +import torch.nn as nn +from torch.utils.data import DataLoader, Dataset + +# TODO import test functions from mmcv and delete them from mmaction2 +try: + from mmcv.engine import (collect_results_cpu, multi_gpu_test, + single_gpu_test) + pytest.skip( + 'Test functions are supported in MMCV', allow_module_level=True) +except (ImportError, ModuleNotFoundError): + warnings.warn( + 'DeprecationWarning: single_gpu_test, multi_gpu_test, ' + 'collect_results_cpu, collect_results_gpu from mmaction2 will be ' + 'deprecated. Please install mmcv through master branch.') + from mmaction.apis.test import (collect_results_cpu, multi_gpu_test, + single_gpu_test) + + +class OldStyleModel(nn.Module): + + def __init__(self): + super().__init__() + self.conv = nn.Conv2d(3, 3, 1) + self.cnt = 0 + + def forward(self, *args, **kwargs): + result = [self.cnt] + self.cnt += 1 + return result + + +class Model(OldStyleModel): + + def train_step(self): + pass + + def val_step(self): + pass + + +class ExampleDataset(Dataset): + + def __init__(self): + self.index = 0 + self.eval_result = [1, 4, 3, 7, 2, -3, 4, 6] + + def __getitem__(self, idx): + results = dict(imgs=torch.tensor([1])) + return results + + def __len__(self): + return len(self.eval_result) + + +def test_single_gpu_test(): + test_dataset = ExampleDataset() + loader = DataLoader(test_dataset, batch_size=1) + model = Model() + + results = single_gpu_test(model, loader) + assert results == list(range(8)) + + +def mock_tensor_without_cuda(*args, **kwargs): + if 'device' not in kwargs: + return torch.Tensor(*args) + return torch.IntTensor(*args, device='cpu') + + +@patch('mmaction.apis.test.collect_results_gpu', + Mock(return_value=list(range(8)))) +@patch('mmaction.apis.test.collect_results_cpu', + Mock(return_value=list(range(8)))) +def test_multi_gpu_test(): + test_dataset = ExampleDataset() + loader = DataLoader(test_dataset, batch_size=1) + model = Model() + + results = multi_gpu_test(model, loader) + assert results == list(range(8)) + + results = multi_gpu_test(model, loader, gpu_collect=False) + assert results == list(range(8)) + + +@patch('mmcv.runner.get_dist_info', Mock(return_value=(0, 1))) +@patch('torch.distributed.broadcast', MagicMock) +@patch('torch.distributed.barrier', Mock) +@pytest.mark.skipif( + sys.version_info[:2] == (3, 8), reason='Not for python 3.8') +def test_collect_results_cpu(): + + def content_for_unittest(): + results_part = list(range(8)) + size = 8 + + results = collect_results_cpu(results_part, size) + assert results == list(range(8)) + + results = collect_results_cpu(results_part, size, 'unittest') + assert results == list(range(8)) + + if not torch.cuda.is_available(): + with patch( + 'torch.full', + Mock( + return_value=torch.full( + (512, ), 32, dtype=torch.uint8, device='cpu'))): + with patch('torch.tensor', mock_tensor_without_cuda): + content_for_unittest() + else: + content_for_unittest() diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_config.py b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_config.py new file mode 100644 index 0000000000000000000000000000000000000000..21c7cb43b753aa04203ab1d2bf5a96bf11bfbabf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_config.py @@ -0,0 +1,74 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import glob +import os +import os.path as osp + +import mmcv +import torch.nn as nn + +from mmaction.models import build_localizer, build_recognizer + + +def _get_config_path(): + """Find the predefined recognizer config path.""" + repo_dir = osp.dirname(osp.dirname(osp.dirname(__file__))) + config_dpath = osp.join(repo_dir, 'configs') + if not osp.exists(config_dpath): + raise Exception('Cannot find config path') + config_fpaths = list(glob.glob(osp.join(config_dpath, '*.py'))) + config_names = [os.path.relpath(p, config_dpath) for p in config_fpaths] + print(f'Using {len(config_names)} config files') + config_fpaths = [ + osp.join(config_dpath, config_fpath) for config_fpath in config_fpaths + ] + return config_fpaths + + +def test_config_build_recognizer(): + """Test that all mmaction models defined in the configs can be + initialized.""" + repo_dir = osp.dirname(osp.dirname(osp.dirname(__file__))) + config_dpath = osp.join(repo_dir, 'configs/recognition') + if not osp.exists(config_dpath): + raise Exception('Cannot find config path') + config_fpaths = list(glob.glob(osp.join(config_dpath, '*.py'))) + # test all config file in `configs` directory + for config_fpath in config_fpaths: + config_mod = mmcv.Config.fromfile(config_fpath) + print(f'Building recognizer, config_fpath = {config_fpath!r}') + + # Remove pretrained keys to allow for testing in an offline environment + if 'pretrained' in config_mod.model['backbone']: + config_mod.model['backbone']['pretrained'] = None + + recognizer = build_recognizer(config_mod.model) + assert isinstance(recognizer, nn.Module) + + +def _get_config_path_for_localizer(): + """Find the predefined localizer config path for localizer.""" + repo_dir = osp.dirname(osp.dirname(osp.dirname(__file__))) + config_dpath = osp.join(repo_dir, 'configs/localization') + if not osp.exists(config_dpath): + raise Exception('Cannot find config path') + config_fpaths = list(glob.glob(osp.join(config_dpath, '*.py'))) + config_names = [os.path.relpath(p, config_dpath) for p in config_fpaths] + print(f'Using {len(config_names)} config files') + config_fpaths = [ + osp.join(config_dpath, config_fpath) for config_fpath in config_fpaths + ] + return config_fpaths + + +def test_config_build_localizer(): + """Test that all mmaction models defined in the configs can be + initialized.""" + config_fpaths = _get_config_path_for_localizer() + + # test all config file in `configs/localization` directory + for config_fpath in config_fpaths: + config_mod = mmcv.Config.fromfile(config_fpath) + print(f'Building localizer, config_fpath = {config_fpath!r}') + if config_mod.get('model', None): + localizer = build_localizer(config_mod.model) + assert isinstance(localizer, nn.Module) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_eval_hook.py b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_eval_hook.py new file mode 100644 index 0000000000000000000000000000000000000000..8d601f247af47835e19568a7911cc709f45a1e82 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_eval_hook.py @@ -0,0 +1,347 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp +import shutil +import tempfile +import unittest.mock as mock +import warnings +from collections import OrderedDict +from unittest.mock import MagicMock, patch + +import pytest +import torch +import torch.nn as nn +from mmcv.runner import EpochBasedRunner, IterBasedRunner +from mmcv.utils import get_logger +from torch.utils.data import DataLoader, Dataset + +# TODO import eval hooks from mmcv and delete them from mmaction2 +try: + from mmcv.runner import DistEvalHook, EvalHook + pytest.skip( + 'EvalHook and DistEvalHook are supported in MMCV', + allow_module_level=True) +except ImportError: + warnings.warn('DeprecationWarning: EvalHook and DistEvalHook from ' + 'mmaction2 will be deprecated. Please install mmcv through ' + 'master branch.') + from mmaction.core import DistEvalHook, EvalHook + + +class ExampleDataset(Dataset): + + def __init__(self): + self.index = 0 + self.eval_result = [1, 4, 3, 7, 2, -3, 4, 6] + + def __getitem__(self, idx): + results = dict(x=torch.tensor([1])) + return results + + def __len__(self): + return 1 + + @mock.create_autospec + def evaluate(self, results, logger=None): + pass + + +class EvalDataset(ExampleDataset): + + def evaluate(self, results, logger=None): + acc = self.eval_result[self.index] + output = OrderedDict(acc=acc, index=self.index, score=acc) + self.index += 1 + return output + + +class Model(nn.Module): + + def __init__(self): + super().__init__() + self.linear = nn.Linear(2, 1) + + @staticmethod + def forward(x, **kwargs): + return x + + @staticmethod + def train_step(data_batch, optimizer, **kwargs): + if not isinstance(data_batch, dict): + data_batch = dict(x=data_batch) + return data_batch + + def val_step(self, x, optimizer, **kwargs): + return dict(loss=self(x)) + + +def _build_epoch_runner(): + + model = Model() + tmp_dir = tempfile.mkdtemp() + + runner = EpochBasedRunner( + model=model, work_dir=tmp_dir, logger=get_logger('demo')) + return runner + + +def _build_iter_runner(): + + model = Model() + tmp_dir = tempfile.mkdtemp() + + runner = IterBasedRunner( + model=model, work_dir=tmp_dir, logger=get_logger('demo')) + return runner + + +def test_eval_hook(): + with pytest.raises(AssertionError): + # `save_best` should be a str + test_dataset = Model() + data_loader = DataLoader(test_dataset) + EvalHook(data_loader, save_best=True) + + with pytest.raises(TypeError): + # dataloader must be a pytorch DataLoader + test_dataset = Model() + data_loader = [DataLoader(test_dataset)] + EvalHook(data_loader) + + with pytest.raises(ValueError): + # save_best must be valid when rule_map is None + test_dataset = ExampleDataset() + data_loader = DataLoader(test_dataset) + EvalHook(data_loader, save_best='unsupport') + + with pytest.raises(KeyError): + # rule must be in keys of rule_map + test_dataset = Model() + data_loader = DataLoader(test_dataset) + EvalHook(data_loader, save_best='auto', rule='unsupport') + + test_dataset = ExampleDataset() + loader = DataLoader(test_dataset) + model = Model() + data_loader = DataLoader(test_dataset) + eval_hook = EvalHook(data_loader, save_best=None) + + with tempfile.TemporaryDirectory() as tmpdir: + + # total_epochs = 1 + logger = get_logger('test_eval') + runner = EpochBasedRunner(model=model, work_dir=tmpdir, logger=logger) + runner.register_hook(eval_hook) + runner.run([loader], [('train', 1)], 1) + test_dataset.evaluate.assert_called_with( + test_dataset, [torch.tensor([1])], logger=runner.logger) + assert runner.meta is None or 'best_score' not in runner.meta[ + 'hook_msgs'] + assert runner.meta is None or 'best_ckpt' not in runner.meta[ + 'hook_msgs'] + + # when `save_best` is set to 'auto', first metric will be used. + loader = DataLoader(EvalDataset()) + model = Model() + data_loader = DataLoader(EvalDataset()) + eval_hook = EvalHook(data_loader, interval=1, save_best='auto') + + with tempfile.TemporaryDirectory() as tmpdir: + logger = get_logger('test_eval') + runner = EpochBasedRunner(model=model, work_dir=tmpdir, logger=logger) + runner.register_checkpoint_hook(dict(interval=1)) + runner.register_hook(eval_hook) + runner.run([loader], [('train', 1)], 8) + + ckpt_path = osp.join(tmpdir, 'best_acc_epoch_4.pth') + + assert runner.meta['hook_msgs']['best_ckpt'] == osp.realpath(ckpt_path) + assert osp.exists(ckpt_path) + assert runner.meta['hook_msgs']['best_score'] == 7 + + # total_epochs = 8, return the best acc and corresponding epoch + loader = DataLoader(EvalDataset()) + model = Model() + data_loader = DataLoader(EvalDataset()) + eval_hook = EvalHook(data_loader, interval=1, save_best='acc') + + with tempfile.TemporaryDirectory() as tmpdir: + logger = get_logger('test_eval') + runner = EpochBasedRunner(model=model, work_dir=tmpdir, logger=logger) + runner.register_checkpoint_hook(dict(interval=1)) + runner.register_hook(eval_hook) + runner.run([loader], [('train', 1)], 8) + + ckpt_path = osp.join(tmpdir, 'best_acc_epoch_4.pth') + + assert runner.meta['hook_msgs']['best_ckpt'] == osp.realpath(ckpt_path) + assert osp.exists(ckpt_path) + assert runner.meta['hook_msgs']['best_score'] == 7 + + # total_epochs = 8, return the best score and corresponding epoch + data_loader = DataLoader(EvalDataset()) + eval_hook = EvalHook( + data_loader, interval=1, save_best='score', rule='greater') + with tempfile.TemporaryDirectory() as tmpdir: + logger = get_logger('test_eval') + runner = EpochBasedRunner(model=model, work_dir=tmpdir, logger=logger) + runner.register_checkpoint_hook(dict(interval=1)) + runner.register_hook(eval_hook) + runner.run([loader], [('train', 1)], 8) + + ckpt_path = osp.join(tmpdir, 'best_score_epoch_4.pth') + + assert runner.meta['hook_msgs']['best_ckpt'] == osp.realpath(ckpt_path) + assert osp.exists(ckpt_path) + assert runner.meta['hook_msgs']['best_score'] == 7 + + # total_epochs = 8, return the best score using less compare func + # and indicate corresponding epoch + data_loader = DataLoader(EvalDataset()) + eval_hook = EvalHook(data_loader, save_best='acc', rule='less') + with tempfile.TemporaryDirectory() as tmpdir: + logger = get_logger('test_eval') + runner = EpochBasedRunner(model=model, work_dir=tmpdir, logger=logger) + runner.register_checkpoint_hook(dict(interval=1)) + runner.register_hook(eval_hook) + runner.run([loader], [('train', 1)], 8) + + ckpt_path = osp.join(tmpdir, 'best_acc_epoch_6.pth') + + assert runner.meta['hook_msgs']['best_ckpt'] == osp.realpath(ckpt_path) + assert osp.exists(ckpt_path) + assert runner.meta['hook_msgs']['best_score'] == -3 + + # Test the EvalHook when resume happened + data_loader = DataLoader(EvalDataset()) + eval_hook = EvalHook(data_loader, save_best='acc') + with tempfile.TemporaryDirectory() as tmpdir: + logger = get_logger('test_eval') + runner = EpochBasedRunner(model=model, work_dir=tmpdir, logger=logger) + runner.register_checkpoint_hook(dict(interval=1)) + runner.register_hook(eval_hook) + runner.run([loader], [('train', 1)], 2) + + ckpt_path = osp.join(tmpdir, 'best_acc_epoch_2.pth') + + assert runner.meta['hook_msgs']['best_ckpt'] == osp.realpath(ckpt_path) + assert osp.exists(ckpt_path) + assert runner.meta['hook_msgs']['best_score'] == 4 + + resume_from = osp.join(tmpdir, 'latest.pth') + loader = DataLoader(ExampleDataset()) + eval_hook = EvalHook(data_loader, save_best='acc') + runner = EpochBasedRunner(model=model, work_dir=tmpdir, logger=logger) + runner.register_checkpoint_hook(dict(interval=1)) + runner.register_hook(eval_hook) + runner.resume(resume_from) + runner.run([loader], [('train', 1)], 8) + + ckpt_path = osp.join(tmpdir, 'best_acc_epoch_4.pth') + + assert runner.meta['hook_msgs']['best_ckpt'] == osp.realpath(ckpt_path) + assert osp.exists(ckpt_path) + assert runner.meta['hook_msgs']['best_score'] == 7 + + +@patch('mmaction.apis.single_gpu_test', MagicMock) +@patch('mmaction.apis.multi_gpu_test', MagicMock) +@pytest.mark.parametrize('EvalHookParam', [EvalHook, DistEvalHook]) +@pytest.mark.parametrize('_build_demo_runner,by_epoch', + [(_build_epoch_runner, True), + (_build_iter_runner, False)]) +def test_start_param(EvalHookParam, _build_demo_runner, by_epoch): + # create dummy data + dataloader = DataLoader(torch.ones((5, 2))) + + # 0.1. dataloader is not a DataLoader object + with pytest.raises(TypeError): + EvalHookParam(dataloader=MagicMock(), interval=-1) + + # 0.2. negative interval + with pytest.raises(ValueError): + EvalHookParam(dataloader, interval=-1) + + # 1. start=None, interval=1: perform evaluation after each epoch. + runner = _build_demo_runner() + evalhook = EvalHookParam( + dataloader, interval=1, by_epoch=by_epoch, save_best=None) + evalhook.evaluate = MagicMock() + runner.register_hook(evalhook) + runner.run([dataloader], [('train', 1)], 2) + assert evalhook.evaluate.call_count == 2 # after epoch 1 & 2 + + # 2. start=1, interval=1: perform evaluation after each epoch. + runner = _build_demo_runner() + evalhook = EvalHookParam( + dataloader, start=1, interval=1, by_epoch=by_epoch, save_best=None) + evalhook.evaluate = MagicMock() + runner.register_hook(evalhook) + runner.run([dataloader], [('train', 1)], 2) + assert evalhook.evaluate.call_count == 2 # after epoch 1 & 2 + + # 3. start=None, interval=2: perform evaluation after epoch 2, 4, 6, etc + runner = _build_demo_runner() + evalhook = EvalHookParam( + dataloader, interval=2, by_epoch=by_epoch, save_best=None) + evalhook.evaluate = MagicMock() + runner.register_hook(evalhook) + runner.run([dataloader], [('train', 1)], 2) + assert evalhook.evaluate.call_count == 1 # after epoch 2 + + # 4. start=1, interval=2: perform evaluation after epoch 1, 3, 5, etc + runner = _build_demo_runner() + evalhook = EvalHookParam( + dataloader, start=1, interval=2, by_epoch=by_epoch, save_best=None) + evalhook.evaluate = MagicMock() + runner.register_hook(evalhook) + runner.run([dataloader], [('train', 1)], 3) + assert evalhook.evaluate.call_count == 2 # after epoch 1 & 3 + + # 5. start=0/negative, interval=1: perform evaluation after each epoch and + # before epoch 1. + runner = _build_demo_runner() + evalhook = EvalHookParam( + dataloader, start=0, by_epoch=by_epoch, save_best=None) + evalhook.evaluate = MagicMock() + runner.register_hook(evalhook) + runner.run([dataloader], [('train', 1)], 2) + assert evalhook.evaluate.call_count == 3 # before epoch1 and after e1 & e2 + + runner = _build_demo_runner() + with pytest.warns(UserWarning): + evalhook = EvalHookParam( + dataloader, start=-2, by_epoch=by_epoch, save_best=None) + evalhook.evaluate = MagicMock() + runner.register_hook(evalhook) + runner.run([dataloader], [('train', 1)], 2) + assert evalhook.evaluate.call_count == 3 # before epoch1 and after e1 & e2 + + # 6. resuming from epoch i, start = x (x<=i), interval =1: perform + # evaluation after each epoch and before the first epoch. + runner = _build_demo_runner() + evalhook = EvalHookParam( + dataloader, start=1, by_epoch=by_epoch, save_best=None) + evalhook.evaluate = MagicMock() + runner.register_hook(evalhook) + if by_epoch: + runner._epoch = 2 + else: + runner._iter = 2 + runner.run([dataloader], [('train', 1)], 3) + assert evalhook.evaluate.call_count == 2 # before & after epoch 3 + + # 7. resuming from epoch i, start = i+1/None, interval =1: perform + # evaluation after each epoch. + runner = _build_demo_runner() + evalhook = EvalHookParam( + dataloader, start=2, by_epoch=by_epoch, save_best=None) + evalhook.evaluate = MagicMock() + runner.register_hook(evalhook) + if by_epoch: + runner._epoch = 1 + else: + runner._iter = 1 + runner.run([dataloader], [('train', 1)], 3) + assert evalhook.evaluate.call_count == 2 # after epoch 2 & 3 + + shutil.rmtree(runner.work_dir) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_inference.py b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_inference.py new file mode 100644 index 0000000000000000000000000000000000000000..f1f6a7b5cea9231f269bb6cea531133c6fba5064 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_inference.py @@ -0,0 +1,149 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import mmcv +import numpy as np +import pytest +import torch +import torch.nn as nn + +from mmaction.apis import inference_recognizer, init_recognizer + +video_config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py' # noqa: E501 +frame_config_file = 'configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py' # noqa: E501 +flow_frame_config_file = 'configs/recognition/tsn/tsn_r50_320p_1x1x3_110e_kinetics400_flow.py' # noqa: E501 +video_path = 'demo/demo.mp4' +frames_path = 'tests/data/imgs' + + +def test_init_recognizer(): + with pytest.raises(TypeError): + # config must be a filename or Config object + init_recognizer(dict(config_file=None)) + + if torch.cuda.is_available(): + device = 'cuda:0' + else: + device = 'cpu' + + model = init_recognizer(video_config_file, None, device) + + config = mmcv.Config.fromfile(video_config_file) + config.model.backbone.pretrained = None + + isinstance(model, nn.Module) + if torch.cuda.is_available(): + assert next(model.parameters()).is_cuda is True + else: + assert next(model.parameters()).is_cuda is False + assert model.cfg.model.backbone.pretrained is None + + +def test_video_inference_recognizer(): + if torch.cuda.is_available(): + device = 'cuda:0' + else: + device = 'cpu' + model = init_recognizer(video_config_file, None, device) + + with pytest.raises(RuntimeError): + # video path doesn't exist + inference_recognizer(model, 'missing.mp4') + + for ops in model.cfg.data.test.pipeline: + if ops['type'] in ('TenCrop', 'ThreeCrop'): + # Use CenterCrop to reduce memory in order to pass CI + ops['type'] = 'CenterCrop' + + top5_label = inference_recognizer(model, video_path) + scores = [item[1] for item in top5_label] + assert len(top5_label) == 5 + assert scores == sorted(scores, reverse=True) + + _, feat = inference_recognizer( + model, video_path, outputs=('backbone', 'cls_head'), as_tensor=False) + assert isinstance(feat, dict) + assert 'backbone' in feat and 'cls_head' in feat + assert isinstance(feat['backbone'], np.ndarray) + assert isinstance(feat['cls_head'], np.ndarray) + assert feat['backbone'].shape == (25, 2048, 7, 7) + assert feat['cls_head'].shape == (1, 400) + + _, feat = inference_recognizer( + model, + video_path, + outputs=('backbone.layer3', 'backbone.layer3.1.conv1')) + assert 'backbone.layer3.1.conv1' in feat and 'backbone.layer3' in feat + assert isinstance(feat['backbone.layer3.1.conv1'], torch.Tensor) + assert isinstance(feat['backbone.layer3'], torch.Tensor) + assert feat['backbone.layer3'].size() == (25, 1024, 14, 14) + assert feat['backbone.layer3.1.conv1'].size() == (25, 256, 14, 14) + + cfg_file = 'configs/recognition/slowfast/slowfast_r50_video_inference_4x16x1_256e_kinetics400_rgb.py' # noqa: E501 + sf_model = init_recognizer(cfg_file, None, device) + for ops in sf_model.cfg.data.test.pipeline: + # Changes to reduce memory in order to pass CI + if ops['type'] in ('TenCrop', 'ThreeCrop'): + ops['type'] = 'CenterCrop' + if ops['type'] == 'SampleFrames': + ops['num_clips'] = 1 + _, feat = inference_recognizer( + sf_model, video_path, outputs=('backbone', 'cls_head')) + assert isinstance(feat, dict) and isinstance(feat['backbone'], tuple) + assert 'backbone' in feat and 'cls_head' in feat + assert len(feat['backbone']) == 2 + assert isinstance(feat['backbone'][0], torch.Tensor) + assert isinstance(feat['backbone'][1], torch.Tensor) + assert feat['backbone'][0].size() == (1, 2048, 4, 8, 8) + assert feat['backbone'][1].size() == (1, 256, 32, 8, 8) + assert feat['cls_head'].size() == (1, 400) + + +def test_frames_inference_recognizer(): + if torch.cuda.is_available(): + device = 'cuda:0' + else: + device = 'cpu' + rgb_model = init_recognizer(frame_config_file, None, device) + flow_model = init_recognizer(flow_frame_config_file, None, device) + + with pytest.raises(RuntimeError): + # video path doesn't exist + inference_recognizer(rgb_model, 'missing_path') + + for ops in rgb_model.cfg.data.test.pipeline: + if ops['type'] in ('TenCrop', 'ThreeCrop'): + # Use CenterCrop to reduce memory in order to pass CI + ops['type'] = 'CenterCrop' + ops['crop_size'] = 224 + for ops in flow_model.cfg.data.test.pipeline: + if ops['type'] in ('TenCrop', 'ThreeCrop'): + # Use CenterCrop to reduce memory in order to pass CI + ops['type'] = 'CenterCrop' + ops['crop_size'] = 224 + + top5_label = inference_recognizer(rgb_model, frames_path) + scores = [item[1] for item in top5_label] + assert len(top5_label) == 5 + assert scores == sorted(scores, reverse=True) + + _, feat = inference_recognizer( + flow_model, + frames_path, + outputs=('backbone', 'cls_head'), + as_tensor=False) + assert isinstance(feat, dict) + assert 'backbone' in feat and 'cls_head' in feat + assert isinstance(feat['backbone'], np.ndarray) + assert isinstance(feat['cls_head'], np.ndarray) + assert feat['backbone'].shape == (25, 2048, 7, 7) + assert feat['cls_head'].shape == (1, 400) + + _, feat = inference_recognizer( + rgb_model, + frames_path, + outputs=('backbone.layer3', 'backbone.layer3.1.conv1')) + + assert 'backbone.layer3.1.conv1' in feat and 'backbone.layer3' in feat + assert isinstance(feat['backbone.layer3.1.conv1'], torch.Tensor) + assert isinstance(feat['backbone.layer3'], torch.Tensor) + assert feat['backbone.layer3'].size() == (25, 1024, 14, 14) + assert feat['backbone.layer3.1.conv1'].size() == (25, 256, 14, 14) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_lr.py b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_lr.py new file mode 100644 index 0000000000000000000000000000000000000000..7a530fecdfbf27f3538658ad2d6080c5a7567310 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_lr.py @@ -0,0 +1,121 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import logging +import os.path as osp +import shutil +import sys +import tempfile +from unittest.mock import MagicMock, call + +import torch +import torch.nn as nn +from mmcv.runner import IterTimerHook, PaviLoggerHook, build_runner +from torch.utils.data import DataLoader + + +def test_tin_lr_updater_hook(): + sys.modules['pavi'] = MagicMock() + loader = DataLoader(torch.ones((10, 2))) + runner = _build_demo_runner() + + hook_cfg = dict(type='TINLrUpdaterHook', min_lr=0.1) + runner.register_hook_from_cfg(hook_cfg) + + hook_cfg = dict( + type='TINLrUpdaterHook', + by_epoch=False, + min_lr=0.1, + warmup='exp', + warmup_iters=2, + warmup_ratio=0.9) + runner.register_hook_from_cfg(hook_cfg) + runner.register_hook_from_cfg(dict(type='IterTimerHook')) + runner.register_hook(IterTimerHook()) + + hook_cfg = dict( + type='TINLrUpdaterHook', + by_epoch=False, + min_lr=0.1, + warmup='constant', + warmup_iters=2, + warmup_ratio=0.9) + runner.register_hook_from_cfg(hook_cfg) + runner.register_hook_from_cfg(dict(type='IterTimerHook')) + runner.register_hook(IterTimerHook()) + + hook_cfg = dict( + type='TINLrUpdaterHook', + by_epoch=False, + min_lr=0.1, + warmup='linear', + warmup_iters=2, + warmup_ratio=0.9) + runner.register_hook_from_cfg(hook_cfg) + runner.register_hook_from_cfg(dict(type='IterTimerHook')) + runner.register_hook(IterTimerHook()) + # add pavi hook + hook = PaviLoggerHook(interval=1, add_graph=False, add_last_ckpt=True) + runner.register_hook(hook) + runner.run([loader], [('train', 1)]) + shutil.rmtree(runner.work_dir) + + assert hasattr(hook, 'writer') + calls = [ + call('train', { + 'learning_rate': 0.028544155877284292, + 'momentum': 0.95 + }, 1), + call('train', { + 'learning_rate': 0.04469266270539641, + 'momentum': 0.95 + }, 6), + call('train', { + 'learning_rate': 0.09695518130045147, + 'momentum': 0.95 + }, 10) + ] + hook.writer.add_scalars.assert_has_calls(calls, any_order=True) + + +def _build_demo_runner(runner_type='EpochBasedRunner', + max_epochs=1, + max_iters=None): + + class Model(nn.Module): + + def __init__(self): + super().__init__() + self.linear = nn.Linear(2, 1) + + def forward(self, x): + return self.linear(x) + + def train_step(self, x, optimizer, **kwargs): + return dict(loss=self(x)) + + def val_step(self, x, optimizer, **kwargs): + return dict(loss=self(x)) + + model = Model() + + optimizer = torch.optim.SGD(model.parameters(), lr=0.02, momentum=0.95) + + log_config = dict( + interval=1, hooks=[ + dict(type='TextLoggerHook'), + ]) + + tmp_dir = tempfile.mkdtemp() + tmp_dir = osp.join(tmp_dir, '.test_lr_tmp') + + runner = build_runner( + dict(type=runner_type), + default_args=dict( + model=model, + work_dir=tmp_dir, + optimizer=optimizer, + logger=logging.getLogger(), + max_epochs=max_epochs, + max_iters=max_iters)) + runner.register_checkpoint_hook(dict(interval=1)) + runner.register_logger_hooks(log_config) + return runner diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_optimizer.py b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_optimizer.py new file mode 100644 index 0000000000000000000000000000000000000000..f0c06fe768d5114713d5326fbcbff50e817c10bb --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_optimizer.py @@ -0,0 +1,214 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import torch +import torch.nn as nn +from mmcv.runner import build_optimizer_constructor + + +class SubModel(nn.Module): + + def __init__(self): + super().__init__() + self.conv1 = nn.Conv2d(2, 2, kernel_size=1, groups=2) + self.gn = nn.GroupNorm(2, 2) + self.fc = nn.Linear(2, 2) + self.param1 = nn.Parameter(torch.ones(1)) + + def forward(self, x): + return x + + +class ExampleModel(nn.Module): + + def __init__(self): + super().__init__() + self.param1 = nn.Parameter(torch.ones(1)) + self.conv1 = nn.Conv2d(3, 4, kernel_size=1, bias=False) + self.conv2 = nn.Conv2d(4, 2, kernel_size=1) + self.bn = nn.BatchNorm2d(2) + self.sub = SubModel() + self.fc = nn.Linear(2, 1) + + def forward(self, x): + return x + + +class PseudoDataParallel(nn.Module): + + def __init__(self): + super().__init__() + self.module = ExampleModel() + + def forward(self, x): + return x + + +base_lr = 0.01 +base_wd = 0.0001 +momentum = 0.9 + + +def check_optimizer(optimizer, + model, + prefix='', + bias_lr_mult=1, + bias_decay_mult=1, + norm_decay_mult=1, + dwconv_decay_mult=1): + param_groups = optimizer.param_groups + assert isinstance(optimizer, torch.optim.SGD) + assert optimizer.defaults['lr'] == base_lr + assert optimizer.defaults['momentum'] == momentum + assert optimizer.defaults['weight_decay'] == base_wd + model_parameters = list(model.parameters()) + assert len(param_groups) == len(model_parameters) + for i, param in enumerate(model_parameters): + param_group = param_groups[i] + assert torch.equal(param_group['params'][0], param) + assert param_group['momentum'] == momentum + # param1 + param1 = param_groups[0] + assert param1['lr'] == base_lr + assert param1['weight_decay'] == base_wd + # conv1.weight + conv1_weight = param_groups[1] + assert conv1_weight['lr'] == base_lr + assert conv1_weight['weight_decay'] == base_wd + # conv2.weight + conv2_weight = param_groups[2] + assert conv2_weight['lr'] == base_lr + assert conv2_weight['weight_decay'] == base_wd + # conv2.bias + conv2_bias = param_groups[3] + assert conv2_bias['lr'] == base_lr * bias_lr_mult + assert conv2_bias['weight_decay'] == base_wd * bias_decay_mult + # bn.weight + bn_weight = param_groups[4] + assert bn_weight['lr'] == base_lr + assert bn_weight['weight_decay'] == base_wd * norm_decay_mult + # bn.bias + bn_bias = param_groups[5] + assert bn_bias['lr'] == base_lr + assert bn_bias['weight_decay'] == base_wd * norm_decay_mult + # sub.param1 + sub_param1 = param_groups[6] + assert sub_param1['lr'] == base_lr + assert sub_param1['weight_decay'] == base_wd + # sub.conv1.weight + sub_conv1_weight = param_groups[7] + assert sub_conv1_weight['lr'] == base_lr + assert sub_conv1_weight['weight_decay'] == base_wd * dwconv_decay_mult + # sub.conv1.bias + sub_conv1_bias = param_groups[8] + assert sub_conv1_bias['lr'] == base_lr * bias_lr_mult + assert sub_conv1_bias['weight_decay'] == base_wd * dwconv_decay_mult + # sub.gn.weight + sub_gn_weight = param_groups[9] + assert sub_gn_weight['lr'] == base_lr + assert sub_gn_weight['weight_decay'] == base_wd * norm_decay_mult + # sub.gn.bias + sub_gn_bias = param_groups[10] + assert sub_gn_bias['lr'] == base_lr + assert sub_gn_bias['weight_decay'] == base_wd * norm_decay_mult + # sub.fc1.weight + sub_fc_weight = param_groups[11] + assert sub_fc_weight['lr'] == base_lr + assert sub_fc_weight['weight_decay'] == base_wd + # sub.fc1.bias + sub_fc_bias = param_groups[12] + assert sub_fc_bias['lr'] == base_lr * bias_lr_mult + assert sub_fc_bias['weight_decay'] == base_wd * bias_decay_mult + # fc1.weight + fc_weight = param_groups[13] + assert fc_weight['lr'] == base_lr + assert fc_weight['weight_decay'] == base_wd + # fc1.bias + fc_bias = param_groups[14] + assert fc_bias['lr'] == base_lr * bias_lr_mult + assert fc_bias['weight_decay'] == base_wd * bias_decay_mult + + +def check_tsm_optimizer(optimizer, model, fc_lr5=True): + param_groups = optimizer.param_groups + assert isinstance(optimizer, torch.optim.SGD) + assert optimizer.defaults['lr'] == base_lr + assert optimizer.defaults['momentum'] == momentum + assert optimizer.defaults['weight_decay'] == base_wd + model_parameters = list(model.parameters()) + # first_conv_weight + first_conv_weight = param_groups[0] + assert torch.equal(first_conv_weight['params'][0], model_parameters[1]) + assert first_conv_weight['lr'] == base_lr + assert first_conv_weight['weight_decay'] == base_wd + # first_conv_bias + first_conv_bias = param_groups[1] + assert first_conv_bias['params'] == [] + assert first_conv_bias['lr'] == base_lr * 2 + assert first_conv_bias['weight_decay'] == 0 + # normal_weight + normal_weight = param_groups[2] + assert torch.equal(normal_weight['params'][0], model_parameters[2]) + assert torch.equal(normal_weight['params'][1], model_parameters[7]) + assert normal_weight['lr'] == base_lr + assert normal_weight['weight_decay'] == base_wd + # normal_bias + normal_bias = param_groups[3] + assert torch.equal(normal_bias['params'][0], model_parameters[3]) + assert torch.equal(normal_bias['params'][1], model_parameters[8]) + assert normal_bias['lr'] == base_lr * 2 + assert normal_bias['weight_decay'] == 0 + # bn + bn = param_groups[4] + assert torch.equal(bn['params'][0], model_parameters[4]) + assert torch.equal(bn['params'][1], model_parameters[5]) + assert torch.equal(bn['params'][2], model_parameters[9]) + assert torch.equal(bn['params'][3], model_parameters[10]) + assert bn['lr'] == base_lr + assert bn['weight_decay'] == 0 + # normal linear weight + assert torch.equal(normal_weight['params'][2], model_parameters[11]) + # normal linear bias + assert torch.equal(normal_bias['params'][2], model_parameters[12]) + # fc_lr5 + lr5_weight = param_groups[5] + lr10_bias = param_groups[6] + assert lr5_weight['lr'] == base_lr * 5 + assert lr5_weight['weight_decay'] == base_wd + assert lr10_bias['lr'] == base_lr * 10 + assert lr10_bias['weight_decay'] == 0 + if fc_lr5: + # lr5_weight + assert torch.equal(lr5_weight['params'][0], model_parameters[13]) + # lr10_bias + assert torch.equal(lr10_bias['params'][0], model_parameters[14]) + else: + # lr5_weight + assert lr5_weight['params'] == [] + # lr10_bias + assert lr10_bias['params'] == [] + assert torch.equal(normal_weight['params'][3], model_parameters[13]) + assert torch.equal(normal_bias['params'][3], model_parameters[14]) + + +def test_tsm_optimizer_constructor(): + model = ExampleModel() + optimizer_cfg = dict( + type='SGD', lr=base_lr, weight_decay=base_wd, momentum=momentum) + # fc_lr5 is True + paramwise_cfg = dict(fc_lr5=True) + optim_constructor_cfg = dict( + type='TSMOptimizerConstructor', + optimizer_cfg=optimizer_cfg, + paramwise_cfg=paramwise_cfg) + optim_constructor = build_optimizer_constructor(optim_constructor_cfg) + optimizer = optim_constructor(model) + check_tsm_optimizer(optimizer, model, **paramwise_cfg) + + # fc_lr5 is False + paramwise_cfg = dict(fc_lr5=False) + optim_constructor_cfg = dict( + type='TSMOptimizerConstructor', + optimizer_cfg=optimizer_cfg, + paramwise_cfg=paramwise_cfg) + optim_constructor = build_optimizer_constructor(optim_constructor_cfg) + optimizer = optim_constructor(model) + check_tsm_optimizer(optimizer, model, **paramwise_cfg) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_precise_bn.py b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_precise_bn.py new file mode 100644 index 0000000000000000000000000000000000000000..42d5fed7e6aca2d30acc36f19f51256ad841744b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_precise_bn.py @@ -0,0 +1,205 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy + +import numpy as np +import pytest +import torch +import torch.nn as nn +from mmcv.parallel import MMDistributedDataParallel +from mmcv.runner import EpochBasedRunner, build_optimizer +from mmcv.utils import get_logger +from torch.utils.data import DataLoader, Dataset + +from mmaction.utils import PreciseBNHook + + +class ExampleDataset(Dataset): + + def __init__(self): + self.index = 0 + + def __getitem__(self, idx): + results = dict(imgs=torch.tensor([1.0], dtype=torch.float32)) + return results + + def __len__(self): + return 1 + + +class BiggerDataset(ExampleDataset): + + def __init__(self, fixed_values=range(0, 12)): + assert len(self) == len(fixed_values) + self.fixed_values = fixed_values + + def __getitem__(self, idx): + results = dict( + imgs=torch.tensor([self.fixed_values[idx]], dtype=torch.float32)) + return results + + def __len__(self): + # a bigger dataset + return 12 + + +class ExampleModel(nn.Module): + + def __init__(self): + super().__init__() + self.conv = nn.Linear(1, 1) + self.bn = nn.BatchNorm1d(1) + self.test_cfg = None + + def forward(self, imgs, return_loss=False): + return self.bn(self.conv(imgs)) + + @staticmethod + def train_step(data_batch, optimizer, **kwargs): + outputs = { + 'loss': 0.5, + 'log_vars': { + 'accuracy': 0.98 + }, + 'num_samples': 1 + } + return outputs + + +class SingleBNModel(ExampleModel): + + def __init__(self): + super().__init__() + self.bn = nn.BatchNorm1d(1) + self.test_cfg = None + + def forward(self, imgs, return_loss=False): + return self.bn(imgs) + + +class GNExampleModel(ExampleModel): + + def __init__(self): + super().__init__() + self.conv = nn.Linear(1, 1) + self.bn = nn.GroupNorm(1, 1) + self.test_cfg = None + + +class NoBNExampleModel(ExampleModel): + + def __init__(self): + super().__init__() + self.conv = nn.Linear(1, 1) + self.test_cfg = None + + def forward(self, imgs, return_loss=False): + return self.conv(imgs) + + +def test_precise_bn(): + with pytest.raises(TypeError): + # `data_loader` must be a Pytorch DataLoader + test_dataset = ExampleModel() + data_loader = DataLoader( + test_dataset, + batch_size=2, + sampler=None, + num_workers=0, + shuffle=False) + PreciseBNHook('data_loader') + + optimizer_cfg = dict( + type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) + + test_dataset = ExampleDataset() + loader = DataLoader(test_dataset, batch_size=2) + model = ExampleModel() + optimizer = build_optimizer(model, optimizer_cfg) + + data_loader = DataLoader(test_dataset, batch_size=2) + precise_bn_loader = copy.deepcopy(data_loader) + logger = get_logger('precise_bn') + runner = EpochBasedRunner( + model=model, batch_processor=None, optimizer=optimizer, logger=logger) + + with pytest.raises(AssertionError): + # num_iters should be no larger than total + # iters + precise_bn_hook = PreciseBNHook(precise_bn_loader, num_iters=5) + runner.register_hook(precise_bn_hook) + runner.run([loader], [('train', 1)], 1) + + # test non-DDP model + test_bigger_dataset = BiggerDataset() + loader = DataLoader(test_bigger_dataset, batch_size=2) + precise_bn_hook = PreciseBNHook(loader, num_iters=5) + assert precise_bn_hook.num_iters == 5 + assert precise_bn_hook.interval == 1 + runner = EpochBasedRunner( + model=model, batch_processor=None, optimizer=optimizer, logger=logger) + runner.register_hook(precise_bn_hook) + runner.run([loader], [('train', 1)], 1) + + # test model w/ gn layer + loader = DataLoader(test_bigger_dataset, batch_size=2) + precise_bn_hook = PreciseBNHook(loader, num_iters=5) + assert precise_bn_hook.num_iters == 5 + assert precise_bn_hook.interval == 1 + model = GNExampleModel() + runner = EpochBasedRunner( + model=model, batch_processor=None, optimizer=optimizer, logger=logger) + runner.register_hook(precise_bn_hook) + runner.run([loader], [('train', 1)], 1) + + # test model without bn layer + loader = DataLoader(test_bigger_dataset, batch_size=2) + precise_bn_hook = PreciseBNHook(loader, num_iters=5) + assert precise_bn_hook.num_iters == 5 + assert precise_bn_hook.interval == 1 + model = NoBNExampleModel() + runner = EpochBasedRunner( + model=model, batch_processor=None, optimizer=optimizer, logger=logger) + runner.register_hook(precise_bn_hook) + runner.run([loader], [('train', 1)], 1) + + # test how precise it is + loader = DataLoader(test_bigger_dataset, batch_size=2) + precise_bn_hook = PreciseBNHook(loader, num_iters=6) # run all + assert precise_bn_hook.num_iters == 6 + assert precise_bn_hook.interval == 1 + model = SingleBNModel() + runner = EpochBasedRunner( + model=model, batch_processor=None, optimizer=optimizer, logger=logger) + runner.register_hook(precise_bn_hook) + runner.run([loader], [('train', 1)], 1) + imgs_list = list() + for _, data in enumerate(loader): + imgs_list.append(np.array(data['imgs'])) + mean = np.mean([np.mean(batch) for batch in imgs_list]) + # bassel correction used in Pytorch, therefore ddof=1 + var = np.mean([np.var(batch, ddof=1) for batch in imgs_list]) + assert np.equal(mean, np.array(model.bn.running_mean)) + assert np.equal(var, np.array(model.bn.running_var)) + + @pytest.mark.skipif( + not torch.cuda.is_available(), reason='requires CUDA support') + def test_ddp_model_precise_bn(): + # test DDP model + test_bigger_dataset = BiggerDataset() + loader = DataLoader(test_bigger_dataset, batch_size=2) + precise_bn_hook = PreciseBNHook(loader, num_iters=5) + assert precise_bn_hook.num_iters == 5 + assert precise_bn_hook.interval == 1 + model = ExampleModel() + model = MMDistributedDataParallel( + model.cuda(), + device_ids=[torch.cuda.current_device()], + broadcast_buffers=False, + find_unused_parameters=True) + runner = EpochBasedRunner( + model=model, + batch_processor=None, + optimizer=optimizer, + logger=logger) + runner.register_hook(precise_bn_hook) + runner.run([loader], [('train', 1)], 1) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_train.py b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_train.py new file mode 100644 index 0000000000000000000000000000000000000000..3a205dfbb4f3b018be4abe552443b076416db194 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_runtime/test_train.py @@ -0,0 +1,125 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import tempfile +from collections import OrderedDict + +import pytest +import torch +import torch.nn as nn +from mmcv import Config +from torch.utils.data import Dataset + +from mmaction.apis import train_model +from mmaction.datasets import DATASETS + + +@DATASETS.register_module() +class ExampleDataset(Dataset): + + def __init__(self, test_mode=False): + self.test_mode = test_mode + + @staticmethod + def evaluate(results, logger=None): + eval_results = OrderedDict() + eval_results['acc'] = 1 + return eval_results + + def __getitem__(self, idx): + results = dict(imgs=torch.tensor([1])) + return results + + def __len__(self): + return 1 + + +class ExampleModel(nn.Module): + + def __init__(self): + super().__init__() + self.test_cfg = None + self.conv1 = nn.Conv2d(3, 8, kernel_size=1) + self.norm1 = nn.BatchNorm1d(2) + + def forward(self, imgs, return_loss=False): + self.norm1(torch.rand(3, 2).cuda()) + losses = dict() + losses['test_loss'] = torch.tensor([0.5], requires_grad=True) + return losses + + def train_step(self, data_batch, optimizer, **kwargs): + imgs = data_batch['imgs'] + losses = self.forward(imgs, True) + loss = torch.tensor([0.5], requires_grad=True) + outputs = dict(loss=loss, log_vars=losses, num_samples=3) + return outputs + + def val_step(self, data_batch, optimizer, **kwargs): + imgs = data_batch['imgs'] + self.forward(imgs, False) + outputs = dict(results=0.5) + return outputs + + +@pytest.mark.skipif( + not torch.cuda.is_available(), reason='requires CUDA support') +def test_train_model(): + model = ExampleModel() + dataset = ExampleDataset() + datasets = [ExampleDataset(), ExampleDataset()] + _cfg = dict( + seed=0, + gpus=1, + gpu_ids=[0], + resume_from=None, + load_from=None, + workflow=[('train', 1)], + total_epochs=5, + evaluation=dict(interval=1, save_best='acc'), + data=dict( + videos_per_gpu=1, + workers_per_gpu=0, + val=dict(type='ExampleDataset')), + optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001), + optimizer_config=dict(grad_clip=dict(max_norm=40, norm_type=2)), + lr_config=dict(policy='step', step=[40, 80]), + omnisource=False, + precise_bn=False, + checkpoint_config=dict(interval=1), + log_level='INFO', + log_config=dict(interval=20, hooks=[dict(type='TextLoggerHook')])) + + with tempfile.TemporaryDirectory() as tmpdir: + # normal train + cfg = copy.deepcopy(_cfg) + cfg['work_dir'] = tmpdir + config = Config(cfg) + train_model(model, dataset, config) + + with tempfile.TemporaryDirectory() as tmpdir: + # train with validation + cfg = copy.deepcopy(_cfg) + cfg['work_dir'] = tmpdir + config = Config(cfg) + train_model(model, dataset, config, validate=True) + + with tempfile.TemporaryDirectory() as tmpdir: + cfg = copy.deepcopy(_cfg) + cfg['work_dir'] = tmpdir + cfg['omnisource'] = True + config = Config(cfg) + train_model(model, datasets, config) + + with tempfile.TemporaryDirectory() as tmpdir: + # train with precise_bn on + cfg = copy.deepcopy(_cfg) + cfg['work_dir'] = tmpdir + cfg['workflow'] = [('train', 1), ('val', 1)] + cfg['data'] = dict( + videos_per_gpu=1, + workers_per_gpu=0, + train=dict(type='ExampleDataset'), + val=dict(type='ExampleDataset')) + cfg['precise_bn'] = dict(num_iters=1, interval=1) + config = Config(cfg) + train_model(model, datasets, config) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_utils/__init__.py b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/__init__.py @@ -0,0 +1 @@ +# Copyright (c) OpenMMLab. All rights reserved. diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_bbox.py b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_bbox.py new file mode 100644 index 0000000000000000000000000000000000000000..8f5e0ab7dd82c175ad58865acfe371abbfcff034 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_bbox.py @@ -0,0 +1,151 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp +from abc import abstractproperty + +import numpy as np +import torch + +from mmaction.core.bbox import bbox2result, bbox_target +from mmaction.datasets import AVADataset + + +def test_assigner_sampler(): + try: + from mmdet.core.bbox import build_assigner, build_sampler + except (ImportError, ModuleNotFoundError): + raise ImportError( + 'Failed to import `build_assigner` and `build_sampler` ' + 'from `mmdet.core.bbox`. The two APIs are required for ' + 'the testing in `test_bbox.py`! ') + data_prefix = osp.normpath( + osp.join(osp.dirname(__file__), '../data/eval_detection')) + ann_file = osp.join(data_prefix, 'gt.csv') + label_file = osp.join(data_prefix, 'action_list.txt') + proposal_file = osp.join(data_prefix, 'proposal.pkl') + dataset = AVADataset( + ann_file=ann_file, + exclude_file=None, + pipeline=[], + label_file=label_file, + proposal_file=proposal_file, + num_classes=4) + + assigner = dict( + type='MaxIoUAssignerAVA', + pos_iou_thr=0.5, + neg_iou_thr=0.5, + min_pos_iou=0.5) + assigner = build_assigner(assigner) + proposal = torch.tensor(dataset[0]['proposals']) + + gt_bboxes = torch.tensor(dataset[0]['gt_bboxes']) + gt_labels = torch.tensor(dataset[0]['gt_labels']) + assign_result = assigner.assign( + bboxes=proposal, + gt_bboxes=gt_bboxes, + gt_bboxes_ignore=None, + gt_labels=gt_labels) + assert assign_result.num_gts == 4 + assert torch.all( + assign_result.gt_inds == torch.tensor([0, 0, 3, 3, 0, 0, 0, 1, 0, 0])) + assert torch.all( + torch.isclose( + assign_result.max_overlaps, + torch.tensor([ + 0.40386841, 0.47127257, 0.53544776, 0.58797631, 0.29281288, + 0.40979504, 0.45902917, 0.50093938, 0.21560125, 0.32948171 + ], + dtype=torch.float64))) + assert torch.all( + torch.isclose( + assign_result.labels, + torch.tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 1., 0., 0.], + [0., 1., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], + [0., 0., 0., 0.], [0., 0., 0., 1.], [0., 0., 0., 0.], + [0., 0., 0., 0.]]))) + sampler = dict(type='RandomSampler', num=32, pos_fraction=1) + sampler = build_sampler(sampler) + sampling_result = sampler.sample(assign_result, proposal, gt_bboxes, + gt_labels) + assert (sampling_result.pos_inds.shape[0] == + sampling_result.pos_bboxes.shape[0]) + assert (sampling_result.neg_inds.shape[0] == + sampling_result.neg_bboxes.shape[0]) + return sampling_result + + +def test_bbox2result(): + bboxes = torch.tensor([[0.072, 0.47, 0.84, 0.898], + [0.23, 0.215, 0.781, 0.534], + [0.195, 0.128, 0.643, 0.944], + [0.236, 0.189, 0.689, 0.74], + [0.375, 0.371, 0.726, 0.804], + [0.024, 0.398, 0.776, 0.719]]) + labels = torch.tensor([[-1.650, 0.515, 0.798, 1.240], + [1.368, -1.128, 0.037, -1.087], + [0.481, -1.303, 0.501, -0.463], + [-0.356, 0.126, -0.840, 0.438], + [0.079, 1.269, -0.263, -0.538], + [-0.853, 0.391, 0.103, 0.398]]) + num_classes = 4 + # Test for multi-label + result = bbox2result(bboxes, labels, num_classes) + assert np.all( + np.isclose( + result[0], + np.array([[0.072, 0.47, 0.84, 0.898, 0.515], + [0.236, 0.189, 0.689, 0.74, 0.126], + [0.375, 0.371, 0.726, 0.804, 1.269], + [0.024, 0.398, 0.776, 0.719, 0.391]]))) + assert np.all( + np.isclose( + result[1], + np.array([[0.072, 0.47, 0.84, 0.898, 0.798], + [0.23, 0.215, 0.781, 0.534, 0.037], + [0.195, 0.128, 0.643, 0.944, 0.501], + [0.024, 0.398, 0.776, 0.719, 0.103]]))) + assert np.all( + np.isclose( + result[2], + np.array([[0.072, 0.47, 0.84, 0.898, 1.24], + [0.236, 0.189, 0.689, 0.74, 0.438], + [0.024, 0.398, 0.776, 0.719, 0.398]]))) + + # Test for single-label + result = bbox2result(bboxes, labels, num_classes, -1.0) + assert np.all( + np.isclose(result[0], np.array([[0.375, 0.371, 0.726, 0.804, 1.269]]))) + assert np.all( + np.isclose( + result[1], + np.array([[0.23, 0.215, 0.781, 0.534, 0.037], + [0.195, 0.128, 0.643, 0.944, 0.501]]))) + assert np.all( + np.isclose( + result[2], + np.array([[0.072, 0.47, 0.84, 0.898, 1.240], + [0.236, 0.189, 0.689, 0.74, 0.438], + [0.024, 0.398, 0.776, 0.719, 0.398]]))) + + +def test_bbox_target(): + pos_bboxes = torch.tensor([[0.072, 0.47, 0.84, 0.898], + [0.23, 0.215, 0.781, 0.534], + [0.195, 0.128, 0.643, 0.944], + [0.236, 0.189, 0.689, 0.74]]) + neg_bboxes = torch.tensor([[0.375, 0.371, 0.726, 0.804], + [0.024, 0.398, 0.776, 0.719]]) + pos_gt_labels = torch.tensor([[0., 0., 1., 0.], [0., 0., 0., 1.], + [0., 1., 0., 0.], [0., 1., 0., 0.]]) + cfg = abstractproperty() + cfg.pos_weight = 0.8 + labels, label_weights = bbox_target([pos_bboxes], [neg_bboxes], + [pos_gt_labels], cfg) + assert torch.all( + torch.isclose( + labels, + torch.tensor([[0., 0., 1., 0.], [0., 0., 0., 1.], [0., 1., 0., 0.], + [0., 1., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., + 0.]]))) + assert torch.all( + torch.isclose(label_weights, torch.tensor([0.8] * 4 + [1.0] * 2))) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_localization_utils.py b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_localization_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..b4709fe8695b4984296eea928858304469ee48e7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_localization_utils.py @@ -0,0 +1,204 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp + +import numpy as np +import pytest +from numpy.testing import assert_array_almost_equal, assert_array_equal + +from mmaction.localization import (generate_bsp_feature, + generate_candidate_proposals, soft_nms, + temporal_iop, temporal_iou) + + +def test_temporal_iou(): + anchors_min = np.array([0.0, 0.5]) + anchors_max = np.array([1.0, 1.5]) + box_min = 0.5 + box_max = 1.0 + + iou = temporal_iou(anchors_min, anchors_max, box_min, box_max) + assert_array_equal(iou, np.array([0.5, 0.5])) + + +def test_temporal_iop(): + anchors_min = np.array([0.0, 0.5]) + anchors_max = np.array([1.0, 1.5]) + box_min = 0.4 + box_max = 1.1 + + ioa = temporal_iop(anchors_min, anchors_max, box_min, box_max) + assert_array_almost_equal(ioa, np.array([0.6, 0.6])) + + +def test_soft_nms(): + proposals = np.array([[0., 1., 1., 1., 0.5, 0.5], + [0., 0.4, 1., 1., 0.4, 0.4], + [0., 0.95, 1., 1., 0.6, 0.6]]) + proposal_list = soft_nms(proposals, 0.75, 0.65, 0.9, 1) + assert_array_equal(proposal_list, [[0., 0.95, 0.6], [0., 0.4, 0.4]]) + + +def test_generate_candidate_proposals(): + video_list = [0, 1] + video_infos = [ + dict( + video_name='v_test1', + duration_second=100, + duration_frame=1000, + annotations=[{ + 'segment': [30.0, 60.0], + 'label': 'Rock climbing' + }], + feature_frame=900), + dict( + video_name='v_test2', + duration_second=100, + duration_frame=1000, + annotations=[{ + 'segment': [6.0, 8.0], + 'label': 'Drinking beer' + }], + feature_frame=900) + ] + tem_results_dir = osp.normpath( + osp.join(osp.dirname(__file__), '../data/tem_results')) + # test when tem_result_ext is not valid + with pytest.raises(NotImplementedError): + result_dict = generate_candidate_proposals( + video_list, + video_infos, + tem_results_dir, + 5, + 0.5, + tem_results_ext='unsupport_ext') + # test without result_dict + assert_result1 = np.array([ + [0.1, 0.7, 0.58390868, 0.35708317, 0.20850396, 0.55555556, 0.55555556], + [0.1, 0.5, 0.58390868, 0.32605207, 0.19038463, 0.29411765, 0.41666667], + [0.1, 0.3, 0.58390868, 0.26221931, 0.15311213, 0., 0.], + [0.3, 0.7, 0.30626667, 0.35708317, 0.10936267, 0.83333333, 0.83333333], + [0.3, 0.5, 0.30626667, 0.32605207, 0.09985888, 0.45454545, 0.83333333] + ]) + assert_result2 = np.array( + [[0.1, 0.3, 0.78390867, 0.3622193, 0.28394685, 0., 0.], + [0.1, 0.7, 0.78390867, 0.35708317, 0.27992059, 0., 0.], + [0.1, 0.5, 0.78390867, 0.32605207, 0.25559504, 0., 0.]]) + result_dict = generate_candidate_proposals(video_list, video_infos, + tem_results_dir, 5, 0.5) + + assert_array_almost_equal(result_dict['v_test1'], assert_result1) + assert_array_almost_equal(result_dict['v_test2'], assert_result2) + + # test with result_dict + result_dict = {} + generate_candidate_proposals( + video_list, + video_infos, + tem_results_dir, + 5, + 0.5, + result_dict=result_dict) + + assert_array_almost_equal(result_dict['v_test1'], assert_result1) + assert_array_almost_equal(result_dict['v_test2'], assert_result2) + + +def test_generate_bsp_feature(): + video_list = [0, 1] + video_infos = [ + dict( + video_name='v_test1', + duration_second=100, + duration_frame=1000, + annotations=[{ + 'segment': [30.0, 60.0], + 'label': 'Rock climbing' + }], + feature_frame=900), + dict( + video_name='v_test2', + duration_second=100, + duration_frame=1000, + annotations=[{ + 'segment': [6.0, 8.0], + 'label': 'Drinking beer' + }], + feature_frame=900) + ] + tem_results_dir = osp.normpath( + osp.join(osp.dirname(__file__), '../data/tem_results')) + pgm_proposals_dir = osp.normpath( + osp.join(osp.dirname(__file__), '../data/proposals')) + + # test when extension is not valid + with pytest.raises(NotImplementedError): + result_dict = generate_bsp_feature( + video_list, + video_infos, + tem_results_dir, + pgm_proposals_dir, + tem_results_ext='unsupport_ext') + + with pytest.raises(NotImplementedError): + result_dict = generate_bsp_feature( + video_list, + video_infos, + tem_results_dir, + pgm_proposals_dir, + pgm_proposal_ext='unsupport_ext') + + # test without result_dict + result_dict = generate_bsp_feature( + video_list, video_infos, tem_results_dir, pgm_proposals_dir, top_k=2) + assert_result1 = np.array( + [[ + 0.02633105, 0.02489364, 0.02345622, 0.0220188, 0.02058138, + 0.01914396, 0.01770654, 0.01626912, 0.01541432, 0.01514214, + 0.01486995, 0.01459776, 0.01432558, 0.01405339, 0.01378121, + 0.01350902, 0.03064331, 0.02941124, 0.02817916, 0.02694709, + 0.02571502, 0.02448295, 0.02325087, 0.0220188, 0.01432558, + 0.01409228, 0.01385897, 0.01362567, 0.01339237, 0.01315907, + 0.01292577, 0.01269246 + ], + [ + 0.01350902, 0.01323684, 0.01296465, 0.01269246, 0.01242028, + 0.01214809, 0.01187591, 0.01160372, 0.01154264, 0.01169266, + 0.01184269, 0.01199271, 0.01214273, 0.01229275, 0.01244278, + 0.0125928, 0.01432558, 0.01409228, 0.01385897, 0.01362567, + 0.01339237, 0.01315907, 0.01292577, 0.01269246, 0.01214273, + 0.01227132, 0.01239991, 0.0125285, 0.0126571, 0.01278569, + 0.01291428, 0.01304287 + ]]) + assert_result2 = np.array( + [[ + 0.04133105, 0.03922697, 0.03712288, 0.0350188, 0.03291471, + 0.03081063, 0.02870654, 0.02660246, 0.02541432, 0.02514214, + 0.02486995, 0.02459776, 0.02432558, 0.02405339, 0.02378121, + 0.02350902, 0.04764331, 0.04583981, 0.04403631, 0.04223281, + 0.0404293, 0.0386258, 0.0368223, 0.0350188, 0.02432558, 0.02409228, + 0.02385897, 0.02362567, 0.02339237, 0.02315907, 0.02292577, + 0.02269246 + ], + [ + 0.02350902, 0.02323684, 0.02296465, 0.02269246, 0.02242028, + 0.02214809, 0.02187591, 0.02160372, 0.02120931, 0.02069266, + 0.02017602, 0.01965937, 0.01914273, 0.01862609, 0.01810944, + 0.0175928, 0.02432558, 0.02409228, 0.02385897, 0.02362567, + 0.02339237, 0.02315907, 0.02292577, 0.02269246, 0.01914273, + 0.01869989, 0.01825706, 0.01781422, 0.01737138, 0.01692854, + 0.0164857, 0.01604287 + ]]) + assert_array_almost_equal(result_dict['v_test1'], assert_result1) + assert_array_almost_equal(result_dict['v_test2'], assert_result2) + + # test with result_dict + result_dict = {} + generate_bsp_feature( + video_list, + video_infos, + tem_results_dir, + pgm_proposals_dir, + top_k=2, + result_dict=result_dict) + assert_array_almost_equal(result_dict['v_test1'], assert_result1) + assert_array_almost_equal(result_dict['v_test2'], assert_result2) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_module_hooks.py b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_module_hooks.py new file mode 100644 index 0000000000000000000000000000000000000000..d77d9e94d93a93e387057953ab07f70977cb0ca0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_module_hooks.py @@ -0,0 +1,144 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import copy +import os.path as osp + +import mmcv +import numpy as np +import pytest +import torch + +from mmaction.models import build_recognizer +from mmaction.utils import register_module_hooks +from mmaction.utils.module_hooks import GPUNormalize +from mmaction.utils.multigrid import LongShortCycleHook + + +def test_register_module_hooks(): + _module_hooks = [ + dict( + type='GPUNormalize', + hooked_module='backbone', + hook_pos='forward_pre', + input_format='NCHW', + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375]) + ] + + repo_dpath = osp.dirname(osp.dirname(osp.dirname(__file__))) + config_fpath = osp.join(repo_dpath, 'configs/_base_/models/tsm_r50.py') + config = mmcv.Config.fromfile(config_fpath) + config.model['backbone']['pretrained'] = None + + # case 1 + module_hooks = copy.deepcopy(_module_hooks) + module_hooks[0]['hook_pos'] = 'forward_pre' + recognizer = build_recognizer(config.model) + handles = register_module_hooks(recognizer, module_hooks) + assert recognizer.backbone._forward_pre_hooks[ + handles[0].id].__name__ == 'normalize_hook' + + # case 2 + module_hooks = copy.deepcopy(_module_hooks) + module_hooks[0]['hook_pos'] = 'forward' + recognizer = build_recognizer(config.model) + handles = register_module_hooks(recognizer, module_hooks) + assert recognizer.backbone._forward_hooks[ + handles[0].id].__name__ == 'normalize_hook' + + # case 3 + module_hooks = copy.deepcopy(_module_hooks) + module_hooks[0]['hooked_module'] = 'cls_head' + module_hooks[0]['hook_pos'] = 'backward' + recognizer = build_recognizer(config.model) + handles = register_module_hooks(recognizer, module_hooks) + assert recognizer.cls_head._backward_hooks[ + handles[0].id].__name__ == 'normalize_hook' + + # case 4 + module_hooks = copy.deepcopy(_module_hooks) + module_hooks[0]['hook_pos'] = '_other_pos' + recognizer = build_recognizer(config.model) + with pytest.raises(ValueError): + handles = register_module_hooks(recognizer, module_hooks) + + # case 5 + module_hooks = copy.deepcopy(_module_hooks) + module_hooks[0]['hooked_module'] = '_other_module' + recognizer = build_recognizer(config.model) + with pytest.raises(ValueError): + handles = register_module_hooks(recognizer, module_hooks) + + +def test_gpu_normalize(): + + def check_normalize(origin_imgs, result_imgs, norm_cfg): + """Check if the origin_imgs are normalized correctly into result_imgs + in a given norm_cfg.""" + from numpy.testing import assert_array_almost_equal + target_imgs = result_imgs.copy() + target_imgs *= norm_cfg['std'] + target_imgs += norm_cfg['mean'] + assert_array_almost_equal(origin_imgs, target_imgs, decimal=4) + + _gpu_normalize_cfg = dict( + input_format='NCTHW', + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375]) + + # case 1 + gpu_normalize_cfg = copy.deepcopy(_gpu_normalize_cfg) + gpu_normalize_cfg['input_format'] = 'NCHW' + gpu_normalize = GPUNormalize(**gpu_normalize_cfg) + assert gpu_normalize._mean.shape == (1, 3, 1, 1) + imgs = np.random.randint(256, size=(2, 240, 320, 3), dtype=np.uint8) + _input = (torch.tensor(imgs).permute(0, 3, 1, 2), ) + normalize_hook = gpu_normalize.hook_func() + _input = normalize_hook(torch.nn.Module, _input) + result_imgs = np.array(_input[0].permute(0, 2, 3, 1)) + check_normalize(imgs, result_imgs, gpu_normalize_cfg) + + # case 2 + gpu_normalize_cfg = copy.deepcopy(_gpu_normalize_cfg) + gpu_normalize_cfg['input_format'] = 'NCTHW' + gpu_normalize = GPUNormalize(**gpu_normalize_cfg) + assert gpu_normalize._mean.shape == (1, 3, 1, 1, 1) + + # case 3 + gpu_normalize_cfg = copy.deepcopy(_gpu_normalize_cfg) + gpu_normalize_cfg['input_format'] = 'NCHW_Flow' + gpu_normalize = GPUNormalize(**gpu_normalize_cfg) + assert gpu_normalize._mean.shape == (1, 3, 1, 1) + + # case 4 + gpu_normalize_cfg = copy.deepcopy(_gpu_normalize_cfg) + gpu_normalize_cfg['input_format'] = 'NPTCHW' + gpu_normalize = GPUNormalize(**gpu_normalize_cfg) + assert gpu_normalize._mean.shape == (1, 1, 1, 3, 1, 1) + + # case 5 + gpu_normalize_cfg = copy.deepcopy(_gpu_normalize_cfg) + gpu_normalize_cfg['input_format'] = '_format' + with pytest.raises(ValueError): + gpu_normalize = GPUNormalize(**gpu_normalize_cfg) + + +def test_multigrid_hook(): + multigrid_cfg = dict(data=dict( + videos_per_gpu=8, + workers_per_gpu=4, + )) + with pytest.raises(AssertionError): + LongShortCycleHook(multigrid_cfg) + + multigrid_cfg = dict( + multigrid=dict( + long_cycle=True, + short_cycle=True, + epoch_factor=1.5, + long_cycle_factors=[[0.25, 0.7071], [0.5, 0.7071], [0.5, 1], + [1, 1]], + short_cycle_factors=[0.5, 0.7071], + default_s=(224, 224), + )) + with pytest.raises(AssertionError): + LongShortCycleHook(multigrid_cfg) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_onnx.py b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_onnx.py new file mode 100644 index 0000000000000000000000000000000000000000..6324ccc3410f6c54cd41476fd495ca6b1c79bcb7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_onnx.py @@ -0,0 +1,33 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os.path as osp +import tempfile + +import torch.nn as nn + +from tools.deployment.pytorch2onnx import _convert_batchnorm, pytorch2onnx + + +class TestModel(nn.Module): + + def __init__(self): + super().__init__() + self.conv = nn.Conv3d(1, 2, 1) + self.bn = nn.SyncBatchNorm(2) + + def forward(self, x): + return self.bn(self.conv(x)) + + def forward_dummy(self, x): + out = self.bn(self.conv(x)) + return (out, ) + + +def test_onnx_exporting(): + with tempfile.TemporaryDirectory() as tmpdir: + out_file = osp.join(tmpdir, 'tmp.onnx') + model = TestModel() + model = _convert_batchnorm(model) + # test exporting + if hasattr(model, 'forward_dummy'): + model.forward = model.forward_dummy + pytorch2onnx(model, (2, 1, 1, 1, 1), output_file=out_file, verify=True) diff --git a/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_setup_env.py b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_setup_env.py new file mode 100644 index 0000000000000000000000000000000000000000..87c2f755a8db3c41510ff32c6f9992c4f87d68bd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tests/test_utils/test_setup_env.py @@ -0,0 +1,68 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import multiprocessing as mp +import os +import platform + +import cv2 +from mmcv import Config + +from mmaction.utils import setup_multi_processes + + +def test_setup_multi_processes(): + # temp save system setting + sys_start_mehod = mp.get_start_method(allow_none=True) + sys_cv_threads = cv2.getNumThreads() + # pop and temp save system env vars + sys_omp_threads = os.environ.pop('OMP_NUM_THREADS', default=None) + sys_mkl_threads = os.environ.pop('MKL_NUM_THREADS', default=None) + + # test config without setting env + config = dict(data=dict(workers_per_gpu=2)) + cfg = Config(config) + setup_multi_processes(cfg) + assert os.getenv('OMP_NUM_THREADS') == '1' + assert os.getenv('MKL_NUM_THREADS') == '1' + # when set to 0, the num threads will be 1 + assert cv2.getNumThreads() == 1 + if platform.system() != 'Windows': + assert mp.get_start_method() == 'fork' + + # test num workers <= 1 + os.environ.pop('OMP_NUM_THREADS') + os.environ.pop('MKL_NUM_THREADS') + config = dict(data=dict(workers_per_gpu=0)) + cfg = Config(config) + setup_multi_processes(cfg) + assert 'OMP_NUM_THREADS' not in os.environ + assert 'MKL_NUM_THREADS' not in os.environ + + # test manually set env var + os.environ['OMP_NUM_THREADS'] = '4' + config = dict(data=dict(workers_per_gpu=2)) + cfg = Config(config) + setup_multi_processes(cfg) + assert os.getenv('OMP_NUM_THREADS') == '4' + + # test manually set opencv threads and mp start method + config = dict( + data=dict(workers_per_gpu=2), + opencv_num_threads=4, + mp_start_method='spawn') + cfg = Config(config) + setup_multi_processes(cfg) + assert cv2.getNumThreads() == 4 + assert mp.get_start_method() == 'spawn' + + # revert setting to avoid affecting other programs + if sys_start_mehod: + mp.set_start_method(sys_start_mehod, force=True) + cv2.setNumThreads(sys_cv_threads) + if sys_omp_threads: + os.environ['OMP_NUM_THREADS'] = sys_omp_threads + else: + os.environ.pop('OMP_NUM_THREADS') + if sys_mkl_threads: + os.environ['MKL_NUM_THREADS'] = sys_mkl_threads + else: + os.environ.pop('MKL_NUM_THREADS') diff --git a/openmmlab_test/mmaction2-0.24.1/tools/__init__.py b/openmmlab_test/mmaction2-0.24.1/tools/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..fd77cb4ce5196b72410db7c07a5a461c060b19ad --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from .analysis import * # noqa: F401, F403 +from .data import * # noqa: F401, F403 +from .deployment import * # noqa: F401, F403 +from .misc import * # noqa: F401, F403 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/analysis/analyze_logs.py b/openmmlab_test/mmaction2-0.24.1/tools/analysis/analyze_logs.py new file mode 100644 index 0000000000000000000000000000000000000000..4d2ca5b018aa2a47dd0f78574f62ea51b4d1688e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/analysis/analyze_logs.py @@ -0,0 +1,167 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import json +from collections import defaultdict + +import matplotlib.pyplot as plt +import numpy as np +import seaborn as sns + + +def cal_train_time(log_dicts, args): + for i, log_dict in enumerate(log_dicts): + print(f'{"-" * 5}Analyze train time of {args.json_logs[i]}{"-" * 5}') + all_times = [] + for epoch in log_dict.keys(): + if args.include_outliers: + all_times.append(log_dict[epoch]['time']) + else: + all_times.append(log_dict[epoch]['time'][1:]) + all_times = np.array(all_times) + epoch_ave_time = all_times.mean(-1) + slowest_epoch = epoch_ave_time.argmax() + fastest_epoch = epoch_ave_time.argmin() + std_over_epoch = epoch_ave_time.std() + print(f'slowest epoch {slowest_epoch + 1}, ' + f'average time is {epoch_ave_time[slowest_epoch]:.4f}') + print(f'fastest epoch {fastest_epoch + 1}, ' + f'average time is {epoch_ave_time[fastest_epoch]:.4f}') + print(f'time std over epochs is {std_over_epoch:.4f}') + print(f'average iter time: {np.mean(all_times):.4f} s/iter') + print() + + +def plot_curve(log_dicts, args): + if args.backend is not None: + plt.switch_backend(args.backend) + sns.set_style(args.style) + # if legend is None, use {filename}_{key} as legend + legend = args.legend + if legend is None: + legend = [] + for json_log in args.json_logs: + for metric in args.keys: + legend.append(f'{json_log}_{metric}') + assert len(legend) == (len(args.json_logs) * len(args.keys)) + metrics = args.keys + + num_metrics = len(metrics) + for i, log_dict in enumerate(log_dicts): + epochs = list(log_dict.keys()) + for j, metric in enumerate(metrics): + print(f'plot curve of {args.json_logs[i]}, metric is {metric}') + if metric not in log_dict[epochs[0]]: + raise KeyError( + f'{args.json_logs[i]} does not contain metric {metric}') + xs = [] + ys = [] + for epoch in epochs: + iters = log_dict[epoch]['iter'] + if log_dict[epoch]['mode'][-1] == 'val': + iters = iters[:-1] + num_iters_per_epoch = iters[-1] + xs.append(np.array(iters) + (epoch - 1) * num_iters_per_epoch) + ys.append(np.array(log_dict[epoch][metric][:len(iters)])) + xs = np.concatenate(xs) + ys = np.concatenate(ys) + plt.xlabel('iter') + plt.plot(xs, ys, label=legend[i * num_metrics + j], linewidth=0.5) + plt.legend() + if args.title is not None: + plt.title(args.title) + if args.out is None: + plt.show() + else: + print(f'save curve to: {args.out}') + plt.savefig(args.out) + plt.cla() + + +def add_plot_parser(subparsers): + parser_plt = subparsers.add_parser( + 'plot_curve', help='parser for plotting curves') + parser_plt.add_argument( + 'json_logs', + type=str, + nargs='+', + help='path of train log in json format') + parser_plt.add_argument( + '--keys', + type=str, + nargs='+', + default=['top1_acc'], + help='the metric that you want to plot') + parser_plt.add_argument('--title', type=str, help='title of figure') + parser_plt.add_argument( + '--legend', + type=str, + nargs='+', + default=None, + help='legend of each plot') + parser_plt.add_argument( + '--backend', type=str, default=None, help='backend of plt') + parser_plt.add_argument( + '--style', type=str, default='dark', help='style of plt') + parser_plt.add_argument('--out', type=str, default=None) + + +def add_time_parser(subparsers): + parser_time = subparsers.add_parser( + 'cal_train_time', + help='parser for computing the average time per training iteration') + parser_time.add_argument( + 'json_logs', + type=str, + nargs='+', + help='path of train log in json format') + parser_time.add_argument( + '--include-outliers', + action='store_true', + help='include the first value of every epoch when computing ' + 'the average time') + + +def parse_args(): + parser = argparse.ArgumentParser(description='Analyze Json Log') + # currently only support plot curve and calculate average train time + subparsers = parser.add_subparsers(dest='task', help='task parser') + add_plot_parser(subparsers) + add_time_parser(subparsers) + args = parser.parse_args() + return args + + +def load_json_logs(json_logs): + # load and convert json_logs to log_dict, key is epoch, value is a sub dict + # keys of sub dict is different metrics, e.g. memory, top1_acc + # value of sub dict is a list of corresponding values of all iterations + log_dicts = [dict() for _ in json_logs] + for json_log, log_dict in zip(json_logs, log_dicts): + with open(json_log, 'r') as log_file: + for line in log_file: + log = json.loads(line.strip()) + # skip lines without `epoch` field + if 'epoch' not in log: + continue + epoch = log.pop('epoch') + if epoch not in log_dict: + log_dict[epoch] = defaultdict(list) + for k, v in log.items(): + log_dict[epoch][k].append(v) + return log_dicts + + +def main(): + args = parse_args() + + json_logs = args.json_logs + for json_log in json_logs: + assert json_log.endswith('.json') + + log_dicts = load_json_logs(json_logs) + + eval(args.task)(log_dicts, args) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/analysis/bench_processing.py b/openmmlab_test/mmaction2-0.24.1/tools/analysis/bench_processing.py new file mode 100644 index 0000000000000000000000000000000000000000..df90899da6b8c381e27846cebb87be2f2ffe3cc0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/analysis/bench_processing.py @@ -0,0 +1,65 @@ +# Copyright (c) OpenMMLab. All rights reserved. +"""This file is for benchmark dataloading process. The command line to run this +file is: + +$ python -m cProfile -o program.prof tools/analysis/bench_processing.py +configs/task/method/[config filename] + +It use cProfile to record cpu running time and output to program.prof +To visualize cProfile output program.prof, use Snakeviz and run: +$ snakeviz program.prof +""" +import argparse +import os + +import mmcv +from mmcv import Config + +from mmaction import __version__ +from mmaction.datasets import build_dataloader, build_dataset +from mmaction.utils import get_root_logger + + +def main(): + parser = argparse.ArgumentParser(description='Benchmark dataloading') + parser.add_argument('config', help='train config file path') + args = parser.parse_args() + cfg = Config.fromfile(args.config) + + # init logger before other steps + logger = get_root_logger() + logger.info(f'MMAction2 Version: {__version__}') + logger.info(f'Config: {cfg.text}') + + # create bench data list + ann_file_bench = 'benchlist.txt' + if not os.path.exists(ann_file_bench): + with open(cfg.ann_file_train) as f: + lines = f.readlines()[:256] + with open(ann_file_bench, 'w') as f1: + f1.writelines(lines) + cfg.data.train.ann_file = ann_file_bench + + dataset = build_dataset(cfg.data.train) + data_loader = build_dataloader( + dataset, + videos_per_gpu=cfg.data.videos_per_gpu, + workers_per_gpu=0, + persistent_workers=False, + num_gpus=1, + dist=False) + + # Start progress bar after first 5 batches + prog_bar = mmcv.ProgressBar( + len(dataset) - 5 * cfg.data.videos_per_gpu, start=False) + for i, data in enumerate(data_loader): + if i == 5: + prog_bar.start() + for _ in data['imgs']: + if i < 5: + continue + prog_bar.update() + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/analysis/benchmark.py b/openmmlab_test/mmaction2-0.24.1/tools/analysis/benchmark.py new file mode 100644 index 0000000000000000000000000000000000000000..8e97a3b2e15555aad5652bf0b03080bd9fd8f0f9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/analysis/benchmark.py @@ -0,0 +1,94 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import time + +import torch +from mmcv import Config +from mmcv.cnn import fuse_conv_bn +from mmcv.parallel import MMDataParallel +from mmcv.runner.fp16_utils import wrap_fp16_model + +from mmaction.datasets import build_dataloader, build_dataset +from mmaction.models import build_model + + +def parse_args(): + parser = argparse.ArgumentParser( + description='MMAction2 benchmark a recognizer') + parser.add_argument('config', help='test config file path') + parser.add_argument( + '--log-interval', default=10, help='interval of logging') + parser.add_argument( + '--fuse-conv-bn', + action='store_true', + help='Whether to fuse conv and bn, this will slightly increase' + 'the inference speed') + args = parser.parse_args() + return args + + +def main(): + args = parse_args() + + cfg = Config.fromfile(args.config) + # set cudnn_benchmark + if cfg.get('cudnn_benchmark', False): + torch.backends.cudnn.benchmark = True + cfg.model.backbone.pretrained = None + cfg.data.test.test_mode = True + + # build the dataloader + dataset = build_dataset(cfg.data.test, dict(test_mode=True)) + data_loader = build_dataloader( + dataset, + videos_per_gpu=1, + workers_per_gpu=cfg.data.workers_per_gpu, + persistent_workers=cfg.data.get('persistent_workers', False), + dist=False, + shuffle=False) + + # build the model and load checkpoint + model = build_model( + cfg.model, train_cfg=None, test_cfg=cfg.get('test_cfg')) + fp16_cfg = cfg.get('fp16', None) + if fp16_cfg is not None: + wrap_fp16_model(model) + if args.fuse_conv_bn: + model = fuse_conv_bn(model) + + model = MMDataParallel(model, device_ids=[0]) + + model.eval() + + # the first several iterations may be very slow so skip them + num_warmup = 5 + pure_inf_time = 0 + + # benchmark with 2000 video and take the average + for i, data in enumerate(data_loader): + + torch.cuda.synchronize() + start_time = time.perf_counter() + + with torch.no_grad(): + model(return_loss=False, **data) + + torch.cuda.synchronize() + elapsed = time.perf_counter() - start_time + + if i >= num_warmup: + pure_inf_time += elapsed + if (i + 1) % args.log_interval == 0: + fps = (i + 1 - num_warmup) / pure_inf_time + print( + f'Done video [{i + 1:<3}/ 2000], fps: {fps:.1f} video / s') + + if (i + 1) == 200: + pure_inf_time += elapsed + fps = (i + 1 - num_warmup) / pure_inf_time + print(f'Overall fps: {fps:.1f} video / s') + break + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/analysis/check_videos.py b/openmmlab_test/mmaction2-0.24.1/tools/analysis/check_videos.py new file mode 100644 index 0000000000000000000000000000000000000000..d2b45761935c7f378974761bd4af5738ca81cb49 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/analysis/check_videos.py @@ -0,0 +1,158 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import warnings +from functools import partial +from multiprocessing import Manager, Pool, cpu_count + +import mmcv +import numpy as np +from mmcv import Config, DictAction + +from mmaction.datasets import PIPELINES, build_dataset + + +def parse_args(): + parser = argparse.ArgumentParser(description='MMAction2 check datasets') + parser.add_argument('config', help='test config file path') + parser.add_argument( + '--options', + nargs='+', + action=DictAction, + default={}, + help='custom options for evaluation, the key-value pair in xxx=yyy ' + 'format will be kwargs for dataset.evaluate() function (deprecate), ' + 'change to --eval-options instead.') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + parser.add_argument( + '--output-file', + default='invalid-video.txt', + help='Output file path which keeps corrupted/missing video file paths') + parser.add_argument( + '--split', + default='train', + choices=['train', 'val', 'test'], + help='Dataset split') + parser.add_argument( + '--decoder', + default='decord', + choices=['decord', 'opencv', 'pyav'], + help='Video decoder type, should be one of [decord, opencv, pyav]') + parser.add_argument( + '--num-processes', + type=int, + default=(cpu_count() - 1 or 1), + help='Number of processes to check videos') + parser.add_argument( + '--remove-corrupted-videos', + action='store_true', + help='Whether to delete all corrupted videos') + args = parser.parse_args() + + if args.options and args.eval_options: + raise ValueError( + '--options and --eval-options cannot be both ' + 'specified, --options is deprecated in favor of --eval-options') + if args.options: + warnings.warn('--options is deprecated in favor of --eval-options') + args.eval_options = args.options + return args + + +@PIPELINES.register_module() +class RandomSampleFrames: + + def __call__(self, results): + """Select frames to verify. + + Select the first, last and three random frames, Required key is + "total_frames", added or modified key is "frame_inds". + Args: + results (dict): The resulting dict to be modified and passed + to the next transform in pipeline. + """ + assert results['total_frames'] > 0 + + # first and last frames + results['frame_inds'] = np.array([0, results['total_frames'] - 1]) + + # choose 3 random frames + if results['total_frames'] > 2: + results['frame_inds'] = np.concatenate([ + results['frame_inds'], + np.random.randint(1, results['total_frames'] - 1, 3) + ]) + + return results + + +def _do_check_videos(lock, dataset, output_file, idx): + try: + dataset[idx] + except: # noqa + # save invalid video path to output file + lock.acquire() + with open(output_file, 'a') as f: + f.write(dataset.video_infos[idx]['filename'] + '\n') + lock.release() + + +if __name__ == '__main__': + args = parse_args() + + decoder_to_pipeline_prefix = dict( + decord='Decord', opencv='OpenCV', pyav='PyAV') + + # read config file + cfg = Config.fromfile(args.config) + cfg.merge_from_dict(args.cfg_options) + + # build dataset + dataset_type = cfg.data[args.split].type + assert dataset_type == 'VideoDataset' + cfg.data[args.split].pipeline = [ + dict(type=decoder_to_pipeline_prefix[args.decoder] + 'Init'), + dict(type='RandomSampleFrames'), + dict(type=decoder_to_pipeline_prefix[args.decoder] + 'Decode') + ] + dataset = build_dataset(cfg.data[args.split], + dict(test_mode=(args.split != 'train'))) + + # prepare for checking + if os.path.exists(args.output_file): + # remove existing output file + os.remove(args.output_file) + pool = Pool(args.num_processes) + lock = Manager().Lock() + worker_fn = partial(_do_check_videos, lock, dataset, args.output_file) + ids = range(len(dataset)) + + # start checking + prog_bar = mmcv.ProgressBar(len(dataset)) + for _ in pool.imap_unordered(worker_fn, ids): + prog_bar.update() + pool.close() + pool.join() + + if os.path.exists(args.output_file): + num_lines = sum(1 for _ in open(args.output_file)) + print(f'Checked {len(dataset)} videos, ' + f'{num_lines} are corrupted/missing.') + if args.remove_corrupted_videos: + print('Start deleting corrupted videos') + cnt = 0 + with open(args.output_file, 'r') as f: + for line in f: + if os.path.exists(line.strip()): + os.remove(line.strip()) + cnt += 1 + print(f'Deleted {cnt} corrupted videos.') + else: + print(f'Checked {len(dataset)} videos, none are corrupted/missing') diff --git a/openmmlab_test/mmaction2-0.24.1/tools/analysis/eval_metric.py b/openmmlab_test/mmaction2-0.24.1/tools/analysis/eval_metric.py new file mode 100644 index 0000000000000000000000000000000000000000..7841a4cb66899ef10b81cc90069751a63abfce4c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/analysis/eval_metric.py @@ -0,0 +1,66 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse + +import mmcv +from mmcv import Config, DictAction + +from mmaction.datasets import build_dataset + + +def parse_args(): + parser = argparse.ArgumentParser(description='Evaluate metric of the ' + 'results saved in pkl/yaml/json format') + parser.add_argument('config', help='Config of the model') + parser.add_argument('results', help='Results in pkl/yaml/json format') + parser.add_argument( + '--eval', + type=str, + nargs='+', + help='evaluation metrics, which depends on the dataset, e.g.,' + ' "top_k_accuracy", "mean_class_accuracy" for video dataset') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + parser.add_argument( + '--eval-options', + nargs='+', + action=DictAction, + help='custom options for evaluation, the key-value pair in xxx=yyy ' + 'format will be kwargs for dataset.evaluate() function') + args = parser.parse_args() + return args + + +def main(): + args = parse_args() + + cfg = Config.fromfile(args.config) + + assert args.eval is not None + + if args.cfg_options is not None: + cfg.merge_from_dict(args.cfg_options) + cfg.data.test.test_mode = True + + dataset = build_dataset(cfg.data.test) + outputs = mmcv.load(args.results) + + kwargs = {} if args.eval_options is None else args.eval_options + eval_kwargs = cfg.get('evaluation', {}).copy() + # hard-code way to remove EvalHook args + for key in [ + 'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best', 'rule', + 'by_epoch' + ]: + eval_kwargs.pop(key, None) + eval_kwargs.update(dict(metrics=args.eval, **kwargs)) + print(dataset.evaluate(outputs, **eval_kwargs)) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/analysis/get_flops.py b/openmmlab_test/mmaction2-0.24.1/tools/analysis/get_flops.py new file mode 100644 index 0000000000000000000000000000000000000000..d4c8e9732ea132985108e70f3dbd0d4691aa40a1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/analysis/get_flops.py @@ -0,0 +1,73 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse + +from mmcv import Config + +from mmaction.models import build_recognizer + +try: + from mmcv.cnn import get_model_complexity_info +except ImportError: + raise ImportError('Please upgrade mmcv to >0.6.2') + + +def parse_args(): + parser = argparse.ArgumentParser(description='Train a recognizer') + parser.add_argument('config', help='train config file path') + parser.add_argument( + '--shape', + type=int, + nargs='+', + default=[340, 256], + help='input image size') + args = parser.parse_args() + return args + + +def main(): + + args = parse_args() + + if len(args.shape) == 1: + input_shape = (1, 3, args.shape[0], args.shape[0]) + elif len(args.shape) == 2: + input_shape = ( + 1, + 3, + ) + tuple(args.shape) + elif len(args.shape) == 4: + # n, c, h, w = args.shape + input_shape = tuple(args.shape) + elif len(args.shape) == 5: + # n, c, t, h, w = args.shape + input_shape = tuple(args.shape) + else: + raise ValueError('invalid input shape') + + cfg = Config.fromfile(args.config) + model = build_recognizer( + cfg.model, + train_cfg=cfg.get('train_cfg'), + test_cfg=cfg.get('test_cfg')) + + model = model.cuda() + model.eval() + + if hasattr(model, 'forward_dummy'): + model.forward = model.forward_dummy + else: + raise NotImplementedError( + 'FLOPs counter is currently not currently supported with {}'. + format(model.__class__.__name__)) + + flops, params = get_model_complexity_info(model, input_shape) + split_line = '=' * 30 + print(f'{split_line}\nInput shape: {input_shape}\n' + f'Flops: {flops}\nParams: {params}\n{split_line}') + print('!!!Please be cautious if you use the results in papers. ' + 'You may need to check if all ops are supported and verify that the ' + 'flops computation is correct.') + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/analysis/print_config.py b/openmmlab_test/mmaction2-0.24.1/tools/analysis/print_config.py new file mode 100644 index 0000000000000000000000000000000000000000..c3538ef56bdd07a841352c138ccf23ac3390561a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/analysis/print_config.py @@ -0,0 +1,27 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse + +from mmcv import Config, DictAction + + +def parse_args(): + parser = argparse.ArgumentParser(description='Print the whole config') + parser.add_argument('config', help='config file path') + parser.add_argument( + '--options', nargs='+', action=DictAction, help='arguments in dict') + args = parser.parse_args() + + return args + + +def main(): + args = parse_args() + + cfg = Config.fromfile(args.config) + if args.options is not None: + cfg.merge_from_dict(args.options) + print(f'Config:\n{cfg.pretty_text}') + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/analysis/report_accuracy.py b/openmmlab_test/mmaction2-0.24.1/tools/analysis/report_accuracy.py new file mode 100644 index 0000000000000000000000000000000000000000..329434d13f2fc26c81bc7af1ab92bd2fb777d493 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/analysis/report_accuracy.py @@ -0,0 +1,57 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse + +from mmcv import load +from scipy.special import softmax + +from mmaction.core.evaluation import (get_weighted_score, mean_class_accuracy, + top_k_accuracy) + + +def parse_args(): + parser = argparse.ArgumentParser(description='Fusing multiple scores') + parser.add_argument( + '--scores', + nargs='+', + help='list of scores', + default=['demo/fuse/rgb.pkl', 'demo/fuse/flow.pkl']) + parser.add_argument( + '--coefficients', + nargs='+', + type=float, + help='coefficients of each score file', + default=[1.0, 1.0]) + parser.add_argument( + '--datalist', + help='list of testing data', + default='demo/fuse/data_list.txt') + parser.add_argument('--apply-softmax', action='store_true') + args = parser.parse_args() + return args + + +def main(): + args = parse_args() + assert len(args.scores) == len(args.coefficients) + score_list = args.scores + score_list = [load(f) for f in score_list] + if args.apply_softmax: + + def apply_softmax(scores): + return [softmax(score) for score in scores] + + score_list = [apply_softmax(scores) for scores in score_list] + + weighted_scores = get_weighted_score(score_list, args.coefficients) + data = open(args.datalist).readlines() + labels = [int(x.strip().split()[-1]) for x in data] + + mean_class_acc = mean_class_accuracy(weighted_scores, labels) + top_1_acc, top_5_acc = top_k_accuracy(weighted_scores, labels, (1, 5)) + print(f'Mean Class Accuracy: {mean_class_acc:.04f}') + print(f'Top 1 Accuracy: {top_1_acc:.04f}') + print(f'Top 5 Accuracy: {top_5_acc:.04f}') + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/analysis/report_map.py b/openmmlab_test/mmaction2-0.24.1/tools/analysis/report_map.py new file mode 100644 index 0000000000000000000000000000000000000000..2aa46a1c505a74e1c9876a35a66ef4f05d45d1f9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/analysis/report_map.py @@ -0,0 +1,87 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import os.path as osp + +import mmcv +import numpy as np + +from mmaction.core import ActivityNetLocalization + +args = None + + +def cuhk17_top1(): + """Assign label for each proposal with the cuhk17 result, which is the #2 + entry in http://activity-net.org/challenges/2017/evaluation.html.""" + if not osp.exists('cuhk_anet17_pred.json'): + os.system('wget https://download.openmmlab.com/' + 'mmaction/localization/cuhk_anet17_pred.json') + proposal = mmcv.load(args.proposal) + results = proposal['results'] + cuhk_pred = mmcv.load('cuhk_anet17_pred.json')['results'] + + def get_topk(preds, k): + preds.sort(key=lambda x: x['score']) + return preds[-k:] + + for k, v in results.items(): + action_pred = cuhk_pred[k] + top1 = get_topk(action_pred, 1) + top1_label = top1[0]['label'] + new_value = [] + for item in v: + x = dict(label=top1_label) + x.update(item) + new_value.append(x) + results[k] = new_value + proposal['results'] = results + mmcv.dump(proposal, args.det_output) + + +cls_funcs = {'cuhk17_top1': cuhk17_top1} + + +def parse_args(): + parser = argparse.ArgumentParser(description='Report detection mAP for' + 'ActivityNet proposal file') + parser.add_argument('--proposal', type=str, help='proposal file') + parser.add_argument( + '--gt', + type=str, + default='data/ActivityNet/' + 'anet_anno_val.json', + help='groundtruth file') + parser.add_argument( + '--cls', + type=str, + default='cuhk17_top1', + choices=['cuhk17_top1'], + help='the way to assign label for each ' + 'proposal') + parser.add_argument( + '--det-output', + type=str, + default='det_result.json', + help='the path to store detection results') + args = parser.parse_args() + return args + + +def main(): + global args, cls_funcs + args = parse_args() + func = cls_funcs[args.cls] + func() + anet_detection = ActivityNetLocalization( + args.gt, + args.det_output, + tiou_thresholds=np.linspace(0.5, 0.95, 10), + verbose=True) + mAP, average_mAP = anet_detection.evaluate() + print('[RESULTS] Performance on ActivityNet detection task.\n' + f'mAP: {mAP}\nAverage-mAP: {average_mAP}') + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/argparse.bash b/openmmlab_test/mmaction2-0.24.1/tools/argparse.bash new file mode 100644 index 0000000000000000000000000000000000000000..4e034cdc924a575a0455d1833b7a064220078ed0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/argparse.bash @@ -0,0 +1,103 @@ +#!/usr/bin/env bash + +# Use python's argparse module in shell scripts +# +# The function `argparse` parses its arguments using +# argparse.ArgumentParser; the parser is defined in the function's +# stdin. +# +# Executing ``argparse.bash`` (as opposed to sourcing it) prints a +# script template. +# +# https://github.com/nhoffman/argparse-bash +# MIT License - Copyright (c) 2015 Noah Hoffman +# +# The MIT License (MIT) +# +# Copyright (c) 2015 Noah Hoffman +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. + +argparse(){ + argparser=$(mktemp 2>/dev/null || mktemp -t argparser) + cat > "$argparser" <> "$argparser" + + cat >> "$argparser" < /dev/null; then + eval $(python "$argparser" "$@") + retval=0 + else + python "$argparser" "$@" + retval=1 + fi + + rm "$argparser" + return $retval +} + +# print a script template when this script is executed +if [[ $0 == *argparse.bash ]]; then + cat < + +```BibTeX +@article{Heilbron2015ActivityNetAL, + title={ActivityNet: A large-scale video benchmark for human activity understanding}, + author={Fabian Caba Heilbron and Victor Escorcia and Bernard Ghanem and Juan Carlos Niebles}, + journal={2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, + year={2015}, + pages={961-970} +} +``` + +For basic dataset information, please refer to the official [website](http://activity-net.org/). +For action detection, you can either use the ActivityNet rescaled feature provided in this [repo](https://github.com/wzmsltw/BSN-boundary-sensitive-network#code-and-data-preparation) or extract feature with mmaction2 (which has better performance). +We release both pipeline. +Before we start, please make sure that current working directory is `$MMACTION2/tools/data/activitynet/`. + +## Option 1: Use the ActivityNet rescaled feature provided in this [repo](https://github.com/wzmsltw/BSN-boundary-sensitive-network#code-and-data-preparation) + +### Step 1. Download Annotations + +First of all, you can run the following script to download annotation files. + +```shell +bash download_feature_annotations.sh +``` + +### Step 2. Prepare Videos Features + +Then, you can run the following script to download activitynet features. + +```shell +bash download_features.sh +``` + +### Step 3. Process Annotation Files + +Next, you can run the following script to process the downloaded annotation files for training and testing. +It first merges the two annotation files together and then separates the annoations by `train`, `val` and `test`. + +```shell +python process_annotations.py +``` + +## Option 2: Extract ActivityNet feature using MMAction2 with all videos provided in official [website](http://activity-net.org/) + +### Step 1. Download Annotations + +First of all, you can run the following script to download annotation files. + +```shell +bash download_annotations.sh +``` + +### Step 2. Prepare Videos + +Then, you can run the following script to prepare videos. +The codes are adapted from the [official crawler](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics). Note that this might take a long time. + +```shell +bash download_videos.sh +``` + +Since some videos in the ActivityNet dataset might be no longer available on YouTube, official [website](http://activity-net.org/) has made the full dataset available on Google and Baidu drives. +To accommodate missing data requests, you can fill in this [request form](https://docs.google.com/forms/d/e/1FAIpQLSeKaFq9ZfcmZ7W0B0PbEhfbTHY41GeEgwsa7WobJgGUhn4DTQ/viewform) provided in official [download page](http://activity-net.org/download.html) to have a 7-day-access to download the videos from the drive folders. + +We also provide download steps for annotations from [BSN repo](https://github.com/wzmsltw/BSN-boundary-sensitive-network#code-and-data-preparation) + +```shell +bash download_bsn_videos.sh +``` + +For this case, the downloading scripts update the annotation file after downloading to make sure every video in it exists. + +### Step 3. Extract RGB and Flow + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +Use following scripts to extract both RGB and Flow. + +```shell +bash extract_frames.sh +``` + +The command above can generate images with new short edge 256. If you want to generate images with short edge 320 (320p), or with fix size 340x256, you can change the args `--new-short 256` to `--new-short 320` or `--new-width 340 --new-height 256`. +More details can be found in [data_preparation](/docs/data_preparation.md) + +### Step 4. Generate File List for ActivityNet Finetuning + +With extracted frames, you can generate video-level or clip-level lists of rawframes, which can be used for ActivityNet Finetuning. + +```shell +python generate_rawframes_filelist.py +``` + +### Step 5. Finetune TSN models on ActivityNet + +You can use ActivityNet configs in `configs/recognition/tsn` to finetune TSN models on ActivityNet. +You need to use Kinetics models for pretraining. +Both RGB models and Flow models are supported. + +### Step 6. Extract ActivityNet Feature with finetuned ckpts + +After finetuning TSN on ActivityNet, you can use it to extract both RGB and Flow feature. + +```shell +python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_train_video.txt --output-prefix ../../../data/ActivityNet/rgb_feat --modality RGB --ckpt /path/to/rgb_checkpoint.pth + +python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_val_video.txt --output-prefix ../../../data/ActivityNet/rgb_feat --modality RGB --ckpt /path/to/rgb_checkpoint.pth + +python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_train_video.txt --output-prefix ../../../data/ActivityNet/flow_feat --modality Flow --ckpt /path/to/flow_checkpoint.pth + +python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_val_video.txt --output-prefix ../../../data/ActivityNet/flow_feat --modality Flow --ckpt /path/to/flow_checkpoint.pth +``` + +After feature extraction, you can use our post processing scripts to concat RGB and Flow feature, generate the `100-t X 400-d` feature for Action Detection. + +```shell +python activitynet_feature_postprocessing.py --rgb ../../../data/ActivityNet/rgb_feat --flow ../../../data/ActivityNet/flow_feat --dest ../../../data/ActivityNet/mmaction_feat +``` + +## Final Step. Check Directory Structure + +After the whole data pipeline for ActivityNet preparation, +you will get the features, videos, frames and annotation files. + +In the context of the whole project (for ActivityNet only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ActivityNet + +(if Option 1 used) +│ │ ├── anet_anno_{train,val,test,full}.json +│ │ ├── anet_anno_action.json +│ │ ├── video_info_new.csv +│ │ ├── activitynet_feature_cuhk +│ │ │ ├── csv_mean_100 +│ │ │ │ ├── v___c8enCfzqw.csv +│ │ │ │ ├── v___dXUJsj3yo.csv +│ │ │ | ├── .. + +(if Option 2 used) +│ │ ├── anet_train_video.txt +│ │ ├── anet_val_video.txt +│ │ ├── anet_train_clip.txt +│ │ ├── anet_val_clip.txt +│ │ ├── activity_net.v1-3.min.json +│ │ ├── mmaction_feat +│ │ │ ├── v___c8enCfzqw.csv +│ │ │ ├── v___dXUJsj3yo.csv +│ │ │ ├── .. +│ │ ├── rawframes +│ │ │ ├── v___c8enCfzqw +│ │ │ │ ├── img_00000.jpg +│ │ │ │ ├── flow_x_00000.jpg +│ │ │ │ ├── flow_y_00000.jpg +│ │ │ │ ├── .. +│ │ │ ├── .. + +``` + +For training and evaluating on ActivityNet, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..7687b948db7a747c147aeee8c0b33632c1e9b489 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/README_zh-CN.md @@ -0,0 +1,169 @@ +# 准备 ActivityNet + +## 简介 + + + +```BibTeX +@article{Heilbron2015ActivityNetAL, + title={ActivityNet: A large-scale video benchmark for human activity understanding}, + author={Fabian Caba Heilbron and Victor Escorcia and Bernard Ghanem and Juan Carlos Niebles}, + journal={2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, + year={2015}, + pages={961-970} +} +``` + +用户可参考该数据集的 [官网](http://activity-net.org/),以获取数据集相关的基本信息。 +对于时序动作检测任务,用户可以使用这个 [代码库](https://github.com/wzmsltw/BSN-boundary-sensitive-network#code-and-data-preparation) 提供的缩放过(rescaled)的 ActivityNet 特征, +或者使用 MMAction2 进行特征提取(这将具有更高的精度)。MMAction2 同时提供了以上所述的两种数据使用流程。 +在数据集准备前,请确保命令行当前路径为 `$MMACTION2/tools/data/activitynet/`。 + +## 选项 1:用户可以使用这个 [代码库](https://github.com/wzmsltw/BSN-boundary-sensitive-network#code-and-data-preparation) 提供的特征 + +### 步骤 1. 下载标注文件 + +首先,用户可以使用以下命令下载标注文件。 + +```shell +bash download_feature_annotations.sh +``` + +### 步骤 2. 准备视频特征 + +之后,用户可以使用以下命令下载 ActivityNet 特征。 + +```shell +bash download_features.sh +``` + +### 步骤 3. 处理标注文件 + +之后,用户可以使用以下命令处理下载的标注文件,以便于训练和测试。 +该脚本会首先合并两个标注文件,然后再将其分为 `train`, `val` 和 `test` 三个部分。 + +```shell +python process_annotations.py +``` + +## 选项 2:使用 MMAction2 对 [官网](http://activity-net.org/) 提供的视频进行特征抽取 + +### 步骤 1. 下载标注文件 + +首先,用户可以使用以下命令下载标注文件。 + +```shell +bash download_annotations.sh +``` + +### 步骤 2. 准备视频 + +之后,用户可以使用以下脚本准备视频数据。 +该代码参考自 [官方爬虫](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics),该过程将会耗费较多时间。 + +```shell +bash download_videos.sh +``` + +由于 ActivityNet 数据集中的一些视频已经在 YouTube 失效,[官网](http://activity-net.org/) 在谷歌网盘和百度网盘提供了完整的数据集数据。 +如果用户想要获取失效的数据集,则需要填写 [下载页面](http://activity-net.org/download.html) 中提供的 [需求表格](https://docs.google.com/forms/d/e/1FAIpQLSeKaFq9ZfcmZ7W0B0PbEhfbTHY41GeEgwsa7WobJgGUhn4DTQ/viewform) 以获取 7 天的下载权限。 + +MMAction2 同时也提供了 [BSN 代码库](https://github.com/wzmsltw/BSN-boundary-sensitive-network#code-and-data-preparation) 的标注文件的下载步骤。 + +```shell +bash download_bsn_videos.sh +``` + +对于这种情况,该下载脚本将在下载后更新此标注文件,以确保每个视频都存在。 + +### 步骤 3. 抽取 RGB 帧和光流 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +可使用以下命令抽取视频帧和光流。 + +```shell +bash extract_frames.sh +``` + +以上脚本将会生成短边 256 分辨率的视频。如果用户想生成短边 320 分辨率的视频(即 320p),或者 340x256 的固定分辨率,用户可以通过改变参数由 `--new-short 256` 至 `--new-short 320`,或者 `--new-width 340 --new-height 256` 进行设置 +更多细节可参考 [数据准备指南](/docs_zh_CN/data_preparation.md) + +### 步骤 4. 生成用于 ActivityNet 微调的文件列表 + +根据抽取的帧,用户可以生成视频级别(video-level)或者片段级别(clip-level)的文件列表,其可用于微调 ActivityNet。 + +```shell +python generate_rawframes_filelist.py +``` + +### 步骤 5. 在 ActivityNet 上微调 TSN 模型 + +用户可使用 `configs/recognition/tsn` 目录中的 ActivityNet 配置文件进行 TSN 模型微调。 +用户需要使用 Kinetics 相关模型(同时支持 RGB 模型与光流模型)进行预训练。 + +### 步骤 6. 使用预训练模型进行 ActivityNet 特征抽取 + +在 ActivityNet 上微调 TSN 模型之后,用户可以使用该模型进行 RGB 特征和光流特征的提取。 + +```shell +python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_train_video.txt --output-prefix ../../../data/ActivityNet/rgb_feat --modality RGB --ckpt /path/to/rgb_checkpoint.pth + +python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_val_video.txt --output-prefix ../../../data/ActivityNet/rgb_feat --modality RGB --ckpt /path/to/rgb_checkpoint.pth + +python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_train_video.txt --output-prefix ../../../data/ActivityNet/flow_feat --modality Flow --ckpt /path/to/flow_checkpoint.pth + +python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_val_video.txt --output-prefix ../../../data/ActivityNet/flow_feat --modality Flow --ckpt /path/to/flow_checkpoint.pth +``` + +在提取完特征后,用户可以使用后处理脚本整合 RGB 特征和光流特征,生成 `100-t X 400-d` 维度的特征用于时序动作检测。 + +```shell +python activitynet_feature_postprocessing.py --rgb ../../../data/ActivityNet/rgb_feat --flow ../../../data/ActivityNet/flow_feat --dest ../../../data/ActivityNet/mmaction_feat +``` + +## 最后一步:检查文件夹结构 + +在完成所有 ActivityNet 数据集准备流程后,用户可以获得对应的特征文件,RGB + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,ActivityNet 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ActivityNet + +(若根据选项 1 进行数据处理) +│ │ ├── anet_anno_{train,val,test,full}.json +│ │ ├── anet_anno_action.json +│ │ ├── video_info_new.csv +│ │ ├── activitynet_feature_cuhk +│ │ │ ├── csv_mean_100 +│ │ │ │ ├── v___c8enCfzqw.csv +│ │ │ │ ├── v___dXUJsj3yo.csv +│ │ │ | ├── .. + +(若根据选项 2 进行数据处理) +│ │ ├── anet_train_video.txt +│ │ ├── anet_val_video.txt +│ │ ├── anet_train_clip.txt +│ │ ├── anet_val_clip.txt +│ │ ├── activity_net.v1-3.min.json +│ │ ├── mmaction_feat +│ │ │ ├── v___c8enCfzqw.csv +│ │ │ ├── v___dXUJsj3yo.csv +│ │ │ ├── .. +│ │ ├── rawframes +│ │ │ ├── v___c8enCfzqw +│ │ │ │ ├── img_00000.jpg +│ │ │ │ ├── flow_x_00000.jpg +│ │ │ │ ├── flow_y_00000.jpg +│ │ │ │ ├── .. +│ │ │ ├── .. + +``` + +关于对 ActivityNet 进行训练和验证,可以参考 [基础教程](/docs_zh_CN/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/action_name.csv b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/action_name.csv new file mode 100644 index 0000000000000000000000000000000000000000..5f5fe1d9c9a2a2895e034ffaac7182bf304158d8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/action_name.csv @@ -0,0 +1,201 @@ +action +Applying sunscreen +Arm wrestling +Assembling bicycle +BMX +Baking cookies +Baton twirling +Beach soccer +Beer pong +Blow-drying hair +Blowing leaves +Playing ten pins +Braiding hair +Building sandcastles +Bullfighting +Calf roping +Camel ride +Canoeing +Capoeira +Carving jack-o-lanterns +Changing car wheel +Cleaning sink +Clipping cat claws +Croquet +Curling +Cutting the grass +Decorating the Christmas tree +Disc dog +Doing a powerbomb +Doing crunches +Drum corps +Elliptical trainer +Doing fencing +Fixing the roof +Fun sliding down +Futsal +Gargling mouthwash +Grooming dog +Hand car wash +Hanging wallpaper +Having an ice cream +Hitting a pinata +Hula hoop +Hurling +Ice fishing +Installing carpet +Kite flying +Kneeling +Knitting +Laying tile +Longboarding +Making a cake +Making a lemonade +Making an omelette +Mooping floor +Painting fence +Painting furniture +Peeling potatoes +Plastering +Playing beach volleyball +Playing blackjack +Playing congas +Playing drums +Playing ice hockey +Playing pool +Playing rubik cube +Powerbocking +Putting in contact lenses +Putting on shoes +Rafting +Raking leaves +Removing ice from car +Riding bumper cars +River tubing +Rock-paper-scissors +Rollerblading +Roof shingle removal +Rope skipping +Running a marathon +Scuba diving +Sharpening knives +Shuffleboard +Skiing +Slacklining +Snow tubing +Snowboarding +Spread mulch +Sumo +Surfing +Swimming +Swinging at the playground +Table soccer +Throwing darts +Trimming branches or hedges +Tug of war +Using the monkey bar +Using the rowing machine +Wakeboarding +Waterskiing +Waxing skis +Welding +Drinking coffee +Zumba +Doing kickboxing +Doing karate +Tango +Putting on makeup +High jump +Playing bagpipes +Cheerleading +Wrapping presents +Cricket +Clean and jerk +Preparing pasta +Bathing dog +Discus throw +Playing field hockey +Grooming horse +Preparing salad +Playing harmonica +Playing saxophone +Chopping wood +Washing face +Using the pommel horse +Javelin throw +Spinning +Ping-pong +Making a sandwich +Brushing hair +Playing guitarra +Doing step aerobics +Drinking beer +Playing polo +Snatch +Paintball +Long jump +Cleaning windows +Brushing teeth +Playing flauta +Tennis serve with ball bouncing +Bungee jumping +Triple jump +Horseback riding +Layup drill in basketball +Vacuuming floor +Cleaning shoes +Doing nails +Shot put +Fixing bicycle +Washing hands +Ironing clothes +Using the balance beam +Shoveling snow +Tumbling +Using parallel bars +Getting a tattoo +Rock climbing +Smoking hookah +Shaving +Getting a piercing +Springboard diving +Playing squash +Playing piano +Dodgeball +Smoking a cigarette +Sailing +Getting a haircut +Playing lacrosse +Cumbia +Tai chi +Painting +Mowing the lawn +Shaving legs +Walking the dog +Hammer throw +Skateboarding +Polishing shoes +Ballet +Hand washing clothes +Plataform diving +Playing violin +Breakdancing +Windsurfing +Hopscotch +Doing motocross +Mixing drinks +Starting a campfire +Belly dance +Removing curlers +Archery +Volleyball +Playing water polo +Playing racquetball +Kayaking +Polishing forniture +Playing kickball +Using uneven bars +Washing dishes +Pole vault +Playing accordion +Playing badminton diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/activitynet_feature_postprocessing.py b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/activitynet_feature_postprocessing.py new file mode 100644 index 0000000000000000000000000000000000000000..8dcd7bfe2612c96bd5e5ad0e7bd28356d3a87b06 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/activitynet_feature_postprocessing.py @@ -0,0 +1,99 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import multiprocessing +import os +import os.path as osp + +import numpy as np +import scipy.interpolate +from mmcv import dump, load + +args = None + + +def parse_args(): + parser = argparse.ArgumentParser(description='ANet Feature Prepare') + parser.add_argument('--rgb', default='', help='rgb feature root') + parser.add_argument('--flow', default='', help='flow feature root') + parser.add_argument('--dest', default='', help='dest root') + parser.add_argument('--output-format', default='csv') + args = parser.parse_args() + return args + + +def pool_feature(data, num_proposals=100, num_sample_bins=3, pool_type='mean'): + """Pool features with arbitrary temporal length. + + Args: + data (list[np.ndarray] | np.ndarray): Features of an untrimmed video, + with arbitrary temporal length. + num_proposals (int): The temporal dim of pooled feature. Default: 100. + num_sample_bins (int): How many points to sample to get the feature + vector at one timestamp. Default: 3. + pool_type (str): Type of pooling to pool features. Choices are + ['mean', 'max']. Default: 'mean'. + + Returns: + np.ndarray: The pooled feature with shape num_proposals x feature_dim. + """ + if len(data) == 1: + return np.concatenate([data] * num_proposals) + x_range = list(range(len(data))) + f = scipy.interpolate.interp1d(x_range, data, axis=0) + eps = 1e-4 + start, end = eps, len(data) - 1 - eps + anchor_size = (end - start) / num_proposals + ptr = start + feature = [] + for _ in range(num_proposals): + x_new = [ + ptr + i / num_sample_bins * anchor_size + for i in range(num_sample_bins) + ] + y_new = f(x_new) + if pool_type == 'mean': + y_new = np.mean(y_new, axis=0) + elif pool_type == 'max': + y_new = np.max(y_new, axis=0) + else: + raise NotImplementedError('Unsupported pool type') + feature.append(y_new) + ptr += anchor_size + feature = np.stack(feature) + return feature + + +def merge_feat(name): + # concatenate rgb feat and flow feat for a single sample + rgb_feat = load(osp.join(args.rgb, name)) + flow_feat = load(osp.join(args.flow, name)) + rgb_feat = pool_feature(rgb_feat) + flow_feat = pool_feature(flow_feat) + feat = np.concatenate([rgb_feat, flow_feat], axis=-1) + if not osp.exists(args.dest): + os.system(f'mkdir -p {args.dest}') + if args.output_format == 'pkl': + dump(feat, osp.join(args.dest, name)) + elif args.output_format == 'csv': + feat = feat.tolist() + lines = [] + line0 = ','.join([f'f{i}' for i in range(400)]) + lines.append(line0) + for line in feat: + lines.append(','.join([f'{x:.4f}' for x in line])) + with open(osp.join(args.dest, name.replace('.pkl', '.csv')), 'w') as f: + f.write('\n'.join(lines)) + + +def main(): + global args + args = parse_args() + rgb_feat = os.listdir(args.rgb) + flow_feat = os.listdir(args.flow) + assert set(rgb_feat) == set(flow_feat) + pool = multiprocessing.Pool(32) + pool.map(merge_feat, rgb_feat) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/convert_proposal_format.py b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/convert_proposal_format.py new file mode 100644 index 0000000000000000000000000000000000000000..f2f8613eb4192382375714fb99d1d955970caf2f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/convert_proposal_format.py @@ -0,0 +1,162 @@ +# Copyright (c) OpenMMLab. All rights reserved. +"""This file converts the output proposal file of proposal generator (BSN, BMN) +into the input proposal file of action classifier (Currently supports SSN and +P-GCN, not including TSN, I3D etc.).""" +import argparse + +import mmcv +import numpy as np + +from mmaction.core import pairwise_temporal_iou + + +def load_annotations(ann_file): + """Load the annotation according to ann_file into video_infos.""" + video_infos = [] + anno_database = mmcv.load(ann_file) + for video_name in anno_database: + video_info = anno_database[video_name] + video_info['video_name'] = video_name + video_infos.append(video_info) + return video_infos + + +def import_ground_truth(video_infos, activity_index): + """Read ground truth data from video_infos.""" + ground_truth = {} + for video_info in video_infos: + video_id = video_info['video_name'][2:] + this_video_ground_truths = [] + for ann in video_info['annotations']: + t_start, t_end = ann['segment'] + label = activity_index[ann['label']] + this_video_ground_truths.append([t_start, t_end, label]) + ground_truth[video_id] = np.array(this_video_ground_truths) + return ground_truth + + +def import_proposals(result_dict): + """Read predictions from result dict.""" + proposals = {} + num_proposals = 0 + for video_id in result_dict: + result = result_dict[video_id] + this_video_proposals = [] + for proposal in result: + t_start, t_end = proposal['segment'] + score = proposal['score'] + this_video_proposals.append([t_start, t_end, score]) + num_proposals += 1 + proposals[video_id] = np.array(this_video_proposals) + return proposals, num_proposals + + +def dump_formatted_proposal(video_idx, video_id, num_frames, fps, gts, + proposals, tiou, t_overlap_self, + formatted_proposal_file): + """dump the formatted proposal file, which is the input proposal file of + action classifier (e.g: SSN). + + Args: + video_idx (int): Index of video. + video_id (str): ID of video. + num_frames (int): Total frames of the video. + fps (float): Fps of the video. + gts (np.ndarray[float]): t_start, t_end and label of groundtruths. + proposals (np.ndarray[float]): t_start, t_end and score of proposals. + tiou (np.ndarray[float]): 2-dim array with IoU ratio. + t_overlap_self (np.ndarray[float]): 2-dim array with overlap_self + (union / self_len) ratio. + formatted_proposal_file (open file object): Open file object of + formatted_proposal_file. + """ + + formatted_proposal_file.write( + f'#{video_idx}\n{video_id}\n{num_frames}\n{fps}\n{gts.shape[0]}\n') + for gt in gts: + formatted_proposal_file.write(f'{int(gt[2])} {gt[0]} {gt[1]}\n') + formatted_proposal_file.write(f'{proposals.shape[0]}\n') + + best_iou = np.amax(tiou, axis=0) + best_iou_index = np.argmax(tiou, axis=0) + best_overlap = np.amax(t_overlap_self, axis=0) + best_overlap_index = np.argmax(t_overlap_self, axis=0) + + for i in range(proposals.shape[0]): + index_iou = best_iou_index[i] + index_overlap = best_overlap_index[i] + label_iou = gts[index_iou][2] + label_overlap = gts[index_overlap][2] + if label_iou != label_overlap: + label = label_iou if label_iou != 0 else label_overlap + else: + label = label_iou + if best_iou[i] == 0 and best_overlap[i] == 0: + formatted_proposal_file.write( + f'0 0 0 {proposals[i][0]} {proposals[i][1]}\n') + else: + formatted_proposal_file.write( + f'{int(label)} {best_iou[i]} {best_overlap[i]} ' + f'{proposals[i][0]} {proposals[i][1]}\n') + + +def parse_args(): + parser = argparse.ArgumentParser(description='convert proposal format') + parser.add_argument( + '--ann-file', + type=str, + default='../../../data/ActivityNet/anet_anno_val.json', + help='name of annotation file') + parser.add_argument( + '--activity-index-file', + type=str, + default='../../../data/ActivityNet/anet_activity_indexes_val.txt', + help='name of activity index file') + parser.add_argument( + '--proposal-file', + type=str, + default='../../../results.json', + help='name of proposal file, which is the' + 'output of proposal generator (BMN)') + parser.add_argument( + '--formatted-proposal-file', + type=str, + default='../../../anet_val_formatted_proposal.txt', + help='name of formatted proposal file, which is the' + 'input of action classifier (SSN)') + args = parser.parse_args() + + return args + + +if __name__ == '__main__': + args = parse_args() + formatted_proposal_file = open(args.formatted_proposal_file, 'w') + + # The activity index file is constructed according to + # 'https://github.com/activitynet/ActivityNet/blob/master/Evaluation/eval_classification.py' + activity_index, class_idx = {}, 0 + for line in open(args.activity_index_file).readlines(): + activity_index[line.strip()] = class_idx + class_idx += 1 + + video_infos = load_annotations(args.ann_file) + ground_truth = import_ground_truth(video_infos, activity_index) + proposal, num_proposals = import_proposals( + mmcv.load(args.proposal_file)['results']) + video_idx = 0 + + for video_info in video_infos: + video_id = video_info['video_name'][2:] + num_frames = video_info['duration_frame'] + fps = video_info['fps'] + tiou, t_overlap = pairwise_temporal_iou( + proposal[video_id][:, :2].astype(float), + ground_truth[video_id][:, :2].astype(float), + calculate_overlap_self=True) + + dump_formatted_proposal(video_idx, video_id, num_frames, fps, + ground_truth[video_id], proposal[video_id], + tiou, t_overlap, formatted_proposal_file) + video_idx += 1 + formatted_proposal_file.close() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download.py b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download.py new file mode 100644 index 0000000000000000000000000000000000000000..1d1bf41a2dc53018cc514d14fd6bf4e5324ca6d9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download.py @@ -0,0 +1,148 @@ +# Copyright (c) OpenMMLab. All rights reserved. +# This scripts is copied from +# https://github.com/activitynet/ActivityNet/blob/master/Crawler/Kinetics/download.py # noqa: E501 +# The code is licensed under the MIT licence. +import argparse +import os +import ssl +import subprocess + +import mmcv +from joblib import Parallel, delayed + +ssl._create_default_https_context = ssl._create_unverified_context +data_file = '../../../data/ActivityNet' +output_dir = f'{data_file}/videos' + + +def parse_args(): + parser = argparse.ArgumentParser(description='ActivityNet downloader') + parser.add_argument( + '--bsn', + action='store_true', + help='download for BSN annotation or official one') + args = parser.parse_args() + return args + + +def download_clip(video_identifier, + output_filename, + num_attempts=5, + url_base='https://www.youtube.com/watch?v='): + """Download a video from youtube if exists and is not blocked. + arguments: + --------- + video_identifier: str + Unique YouTube video identifier (11 characters) + output_filename: str + File path where the video will be stored. + """ + # Defensive argument checking. + assert isinstance(video_identifier, str), 'video_identifier must be string' + assert isinstance(output_filename, str), 'output_filename must be string' + assert len(video_identifier) == 11, 'video_identifier must have length 11' + + status = False + + if not os.path.exists(output_filename): + command = [ + 'youtube-dl', '--quiet', '--no-warnings', '--no-check-certificate', + '-f', 'mp4', '-o', + '"%s"' % output_filename, + '"%s"' % (url_base + video_identifier) + ] + command = ' '.join(command) + print(command) + attempts = 0 + while True: + try: + subprocess.check_output( + command, shell=True, stderr=subprocess.STDOUT) + except subprocess.CalledProcessError: + attempts += 1 + if attempts == num_attempts: + return status, 'Fail' + else: + break + # Check if the video was successfully saved. + status = os.path.exists(output_filename) + return status, 'Downloaded' + + +def download_clip_wrapper(youtube_id, output_dir): + """Wrapper for parallel processing purposes.""" + # we do this to align with names in annotations + output_filename = os.path.join(output_dir, 'v_' + youtube_id + '.mp4') + if os.path.exists(output_filename): + status = tuple(['v_' + youtube_id, True, 'Exists']) + return status + + downloaded, log = download_clip(youtube_id, output_filename) + status = tuple(['v_' + youtube_id, downloaded, log]) + return status + + +def parse_activitynet_annotations(input_csv, is_bsn_case=False): + """Returns a list of YoutubeID. + arguments: + --------- + input_csv: str + Path to CSV file containing the following columns: + 'video,numFrame,seconds,fps,rfps,subset,featureFrame' + returns: + ------- + youtube_ids: list + List of all YoutubeIDs in ActivityNet. + + """ + if is_bsn_case: + lines = open(input_csv).readlines() + lines = lines[1:] + # YoutubeIDs do not have prefix `v_` + youtube_ids = [x.split(',')[0][2:] for x in lines] + else: + data = mmcv.load(anno_file)['database'] + youtube_ids = list(data.keys()) + + return youtube_ids + + +def main(input_csv, output_dir, anno_file, num_jobs=24, is_bsn_case=False): + # Reading and parsing ActivityNet. + youtube_ids = parse_activitynet_annotations(input_csv, is_bsn_case) + + # Creates folders where videos will be saved later. + if not os.path.exists(output_dir): + os.makedirs(output_dir) + # Download all clips. + if num_jobs == 1: + status_list = [] + for index in youtube_ids: + status_list.append(download_clip_wrapper(index, output_dir)) + else: + status_list = Parallel(n_jobs=num_jobs)( + delayed(download_clip_wrapper)(index, output_dir) + for index in youtube_ids) + + # Save download report. + mmcv.dump(status_list, 'download_report.json') + annotation = mmcv.load(anno_file) + downloaded = {status[0]: status[1] for status in status_list} + annotation = {k: v for k, v in annotation.items() if downloaded[k]} + + if is_bsn_case: + anno_file_bak = anno_file.replace('.json', '_bak.json') + os.rename(anno_file, anno_file_bak) + mmcv.dump(annotation, anno_file) + + +if __name__ == '__main__': + args = parse_args() + is_bsn_case = args.bsn + if is_bsn_case: + video_list = f'{data_file}/video_info_new.csv' + anno_file = f'{data_file}/anet_anno_action.json' + else: + video_list = f'{data_file}/activity_net.v1-3.min.json' + anno_file = video_list + main(video_list, output_dir, anno_file, 24, is_bsn_case) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..3be2e229215d79fca6f2f52990248cd2e915143d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_annotations.sh @@ -0,0 +1,12 @@ +DATA_DIR="../../../data/ActivityNet/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} + +wget http://ec2-52-25-205-214.us-west-2.compute.amazonaws.com/files/activity_net.v1-3.min.json + +cd - diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_bsn_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_bsn_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..706aafc5279c52a46ed0baa326c2b23cd7a4f6c1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_bsn_videos.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +# set up environment +conda env create -f environment.yml +source activate activitynet +pip install --upgrade youtube-dl +pip install mmcv + +DATA_DIR="../../../data/ActivityNet" +python download.py --bsn + +source deactivate activitynet +conda remove -n activitynet --all diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_feature_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_feature_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..9ef9fc0bbf455d7ab62787da588610e47f3b2b69 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_feature_annotations.sh @@ -0,0 +1,16 @@ +DATA_DIR="../../../data/ActivityNet/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} + +wget https://raw.githubusercontent.com/wzmsltw/BSN-boundary-sensitive-network/master/data/activitynet_annotations/anet_anno_action.json + +wget https://raw.githubusercontent.com/wzmsltw/BSN-boundary-sensitive-network/master/data/activitynet_annotations/video_info_new.csv + +wget https://download.openmmlab.com/mmaction/localization/anet_activity_indexes_val.txt + +cd - diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_features.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_features.sh new file mode 100644 index 0000000000000000000000000000000000000000..b9762597b7ae92badba066bb02790508aa8982d3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_features.sh @@ -0,0 +1,11 @@ +DATA_DIR="../../../data/ActivityNet/activitynet_feature_cuhk/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1ISemndlSDS2FtqQOKL0t3Cjj9yk2yznF' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1ISemndlSDS2FtqQOKL0t3Cjj9yk2yznF" -O "csv_mean_100.zip" && rm -rf /tmp/cookies.txt + +unzip csv_mean_100.zip -d ${DATA_DIR}/ +rm csv_mean_100.zip diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..5d10a1017d52e19d74a7651712cfe09da6cf192b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/download_videos.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +# set up environment +conda env create -f environment.yml +source activate activitynet +pip install --upgrade youtube-dl +pip install mmcv + +DATA_DIR="../../../data/ActivityNet" +python download.py + +source deactivate activitynet +conda remove -n activitynet --all diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/environment.yml b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/environment.yml new file mode 100644 index 0000000000000000000000000000000000000000..f4e6d51fe8021aecf4829252a5966202debde4da --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/environment.yml @@ -0,0 +1,36 @@ +name: activitynet +channels: + - anaconda + - menpo + - conda-forge + - defaults +dependencies: + - ca-certificates=2020.1.1 + - certifi=2020.4.5.1 + - ffmpeg=2.8.6 + - libcxx=10.0.0 + - libedit=3.1.20181209 + - libffi=3.3 + - ncurses=6.2 + - openssl=1.1.1g + - pip=20.0.2 + - python=3.7.7 + - readline=8.0 + - setuptools=46.4.0 + - sqlite=3.31.1 + - tk=8.6.8 + - wheel=0.34.2 + - xz=5.2.5 + - zlib=1.2.11 + - pip: + - decorator==4.4.2 + - intel-openmp==2019.0 + - joblib==0.15.1 + - mkl==2019.0 + - numpy==1.18.4 + - olefile==0.46 + - pandas==1.0.3 + - python-dateutil==2.8.1 + - pytz==2020.1 + - six==1.14.0 + - youtube-dl diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..1449dded1f8d5b2281f500577f32e03067595b7f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/extract_frames.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash +cd ../ +python build_rawframes.py ../../data/ActivityNet/videos/ ../../data/ActivityNet/rawframes/ --level 1 --flow-type tvl1 --ext mp4 --task both --new-short 256 +echo "Raw frames (RGB and tv-l1) Generated for train set" + +cd activitynet/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/generate_rawframes_filelist.py b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/generate_rawframes_filelist.py new file mode 100644 index 0000000000000000000000000000000000000000..4be9262288d0ab3bece0ecf07ffe3693dae814f6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/generate_rawframes_filelist.py @@ -0,0 +1,113 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import json +import os +import os.path as osp + +data_file = '../../../data/ActivityNet' +video_list = f'{data_file}/video_info_new.csv' +anno_file = f'{data_file}/anet_anno_action.json' +rawframe_dir = f'{data_file}/rawframes' +action_name_list = 'action_name.csv' + +train_rawframe_dir = rawframe_dir +val_rawframe_dir = rawframe_dir + +json_file = f'{data_file}/activity_net.v1-3.min.json' + + +def generate_rawframes_filelist(): + load_dict = json.load(open(json_file)) + + anet_labels = open(action_name_list).readlines() + anet_labels = [x.strip() for x in anet_labels[1:]] + + train_dir_list = [ + osp.join(train_rawframe_dir, x) for x in os.listdir(train_rawframe_dir) + ] + val_dir_list = [ + osp.join(val_rawframe_dir, x) for x in os.listdir(val_rawframe_dir) + ] + + def simple_label(anno): + label = anno[0]['label'] + return anet_labels.index(label) + + def count_frames(dir_list, video): + for dir_name in dir_list: + if video in dir_name: + return osp.basename(dir_name), len(os.listdir(dir_name)) + return None, None + + database = load_dict['database'] + training = {} + validation = {} + key_dict = {} + + for k in database: + data = database[k] + subset = data['subset'] + + if subset in ['training', 'validation']: + annotations = data['annotations'] + label = simple_label(annotations) + if subset == 'training': + dir_list = train_dir_list + data_dict = training + else: + dir_list = val_dir_list + data_dict = validation + + else: + continue + + gt_dir_name, num_frames = count_frames(dir_list, k) + if gt_dir_name is None: + continue + data_dict[gt_dir_name] = [num_frames, label] + key_dict[gt_dir_name] = k + + train_lines = [ + k + ' ' + str(training[k][0]) + ' ' + str(training[k][1]) + for k in training + ] + val_lines = [ + k + ' ' + str(validation[k][0]) + ' ' + str(validation[k][1]) + for k in validation + ] + + with open(osp.join(data_file, 'anet_train_video.txt'), 'w') as fout: + fout.write('\n'.join(train_lines)) + with open(osp.join(data_file, 'anet_val_video.txt'), 'w') as fout: + fout.write('\n'.join(val_lines)) + + def clip_list(k, anno, video_anno): + duration = anno['duration'] + num_frames = video_anno[0] + fps = num_frames / duration + segs = anno['annotations'] + lines = [] + for seg in segs: + segment = seg['segment'] + label = seg['label'] + label = anet_labels.index(label) + start, end = int(segment[0] * fps), int(segment[1] * fps) + if end > num_frames - 1: + end = num_frames - 1 + newline = f'{k} {start} {end - start + 1} {label}' + lines.append(newline) + return lines + + train_clips, val_clips = [], [] + for k in training: + train_clips.extend(clip_list(k, database[key_dict[k]], training[k])) + for k in validation: + val_clips.extend(clip_list(k, database[key_dict[k]], validation[k])) + + with open(osp.join(data_file, 'anet_train_clip.txt'), 'w') as fout: + fout.write('\n'.join(train_clips)) + with open(osp.join(data_file, 'anet_val_clip.txt'), 'w') as fout: + fout.write('\n'.join(val_clips)) + + +if __name__ == '__main__': + generate_rawframes_filelist() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..6b1bb01db45609d71cad17c68c4789890e0230cd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/label_map.txt @@ -0,0 +1,200 @@ +Applying sunscreen +Arm wrestling +Assembling bicycle +BMX +Baking cookies +Baton twirling +Beach soccer +Beer pong +Blow-drying hair +Blowing leaves +Playing ten pins +Braiding hair +Building sandcastles +Bullfighting +Calf roping +Camel ride +Canoeing +Capoeira +Carving jack-o-lanterns +Changing car wheel +Cleaning sink +Clipping cat claws +Croquet +Curling +Cutting the grass +Decorating the Christmas tree +Disc dog +Doing a powerbomb +Doing crunches +Drum corps +Elliptical trainer +Doing fencing +Fixing the roof +Fun sliding down +Futsal +Gargling mouthwash +Grooming dog +Hand car wash +Hanging wallpaper +Having an ice cream +Hitting a pinata +Hula hoop +Hurling +Ice fishing +Installing carpet +Kite flying +Kneeling +Knitting +Laying tile +Longboarding +Making a cake +Making a lemonade +Making an omelette +Mooping floor +Painting fence +Painting furniture +Peeling potatoes +Plastering +Playing beach volleyball +Playing blackjack +Playing congas +Playing drums +Playing ice hockey +Playing pool +Playing rubik cube +Powerbocking +Putting in contact lenses +Putting on shoes +Rafting +Raking leaves +Removing ice from car +Riding bumper cars +River tubing +Rock-paper-scissors +Rollerblading +Roof shingle removal +Rope skipping +Running a marathon +Scuba diving +Sharpening knives +Shuffleboard +Skiing +Slacklining +Snow tubing +Snowboarding +Spread mulch +Sumo +Surfing +Swimming +Swinging at the playground +Table soccer +Throwing darts +Trimming branches or hedges +Tug of war +Using the monkey bar +Using the rowing machine +Wakeboarding +Waterskiing +Waxing skis +Welding +Drinking coffee +Zumba +Doing kickboxing +Doing karate +Tango +Putting on makeup +High jump +Playing bagpipes +Cheerleading +Wrapping presents +Cricket +Clean and jerk +Preparing pasta +Bathing dog +Discus throw +Playing field hockey +Grooming horse +Preparing salad +Playing harmonica +Playing saxophone +Chopping wood +Washing face +Using the pommel horse +Javelin throw +Spinning +Ping-pong +Making a sandwich +Brushing hair +Playing guitarra +Doing step aerobics +Drinking beer +Playing polo +Snatch +Paintball +Long jump +Cleaning windows +Brushing teeth +Playing flauta +Tennis serve with ball bouncing +Bungee jumping +Triple jump +Horseback riding +Layup drill in basketball +Vacuuming floor +Cleaning shoes +Doing nails +Shot put +Fixing bicycle +Washing hands +Ironing clothes +Using the balance beam +Shoveling snow +Tumbling +Using parallel bars +Getting a tattoo +Rock climbing +Smoking hookah +Shaving +Getting a piercing +Springboard diving +Playing squash +Playing piano +Dodgeball +Smoking a cigarette +Sailing +Getting a haircut +Playing lacrosse +Cumbia +Tai chi +Painting +Mowing the lawn +Shaving legs +Walking the dog +Hammer throw +Skateboarding +Polishing shoes +Ballet +Hand washing clothes +Plataform diving +Playing violin +Breakdancing +Windsurfing +Hopscotch +Doing motocross +Mixing drinks +Starting a campfire +Belly dance +Removing curlers +Archery +Volleyball +Playing water polo +Playing racquetball +Kayaking +Polishing forniture +Playing kickball +Using uneven bars +Washing dishes +Pole vault +Playing accordion +Playing badminton diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/process_annotations.py b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/process_annotations.py new file mode 100644 index 0000000000000000000000000000000000000000..09ed5b5c8f7ab1f83a129b3249ebd38ee8a86222 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/process_annotations.py @@ -0,0 +1,54 @@ +# Copyright (c) OpenMMLab. All rights reserved. +"""This file processes the annotation files and generates proper annotation +files for localizers.""" +import json + +import numpy as np + + +def load_json(file): + with open(file) as json_file: + data = json.load(json_file) + return data + + +data_file = '../../../data/ActivityNet' +info_file = f'{data_file}/video_info_new.csv' +ann_file = f'{data_file}/anet_anno_action.json' + +anno_database = load_json(ann_file) + +video_record = np.loadtxt(info_file, dtype=np.str, delimiter=',', skiprows=1) + +video_dict_train = {} +video_dict_val = {} +video_dict_test = {} +video_dict_full = {} + +for _, video_item in enumerate(video_record): + video_name = video_item[0] + video_info = anno_database[video_name] + video_subset = video_item[5] + video_info['fps'] = video_item[3].astype(np.float) + video_info['rfps'] = video_item[4].astype(np.float) + video_dict_full[video_name] = video_info + if video_subset == 'training': + video_dict_train[video_name] = video_info + elif video_subset == 'testing': + video_dict_test[video_name] = video_info + elif video_subset == 'validation': + video_dict_val[video_name] = video_info + +print(f'full subset video numbers: {len(video_record)}') + +with open(f'{data_file}/anet_anno_train.json', 'w') as result_file: + json.dump(video_dict_train, result_file) + +with open(f'{data_file}/anet_anno_val.json', 'w') as result_file: + json.dump(video_dict_val, result_file) + +with open(f'{data_file}/anet_anno_test.json', 'w') as result_file: + json.dump(video_dict_test, result_file) + +with open(f'{data_file}/anet_anno_full.json', 'w') as result_file: + json.dump(video_dict_full, result_file) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/tsn_feature_extraction.py b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/tsn_feature_extraction.py new file mode 100644 index 0000000000000000000000000000000000000000..c3d53f46e509b1d70132ab646ace9d4fe2714aab --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/activitynet/tsn_feature_extraction.py @@ -0,0 +1,149 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import os.path as osp +import pickle + +import mmcv +import numpy as np +import torch + +from mmaction.datasets.pipelines import Compose +from mmaction.models import build_model + + +def parse_args(): + parser = argparse.ArgumentParser(description='Extract TSN Feature') + parser.add_argument('--data-prefix', default='', help='dataset prefix') + parser.add_argument('--output-prefix', default='', help='output prefix') + parser.add_argument( + '--data-list', + help='video list of the dataset, the format should be ' + '`frame_dir num_frames output_file`') + parser.add_argument( + '--frame-interval', + type=int, + default=16, + help='the sampling frequency of frame in the untrimed video') + parser.add_argument('--modality', default='RGB', choices=['RGB', 'Flow']) + parser.add_argument('--ckpt', help='checkpoint for feature extraction') + parser.add_argument( + '--part', + type=int, + default=0, + help='which part of dataset to forward(alldata[part::total])') + parser.add_argument( + '--total', type=int, default=1, help='how many parts exist') + args = parser.parse_args() + return args + + +def main(): + args = parse_args() + args.is_rgb = args.modality == 'RGB' + args.clip_len = 1 if args.is_rgb else 5 + args.input_format = 'NCHW' if args.is_rgb else 'NCHW_Flow' + rgb_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + to_bgr=False) + flow_norm_cfg = dict(mean=[128, 128], std=[128, 128]) + args.img_norm_cfg = rgb_norm_cfg if args.is_rgb else flow_norm_cfg + args.f_tmpl = 'img_{:05d}.jpg' if args.is_rgb else 'flow_{}_{:05d}.jpg' + args.in_channels = args.clip_len * (3 if args.is_rgb else 2) + # max batch_size for one forward + args.batch_size = 200 + + # define the data pipeline for Untrimmed Videos + data_pipeline = [ + dict( + type='UntrimmedSampleFrames', + clip_len=args.clip_len, + frame_interval=args.frame_interval, + start_index=0), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=256), + dict(type='Normalize', **args.img_norm_cfg), + dict(type='FormatShape', input_format=args.input_format), + dict(type='Collect', keys=['imgs'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) + ] + data_pipeline = Compose(data_pipeline) + + # define TSN R50 model, the model is used as the feature extractor + model_cfg = dict( + type='Recognizer2D', + backbone=dict( + type='ResNet', + depth=50, + in_channels=args.in_channels, + norm_eval=False), + cls_head=dict( + type='TSNHead', + num_classes=200, + in_channels=2048, + spatial_type='avg', + consensus=dict(type='AvgConsensus', dim=1)), + test_cfg=dict(average_clips=None)) + model = build_model(model_cfg) + # load pretrained weight into the feature extractor + state_dict = torch.load(args.ckpt)['state_dict'] + model.load_state_dict(state_dict) + model = model.cuda() + model.eval() + + data = open(args.data_list).readlines() + data = [x.strip() for x in data] + data = data[args.part::args.total] + + # enumerate Untrimmed videos, extract feature from each of them + prog_bar = mmcv.ProgressBar(len(data)) + if not osp.exists(args.output_prefix): + os.system(f'mkdir -p {args.output_prefix}') + + for item in data: + frame_dir, length, _ = item.split() + output_file = osp.basename(frame_dir) + '.pkl' + frame_dir = osp.join(args.data_prefix, frame_dir) + output_file = osp.join(args.output_prefix, output_file) + assert output_file.endswith('.pkl') + length = int(length) + + # prepare a pseudo sample + tmpl = dict( + frame_dir=frame_dir, + total_frames=length, + filename_tmpl=args.f_tmpl, + start_index=0, + modality=args.modality) + sample = data_pipeline(tmpl) + imgs = sample['imgs'] + shape = imgs.shape + # the original shape should be N_seg * C * H * W, resize it to N_seg * + # 1 * C * H * W so that the network return feature of each frame (No + # score average among segments) + imgs = imgs.reshape((shape[0], 1) + shape[1:]) + imgs = imgs.cuda() + + def forward_data(model, data): + # chop large data into pieces and extract feature from them + results = [] + start_idx = 0 + num_clip = data.shape[0] + while start_idx < num_clip: + with torch.no_grad(): + part = data[start_idx:start_idx + args.batch_size] + feat = model.forward(part, return_loss=False) + results.append(feat) + start_idx += args.batch_size + return np.concatenate(results) + + feat = forward_data(model, imgs) + with open(output_file, 'wb') as fout: + pickle.dump(feat, fout) + prog_bar.update() + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/anno_txt2json.py b/openmmlab_test/mmaction2-0.24.1/tools/data/anno_txt2json.py new file mode 100644 index 0000000000000000000000000000000000000000..fcefc7778e4f860697e6d9102e9b8f3dfad48eed --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/anno_txt2json.py @@ -0,0 +1,103 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse + +import mmcv + + +def parse_args(): + parser = argparse.ArgumentParser( + description='Convert txt annotation list to json') + parser.add_argument( + 'annofile', type=str, help='the txt annotation file to convert') + parser.add_argument( + '--format', + type=str, + default='rawframes', + choices=['rawframes', 'videos'], + help='the format of the txt annotation file') + parser.add_argument( + '--output', + type=str, + default=None, + help=( + 'the output file name, use annofile.replace(\'.txt\', \'.json\') ' + 'if the arg value is None')) + args = parser.parse_args() + + return args + + +def lines2dictlist(lines, format): + """Convert lines in 'txt' format to dictionaries in 'json' format. + Currently support single-label and multi-label. + + Example of a single-label rawframes annotation txt file: + + .. code-block:: txt + + (frame_dir num_frames label) + some/directory-1 163 1 + some/directory-2 122 1 + some/directory-3 258 2 + + Example of a multi-label rawframes annotation txt file: + + .. code-block:: txt + + (frame_dir num_frames label1 label2 ...) + some/directory-1 163 1 3 5 + some/directory-2 122 1 2 + some/directory-3 258 2 + + Example of a single-label videos annotation txt file: + + .. code-block:: txt + + (filename label) + some/path/000.mp4 1 + some/path/001.mp4 1 + some/path/002.mp4 2 + + Example of a multi-label videos annotation txt file: + + .. code-block:: txt + + (filename label1 label2 ...) + some/path/000.mp4 1 3 5 + some/path/001.mp4 1 4 8 + some/path/002.mp4 2 4 9 + + Args: + lines (list): List of lines in 'txt' label format. + format (str): Data format, choices are 'rawframes' and 'videos'. + + Returns: + list[dict]: For rawframes format, each dict has keys: frame_dir, + total_frames, label; for videos format, each diction has keys: + filename, label. + """ + lines = [x.split() for x in lines] + if format == 'rawframes': + data = [ + dict( + frame_dir=line[0], + total_frames=int(line[1]), + label=[int(x) for x in line[2:]]) for line in lines + ] + elif format == 'videos': + data = [ + dict(filename=line[0], label=[int(x) for x in line[1:]]) + for line in lines + ] + return data + + +if __name__ == '__main__': + # convert txt anno list to json + args = parse_args() + lines = open(args.annofile).readlines() + lines = [x.strip() for x in lines] + result = lines2dictlist(lines, args.format) + if args.output is None: + args.output = args.annofile.replace('.txt', '.json') + mmcv.dump(result, args.output) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/AVA_annotation_explained.md b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/AVA_annotation_explained.md new file mode 100644 index 0000000000000000000000000000000000000000..3d0002d1b35bfb240da1f6dc34d05596f29d27b5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/AVA_annotation_explained.md @@ -0,0 +1,34 @@ +# AVA Annotation Explained + +In this section, we explain the annotation format of AVA in details: + +``` +mmaction2 +├── data +│ ├── ava +│ │ ├── annotations +│ │ | ├── ava_dense_proposals_train.FAIR.recall_93.9.pkl +│ │ | ├── ava_dense_proposals_val.FAIR.recall_93.9.pkl +│ │ | ├── ava_dense_proposals_test.FAIR.recall_93.9.pkl +│ │ | ├── ava_train_v2.1.csv +│ │ | ├── ava_val_v2.1.csv +│ │ | ├── ava_train_excluded_timestamps_v2.1.csv +│ │ | ├── ava_val_excluded_timestamps_v2.1.csv +│ │ | ├── ava_action_list_v2.1_for_activitynet_2018.pbtxt +``` + +## The proposals generated by human detectors + +In the annotation folder, `ava_dense_proposals_[train/val/test].FAIR.recall_93.9.pkl` are human proposals generated by a human detector. They are used in training, validation and testing respectively. Take `ava_dense_proposals_train.FAIR.recall_93.9.pkl` as an example. It is a dictionary of size 203626. The key consists of the `videoID` and the `timestamp`. For example, the key `-5KQ66BBWC4,0902` means the values are the detection results for the frame at the $$902\_{nd}$$ second in the video `-5KQ66BBWC4`. The values in the dictionary are numpy arrays with shape $$N \\times 5$$ , $$N$$ is the number of detected human bounding boxes in the corresponding frame. The format of bounding box is $$\[x_1, y_1, x_2, y_2, score\], 0 \\le x_1, y_1, x_2, w_2, score \\le 1$$. $$(x_1, y_1)$$ indicates the top-left corner of the bounding box, $$(x_2, y_2)$$ indicates the bottom-right corner of the bounding box; $$(0, 0)$$ indicates the top-left corner of the image, while $$(1, 1)$$ indicates the bottom-right corner of the image. + +## The ground-truth labels for spatio-temporal action detection + +In the annotation folder, `ava_[train/val]_v[2.1/2.2].csv` are ground-truth labels for spatio-temporal action detection, which are used during training & validation. Take `ava_train_v2.1.csv` as an example, it is a csv file with 837318 lines, each line is the annotation for a human instance in one frame. For example, the first line in `ava_train_v2.1.csv` is `'-5KQ66BBWC4,0902,0.077,0.151,0.283,0.811,80,1'`: the first two items `-5KQ66BBWC4` and `0902` indicate that it corresponds to the $$902\_{nd}$$ second in the video `-5KQ66BBWC4`. The next four items ($$\[0.077(x_1), 0.151(y_1), 0.283(x_2), 0.811(y_2)\]$$) indicates the location of the bounding box, the bbox format is the same as human proposals. The next item `80` is the action label. The last item `1` is the ID of this bounding box. + +## Excluded timestamps + +`ava_[train/val]_excludes_timestamps_v[2.1/2.2].csv` contains excluded timestamps which are not used during training or validation. The format is `video_id, second_idx` . + +## Label map + +`ava_action_list_v[2.1/2.2]_for_activitynet_[2018/2019].pbtxt` contains the label map of the AVA dataset, which maps the action name to the label index. diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a416eb26320e7933a0f0d43007990dae07342352 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/README.md @@ -0,0 +1,148 @@ +# Preparing AVA + +## Introduction + + + +```BibTeX +@inproceedings{gu2018ava, + title={Ava: A video dataset of spatio-temporally localized atomic visual actions}, + author={Gu, Chunhui and Sun, Chen and Ross, David A and Vondrick, Carl and Pantofaru, Caroline and Li, Yeqing and Vijayanarasimhan, Sudheendra and Toderici, George and Ricco, Susanna and Sukthankar, Rahul and others}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={6047--6056}, + year={2018} +} +``` + +For basic dataset information, please refer to the official [website](https://research.google.com/ava/index.html). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/ava/`. + +## Step 1. Prepare Annotations + +First of all, you can run the following script to prepare annotations. + +```shell +bash download_annotations.sh +``` + +This command will download `ava_v2.1.zip` for AVA `v2.1` annotation. If you need the AVA `v2.2` annotation, you can try the following script. + +```shell +VERSION=2.2 bash download_annotations.sh +``` + +## Step 2. Prepare Videos + +Then, use the following script to prepare videos. The codes are adapted from the [official crawler](https://github.com/cvdfoundation/ava-dataset). +Note that this might take a long time. + +```shell +bash download_videos.sh +``` + +Or you can use the following command to downloading AVA videos in parallel using a python script. + +```shell +bash download_videos_parallel.sh +``` + +Note that if you happen to have sudoer or have [GNU parallel](https://www.gnu.org/software/parallel/) on your machine, +you can speed up the procedure by downloading in parallel. + +```shell +# sudo apt-get install parallel +bash download_videos_gnu_parallel.sh +``` + +## Step 3. Cut Videos + +Cut each video from its 15th to 30th minute and make them at 30 fps. + +```shell +bash cut_videos.sh +``` + +## Step 4. Extract RGB and Flow + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. And you can run the following script to soft link the extracted frames. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/ava_extracted/ +ln -s /mnt/SSD/ava_extracted/ ../data/ava/rawframes/ +``` + +If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract **RGB-only** frames using denseflow. + +```shell +bash extract_rgb_frames.sh +``` + +If you didn't install denseflow, you can still extract RGB frames using ffmpeg by the following script. + +```shell +bash extract_rgb_frames_ffmpeg.sh +``` + +If both are required, run the following script to extract frames. + +```shell +bash extract_frames.sh +``` + +## Step 5. Fetch Proposal Files + +The scripts are adapted from FAIR's [Long-Term Feature Banks](https://github.com/facebookresearch/video-long-term-feature-banks). + +Run the following scripts to fetch the pre-computed proposal list. + +```shell +bash fetch_ava_proposals.sh +``` + +## Step 6. Folder Structure + +After the whole data pipeline for AVA preparation. +you can get the rawframes (RGB + Flow), videos and annotation files for AVA. + +In the context of the whole project (for AVA only), the *minimal* folder structure will look like: +(*minimal* means that some data are not necessary: for example, you may want to evaluate AVA using the original video format.) + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ava +│ │ ├── annotations +│ │ | ├── ava_dense_proposals_train.FAIR.recall_93.9.pkl +│ │ | ├── ava_dense_proposals_val.FAIR.recall_93.9.pkl +│ │ | ├── ava_dense_proposals_test.FAIR.recall_93.9.pkl +│ │ | ├── ava_train_v2.1.csv +│ │ | ├── ava_val_v2.1.csv +│ │ | ├── ava_train_excluded_timestamps_v2.1.csv +│ │ | ├── ava_val_excluded_timestamps_v2.1.csv +│ │ | ├── ava_action_list_v2.1_for_activitynet_2018.pbtxt +│ │ ├── videos +│ │ │ ├── 053oq2xB3oU.mkv +│ │ │ ├── 0f39OWEqJ24.mp4 +│ │ │ ├── ... +│ │ ├── videos_15min +│ │ │ ├── 053oq2xB3oU.mkv +│ │ │ ├── 0f39OWEqJ24.mp4 +│ │ │ ├── ... +│ │ ├── rawframes +│ │ │ ├── 053oq2xB3oU +| │ │ │ ├── img_00001.jpg +| │ │ │ ├── img_00002.jpg +| │ │ │ ├── ... +``` + +For training and evaluating on AVA, please refer to [getting_started](/docs/getting_started.md). + +## Reference + +1. O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..5a7b96da88c98f629f7061158cddb0d831a0af6e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/README_zh-CN.md @@ -0,0 +1,134 @@ +# 准备 AVA + +## 简介 + + + +```BibTeX +@inproceedings{gu2018ava, + title={Ava: A video dataset of spatio-temporally localized atomic visual actions}, + author={Gu, Chunhui and Sun, Chen and Ross, David A and Vondrick, Carl and Pantofaru, Caroline and Li, Yeqing and Vijayanarasimhan, Sudheendra and Toderici, George and Ricco, Susanna and Sukthankar, Rahul and others}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={6047--6056}, + year={2018} +} +``` + +请参照 [官方网站](https://research.google.com/ava/index.html) 以获取数据集基本信息。 +在开始之前,用户需确保当前目录为 `$MMACTION2/tools/data/ava/`。 + +## 1. 准备标注文件 + +首先,用户可以使用如下脚本下载标注文件并进行预处理: + +```shell +bash download_annotations.sh +``` + +这一命令将下载 `ava_v2.1.zip` 以得到 AVA v2.1 标注文件。如用户需要 AVA v2.2 标注文件,可使用以下脚本: + +```shell +VERSION=2.2 bash download_annotations.sh +``` + +## 2. 下载视频 + +用户可以使用以下脚本准备视频,视频准备代码修改自 [官方爬虫](https://github.com/cvdfoundation/ava-dataset)。 +注意这一步骤将花费较长时间。 + +```shell +bash download_videos.sh +``` + +亦可使用以下脚本,使用 python 并行下载 AVA 数据集视频: + +```shell +bash download_videos_parallel.sh +``` + +## 3. 截取视频 + +截取每个视频中的 15 到 30 分钟,设定帧率为 30。 + +```shell +bash cut_videos.sh +``` + +## 4. 提取 RGB 帧和光流 + +在提取之前,请参考 [安装教程](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果用户有足够的 SSD 空间,那么建议将视频抽取为 RGB 帧以提升 I/O 性能。用户可以使用以下脚本为抽取得到的帧文件夹建立软连接: + +```shell +# 执行以下脚本 (假设 SSD 被挂载在 "/mnt/SSD/") +mkdir /mnt/SSD/ava_extracted/ +ln -s /mnt/SSD/ava_extracted/ ../data/ava/rawframes/ +``` + +如果用户只使用 RGB 帧(由于光流提取非常耗时),可执行以下脚本使用 denseflow 提取 RGB 帧: + +```shell +bash extract_rgb_frames.sh +``` + +如果用户未安装 denseflow,可执行以下脚本使用 ffmpeg 提取 RGB 帧: + +```shell +bash extract_rgb_frames_ffmpeg.sh +``` + +如果同时需要 RGB 帧和光流,可使用如下脚本抽帧: + +```shell +bash extract_frames.sh +``` + +## 5. 下载 AVA 上人体检测结果 + +以下脚本修改自 [Long-Term Feature Banks](https://github.com/facebookresearch/video-long-term-feature-banks)。 + +可使用以下脚本下载 AVA 上预先计算的人体检测结果: + +```shell +bash fetch_ava_proposals.sh +``` + +## 6. 目录结构 + +在完整完成 AVA 的数据处理后,将得到帧文件夹(RGB 帧和光流帧),视频以及标注文件。 + +在整个项目目录下(仅针对 AVA),*最简* 目录结构如下所示: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ava +│ │ ├── annotations +│ │ | ├── ava_dense_proposals_train.FAIR.recall_93.9.pkl +│ │ | ├── ava_dense_proposals_val.FAIR.recall_93.9.pkl +│ │ | ├── ava_dense_proposals_test.FAIR.recall_93.9.pkl +│ │ | ├── ava_train_v2.1.csv +│ │ | ├── ava_val_v2.1.csv +│ │ | ├── ava_train_excluded_timestamps_v2.1.csv +│ │ | ├── ava_val_excluded_timestamps_v2.1.csv +│ │ | ├── ava_action_list_v2.1_for_activitynet_2018.pbtxt +│ │ ├── videos +│ │ │ ├── 053oq2xB3oU.mkv +│ │ │ ├── 0f39OWEqJ24.mp4 +│ │ │ ├── ... +│ │ ├── videos_15min +│ │ │ ├── 053oq2xB3oU.mkv +│ │ │ ├── 0f39OWEqJ24.mp4 +│ │ │ ├── ... +│ │ ├── rawframes +│ │ │ ├── 053oq2xB3oU +| │ │ │ ├── img_00001.jpg +| │ │ │ ├── img_00002.jpg +| │ │ │ ├── ... +``` + +关于 AVA 数据集上的训练与测试,请参照 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/cut_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/cut_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..763c9127f47eb20d94397d0856fbcfee4f4cc351 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/cut_videos.sh @@ -0,0 +1,34 @@ +#!/usr/bin/env bash + +# Copyright (c) Facebook, Inc. and its affiliates. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +############################################################################## + +# Cut each video from its 15th to 30th minute. + +IN_DATA_DIR="../../../data/ava/videos" +OUT_DATA_DIR="../../../data/ava/videos_15min" + +if [[ ! -d "${OUT_DATA_DIR}" ]]; then + echo "${OUT_DATA_DIR} doesn't exist. Creating it."; + mkdir -p ${OUT_DATA_DIR} +fi + +for video in $(ls -A1 -U ${IN_DATA_DIR}/*) +do + out_name="${OUT_DATA_DIR}/${video##*/}" + if [ ! -f "${out_name}" ]; then + ffmpeg -ss 900 -t 901 -i "${video}" -r 30 -strict experimental "${out_name}" + fi +done diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..ba4a501583b020a4f72e5479eb20bfe928005a44 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_annotations.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +set -e + +VERSION=${VERSION:-"2.1"} +DATA_DIR="../../../data/ava/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +wget https://research.google.com/ava/download/ava_v${VERSION}.zip +unzip -j ava_v${VERSION}.zip -d ${DATA_DIR}/ +rm ava_v${VERSION}.zip diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..ba8c5692b2dd13bca7c9e1d0d3736496da0d036b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash + +set -e + +DATA_DIR="../../../data/ava/videos" +ANNO_DIR="../../../data/ava/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +wget https://s3.amazonaws.com/ava-dataset/annotations/ava_file_names_trainval_v2.1.txt -P ${ANNO_DIR} + +cat ${ANNO_DIR}/ava_file_names_trainval_v2.1.txt | +while read vid; + do wget -c "https://s3.amazonaws.com/ava-dataset/trainval/${vid}" -P ${DATA_DIR}; done + +echo "Downloading finished." diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos_gnu_parallel.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos_gnu_parallel.sh new file mode 100644 index 0000000000000000000000000000000000000000..6ef5bf11cf4c4d7f2f4d5da8f008f74ef702af53 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos_gnu_parallel.sh @@ -0,0 +1,20 @@ +#!/usr/bin/env bash + +set -e + +DATA_DIR="../../../data/ava/videos" +ANNO_DIR="../../../data/ava/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +wget https://s3.amazonaws.com/ava-dataset/annotations/ava_file_names_trainval_v2.1.txt -P ${ANNO_DIR} + +# sudo apt-get install parallel +# parallel downloading to speed up +awk '{print "https://s3.amazonaws.com/ava-dataset/trainval/"$0}' ${ANNO_DIR}/ava_file_names_trainval_v2.1.txt | +parallel -j 8 wget -c -q {} -P ${DATA_DIR} + +echo "Downloading finished." diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos_parallel.py b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos_parallel.py new file mode 100644 index 0000000000000000000000000000000000000000..7be4b1b883738ee7f6dec903e7dfa08b805a23cd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos_parallel.py @@ -0,0 +1,66 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os.path as osp +import subprocess + +import mmcv +from joblib import Parallel, delayed + +URL_PREFIX = 'https://s3.amazonaws.com/ava-dataset/trainval/' + + +def download_video(video_url, output_dir, num_attempts=5): + video_file = osp.basename(video_url) + output_file = osp.join(output_dir, video_file) + + status = False + + if not osp.exists(output_file): + command = ['wget', '-c', video_url, '-P', output_dir] + command = ' '.join(command) + print(command) + attempts = 0 + while True: + try: + subprocess.check_output( + command, shell=True, stderr=subprocess.STDOUT) + except subprocess.CalledProcessError: + attempts += 1 + if attempts == num_attempts: + return status, 'Downloading Failed' + else: + break + + status = osp.exists(output_file) + return status, 'Downloaded' + + +def main(source_file, output_dir, num_jobs=24, num_attempts=5): + mmcv.mkdir_or_exist(output_dir) + video_list = open(source_file).read().strip().split('\n') + video_list = [osp.join(URL_PREFIX, video) for video in video_list] + + if num_jobs == 1: + status_list = [] + for video in video_list: + video_list.append(download_video(video, output_dir, num_attempts)) + else: + status_list = Parallel(n_jobs=num_jobs)( + delayed(download_video)(video, output_dir, num_attempts) + for video in video_list) + + mmcv.dump(status_list, 'download_report.json') + + +if __name__ == '__main__': + description = 'Helper script for downloading AVA videos' + parser = argparse.ArgumentParser(description=description) + parser.add_argument( + 'source_file', type=str, help='TXT file containing the video filename') + parser.add_argument( + 'output_dir', + type=str, + help='Output directory where videos will be saved') + parser.add_argument('-n', '--num-jobs', type=int, default=24) + parser.add_argument('--num-attempts', type=int, default=5) + main(**vars(parser.parse_args())) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos_parallel.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos_parallel.sh new file mode 100644 index 0000000000000000000000000000000000000000..23329227201d0bb76a98d65d19dc3051e8a831b8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/download_videos_parallel.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +set -e + +DATA_DIR="../../../data/ava/videos" +ANNO_DIR="../../../data/ava/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +wget https://s3.amazonaws.com/ava-dataset/annotations/ava_file_names_trainval_v2.1.txt -P ${ANNO_DIR} + +python download_videos_parallel.py ${ANNO_DIR}/ava_file_names_trainval_v2.1.txt ${DATA_DIR} diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..68f5cf0c879c2c4dab970825d013641687538fc6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/extract_frames.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/ava/videos_15min/ ../../data/ava/rawframes/ --task both --level 1 --flow-type tvl1 --mixed-ext +echo "Raw frames (RGB and Flow) Generated" +cd ava/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/extract_rgb_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/extract_rgb_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..84d21d8458548f3e7602ddae33c6fab50da44aa8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/extract_rgb_frames.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/ava/videos_15min/ ../../data/ava/rawframes/ --task rgb --level 1 --mixed-ext +echo "Genearte raw frames (RGB only)" + +cd ava/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/extract_rgb_frames_ffmpeg.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/extract_rgb_frames_ffmpeg.sh new file mode 100644 index 0000000000000000000000000000000000000000..b299a5cdba36add2f668f68f41daa62951745a12 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/extract_rgb_frames_ffmpeg.sh @@ -0,0 +1,44 @@ +#!/usr/bin/env bash + +# Copyright (c) Facebook, Inc. and its affiliates. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +############################################################################## + +# Extract frames from videos. + +IN_DATA_DIR="../../../data/ava/videos_15min" +OUT_DATA_DIR="../../../data/ava/rawframes" + +if [[ ! -d "${OUT_DATA_DIR}" ]]; then + echo "${OUT_DATA_DIR} doesn't exist. Creating it."; + mkdir -p ${OUT_DATA_DIR} +fi + +for video in $(ls -A1 -U ${IN_DATA_DIR}/*) +do + video_name=${video##*/} + + if [[ $video_name = *".webm" ]]; then + video_name=${video_name::-5} + else + video_name=${video_name::-4} + fi + + out_video_dir=${OUT_DATA_DIR}/${video_name} + mkdir -p "${out_video_dir}" + + out_name="${out_video_dir}/img_%05d.jpg" + + ffmpeg -i "${video}" -r 30 -q:v 1 "${out_name}" +done diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/fetch_ava_proposals.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/fetch_ava_proposals.sh new file mode 100644 index 0000000000000000000000000000000000000000..57d2b2aa0743f1dc10a9ff01a93d0a062110763b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/fetch_ava_proposals.sh @@ -0,0 +1,9 @@ +#!/usr/bin/env bash + +set -e + +DATA_DIR="../../../data/ava/annotations" + +wget https://download.openmmlab.com/mmaction/dataset/ava/ava_dense_proposals_train.FAIR.recall_93.9.pkl -P ${DATA_DIR} +wget https://download.openmmlab.com/mmaction/dataset/ava/ava_dense_proposals_val.FAIR.recall_93.9.pkl -P ${DATA_DIR} +wget https://download.openmmlab.com/mmaction/dataset/ava/ava_dense_proposals_test.FAIR.recall_93.9.pkl -P ${DATA_DIR} diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ava/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..0348b039963b0ba3db6dfc3ef4dabc670722affc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ava/label_map.txt @@ -0,0 +1,60 @@ +1: bend/bow (at the waist) +3: crouch/kneel +4: dance +5: fall down +6: get up +7: jump/leap +8: lie/sleep +9: martial art +10: run/jog +11: sit +12: stand +13: swim +14: walk +15: answer phone +17: carry/hold (an object) +20: climb (e.g., a mountain) +22: close (e.g., a door, a box) +24: cut +26: dress/put on clothing +27: drink +28: drive (e.g., a car, a truck) +29: eat +30: enter +34: hit (an object) +36: lift/pick up +37: listen (e.g., to music) +38: open (e.g., a window, a car door) +41: play musical instrument +43: point to (an object) +45: pull (an object) +46: push (an object) +47: put down +48: read +49: ride (e.g., a bike, a car, a horse) +51: sail boat +52: shoot +54: smoke +56: take a photo +57: text on/look at a cellphone +58: throw +59: touch (an object) +60: turn (e.g., a screwdriver) +61: watch (e.g., TV) +62: work on a computer +63: write +64: fight/hit (a person) +65: give/serve (an object) to (a person) +66: grab (a person) +67: hand clap +68: hand shake +69: hand wave +70: hug (a person) +72: kiss (a person) +73: lift (a person) +74: listen to (a person) +76: push (another person) +77: sing to (e.g., self, a person, a group) +78: take (an object) from (a person) +79: talk to (e.g., self, a person, a group) +80: watch (a person) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/build_audio_features.py b/openmmlab_test/mmaction2-0.24.1/tools/data/build_audio_features.py new file mode 100644 index 0000000000000000000000000000000000000000..f143427c506a788cf589b6f94bd67b7bb3fdc1f3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/build_audio_features.py @@ -0,0 +1,316 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import glob +import os +import os.path as osp +import sys +from multiprocessing import Pool + +import mmcv +import numpy as np +from scipy.io import wavfile + +try: + import librosa + import lws +except ImportError: + print('Please import librosa, lws first.') + +sys.path.append('..') + +SILENCE_THRESHOLD = 2 +FMIN = 125 +FMAX = 7600 +FRAME_SHIFT_MS = None +MIN_LEVEL_DB = -100 +REF_LEVEL_DB = 20 +RESCALING = True +RESCALING_MAX = 0.999 +ALLOW_CLIPPING_IN_NORMALIZATION = True +LOG_SCALE_MIN = -32.23619130191664 +NORM_AUDIO = True + + +class AudioTools: + """All methods related to audio feature extraction. Code Reference: + + `_, + `_. + + Args: + frame_rate (int): The frame rate per second of the video. Default: 30. + sample_rate (int): The sample rate for audio sampling. Default: 16000. + num_mels (int): Number of channels of the melspectrogram. Default: 80. + fft_size (int): fft_size / sample_rate is window size. Default: 1280. + hop_size (int): hop_size / sample_rate is step size. Default: 320. + """ + + def __init__(self, + frame_rate=30, + sample_rate=16000, + num_mels=80, + fft_size=1280, + hop_size=320, + spectrogram_type='lws'): + self.frame_rate = frame_rate + self.sample_rate = sample_rate + self.silence_threshold = SILENCE_THRESHOLD + self.num_mels = num_mels + self.fmin = FMIN + self.fmax = FMAX + self.fft_size = fft_size + self.hop_size = hop_size + self.frame_shift_ms = FRAME_SHIFT_MS + self.min_level_db = MIN_LEVEL_DB + self.ref_level_db = REF_LEVEL_DB + self.rescaling = RESCALING + self.rescaling_max = RESCALING_MAX + self.allow_clipping_in_normalization = ALLOW_CLIPPING_IN_NORMALIZATION + self.log_scale_min = LOG_SCALE_MIN + self.norm_audio = NORM_AUDIO + self.spectrogram_type = spectrogram_type + assert spectrogram_type in ['lws', 'librosa'] + + def load_wav(self, path): + """Load an audio file into numpy array.""" + return librosa.core.load(path, sr=self.sample_rate)[0] + + @staticmethod + def audio_normalize(samples, desired_rms=0.1, eps=1e-4): + """RMS normalize the audio data.""" + rms = np.maximum(eps, np.sqrt(np.mean(samples**2))) + samples = samples * (desired_rms / rms) + return samples + + def generate_spectrogram_magphase(self, audio, with_phase=False): + """Separate a complex-valued spectrogram D into its magnitude (S) + + and phase (P) components, so that D = S * P. + + Args: + audio (np.ndarray): The input audio signal. + with_phase (bool): Determines whether to output the + phase components. Default: False. + + Returns: + np.ndarray: magnitude and phase component of the complex-valued + spectrogram. + """ + spectro = librosa.core.stft( + audio, + hop_length=self.get_hop_size(), + n_fft=self.fft_size, + center=True) + spectro_mag, spectro_phase = librosa.core.magphase(spectro) + spectro_mag = np.expand_dims(spectro_mag, axis=0) + if with_phase: + spectro_phase = np.expand_dims(np.angle(spectro_phase), axis=0) + return spectro_mag, spectro_phase + + return spectro_mag + + def save_wav(self, wav, path): + """Save the wav to disk.""" + # 32767 = (2 ^ 15 - 1) maximum of int16 + wav *= 32767 / max(0.01, np.max(np.abs(wav))) + wavfile.write(path, self.sample_rate, wav.astype(np.int16)) + + def trim(self, quantized): + """Trim the audio wavfile.""" + start, end = self.start_and_end_indices(quantized, + self.silence_threshold) + return quantized[start:end] + + def adjust_time_resolution(self, quantized, mel): + """Adjust time resolution by repeating features. + + Args: + quantized (np.ndarray): (T,) + mel (np.ndarray): (N, D) + + Returns: + tuple: Tuple of (T,) and (T, D) + """ + assert quantized.ndim == 1 + assert mel.ndim == 2 + + upsample_factor = quantized.size // mel.shape[0] + mel = np.repeat(mel, upsample_factor, axis=0) + n_pad = quantized.size - mel.shape[0] + if n_pad != 0: + assert n_pad > 0 + mel = np.pad( + mel, [(0, n_pad), (0, 0)], mode='constant', constant_values=0) + + # trim + start, end = self.start_and_end_indices(quantized, + self.silence_threshold) + + return quantized[start:end], mel[start:end, :] + + @staticmethod + def start_and_end_indices(quantized, silence_threshold=2): + """Trim the audio file when reaches the silence threshold.""" + for start in range(quantized.size): + if abs(quantized[start] - 127) > silence_threshold: + break + for end in range(quantized.size - 1, 1, -1): + if abs(quantized[end] - 127) > silence_threshold: + break + + assert abs(quantized[start] - 127) > silence_threshold + assert abs(quantized[end] - 127) > silence_threshold + + return start, end + + def melspectrogram(self, y): + """Generate the melspectrogram.""" + D = self._lws_processor().stft(y).T + S = self._amp_to_db(self._linear_to_mel(np.abs(D))) - self.ref_level_db + if not self.allow_clipping_in_normalization: + assert S.max() <= 0 and S.min() - self.min_level_db >= 0 + return self._normalize(S) + + def get_hop_size(self): + """Calculate the hop size.""" + hop_size = self.hop_size + if hop_size is None: + assert self.frame_shift_ms is not None + hop_size = int(self.frame_shift_ms / 1000 * self.sample_rate) + return hop_size + + def _lws_processor(self): + """Perform local weighted sum. + + Please refer to `_. + """ + return lws.lws(self.fft_size, self.get_hop_size(), mode='speech') + + @staticmethod + def lws_num_frames(length, fsize, fshift): + """Compute number of time frames of lws spectrogram. + + Please refer to `_. + """ + pad = (fsize - fshift) + if length % fshift == 0: + M = (length + pad * 2 - fsize) // fshift + 1 + else: + M = (length + pad * 2 - fsize) // fshift + 2 + return M + + def lws_pad_lr(self, x, fsize, fshift): + """Compute left and right padding lws internally uses. + + Please refer to `_. + """ + M = self.lws_num_frames(len(x), fsize, fshift) + pad = (fsize - fshift) + T = len(x) + 2 * pad + r = (M - 1) * fshift + fsize - T + return pad, pad + r + + def _linear_to_mel(self, spectrogram): + """Warp linear scale spectrograms to the mel scale. + + Please refer to `_ + """ + global _mel_basis + _mel_basis = self._build_mel_basis() + return np.dot(_mel_basis, spectrogram) + + def _build_mel_basis(self): + """Build mel filters. + + Please refer to `_ + """ + assert self.fmax <= self.sample_rate // 2 + return librosa.filters.mel( + self.sample_rate, + self.fft_size, + fmin=self.fmin, + fmax=self.fmax, + n_mels=self.num_mels) + + def _amp_to_db(self, x): + min_level = np.exp(self.min_level_db / 20 * np.log(10)) + return 20 * np.log10(np.maximum(min_level, x)) + + @staticmethod + def _db_to_amp(x): + return np.power(10.0, x * 0.05) + + def _normalize(self, S): + return np.clip((S - self.min_level_db) / -self.min_level_db, 0, 1) + + def _denormalize(self, S): + return (np.clip(S, 0, 1) * -self.min_level_db) + self.min_level_db + + def read_audio(self, audio_path): + wav = self.load_wav(audio_path) + if self.norm_audio: + wav = self.audio_normalize(wav) + else: + wav = wav / np.abs(wav).max() + + return wav + + def audio_to_spectrogram(self, wav): + if self.spectrogram_type == 'lws': + spectrogram = self.melspectrogram(wav).astype(np.float32).T + elif self.spectrogram_type == 'librosa': + spectrogram = self.generate_spectrogram_magphase(wav) + return spectrogram + + +def extract_audio_feature(wav_path, audio_tools, mel_out_dir): + file_name, _ = osp.splitext(osp.basename(wav_path)) + # Write the spectrograms to disk: + mel_filename = os.path.join(mel_out_dir, file_name + '.npy') + if not os.path.exists(mel_filename): + try: + wav = audio_tools.read_audio(wav_path) + + spectrogram = audio_tools.audio_to_spectrogram(wav) + + np.save( + mel_filename, + spectrogram.astype(np.float32), + allow_pickle=False) + + except BaseException: + print(f'Read audio [{wav_path}] failed.') + + +if __name__ == '__main__': + audio_tools = AudioTools( + fft_size=512, hop_size=256) # window_size:32ms hop_size:16ms + + parser = argparse.ArgumentParser() + parser.add_argument('audio_home_path', type=str) + parser.add_argument('spectrogram_save_path', type=str) + parser.add_argument('--level', type=int, default=1) + parser.add_argument('--ext', default='m4a') + parser.add_argument('--num-workers', type=int, default=4) + parser.add_argument('--part', type=str, default='1/1') + args = parser.parse_args() + + mmcv.mkdir_or_exist(args.spectrogram_save_path) + + files = glob.glob( + # osp.join(args.audio_home_path, '*/' * args.level, '*' + args.ext) + args.audio_home_path + '/*' * args.level + '.' + args.ext) + print(f'found {len(files)} files.') + files = sorted(files) + if args.part is not None: + [this_part, num_parts] = [int(i) for i in args.part.split('/')] + part_len = len(files) // num_parts + + p = Pool(args.num_workers) + for file in files[part_len * (this_part - 1):( + part_len * this_part) if this_part != num_parts else len(files)]: + p.apply_async( + extract_audio_feature, + args=(file, audio_tools, args.spectrogram_save_path)) + p.close() + p.join() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/build_file_list.py b/openmmlab_test/mmaction2-0.24.1/tools/data/build_file_list.py new file mode 100644 index 0000000000000000000000000000000000000000..0ba15e75d01e5bdf9777b8f1e7e6c67cb8e60426 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/build_file_list.py @@ -0,0 +1,269 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import glob +import json +import os.path as osp +import random + +from mmcv.runner import set_random_seed + +from tools.data.anno_txt2json import lines2dictlist +from tools.data.parse_file_list import (parse_directory, parse_diving48_splits, + parse_hmdb51_split, + parse_jester_splits, + parse_kinetics_splits, + parse_mit_splits, parse_mmit_splits, + parse_sthv1_splits, parse_sthv2_splits, + parse_ucf101_splits) + + +def parse_args(): + parser = argparse.ArgumentParser(description='Build file list') + parser.add_argument( + 'dataset', + type=str, + choices=[ + 'ucf101', 'kinetics400', 'kinetics600', 'kinetics700', 'thumos14', + 'sthv1', 'sthv2', 'mit', 'mmit', 'activitynet', 'hmdb51', 'jester', + 'diving48' + ], + help='dataset to be built file list') + parser.add_argument( + 'src_folder', type=str, help='root directory for the frames or videos') + parser.add_argument( + '--rgb-prefix', type=str, default='img_', help='prefix of rgb frames') + parser.add_argument( + '--flow-x-prefix', + type=str, + default='flow_x_', + help='prefix of flow x frames') + parser.add_argument( + '--flow-y-prefix', + type=str, + default='flow_y_', + help='prefix of flow y frames') + parser.add_argument( + '--num-split', + type=int, + default=3, + help='number of split to file list') + parser.add_argument( + '--subset', + type=str, + default='train', + choices=['train', 'val', 'test'], + help='subset to generate file list') + parser.add_argument( + '--level', + type=int, + default=2, + choices=[1, 2], + help='directory level of data') + parser.add_argument( + '--format', + type=str, + default='rawframes', + choices=['rawframes', 'videos'], + help='data format') + parser.add_argument( + '--out-root-path', + type=str, + default='data/', + help='root path for output') + parser.add_argument( + '--output-format', + type=str, + default='txt', + choices=['txt', 'json'], + help='built file list format') + parser.add_argument('--seed', type=int, default=None, help='random seed') + parser.add_argument( + '--shuffle', + action='store_true', + default=False, + help='whether to shuffle the file list') + args = parser.parse_args() + + return args + + +def build_file_list(splits, frame_info, shuffle=False): + """Build file list for a certain data split. + + Args: + splits (tuple): Data split to generate file list. + frame_info (dict): Dict mapping from frames to path. e.g., + 'Skiing/v_Skiing_g18_c02': ('data/ucf101/rawframes/Skiing/v_Skiing_g18_c02', 0, 0). # noqa: E501 + shuffle (bool): Whether to shuffle the file list. + + Returns: + tuple: RGB file list for training and testing, together with + Flow file list for training and testing. + """ + + def build_list(split): + """Build RGB and Flow file list with a given split. + + Args: + split (list): Split to be generate file list. + + Returns: + tuple[list, list]: (rgb_list, flow_list), rgb_list is the + generated file list for rgb, flow_list is the generated + file list for flow. + """ + rgb_list, flow_list = list(), list() + for item in split: + if item[0] not in frame_info: + continue + if frame_info[item[0]][1] > 0: + # rawframes + rgb_cnt = frame_info[item[0]][1] + flow_cnt = frame_info[item[0]][2] + if isinstance(item[1], int): + rgb_list.append(f'{item[0]} {rgb_cnt} {item[1]}\n') + flow_list.append(f'{item[0]} {flow_cnt} {item[1]}\n') + elif isinstance(item[1], list): + # only for multi-label datasets like mmit + rgb_list.append(f'{item[0]} {rgb_cnt} ' + + ' '.join([str(digit) + for digit in item[1]]) + '\n') + rgb_list.append(f'{item[0]} {flow_cnt} ' + + ' '.join([str(digit) + for digit in item[1]]) + '\n') + else: + raise ValueError( + 'frame_info should be ' + + '[`video`(str), `label`(int)|`labels(list[int])`') + else: + # videos + if isinstance(item[1], int): + rgb_list.append(f'{frame_info[item[0]][0]} {item[1]}\n') + flow_list.append(f'{frame_info[item[0]][0]} {item[1]}\n') + elif isinstance(item[1], list): + # only for multi-label datasets like mmit + rgb_list.append(f'{frame_info[item[0]][0]} ' + + ' '.join([str(digit) + for digit in item[1]]) + '\n') + flow_list.append( + f'{frame_info[item[0]][0]} ' + + ' '.join([str(digit) for digit in item[1]]) + '\n') + else: + raise ValueError( + 'frame_info should be ' + + '[`video`(str), `label`(int)|`labels(list[int])`') + if shuffle: + random.shuffle(rgb_list) + random.shuffle(flow_list) + return rgb_list, flow_list + + train_rgb_list, train_flow_list = build_list(splits[0]) + test_rgb_list, test_flow_list = build_list(splits[1]) + return (train_rgb_list, test_rgb_list), (train_flow_list, test_flow_list) + + +def main(): + args = parse_args() + + if args.seed is not None: + print(f'Set random seed to {args.seed}') + set_random_seed(args.seed) + + if args.format == 'rawframes': + frame_info = parse_directory( + args.src_folder, + rgb_prefix=args.rgb_prefix, + flow_x_prefix=args.flow_x_prefix, + flow_y_prefix=args.flow_y_prefix, + level=args.level) + elif args.format == 'videos': + if args.level == 1: + # search for one-level directory + video_list = glob.glob(osp.join(args.src_folder, '*')) + elif args.level == 2: + # search for two-level directory + video_list = glob.glob(osp.join(args.src_folder, '*', '*')) + else: + raise ValueError(f'level must be 1 or 2, but got {args.level}') + frame_info = {} + for video in video_list: + video_path = osp.relpath(video, args.src_folder) + # video_id: (video_relative_path, -1, -1) + frame_info[osp.splitext(video_path)[0]] = (video_path, -1, -1) + else: + raise NotImplementedError('only rawframes and videos are supported') + + if args.dataset == 'ucf101': + splits = parse_ucf101_splits(args.level) + elif args.dataset == 'sthv1': + splits = parse_sthv1_splits(args.level) + elif args.dataset == 'sthv2': + splits = parse_sthv2_splits(args.level) + elif args.dataset == 'mit': + splits = parse_mit_splits() + elif args.dataset == 'mmit': + splits = parse_mmit_splits() + elif args.dataset in ['kinetics400', 'kinetics600', 'kinetics700']: + splits = parse_kinetics_splits(args.level, args.dataset) + elif args.dataset == 'hmdb51': + splits = parse_hmdb51_split(args.level) + elif args.dataset == 'jester': + splits = parse_jester_splits(args.level) + elif args.dataset == 'diving48': + splits = parse_diving48_splits() + else: + raise ValueError( + f"Supported datasets are 'ucf101, sthv1, sthv2', 'jester', " + f"'mmit', 'mit', 'kinetics400', 'kinetics600', 'kinetics700', but " + f'got {args.dataset}') + + assert len(splits) == args.num_split + + out_path = args.out_root_path + args.dataset + + if len(splits) > 1: + for i, split in enumerate(splits): + file_lists = build_file_list( + split, frame_info, shuffle=args.shuffle) + train_name = f'{args.dataset}_train_split_{i+1}_{args.format}.txt' + val_name = f'{args.dataset}_val_split_{i+1}_{args.format}.txt' + if args.output_format == 'txt': + with open(osp.join(out_path, train_name), 'w') as f: + f.writelines(file_lists[0][0]) + with open(osp.join(out_path, val_name), 'w') as f: + f.writelines(file_lists[0][1]) + elif args.output_format == 'json': + train_list = lines2dictlist(file_lists[0][0], args.format) + val_list = lines2dictlist(file_lists[0][1], args.format) + train_name = train_name.replace('.txt', '.json') + val_name = val_name.replace('.txt', '.json') + with open(osp.join(out_path, train_name), 'w') as f: + json.dump(train_list, f) + with open(osp.join(out_path, val_name), 'w') as f: + json.dump(val_list, f) + else: + lists = build_file_list(splits[0], frame_info, shuffle=args.shuffle) + + if args.subset == 'train': + ind = 0 + elif args.subset == 'val': + ind = 1 + elif args.subset == 'test': + ind = 2 + else: + raise ValueError(f"subset must be in ['train', 'val', 'test'], " + f'but got {args.subset}.') + + filename = f'{args.dataset}_{args.subset}_list_{args.format}.txt' + if args.output_format == 'txt': + with open(osp.join(out_path, filename), 'w') as f: + f.writelines(lists[0][ind]) + elif args.output_format == 'json': + data_list = lines2dictlist(lists[0][ind], args.format) + filename = filename.replace('.txt', '.json') + with open(osp.join(out_path, filename), 'w') as f: + json.dump(data_list, f) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/build_rawframes.py b/openmmlab_test/mmaction2-0.24.1/tools/data/build_rawframes.py new file mode 100644 index 0000000000000000000000000000000000000000..70054e5b5e2cedc6ac5817af511827cb15251c7d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/build_rawframes.py @@ -0,0 +1,278 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import glob +import os +import os.path as osp +import sys +import warnings +from multiprocessing import Lock, Pool + +import mmcv +import numpy as np + + +def extract_frame(vid_item): + """Generate optical flow using dense flow. + + Args: + vid_item (list): Video item containing video full path, + video (short) path, video id. + + Returns: + bool: Whether generate optical flow successfully. + """ + full_path, vid_path, vid_id, method, task, report_file = vid_item + if '/' in vid_path: + act_name = osp.basename(osp.dirname(vid_path)) + out_full_path = osp.join(args.out_dir, act_name) + else: + out_full_path = args.out_dir + + run_success = -1 + + if task == 'rgb': + if args.use_opencv: + # Not like using denseflow, + # Use OpenCV will not make a sub directory with the video name + try: + video_name = osp.splitext(osp.basename(vid_path))[0] + out_full_path = osp.join(out_full_path, video_name) + + vr = mmcv.VideoReader(full_path) + for i, vr_frame in enumerate(vr): + if vr_frame is not None: + w, h, _ = np.shape(vr_frame) + if args.new_short == 0: + if args.new_width == 0 or args.new_height == 0: + # Keep original shape + out_img = vr_frame + else: + out_img = mmcv.imresize( + vr_frame, + (args.new_width, args.new_height)) + else: + if min(h, w) == h: + new_h = args.new_short + new_w = int((new_h / h) * w) + else: + new_w = args.new_short + new_h = int((new_w / w) * h) + out_img = mmcv.imresize(vr_frame, (new_h, new_w)) + mmcv.imwrite(out_img, + f'{out_full_path}/img_{i + 1:05d}.jpg') + else: + warnings.warn( + 'Length inconsistent!' + f'Early stop with {i + 1} out of {len(vr)} frames.' + ) + break + run_success = 0 + except Exception: + run_success = -1 + else: + if args.new_short == 0: + cmd = osp.join( + f"denseflow '{full_path}' -b=20 -s=0 -o='{out_full_path}'" + f' -nw={args.new_width} -nh={args.new_height} -v') + else: + cmd = osp.join( + f"denseflow '{full_path}' -b=20 -s=0 -o='{out_full_path}'" + f' -ns={args.new_short} -v') + run_success = os.system(cmd) + elif task == 'flow': + if args.input_frames: + if args.new_short == 0: + cmd = osp.join( + f"denseflow '{full_path}' -a={method} -b=20 -s=1 -o='{out_full_path}'" # noqa: E501 + f' -nw={args.new_width} --nh={args.new_height} -v --if') + else: + cmd = osp.join( + f"denseflow '{full_path}' -a={method} -b=20 -s=1 -o='{out_full_path}'" # noqa: E501 + f' -ns={args.new_short} -v --if') + else: + if args.new_short == 0: + cmd = osp.join( + f"denseflow '{full_path}' -a={method} -b=20 -s=1 -o='{out_full_path}'" # noqa: E501 + f' -nw={args.new_width} --nh={args.new_height} -v') + else: + cmd = osp.join( + f"denseflow '{full_path}' -a={method} -b=20 -s=1 -o='{out_full_path}'" # noqa: E501 + f' -ns={args.new_short} -v') + run_success = os.system(cmd) + else: + if args.new_short == 0: + cmd_rgb = osp.join( + f"denseflow '{full_path}' -b=20 -s=0 -o='{out_full_path}'" + f' -nw={args.new_width} -nh={args.new_height} -v') + cmd_flow = osp.join( + f"denseflow '{full_path}' -a={method} -b=20 -s=1 -o='{out_full_path}'" # noqa: E501 + f' -nw={args.new_width} -nh={args.new_height} -v') + else: + cmd_rgb = osp.join( + f"denseflow '{full_path}' -b=20 -s=0 -o='{out_full_path}'" + f' -ns={args.new_short} -v') + cmd_flow = osp.join( + f"denseflow '{full_path}' -a={method} -b=20 -s=1 -o='{out_full_path}'" # noqa: E501 + f' -ns={args.new_short} -v') + run_success_rgb = os.system(cmd_rgb) + run_success_flow = os.system(cmd_flow) + if run_success_flow == 0 and run_success_rgb == 0: + run_success = 0 + + if run_success == 0: + print(f'{task} {vid_id} {vid_path} {method} done') + sys.stdout.flush() + + lock.acquire() + with open(report_file, 'a') as f: + line = full_path + '\n' + f.write(line) + lock.release() + else: + print(f'{task} {vid_id} {vid_path} {method} got something wrong') + sys.stdout.flush() + + return True + + +def parse_args(): + parser = argparse.ArgumentParser(description='extract optical flows') + parser.add_argument('src_dir', type=str, help='source video directory') + parser.add_argument('out_dir', type=str, help='output rawframe directory') + parser.add_argument( + '--task', + type=str, + default='flow', + choices=['rgb', 'flow', 'both'], + help='which type of frames to be extracted') + parser.add_argument( + '--level', + type=int, + choices=[1, 2], + default=2, + help='directory level of data') + parser.add_argument( + '--num-worker', + type=int, + default=8, + help='number of workers to build rawframes') + parser.add_argument( + '--flow-type', + type=str, + default=None, + choices=[None, 'tvl1', 'warp_tvl1', 'farn', 'brox'], + help='flow type to be generated') + parser.add_argument( + '--out-format', + type=str, + default='jpg', + choices=['jpg', 'h5', 'png'], + help='output format') + parser.add_argument( + '--ext', + type=str, + default='avi', + choices=['avi', 'mp4', 'webm'], + help='video file extensions') + parser.add_argument( + '--mixed-ext', + action='store_true', + help='process video files with mixed extensions') + parser.add_argument( + '--new-width', type=int, default=0, help='resize image width') + parser.add_argument( + '--new-height', type=int, default=0, help='resize image height') + parser.add_argument( + '--new-short', + type=int, + default=0, + help='resize image short side length keeping ratio') + parser.add_argument('--num-gpu', type=int, default=8, help='number of GPU') + parser.add_argument( + '--resume', + action='store_true', + default=False, + help='resume optical flow extraction instead of overwriting') + parser.add_argument( + '--use-opencv', + action='store_true', + help='Whether to use opencv to extract rgb frames') + parser.add_argument( + '--input-frames', + action='store_true', + help='Whether to extract flow frames based on rgb frames') + parser.add_argument( + '--report-file', + type=str, + default='build_report.txt', + help='report to record files which have been successfully processed') + args = parser.parse_args() + + return args + + +def init(lock_): + global lock + lock = lock_ + + +if __name__ == '__main__': + args = parse_args() + + if not osp.isdir(args.out_dir): + print(f'Creating folder: {args.out_dir}') + os.makedirs(args.out_dir) + + if args.level == 2: + classes = os.listdir(args.src_dir) + for classname in classes: + new_dir = osp.join(args.out_dir, classname) + if not osp.isdir(new_dir): + print(f'Creating folder: {new_dir}') + os.makedirs(new_dir) + + if args.input_frames: + print('Reading rgb frames from folder: ', args.src_dir) + fullpath_list = glob.glob(args.src_dir + '/*' * args.level) + print('Total number of rgb frame folders found: ', len(fullpath_list)) + else: + print('Reading videos from folder: ', args.src_dir) + if args.mixed_ext: + print('Extension of videos is mixed') + fullpath_list = glob.glob(args.src_dir + '/*' * args.level) + else: + print('Extension of videos: ', args.ext) + fullpath_list = glob.glob(args.src_dir + '/*' * args.level + '.' + + args.ext) + print('Total number of videos found: ', len(fullpath_list)) + + if args.resume: + done_fullpath_list = [] + with open(args.report_file) as f: + for line in f: + if line == '\n': + continue + done_full_path = line.strip().split()[0] + done_fullpath_list.append(done_full_path) + done_fullpath_list = set(done_fullpath_list) + fullpath_list = list(set(fullpath_list).difference(done_fullpath_list)) + + if args.level == 2: + vid_list = list( + map( + lambda p: osp.join( + osp.basename(osp.dirname(p)), osp.basename(p)), + fullpath_list)) + elif args.level == 1: + vid_list = list(map(osp.basename, fullpath_list)) + + lock = Lock() + pool = Pool(args.num_worker, initializer=init, initargs=(lock, )) + pool.map( + extract_frame, + zip(fullpath_list, vid_list, range(len(vid_list)), + len(vid_list) * [args.flow_type], + len(vid_list) * [args.task], + len(vid_list) * [args.report_file])) + pool.close() + pool.join() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/build_videos.py b/openmmlab_test/mmaction2-0.24.1/tools/data/build_videos.py new file mode 100644 index 0000000000000000000000000000000000000000..77a3a0bd393758414d73730d621f53f513d0c71e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/build_videos.py @@ -0,0 +1,127 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import glob +import os +import os.path as osp +import sys +from multiprocessing import Pool + + +def encode_video(frame_dir_item): + """Encode frames to video using ffmpeg. + + Args: + frame_dir_item (list): Rawframe item containing raw frame directory + full path, rawframe directory (short) path, rawframe directory id. + + Returns: + bool: Whether synthesize video successfully. + """ + full_path, frame_dir_path, frame_dir_id = frame_dir_item + out_full_path = args.out_dir + + img_name_tmpl = args.filename_tmpl + '.' + args.in_format + img_path = osp.join(full_path, img_name_tmpl) + + out_vid_name = frame_dir_path + '.' + args.ext + out_vid_path = osp.join(out_full_path, out_vid_name) + + cmd = osp.join( + f"ffmpeg -start_number {args.start_idx} -r {args.fps} -i '{img_path}' " + f"-vcodec {args.vcodec} '{out_vid_path}'") + os.system(cmd) + + print(f'{frame_dir_id} {frame_dir_path} done') + sys.stdout.flush() + return True + + +def parse_args(): + parser = argparse.ArgumentParser(description='synthesize videos') + parser.add_argument('src_dir', type=str, help='source rawframe directory') + parser.add_argument('out_dir', type=str, help='output video directory') + parser.add_argument( + '--fps', type=int, default=30, help='fps of videos to be synthesized') + parser.add_argument( + '--level', + type=int, + choices=[1, 2], + default=2, + help='directory level of data') + parser.add_argument( + '--num-worker', + type=int, + default=8, + help='number of workers to build videos') + parser.add_argument( + '--in-format', + type=str, + default='jpg', + choices=['jpg', 'png'], + help='input format') + parser.add_argument( + '--start-idx', type=int, default=0, help='starting index of rawframes') + parser.add_argument( + '--filename-tmpl', + type=str, + default='img_%05d', + help='filename template of rawframes') + parser.add_argument( + '--vcodec', type=str, default='mpeg4', help='coding method of videos') + parser.add_argument( + '--ext', + type=str, + default='mp4', + choices=['mp4', 'avi'], + help='video file extensions') + parser.add_argument('--num-gpu', type=int, default=8, help='number of GPU') + parser.add_argument( + '--resume', + action='store_true', + default=False, + help='resume optical flow extraction instead of overwriting') + args = parser.parse_args() + + return args + + +if __name__ == '__main__': + args = parse_args() + + if not osp.isdir(args.out_dir): + print(f'Creating folder: {args.out_dir}') + os.makedirs(args.out_dir) + + if args.level == 2: + classes = os.listdir(args.src_dir) + for classname in classes: + new_dir = osp.join(args.out_dir, classname) + if not osp.isdir(new_dir): + print(f'Creating folder: {new_dir}') + os.makedirs(new_dir) + + print('Reading rgb frames from folder: ', args.src_dir) + print('Input format of rgb frames: ', args.in_format) + fullpath_list = glob.glob(args.src_dir + '/*' * args.level) + done_fullpath_list = glob.glob(args.src_dir + '/*' * args.level + '.' + + args.ext) + print('Total number of rgb frame folders found: ', len(fullpath_list)) + + if args.resume: + fullpath_list = set(fullpath_list).difference(set(done_fullpath_list)) + fullpath_list = list(fullpath_list) + print('Resuming. number of videos to be synthesized: ', + len(fullpath_list)) + + if args.level == 2: + frame_dir_list = list( + map( + lambda p: osp.join( + osp.basename(osp.dirname(p)), osp.basename(p)), + fullpath_list)) + elif args.level == 1: + frame_dir_list = list(map(osp.basename, fullpath_list)) + + pool = Pool(args.num_worker) + pool.map(encode_video, + zip(fullpath_list, frame_dir_list, range(len(frame_dir_list)))) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/denormalize_proposal_file.py b/openmmlab_test/mmaction2-0.24.1/tools/data/denormalize_proposal_file.py new file mode 100644 index 0000000000000000000000000000000000000000..1e198d032d9bd78fffeda910864c06b6e7db122f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/denormalize_proposal_file.py @@ -0,0 +1,82 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os.path as osp + +from mmaction.localization import load_localize_proposal_file +from tools.data.parse_file_list import parse_directory + + +def process_norm_proposal_file(norm_proposal_file, frame_dict): + """Process the normalized proposal file and denormalize it. + + Args: + norm_proposal_file (str): Name of normalized proposal file. + frame_dict (dict): Information of frame folders. + """ + proposal_file = norm_proposal_file.replace('normalized_', '') + norm_proposals = load_localize_proposal_file(norm_proposal_file) + + processed_proposal_list = [] + for idx, norm_proposal in enumerate(norm_proposals): + video_id = norm_proposal[0] + frame_info = frame_dict[video_id] + num_frames = frame_info[1] + frame_path = osp.basename(frame_info[0]) + + gt = [[ + int(x[0]), + int(float(x[1]) * num_frames), + int(float(x[2]) * num_frames) + ] for x in norm_proposal[2]] + + proposal = [[ + int(x[0]), + float(x[1]), + float(x[2]), + int(float(x[3]) * num_frames), + int(float(x[4]) * num_frames) + ] for x in norm_proposal[3]] + + gt_dump = '\n'.join(['{} {} {}'.format(*x) for x in gt]) + gt_dump += '\n' if len(gt) else '' + proposal_dump = '\n'.join( + ['{} {:.04f} {:.04f} {} {}'.format(*x) for x in proposal]) + proposal_dump += '\n' if len(proposal) else '' + + processed_proposal_list.append( + f'# {idx}\n{frame_path}\n{num_frames}\n1' + f'\n{len(gt)}\n{gt_dump}{len(proposal)}\n{proposal_dump}') + + with open(proposal_file, 'w') as f: + f.writelines(processed_proposal_list) + + +def parse_args(): + parser = argparse.ArgumentParser(description='Denormalize proposal file') + parser.add_argument( + 'dataset', + type=str, + choices=['thumos14'], + help='dataset to be denormalize proposal file') + parser.add_argument( + '--norm-proposal-file', + type=str, + help='normalized proposal file to be denormalize') + parser.add_argument( + '--data-prefix', + type=str, + help='path to a directory where rawframes are held') + args = parser.parse_args() + return args + + +def main(): + args = parse_args() + + print(f'Converting from {args.norm_proposal_file}.') + frame_dict = parse_directory(args.data_prefix) + process_norm_proposal_file(args.norm_proposal_file, frame_dict) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/README.md new file mode 100644 index 0000000000000000000000000000000000000000..588cddd173d33c7e3d6e4241feffa43dd487ee2b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/README.md @@ -0,0 +1,123 @@ +# Preparing Diving48 + +## Introduction + + + +```BibTeX +@inproceedings{li2018resound, + title={Resound: Towards action recognition without representation bias}, + author={Li, Yingwei and Li, Yi and Vasconcelos, Nuno}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + pages={513--528}, + year={2018} +} +``` + +For basic dataset information, you can refer to the official dataset [website](http://www.svcl.ucsd.edu/projects/resound/dataset.html). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/diving48/`. + +## Step 1. Prepare Annotations + +You can run the following script to download annotations (considering the correctness of annotation files, we only download V2 version here). + +```shell +bash download_annotations.sh +``` + +## Step 2. Prepare Videos + +You can run the following script to download videos. + +```shell +bash download_videos.sh +``` + +## Step 3. Prepare RGB and Flow + +This part is **optional** if you only want to use the video loader. + +The frames provided in official compressed file are not complete. You may need to go through the following extraction steps to get the complete frames. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. + +You can run the following script to soft link SSD. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/diving48_extracted/ +ln -s /mnt/SSD/diving48_extracted/ ../../../data/diving48/rawframes +``` + +If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract **RGB-only** frames using denseflow. + +```shell +cd $MMACTION2/tools/data/diving48/ +bash extract_rgb_frames.sh +``` + +If you didn't install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images. + +```shell +cd $MMACTION2/tools/data/diving48/ +bash extract_rgb_frames_opencv.sh +``` + +If both are required, run the following script to extract frames. + +```shell +cd $MMACTION2/tools/data/diving48/ +bash extract_frames.sh +``` + +## Step 4. Generate File List + +you can run the follow script to generate file list in the format of rawframes and videos. + +```shell +bash generate_videos_filelist.sh +bash generate_rawframes_filelist.sh +``` + +## Step 5. Check Directory Structure + +After the whole data process for Diving48 preparation, +you will get the rawframes (RGB + Flow), videos and annotation files for Diving48. + +In the context of the whole project (for Diving48 only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── diving48 +│ │ ├── diving48_{train,val}_list_rawframes.txt +│ │ ├── diving48_{train,val}_list_videos.txt +│ │ ├── annotations +│ | | ├── Diving48_V2_train.json +│ | | ├── Diving48_V2_test.json +│ | | ├── Diving48_vocab.json +│ | ├── videos +│ | | ├── _8Vy3dlHg2w_00000.mp4 +│ | | ├── _8Vy3dlHg2w_00001.mp4 +│ | | ├── ... +│ | ├── rawframes +│ | | ├── 2x00lRzlTVQ_00000 +│ | | | ├── img_00001.jpg +│ | | | ├── img_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_x_00001.jpg +│ | | | ├── flow_x_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_y_00001.jpg +│ | | | ├── flow_y_00002.jpg +│ | | | ├── ... +│ | | ├── 2x00lRzlTVQ_00001 +│ | | ├── ... +``` + +For training and evaluating on Diving48, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..e91f8729a575c05ceb0461e334a6acde5cceb971 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/README_zh-CN.md @@ -0,0 +1,123 @@ +# 准备 Diving48 + +## 简介 + + + +```BibTeX +@inproceedings{li2018resound, + title={Resound: Towards action recognition without representation bias}, + author={Li, Yingwei and Li, Yi and Vasconcelos, Nuno}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + pages={513--528}, + year={2018} +} +``` + +用户可参考该数据集的 [官网](http://www.svcl.ucsd.edu/projects/resound/dataset.html),以获取数据集相关的基本信息。 +在数据集准备前,请确保命令行当前路径为 `$MMACTION2/tools/data/diving48/`。 + +## 步骤 1. 下载标注文件 + +用户可以使用以下命令下载标注文件(考虑到标注的准确性,这里仅下载 V2 版本)。 + +```shell +bash download_annotations.sh +``` + +## 步骤 2. 准备视频 + +用户可以使用以下命令下载视频。 + +```shell +bash download_videos.sh +``` + +## Step 3. 抽取 RGB 帧和光流 + +如果用户只想使用视频加载训练,则该部分是 **可选项**。 + +官网提供的帧压缩包并不完整。若想获取完整的数据,可以使用以下步骤解帧。 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果拥有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 中。 + +可以运行以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行进行抽取(假设 SSD 挂载在 "/mnt/SSD/") +mkdir /mnt/SSD/diving48_extracted/ +ln -s /mnt/SSD/diving48_extracted/ ../../../data/diving48/rawframes +``` + +如果用户需要抽取 RGB 帧(因为抽取光流的过程十分耗时),可以考虑运行以下命令使用 denseflow **只抽取 RGB 帧**。 + +```shell +cd $MMACTION2/tools/data/diving48/ +bash extract_rgb_frames.sh +``` + +如果用户没有安装 denseflow,则可以运行以下命令使用 OpenCV 抽取 RGB 帧。然而,该方法只能抽取与原始视频分辨率相同的帧。 + +```shell +cd $MMACTION2/tools/data/diving48/ +bash extract_rgb_frames_opencv.sh +``` + +如果用户想抽取 RGB 帧和光流,则可以运行以下脚本进行抽取。 + +```shell +cd $MMACTION2/tools/data/diving48/ +bash extract_frames.sh +``` + +## 步骤 4. 生成文件列表 + +用户可以通过运行以下命令生成帧和视频格式的文件列表。 + +```shell +bash generate_videos_filelist.sh +bash generate_rawframes_filelist.sh +``` + +## 步骤 5. 检查文件夹结构 + +在完成所有 Diving48 数据集准备流程后, +用户可以获得对应的 RGB + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,Diving48 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── diving48 +│ │ ├── diving48_{train,val}_list_rawframes.txt +│ │ ├── diving48_{train,val}_list_videos.txt +│ │ ├── annotations +│ | | ├── Diving48_V2_train.json +│ | | ├── Diving48_V2_test.json +│ | | ├── Diving48_vocab.json +│ | ├── videos +│ | | ├── _8Vy3dlHg2w_00000.mp4 +│ | | ├── _8Vy3dlHg2w_00001.mp4 +│ | | ├── ... +│ | ├── rawframes +│ | | ├── 2x00lRzlTVQ_00000 +│ | | | ├── img_00001.jpg +│ | | | ├── img_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_x_00001.jpg +│ | | | ├── flow_x_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_y_00001.jpg +│ | | | ├── flow_y_00002.jpg +│ | | | ├── ... +│ | | ├── 2x00lRzlTVQ_00001 +│ | | ├── ... +``` + +关于对 Diving48 进行训练和验证,可以参考 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..1f8845672b9d9f27d94c876184c2f01ba6578aa6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/download_annotations.sh @@ -0,0 +1,16 @@ +#!/usr/bin/env bash + +DATA_DIR="../../../data/diving48/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} + +wget http://www.svcl.ucsd.edu/projects/resound/Diving48_vocab.json +wget http://www.svcl.ucsd.edu/projects/resound/Diving48_V2_train.json +wget http://www.svcl.ucsd.edu/projects/resound/Diving48_V2_test.json + +cd - diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/download_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/download_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..757f443fc98306aa263236dc98949091f612b85c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/download_videos.sh @@ -0,0 +1,16 @@ +#!/usr/bin/env bash + +DATA_DIR="../../../data/diving48/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} + +wget http://www.svcl.ucsd.edu/projects/resound/Diving48_rgb.tar.gz --no-check-certificate +tar -zxvf Diving48_rgb.tar.gz +mv ./rgb ./videos + +cd - diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..1563d9993ffcc53fd6852e9d269bd3fcb75ae62e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/extract_frames.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/diving48/videos/ ../../data/diving48/rawframes/ --task both --level 1 --flow-type tvl1 --ext mp4 +echo "Raw frames (RGB and tv-l1) Generated" +cd - diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/extract_rgb_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/extract_rgb_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..830d1433a3d6b8b93f3de8364f8ae431c20744b2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/extract_rgb_frames.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/diving48/videos/ ../../data/diving48/rawframes/ --task rgb --level 1 --ext mp4 +echo "Genearte raw frames (RGB only)" + +cd - diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/extract_rgb_frames_opencv.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/extract_rgb_frames_opencv.sh new file mode 100644 index 0000000000000000000000000000000000000000..db4c83c313d2946dbc4948edbfa3a1456e144609 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/extract_rgb_frames_opencv.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/diving48/videos/ ../../data/diving48/rawframes/ --task rgb --level 1 --ext mp4 --use-opencv +echo "Genearte raw frames (RGB only)" + +cd - diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..96d7397607350fad547ac88d76ef3fadbd6bba98 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/generate_rawframes_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py diving48 data/diving48/rawframes/ --num-split 1 --level 1 --subset train --format rawframes --shuffle +PYTHONPATH=. python tools/data/build_file_list.py diving48 data/diving48/rawframes/ --num-split 1 --level 1 --subset val --format rawframes --shuffle +echo "Filelist for rawframes generated." + +cd tools/data/diving48/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..68d7ff199ca9f9e77e1342d7c6a8bba8d10bbf25 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/generate_videos_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py diving48 data/diving48/videos/ --num-split 1 --level 1 --subset train --format videos --shuffle +PYTHONPATH=. python tools/data/build_file_list.py diving48 data/diving48/videos/ --num-split 1 --level 1 --subset val --format videos --shuffle +echo "Filelist for videos generated." + +cd tools/data/diving48/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..e2f629dd4f8f12d3005d80186154c58e6bdb8b27 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/diving48/label_map.txt @@ -0,0 +1,48 @@ +Back+15som+05Twis+FREE +Back+15som+15Twis+FREE +Back+15som+25Twis+FREE +Back+15som+NoTwis+PIKE +Back+15som+NoTwis+TUCK +Back+25som+15Twis+PIKE +Back+25som+25Twis+PIKE +Back+25som+NoTwis+PIKE +Back+25som+NoTwis+TUCK +Back+2som+15Twis+FREE +Back+2som+25Twis+FREE +Back+35som+NoTwis+PIKE +Back+35som+NoTwis+TUCK +Back+3som+NoTwis+PIKE +Back+3som+NoTwis+TUCK +Back+Dive+NoTwis+PIKE +Back+Dive+NoTwis+TUCK +Forward+15som+1Twis+FREE +Forward+15som+2Twis+FREE +Forward+15som+NoTwis+PIKE +Forward+1som+NoTwis+PIKE +Forward+25som+1Twis+PIKE +Forward+25som+2Twis+PIKE +Forward+25som+3Twis+PIKE +Forward+25som+NoTwis+PIKE +Forward+25som+NoTwis+TUCK +Forward+35som+NoTwis+PIKE +Forward+35som+NoTwis+TUCK +Forward+45som+NoTwis+TUCK +Forward+Dive+NoTwis+PIKE +Forward+Dive+NoTwis+STR +Inward+15som+NoTwis+PIKE +Inward+15som+NoTwis+TUCK +Inward+25som+NoTwis+PIKE +Inward+25som+NoTwis+TUCK +Inward+35som+NoTwis+TUCK +Inward+Dive+NoTwis+PIKE +Reverse+15som+05Twis+FREE +Reverse+15som+15Twis+FREE +Reverse+15som+25Twis+FREE +Reverse+15som+35Twis+FREE +Reverse+15som+NoTwis+PIKE +Reverse+25som+15Twis+PIKE +Reverse+25som+NoTwis+PIKE +Reverse+25som+NoTwis+TUCK +Reverse+35som+NoTwis+TUCK +Reverse+Dive+NoTwis+PIKE +Reverse+Dive+NoTwis+TUCK diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/extract_audio.py b/openmmlab_test/mmaction2-0.24.1/tools/data/extract_audio.py new file mode 100644 index 0000000000000000000000000000000000000000..ed828f990c8f30129982c16b2d495ad467851c28 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/extract_audio.py @@ -0,0 +1,61 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import glob +import os +import os.path as osp +from multiprocessing import Pool + +import mmcv + + +def extract_audio_wav(line): + """Extract the audio wave from video streams using FFMPEG.""" + video_id, _ = osp.splitext(osp.basename(line)) + video_dir = osp.dirname(line) + video_rel_dir = osp.relpath(video_dir, args.root) + dst_dir = osp.join(args.dst_root, video_rel_dir) + os.popen(f'mkdir -p {dst_dir}') + try: + if osp.exists(f'{dst_dir}/{video_id}.wav'): + return + cmd = f'ffmpeg -i {line} -map 0:a -y {dst_dir}/{video_id}.wav' + os.popen(cmd) + except BaseException: + with open('extract_wav_err_file.txt', 'a+') as f: + f.write(f'{line}\n') + + +def parse_args(): + parser = argparse.ArgumentParser(description='Extract audios') + parser.add_argument('root', type=str, help='source video directory') + parser.add_argument('dst_root', type=str, help='output audio directory') + parser.add_argument( + '--level', type=int, default=2, help='directory level of data') + parser.add_argument( + '--ext', + type=str, + default='mp4', + choices=['avi', 'mp4', 'webm'], + help='video file extensions') + parser.add_argument( + '--num-workers', type=int, default=8, help='number of workers') + args = parser.parse_args() + + return args + + +if __name__ == '__main__': + args = parse_args() + + mmcv.mkdir_or_exist(args.dst_root) + + print('Reading videos from folder: ', args.root) + print('Extension of videos: ', args.ext) + fullpath_list = glob.glob(args.root + '/*' * args.level + '.' + args.ext) + done_fullpath_list = glob.glob(args.dst_root + '/*' * args.level + '.wav') + print('Total number of videos found: ', len(fullpath_list)) + print('Total number of videos extracted finished: ', + len(done_fullpath_list)) + + pool = Pool(args.num_workers) + pool.map(extract_audio_wav, fullpath_list) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a39eda6fd48688f0441f20107a1b92abca4428a6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/README.md @@ -0,0 +1,109 @@ +# Preparing GYM + +## Introduction + + + +```BibTeX +@inproceedings{shao2020finegym, + title={Finegym: A hierarchical video dataset for fine-grained action understanding}, + author={Shao, Dian and Zhao, Yue and Dai, Bo and Lin, Dahua}, + booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + pages={2616--2625}, + year={2020} +} +``` + +For basic dataset information, please refer to the official [project](https://sdolivia.github.io/FineGym/) and the [paper](https://arxiv.org/abs/2004.06704). +We currently provide the data pre-processing pipeline for GYM99. +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/gym/`. + +## Step 1. Prepare Annotations + +First of all, you can run the following script to prepare annotations. + +```shell +bash download_annotations.sh +``` + +## Step 2. Prepare Videos + +Then, you can run the following script to prepare videos. +The codes are adapted from the [official crawler](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics). Note that this might take a long time. + +```shell +bash download_videos.sh +``` + +## Step 3. Trim Videos into Events + +First, you need to trim long videos into events based on the annotation of GYM with the following scripts. + +```shell +python trim_event.py +``` + +## Step 4. Trim Events into Subactions + +Then, you need to trim events into subactions based on the annotation of GYM with the following scripts. We use the two stage trimming for better efficiency (trimming multiple short clips from a long video can be extremely inefficient, since you need to go over the video many times). + +```shell +python trim_subaction.py +``` + +## Step 5. Extract RGB and Flow + +This part is **optional** if you only want to use the video loader for RGB model training. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +Run the following script to extract both rgb and flow using "tvl1" algorithm. + +```shell +bash extract_frames.sh +``` + +## Step 6. Generate file list for GYM99 based on extracted subactions + +You can use the following script to generate train / val lists for GYM99. + +```shell +python generate_file_list.py +``` + +## Step 7. Folder Structure + +After the whole data pipeline for GYM preparation. You can get the subaction clips, event clips, raw videos and GYM99 train/val lists. + +In the context of the whole project (for GYM only), the full folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── gym +| | ├── annotations +| | | ├── gym99_train_org.txt +| | | ├── gym99_val_org.txt +| | | ├── gym99_train.txt +| | | ├── gym99_val.txt +| | | ├── annotation.json +| | | └── event_annotation.json +│ │ ├── videos +| | | ├── 0LtLS9wROrk.mp4 +| | | ├── ... +| | | └── zfqS-wCJSsw.mp4 +│ │ ├── events +| | | ├── 0LtLS9wROrk_E_002407_002435.mp4 +| | | ├── ... +| | | └── zfqS-wCJSsw_E_006732_006824.mp4 +│ │ ├── subactions +| | | ├── 0LtLS9wROrk_E_002407_002435_A_0003_0005.mp4 +| | | ├── ... +| | | └── zfqS-wCJSsw_E_006244_006252_A_0000_0007.mp4 +| | └── subaction_frames +``` + +For training and evaluating on GYM, please refer to [getting_started](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..cb3a796ec7595fbbb03e72798ef02992b0bf27ce --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/README_zh-CN.md @@ -0,0 +1,109 @@ +# 准备 GYM + +## 简介 + + + +```BibTeX +@inproceedings{shao2020finegym, + title={Finegym: A hierarchical video dataset for fine-grained action understanding}, + author={Shao, Dian and Zhao, Yue and Dai, Bo and Lin, Dahua}, + booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + pages={2616--2625}, + year={2020} +} +``` + +请参照 [项目主页](https://sdolivia.github.io/FineGym/) 及 [原论文](https://sdolivia.github.io/FineGym/) 以获取数据集基本信息。 +MMAction2 当前支持 GYM99 的数据集预处理。 +在开始之前,用户需确保当前目录为 `$MMACTION2/tools/data/gym/`。 + +## 1. 准备标注文件 + +首先,用户可以使用如下脚本下载标注文件并进行预处理: + +```shell +bash download_annotations.sh +``` + +## 2. 准备视频 + +用户可以使用以下脚本准备视频,视频准备代码修改自 [ActivityNet 爬虫](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics)。 +注意这一步骤将花费较长时间。 + +```shell +bash download_videos.sh +``` + +## 3. 裁剪长视频至动作级别 + +用户首先需要使用以下脚本将 GYM 中的长视频依据标注文件裁剪至动作级别。 + +```shell +python trim_event.py +``` + +## 4. 裁剪动作视频至分动作级别 + +随后,用户需要使用以下脚本将 GYM 中的动作视频依据标注文件裁剪至分动作级别。将视频的裁剪分成两个级别可以带来更高的效率(在长视频中裁剪多个极短片段异常耗时)。 + +```shell +python trim_subaction.py +``` + +## 5. 提取 RGB 帧和光流 + +如果用户仅使用 video loader,则可以跳过本步。 + +在提取之前,请参考 [安装教程](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +用户可使用如下脚本同时抽取 RGB 帧和光流(提取光流时使用 tvl1 算法): + +```shell +bash extract_frames.sh +``` + +## 6. 基于提取出的分动作生成文件列表 + +用户可使用以下脚本为 GYM99 生成训练及测试的文件列表: + +```shell +python generate_file_list.py +``` + +## 7. 目录结构 + +在完整完成 GYM 的数据处理后,将得到帧文件夹(RGB 帧和光流帧),动作视频片段,分动作视频片段以及训练测试所用标注文件。 + +在整个项目目录下(仅针对 GYM),完整目录结构如下所示: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── gym +| | ├── annotations +| | | ├── gym99_train_org.txt +| | | ├── gym99_val_org.txt +| | | ├── gym99_train.txt +| | | ├── gym99_val.txt +| | | ├── annotation.json +| | | └── event_annotation.json +│ │ ├── videos +| | | ├── 0LtLS9wROrk.mp4 +| | | ├── ... +| | | └── zfqS-wCJSsw.mp4 +│ │ ├── events +| | | ├── 0LtLS9wROrk_E_002407_002435.mp4 +| | | ├── ... +| | | └── zfqS-wCJSsw_E_006732_006824.mp4 +│ │ ├── subactions +| | | ├── 0LtLS9wROrk_E_002407_002435_A_0003_0005.mp4 +| | | ├── ... +| | | └── zfqS-wCJSsw_E_006244_006252_A_0000_0007.mp4 +| | └── subaction_frames +``` + +关于 GYM 数据集上的训练与测试,请参照 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/download.py b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/download.py new file mode 100644 index 0000000000000000000000000000000000000000..cfcb954c350f9d9ca9336db4c48a930e44466607 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/download.py @@ -0,0 +1,100 @@ +# Copyright (c) OpenMMLab. All rights reserved. +# This scripts is copied from +# https://github.com/activitynet/ActivityNet/blob/master/Crawler/Kinetics/download.py # noqa: E501 +# The code is licensed under the MIT licence. +import argparse +import os +import ssl +import subprocess + +import mmcv +from joblib import Parallel, delayed + +ssl._create_default_https_context = ssl._create_unverified_context + + +def download(video_identifier, + output_filename, + num_attempts=5, + url_base='https://www.youtube.com/watch?v='): + """Download a video from youtube if exists and is not blocked. + arguments: + --------- + video_identifier: str + Unique YouTube video identifier (11 characters) + output_filename: str + File path where the video will be stored. + """ + # Defensive argument checking. + assert isinstance(video_identifier, str), 'video_identifier must be string' + assert isinstance(output_filename, str), 'output_filename must be string' + assert len(video_identifier) == 11, 'video_identifier must have length 11' + + status = False + + if not os.path.exists(output_filename): + command = [ + 'youtube-dl', '--quiet', '--no-warnings', '--no-check-certificate', + '-f', 'mp4', '-o', + '"%s"' % output_filename, + '"%s"' % (url_base + video_identifier) + ] + command = ' '.join(command) + print(command) + attempts = 0 + while True: + try: + subprocess.check_output( + command, shell=True, stderr=subprocess.STDOUT) + except subprocess.CalledProcessError: + attempts += 1 + if attempts == num_attempts: + return status, 'Fail' + else: + break + # Check if the video was successfully saved. + status = os.path.exists(output_filename) + return status, 'Downloaded' + + +def download_wrapper(youtube_id, output_dir): + """Wrapper for parallel processing purposes.""" + # we do this to align with names in annotations + output_filename = os.path.join(output_dir, youtube_id + '.mp4') + if os.path.exists(output_filename): + status = tuple([youtube_id, True, 'Exists']) + return status + + downloaded, log = download(youtube_id, output_filename) + status = tuple([youtube_id, downloaded, log]) + return status + + +def main(input, output_dir, num_jobs=24): + # Reading and parsing ActivityNet. + youtube_ids = mmcv.load(input).keys() + # Creates folders where videos will be saved later. + if not os.path.exists(output_dir): + os.makedirs(output_dir) + # Download all clips. + if num_jobs == 1: + status_list = [] + for index in youtube_ids: + status_list.append(download_wrapper(index, output_dir)) + else: + status_list = Parallel(n_jobs=num_jobs)( + delayed(download_wrapper)(index, output_dir) + for index in youtube_ids) + + # Save download report. + mmcv.dump(status_list, 'download_report.json') + + +if __name__ == '__main__': + description = 'Helper script for downloading GYM videos.' + p = argparse.ArgumentParser(description=description) + p.add_argument('input', type=str, help='The gym annotation file') + p.add_argument( + 'output_dir', type=str, help='Output directory to save videos.') + p.add_argument('-n', '--num-jobs', type=int, default=24) + main(**vars(p.parse_args())) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..4922104995427de641a4fffdda565de6745a2d11 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/download_annotations.sh @@ -0,0 +1,14 @@ +#!/usr/bin/env bash + +set -e + +DATA_DIR="../../../data/gym/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +wget https://sdolivia.github.io/FineGym/resources/dataset/finegym_annotation_info_v1.0.json -O $DATA_DIR/annotation.json +wget https://sdolivia.github.io/FineGym/resources/dataset/gym99_train_element_v1.0.txt -O $DATA_DIR/gym99_train_org.txt +wget https://sdolivia.github.io/FineGym/resources/dataset/gym99_val_element.txt -O $DATA_DIR/gym99_val_org.txt diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/download_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/download_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..1e8fd995996fe1295debfe1ef03f4a885c6e77aa --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/download_videos.sh @@ -0,0 +1,14 @@ +#!/usr/bin/env bash + +# set up environment +conda env create -f environment.yml +source activate gym +pip install mmcv +pip install --upgrade youtube-dl + +DATA_DIR="../../../data/gym" +ANNO_DIR="../../../data/gym/annotations" +python download.py ${ANNO_DIR}/annotation.json ${DATA_DIR}/videos + +source deactivate gym +conda remove -n gym --all diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/environment.yml b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/environment.yml new file mode 100644 index 0000000000000000000000000000000000000000..88d8998513c8b0ddd32064883b0dd7917f15e216 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/environment.yml @@ -0,0 +1,36 @@ +name: gym +channels: + - anaconda + - menpo + - conda-forge + - defaults +dependencies: + - ca-certificates=2020.1.1 + - certifi=2020.4.5.1 + - ffmpeg=2.8.6 + - libcxx=10.0.0 + - libedit=3.1.20181209 + - libffi=3.3 + - ncurses=6.2 + - openssl=1.1.1g + - pip=20.0.2 + - python=3.7.7 + - readline=8.0 + - setuptools=46.4.0 + - sqlite=3.31.1 + - tk=8.6.8 + - wheel=0.34.2 + - xz=5.2.5 + - zlib=1.2.11 + - pip: + - decorator==4.4.2 + - intel-openmp==2019.0 + - joblib==0.15.1 + - mkl==2019.0 + - numpy==1.18.4 + - olefile==0.46 + - pandas==1.0.3 + - python-dateutil==2.8.1 + - pytz==2020.1 + - six==1.14.0 + - youtube-dl diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..cfcc8c044d4b459043c8de6da1dac8fa2855f0c1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/extract_frames.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/gym/subactions/ ../../data/gym/subaction_frames/ --level 1 --flow-type tvl1 --ext mp4 --task both --new-short 256 +echo "Raw frames (RGB and tv-l1) Generated" + +cd gym/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/generate_file_list.py b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/generate_file_list.py new file mode 100644 index 0000000000000000000000000000000000000000..5f4295d2ed42b3fc240941a3f8acdc94d6259429 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/generate_file_list.py @@ -0,0 +1,49 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os +import os.path as osp + +annotation_root = '../../../data/gym/annotations' +data_root = '../../../data/gym/subactions' +frame_data_root = '../../../data/gym/subaction_frames' + +videos = os.listdir(data_root) +videos = set(videos) + +train_file_org = osp.join(annotation_root, 'gym99_train_org.txt') +val_file_org = osp.join(annotation_root, 'gym99_val_org.txt') +train_file = osp.join(annotation_root, 'gym99_train.txt') +val_file = osp.join(annotation_root, 'gym99_val.txt') +train_frame_file = osp.join(annotation_root, 'gym99_train_frame.txt') +val_frame_file = osp.join(annotation_root, 'gym99_val_frame.txt') + +train_org = open(train_file_org).readlines() +train_org = [x.strip().split() for x in train_org] +train = [x for x in train_org if x[0] + '.mp4' in videos] +if osp.exists(frame_data_root): + train_frames = [] + for line in train: + length = len(os.listdir(osp.join(frame_data_root, line[0]))) + train_frames.append([line[0], str(length // 3), line[1]]) + train_frames = [' '.join(x) for x in train_frames] + with open(train_frame_file, 'w') as fout: + fout.write('\n'.join(train_frames)) + +train = [x[0] + '.mp4 ' + x[1] for x in train] +with open(train_file, 'w') as fout: + fout.write('\n'.join(train)) + +val_org = open(val_file_org).readlines() +val_org = [x.strip().split() for x in val_org] +val = [x for x in val_org if x[0] + '.mp4' in videos] +if osp.exists(frame_data_root): + val_frames = [] + for line in val: + length = len(os.listdir(osp.join(frame_data_root, line[0]))) + val_frames.append([line[0], str(length // 3), line[1]]) + val_frames = [' '.join(x) for x in val_frames] + with open(val_frame_file, 'w') as fout: + fout.write('\n'.join(val_frames)) + +val = [x[0] + '.mp4 ' + x[1] for x in val] +with open(val_file, 'w') as fout: + fout.write('\n'.join(val)) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..daca3aa7f7d41a3c43aea577331be266d6e0b275 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/label_map.txt @@ -0,0 +1,99 @@ +(VT) round-off, flic-flac with 0.5 turn on, stretched salto forward with 0.5 turn off +(VT) round-off, flic-flac on, stretched salto backward with 2 turn off +(VT) round-off, flic-flac on, stretched salto backward with 1 turn off +(VT) round-off, flic-flac on, stretched salto backward with 1.5 turn off +(VT) round-off, flic-flac on, stretched salto backward with 2.5 turn off +(VT) round-off, flic-flac on, stretched salto backward off +(FX) switch leap with 0.5 turn +(FX) switch leap with 1 turn +(FX) split leap with 1 turn +(FX) split leap with 1.5 turn or more +(FX) switch leap (leap forward with leg change to cross split) +(FX) split jump with 1 turn +(FX) split jump (leg separation 180 degree parallel to the floor) +(FX) johnson with additional 0.5 turn +(FX) straddle pike or side split jump with 1 turn +(FX) switch leap to ring position +(FX) stag jump +(FX) 2 turn with free leg held upward in 180 split position throughout turn +(FX) 2 turn in tuck stand on one leg, free leg straight throughout turn +(FX) 3 turn on one leg, free leg optional below horizontal +(FX) 2 turn on one leg, free leg optional below horizontal +(FX) 1 turn on one leg, free leg optional below horizontal +(FX) 2 turn or more with heel of free leg forward at horizontal throughout turn +(FX) 1 turn with heel of free leg forward at horizontal throughout turn +(FX) arabian double salto tucked +(FX) salto forward tucked +(FX) aerial walkover forward +(FX) salto forward stretched with 2 twist +(FX) salto forward stretched with 1 twist +(FX) salto forward stretched with 1.5 twist +(FX) salto forward stretched, feet land together +(FX) double salto backward stretched +(FX) salto backward stretched with 3 twist +(FX) salto backward stretched with 2 twist +(FX) salto backward stretched with 2.5 twist +(FX) salto backward stretched with 1.5 twist +(FX) double salto backward tucked with 2 twist +(FX) double salto backward tucked with 1 twist +(FX) double salto backward tucked +(FX) double salto backward piked with 1 twist +(FX) double salto backward piked +(BB) sissone (leg separation 180 degree on the diagonal to the floor, take off two feet, land on one foot) +(BB) split jump with 0.5 turn in side position +(BB) split jump +(BB) straddle pike jump or side split jump +(BB) split ring jump (ring jump with front leg horizontal to the floor) +(BB) switch leap with 0.5 turn +(BB) switch leap (leap forward with leg change) +(BB) split leap forward +(BB) johnson (leap forward with leg change and 0.25 turn to side split or straddle pike position) +(BB) switch leap to ring position +(BB) sheep jump (jump with upper back arch and head release with feet to head height/closed Ring) +(BB) wolf hop or jump (hip angle at 45, knees together) +(BB) 1 turn with heel of free leg forward at horizontal throughout turn +(BB) 2 turn on one leg, free leg optional below horizontal +(BB) 1 turn on one leg, free leg optional below horizontal +(BB) 2 turn in tuck stand on one leg, free leg optional +(BB) salto backward tucked with 1 twist +(BB) salto backward tucked +(BB) salto backward stretched-step out (feet land successively) +(BB) salto backward stretched with legs together +(BB) salto sideward tucked, take off from one leg to side stand +(BB) free aerial cartwheel landing in cross position +(BB) salto forward tucked to cross stand +(BB) free aerial walkover forward, landing on one or both feet +(BB) jump backward, flic-flac take-off with 0.5 twist through handstand to walkover forward, also with support on one arm +(BB) flic-flac to land on both feet +(BB) flic-flac with step-out, also with support on one arm +(BB) round-off +(BB) double salto backward tucked +(BB) salto backward tucked +(BB) double salto backward piked +(BB) salto backward stretched with 2 twist +(BB) salto backward stretched with 2.5 twist +(UB) pike sole circle backward with 1 turn to handstand +(UB) pike sole circle backward with 0.5 turn to handstand +(UB) pike sole circle backward to handstand +(UB) giant circle backward with 1 turn to handstand +(UB) giant circle backward with 0.5 turn to handstand +(UB) giant circle backward +(UB) giant circle forward with 1 turn on one arm before handstand phase +(UB) giant circle forward with 0.5 turn to handstand +(UB) giant circle forward +(UB) clear hip circle backward to handstand +(UB) clear pike circle backward with 1 turn to handstand +(UB) clear pike circle backward with 0.5 turn to handstand +(UB) clear pike circle backward to handstand +(UB) stalder backward with 1 turn to handstand +(UB) stalder backward to handstand +(UB) counter straddle over high bar to hang +(UB) counter piked over high bar to hang +(UB) (swing backward or front support) salto forward straddled to hang on high bar +(UB) (swing backward) salto forward piked to hang on high bar +(UB) (swing forward or hip circle backward) salto backward with 0.5 turn piked to hang on high bar +(UB) transition flight from high bar to low bar +(UB) transition flight from low bar to high bar +(UB) (swing forward) double salto backward tucked with 1 turn +(UB) (swing backward) double salto forward tucked +(UB) (swing forward) double salto backward stretched diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/trim_event.py b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/trim_event.py new file mode 100644 index 0000000000000000000000000000000000000000..bf1fc97ade9593db37c0aabf797564921db17c0a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/trim_event.py @@ -0,0 +1,58 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os +import os.path as osp +import subprocess + +import mmcv + +data_root = '../../../data/gym' +video_root = f'{data_root}/videos' +anno_root = f'{data_root}/annotations' +anno_file = f'{anno_root}/annotation.json' + +event_anno_file = f'{anno_root}/event_annotation.json' +event_root = f'{data_root}/events' + +videos = os.listdir(video_root) +videos = set(videos) +annotation = mmcv.load(anno_file) +event_annotation = {} + +mmcv.mkdir_or_exist(event_root) + +for k, v in annotation.items(): + if k + '.mp4' not in videos: + print(f'video {k} has not been downloaded') + continue + + video_path = osp.join(video_root, k + '.mp4') + + for event_id, event_anno in v.items(): + timestamps = event_anno['timestamps'][0] + start_time, end_time = timestamps + event_name = k + '_' + event_id + + output_filename = event_name + '.mp4' + + command = [ + 'ffmpeg', '-i', + '"%s"' % video_path, '-ss', + str(start_time), '-t', + str(end_time - start_time), '-c:v', 'libx264', '-c:a', 'copy', + '-threads', '8', '-loglevel', 'panic', + '"%s"' % osp.join(event_root, output_filename) + ] + command = ' '.join(command) + try: + subprocess.check_output( + command, shell=True, stderr=subprocess.STDOUT) + except subprocess.CalledProcessError: + print( + f'Trimming of the Event {event_name} of Video {k} Failed', + flush=True) + + segments = event_anno['segments'] + if segments is not None: + event_annotation[event_name] = segments + +mmcv.dump(event_annotation, event_anno_file) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/gym/trim_subaction.py b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/trim_subaction.py new file mode 100644 index 0000000000000000000000000000000000000000..bbff90a83992bf549d67002299aee233c0dd5a3b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/gym/trim_subaction.py @@ -0,0 +1,52 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os +import os.path as osp +import subprocess + +import mmcv + +data_root = '../../../data/gym' +anno_root = f'{data_root}/annotations' + +event_anno_file = f'{anno_root}/event_annotation.json' +event_root = f'{data_root}/events' +subaction_root = f'{data_root}/subactions' + +events = os.listdir(event_root) +events = set(events) +annotation = mmcv.load(event_anno_file) + +mmcv.mkdir_or_exist(subaction_root) + +for k, v in annotation.items(): + if k + '.mp4' not in events: + print(f'video {k[:11]} has not been downloaded ' + f'or the event clip {k} not generated') + continue + + video_path = osp.join(event_root, k + '.mp4') + + for subaction_id, subaction_anno in v.items(): + timestamps = subaction_anno['timestamps'] + start_time, end_time = timestamps[0][0], timestamps[-1][1] + subaction_name = k + '_' + subaction_id + + output_filename = subaction_name + '.mp4' + + command = [ + 'ffmpeg', '-i', + '"%s"' % video_path, '-ss', + str(start_time), '-t', + str(end_time - start_time), '-c:v', 'libx264', '-c:a', 'copy', + '-threads', '8', '-loglevel', 'panic', + '"%s"' % osp.join(subaction_root, output_filename) + ] + command = ' '.join(command) + try: + subprocess.check_output( + command, shell=True, stderr=subprocess.STDOUT) + except subprocess.CalledProcessError: + print( + f'Trimming of the Subaction {subaction_name} of Event ' + f'{k} Failed', + flush=True) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/README.md new file mode 100644 index 0000000000000000000000000000000000000000..206b54876422b93ff03069939687909e6428d4a4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/README.md @@ -0,0 +1,125 @@ +# Preparing HMDB51 + +## Introduction + + + +```BibTeX +@article{Kuehne2011HMDBAL, + title={HMDB: A large video database for human motion recognition}, + author={Hilde Kuehne and Hueihan Jhuang and E. Garrote and T. Poggio and Thomas Serre}, + journal={2011 International Conference on Computer Vision}, + year={2011}, + pages={2556-2563} +} +``` + +For basic dataset information, you can refer to the dataset [website](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/hmdb51/`. + +To run the bash scripts below, you need to install `unrar`. you can install it by `sudo apt-get install unrar`, +or refer to [this repo](https://github.com/innerlee/setup) by following the usage and taking [`zzunrar.sh`](https://github.com/innerlee/setup/blob/master/zzunrar.sh) +script for easy installation without sudo. + +## Step 1. Prepare Annotations + +First of all, you can run the following script to prepare annotations. + +```shell +bash download_annotations.sh +``` + +## Step 2. Prepare Videos + +Then, you can run the following script to prepare videos. + +```shell +bash download_videos.sh +``` + +## Step 3. Extract RGB and Flow + +This part is **optional** if you only want to use the video loader. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. + +You can run the following script to soft link SSD. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/hmdb51_extracted/ +ln -s /mnt/SSD/hmdb51_extracted/ ../../../data/hmdb51/rawframes +``` + +If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract **RGB-only** frames using denseflow. + +```shell +bash extract_rgb_frames.sh +``` + +If you didn't install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images. + +```shell +bash extract_rgb_frames_opencv.sh +``` + +If both are required, run the following script to extract frames using "tvl1" algorithm. + +```shell +bash extract_frames.sh +``` + +## Step 4. Generate File List + +you can run the follow script to generate file list in the format of rawframes and videos. + +```shell +bash generate_rawframes_filelist.sh +bash generate_videos_filelist.sh +``` + +## Step 5. Check Directory Structure + +After the whole data process for HMDB51 preparation, +you will get the rawframes (RGB + Flow), videos and annotation files for HMDB51. + +In the context of the whole project (for HMDB51 only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── hmdb51 +│ │ ├── hmdb51_{train,val}_split_{1,2,3}_rawframes.txt +│ │ ├── hmdb51_{train,val}_split_{1,2,3}_videos.txt +│ │ ├── annotations +│ │ ├── videos +│ │ │ ├── brush_hair +│ │ │ │ ├── April_09_brush_hair_u_nm_np1_ba_goo_0.avi + +│ │ │ ├── wave +│ │ │ │ ├── 20060723sfjffbartsinger_wave_f_cm_np1_ba_med_0.avi +│ │ ├── rawframes +│ │ │ ├── brush_hair +│ │ │ │ ├── April_09_brush_hair_u_nm_np1_ba_goo_0 +│ │ │ │ │ ├── img_00001.jpg +│ │ │ │ │ ├── img_00002.jpg +│ │ │ │ │ ├── ... +│ │ │ │ │ ├── flow_x_00001.jpg +│ │ │ │ │ ├── flow_x_00002.jpg +│ │ │ │ │ ├── ... +│ │ │ │ │ ├── flow_y_00001.jpg +│ │ │ │ │ ├── flow_y_00002.jpg +│ │ │ ├── ... +│ │ │ ├── wave +│ │ │ │ ├── 20060723sfjffbartsinger_wave_f_cm_np1_ba_med_0 +│ │ │ │ ├── ... +│ │ │ │ ├── winKen_wave_u_cm_np1_ri_bad_1 + +``` + +For training and evaluating on HMDB51, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..a34c4b9ce91a1cfabb6eb12472e6f51746b36693 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/README_zh-CN.md @@ -0,0 +1,121 @@ +# 准备 HMDB51 + +## 简介 + + + +```BibTeX +@article{Kuehne2011HMDBAL, + title={HMDB: A large video database for human motion recognition}, + author={Hilde Kuehne and Hueihan Jhuang and E. Garrote and T. Poggio and Thomas Serre}, + journal={2011 International Conference on Computer Vision}, + year={2011}, + pages={2556-2563} +} +``` + +用户可以参照数据集 [官网](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/),获取数据集相关的基本信息。 +在准备数据集前,请确保命令行当前路径为 `$MMACTION2/tools/data/hmdb51/`。 + +为运行下面的 bash 脚本,需要安装 `unrar`。用户可运行 `sudo apt-get install unrar` 安装,或参照 [setup](https://github.com/innerlee/setup),运行 [`zzunrar.sh`](https://github.com/innerlee/setup/blob/master/zzunrar.sh) 脚本实现无管理员权限下的简易安装。 + +## 步骤 1. 下载标注文件 + +首先,用户可使用以下命令下载标注文件。 + +```shell +bash download_annotations.sh +``` + +## 步骤 2. 下载视频 + +之后,用户可使用以下指令下载视频 + +```shell +bash download_videos.sh +``` + +## 步骤 3. 抽取帧和光流 + +如果用户只想使用视频加载训练,则该部分是 **可选项**。 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果用户有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 上。 +用户可使用以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行指令进行抽取(假设 SSD 挂载在 "/mnt/SSD/"上) +mkdir /mnt/SSD/hmdb51_extracted/ +ln -s /mnt/SSD/hmdb51_extracted/ ../../../data/hmdb51/rawframes +``` + +如果用户需要抽取 RGB 帧(因为抽取光流的过程十分耗时),可以考虑运行以下命令使用 denseflow **只抽取 RGB 帧**。 + +```shell +bash extract_rgb_frames.sh +``` + +如果用户没有安装 denseflow,则可以运行以下命令使用 OpenCV 抽取 RGB 帧。然而,该方法只能抽取与原始视频分辨率相同的帧。 + +```shell +bash extract_rgb_frames_opencv.sh +``` + +如果用户想抽取 RGB 帧和光流,则可以运行以下脚本,使用 "tvl1" 算法进行抽取。 + +```shell +bash extract_frames.sh +``` + +## 步骤 4. 生成文件列表 + +用户可以通过运行以下命令生成帧和视频格式的文件列表。 + +```shell +bash generate_rawframes_filelist.sh +bash generate_videos_filelist.sh +``` + +## 步骤 5. 检查目录结构 + +在完成 HMDB51 数据集准备流程后,用户可以得到 HMDB51 的 RGB 帧 + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,HMDB51 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── hmdb51 +│ │ ├── hmdb51_{train,val}_split_{1,2,3}_rawframes.txt +│ │ ├── hmdb51_{train,val}_split_{1,2,3}_videos.txt +│ │ ├── annotations +│ │ ├── videos +│ │ │ ├── brush_hair +│ │ │ │ ├── April_09_brush_hair_u_nm_np1_ba_goo_0.avi + +│ │ │ ├── wave +│ │ │ │ ├── 20060723sfjffbartsinger_wave_f_cm_np1_ba_med_0.avi +│ │ ├── rawframes +│ │ │ ├── brush_hair +│ │ │ │ ├── April_09_brush_hair_u_nm_np1_ba_goo_0 +│ │ │ │ │ ├── img_00001.jpg +│ │ │ │ │ ├── img_00002.jpg +│ │ │ │ │ ├── ... +│ │ │ │ │ ├── flow_x_00001.jpg +│ │ │ │ │ ├── flow_x_00002.jpg +│ │ │ │ │ ├── ... +│ │ │ │ │ ├── flow_y_00001.jpg +│ │ │ │ │ ├── flow_y_00002.jpg +│ │ │ ├── ... +│ │ │ ├── wave +│ │ │ │ ├── 20060723sfjffbartsinger_wave_f_cm_np1_ba_med_0 +│ │ │ │ ├── ... +│ │ │ │ ├── winKen_wave_u_cm_np1_ri_bad_1 + +``` + +关于对 HMDB51 进行训练和验证,可以参照 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..f168cb1ea4e9cfec824f4bb9485b861970e4b9b1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/download_annotations.sh @@ -0,0 +1,22 @@ +#!/usr/bin/env bash + +set -e + +DATA_DIR="../../../data/hmdb51/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} +wget http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/test_train_splits.rar --no-check-certificate + +# sudo apt-get install unrar +unrar x test_train_splits.rar +rm test_train_splits.rar + +mv testTrainMulti_7030_splits/*.txt ./ +rmdir testTrainMulti_7030_splits + +cd - diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/download_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/download_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..ea5d90730f98dc4928dc3fa5d6d73e5e88a54ce9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/download_videos.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash + +set -e + +DATA_DIR="../../../data/hmdb51/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} + +mkdir -p ./videos +cd ./videos + +wget http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/hmdb51_org.rar --no-check-certificate + +# sudo apt-get install unrar +unrar x ./hmdb51_org.rar +rm ./hmdb51_org.rar + +# extract all rar files with full path +for file in *.rar; do unrar x $file; done + +rm ./*.rar +cd "../../../tools/data/hmdb51" diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..fb63c16b9142e5d1e014b935a957117b1678625b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/extract_frames.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/hmdb51/videos/ ../../data/hmdb51/rawframes/ --task both --level 2 --flow-type tvl1 +echo "Raw frames (RGB and Flow) Generated" +cd hmdb51/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/extract_rgb_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/extract_rgb_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..9e935b1f8dbcfcd4f2565aa3db28df9e54d4b0c9 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/extract_rgb_frames.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/hmdb51/videos/ ../../data/hmdb51/rawframes/ --task rgb --level 2 --ext avi +echo "Genearte raw frames (RGB only)" + +cd hmdb51/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/extract_rgb_frames_opencv.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/extract_rgb_frames_opencv.sh new file mode 100644 index 0000000000000000000000000000000000000000..91ff4f3254038b9600cba2d40d41296937733301 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/extract_rgb_frames_opencv.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/hmdb51/videos/ ../../data/hmdb51/rawframes/ --task rgb --level 2 --ext avi --use-opencv +echo "Genearte raw frames (RGB only)" + +cd hmdb51/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..bc20187a9ea806c0cf8e442d71b7bdd2c784bde5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/generate_rawframes_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ + +PYTHONPATH=. python tools/data/build_file_list.py hmdb51 data/hmdb51/rawframes/ --level 2 --format rawframes --shuffle +echo "Filelist for rawframes generated." + +cd tools/data/hmdb51/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..4acd28f4ce2f4b8ceb5c67cc7fd9e7a3952e3e5d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/generate_videos_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ + +PYTHONPATH=. python tools/data/build_file_list.py hmdb51 data/hmdb51/videos/ --level 2 --format videos --shuffle +echo "Filelist for videos generated." + +cd tools/data/hmdb51/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..3217416f52436eb030736db7ae8ae86a23883bc1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hmdb51/label_map.txt @@ -0,0 +1,51 @@ +brush_hair +cartwheel +catch +chew +clap +climb +climb_stairs +dive +draw_sword +dribble +drink +eat +fall_floor +fencing +flic_flac +golf +handstand +hit +hug +jump +kick +kick_ball +kiss +laugh +pick +pour +pullup +punch +push +pushup +ride_bike +ride_horse +run +shake_hands +shoot_ball +shoot_bow +shoot_gun +sit +situp +smile +smoke +somersault +stand +swing_baseball +sword +sword_exercise +talk +throw +turn +walk +wave diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6bcc73f862eaaedc222f3f2310cc3d0357971f03 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/README.md @@ -0,0 +1,123 @@ +# Preparing HVU + +## Introduction + + + +```BibTeX +@article{Diba2019LargeSH, + title={Large Scale Holistic Video Understanding}, + author={Ali Diba and M. Fayyaz and Vivek Sharma and Manohar Paluri and Jurgen Gall and R. Stiefelhagen and L. Gool}, + journal={arXiv: Computer Vision and Pattern Recognition}, + year={2019} +} +``` + +For basic dataset information, please refer to the official [project](https://github.com/holistic-video-understanding/HVU-Dataset/) and the [paper](https://arxiv.org/abs/1904.11451). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/hvu/`. + +## Step 1. Prepare Annotations + +First of all, you can run the following script to prepare annotations. + +```shell +bash download_annotations.sh +``` + +Besides, you need to run the following command to parse the tag list of HVU. + +```shell +python parse_tag_list.py +``` + +## Step 2. Prepare Videos + +Then, you can run the following script to prepare videos. +The codes are adapted from the [official crawler](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics). Note that this might take a long time. + +```shell +bash download_videos.sh +``` + +## Step 3. Extract RGB and Flow + +This part is **optional** if you only want to use the video loader. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +You can use the following script to extract both RGB and Flow frames. + +```shell +bash extract_frames.sh +``` + +By default, we generate frames with short edge resized to 256. +More details can be found in [data_preparation](/docs/data_preparation.md) + +## Step 4. Generate File List + +You can run the follow scripts to generate file list in the format of videos and rawframes, respectively. + +```shell +bash generate_videos_filelist.sh +# execute the command below when rawframes are ready +bash generate_rawframes_filelist.sh +``` + +## Step 5. Generate File List for Each Individual Tag Categories + +This part is **optional** if you don't want to train models on HVU for a specific tag category. + +The file list generated in step 4 contains labels of different categories. These file lists can only be +handled with HVUDataset and used for multi-task learning of different tag categories. The component +`LoadHVULabel` is needed to load the multi-category tags, and the `HVULoss` should be used to train +the model. + +If you only want to train video recognition models for a specific tag category, i.e. you want to train +a recognition model on HVU which only handles tags in the category `action`, we recommend you to use +the following command to generate file lists for the specific tag category. The new list, which only +contains tags of a specific category, can be handled with `VideoDataset` or `RawframeDataset`. The +recognition models can be trained with `BCELossWithLogits`. + +The following command generates file list for the tag category ${category}, note that the tag category you +specified should be in the 6 tag categories available in HVU: \['action', 'attribute', 'concept', 'event', +'object', 'scene'\]. + +```shell +python generate_sub_file_list.py path/to/filelist.json ${category} +``` + +The filename of the generated file list for ${category} is generated by replacing `hvu` in the original +filename with `hvu_${category}`. For example, if the original filename is `hvu_train.json`, the filename +of the file list for action is `hvu_action_train.json`. + +## Step 6. Folder Structure + +After the whole data pipeline for HVU preparation. +you can get the rawframes (RGB + Flow), videos and annotation files for HVU. + +In the context of the whole project (for HVU only), the full folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── hvu +│ │ ├── hvu_train_video.json +│ │ ├── hvu_val_video.json +│ │ ├── hvu_train.json +│ │ ├── hvu_val.json +│ │ ├── annotations +│ │ ├── videos_train +│ │ │ ├── OLpWTpTC4P8_000570_000670.mp4 +│ │ │ ├── xsPKW4tZZBc_002330_002430.mp4 +│ │ │ ├── ... +│ │ ├── videos_val +│ │ ├── rawframes_train +│ │ ├── rawframes_val + +``` + +For training and evaluating on HVU, please refer to [getting_started](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..5b3ffa1ea3a26fdf0cb49af72bdde0f10d1a9275 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/README_zh-CN.md @@ -0,0 +1,110 @@ +# 准备 HVU + +## 简介 + + + +```BibTeX +@article{Diba2019LargeSH, + title={Large Scale Holistic Video Understanding}, + author={Ali Diba and M. Fayyaz and Vivek Sharma and Manohar Paluri and Jurgen Gall and R. Stiefelhagen and L. Gool}, + journal={arXiv: Computer Vision and Pattern Recognition}, + year={2019} +} +``` + +请参照 [官方项目](https://github.com/holistic-video-understanding/HVU-Dataset/) 及 [原论文](https://arxiv.org/abs/1904.11451) 以获取数据集基本信息。 +在开始之前,用户需确保当前目录为 `$MMACTION2/tools/data/hvu/`。 + +## 1. 准备标注文件 + +首先,用户可以使用如下脚本下载标注文件并进行预处理: + +```shell +bash download_annotations.sh +``` + +此外,用户可使用如下命令解析 HVU 的标签列表: + +```shell +python parse_tag_list.py +``` + +## 2. 准备视频 + +用户可以使用以下脚本准备视频,视频准备代码修改自 [ActivityNet 爬虫](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics)。 +注意这一步骤将花费较长时间。 + +```shell +bash download_videos.sh +``` + +## 3. 提取 RGB 帧和光流 + +如果用户仅使用 video loader,则可以跳过本步。 + +在提取之前,请参考 [安装教程](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +用户可使用如下脚本同时抽取 RGB 帧和光流: + +```shell +bash extract_frames.sh +``` + +该脚本默认生成短边长度为 256 的帧,可参考 [数据准备](/docs_zh_CN/data_preparation.md) 获得更多细节。 + +## 4. 生成文件列表 + +用户可以使用以下两个脚本分别为视频和帧文件夹生成文件列表: + +```shell +bash generate_videos_filelist.sh +# 为帧文件夹生成文件列表 +bash generate_rawframes_filelist.sh +``` + +## 5. 为每个 tag 种类生成文件列表 + +若用户需要为 HVU 数据集的每个 tag 种类训练识别模型,则需要进行此步骤。 + +步骤 4 中生成的文件列表包含不同类型的标签,仅支持使用 HVUDataset 进行涉及多个标签种类的多任务学习。加载数据的过程中需要使用 `LoadHVULabel` 类进行多类别标签的加载,训练过程中使用 `HVULoss` 作为损失函数。 + +如果用户仅需训练某一特定类别的标签,例如训练一识别模型用于识别 HVU 中 `action` 类别的标签,则建议使用如下脚本为特定标签种类生成文件列表。新生成的列表将只含有特定类别的标签,因此可使用 `VideoDataset` 或 `RawframeDataset` 进行加载。训训练过程中使用 `BCELossWithLogits` 作为损失函数。 + +以下脚本为类别为 ${category} 的标签生成文件列表,注意仅支持 HVU 数据集包含的 6 种标签类别: action, attribute, concept, event, object, scene。 + +```shell +python generate_sub_file_list.py path/to/filelist.json ${category} +``` + +对于类别 ${category},生成的标签列表文件名中将使用 `hvu_${category}` 替代 `hvu`。例如,若原指定文件名为 `hvu_train.json`,则对于类别 action,生成的文件列表名为 `hvu_action_train.json`。 + +## 6. 目录结构 + +在完整完成 HVU 的数据处理后,将得到帧文件夹(RGB 帧和光流帧),视频以及标注文件。 + +在整个项目目录下(仅针对 HVU),完整目录结构如下所示: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── hvu +│ │ ├── hvu_train_video.json +│ │ ├── hvu_val_video.json +│ │ ├── hvu_train.json +│ │ ├── hvu_val.json +│ │ ├── annotations +│ │ ├── videos_train +│ │ │ ├── OLpWTpTC4P8_000570_000670.mp4 +│ │ │ ├── xsPKW4tZZBc_002330_002430.mp4 +│ │ │ ├── ... +│ │ ├── videos_val +│ │ ├── rawframes_train +│ │ ├── rawframes_val + +``` + +关于 HVU 数据集上的训练与测试,请参照 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/download.py b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/download.py new file mode 100644 index 0000000000000000000000000000000000000000..2ab18e843445fa9e3cc4446deda70d0e4c55b6a1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/download.py @@ -0,0 +1,203 @@ +# ------------------------------------------------------------------------------ +# Adapted from https://github.com/activitynet/ActivityNet/ +# Original licence: Copyright (c) Microsoft, under the MIT License. +# ------------------------------------------------------------------------------ + +import argparse +import glob +import os +import shutil +import ssl +import subprocess +import uuid + +import mmcv +from joblib import Parallel, delayed + +ssl._create_default_https_context = ssl._create_unverified_context +args = None + + +def create_video_folders(output_dir, tmp_dir): + if not os.path.exists(output_dir): + os.makedirs(output_dir) + if not os.path.exists(tmp_dir): + os.makedirs(tmp_dir) + + +def construct_video_filename(item, trim_format, output_dir): + """Given a dataset row, this function constructs the output filename for a + given video.""" + youtube_id, start_time, end_time = item + start_time, end_time = int(start_time * 10), int(end_time * 10) + basename = '%s_%s_%s.mp4' % (youtube_id, trim_format % start_time, + trim_format % end_time) + output_filename = os.path.join(output_dir, basename) + return output_filename + + +def download_clip(video_identifier, + output_filename, + start_time, + end_time, + tmp_dir='/tmp/hvu/.tmp_dir', + num_attempts=5, + url_base='https://www.youtube.com/watch?v='): + """Download a video from youtube if exists and is not blocked. + arguments: + --------- + video_identifier: str + Unique YouTube video identifier (11 characters) + output_filename: str + File path where the video will be stored. + start_time: float + Indicates the beginning time in seconds from where the video + will be trimmed. + end_time: float + Indicates the ending time in seconds of the trimmed video. + """ + # Defensive argument checking. + assert isinstance(video_identifier, str), 'video_identifier must be string' + assert isinstance(output_filename, str), 'output_filename must be string' + assert len(video_identifier) == 11, 'video_identifier must have length 11' + + status = False + tmp_filename = os.path.join(tmp_dir, '%s.%%(ext)s' % uuid.uuid4()) + + if not os.path.exists(output_filename): + if not os.path.exists(tmp_filename): + command = [ + 'youtube-dl', '--quiet', '--no-warnings', + '--no-check-certificate', '-f', 'mp4', '-o', + '"%s"' % tmp_filename, + '"%s"' % (url_base + video_identifier) + ] + command = ' '.join(command) + print(command) + attempts = 0 + while True: + try: + subprocess.check_output( + command, shell=True, stderr=subprocess.STDOUT) + except subprocess.CalledProcessError: + attempts += 1 + if attempts == num_attempts: + return status, 'Downloading Failed' + else: + break + + tmp_filename = glob.glob('%s*' % tmp_filename.split('.')[0])[0] + # Construct command to trim the videos (ffmpeg required). + command = [ + 'ffmpeg', '-i', + '"%s"' % tmp_filename, '-ss', + str(start_time), '-t', + str(end_time - start_time), '-c:v', 'libx264', '-c:a', 'copy', + '-threads', '1', '-loglevel', 'panic', + '"%s"' % output_filename + ] + command = ' '.join(command) + try: + subprocess.check_output( + command, shell=True, stderr=subprocess.STDOUT) + except subprocess.CalledProcessError: + return status, 'Trimming Failed' + + # Check if the video was successfully saved. + status = os.path.exists(output_filename) + os.remove(tmp_filename) + return status, 'Downloaded' + + +def download_clip_wrapper(item, trim_format, tmp_dir, output_dir): + """Wrapper for parallel processing purposes.""" + output_filename = construct_video_filename(item, trim_format, output_dir) + clip_id = os.path.basename(output_filename).split('.mp4')[0] + if os.path.exists(output_filename): + status = tuple([clip_id, True, 'Exists']) + return status + + youtube_id, start_time, end_time = item + downloaded, log = download_clip( + youtube_id, output_filename, start_time, end_time, tmp_dir=tmp_dir) + + status = tuple([clip_id, downloaded, log]) + return status + + +def parse_hvu_annotations(input_csv): + """Returns a parsed DataFrame. + arguments: + --------- + input_csv: str + Path to CSV file containing the following columns: + 'Tags, youtube_id, time_start, time_end' + returns: + ------- + dataset: List of tuples. Each tuple consists of + (youtube_id, time_start, time_end). The type of time is float. + """ + lines = open(input_csv).readlines() + lines = [x.strip().split(',')[1:] for x in lines[1:]] + + lines = [(x[0], float(x[1]), float(x[2])) for x in lines] + + return lines + + +def main(input_csv, + output_dir, + trim_format='%06d', + num_jobs=24, + tmp_dir='/tmp/hvu'): + + tmp_dir = os.path.join(tmp_dir, '.tmp_dir') + + # Reading and parsing HVU. + dataset = parse_hvu_annotations(input_csv) + + # Creates folders where videos will be saved later. + create_video_folders(output_dir, tmp_dir) + + # Download all clips. + if num_jobs == 1: + status_lst = [] + for item in dataset: + status_lst.append( + download_clip_wrapper(item, trim_format, tmp_dir, output_dir)) + else: + status_lst = Parallel(n_jobs=num_jobs)( + delayed(download_clip_wrapper)(item, trim_format, tmp_dir, + output_dir) for item in dataset) + + # Clean tmp dir. + shutil.rmtree(tmp_dir) + # Save download report. + mmcv.dump(status_lst, 'download_report.json') + + +if __name__ == '__main__': + description = 'Helper script for downloading and trimming HVU videos.' + p = argparse.ArgumentParser(description=description) + p.add_argument( + 'input_csv', + type=str, + help=('CSV file containing the following format: ' + 'Tags, youtube_id, time_start, time_end')) + p.add_argument( + 'output_dir', + type=str, + help='Output directory where videos will be saved.') + p.add_argument( + '-f', + '--trim-format', + type=str, + default='%06d', + help=('This will be the format for the ' + 'filename of trimmed videos: ' + 'videoid_%0xd(start_time)_%0xd(end_time).mp4. ' + 'Note that the start_time is multiplied by 10 since ' + 'decimal exists somewhere. ')) + p.add_argument('-n', '--num-jobs', type=int, default=24) + p.add_argument('-t', '--tmp-dir', type=str, default='/tmp/hvu') + main(**vars(p.parse_args())) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..d100a47598c50d95d7ef7c15bdece53ff28c52e8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/download_annotations.sh @@ -0,0 +1,22 @@ +#!/usr/bin/env bash + +set -e + +DATA_DIR="../../../data/hvu/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +git clone https://github.com/holistic-video-understanding/HVU-Dataset.git + +cd HVU-Dataset +unzip -o HVU_Train_V1.0.zip +unzip -o HVU_Val_V1.0.zip +cd .. +mv HVU-Dataset/HVU_Train_V1.0.csv ${DATA_DIR}/hvu_train.csv +mv HVU-Dataset/HVU_Val_V1.0.csv ${DATA_DIR}/hvu_val.csv +mv HVU-Dataset/HVU_Tags_Categories_V1.0.csv ${DATA_DIR}/hvu_categories.csv + +rm -rf HVU-Dataset diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/download_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/download_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..a4ce0d63f15099467cacd50f7c3e61d22d423065 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/download_videos.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +# set up environment +conda env create -f environment.yml +source activate hvu +pip install mmcv +pip install --upgrade youtube-dl + +DATA_DIR="../../../data/hvu" +ANNO_DIR="../../../data/hvu/annotations" +python download.py ${ANNO_DIR}/hvu_train.csv ${DATA_DIR}/videos_train +python download.py ${ANNO_DIR}/hvu_val.csv ${DATA_DIR}/videos_val + +source deactivate hvu +conda remove -n hvu --all diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/environment.yml b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/environment.yml new file mode 100644 index 0000000000000000000000000000000000000000..bcee98f8779857a9d382f2dc85f37fbec81cbba0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/environment.yml @@ -0,0 +1,36 @@ +name: kinetics +channels: + - anaconda + - menpo + - conda-forge + - defaults +dependencies: + - ca-certificates=2020.1.1 + - certifi=2020.4.5.1 + - ffmpeg=2.8.6 + - libcxx=10.0.0 + - libedit=3.1.20181209 + - libffi=3.3 + - ncurses=6.2 + - openssl=1.1.1g + - pip=20.0.2 + - python=3.7.7 + - readline=8.0 + - setuptools=46.4.0 + - sqlite=3.31.1 + - tk=8.6.8 + - wheel=0.34.2 + - xz=5.2.5 + - zlib=1.2.11 + - pip: + - decorator==4.4.2 + - intel-openmp==2019.0 + - joblib==0.15.1 + - mkl==2019.0 + - numpy==1.18.4 + - olefile==0.46 + - pandas==1.0.3 + - python-dateutil==2.8.1 + - pytz==2020.1 + - six==1.14.0 + - youtube-dl diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..d50f1cf87b1df92bbc3395a54e0b61eb566b543d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/extract_frames.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/hvu/videos_train/ ../../data/hvu/rawframes_train/ --level 1 --flow-type tvl1 --ext mp4 --task both --new-short 256 +echo "Raw frames (RGB and tv-l1) Generated for train set" + +python build_rawframes.py ../../data/hvu/videos_val/ ../../data/hvu/rawframes_val/ --level 1 --flow-type tvl1 --ext mp4 --task both --new-short 256 +echo "Raw frames (RGB and tv-l1) Generated for val set" + +cd hvu/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_file_list.py b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_file_list.py new file mode 100644 index 0000000000000000000000000000000000000000..83e99b14820d3b4273dddb229bad561ab5e794a2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_file_list.py @@ -0,0 +1,152 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import fnmatch +import glob +import os +import os.path as osp + +import mmcv + +annotation_root = '../../data/hvu/annotations' +tag_file = 'hvu_tags.json' +args = None + + +def parse_directory(path, + rgb_prefix='img_', + flow_x_prefix='flow_x_', + flow_y_prefix='flow_y_', + level=1): + """Parse directories holding extracted frames from standard benchmarks. + + Args: + path (str): Directory path to parse frames. + rgb_prefix (str): Prefix of generated rgb frames name. + default: 'img_'. + flow_x_prefix (str): Prefix of generated flow x name. + default: `flow_x_`. + flow_y_prefix (str): Prefix of generated flow y name. + default: `flow_y_`. + level (int): Directory level for glob searching. Options are 1 and 2. + default: 1. + + Returns: + dict: frame info dict with video id as key and tuple(path(str), + rgb_num(int), flow_x_num(int)) as value. + """ + print(f'parse frames under directory {path}') + if level == 1: + # Only search for one-level directory + def locate_directory(x): + return osp.basename(x) + + frame_dirs = glob.glob(osp.join(path, '*')) + + elif level == 2: + # search for two-level directory + def locate_directory(x): + return osp.join(osp.basename(osp.dirname(x)), osp.basename(x)) + + frame_dirs = glob.glob(osp.join(path, '*', '*')) + + else: + raise ValueError('level can be only 1 or 2') + + def count_files(directory, prefix_list): + """Count file number with a given directory and prefix. + + Args: + directory (str): Data directory to be search. + prefix_list (list): List or prefix. + + Returns: + list (int): Number list of the file with the prefix. + """ + lst = os.listdir(directory) + cnt_list = [len(fnmatch.filter(lst, x + '*')) for x in prefix_list] + return cnt_list + + # check RGB + frame_dict = {} + for i, frame_dir in enumerate(frame_dirs): + total_num = count_files(frame_dir, + (rgb_prefix, flow_x_prefix, flow_y_prefix)) + dir_name = locate_directory(frame_dir) + + num_x = total_num[1] + num_y = total_num[2] + if num_x != num_y: + raise ValueError(f'x and y direction have different number ' + f'of flow images in video directory: {frame_dir}') + if i % 200 == 0: + print(f'{i} videos parsed') + + frame_dict[dir_name] = (frame_dir, total_num[0], num_x) + + print('frame directory analysis done') + return frame_dict + + +def parse_args(): + parser = argparse.ArgumentParser(description='build file list for HVU') + parser.add_argument('--input_csv', type=str, help='path of input csv file') + parser.add_argument( + '--src_dir', type=str, help='source video / frames directory') + parser.add_argument( + '--output', + type=str, + help='output filename, should \ + ends with .json') + parser.add_argument( + '--mode', + type=str, + choices=['frames', 'videos'], + help='generate file list for frames or videos') + + args = parser.parse_args() + return args + + +if __name__ == '__main__': + args = parse_args() + tag_cates = mmcv.load(tag_file) + tag2category = {} + for k in tag_cates: + for tag in tag_cates[k]: + tag2category[tag] = k + + data_list = open(args.input_csv).readlines() + data_list = [x.strip().split(',') for x in data_list[1:]] + + if args.mode == 'videos': + downloaded = os.listdir(args.src_dir) + downloaded = [x.split('.')[0] for x in downloaded] + downloaded_set = set(downloaded) + else: + parse_result = parse_directory(args.src_dir) + downloaded_set = set(parse_result) + + def parse_line(line): + tags, youtube_id, start, end = line + start, end = int(float(start) * 10), int(float(end) * 10) + newname = f'{youtube_id}_{start:06d}_{end:06d}' + tags = tags.split('|') + all_tags = {} + for tag in tags: + category = tag2category[tag] + all_tags.setdefault(category, + []).append(tag_cates[category].index(tag)) + return newname, all_tags + + data_list = [parse_line(line) for line in data_list] + data_list = [line for line in data_list if line[0] in downloaded_set] + + if args.mode == 'frames': + result = [ + dict( + frame_dir=k[0], total_frames=parse_result[k[0]][1], label=k[1]) + for k in data_list + ] + elif args.mode == 'videos': + result = [dict(filename=k[0] + '.mp4', label=k[1]) for k in data_list] + mmcv.dump(result, args.output) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..59f3fa18bf954b3598bf08041504630e3cb387c6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_rawframes_filelist.sh @@ -0,0 +1,5 @@ +# to generate file list of frames +python generate_file_list.py --input_csv ../../../data/hvu/annotations/hvu_train.csv --src_dir ../../../data/hvu/rawframes_train \ + --output ../../../data/hvu/hvu_train.json --mode frames +python generate_file_list.py --input_csv ../../../data/hvu/annotations/hvu_val.csv --src_dir ../../../data/hvu/rawframes_val \ + --output ../../../data/hvu/hvu_val.json --mode frames diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_sub_file_list.py b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_sub_file_list.py new file mode 100644 index 0000000000000000000000000000000000000000..8313a9b3c98c61634eb437e051172325ddc303c5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_sub_file_list.py @@ -0,0 +1,42 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os.path as osp + +import mmcv + + +def main(annotation_file, category): + assert category in [ + 'action', 'attribute', 'concept', 'event', 'object', 'scene' + ] + + data = mmcv.load(annotation_file) + basename = osp.basename(annotation_file) + dirname = osp.dirname(annotation_file) + basename = basename.replace('hvu', f'hvu_{category}') + + target_file = osp.join(dirname, basename) + + result = [] + for item in data: + label = item['label'] + if category in label: + item['label'] = label[category] + result.append(item) + + mmcv.dump(data, target_file) + + +if __name__ == '__main__': + description = 'Helper script for generating HVU per-category file list.' + p = argparse.ArgumentParser(description=description) + p.add_argument( + 'annotation_file', + type=str, + help=('The annotation file which contains tags of all categories.')) + p.add_argument( + 'category', + type=str, + choices=['action', 'attribute', 'concept', 'event', 'object', 'scene'], + help='The tag category that you want to generate file list for.') + main(**vars(p.parse_args())) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..deba7b74d8c08ead8b88e61903bc56725d386b22 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/generate_videos_filelist.sh @@ -0,0 +1,5 @@ +# to generate file lists of videos +python generate_file_list.py --input_csv ../../../data/hvu/annotations/hvu_train.csv --src_dir ../../../data/hvu/videos_train \ + --output ../../../data/hvu/hvu_train_video.json --mode videos +python generate_file_list.py --input_csv ../../../data/hvu/annotations/hvu_val.csv --src_dir ../../../data/hvu/videos_val \ + --output ../../../data/hvu/hvu_val_video.json --mode videos diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/label_map.json b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/label_map.json new file mode 100644 index 0000000000000000000000000000000000000000..a591a291db84a562438e68aab04dea8e8fa18d79 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/label_map.json @@ -0,0 +1 @@ +{"action": ["abseiling", "acrobatics", "acting_in_play", "adjusting_glasses", "air_drumming", "alligator_wrestling", "alpine_skiing", "american_football", "angling", "answering_questions", "applauding", "applying_cream", "archaeological_excavation", "archery", "arguing", "arm_wrestling", "arranging_flowers", "assembling_bicycle", "assembling_computer", "attending_conference", "auctioning", "auto_racing", "backflip_human_", "baking_cookies", "ball_game", "bandaging", "barbequing", "bartending", "base_jumping", "baseball", "basketball_moves", "bathing", "bathing_dog", "baton_twirling", "battle_rope_training", "beach_soccer", "beatboxing", "bee_keeping", "belly_dancing", "bench_pressing", "bending_back", "bending_metal", "biking_through_snow", "blasting_sand", "blowdrying_hair", "blowing_bubble_gum", "blowing_glass", "blowing_leaves", "blowing_nose", "blowing_out_candles", "bmx", "boating", "bobsledding", "bodybuilding", "bodysurfing", "bookbinding", "bottling", "bouldering", "bouncing_on_bouncy_castle", "bouncing_on_trampoline", "bowling", "boxing", "braiding_hair", "breading_or_breadcrumbing", "breakdancing", "breaking_boards", "breathing_fire", "brush_painting", "brushing_hair", "brushing_teeth", "building_cabinet", "building_lego", "building_sandcastle", "building_shed", "bull_fighting", "bulldozing", "bungee_jumping", "burping", "busking", "calculating", "calf_roping", "calligraphy", "canoeing_or_kayaking", "capoeira", "capsizing", "card_game", "card_stacking", "card_throwing", "carrying_baby", "cartwheeling", "carving_ice", "carving_pumpkin", "casting_fishing_line", "catching_fish", "catching_or_throwing_baseball", "catching_or_throwing_frisbee", "catching_or_throwing_softball", "caving", "celebrating", "changing_gear_in_car", "changing_oil", "changing_wheel_not_on_bike_", "checking_tires", "cheering", "cheerleading", "chewing_gum", "chiseling_stone", "chiseling_wood", "chopping_meat", "chopping_vegetables", "chopping_wood", "choreography", "clam_digging", "clapping", "clay_pottery_making", "clean_and_jerk", "cleaning_gutters", "cleaning_pool", "cleaning_shoes", "cleaning_toilet", "cleaning_windows", "climbing", "climbing_a_rope", "climbing_ladder", "climbing_tree", "clipping_cat_claws", "coloring_in", "combing_hair", "contact_juggling", "contorting", "control", "cooking", "cooking_egg", "cooking_on_campfire", "cooking_sausages_not_on_barbeque_", "cooking_scallops", "cosplaying", "counting_money", "country_line_dancing", "cracking_back", "cracking_knuckles", "cracking_neck", "craft", "crawling_baby", "crochet", "croquet", "cross", "cross_country_cycling", "crossing_eyes", "crossing_river", "crying", "cumbia", "curling_hair", "curling_sport_", "cutting_apple", "cutting_nails", "cutting_orange", "cutting_pineapple", "cutting_the_grass", "cutting_watermelon", "cycling", "dance", "dancing_ballet", "dancing_charleston", "dancing_gangnam_style", "dancing_macarena", "deadlifting", "decorating_the_christmas_tree", "delivering_mail", "dining", "directing_traffic", "disc_dog", "disc_golfing", "diving", "diving_cliff", "docking_boat", "dodgeball", "doing_a_powerbomb", "doing_aerobics", "doing_jigsaw_puzzle", "doing_karate", "doing_kickboxing", "doing_laundry", "doing_motocross", "doing_nails", "downhill_mountain_biking", "drawing", "dribbling_basketball", "drinking", "drinking_shots", "driving_car", "driving_tractor", "drooling", "drop_kicking", "drum_corps", "drumming_fingers", "dumpster_diving", "dunking_basketball", "dyeing_eyebrows", "dyeing_hair", "eating", "eating_burger", "eating_cake", "eating_carrots", "eating_chips", "eating_doughnuts", "eating_hotdog", "eating_ice_cream", "eating_spaghetti", "eating_watermelon", "egg_hunting", "embroidering", "equitation", "exercising_with_an_exercise_ball", "extinguishing_fire", "faceplanting", "falling_off_bike", "falling_off_chair", "feeding_birds", "feeding_fish", "feeding_goats", "fencing_sport_", "fidgeting", "fight", "figure_skating", "finger_snapping", "fishing", "fixing_bicycle", "fixing_hair", "fixing_the_roof", "flint_knapping", "flipping_pancake", "fly_casting", "fly_fishing", "fly_tying", "flying_kite", "folding_clothes", "folding_napkins", "folding_paper", "folk_dance", "front_raises", "frying", "frying_vegetables", "futsal", "gambling", "geocaching", "getting_a_haircut", "getting_a_piercing", "getting_a_tattoo", "giving_or_receiving_award", "gliding", "gold_panning", "golf", "golf_chipping", "golf_driving", "golf_putting", "gospel_singing_in_church", "grappling", "grilling", "grinding_meat", "grooming_dog", "grooming_horse", "gymnastics", "gymnastics_tumbling", "hammer_throw", "hand_car_wash", "hand_washing_clothes", "harvest", "head_stand", "headbanging", "headbutting", "high_jump", "high_kick", "historical_reenactment", "hitting_a_pinata", "hitting_baseball", "hockey_stop", "holding_snake", "home_roasting_coffee", "hopscotch", "hoverboarding", "huddling", "hugging_baby", "hugging_not_baby_", "hula_hooping", "hunt_seat", "hurdling", "hurling_sport_", "ice_climbing", "ice_fishing", "ice_skating", "ice_swimming", "inflating_balloons", "inline_skating", "installing_carpet", "ironing", "ironing_hair", "javelin_throw", "jaywalking", "jetskiing", "jogging", "juggling_balls", "juggling_fire", "juggling_soccer_ball", "jumping", "jumping_bicycle", "jumping_into_pool", "jumping_jacks", "jumpstyle_dancing", "karaoke", "kicking_field_goal", "kicking_soccer_ball", "kissing", "kitesurfing", "knitting", "krumping", "land_sailing", "laughing", "lawn_mower_racing", "laying_bricks", "laying_concrete", "laying_stone", "laying_tiles", "layup_drill_in_basketball", "learning", "leatherworking", "licking", "lifting_hat", "lighting_fire", "lock_picking", "logging", "long_jump", "longboarding", "looking_at_phone", "luge", "lunge", "making_a_cake", "making_a_lemonade", "making_a_sandwich", "making_an_omelette", "making_balloon_shapes", "making_bubbles", "making_cheese", "making_horseshoes", "making_jewelry", "making_paper_aeroplanes", "making_pizza", "making_snowman", "making_sushi", "making_tea", "making_the_bed", "marching", "marching_percussion", "marriage_proposal", "massaging_back", "massaging_feet", "massaging_legs", "massaging_neck", "massaging_person_s_head", "milking_cow", "modern_dance", "moon_walking", "mopping_floor", "mosh_pit_dancing", "motorcycling", "mountain_biking", "mountain_climber_exercise_", "moving_furniture", "mowing_lawn", "mushroom_foraging", "needle_felting", "needlework", "news_anchoring", "opening_bottle_not_wine_", "opening_door", "opening_present", "opening_refrigerator", "opening_wine_bottle", "origami", "outdoor_recreation", "packing", "painting_fence", "painting_furniture", "pan_frying", "parachuting", "paragliding", "parasailing", "parkour", "passing_american_football_in_game_", "passing_american_football_not_in_game_", "passing_soccer_ball", "peeling_apples", "peeling_potatoes", "percussion", "person_collecting_garbage", "petting_animal_not_cat_", "petting_cat", "photobombing", "photocopying", "photograph", "physical_exercise", "picking_fruit", "pillow_fight", "pinching", "pirouetting", "pitch", "planing_wood", "planting_trees", "plastering", "plataform_diving", "playing_accordion", "playing_badminton", "playing_bagpipes", "playing_basketball", "playing_bass_guitar", "playing_beer_pong", "playing_blackjack", "playing_cello", "playing_chess", "playing_clarinet", "playing_congas", "playing_controller", "playing_cricket", "playing_cymbals", "playing_darts", "playing_didgeridoo", "playing_dominoes", "playing_drums", "playing_field_hockey", "playing_flute", "playing_gong", "playing_guitar", "playing_hand_clapping_games", "playing_harmonica", "playing_harp", "playing_ice_hockey", "playing_keyboard", "playing_kickball", "playing_lacrosse", "playing_laser_tag", "playing_lute", "playing_maracas", "playing_marbles", "playing_monopoly", "playing_netball", "playing_ocarina", "playing_organ", "playing_paintball", "playing_pan_pipes", "playing_piano", "playing_pinball", "playing_ping_pong", "playing_poker", "playing_polo", "playing_recorder", "playing_rubiks_cube", "playing_saxophone", "playing_scrabble", "playing_squash_or_racquetball", "playing_ten_pins", "playing_tennis", "playing_trombone", "playing_trumpet", "playing_ukulele", "playing_violin", "playing_volleyball", "playing_water_polo", "playing_with_trains", "playing_xylophone", "poking_bellybutton", "pole_vault", "polishing_forniture", "polishing_metal", "popping_balloons", "pouring_beer", "powerbocking", "preparing_pasta", "preparing_salad", "presenting_weather_forecast", "print", "public_speaking", "pull_ups", "pumping_fist", "pumping_gas", "punch", "punching_bag", "punching_person_boxing_", "purl", "push_up", "pushing_car", "pushing_cart", "pushing_wheelbarrow", "pushing_wheelchair", "putting_in_contact_lenses", "putting_on_eyeliner", "putting_on_foundation", "putting_on_lipstick", "putting_on_mascara", "putting_on_sari", "putting_on_shoes", "rafting", "raising_eyebrows", "raking_leaves", "reading", "reading_book", "reading_newspaper", "recording_music", "recreation", "recreational_fishing", "removing_curlers", "repairing_puncture", "riding_a_bike", "riding_bumper_cars", "riding_camel", "riding_elephant", "riding_mechanical_bull", "riding_mower", "riding_mule", "riding_or_walking_with_horse", "riding_scooter", "riding_snow_blower", "riding_unicycle", "ripping_paper", "river_tubing", "roasting", "roasting_marshmallows", "roasting_pig", "robot_dancing", "rock_climbing", "rock_scissors_paper", "rodeo", "roller_skating", "rollerblading", "rolling_pastry", "roof_shingle_removal", "rope_pushdown", "running", "running_on_treadmill", "sailing", "salsa_dancing", "sanding_floor", "sausage_making", "sawing_wood", "scrambling_eggs", "scrapbooking", "scrubbing_face", "scuba_diving", "separating_eggs", "setting_table", "sewing", "shaking_hands", "shaking_head", "shaping_bread_dough", "sharpening_knives", "sharpening_pencil", "shaving_head", "shaving_legs", "shearing_sheep", "shining_flashlight", "shining_shoes", "shooting", "shooting_basketball", "shooting_goal_soccer_", "shopping", "shot_put", "shoveling_snow", "shucking_oysters", "shuffling_cards", "shuffling_feet", "side_kick", "sign_language_interpreting", "singing", "sipping_cup", "sitting", "situp", "skateboarding", "ski_jumping", "skiing", "skiing_crosscountry", "skiing_mono", "skiing_slalom", "skipping_rope", "skipping_stone", "skydiving", "slacklining", "slapping", "sled_dog_racing", "sledding", "sleeping", "smashing", "smelling_feet", "smile", "smoking", "smoking_hookah", "smoking_pipe", "snatch_weight_lifting", "sneezing", "snorkeling", "snow_tubing", "snowboarding", "snowkiting", "snowmobiling", "soccer", "softball", "somersaulting", "sparring", "spelunking", "spinning_poi", "sports_training", "spray_painting", "spread_mulch", "springboard_diving", "sprint", "square_dancing", "squat", "standing", "standing_on_hands", "staring", "steer_roping", "sticking_tongue_out", "stitch", "stomping_grapes", "stone_carving", "strength_training", "stretching_arm", "stretching_leg", "sucking_lolly", "surf_fishing", "surfing_crowd", "surfing_water", "sweeping_floor", "swimming", "swimming_backstroke", "swimming_breast_stroke", "swimming_butterfly_stroke", "swimming_front_crawl", "swing_dancing", "swinging_baseball_bat", "swinging_on_something", "sword_fighting", "sword_swallowing", "table_soccer", "tackling", "tagging_graffiti", "tai_chi", "talking_on_cell_phone", "tango_dancing", "tap_dancing", "tapping_guitar", "tapping_pen", "tasting_beer", "tasting_food", "tasting_wine", "testifying", "texting", "threading_needle", "throwing_axe", "throwing_ball_not_baseball_or_american_football_", "throwing_discus", "throwing_knife", "throwing_snowballs", "throwing_tantrum", "throwing_water_balloon", "tickling", "tie_dying", "tightrope_walking", "tiptoeing", "tobogganing", "tossing_coin", "track_and_field", "trail_riding", "training_dog", "trapezing", "trimming_or_shaving_beard", "trimming_shrubs", "trimming_trees", "triple_jump", "twiddling_fingers", "tying_bow_tie", "tying_knot_not_on_a_tie_", "tying_necktie", "tying_shoe_laces", "unboxing", "underwater_diving", "unloading_truck", "using_a_microscope", "using_a_paint_roller", "using_a_power_drill", "using_a_sledge_hammer", "using_a_wrench", "using_atm", "using_bagging_machine", "using_circular_saw", "using_inhaler", "using_puppets", "using_remote_controller_not_gaming_", "using_segway", "using_the_monkey_bar", "using_the_pommel_horse", "vacuuming_floor", "visiting_the_zoo", "wading_through_mud", "wading_through_water", "waiting_in_line", "waking_up", "walking", "walking_the_dog", "walking_through_snow", "washing_dishes", "washing_feet", "washing_hair", "washing_hands", "waste", "watching_tv", "water_skiing", "water_sliding", "watering_plants", "waving_hand", "waxing_back", "waxing_chest", "waxing_eyebrows", "waxing_legs", "weaving", "weaving_basket", "weaving_fabric", "welding", "whistling", "wicker_weaving", "windsurfing", "winking", "wood_burning_art_", "worship", "wrapping_present", "wrestling", "writing", "yarn_spinning", "yawning", "yoga", "zumba"], "attribute": ["afro", "aggression", "al_dente", "angora", "art_paper", "asphalt", "azure", "bangs", "barechestedness", "beauty", "beige", "black", "black_and_white", "black_hair", "blond", "blue", "bmw", "boiling", "brass", "bricks_and_mortar", "brown", "brown_hair", "caffeine", "calm", "camouflage", "caramel_color", "cardboard", "ceramic", "citric_acid", "classic", "clay", "cleft", "cobalt_blue", "coca_cola", "complexion", "concrete", "cool", "dairy", "darkness", "daytime", "deciduous", "denim", "drama", "elder", "electric_blue", "emerald", "evergreen", "explosive_material", "floating", "fluid", "flyweight", "forward", "freezing", "fun", "glitter", "gold", "granite", "green", "happy", "human_hair_color", "hunky", "inflatable", "iron", "laminate", "layered_hair", "leather", "leisure", "lilac", "long_hair", "magenta", "maroon", "metal", "metropolis", "military", "moist", "monochrome", "multimedia", "neon", "orange", "origami_paper", "paper", "patchwork", "peach", "pigtail", "pink", "plane", "plastic", "platinum_blond", "plush", "plywood", "polka_dot", "pompadour", "purple", "rapid", "red", "red_hair", "reflection", "satin", "shade", "silk", "silver", "sweetness", "symmetry", "synthetic_rubber", "teal", "transparency_and_translucency", "turquoise", "velvet", "violet", "white", "wood", "wool", "woolen", "woven_fabric", "wrinkle", "yellow", "youth"], "concept": ["aerial_photography", "agriculture", "air_force", "air_sports", "american_food", "ancient_history", "angle", "animal_migration", "animal_source_foods", "animal_sports", "arch", "architecture", "army", "art", "artistic_gymnastics", "asian_food", "athletics", "audience", "automotive_design", "automotive_exterior", "aviation", "baked_goods", "ball_over_a_net_games", "bat_and_ball_games", "benthos", "blessing", "boardsport", "brand", "business", "cable_management", "cellular_network", "choir", "circle", "circus", "class", "classic_car", "classical_music", "clergy", "clip_art", "close_up", "collaboration", "color_guard", "combat_sport", "comfort", "comfort_food", "commodity", "community", "computer_program", "concert_band", "confectionery", "construction", "contact_sport", "convenience_food", "costume_design", "court", "court_game", "crew", "crowd", "cube", "cuisine", "currency", "cycle_sport", "cylinder", "decor", "design", "dialog_box", "diet_food", "display_advertising", "dog_breed", "dog_sports", "doubles", "dressage", "east_asian_food", "ecosystem", "electrical_network", "electricity", "electronics", "emergency", "emergency_service", "emotion", "endurance_sports", "energy", "engineering", "ensemble", "entertainment", "equestrian_sport", "erg", "european_food", "extreme_sport", "facial_expression", "family", "fashion_design", "fast_food", "fauna", "fictional_character", "field_game", "film", "finger_food", "fixed_link", "floral_design", "floristry", "font", "fried_food", "friendship", "frozen_food", "games", "geological_phenomenon", "geology", "german_food", "golf_club", "graffito", "graphic_design", "graphics", "grilled_food", "hairstyle", "handwriting", "health_care", "heart", "heat", "herd", "history", "human_behavior", "individual_sports", "indoor_games_and_sports", "industry", "infrastructure", "interaction", "interior_design", "inventory", "italian_food", "japanese_cuisine", "japanese_martial_arts", "job", "junk_food", "kite_sports", "land_vehicle", "laser", "laughter", "law_enforcement", "light_commercial_vehicle", "lighting", "line", "line_art", "local_food", "lockstitch", "logo", "love", "luxury_vehicle", "luxury_yacht", "major_appliance", "male", "management", "map", "marching_band", "marine_mammal", "martial_arts", "mass_production", "match_play", "meal", "medal_play", "medical", "medicine", "memorial", "mesh", "meteorological_phenomenon", "mid_size_car", "military_officer", "military_organization", "military_rank", "mineral", "mixture", "mode_of_transport", "modern_art", "money", "monochrome_photography", "motorsport", "music", "musical_ensemble", "natural_foods", "nature", "news", "non_sporting_group", "number", "off_road", "official", "orchestra", "organism", "pachyderm", "packaging_and_labeling", "painting", "party_supply", "pattern", "people", "performance", "performing_arts", "physical_fitness", "pint_us", "plaid", "plant_community", "plaster", "police", "pollinator", "pollution", "pop_music", "primate", "public_transport", "public_utility", "pyramid", "racquet_sport", "rapid_transit", "real_estate", "recipe", "rectangle", "religion", "research", "rock", "roller_sport", "romance", "rose_order", "seafood", "security", "selfie", "service", "shadow", "shelving", "shoal", "shooting_sport", "side_dish", "silhouette", "singles", "skin_care", "social_group", "software", "song", "spanish_cuisine", "sphere", "spiral", "spoor", "sport", "spotlight", "spring_break", "square", "star", "stick_and_ball_games", "stick_and_ball_sports", "still_life", "still_life_photography", "stock_photography", "street_art", "street_food", "striking_combat_sports", "stucco", "superfood", "surface_water_sports", "symbol", "tartan", "taste", "team", "team_sport", "technology", "telephony", "television_program", "tool", "tourism", "towed_water_sport", "tradition", "traditional_sport", "traffic", "tread", "triangle", "tribe", "troop", "underwater", "vegetarian_food", "vegetation", "video_game_software", "visual_arts", "war", "waste_containment", "water_ball_sports", "water_sport", "water_transportation", "watercraft", "weapon", "weapon_combat_sports", "website", "whole_food", "wildlife", "wind", "windsports", "winter_sport"], "event": ["800_metres", "adventure", "air_travel", "art_exhibition", "auto_show", "autumn", "award_ceremony", "banquet", "bedtime", "breakfast", "broad_jump", "brunch", "carnival", "ceremony", "championship", "christmas", "competition", "concert", "conference", "convention", "conversation", "decathlon", "demonstration", "dinner", "disaster", "evening", "exhibition", "festival", "flight", "freight_transport", "general_aviation", "graduation", "halloween", "heptathlon", "holiday", "lecture", "lunch", "manicure", "marathon", "massage", "meeting", "morning", "multi_sport_event", "news_conference", "night", "parade", "party", "photo_shoot", "picnic", "presentation", "protest", "public_event", "race", "ritual", "road_trip", "rock_concert", "safari", "seminar", "ski_cross", "speech", "spring", "summer", "sunrise_and_sunset", "supper", "tournament", "vacation", "wedding", "wedding_reception", "winter"], "object": ["abdomen", "academic_dress", "accordion", "accordionist", "acoustic_electric_guitar", "acoustic_guitar", "acrylic_paint", "action_figure", "active_undergarment", "adding_machine", "aegean_cat", "aerialist", "african_elephant", "agaric", "agaricaceae", "agaricomycetes", "agaricus", "agricultural_machinery", "agriculturist", "aioli", "air_bubble", "air_gun", "aircraft", "airliner", "alaskan_malamute", "album_cover", "alcoholic_beverage", "ale", "algae", "all_terrain_vehicle", "all_xbox_accessory", "alligator", "alloy_wheel", "alpinist", "alto_horn", "american_alligator", "american_pit_bull_terrier", "amusement_ride", "ananas", "anchor", "angle_grinder", "animal_fat", "ankle", "annual_plant", "antique", "antique_car", "appetizer", "apple", "aqua", "aqualung", "aquanaut", "aquarium", "aquatic_plant", "aquifoliaceae", "arabian_camel", "arcade_game", "archer", "arecales", "arm", "artifact", "artificial_fly", "artificial_turf", "artisan", "artwork", "athlete", "athletic_shoe", "audio_engineer", "audio_equipment", "auto_part", "automaton", "automotive_engine_part", "automotive_exhaust", "automotive_lighting", "automotive_mirror", "automotive_tire", "automotive_wheel_system", "automotive_window_part", "ax", "ax_handle", "baby_buggy", "baby_carrier", "baby_products", "baby_toys", "back", "backboard", "backhoe", "backseat", "bag", "bagel", "baggage", "bagpipes", "bait", "baker", "balance_beam", "balcony", "ball", "ballet_dancer", "ballet_skirt", "balloon", "baluster", "bandage", "banderillero", "bandoneon", "banjo", "banner", "barbell", "barber", "baritone_saxophone", "barramundi", "barrel", "barrow", "bartender", "barware", "baseball_bat", "baseball_cap", "baseball_equipment", "baseball_player", "basket", "basketball_player", "bass", "bass_drum", "bass_fiddle", "bass_guitar", "bass_oboe", "bassinet", "bassist", "bassoon", "bathing_cap", "bathroom_accessory", "bathroom_sink", "bathtub", "batter", "bayonne_ham", "bead", "beak", "beam", "bean", "beanie", "beard", "bed", "bed_frame", "bed_sheet", "bedding", "bedrock", "bee", "beef", "beef_tenderloin", "beehive", "beekeeper", "beer", "beer_cocktail", "beer_glass", "belay_device", "bell_peppers_and_chili_peppers", "bench", "berry", "beyaz_peynir", "bib", "bichon", "bicycle", "bicycle_accessory", "bicycle_chain", "bicycle_drivetrain_part", "bicycle_frame", "bicycle_handlebar", "bicycle_helmet", "bicycle_part", "bicycle_saddle", "bicycle_tire", "bicycle_wheel", "bidet", "big_cats", "bikini", "billboard", "bin", "birch", "bird", "birthday_cake", "biscuit", "black_belt", "black_cat", "blackboard", "blacksmith", "blade", "blazer", "blender", "block", "blood", "blossom", "blouse", "blue_collar_worker", "bmx_bike", "boa_constrictor", "board_game", "boas", "boat", "boats_and_boating_equipment_and_supplies", "bobsled", "bocce_ball", "bodybuilder", "bolete", "bonfire", "bongo", "bony_fish", "book", "bookcase", "boot", "bottle", "bottled_water", "boulder", "bouquet", "bow_and_arrow", "bow_tie", "bowed_string_instrument", "bowie_knife", "bowl", "bowler", "bowling_ball", "bowling_equipment", "bowling_pin", "box", "boxing_equipment", "boxing_glove", "boy", "bracelet", "brake_disk", "branch", "brass_instrument", "brassiere", "bratwurst", "bread", "bread_dough", "brick", "bricklayer", "brickwork", "bridal_clothing", "bride", "bridle", "briefs", "broccoli", "brochette", "bromeliaceae", "broom", "broth", "brush", "bubble", "bubble_gum", "bucket", "bugle", "bull", "bulldozer", "bullfighter", "bumper", "bumper_car", "bun", "bungee", "buoyancy_compensator", "bus", "businessperson", "butcher", "buttercream", "button", "button_accordion", "cab", "cabin_cruiser", "cabinet", "cabinetry", "cable", "caesar_salad", "cage", "cake", "calf", "camel", "camera", "camera_accessory", "camera_lens", "camera_operator", "camgirl", "campfire", "candle", "cannon", "canoe", "cap", "car", "car_mirror", "car_seat", "car_seat_cover", "car_tire", "car_wheel", "carbonara", "carbonated_soft_drinks", "cardboard_box", "caricaturist", "carnivoran", "carpenter", "carpet", "carriage", "carrot", "cart", "carton", "cartoon", "carving", "cash", "cash_machine", "cat", "catamaran", "cattle_like_mammal", "ceiling", "celesta", "cellist", "cello", "cellular_telephone", "center_console", "central_processing_unit", "centrepiece", "chain", "chain_link_fencing", "chain_saw", "chair", "chalk", "champagne", "champagne_stemware", "charcoal", "charcuterie", "chariot", "chassis", "cheek", "cheerleader", "cheerleading_uniform", "cheese", "cheese_pizza", "cheeseburger", "chef", "cherry", "chess_master", "chessboard", "chessman", "chest", "chest_hair", "chest_of_drawers", "chicken", "chihuahua", "child", "chin", "chip", "chocolate", "chocolate_brownie", "chocolate_cake", "chocolate_chip_cookie", "chocolate_spread", "choreographer", "christmas_decoration", "christmas_lights", "christmas_tree", "chute", "circuit", "circuit_component", "circular_saw", "circus_acrobat", "citrullus", "citrus", "city_car", "clam", "clams_oysters_mussels_and_scallops", "clarinet", "clarinet_family", "clavier", "clementine", "climber", "climbing_frame", "climbing_harness", "closet", "clothes_closet", "clothes_dryer", "clothes_hamper", "clothing", "cloud", "clown", "coat", "cobblestone", "cockapoo", "cocktail", "cocktail_dress", "cocktail_garnish", "coconut", "cod", "coffee", "coffee_bean", "coffee_cup", "coffee_table", "coin", "cola", "colander", "cold_weapon", "collage", "collar", "collection", "collie", "color_television", "colt", "colubridae", "column", "comb", "comforter", "commercial_vehicle", "common_pet_parakeet", "communication_device", "commuter", "compact_car", "compact_van", "companion_dog", "composite_material", "compound_microscope", "computer", "computer_accessory", "computer_case", "computer_component", "computer_cooling", "computer_hardware", "computer_keyboard", "concert_grand", "concertina", "condiment", "conifer", "construction_equipment", "construction_worker", "convertible", "cookie", "cookie_sheet", "cookies_and_crackers", "cookware_accessory", "cookware_and_bakeware", "cor_anglais", "coral", "coral_reef_fish", "cornet", "cosmetics", "costume", "couch", "countertop", "coverall", "cow_goat_family", "cowbarn", "cowboy", "cowboy_hat", "craftsman", "crampon", "crane", "cravat", "cream", "cream_cheese", "cricket_bat", "cricketer", "crochet_needle", "crocodile", "crocodilia", "crop", "croquet_mallet", "crossword_puzzle", "cruciferous_vegetables", "crystal", "cuatro", "cucumber", "cucumber_gourd_and_melon_family", "cucumis", "cucurbita", "cumulus", "cup", "cupboard", "curbstone", "curd", "curtain", "customer", "cut_flowers", "cutlery", "cymbal", "dairy_cattle", "dairy_cow", "dairy_product", "dance_dress", "dancer", "dashboard", "data_storage_device", "date_palm", "defenseman", "desk", "desktop_computer", "dessert", "dhow", "diaper", "diatonic_button_accordion", "digital_clock", "dining_table", "dinnerware_set", "dip", "discinaceae", "dish", "dishware", "dishwasher", "disk_jockey", "display_case", "display_device", "display_window", "distilled_beverage", "divemaster", "diver", "diving_equipment", "diving_mask", "dobok", "document", "dog", "dog_sled", "doll", "dolphin", "dome", "domestic_rabbit", "donkey", "door", "door_handle", "double_bass", "dough", "drawer", "dress", "dress_shirt", "drill", "drink", "drinker", "drinking_water", "drinkware", "drop", "drum", "drumhead", "drummer", "drumstick", "dry_suit", "dryer", "duck", "ducks_geese_and_swans", "dumbbell", "dump_truck", "duplicator", "dustpan", "ear", "earl_grey_tea", "earrings", "eating_apple", "edger", "edible_mushroom", "egg", "egg_yolk", "electric_guitar", "electric_organ", "electric_piano", "electrical_supply", "electrical_wiring", "electronic_component", "electronic_device", "electronic_keyboard", "electronic_musical_instrument", "electronic_signage", "electronics_accessory", "elephant", "elliptical_trainer", "emblem", "emergency_vehicle", "engine", "engineer", "envelope", "epee", "equestrian", "espresso", "euphonium", "executive_car", "exercise_bike", "exercise_equipment", "exercise_machine", "exhaust_system", "eye", "eye_shadow", "eyebrow", "eyelash", "eyewear", "facade", "face", "facial_hair", "family_car", "fan", "farm_machine", "farmer", "farmworker", "fashion_accessory", "fashion_model", "faucet", "feather", "feather_boa", "feature_phone", "fedora", "fence", "fencing_sword", "fencing_weapon", "fern", "ferry", "fiddle", "field_hockey_ball", "figure_skater", "figurine", "fin", "finger", "finger_paint", "fipple_flute", "fir", "fire", "firearm", "firefighter", "fireplace", "fish", "fish_feeder", "fisherman", "fishing_bait", "fishing_lure", "fishing_rod", "fishing_vessel", "fitness_professional", "flag", "flag_of_the_united_states", "flagstone", "flashlight", "flat_panel_display", "flatbread", "flautist", "flightless_bird", "flooring", "florist", "flour", "flourless_chocolate_cake", "flower", "flower_bouquet", "flowering_plant", "flowerpot", "flush_toilet", "flute", "flutist", "fly", "foal", "foil", "folk_dancer", "folk_instrument", "fondant", "food", "food_processor", "foot", "football_equipment_and_supplies", "football_helmet", "football_player", "footwear", "forehead", "fork", "forklift_truck", "formal_wear", "fortepiano", "foundation", "fountain", "fountain_pen", "free_reed_aerophone", "french_fries", "fret", "fried_egg", "fried_rice", "frost", "frozen_dessert", "fruit", "fruit_tree", "frying_pan", "fuel", "full_size_car", "fungus", "fur", "fur_clothing", "furniture", "gadget", "galliformes", "game_controller", "garbage_heap", "garbage_man", "garbage_truck", "garden_roses", "gardener", "garmon", "garnish", "gas_burner", "gas_pump", "gas_ring", "gate", "gauge", "gazebo", "gear", "gearshift", "gemstone", "german_shepherd_dog", "german_spitz", "gift", "gin_and_tonic", "giraffe", "girl", "glass", "glassblower", "glasses", "glider", "glockenspiel", "glove", "glutinous_rice", "go_kart", "goal", "goat", "goat_antelope", "goggles", "golden_retriever", "goldfish", "golf_ball", "golf_equipment", "golfcart", "golfer", "gourd", "gown", "graffiti", "grand_piano", "grape", "grapevine_family", "grass", "gravel", "great_dane", "greek_salad", "green_algae", "green_bean", "greenland_dog", "grenadier", "greyhound", "griddle", "grocer", "groom", "groundcover", "guard_dog", "guard_rail", "guitar", "guitar_accessory", "guitarist", "gymnast", "hair", "hair_accessory", "hair_coloring", "hair_dryer", "hairbrush", "hairdresser", "halter", "hamburger", "hammer", "hand", "hand_calculator", "hand_drum", "hand_glass", "handbag", "handcart", "handlebar", "handrail", "hang_glider", "hard_hat", "hardware", "hardware_accessory", "harmonica", "harp", "harvester", "hat", "hatchback", "hatchet", "havanese", "hay", "head", "head_restraint", "headgear", "headphones", "headpiece", "hearth", "heat_sink", "hedge", "heel", "helmet", "herb", "high_heeled_footwear", "highchair", "hip", "hockey_protective_equipment", "hockey_stick", "home_accessories", "home_appliance", "home_door", "home_fencing", "home_game_console_accessory", "honey_bee", "honeycomb", "hood", "hoodie", "horizontal_bar", "horn", "hors_d_oeuvre", "horse", "horse_and_buggy", "horse_harness", "horse_like_mammal", "horse_supplies", "horse_tack", "horse_trainer", "horseman", "hospital_bed", "hot_air_balloon", "hot_pot", "hot_tub", "household_cleaning_supply", "houseplant", "hub_gear", "hubcap", "human", "human_body", "human_leg", "hunting_dog", "hurdle", "hybrid_bicycle", "ice", "ice_cream", "ice_cream_cone", "ice_lolly", "ice_skate", "iceberg", "icing", "illustration", "indian_elephant", "infant", "infant_bed", "infantry", "inflatable_boat", "ingredient", "input_device", "insect", "invertebrate", "io_card", "iris", "ivy", "jack_o_lantern", "jacket", "jasmine_rice", "javelin", "jaw", "jeans", "jersey", "jewellery", "jigsaw_puzzle", "jockey", "joint", "jointer", "journalist", "joystick", "juggler", "juice", "jungle_gym", "kayak", "kettle", "keyboard_instrument", "keyboard_player", "kielbasa", "kilt", "kisser", "kitchen_appliance", "kitchen_knife", "kite", "kitten", "knackwurst", "knee", "knife", "knit_cap", "knitting_needle", "knot", "koi", "konghou", "lab_coat", "label", "labrador_retriever", "lace", "lacrosse_stick", "lacrosse_training_equipment", "ladder", "lamp", "laptop", "lasso", "latch", "lathe", "laundry", "lawn", "lcd_tv", "lead_pencil", "leaf", "leaf_vegetable", "leash", "led_backlit_lcd_display", "leggings", "lemon", "lemonade", "lens", "leotard", "lettuce", "lever", "ligament", "light_bulb", "light_fixture", "light_microscope", "lighter", "lighting_accessory", "lineman", "linens", "lingerie", "lip", "lip_gloss", "lipstick", "liquor_shelf", "litter", "little_black_dress", "livestock", "lobe", "lock", "locker", "locomotive", "loggerhead", "lollipop", "longboard", "loom", "lotion", "loudspeaker", "lovebird", "loveseat", "lumber", "lute", "macaw", "machine", "machine_tool", "magazine", "maillot", "makeup", "mallet", "maltese", "mammal", "man", "mandarin_orange", "mandolin", "mane", "maraca", "marcher", "mare", "marimba", "marine_invertebrates", "marines", "mask", "mason_jar", "mast", "mat", "matador", "matsutake", "mattress", "mattress_pad", "mcintosh", "measuring_instrument", "meat", "meat_grinder", "mechanic", "media_player", "medical_assistant", "medical_equipment", "medical_glove", "medicine_ball", "melee_weapon", "mellophone", "melon", "membrane_winged_insect", "mender", "metal_lathe", "metalsmith", "microcontroller", "microphone", "microscope", "microwave_oven", "miler", "military_camouflage", "military_person", "military_uniform", "milk", "miniature_poodle", "minibus", "minivan", "mirror", "mixer", "mixing_bowl", "mixing_console", "mobile_device", "mobile_phone", "model", "monument", "moped", "moss", "motherboard", "motocross_bike", "motor_scooter", "motor_ship", "motor_vehicle", "motorboat", "motorcycle", "motorcycle_accessories", "motorcyclist", "motorized_wheelchair", "mountain_bike", "mountaineer", "moustache", "mouth", "mower", "mud", "mug", "mule", "mural", "muscle", "musher", "mushroom", "musical_instrument", "musical_instrument_accessory", "musical_keyboard", "musician", "musket", "nail", "nail_polish", "neck", "necklace", "necktie", "needle", "neon_lamp", "neon_sign", "net", "newscaster", "newspaper", "nib", "nightwear", "non_alcoholic_beverage", "non_commissioned_officer", "non_skin_percussion_instrument", "noodle", "nose", "numeric_keypad", "oars", "oboist", "ocarina", "off_road_vehicle", "office_equipment", "office_supplies", "oil_paint", "open_wheel_car", "optical_instrument", "orator", "organ", "organ_pipe", "organist", "outdoor_furniture", "outdoor_grill", "outdoor_play_equipment", "outdoor_power_equipment", "outdoor_shoe", "outdoor_structure", "outerwear", "output_device", "overhead_power_line", "ox", "oxygen_mask", "oyster", "oyster_mushroom", "oyster_shell", "pack_animal", "paddle", "padlock", "paintball_equipment", "paintball_gun", "palm_tree", "pan", "panelist", "pantyhose", "paper_product", "paper_towel", "parachute", "parakeet", "parallel_bars", "park_bench", "parquet", "parrot", "parsley", "passenger", "passenger_ship", "pasta", "pastry", "patient", "paving", "paw", "pawn", "pearl", "pebble", "pedestrian", "peel", "pen", "pencil", "pencil_sharpener", "pepperoni", "percussion_accessory", "percussion_instrument", "percussionist", "performance_car", "perico", "personal_computer", "personal_digital_assistant", "personal_flotation_device", "personal_protective_equipment", "petal", "pezizales", "photocopier", "physical_therapist", "physician", "pianet", "pianist", "piano", "piano_keyboard", "picador", "picket_fence", "pickup_truck", "picnic_boat", "pig", "pig_like_mammal", "pigeon", "pigeons_and_doves", "pillow", "pilot_boat", "pinata", "pinball_machine", "pine", "pine_family", "pineapple", "pinscher", "pint_glass", "pipe", "pizza", "pizza_cheese", "plant", "plant_stem", "plastic_bag", "plate", "platter", "play_vehicle", "player", "playground_slide", "playpen", "playstation_3_accessory", "playstation_accessory", "pliers", "plimsoll", "plucked_string_instruments", "plumbing", "plumbing_fixture", "pocket", "pointer", "pole", "police_officer", "polo_mallet", "polo_pony", "polo_shirt", "pomeranian", "pommel_horse", "pontoon", "pony", "poodle", "porcelain", "portable_communications_device", "portable_media_player", "portrait", "poster", "potato", "potato_and_tomato_genus", "pothole", "powdered_sugar", "power_drill", "power_mower", "power_shovel", "printer", "produce", "professional_golfer", "propeller", "protective_equipment_in_gridiron_football", "protective_gear_in_sports", "pug", "pumpkin", "pungsan_dog", "puppy", "putter", "puzzle", "queen", "quill", "rabbit", "race_car", "racer", "racing_bicycle", "racket", "radial", "random_orbital_sander", "ranged_weapon", "rear_view_mirror", "recycling_bin", "red_carpet", "red_meat", "red_wine", "redhead", "reed_instrument", "refrigerator", "rein", "remote_control", "reptile", "researcher", "retaining_wall", "retriever", "ribbon", "rice", "rifle", "rim", "ring", "road_bicycle", "roast_beef", "robot", "rock_climbing_equipment", "rock_star", "rodent", "roller_blades", "roller_skates", "rolling_pin", "roof", "root", "root_vegetable", "rope", "rose", "rose_family", "rotisserie", "royal_icing", "rubber_boot", "rubble", "runner", "running_shoe", "saddle", "safe", "safety_belt", "safety_bicycle", "safety_glove", "sail", "sailboat", "sailing_ship", "salad", "salmon", "samoyed", "sand", "sand_wedge", "sandal", "sandbox", "sandwich", "sapsali", "sari", "sarong", "sash_window", "sashimi", "saucer", "sauces", "sausage", "saw", "saxhorn", "saxophone", "saxophonist", "scaffolding", "scale_model", "scaled_reptile", "scanner", "scarf", "schipperke", "schnoodle", "schooner", "scientific_instrument", "scissors", "scooter", "scoreboard", "scow", "scrap", "screen", "scuba_diver", "sculptor", "sculpture", "sea_ice", "sea_kayak", "sea_turtle", "seabird", "seaplane", "seat_belt", "seaweed", "sedan", "seed", "segway", "senior_citizen", "serger", "serpent", "serveware", "sewing_machine", "sewing_machine_needle", "shaving_cream", "shed", "sheep", "shelf", "shih_tzu", "ship", "shipwreck", "shirt", "shoe", "shopkeeper", "shopping_basket", "shopping_cart", "shorts", "shoulder", "shovel", "shower_curtain", "shrimp", "shrub", "siberian_husky", "sicilian_pizza", "sideboard", "siding", "sign", "singer", "singlet", "sink", "skateboard", "skateboarder", "skateboarding_equipment_and_supplies", "sketch", "skewer", "ski", "ski_binding", "ski_equipment", "ski_pole", "skidder", "skiff", "skin", "skin_head_percussion_instrument", "skirt", "slate_roof", "sled", "sled_dog", "sleeper", "sleeve", "sloop", "slot", "slot_machine", "small_appliance", "smartphone", "smoke", "snack", "snake", "snare_drum", "sneakers", "snorkel", "snout", "snow_thrower", "snowboard", "snowmobile", "snowplow", "snowshoe", "snowsuit", "soccer_ball", "soccer_player", "sock", "soft_drink", "soil", "soup", "space_bar", "spaghetti", "spaniel", "spatula", "speaker", "speedometer", "speleothem", "spice", "spin_dryer", "spinach", "spinach_salad", "spindle", "spinet", "spinning_wheel", "spitz", "spoke", "spokesperson", "spoon", "sport_kite", "sport_utility_vehicle", "sports_car", "sports_equipment", "sports_uniform", "sportswear", "spring_greens", "sprinkler", "spruce", "spume", "square_dancer", "squash", "stairs", "stalagmite", "stall", "stallion", "standard_poodle", "statue", "steak", "steam_iron", "steamed_rice", "steel", "steel_drum", "steering_part", "steering_wheel", "stemware", "stew", "stick", "stock_car", "stock_dove", "stocking", "stomach", "stone_wall", "stony_coral", "storage_basket", "stout", "stove_and_oven", "strainer", "straw", "streamer_fly", "street_light", "string_instrument", "string_instrument_accessory", "stubble", "student", "stuffed_toy", "stuffing", "stunt_performer", "subcompact_car", "subwoofer", "sugar_cake", "sugar_paste", "suit", "sun", "sun_hat", "sunbather", "sunglasses", "sunlight", "supercar", "superhero", "surfboard", "surfing_equipment_and_supplies", "sushi", "swab", "swan", "sweater", "sweet_grass", "swimmer", "swimsuit_bottom", "swimwear", "swing", "switch", "synthesizer", "t_shirt", "tabby_cat", "table", "table_knife", "table_tennis_racket", "tablecloth", "tabletop_game", "tableware", "tachometer", "taglierini", "tail", "tall_ship", "tank", "tarpaulin", "tattoo", "tea", "teacher", "teapot", "teddy_bear", "telephone", "television_presenter", "television_reporter", "television_set", "tennis_equipment_and_supplies", "tennis_player", "tennis_pro", "tennis_racket", "tenor_saxophonist", "tent", "terrestrial_animal", "terrestrial_plant", "terrier", "text", "textile", "theater_curtain", "therapist", "thigh", "thorns_spines_and_prickles", "thread", "thumb", "tights", "tile", "tiple", "tire", "toast", "toddler", "toe", "toilet", "toilet_tissue", "tom_tom_drum", "tomahawk", "tomato", "tongue", "tooth", "toothbrush", "top", "toppings", "torch", "torso", "torte", "tower", "toy", "toy_box", "toy_poodle", "track_spikes", "tractor", "traffic_cop", "traffic_light", "trail_bike", "trailer", "trailer_truck", "train", "trampoline", "trapeze", "travel_trailer", "tree", "tricycle", "trigger", "trombone", "trousers", "trowel", "truck", "trumpet", "trumpeter", "tub", "tudung", "tusk", "tuxedo", "twig", "uke", "umbrella", "undergarment", "underpants", "uneven_parallel_bars", "unicycle", "unicyclist", "uniform", "urinal", "vacuum_cleaner", "van", "vascular_plant", "vase", "vaulter", "vegetable", "vehicle", "vehicle_brake", "vehicle_door", "vehicle_registration_plate", "venison", "vertebrate", "vibraphone", "video_game_console", "vigil_light", "vintage_car", "vintage_clothing", "violin", "violin_family", "violinist", "violist", "vitis", "vizsla", "volleyball_net", "volleyball_player", "wagon", "waist", "waiter", "walk_behind_mower", "walker", "walking_shoe", "wall", "wardrobe", "washbasin", "washing_machine", "waste_container", "watch", "water", "water_bird", "water_feature", "water_polo_cap", "water_ski", "watercolor_paint", "waterfowl", "watering_can", "watermelon", "wave", "wedding_ceremony_supply", "wedding_dress", "wedding_ring", "weightlifter", "weights", "welder", "west_highland_white_terrier", "wetsuit", "whaler", "whales_dolphins_and_porpoises", "wheat_beer", "wheel", "wheelchair", "whipped_cream", "whippet", "whisk", "whiskers", "whisky", "whistle", "white_coat", "white_collar_worker", "white_rice", "wicker_basket", "wicket", "wig", "wildflower", "wildlife_biologist", "wind_instrument", "wind_wave", "window", "window_blind", "window_covering", "window_screen", "window_treatment", "windshield", "windshield_wiper", "wine", "wine_glass", "wing", "winter_squash", "wiper", "wire", "wire_fencing", "wok", "woman", "wood_burning_stove", "wood_stain", "woodwind_instrument", "woody_plant", "workman", "wrench", "wrestler", "wrestling_mat", "wrestling_singlet", "wrist", "xylophone", "yacht", "yakitori", "yolk"], "scene": ["aeolian_landform", "aisle", "alley", "amusement_park", "animal_shelter", "apartment", "apiary", "archaeological_site", "arena", "arroyo", "attic", "auditorium", "automobile_repair_shop", "backyard", "badlands", "bakery", "ballpark", "ballroom", "bank", "bar", "barbershop", "barn", "baseball_field", "baseball_positions", "basement", "basketball_court", "bathroom", "batting_cage", "bay", "bayou", "bazaar", "beach", "beauty_salon", "bedroom", "boardwalk", "body_of_water", "boutique", "bowling_alley", "boxing_ring", "bridge", "building", "bullring", "butcher_shop", "canyon", "cape", "carport", "casino", "cave", "channel", "chapel", "cityscape", "cliff", "clinic", "coast", "coastal_and_oceanic_landforms", "cockpit", "cocktail_lounge", "concert_hall", "condominium", "conference_hall", "coral_reef", "courtyard", "creek", "day_nursery", "deck", "desert", "dining_room", "dock", "downtown", "dune", "ecoregion", "escarpment", "estate", "factory", "fair", "farm", "fault", "field", "field_lacrosse", "fire_department", "fish_pond", "floor", "fluvial_landforms_of_streams", "football_stadium", "forest", "formation", "foundry", "function_hall", "garage", "garden", "garden_buildings", "glacial_lake", "golf_course", "grassland", "grocery_store", "grove", "gym", "hall", "harbor", "haze", "headland", "highland", "hill", "historic_site", "home", "horizon", "hospital", "hot_spring", "hotel", "hotel_room", "house", "hut", "ice_hockey_position", "ice_hockey_rink", "ice_rink", "inlet", "intersection", "kindergarten", "kitchen", "laboratory", "lake", "land_lot", "landmark", "landscape", "lane", "lecture_room", "leisure_centre", "littoral", "living_room", "log_cabin", "marina", "market", "marsh", "massif", "meadow", "meander", "metropolitan_area", "mountain", "mountain_pass", "mountain_range", "mountainous_landforms", "music_venue", "musical_theatre", "national_park", "natural_resources", "nature_reserve", "neighbourhood", "nightclub", "office", "opera", "outcrop", "paddy_field", "palace", "panorama", "park", "parking", "pasture", "path", "patio", "pavilion", "pedestrian_crossing", "performing_arts_center", "piste", "place_of_worship", "plain", "plateau", "playground", "plaza", "pond", "port", "property", "public_space", "race_track", "ranch", "reef", "religious_institute", "reservoir", "residential_area", "resort", "restaurant", "restroom", "retail", "ridge", "riparian_zone", "river", "riverbed", "road", "road_highway", "room", "rural_area", "sandbank", "sandbar", "school", "sea", "seashore", "seaside", "shack", "shooting_range", "shopping_mall", "shore", "sidewalk", "ski_slope", "sky", "skyline", "skyscraper", "snow_covered_landscape", "sport_venue", "stable", "stadium", "stage", "strand", "stream", "stream_bed", "street", "suburb", "summit", "supermarket", "swamp", "swimming_pool", "tavern", "television_room", "tennis_camp", "tennis_court", "terrain", "theatre", "toolroom", "tourist_attraction", "tower_block", "town", "town_square", "track", "tropical_beach", "tropics", "tunnel", "urban_area", "urban_design", "valley", "village", "walkway", "warehouse", "watercourse", "waterfall", "waterway", "wetland", "wildlife_region", "workshop", "yard", "zoo"]} diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/parse_tag_list.py b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/parse_tag_list.py new file mode 100644 index 0000000000000000000000000000000000000000..0871491ef8b852fe948a75c89310274727d1822d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/hvu/parse_tag_list.py @@ -0,0 +1,16 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import mmcv + +tag_list = '../../../data/hvu/annotations/hvu_categories.csv' + +lines = open(tag_list).readlines() +lines = [x.strip().split(',') for x in lines[1:]] +tag_categories = {} +for line in lines: + tag, category = line + tag_categories.setdefault(category, []).append(tag) + +for k in tag_categories: + tag_categories[k].sort() + +mmcv.dump(tag_categories, 'hvu_tags.json') diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/jester/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/README.md new file mode 100644 index 0000000000000000000000000000000000000000..2e054ab33db1cf75fdbe1940ddb0dd22942785ad --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/README.md @@ -0,0 +1,143 @@ +# Preparing Jester + +## Introduction + + + +```BibTeX +@InProceedings{Materzynska_2019_ICCV, + author = {Materzynska, Joanna and Berger, Guillaume and Bax, Ingo and Memisevic, Roland}, + title = {The Jester Dataset: A Large-Scale Video Dataset of Human Gestures}, + booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, + month = {Oct}, + year = {2019} +} +``` + +For basic dataset information, you can refer to the dataset [website](https://20bn.com/datasets/jester/v1). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/jester/`. + +## Step 1. Prepare Annotations + +First of all, you have to sign in and download annotations to `$MMACTION2/data/jester/annotations` on the official [website](https://20bn.com/datasets/jester/v1). + +## Step 2. Prepare RGB Frames + +Since the [jester website](https://20bn.com/datasets/jester/v1) doesn't provide the original video data and only extracted RGB frames are available, you have to directly download RGB frames from [jester website](https://20bn.com/datasets/jester/v1). + +You can download all RGB frame parts on [jester website](https://20bn.com/datasets/jester/v1) to `$MMACTION2/data/jester/` and use the following command to extract. + +```shell +cd $MMACTION2/data/jester/ +cat 20bn-jester-v1-?? | tar zx +cd $MMACTION2/tools/data/jester/ +``` + +For users who only want to use RGB frames, you can skip to step 5 to generate file lists in the format of rawframes. Since the prefix of official JPGs is "%05d.jpg" (e.g., "00001.jpg"), +we add `"filename_tmpl='{:05}.jpg'"` to the dict of `data.train`, `data.val` and `data.test` in the config files related with jester like this: + +``` +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +``` + +## Step 3. Extract Flow + +This part is **optional** if you only want to use RGB frames. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. + +You can run the following script to soft link SSD. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/jester_extracted/ +ln -s /mnt/SSD/jester_extracted/ ../../../data/jester/rawframes +``` + +Then, you can run the following script to extract optical flow based on RGB frames. + +```shell +cd $MMACTION2/tools/data/jester/ +bash extract_flow.sh +``` + +## Step 4. Encode Videos + +This part is **optional** if you only want to use RGB frames. + +You can run the following script to encode videos. + +```shell +cd $MMACTION2/tools/data/jester/ +bash encode_videos.sh +``` + +## Step 5. Generate File List + +You can run the follow script to generate file list in the format of rawframes and videos. + +```shell +cd $MMACTION2/tools/data/jester/ +bash generate_{rawframes, videos}_filelist.sh +``` + +## Step 5. Check Directory Structure + +After the whole data process for Jester preparation, +you will get the rawframes (RGB + Flow), and annotation files for Jester. + +In the context of the whole project (for Jester only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── jester +│ │ ├── jester_{train,val}_list_rawframes.txt +│ │ ├── jester_{train,val}_list_videos.txt +│ │ ├── annotations +│ | ├── videos +│ | | ├── 1.mp4 +│ | | ├── 2.mp4 +│ | | ├──... +│ | ├── rawframes +│ | | ├── 1 +│ | | | ├── 00001.jpg +│ | | | ├── 00002.jpg +│ | | | ├── ... +│ | | | ├── flow_x_00001.jpg +│ | | | ├── flow_x_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_y_00001.jpg +│ | | | ├── flow_y_00002.jpg +│ | | | ├── ... +│ | | ├── 2 +│ | | ├── ... + +``` + +For training and evaluating on Jester, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/jester/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..4b3fb17f0b6c7fff7c04ae5bfc2d6ca03b86a20b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/README_zh-CN.md @@ -0,0 +1,143 @@ +# 准备 Jester + +## 简介 + + + +```BibTeX +@InProceedings{Materzynska_2019_ICCV, + author = {Materzynska, Joanna and Berger, Guillaume and Bax, Ingo and Memisevic, Roland}, + title = {The Jester Dataset: A Large-Scale Video Dataset of Human Gestures}, + booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, + month = {Oct}, + year = {2019} +} +``` + +用户可以参照数据集 [官网](https://20bn.com/datasets/jester/v1),获取数据集相关的基本信息。 +在准备数据集前,请确保命令行当前路径为 `$MMACTION2/tools/data/jester/`。 + +## 步骤 1. 下载标注文件 + +首先,用户需要在 [官网](https://20bn.com/datasets/jester/v1) 完成注册,才能下载标注文件。下载好的标注文件需要放在 `$MMACTION2/data/jester/annotations` 文件夹下。 + +## 步骤 2. 准备 RGB 帧 + +[jester 官网](https://20bn.com/datasets/jester/v1) 并未提供原始视频文件,只提供了对原视频文件进行抽取得到的 RGB 帧,用户可在 [jester 官网](https://20bn.com/datasets/jester/v1) 直接下载。 + +将下载好的压缩文件放在 `$MMACTION2/data/jester/` 文件夹下,并使用以下脚本进行解压。 + +```shell +cd $MMACTION2/data/jester/ +cat 20bn-jester-v1-?? | tar zx +cd $MMACTION2/tools/data/jester/ +``` + +如果用户只想使用 RGB 帧,则可以跳过中间步骤至步骤 5 以直接生成视频帧的文件列表。 +由于官网的 JPG 文件名形如 "%05d.jpg" (比如,"00001.jpg"),需要在配置文件的 `data.train`, `data.val` 和 `data.test` 处添加 `"filename_tmpl='{:05}.jpg'"` 代码,以修改文件名模板。 + +```python +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +``` + +## 步骤 3. 抽取光流 + +如果用户只想使用 RGB 帧训练,则该部分是 **可选项**。 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果拥有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 中。 + +可以运行以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行进行抽取(假设 SSD 挂载在 "/mnt/SSD/") +mkdir /mnt/SSD/jester_extracted/ +ln -s /mnt/SSD/jester_extracted/ ../../../data/jester/rawframes +``` + +如果想抽取光流,则可以运行以下脚本从 RGB 帧中抽取出光流。 + +```shell +cd $MMACTION2/tools/data/jester/ +bash extract_flow.sh +``` + +## 步骤 4: 编码视频 + +如果用户只想使用 RGB 帧训练,则该部分是 **可选项**。 + +用户可以运行以下命令进行视频编码。 + +```shell +cd $MMACTION2/tools/data/jester/ +bash encode_videos.sh +``` + +## 步骤 5. 生成文件列表 + +用户可以通过运行以下命令生成帧和视频格式的文件列表。 + +```shell +cd $MMACTION2/tools/data/jester/ +bash generate_{rawframes, videos}_filelist.sh +``` + +## 步骤 6. 检查文件夹结构 + +在完成所有 Jester 数据集准备流程后, +用户可以获得对应的 RGB + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,Jester 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── jester +│ │ ├── jester_{train,val}_list_rawframes.txt +│ │ ├── jester_{train,val}_list_videos.txt +│ │ ├── annotations +│ | ├── videos +│ | | ├── 1.mp4 +│ | | ├── 2.mp4 +│ | | ├──... +│ | ├── rawframes +│ | | ├── 1 +│ | | | ├── 00001.jpg +│ | | | ├── 00002.jpg +│ | | | ├── ... +│ | | | ├── flow_x_00001.jpg +│ | | | ├── flow_x_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_y_00001.jpg +│ | | | ├── flow_y_00002.jpg +│ | | | ├── ... +│ | | ├── 2 +│ | | ├── ... + +``` + +关于对 jester 进行训练和验证,可以参考 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/jester/encode_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/encode_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..c220424ab4eb678005d803f90fc4df8f910ee965 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/encode_videos.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_videos.py ../../data/jester/rawframes/ ../../data/jester/videos/ --fps 12 --level 1 --start-idx 1 --filename-tmpl '%05d' +echo "Encode videos" + +cd jester/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/jester/extract_flow.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/extract_flow.sh new file mode 100644 index 0000000000000000000000000000000000000000..f6b509088aa81ec87d729d54327e78e180434be3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/extract_flow.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/jester/rawframes/ ../../data/jester/rawframes/ --task flow --level 1 --flow-type tvl1 --input-frames +echo "Flow (tv-l1) Generated" +cd jester/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/jester/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..c92674aae92f1512a2a506add308429967fa318e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/generate_rawframes_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py jester data/jester/rawframes/ --rgb-prefix '0' --num-split 1 --level 1 --subset train --format rawframes --shuffle +PYTHONPATH=. python tools/data/build_file_list.py jester data/jester/rawframes/ --rgb-prefix '0' --num-split 1 --level 1 --subset val --format rawframes --shuffle +echo "Filelist for rawframes generated." + +cd tools/data/jester/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/jester/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..693849677914871e782702fe29a8b98197d7b1b0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/generate_videos_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py jester data/jester/videos/ --num-split 1 --level 1 --subset train --format videos --shuffle +PYTHONPATH=. python tools/data/build_file_list.py jester data/jester/videos/ --num-split 1 --level 1 --subset val --format videos --shuffle +echo "Filelist for videos generated." + +cd tools/data/jester/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/jester/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..577e5a22e11aa01806f7da10db417dad99c30c9c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/jester/label_map.txt @@ -0,0 +1,27 @@ +Swiping Left +Swiping Right +Swiping Down +Swiping Up +Pushing Hand Away +Pulling Hand In +Sliding Two Fingers Left +Sliding Two Fingers Right +Sliding Two Fingers Down +Sliding Two Fingers Up +Pushing Two Fingers Away +Pulling Two Fingers In +Rolling Hand Forward +Rolling Hand Backward +Turning Hand Clockwise +Turning Hand Counterclockwise +Zooming In With Full Hand +Zooming Out With Full Hand +Zooming In With Two Fingers +Zooming Out With Two Fingers +Thumb Up +Thumb Down +Shaking Hand +Stop Sign +Drumming Fingers +No gesture +Doing other things diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/jhmdb/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/jhmdb/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6e2042c138286790b337797f73e3b4a50f327694 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/jhmdb/README.md @@ -0,0 +1,101 @@ +# Preparing JHMDB + +## Introduction + + + +```BibTeX +@inproceedings{Jhuang:ICCV:2013, + title = {Towards understanding action recognition}, + author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black}, + booktitle = {International Conf. on Computer Vision (ICCV)}, + month = Dec, + pages = {3192-3199}, + year = {2013} +} +``` + +For basic dataset information, you can refer to the dataset [website](http://jhmdb.is.tue.mpg.de/). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/jhmdb/`. + +## Download and Extract + +You can download the RGB frames, optical flow and ground truth annotations from [google drive](https://drive.google.com/drive/folders/1BvGywlAGrACEqRyfYbz3wzlVV3cDFkct). +The data are provided from [MOC](https://github.com/MCG-NJU/MOC-Detector/blob/master/readme/Dataset.md), which is adapted from [act-detector](https://github.com/vkalogeiton/caffe/tree/act-detector). + +After downloading the `JHMDB.tar.gz` file and put it in `$MMACTION2/tools/data/jhmdb/`, you can run the following command to extract. + +```shell +tar -zxvf JHMDB.tar.gz +``` + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. + +You can run the following script to soft link SSD. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/JHMDB/ +ln -s /mnt/SSD/JHMDB/ ../../../data/jhmdb +``` + +## Check Directory Structure + +After extracting, you will get the `FlowBrox04` directory, `Frames` directory and `JHMDB-GT.pkl` for JHMDB. + +In the context of the whole project (for JHMDB only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── jhmdb +│ | ├── FlowBrox04 +│ | | ├── brush_hair +│ | | | ├── April_09_brush_hair_u_nm_np1_ba_goo_0 +│ | | | | ├── 00001.jpg +│ | | | | ├── 00002.jpg +│ | | | | ├── ... +│ | | | | ├── 00039.jpg +│ | | | | ├── 00040.jpg +│ | | | ├── ... +│ | | | ├── Trannydude___Brushing_SyntheticHair___OhNOES!__those_fukin_knots!_brush_hair_u_nm_np1_fr_goo_2 +│ | | ├── ... +│ | | ├── wave +│ | | | ├── 21_wave_u_nm_np1_fr_goo_5 +│ | | | ├── ... +│ | | | ├── Wie_man_winkt!!_wave_u_cm_np1_fr_med_0 +│ | ├── Frames +│ | | ├── brush_hair +│ | | | ├── April_09_brush_hair_u_nm_np1_ba_goo_0 +│ | | | | ├── 00001.png +│ | | | | ├── 00002.png +│ | | | | ├── ... +│ | | | | ├── 00039.png +│ | | | | ├── 00040.png +│ | | | ├── ... +│ | | | ├── Trannydude___Brushing_SyntheticHair___OhNOES!__those_fukin_knots!_brush_hair_u_nm_np1_fr_goo_2 +│ | | ├── ... +│ | | ├── wave +│ | | | ├── 21_wave_u_nm_np1_fr_goo_5 +│ | | | ├── ... +│ | | | ├── Wie_man_winkt!!_wave_u_cm_np1_fr_med_0 +│ | ├── JHMDB-GT.pkl + +``` + +:::{note} +The `JHMDB-GT.pkl` exists as a cache, it contains 6 items as follows: + +1. `labels` (list): List of the 21 labels. +2. `gttubes` (dict): Dictionary that contains the ground truth tubes for each video. + A **gttube** is dictionary that associates with each index of label and a list of tubes. + A **tube** is a numpy array with `nframes` rows and 5 columns, each col is in format like ` `. +3. `nframes` (dict): Dictionary that contains the number of frames for each video, like `'walk/Panic_in_the_Streets_walk_u_cm_np1_ba_med_5': 16`. +4. `train_videos` (list): A list with `nsplits=1` elements, each one containing the list of training videos. +5. `test_videos` (list): A list with `nsplits=1` elements, each one containing the list of testing videos. +6. `resolution` (dict): Dictionary that outputs a tuple (h,w) of the resolution for each video, like `'pour/Bartender_School_Students_Practice_pour_u_cm_np1_fr_med_1': (240, 320)`. + +::: diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/jhmdb/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/jhmdb/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..3e9fb638aed0bd87d9b1a8821c45a3200c797033 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/jhmdb/README_zh-CN.md @@ -0,0 +1,98 @@ +# 准备 JHMDB + +## 简介 + + + +```BibTeX +@inproceedings{Jhuang:ICCV:2013, + title = {Towards understanding action recognition}, + author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black}, + booktitle = {International Conf. on Computer Vision (ICCV)}, + month = Dec, + pages = {3192-3199}, + year = {2013} +} +``` + +用户可参考该数据集的 [官网](http://jhmdb.is.tue.mpg.de/),以获取数据集相关的基本信息。 +在数据集准备前,请确保命令行当前路径为 `$MMACTION2/tools/data/jhmdb/`。 + +## 下载和解压 + +用户可以从 [这里](https://drive.google.com/drive/folders/1BvGywlAGrACEqRyfYbz3wzlVV3cDFkct) 下载 RGB 帧,光流和真实标签文件。 +该数据由 [MOC](https://github.com/MCG-NJU/MOC-Detector/blob/master/readme/Dataset.md) 代码库提供,参考自 [act-detector](https://github.com/vkalogeiton/caffe/tree/act-detector)。 + +用户在下载 `JHMDB.tar.gz` 文件后,需将其放置在 `$MMACTION2/tools/data/jhmdb/` 目录下,并使用以下指令进行解压: + +```shell +tar -zxvf JHMDB.tar.gz +``` + +如果拥有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 中。 + +可以运行以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行进行抽取(假设 SSD 挂载在 "/mnt/SSD/") +mkdir /mnt/SSD/JHMDB/ +ln -s /mnt/SSD/JHMDB/ ../../../data/jhmdb +``` + +## 检查文件夹结构 + +完成解压后,用户将得到 `FlowBrox04` 文件夹,`Frames` 文件夹和 `JHMDB-GT.pkl` 文件。 + +在整个 MMAction2 文件夹下,JHMDB 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── jhmdb +│ | ├── FlowBrox04 +│ | | ├── brush_hair +│ | | | ├── April_09_brush_hair_u_nm_np1_ba_goo_0 +│ | | | | ├── 00001.jpg +│ | | | | ├── 00002.jpg +│ | | | | ├── ... +│ | | | | ├── 00039.jpg +│ | | | | ├── 00040.jpg +│ | | | ├── ... +│ | | | ├── Trannydude___Brushing_SyntheticHair___OhNOES!__those_fukin_knots!_brush_hair_u_nm_np1_fr_goo_2 +│ | | ├── ... +│ | | ├── wave +│ | | | ├── 21_wave_u_nm_np1_fr_goo_5 +│ | | | ├── ... +│ | | | ├── Wie_man_winkt!!_wave_u_cm_np1_fr_med_0 +│ | ├── Frames +│ | | ├── brush_hair +│ | | | ├── April_09_brush_hair_u_nm_np1_ba_goo_0 +│ | | | | ├── 00001.png +│ | | | | ├── 00002.png +│ | | | | ├── ... +│ | | | | ├── 00039.png +│ | | | | ├── 00040.png +│ | | | ├── ... +│ | | | ├── Trannydude___Brushing_SyntheticHair___OhNOES!__those_fukin_knots!_brush_hair_u_nm_np1_fr_goo_2 +│ | | ├── ... +│ | | ├── wave +│ | | | ├── 21_wave_u_nm_np1_fr_goo_5 +│ | | | ├── ... +│ | | | ├── Wie_man_winkt!!_wave_u_cm_np1_fr_med_0 +│ | ├── JHMDB-GT.pkl + +``` + +**注意**:`JHMDB-GT.pkl` 作为一个缓存文件,它包含 6 个项目: + +1. `labels` (list):21 个行为类别名称组成的列表 +2. `gttubes` (dict):每个视频对应的基准 tubes 组成的字典 + **gttube** 是由标签索引和 tube 列表组成的字典 + **tube** 是一个 `nframes` 行和 5 列的 numpy array,每一列的形式如 ` ` +3. `nframes` (dict):用以表示每个视频对应的帧数,如 `'walk/Panic_in_the_Streets_walk_u_cm_np1_ba_med_5': 16` +4. `train_videos` (list):包含 `nsplits=1` 的元素,每一项都包含了训练视频的列表 +5. `test_videos` (list):包含 `nsplits=1` 的元素,每一项都包含了测试视频的列表 +6. `resolution` (dict):每个视频对应的分辨率(形如 (h,w)),如 `'pour/Bartender_School_Students_Practice_pour_u_cm_np1_fr_med_1': (240, 320)` diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4fc7b6bb1e99bbf81ce7f2bf143b4835d8473e24 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/README.md @@ -0,0 +1,150 @@ +# Preparing Kinetics-\[400/600/700\] + +## Introduction + + + +```BibTeX +@inproceedings{inproceedings, + author = {Carreira, J. and Zisserman, Andrew}, + year = {2017}, + month = {07}, + pages = {4724-4733}, + title = {Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset}, + doi = {10.1109/CVPR.2017.502} +} +``` + +For basic dataset information, please refer to the official [website](https://deepmind.com/research/open-source/open-source-datasets/kinetics/). The scripts can be used for preparing kinetics400, kinetics600, kinetics700. To prepare different version of kinetics, you need to replace `${DATASET}` in the following examples with the specific dataset name. The choices of dataset names are `kinetics400`, `kinetics600` and `kinetics700`. +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/${DATASET}/`. + +:::{note} +Because of the expirations of some YouTube links, the sizes of kinetics dataset copies may be different. Here are the sizes of our kinetics dataset copies that used to train all checkpoints. + +| Dataset | training videos | validation videos | +| :---------: | :-------------: | :---------------: | +| kinetics400 | 240436 | 19796 | + +::: + +## Step 1. Prepare Annotations + +First of all, you can run the following script to prepare annotations by downloading from the official [website](https://deepmind.com/research/open-source/open-source-datasets/kinetics/). + +```shell +bash download_annotations.sh ${DATASET} +``` + +Since some video urls are invalid, the number of video items in current official annotations are less than the original official ones. +So we provide an alternative way to download the older one as a reference. +Among these, the annotation files of Kinetics400 and Kinetics600 are from [official crawler](https://github.com/activitynet/ActivityNet/tree/199c9358907928a47cdfc81de4db788fddc2f91d/Crawler/Kinetics/data), +the annotation files of Kinetics700 are from [website](https://deepmind.com/research/open-source/open-source-datasets/kinetics/) downloaded in 05/02/2021. + +```shell +bash download_backup_annotations.sh ${DATASET} +``` + +## Step 2. Prepare Videos + +Then, you can run the following script to prepare videos. +The codes are adapted from the [official crawler](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics). Note that this might take a long time. + +```shell +bash download_videos.sh ${DATASET} +``` + +**Important**: If you have already downloaded video dataset using the download script above, +you must replace all whitespaces in the class name for ease of processing by running + +```shell +bash rename_classnames.sh ${DATASET} +``` + +For better decoding speed, you can resize the original videos into smaller sized, densely encoded version by: + +```bash +python ../resize_videos.py ../../../data/${DATASET}/videos_train/ ../../../data/${DATASET}/videos_train_256p_dense_cache --dense --level 2 +``` + +You can also download from [Academic Torrents](https://academictorrents.com/) ([kinetics400](https://academictorrents.com/details/184d11318372f70018cf9a72ef867e2fb9ce1d26) & [kinetics700](https://academictorrents.com/details/49f203189fb69ae96fb40a6d0e129949e1dfec98) with short edge 256 pixels are available) and [cvdfoundation/kinetics-dataset](https://github.com/cvdfoundation/kinetics-dataset) (Host by Common Visual Data Foundation and Kinetics400/Kinetics600/Kinetics-700-2020 are available) + +## Step 3. Extract RGB and Flow + +This part is **optional** if you only want to use the video loader. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. And you can run the following script to soft link the extracted frames. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/${DATASET}_extracted_train/ +ln -s /mnt/SSD/${DATASET}_extracted_train/ ../../../data/${DATASET}/rawframes_train/ +mkdir /mnt/SSD/${DATASET}_extracted_val/ +ln -s /mnt/SSD/${DATASET}_extracted_val/ ../../../data/${DATASET}/rawframes_val/ +``` + +If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract **RGB-only** frames using denseflow. + +```shell +bash extract_rgb_frames.sh ${DATASET} +``` + +If you didn't install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images. + +```shell +bash extract_rgb_frames_opencv.sh ${DATASET} +``` + +If both are required, run the following script to extract frames. + +```shell +bash extract_frames.sh ${DATASET} +``` + +The commands above can generate images with new short edge 256. If you want to generate images with short edge 320 (320p), or with fix size 340x256, you can change the args `--new-short 256` to `--new-short 320` or `--new-width 340 --new-height 256`. +More details can be found in [data_preparation](/docs/data_preparation.md) + +## Step 4. Generate File List + +you can run the follow scripts to generate file list in the format of videos and rawframes, respectively. + +```shell +bash generate_videos_filelist.sh ${DATASET} +# execute the command below when rawframes are ready +bash generate_rawframes_filelist.sh ${DATASET} +``` + +## Step 5. Folder Structure + +After the whole data pipeline for Kinetics preparation. +you can get the rawframes (RGB + Flow), videos and annotation files for Kinetics. + +In the context of the whole project (for Kinetics only), the *minimal* folder structure will look like: +(*minimal* means that some data are not necessary: for example, you may want to evaluate kinetics using the original video format.) + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ${DATASET} +│ │ ├── ${DATASET}_train_list_videos.txt +│ │ ├── ${DATASET}_val_list_videos.txt +│ │ ├── annotations +│ │ ├── videos_train +│ │ ├── videos_val +│ │ │ ├── abseiling +│ │ │ │ ├── 0wR5jVB-WPk_000417_000427.mp4 +│ │ │ │ ├── ... +│ │ │ ├── ... +│ │ │ ├── wrapping_present +│ │ │ ├── ... +│ │ │ ├── zumba +│ │ ├── rawframes_train +│ │ ├── rawframes_val + +``` + +For training and evaluating on Kinetics, please refer to [getting_started](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..e307b9e7f577d412e60a9db03090fc34a3b795df --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/README_zh-CN.md @@ -0,0 +1,142 @@ +# 准备 Kinetics-\[400/600/700\] + +## 简介 + + + +```BibTeX +@inproceedings{inproceedings, + author = {Carreira, J. and Zisserman, Andrew}, + year = {2017}, + month = {07}, + pages = {4724-4733}, + title = {Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset}, + doi = {10.1109/CVPR.2017.502} +} +``` + +请参照 [官方网站](https://deepmind.com/research/open-source/open-source-datasets/kinetics/) 以获取数据集基本信息。此脚本用于准备数据集 kinetics400,kinetics600,kinetics700。为准备 kinetics 数据集的不同版本,用户需将脚本中的 `${DATASET}` 赋值为数据集对应版本名称,可选项为 `kinetics400`,`kinetics600`, `kinetics700`。 +在开始之前,用户需确保当前目录为 `$MMACTION2/tools/data/${DATASET}/`。 + +**注**:由于部分 YouTube 链接失效,爬取的 Kinetics 数据集大小可能与原版不同。以下是我们所使用 Kinetics 数据集的大小: + +| 数据集 | 训练视频 | 验证集视频 | +| :---------: | :------: | :--------: | +| kinetics400 | 240436 | 19796 | + +## 1. 准备标注文件 + +首先,用户可以使用如下脚本从 [Kinetics 数据集官网](https://deepmind.com/research/open-source/open-source-datasets/kinetics/)下载标注文件并进行预处理: + +```shell +bash download_annotations.sh ${DATASET} +``` + +由于部分视频的 URL 不可用,当前官方标注中所含视频数量可能小于初始版本。所以 MMAction2 提供了另一种方式以获取初始版本标注作为参考。 +在这其中,Kinetics400 和 Kinetics600 的标注文件来自 [官方爬虫](https://github.com/activitynet/ActivityNet/tree/199c9358907928a47cdfc81de4db788fddc2f91d/Crawler/Kinetics/data), +Kinetics700 的标注文件于 05/02/2021 下载自 [网站](https://deepmind.com/research/open-source/open-source-datasets/kinetics/)。 + +```shell +bash download_backup_annotations.sh ${DATASET} +``` + +## 2. 准备视频 + +用户可以使用以下脚本准备视频,视频准备代码修改自 [官方爬虫](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics)。注意这一步骤将花费较长时间。 + +```shell +bash download_videos.sh ${DATASET} +``` + +**重要提示**:如果在此之前已下载好 Kinetics 数据集的视频,还需使用重命名脚本来替换掉类名中的空格: + +```shell +bash rename_classnames.sh ${DATASET} +``` + +为提升解码速度,用户可以使用以下脚本将原始视频缩放至更小的分辨率(利用稠密编码): + +```bash +python ../resize_videos.py ../../../data/${DATASET}/videos_train/ ../../../data/${DATASET}/videos_train_256p_dense_cache --dense --level 2 +``` + +也可以从 [Academic Torrents](https://academictorrents.com/) 中下载短边长度为 256 的 [kinetics400](https://academictorrents.com/details/184d11318372f70018cf9a72ef867e2fb9ce1d26) 和 [kinetics700](https://academictorrents.com/details/49f203189fb69ae96fb40a6d0e129949e1dfec98),或从 Common Visual Data Foundation 维护的 [cvdfoundation/kinetics-dataset](https://github.com/cvdfoundation/kinetics-dataset) 中下载 Kinetics400/Kinetics600/Kinetics-700-2020。 + +## 3. 提取 RGB 帧和光流 + +如果用户仅使用 video loader,则可以跳过本步。 + +在提取之前,请参考 [安装教程](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果用户有足够的 SSD 空间,那么建议将视频抽取为 RGB 帧以提升 I/O 性能。用户可以使用以下脚本为抽取得到的帧文件夹建立软连接: + +```shell +# 执行以下脚本 (假设 SSD 被挂载在 "/mnt/SSD/") +mkdir /mnt/SSD/${DATASET}_extracted_train/ +ln -s /mnt/SSD/${DATASET}_extracted_train/ ../../../data/${DATASET}/rawframes_train/ +mkdir /mnt/SSD/${DATASET}_extracted_val/ +ln -s /mnt/SSD/${DATASET}_extracted_val/ ../../../data/${DATASET}/rawframes_val/ +``` + +如果用户只使用 RGB 帧(由于光流提取非常耗时),可以考虑执行以下脚本,仅用 denseflow 提取 RGB 帧: + +```shell +bash extract_rgb_frames.sh ${DATASET} +``` + +如果用户未安装 denseflow,以下脚本可以使用 OpenCV 进行 RGB 帧的提取,但视频原分辨率大小会被保留: + +```shell +bash extract_rgb_frames_opencv.sh ${DATASET} +``` + +如果同时需要 RGB 帧和光流,可使用如下脚本抽帧: + +```shell +bash extract_frames.sh ${DATASET} +``` + +以上的命令生成短边长度为 256 的 RGB 帧和光流帧。如果用户需要生成短边长度为 320 的帧 (320p),或是固定分辨率为 340 x 256 的帧,可改变参数 `--new-short 256` 为 `--new-short 320` 或 `--new-width 340 --new-height 256`。 +更多细节可以参考 [数据准备](/docs_zh_CN/data_preparation.md)。 + +## 4. 生成文件列表 + +用户可以使用以下两个脚本分别为视频和帧文件夹生成文件列表: + +```shell +bash generate_videos_filelist.sh ${DATASET} +# 为帧文件夹生成文件列表 +bash generate_rawframes_filelist.sh ${DATASET} +``` + +## 5. 目录结构 + +在完整完成 Kinetics 的数据处理后,将得到帧文件夹(RGB 帧和光流帧),视频以及标注文件。 + +在整个项目目录下(仅针对 Kinetics),*最简* 目录结构如下所示: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ${DATASET} +│ │ ├── ${DATASET}_train_list_videos.txt +│ │ ├── ${DATASET}_val_list_videos.txt +│ │ ├── annotations +│ │ ├── videos_train +│ │ ├── videos_val +│ │ │ ├── abseiling +│ │ │ │ ├── 0wR5jVB-WPk_000417_000427.mp4 +│ │ │ │ ├── ... +│ │ │ ├── ... +│ │ │ ├── wrapping_present +│ │ │ ├── ... +│ │ │ ├── zumba +│ │ ├── rawframes_train +│ │ ├── rawframes_val + +``` + +关于 Kinetics 数据集上的训练与测试,请参照 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download.py b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download.py new file mode 100644 index 0000000000000000000000000000000000000000..b4e7e62a7ecf5251d1a35d86f7c15ae0ed97e731 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download.py @@ -0,0 +1,230 @@ +# ------------------------------------------------------------------------------ +# Adapted from https://github.com/activitynet/ActivityNet/ +# Original licence: Copyright (c) Microsoft, under the MIT License. +# ------------------------------------------------------------------------------ +import argparse +import glob +import json +import os +import shutil +import ssl +import subprocess +import uuid +from collections import OrderedDict + +import pandas as pd +from joblib import Parallel, delayed + +ssl._create_default_https_context = ssl._create_unverified_context + + +def create_video_folders(dataset, output_dir, tmp_dir): + """Creates a directory for each label name in the dataset.""" + if 'label-name' not in dataset.columns: + this_dir = os.path.join(output_dir, 'test') + if not os.path.exists(this_dir): + os.makedirs(this_dir) + # I should return a dict but ... + return this_dir + if not os.path.exists(output_dir): + os.makedirs(output_dir) + if not os.path.exists(tmp_dir): + os.makedirs(tmp_dir) + + label_to_dir = {} + for label_name in dataset['label-name'].unique(): + this_dir = os.path.join(output_dir, label_name) + if not os.path.exists(this_dir): + os.makedirs(this_dir) + label_to_dir[label_name] = this_dir + return label_to_dir + + +def construct_video_filename(row, label_to_dir, trim_format='%06d'): + """Given a dataset row, this function constructs the output filename for a + given video.""" + basename = '%s_%s_%s.mp4' % (row['video-id'], + trim_format % row['start-time'], + trim_format % row['end-time']) + if not isinstance(label_to_dir, dict): + dirname = label_to_dir + else: + dirname = label_to_dir[row['label-name']] + output_filename = os.path.join(dirname, basename) + return output_filename + + +def download_clip(video_identifier, + output_filename, + start_time, + end_time, + tmp_dir='/tmp/kinetics/.tmp_dir', + num_attempts=5, + url_base='https://www.youtube.com/watch?v='): + """Download a video from youtube if exists and is not blocked. + arguments: + --------- + video_identifier: str + Unique YouTube video identifier (11 characters) + output_filename: str + File path where the video will be stored. + start_time: float + Indicates the beginning time in seconds from where the video + will be trimmed. + end_time: float + Indicates the ending time in seconds of the trimmed video. + """ + # Defensive argument checking. + assert isinstance(video_identifier, str), 'video_identifier must be string' + assert isinstance(output_filename, str), 'output_filename must be string' + assert len(video_identifier) == 11, 'video_identifier must have length 11' + + status = False + # Construct command line for getting the direct video link. + tmp_filename = os.path.join(tmp_dir, '%s.%%(ext)s' % uuid.uuid4()) + + if not os.path.exists(output_filename): + if not os.path.exists(tmp_filename): + command = [ + 'youtube-dl', '--quiet', '--no-warnings', + '--no-check-certificate', '-f', 'mp4', '-o', + '"%s"' % tmp_filename, + '"%s"' % (url_base + video_identifier) + ] + command = ' '.join(command) + print(command) + attempts = 0 + while True: + try: + subprocess.check_output( + command, shell=True, stderr=subprocess.STDOUT) + except subprocess.CalledProcessError as err: + attempts += 1 + if attempts == num_attempts: + return status, err.output + else: + break + + tmp_filename = glob.glob('%s*' % tmp_filename.split('.')[0])[0] + # Construct command to trim the videos (ffmpeg required). + command = [ + 'ffmpeg', '-i', + '"%s"' % tmp_filename, '-ss', + str(start_time), '-t', + str(end_time - start_time), '-c:v', 'libx264', '-c:a', 'copy', + '-threads', '1', '-loglevel', 'panic', + '"%s"' % output_filename + ] + command = ' '.join(command) + try: + subprocess.check_output( + command, shell=True, stderr=subprocess.STDOUT) + except subprocess.CalledProcessError as err: + return status, err.output + + # Check if the video was successfully saved. + status = os.path.exists(output_filename) + os.remove(tmp_filename) + return status, 'Downloaded' + + +def download_clip_wrapper(row, label_to_dir, trim_format, tmp_dir): + """Wrapper for parallel processing purposes.""" + output_filename = construct_video_filename(row, label_to_dir, trim_format) + clip_id = os.path.basename(output_filename).split('.mp4')[0] + if os.path.exists(output_filename): + status = tuple([clip_id, True, 'Exists']) + return status + + downloaded, log = download_clip( + row['video-id'], + output_filename, + row['start-time'], + row['end-time'], + tmp_dir=tmp_dir) + status = tuple([clip_id, downloaded, log]) + return status + + +def parse_kinetics_annotations(input_csv, ignore_is_cc=False): + """Returns a parsed DataFrame. + arguments: + --------- + input_csv: str + Path to CSV file containing the following columns: + 'YouTube Identifier,Start time,End time,Class label' + returns: + ------- + dataset: DataFrame + Pandas with the following columns: + 'video-id', 'start-time', 'end-time', 'label-name' + """ + df = pd.read_csv(input_csv) + if 'youtube_id' in df.columns: + columns = OrderedDict([('youtube_id', 'video-id'), + ('time_start', 'start-time'), + ('time_end', 'end-time'), + ('label', 'label-name')]) + df.rename(columns=columns, inplace=True) + if ignore_is_cc: + df = df.loc[:, df.columns.tolist()[:-1]] + return df + + +def main(input_csv, + output_dir, + trim_format='%06d', + num_jobs=24, + tmp_dir='/tmp/kinetics'): + tmp_dir = os.path.join(tmp_dir, '.tmp_dir') + + # Reading and parsing Kinetics. + dataset = parse_kinetics_annotations(input_csv) + + # Creates folders where videos will be saved later. + label_to_dir = create_video_folders(dataset, output_dir, tmp_dir) + + # Download all clips. + if num_jobs == 1: + status_list = [] + for _, row in dataset.iterrows(): + status_list.append( + download_clip_wrapper(row, label_to_dir, trim_format, tmp_dir)) + else: + status_list = Parallel( + n_jobs=num_jobs)(delayed(download_clip_wrapper)( + row, label_to_dir, trim_format, tmp_dir) + for i, row in dataset.iterrows()) + + # Clean tmp dir. + shutil.rmtree(tmp_dir) + + # Save download report. + with open('download_report.json', 'w') as fobj: + fobj.write(json.dumps(status_list)) + + +if __name__ == '__main__': + description = 'Helper script for downloading and trimming kinetics videos.' + p = argparse.ArgumentParser(description=description) + p.add_argument( + 'input_csv', + type=str, + help=('CSV file containing the following format: ' + 'YouTube Identifier,Start time,End time,Class label')) + p.add_argument( + 'output_dir', + type=str, + help='Output directory where videos will be saved.') + p.add_argument( + '-f', + '--trim-format', + type=str, + default='%06d', + help=('This will be the format for the ' + 'filename of trimmed videos: ' + 'videoid_%0xd(start_time)_%0xd(end_time).mp4')) + p.add_argument('-n', '--num-jobs', type=int, default=24) + p.add_argument('-t', '--tmp-dir', type=str, default='/tmp/kinetics') + # help='CSV file of the previous version of Kinetics.') + main(**vars(p.parse_args())) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..09e25b19575ccf6a5874b0285b4a30de333296d0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download_annotations.sh @@ -0,0 +1,26 @@ +#!/usr/bin/env bash + +DATASET=$1 +if [ "$DATASET" == "kinetics400" ] || [ "$1" == "kinetics600" ] || [ "$1" == "kinetics700" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support kinetics400, kinetics600 or kinetics700" + exit 0 +fi + +DATA_DIR="../../../data/${DATASET}/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +wget https://storage.googleapis.com/deepmind-media/Datasets/${DATASET}.tar.gz + +tar -zxvf ${DATASET}.tar.gz --strip-components 1 -C ${DATA_DIR}/ +mv ${DATA_DIR}/train.csv ${DATA_DIR}/kinetics_train.csv +mv ${DATA_DIR}/validate.csv ${DATA_DIR}/kinetics_val.csv +mv ${DATA_DIR}/test.csv ${DATA_DIR}/kinetics_test.csv + +rm ${DATASET}.tar.gz +rm ${DATA_DIR}/*.json diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download_backup_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download_backup_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..8f22a74353c83caeb2b28512561a4f01d89f3265 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download_backup_annotations.sh @@ -0,0 +1,25 @@ +#!/usr/bin/env bash + +DATASET=$1 +if [ "$DATASET" == "kinetics400" ] || [ "$1" == "kinetics600" ] || [ "$1" == "kinetics700" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support kinetics400, kinetics600 or kinetics700" + exit 0 +fi + +DATA_DIR="../../../data/${DATASET}/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + + +wget https://download.openmmlab.com/mmaction/dataset/${DATASET}/annotations/kinetics_train.csv +wget https://download.openmmlab.com/mmaction/dataset/${DATASET}/annotations/kinetics_val.csv +wget https://download.openmmlab.com/mmaction/dataset/${DATASET}/annotations/kinetics_test.csv + +mv kinetics_train.csv ${DATA_DIR}/kinetics_train.csv +mv kinetics_val.csv ${DATA_DIR}/kinetics_val.csv +mv kinetics_test.csv ${DATA_DIR}/kinetics_test.csv diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..0f49ed5ef20d65b6267f4ffb961c5d91f9df1845 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/download_videos.sh @@ -0,0 +1,22 @@ +#!/usr/bin/env bash + +# set up environment +conda env create -f environment.yml +source activate kinetics +pip install --upgrade youtube-dl + +DATASET=$1 +if [ "$DATASET" == "kinetics400" ] || [ "$1" == "kinetics600" ] || [ "$1" == "kinetics700" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support kinetics400, kinetics600 or kinetics700" + exit 0 +fi + +DATA_DIR="../../../data/${DATASET}" +ANNO_DIR="../../../data/${DATASET}/annotations" +python download.py ${ANNO_DIR}/kinetics_train.csv ${DATA_DIR}/videos_train +python download.py ${ANNO_DIR}/kinetics_val.csv ${DATA_DIR}/videos_val + +source deactivate kinetics +conda remove -n kinetics --all diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/environment.yml b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/environment.yml new file mode 100644 index 0000000000000000000000000000000000000000..bcee98f8779857a9d382f2dc85f37fbec81cbba0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/environment.yml @@ -0,0 +1,36 @@ +name: kinetics +channels: + - anaconda + - menpo + - conda-forge + - defaults +dependencies: + - ca-certificates=2020.1.1 + - certifi=2020.4.5.1 + - ffmpeg=2.8.6 + - libcxx=10.0.0 + - libedit=3.1.20181209 + - libffi=3.3 + - ncurses=6.2 + - openssl=1.1.1g + - pip=20.0.2 + - python=3.7.7 + - readline=8.0 + - setuptools=46.4.0 + - sqlite=3.31.1 + - tk=8.6.8 + - wheel=0.34.2 + - xz=5.2.5 + - zlib=1.2.11 + - pip: + - decorator==4.4.2 + - intel-openmp==2019.0 + - joblib==0.15.1 + - mkl==2019.0 + - numpy==1.18.4 + - olefile==0.46 + - pandas==1.0.3 + - python-dateutil==2.8.1 + - pytz==2020.1 + - six==1.14.0 + - youtube-dl diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..a3e346674b8d93d0334aa33460a5df4dc3718ee0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/extract_frames.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash + +DATASET=$1 +if [ "$DATASET" == "kinetics400" ] || [ "$1" == "kinetics600" ] || [ "$1" == "kinetics700" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support kinetics400, kinetics600 or kinetics700" + exit 0 +fi + +cd ../ +python build_rawframes.py ../../data/${DATASET}/videos_train/ ../../data/${DATASET}/rawframes_train/ --level 2 --flow-type tvl1 --ext mp4 --task both --new-short 256 +echo "Raw frames (RGB and tv-l1) Generated for train set" + +python build_rawframes.py ../../data/${DATASET}/videos_val/ ../../data/${DATASET}/rawframes_val/ --level 2 --flow-type tvl1 --ext mp4 --task both --new-short 256 +echo "Raw frames (RGB and tv-l1) Generated for val set" + +cd ${DATASET}/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/extract_rgb_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/extract_rgb_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..c83c2f58cf6b6f4cc4de2bf7e982c48ee773b43d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/extract_rgb_frames.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash + +DATASET=$1 +if [ "$DATASET" == "kinetics400" ] || [ "$1" == "kinetics600" ] || [ "$1" == "kinetics700" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support kinetics400, kinetics600 or kinetics700" + exit 0 +fi + +cd ../ +python build_rawframes.py ../../data/${DATASET}/videos_train/ ../../data/${DATASET}/rawframes_train/ --level 2 --ext mp4 --task rgb --new-short 256 +echo "Raw frames (RGB only) generated for train set" + +python build_rawframes.py ../../data/${DATASET}/videos_val/ ../../data/${DATASET}/rawframes_val/ --level 2 --ext mp4 --task rgb --new-short 256 +echo "Raw frames (RGB only) generated for val set" + +cd ${DATASET}/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/extract_rgb_frames_opencv.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/extract_rgb_frames_opencv.sh new file mode 100644 index 0000000000000000000000000000000000000000..83d94a51fbc46a9c7b8c3cb6ea7cd96d37c28b99 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/extract_rgb_frames_opencv.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash + +DATASET=$1 +if [ "$DATASET" == "kinetics400" ] || [ "$1" == "kinetics600" ] || [ "$1" == "kinetics700" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support kinetics400, kinetics600 or kinetics700" + exit 0 +fi + +cd ../ +python build_rawframes.py ../../data/${DATASET}/videos_train/ ../../data/${DATASET}/rawframes_train/ --level 2 --ext mp4 --task rgb --new-short 256 --use-opencv +echo "Raw frames (RGB only) generated for train set" + +python build_rawframes.py ../../data/${DATASET}/videos_val/ ../../data/${DATASET}/rawframes_val/ --level 2 --ext mp4 --task rgb --new-short 256 --use-opencv +echo "Raw frames (RGB only) generated for val set" + +cd ${DATASET}/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..22b2366e60e3327cc3cc74b0fc5dd45a26399bec --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/generate_rawframes_filelist.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash + +DATASET=$1 +if [ "$DATASET" == "kinetics400" ] || [ "$1" == "kinetics600" ] || [ "$1" == "kinetics700" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support kinetics400, kinetics600 or kinetics700" + exit 0 +fi + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py ${DATASET} data/${DATASET}/rawframes_train/ --level 2 --format rawframes --num-split 1 --subset train --shuffle +echo "Train filelist for rawframes generated." + +PYTHONPATH=. python tools/data/build_file_list.py ${DATASET} data/${DATASET}/rawframes_val/ --level 2 --format rawframes --num-split 1 --subset val --shuffle +echo "Val filelist for rawframes generated." +cd tools/data/${DATASET}/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..16db70cfb053d6ad57e0ba556990a35467e722c4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/generate_videos_filelist.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash + +DATASET=$1 +if [ "$DATASET" == "kinetics400" ] || [ "$1" == "kinetics600" ] || [ "$1" == "kinetics700" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support kinetics400, kinetics600 or kinetics700" + exit 0 +fi + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py ${DATASET} data/${DATASET}/videos_train/ --level 2 --format videos --num-split 1 --subset train --shuffle +echo "Train filelist for video generated." + +PYTHONPATH=. python tools/data/build_file_list.py ${DATASET} data/${DATASET}/videos_val/ --level 2 --format videos --num-split 1 --subset val --shuffle +echo "Val filelist for video generated." +cd tools/data/kinetics/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/label_map_k400.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/label_map_k400.txt new file mode 100644 index 0000000000000000000000000000000000000000..cdaafcb1415eee46483561e78cb4747cce76a933 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/label_map_k400.txt @@ -0,0 +1,400 @@ +abseiling +air drumming +answering questions +applauding +applying cream +archery +arm wrestling +arranging flowers +assembling computer +auctioning +baby waking up +baking cookies +balloon blowing +bandaging +barbequing +bartending +beatboxing +bee keeping +belly dancing +bench pressing +bending back +bending metal +biking through snow +blasting sand +blowing glass +blowing leaves +blowing nose +blowing out candles +bobsledding +bookbinding +bouncing on trampoline +bowling +braiding hair +breading or breadcrumbing +breakdancing +brush painting +brushing hair +brushing teeth +building cabinet +building shed +bungee jumping +busking +canoeing or kayaking +capoeira +carrying baby +cartwheeling +carving pumpkin +catching fish +catching or throwing baseball +catching or throwing frisbee +catching or throwing softball +celebrating +changing oil +changing wheel +checking tires +cheerleading +chopping wood +clapping +clay pottery making +clean and jerk +cleaning floor +cleaning gutters +cleaning pool +cleaning shoes +cleaning toilet +cleaning windows +climbing a rope +climbing ladder +climbing tree +contact juggling +cooking chicken +cooking egg +cooking on campfire +cooking sausages +counting money +country line dancing +cracking neck +crawling baby +crossing river +crying +curling hair +cutting nails +cutting pineapple +cutting watermelon +dancing ballet +dancing charleston +dancing gangnam style +dancing macarena +deadlifting +decorating the christmas tree +digging +dining +disc golfing +diving cliff +dodgeball +doing aerobics +doing laundry +doing nails +drawing +dribbling basketball +drinking +drinking beer +drinking shots +driving car +driving tractor +drop kicking +drumming fingers +dunking basketball +dying hair +eating burger +eating cake +eating carrots +eating chips +eating doughnuts +eating hotdog +eating ice cream +eating spaghetti +eating watermelon +egg hunting +exercising arm +exercising with an exercise ball +extinguishing fire +faceplanting +feeding birds +feeding fish +feeding goats +filling eyebrows +finger snapping +fixing hair +flipping pancake +flying kite +folding clothes +folding napkins +folding paper +front raises +frying vegetables +garbage collecting +gargling +getting a haircut +getting a tattoo +giving or receiving award +golf chipping +golf driving +golf putting +grinding meat +grooming dog +grooming horse +gymnastics tumbling +hammer throw +headbanging +headbutting +high jump +high kick +hitting baseball +hockey stop +holding snake +hopscotch +hoverboarding +hugging +hula hooping +hurdling +hurling (sport) +ice climbing +ice fishing +ice skating +ironing +javelin throw +jetskiing +jogging +juggling balls +juggling fire +juggling soccer ball +jumping into pool +jumpstyle dancing +kicking field goal +kicking soccer ball +kissing +kitesurfing +knitting +krumping +laughing +laying bricks +long jump +lunge +making a cake +making a sandwich +making bed +making jewelry +making pizza +making snowman +making sushi +making tea +marching +massaging back +massaging feet +massaging legs +massaging person's head +milking cow +mopping floor +motorcycling +moving furniture +mowing lawn +news anchoring +opening bottle +opening present +paragliding +parasailing +parkour +passing American football (in game) +passing American football (not in game) +peeling apples +peeling potatoes +petting animal (not cat) +petting cat +picking fruit +planting trees +plastering +playing accordion +playing badminton +playing bagpipes +playing basketball +playing bass guitar +playing cards +playing cello +playing chess +playing clarinet +playing controller +playing cricket +playing cymbals +playing didgeridoo +playing drums +playing flute +playing guitar +playing harmonica +playing harp +playing ice hockey +playing keyboard +playing kickball +playing monopoly +playing organ +playing paintball +playing piano +playing poker +playing recorder +playing saxophone +playing squash or racquetball +playing tennis +playing trombone +playing trumpet +playing ukulele +playing violin +playing volleyball +playing xylophone +pole vault +presenting weather forecast +pull ups +pumping fist +pumping gas +punching bag +punching person (boxing) +push up +pushing car +pushing cart +pushing wheelchair +reading book +reading newspaper +recording music +riding a bike +riding camel +riding elephant +riding mechanical bull +riding mountain bike +riding mule +riding or walking with horse +riding scooter +riding unicycle +ripping paper +robot dancing +rock climbing +rock scissors paper +roller skating +running on treadmill +sailing +salsa dancing +sanding floor +scrambling eggs +scuba diving +setting table +shaking hands +shaking head +sharpening knives +sharpening pencil +shaving head +shaving legs +shearing sheep +shining shoes +shooting basketball +shooting goal (soccer) +shot put +shoveling snow +shredding paper +shuffling cards +side kick +sign language interpreting +singing +situp +skateboarding +ski jumping +skiing (not slalom or crosscountry) +skiing crosscountry +skiing slalom +skipping rope +skydiving +slacklining +slapping +sled dog racing +smoking +smoking hookah +snatch weight lifting +sneezing +sniffing +snorkeling +snowboarding +snowkiting +snowmobiling +somersaulting +spinning poi +spray painting +spraying +springboard diving +squat +sticking tongue out +stomping grapes +stretching arm +stretching leg +strumming guitar +surfing crowd +surfing water +sweeping floor +swimming backstroke +swimming breast stroke +swimming butterfly stroke +swing dancing +swinging legs +swinging on something +sword fighting +tai chi +taking a shower +tango dancing +tap dancing +tapping guitar +tapping pen +tasting beer +tasting food +testifying +texting +throwing axe +throwing ball +throwing discus +tickling +tobogganing +tossing coin +tossing salad +training dog +trapezing +trimming or shaving beard +trimming trees +triple jump +tying bow tie +tying knot (not on a tie) +tying tie +unboxing +unloading truck +using computer +using remote controller (not gaming) +using segway +vault +waiting in line +walking the dog +washing dishes +washing feet +washing hair +washing hands +water skiing +water sliding +watering plants +waxing back +waxing chest +waxing eyebrows +waxing legs +weaving basket +welding +whistling +windsurfing +wrapping present +wrestling +writing +yawning +yoga +zumba diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/label_map_k600.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/label_map_k600.txt new file mode 100644 index 0000000000000000000000000000000000000000..639e9c91fa8a941ea57942872fae55628d590b42 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/label_map_k600.txt @@ -0,0 +1,600 @@ +abseiling +acting in play +adjusting glasses +air drumming +alligator wrestling +answering questions +applauding +applying cream +archaeological excavation +archery +arguing +arm wrestling +arranging flowers +assembling bicycle +assembling computer +attending conference +auctioning +backflip (human) +baking cookies +bandaging +barbequing +bartending +base jumping +bathing dog +battle rope training +beatboxing +bee keeping +belly dancing +bench pressing +bending back +bending metal +biking through snow +blasting sand +blowdrying hair +blowing bubble gum +blowing glass +blowing leaves +blowing nose +blowing out candles +bobsledding +bodysurfing +bookbinding +bottling +bouncing on bouncy castle +bouncing on trampoline +bowling +braiding hair +breading or breadcrumbing +breakdancing +breaking boards +breathing fire +brush painting +brushing hair +brushing teeth +building cabinet +building lego +building sandcastle +building shed +bull fighting +bulldozing +bungee jumping +burping +busking +calculating +calligraphy +canoeing or kayaking +capoeira +capsizing +card stacking +card throwing +carrying baby +cartwheeling +carving ice +carving pumpkin +casting fishing line +catching fish +catching or throwing baseball +catching or throwing frisbee +catching or throwing softball +celebrating +changing gear in car +changing oil +changing wheel (not on bike) +checking tires +cheerleading +chewing gum +chiseling stone +chiseling wood +chopping meat +chopping vegetables +chopping wood +clam digging +clapping +clay pottery making +clean and jerk +cleaning gutters +cleaning pool +cleaning shoes +cleaning toilet +cleaning windows +climbing a rope +climbing ladder +climbing tree +coloring in +combing hair +contact juggling +contorting +cooking egg +cooking on campfire +cooking sausages (not on barbeque) +cooking scallops +cosplaying +counting money +country line dancing +cracking back +cracking knuckles +cracking neck +crawling baby +crossing eyes +crossing river +crying +cumbia +curling (sport) +curling hair +cutting apple +cutting nails +cutting orange +cutting pineapple +cutting watermelon +dancing ballet +dancing charleston +dancing gangnam style +dancing macarena +deadlifting +decorating the christmas tree +delivering mail +dining +directing traffic +disc golfing +diving cliff +docking boat +dodgeball +doing aerobics +doing jigsaw puzzle +doing laundry +doing nails +drawing +dribbling basketball +drinking shots +driving car +driving tractor +drooling +drop kicking +drumming fingers +dumpster diving +dunking basketball +dyeing eyebrows +dyeing hair +eating burger +eating cake +eating carrots +eating chips +eating doughnuts +eating hotdog +eating ice cream +eating spaghetti +eating watermelon +egg hunting +embroidering +exercising with an exercise ball +extinguishing fire +faceplanting +falling off bike +falling off chair +feeding birds +feeding fish +feeding goats +fencing (sport) +fidgeting +finger snapping +fixing bicycle +fixing hair +flint knapping +flipping pancake +fly tying +flying kite +folding clothes +folding napkins +folding paper +front raises +frying vegetables +geocaching +getting a haircut +getting a piercing +getting a tattoo +giving or receiving award +gold panning +golf chipping +golf driving +golf putting +gospel singing in church +grinding meat +grooming dog +grooming horse +gymnastics tumbling +hammer throw +hand washing clothes +head stand +headbanging +headbutting +high jump +high kick +historical reenactment +hitting baseball +hockey stop +holding snake +home roasting coffee +hopscotch +hoverboarding +huddling +hugging (not baby) +hugging baby +hula hooping +hurdling +hurling (sport) +ice climbing +ice fishing +ice skating +ice swimming +inflating balloons +installing carpet +ironing +ironing hair +javelin throw +jaywalking +jetskiing +jogging +juggling balls +juggling fire +juggling soccer ball +jumping bicycle +jumping into pool +jumping jacks +jumpstyle dancing +karaoke +kicking field goal +kicking soccer ball +kissing +kitesurfing +knitting +krumping +land sailing +laughing +lawn mower racing +laying bricks +laying concrete +laying stone +laying tiles +leatherworking +licking +lifting hat +lighting fire +lock picking +long jump +longboarding +looking at phone +luge +lunge +making a cake +making a sandwich +making balloon shapes +making bubbles +making cheese +making horseshoes +making jewelry +making paper aeroplanes +making pizza +making snowman +making sushi +making tea +making the bed +marching +marriage proposal +massaging back +massaging feet +massaging legs +massaging neck +massaging person's head +milking cow +moon walking +mopping floor +mosh pit dancing +motorcycling +mountain climber (exercise) +moving furniture +mowing lawn +mushroom foraging +needle felting +news anchoring +opening bottle (not wine) +opening door +opening present +opening refrigerator +opening wine bottle +packing +paragliding +parasailing +parkour +passing American football (in game) +passing american football (not in game) +passing soccer ball +peeling apples +peeling potatoes +person collecting garbage +petting animal (not cat) +petting cat +photobombing +photocopying +picking fruit +pillow fight +pinching +pirouetting +planing wood +planting trees +plastering +playing accordion +playing badminton +playing bagpipes +playing basketball +playing bass guitar +playing beer pong +playing blackjack +playing cello +playing chess +playing clarinet +playing controller +playing cricket +playing cymbals +playing darts +playing didgeridoo +playing dominoes +playing drums +playing field hockey +playing flute +playing gong +playing guitar +playing hand clapping games +playing harmonica +playing harp +playing ice hockey +playing keyboard +playing kickball +playing laser tag +playing lute +playing maracas +playing marbles +playing monopoly +playing netball +playing ocarina +playing organ +playing paintball +playing pan pipes +playing piano +playing pinball +playing ping pong +playing poker +playing polo +playing recorder +playing rubiks cube +playing saxophone +playing scrabble +playing squash or racquetball +playing tennis +playing trombone +playing trumpet +playing ukulele +playing violin +playing volleyball +playing with trains +playing xylophone +poking bellybutton +pole vault +polishing metal +popping balloons +pouring beer +preparing salad +presenting weather forecast +pull ups +pumping fist +pumping gas +punching bag +punching person (boxing) +push up +pushing car +pushing cart +pushing wheelbarrow +pushing wheelchair +putting in contact lenses +putting on eyeliner +putting on foundation +putting on lipstick +putting on mascara +putting on sari +putting on shoes +raising eyebrows +reading book +reading newspaper +recording music +repairing puncture +riding a bike +riding camel +riding elephant +riding mechanical bull +riding mule +riding or walking with horse +riding scooter +riding snow blower +riding unicycle +ripping paper +roasting marshmallows +roasting pig +robot dancing +rock climbing +rock scissors paper +roller skating +rolling pastry +rope pushdown +running on treadmill +sailing +salsa dancing +sanding floor +sausage making +sawing wood +scrambling eggs +scrapbooking +scrubbing face +scuba diving +separating eggs +setting table +sewing +shaking hands +shaking head +shaping bread dough +sharpening knives +sharpening pencil +shaving head +shaving legs +shearing sheep +shining flashlight +shining shoes +shooting basketball +shooting goal (soccer) +shopping +shot put +shoveling snow +shucking oysters +shuffling cards +shuffling feet +side kick +sign language interpreting +singing +sipping cup +situp +skateboarding +ski jumping +skiing crosscountry +skiing mono +skiing slalom +skipping rope +skipping stone +skydiving +slacklining +slapping +sled dog racing +sleeping +smashing +smelling feet +smoking +smoking hookah +smoking pipe +snatch weight lifting +sneezing +snorkeling +snowboarding +snowkiting +snowmobiling +somersaulting +spelunking +spinning poi +spray painting +springboard diving +square dancing +squat +standing on hands +staring +steer roping +sticking tongue out +stomping grapes +stretching arm +stretching leg +sucking lolly +surfing crowd +surfing water +sweeping floor +swimming backstroke +swimming breast stroke +swimming butterfly stroke +swimming front crawl +swing dancing +swinging baseball bat +swinging on something +sword fighting +sword swallowing +tackling +tagging graffiti +tai chi +talking on cell phone +tango dancing +tap dancing +tapping guitar +tapping pen +tasting beer +tasting food +tasting wine +testifying +texting +threading needle +throwing axe +throwing ball (not baseball or American football) +throwing discus +throwing knife +throwing snowballs +throwing tantrum +throwing water balloon +tickling +tie dying +tightrope walking +tiptoeing +tobogganing +tossing coin +training dog +trapezing +trimming or shaving beard +trimming shrubs +trimming trees +triple jump +twiddling fingers +tying bow tie +tying knot (not on a tie) +tying necktie +tying shoe laces +unboxing +unloading truck +using a microscope +using a paint roller +using a power drill +using a sledge hammer +using a wrench +using atm +using bagging machine +using circular saw +using inhaler +using puppets +using remote controller (not gaming) +using segway +vacuuming floor +visiting the zoo +wading through mud +wading through water +waiting in line +waking up +walking the dog +walking through snow +washing dishes +washing feet +washing hair +washing hands +watching tv +water skiing +water sliding +watering plants +waving hand +waxing back +waxing chest +waxing eyebrows +waxing legs +weaving basket +weaving fabric +welding +whistling +windsurfing +winking +wood burning (art) +wrapping present +wrestling +writing +yarn spinning +yawning +yoga +zumba diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/label_map_k700.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/label_map_k700.txt new file mode 100644 index 0000000000000000000000000000000000000000..2ce7e6fa5c0d4ae2d5f9ad464667bdb391307a8b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/label_map_k700.txt @@ -0,0 +1,700 @@ +abseiling +acting in play +adjusting glasses +air drumming +alligator wrestling +answering questions +applauding +applying cream +archaeological excavation +archery +arguing +arm wrestling +arranging flowers +arresting +assembling bicycle +assembling computer +attending conference +auctioning +baby waking up +backflip (human) +baking cookies +bandaging +barbequing +bartending +base jumping +bathing dog +battle rope training +beatboxing +bee keeping +being excited +being in zero gravity +belly dancing +bench pressing +bending back +bending metal +biking through snow +blasting sand +blending fruit +blowdrying hair +blowing bubble gum +blowing glass +blowing leaves +blowing nose +blowing out candles +bobsledding +bodysurfing +bookbinding +bottling +bouncing ball (not juggling) +bouncing on bouncy castle +bouncing on trampoline +bowling +braiding hair +breading or breadcrumbing +breakdancing +breaking boards +breaking glass +breathing fire +brush painting +brushing floor +brushing hair +brushing teeth +building cabinet +building lego +building sandcastle +building shed +bulldozing +bungee jumping +burping +busking +calculating +calligraphy +canoeing or kayaking +capoeira +capsizing +card stacking +card throwing +carrying baby +carrying weight +cartwheeling +carving ice +carving marble +carving pumpkin +carving wood with a knife +casting fishing line +catching fish +catching or throwing baseball +catching or throwing frisbee +catching or throwing softball +celebrating +changing gear in car +changing oil +changing wheel (not on bike) +chasing +checking tires +checking watch +cheerleading +chewing gum +chiseling stone +chiseling wood +chopping meat +chopping wood +clam digging +clapping +clay pottery making +clean and jerk +cleaning gutters +cleaning pool +cleaning shoes +cleaning toilet +cleaning windows +climbing a rope +climbing ladder +climbing tree +closing door +coloring in +combing hair +contact juggling +contorting +cooking chicken +cooking egg +cooking on campfire +cooking sausages (not on barbeque) +cooking scallops +cosplaying +coughing +counting money +country line dancing +cracking back +cracking knuckles +cracking neck +crawling baby +crocheting +crossing eyes +crossing river +crying +cumbia +curling (sport) +curling eyelashes +curling hair +cutting apple +cutting cake +cutting nails +cutting orange +cutting pineapple +cutting watermelon +dancing ballet +dancing charleston +dancing gangnam style +dancing macarena +deadlifting +dealing cards +decorating the christmas tree +decoupage +delivering mail +digging +dining +directing traffic +disc golfing +diving cliff +docking boat +dodgeball +doing aerobics +doing jigsaw puzzle +doing laundry +doing nails +doing sudoku +drawing +dribbling basketball +drinking shots +driving car +driving tractor +drooling +drop kicking +drumming fingers +dumpster diving +dunking basketball +dyeing eyebrows +dyeing hair +eating burger +eating cake +eating carrots +eating chips +eating doughnuts +eating hotdog +eating ice cream +eating nachos +eating spaghetti +eating watermelon +egg hunting +embroidering +entering church +exercising arm +exercising with an exercise ball +extinguishing fire +faceplanting +falling off bike +falling off chair +feeding birds +feeding fish +feeding goats +fencing (sport) +fidgeting +filling cake +filling eyebrows +finger snapping +fixing bicycle +fixing hair +flint knapping +flipping bottle +flipping pancake +fly tying +flying kite +folding clothes +folding napkins +folding paper +front raises +frying vegetables +gargling +geocaching +getting a haircut +getting a piercing +getting a tattoo +giving or receiving award +gold panning +golf chipping +golf driving +golf putting +gospel singing in church +grinding meat +grooming cat +grooming dog +grooming horse +gymnastics tumbling +hammer throw +hand washing clothes +head stand +headbanging +headbutting +helmet diving +herding cattle +high fiving +high jump +high kick +historical reenactment +hitting baseball +hockey stop +holding snake +home roasting coffee +hopscotch +hoverboarding +huddling +hugging (not baby) +hugging baby +hula hooping +hurdling +hurling (sport) +ice climbing +ice fishing +ice skating +ice swimming +inflating balloons +installing carpet +ironing +ironing hair +javelin throw +jaywalking +jetskiing +jogging +juggling balls +juggling fire +juggling soccer ball +jumping bicycle +jumping into pool +jumping jacks +jumping sofa +jumpstyle dancing +karaoke +kicking field goal +kicking soccer ball +kissing +kitesurfing +knitting +krumping +land sailing +laughing +lawn mower racing +laying bricks +laying concrete +laying decking +laying stone +laying tiles +leatherworking +letting go of balloon +licking +lifting hat +lighting candle +lighting fire +listening with headphones +lock picking +long jump +longboarding +looking at phone +looking in mirror +luge +lunge +making a cake +making a sandwich +making balloon shapes +making bubbles +making cheese +making horseshoes +making jewelry +making latte art +making paper aeroplanes +making pizza +making slime +making snowman +making sushi +making tea +making the bed +marching +marriage proposal +massaging back +massaging feet +massaging legs +massaging neck +massaging person's head +metal detecting +milking cow +milking goat +mixing colours +moon walking +mopping floor +mosh pit dancing +motorcycling +mountain climber (exercise) +moving baby +moving child +moving furniture +mowing lawn +mushroom foraging +needle felting +news anchoring +opening bottle (not wine) +opening coconuts +opening door +opening present +opening refrigerator +opening wine bottle +packing +paragliding +parasailing +parkour +passing American football (in game) +passing American football (not in game) +passing soccer ball +peeling apples +peeling banana +peeling potatoes +person collecting garbage +petting animal (not cat) +petting cat +petting horse +photobombing +photocopying +picking apples +picking blueberries +pillow fight +pinching +pirouetting +planing wood +planting trees +plastering +playing accordion +playing american football +playing badminton +playing bagpipes +playing basketball +playing bass guitar +playing beer pong +playing billiards +playing blackjack +playing cards +playing cello +playing checkers +playing chess +playing clarinet +playing controller +playing cricket +playing cymbals +playing darts +playing didgeridoo +playing dominoes +playing drums +playing field hockey +playing flute +playing gong +playing guitar +playing hand clapping games +playing harmonica +playing harp +playing ice hockey +playing keyboard +playing kickball +playing laser tag +playing lute +playing mahjong +playing maracas +playing marbles +playing monopoly +playing netball +playing nose flute +playing oboe +playing ocarina +playing organ +playing paintball +playing pan pipes +playing piano +playing piccolo +playing pinball +playing ping pong +playing poker +playing polo +playing recorder +playing road hockey +playing rounders +playing rubiks cube +playing saxophone +playing scrabble +playing shuffleboard +playing slot machine +playing squash or racquetball +playing tennis +playing trombone +playing trumpet +playing ukulele +playing violin +playing volleyball +playing with trains +playing xylophone +poaching eggs +poking bellybutton +pole vault +polishing furniture +polishing metal +popping balloons +pouring beer +pouring milk +pouring wine +preparing salad +presenting weather forecast +pretending to be a statue +pull ups +pulling espresso shot +pulling rope (game) +pumping fist +pumping gas +punching bag +punching person (boxing) +push up +pushing car +pushing cart +pushing wheelbarrow +pushing wheelchair +putting in contact lenses +putting on eyeliner +putting on foundation +putting on lipstick +putting on mascara +putting on sari +putting on shoes +putting wallpaper on wall +raising eyebrows +reading book +reading newspaper +recording music +repairing puncture +riding a bike +riding camel +riding elephant +riding mechanical bull +riding mule +riding or walking with horse +riding scooter +riding snow blower +riding unicycle +ripping paper +roasting marshmallows +roasting pig +robot dancing +rock climbing +rock scissors paper +roller skating +rolling eyes +rolling pastry +rope pushdown +running on treadmill +sailing +salsa dancing +saluting +sanding floor +sanding wood +sausage making +sawing wood +scrambling eggs +scrapbooking +scrubbing face +scuba diving +seasoning food +separating eggs +setting table +sewing +shaking hands +shaking head +shaping bread dough +sharpening knives +sharpening pencil +shaving head +shaving legs +shearing sheep +shining flashlight +shining shoes +shoot dance +shooting basketball +shooting goal (soccer) +shooting off fireworks +shopping +shot put +shouting +shoveling snow +shredding paper +shucking oysters +shuffling cards +shuffling feet +side kick +sieving +sign language interpreting +silent disco +singing +sipping cup +situp +skateboarding +ski ballet +ski jumping +skiing crosscountry +skiing mono +skiing slalom +skipping rope +skipping stone +skydiving +slacklining +slapping +sled dog racing +sleeping +slicing onion +smashing +smelling feet +smoking +smoking hookah +smoking pipe +snatch weight lifting +sneezing +snorkeling +snowboarding +snowkiting +snowmobiling +somersaulting +spelunking +spinning plates +spinning poi +splashing water +spray painting +spraying +springboard diving +square dancing +squat +squeezing orange +stacking cups +stacking dice +standing on hands +staring +steer roping +steering car +sticking tongue out +stomping grapes +stretching arm +stretching leg +sucking lolly +surfing crowd +surfing water +surveying +sweeping floor +swimming backstroke +swimming breast stroke +swimming butterfly stroke +swimming front crawl +swimming with dolphins +swimming with sharks +swing dancing +swinging baseball bat +swinging on something +sword fighting +sword swallowing +tackling +tagging graffiti +tai chi +taking photo +talking on cell phone +tango dancing +tap dancing +tapping guitar +tapping pen +tasting beer +tasting food +tasting wine +testifying +texting +threading needle +throwing axe +throwing ball (not baseball or American football) +throwing discus +throwing knife +throwing snowballs +throwing tantrum +throwing water balloon +tickling +tie dying +tightrope walking +tiptoeing +tobogganing +tossing coin +tossing salad +training dog +trapezing +treating wood +trimming or shaving beard +trimming shrubs +trimming trees +triple jump +twiddling fingers +tying bow tie +tying knot (not on a tie) +tying necktie +tying shoe laces +unboxing +uncorking champagne +unloading truck +using a microscope +using a paint roller +using a power drill +using a sledge hammer +using a wrench +using atm +using bagging machine +using circular saw +using inhaler +using megaphone +using puppets +using remote controller (not gaming) +using segway +vacuuming car +vacuuming floor +visiting the zoo +wading through mud +wading through water +waiting in line +waking up +walking on stilts +walking the dog +walking through snow +walking with crutches +washing dishes +washing feet +washing hair +washing hands +watching tv +water skiing +water sliding +watering plants +waving hand +waxing armpits +waxing back +waxing chest +waxing eyebrows +waxing legs +weaving basket +weaving fabric +welding +whistling +windsurfing +winking +wood burning (art) +wrapping present +wrestling +writing +yarn spinning +yawning +yoga +zumba diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/rename_classnames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/rename_classnames.sh new file mode 100644 index 0000000000000000000000000000000000000000..a2b7a1b405d014a9f25cc75d09ed639335d0b95e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/kinetics/rename_classnames.sh @@ -0,0 +1,29 @@ +#!/usr/bin/env bash + +# Rename classname for convenience +DATASET=$1 +if [ "$DATASET" == "kinetics400" ] || [ "$1" == "kinetics600" ] || [ "$1" == "kinetics700" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support kinetics400, kinetics600 or kinetics700" + exit 0 +fi + +cd ../../../data/${DATASET}/ +ls ./videos_train | while read class; do \ + newclass=`echo $class | tr " " "_" `; + if [ "${class}" != "${newclass}" ] + then + mv "videos_train/${class}" "videos_train/${newclass}"; + fi +done + +ls ./videos_val | while read class; do \ + newclass=`echo $class | tr " " "_" `; + if [ "${class}" != "${newclass}" ] + then + mv "videos_val/${class}" "videos_val/${newclass}"; + fi +done + +cd ../../tools/data/kinetics/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mit/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e67ca45335a98acf5e763224680521dddc16c532 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/README.md @@ -0,0 +1,128 @@ +# Preparing Moments in Time + +## Introduction + + + +```BibTeX +@article{monfortmoments, + title={Moments in Time Dataset: one million videos for event understanding}, + author={Monfort, Mathew and Andonian, Alex and Zhou, Bolei and Ramakrishnan, Kandan and Bargal, Sarah Adel and Yan, Tom and Brown, Lisa and Fan, Quanfu and Gutfruend, Dan and Vondrick, Carl and others}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + year={2019}, + issn={0162-8828}, + pages={1--8}, + numpages={8}, + doi={10.1109/TPAMI.2019.2901464}, +} +``` + +For basic dataset information, you can refer to the dataset [website](http://moments.csail.mit.edu/). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/mit/`. + +## Step 1. Prepare Annotations and Videos + +First of all, you have to visit the official [website](http://moments.csail.mit.edu/), fill in an application form for downloading the dataset. Then you will get the download link. You can use `bash preprocess_data.sh` to prepare annotations and videos. However, the download command is missing in that script. Remember to download the dataset to the proper place follow the comment in this script. + +For better decoding speed, you can resize the original videos into smaller sized, densely encoded version by: + +```shell +python ../resize_videos.py ../../../data/mit/videos/ ../../../data/mit/videos_256p_dense_cache --dense --level 2 +``` + +## Step 2. Extract RGB and Flow + +This part is **optional** if you only want to use the video loader. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. And you can run the following script to soft link the extracted frames. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/mit_extracted/ +ln -s /mnt/SSD/mit_extracted/ ../../../data/mit/rawframes +``` + +If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract **RGB-only** frames using denseflow. + +```shell +bash extract_rgb_frames.sh +``` + +If you didn't install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images. + +```shell +bash extract_rgb_frames_opencv.sh +``` + +If both are required, run the following script to extract frames. + +```shell +bash extract_frames.sh +``` + +## Step 4. Generate File List + +you can run the follow script to generate file list in the format of rawframes and videos. + +```shell +bash generate_{rawframes, videos}_filelist.sh +``` + +## Step 5. Check Directory Structure + +After the whole data process for Moments in Time preparation, +you will get the rawframes (RGB + Flow), videos and annotation files for Moments in Time. + +In the context of the whole project (for Moments in Time only), the folder structure will look like: + +``` +mmaction2 +├── data +│   └── mit +│   ├── annotations +│   │   ├── license.txt +│   │   ├── moments_categories.txt +│   │   ├── README.txt +│   │   ├── trainingSet.csv +│   │   └── validationSet.csv +│   ├── mit_train_rawframe_anno.txt +│   ├── mit_train_video_anno.txt +│   ├── mit_val_rawframe_anno.txt +│   ├── mit_val_video_anno.txt +│   ├── rawframes +│   │   ├── training +│   │   │   ├── adult+female+singing +│   │   │   │   ├── 0P3XG_vf91c_35 +│   │   │   │   │   ├── flow_x_00001.jpg +│   │   │   │   │   ├── flow_x_00002.jpg +│   │   │   │   │   ├── ... +│   │   │   │   │   ├── flow_y_00001.jpg +│   │   │   │   │   ├── flow_y_00002.jpg +│   │   │   │   │   ├── ... +│   │   │   │   │   ├── img_00001.jpg +│   │   │   │   │   └── img_00002.jpg +│   │   │   │   └── yt-zxQfALnTdfc_56 +│   │   │   │   │   ├── ... +│   │   │   └── yawning +│   │   │   ├── _8zmP1e-EjU_2 +│   │   │      │   ├── ... +│   │   └── validation +│   │   │   ├── ... +│   └── videos +│   ├── training +│   │   ├── adult+female+singing +│   │   │   ├── 0P3XG_vf91c_35.mp4 +│   │   │   ├── ... +│   │   │   └── yt-zxQfALnTdfc_56.mp4 +│   │   └── yawning +│   │   ├── ... +│   └── validation +│   │   ├── ... +└── mmaction +└── ... + +``` + +For training and evaluating on Moments in Time, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mit/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..74a3d0c247f89b193f1476dbfa89b8d9bd25ff16 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/README_zh-CN.md @@ -0,0 +1,130 @@ +# 准备 Moments in Time + +## 简介 + + + +```BibTeX +@article{monfortmoments, + title={Moments in Time Dataset: one million videos for event understanding}, + author={Monfort, Mathew and Andonian, Alex and Zhou, Bolei and Ramakrishnan, Kandan and Bargal, Sarah Adel and Yan, Tom and Brown, Lisa and Fan, Quanfu and Gutfruend, Dan and Vondrick, Carl and others}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + year={2019}, + issn={0162-8828}, + pages={1--8}, + numpages={8}, + doi={10.1109/TPAMI.2019.2901464}, +} +``` + +用户可以参照数据集 [官网](http://moments.csail.mit.edu/),获取数据集相关的基本信息。 +在准备数据集前,请确保命令行当前路径为 `$MMACTION2/tools/data/mit/`。 + +## 步骤 1. 准备标注文件和视频文件 + +首先,用户需要访问[官网](http://moments.csail.mit.edu/),填写申请表来下载数据集。 +在得到下载链接后,用户可以使用 `bash preprocess_data.sh` 来准备标注文件和视频。 +请注意此脚本并没有下载标注和视频文件,用户需要根据脚本文件中的注释,提前下载好数据集,并放/软链接到合适的位置。 + +为加快视频解码速度,用户需要缩小原视频的尺寸,可使用以下命令获取密集编码版视频: + +```shell +python ../resize_videos.py ../../../data/mit/videos/ ../../../data/mit/videos_256p_dense_cache --dense --level 2 +``` + +## Step 2. 抽取帧和光流 + +如果用户只想使用视频加载训练,则该部分是 **可选项**。 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果用户有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 上。 +用户可使用以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行指令进行抽取(假设 SSD 挂载在 "/mnt/SSD/"上) +mkdir /mnt/SSD/mit_extracted/ +ln -s /mnt/SSD/mit_extracted/ ../../../data/mit/rawframes +``` + +如果用户需要抽取 RGB 帧(因为抽取光流的过程十分耗时),可以考虑运行以下命令使用 denseflow **只抽取 RGB 帧**。 + +```shell +bash extract_rgb_frames.sh +``` + +如果用户没有安装 denseflow,则可以运行以下命令使用 OpenCV 抽取 RGB 帧。然而,该方法只能抽取与原始视频分辨率相同的帧。 + +```shell +bash extract_rgb_frames_opencv.sh +``` + +如果用户想抽取 RGB 帧和光流,则可以运行以下脚本进行抽取。 + +```shell +bash extract_frames.sh +``` + +## 步骤 3. 生成文件列表 + +用户可以通过运行以下命令生成帧和视频格式的文件列表。 + +```shell +bash generate_{rawframes, videos}_filelist.sh +``` + +## 步骤 4. 检查目录结构 + +在完成 Moments in Time 数据集准备流程后,用户可以得到 Moments in Time 的 RGB 帧 + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,Moments in Time 的文件结构如下: + +``` +mmaction2 +├── data +│   └── mit +│   ├── annotations +│   │   ├── license.txt +│   │   ├── moments_categories.txt +│   │   ├── README.txt +│   │   ├── trainingSet.csv +│   │   └── validationSet.csv +│   ├── mit_train_rawframe_anno.txt +│   ├── mit_train_video_anno.txt +│   ├── mit_val_rawframe_anno.txt +│   ├── mit_val_video_anno.txt +│   ├── rawframes +│   │   ├── training +│   │   │   ├── adult+female+singing +│   │   │   │   ├── 0P3XG_vf91c_35 +│   │   │   │   │   ├── flow_x_00001.jpg +│   │   │   │   │   ├── flow_x_00002.jpg +│   │   │   │   │   ├── ... +│   │   │   │   │   ├── flow_y_00001.jpg +│   │   │   │   │   ├── flow_y_00002.jpg +│   │   │   │   │   ├── ... +│   │   │   │   │   ├── img_00001.jpg +│   │   │   │   │   └── img_00002.jpg +│   │   │   │   └── yt-zxQfALnTdfc_56 +│   │   │   │   │   ├── ... +│   │   │   └── yawning +│   │   │   ├── _8zmP1e-EjU_2 +│   │   │      │   ├── ... +│   │   └── validation +│   │   │   ├── ... +│   └── videos +│   ├── training +│   │   ├── adult+female+singing +│   │   │   ├── 0P3XG_vf91c_35.mp4 +│   │   │   ├── ... +│   │   │   └── yt-zxQfALnTdfc_56.mp4 +│   │   └── yawning +│   │   ├── ... +│   └── validation +│   │   ├── ... +└── mmaction +└── ... + +``` + +关于对 Moments in Times 进行训练和验证,可以参照 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mit/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..477fb8d9a4650ad00ff2d94e09c83225fd0defbd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/extract_frames.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/mit/videos/training ../../data/mit/rawframes/training/ --level 2 --flow-type tvl1 --ext mp4 --task both +echo "Raw frames (RGB and tv-l1) Generated for train set" + +python build_rawframes.py ../../data/mit/vides/validation/ ../../data/mit/rawframes/validation/ --level 2 --flow-type tvl1 --ext mp4 --task both +echo "Raw frames (RGB and tv-l1) Generated for val set" + +cd mit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mit/extract_rgb_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/extract_rgb_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..4b468b52e6fcbbedc1232f0e62eda32e3db194d2 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/extract_rgb_frames.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/mit/videos/training ../../data/mit/rawframes/training/ --level 2 --ext mp4 --task rgb +echo "Raw frames (RGB only) generated for train set" + +python build_rawframes.py ../../data/mit/videos/validation ../../data/mit/rawframes/validation/ --level 2 --ext mp4 --task rgb +echo "Raw frames (RGB only) generated for val set" + +cd mit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mit/extract_rgb_frames_opencv.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/extract_rgb_frames_opencv.sh new file mode 100644 index 0000000000000000000000000000000000000000..004f6e4114b970fd41c175544f0aa544c3941ac0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/extract_rgb_frames_opencv.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/mit/videos/training ../../data/mit/rawframes/training/ --level 2 --ext mp4 --task rgb --use-opencv +echo "Raw frames (RGB only) generated for train set" + +python build_rawframes.py ../../data/mit/videos/validation ../../data/mit/rawframes/validation/ --level 2 --ext mp4 --task rgb --use-opencv +echo "Raw frames (RGB only) generated for val set" + +cd mit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mit/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..f53bcdebfa203d4dcadd276065f39ae39b549fb6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/generate_rawframes_filelist.sh @@ -0,0 +1,9 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py mit data/mit/rawframes/training/ --level 2 --format rawframes --num-split 1 --subset train --shuffle +echo "Train filelist for rawframes generated." + +PYTHONPATH=. python tools/data/build_file_list.py mit data/mit/rawframes/validation/ --level 2 --format rawframes --num-split 1 --subset val --shuffle +echo "Val filelist for rawframes generated." +cd tools/data/mit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mit/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..390d4eb7e6d3370f74fe1b4004897fc1530954f7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/generate_videos_filelist.sh @@ -0,0 +1,9 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py mit data/mit/videos/training/ --level 2 --format videos --num-split 1 --subset train --shuffle +echo "Train filelist for videos generated." + +PYTHONPATH=. python tools/data/build_file_list.py mit data/mit/videos/validation/ --level 2 --format videos --num-split 1 --subset val --shuffle +echo "Val filelist for videos generated." +cd tools/data/mit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mit/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..c1160edf2f932df741b620b36340d4c8afdb9f20 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/label_map.txt @@ -0,0 +1,339 @@ +clapping +praying +dropping +burying +covering +flooding +leaping +drinking +slapping +cuddling +sleeping +preaching +raining +stitching +spraying +twisting +coaching +submerging +breaking +tuning +boarding +running +destroying +competing +giggling +shoveling +chasing +flicking +pouring +buttoning +hammering +carrying +surfing +pulling +squatting +aiming +crouching +tapping +skipping +washing +winking +queuing +locking +stopping +sneezing +flipping +sewing +clipping +working +rocking +asking +playing+fun +camping +plugging +pedaling +constructing +slipping +sweeping +screwing +shrugging +hitchhiking +cracking +scratching +trimming +selling +marching +stirring +kissing +jumping +starting +clinging +socializing +picking +splashing +licking +kicking +sliding +filming +driving +handwriting +steering +filling +crashing +stealing +pressing +shouting +hiking +vacuuming +pointing +giving +diving +hugging +building +swerving +dining +floating +cheerleading +leaning +sailing +singing +playing +hitting +bubbling +joining +bathing +raising +sitting +drawing +protesting +rinsing +coughing +smashing +slicing +balancing +rafting +kneeling +dunking +brushing +crushing +rubbing +punting +watering +playing+music +removing +tearing +imitating +teaching +cooking +reaching +studying +serving +bulldozing +shaking +discussing +dragging +gardening +performing +officiating +photographing +sowing +dripping +writing +clawing +bending +boxing +mopping +gripping +flowing +digging +tripping +cheering +buying +bicycling +feeding +emptying +unpacking +sketching +standing +weeding +stacking +drying +crying +spinning +frying +cutting +paying +eating +lecturing +dancing +adult+female+speaking +boiling +peeling +wrapping +wetting +attacking +welding +putting +swinging +carving +walking +dressing +inflating +climbing +shredding +reading +sanding +frowning +closing +hunting +clearing +launching +packaging +fishing +spilling +leaking +knitting +boating +sprinkling +baptizing +playing+sports +rolling +spitting +dipping +riding +chopping +extinguishing +applauding +calling +talking +adult+male+speaking +snowing +shaving +marrying +rising +laughing +crawling +flying +assembling +injecting +landing +operating +packing +descending +falling +entering +pushing +sawing +smelling +overflowing +fighting +waking +barbecuing +skating +painting +drilling +punching +tying +manicuring +plunging +grilling +pitching +towing +telephoning +crafting +knocking +playing+videogames +storming +placing +turning +barking +child+singing +opening +waxing +juggling +mowing +shooting +sniffing +interviewing +stomping +chewing +arresting +grooming +rowing +bowing +gambling +saluting +fueling +autographing +throwing +drenching +waving +signing +repairing +baking +smoking +skiing +drumming +child+speaking +blowing +cleaning +combing +spreading +racing +combusting +adult+female+singing +fencing +swimming +adult+male+singing +snuggling +shopping +bouncing +dusting +stroking +snapping +biting +roaring +guarding +unloading +lifting +instructing +folding +measuring +whistling +exiting +stretching +taping +squinting +catching +draining +massaging +scrubbing +handcuffing +celebrating +jogging +colliding +bowling +resting +blocking +smiling +tattooing +erupting +howling +parading +grinning +sprinting +hanging +planting +speaking +ascending +yawning +cramming +burning +wrestling +poking +tickling +exercising +loading +piloting +typing diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mit/preprocess_data.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/preprocess_data.sh new file mode 100644 index 0000000000000000000000000000000000000000..f1194273def9dcbfb8cc1bb82004e22ea35ba7a7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mit/preprocess_data.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash + +DATA_DIR="../../../data/mit/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} + +# Download the Moments_in_Time_Raw.zip here manually +unzip Moments_in_Time_Raw.zip +rm Moments_in_Time_Raw.zip + +if [ ! -d "./videos" ]; then + mkdir ./videos +fi +mv ./training ./videos && mv ./validation ./video + +if [ ! -d "./annotations" ]; then + mkdir ./annotations +fi + +mv *.txt annotations && mv *.csv annotations + +cd "../../tools/data/mit" diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5deedf71d05a0382a5daf80260645c28b03b7030 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/README.md @@ -0,0 +1,113 @@ +# Preparing Multi-Moments in Time + +## Introduction + + + +```BibTeX +@misc{monfort2019multimoments, + title={Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding}, + author={Mathew Monfort and Kandan Ramakrishnan and Alex Andonian and Barry A McNamara and Alex Lascelles, Bowen Pan, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva}, + year={2019}, + eprint={1911.00232}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +For basic dataset information, you can refer to the dataset [website](http://moments.csail.mit.edu). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/mmit/`. + +## Step 1. Prepare Annotations and Videos + +First of all, you have to visit the official [website](http://moments.csail.mit.edu/), fill in an application form for downloading the dataset. Then you will get the download link. You can use `bash preprocess_data.sh` to prepare annotations and videos. However, the download command is missing in that script. Remember to download the dataset to the proper place follow the comment in this script. + +For better decoding speed, you can resize the original videos into smaller sized, densely encoded version by: + +``` +python ../resize_videos.py ../../../data/mmit/videos/ ../../../data/mmit/videos_256p_dense_cache --dense --level 2 +``` + +## Step 2. Extract RGB and Flow + +This part is **optional** if you only want to use the video loader. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +First, you can run the following script to soft link SSD. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/mmit_extracted/ +ln -s /mnt/SSD/mmit_extracted/ ../../../data/mmit/rawframes +``` + +If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract **RGB-only** frames using denseflow. + +```shell +bash extract_rgb_frames.sh +``` + +If you didn't install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images. + +```shell +bash extract_rgb_frames_opencv.sh +``` + +If both are required, run the following script to extract frames using "tvl1" algorithm. + +```shell +bash extract_frames.sh +``` + +## Step 3. Generate File List + +you can run the follow script to generate file list in the format of rawframes or videos. + +```shell +bash generate_rawframes_filelist.sh +bash generate_videos_filelist.sh +``` + +## Step 4. Check Directory Structure + +After the whole data process for Multi-Moments in Time preparation, +you will get the rawframes (RGB + Flow), videos and annotation files for Multi-Moments in Time. + +In the context of the whole project (for Multi-Moments in Time only), the folder structure will look like: + +``` +mmaction2/ +└── data + └── mmit + ├── annotations + │   ├── moments_categories.txt + │   ├── trainingSet.txt + │   └── validationSet.txt + ├── mmit_train_rawframes.txt + ├── mmit_train_videos.txt + ├── mmit_val_rawframes.txt + ├── mmit_val_videos.txt + ├── rawframes + │   ├── 0-3-6-2-9-1-2-6-14603629126_5 + │   │   ├── flow_x_00001.jpg + │   │   ├── flow_x_00002.jpg + │   │   ├── ... + │   │   ├── flow_y_00001.jpg + │   │   ├── flow_y_00002.jpg + │   │   ├── ... + │   │   ├── img_00001.jpg + │   │   └── img_00002.jpg + │   │   ├── ... + │   └── yt-zxQfALnTdfc_56 + │   │   ├── ... + │   └── ... + + └── videos + └── adult+female+singing + ├── 0-3-6-2-9-1-2-6-14603629126_5.mp4 + └── yt-zxQfALnTdfc_56.mp4 + └── ... +``` + +For training and evaluating on Multi-Moments in Time, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..e070505e343384a9d39de86bf1078477efe07c31 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/README_zh-CN.md @@ -0,0 +1,115 @@ +# 准备 Multi-Moments in Time + +## 简介 + + + +```BibTeX +@misc{monfort2019multimoments, + title={Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding}, + author={Mathew Monfort and Kandan Ramakrishnan and Alex Andonian and Barry A McNamara and Alex Lascelles, Bowen Pan, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva}, + year={2019}, + eprint={1911.00232}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +用户可以参照数据集 [官网](http://moments.csail.mit.edu/),获取数据集相关的基本信息。 +在准备数据集前,请确保命令行当前路径为 `$MMACTION2/tools/data/mmit/`。 + +## 步骤 1. Prepare Annotations and Videos + +首先,用户需要访问[官网](http://moments.csail.mit.edu/),填写申请表来下载数据集。 +在得到下载链接后,用户可以使用 `bash preprocess_data.sh` 来准备标注文件和视频。 +请注意此脚本并没有下载标注和视频文件,用户需要根据脚本文件中的注释,提前下载好数据集,并放/软链接到合适的位置。 + +为加快视频解码速度,用户需要缩小原视频的尺寸,可使用以下命令获取密集编码版视频: + +``` +python ../resize_videos.py ../../../data/mmit/videos/ ../../../data/mmit/videos_256p_dense_cache --dense --level 2 +``` + +## Step 2. 抽取帧和光流 + +如果用户只想使用视频加载训练,则该部分是 **可选项**。 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果用户有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 上。 +用户可使用以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行指令进行抽取(假设 SSD 挂载在 "/mnt/SSD/"上) +mkdir /mnt/SSD/mmit_extracted/ +ln -s /mnt/SSD/mmit_extracted/ ../../../data/mmit/rawframes +``` + +如果用户需要抽取 RGB 帧(因为抽取光流的过程十分耗时),可以考虑运行以下命令使用 denseflow **只抽取 RGB 帧**。 + +```shell +bash extract_rgb_frames.sh +``` + +如果用户没有安装 denseflow,则可以运行以下命令使用 OpenCV 抽取 RGB 帧。然而,该方法只能抽取与原始视频分辨率相同的帧。 + +```shell +bash extract_rgb_frames_opencv.sh +``` + +如果用户想抽取 RGB 帧和光流,则可以运行以下脚本进行抽取。 + +```shell +bash extract_frames.sh +``` + +## 步骤 3. 生成文件列表 + +用户可以通过运行以下命令生成帧和视频格式的文件列表。 + +```shell +bash generate_rawframes_filelist.sh +bash generate_videos_filelist.sh +``` + +## 步骤 4. 检查目录结构 + +在完成 Multi-Moments in Time 数据集准备流程后,用户可以得到 Multi-Moments in Time 的 RGB 帧 + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,Multi-Moments in Time 的文件结构如下: + +``` +mmaction2/ +└── data + └── mmit + ├── annotations + │   ├── moments_categories.txt + │   ├── trainingSet.txt + │   └── validationSet.txt + ├── mmit_train_rawframes.txt + ├── mmit_train_videos.txt + ├── mmit_val_rawframes.txt + ├── mmit_val_videos.txt + ├── rawframes + │   ├── 0-3-6-2-9-1-2-6-14603629126_5 + │   │   ├── flow_x_00001.jpg + │   │   ├── flow_x_00002.jpg + │   │   ├── ... + │   │   ├── flow_y_00001.jpg + │   │   ├── flow_y_00002.jpg + │   │   ├── ... + │   │   ├── img_00001.jpg + │   │   └── img_00002.jpg + │   │   ├── ... + │   └── yt-zxQfALnTdfc_56 + │   │   ├── ... + │   └── ... + + └── videos + └── adult+female+singing + ├── 0-3-6-2-9-1-2-6-14603629126_5.mp4 + └── yt-zxQfALnTdfc_56.mp4 + └── ... +``` + +关于对 Multi-Moments in Time 进行训练和验证,可以参照 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..548f8625db30dc96105abb96f8ab846e8580c9f3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/extract_frames.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/mmit/videos/ ../../../data/mmit/rawframes/ --task both --level 2 --flow-type tvl1 --ext mp4 +echo "Raw frames (RGB and Flow) Generated" +cd mmit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/extract_rgb_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/extract_rgb_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..869b7093c3c456d7ef20d8870dcd580cca1ad6d3 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/extract_rgb_frames.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/mmit/videos/ ../../data/mmit/rawframes/ --task rgb --level 2 --ext mp4 + +echo "Genearte raw frames (RGB only)" + +cd mmit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/extract_rgb_frames_opencv.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/extract_rgb_frames_opencv.sh new file mode 100644 index 0000000000000000000000000000000000000000..5d09f05a6f352684b71338d0c38d0546c8a5eaa5 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/extract_rgb_frames_opencv.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/mmit/videos/ ../../data/mmit/rawframes/ --task rgb --level 2 --ext mp4 --use-opencv + +echo "Genearte raw frames (RGB only)" + +cd mmit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..2e745c3d03dd105a52dd034808b8f470c7612328 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/generate_rawframes_filelist.sh @@ -0,0 +1,9 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py mmit data/mmit/rawframes/ --level 2 --format rawframes --num-split 1 --subset train --shuffle +echo "Train filelist for rawframes generated." + +PYTHONPATH=. python tools/data/build_file_list.py mmit data/mmit/rawframes/ --level 2 --format rawframes --num-split 1 --subset val --shuffle +echo "Val filelist for rawframes generated." +cd tools/data/mmit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..1fa1f3f06564d85f0b406c6725cc3bee3e3904e4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/generate_videos_filelist.sh @@ -0,0 +1,9 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py mmit data/mmit/videos/ --level 2 --format videos --num-split 1 --subset train --shuffle +echo "Train filelist for videos generated." + +PYTHONPATH=. python tools/data/build_file_list.py mmit data/mmit/videos/ --level 2 --format videos --num-split 1 --subset val --shuffle +echo "Val filelist for videos generated." +cd tools/data/mmit/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..ae89927a8b7c0cf953c1b6bb265fcca49edcf6bf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/label_map.txt @@ -0,0 +1,313 @@ +crafting +paddling +raining +weightlifting +clawing +hitchhiking +autographing +cooking +gripping +swerving +frowning +giving +tattooing +dipping +leaking +plunging +barking +stroking/petting +piloting +camping +towing +loading +parading +submerging +squeezing +sculpting +stomping +punting +kissing +smoking +pouring +texting +adult+male+speaking +adult+female+speaking +crying +unpacking +pointing +boating +landing +ironing +crouching +slapping +typing +ice+skating +boiling +chopping +bowling +fighting/attacking +tapping +applauding +driving +sprinting +slicing +approaching +waving +dusting +wrapping +knocking +snapping +gardening +combing +tickling +carving +smashing +smiling/grinning +dressing +pressing +lecturing +telephoning +exercising +riding +draining +flying +wrestling +boxing +rinsing +overflowing +inflating +picking +sowing +shaving +baking +shaking +running +throwing +stacking/piling +buttoning +leaping +fueling +pitching +child+speaking +breaking/destroying +lifting +filming/photographing +singing +reading +chewing +operating +bubbling +waxing +cleaning/washing +scooping +erasing +steering +playing+videogames +crashing +constructing/assembling +flooding +drinking +praying +shouting +winking +dining +repairing +tying +juggling +rolling +studying +marching +socializing +ascending/rising +arresting +cracking +laying +clinging +frying +vacuuming +combusting/burning +filling +standing +howling +dunking +spraying +bandaging +shivering +slipping +racing +roaring +planting +yawning +grilling +squinting +skiing +taping +trimming +preaching +resting +descending/lowering +clearing +screwing +chasing +speaking +manicuring +tripping +performing +teaching/instructing +blowing +painting +sneezing +packaging +punching +clapping +rotating/spinning +skating +cheerleading +balancing +child+singing +covering +snuggling/cuddling/hugging +bulldozing +jumping +sliding +barbecuing +weeding +swimming +shooting +dialing +measuring +pulling +celebrating +playing+fun +knitting +spreading +erupting +snowboarding +swinging +protesting +sitting +inserting +bouncing +surfing +extinguishing +unloading +aiming +bathing +hammering +fishing +opening +biting +packing +saluting +rafting +laughing +bicycling +rocking +storming +wetting +shrugging +handwriting +gambling +writing +skipping +dragging +unplugging +kicking +sawing +grooming +whistling +floating +diving +rubbing +bending +shoveling/digging +peeling +catching +closing +eating/feeding +falling +discussing +sweeping +massaging +locking +dancing +mowing +clipping +hanging +burying +reaching +kayaking +snowing +sleeping +climbing +flipping +tearing/ripping +folding +signing +cutting +stretching +stirring +licking +kneeling +sewing +dripping +queuing +pushing +pedaling +flossing +buying/selling/shopping +smelling/sniffing +emptying +sanding +smacking +carrying +adult+male+singing +poking +brushing +adult+female+singing +scratching +welding +crawling +skateboarding +turning +dropping +hunting +cheering +drawing +sprinkling +spitting +competing +bowing +hiking +drying +launching +twisting +crushing +hitting/colliding +shredding +plugging +gasping +rowing +calling +drumming +walking +removing +waking +stitching +coughing +playing+music +playing+sports +interviewing +scrubbing +splashing +officiating +mopping +flowing +sailing +drilling +squatting +handcuffing +spilling +marrying +injecting +jogging diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/preprocess_data.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/preprocess_data.sh new file mode 100644 index 0000000000000000000000000000000000000000..5fbf25a4bdd891c65b66ad051a8508603761406c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/mmit/preprocess_data.sh @@ -0,0 +1,20 @@ +DATA_DIR="../../../data/mmit/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} + +# Download the Multi_Moments_in_Time_Raw.zip here manually +unzip Multi_Moments_in_Time_Raw.zip +rm Multi_Moments_in_Time.zip + +if [ ! -d "./annotations" ]; then + mkdir ./annotations +fi + +mv *.txt annotations && mv *.csv annotations + +cd - diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/omnisource/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/omnisource/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ef3ea7e442df8d9eecd5d34fbc4c6829359185f4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/omnisource/README.md @@ -0,0 +1,150 @@ +# Preparing OmniSource + +## Introduction + + + +```BibTeX +@article{duan2020omni, + title={Omni-sourced Webly-supervised Learning for Video Recognition}, + author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua}, + journal={arXiv preprint arXiv:2003.13042}, + year={2020} +} +``` + +We release a subset of the OmniSource web dataset used in the paper [Omni-sourced Webly-supervised Learning for Video Recognition](https://arxiv.org/abs/2003.13042). Since all web dataset in OmniSource are built based on the Kinetics-400 taxonomy, we select those web data related to the 200 classes in Mini-Kinetics subset (which is proposed in [Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification](https://arxiv.org/pdf/1712.04851.pdf)). + +We provide data from all sources that are related to the 200 classes in Mini-Kinetics (including Kinetics trimmed clips, Kinetics untrimmed videos, images from Google and Instagram, video clips from Instagram). To obtain this dataset, please first fill in the [request form](https://docs.google.com/forms/d/e/1FAIpQLSd8_GlmHzG8FcDbW-OEu__G7qLgOSYZpH-i5vYVJcu7wcb_TQ/viewform?usp=sf_link). We will share the download link to you after your request is received. Since we release all data crawled from the web without any filtering, the dataset is large and it may take some time to download them. We describe the size of the datasets in the following table: + +| Dataset Name | #samples | Size | Teacher Model | #samples after filtering | #samples similar to k200_val | +| :-------------: | :------: | :-----: | :--------------: | :----------------------: | :--------------------------: | +| k200_train | 76030 | 45.6G | N/A | N/A | N/A | +| k200_val | 4838 | 2.9G | N/A | N/A | N/A | +| googleimage_200 | 3050880 | 265.5G | TSN-R50-8seg | 1188695 | 967 | +| insimage_200 | 3654650 | 224.4G | TSN-R50-8seg | 879726 | 116 | +| insvideo_200 | 732855 | 1487.6G | SlowOnly-8x8-R50 | 330680 | 956 | +| k200_raw_train | 76027 | 963.5G | SlowOnly-8x8-R50 | N/A | N/A | + +The file structure of our uploaded OmniSource dataset looks like: + +``` +OmniSource/ +├── annotations +│ ├── googleimage_200 +│ │ ├── googleimage_200.txt File list of all valid images crawled from Google. +│ │ ├── tsn_8seg_googleimage_200_duplicate.txt Positive file list of images crawled from Google, which is similar to a validation example. +│ │ ├── tsn_8seg_googleimage_200.txt Positive file list of images crawled from Google, filtered by the teacher model. +│ │ └── tsn_8seg_googleimage_200_wodup.txt Positive file list of images crawled from Google, filtered by the teacher model, after de-duplication. +│ ├── insimage_200 +│ │ ├── insimage_200.txt +│ │ ├── tsn_8seg_insimage_200_duplicate.txt +│ │ ├── tsn_8seg_insimage_200.txt +│ │ └── tsn_8seg_insimage_200_wodup.txt +│ ├── insvideo_200 +│ │ ├── insvideo_200.txt +│ │ ├── slowonly_8x8_insvideo_200_duplicate.txt +│ │ ├── slowonly_8x8_insvideo_200.txt +│ │ └── slowonly_8x8_insvideo_200_wodup.txt +│ ├── k200_actions.txt The list of action names of the 200 classes in MiniKinetics. +│ ├── K400_to_MiniKinetics_classidx_mapping.json The index mapping from Kinetics-400 to MiniKinetics. +│ ├── kinetics_200 +│ │ ├── k200_train.txt +│ │ └── k200_val.txt +│ ├── kinetics_raw_200 +│ │ └── slowonly_8x8_kinetics_raw_200.json Kinetics Raw Clips filtered by the teacher model. +│ └── webimage_200 +│ └── tsn_8seg_webimage_200_wodup.txt The union of `tsn_8seg_googleimage_200_wodup.txt` and `tsn_8seg_insimage_200_wodup.txt` +├── googleimage_200 (10 volumes) +│ ├── vol_0.tar +│ ├── ... +│ └── vol_9.tar +├── insimage_200 (10 volumes) +│ ├── vol_0.tar +│ ├── ... +│ └── vol_9.tar +├── insvideo_200 (20 volumes) +│ ├── vol_00.tar +│ ├── ... +│ └── vol_19.tar +├── kinetics_200_train +│ └── kinetics_200_train.tar +├── kinetics_200_val +│ └── kinetics_200_val.tar +└── kinetics_raw_200_train (16 volumes) + ├── vol_0.tar + ├── ... + └── vol_15.tar +``` + +## Data Preparation + +For data preparation, you need to first download those data. For `kinetics_200` and 3 web datasets: `googleimage_200`, `insimage_200` and `insvideo_200`, you just need to extract each volume and merge their contents. + +For Kinetics raw videos, since loading long videos is very heavy, you need to first trim it into clips. Here we provide a script named `trim_raw_video.py`. It trims a long video into 10-second clips and remove the original raw video. You can use it to trim the Kinetics raw video. + +The data should be placed in `data/OmniSource/`. When data preparation finished, the folder structure of `data/OmniSource` looks like (We omit the files not needed in training & testing for simplicity): + +``` +data/OmniSource/ +├── annotations +│ ├── googleimage_200 +│ │ └── tsn_8seg_googleimage_200_wodup.txt Positive file list of images crawled from Google, filtered by the teacher model, after de-duplication. +│ ├── insimage_200 +│ │ └── tsn_8seg_insimage_200_wodup.txt +│ ├── insvideo_200 +│ │ └── slowonly_8x8_insvideo_200_wodup.txt +│ ├── kinetics_200 +│ │ ├── k200_train.txt +│ │ └── k200_val.txt +│ ├── kinetics_raw_200 +│ │ └── slowonly_8x8_kinetics_raw_200.json Kinetics Raw Clips filtered by the teacher model. +│ └── webimage_200 +│ └── tsn_8seg_webimage_200_wodup.txt The union of `tsn_8seg_googleimage_200_wodup.txt` and `tsn_8seg_insimage_200_wodup.txt` +├── googleimage_200 +│ ├── 000 +| │ ├── 00 +| │ │ ├── 000001.jpg +| │ │ ├── ... +| │ │ └── 000901.jpg +| │ ├── ... +| │ ├── 19 +│ ├── ... +│ └── 199 +├── insimage_200 +│ ├── 000 +| │ ├── abseil +| │ │ ├── 1J9tKWCNgV_0.jpg +| │ │ ├── ... +| │ │ └── 1J9tKWCNgV_0.jpg +| │ ├── abseiling +│ ├── ... +│ └── 199 +├── insvideo_200 +│ ├── 000 +| │ ├── abseil +| │ │ ├── B00arxogubl.mp4 +| │ │ ├── ... +| │ │ └── BzYsP0HIvbt.mp4 +| │ ├── abseiling +│ ├── ... +│ └── 199 +├── kinetics_200_train +│ ├── 0074cdXclLU.mp4 +| ├── ... +| ├── zzzlyL61Fyo.mp4 +├── kinetics_200_val +│ ├── 01fAWEHzudA.mp4 +| ├── ... +| ├── zymA_6jZIz4.mp4 +└── kinetics_raw_200_train +│ ├── pref_ +│ | ├── ___dTOdxzXY +| │ │ ├── part_0.mp4 +| │ │ ├── ... +| │ │ ├── part_6.mp4 +│ | ├── ... +│ | └── _zygwGDE2EM +│ ├── ... +│ └── prefZ +``` diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/omnisource/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/omnisource/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..90aea5f4d171baef1dc8a9515690b98d4863cbea --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/omnisource/README_zh-CN.md @@ -0,0 +1,149 @@ +# 准备 OmniSource + +## 简介 + + + +```BibTeX +@article{duan2020omni, + title={Omni-sourced Webly-supervised Learning for Video Recognition}, + author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua}, + journal={arXiv preprint arXiv:2003.13042}, + year={2020} +} +``` + +MMAction2 中发布了 OmniSource 网络数据集的一个子集 (来自论文 [Omni-sourced Webly-supervised Learning for Video Recognition](https://arxiv.org/abs/2003.13042))。 +OmniSource 数据集中所有类别均来自 Kinetics-400。MMAction2 所提供的子集包含属于 Mini-Kinetics 数据集 200 类动作的网络数据 (Mini-inetics 数据集由论文 [Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification](https://arxiv.org/pdf/1712.04851.pdf) 提出)。 + +MMAction2 提供所有数据源中属于 Mini-Kinetics 200 类动作的数据,这些数据源包含:Kinetics 数据集,Kinetics 原始数据集(未经裁剪的长视频),来自 Google 和 Instagram 的网络图片,来自 Instagram 的网络视频。为获取这一数据集,用户需先填写 [数据申请表](https://docs.google.com/forms/d/e/1FAIpQLSd8_GlmHzG8FcDbW-OEu__G7qLgOSYZpH-i5vYVJcu7wcb_TQ/viewform?usp=sf_link)。在接收到申请后,下载链接将被发送至用户邮箱。由于发布的数据集均为爬取所得的原始数据,数据集较大,下载需要一定时间。下表中提供了 OmniSource 数据集各个分量的统计信息。 + +| 数据集名称 | 样本个数 | 所占空间 | 过滤使用的 Teacher 模型 | 过滤后的样本个数 | 与 k200_val 中样本相似(疑似重复)的样本个数 | +| :-------------: | :------: | :------: | :---------------------: | :--------------: | :------------------------------------------: | +| k200_train | 76030 | 45.6G | N/A | N/A | N/A | +| k200_val | 4838 | 2.9G | N/A | N/A | N/A | +| googleimage_200 | 3050880 | 265.5G | TSN-R50-8seg | 1188695 | 967 | +| insimage_200 | 3654650 | 224.4G | TSN-R50-8seg | 879726 | 116 | +| insvideo_200 | 732855 | 1487.6G | SlowOnly-8x8-R50 | 330680 | 956 | +| k200_raw_train | 76027 | 963.5G | SlowOnly-8x8-R50 | N/A | N/A | + +MMAction2 所发布的 OmniSource 数据集目录结构如下所示: + +``` +OmniSource/ +├── annotations +│ ├── googleimage_200 +│ │ ├── googleimage_200.txt 从 Google 爬取到的所有图片列表 +│ │ ├── tsn_8seg_googleimage_200_duplicate.txt 从 Google 爬取到的,疑似与 k200-val 中样本重复的正样本列表 +│ │ ├── tsn_8seg_googleimage_200.txt 从 Google 爬取到的,经过 teacher 模型过滤的正样本列表 +│ │ └── tsn_8seg_googleimage_200_wodup.txt 从 Google 爬取到的,经过 teacher 模型过滤及去重的正样本列表 +│ ├── insimage_200 +│ │ ├── insimage_200.txt +│ │ ├── tsn_8seg_insimage_200_duplicate.txt +│ │ ├── tsn_8seg_insimage_200.txt +│ │ └── tsn_8seg_insimage_200_wodup.txt +│ ├── insvideo_200 +│ │ ├── insvideo_200.txt +│ │ ├── slowonly_8x8_insvideo_200_duplicate.txt +│ │ ├── slowonly_8x8_insvideo_200.txt +│ │ └── slowonly_8x8_insvideo_200_wodup.txt +│ ├── k200_actions.txt MiniKinetics 中 200 类动作的名称 +│ ├── K400_to_MiniKinetics_classidx_mapping.json Kinetics 中的类索引至 MiniKinetics 中的类索引的映射 +│ ├── kinetics_200 +│ │ ├── k200_train.txt +│ │ └── k200_val.txt +│ └── kinetics_raw_200 +│ └── slowonly_8x8_kinetics_raw_200.json 经 teacher 模型过滤后的 Kinetics 原始视频片段 +├── googleimage_200 共 10 卷 +│ ├── vol_0.tar +│ ├── ... +│ └── vol_9.tar +├── insimage_200 共 10 卷 +│ ├── vol_0.tar +│ ├── ... +│ └── vol_9.tar +├── insvideo_200 共 20 卷 +│ ├── vol_00.tar +│ ├── ... +│ └── vol_19.tar +├── kinetics_200_train +│ └── kinetics_200_train.tar +├── kinetics_200_val +│ └── kinetics_200_val.tar +└── kinetics_raw_200_train 共 16 卷 + ├── vol_0.tar + ├── ... + └── vol_15.tar +``` + +## 数据准备 + +用户需要首先完成数据下载,对于 `kinetics_200` 和三个网络数据集 `googleimage_200`, `insimage_200`, `insvideo_200`,用户仅需解压各压缩卷并将其合并至一处。 + +对于 Kinetics 原始视频,由于直接读取长视频非常耗时,用户需要先将其分割为小段。MMAction2 提供了名为 `trim_raw_video.py` 的脚本,用于将长视频分割至 10 秒的小段(分割完成后删除长视频)。用户可利用这一脚本分割长视频。 + +所有数据应位于 `data/OmniSource/` 目录下。完成数据准备后,`data/OmniSource/` 目录的结构应如下所示(为简洁,省去了训练及测试时未使用的文件): + +``` +data/OmniSource/ +├── annotations +│ ├── googleimage_200 +│ │ └── tsn_8seg_googleimage_200_wodup.txt Positive file list of images crawled from Google, filtered by the teacher model, after de-duplication. +│ ├── insimage_200 +│ │ └── tsn_8seg_insimage_200_wodup.txt +│ ├── insvideo_200 +│ │ └── slowonly_8x8_insvideo_200_wodup.txt +│ ├── kinetics_200 +│ │ ├── k200_train.txt +│ │ └── k200_val.txt +│ ├── kinetics_raw_200 +│ │ └── slowonly_8x8_kinetics_raw_200.json Kinetics Raw Clips filtered by the teacher model. +│ └── webimage_200 +│ └── tsn_8seg_webimage_200_wodup.txt The union of `tsn_8seg_googleimage_200_wodup.txt` and `tsn_8seg_insimage_200_wodup.txt` +├── googleimage_200 +│ ├── 000 +| │ ├── 00 +| │ │ ├── 000001.jpg +| │ │ ├── ... +| │ │ └── 000901.jpg +| │ ├── ... +| │ ├── 19 +│ ├── ... +│ └── 199 +├── insimage_200 +│ ├── 000 +| │ ├── abseil +| │ │ ├── 1J9tKWCNgV_0.jpg +| │ │ ├── ... +| │ │ └── 1J9tKWCNgV_0.jpg +| │ ├── abseiling +│ ├── ... +│ └── 199 +├── insvideo_200 +│ ├── 000 +| │ ├── abseil +| │ │ ├── B00arxogubl.mp4 +| │ │ ├── ... +| │ │ └── BzYsP0HIvbt.mp4 +| │ ├── abseiling +│ ├── ... +│ └── 199 +├── kinetics_200_train +│ ├── 0074cdXclLU.mp4 +| ├── ... +| ├── zzzlyL61Fyo.mp4 +├── kinetics_200_val +│ ├── 01fAWEHzudA.mp4 +| ├── ... +| ├── zymA_6jZIz4.mp4 +└── kinetics_raw_200_train +│ ├── pref_ +│ | ├── ___dTOdxzXY +| │ │ ├── part_0.mp4 +| │ │ ├── ... +| │ │ ├── part_6.mp4 +│ | ├── ... +│ | └── _zygwGDE2EM +│ ├── ... +│ └── prefZ +``` diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/omnisource/trim_raw_video.py b/openmmlab_test/mmaction2-0.24.1/tools/data/omnisource/trim_raw_video.py new file mode 100644 index 0000000000000000000000000000000000000000..81aef771402a3785ae58b77a44a86b54f37c0397 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/omnisource/trim_raw_video.py @@ -0,0 +1,45 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import os +import os.path as osp +import sys +from subprocess import check_output + +import mmcv + + +def get_duration(vid_name): + command = f'ffprobe -i {vid_name} 2>&1 | grep "Duration"' + output = str(check_output(command, shell=True)) + output = output.split(',')[0].split('Duration:')[1].strip() + h, m, s = output.split(':') + duration = int(h) * 3600 + int(m) * 60 + float(s) + return duration + + +def trim(vid_name): + try: + lt = get_duration(vid_name) + except Exception: + print(f'get_duration failed for video {vid_name}', flush=True) + return + + i = 0 + name, _ = osp.splitext(vid_name) + + # We output 10-second clips into the folder `name` + dest = name + mmcv.mkdir_or_exist(dest) + + command_tmpl = ('ffmpeg -y loglevel error -i {} -ss {} -t {} -crf 18 ' + '-c:v libx264 {}/part_{}.mp4') + while i * 10 < lt: + os.system(command_tmpl.format(vid_name, i * 10, 10, dest, i)) + i += 1 + + # remove a raw video after decomposing it into 10-second clip to save space + os.remove(vid_name) + + +if __name__ == '__main__': + vid_name = sys.argv[1] + trim(vid_name) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/parse_file_list.py b/openmmlab_test/mmaction2-0.24.1/tools/data/parse_file_list.py new file mode 100644 index 0000000000000000000000000000000000000000..a87073efa63e9de6a0e2e6d3aa86fc804dbca91e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/parse_file_list.py @@ -0,0 +1,535 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import csv +import fnmatch +import glob +import json +import os +import os.path as osp + + +def parse_directory(path, + rgb_prefix='img_', + flow_x_prefix='flow_x_', + flow_y_prefix='flow_y_', + level=1): + """Parse directories holding extracted frames from standard benchmarks. + + Args: + path (str): Directory path to parse frames. + rgb_prefix (str): Prefix of generated rgb frames name. + default: 'img_'. + flow_x_prefix (str): Prefix of generated flow x name. + default: `flow_x_`. + flow_y_prefix (str): Prefix of generated flow y name. + default: `flow_y_`. + level (int): Directory level for glob searching. Options are 1 and 2. + default: 1. + + Returns: + dict: frame info dict with video id as key and tuple(path(str), + rgb_num(int), flow_x_num(int)) as value. + """ + print(f'parse frames under directory {path}') + if level == 1: + # Only search for one-level directory + def locate_directory(x): + return osp.basename(x) + + frame_dirs = glob.glob(osp.join(path, '*')) + + elif level == 2: + # search for two-level directory + def locate_directory(x): + return osp.join(osp.basename(osp.dirname(x)), osp.basename(x)) + + frame_dirs = glob.glob(osp.join(path, '*', '*')) + + else: + raise ValueError('level can be only 1 or 2') + + def count_files(directory, prefix_list): + """Count file number with a given directory and prefix. + + Args: + directory (str): Data directory to be search. + prefix_list (list): List or prefix. + + Returns: + list (int): Number list of the file with the prefix. + """ + lst = os.listdir(directory) + cnt_list = [len(fnmatch.filter(lst, x + '*')) for x in prefix_list] + return cnt_list + + # check RGB + frame_dict = {} + for i, frame_dir in enumerate(frame_dirs): + total_num = count_files(frame_dir, + (rgb_prefix, flow_x_prefix, flow_y_prefix)) + dir_name = locate_directory(frame_dir) + + num_x = total_num[1] + num_y = total_num[2] + if num_x != num_y: + raise ValueError(f'x and y direction have different number ' + f'of flow images in video directory: {frame_dir}') + if i % 200 == 0: + print(f'{i} videos parsed') + frame_dict[dir_name] = (frame_dir, total_num[0], num_x) + + print('frame directory analysis done') + return frame_dict + + +def parse_ucf101_splits(level): + """Parse UCF-101 dataset into "train", "val", "test" splits. + + Args: + level (int): Directory level of data. 1 for the single-level directory, + 2 for the two-level directory. + + Returns: + list: "train", "val", "test" splits of UCF-101. + """ + class_index_file = 'data/ucf101/annotations/classInd.txt' + train_file_template = 'data/ucf101/annotations/trainlist{:02d}.txt' + test_file_template = 'data/ucf101/annotations/testlist{:02d}.txt' + + with open(class_index_file, 'r') as fin: + class_index = [x.strip().split() for x in fin] + class_mapping = {x[1]: int(x[0]) - 1 for x in class_index} + + def line_to_map(line): + """A function to map line string to video and label. + + Args: + line (str): A long directory path, which is a text path. + + Returns: + tuple[str, str]: (video, label), video is the video id, + label is the video label. + """ + items = line.strip().split() + video = osp.splitext(items[0])[0] + if level == 1: + video = osp.basename(video) + label = items[0] + elif level == 2: + video = osp.join( + osp.basename(osp.dirname(video)), osp.basename(video)) + label = class_mapping[osp.dirname(items[0])] + return video, label + + splits = [] + for i in range(1, 4): + with open(train_file_template.format(i), 'r') as fin: + train_list = [line_to_map(x) for x in fin] + + with open(test_file_template.format(i), 'r') as fin: + test_list = [line_to_map(x) for x in fin] + splits.append((train_list, test_list)) + + return splits + + +def parse_jester_splits(level): + """Parse Jester into "train", "val" splits. + + Args: + level (int): Directory level of data. 1 for the single-level directory, + 2 for the two-level directory. + + Returns: + list: "train", "val", "test" splits of Jester dataset. + """ + # Read the annotations + class_index_file = 'data/jester/annotations/jester-v1-labels.csv' + train_file = 'data/jester/annotations/jester-v1-train.csv' + val_file = 'data/jester/annotations/jester-v1-validation.csv' + test_file = 'data/jester/annotations/jester-v1-test.csv' + + with open(class_index_file, 'r') as fin: + class_index = [x.strip() for x in fin] + class_mapping = {class_index[idx]: idx for idx in range(len(class_index))} + + def line_to_map(line, test_mode=False): + items = line.strip().split(';') + video = items[0] + if level == 1: + video = osp.basename(video) + elif level == 2: + video = osp.join( + osp.basename(osp.dirname(video)), osp.basename(video)) + if test_mode: + return video + + label = class_mapping[items[1]] + return video, label + + with open(train_file, 'r') as fin: + train_list = [line_to_map(x) for x in fin] + + with open(val_file, 'r') as fin: + val_list = [line_to_map(x) for x in fin] + + with open(test_file, 'r') as fin: + test_list = [line_to_map(x, test_mode=True) for x in fin] + + splits = ((train_list, val_list, test_list), ) + return splits + + +def parse_sthv1_splits(level): + """Parse Something-Something dataset V1 into "train", "val" splits. + + Args: + level (int): Directory level of data. 1 for the single-level directory, + 2 for the two-level directory. + + Returns: + list: "train", "val", "test" splits of Something-Something V1 dataset. + """ + # Read the annotations + # yapf: disable + class_index_file = 'data/sthv1/annotations/something-something-v1-labels.csv' # noqa + # yapf: enable + train_file = 'data/sthv1/annotations/something-something-v1-train.csv' + val_file = 'data/sthv1/annotations/something-something-v1-validation.csv' + test_file = 'data/sthv1/annotations/something-something-v1-test.csv' + + with open(class_index_file, 'r') as fin: + class_index = [x.strip() for x in fin] + class_mapping = {class_index[idx]: idx for idx in range(len(class_index))} + + def line_to_map(line, test_mode=False): + items = line.strip().split(';') + video = items[0] + if level == 1: + video = osp.basename(video) + elif level == 2: + video = osp.join( + osp.basename(osp.dirname(video)), osp.basename(video)) + if test_mode: + return video + + label = class_mapping[items[1]] + return video, label + + with open(train_file, 'r') as fin: + train_list = [line_to_map(x) for x in fin] + + with open(val_file, 'r') as fin: + val_list = [line_to_map(x) for x in fin] + + with open(test_file, 'r') as fin: + test_list = [line_to_map(x, test_mode=True) for x in fin] + + splits = ((train_list, val_list, test_list), ) + return splits + + +def parse_sthv2_splits(level): + """Parse Something-Something dataset V2 into "train", "val" splits. + + Args: + level (int): Directory level of data. 1 for the single-level directory, + 2 for the two-level directory. + + Returns: + list: "train", "val", "test" splits of Something-Something V2 dataset. + """ + # Read the annotations + # yapf: disable + class_index_file = 'data/sthv2/annotations/something-something-v2-labels.json' # noqa + # yapf: enable + train_file = 'data/sthv2/annotations/something-something-v2-train.json' + val_file = 'data/sthv2/annotations/something-something-v2-validation.json' + test_file = 'data/sthv2/annotations/something-something-v2-test.json' + + with open(class_index_file, 'r') as fin: + class_mapping = json.loads(fin.read()) + + def line_to_map(item, test_mode=False): + video = item['id'] + if level == 1: + video = osp.basename(video) + elif level == 2: + video = osp.join( + osp.basename(osp.dirname(video)), osp.basename(video)) + if test_mode: + return video + + template = item['template'].replace('[', '') + template = template.replace(']', '') + label = int(class_mapping[template]) + return video, label + + with open(train_file, 'r') as fin: + items = json.loads(fin.read()) + train_list = [line_to_map(item) for item in items] + + with open(val_file, 'r') as fin: + items = json.loads(fin.read()) + val_list = [line_to_map(item) for item in items] + + with open(test_file, 'r') as fin: + items = json.loads(fin.read()) + test_list = [line_to_map(item, test_mode=True) for item in items] + + splits = ((train_list, val_list, test_list), ) + return splits + + +def parse_mmit_splits(): + """Parse Multi-Moments in Time dataset into "train", "val" splits. + + Returns: + list: "train", "val", "test" splits of Multi-Moments in Time. + """ + + # Read the annotations + def line_to_map(x): + video = osp.splitext(x[0])[0] + labels = [int(digit) for digit in x[1:]] + return video, labels + + csv_reader = csv.reader(open('data/mmit/annotations/trainingSet.csv')) + train_list = [line_to_map(x) for x in csv_reader] + + csv_reader = csv.reader(open('data/mmit/annotations/validationSet.csv')) + val_list = [line_to_map(x) for x in csv_reader] + + test_list = val_list # not test for mit + + splits = ((train_list, val_list, test_list), ) + return splits + + +def parse_kinetics_splits(level, dataset): + """Parse Kinetics dataset into "train", "val", "test" splits. + + Args: + level (int): Directory level of data. 1 for the single-level directory, + 2 for the two-level directory. + dataset (str): Denotes the version of Kinetics that needs to be parsed, + choices are "kinetics400", "kinetics600" and "kinetics700". + + Returns: + list: "train", "val", "test" splits of Kinetics. + """ + + def convert_label(s, keep_whitespaces=False): + """Convert label name to a formal string. + + Remove redundant '"' and convert whitespace to '_'. + + Args: + s (str): String to be converted. + keep_whitespaces(bool): Whether to keep whitespace. Default: False. + + Returns: + str: Converted string. + """ + if not keep_whitespaces: + return s.replace('"', '').replace(' ', '_') + + return s.replace('"', '') + + def line_to_map(x, test=False): + """A function to map line string to video and label. + + Args: + x (str): A single line from Kinetics csv file. + test (bool): Indicate whether the line comes from test + annotation file. + + Returns: + tuple[str, str]: (video, label), video is the video id, + label is the video label. + """ + if test: + # video = f'{x[0]}_{int(x[1]):06d}_{int(x[2]):06d}' + video = f'{x[1]}_{int(float(x[2])):06d}_{int(float(x[3])):06d}' + label = -1 # label unknown + return video, label + + video = f'{x[1]}_{int(float(x[2])):06d}_{int(float(x[3])):06d}' + if level == 2: + video = f'{convert_label(x[0])}/{video}' + else: + assert level == 1 + label = class_mapping[convert_label(x[0])] + return video, label + + train_file = f'data/{dataset}/annotations/kinetics_train.csv' + val_file = f'data/{dataset}/annotations/kinetics_val.csv' + test_file = f'data/{dataset}/annotations/kinetics_test.csv' + + csv_reader = csv.reader(open(train_file)) + # skip the first line + next(csv_reader) + + labels_sorted = sorted({convert_label(row[0]) for row in csv_reader}) + class_mapping = {label: i for i, label in enumerate(labels_sorted)} + + csv_reader = csv.reader(open(train_file)) + next(csv_reader) + train_list = [line_to_map(x) for x in csv_reader] + + csv_reader = csv.reader(open(val_file)) + next(csv_reader) + val_list = [line_to_map(x) for x in csv_reader] + + csv_reader = csv.reader(open(test_file)) + next(csv_reader) + test_list = [line_to_map(x, test=True) for x in csv_reader] + + splits = ((train_list, val_list, test_list), ) + return splits + + +def parse_mit_splits(): + """Parse Moments in Time dataset into "train", "val" splits. + + Returns: + list: "train", "val", "test" splits of Moments in Time. + """ + # Read the annotations + class_mapping = {} + with open('data/mit/annotations/moments_categories.txt') as f_cat: + for line in f_cat.readlines(): + cat, digit = line.rstrip().split(',') + class_mapping[cat] = int(digit) + + def line_to_map(x): + video = osp.splitext(x[0])[0] + label = class_mapping[osp.dirname(x[0])] + return video, label + + csv_reader = csv.reader(open('data/mit/annotations/trainingSet.csv')) + train_list = [line_to_map(x) for x in csv_reader] + + csv_reader = csv.reader(open('data/mit/annotations/validationSet.csv')) + val_list = [line_to_map(x) for x in csv_reader] + + test_list = val_list # no test for mit + + splits = ((train_list, val_list, test_list), ) + return splits + + +def parse_hmdb51_split(level): + train_file_template = 'data/hmdb51/annotations/trainlist{:02d}.txt' + test_file_template = 'data/hmdb51/annotations/testlist{:02d}.txt' + class_index_file = 'data/hmdb51/annotations/classInd.txt' + + def generate_class_index_file(): + """This function will generate a `ClassInd.txt` for HMDB51 in a format + like UCF101, where class id starts with 1.""" + video_path = 'data/hmdb51/videos' + annotation_dir = 'data/hmdb51/annotations' + + class_list = sorted(os.listdir(video_path)) + class_dict = dict() + if not osp.exists(class_index_file): + with open(class_index_file, 'w') as f: + content = [] + for class_id, class_name in enumerate(class_list): + # like `ClassInd.txt` in UCF-101, + # the class_id begins with 1 + class_dict[class_name] = class_id + 1 + cur_line = ' '.join([str(class_id + 1), class_name]) + content.append(cur_line) + content = '\n'.join(content) + f.write(content) + else: + print(f'{class_index_file} has been generated before.') + class_dict = { + class_name: class_id + 1 + for class_id, class_name in enumerate(class_list) + } + + for i in range(1, 4): + train_content = [] + test_content = [] + for class_name in class_dict: + filename = class_name + f'_test_split{i}.txt' + filename_path = osp.join(annotation_dir, filename) + with open(filename_path, 'r') as fin: + for line in fin: + video_info = line.strip().split() + video_name = video_info[0] + if video_info[1] == '1': + target_line = ' '.join([ + osp.join(class_name, video_name), + str(class_dict[class_name]) + ]) + train_content.append(target_line) + elif video_info[1] == '2': + target_line = ' '.join([ + osp.join(class_name, video_name), + str(class_dict[class_name]) + ]) + test_content.append(target_line) + train_content = '\n'.join(train_content) + test_content = '\n'.join(test_content) + with open(train_file_template.format(i), 'w') as fout: + fout.write(train_content) + with open(test_file_template.format(i), 'w') as fout: + fout.write(test_content) + + generate_class_index_file() + + with open(class_index_file, 'r') as fin: + class_index = [x.strip().split() for x in fin] + class_mapping = {x[1]: int(x[0]) - 1 for x in class_index} + + def line_to_map(line): + items = line.strip().split() + video = osp.splitext(items[0])[0] + if level == 1: + video = osp.basename(video) + elif level == 2: + video = osp.join( + osp.basename(osp.dirname(video)), osp.basename(video)) + label = class_mapping[osp.dirname(items[0])] + return video, label + + splits = [] + for i in range(1, 4): + with open(train_file_template.format(i), 'r') as fin: + train_list = [line_to_map(x) for x in fin] + + with open(test_file_template.format(i), 'r') as fin: + test_list = [line_to_map(x) for x in fin] + splits.append((train_list, test_list)) + + return splits + + +def parse_diving48_splits(): + + train_file = 'data/diving48/annotations/Diving48_V2_train.json' + test_file = 'data/diving48/annotations/Diving48_V2_test.json' + + train = json.load(open(train_file)) + test = json.load(open(test_file)) + + # class_index_file = 'data/diving48/annotations/Diving48_vocab.json' + # class_list = json.load(open(class_index_file)) + + train_list = [] + test_list = [] + + for item in train: + vid_name = item['vid_name'] + label = item['label'] + train_list.append((vid_name, label)) + + for item in test: + vid_name = item['vid_name'] + label = item['label'] + test_list.append((vid_name, label)) + + splits = ((train_list, test_list), ) + return splits diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/resize_videos.py b/openmmlab_test/mmaction2-0.24.1/tools/data/resize_videos.py new file mode 100644 index 0000000000000000000000000000000000000000..8f6695a6f0d9babcaf9763d8c828c821ed800e20 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/resize_videos.py @@ -0,0 +1,126 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import glob +import os +import os.path as osp +import sys +from multiprocessing import Pool + + +def resize_videos(vid_item, args): + """Generate resized video cache. + + Args: + vid_item (list): Video item containing video full path, + video relative path. + Returns: + bool: Whether generate video cache successfully. + """ + full_path, vid_path = vid_item + # Change the output video extension to .mp4 if '--to-mp4' flag is set + if args.to_mp4: + vid_path = vid_path.split('.') + assert len(vid_path) == 2, \ + f"Video path '{vid_path}' contain more than one dot" + vid_path = vid_path[0] + '.mp4' + out_full_path = osp.join(args.out_dir, vid_path) + dir_name = osp.dirname(vid_path) + out_dir = osp.join(args.out_dir, dir_name) + if not osp.exists(out_dir): + os.makedirs(out_dir) + result = os.popen( + f'ffprobe -hide_banner -loglevel error -select_streams v:0 -show_entries stream=width,height -of csv=p=0 {full_path}' # noqa:E501 + ) + w, h = [int(d) for d in result.readline().rstrip().split(',')] + if w > h: + cmd = (f'ffmpeg -hide_banner -loglevel error -i {full_path} ' + f'-vf {"mpdecimate," if args.remove_dup else ""}' + f'scale=-2:{args.scale} ' + f'{"-vsync vfr" if args.remove_dup else ""} ' + f'-c:v libx264 {"-g 16" if args.dense else ""} ' + f'-an {out_full_path} -y') + else: + cmd = (f'ffmpeg -hide_banner -loglevel error -i {full_path} ' + f'-vf {"mpdecimate," if args.remove_dup else ""}' + f'scale={args.scale}:-2 ' + f'{"-vsync vfr" if args.remove_dup else ""} ' + f'-c:v libx264 {"-g 16" if args.dense else ""} ' + f'-an {out_full_path} -y') + os.popen(cmd) + print(f'{vid_path} done') + sys.stdout.flush() + return True + + +def run_with_args(item): + vid_item, args = item + return resize_videos(vid_item, args) + + +def parse_args(): + parser = argparse.ArgumentParser( + description='Generate the resized cache of original videos') + parser.add_argument('src_dir', type=str, help='source video directory') + parser.add_argument('out_dir', type=str, help='output video directory') + parser.add_argument( + '--dense', + action='store_true', + help='whether to generate a faster cache') + parser.add_argument( + '--level', + type=int, + choices=[1, 2], + default=2, + help='directory level of data') + parser.add_argument( + '--remove-dup', + action='store_true', + help='whether to remove duplicated frames') + parser.add_argument( + '--ext', + type=str, + default='mp4', + choices=['avi', 'mp4', 'webm', 'mkv'], + help='video file extensions') + parser.add_argument( + '--to-mp4', + action='store_true', + help='whether to output videos in mp4 format') + parser.add_argument( + '--scale', + type=int, + default=256, + help='resize image short side length keeping ratio') + parser.add_argument( + '--num-worker', type=int, default=8, help='number of workers') + args = parser.parse_args() + + return args + + +if __name__ == '__main__': + args = parse_args() + + if not osp.isdir(args.out_dir): + print(f'Creating folder: {args.out_dir}') + os.makedirs(args.out_dir) + + print('Reading videos from folder: ', args.src_dir) + print('Extension of videos: ', args.ext) + fullpath_list = glob.glob(args.src_dir + '/*' * args.level + '.' + + args.ext) + done_fullpath_list = glob.glob(args.out_dir + '/*' * args.level + args.ext) + print('Total number of videos found: ', len(fullpath_list)) + print('Total number of videos transfer finished: ', + len(done_fullpath_list)) + if args.level == 2: + vid_list = list( + map( + lambda p: osp.join( + osp.basename(osp.dirname(p)), osp.basename(p)), + fullpath_list)) + elif args.level == 1: + vid_list = list(map(osp.basename, fullpath_list)) + pool = Pool(args.num_worker) + vid_items = zip(fullpath_list, vid_list) + pool.map(run_with_args, [(item, args) for item in vid_items]) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/NTU_RGBD120_samples_with_missing_skeletons.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/NTU_RGBD120_samples_with_missing_skeletons.txt new file mode 100644 index 0000000000000000000000000000000000000000..e37c94eb4227dc64477644be54336f6a28a651e4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/NTU_RGBD120_samples_with_missing_skeletons.txtdiff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/NTU_RGBD_samples_with_missing_skeletons.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/NTU_RGBD_samples_with_missing_skeletons.txt new file mode 100644 index 0000000000000000000000000000000000000000..5ad472e40453a985d8651444162c8c821011dd16 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/NTU_RGBD_samples_with_missing_skeletons.txtdiff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/README.md new file mode 100644 index 0000000000000000000000000000000000000000..25c7f628929ee6b37fb18af328b89c3a16aa7b85 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/README.md @@ -0,0 +1,131 @@ +# Preparing Skeleton Dataset + + + +```BibTeX +@misc{duan2021revisiting, + title={Revisiting Skeleton-based Action Recognition}, + author={Haodong Duan and Yue Zhao and Kai Chen and Dian Shao and Dahua Lin and Bo Dai}, + year={2021}, + eprint={2104.13586}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +## Introduction + +We release the skeleton annotations used in [Revisiting Skeleton-based Action Recognition](https://arxiv.org/abs/2104.13586). By default, we use [Faster-RCNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco-person.py) with ResNet50 backbone for human detection and [HRNet-w32](https://github.com/open-mmlab/mmpose/blob/master/configs/top_down/hrnet/coco/hrnet_w32_coco_256x192.py) for single person pose estimation. For FineGYM, we use Ground-Truth bounding boxes for the athlete instead of detection bounding boxes. Currently, we release the skeleton annotations for FineGYM and NTURGB-D Xsub split. Other annotations will be soo released. + +## Prepare Annotations + +Currently, we support HMDB51, UCF101, FineGYM and NTURGB+D. For FineGYM, you can execute following scripts to prepare the annotations. + +```shell +bash download_annotations.sh ${DATASET} +``` + +Due to [Conditions of Use](http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp) of the NTURGB+D dataset, we can not directly release the annotations used in our experiments. So that we provide a script to generate pose annotations for videos in NTURGB+D datasets, which generate a dictionary and save it as a single pickle file. You can create a list which contain all annotation dictionaries of corresponding videos and save them as a pickle file. Then you can get the `ntu60_xsub_train.pkl`, `ntu60_xsub_val.pkl`, `ntu120_xsub_train.pkl`, `ntu120_xsub_val.pkl` that we used in training. + +For those who have not enough computations for pose extraction, we provide the outputs of the above pipeline here, corresponding to 4 different splits of NTURGB+D datasets: + +- ntu60_xsub_train: https://download.openmmlab.com/mmaction/posec3d/ntu60_xsub_train.pkl +- ntu60_xsub_val: https://download.openmmlab.com/mmaction/posec3d/ntu60_xsub_val.pkl +- ntu120_xsub_train: https://download.openmmlab.com/mmaction/posec3d/ntu120_xsub_train.pkl +- ntu120_xsub_val: https://download.openmmlab.com/mmaction/posec3d/ntu120_xsub_val.pkl +- hmdb51: https://download.openmmlab.com/mmaction/posec3d/hmdb51.pkl +- ucf101: https://download.openmmlab.com/mmaction/posec3d/ucf101.pkl + +To generate 2D pose annotations for a single video, first, you need to install mmdetection and mmpose from src code. After that, you need to replace the placeholder `mmdet_root` and `mmpose_root` in `ntu_pose_extraction.py` with your installation path. Then you can use following scripts for NTURGB+D video pose extraction: + +```python +python ntu_pose_extraction.py S001C001P001R001A001_rgb.avi S001C001P001R001A001.pkl +``` + +After you get pose annotations for all videos in a dataset split, like `ntu60_xsub_val`. You can gather them into a single list and save the list as `ntu60_xsub_val.pkl`. You can use those larger pickle files for training and testing. + +## The Format of PoseC3D Annotations + +Here we briefly introduce the format of PoseC3D Annotations, we will take `gym_train.pkl` as an example: the content of `gym_train.pkl` is a list of length 20484, each item is a dictionary that is the skeleton annotation of one video. Each dictionary has following fields: + +- keypoint: The keypoint coordinates, which is a numpy array of the shape N (#person) x T (temporal length) x K (#keypoints, 17 in our case) x 2 (x, y coordinate). +- keypoint_score: The keypoint confidence scores, which is a numpy array of the shape N (#person) x T (temporal length) x K (#keypoints, 17 in our case). +- frame_dir: The corresponding video name. +- label: The action category. +- img_shape: The image shape of each frame. +- original_shape: Same as above. +- total_frames: The temporal length of the video. + +For training with your custom dataset, you can refer to [Custom Dataset Training](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/custom_dataset_training.md). + +## Visualization + +For skeleton data visualization, you need also to prepare the RGB videos. Please refer to [visualize_heatmap_volume](/demo/visualize_heatmap_volume.ipynb) for detailed process. Here we provide some visualization examples from NTU-60 and FineGYM. + + + + + + + + + +
+
+ Pose Estimation Results +
+ +
+
+ +
+
+ Keypoint Heatmap Volume Visualization +
+ +
+
+ +
+
+ Limb Heatmap Volume Visualization +
+ +
+
+ +
+ +## Convert the NTU RGB+D raw skeleton data to our format (only applicable to GCN backbones) + +Here we also provide the script for converting the NTU RGB+D raw skeleton data to our format. +First, download the raw skeleton data of NTU-RGBD 60 and NTU-RGBD 120 from https://github.com/shahroudy/NTURGB-D. + +For NTU-RGBD 60, preprocess data and convert the data format with + +```python +python gen_ntu_rgbd_raw.py --data-path your_raw_nturgbd60_skeleton_path --ignored-sample-path NTU_RGBD_samples_with_missing_skeletons.txt --out-folder your_nturgbd60_output_path --task ntu60 +``` + +For NTU-RGBD 120, preprocess data and convert the data format with + +```python +python gen_ntu_rgbd_raw.py --data-path your_raw_nturgbd120_skeleton_path --ignored-sample-path NTU_RGBD120_samples_with_missing_skeletons.txt --out-folder your_nturgbd120_output_path --task ntu120 +``` + +## Convert annotations from third-party projects + +We provide scripts to convert skeleton annotations from third-party projects to MMAction2 formats: + +- BABEL: `babel2mma2.py` + +**TODO**: + +- [x] FineGYM +- [x] NTU60_XSub +- [x] NTU120_XSub +- [x] NTU60_XView +- [x] NTU120_XSet +- [x] UCF101 +- [x] HMDB51 +- [ ] Kinetics diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..fb6de5925a59a38c1407f4a7c142aba1febb3e5c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/README_zh-CN.md @@ -0,0 +1,135 @@ +# 准备骨架数据集 + +```BibTeX +@misc{duan2021revisiting, + title={Revisiting Skeleton-based Action Recognition}, + author={Haodong Duan and Yue Zhao and Kai Chen and Dian Shao and Dahua Lin and Bo Dai}, + year={2021}, + eprint={2104.13586}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +## 简介 + +MMAction2 发布 [Revisiting Skeleton-based Action Recognition](https://arxiv.org/abs/2104.13586) 论文中所使用的骨架标注。 +默认使用 [Faster-RCNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco-person.py) 作为人体检测器, +使用 [HRNet-w32](https://github.com/open-mmlab/mmpose/blob/master/configs/top_down/hrnet/coco/hrnet_w32_coco_256x192.py) 作为单人姿态估计模型。 +对于 FineGYM 数据集,MMAction2 使用的是运动员的真实框标注,而非检测器所出的框。目前,MMAction2 已发布 FineGYM 和 NTURGB-D Xsub 部分的骨架标注,其他数据集的标注也将很快发布。 + +## 准备标注文件 + +目前,MMAction2 支持 HMDB51, UCF101, FineGYM 和 NTURGB+D 数据集。对于 FineGYM 数据集,用户可以使用以下脚本下载标注文件。 + +```shell +bash download_annotations.sh ${DATASET} +``` + +由于 NTURGB+D 数据集的 [使用条例](http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp),MMAction2 并未直接发布实验中所使用的标注文件。 +因此,这里提供生成 NTURGB+D 数据集中视频的姿态标注文件,这将生成一个 dict 数据并将其保存为一个 pickle 文件。 +用户可以生成一个 list 用以包含对应视频的 dict 数据,并将其保存为一个 pickle 文件。 +之后,用户可以获得 `ntu60_xsub_train.pkl`, `ntu60_xsub_val.pkl`, `ntu120_xsub_train.pkl`, `ntu120_xsub_val.pkl` 文件用于训练。 + +对于无法进行姿态提取的用户,这里提供了上述流程的输出结果,分别对应 NTURGB-D 数据集的 4 个部分: + +- ntu60_xsub_train: https://download.openmmlab.com/mmaction/posec3d/ntu60_xsub_train.pkl +- ntu60_xsub_val: https://download.openmmlab.com/mmaction/posec3d/ntu60_xsub_val.pkl +- ntu120_xsub_train: https://download.openmmlab.com/mmaction/posec3d/ntu120_xsub_train.pkl +- ntu120_xsub_val: https://download.openmmlab.com/mmaction/posec3d/ntu120_xsub_val.pkl +- hmdb51: https://download.openmmlab.com/mmaction/posec3d/hmdb51.pkl +- ucf101: https://download.openmmlab.com/mmaction/posec3d/ucf101.pkl + +若想生成单个视频的 2D 姿态标注文件,首先,用户需要由源码安装 mmdetection 和 mmpose。之后,用户需要在 `ntu_pose_extraction.py` 中指定 `mmdet_root` 和 `mmpose_root` 变量。 +最后,用户可使用以下脚本进行 NTURGB+D 视频的姿态提取: + +```python +python ntu_pose_extraction.py S001C001P001R001A001_rgb.avi S001C001P001R001A001.pkl +``` + +在用户获得数据集某部分所有视频的姿态标注文件(如 `ntu60_xsub_val`)后,可以将其集合成一个 list 数据并保存为 `ntu60_xsub_val.pkl`。用户可用这些大型 pickle 文件进行训练和测试。 + +## PoseC3D 的标注文件格式 + +这里简单介绍 PoseC3D 的标注文件格式。以 `gym_train.pkl` 为例:`gym_train.pkl` 存储一个长度为 20484 的 list,list 的每一项为单个视频的骨架标注 dict。每个 dict 的内容如下: + +- keypoint:关键点坐标,大小为 N(#人数)x T(时序长度)x K(#关键点, 这里为17)x 2 (x,y 坐标)的 numpy array 数据类型 +- keypoint_score:关键点的置信分数,大小为 N(#人数)x T(时序长度)x K(#关键点, 这里为17)的 numpy array 数据类型 +- frame_dir: 对应视频名 +- label: 动作类别 +- img_shape: 每一帧图像的大小 +- original_shape: 同 `img_shape` +- total_frames: 视频时序长度 + +如用户想使用自己的数据集训练 PoseC3D,可以参考 [Custom Dataset Training](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/custom_dataset_training.md)。 + +## 可视化 + +为了可视化骨架数据,用户需要准备 RGB 的视频。详情可参考 [visualize_heatmap_volume](/demo/visualize_heatmap_volume.ipynb)。这里提供一些 NTU-60 和 FineGYM 上的例子 + + + + + + + + + +
+
+ 姿态估计结果 +
+ +
+
+ +
+
+ 关键点热力图三维可视化 +
+ +
+
+ +
+
+ 肢体热力图三维可视化 +
+ +
+
+ +
+ +## 如何将 NTU RGB+D 原始数据转化为 MMAction2 格式 (转换好的标注文件目前仅适用于 GCN 模型) + +这里介绍如何将 NTU RGB+D 原始数据转化为 MMAction2 格式。首先,需要从 https://github.com/shahroudy/NTURGB-D 下载原始 NTU-RGBD 60 和 NTU-RGBD 120 数据集的原始骨架数据。 + +对于 NTU-RGBD 60 数据集,可使用以下脚本 + +```python +python gen_ntu_rgbd_raw.py --data-path your_raw_nturgbd60_skeleton_path --ignored-sample-path NTU_RGBD_samples_with_missing_skeletons.txt --out-folder your_nturgbd60_output_path --task ntu60 +``` + +对于 NTU-RGBD 120 数据集,可使用以下脚本 + +```python +python gen_ntu_rgbd_raw.py --data-path your_raw_nturgbd120_skeleton_path --ignored-sample-path NTU_RGBD120_samples_with_missing_skeletons.txt --out-folder your_nturgbd120_output_path --task ntu120 +``` + +## 转换其他第三方项目的骨骼标注 + +MMAction2 提供脚本以将其他第三方项目的骨骼标注转至 MMAction2 格式,如: + +- BABEL: `babel2mma2.py` + +**待办项**: + +- [x] FineGYM +- [x] NTU60_XSub +- [x] NTU120_XSub +- [x] NTU60_XView +- [x] NTU120_XSet +- [x] UCF101 +- [x] HMDB51 +- [ ] Kinetics diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/S001C001P001R001A001_rgb.avi b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/S001C001P001R001A001_rgb.avi new file mode 100644 index 0000000000000000000000000000000000000000..0ea54177e04e0654267aba7b77c936f8fe477658 Binary files /dev/null and b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/S001C001P001R001A001_rgb.avi differ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/babel2mma2.py b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/babel2mma2.py new file mode 100644 index 0000000000000000000000000000000000000000..3dedc1b31eb316d00722709aa1f2e9e27f419c4d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/babel2mma2.py @@ -0,0 +1,25 @@ +# Copyright (c) OpenMMLab. All rights reserved. +# In this example, we convert babel120_train to MMAction2 format +# The required files can be downloaded from the homepage of BABEL project +import numpy as np +from mmcv import dump, load + + +def gen_babel(x, y): + data = [] + for i, xx in enumerate(x): + sample = dict() + sample['keypoint'] = xx.transpose(3, 1, 2, 0).astype(np.float16) + sample['label'] = y[1][0][i] + names = [y[0][i], y[1][1][i], y[1][2][i], y[1][3][i]] + sample['frame_dir'] = '_'.join([str(k) for k in names]) + sample['total_frames'] = 150 + data.append(sample) + return data + + +x = np.load('train_ntu_sk_120.npy') +y = load('train_label_120.pkl') + +data = gen_babel(x, y) +dump(data, 'babel120_train.pkl') diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..d57efbceaced402cb1ba25e9bf0f922a1817ba6e --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/download_annotations.sh @@ -0,0 +1,22 @@ +#!/usr/bin/env bash + +DATASET=$1 +if [ "$DATASET" == "gym" ]; then + echo "We are processing $DATASET" +else + echo "Bad Argument, we only support gym now." + exit 0 +fi + +DATA_DIR="../../../data/posec3d/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +wget https://download.openmmlab.com/mmaction/posec3d/${DATASET}_train.pkl +wget https://download.openmmlab.com/mmaction/posec3d/${DATASET}_val.pkl + +mv ${DATASET}_train.pkl ${DATA_DIR} +mv ${DATASET}_val.pkl ${DATA_DIR} diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/gen_ntu_rgbd_raw.py b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/gen_ntu_rgbd_raw.py new file mode 100644 index 0000000000000000000000000000000000000000..5ca73bf8f12f814f1cd2ac2e7297125df40957dd --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/gen_ntu_rgbd_raw.py @@ -0,0 +1,355 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import math +import os +import os.path as osp + +import mmcv +import numpy as np + +training_subjects_60 = [ + 1, 2, 4, 5, 8, 9, 13, 14, 15, 16, 17, 18, 19, 25, 27, 28, 31, 34, 35, 38 +] +training_cameras_60 = [2, 3] +training_subjects_120 = [ + 1, 2, 4, 5, 8, 9, 13, 14, 15, 16, 17, 18, 19, 25, 27, 28, 31, 34, 35, 38, + 45, 46, 47, 49, 50, 52, 53, 54, 55, 56, 57, 58, 59, 70, 74, 78, 80, 81, 82, + 83, 84, 85, 86, 89, 91, 92, 93, 94, 95, 97, 98, 100, 103 +] +training_setups_120 = [ + 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32 +] +max_body_true = 2 +max_body_kinect = 4 +num_joint = 25 +max_frame = 300 + + +def unit_vector(vector): + """Returns the unit vector of the vector.""" + return vector / np.linalg.norm(vector) + + +def angle_between(v1, v2): + """Returns the angle in radians between vectors 'v1' and 'v2':: + + >>> angle_between((1, 0, 0), (0, 1, 0)) + 1.5707963267948966 + >>> angle_between((1, 0, 0), (1, 0, 0)) + 0.0 + >>> angle_between((1, 0, 0), (-1, 0, 0)) + 3.141592653589793 + """ + if np.abs(v1).sum() < 1e-6 or np.abs(v2).sum() < 1e-6: + return 0 + v1_u = unit_vector(v1) + v2_u = unit_vector(v2) + return np.arccos(np.clip(np.dot(v1_u, v2_u), -1.0, 1.0)) + + +def rotation_matrix(axis, theta): + """Return the rotation matrix associated with counterclockwise rotation + about the given axis by theta radians.""" + if np.abs(axis).sum() < 1e-6 or np.abs(theta) < 1e-6: + return np.eye(3) + axis = np.asarray(axis) + axis = axis / math.sqrt(np.dot(axis, axis)) + a = math.cos(theta / 2.0) + b, c, d = -axis * math.sin(theta / 2.0) + aa, bb, cc, dd = a * a, b * b, c * c, d * d + bc, ad, ac, ab, bd, cd = b * c, a * d, a * c, a * b, b * d, c * d + return np.array([[aa + bb - cc - dd, 2 * (bc + ad), 2 * (bd - ac)], + [2 * (bc - ad), aa + cc - bb - dd, 2 * (cd + ab)], + [2 * (bd + ac), 2 * (cd - ab), aa + dd - bb - cc]]) + + +def pre_normalization(data, zaxis=[0, 1], xaxis=[8, 4]): + N, C, T, V, M = data.shape + s = np.transpose(data, [0, 4, 2, 3, 1]) # N C T V M -> N M T V C + + print('pad the null frames with the previous frames') + prog_bar = mmcv.ProgressBar(len(s)) + for i_s, skeleton in enumerate(s): + if skeleton.sum() == 0: + print(i_s, ' has no skeleton') + for i_p, person in enumerate(skeleton): + if person.sum() == 0: + continue + if person[0].sum() == 0: + index = (person.sum(-1).sum(-1) != 0) + tmp = person[index].copy() + person *= 0 + person[:len(tmp)] = tmp + + for i_f, frame in enumerate(person): + if frame.sum() == 0: + if person[i_f:].sum() == 0: + rest = len(person) - i_f + num = int(np.ceil(rest / i_f)) + pad = np.concatenate( + [person[0:i_f] for _ in range(num)], 0)[:rest] + s[i_s, i_p, i_f:] = pad + break + prog_bar.update() + + print('sub the center joint #1 (spine joint in ntu and ' + 'neck joint in kinetics)') + prog_bar = mmcv.ProgressBar(len(s)) + for i_s, skeleton in enumerate(s): + if skeleton.sum() == 0: + continue + main_body_center = skeleton[0][:, 1:2, :].copy() + for i_p, person in enumerate(skeleton): + if person.sum() == 0: + continue + mask = (person.sum(-1) != 0).reshape(T, V, 1) + s[i_s, i_p] = (s[i_s, i_p] - main_body_center) * mask + prog_bar.update() + + print('parallel the bone between hip(jpt 0) and ' + 'spine(jpt 1) of the first person to the z axis') + prog_bar = mmcv.ProgressBar(len(s)) + for i_s, skeleton in enumerate(s): + if skeleton.sum() == 0: + continue + joint_bottom = skeleton[0, 0, zaxis[0]] + joint_top = skeleton[0, 0, zaxis[1]] + axis = np.cross(joint_top - joint_bottom, [0, 0, 1]) + angle = angle_between(joint_top - joint_bottom, [0, 0, 1]) + matrix_z = rotation_matrix(axis, angle) + for i_p, person in enumerate(skeleton): + if person.sum() == 0: + continue + for i_f, frame in enumerate(person): + if frame.sum() == 0: + continue + for i_j, joint in enumerate(frame): + s[i_s, i_p, i_f, i_j] = np.dot(matrix_z, joint) + prog_bar.update() + + print('parallel the bone between right shoulder(jpt 8) and ' + 'left shoulder(jpt 4) of the first person to the x axis') + prog_bar = mmcv.ProgressBar(len(s)) + for i_s, skeleton in enumerate(s): + if skeleton.sum() == 0: + continue + joint_rshoulder = skeleton[0, 0, xaxis[0]] + joint_lshoulder = skeleton[0, 0, xaxis[1]] + axis = np.cross(joint_rshoulder - joint_lshoulder, [1, 0, 0]) + angle = angle_between(joint_rshoulder - joint_lshoulder, [1, 0, 0]) + matrix_x = rotation_matrix(axis, angle) + for i_p, person in enumerate(skeleton): + if person.sum() == 0: + continue + for i_f, frame in enumerate(person): + if frame.sum() == 0: + continue + for i_j, joint in enumerate(frame): + s[i_s, i_p, i_f, i_j] = np.dot(matrix_x, joint) + prog_bar.update() + + data = np.transpose(s, [0, 4, 2, 3, 1]) + return data + + +def read_skeleton_filter(file): + with open(file, 'r') as f: + skeleton_sequence = {} + skeleton_sequence['num_frame'] = int(f.readline()) + skeleton_sequence['frameInfo'] = [] + + for t in range(skeleton_sequence['num_frame']): + frame_info = {} + frame_info['numBody'] = int(f.readline()) + frame_info['bodyInfo'] = [] + + for m in range(frame_info['numBody']): + body_info = {} + body_info_key = [ + 'bodyID', 'clipedEdges', 'handLeftConfidence', + 'handLeftState', 'handRightConfidence', 'handRightState', + 'isResticted', 'leanX', 'leanY', 'trackingState' + ] + body_info = { + k: float(v) + for k, v in zip(body_info_key, + f.readline().split()) + } + body_info['numJoint'] = int(f.readline()) + body_info['jointInfo'] = [] + for v in range(body_info['numJoint']): + joint_info_key = [ + 'x', 'y', 'z', 'depthX', 'depthY', 'colorX', 'colorY', + 'orientationW', 'orientationX', 'orientationY', + 'orientationZ', 'trackingState' + ] + joint_info = { + k: float(v) + for k, v in zip(joint_info_key, + f.readline().split()) + } + body_info['jointInfo'].append(joint_info) + frame_info['bodyInfo'].append(body_info) + skeleton_sequence['frameInfo'].append(frame_info) + + return skeleton_sequence + + +def get_nonzero_std(s): # T V C + index = s.sum(-1).sum(-1) != 0 + s = s[index] + if len(s) != 0: + s = s[:, :, 0].std() + s[:, :, 1].std() + s[:, :, + 2].std() # three channels + else: + s = 0 + return s + + +def read_xyz(file, max_body=2, num_joint=25): + seq_info = read_skeleton_filter(file) + # num_frame = seq_info['num_frame'] + data = np.zeros((max_body, seq_info['num_frame'], num_joint, 3)) + for n, f in enumerate(seq_info['frameInfo']): + for m, b in enumerate(f['bodyInfo']): + for j, v in enumerate(b['jointInfo']): + if m < max_body and j < num_joint: + data[m, n, j, :] = [v['x'], v['y'], v['z']] + else: + pass + + # select two max energy body + energy = np.array([get_nonzero_std(x) for x in data]) + index = energy.argsort()[::-1][0:max_body_true] + data = data[index] + data = data.transpose(3, 1, 2, 0) + return data + + +def gendata(data_path, + out_path, + ignored_sample_path=None, + task='ntu60', + benchmark='xsub', + part='train', + pre_norm=True): + if ignored_sample_path is not None: + with open(ignored_sample_path, 'r') as f: + ignored_samples = [ + line.strip() + '.skeleton' for line in f.readlines() + ] + else: + ignored_samples = [] + + sample_name = [] + sample_label = [] + total_frames = [] + results = [] + + for filename in os.listdir(data_path): + if filename in ignored_samples: + continue + + setup_number = int(filename[filename.find('S') + 1:filename.find('S') + + 4]) + action_class = int(filename[filename.find('A') + 1:filename.find('A') + + 4]) + subject_id = int(filename[filename.find('P') + 1:filename.find('P') + + 4]) + camera_id = int(filename[filename.find('C') + 1:filename.find('C') + + 4]) + + if benchmark == 'xsub': + if task == 'ntu60': + istraining = (subject_id in training_subjects_60) + else: + istraining = (subject_id in training_subjects_120) + elif benchmark == 'xview': + istraining = (camera_id in training_cameras_60) + elif benchmark == 'xsetup': + istraining = (setup_number in training_setups_120) + else: + raise ValueError() + + if part == 'train': + issample = istraining + elif part == 'val': + issample = not (istraining) + else: + raise ValueError() + + if issample: + sample_name.append(filename) + sample_label.append(action_class - 1) + + fp = np.zeros((len(sample_label), 3, max_frame, num_joint, max_body_true), + dtype=np.float32) + prog_bar = mmcv.ProgressBar(len(sample_name)) + for i, s in enumerate(sample_name): + data = read_xyz( + osp.join(data_path, s), + max_body=max_body_kinect, + num_joint=num_joint).astype(np.float32) + fp[i, :, 0:data.shape[1], :, :] = data + total_frames.append(data.shape[1]) + prog_bar.update() + + if pre_norm: + fp = pre_normalization(fp) + + prog_bar = mmcv.ProgressBar(len(sample_name)) + for i, s in enumerate(sample_name): + anno = dict() + anno['total_frames'] = total_frames[i] + anno['keypoint'] = fp[i, :, 0:total_frames[i], :, :].transpose( + 3, 1, 2, 0) # C T V M -> M T V C + anno['frame_dir'] = osp.splitext(s)[0] + anno['img_shape'] = (1080, 1920) + anno['original_shape'] = (1080, 1920) + anno['label'] = sample_label[i] + + results.append(anno) + prog_bar.update() + + output_path = '{}/{}.pkl'.format(out_path, part) + mmcv.dump(results, output_path) + print(f'{benchmark}-{part} finish~!') + + +if __name__ == '__main__': + parser = argparse.ArgumentParser( + description='Generate Pose Annotation for NTURGB-D raw skeleton data') + parser.add_argument( + '--data-path', + type=str, + help='raw skeleton data path', + default='data/ntu/nturgb+d_skeletons_60/') + parser.add_argument( + '--ignored-sample-path', + type=str, + default='NTU_RGBD_samples_with_missing_skeletons.txt') + parser.add_argument( + '--out-folder', type=str, default='data/ntu/nturgb+d_skeletons_60_3d') + parser.add_argument('--task', type=str, default='ntu60') + args = parser.parse_args() + + assert args.task in ['ntu60', 'ntu120'] + + if args.task == 'ntu60': + benchmark = ['xsub', 'xview'] + else: + benchmark = ['xsub', 'xsetup'] + part = ['train', 'val'] + + for b in benchmark: + for p in part: + out_path = osp.join(args.out_folder, b) + if not osp.exists(out_path): + os.makedirs(out_path) + gendata( + args.data_path, + out_path, + args.ignored_sample_path, + args.task, + benchmark=b, + part=p) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/label_map_gym99.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/label_map_gym99.txt new file mode 100644 index 0000000000000000000000000000000000000000..daca3aa7f7d41a3c43aea577331be266d6e0b275 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/label_map_gym99.txt @@ -0,0 +1,99 @@ +(VT) round-off, flic-flac with 0.5 turn on, stretched salto forward with 0.5 turn off +(VT) round-off, flic-flac on, stretched salto backward with 2 turn off +(VT) round-off, flic-flac on, stretched salto backward with 1 turn off +(VT) round-off, flic-flac on, stretched salto backward with 1.5 turn off +(VT) round-off, flic-flac on, stretched salto backward with 2.5 turn off +(VT) round-off, flic-flac on, stretched salto backward off +(FX) switch leap with 0.5 turn +(FX) switch leap with 1 turn +(FX) split leap with 1 turn +(FX) split leap with 1.5 turn or more +(FX) switch leap (leap forward with leg change to cross split) +(FX) split jump with 1 turn +(FX) split jump (leg separation 180 degree parallel to the floor) +(FX) johnson with additional 0.5 turn +(FX) straddle pike or side split jump with 1 turn +(FX) switch leap to ring position +(FX) stag jump +(FX) 2 turn with free leg held upward in 180 split position throughout turn +(FX) 2 turn in tuck stand on one leg, free leg straight throughout turn +(FX) 3 turn on one leg, free leg optional below horizontal +(FX) 2 turn on one leg, free leg optional below horizontal +(FX) 1 turn on one leg, free leg optional below horizontal +(FX) 2 turn or more with heel of free leg forward at horizontal throughout turn +(FX) 1 turn with heel of free leg forward at horizontal throughout turn +(FX) arabian double salto tucked +(FX) salto forward tucked +(FX) aerial walkover forward +(FX) salto forward stretched with 2 twist +(FX) salto forward stretched with 1 twist +(FX) salto forward stretched with 1.5 twist +(FX) salto forward stretched, feet land together +(FX) double salto backward stretched +(FX) salto backward stretched with 3 twist +(FX) salto backward stretched with 2 twist +(FX) salto backward stretched with 2.5 twist +(FX) salto backward stretched with 1.5 twist +(FX) double salto backward tucked with 2 twist +(FX) double salto backward tucked with 1 twist +(FX) double salto backward tucked +(FX) double salto backward piked with 1 twist +(FX) double salto backward piked +(BB) sissone (leg separation 180 degree on the diagonal to the floor, take off two feet, land on one foot) +(BB) split jump with 0.5 turn in side position +(BB) split jump +(BB) straddle pike jump or side split jump +(BB) split ring jump (ring jump with front leg horizontal to the floor) +(BB) switch leap with 0.5 turn +(BB) switch leap (leap forward with leg change) +(BB) split leap forward +(BB) johnson (leap forward with leg change and 0.25 turn to side split or straddle pike position) +(BB) switch leap to ring position +(BB) sheep jump (jump with upper back arch and head release with feet to head height/closed Ring) +(BB) wolf hop or jump (hip angle at 45, knees together) +(BB) 1 turn with heel of free leg forward at horizontal throughout turn +(BB) 2 turn on one leg, free leg optional below horizontal +(BB) 1 turn on one leg, free leg optional below horizontal +(BB) 2 turn in tuck stand on one leg, free leg optional +(BB) salto backward tucked with 1 twist +(BB) salto backward tucked +(BB) salto backward stretched-step out (feet land successively) +(BB) salto backward stretched with legs together +(BB) salto sideward tucked, take off from one leg to side stand +(BB) free aerial cartwheel landing in cross position +(BB) salto forward tucked to cross stand +(BB) free aerial walkover forward, landing on one or both feet +(BB) jump backward, flic-flac take-off with 0.5 twist through handstand to walkover forward, also with support on one arm +(BB) flic-flac to land on both feet +(BB) flic-flac with step-out, also with support on one arm +(BB) round-off +(BB) double salto backward tucked +(BB) salto backward tucked +(BB) double salto backward piked +(BB) salto backward stretched with 2 twist +(BB) salto backward stretched with 2.5 twist +(UB) pike sole circle backward with 1 turn to handstand +(UB) pike sole circle backward with 0.5 turn to handstand +(UB) pike sole circle backward to handstand +(UB) giant circle backward with 1 turn to handstand +(UB) giant circle backward with 0.5 turn to handstand +(UB) giant circle backward +(UB) giant circle forward with 1 turn on one arm before handstand phase +(UB) giant circle forward with 0.5 turn to handstand +(UB) giant circle forward +(UB) clear hip circle backward to handstand +(UB) clear pike circle backward with 1 turn to handstand +(UB) clear pike circle backward with 0.5 turn to handstand +(UB) clear pike circle backward to handstand +(UB) stalder backward with 1 turn to handstand +(UB) stalder backward to handstand +(UB) counter straddle over high bar to hang +(UB) counter piked over high bar to hang +(UB) (swing backward or front support) salto forward straddled to hang on high bar +(UB) (swing backward) salto forward piked to hang on high bar +(UB) (swing forward or hip circle backward) salto backward with 0.5 turn piked to hang on high bar +(UB) transition flight from high bar to low bar +(UB) transition flight from low bar to high bar +(UB) (swing forward) double salto backward tucked with 1 turn +(UB) (swing backward) double salto forward tucked +(UB) (swing forward) double salto backward stretched diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/label_map_ntu120.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/label_map_ntu120.txt new file mode 100644 index 0000000000000000000000000000000000000000..69826dfebf9cd33b982c12663c56ac05ae5add97 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/label_map_ntu120.txt @@ -0,0 +1,120 @@ +drink water +eat meal/snack +brushing teeth +brushing hair +drop +pickup +throw +sitting down +standing up (from sitting position) +clapping +reading +writing +tear up paper +wear jacket +take off jacket +wear a shoe +take off a shoe +wear on glasses +take off glasses +put on a hat/cap +take off a hat/cap +cheer up +hand waving +kicking something +reach into pocket +hopping (one foot jumping) +jump up +make a phone call/answer phone +playing with phone/tablet +typing on a keyboard +pointing to something with finger +taking a selfie +check time (from watch) +rub two hands together +nod head/bow +shake head +wipe face +salute +put the palms together +cross hands in front (say stop) +sneeze/cough +staggering +falling +touch head (headache) +touch chest (stomachache/heart pain) +touch back (backache) +touch neck (neckache) +nausea or vomiting condition +use a fan (with hand or paper)/feeling warm +punching/slapping other person +kicking other person +pushing other person +pat on back of other person +point finger at the other person +hugging other person +giving something to other person +touch other person's pocket +handshaking +walking towards each other +walking apart from each other +put on headphone +take off headphone +shoot at the basket +bounce ball +tennis bat swing +juggling table tennis balls +hush (quite) +flick hair +thumb up +thumb down +make ok sign +make victory sign +staple book +counting money +cutting nails +cutting paper (using scissors) +snapping fingers +open bottle +sniff (smell) +squat down +toss a coin +fold paper +ball up paper +play magic cube +apply cream on face +apply cream on hand back +put on bag +take off bag +put something into a bag +take something out of a bag +open a box +move heavy objects +shake fist +throw up cap/hat +hands up (both hands) +cross arms +arm circles +arm swings +running on the spot +butt kicks (kick backward) +cross toe touch +side kick +yawn +stretch oneself +blow nose +hit other person with something +wield knife towards other person +knock over other person (hit with body) +grab other person’s stuff +shoot at other person with a gun +step on foot +high-five +cheers and drink +carry something with other person +take a photo of other person +follow other person +whisper in other person’s ear +exchange things with other person +support somebody with hand +finger-guessing game (playing rock-paper-scissors) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/ntu_pose_extraction.py b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/ntu_pose_extraction.py new file mode 100644 index 0000000000000000000000000000000000000000..7ce37cedb25a270f2997b2cc00338226ed1c675d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/skeleton/ntu_pose_extraction.py @@ -0,0 +1,347 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import abc +import argparse +import os +import os.path as osp +import random as rd +import shutil +import string +from collections import defaultdict + +import cv2 +import mmcv +import numpy as np + +try: + from mmdet.apis import inference_detector, init_detector +except (ImportError, ModuleNotFoundError): + raise ImportError('Failed to import `inference_detector` and ' + '`init_detector` form `mmdet.apis`. These apis are ' + 'required in this script! ') + +try: + from mmpose.apis import inference_top_down_pose_model, init_pose_model +except (ImportError, ModuleNotFoundError): + raise ImportError('Failed to import `inference_top_down_pose_model` and ' + '`init_pose_model` form `mmpose.apis`. These apis are ' + 'required in this script! ') + +mmdet_root = '' +mmpose_root = '' + +args = abc.abstractproperty() +args.det_config = f'{mmdet_root}/configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco-person.py' # noqa: E501 +args.det_checkpoint = 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco-person/faster_rcnn_r50_fpn_1x_coco-person_20201216_175929-d022e227.pth' # noqa: E501 +args.det_score_thr = 0.5 +args.pose_config = f'{mmpose_root}/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w32_coco_256x192.py' # noqa: E501 +args.pose_checkpoint = 'https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth' # noqa: E501 + + +def gen_id(size=8): + chars = string.ascii_uppercase + string.digits + return ''.join(rd.choice(chars) for _ in range(size)) + + +def extract_frame(video_path): + dname = gen_id() + os.makedirs(dname, exist_ok=True) + frame_tmpl = osp.join(dname, 'img_{:05d}.jpg') + vid = cv2.VideoCapture(video_path) + frame_paths = [] + flag, frame = vid.read() + cnt = 0 + while flag: + frame_path = frame_tmpl.format(cnt + 1) + frame_paths.append(frame_path) + + cv2.imwrite(frame_path, frame) + cnt += 1 + flag, frame = vid.read() + + return frame_paths + + +def detection_inference(args, frame_paths): + model = init_detector(args.det_config, args.det_checkpoint, args.device) + assert model.CLASSES[0] == 'person', ('We require you to use a detector ' + 'trained on COCO') + results = [] + print('Performing Human Detection for each frame') + prog_bar = mmcv.ProgressBar(len(frame_paths)) + for frame_path in frame_paths: + result = inference_detector(model, frame_path) + # We only keep human detections with score larger than det_score_thr + result = result[0][result[0][:, 4] >= args.det_score_thr] + results.append(result) + prog_bar.update() + return results + + +def intersection(b0, b1): + l, r = max(b0[0], b1[0]), min(b0[2], b1[2]) + u, d = max(b0[1], b1[1]), min(b0[3], b1[3]) + return max(0, r - l) * max(0, d - u) + + +def iou(b0, b1): + i = intersection(b0, b1) + u = area(b0) + area(b1) - i + return i / u + + +def area(b): + return (b[2] - b[0]) * (b[3] - b[1]) + + +def removedup(bbox): + + def inside(box0, box1, thre=0.8): + return intersection(box0, box1) / area(box0) > thre + + num_bboxes = bbox.shape[0] + if num_bboxes == 1 or num_bboxes == 0: + return bbox + valid = [] + for i in range(num_bboxes): + flag = True + for j in range(num_bboxes): + if i != j and inside(bbox[i], + bbox[j]) and bbox[i][4] <= bbox[j][4]: + flag = False + break + if flag: + valid.append(i) + return bbox[valid] + + +def is_easy_example(det_results, num_person): + threshold = 0.95 + + def thre_bbox(bboxes, thre=threshold): + shape = [sum(bbox[:, -1] > thre) for bbox in bboxes] + ret = np.all(np.array(shape) == shape[0]) + return shape[0] if ret else -1 + + if thre_bbox(det_results) == num_person: + det_results = [x[x[..., -1] > 0.95] for x in det_results] + return True, np.stack(det_results) + return False, thre_bbox(det_results) + + +def bbox2tracklet(bbox): + iou_thre = 0.6 + tracklet_id = -1 + tracklet_st_frame = {} + tracklets = defaultdict(list) + for t, box in enumerate(bbox): + for idx in range(box.shape[0]): + matched = False + for tlet_id in range(tracklet_id, -1, -1): + cond1 = iou(tracklets[tlet_id][-1][-1], box[idx]) >= iou_thre + cond2 = ( + t - tracklet_st_frame[tlet_id] - len(tracklets[tlet_id]) < + 10) + cond3 = tracklets[tlet_id][-1][0] != t + if cond1 and cond2 and cond3: + matched = True + tracklets[tlet_id].append((t, box[idx])) + break + if not matched: + tracklet_id += 1 + tracklet_st_frame[tracklet_id] = t + tracklets[tracklet_id].append((t, box[idx])) + return tracklets + + +def drop_tracklet(tracklet): + tracklet = {k: v for k, v in tracklet.items() if len(v) > 5} + + def meanarea(track): + boxes = np.stack([x[1] for x in track]).astype(np.float32) + areas = (boxes[..., 2] - boxes[..., 0]) * ( + boxes[..., 3] - boxes[..., 1]) + return np.mean(areas) + + tracklet = {k: v for k, v in tracklet.items() if meanarea(v) > 5000} + return tracklet + + +def distance_tracklet(tracklet): + dists = {} + for k, v in tracklet.items(): + bboxes = np.stack([x[1] for x in v]) + c_x = (bboxes[..., 2] + bboxes[..., 0]) / 2. + c_y = (bboxes[..., 3] + bboxes[..., 1]) / 2. + c_x -= 480 + c_y -= 270 + c = np.concatenate([c_x[..., None], c_y[..., None]], axis=1) + dist = np.linalg.norm(c, axis=1) + dists[k] = np.mean(dist) + return dists + + +def tracklet2bbox(track, num_frame): + # assign_prev + bbox = np.zeros((num_frame, 5)) + trackd = {} + for k, v in track: + bbox[k] = v + trackd[k] = v + for i in range(num_frame): + if bbox[i][-1] <= 0.5: + mind = np.Inf + for k in trackd: + if np.abs(k - i) < mind: + mind = np.abs(k - i) + bbox[i] = bbox[k] + return bbox + + +def tracklets2bbox(tracklet, num_frame): + dists = distance_tracklet(tracklet) + sorted_inds = sorted(dists, key=lambda x: dists[x]) + dist_thre = np.Inf + for i in sorted_inds: + if len(tracklet[i]) >= num_frame / 2: + dist_thre = 2 * dists[i] + break + + dist_thre = max(50, dist_thre) + + bbox = np.zeros((num_frame, 5)) + bboxd = {} + for idx in sorted_inds: + if dists[idx] < dist_thre: + for k, v in tracklet[idx]: + if bbox[k][-1] < 0.01: + bbox[k] = v + bboxd[k] = v + bad = 0 + for idx in range(num_frame): + if bbox[idx][-1] < 0.01: + bad += 1 + mind = np.Inf + mink = None + for k in bboxd: + if np.abs(k - idx) < mind: + mind = np.abs(k - idx) + mink = k + bbox[idx] = bboxd[mink] + return bad, bbox + + +def bboxes2bbox(bbox, num_frame): + ret = np.zeros((num_frame, 2, 5)) + for t, item in enumerate(bbox): + if item.shape[0] <= 2: + ret[t, :item.shape[0]] = item + else: + inds = sorted( + list(range(item.shape[0])), key=lambda x: -item[x, -1]) + ret[t] = item[inds[:2]] + for t in range(num_frame): + if ret[t, 0, -1] <= 0.01: + ret[t] = ret[t - 1] + elif ret[t, 1, -1] <= 0.01: + if t: + if ret[t - 1, 0, -1] > 0.01 and ret[t - 1, 1, -1] > 0.01: + if iou(ret[t, 0], ret[t - 1, 0]) > iou( + ret[t, 0], ret[t - 1, 1]): + ret[t, 1] = ret[t - 1, 1] + else: + ret[t, 1] = ret[t - 1, 0] + return ret + + +def ntu_det_postproc(vid, det_results): + det_results = [removedup(x) for x in det_results] + label = int(vid.split('/')[-1].split('A')[1][:3]) + mpaction = list(range(50, 61)) + list(range(106, 121)) + n_person = 2 if label in mpaction else 1 + is_easy, bboxes = is_easy_example(det_results, n_person) + if is_easy: + print('\nEasy Example') + return bboxes + + tracklets = bbox2tracklet(det_results) + tracklets = drop_tracklet(tracklets) + + print(f'\nHard {n_person}-person Example, found {len(tracklets)} tracklet') + if n_person == 1: + if len(tracklets) == 1: + tracklet = list(tracklets.values())[0] + det_results = tracklet2bbox(tracklet, len(det_results)) + return np.stack(det_results) + else: + bad, det_results = tracklets2bbox(tracklets, len(det_results)) + return det_results + # n_person is 2 + if len(tracklets) <= 2: + tracklets = list(tracklets.values()) + bboxes = [] + for tracklet in tracklets: + bboxes.append(tracklet2bbox(tracklet, len(det_results))[:, None]) + bbox = np.concatenate(bboxes, axis=1) + return bbox + else: + return bboxes2bbox(det_results, len(det_results)) + + +def pose_inference(args, frame_paths, det_results): + model = init_pose_model(args.pose_config, args.pose_checkpoint, + args.device) + print('Performing Human Pose Estimation for each frame') + prog_bar = mmcv.ProgressBar(len(frame_paths)) + + num_frame = len(det_results) + num_person = max([len(x) for x in det_results]) + kp = np.zeros((num_person, num_frame, 17, 3), dtype=np.float32) + + for i, (f, d) in enumerate(zip(frame_paths, det_results)): + # Align input format + d = [dict(bbox=x) for x in list(d) if x[-1] > 0.5] + pose = inference_top_down_pose_model(model, f, d, format='xyxy')[0] + for j, item in enumerate(pose): + kp[j, i] = item['keypoints'] + prog_bar.update() + return kp + + +def ntu_pose_extraction(vid, skip_postproc=False): + frame_paths = extract_frame(vid) + det_results = detection_inference(args, frame_paths) + if not skip_postproc: + det_results = ntu_det_postproc(vid, det_results) + pose_results = pose_inference(args, frame_paths, det_results) + anno = dict() + anno['keypoint'] = pose_results[..., :2] + anno['keypoint_score'] = pose_results[..., 2] + anno['frame_dir'] = osp.splitext(osp.basename(vid))[0] + anno['img_shape'] = (1080, 1920) + anno['original_shape'] = (1080, 1920) + anno['total_frames'] = pose_results.shape[1] + anno['label'] = int(osp.basename(vid).split('A')[1][:3]) - 1 + shutil.rmtree(osp.dirname(frame_paths[0])) + + return anno + + +def parse_args(): + parser = argparse.ArgumentParser( + description='Generate Pose Annotation for a single NTURGB-D video') + parser.add_argument('video', type=str, help='source video') + parser.add_argument('output', type=str, help='output pickle name') + parser.add_argument('--device', type=str, default='cuda:0') + parser.add_argument('--skip-postproc', action='store_true') + args = parser.parse_args() + return args + + +if __name__ == '__main__': + global_args = parse_args() + args.device = global_args.device + args.video = global_args.video + args.output = global_args.output + args.skip_postproc = global_args.skip_postproc + anno = ntu_pose_extraction(args.video, args.skip_postproc) + mmcv.dump(anno, args.output) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/README.md new file mode 100644 index 0000000000000000000000000000000000000000..75f4c111343764160dc3065935fd7853df113bdc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/README.md @@ -0,0 +1,144 @@ +# Preparing Something-Something V1 + +## Introduction + + + +```BibTeX +@misc{goyal2017something, + title={The "something something" video database for learning and evaluating visual common sense}, + author={Raghav Goyal and Samira Ebrahimi Kahou and Vincent Michalski and Joanna Materzyńska and Susanne Westphal and Heuna Kim and Valentin Haenel and Ingo Fruend and Peter Yianilos and Moritz Mueller-Freitag and Florian Hoppe and Christian Thurau and Ingo Bax and Roland Memisevic}, + year={2017}, + eprint={1706.04261}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +For basic dataset information, you can refer to the dataset [website](https://20bn.com/datasets/something-something/v1). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/sthv1/`. + +## Step 1. Prepare Annotations + +First of all, you have to sign in and download annotations to `$MMACTION2/data/sthv1/annotations` on the official [website](https://20bn.com/datasets/something-something/v1). + +## Step 2. Prepare RGB Frames + +Since the [sthv1 website](https://20bn.com/datasets/something-something/v1) doesn't provide the original video data and only extracted RGB frames are available, you have to directly download RGB frames from [sthv1 website](https://20bn.com/datasets/something-something/v1). + +You can download all compressed file parts on [sthv1 website](https://20bn.com/datasets/something-something/v1) to `$MMACTION2/data/sthv1/` and use the following command to uncompress. + +```shell +cd $MMACTION2/data/sthv1/ +cat 20bn-something-something-v1-?? | tar zx +cd $MMACTION2/tools/data/sthv1/ +``` + +For users who only want to use RGB frames, you can skip to step 5 to generate file lists in the format of rawframes. +Since the prefix of official JPGs is "%05d.jpg" (e.g., "00001.jpg"), users need to add `"filename_tmpl='{:05}.jpg'"` to the dict of `data.train`, `data.val` and `data.test` in the config files related with sthv1 like this: + +``` +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +``` + +## Step 3. Extract Flow + +This part is **optional** if you only want to use RGB frames. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. + +You can run the following script to soft link SSD. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/sthv1_extracted/ +ln -s /mnt/SSD/sthv1_extracted/ ../../../data/sthv1/rawframes +``` + +Then, you can run the following script to extract optical flow based on RGB frames. + +```shell +cd $MMACTION2/tools/data/sthv1/ +bash extract_flow.sh +``` + +## Step 4. Encode Videos + +This part is **optional** if you only want to use RGB frames. + +You can run the following script to encode videos. + +```shell +cd $MMACTION2/tools/data/sthv1/ +bash encode_videos.sh +``` + +## Step 5. Generate File List + +You can run the follow script to generate file list in the format of rawframes and videos. + +```shell +cd $MMACTION2/tools/data/sthv1/ +bash generate_{rawframes, videos}_filelist.sh +``` + +## Step 6. Check Directory Structure + +After the whole data process for Something-Something V1 preparation, +you will get the rawframes (RGB + Flow), and annotation files for Something-Something V1. + +In the context of the whole project (for Something-Something V1 only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── sthv1 +│ │ ├── sthv1_{train,val}_list_rawframes.txt +│ │ ├── sthv1_{train,val}_list_videos.txt +│ │ ├── annotations +│ | ├── videos +│ | | ├── 1.mp4 +│ | | ├── 2.mp4 +│ | | ├──... +│ | ├── rawframes +│ | | ├── 1 +│ | | | ├── 00001.jpg +│ | | | ├── 00002.jpg +│ | | | ├── ... +│ | | | ├── flow_x_00001.jpg +│ | | | ├── flow_x_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_y_00001.jpg +│ | | | ├── flow_y_00002.jpg +│ | | | ├── ... +│ | | ├── 2 +│ | | ├── ... + +``` + +For training and evaluating on Something-Something V1, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..11cc9318beb9eda322243958ae500ca0d3706f90 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/README_zh-CN.md @@ -0,0 +1,142 @@ +# 准备 Something-Something V1 + +## 简介 + +``` +@misc{goyal2017something, + title={The "something something" video database for learning and evaluating visual common sense}, + author={Raghav Goyal and Samira Ebrahimi Kahou and Vincent Michalski and Joanna Materzyńska and Susanne Westphal and Heuna Kim and Valentin Haenel and Ingo Fruend and Peter Yianilos and Moritz Mueller-Freitag and Florian Hoppe and Christian Thurau and Ingo Bax and Roland Memisevic}, + year={2017}, + eprint={1706.04261}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +用户可参考该数据集的 [官网](https://20bn.com/datasets/something-something/v1),以获取数据集相关的基本信息。 +在数据集准备前,请确保命令行当前路径为 `$MMACTION2/tools/data/sthv1/`。 + +## 步骤 1. 下载标注文件 + +首先,用户需要在 [官网](https://20bn.com/datasets/something-something/v1) 进行注册,才能下载标注文件。下载好的标注文件需要放在 `$MMACTION2/data/sthv1/annotations` 文件夹下。 + +## 步骤 2. 准备 RGB 帧 + +[官网](https://20bn.com/datasets/something-something/v1) 并未提供原始视频文件,只提供了对原视频文件进行抽取得到的 RGB 帧,用户可在 [官网](https://20bn.com/datasets/something-something/v1) 直接下载。 + +将下载好的压缩文件放在 `$MMACTION2/data/sthv1/` 文件夹下,并使用以下脚本进行解压。 + +```shell +cd $MMACTION2/data/sthv1/ +cat 20bn-something-something-v1-?? | tar zx +cd $MMACTION2/tools/data/sthv1/ +``` + +如果用户只想使用 RGB 帧,则可以跳过中间步骤至步骤 5 以直接生成视频帧的文件列表。 +由于官网的 JPG 文件名形如 "%05d.jpg" (比如,"00001.jpg"),需要在配置文件的 `data.train`, `data.val` 和 `data.test` 处添加 `"filename_tmpl='{:05}.jpg'"` 代码,以修改文件名模板。 + +``` +data = dict( + videos_per_gpu=16, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + filename_tmpl='{:05}.jpg', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + filename_tmpl='{:05}.jpg', + pipeline=test_pipeline)) +``` + +## 步骤 3. 抽取光流 + +如果用户只想使用原 RGB 帧加载训练,则该部分是 **可选项**。 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果拥有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 中。 + +可以运行以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行进行抽取(假设 SSD 挂载在 "/mnt/SSD/") +mkdir /mnt/SSD/sthv1_extracted/ +ln -s /mnt/SSD/sthv1_extracted/ ../../../data/sthv1/rawframes +``` + +如果想抽取光流,则可以运行以下脚本从 RGB 帧中抽取出光流。 + +```shell +cd $MMACTION2/tools/data/sthv1/ +bash extract_flow.sh +``` + +## 步骤 4: 编码视频 + +如果用户只想使用 RGB 帧加载训练,则该部分是 **可选项**。 + +用户可以运行以下命令进行视频编码。 + +```shell +cd $MMACTION2/tools/data/sthv1/ +bash encode_videos.sh +``` + +## 步骤 5. 生成文件列表 + +用户可以通过运行以下命令生成帧和视频格式的文件列表。 + +```shell +cd $MMACTION2/tools/data/sthv1/ +bash generate_{rawframes, videos}_filelist.sh +``` + +## 步骤 6. 检查文件夹结构 + +在完成所有 Something-Something V1 数据集准备流程后, +用户可以获得对应的 RGB + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,Something-Something V1 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── sthv1 +│ │ ├── sthv1_{train,val}_list_rawframes.txt +│ │ ├── sthv1_{train,val}_list_videos.txt +│ │ ├── annotations +│ | ├── videos +│ | | ├── 1.mp4 +│ | | ├── 2.mp4 +│ | | ├──... +│ | ├── rawframes +│ | | ├── 1 +│ | | | ├── 00001.jpg +│ | | | ├── 00002.jpg +│ | | | ├── ... +│ | | | ├── flow_x_00001.jpg +│ | | | ├── flow_x_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_y_00001.jpg +│ | | | ├── flow_y_00002.jpg +│ | | | ├── ... +│ | | ├── 2 +│ | | ├── ... + +``` + +关于对 Something-Something V1 进行训练和验证,可以参考 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/encode_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/encode_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..79a49ab3c48341e492b3281428cd471c51ec7be1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/encode_videos.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_videos.py ../../data/sthv1/rawframes/ ../../data/sthv1/videos/ --fps 12 --level 1 --start-idx 1 --filename-tmpl '%05d' +echo "Encode videos" + +cd sthv1/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/extract_flow.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/extract_flow.sh new file mode 100644 index 0000000000000000000000000000000000000000..25a66883a2dc18d2d83fc0c7d8ef0f4ed3ec6fdf --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/extract_flow.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/sthv1/rawframes/ ../../data/sthv1/rawframes/ --task flow --level 1 --flow-type tvl1 --input-frames +echo "Flow (tv-l1) Generated" +cd sthv1/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..b6a3935ae54a2ea63d7537800b0b68b9f154d20d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/generate_rawframes_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py sthv1 data/sthv1/rawframes/ --rgb-prefix '0' --num-split 1 --level 1 --subset train --format rawframes --shuffle +PYTHONPATH=. python tools/data/build_file_list.py sthv1 data/sthv1/rawframes/ --rgb-prefix '0' --num-split 1 --level 1 --subset val --format rawframes --shuffle +echo "Filelist for rawframes generated." + +cd tools/data/sthv1/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..4da50b93169451b20251a2daabf28fa76a441ea8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/generate_videos_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py sthv1 data/sthv1/videos/ --num-split 1 --level 1 --subset train --format videos --shuffle +PYTHONPATH=. python tools/data/build_file_list.py sthv1 data/sthv1/videos/ --num-split 1 --level 1 --subset val --format videos --shuffle +echo "Filelist for videos generated." + +cd tools/data/sthv1/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..8e07166d8beddc06f052c145bd576d918ebf3bc1 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv1/label_map.txt @@ -0,0 +1,174 @@ +Holding something +Turning something upside down +Turning the camera left while filming something +Stacking number of something +Turning the camera right while filming something +Opening something +Approaching something with your camera +Picking something up +Pushing something so that it almost falls off but doesn't +Folding something +Moving something away from the camera +Closing something +Moving away from something with your camera +Turning the camera downwards while filming something +Pushing something so that it slightly moves +Turning the camera upwards while filming something +Pretending to pick something up +Showing something to the camera +Moving something up +Plugging something into something +Unfolding something +Putting something onto something +Showing that something is empty +Pretending to put something on a surface +Taking something from somewhere +Putting something next to something +Moving something towards the camera +Showing a photo of something to the camera +Pushing something with something +Throwing something +Pushing something from left to right +Something falling like a feather or paper +Throwing something in the air and letting it fall +Throwing something against something +Lifting something with something on it +Taking one of many similar things on the table +Showing something behind something +Putting something into something +Tearing something just a little bit +Moving something away from something +Tearing something into two pieces +Pushing something from right to left +Holding something next to something +Putting something, something and something on the table +Pretending to take something from somewhere +Moving something closer to something +Pretending to put something next to something +Uncovering something +Something falling like a rock +Putting something and something on the table +Pouring something into something +Moving something down +Pulling something from right to left +Throwing something in the air and catching it +Tilting something with something on it until it falls off +Putting something in front of something +Pretending to turn something upside down +Putting something on a surface +Pretending to throw something +Showing something on top of something +Covering something with something +Squeezing something +Putting something similar to other things that are already on the table +Lifting up one end of something, then letting it drop down +Taking something out of something +Moving part of something +Pulling something from left to right +Lifting something up completely without letting it drop down +Attaching something to something +Putting something behind something +Moving something and something closer to each other +Holding something in front of something +Pushing something so that it falls off the table +Holding something over something +Pretending to open something without actually opening it +Removing something, revealing something behind +Hitting something with something +Moving something and something away from each other +Touching (without moving) part of something +Pretending to put something into something +Showing that something is inside something +Lifting something up completely, then letting it drop down +Pretending to take something out of something +Holding something behind something +Laying something on the table on its side, not upright +Poking something so it slightly moves +Pretending to close something without actually closing it +Putting something upright on the table +Dropping something in front of something +Dropping something behind something +Lifting up one end of something without letting it drop down +Rolling something on a flat surface +Throwing something onto a surface +Showing something next to something +Dropping something onto something +Stuffing something into something +Dropping something into something +Piling something up +Letting something roll along a flat surface +Twisting something +Spinning something that quickly stops spinning +Putting number of something onto something +Putting something underneath something +Moving something across a surface without it falling down +Plugging something into something but pulling it right out as you remove your hand +Dropping something next to something +Poking something so that it falls over +Spinning something so it continues spinning +Poking something so lightly that it doesn't or almost doesn't move +Wiping something off of something +Moving something across a surface until it falls down +Pretending to poke something +Putting something that cannot actually stand upright upright on the table, so it falls on its side +Pulling something out of something +Scooping something up with something +Pretending to be tearing something that is not tearable +Burying something in something +Tipping something over +Tilting something with something on it slightly so it doesn't fall down +Pretending to put something onto something +Bending something until it breaks +Letting something roll down a slanted surface +Trying to bend something unbendable so nothing happens +Bending something so that it deforms +Digging something out of something +Pretending to put something underneath something +Putting something on a flat surface without letting it roll +Putting something on the edge of something so it is not supported and falls down +Spreading something onto something +Pretending to put something behind something +Sprinkling something onto something +Something colliding with something and both come to a halt +Pushing something off of something +Putting something that can't roll onto a slanted surface, so it stays where it is +Lifting a surface with something on it until it starts sliding down +Pretending or failing to wipe something off of something +Trying but failing to attach something to something because it doesn't stick +Pulling something from behind of something +Pushing something so it spins +Pouring something onto something +Pulling two ends of something but nothing happens +Moving something and something so they pass each other +Pretending to sprinkle air onto something +Putting something that can't roll onto a slanted surface, so it slides down +Something colliding with something and both are being deflected +Pretending to squeeze something +Pulling something onto something +Putting something onto something else that cannot support it so it falls down +Lifting a surface with something on it but not enough for it to slide down +Pouring something out of something +Moving something and something so they collide with each other +Tipping something with something in it over, so something in it falls out +Letting something roll up a slanted surface, so it rolls back down +Pretending to scoop something up with something +Pretending to pour something out of something, but something is empty +Pulling two ends of something so that it gets stretched +Failing to put something into something because something does not fit +Pretending or trying and failing to twist something +Trying to pour something into something, but missing so it spills next to it +Something being deflected from something +Poking a stack of something so the stack collapses +Spilling something onto something +Pulling two ends of something so that it separates into two pieces +Pouring something into something until it overflows +Pretending to spread air onto something +Twisting (wringing) something wet until water comes out +Poking a hole into something soft +Spilling something next to something +Poking a stack of something without the stack collapsing +Putting something onto a slanted surface but it doesn't glide down +Pushing something onto something +Poking something so that it spins around +Spilling something behind something +Poking a hole into some substance diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/README.md new file mode 100644 index 0000000000000000000000000000000000000000..af112872da3177407d23c335acc85b37ba5f90d8 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/README.md @@ -0,0 +1,118 @@ +# Preparing Something-Something V2 + +## Introduction + + + +```BibTeX +@misc{goyal2017something, + title={The "something something" video database for learning and evaluating visual common sense}, + author={Raghav Goyal and Samira Ebrahimi Kahou and Vincent Michalski and Joanna Materzyńska and Susanne Westphal and Heuna Kim and Valentin Haenel and Ingo Fruend and Peter Yianilos and Moritz Mueller-Freitag and Florian Hoppe and Christian Thurau and Ingo Bax and Roland Memisevic}, + year={2017}, + eprint={1706.04261}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +For basic dataset information, you can refer to the dataset [website](https://20bn.com/datasets/something-something/v2). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/sthv2/`. + +## Step 1. Prepare Annotations + +First of all, you have to sign in and download annotations to `$MMACTION2/data/sthv2/annotations` on the official [website](https://20bn.com/datasets/something-something/v2). + +## Step 2. Prepare Videos + +Then, you can download all data parts to `$MMACTION2/data/sthv2/` and use the following command to uncompress. + +```shell +cd $MMACTION2/data/sthv2/ +cat 20bn-something-something-v2-?? | tar zx +cd $MMACTION2/tools/data/sthv2/ +``` + +## Step 3. Extract RGB and Flow + +This part is **optional** if you only want to use the video loader. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. + +You can run the following script to soft link SSD. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/sthv2_extracted/ +ln -s /mnt/SSD/sthv2_extracted/ ../../../data/sthv2/rawframes +``` + +If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract **RGB-only** frames using denseflow. + +```shell +cd $MMACTION2/tools/data/sthv2/ +bash extract_rgb_frames.sh +``` + +If you didn't install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images. + +```shell +cd $MMACTION2/tools/data/sthv2/ +bash extract_rgb_frames_opencv.sh +``` + +If both are required, run the following script to extract frames. + +```shell +cd $MMACTION2/tools/data/sthv2/ +bash extract_frames.sh +``` + +## Step 4. Generate File List + +you can run the follow script to generate file list in the format of rawframes and videos. + +```shell +cd $MMACTION2/tools/data/sthv2/ +bash generate_{rawframes, videos}_filelist.sh +``` + +## Step 5. Check Directory Structure + +After the whole data process for Something-Something V2 preparation, +you will get the rawframes (RGB + Flow), videos and annotation files for Something-Something V2. + +In the context of the whole project (for Something-Something V2 only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── sthv2 +│ │ ├── sthv2_{train,val}_list_rawframes.txt +│ │ ├── sthv2_{train,val}_list_videos.txt +│ │ ├── annotations +│ | ├── videos +│ | | ├── 1.mp4 +│ | | ├── 2.mp4 +│ | | ├──... +│ | ├── rawframes +│ | | ├── 1 +│ | | | ├── img_00001.jpg +│ | | | ├── img_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_x_00001.jpg +│ | | | ├── flow_x_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_y_00001.jpg +│ | | | ├── flow_y_00002.jpg +│ | | | ├── ... +│ | | ├── 2 +│ | | ├── ... + +``` + +For training and evaluating on Something-Something V2, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..7d8080c5a48702555291ef9314ce3f5694d0e90b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/README_zh-CN.md @@ -0,0 +1,118 @@ +# 准备 Something-Something V2 + +## 简介 + + + +```BibTeX +@misc{goyal2017something, + title={The "something something" video database for learning and evaluating visual common sense}, + author={Raghav Goyal and Samira Ebrahimi Kahou and Vincent Michalski and Joanna Materzyńska and Susanne Westphal and Heuna Kim and Valentin Haenel and Ingo Fruend and Peter Yianilos and Moritz Mueller-Freitag and Florian Hoppe and Christian Thurau and Ingo Bax and Roland Memisevic}, + year={2017}, + eprint={1706.04261}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +用户可参考该数据集的 [官网](https://20bn.com/datasets/something-something/v2),以获取数据集相关的基本信息。 +在数据集准备前,请确保命令行当前路径为 `$MMACTION2/tools/data/sthv2/`。 + +## 步骤 1. 下载标注文件 + +首先,用户需要在 [官网](https://20bn.com/datasets/something-something/v2) 完成注册,才能下载标注文件。下载好的标注文件需要放在 `$MMACTION2/data/sthv2/annotations` 文件夹下。 + +## 步骤 2. 准备视频 + +之后,用户可将下载好的压缩文件放在 `$MMACTION2/data/sthv2/` 文件夹下,并且使用以下指令进行解压。 + +```shell +cd $MMACTION2/data/sthv2/ +cat 20bn-something-something-v2-?? | tar zx +cd $MMACTION2/tools/data/sthv2/ +``` + +## 步骤 3. 抽取 RGB 帧和光流 + +如果用户只想使用视频加载训练,则该部分是 **可选项**。 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果拥有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 中。 + +可以运行以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行进行抽取(假设 SSD 挂载在 "/mnt/SSD/") +mkdir /mnt/SSD/sthv2_extracted/ +ln -s /mnt/SSD/sthv2_extracted/ ../../../data/sthv2/rawframes +``` + +如果用户需要抽取 RGB 帧(因为抽取光流的过程十分耗时),可以考虑运行以下命令使用 denseflow **只抽取 RGB 帧**。 + +```shell +cd $MMACTION2/tools/data/sthv2/ +bash extract_rgb_frames.sh +``` + +如果用户没有安装 denseflow,则可以运行以下命令使用 OpenCV 抽取 RGB 帧。然而,该方法只能抽取与原始视频分辨率相同的帧。 + +```shell +cd $MMACTION2/tools/data/sthv2/ +bash extract_rgb_frames_opencv.sh +``` + +如果用户想抽取 RGB 帧和光流,则可以运行以下脚本进行抽取。 + +```shell +cd $MMACTION2/tools/data/sthv2/ +bash extract_frames.sh +``` + +## 步骤 4. 生成文件列表 + +用户可以通过运行以下命令生成帧和视频格式的文件列表。 + +```shell +cd $MMACTION2/tools/data/sthv2/ +bash generate_{rawframes, videos}_filelist.sh +``` + +## 步骤 5. 检查文件夹结构 + +在完成所有 Something-Something V2 数据集准备流程后, +用户可以获得对应的 RGB + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,Something-Something V2 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── sthv2 +│ │ ├── sthv2_{train,val}_list_rawframes.txt +│ │ ├── sthv2_{train,val}_list_videos.txt +│ │ ├── annotations +│ | ├── videos +│ | | ├── 1.mp4 +│ | | ├── 2.mp4 +│ | | ├──... +│ | ├── rawframes +│ | | ├── 1 +│ | | | ├── img_00001.jpg +│ | | | ├── img_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_x_00001.jpg +│ | | | ├── flow_x_00002.jpg +│ | | | ├── ... +│ | | | ├── flow_y_00001.jpg +│ | | | ├── flow_y_00002.jpg +│ | | | ├── ... +│ | | ├── 2 +│ | | ├── ... + +``` + +关于对 Something-Something V2 进行训练和验证,可以参考 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..22bc4360565f875766c1cd72288638cba051c67c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/extract_frames.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/sthv2/videos/ ../../data/sthv2/rawframes/ --task both --level 1 --flow-type tvl1 --ext webm +echo "Raw frames (RGB and tv-l1) Generated" +cd sthv2/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/extract_rgb_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/extract_rgb_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..5b6d58d098b16d6afcb12aa66da97f239be62779 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/extract_rgb_frames.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/sthv2/videos/ ../../data/sthv2/rawframes/ --task rgb --level 1 --ext webm +echo "Genearte raw frames (RGB only)" + +cd sthv2/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/extract_rgb_frames_opencv.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/extract_rgb_frames_opencv.sh new file mode 100644 index 0000000000000000000000000000000000000000..5805d6185b354ad63929717f5c51a671ccb73b7b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/extract_rgb_frames_opencv.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/sthv2/videos/ ../../data/sthv2/rawframes/ --task rgb --level 1 --ext webm --use-opencv +echo "Genearte raw frames (RGB only)" + +cd sthv2/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..da6f938b3e8f5275e001157e355e49abd7c252c0 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/generate_rawframes_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py sthv2 data/sthv2/rawframes/ --num-split 1 --level 1 --subset train --format rawframes --shuffle +PYTHONPATH=. python tools/data/build_file_list.py sthv2 data/sthv2/rawframes/ --num-split 1 --level 1 --subset val --format rawframes --shuffle +echo "Filelist for rawframes generated." + +cd tools/data/sthv2/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..bef255446ce182bbad3605e67d1c855f80623ea4 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/generate_videos_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/build_file_list.py sthv2 data/sthv2/videos/ --num-split 1 --level 1 --subset train --format videos --shuffle +PYTHONPATH=. python tools/data/build_file_list.py sthv2 data/sthv2/videos/ --num-split 1 --level 1 --subset val --format videos --shuffle +echo "Filelist for videos generated." + +cd tools/data/sthv2/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..7dbb309b3434494b76a73362f85a4e82c4a7e81a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/sthv2/label_map.txt @@ -0,0 +1,174 @@ +Approaching something with your camera +Attaching something to something +Bending something so that it deforms +Bending something until it breaks +Burying something in something +Closing something +Covering something with something +Digging something out of something +Dropping something behind something +Dropping something in front of something +Dropping something into something +Dropping something next to something +Dropping something onto something +Failing to put something into something because something does not fit +Folding something +Hitting something with something +Holding something +Holding something behind something +Holding something in front of something +Holding something next to something +Holding something over something +Laying something on the table on its side, not upright +Letting something roll along a flat surface +Letting something roll down a slanted surface +Letting something roll up a slanted surface, so it rolls back down +Lifting a surface with something on it but not enough for it to slide down +Lifting a surface with something on it until it starts sliding down +Lifting something up completely without letting it drop down +Lifting something up completely, then letting it drop down +Lifting something with something on it +Lifting up one end of something without letting it drop down +Lifting up one end of something, then letting it drop down +Moving away from something with your camera +Moving part of something +Moving something across a surface until it falls down +Moving something across a surface without it falling down +Moving something and something away from each other +Moving something and something closer to each other +Moving something and something so they collide with each other +Moving something and something so they pass each other +Moving something away from something +Moving something away from the camera +Moving something closer to something +Moving something down +Moving something towards the camera +Moving something up +Opening something +Picking something up +Piling something up +Plugging something into something +Plugging something into something but pulling it right out as you remove your hand +Poking a hole into some substance +Poking a hole into something soft +Poking a stack of something so the stack collapses +Poking a stack of something without the stack collapsing +Poking something so it slightly moves +Poking something so lightly that it doesn't or almost doesn't move +Poking something so that it falls over +Poking something so that it spins around +Pouring something into something +Pouring something into something until it overflows +Pouring something onto something +Pouring something out of something +Pretending or failing to wipe something off of something +Pretending or trying and failing to twist something +Pretending to be tearing something that is not tearable +Pretending to close something without actually closing it +Pretending to open something without actually opening it +Pretending to pick something up +Pretending to poke something +Pretending to pour something out of something, but something is empty +Pretending to put something behind something +Pretending to put something into something +Pretending to put something next to something +Pretending to put something on a surface +Pretending to put something onto something +Pretending to put something underneath something +Pretending to scoop something up with something +Pretending to spread air onto something +Pretending to sprinkle air onto something +Pretending to squeeze something +Pretending to take something from somewhere +Pretending to take something out of something +Pretending to throw something +Pretending to turn something upside down +Pulling something from behind of something +Pulling something from left to right +Pulling something from right to left +Pulling something onto something +Pulling something out of something +Pulling two ends of something but nothing happens +Pulling two ends of something so that it gets stretched +Pulling two ends of something so that it separates into two pieces +Pushing something from left to right +Pushing something from right to left +Pushing something off of something +Pushing something onto something +Pushing something so it spins +Pushing something so that it almost falls off but doesn't +Pushing something so that it falls off the table +Pushing something so that it slightly moves +Pushing something with something +Putting number of something onto something +Putting something and something on the table +Putting something behind something +Putting something in front of something +Putting something into something +Putting something next to something +Putting something on a flat surface without letting it roll +Putting something on a surface +Putting something on the edge of something so it is not supported and falls down +Putting something onto a slanted surface but it doesn't glide down +Putting something onto something +Putting something onto something else that cannot support it so it falls down +Putting something similar to other things that are already on the table +Putting something that can't roll onto a slanted surface, so it slides down +Putting something that can't roll onto a slanted surface, so it stays where it is +Putting something that cannot actually stand upright upright on the table, so it falls on its side +Putting something underneath something +Putting something upright on the table +Putting something, something and something on the table +Removing something, revealing something behind +Rolling something on a flat surface +Scooping something up with something +Showing a photo of something to the camera +Showing something behind something +Showing something next to something +Showing something on top of something +Showing something to the camera +Showing that something is empty +Showing that something is inside something +Something being deflected from something +Something colliding with something and both are being deflected +Something colliding with something and both come to a halt +Something falling like a feather or paper +Something falling like a rock +Spilling something behind something +Spilling something next to something +Spilling something onto something +Spinning something so it continues spinning +Spinning something that quickly stops spinning +Spreading something onto something +Sprinkling something onto something +Squeezing something +Stacking number of something +Stuffing something into something +Taking one of many similar things on the table +Taking something from somewhere +Taking something out of something +Tearing something into two pieces +Tearing something just a little bit +Throwing something +Throwing something against something +Throwing something in the air and catching it +Throwing something in the air and letting it fall +Throwing something onto a surface +Tilting something with something on it slightly so it doesn't fall down +Tilting something with something on it until it falls off +Tipping something over +Tipping something with something in it over, so something in it falls out +Touching (without moving) part of something +Trying but failing to attach something to something because it doesn't stick +Trying to bend something unbendable so nothing happens +Trying to pour something into something, but missing so it spills next to it +Turning something upside down +Turning the camera downwards while filming something +Turning the camera left while filming something +Turning the camera right while filming something +Turning the camera upwards while filming something +Twisting (wringing) something wet until water comes out +Twisting something +Uncovering something +Unfolding something +Wiping something off of something diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/README.md new file mode 100644 index 0000000000000000000000000000000000000000..eaddb60cbebe7e1ea2c6dc202a5e00875f00df3d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/README.md @@ -0,0 +1,142 @@ +# Preparing THUMOS'14 + +## Introduction + + + +```BibTeX +@misc{THUMOS14, + author = {Jiang, Y.-G. and Liu, J. and Roshan Zamir, A. and Toderici, G. and Laptev, + I. and Shah, M. and Sukthankar, R.}, + title = {{THUMOS} Challenge: Action Recognition with a Large + Number of Classes}, + howpublished = "\url{http://crcv.ucf.edu/THUMOS14/}", + Year = {2014} +} +``` + +For basic dataset information, you can refer to the dataset [website](https://www.crcv.ucf.edu/THUMOS14/download.html). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/thumos14/`. + +## Step 1. Prepare Annotations + +First of all, run the following script to prepare annotations. + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash download_annotations.sh +``` + +## Step 2. Prepare Videos + +Then, you can run the following script to prepare videos. + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash download_videos.sh +``` + +## Step 3. Extract RGB and Flow + +This part is **optional** if you only want to use the video loader. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. + +You can run the following script to soft link SSD. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/thumos14_extracted/ +ln -s /mnt/SSD/thumos14_extracted/ ../data/thumos14/rawframes/ +``` + +If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract **RGB-only** frames using denseflow. + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash extract_rgb_frames.sh +``` + +If you didn't install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images. + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash extract_rgb_frames_opencv.sh +``` + +If both are required, run the following script to extract frames. + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash extract_frames.sh tvl1 +``` + +## Step 4. Fetch File List + +This part is **optional** if you do not use SSN model. + +You can run the follow script to fetch pre-computed tag proposals. + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash fetch_tag_proposals.sh +``` + +## Step 5. Denormalize Proposal File + +This part is **optional** if you do not use SSN model. + +You can run the follow script to denormalize pre-computed tag proposals according to +actual number of local rawframes. + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash denormalize_proposal_file.sh +``` + +## Step 6. Check Directory Structure + +After the whole data process for THUMOS'14 preparation, +you will get the rawframes (RGB + Flow), videos and annotation files for THUMOS'14. + +In the context of the whole project (for THUMOS'14 only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── thumos14 +│ │ ├── proposals +│ │ | ├── thumos14_tag_val_normalized_proposal_list.txt +│ │ | ├── thumos14_tag_test_normalized_proposal_list.txt +│ │ ├── annotations_val +│ │ ├── annotations_test +│ │ ├── videos +│ │ │ ├── val +│ │ │ | ├── video_validation_0000001.mp4 +│ │ │ | ├── ... +│ │ | ├── test +│ │ │ | ├── video_test_0000001.mp4 +│ │ │ | ├── ... +│ │ ├── rawframes +│ │ │ ├── val +│ │ │ | ├── video_validation_0000001 +| │ │ | │ ├── img_00001.jpg +| │ │ | │ ├── img_00002.jpg +| │ │ | │ ├── ... +| │ │ | │ ├── flow_x_00001.jpg +| │ │ | │ ├── flow_x_00002.jpg +| │ │ | │ ├── ... +| │ │ | │ ├── flow_y_00001.jpg +| │ │ | │ ├── flow_y_00002.jpg +| │ │ | │ ├── ... +│ │ │ | ├── ... +│ │ | ├── test +│ │ │ | ├── video_test_0000001 +``` + +For training and evaluating on THUMOS'14, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..fb7140a24e60bb1e48824e24cd45f4b726d5c730 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/README_zh-CN.md @@ -0,0 +1,139 @@ +# 准备 THUMOS'14 + +## 简介 + + + +```BibTex +@misc{THUMOS14, + author = {Jiang, Y.-G. and Liu, J. and Roshan Zamir, A. and Toderici, G. and Laptev, + I. and Shah, M. and Sukthankar, R.}, + title = {{THUMOS} Challenge: Action Recognition with a Large + Number of Classes}, + howpublished = "\url{http://crcv.ucf.edu/THUMOS14/}", + Year = {2014} +} +``` + +用户可以参照数据集 [官网](https://www.crcv.ucf.edu/THUMOS14/download.html),获取数据集相关的基本信息。 +在准备数据集前,请确保命令行当前路径为 `$MMACTION2/tools/data/thumos14/`。 + +## 步骤 1. 下载标注文件 + +首先,用户可使用以下命令下载标注文件。 + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash download_annotations.sh +``` + +## 步骤 2. 下载视频 + +之后,用户可使用以下指令下载视频 + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash download_videos.sh +``` + +## 步骤 3. 抽取帧和光流 + +如果用户只想使用视频加载训练,则该部分是 **可选项**。 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果用户有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 上。 +用户可使用以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行指令进行抽取(假设 SSD 挂载在 "/mnt/SSD/"上) +mkdir /mnt/SSD/thumos14_extracted/ +ln -s /mnt/SSD/thumos14_extracted/ ../data/thumos14/rawframes/ +``` + +如果用户需要抽取 RGB 帧(因为抽取光流的过程十分耗时),可以考虑运行以下命令使用 denseflow **只抽取 RGB 帧**。 + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash extract_rgb_frames.sh +``` + +如果用户没有安装 denseflow,则可以运行以下命令使用 OpenCV 抽取 RGB 帧。然而,该方法只能抽取与原始视频分辨率相同的帧。 + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash extract_rgb_frames_opencv.sh +``` + +如果用户想抽取 RGB 帧和光流,则可以运行以下脚本进行抽取。 + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash extract_frames.sh tvl1 +``` + +## 步骤 4. 生成文件列表 + +如果用户不使用 SSN 模型,则该部分是 **可选项**。 + +可使用运行以下脚本下载预先计算的候选标签。 + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash fetch_tag_proposals.sh +``` + +## 步骤 5. 去规范化候选文件 + +如果用户不使用 SSN 模型,则该部分是 **可选项**。 + +可运行以下脚本,来根据本地原始帧的实际数量,去规范化预先计算的候选标签。 + +```shell +cd $MMACTION2/tools/data/thumos14/ +bash denormalize_proposal_file.sh +``` + +## 步骤 6. 检查目录结构 + +在完成 THUMOS'14 数据集准备流程后,用户可以得到 THUMOS'14 的 RGB 帧 + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,THUMOS'14 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── thumos14 +│ │ ├── proposals +│ │ | ├── thumos14_tag_val_normalized_proposal_list.txt +│ │ | ├── thumos14_tag_test_normalized_proposal_list.txt +│ │ ├── annotations_val +│ │ ├── annotations_test +│ │ ├── videos +│ │ │ ├── val +│ │ │ | ├── video_validation_0000001.mp4 +│ │ │ | ├── ... +│ │ | ├── test +│ │ │ | ├── video_test_0000001.mp4 +│ │ │ | ├── ... +│ │ ├── rawframes +│ │ │ ├── val +│ │ │ | ├── video_validation_0000001 +| │ │ | │ ├── img_00001.jpg +| │ │ | │ ├── img_00002.jpg +| │ │ | │ ├── ... +| │ │ | │ ├── flow_x_00001.jpg +| │ │ | │ ├── flow_x_00002.jpg +| │ │ | │ ├── ... +| │ │ | │ ├── flow_y_00001.jpg +| │ │ | │ ├── flow_y_00002.jpg +| │ │ | │ ├── ... +│ │ │ | ├── ... +│ │ | ├── test +│ │ │ | ├── video_test_0000001 +``` + +关于对 THUMOS'14 进行训练和验证,可以参照 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/denormalize_proposal_file.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/denormalize_proposal_file.sh new file mode 100644 index 0000000000000000000000000000000000000000..92f561a40c84385bd9a202666a7b467d3fa79c1f --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/denormalize_proposal_file.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash + +cd ../../../ +PYTHONPATH=. python tools/data/denormalize_proposal_file.py thumos14 --norm-proposal-file data/thumos14/proposals/thumos14_tag_val_normalized_proposal_list.txt --data-prefix data/thumos14/rawframes/val/ +echo "Proposal file denormalized for val set" + +PYTHONPATH=. python tools/data/denormalize_proposal_file.py thumos14 --norm-proposal-file data/thumos14/proposals/thumos14_tag_test_normalized_proposal_list.txt --data-prefix data/thumos14/rawframes/test/ +echo "Proposal file denormalized for test set" + +cd tools/data/thumos14/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..fc8473f695af726cd4cccce3097ff17834c9f328 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/download_annotations.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash + +DATA_DIR="../../../data/thumos14/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi +cd ${DATA_DIR} + +wget http://crcv.ucf.edu/THUMOS14/Validation_set/TH14_Temporal_annotations_validation.zip --no-check-certificate +wget http://crcv.ucf.edu/THUMOS14/test_set/TH14_Temporal_annotations_test.zip --no-check-certificate + +if [ ! -d "./annotations_val" ]; then + mkdir ./annotations_val +fi +unzip -j TH14_Temporal_annotations_validation.zip -d annotations_val + +if [ ! -d "./annotations_test" ]; then + mkdir ./annotations_test +fi +unzip -j TH14_Temporal_annotations_test.zip -d annotations_test + +rm TH14_Temporal_annotations_validation.zip +rm TH14_Temporal_annotations_test.zip + +cd "../../tools/data/thumos14/" diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/download_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/download_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..e987bb4e98b310ad1d590f79b395b44e29bdc905 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/download_videos.sh @@ -0,0 +1,25 @@ +#!/usr/bin/env bash + +DATA_DIR="../../../data/thumos14/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} + +wget https://storage.googleapis.com/thumos14_files/TH14_validation_set_mp4.zip +wget https://storage.googleapis.com/thumos14_files/TH14_Test_set_mp4.zip + +if [ ! -d "./videos/val" ]; then + mkdir -p ./videos/val +fi +unzip -j TH14_validation_set_mp4.zip -d videos/val + +if [ ! -d "./videos/test" ]; then + mkdir -p ./videos/test +fi +unzip -P "THUMOS14_REGISTERED" -j TH14_Test_set_mp4.zip -d videos/test + +cd "../../tools/data/thumos14/" diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..edf6be2b80580ef9f590e33f7048ef1cb4c4622d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/extract_frames.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/thumos14/videos/val/ ../../data/thumos14/rawframes/val/ --level 1 --flow-type tvl1 --ext mp4 --task both +echo "Raw frames (RGB and tv-l1) Generated for val set" + +python build_rawframes.py ../../data/thumos14/videos/test/ ../../data/thumos14/rawframes/test/ --level 1 --flow-type tvl1 --ext mp4 --task both +echo "Raw frames (RGB and tv-l1) Generated for test set" + +cd thumos14/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/extract_rgb_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/extract_rgb_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..776575f9430e2c73b65edc737ee8b31963d04030 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/extract_rgb_frames.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/thumos14/videos/val/ ../../data/thumos14/rawframes/val/ --level 1 --ext mp4 --task rgb +echo "Raw frames (RGB only) generated for val set" + +python build_rawframes.py ../../data/thumos14/videos/test/ ../../data/thumos14/rawframes/test/ --level 1 --ext mp4 --task rgb +echo "Raw frames (RGB only) generated for test set" + +cd thumos14/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/extract_rgb_frames_opencv.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/extract_rgb_frames_opencv.sh new file mode 100644 index 0000000000000000000000000000000000000000..d4fad08d800556b36bdbaca912e49606fbbccf7a --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/extract_rgb_frames_opencv.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/thumos14/videos/val/ ../../data/thumos14/rawframes/val/ --level 1 --ext mp4 --task rgb --use-opencv +echo "Raw frames (RGB only) generated for val set" + +python build_rawframes.py ../../data/thumos14/videos/test/ ../../data/thumos14/rawframes/test/ --level 1 --ext mp4 --task rgb --use-opencv +echo "Raw frames (RGB only) generated for test set" + +cd thumos14/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/fetch_tag_proposals.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/fetch_tag_proposals.sh new file mode 100644 index 0000000000000000000000000000000000000000..39f05fd1e4e9071cfc040b822120ce331d8e1f5d --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/thumos14/fetch_tag_proposals.sh @@ -0,0 +1,11 @@ +#!/usr/bin/env bash + +PROP_DIR="../../../data/thumos14/proposals" + +if [[ ! -d "${PROP_DIR}" ]]; then + echo "${PROP_DIR} does not exist. Creating"; + mkdir -p ${PROP_DIR} +fi + +wget https://download.openmmlab.com/mmaction/dataset/thumos14/thumos14_tag_val_normalized_proposal_list.txt -P ${PROP_DIR} +wget https://download.openmmlab.com/mmaction/dataset/thumos14/thumos14_tag_test_normalized_proposal_list.txt -P ${PROP_DIR} diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9abaff1b9091e82441c9e8b716da6f97530a0d5c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/README.md @@ -0,0 +1,127 @@ +# Preparing UCF-101 + +## Introduction + + + +```BibTeX +@article{Soomro2012UCF101AD, + title={UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild}, + author={K. Soomro and A. Zamir and M. Shah}, + journal={ArXiv}, + year={2012}, + volume={abs/1212.0402} +} +``` + +For basic dataset information, you can refer to the dataset [website](https://www.crcv.ucf.edu/research/data-sets/ucf101/). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/ucf101/`. + +## Step 1. Prepare Annotations + +First of all, you can run the following script to prepare annotations. + +```shell +bash download_annotations.sh +``` + +## Step 2. Prepare Videos + +Then, you can run the following script to prepare videos. + +```shell +bash download_videos.sh +``` + +For better decoding speed, you can resize the original videos into smaller sized, densely encoded version by: + +``` +python ../resize_videos.py ../../../data/ucf101/videos/ ../../../data/ucf101/videos_256p_dense_cache --dense --level 2 --ext avi +``` + +## Step 3. Extract RGB and Flow + +This part is **optional** if you only want to use the video loader. + +Before extracting, please refer to [install.md](/docs/install.md) for installing [denseflow](https://github.com/open-mmlab/denseflow). + +If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. The extracted frames (RGB + Flow) will take up about 100GB. + +You can run the following script to soft link SSD. + +```shell +# execute these two line (Assume the SSD is mounted at "/mnt/SSD/") +mkdir /mnt/SSD/ucf101_extracted/ +ln -s /mnt/SSD/ucf101_extracted/ ../../../data/ucf101/rawframes +``` + +If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract **RGB-only** frames using denseflow. + +```shell +bash extract_rgb_frames.sh +``` + +If you didn't install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images. + +```shell +bash extract_rgb_frames_opencv.sh +``` + +If Optical Flow is also required, run the following script to extract flow using "tvl1" algorithm. + +```shell +bash extract_frames.sh +``` + +## Step 4. Generate File List + +you can run the follow script to generate file list in the format of rawframes and videos. + +```shell +bash generate_videos_filelist.sh +bash generate_rawframes_filelist.sh +``` + +## Step 5. Check Directory Structure + +After the whole data process for UCF-101 preparation, +you will get the rawframes (RGB + Flow), videos and annotation files for UCF-101. + +In the context of the whole project (for UCF-101 only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ucf101 +│ │ ├── ucf101_{train,val}_split_{1,2,3}_rawframes.txt +│ │ ├── ucf101_{train,val}_split_{1,2,3}_videos.txt +│ │ ├── annotations +│ │ ├── videos +│ │ │ ├── ApplyEyeMakeup +│ │ │ │ ├── v_ApplyEyeMakeup_g01_c01.avi + +│ │ │ ├── YoYo +│ │ │ │ ├── v_YoYo_g25_c05.avi +│ │ ├── rawframes +│ │ │ ├── ApplyEyeMakeup +│ │ │ │ ├── v_ApplyEyeMakeup_g01_c01 +│ │ │ │ │ ├── img_00001.jpg +│ │ │ │ │ ├── img_00002.jpg +│ │ │ │ │ ├── ... +│ │ │ │ │ ├── flow_x_00001.jpg +│ │ │ │ │ ├── flow_x_00002.jpg +│ │ │ │ │ ├── ... +│ │ │ │ │ ├── flow_y_00001.jpg +│ │ │ │ │ ├── flow_y_00002.jpg +│ │ │ ├── ... +│ │ │ ├── YoYo +│ │ │ │ ├── v_YoYo_g01_c01 +│ │ │ │ ├── ... +│ │ │ │ ├── v_YoYo_g25_c05 + +``` + +For training and evaluating on UCF-101, please refer to [getting_started.md](/docs/getting_started.md). diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..96e9453ff4f93d282563a90c28c3f8e1ad45e8b6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/README_zh-CN.md @@ -0,0 +1,125 @@ +# 准备 UCF-101 + +## 简介 + +```BibTeX +@article{Soomro2012UCF101AD, + title={UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild}, + author={K. Soomro and A. Zamir and M. Shah}, + journal={ArXiv}, + year={2012}, + volume={abs/1212.0402} +} +``` + +用户可参考该数据集的 [官网](https://www.crcv.ucf.edu/research/data-sets/ucf101/),以获取数据集相关的基本信息。 +在数据集准备前,请确保命令行当前路径为 `$MMACTION2/tools/data/ucf101/`。 + +## 步骤 1. 下载标注文件 + +首先,用户可运行以下脚本下载标注文件。 + +```shell +bash download_annotations.sh +``` + +## 步骤 2. 准备视频文件 + +之后,用户可运行以下脚本准备视频文件。 + +```shell +bash download_videos.sh +``` + +用户可使用以下脚本,对原视频进行裁剪,得到密集编码且更小尺寸的视频。 + +``` +python ../resize_videos.py ../../../data/ucf101/videos/ ../../../data/ucf101/videos_256p_dense_cache --dense --level 2 --ext avi +``` + +## 步骤 3. 抽取视频帧和光流 + +如果用户只想使用视频加载训练,则该部分是 **可选项**。 + +在抽取视频帧和光流之前,请参考 [安装指南](/docs_zh_CN/install.md) 安装 [denseflow](https://github.com/open-mmlab/denseflow)。 + +如果拥有大量的 SSD 存储空间,则推荐将抽取的帧存储至 I/O 性能更优秀的 SSD 中。所抽取的视频帧和光流约占据 100 GB 的存储空间。 + +可以运行以下命令为 SSD 建立软链接。 + +```shell +# 执行这两行进行抽取(假设 SSD 挂载在 "/mnt/SSD/") +mkdir /mnt/SSD/ucf101_extracted/ +ln -s /mnt/SSD/ucf101_extracted/ ../../../data/ucf101/rawframes +``` + +如果用户需要抽取 RGB 帧(因为抽取光流的过程十分耗时),可以考虑运行以下命令使用 denseflow **只抽取 RGB 帧**。 + +```shell +bash extract_rgb_frames.sh +``` + +如果用户没有安装 denseflow,则可以运行以下命令使用 OpenCV 抽取 RGB 帧。然而,该方法只能抽取与原始视频分辨率相同的帧。 + +```shell +bash extract_rgb_frames_opencv.sh +``` + +如果用户想抽取 RGB 帧和光流,则可以运行以下脚本使用 "tvl1" 算法进行抽取。 + +```shell +bash extract_frames.sh +``` + +## 步骤 4. 生成文件列表 + +用户可以通过运行以下命令生成帧和视频格式的文件列表。 + +```shell +bash generate_videos_filelist.sh +bash generate_rawframes_filelist.sh +``` + +## 步骤 5. 检查文件夹结构 + +在完成所有 UCF-101 数据集准备流程后, +用户可以获得对应的 RGB + 光流文件,视频文件以及标注文件。 + +在整个 MMAction2 文件夹下,UCF-101 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ucf101 +│ │ ├── ucf101_{train,val}_split_{1,2,3}_rawframes.txt +│ │ ├── ucf101_{train,val}_split_{1,2,3}_videos.txt +│ │ ├── annotations +│ │ ├── videos +│ │ │ ├── ApplyEyeMakeup +│ │ │ │ ├── v_ApplyEyeMakeup_g01_c01.avi + +│ │ │ ├── YoYo +│ │ │ │ ├── v_YoYo_g25_c05.avi +│ │ ├── rawframes +│ │ │ ├── ApplyEyeMakeup +│ │ │ │ ├── v_ApplyEyeMakeup_g01_c01 +│ │ │ │ │ ├── img_00001.jpg +│ │ │ │ │ ├── img_00002.jpg +│ │ │ │ │ ├── ... +│ │ │ │ │ ├── flow_x_00001.jpg +│ │ │ │ │ ├── flow_x_00002.jpg +│ │ │ │ │ ├── ... +│ │ │ │ │ ├── flow_y_00001.jpg +│ │ │ │ │ ├── flow_y_00002.jpg +│ │ │ ├── ... +│ │ │ ├── YoYo +│ │ │ │ ├── v_YoYo_g01_c01 +│ │ │ │ ├── ... +│ │ │ │ ├── v_YoYo_g25_c05 + +``` + +关于对 UCF-101 进行训练和验证,可以参考 [基础教程](/docs_zh_CN/getting_started.md)。 diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/download_annotations.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/download_annotations.sh new file mode 100644 index 0000000000000000000000000000000000000000..7a8822b3573be77e3b97a9a77b20750e5b710171 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/download_annotations.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +DATA_DIR="../../../data/ucf101/annotations" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +wget https://www.crcv.ucf.edu/wp-content/uploads/2019/03/UCF101TrainTestSplits-RecognitionTask.zip --no-check-certificate + +unzip -j UCF101TrainTestSplits-RecognitionTask.zip -d ${DATA_DIR}/ +rm UCF101TrainTestSplits-RecognitionTask.zip diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/download_videos.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/download_videos.sh new file mode 100644 index 0000000000000000000000000000000000000000..6efd98e199655170d2c41590e84f85e4b151c2c7 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/download_videos.sh @@ -0,0 +1,16 @@ +#!/usr/bin/env bash + +DATA_DIR="../../../data/ucf101/" + +if [[ ! -d "${DATA_DIR}" ]]; then + echo "${DATA_DIR} does not exist. Creating"; + mkdir -p ${DATA_DIR} +fi + +cd ${DATA_DIR} + +wget https://www.crcv.ucf.edu/datasets/human-actions/ucf101/UCF101.rar --no-check-certificate +unrar x UCF101.rar +mv ./UCF-101 ./videos + +cd "../../tools/data/ucf101" diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/extract_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/extract_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..b549b6e6fe9ea24ff5244769374ff0a361b990ce --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/extract_frames.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/ucf101/videos/ ../../data/ucf101/rawframes/ --task both --level 2 --flow-type tvl1 +echo "Raw frames (RGB and Flow) Generated" +cd ucf101/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/extract_rgb_frames.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/extract_rgb_frames.sh new file mode 100644 index 0000000000000000000000000000000000000000..b39df7c6d09a727fc1eda6f8f3876f3fdf8c50fc --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/extract_rgb_frames.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/ucf101/videos/ ../../data/ucf101/rawframes/ --task rgb --level 2 --ext avi +echo "Genearte raw frames (RGB only)" + +cd ucf101/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/extract_rgb_frames_opencv.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/extract_rgb_frames_opencv.sh new file mode 100644 index 0000000000000000000000000000000000000000..50d1ac326fdc1883955683568f1f2f5ea10d9375 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/extract_rgb_frames_opencv.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +cd ../ +python build_rawframes.py ../../data/ucf101/videos/ ../../data/ucf101/rawframes/ --task rgb --level 2 --ext avi --use-opencv +echo "Genearte raw frames (RGB only)" + +cd ucf101/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/generate_rawframes_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/generate_rawframes_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..9b9ed9937d36fba444fa08a30ca8e60998d91981 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/generate_rawframes_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ + +PYTHONPATH=. python tools/data/build_file_list.py ucf101 data/ucf101/rawframes/ --level 2 --format rawframes --shuffle +echo "Filelist for rawframes generated." + +cd tools/data/ucf101/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/generate_videos_filelist.sh b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/generate_videos_filelist.sh new file mode 100644 index 0000000000000000000000000000000000000000..5f391437d2fe63f6dff30173c73b6fe957c90667 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/generate_videos_filelist.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +cd ../../../ + +PYTHONPATH=. python tools/data/build_file_list.py ucf101 data/ucf101/videos/ --level 2 --format videos --shuffle +echo "Filelist for videos generated." + +cd tools/data/ucf101/ diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/label_map.txt b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/label_map.txt new file mode 100644 index 0000000000000000000000000000000000000000..dd41d095c7c503e2ca102aa077bfaaebe63a1008 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101/label_map.txt @@ -0,0 +1,101 @@ +ApplyEyeMakeup +ApplyLipstick +Archery +BabyCrawling +BalanceBeam +BandMarching +BaseballPitch +Basketball +BasketballDunk +BenchPress +Biking +Billiards +BlowDryHair +BlowingCandles +BodyWeightSquats +Bowling +BoxingPunchingBag +BoxingSpeedBag +BreastStroke +BrushingTeeth +CleanAndJerk +CliffDiving +CricketBowling +CricketShot +CuttingInKitchen +Diving +Drumming +Fencing +FieldHockeyPenalty +FloorGymnastics +FrisbeeCatch +FrontCrawl +GolfSwing +Haircut +Hammering +HammerThrow +HandstandPushups +HandstandWalking +HeadMassage +HighJump +HorseRace +HorseRiding +HulaHoop +IceDancing +JavelinThrow +JugglingBalls +JumpingJack +JumpRope +Kayaking +Knitting +LongJump +Lunges +MilitaryParade +Mixing +MoppingFloor +Nunchucks +ParallelBars +PizzaTossing +PlayingCello +PlayingDaf +PlayingDhol +PlayingFlute +PlayingGuitar +PlayingPiano +PlayingSitar +PlayingTabla +PlayingViolin +PoleVault +PommelHorse +PullUps +Punch +PushUps +Rafting +RockClimbingIndoor +RopeClimbing +Rowing +SalsaSpin +ShavingBeard +Shotput +SkateBoarding +Skiing +Skijet +SkyDiving +SoccerJuggling +SoccerPenalty +StillRings +SumoWrestling +Surfing +Swing +TableTennisShot +TaiChi +TennisSwing +ThrowDiscus +TrampolineJumping +Typing +UnevenBars +VolleyballSpiking +WalkingWithDog +WallPushups +WritingOnBoard +YoYo diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101_24/README.md b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101_24/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8d637965b8fa4526cc38c71ac78952cd8220d912 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101_24/README.md @@ -0,0 +1,89 @@ +# Preparing UCF101-24 + +## Introduction + + + +```BibTeX +@article{Soomro2012UCF101AD, + title={UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild}, + author={K. Soomro and A. Zamir and M. Shah}, + journal={ArXiv}, + year={2012}, + volume={abs/1212.0402} +} +``` + +For basic dataset information, you can refer to the dataset [website](http://www.thumos.info/download.html). +Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/ucf101_24/`. + +## Download and Extract + +You can download the RGB frames, optical flow and ground truth annotations from [google drive](https://drive.google.com/drive/folders/1BvGywlAGrACEqRyfYbz3wzlVV3cDFkct). +The data are provided from [MOC](https://github.com/MCG-NJU/MOC-Detector/blob/master/readme/Dataset.md), which is adapted from [act-detector](https://github.com/vkalogeiton/caffe/tree/act-detector) and [corrected-UCF101-Annots](https://github.com/gurkirt/corrected-UCF101-Annots). + +:::{note} +The annotation of this UCF101-24 is from [here](https://github.com/gurkirt/corrected-UCF101-Annots), which is more correct. +::: + +After downloading the `UCF101_v2.tar.gz` file and put it in `$MMACTION2/tools/data/ucf101_24/`, you can run the following command to uncompress. + +```shell +tar -zxvf UCF101_v2.tar.gz +``` + +## Check Directory Structure + +After uncompressing, you will get the `rgb-images` directory, `brox-images` directory and `UCF101v2-GT.pkl` for UCF101-24. + +In the context of the whole project (for UCF101-24 only), the folder structure will look like: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ucf101_24 +│ | ├── brox-images +│ | | ├── Basketball +│ | | | ├── v_Basketball_g01_c01 +│ | | | | ├── 00001.jpg +│ | | | | ├── 00002.jpg +│ | | | | ├── ... +│ | | | | ├── 00140.jpg +│ | | | | ├── 00141.jpg +│ | | ├── ... +│ | | ├── WalkingWithDog +│ | | | ├── v_WalkingWithDog_g01_c01 +│ | | | ├── ... +│ | | | ├── v_WalkingWithDog_g25_c04 +│ | ├── rgb-images +│ | | ├── Basketball +│ | | | ├── v_Basketball_g01_c01 +│ | | | | ├── 00001.jpg +│ | | | | ├── 00002.jpg +│ | | | | ├── ... +│ | | | | ├── 00140.jpg +│ | | | | ├── 00141.jpg +│ | | ├── ... +│ | | ├── WalkingWithDog +│ | | | ├── v_WalkingWithDog_g01_c01 +│ | | | ├── ... +│ | | | ├── v_WalkingWithDog_g25_c04 +│ | ├── UCF101v2-GT.pkl + +``` + +:::{note} +The `UCF101v2-GT.pkl` exists as a cache, it contains 6 items as follows: +::: + +1. `labels` (list): List of the 24 labels. +2. `gttubes` (dict): Dictionary that contains the ground truth tubes for each video. + A **gttube** is dictionary that associates with each index of label and a list of tubes. + A **tube** is a numpy array with `nframes` rows and 5 columns, each col is in format like ` `. +3. `nframes` (dict): Dictionary that contains the number of frames for each video, like `'HorseRiding/v_HorseRiding_g05_c02': 151`. +4. `train_videos` (list): A list with `nsplits=1` elements, each one containing the list of training videos. +5. `test_videos` (list): A list with `nsplits=1` elements, each one containing the list of testing videos. +6. `resolution` (dict): Dictionary that outputs a tuple (h,w) of the resolution for each video, like `'FloorGymnastics/v_FloorGymnastics_g09_c03': (240, 320)`. diff --git a/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101_24/README_zh-CN.md b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101_24/README_zh-CN.md new file mode 100644 index 0000000000000000000000000000000000000000..1e91b2518b05c20c531e66c8155ec1af5b28bded --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/data/ucf101_24/README_zh-CN.md @@ -0,0 +1,84 @@ +# 准备 UCF101-24 + +## 简介 + +```BibTeX +@article{Soomro2012UCF101AD, + title={UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild}, + author={K. Soomro and A. Zamir and M. Shah}, + journal={ArXiv}, + year={2012}, + volume={abs/1212.0402} +} +``` + +用户可参考该数据集的 [官网](http://www.thumos.info/download.html),以获取数据集相关的基本信息。 +在数据集准备前,请确保命令行当前路径为 `$MMACTION2/tools/data/ucf101_24/`。 + +## 下载和解压 + +用户可以从 [这里](https://drive.google.com/drive/folders/1BvGywlAGrACEqRyfYbz3wzlVV3cDFkct) 下载 RGB 帧,光流和标注文件。 +该数据由 [MOC](https://github.com/MCG-NJU/MOC-Detector/blob/master/readme/Dataset.md) 代码库提供, +参考自 [act-detector](https://github.com/vkalogeiton/caffe/tree/act-detector) 和 [corrected-UCF101-Annots](https://github.com/gurkirt/corrected-UCF101-Annots)。 + +**注意**:UCF101-24 的标注文件来自于 [这里](https://github.com/gurkirt/corrected-UCF101-Annots),该标注文件相对于其他标注文件更加准确。 + +用户在下载 `UCF101_v2.tar.gz` 文件后,需将其放置在 `$MMACTION2/tools/data/ucf101_24/` 目录下,并使用以下指令进行解压: + +```shell +tar -zxvf UCF101_v2.tar.gz +``` + +## 检查文件夹结构 + +经过解压后,用户将得到 `rgb-images` 文件夹,`brox-images` 文件夹和 `UCF101v2-GT.pkl` 文件。 + +在整个 MMAction2 文件夹下,UCF101_24 的文件结构如下: + +``` +mmaction2 +├── mmaction +├── tools +├── configs +├── data +│ ├── ucf101_24 +│ | ├── brox-images +│ | | ├── Basketball +│ | | | ├── v_Basketball_g01_c01 +│ | | | | ├── 00001.jpg +│ | | | | ├── 00002.jpg +│ | | | | ├── ... +│ | | | | ├── 00140.jpg +│ | | | | ├── 00141.jpg +│ | | ├── ... +│ | | ├── WalkingWithDog +│ | | | ├── v_WalkingWithDog_g01_c01 +│ | | | ├── ... +│ | | | ├── v_WalkingWithDog_g25_c04 +│ | ├── rgb-images +│ | | ├── Basketball +│ | | | ├── v_Basketball_g01_c01 +│ | | | | ├── 00001.jpg +│ | | | | ├── 00002.jpg +│ | | | | ├── ... +│ | | | | ├── 00140.jpg +│ | | | | ├── 00141.jpg +│ | | ├── ... +│ | | ├── WalkingWithDog +│ | | | ├── v_WalkingWithDog_g01_c01 +│ | | | ├── ... +│ | | | ├── v_WalkingWithDog_g25_c04 +│ | ├── UCF101v2-GT.pkl + +``` + +**注意**:`UCF101v2-GT.pkl` 作为一个缓存文件,它包含 6 个项目: + +1. `labels` (list):24 个行为类别名称组成的列表 +2. `gttubes` (dict):每个视频对应的基准 tubes 组成的字典 + **gttube** 是由标签索引和 tube 列表组成的字典 + **tube** 是一个 `nframes` 行和 5 列的 numpy array,每一列的形式如 ` ` +3. `nframes` (dict):用以表示每个视频对应的帧数,如 `'HorseRiding/v_HorseRiding_g05_c02': 151` +4. `train_videos` (list):包含 `nsplits=1` 的元素,每一项都包含了训练视频的列表 +5. `test_videos` (list):包含 `nsplits=1` 的元素,每一项都包含了测试视频的列表 +6. `resolution` (dict):每个视频对应的分辨率(形如 (h,w)),如 `'FloorGymnastics/v_FloorGymnastics_g09_c03': (240, 320)` diff --git a/openmmlab_test/mmaction2-0.24.1/tools/deployment/mmaction2torchserve.py b/openmmlab_test/mmaction2-0.24.1/tools/deployment/mmaction2torchserve.py new file mode 100644 index 0000000000000000000000000000000000000000..d491ac7b364f7ed3f8e4b0eebc2b6c9c58a13123 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/deployment/mmaction2torchserve.py @@ -0,0 +1,109 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import shutil +from argparse import ArgumentParser, Namespace +from pathlib import Path +from tempfile import TemporaryDirectory + +import mmcv + +try: + from model_archiver.model_packaging import package_model + from model_archiver.model_packaging_utils import ModelExportUtils +except ImportError: + raise ImportError('`torch-model-archiver` is required.' + 'Try: pip install torch-model-archiver') + + +def mmaction2torchserve( + config_file: str, + checkpoint_file: str, + output_folder: str, + model_name: str, + label_file: str, + model_version: str = '1.0', + force: bool = False, +): + """Converts MMAction2 model (config + checkpoint) to TorchServe `.mar`. + + Args: + config_file (str): In MMAction2 config format. + checkpoint_file (str): In MMAction2 checkpoint format. + output_folder (str): Folder where `{model_name}.mar` will be created. + The file created will be in TorchServe archive format. + label_file (str): A txt file which contains the action category names. + model_name (str | None): If not None, used for naming the + `{model_name}.mar` file that will be created under `output_folder`. + If None, `{Path(checkpoint_file).stem}` will be used. + model_version (str): Model's version. + force (bool): If True, if there is an existing `{model_name}.mar` file + under `output_folder` it will be overwritten. + """ + mmcv.mkdir_or_exist(output_folder) + + config = mmcv.Config.fromfile(config_file) + + with TemporaryDirectory() as tmpdir: + config.dump(f'{tmpdir}/config.py') + shutil.copy(label_file, f'{tmpdir}/label_map.txt') + + args = Namespace( + **{ + 'model_file': f'{tmpdir}/config.py', + 'serialized_file': checkpoint_file, + 'handler': f'{Path(__file__).parent}/mmaction_handler.py', + 'model_name': model_name or Path(checkpoint_file).stem, + 'version': model_version, + 'export_path': output_folder, + 'force': force, + 'requirements_file': None, + 'extra_files': f'{tmpdir}/label_map.txt', + 'runtime': 'python', + 'archive_format': 'default' + }) + manifest = ModelExportUtils.generate_manifest_json(args) + package_model(args, manifest) + + +def parse_args(): + parser = ArgumentParser( + description='Convert MMAction2 models to TorchServe `.mar` format.') + parser.add_argument('config', type=str, help='config file path') + parser.add_argument('checkpoint', type=str, help='checkpoint file path') + parser.add_argument( + '--output-folder', + type=str, + required=True, + help='Folder where `{model_name}.mar` will be created.') + parser.add_argument( + '--model-name', + type=str, + default=None, + help='If not None, used for naming the `{model_name}.mar`' + 'file that will be created under `output_folder`.' + 'If None, `{Path(checkpoint_file).stem}` will be used.') + parser.add_argument( + '--label-file', + type=str, + default=None, + help='A txt file which contains the action category names. ') + parser.add_argument( + '--model-version', + type=str, + default='1.0', + help='Number used for versioning.') + parser.add_argument( + '-f', + '--force', + action='store_true', + help='overwrite the existing `{model_name}.mar`') + args = parser.parse_args() + + return args + + +if __name__ == '__main__': + args = parse_args() + + mmaction2torchserve(args.config, args.checkpoint, args.output_folder, + args.model_name, args.label_file, args.model_version, + args.force) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/deployment/mmaction_handler.py b/openmmlab_test/mmaction2-0.24.1/tools/deployment/mmaction_handler.py new file mode 100644 index 0000000000000000000000000000000000000000..10626d15c391d7fdee95c91597272d7730b78959 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/deployment/mmaction_handler.py @@ -0,0 +1,79 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import base64 +import os +import os.path as osp +import warnings + +import decord +import numpy as np +import torch + +from mmaction.apis import inference_recognizer, init_recognizer # noqa: F401 + +try: + from ts.torch_handler.base_handler import BaseHandler +except ImportError: + raise ImportError('`ts` is required. Try: pip install ts.') + + +class MMActionHandler(BaseHandler): + + def initialize(self, context): + properties = context.system_properties + self.map_location = 'cuda' if torch.cuda.is_available() else 'cpu' + self.device = torch.device(self.map_location + ':' + + str(properties.get('gpu_id')) if torch.cuda. + is_available() else self.map_location) + self.manifest = context.manifest + + model_dir = properties.get('model_dir') + serialized_file = self.manifest['model']['serializedFile'] + checkpoint = os.path.join(model_dir, serialized_file) + self.config_file = os.path.join(model_dir, 'config.py') + + mapping_file_path = osp.join(model_dir, 'label_map.txt') + if not os.path.isfile(mapping_file_path): + warnings.warn('Missing the label_map.txt file. ' + 'Inference output will not include class name.') + self.mapping = None + else: + lines = open(mapping_file_path).readlines() + self.mapping = [x.strip() for x in lines] + + self.model = init_recognizer(self.config_file, checkpoint, self.device) + self.initialized = True + + def preprocess(self, data): + videos = [] + + for row in data: + video = row.get('data') or row.get('body') + if isinstance(video, str): + video = base64.b64decode(video) + # First save the bytes as a tmp file + with open('/tmp/tmp.mp4', 'wb') as fout: + fout.write(video) + + video = decord.VideoReader('/tmp/tmp.mp4') + frames = [x.asnumpy() for x in video] + videos.append(np.stack(frames)) + + return videos + + def inference(self, data, *args, **kwargs): + results = [inference_recognizer(self.model, item) for item in data] + return results + + def postprocess(self, data): + # Format output following the example ObjectDetectionHandler format + output = [] + for video_idx, video_result in enumerate(data): + output.append([]) + assert isinstance(video_result, list) + + output[video_idx] = { + self.mapping[x[0]] if self.mapping else x[0]: float(x[1]) + for x in video_result + } + + return output diff --git a/openmmlab_test/mmaction2-0.24.1/tools/deployment/publish_model.py b/openmmlab_test/mmaction2-0.24.1/tools/deployment/publish_model.py new file mode 100644 index 0000000000000000000000000000000000000000..1c59508ce3db5fe815cb3e7067fec24fd5c92053 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/deployment/publish_model.py @@ -0,0 +1,47 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import platform +import subprocess + +import torch + + +def parse_args(): + parser = argparse.ArgumentParser( + description='Process a checkpoint to be published') + parser.add_argument('in_file', help='input checkpoint filename') + parser.add_argument('out_file', help='output checkpoint filename') + args = parser.parse_args() + return args + + +def process_checkpoint(in_file, out_file): + checkpoint = torch.load(in_file, map_location='cpu') + # remove optimizer for smaller file size + if 'optimizer' in checkpoint: + del checkpoint['optimizer'] + # if it is necessary to remove some sensitive data in checkpoint['meta'], + # add the code here. + torch.save(checkpoint, out_file) + if platform.system() == 'Windows': + sha = subprocess.check_output( + ['certutil', '-hashfile', out_file, 'SHA256']) + sha = str(sha).split('\\r\\n')[1] + else: + sha = subprocess.check_output(['sha256sum', out_file]).decode() + if out_file.endswith('.pth'): + out_file_name = out_file[:-4] + else: + out_file_name = out_file + final_file = out_file_name + f'-{sha[:8]}.pth' + os.rename(out_file, final_file) + + +def main(): + args = parse_args() + process_checkpoint(args.in_file, args.out_file) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/deployment/pytorch2onnx.py b/openmmlab_test/mmaction2-0.24.1/tools/deployment/pytorch2onnx.py new file mode 100644 index 0000000000000000000000000000000000000000..9b4cf5ca2dcd17e58012bdd214a64d8c92bf57db --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/deployment/pytorch2onnx.py @@ -0,0 +1,183 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import warnings + +import mmcv +import numpy as np +import torch +from mmcv.runner import load_checkpoint + +from mmaction.models import build_model + +try: + import onnx + import onnxruntime as rt +except ImportError as e: + raise ImportError(f'Please install onnx and onnxruntime first. {e}') + +try: + from mmcv.onnx.symbolic import register_extra_symbolics +except ModuleNotFoundError: + raise NotImplementedError('please update mmcv to version>=1.0.4') + + +def _convert_batchnorm(module): + """Convert the syncBNs into normal BN3ds.""" + module_output = module + if isinstance(module, torch.nn.SyncBatchNorm): + module_output = torch.nn.BatchNorm3d(module.num_features, module.eps, + module.momentum, module.affine, + module.track_running_stats) + if module.affine: + module_output.weight.data = module.weight.data.clone().detach() + module_output.bias.data = module.bias.data.clone().detach() + # keep requires_grad unchanged + module_output.weight.requires_grad = module.weight.requires_grad + module_output.bias.requires_grad = module.bias.requires_grad + module_output.running_mean = module.running_mean + module_output.running_var = module.running_var + module_output.num_batches_tracked = module.num_batches_tracked + for name, child in module.named_children(): + module_output.add_module(name, _convert_batchnorm(child)) + del module + return module_output + + +def pytorch2onnx(model, + input_shape, + opset_version=11, + show=False, + output_file='tmp.onnx', + verify=False): + """Convert pytorch model to onnx model. + + Args: + model (:obj:`nn.Module`): The pytorch model to be exported. + input_shape (tuple[int]): The input tensor shape of the model. + opset_version (int): Opset version of onnx used. Default: 11. + show (bool): Determines whether to print the onnx model architecture. + Default: False. + output_file (str): Output onnx model name. Default: 'tmp.onnx'. + verify (bool): Determines whether to verify the onnx model. + Default: False. + """ + model.cpu().eval() + + input_tensor = torch.randn(input_shape) + + register_extra_symbolics(opset_version) + torch.onnx.export( + model, + input_tensor, + output_file, + export_params=True, + keep_initializers_as_inputs=True, + verbose=show, + opset_version=opset_version) + + print(f'Successfully exported ONNX model: {output_file}') + if verify: + # check by onnx + onnx_model = onnx.load(output_file) + onnx.checker.check_model(onnx_model) + + # check the numerical value + # get pytorch output + pytorch_result = model(input_tensor)[0].detach().numpy() + + # get onnx output + input_all = [node.name for node in onnx_model.graph.input] + input_initializer = [ + node.name for node in onnx_model.graph.initializer + ] + net_feed_input = list(set(input_all) - set(input_initializer)) + assert len(net_feed_input) == 1 + sess = rt.InferenceSession(output_file) + onnx_result = sess.run( + None, {net_feed_input[0]: input_tensor.detach().numpy()})[0] + # only compare part of results + random_class = np.random.randint(pytorch_result.shape[1]) + assert np.allclose( + pytorch_result[:, random_class], onnx_result[:, random_class] + ), 'The outputs are different between Pytorch and ONNX' + print('The numerical values are same between Pytorch and ONNX') + + +def parse_args(): + parser = argparse.ArgumentParser( + description='Convert MMAction2 models to ONNX') + parser.add_argument('config', help='test config file path') + parser.add_argument('checkpoint', help='checkpoint file') + parser.add_argument('--show', action='store_true', help='show onnx graph') + parser.add_argument('--output-file', type=str, default='tmp.onnx') + parser.add_argument('--opset-version', type=int, default=11) + parser.add_argument( + '--verify', + action='store_true', + help='verify the onnx model output against pytorch output') + parser.add_argument( + '--is-localizer', + action='store_true', + help='whether it is a localizer') + parser.add_argument( + '--shape', + type=int, + nargs='+', + default=[1, 3, 8, 224, 224], + help='input video size') + parser.add_argument( + '--softmax', + action='store_true', + help='wheter to add softmax layer at the end of recognizers') + args = parser.parse_args() + return args + + +if __name__ == '__main__': + args = parse_args() + + assert args.opset_version == 11, 'MMAction2 only supports opset 11 now' + + cfg = mmcv.Config.fromfile(args.config) + # import modules from string list. + + if not args.is_localizer: + cfg.model.backbone.pretrained = None + + # build the model + model = build_model( + cfg.model, train_cfg=None, test_cfg=cfg.get('test_cfg')) + model = _convert_batchnorm(model) + + # onnx.export does not support kwargs + if hasattr(model, 'forward_dummy'): + from functools import partial + model.forward = partial(model.forward_dummy, softmax=args.softmax) + elif hasattr(model, '_forward') and args.is_localizer: + model.forward = model._forward + else: + raise NotImplementedError( + 'Please implement the forward method for exporting.') + + checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu') + + # convert model to onnx file + pytorch2onnx( + model, + args.shape, + opset_version=args.opset_version, + show=args.show, + output_file=args.output_file, + verify=args.verify) + + # Following strings of text style are from colorama package + bright_style, reset_style = '\x1b[1m', '\x1b[0m' + red_text, blue_text = '\x1b[31m', '\x1b[34m' + white_background = '\x1b[107m' + + msg = white_background + bright_style + red_text + msg += 'DeprecationWarning: This tool will be deprecated in future. ' + msg += blue_text + 'Welcome to use the unified model deployment toolbox ' + msg += 'MMDeploy: https://github.com/open-mmlab/mmdeploy' + msg += reset_style + warnings.warn(msg) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/dist_test.sh b/openmmlab_test/mmaction2-0.24.1/tools/dist_test.sh new file mode 100644 index 0000000000000000000000000000000000000000..4e90525c092e09e71d3070a1475f4fa5496abce6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/dist_test.sh @@ -0,0 +1,14 @@ +#!/usr/bin/env bash + +NNODES=${NNODES:-1} +NODE_RANK=${NODE_RANK:-0} +MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"} +CONFIG=$1 +CHECKPOINT=$2 +GPUS=$3 +PORT=${PORT:-29500} + +PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ +# Arguments starting from the forth one are captured by ${@:4} +python -m torch.distributed.launch --nnodes=$NNODES --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR \ + --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/test.py $CONFIG $CHECKPOINT --launcher pytorch ${@:4} diff --git a/openmmlab_test/mmaction2-0.24.1/tools/dist_train.sh b/openmmlab_test/mmaction2-0.24.1/tools/dist_train.sh new file mode 100644 index 0000000000000000000000000000000000000000..8944199038b4b53cc396511f2e6851eb8b6525ca --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/dist_train.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +NNODES=${NNODES:-1} +NODE_RANK=${NODE_RANK:-0} +MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"} +CONFIG=$1 +GPUS=$2 +PORT=${PORT:-29500} + +PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ +python -m torch.distributed.launch --nnodes=$NNODES --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR \ + --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py $CONFIG --launcher pytorch ${@:3} +# Any arguments from the third one are captured by ${@:3} diff --git a/openmmlab_test/mmaction2-0.24.1/tools/misc/bsn_proposal_generation.py b/openmmlab_test/mmaction2-0.24.1/tools/misc/bsn_proposal_generation.py new file mode 100644 index 0000000000000000000000000000000000000000..04e3cc72443d0c5cb63ea3320ae8e85aa7075879 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/misc/bsn_proposal_generation.py @@ -0,0 +1,198 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import os.path as osp + +import mmcv +import numpy as np +import torch.multiprocessing as mp + +from mmaction.localization import (generate_bsp_feature, + generate_candidate_proposals) + + +def load_video_infos(ann_file): + """Load the video annotations. + + Args: + ann_file (str): A json file path of the annotation file. + + Returns: + list[dict]: A list containing annotations for videos. + """ + video_infos = [] + anno_database = mmcv.load(ann_file) + for video_name in anno_database: + video_info = anno_database[video_name] + video_info['video_name'] = video_name + video_infos.append(video_info) + return video_infos + + +def generate_proposals(ann_file, tem_results_dir, pgm_proposals_dir, + pgm_proposals_thread, **kwargs): + """Generate proposals using multi-process. + + Args: + ann_file (str): A json file path of the annotation file for + all videos to be processed. + tem_results_dir (str): Directory to read tem results + pgm_proposals_dir (str): Directory to save generated proposals. + pgm_proposals_thread (int): Total number of threads. + kwargs (dict): Keyword arguments for "generate_candidate_proposals". + """ + video_infos = load_video_infos(ann_file) + num_videos = len(video_infos) + num_videos_per_thread = num_videos // pgm_proposals_thread + processes = [] + manager = mp.Manager() + result_dict = manager.dict() + kwargs['result_dict'] = result_dict + for tid in range(pgm_proposals_thread - 1): + tmp_video_list = range(tid * num_videos_per_thread, + (tid + 1) * num_videos_per_thread) + p = mp.Process( + target=generate_candidate_proposals, + args=( + tmp_video_list, + video_infos, + tem_results_dir, + ), + kwargs=kwargs) + p.start() + processes.append(p) + + tmp_video_list = range((pgm_proposals_thread - 1) * num_videos_per_thread, + num_videos) + p = mp.Process( + target=generate_candidate_proposals, + args=( + tmp_video_list, + video_infos, + tem_results_dir, + ), + kwargs=kwargs) + p.start() + processes.append(p) + + for p in processes: + p.join() + + # save results + os.makedirs(pgm_proposals_dir, exist_ok=True) + prog_bar = mmcv.ProgressBar(num_videos) + header = 'tmin,tmax,tmin_score,tmax_score,score,match_iou,match_ioa' + for video_name in result_dict: + proposals = result_dict[video_name] + proposal_path = osp.join(pgm_proposals_dir, video_name + '.csv') + np.savetxt( + proposal_path, + proposals, + header=header, + delimiter=',', + comments='') + prog_bar.update() + + +def generate_features(ann_file, tem_results_dir, pgm_proposals_dir, + pgm_features_dir, pgm_features_thread, **kwargs): + """Generate proposals features using multi-process. + + Args: + ann_file (str): A json file path of the annotation file for + all videos to be processed. + tem_results_dir (str): Directory to read tem results. + pgm_proposals_dir (str): Directory to read generated proposals. + pgm_features_dir (str): Directory to save generated features. + pgm_features_thread (int): Total number of threads. + kwargs (dict): Keyword arguments for "generate_bsp_feature". + """ + video_infos = load_video_infos(ann_file) + num_videos = len(video_infos) + num_videos_per_thread = num_videos // pgm_features_thread + processes = [] + manager = mp.Manager() + feature_return_dict = manager.dict() + kwargs['result_dict'] = feature_return_dict + for tid in range(pgm_features_thread - 1): + tmp_video_list = range(tid * num_videos_per_thread, + (tid + 1) * num_videos_per_thread) + p = mp.Process( + target=generate_bsp_feature, + args=( + tmp_video_list, + video_infos, + tem_results_dir, + pgm_proposals_dir, + ), + kwargs=kwargs) + p.start() + processes.append(p) + tmp_video_list = range((pgm_features_thread - 1) * num_videos_per_thread, + num_videos) + p = mp.Process( + target=generate_bsp_feature, + args=( + tmp_video_list, + video_infos, + tem_results_dir, + pgm_proposals_dir, + ), + kwargs=kwargs) + p.start() + processes.append(p) + + for p in processes: + p.join() + + # save results + os.makedirs(pgm_features_dir, exist_ok=True) + prog_bar = mmcv.ProgressBar(num_videos) + for video_name in feature_return_dict.keys(): + bsp_feature = feature_return_dict[video_name] + feature_path = osp.join(pgm_features_dir, video_name + '.npy') + np.save(feature_path, bsp_feature) + prog_bar.update() + + +def parse_args(): + parser = argparse.ArgumentParser(description='Proposal generation module') + parser.add_argument('config', help='test config file path') + parser.add_argument( + '--mode', + choices=['train', 'test'], + default='test', + help='train or test') + args = parser.parse_args() + return args + + +def main(): + print('Begin Proposal Generation Module') + args = parse_args() + cfg = mmcv.Config.fromfile(args.config) + tem_results_dir = cfg.tem_results_dir + pgm_proposals_dir = cfg.pgm_proposals_dir + pgm_features_dir = cfg.pgm_features_dir + if args.mode == 'test': + generate_proposals(cfg.ann_file_val, tem_results_dir, + pgm_proposals_dir, **cfg.pgm_proposals_cfg) + print('\nFinish proposal generation') + generate_features(cfg.ann_file_val, tem_results_dir, pgm_proposals_dir, + pgm_features_dir, **cfg.pgm_features_test_cfg) + print('\nFinish feature generation') + + elif args.mode == 'train': + generate_proposals(cfg.ann_file_train, tem_results_dir, + pgm_proposals_dir, **cfg.pgm_proposals_cfg) + print('\nFinish proposal generation') + generate_features(cfg.ann_file_train, tem_results_dir, + pgm_proposals_dir, pgm_features_dir, + **cfg.pgm_features_train_cfg) + print('\nFinish feature generation') + + print('Finish Proposal Generation Module') + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/misc/clip_feature_extraction.py b/openmmlab_test/mmaction2-0.24.1/tools/misc/clip_feature_extraction.py new file mode 100644 index 0000000000000000000000000000000000000000..1829bf9b5c36b099ae8c9886f641aced2972affe --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/misc/clip_feature_extraction.py @@ -0,0 +1,229 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import os.path as osp +import warnings +from datetime import datetime + +import mmcv +import numpy as np +import torch +import torch.distributed as dist +from mmcv import Config, DictAction +from mmcv.cnn import fuse_conv_bn +from mmcv.fileio.io import file_handlers +from mmcv.parallel import MMDataParallel, MMDistributedDataParallel +from mmcv.runner import get_dist_info, init_dist, load_checkpoint +from mmcv.runner.fp16_utils import wrap_fp16_model + +from mmaction.apis import multi_gpu_test, single_gpu_test +from mmaction.datasets import build_dataloader, build_dataset +from mmaction.models import build_model +from mmaction.utils import register_module_hooks + + +def parse_args(): + parser = argparse.ArgumentParser( + description='MMAction2 clip-level feature extraction') + parser.add_argument('config', help='test config file path') + parser.add_argument('checkpoint', help='checkpoint file') + parser.add_argument('--video-list', help='video file list') + parser.add_argument('--video-root', help='video root directory') + parser.add_argument( + '--out', + default=None, + help='output result file in pkl/yaml/json format') + parser.add_argument( + '--fuse-conv-bn', + action='store_true', + help='Whether to fuse conv and bn, this will slightly increase' + 'the inference speed') + parser.add_argument( + '--gpu-collect', + action='store_true', + help='whether to use gpu to collect results') + parser.add_argument( + '--tmpdir', + help='tmp directory used for collecting results from multiple ' + 'workers, available when gpu-collect is not specified') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + parser.add_argument( + '--launcher', + choices=['none', 'pytorch', 'slurm', 'mpi'], + default='none', + help='job launcher') + parser.add_argument('--local_rank', type=int, default=0) + args = parser.parse_args() + if 'LOCAL_RANK' not in os.environ: + os.environ['LOCAL_RANK'] = str(args.local_rank) + + return args + + +def turn_off_pretrained(cfg): + # recursively find all pretrained in the model config, + # and set them None to avoid redundant pretrain steps for testing + if 'pretrained' in cfg: + cfg.pretrained = None + + # recursively turn off pretrained value + for sub_cfg in cfg.values(): + if isinstance(sub_cfg, dict): + turn_off_pretrained(sub_cfg) + + +def text2tensor(text, size=256): + nums = [ord(x) for x in text] + assert len(nums) < size + nums.extend([0] * (size - len(nums))) + nums = np.array(nums, dtype=np.uint8) + return torch.from_numpy(nums) + + +def tensor2text(tensor): + # 0 may not occur in a string + chars = [chr(x) for x in tensor if x != 0] + return ''.join(chars) + + +def inference_pytorch(args, cfg, distributed, data_loader): + """Get predictions by pytorch models.""" + # remove redundant pretrain steps for testing + turn_off_pretrained(cfg.model) + + # build the model and load checkpoint + model = build_model( + cfg.model, train_cfg=None, test_cfg=cfg.get('test_cfg')) + + if len(cfg.module_hooks) > 0: + register_module_hooks(model, cfg.module_hooks) + + fp16_cfg = cfg.get('fp16', None) + if fp16_cfg is not None: + wrap_fp16_model(model) + load_checkpoint(model, args.checkpoint, map_location='cpu') + + if args.fuse_conv_bn: + model = fuse_conv_bn(model) + + if not distributed: + model = MMDataParallel(model, device_ids=[0]) + outputs = single_gpu_test(model, data_loader) + else: + model = MMDistributedDataParallel( + model.cuda(), + device_ids=[torch.cuda.current_device()], + broadcast_buffers=False) + outputs = multi_gpu_test(model, data_loader, args.tmpdir, + args.gpu_collect) + + return outputs + + +def main(): + args = parse_args() + + cfg = Config.fromfile(args.config) + + cfg.merge_from_dict(args.cfg_options) + + if cfg.model['test_cfg'] is None: + cfg.model['test_cfg'] = dict(feature_extraction=True) + else: + cfg.model['test_cfg']['feature_extraction'] = True + + # Load output_config from cfg + output_config = cfg.get('output_config', {}) + if args.out: + # Overwrite output_config from args.out + output_config = Config._merge_a_into_b( + dict(out=args.out), output_config) + + assert output_config, 'Please specify output filename with --out.' + + dataset_type = cfg.data.test.type + if output_config.get('out', None): + if 'output_format' in output_config: + # ugly workround to make recognition and localization the same + warnings.warn( + 'Skip checking `output_format` in localization task.') + else: + out = output_config['out'] + # make sure the dirname of the output path exists + mmcv.mkdir_or_exist(osp.dirname(out)) + _, suffix = osp.splitext(out) + assert dataset_type == 'VideoDataset' + + assert suffix[1:] in file_handlers, ( + 'The format of the output ' + 'file should be json, pickle or yaml') + + # set cudnn benchmark + if cfg.get('cudnn_benchmark', False): + torch.backends.cudnn.benchmark = True + cfg.data.test.test_mode = True + cfg.data.test.data_prefix = args.video_root + + # init distributed env first, since logger depends on the dist info. + if args.launcher == 'none': + distributed = False + else: + distributed = True + init_dist(args.launcher, **cfg.dist_params) + + rank, _ = get_dist_info() + + size = 256 + fname_tensor = torch.zeros(size, dtype=torch.uint8).cuda() + if rank == 0: + videos = open(args.video_list).readlines() + videos = [x.strip() for x in videos] + + timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') + fake_anno = f'fake_anno_{timestamp}.txt' + with open(fake_anno, 'w') as fout: + lines = [x + ' 0' for x in videos] + fout.write('\n'.join(lines)) + fname_tensor = text2tensor(fake_anno, size).cuda() + + if distributed: + dist.broadcast(fname_tensor.cuda(), src=0) + + fname = tensor2text(fname_tensor) + cfg.data.test.ann_file = fname + + # The flag is used to register module's hooks + cfg.setdefault('module_hooks', []) + + # build the dataloader + dataset = build_dataset(cfg.data.test, dict(test_mode=True)) + dataloader_setting = dict( + videos_per_gpu=cfg.data.get('videos_per_gpu', 1), + workers_per_gpu=cfg.data.get('workers_per_gpu', 1), + dist=distributed, + shuffle=False) + + dataloader_setting = dict(dataloader_setting, + **cfg.data.get('test_dataloader', {})) + data_loader = build_dataloader(dataset, **dataloader_setting) + + outputs = inference_pytorch(args, cfg, distributed, data_loader) + + if rank == 0: + if output_config.get('out', None): + out = output_config['out'] + print(f'\nwriting results to {out}') + dataset.dump_results(outputs, **output_config) + # remove the temporary file + os.remove(fake_anno) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/misc/dist_clip_feature_extraction.sh b/openmmlab_test/mmaction2-0.24.1/tools/misc/dist_clip_feature_extraction.sh new file mode 100644 index 0000000000000000000000000000000000000000..f5c7a1a607abe4d18a30fe719bb034434e1b2637 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/misc/dist_clip_feature_extraction.sh @@ -0,0 +1,12 @@ +#!/usr/bin/env bash + +CONFIG=$1 +CHECKPOINT=$2 +GPUS=$3 +PORT=${PORT:-29500} + +PYTHONPATH="$(dirname $0)/../..":$PYTHONPATH \ +# Arguments starting from the forth one are captured by ${@:4} +python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \ + $(dirname "$0")/clip_feature_extraction.py $CONFIG $CHECKPOINT \ + --launcher pytorch ${@:4} diff --git a/openmmlab_test/mmaction2-0.24.1/tools/misc/flow_extraction.py b/openmmlab_test/mmaction2-0.24.1/tools/misc/flow_extraction.py new file mode 100644 index 0000000000000000000000000000000000000000..b8763430b52ba9983cd5fea2ff6b4ecb4605bb7c --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/misc/flow_extraction.py @@ -0,0 +1,187 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import os +import os.path as osp + +import cv2 +import numpy as np + + +def flow_to_img(raw_flow, bound=20.): + """Convert flow to gray image. + + Args: + raw_flow (np.ndarray[float]): Estimated flow with the shape (w, h). + bound (float): Bound for the flow-to-image normalization. Default: 20. + + Returns: + np.ndarray[uint8]: The result list of np.ndarray[uint8], with shape + (w, h). + """ + flow = np.clip(raw_flow, -bound, bound) + flow += bound + flow *= (255 / float(2 * bound)) + flow = flow.astype(np.uint8) + return flow + + +def generate_flow(frames, method='tvl1'): + """Estimate flow with given frames. + + Args: + frames (list[np.ndarray[uint8]]): List of rgb frames, with shape + (w, h, 3). + method (str): Use which method to generate flow. Options are 'tvl1' + and 'farneback'. Default: 'tvl1'. + + Returns: + list[np.ndarray[float]]: The result list of np.ndarray[float], with + shape (w, h, 2). + """ + assert method in ['tvl1', 'farneback'] + gray_frames = [cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) for frame in frames] + + if method == 'tvl1': + tvl1 = cv2.optflow.DualTVL1OpticalFlow_create() + + def op(x, y): + return tvl1.calc(x, y, None) + elif method == 'farneback': + + def op(x, y): + return cv2.calcOpticalFlowFarneback(x, y, None, 0.5, 3, 15, 3, 5, + 1.2, 0) + + gray_st = gray_frames[:-1] + gray_ed = gray_frames[1:] + + flow = [op(x, y) for x, y in zip(gray_st, gray_ed)] + return flow + + +def extract_dense_flow(path, + dest, + bound=20., + save_rgb=False, + start_idx=0, + rgb_tmpl='img_{:05d}.jpg', + flow_tmpl='{}_{:05d}.jpg', + method='tvl1'): + """Extract dense flow given video or frames, save them as gray-scale + images. + + Args: + path (str): Location of the input video. + dest (str): The directory to store the extracted flow images. + bound (float): Bound for the flow-to-image normalization. Default: 20. + save_rgb (bool): Save extracted RGB frames. Default: False. + start_idx (int): The starting frame index if use frames as input, the + first image is path.format(start_idx). Default: 0. + rgb_tmpl (str): The template of RGB frame names, Default: + 'img_{:05d}.jpg'. + flow_tmpl (str): The template of Flow frame names, Default: + '{}_{:05d}.jpg'. + method (str): Use which method to generate flow. Options are 'tvl1' + and 'farneback'. Default: 'tvl1'. + """ + + frames = [] + assert osp.exists(path) + video = cv2.VideoCapture(path) + flag, f = video.read() + while flag: + frames.append(f) + flag, f = video.read() + + flow = generate_flow(frames, method=method) + + flow_x = [flow_to_img(x[:, :, 0], bound) for x in flow] + flow_y = [flow_to_img(x[:, :, 1], bound) for x in flow] + + if not osp.exists(dest): + os.system('mkdir -p ' + dest) + flow_x_names = [ + osp.join(dest, flow_tmpl.format('x', ind + start_idx)) + for ind in range(len(flow_x)) + ] + flow_y_names = [ + osp.join(dest, flow_tmpl.format('y', ind + start_idx)) + for ind in range(len(flow_y)) + ] + + num_frames = len(flow) + for i in range(num_frames): + cv2.imwrite(flow_x_names[i], flow_x[i]) + cv2.imwrite(flow_y_names[i], flow_y[i]) + + if save_rgb: + img_names = [ + osp.join(dest, rgb_tmpl.format(ind + start_idx)) + for ind in range(len(frames)) + ] + for frame, name in zip(frames, img_names): + cv2.imwrite(name, frame) + + +def parse_args(): + parser = argparse.ArgumentParser(description='Extract flow and RGB images') + parser.add_argument( + '--input', + help='videos for frame extraction, can be' + 'single video or a video list, the video list should be a txt file ' + 'and just consists of filenames without directories') + parser.add_argument( + '--prefix', + default='', + help='the prefix of input ' + 'videos, used when input is a video list') + parser.add_argument( + '--dest', + default='', + help='the destination to save ' + 'extracted frames') + parser.add_argument( + '--save-rgb', action='store_true', help='also save ' + 'rgb frames') + parser.add_argument( + '--rgb-tmpl', + default='img_{:05d}.jpg', + help='template filename of rgb frames') + parser.add_argument( + '--flow-tmpl', + default='{}_{:05d}.jpg', + help='template filename of flow frames') + parser.add_argument( + '--start-idx', + type=int, + default=1, + help='the start ' + 'index of extracted frames') + parser.add_argument( + '--method', + default='tvl1', + help='use which method to ' + 'generate flow') + parser.add_argument( + '--bound', type=float, default=20, help='maximum of ' + 'optical flow') + + args = parser.parse_args() + return args + + +if __name__ == '__main__': + args = parse_args() + if args.input.endswith('.txt'): + lines = open(args.input).readlines() + lines = [x.strip() for x in lines] + videos = [osp.join(args.prefix, x) for x in lines] + dests = [osp.join(args.dest, x.split('.')[0]) for x in lines] + for video, dest in zip(videos, dests): + extract_dense_flow(video, dest, args.bound, args.save_rgb, + args.start_idx, args.rgb_tmpl, args.flow_tmpl, + args.method) + else: + extract_dense_flow(args.input, args.dest, args.bound, args.save_rgb, + args.start_idx, args.rgb_tmpl, args.flow_tmpl, + args.method) diff --git a/openmmlab_test/mmaction2-0.24.1/tools/slurm_test.sh b/openmmlab_test/mmaction2-0.24.1/tools/slurm_test.sh new file mode 100644 index 0000000000000000000000000000000000000000..fdea5da7e697752c61612d29646f6fe912b90c1b --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/slurm_test.sh @@ -0,0 +1,24 @@ +#!/usr/bin/env bash + +set -x + +PARTITION=$1 +JOB_NAME=$2 +CONFIG=$3 +CHECKPOINT=$4 +GPUS=${GPUS:-8} +GPUS_PER_NODE=${GPUS_PER_NODE:-8} +CPUS_PER_TASK=${CPUS_PER_TASK:-5} +PY_ARGS=${@:5} # Arguments starting from the fifth one are captured +SRUN_ARGS=${SRUN_ARGS:-""} + +PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ +srun -p ${PARTITION} \ + --job-name=${JOB_NAME} \ + --gres=gpu:${GPUS_PER_NODE} \ + --ntasks=${GPUS} \ + --ntasks-per-node=${GPUS_PER_NODE} \ + --cpus-per-task=${CPUS_PER_TASK} \ + --kill-on-bad-exit=1 \ + ${SRUN_ARGS} \ + python -u tools/test.py ${CONFIG} ${CHECKPOINT} --launcher="slurm" ${PY_ARGS} diff --git a/openmmlab_test/mmaction2-0.24.1/tools/slurm_train b/openmmlab_test/mmaction2-0.24.1/tools/slurm_train new file mode 100644 index 0000000000000000000000000000000000000000..15e77f0408d409399ca3006cb3f0d5013586dda6 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/slurm_train @@ -0,0 +1,56 @@ +#!/usr/bin/env bash + +ARGPARSE_DESCRIPTION="Train recognizer on slurm cluster" +source $(dirname $0)/argparse.bash || exit 1 +argparse "$@" < 0: + register_module_hooks(model, cfg.module_hooks) + + fp16_cfg = cfg.get('fp16', None) + if fp16_cfg is not None: + wrap_fp16_model(model) + load_checkpoint(model, args.checkpoint, map_location='cpu') + + if args.fuse_conv_bn: + model = fuse_conv_bn(model) + + if not distributed: + model = build_dp( + model, default_device, default_args=dict(device_ids=cfg.gpu_ids)) + outputs = single_gpu_test(model, data_loader) + else: + model = build_ddp( + model, + default_device, + default_args=dict( + device_ids=[int(os.environ['LOCAL_RANK'])], + broadcast_buffers=False)) + outputs = multi_gpu_test(model, data_loader, args.tmpdir, + args.gpu_collect) + + return outputs + + +def inference_tensorrt(ckpt_path, distributed, data_loader, batch_size): + """Get predictions by TensorRT engine. + + For now, multi-gpu mode and dynamic tensor shape are not supported. + """ + assert not distributed, \ + 'TensorRT engine inference only supports single gpu mode.' + import tensorrt as trt + from mmcv.tensorrt.tensorrt_utils import (torch_device_from_trt, + torch_dtype_from_trt) + + # load engine + with trt.Logger() as logger, trt.Runtime(logger) as runtime: + with open(ckpt_path, mode='rb') as f: + engine_bytes = f.read() + engine = runtime.deserialize_cuda_engine(engine_bytes) + + # For now, only support fixed input tensor + cur_batch_size = engine.get_binding_shape(0)[0] + assert batch_size == cur_batch_size, \ + ('Dataset and TensorRT model should share the same batch size, ' + f'but get {batch_size} and {cur_batch_size}') + + context = engine.create_execution_context() + + # get output tensor + dtype = torch_dtype_from_trt(engine.get_binding_dtype(1)) + shape = tuple(context.get_binding_shape(1)) + device = torch_device_from_trt(engine.get_location(1)) + output = torch.empty( + size=shape, dtype=dtype, device=device, requires_grad=False) + + # get predictions + results = [] + dataset = data_loader.dataset + prog_bar = mmcv.ProgressBar(len(dataset)) + for data in data_loader: + bindings = [ + data['imgs'].contiguous().data_ptr(), + output.contiguous().data_ptr() + ] + context.execute_async_v2(bindings, + torch.cuda.current_stream().cuda_stream) + results.extend(output.cpu().numpy()) + batch_size = len(next(iter(data.values()))) + for _ in range(batch_size): + prog_bar.update() + return results + + +def inference_onnx(ckpt_path, distributed, data_loader, batch_size): + """Get predictions by ONNX. + + For now, multi-gpu mode and dynamic tensor shape are not supported. + """ + assert not distributed, 'ONNX inference only supports single gpu mode.' + + import onnx + import onnxruntime as rt + + # get input tensor name + onnx_model = onnx.load(ckpt_path) + input_all = [node.name for node in onnx_model.graph.input] + input_initializer = [node.name for node in onnx_model.graph.initializer] + net_feed_input = list(set(input_all) - set(input_initializer)) + assert len(net_feed_input) == 1 + + # For now, only support fixed tensor shape + input_tensor = None + for tensor in onnx_model.graph.input: + if tensor.name == net_feed_input[0]: + input_tensor = tensor + break + cur_batch_size = input_tensor.type.tensor_type.shape.dim[0].dim_value + assert batch_size == cur_batch_size, \ + ('Dataset and ONNX model should share the same batch size, ' + f'but get {batch_size} and {cur_batch_size}') + + # get predictions + sess = rt.InferenceSession(ckpt_path) + results = [] + dataset = data_loader.dataset + prog_bar = mmcv.ProgressBar(len(dataset)) + for data in data_loader: + imgs = data['imgs'].cpu().numpy() + onnx_result = sess.run(None, {net_feed_input[0]: imgs})[0] + results.extend(onnx_result) + batch_size = len(next(iter(data.values()))) + for _ in range(batch_size): + prog_bar.update() + return results + + +def main(): + args = parse_args() + + if args.tensorrt and args.onnx: + raise ValueError( + 'Cannot set onnx mode and tensorrt mode at the same time.') + + cfg = Config.fromfile(args.config) + + cfg.merge_from_dict(args.cfg_options) + + # set multi-process settings + setup_multi_processes(cfg) + + # Load output_config from cfg + output_config = cfg.get('output_config', {}) + if args.out: + # Overwrite output_config from args.out + output_config = Config._merge_a_into_b( + dict(out=args.out), output_config) + + # Load eval_config from cfg + eval_config = cfg.get('eval_config', {}) + if args.eval: + # Overwrite eval_config from args.eval + eval_config = Config._merge_a_into_b( + dict(metrics=args.eval), eval_config) + if args.eval_options: + # Add options from args.eval_options + eval_config = Config._merge_a_into_b(args.eval_options, eval_config) + + assert output_config or eval_config, \ + ('Please specify at least one operation (save or eval the ' + 'results) with the argument "--out" or "--eval"') + + dataset_type = cfg.data.test.type + if output_config.get('out', None): + if 'output_format' in output_config: + # ugly workround to make recognition and localization the same + warnings.warn( + 'Skip checking `output_format` in localization task.') + else: + out = output_config['out'] + # make sure the dirname of the output path exists + mmcv.mkdir_or_exist(osp.dirname(out)) + _, suffix = osp.splitext(out) + if dataset_type == 'AVADataset': + assert suffix[1:] == 'csv', ('For AVADataset, the format of ' + 'the output file should be csv') + else: + assert suffix[1:] in file_handlers, ( + 'The format of the output ' + 'file should be json, pickle or yaml') + + # set cudnn benchmark + if cfg.get('cudnn_benchmark', False): + torch.backends.cudnn.benchmark = True + cfg.data.test.test_mode = True + + # init distributed env first, since logger depends on the dist info. + if args.launcher == 'none': + distributed = False + else: + distributed = True + init_dist(args.launcher, **cfg.dist_params) + + # The flag is used to register module's hooks + cfg.setdefault('module_hooks', []) + + # build the dataloader + dataset = build_dataset(cfg.data.test, dict(test_mode=True)) + dataloader_setting = dict( + videos_per_gpu=cfg.data.get('videos_per_gpu', 1), + workers_per_gpu=cfg.data.get('workers_per_gpu', 1), + dist=distributed, + shuffle=False) + dataloader_setting = dict(dataloader_setting, + **cfg.data.get('test_dataloader', {})) + data_loader = build_dataloader(dataset, **dataloader_setting) + + if args.tensorrt: + outputs = inference_tensorrt(args.checkpoint, distributed, data_loader, + dataloader_setting['videos_per_gpu']) + elif args.onnx: + outputs = inference_onnx(args.checkpoint, distributed, data_loader, + dataloader_setting['videos_per_gpu']) + else: + outputs = inference_pytorch(args, cfg, distributed, data_loader) + + rank, _ = get_dist_info() + if rank == 0: + if output_config.get('out', None): + out = output_config['out'] + print(f'\nwriting results to {out}') + dataset.dump_results(outputs, **output_config) + if eval_config: + eval_res = dataset.evaluate(outputs, **eval_config) + for name, val in eval_res.items(): + print(f'{name}: {val:.04f}') + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/tools/train.py b/openmmlab_test/mmaction2-0.24.1/tools/train.py new file mode 100644 index 0000000000000000000000000000000000000000..d4049804645551f02bbbd6572a508a94b93548ae --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/tools/train.py @@ -0,0 +1,222 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import copy +import os +import os.path as osp +import time +import warnings + +import mmcv +import torch +import torch.distributed as dist +from mmcv import Config, DictAction +from mmcv.runner import get_dist_info, init_dist, set_random_seed +from mmcv.utils import get_git_hash + +from mmaction import __version__ +from mmaction.apis import init_random_seed, train_model +from mmaction.datasets import build_dataset +from mmaction.models import build_model +from mmaction.utils import (collect_env, get_root_logger, + register_module_hooks, setup_multi_processes) + + +def parse_args(): + parser = argparse.ArgumentParser(description='Train a recognizer') + parser.add_argument('config', help='train config file path') + parser.add_argument('--work-dir', help='the dir to save logs and models') + parser.add_argument( + '--resume-from', help='the checkpoint file to resume from') + parser.add_argument( + '--validate', + action='store_true', + help='whether to evaluate the checkpoint during training') + parser.add_argument( + '--test-last', + action='store_true', + help='whether to test the checkpoint after training') + parser.add_argument( + '--test-best', + action='store_true', + help=('whether to test the best checkpoint (if applicable) after ' + 'training')) + group_gpus = parser.add_mutually_exclusive_group() + group_gpus.add_argument( + '--gpus', + type=int, + help='number of gpus to use ' + '(only applicable to non-distributed training)') + group_gpus.add_argument( + '--gpu-ids', + type=int, + nargs='+', + help='ids of gpus to use ' + '(only applicable to non-distributed training)') + parser.add_argument('--seed', type=int, default=None, help='random seed') + parser.add_argument( + '--diff-seed', + action='store_true', + help='Whether or not set different seeds for different ranks') + parser.add_argument( + '--deterministic', + action='store_true', + help='whether to set deterministic options for CUDNN backend.') + parser.add_argument( + '--cfg-options', + nargs='+', + action=DictAction, + default={}, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. For example, ' + "'--cfg-options model.backbone.depth=18 model.backbone.with_cp=True'") + parser.add_argument( + '--launcher', + choices=['none', 'pytorch', 'slurm', 'mpi'], + default='none', + help='job launcher') + parser.add_argument('--local_rank', type=int, default=0) + args = parser.parse_args() + if 'LOCAL_RANK' not in os.environ: + os.environ['LOCAL_RANK'] = str(args.local_rank) + + return args + + +def main(): + args = parse_args() + + cfg = Config.fromfile(args.config) + + cfg.merge_from_dict(args.cfg_options) + + # set multi-process settings + setup_multi_processes(cfg) + + # set cudnn_benchmark + if cfg.get('cudnn_benchmark', False): + torch.backends.cudnn.benchmark = True + + # work_dir is determined in this priority: + # CLI > config file > default (base filename) + if args.work_dir is not None: + # update configs according to CLI args if args.work_dir is not None + cfg.work_dir = args.work_dir + elif cfg.get('work_dir', None) is None: + # use config filename as default work_dir if cfg.work_dir is None + cfg.work_dir = osp.join('./work_dirs', + osp.splitext(osp.basename(args.config))[0]) + if args.resume_from is not None: + cfg.resume_from = args.resume_from + + if args.gpu_ids is not None or args.gpus is not None: + warnings.warn( + 'The Args `gpu_ids` and `gpus` are only used in non-distributed ' + 'mode and we highly encourage you to use distributed mode, i.e., ' + 'launch training with dist_train.sh. The two args will be ' + 'deperacted.') + if args.gpu_ids is not None: + warnings.warn( + 'Non-distributed training can only use 1 gpu now. We will ' + 'use the 1st one in gpu_ids. ') + cfg.gpu_ids = [args.gpu_ids[0]] + elif args.gpus is not None: + warnings.warn('Non-distributed training can only use 1 gpu now. ') + cfg.gpu_ids = range(1) + + # init distributed env first, since logger depends on the dist info. + if args.launcher == 'none': + distributed = False + else: + distributed = True + init_dist(args.launcher, **cfg.dist_params) + _, world_size = get_dist_info() + cfg.gpu_ids = range(world_size) + + # The flag is used to determine whether it is omnisource training + cfg.setdefault('omnisource', False) + + # The flag is used to register module's hooks + cfg.setdefault('module_hooks', []) + + # create work_dir + mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) + # dump config + cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config))) + # init logger before other steps + timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime()) + log_file = osp.join(cfg.work_dir, f'{timestamp}.log') + logger = get_root_logger(log_file=log_file, log_level=cfg.log_level) + + # init the meta dict to record some important information such as + # environment info and seed, which will be logged + meta = dict() + # log env info + env_info_dict = collect_env() + env_info = '\n'.join([f'{k}: {v}' for k, v in env_info_dict.items()]) + dash_line = '-' * 60 + '\n' + logger.info('Environment info:\n' + dash_line + env_info + '\n' + + dash_line) + meta['env_info'] = env_info + + # log some basic info + logger.info(f'Distributed training: {distributed}') + logger.info(f'Config: {cfg.pretty_text}') + + # set random seeds + seed = init_random_seed(args.seed, distributed=distributed) + seed = seed + dist.get_rank() if args.diff_seed else seed + logger.info(f'Set random seed to {seed}, ' + f'deterministic: {args.deterministic}') + set_random_seed(seed, deterministic=args.deterministic) + + cfg.seed = seed + meta['seed'] = seed + meta['config_name'] = osp.basename(args.config) + meta['work_dir'] = osp.basename(cfg.work_dir.rstrip('/\\')) + + model = build_model( + cfg.model, + train_cfg=cfg.get('train_cfg'), + test_cfg=cfg.get('test_cfg')) + + if len(cfg.module_hooks) > 0: + register_module_hooks(model, cfg.module_hooks) + + if cfg.omnisource: + # If omnisource flag is set, cfg.data.train should be a list + assert isinstance(cfg.data.train, list) + datasets = [build_dataset(dataset) for dataset in cfg.data.train] + else: + datasets = [build_dataset(cfg.data.train)] + + if len(cfg.workflow) == 2: + # For simplicity, omnisource is not compatible with val workflow, + # we recommend you to use `--validate` + assert not cfg.omnisource + if args.validate: + warnings.warn('val workflow is duplicated with `--validate`, ' + 'it is recommended to use `--validate`. see ' + 'https://github.com/open-mmlab/mmaction2/pull/123') + val_dataset = copy.deepcopy(cfg.data.val) + datasets.append(build_dataset(val_dataset)) + if cfg.checkpoint_config is not None: + # save mmaction version, config file content and class names in + # checkpoints as meta data + cfg.checkpoint_config.meta = dict( + mmaction_version=__version__ + get_git_hash(digits=7), + config=cfg.pretty_text) + + test_option = dict(test_last=args.test_last, test_best=args.test_best) + train_model( + model, + datasets, + cfg, + distributed=distributed, + validate=args.validate, + test=test_option, + timestamp=timestamp, + meta=meta) + + +if __name__ == '__main__': + main() diff --git a/openmmlab_test/mmaction2-0.24.1/train.md b/openmmlab_test/mmaction2-0.24.1/train.md new file mode 100644 index 0000000000000000000000000000000000000000..0014a32e7746a350255c2209ddf3896fb9a90b96 --- /dev/null +++ b/openmmlab_test/mmaction2-0.24.1/train.md @@ -0,0 +1,65 @@ +# MMaction2算例测试 + +## 测试前准备 + +### 环境部署 + +```python +yum install python3 +yum install libquadmath +yum install numactl +yum install openmpi3 +yum install glog +yum install lmdb-libs +yum install opencv-core +yum install opencv +yum install openblas-serial +pip3 install --upgrade pip +pip3 install opencv-python +``` + +### 安装python依赖包 + +```python +pip3 install torch-1.10.0a0+git2040069.dtk2210-cp37-cp37m-manylinux2014_x86_64.whl -i https://pypi.tuna.tsinghua.edu.cn/simple +pip3 install torchvision-0.10.0a0+e04d001.dtk2210-cp37-cp37m-manylinux2014_x86_64.whl -i https://pypi.tuna.tsinghua.edu.cn/simple +pip3 install mmcv_full-1.6.1+gitdebbc80.dtk2210-cp37-cp37m-manylinux2014_x86_64.whl -i https://pypi.tuna.tsinghua.edu.cn/simple +mmaction2 安装: +cd mmaction2-0.24.1 +pip3 install -e . +``` + +注:测试不同版本的dtk,需安装对应版本的库whl包 + +## ST-GCN测试 +### 单精度测试 + +### 单卡测试(单精度) + +```python +export ROCBLAS_ATOMICS_MOD=1 +./sing_test.sh configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py +``` +#### 参数说明 + +configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py 中batch_size=videos_per_gpu*卡数,性能计算方法:batch_size/time + +#### 性能关注:time + +### 多卡测试(单精度) +#### 单机多卡训练 + +1.pytorch单机多卡训练 + +```python +export ROCBLAS_ATOMICS_MOD=1 +./tools/dist_train.sh configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py $GPUS +``` +#### 多机多卡训练 + +1.pytorch多机多卡训练 +在第一台机器上: +NODES=2 NODE_RANK=0 PORT=12345 MASTER_ADDR=10.1.3.56 sh tools/dist_train.sh export ROCBLAS_ATOMICS_MOD=1 $GPUS +在第二台机器上: +NODES=2 NODE_RANK=1 PORT=12345 MASTER_ADDR=10.1.3.56 sh tools/dist_train.sh export ROCBLAS_ATOMICS_MOD=1 $GPUS +