init

e0a11e60 · luopl · e0a11e60 · e0a11e60 · e0a11e60 · e0a11e60
Commit e0a11e60 authored Aug 21, 2024 by luopl
20 changed files
--- a/.gitignore
+++ b/.gitignore
+# output dir
+output
+output*
+instant_test_output
+inference_test_output
+
+
+*.json
+*.diff
+
+# compilation and distribution
+__pycache__
+_ext
+*.pyc
+*.pyd
+*.so
+detectron2.egg-info/
+build/
+dist/
+wheels/
+
+# pytorch/python/numpy formats
+*.pth
+*.pkl
+*.npy
+
+# ipython/jupyter notebooks
+*.ipynb
+**/.ipynb_checkpoints/
+
+# Editor temporaries
+*.swn
+*.swo
+*.swp
+*~
+
+# editor settings
+.idea
+.vscode
+_darcs
+
+# project dirs
+/detectron2/model_zoo/configs
+/datasets/*
+!/datasets/*.*
+/projects/*/datasets
+
--- a/GETTING_STARTED.md
+++ b/GETTING_STARTED.md
+## Getting Started with DiffusionDet
+
+
+
+### Installation
+
+The codebases are built on top of [Detectron2](https://github.com/facebookresearch/detectron2), [Sparse R-CNN](https://github.com/PeizeSun/SparseR-CNN), and [denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
+Thanks very much.
+
+#### Requirements
+- Linux or macOS with Python ≥ 3.6
+- PyTorch ≥ 1.9.0 and [torchvision](https://github.com/pytorch/vision/) that matches the PyTorch installation.
+  You can install them together at [pytorch.org](https://pytorch.org) to make sure of this
+- OpenCV is optional and needed by demo and visualization
+
+#### Steps
+1. Install Detectron2 following https://github.com/facebookresearch/detectron2/blob/main/INSTALL.md#installation.
+
+2. Prepare datasets
+```
+mkdir -p datasets/coco
+mkdir -p datasets/lvis
+
+ln -s /path_to_coco_dataset/annotations datasets/coco/annotations
+ln -s /path_to_coco_dataset/train2017 datasets/coco/train2017
+ln -s /path_to_coco_dataset/val2017 datasets/coco/val2017
+
+ln -s /path_to_lvis_dataset/lvis_v1_train.json datasets/lvis/lvis_v1_train.json
+ln -s /path_to_lvis_dataset/lvis_v1_val.json datasets/lvis/lvis_v1_val.json
+```
+
+3. Prepare pretrain models
+
+DiffusionDet uses three backbones including ResNet-50, ResNet-101 and Swin-Base. The pretrained ResNet-50 model can be
+downloaded automatically by Detectron2. We also provide pretrained
+[ResNet-101](https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/torchvision-R-101.pkl) and
+[Swin-Base](https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/swin_base_patch4_window7_224_22k.pkl) which are compatible with
+Detectron2. Please download them to `DiffusionDet_ROOT/models/` before training:
+
+```bash
+mkdir models
+cd models
+# ResNet-101
+wget https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/torchvision-R-101.pkl
+
+# Swin-Base
+wget https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/swin_base_patch4_window7_224_22k.pkl
+
+cd ..
+```
+
+Thanks for model conversion scripts of [ResNet-101](https://github.com/PeizeSun/SparseR-CNN/blob/main/tools/convert-torchvision-to-d2.py)
+and [Swin-Base](https://github.com/facebookresearch/Detic/blob/main/tools/convert-thirdparty-pretrained-model-to-d2.py).
+
+4. Train DiffusionDet
+```
+python train_net.py --num-gpus 8 \
+    --config-file configs/diffdet.coco.res50.yaml
+```
+
+5. Evaluate DiffusionDet
+```
+python train_net.py --num-gpus 8 \
+    --config-file configs/diffdet.coco.res50.yaml \
+    --eval-only MODEL.WEIGHTS path/to/model.pth
+```
+
+* Evaluate with arbitrary number (e.g 300) of boxes by setting `MODEL.DiffusionDet.NUM_PROPOSALS 300`.
+* Evaluate with 4 refinement steps by setting `MODEL.DiffusionDet.SAMPLE_STEP 4`.
+
+
+We also provide the [pretrained model](https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_coco_res50_300boxes.pth)
+of [DiffusionDet-300boxes](configs/diffdet.coco.res50.300boxes.yaml) that is used for ablation study.
+
+
+### Inference Demo with Pre-trained Models
+We provide a command line tool to run a simple demo following [Detectron2](https://github.com/facebookresearch/detectron2/tree/main/demo#detectron2-demo).
+
+```bash
+python demo.py --config-file configs/diffdet.coco.res50.yaml \
+    --input image.jpg --opts MODEL.WEIGHTS diffdet_coco_res50.pth
+```
+
+We need to specify `MODEL.WEIGHTS` to a model from model zoo for evaluation.
+This command will run the inference and show visualizations in an OpenCV window.
+
+For details of the command line arguments, see `demo.py -h` or look at its source code
+to understand its behavior. Some common arguments are:
+* To run __on your webcam__, replace `--input files` with `--webcam`.
+* To run __on a video__, replace `--input files` with `--video-input video.mp4`.
+* To run __on cpu__, add `MODEL.DEVICE cpu` after `--opts`.
+* To save outputs to a directory (for images) or a file (for webcam or video), use `--output`.
--- a/LICENSE
+++ b/LICENSE
+
+Attribution-NonCommercial 4.0 International
+
+=======================================================================
+
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+
+Using Creative Commons Public Licenses
+
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+   wiki.creativecommons.org/Considerations_for_licensors
+
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More_considerations
+     for the public: 
+   wiki.creativecommons.org/Considerations_for_licensees
+
+=======================================================================
+
+Creative Commons Attribution-NonCommercial 4.0 International Public
+License
+
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial 4.0 International Public License ("Public
+License"). To the extent this Public License may be interpreted as a
+contract, You are granted the Licensed Rights in consideration of Your
+acceptance of these terms and conditions, and the Licensor grants You
+such rights in consideration of benefits the Licensor receives from
+making the Licensed Material available under these terms and
+conditions.
+
+Section 1 -- Definitions.
+
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+
+  b. Adapter's License means the license You apply to Your Copyright
+     and Similar Rights in Your contributions to Adapted Material in
+     accordance with the terms and conditions of this Public License.
+
+  c. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+  d. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+
+  e. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+
+  f. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+
+  g. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+
+  h. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+
+  i. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+
+  j. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+
+  k. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+
+  l. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+
+Section 2 -- Scope.
+
+  a. License grant.
+
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+
+            b. produce, reproduce, and Share Adapted Material for
+               NonCommercial purposes only.
+
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+
+       5. Downstream recipients.
+
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+
+            b. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+
+  b. Other rights.
+
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+
+Section 3 -- License Conditions.
+
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+
+  a. Attribution.
+
+       1. If You Share the Licensed Material (including in modified
+          form), You must:
+
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+
+                ii. a copyright notice;
+
+               iii. a notice that refers to this Public License;
+
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+
+       4. If You Share Adapted Material You produce, the Adapter's
+          License You apply must not prevent recipients of the Adapted
+          Material from complying with this Public License.
+
+Section 4 -- Sui Generis Database Rights.
+
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only;
+
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material; and
+
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+
+Section 6 -- Term and Termination.
+
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+
+       2. upon express reinstatement by the Licensor.
+
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+
+Section 7 -- Other Terms and Conditions.
+
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+
+Section 8 -- Interpretation.
+
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+
+=======================================================================
+
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the “Licensor.” The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+
+Creative Commons may be contacted at creativecommons.org.
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# DiffusionDet
+## 论文
+`DiffusionDet: Diffusion Model for Object Detection`
+- https://arxiv.org/abs/2211.09788
+## 模型结构
+ 扩散模型在许多生成任务中取得了巨大成功，开始在感知任务如图像分割中进行探索。然而，据作者所知，尚无成功将其应用于目标检测的先例。
+ DiffusionDet是一种新框架，它将目标检测表述为从噪声框到目标框的去噪扩散过程。
+
+<div align=center>
+    <img src="./assets/teaser.png"/>
+</div>
+
+## 算法原理
+DiffusionDet框架如下图。(a) 图像编码器从输入图像中提取特征表示。检测解码器以带噪声的框为输入，预测类别分类和框坐标。
+(b) 检测解码器在一个检测头部有 6 个阶段，遵循了 DETR 和 Sparse R-CNN 的设计。此外，DiffusionDet 可以多次重用这个检测头部（包含 6 个阶段），这被称为“迭代评估”。
+<div align=center>
+    <img src="./assets/framework.png"/>
+</div>
+
+## 环境配置
+### Docker（方法一）
+此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤，以及[光合](https://developer.hpccube.com/tool/)开发者社区深度学习库下载地址
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310 
+docker run -it --shm-size=128G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name diffusiondet_pytorch  <your IMAGE ID> bash # <your IMAGE ID>为以上拉取的docker的镜像ID替换，本镜像为：c85ed27005f2
+cd /path/your_code_data/diffusiondet_pytorch
+pip install mmcv-2.0.1_das1.0+gitc0ccf15.abi0.dtk2404.torch2.1.-cp310-cp310-manylinux2014_x86_64.whl
+pip install wheel -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com --no-deps
+pip install timm -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
+git clone https://github.com/facebookresearch/detectron2.git
+cd detectron2
+pip install e . --no-build-isolation -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
+
+```
+### Dockerfile（方法二）
+此处提供dockerfile的使用方法
+```
+docker build --no-cache -t diffusiondet:latest .
+docker run -it --shm-size=128G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name diffusiondet_pytorch  diffusiondet  bash
+cd /path/your_code_data/diffusiondet_pytorch
+pip install mmcv-2.0.1_das1.0+gitc0ccf15.abi0.dtk2404.torch2.1.-cp310-cp310-manylinux2014_x86_64.whl
+pip install wheel -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com --no-deps
+pip install timm -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
+git clone https://github.com/facebookresearch/detectron2.git
+cd detectron2
+pip install e . --no-build-isolation -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
+```
+### Anaconda（方法三）
+此处提供本地配置、编译的详细步骤，例如：
+
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+```
+#DTK驱动：dtk24.04
+# python：python3.10
+# torch: 2.1.0
+# torchvision: 0.16.0
+conda create -n diffusiondet python=3.10
+conda activate diffusiondet
+pip install torch-2.1.0+das1.0+git00661e0.abi0.dtk2404-cp310-cp310-manylinux2014_x86_64.whl
+pip install torchvision-0.16.0+das1.0+gitc9e7141.abi0.dtk2404.torch2.1-cp310-cp310-manylinux2014_x86_64.whl
+pip install mmcv-2.0.1_das1.0+gitc0ccf15.abi0.dtk2404.torch2.1.-cp310-cp310-manylinux2014_x86_64.whl
+
+```
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
+
+其它依赖环境安装如下：
+```
+cd /path/your_code_data/sed
+git clone https://github.com/facebookresearch/detectron2.git
+cd detectron2
+pip install e . --no-build-isolation -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
+pip install timm -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
+```
+## 数据集
+
+dataset数据结构如下:
+数据集SCNet快速下载链接
+
+[coco](http://113.200.138.88:18080/aidatasets/coco2017)
+
+[lvis](http://113.200.138.88:18080/aidatasets/lvis)
+
+```
+ ── dataset
+│   ├── coco
+│   │  ├── annotations
+│   │  ├── train2017
+│   │  └── val2017
+│   ├── lvis
+│   │  ├── lvis_v1_train.json
+│   │  └── lvis_v1_val.json
+```
+数据准备详情查看dataset/readme.md。
+
+## 训练
+首先下载模型文件:
+
+模型文件SCNet快速下载链接[pkl文件](http://113.200.138.88:18080/aimodels/diffusiondet_models)
+下载后放于/path/your_code_data/diffusiondet_pytorch/文件夹下
+```
+mkdir models
+cd models
+# ResNet-101
+wget https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/torchvision-R-101.pkl
+
+# Swin-Base
+wget https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/swin_base_patch4_window7_224_22k.pkl
+
+cd ..
+
+
+```
+
+### 单机单卡
+```
+python train_net.py --config-file configs/diffdet.coco.res50.yaml
+```
+### 单机多卡
+```
+python train_net.py --num-gpus 4 --config-file configs/diffdet.coco.res50.yaml
+```
+
+## 推理
+模型权重文件下载表格如下，放到weights文件夹下：
+
+注意：模型配置文件、clip文件与权重文件应一一对应
+
+### 单卡推理
+
+Inference Demo
+
+To save outputs to a directory , use --output
+```
+python demo.py --config-file configs/diffdet.coco.res50.yaml \
+    --input demo.jpg --opts MODEL.WEIGHTS diffdet_coco_res50.pth 
+```
+
+Evaluate DiffusionDet
+```
+python train_net.py \
+    --config-file configs/diffdet.coco.res50.yaml \
+    --eval-only MODEL.WEIGHTS path/to/model.pth
+```
+
+
+### 多卡推理
+
+```
+python train_net.py --num-gpus 4 \
+    --config-file configs/diffdet.coco.res50.yaml \
+    --eval-only MODEL.WEIGHTS path/to/model.pth
+    
+#Evaluate with arbitrary number (e.g 300) of boxes by setting MODEL.DiffusionDet.NUM_PROPOSALS 300.
+#Evaluate with 4 refinement steps by setting MODEL.DiffusionDet.SAMPLE_STEP 4.
+```
+
+## result
+Inference Demo
+
+<div align=center>
+    <img src="./assets/demo.jpg"/>
+</div>
+
+
+
+
+### 精度
+使用四张DCU-K100 AI卡推理
+
+|                                        Method                                        | Box AP (1 step) | Box AP (4 step) | 
+|:------------------------------------------------------------------------------------:|:---------------:|------|
+|                    [COCO-Res50](configs/diffdet.coco.res50.yaml)                     |      45.7       | 46.1 |
+|                   [COCO-Res101](configs/diffdet.coco.res101.yaml)                    |      46.6       | 46.9 |
+|                 [COCO-SwinBase](configs/diffdet.coco.swinbase.yaml)                  |      52.3       | 52.7 |
+|                    [LVIS-Res50](configs/diffdet.lvis.res50.yaml)                     |      30.4       | 31.8 |
+|                   [LVIS-Res101](configs/diffdet.lvis.res101.yaml)                    |      31.9       | 32.9 |
+|                  [LVIS-SwinBase](configs/diffdet.lvis.swinbase.yaml)                 |      40.6       | 41.9|
+
+
+
+## 应用场景
+### 算法类别
+`目标检测`
+### 热点应用行业
+`科研,制造,医疗,家居,教育`
+## 源码仓库及问题反馈
+- https://developer.hpccube.com/codes/modelzoo/diffusiondet-pytorch
+## 参考资料
+- https://github.com/ShoufaChen/DiffusionDet
+
--- a/README_ori.md
+++ b/README_ori.md
+## DiffusionDet: Diffusion Model for Object Detection
+
+**DiffusionDet is the first work of diffusion model for object detection.**
+
+![](teaser.png)
+
+
+> [**DiffusionDet: Diffusion Model for Object Detection**](https://arxiv.org/abs/2211.09788)               
+> [Shoufa Chen](https://www.shoufachen.com/), [Peize Sun](https://peizesun.github.io/), [Yibing Song](https://ybsong00.github.io/), [Ping Luo](http://luoping.me/)                 
+> *[arXiv 2211.09788](https://arxiv.org/abs/2211.09788)* 
+
+## Updates
+- (11/2022) Code is released.
+
+## Models
+Method | Box AP (1 step) | Box AP (4 step) | Download
+--- |:---:|:---:|:---:
+[COCO-Res50](configs/diffdet.coco.res50.yaml) | 45.5 | 46.1 | [model](https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_coco_res50.pth)
+[COCO-Res101](configs/diffdet.coco.res101.yaml) | 46.6 | 46.9 | [model](https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_coco_res101.pth)
+[COCO-SwinBase](configs/diffdet.coco.swinbase.yaml) | 52.3 | 52.7 | [model](https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_coco_swinbase.pth)
+[LVIS-Res50](configs/diffdet.lvis.res50.yaml) | 30.4 | 31.8 | [model](https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_lvis_res50.pth)
+[LVIS-Res101](configs/diffdet.lvis.res101.yaml) | 31.9 | 32.9 | [model](https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_lvis_res101.pth)
+[LVIS-SwinBase](configs/diffdet.lvis.swinbase.yaml) | 40.6 | 41.9 | [model](https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_lvis_swinbase.pth)
+
+
+## Getting Started
+
+The installation instruction and usage are in [Getting Started with DiffusionDet](GETTING_STARTED.md).
+
+
+## License
+
+This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.
+
+
+## Citing DiffusionDet
+
+If you use DiffusionDet in your research or wish to refer to the baseline results published here, please use the following BibTeX entry.
+
+```BibTeX
+@article{chen2022diffusiondet,
+      title={DiffusionDet: Diffusion Model for Object Detection},
+      author={Chen, Shoufa and Sun, Peize and Song, Yibing and Luo, Ping},
+      journal={arXiv preprint arXiv:2211.09788},
+      year={2022}
+}
+```
\ No newline at end of file
--- a/assets/demo.jpg
+++ b/assets/demo.jpg
--- a/assets/framework.png
+++ b/assets/framework.png
--- a/assets/teaser.png
+++ b/assets/teaser.png
--- a/configs/Base-DiffusionDet.yaml
+++ b/configs/Base-DiffusionDet.yaml
+MODEL:
+  META_ARCHITECTURE: "DiffusionDet"
+  WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
+  BACKBONE:
+    NAME: "build_resnet_fpn_backbone"
+  RESNETS:
+    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
+  FPN:
+    IN_FEATURES: ["res2", "res3", "res4", "res5"]
+  ROI_HEADS:
+    IN_FEATURES: ["p2", "p3", "p4", "p5"]
+  ROI_BOX_HEAD:
+    POOLER_TYPE: "ROIAlignV2"
+    POOLER_RESOLUTION: 7
+    POOLER_SAMPLING_RATIO: 2
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.000025
+  STEPS: (210000, 250000)
+  MAX_ITER: 270000
+  WARMUP_FACTOR: 0.01
+  WARMUP_ITERS: 1000
+  WEIGHT_DECAY: 0.0001
+  OPTIMIZER: "ADAMW"
+  BACKBONE_MULTIPLIER: 1.0  # keep same with BASE_LR.
+  CLIP_GRADIENTS:
+    ENABLED: True
+    CLIP_TYPE: "full_model"
+    CLIP_VALUE: 1.0
+    NORM_TYPE: 2.0
+SEED: 40244023
+INPUT:
+  MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
+  CROP:
+    ENABLED: False
+    TYPE: "absolute_range"
+    SIZE: (384, 600)
+  FORMAT: "RGB"
+TEST:
+  EVAL_PERIOD: 7330
+DATALOADER:
+  FILTER_EMPTY_ANNOTATIONS: False
+  NUM_WORKERS: 4
+VERSION: 2
--- a/configs/diffdet.coco.res101.yaml
+++ b/configs/diffdet.coco.res101.yaml
+_BASE_: "Base-DiffusionDet.yaml"
+MODEL:
+  WEIGHTS: "models/torchvision-R-101.pkl"
+  RESNETS:
+    DEPTH: 101
+    STRIDE_IN_1X1: False
+  DiffusionDet:
+    NUM_PROPOSALS: 500
+    NUM_CLASSES: 80
+DATASETS:
+  TRAIN: ("coco_2017_train",)
+  TEST:  ("coco_2017_val",)
+SOLVER:
+  STEPS: (350000, 420000)
+  MAX_ITER: 450000
+INPUT:
+  CROP:
+    ENABLED: True
+  FORMAT: "RGB"
--- a/configs/diffdet.coco.res50.300boxes.yaml
+++ b/configs/diffdet.coco.res50.300boxes.yaml
+_BASE_: "Base-DiffusionDet.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
+  RESNETS:
+    DEPTH: 50
+    STRIDE_IN_1X1: False
+  DiffusionDet:
+    NUM_PROPOSALS: 300
+    NUM_CLASSES: 80
+DATASETS:
+  TRAIN: ("coco_2017_train",)
+  TEST:  ("coco_2017_val",)
+SOLVER:
+  STEPS: (350000, 420000)
+  MAX_ITER: 450000
+INPUT:
+  CROP:
+    ENABLED: True
+  FORMAT: "RGB"
--- a/configs/diffdet.coco.res50.yaml
+++ b/configs/diffdet.coco.res50.yaml
+_BASE_: "Base-DiffusionDet.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
+  RESNETS:
+    DEPTH: 50
+    STRIDE_IN_1X1: False
+  DiffusionDet:
+    NUM_PROPOSALS: 500
+    NUM_CLASSES: 80
+DATASETS:
+  TRAIN: ("coco_2017_train",)
+  TEST:  ("coco_2017_val",)
+SOLVER:
+  STEPS: (350000, 420000)
+  MAX_ITER: 450000
+INPUT:
+  CROP:
+    ENABLED: True
+  FORMAT: "RGB"
--- a/configs/diffdet.coco.swinbase.yaml
+++ b/configs/diffdet.coco.swinbase.yaml
+_BASE_: "Base-DiffusionDet.yaml"
+MODEL:
+  WEIGHTS: "models/swin_base_patch4_window7_224_22k.pkl"
+  BACKBONE:
+    NAME: build_swintransformer_fpn_backbone
+  SWIN:
+    SIZE: B-22k
+  FPN:
+    IN_FEATURES: ["swin0", "swin1", "swin2", "swin3" ]
+  DiffusionDet:
+    NUM_PROPOSALS: 500
+    NUM_CLASSES: 80
+DATASETS:
+  TRAIN: ("coco_2017_train",)
+  TEST:  ("coco_2017_val",)
+SOLVER:
+  STEPS: (350000, 420000)
+  MAX_ITER: 450000
+INPUT:
+  CROP:
+    ENABLED: True
+  FORMAT: "RGB"
--- a/configs/diffdet.lvis.res101.yaml
+++ b/configs/diffdet.lvis.res101.yaml
+_BASE_: "Base-DiffusionDet.yaml"
+MODEL:
+  WEIGHTS: "models/torchvision-R-101.pkl"
+  RESNETS:
+    DEPTH: 101
+    STRIDE_IN_1X1: False
+  ROI_HEADS:
+    NUM_CLASSES: 1203  # LVIS
+  DiffusionDet:
+    NUM_PROPOSALS: 500
+    NUM_CLASSES: 1203  # LVIS
+    USE_FED_LOSS: True  # LVIS
+DATASETS:  # LVIS
+  TRAIN: ("lvis_v1_train",)
+  TEST: ("lvis_v1_val",)
+DATALOADER:
+  SAMPLER_TRAIN: "RepeatFactorTrainingSampler"
+  REPEAT_THRESHOLD: 0.001
+SOLVER:
+  STEPS: (210000, 250000)
+  MAX_ITER: 270000
+INPUT:
+  CROP:
+    ENABLED: True
+  FORMAT: "RGB"
+TEST:  # LVIS
+  EVAL_PERIOD: 0  # disable eval during train since long time
--- a/configs/diffdet.lvis.res50.yaml
+++ b/configs/diffdet.lvis.res50.yaml
+_BASE_: "Base-DiffusionDet.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
+  RESNETS:
+    DEPTH: 50
+    STRIDE_IN_1X1: False
+  ROI_HEADS:
+    NUM_CLASSES: 1203  # LVIS
+  DiffusionDet:
+    NUM_PROPOSALS: 500
+    NUM_CLASSES: 1203  # LVIS
+    USE_FED_LOSS: True  # LVIS
+DATASETS:  # LVIS
+  TRAIN: ("lvis_v1_train",)
+  TEST: ("lvis_v1_val",)
+DATALOADER:
+  SAMPLER_TRAIN: "RepeatFactorTrainingSampler"
+  REPEAT_THRESHOLD: 0.001
+SOLVER:
+  STEPS: (210000, 250000)
+  MAX_ITER: 270000
+INPUT:
+  CROP:
+    ENABLED: True
+  FORMAT: "RGB"
+TEST:  # LVIS
+  EVAL_PERIOD: 0  # disable eval during train since long time
--- a/configs/diffdet.lvis.swinbase.yaml
+++ b/configs/diffdet.lvis.swinbase.yaml
+_BASE_: "Base-DiffusionDet.yaml"
+MODEL:
+  WEIGHTS: "models/swin_base_patch4_window7_224_22k.pkl"
+  BACKBONE:
+    NAME: build_swintransformer_fpn_backbone
+  SWIN:
+    SIZE: B-22k
+  FPN:
+    IN_FEATURES: [ "swin0", "swin1", "swin2", "swin3" ]
+  ROI_HEADS:
+    NUM_CLASSES: 1203  # LVIS
+  DiffusionDet:
+    NUM_PROPOSALS: 500
+    NUM_CLASSES: 1203  # LVIS
+    USE_FED_LOSS: True  # LVIS
+DATASETS:  # LVIS
+  TRAIN: ("lvis_v1_train",)
+  TEST: ("lvis_v1_val",)
+DATALOADER:
+  SAMPLER_TRAIN: "RepeatFactorTrainingSampler"
+  REPEAT_THRESHOLD: 0.001
+SOLVER:
+  STEPS: (210000, 250000)
+  MAX_ITER: 270000
+INPUT:
+  CROP:
+    ENABLED: True
+  FORMAT: "RGB"
+TEST:  # LVIS
+  EVAL_PERIOD: 0  # disable eval during train since long time
--- a/demo.jpg
+++ b/demo.jpg
--- a/demo.py
+++ b/demo.py
+# Copyright (c) Facebook, Inc. and its affiliates.
+import argparse
+import glob
+import multiprocessing as mp
+import numpy as np
+import os
+import tempfile
+import time
+import warnings
+import cv2
+import tqdm
+
+from detectron2.config import get_cfg
+from detectron2.data.detection_utils import read_image
+from detectron2.utils.logger import setup_logger
+
+from diffusiondet.predictor import VisualizationDemo
+from diffusiondet import DiffusionDetDatasetMapper, add_diffusiondet_config, DiffusionDetWithTTA
+from diffusiondet.util.model_ema import add_model_ema_configs, may_build_model_ema, may_get_ema_checkpointer, EMAHook, \
+    apply_model_ema_and_restore, EMADetectionCheckpointer
+
+# constants
+WINDOW_NAME = "COCO detections"
+
+
+def setup_cfg(args):
+    # load config from file and command-line arguments
+    cfg = get_cfg()
+    # To use demo for Panoptic-DeepLab, please uncomment the following two lines.
+    # from detectron2.projects.panoptic_deeplab import add_panoptic_deeplab_config  # noqa
+    # add_panoptic_deeplab_config(cfg)
+    add_diffusiondet_config(cfg)
+    add_model_ema_configs(cfg)
+    cfg.merge_from_file(args.config_file)
+    cfg.merge_from_list(args.opts)
+    # Set score_threshold for builtin models
+    cfg.MODEL.RETINANET.SCORE_THRESH_TEST = args.confidence_threshold
+    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = args.confidence_threshold
+    cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = args.confidence_threshold
+    cfg.freeze()
+    return cfg
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(description="Detectron2 demo for builtin configs")
+    parser.add_argument(
+        "--config-file",
+        default="configs/quick_schedules/mask_rcnn_R_50_FPN_inference_acc_test.yaml",
+        metavar="FILE",
+        help="path to config file",
+    )
+    parser.add_argument("--webcam", action="store_true", help="Take inputs from webcam.")
+    parser.add_argument("--video-input", help="Path to video file.")
+    parser.add_argument(
+        "--input",
+        nargs="+",
+        help="A list of space separated input images; "
+        "or a single glob pattern such as 'directory/*.jpg'",
+    )
+    parser.add_argument(
+        "--output",
+        help="A file or directory to save output visualizations. "
+        "If not given, will show output in an OpenCV window.",
+    )
+
+    parser.add_argument(
+        "--confidence-threshold",
+        type=float,
+        default=0.5,
+        help="Minimum score for instance predictions to be shown",
+    )
+    parser.add_argument(
+        "--opts",
+        help="Modify config options using the command-line 'KEY VALUE' pairs",
+        default=[],
+        nargs=argparse.REMAINDER,
+    )
+    return parser
+
+
+def test_opencv_video_format(codec, file_ext):
+    with tempfile.TemporaryDirectory(prefix="video_format_test") as dir:
+        filename = os.path.join(dir, "test_file" + file_ext)
+        writer = cv2.VideoWriter(
+            filename=filename,
+            fourcc=cv2.VideoWriter_fourcc(*codec),
+            fps=float(30),
+            frameSize=(10, 10),
+            isColor=True,
+        )
+        [writer.write(np.zeros((10, 10, 3), np.uint8)) for _ in range(30)]
+        writer.release()
+        if os.path.isfile(filename):
+            return True
+        return False
+
+
+if __name__ == "__main__":
+    mp.set_start_method("spawn", force=True)
+    args = get_parser().parse_args()
+    setup_logger(name="fvcore")
+    logger = setup_logger()
+    logger.info("Arguments: " + str(args))
+
+    cfg = setup_cfg(args)
+
+    demo = VisualizationDemo(cfg)
+
+    if args.input:
+        if len(args.input) == 1:
+            args.input = glob.glob(os.path.expanduser(args.input[0]))
+            assert args.input, "The input path(s) was not found"
+        for path in tqdm.tqdm(args.input, disable=not args.output):
+            # use PIL, to be consistent with evaluation
+            img = read_image(path, format="BGR")
+            start_time = time.time()
+            predictions, visualized_output = demo.run_on_image(img)
+            logger.info(
+                "{}: {} in {:.2f}s".format(
+                    path,
+                    "detected {} instances".format(len(predictions["instances"]))
+                    if "instances" in predictions
+                    else "finished",
+                    time.time() - start_time,
+                )
+            )
+
+            if args.output:
+                if os.path.isdir(args.output):
+                    assert os.path.isdir(args.output), args.output
+                    out_filename = os.path.join(args.output, os.path.basename(path))
+                else:
+                    assert len(args.input) == 1, "Please specify a directory with args.output"
+                    out_filename = args.output
+                visualized_output.save(out_filename)
+            else:
+                cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)
+                cv2.imshow(WINDOW_NAME, visualized_output.get_image()[:, :, ::-1])
+                if cv2.waitKey(0) == 27:
+                    break  # esc to quit
+    elif args.webcam:
+        assert args.input is None, "Cannot have both --input and --webcam!"
+        assert args.output is None, "output not yet supported with --webcam!"
+        cam = cv2.VideoCapture(0)
+        for vis in tqdm.tqdm(demo.run_on_video(cam)):
+            cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)
+            cv2.imshow(WINDOW_NAME, vis)
+            if cv2.waitKey(1) == 27:
+                break  # esc to quit
+        cam.release()
+        cv2.destroyAllWindows()
+    elif args.video_input:
+        video = cv2.VideoCapture(args.video_input)
+        width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
+        height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        frames_per_second = video.get(cv2.CAP_PROP_FPS)
+        num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
+        basename = os.path.basename(args.video_input)
+        codec, file_ext = (
+            ("x264", ".mkv") if test_opencv_video_format("x264", ".mkv") else ("mp4v", ".mp4")
+        )
+        if codec == ".mp4v":
+            warnings.warn("x264 codec not available, switching to mp4v")
+        if args.output:
+            if os.path.isdir(args.output):
+                output_fname = os.path.join(args.output, basename)
+                output_fname = os.path.splitext(output_fname)[0] + file_ext
+            else:
+                output_fname = args.output
+            assert not os.path.isfile(output_fname), output_fname
+            output_file = cv2.VideoWriter(
+                filename=output_fname,
+                # some installation of opencv may not support x264 (due to its license),
+                # you can try other format (e.g. MPEG)
+                fourcc=cv2.VideoWriter_fourcc(*codec),
+                fps=float(frames_per_second),
+                frameSize=(width, height),
+                isColor=True,
+            )
+        assert os.path.isfile(args.video_input)
+        for vis_frame in tqdm.tqdm(demo.run_on_video(video), total=num_frames):
+            if args.output:
+                output_file.write(vis_frame)
+            else:
+                cv2.namedWindow(basename, cv2.WINDOW_NORMAL)
+                cv2.imshow(basename, vis_frame)
+                if cv2.waitKey(1) == 27:
+                    break  # esc to quit
+        video.release()
+        if args.output:
+            output_file.release()
+        else:
+            cv2.destroyAllWindows()
--- a/diffusiondet/__init__.py
+++ b/diffusiondet/__init__.py
+# ========================================
+# Modified by Shoufa Chen
+# ========================================
+# Modified by Peize Sun, Rufeng Zhang
+# Contact: {sunpeize, cxrfzhang}@foxmail.com
+#
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+from .config import add_diffusiondet_config
+from .detector import DiffusionDet
+from .dataset_mapper import DiffusionDetDatasetMapper
+from .test_time_augmentation import DiffusionDetWithTTA
+from .swintransformer import build_swintransformer_fpn_backbone
--- a/diffusiondet/config.py
+++ b/diffusiondet/config.py
+# ========================================
+# Modified by Shoufa Chen
+# ========================================
+# Modified by Peize Sun, Rufeng Zhang
+# Contact: {sunpeize, cxrfzhang}@foxmail.com
+#
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+from detectron2.config import CfgNode as CN
+
+
+def add_diffusiondet_config(cfg):
+    """
+    Add config for DiffusionDet
+    """
+    cfg.MODEL.DiffusionDet = CN()
+    cfg.MODEL.DiffusionDet.NUM_CLASSES = 80
+    cfg.MODEL.DiffusionDet.NUM_PROPOSALS = 300
+
+    # RCNN Head.
+    cfg.MODEL.DiffusionDet.NHEADS = 8
+    cfg.MODEL.DiffusionDet.DROPOUT = 0.0
+    cfg.MODEL.DiffusionDet.DIM_FEEDFORWARD = 2048
+    cfg.MODEL.DiffusionDet.ACTIVATION = 'relu'
+    cfg.MODEL.DiffusionDet.HIDDEN_DIM = 256
+    cfg.MODEL.DiffusionDet.NUM_CLS = 1
+    cfg.MODEL.DiffusionDet.NUM_REG = 3
+    cfg.MODEL.DiffusionDet.NUM_HEADS = 6
+
+    # Dynamic Conv.
+    cfg.MODEL.DiffusionDet.NUM_DYNAMIC = 2
+    cfg.MODEL.DiffusionDet.DIM_DYNAMIC = 64
+
+    # Loss.
+    cfg.MODEL.DiffusionDet.CLASS_WEIGHT = 2.0
+    cfg.MODEL.DiffusionDet.GIOU_WEIGHT = 2.0
+    cfg.MODEL.DiffusionDet.L1_WEIGHT = 5.0
+    cfg.MODEL.DiffusionDet.DEEP_SUPERVISION = True
+    cfg.MODEL.DiffusionDet.NO_OBJECT_WEIGHT = 0.1
+
+    # Focal Loss.
+    cfg.MODEL.DiffusionDet.USE_FOCAL = True
+    cfg.MODEL.DiffusionDet.USE_FED_LOSS = False
+    cfg.MODEL.DiffusionDet.ALPHA = 0.25
+    cfg.MODEL.DiffusionDet.GAMMA = 2.0
+    cfg.MODEL.DiffusionDet.PRIOR_PROB = 0.01
+
+    # Dynamic K
+    cfg.MODEL.DiffusionDet.OTA_K = 5
+
+    # Diffusion
+    cfg.MODEL.DiffusionDet.SNR_SCALE = 2.0
+    cfg.MODEL.DiffusionDet.SAMPLE_STEP = 1
+
+    # Inference
+    cfg.MODEL.DiffusionDet.USE_NMS = True
+
+    # Swin Backbones
+    cfg.MODEL.SWIN = CN()
+    cfg.MODEL.SWIN.SIZE = 'B'  # 'T', 'S', 'B'
+    cfg.MODEL.SWIN.USE_CHECKPOINT = False
+    cfg.MODEL.SWIN.OUT_FEATURES = (0, 1, 2, 3)  # modify
+
+    # Optimizer.
+    cfg.SOLVER.OPTIMIZER = "ADAMW"
+    cfg.SOLVER.BACKBONE_MULTIPLIER = 1.0
+
+    # TTA.
+    cfg.TEST.AUG.MIN_SIZES = (400, 500, 600, 640, 700, 900, 1000, 1100, 1200, 1300, 1400, 1800, 800)
+    cfg.TEST.AUG.CVPODS_TTA = True
+    cfg.TEST.AUG.SCALE_FILTER = True
+    cfg.TEST.AUG.SCALE_RANGES = ([96, 10000], [96, 10000], 
+                                 [64, 10000], [64, 10000],
+                                 [64, 10000], [0, 10000],
+                                 [0, 10000], [0, 256],
+                                 [0, 256], [0, 192],
+                                 [0, 192], [0, 96],
+                                 [0, 10000])