init

f8772570 · bailuo · f8772570 · f8772570 · f8772570 · f8772570
Commit f8772570 authored Jul 18, 2024 by bailuo
20 changed files
--- a/LICENSE
+++ b/LICENSE
+MIT License
+Copyright (c) 2022 Bin Yan
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# Unicorn
+实现目标跟踪的大统一。
+## 论文
+`Towards Grand Unification of Object Tracking`
+- https://arxiv.org/abs/2207.07078
+- ECCV 2022
+## 模型结构
+<!-- 此处一句话简要介绍模型结构 -->
+Unicorn 的统一表现在可以使用相同的模型参数通过单个网络同时解决四个跟踪问题（SOT、MOT、VOS、MOTS）。
+<div align=center>
+    <img src="./doc/unicorn.png"/>
+</div>
+## 算法原理
+Unicorn 由三个部分组成：统一输入与主干、统一嵌入、统一头。三个组件分别负责获得强大的视觉表征、建立精确的对应关系和检测不同的跟踪目标。 
+- 统一输入和主干\
+为了有效地定位多个潜在目标，Unicorn 将整个图像（参考帧和当前帧）而不是局部搜索区域作为输入。在特征提取过程中，参考帧和当前帧通过权重共享主干获得特征金字塔表示（FPN）。为了在计算对应关系时保持重要细节并减少计算负担，本文选择 stride 为 16 的特征图作为之后嵌入模块的输入。参考帧和当前帧的相应特征分别称为 F_ref 和 F_cur。
+- 统一嵌入\
+目标跟踪的核心任务是在视频中的帧之间建立准确的对应关系。对于 SOT 和 VOS，逐像素对应将用户提供的目标从参考帧（通常是 1^th 帧）传播到 t^th 帧，为最终的框或掩码预测提供强大的先验信息。此外，对于 MOT 和 MOTS，实例级对应有助于将 t^th 帧上检测到的实例与参考帧（通常是 t-1^th 帧）上的现有轨迹相关联。
+- 统一输出头\
+为了实现目标跟踪的大统一，另一个重要且具有挑战性的问题是为四个跟踪任务设计一个统一头。具体而言，MOT 检测特定类别的目标，SOT 需要检测参考帧中给定的任何目标。为了弥补这一差距，Unicorn 向原始检测器头引入了一个额外的输入（称为目标先验）。无需任何进一步修改，Unicorn 就可以通过这个统一的头轻松检测四项任务所需的各种目标。
+<div align=center>
+    <img src="./doc/Unicorn_components.png"/>
+</div>
+## 环境配置
+```
+mv unicorn_pytoch unicorn # 去框架名后缀
+# -v 路径、docker_name和imageID根据实际情况修改
+```
+### Docker（方法一）
+<!-- 此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤 -->
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-23.04-py37-latest # 本镜像imageID为：7c32d5b3e7d1
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+cd /your_code_path/unicorn
+pip3 install -U pip && pip3 install -r requirements.txt
+pip3 install -v -e .  # or python3 setup.py develop
+# Install Deformable Attention
+cd unicorn/models/ops
+bash make.sh
+cd ../../..
+# Install mmcv, mmdet
+cd external/qdtrack
+# 确认一下环境中 mmcv_full=1.6.1+git0d20119.abi0.dtk2304.torch1.10 版本
+pip3 install --user mmdet==2.26
+# Install bdd100k
+cd bdd100k
+python3 setup.py develop --user
+pip3 uninstall -y scalabel
+pip3 install --user git+https://github.com/scalabel/scalabel.git
+cd ../../..
+```
+### Dockerfile（方法二）
+<!-- 此处提供dockerfile的使用方法 -->
+```
+cd /your_code_path/unicorn/docker
+docker build --no-cache -t codestral:latest .
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+cd /your_code_path/unicorn
+pip install -r requirements.txt
+```
+### Anaconda（方法三）
+<!-- 此处提供本地配置、编译的详细步骤，例如： -->
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+```
+DTK驱动：dtk23.04
+python：python3.7
+pytorch:1.10.0
+```
+`Tips：以上DTK驱动、python、pytorch等DCU相关工具版本需要严格一一对应`
+其它非深度学习库参照requirements.txt安装：
+```
+pip install -r requirements.txt
+```
+## 数据集
+数据集有COCO, LaSOT, GOT-10K, TrackingNet, DAVIS, Youtube-VOS 2018, MOT17, CrowdHuman, ETHZ等。\
+这里针对SOT任务提供了`GOT-10K`用于训练测试
+- http://got-10k.aitestunion.com/ 
+- 百度网盘链接：https://pan.baidu.com/s/1JYdcw41kDnbdu09itdJwbA 提取码：kwyc
+<!-- - 建议从SCNet下载 http://113.200.138.88:18080/aidatasets/project-dependency/nyu-depvh-v2-sync -->
+<!-- 数据下载、预处理脚本的使用方法
+```
+cd /your_code_path/unicorn/
+python get_davis.py # 下载数据集DAVIS-2017-trainval-480p
+python main_processing.py # 预处理数据集
+``` -->
+训练数据目录结构如下，用于正常训练的完整数据集请按此目录结构进行制备：
+```
+├──GOT10k
+    ├──train
+        ├──sequence_1
+        ├──sequence_2
+        ├──...
+    ├──val
+    ├──test
+```
+## 训练
+<!-- 一般情况下，ModelZoo上的项目提供单机训练的启动方法即可，单机多卡、单机单卡至少提供其一训练方法。 -->
+<!-- ### 单机多卡
+```
+python train.py --config configs/default.txt # 注意修改configs文件以及config.py文件
+``` -->
+针对SOT任务只采用了部分数据集`GOT-10K`，在GPU和DCU上完全保持一致
+### 单机单卡
+```
+cd /your_code_path/unicorn
+HIP_VISIBLE_DEVICES=0 python3 launch_uni.py --name unicorn_track_tiny_sot_only_dcu --nproc_per_node 1 --batch 2 --mode multiple
+# 这里是在预训练权重unicorn_det_convnext_tiny_800x1280上进行的微调，可根据情况自行选择。
+```
+## 测试
+```
+HIP_VISIBLE_DEVICES=0 python3 tools/test.py unicorn_sot unicorn_track_tiny_sot_only_dcu  --dataset got10k_val --threads 8
+python3 tools/analysis_results.py --name unicorn_track_tiny_sot_only_dcu/got10k
+```
+<!-- ## 推理
+```
+python run.py --encoder vitb --img-path assets/examples --outdir depth_vis
+``` -->
+<!-- ## result
+此处填算法效果测试图（包括输入、输出） -->
+### 精度
+GPU：A800，DCU：Z100L
+<!-- <div align=center>
+    <img src="./doc/metric.png"/>
+</div> -->
+<!-- 根据测试结果情况填写表格： -->
+| GOT10K | AUC | OP50 | OP75 | Pre | Norm Pre |
+| :------: | :------: | :------: | :------: |:------: |:------: |
+| DCU | 78.97 | 88.64 | 78.13 | 74.11 | 87.86 |
+| GPU | 78.86 | 88.72 | 78.30 | 73.49 | 88.03 |
+## 应用场景
+### 算法类别
+<!-- 参考此分类方法（上传时请去除参考图片），与icon图标类别一致，请勿随意命名： -->
+<!-- <div align=center>
+    <img src="./doc/icon.png"/>
+</div> -->
+<!-- 超出以上分类的类别命名也可参考此网址中的类别名：https://huggingface.co/ \ -->
+`目标跟踪`
+### 热点应用行业
+<!-- 应用行业的填写需要做大量调研，从而为使用者提供专业、全面的推荐，除特殊算法，通常推荐数量>=3。 -->
+`制造,电商,医疗,教育`
+<!-- ## 预训练权重 -->
+<!-- - 此处填写预训练权重在公司内部的下载地址（预训练权重存放中心为：[SCNet AIModels](http://113.200.138.88:18080/aimodels) ，模型用到的各预训练权重请分别填上具体地址。），过小权重文件可打包到项目里。
+- 此处填写公开预训练权重官网下载地址（非必须）。 -->
+## 源码仓库及问题反馈
+<!-- - 此处填本项目gitlab地址 -->
+- https://developer.hpccube.com/codes/modelzoo/unicorn_pytorch
+## 参考资料
+- https://github.com/MasterBin-IIAU/Unicorn
--- a/README_origin.md
+++ b/README_origin.md
+## Unicorn :unicorn: : Towards Grand Unification of Object Tracking
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-grand-unification-of-object-tracking/multiple-object-tracking-on-bdd100k)](https://paperswithcode.com/sota/multiple-object-tracking-on-bdd100k?p=towards-grand-unification-of-object-tracking)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-grand-unification-of-object-tracking/multi-object-tracking-and-segmentation-on-2)](https://paperswithcode.com/sota/multi-object-tracking-and-segmentation-on-2?p=towards-grand-unification-of-object-tracking)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-grand-unification-of-object-tracking/multi-object-tracking-on-mots20)](https://paperswithcode.com/sota/multi-object-tracking-on-mots20?p=towards-grand-unification-of-object-tracking)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-grand-unification-of-object-tracking/visual-object-tracking-on-lasot)](https://paperswithcode.com/sota/visual-object-tracking-on-lasot?p=towards-grand-unification-of-object-tracking)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-grand-unification-of-object-tracking/visual-object-tracking-on-trackingnet)](https://paperswithcode.com/sota/visual-object-tracking-on-trackingnet?p=towards-grand-unification-of-object-tracking)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-grand-unification-of-object-tracking/multi-object-tracking-on-mot17)](https://paperswithcode.com/sota/multi-object-tracking-on-mot17?p=towards-grand-unification-of-object-tracking)
+[![Models on Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Hub-blue)](https://huggingface.co/models?arxiv=arxiv:2111.12085)
+![Unicorn](assets/Unicorn.png)
+This repository is the project page for the paper [Towards Grand Unification of Object Tracking](https://arxiv.org/abs/2207.07078)
+## Highlight
+- Unicorn is accepted to ECCV 2022 as an **oral presentation**!
+- Unicorn first demonstrates grand unification for **four object-tracking tasks**.
+- Unicorn achieves strong performance in **eight tracking benchmarks**. 
+## Introduction
+- The object tracking field mainly consists of four sub-tasks: Single Object Tracking (SOT), Multiple Object Tracking (MOT), Video Object Segmentation (VOS), and Multi-Object Tracking and Segmentation (MOTS). Most previous approaches are developed for only one of or part of the sub-tasks. 
+- For the first time, Unicorn accomplishes the great unification of the network architecture and the learning paradigm for **four tracking tasks**. Besides, Unicorn puts forwards new state-of-the-art performance on many challenging tracking benchmarks **using the same model parameters**.
+This repository supports the following tasks:
+Image-level
+- Object Detection
+- Instance Segmentation
+Video-level
+- Single Object Tracking (SOT)
+- Multiple Object Tracking (MOT)
+- Video Object Segmentation (VOS)
+- Multi-Object Tracking and Segmentation (MOTS)
+## Demo
+Unicorn conquers four tracking tasks (SOT, MOT, VOS, MOTS) using **the same network** with **the same parameters**.
+https://user-images.githubusercontent.com/6366788/180479685-c2f4bf3e-3faf-4abe-b401-80150877348d.mp4
+## Results
+### SOT
+<div align="center">
+<img src="assets/SOT.png" width="600pix"/>
+</div>
+### MOT (MOT17)
+<div align="center">
+<img src="assets/MOT.png" width="600pix"/>
+</div>
+### MOT (BDD100K) 
+<div align="center">
+<img src="assets/MOT-BDD.png" width="600pix"/>
+</div>
+### VOS
+<div align="center">
+<img src="assets/VOS.png" width="600pix"/>
+</div>
+### MOTS (MOTS Challenge)
+<div align="center">
+<img src="assets/MOTS.png" width="600pix"/>
+</div>
+### MOTS (BDD100K MOTS)
+<div align="center">
+<img src="assets/MOTS-BDD.png" width="600pix"/>
+</div>
+## Getting started
+1. Installation: Please refer to [install.md](assets/install.md) for more details.
+2. Data preparation: Please refer to [data.md](assets/data.md) for more details.
+3. Training: Please refer to [train.md](assets/train.md) for more details.
+4. Testing: Please refer to [test.md](assets/test.md) for more details. 
+5. Model zoo: Please refer to [model_zoo.md](assets/model_zoo.md) for more details.
+## Citing Unicorn
+If you find Unicorn useful in your research, please consider citing:
+```bibtex
+@inproceedings{unicorn,
+  title={Towards Grand Unification of Object Tracking},
+  author={Yan, Bin and Jiang, Yi and Sun, Peize and Wang, Dong and Yuan, Zehuan and Luo, Ping and Lu, Huchuan},
+  booktitle={ECCV},
+  year={2022}
+}
+```
+## Acknowledgments
+- Thanks [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) and [CondInst](https://github.com/aim-uofa/AdelaiDet) for providing strong baseline for object detection and instance segmentation.
+- Thanks [STARK](https://github.com/researchmm/Stark) and [PyTracking](https://github.com/visionml/pytracking) for providing useful inference and evaluation toolkits for SOT and VOS.
+- Thanks [ByteTrack](https://github.com/ifzhang/ByteTrack), [QDTrack](https://github.com/SysCV/qdtrack) and [PCAN](https://github.com/SysCV/pcan/) for providing useful data-processing scripts and evalution codes for MOT and MOTS.
--- a/assets/MOT-BDD.png
+++ b/assets/MOT-BDD.png
--- a/assets/MOT.png
+++ b/assets/MOT.png
--- a/assets/MOTS-BDD.png
+++ b/assets/MOTS-BDD.png
--- a/assets/MOTS.png
+++ b/assets/MOTS.png
--- a/assets/SOT.png
+++ b/assets/SOT.png
--- a/assets/Unicorn.png
+++ b/assets/Unicorn.png
--- a/assets/VOS.png
+++ b/assets/VOS.png
--- a/assets/data.md
+++ b/assets/data.md
+# Data Preparation
+We put pretrained backbone weights under ${UNICORN_ROOT} and put all data under the `datasets` folder. The complete data structure looks like this.
+```
+${UNICORN_ROOT}
+    -- convnext_tiny_1k_224_ema.pth
+    -- convnext_large_22k_224.pth
+    -- datasets
+        -- bdd
+            -- images
+                -- 10k
+                -- 100k
+                -- seg_track_20
+                -- track
+            -- labels
+                -- box_track_20
+                -- det_20
+                -- ins_seg
+                -- seg_track_20
+        -- Cityscapes
+            -- annotations
+            -- images
+            -- labels_with_ids
+        -- COCO
+            -- annotations
+            -- train2017
+            -- val2017
+        -- crowdhuman
+            -- annotations
+            -- CrowdHuman_train
+            -- CrowdHuman_val
+            -- annotation_train.odgt
+            -- annotation_val.odgt
+        -- DAVIS
+            -- Annotations
+            -- ImageSets
+            -- JPEGImages
+            -- README.md
+            -- SOURCES.md
+        -- ETHZ
+            -- annotations
+            -- eth01
+            -- eth02
+            -- eth03
+            -- eth05
+            -- eth07
+        -- GOT10K
+            -- test
+                -- GOT-10k_Test_000001
+                -- ...
+            -- train
+                -- GOT-10k_Train_000001
+                -- ...
+        -- LaSOT
+            -- airplane
+            -- basketball
+            -- ...
+        -- mot
+            -- annotations
+            -- test
+            -- train
+        -- MOTS
+            -- annotations
+            -- test
+            -- train
+        -- saliency
+            -- image
+            -- mask
+        -- TrackingNet
+            -- TEST
+            -- TRAIN_0
+            -- TRAIN_1
+            -- TRAIN_2
+            -- TRAIN_3
+        -- ytbvos18
+            -- train
+            -- val
+```
+## Pretrained backbone weights
+Unicorn uses [ConvNeXt](https://arxiv.org/abs/2201.03545) as the backbone by default. The pretrained backbone weights can be downloaded by the following commands.
+```
+wget -c https://dl.fbaipublicfiles.com/convnext/convnext_tiny_1k_224_ema.pth # convnext-tiny
+wget -c https://dl.fbaipublicfiles.com/convnext/convnext_large_22k_224.pth # convnext-large
+```
+## Data
+For users who are only interested in part of tasks, there is no need of downloading all datasets. The following lines list the datasets needed for different tasks.
+- Object detection & instance segmentation: COCO
+- SOT: COCO, LaSOT, GOT-10K, TrackingNet
+- VOS: DAVIS, Youtube-VOS 2018, COCO, Saliency
+- MOT & MOTS (MOT Challenge 17, MOTS Challenge): MOT17, CrowdHuman, ETHZ, CityPerson, COCO, MOTS
+- MOT & MOTS (BDD100K): BDD100K
+### Object Detection & Instance Segmentation
+Please download [COCO](https://cocodataset.org/#home) from the offical website. We use [train2017.zip](http://images.cocodataset.org/zips/train2017.zip), [val2017.zip](http://images.cocodataset.org/zips/val2017.zip) & [annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip). We expect that the data is organized as below.
+```
+${UNICORN_ROOT}
+    -- datasets
+        -- COCO
+            -- annotations
+            -- train2017
+            -- val2017
+```
+### SOT
+Please download [COCO](https://cocodataset.org/#home), [LaSOT](http://vision.cs.stonybrook.edu/~lasot/download.html), [GOT-10K](http://got-10k.aitestunion.com/downloads) and [TrackingNet](https://tracking-net.org/). Since TrackingNet is very large and hard to download, we only use the first 4 splits (TRAIN_0.zip, TRAIN_1.zip, TRAIN_2.zip, TRAIN_3.zip) rather than the complete 12 splits for the training set. The original TrackingNet zips (put under `datasets`) can be unzipped by the following commands.
+```
+python3 tools/process_trackingnet.py
+```
+We expect that the data is organized as below.
+```
+${UNICORN_ROOT}
+    -- datasets
+        -- COCO
+            -- annotations
+            -- train2017
+            -- val2017
+        -- GOT10K
+            -- test
+                -- GOT-10k_Test_000001
+                -- ...
+            -- train
+                -- GOT-10k_Train_000001
+                -- ...
+        -- LaSOT
+            -- airplane
+            -- basketball
+            -- ...
+        -- TrackingNet
+            -- TEST
+            -- TRAIN_0
+            -- TRAIN_1
+            -- TRAIN_2
+            -- TRAIN_3
+```
+### VOS
+Please download [DAVIS](https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip), [Youtube-VOS 2018](https://youtube-vos.org/dataset/), [COCO](https://cocodataset.org/#home), [Saliency](https://drive.google.com/file/d/1qgjvIbeMIBSWfRu6iDrCnY--CbUUhHOb/view?usp=sharing).
+The saliency dataset is constructed from [DUTS](http://saliencydetection.net/duts/), [DUT-OMRON](http://saliencydetection.net/dut-omron/), etc.
+The downloaded youtube-vos zips can be processed using the following commands.
+```
+unzip -qq ytbvos18_train.zip
+unzip -qq ytbvos18_val.zip
+mkdir ytbvos18
+mv train ytbvos18/train
+mv valid ytbvos18/val
+rm -rf ytbvos18_train.zip
+rm -rf ytbvos18_val.zip
+mv ytbvos18 datasets
+```
+We expect that the data is organized as below.
+```
+${UNICORN_ROOT}
+    -- datasets
+        -- COCO
+            -- annotations
+            -- train2017
+            -- val2017
+        -- DAVIS
+            -- Annotations
+            -- ImageSets
+            -- JPEGImages
+            -- README.md
+            -- SOURCES.md
+        -- saliency
+            -- image
+            -- mask
+        -- ytbvos18
+            -- train
+            -- val
+```
+### MOT & MOTS (MOT Challenge)
+Download [MOT17](https://motchallenge.net/), [CrowdHuman](https://www.crowdhuman.org/), [Cityperson](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md), [ETHZ](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md), [MOTS](https://motchallenge.net/) and put them under `datasets` in the following structure:
+```
+${UNICORN_ROOT}
+    -- datasets
+        -- Cityscapes
+            -- annotations
+            -- images
+            -- labels_with_ids
+        -- COCO
+            -- annotations
+            -- train2017
+            -- val2017
+        -- crowdhuman
+            -- annotations
+            -- CrowdHuman_train
+            -- CrowdHuman_val
+            -- annotation_train.odgt
+            -- annotation_val.odgt
+        -- ETHZ
+            -- annotations
+            -- eth01
+            -- eth02
+            -- eth03
+            -- eth05
+            -- eth07
+        -- mot
+            -- annotations
+            -- test
+            -- train
+        -- MOTS
+            -- annotations
+            -- test
+            -- train
+```
+unzip CityPersons dataset by 
+```
+cat Citypersons.z01 Citypersons.z02 Citypersons.z03 Citypersons.zip > c.zip
+zip -FF Citypersons.zip --out c.zip
+unzip -qq c.zip
+```
+Unzip CrowdHuman dataset by
+```
+# unzip the train split
+unzip -qq CrowdHuman_train01.zip
+unzip -qq CrowdHuman_train02.zip
+unzip -qq CrowdHuman_train03.zip
+mv Images CrowdHuman_train
+# unzip the val split
+unzip -qq CrowdHuman_val.zip
+mv Images CrowdHuman_val
+```
+Then, you need to turn the datasets to COCO format:
+```shell
+python3 tools/convert_mot17_to_coco.py
+python3 tools/convert_mot17_to_omni.py --dataset_name mot
+python3 tools/convert_crowdhuman_to_coco.py
+python3 tools/convert_cityperson_to_coco.py
+python3 tools/convert_ethz_to_coco.py
+python3 tools/convert_mots_to_coco.py
+```
+### MOT & MOTS (BDD100K)
+We need to download the `detection` set, `tracking` set, `instance seg` set and `tracking & seg` set for training and validation.
+For more details about the dataset, please refer to the [offial documentation](https://doc.bdd100k.com/download.html).
+We provide the following commands to download and process BDD100K datasets in parallel.
+```
+cd external/qdtrack
+python3 download_bdd100k.py # replace save_dir to your path
+bash prepare_bdd100k.sh # replace paths to yours
+ln -s <UNICORN_ROOT>/external/qdtrack/data/bdd <UNICORN_ROOT>/datasets/bdd
+```
+We expect that the data is organized as below
+```
+${UNICORN_ROOT}
+    -- datasets
+        -- bdd
+            -- images
+                -- 10k
+                -- 100k
+                -- seg_track_20
+                -- track
+            -- labels
+                -- box_track_20
+                -- det_20
+                -- ins_seg
+                -- seg_track_20
+```
\ No newline at end of file
--- a/assets/install.md
+++ b/assets/install.md
+# Install
+## Requirements
+We test the codes in the following environments, other versions may also be compatible but Pytorch vision should be >= 1.7
+- CUDA 11.3
+- Python 3.7
+- Pytorch 1.10.0
+- Torchvison 0.11.1
+## Install environment for Unicorn
+```
+# Pytorch and Torchvision
+conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
+# YOLOX and some other packages
+pip3 install -U pip && pip3 install -r requirements.txt
+pip3 install -v -e .  # or python3 setup.py develop
+# Install Deformable Attention
+cd unicorn/models/ops
+bash make.sh
+cd ../../..
+# Install mmcv, mmdet, bdd100k
+cd external/qdtrack
+wget -c https://download.openmmlab.com/mmcv/dist/cu113/torch1.10.0/mmcv_full-1.4.6-cp37-cp37m-manylinux1_x86_64.whl # This should change according to cuda version and pytorch version
+pip3 install --user mmcv_full-1.4.6-cp37-cp37m-manylinux1_x86_64.whl
+pip3 install --user mmdet
+git clone https://github.com/bdd100k/bdd100k.git
+cd bdd100k
+python3 setup.py develop --user
+pip3 uninstall -y scalabel
+pip3 install --user git+https://github.com/scalabel/scalabel.git
+cd ../../..
+```
--- a/assets/model_zoo.md
+++ b/assets/model_zoo.md
+# Unicorn Model Zoo
+Here we provide the performance of Unicorn on multiple tasks (Object Detection, Instance Segmentation, and Object Tracking).
+The complete model weights and the corresponding training logs are given by the links.
+## Object Detection
+The object detector of Unicorn is pretrained and evaluated on COCO. In this step, there is no segmentation head and the network is trained only using box-level annotations.
+<table>
+  <tr>
+    <th>Experiment
+    <th>Backone</th>
+    <th>Box AP</th>
+    <th>Model</th>
+    <th>Log</th>
+  </tr>
+  <tr>
+    <td>unicorn_det_convnext_large_800x1280</td>
+    <td>ConvNext-Large</td>
+    <td>53.7</td>
+    <td><a href="https://drive.google.com/file/d/1kET9m1BV9f6agv5EY0oNHinJH00PKn_Q/view?usp=sharing">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1QMzcK0bnPE3fcyRLHFp0W6hJdHVfgsUS/view?usp=sharing">log</a></td>
+  </tr>
+  <tr>
+    <td>unicorn_det_convnext_tiny_800x1280</td>
+    <td>ConvNext-Tiny</td>
+    <td>53.1</td>
+    <td><a href="https://drive.google.com/file/d/11kLsIOp6jQEEM0ZmOvvsJW_RgjgCxuYZ/view?usp=sharing">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1GezYXWtUStUf01oeDvkVOFFpJ7CVh2dk/view?usp=sharing">log</a></td>
+  </tr>
+  <tr>
+    <td>unicorn_det_r50_800x1280</td>
+    <td>ResNet-50</td>
+    <td>51.7</td>
+    <td><a href="https://drive.google.com/file/d/13wJ8lRrIrhixDYv7zgbQ6KEhIwQH15aQ/view?usp=sharing">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1E8XJHsKj5fjGTZ9Y3hMYLJKk9tQmC2pU/view?usp=sharing">log</a></td>
+  </tr>
+</table>
+## Instance Segmentation (Optional)
+Please note that this part is optional. The training of downstream tracking tasks do not rely on this. So please feel free to skip it unless you are interested in instance segmentation on COCO. In this step, a segmentaiton head is appended to the pretrained object detector. Then parameters of the object detector are frozen and only the segmentation head is optimized. So the box AP would be the same as that in the previous stage. Here we provide the results of the model with convnext-tiny backbone.
+<table>
+  <tr>
+    <th>Experiment
+    <th>Backone</th>
+    <th>Mask AP</th>
+    <th>Model</th>
+    <th>Log</th>
+  </tr>
+  <tr>
+    <td>unicorn_inst_convnext_tiny_800x1280</td>
+    <td>ConvNext-Tiny</td>
+    <td>43.2</td>
+    <td><a href="https://drive.google.com/file/d/1S7wG5dzmjyeyl6gZzgJvd9EBWGO-QU2E/view?usp=sharing">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1TpdECG_Vt7zaAEAkhS_l_uFEGEVeU_L0/view?usp=sharing">log</a></td>
+  </tr>
+</table>
+## Object Tracking
+There are some inner conflicts among existing MOT benchmarks. 
+- Different benchmarks focus on different object classes. For example, MOT Challenge, BDD100K, and TAO include 1, 8, and 800+ object classes.
+- Different benchmarks have different labeling rules. For example, the MOT challenge always annotates the whole person, even when the person is heavily occluded or cut by the image boundary. However, the other benchmarks do not share the same rule. 
+These factors make it difficult to train one unified model for different MOT benchmarks. To deal with this problem, Unicorn trains two unified models. To be specific, the first model can simultaneously deal with SOT, BDD100K, VOS, and BDD100K MOTS. The second model can simultaneously deal with SOT, MOT17, VOS, and MOTS Challenge. The results of SOT and VOS are reported using the first model.
+The results of the first group of models are shown as below.
+<table>
+  <tr>
+    <th>Experiment</th>
+    <th>Input Size</th>
+    <th>LaSOT<br>AUC (%)</th>
+    <th>BDD100K<br>mMOTA (%)</th>
+    <th>DAVIS17<br>J&F (%)</th>
+    <th>BDD100K MOTS<br>mMOTSA (%)</th>
+    <th>Model</th>
+    <th>Log<br>Stage1</th>
+    <th>Log<br>Stage2</th>
+  </tr>
+  <tr>
+    <td>unicorn_track_large_mask</td>
+    <td>800x1280</td>
+    <td>68.5</td>
+    <td>41.2</td>
+    <td>69.2</td>
+    <td>29.6</td>
+    <td><a href="https://huggingface.co/NimaBoscarino/unicorn_track_large_mask">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1GZwqWsMgx8H3VYPZcDwk_4XxTSJxEyEf/view?usp=sharing">log1</a></td>
+    <td><a href="https://drive.google.com/file/d/1eWLNiOyKFX8Tu0Xfp2whR1g7n8CBEMQ7/view?usp=sharing">log2</a></td>
+  </tr>
+  <tr>
+    <td>unicorn_track_tiny_mask</td>
+    <td>800x1280</td>
+    <td>67.7</td>
+    <td>39.9</td>
+    <td>68.0</td>
+    <td>29.7</td>
+    <td><a href="https://huggingface.co/NimaBoscarino/unicorn_track_tiny_mask">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1BQPi5e_iOCQBKYj55U0Um7Y2NI69_R5z/view?usp=sharing">log1</a></td>
+    <td><a href="https://drive.google.com/file/d/1dgTiATiVFyZT4xYkvxHO6kgjNSCzfd5m/view?usp=sharing">log2</a></td>
+  </tr>
+  <tr>
+    <td>unicorn_track_tiny_rt_mask</td>
+    <td>640x1024</td>
+    <td>67.1</td>
+    <td>37.5</td>
+    <td>66.8</td>
+    <td>26.2</td>
+    <td><a href="https://huggingface.co/NimaBoscarino/unicorn_track_tiny_rt_mask">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1ObMKqOr46AKmAcIC6-pTez6s0mxgTAqy/view?usp=sharing">log1</a></td>
+    <td><a href="https://drive.google.com/file/d/1HdRj5ME157hDO84k6lxnA6gz1Tbe5EdQ/view?usp=sharing">log2</a></td>
+  </tr>
+  <tr>
+    <td>unicorn_track_r50_mask</td>
+    <td>800x1280</td>
+    <td>65.3</td>
+    <td>35.1</td>
+    <td>66.2</td>
+    <td>30.8</td>
+    <td><a href="https://huggingface.co/NimaBoscarino/unicorn_track_r50_mask">model</a></td>
+    <td><a href="https://drive.google.com/file/d/13tDFEjFbYYZAYvDYkOXoKThDyfRT7-53/view?usp=sharing">log1</a></td>
+    <td><a href="https://drive.google.com/file/d/1Qh45-TW4Nw9qx7Gkk4L6uDKGXomnnuKy/view?usp=sharing">log2</a></td>
+  </tr>
+</table>
+The results of the second group of models are shown as below.
+<table>
+  <tr>
+    <th>Experiment</th>
+    <th>Input Size</th>
+    <th>MOT17<br>MOTA (%)</th>
+    <th>MOTS<br>sMOTSA (%)</th>
+    <th>Model</th>
+    <th>Log<br>Stage1</th>
+    <th>Log<br>Stage2</th>
+  </tr>
+  <tr>
+    <td>unicorn_track_large_mot_challenge_mask</td>
+    <td>800x1280</td>
+    <td>77.2</td>
+    <td>65.3</td>
+    <td><a href="https://drive.google.com/file/d/1tktJbsdA3peX9i8tAcDGdxwxMit0rPs0/view?usp=sharing">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1NFcEkOarlhLI6jxoibKwWVqJ9jj6NE-L/view?usp=sharing">log1</a></td>
+    <td><a href="https://drive.google.com/file/d/18R_IUi8ooq4ZKah0DV0ajY7Y1GO5DYvB/view?usp=sharing">log2</a></td>
+  </tr>
+</table>
+We also provide task-specific models for users who are only interested in part of tasks.
+<table>
+  <tr>
+    <th>Experiment</th>
+    <th>Input Size</th>
+    <th>LaSOT<br>AUC (%)</th>
+    <th>BDD100K<br>mMOTA (%)</th>
+    <th>DAVIS17<br>J&F (%)</th>
+    <th>BDD100K MOTS<br>mMOTSA (%)</th>
+    <th>Model</th>
+    <th>Log<br>Stage1</th>
+    <th>Log<br>Stage2</th>
+  </tr>
+  <tr>
+    <td>unicorn_track_tiny_sot_only</td>
+    <td>800x1280</td>
+    <td>67.5</td>
+    <td>-</td>
+    <td>-</td>
+    <td>-</td>
+    <td><a href="https://drive.google.com/file/d/1NcMsWML-1-zr0SWXUOiRPNZRs-VnAWG5/view?usp=sharing">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1TiKopPT93v6JDYrrKMihlBV44h6VugqS/view?usp=sharing">log1</a></td>
+    <td>-</a></td>
+  </tr>
+  <tr>
+    <td>unicorn_track_tiny_mot_only</td>
+    <td>800x1280</td>
+    <td>-</td>
+    <td>39.6</td>
+    <td>-</td>
+    <td>-</td>
+    <td><a href="https://drive.google.com/file/d/1T0DxX-d_qeHvVlZ7IIbdNqtsHCQANKlQ/view?usp=sharing">model</a></td>
+    <td><a href="https://drive.google.com/file/d/1Fx8jataBKFH2c-q1uNVEFomQgm9252QX/view?usp=sharing">log1</a></td>
+    <td>-</a></td>
+  </tr>
+  <tr>
+    <td>unicorn_track_tiny_vos_only</td>
+    <td>800x1280</td>
+    <td>-</td>
+    <td>-</td>
+    <td>68.4</td>
+    <td>-</td>
+    <td><a href="https://drive.google.com/file/d/12T7XodWFuwSFAv5oBcDTIBJT4INVpQ94/view?usp=sharing">model</a></td>
+    <td>-</td>
+    <td><a href="https://drive.google.com/file/d/1jxbAZEVgvD2pZko9jce6pJYnzYP866PQ/view?usp=sharing">log2</a></td>
+  </tr>
+  <tr>
+    <td>unicorn_track_tiny_mots_only</td>
+    <td>800x1280</td>
+    <td>-</td>
+    <td>-</td>
+    <td>-</td>
+    <td>28.1</td>
+    <td><a href="https://drive.google.com/file/d/13D0rH3i0n5d_W8ead41zFySDwXAePxgv/view?usp=sharing">model</a></td>
+    <td>-</a></td>
+    <td><a href="https://drive.google.com/file/d/1P0GacRmGLwFw72rpaorBZ3vI63S70IeS/view?usp=sharing">log2</a></td>
+  </tr>
+</table>
+## Structure
+The downloaded checkpoints should be organized in the following structure
+   ```
+   ${UNICORN_ROOT}
+    -- Unicorn_outputs
+        -- unicorn_det_convnext_large_800x1280
+            -- best_ckpt.pth
+        -- unicorn_det_convnext_tiny_800x1280
+            -- best_ckpt.pth
+        -- unicorn_det_r50_800x1280
+            -- best_ckpt.pth
+        -- unicorn_track_large_mask
+            -- latest_ckpt.pth
+        -- unicorn_track_tiny_mask
+            -- latest_ckpt.pth
+        -- unicorn_track_r50_mask
+            -- latest_ckpt.pth
+        ...
+   ```
--- a/assets/test.md
+++ b/assets/test.md
+# Object Tracking Inference
+## Box-level Tracking
+For box-level tracking tasks like SOT and MOT, ${exp_name} should **NOT** contain `mask`. Specifically, legal ${exp_name} for box-level tracking include `unicorn_track_large`, `unicorn_track_large_mot_challenge`, `unicorn_track_tiny`, `unicorn_track_tiny_rt`, `unicorn_track_r50`. Their corresponding weights are exactly the same as those which end with `mask`. For exmple, if you want to run experiments about `unicorn_track_tiny`, you should first copy `Unicorn_outputs/unicorn_track_tiny_mask` to `Unicorn_outputs/unicorn_track_tiny`.
+**SOT**
+- LaSOT
+```
+python3 tools/test.py unicorn_sot ${exp_name} --dataset lasot --threads 32
+python3 tools/analysis_results.py --name ${exp_name}
+```
+- TrackingNet
+```
+python3 tools/test.py unicorn_sot ${exp_name} --dataset trackingnet --threads 32
+python3 external/lib/test/utils/transform_trackingnet.py --tracker_name unicorn_sot --cfg_name ${exp_name}
+```
+**MOT**
+- BDD100K
+```
+cd external/qdtrack
+# track
+bash tools/dist_test_omni.sh configs/bdd100k/unicorn.py ../../Unicorn_outputs/${exp_name}/latest_ckpt.pth 8 ${exp_name} --eval track
+# bbox
+python3 tools/eval.py configs/bdd100k/unicorn.py result_omni.pkl --eval bbox
+```
+- MOT Challenge 17
+```
+python3 tools/track.py -f exps/default/${exp_name} -c <ckpt path> -b 1 -d 1 # using the association strategy in ByteTrack
+python3 tools/track_omni.py -f exps/default/${exp_name} -c <ckpt path> -b 1 -d 1 # using the association strategy in QDTrack
+python3 tools/interpolation.py # need to change some paths
+```
+## Mask-level Tracking
+For Mask-level tracking tasks like VOS and MOTS, ${exp_name} should contain `mask`. Specifically, legal ${exp_name} for mask-level tracking include `unicorn_track_large_mask`, `unicorn_track_large_mot_challenge_mask`, `unicorn_track_tiny_mask`, `unicorn_track_tiny_rt_mask`, `unicorn_track_r50_mask`.
+**VOS**
+- DAVIS-2016
+```
+python3 tools/test.py unicorn_vos ${exp_name} --dataset dv2016_val --threads 20
+cd external/PyDavis16EvalToolbox
+python3 eval.py --name_list_path ../../datasets/DAVIS/ImageSets/2016/val.txt --mask_root ../../datasets/DAVIS/Annotations/480p --pred_path ../../test/segmentation_results/unicorn_vos/${exp_name}/ --save_path ../../result.pkl
+```
+- DAVIS-2017
+```
+python3 tools/test.py unicorn_vos ${exp_name} --dataset dv2017_val --threads 30
+cd external/davis2017-evaluation
+python3 evaluation_method.py --task semi-supervised --results_path ../../test/segmentation_results/unicorn_vos/${exp_name} --davis_path ../../datasets/DAVIS
+```
+**MOTS**
+- MOTSChallenge
+```
+python3 tools/track_omni.py -f <exp file path> -c <ckpt path> -b 1 -d 1 --mots --mask_thres 0.3
+# for train split
+cp Unicorn_outputs/${exp_name}/track_results/* ../MOTChallengeEvalKit/res/MOTSres
+cd ../MOTChallengeEvalKit
+python MOTS/evalMOTS.py
+```
+- BDD100K MOTS
+```
+cd external/qdtrack
+# track
+bash tools/dist_test_omni.sh configs/bdd100k_mots/segtrack-frcnn_r50_fpn_12e_bdd10k_fixed_pcan.py ../../Unicorn_outputs/${exp_name}/latest_ckpt.pth 8 ${exp_name} --eval segm --mots
+# convert to BDD100K format (bitmask)
+python3 tools/to_bdd100k.py configs/bdd100k_mots/segtrack-frcnn_r50_fpn_12e_bdd10k_fixed_pcan.py --res result_omni.pkl --task seg_track --bdd-dir . --nproc 32
+# evaluate
+bash eval_bdd_submit.sh
+```
\ No newline at end of file
--- a/assets/train.md
+++ b/assets/train.md
+# Tutorial for Training
+Every experiment is defined using a python file under the `exps/default` folder. Experiments about object detection and object tracking start with `unicorn_det` and `unicorn_track` respectively. In the next paragraphs, {exp_name} should be replaced with specifc filenames (without .py). For example, if you want to train unicorn with convnext-tiny backbone for object tracking, replace ${exp_name} with unicorn_track_tiny
+## Detection & Instance Segmentation
+**Single-node Training** 
+On a single node with 8 GPUs, run 
+```
+python3 launch_uni.py --name ${exp_name} --nproc_per_node 8 --batch 64 --mode multiple --fp16 0
+```
+**Multiple-node Training**
+On the master node, run
+```
+python3 launch_uni.py --name ${exp_name} --nproc_per_node 8 --batch 128 --mode distribute --fp16 0 --nnodes 2 --master_address ${master_address} --node_rank 0
+```
+On the second node, run
+```
+python3 launch_uni.py --name ${exp_name} --nproc_per_node 8 --batch 128 --mode distribute --fp16 0 --nnodes 2 --master_address ${master_address} --node_rank 1
+```
+Testing (Instance Segmentation)
+```
+python3 tools/eval.py -f exps/default/${exp_name}.py -c Unicorn_outputs/${exp_name}/latest_ckpt.pth -b 64 -d 8 --conf 0.001 --mask_thres 0.3
+```
+## Unified Tracking (SOT, MOT, VOS, MOTS)
+**Single-node Training** 
+On a single node with 8 GPUs, run 
+```
+python3 launch_uni.py --name ${exp_name} --nproc_per_node 8 --batch 16 --mode multiple
+```
+**Multiple-node Training**
+On the master node, run
+```
+python3 launch_uni.py --name ${exp_name} --nproc_per_node 8 --batch 32 --mode distribute --nnodes 2 --master_address ${master_address} --node_rank 0
+```
+On the second node, run
+```
+python3 launch_uni.py --name ${exp_name} --nproc_per_node 8 --batch 32 --mode distribute --nnodes 2 --master_address ${master_address} --node_rank 1
+```
--- a/assets/video_demo.gif
+++ b/assets/video_demo.gif
--- a/datasets/GOT10K
+++ b/datasets/GOT10K
+../../../data/GOT-10k_full_data/
\ No newline at end of file
--- a/datasets/data_path/citypersons.train
+++ b/datasets/data_path/citypersons.train
--- a/datasets/data_path/eth.train
+++ b/datasets/data_path/eth.train
--- a/doc/Unicorn_components.png
+++ b/doc/Unicorn_components.png