model_zoo.md 8.73 KB
Newer Older
bailuo's avatar
init  
bailuo committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# Unicorn Model Zoo
Here we provide the performance of Unicorn on multiple tasks (Object Detection, Instance Segmentation, and Object Tracking).
The complete model weights and the corresponding training logs are given by the links.

## Object Detection
The object detector of Unicorn is pretrained and evaluated on COCO. In this step, there is no segmentation head and the network is trained only using box-level annotations.
<table>
  <tr>
    <th>Experiment
    <th>Backone</th>
    <th>Box AP</th>
    <th>Model</th>
    <th>Log</th>
  </tr>
  <tr>
    <td>unicorn_det_convnext_large_800x1280</td>
    <td>ConvNext-Large</td>
    <td>53.7</td>
    <td><a href="https://drive.google.com/file/d/1kET9m1BV9f6agv5EY0oNHinJH00PKn_Q/view?usp=sharing">model</a></td>
    <td><a href="https://drive.google.com/file/d/1QMzcK0bnPE3fcyRLHFp0W6hJdHVfgsUS/view?usp=sharing">log</a></td>
  </tr>
  <tr>
    <td>unicorn_det_convnext_tiny_800x1280</td>
    <td>ConvNext-Tiny</td>
    <td>53.1</td>
    <td><a href="https://drive.google.com/file/d/11kLsIOp6jQEEM0ZmOvvsJW_RgjgCxuYZ/view?usp=sharing">model</a></td>
    <td><a href="https://drive.google.com/file/d/1GezYXWtUStUf01oeDvkVOFFpJ7CVh2dk/view?usp=sharing">log</a></td>
  </tr>
  <tr>
    <td>unicorn_det_r50_800x1280</td>
    <td>ResNet-50</td>
    <td>51.7</td>
    <td><a href="https://drive.google.com/file/d/13wJ8lRrIrhixDYv7zgbQ6KEhIwQH15aQ/view?usp=sharing">model</a></td>
    <td><a href="https://drive.google.com/file/d/1E8XJHsKj5fjGTZ9Y3hMYLJKk9tQmC2pU/view?usp=sharing">log</a></td>
  </tr>

</table>

## Instance Segmentation (Optional)
Please note that this part is optional. The training of downstream tracking tasks do not rely on this. So please feel free to skip it unless you are interested in instance segmentation on COCO. In this step, a segmentaiton head is appended to the pretrained object detector. Then parameters of the object detector are frozen and only the segmentation head is optimized. So the box AP would be the same as that in the previous stage. Here we provide the results of the model with convnext-tiny backbone.

<table>
  <tr>
    <th>Experiment
    <th>Backone</th>
    <th>Mask AP</th>
    <th>Model</th>
    <th>Log</th>
  </tr>
  <tr>
    <td>unicorn_inst_convnext_tiny_800x1280</td>
    <td>ConvNext-Tiny</td>
    <td>43.2</td>
    <td><a href="https://drive.google.com/file/d/1S7wG5dzmjyeyl6gZzgJvd9EBWGO-QU2E/view?usp=sharing">model</a></td>
    <td><a href="https://drive.google.com/file/d/1TpdECG_Vt7zaAEAkhS_l_uFEGEVeU_L0/view?usp=sharing">log</a></td>
  </tr>

</table>


## Object Tracking
There are some inner conflicts among existing MOT benchmarks. 
- Different benchmarks focus on different object classes. For example, MOT Challenge, BDD100K, and TAO include 1, 8, and 800+ object classes.
- Different benchmarks have different labeling rules. For example, the MOT challenge always annotates the whole person, even when the person is heavily occluded or cut by the image boundary. However, the other benchmarks do not share the same rule. 

These factors make it difficult to train one unified model for different MOT benchmarks. To deal with this problem, Unicorn trains two unified models. To be specific, the first model can simultaneously deal with SOT, BDD100K, VOS, and BDD100K MOTS. The second model can simultaneously deal with SOT, MOT17, VOS, and MOTS Challenge. The results of SOT and VOS are reported using the first model.

The results of the first group of models are shown as below.
<table>
  <tr>
    <th>Experiment</th>
    <th>Input Size</th>
    <th>LaSOT<br>AUC (%)</th>
    <th>BDD100K<br>mMOTA (%)</th>
    <th>DAVIS17<br>J&F (%)</th>
    <th>BDD100K MOTS<br>mMOTSA (%)</th>
    <th>Model</th>
    <th>Log<br>Stage1</th>
    <th>Log<br>Stage2</th>
  </tr>
  <tr>
    <td>unicorn_track_large_mask</td>
    <td>800x1280</td>
    <td>68.5</td>
    <td>41.2</td>
    <td>69.2</td>
    <td>29.6</td>
    <td><a href="https://huggingface.co/NimaBoscarino/unicorn_track_large_mask">model</a></td>
    <td><a href="https://drive.google.com/file/d/1GZwqWsMgx8H3VYPZcDwk_4XxTSJxEyEf/view?usp=sharing">log1</a></td>
    <td><a href="https://drive.google.com/file/d/1eWLNiOyKFX8Tu0Xfp2whR1g7n8CBEMQ7/view?usp=sharing">log2</a></td>
  </tr>
  <tr>
    <td>unicorn_track_tiny_mask</td>
    <td>800x1280</td>
    <td>67.7</td>
    <td>39.9</td>
    <td>68.0</td>
    <td>29.7</td>
    <td><a href="https://huggingface.co/NimaBoscarino/unicorn_track_tiny_mask">model</a></td>
    <td><a href="https://drive.google.com/file/d/1BQPi5e_iOCQBKYj55U0Um7Y2NI69_R5z/view?usp=sharing">log1</a></td>
    <td><a href="https://drive.google.com/file/d/1dgTiATiVFyZT4xYkvxHO6kgjNSCzfd5m/view?usp=sharing">log2</a></td>
  </tr>
  <tr>
    <td>unicorn_track_tiny_rt_mask</td>
    <td>640x1024</td>
    <td>67.1</td>
    <td>37.5</td>
    <td>66.8</td>
    <td>26.2</td>
    <td><a href="https://huggingface.co/NimaBoscarino/unicorn_track_tiny_rt_mask">model</a></td>
    <td><a href="https://drive.google.com/file/d/1ObMKqOr46AKmAcIC6-pTez6s0mxgTAqy/view?usp=sharing">log1</a></td>
    <td><a href="https://drive.google.com/file/d/1HdRj5ME157hDO84k6lxnA6gz1Tbe5EdQ/view?usp=sharing">log2</a></td>
  </tr>
  <tr>
    <td>unicorn_track_r50_mask</td>
    <td>800x1280</td>
    <td>65.3</td>
    <td>35.1</td>
    <td>66.2</td>
    <td>30.8</td>
    <td><a href="https://huggingface.co/NimaBoscarino/unicorn_track_r50_mask">model</a></td>
    <td><a href="https://drive.google.com/file/d/13tDFEjFbYYZAYvDYkOXoKThDyfRT7-53/view?usp=sharing">log1</a></td>
    <td><a href="https://drive.google.com/file/d/1Qh45-TW4Nw9qx7Gkk4L6uDKGXomnnuKy/view?usp=sharing">log2</a></td>
  </tr>

</table>

The results of the second group of models are shown as below.
<table>
  <tr>
    <th>Experiment</th>
    <th>Input Size</th>
    <th>MOT17<br>MOTA (%)</th>
    <th>MOTS<br>sMOTSA (%)</th>
    <th>Model</th>
    <th>Log<br>Stage1</th>
    <th>Log<br>Stage2</th>
  </tr>
  <tr>
    <td>unicorn_track_large_mot_challenge_mask</td>
    <td>800x1280</td>
    <td>77.2</td>
    <td>65.3</td>
    <td><a href="https://drive.google.com/file/d/1tktJbsdA3peX9i8tAcDGdxwxMit0rPs0/view?usp=sharing">model</a></td>
    <td><a href="https://drive.google.com/file/d/1NFcEkOarlhLI6jxoibKwWVqJ9jj6NE-L/view?usp=sharing">log1</a></td>
    <td><a href="https://drive.google.com/file/d/18R_IUi8ooq4ZKah0DV0ajY7Y1GO5DYvB/view?usp=sharing">log2</a></td>
  </tr>

</table>

We also provide task-specific models for users who are only interested in part of tasks.
<table>
  <tr>
    <th>Experiment</th>
    <th>Input Size</th>
    <th>LaSOT<br>AUC (%)</th>
    <th>BDD100K<br>mMOTA (%)</th>
    <th>DAVIS17<br>J&F (%)</th>
    <th>BDD100K MOTS<br>mMOTSA (%)</th>
    <th>Model</th>
    <th>Log<br>Stage1</th>
    <th>Log<br>Stage2</th>
  </tr>
  <tr>
    <td>unicorn_track_tiny_sot_only</td>
    <td>800x1280</td>
    <td>67.5</td>
    <td>-</td>
    <td>-</td>
    <td>-</td>
    <td><a href="https://drive.google.com/file/d/1NcMsWML-1-zr0SWXUOiRPNZRs-VnAWG5/view?usp=sharing">model</a></td>
    <td><a href="https://drive.google.com/file/d/1TiKopPT93v6JDYrrKMihlBV44h6VugqS/view?usp=sharing">log1</a></td>
    <td>-</a></td>
  </tr>
  <tr>
    <td>unicorn_track_tiny_mot_only</td>
    <td>800x1280</td>
    <td>-</td>
    <td>39.6</td>
    <td>-</td>
    <td>-</td>
    <td><a href="https://drive.google.com/file/d/1T0DxX-d_qeHvVlZ7IIbdNqtsHCQANKlQ/view?usp=sharing">model</a></td>
    <td><a href="https://drive.google.com/file/d/1Fx8jataBKFH2c-q1uNVEFomQgm9252QX/view?usp=sharing">log1</a></td>
    <td>-</a></td>
  </tr>
  <tr>
    <td>unicorn_track_tiny_vos_only</td>
    <td>800x1280</td>
    <td>-</td>
    <td>-</td>
    <td>68.4</td>
    <td>-</td>
    <td><a href="https://drive.google.com/file/d/12T7XodWFuwSFAv5oBcDTIBJT4INVpQ94/view?usp=sharing">model</a></td>
    <td>-</td>
    <td><a href="https://drive.google.com/file/d/1jxbAZEVgvD2pZko9jce6pJYnzYP866PQ/view?usp=sharing">log2</a></td>
  </tr>
  <tr>
    <td>unicorn_track_tiny_mots_only</td>
    <td>800x1280</td>
    <td>-</td>
    <td>-</td>
    <td>-</td>
    <td>28.1</td>
    <td><a href="https://drive.google.com/file/d/13D0rH3i0n5d_W8ead41zFySDwXAePxgv/view?usp=sharing">model</a></td>
    <td>-</a></td>
    <td><a href="https://drive.google.com/file/d/1P0GacRmGLwFw72rpaorBZ3vI63S70IeS/view?usp=sharing">log2</a></td>
  </tr>

</table>

## Structure
The downloaded checkpoints should be organized in the following structure
   ```
   ${UNICORN_ROOT}
    -- Unicorn_outputs
        -- unicorn_det_convnext_large_800x1280
            -- best_ckpt.pth
        -- unicorn_det_convnext_tiny_800x1280
            -- best_ckpt.pth
        -- unicorn_det_r50_800x1280
            -- best_ckpt.pth
        -- unicorn_track_large_mask
            -- latest_ckpt.pth
        -- unicorn_track_tiny_mask
            -- latest_ckpt.pth
        -- unicorn_track_r50_mask
            -- latest_ckpt.pth
        ...
   ```