This file documents a large collection of baselines trained
with detectron2 in Sep-Oct, 2019.
All numbers were obtained on [Big Basin](https://engineering.fb.com/data-center-engineering/introducing-big-basin-our-next-generation-ai-hardware/)
servers with 8 NVIDIA V100 GPUs & NVLink. The software in use were PyTorch 1.3, CUDA 9.2, cuDNN 7.4.2 or 7.6.3.
You can access these models from code using [detectron2.model_zoo](https://detectron2.readthedocs.io/modules/model_zoo.html) APIs.
In addition to these official baseline models, you can find more models in [projects/](projects/).
#### How to Read the Tables
* The "Name" column contains a link to the config file. Running `tools/train_net.py` with this config file
and 8 GPUs will reproduce the model.
* Training speed is averaged across the entire training.
We keep updating the speed with latest version of detectron2/pytorch/etc.,
so they might be different from the `metrics` file.
Training speed for multi-machine jobs is not provided.
* Inference speed is measured by `tools/train_net.py --eval-only`, or [inference_on_dataset()](https://detectron2.readthedocs.io/modules/evaluation.html#detectron2.evaluation.inference_on_dataset),
with batch size 1 in detectron2 directly.
Measuring it with your own code will likely introduce other overhead.
Actual deployment in production should in general be faster than the given inference
speed due to more optimizations.
* The *model id* column is provided for ease of reference.
To check downloaded file integrity, any model on this page contains its md5 prefix in its file name.
* Training curves and other statistics can be found in `metrics` for each model.
#### Common Settings for COCO Models
* All COCO models were trained on `train2017` and evaluated on `val2017`.
* The default settings are __not directly comparable__ with Detectron's standard settings.
For example, our default training data augmentation uses scale jittering in addition to horizontal flipping.
To make fair comparisons with Detectron's settings, see
[Detectron1-Comparisons](configs/Detectron1-Comparisons/) for accuracy comparison,
and [benchmarks](https://detectron2.readthedocs.io/notes/benchmarks.html)
for speed comparison.
* For Faster/Mask R-CNN, we provide baselines based on __3 different backbone combinations__:
* __FPN__: Use a ResNet+FPN backbone with standard conv and FC heads for mask and box prediction,
respectively. It obtains the best
speed/accuracy tradeoff, but the other two are still useful for research.
* __C4__: Use a ResNet conv4 backbone with conv5 head. The original baseline in the Faster R-CNN paper.
* __DC5__ (Dilated-C5): Use a ResNet conv5 backbone with dilations in conv5, and standard conv and FC heads
for mask and box prediction, respectively.
This is used by the Deformable ConvNet paper.
* Most models are trained with the 3x schedule (~37 COCO epochs).
Although 1x models are heavily under-trained, we provide some ResNet-50 models with the 1x (~12 COCO epochs)
training schedule for comparison when doing quick research iteration.
#### ImageNet Pretrained Models
We provide backbone models pretrained on ImageNet-1k dataset.
These models have __different__ format from those provided in Detectron: we do not fuse BatchNorm into an affine layer.
*[R-50.pkl](https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/MSRA/R-50.pkl): converted copy of [MSRA's original ResNet-50](https://github.com/KaimingHe/deep-residual-networks) model.
*[R-101.pkl](https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/MSRA/R-101.pkl): converted copy of [MSRA's original ResNet-101](https://github.com/KaimingHe/deep-residual-networks) model.
*[X-101-32x8d.pkl](https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/FAIR/X-101-32x8d.pkl): ResNeXt-101-32x8d model trained with Caffe2 at FB.
Pretrained models in Detectron's format can still be used. For example:
Ablations for normalization methods, and a few models trained from scratch following [Rethinking ImageNet Pre-training](https://arxiv.org/abs/1811.08883).
(Note: The baseline uses `2fc` head while the others use [`4conv1fc` head](https://arxiv.org/abs/1803.08494))
* It is powered by the [PyTorch](https://pytorch.org) deep learning framework.
* Includes more features such as panoptic segmentation, densepose, Cascade R-CNN, rotated bounding boxes, etc.
* Can be used as a library to support [different projects](projects/) on top of it.
We'll open source more research projects in this way.
* It [trains much faster](https://detectron2.readthedocs.io/notes/benchmarks.html).
See our [blog post](https://ai.facebook.com/blog/-detectron2-a-pytorch-based-modular-object-detection-library-/)
to see more demos and learn about detectron2.
## Installation
See [INSTALL.md](INSTALL.md).
## Quick Start
See [GETTING_STARTED.md](GETTING_STARTED.md),
or the [Colab Notebook](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5).
Learn more at our [documentation](https://detectron2.readthedocs.org).
And see [projects/](projects/) for some projects that are built on top of detectron2.
## Model Zoo and Baselines
We provide a large set of baseline results and trained models available for download in the [Detectron2 Model Zoo](MODEL_ZOO.md).
## License
Detectron2 is released under the [Apache 2.0 license](LICENSE).
## Citing Detectron2
If you use Detectron2 in your research or wish to refer to the baseline results published in the [Model Zoo](MODEL_ZOO.md), please use the following BibTeX entry.
```BibTeX
@misc{wu2019detectron2,
author = {Yuxin Wu and Alexander Kirillov and Francisco Massa and