projects_oss/detr/detr/models/deformable_detr.py · aea87f6c0721bdace72261977a50f4d8f399afaf · OpenDAS / d2go

Zhicheng Yan authored Aug 24, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/106

# 2-stage DF-DETR

DF-DETR supports 2-stage detection. In the 1st stage, we detect class-agnostic boxes using the feature pyramid (a.k.a. `memory` in the code) computed by the encoder.

Current implementation has a few flaws
- In `setcriterion.py`, when computing loss for encoder 1st stage predictions, `num_boxes` should be reduced across gpus and also clamped to be positive integer to avoid divide-by-zero bug. Current implementation will lead to divide-by-zero NaN issue when `num_boxes` is zero (e.g. no box annotation in the cropped input image).
- In `gen_encoder_output_proposals()`, it manually fill in `float("inf")` at invalid spatial positions outside of actual image size. However, it is not guaranteed that those positions won't be selected as top-scored positions. `float("inf")` can easily cause affected parameters to be updated to NaN value.
- `class_embed` for encoder should has 1 channel rather than num_class channels because we only need to predict the probability of being a foreground box.

This diff fixes the issues above.

# Gradient blocking in decoder

Currently, gradient of reference point is blocked at each decoding layer to improve numerical stability during training.
In this diff, add an option `MODEL.DETR.DECODER_BLOCK_GRAD`. When False, we do NOT block the gradient. Empirically, we find this leads to better box AP.

Reviewed By: zhanghang1989

Differential Revision: D30325396

fbshipit-source-id: 7d7add1e05888adda6e46cc6886117170daa22d4

aea87f6c

deformable_detr.py 15.6 KB

Replace deformable_detr.py