fix two-stage DF-DETR
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/106 # 2-stage DF-DETR DF-DETR supports 2-stage detection. In the 1st stage, we detect class-agnostic boxes using the feature pyramid (a.k.a. `memory` in the code) computed by the encoder. Current implementation has a few flaws - In `setcriterion.py`, when computing loss for encoder 1st stage predictions, `num_boxes` should be reduced across gpus and also clamped to be positive integer to avoid divide-by-zero bug. Current implementation will lead to divide-by-zero NaN issue when `num_boxes` is zero (e.g. no box annotation in the cropped input image). - In `gen_encoder_output_proposals()`, it manually fill in `float("inf")` at invalid spatial positions outside of actual image size. However, it is not guaranteed that those positions won't be selected as top-scored positions. `float("inf")` can easily cause affected parameters to be updated to NaN value. - `class_embed` for encoder should has 1 channel rather than num_class channels because we only need to predict the probability of being a foreground box. This diff fixes the issues above. # Gradient blocking in decoder Currently, gradient of reference point is blocked at each decoding layer to improve numerical stability during training. In this diff, add an option `MODEL.DETR.DECODER_BLOCK_GRAD`. When False, we do NOT block the gradient. Empirically, we find this leads to better box AP. Reviewed By: zhanghang1989 Differential Revision: D30325396 fbshipit-source-id: 7d7add1e05888adda6e46cc6886117170daa22d4
Showing
Please register or sign in to comment