• Zhicheng Yan's avatar
    fix two-stage DF-DETR · aea87f6c
    Zhicheng Yan authored
    Summary:
    Pull Request resolved: https://github.com/facebookresearch/d2go/pull/106
    
    # 2-stage DF-DETR
    
    DF-DETR supports 2-stage detection. In the 1st stage, we detect class-agnostic boxes using the feature pyramid (a.k.a. `memory` in the code) computed by the encoder.
    
    Current implementation has a few flaws
    - In `setcriterion.py`, when computing loss for encoder 1st stage predictions, `num_boxes` should be reduced across gpus and also clamped to be positive integer to avoid divide-by-zero bug. Current implementation will lead to divide-by-zero NaN issue when `num_boxes` is zero (e.g. no box annotation in the cropped input image).
    - In `gen_encoder_output_proposals()`, it manually fill in `float("inf")` at invalid spatial positions outside of actual image size. However, it is not guaranteed that those positions won't be selected as top-scored positions.  `float("inf")` can easily cause affected parameters to be updated to NaN value.
    - `class_embed` for encoder should has 1 channel rather than num_class channels because we only need to predict the probability of being a foreground box.
    
    This diff fixes the issues above.
    
    # Gradient blocking in decoder
    
    Currently, gradient of reference point is blocked at each decoding layer to improve numerical stability during training.
    In this diff, add an option `MODEL.DETR.DECODER_BLOCK_GRAD`. When False, we do NOT block the gradient. Empirically, we find this leads to better box AP.
    
    Reviewed By: zhanghang1989
    
    Differential Revision: D30325396
    
    fbshipit-source-id: 7d7add1e05888adda6e46cc6886117170daa22d4
    aea87f6c
deformable_detr.py 15.6 KB