"vscode:/vscode.git/clone" did not exist on "fd1c54abf2c3c0ea0ad8f16fc278ae62371154f3"
clamp reference point max to 1.0 to avoid NaN in regressed bbox
Summary: For training DF-DETR with swin-transformer backbone which uses large size_divisibility 224 (=32 * 7) and potentially has more zero-padding, we find the regressed box can contain NaN values and fail the assertion here (https://fburl.com/code/p27ztcce). This issue might be caused by two potential reasons. - Fix 1. In DF-DETR encoder, the reference points prepared by `get_reference_points()` can contain normalized x,y coordinates larger than 1 due to the rounding issues during mask interpolation across feature scales (specific examples can be given upon request LoL). Thus, we clamp max of x,y coordinates to 1.0. - Fix 2. The MLP used in bbox_embed heads contains 3 FC layers, which might be too many. We introduce an argument `BBOX_EMBED_NUM_LAYERS` to allow users to configure the number of FC layers. This change is back-compatible. Reviewed By: zhanghang1989 Differential Revision: D30661167 fbshipit-source-id: c7e94983bf1ec07426fdf1b9d363e5163637f21a
Showing
Please register or sign in to comment