Add an option to specify the period of metric gathering and writing in Trainer
Summary: X-link: https://github.com/fairinternal/detectron2/pull/591 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/469 X-link: https://github.com/facebookresearch/detectron2/pull/4785 Add an option to specify the period of metric gathering and writing in Trainer. This feature is needed to optimize training speed for large-scale training jobs like generative AI. The reason is that the all_gather call in metric writing at every iteration is time-consuming when hundreds of gpus are used. This takes ~10% of the total training time. With this feature we can set the metric writing period as the same as cfg.WRITER_PERIOD=20 to reduce training time while still keeping metric logging the same to users Reviewed By: miqueljubert, wat3rBro Differential Revision: D43098985 Privacy Context Container: 2011691122555468 fbshipit-source-id: 63c93a7331aa63badce5125e5240d2d5f7e61b74
Showing
Please register or sign in to comment