1. 26 Jun, 2023 1 commit
  2. 23 Jun, 2023 4 commits
    • Anthony Chen's avatar
      disable FSDP mixed precision for model buffers · b0abd7aa
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/585
      
      Disable FSDP mixed precision for model buffers. Buffers are usually small in size so there's very limited performance gain for enabling mixed precision. Plus, applications like BatchNorm layers and diffusion models are very sensitive to the precision of buffers. Thus, we stick to full precision for buffers in FSDP.
      
      Reviewed By: wat3rBro
      
      Differential Revision: D46951673
      
      fbshipit-source-id: 12bb1a47fbd8b3dd85c7f781bab707206044af15
      b0abd7aa
    • Zhicheng Yan's avatar
      update INJECTED_COCO_DATASETS_LUT when registering AdhocCOCODataset · be8a6324
      Zhicheng Yan authored
      Summary:
      When registering AdhocCOCODataset, INJECTED_COCO_DATASETS_LUT needs to be updated as well.
      For example, if a dataset uses custom registering function, it can be only retrieved from INJECTED_COCO_DATASETS_LUT.
      Otherwise, it uses the default registering function as in branch `register_dataset_split`.
      
      Reviewed By: antonrigner
      
      Differential Revision: D46826507
      
      fbshipit-source-id: 9170c5b57f3935875b899ab7f93c3c57e77eb28c
      be8a6324
    • Anthony Chen's avatar
      remove AC prefix from EMA to make it compatible with loading · 5c23bee8
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/578
      
      # Problem:
      d2go EMA uses `named_parameters()` to traverse model states and save EMA checkpoints, while using `state_dict()`  to save model checkpoints. This is a brittle practice because `named_parameters()` and `state_dict()` are calling two sets of python APIs and can return different things.
      In the case of Activation Checkpointing (AC), we don't want AC wrapper to affect checkpoint names. Thus, `state_dict()` is overriden by Pytorch to remove prefix "_checkpoint_wrapped_module" from FQN. However, `named_parameters()` does not have that support, so prefix still exists. In the event of us changing AC wrapping strategy (very common for optimization), we will not be able to load the previous EMA state back to the model. And the same problem also happened with FSDP.
      
      # Short-term hack:
      This diff adds a short term hack to manually remove the AC prefix in EMA. We can expand `IGNORED_FQN_PREFIX` to support more use cases.
      
      Reviewed By: wat3rBro
      
      Differential Revision: D46815031
      
      fbshipit-source-id: 29b6ea444ed2ef90b8741fccdcb2b62625933e7f
      5c23bee8
    • Anthony Chen's avatar
      disable memory profiler by default + remove force disable + add logging · c0a84df5
      Anthony Chen authored
      Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/581
      
      Reviewed By: wat3rBro
      
      Differential Revision: D46913792
      
      fbshipit-source-id: cf3c3812c455091fbf63842443644d2571976017
      c0a84df5
  3. 22 Jun, 2023 3 commits
    • Anthony Chen's avatar
      expose use_orig_params to d2go config · 7f17bbf0
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/582
      
      Expose use_orig_params for FSDP constructor to d2go config. Read more about it in the docstring of torch.distributed.fsdp.fully_sharded_data_parallel.
      
      use_orig_params=False (default) uses FlatParameters to store flattened parameters, which saves memory by avoiding fragmentation. However, use_orig_params=True is essential for models that are partly frozen. This is because FlatParameters can only accept uniform requries_grad across the whole model
      
      Reviewed By: wat3rBro
      
      Differential Revision: D46917757
      
      fbshipit-source-id: 12ebe83e6de456e37d89eaf8b257f23925a6786d
      7f17bbf0
    • Francisc Bungiu's avatar
      Add MAST support for eval · 60b6995d
      Francisc Bungiu authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/583
      
      Extend support to MAST for evaluator binary.
      
      Reviewed By: miqueljubert
      
      Differential Revision: D46762473
      
      fbshipit-source-id: 62ac68f195c89924abf71c9b6a9715d60ffcbf9b
      60b6995d
    • Yanghan Wang's avatar
      clean up all __init__.py · 955e53f6
      Yanghan Wang authored
      Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/580
      
      Reviewed By: ajinkya-deogade
      
      Differential Revision: D46875151
      
      fbshipit-source-id: e19d9ac79c0a4ad1b1ab49112e36f80c55062ea4
      955e53f6
  4. 21 Jun, 2023 1 commit
  5. 19 Jun, 2023 1 commit
  6. 16 Jun, 2023 2 commits
  7. 14 Jun, 2023 1 commit
  8. 13 Jun, 2023 2 commits
    • Anthony Chen's avatar
      delete loaded ckpt after use to save memory · 3fce52cf
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/574
      
      Currently, d2go runner doesn't delete checkpoint after loading. This is fine if we run `resume=True` because all the model/optimizer/ema state in the checkpoint will be loaded into the corresponding training components. However, in the case of `resume=False`, only model state will be loaded and the optimizer/ema state will be left in memory until the end of training. This could potentially cause OOM if the checkpoint size is large.
      
      This diff deletes loaded ckpt after use to save memory and avoid potentiall OOM issues.
      
      Reviewed By: tglik
      
      Differential Revision: D46674618
      
      fbshipit-source-id: 2b70a8e46c7f2a309f83cc4deefe5d7a14783734
      3fce52cf
    • Yanghan Wang's avatar
      move detectron2 related .autodeps.toml to detectron2 · a879c1b4
      Yanghan Wang authored
      Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/572
      
      Reviewed By: ajinkya-deogade
      
      Differential Revision: D46664313
      
      fbshipit-source-id: acb1876c92c3907eb185dd144782495bda593d23
      a879c1b4
  9. 12 Jun, 2023 1 commit
    • Yanghan Wang's avatar
      fix d2go.config · bcad53f6
      Yanghan Wang authored
      Summary:
      I think the main issue is that we import `reroute_config_path` from `d2go.config.config` in `__init__.py`, but it's actually in `d2go.config.utils`. After fixing this, the namespace forward also works, see `scripts/wangyanghan/autodeps_testbed/d2go_config/TARGETS`
      
      Update all TARGETS:
      ```
      fbgs -l "d2go/config:" | xargs printf -- '/data/sandcastle/boxes/%s\n' | xargs arc lint -a
      ```
      
      For reviewers, only `.autodeps.toml` and files in `d2go/d2go/config/` and `scripts/wangyanghan/autodeps_testbed/d2go_config/` are manually changed, other files are auto modified.
      
      Reviewed By: ajinkya-deogade
      
      Differential Revision: D46582416
      
      fbshipit-source-id: 0be0bebedd1aad5b67a746c75db3c6b81bcfecee
      bcad53f6
  10. 08 Jun, 2023 1 commit
  11. 07 Jun, 2023 1 commit
  12. 06 Jun, 2023 1 commit
  13. 03 Jun, 2023 1 commit
  14. 02 Jun, 2023 1 commit
  15. 01 Jun, 2023 1 commit
  16. 29 May, 2023 2 commits
  17. 27 May, 2023 8 commits
  18. 26 May, 2023 4 commits
    • Ajinkya Deogade's avatar
      Quantization: create a separate buck target · 1581776b
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/552
      
      This diff breaks down the TARGETS for dir `quantization`.
      Apart from creating the TARGETS the diff temporarily copies the function `_convert_to_d2` from `d2go/runner/lightning_task.py` to avoid circular dependencies. The change is reverted in the diff D46096373.
      
      Reviewed By: tglik
      
      Differential Revision: D45912067
      
      fbshipit-source-id: b430b2abd129690f8c56479bb75819940fde4e3b
      1581776b
    • Ajinkya Deogade's avatar
      Utils part 1: create a separate buck target · 77dfafa2
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/551
      
      The `utils` dir is broken down into two steps to deal with circular dependencies while keeping the diffs atomic. This diff creates TARGETS for the dirs. `utils`(except `demo_predictor.py`) and `utils/fb`. The TARGETS for `utils/testing` and `utils/demo_predictor.py` are introduced in down the stack in the diff D46096376.
      
      Reviewed By: tglik
      
      Differential Revision: D45912077
      
      fbshipit-source-id: fb01969c5f5df97de8afaa24bee8492591059b4d
      77dfafa2
    • Ajinkya Deogade's avatar
      Optimizer: create a separate buck target · bc1939eb
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/550
      
      Create buck target for the optimizer.
      Other than creating TARGETS, replaced `from d2go.optimizer import build_optimizer_mapper` with `from d2go.optimizer.build import build_optimizer_mapper`
      
      Reviewed By: tglik
      
      Differential Revision: D45912075
      
      fbshipit-source-id: e478783a9ec16d4573d6365e5567e8d2ed72eb06
      bc1939eb
    • Ajinkya Deogade's avatar
      Move iterate_module_named_parameters to utils · 1950242a
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/549
      
      The `iterate_module_named_parameters` is used by the `optimizer` and `quantization`.
      Let's move the `iterate_module_named_parameters` to a shared location `utils` to break the circular dependencies for the following diffs in the stack.
      
      Reviewed By: tglik
      
      Differential Revision: D45912066
      
      fbshipit-source-id: bce5c5db3bbc1866f4da8662f7bd5908bfe30aad
      1950242a
  19. 25 May, 2023 4 commits
    • Jiaxu Zhu's avatar
      Generic Reproducibility · edcdb731
      Jiaxu Zhu authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/548
      
      As title, by setting
      ```
      SOLVER.DETERMINISTIC = True
      SEED = 42 # or other values
      ```
      Training results are reproducible
      
      Reviewed By: wat3rBro, rkaarimi
      
      Differential Revision: D46174626
      
      fbshipit-source-id: d6665b777376a176bd46a1286c3199ed0da26ae6
      edcdb731
    • Ajinkya Deogade's avatar
      Config and Registry: create a separate buck target · 1accd414
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/546
      
      Here we start modularizing the targets. I had to introduce some temporary hacks to break the circular dependency while keeping the diff atomic. There are some TODOs left at the end of the stack that are still WIP.
      
      Reviewed By: tglik
      
      Differential Revision: D45912076
      
      fbshipit-source-id: 375f579fe749dd4a588908cdca7b76ba68f1048f
      1accd414
    • Ajinkya Deogade's avatar
      Resolve relative import for modeldef · 34823153
      Ajinkya Deogade authored
      Summary:
      There is an issue with the relative import in the `__init__` file of modeldef that causes tests on GitHub CI to fail.
      Specifically, the `FBNetV2ModelArch` is not correctly populated.
      The internal CI does not detect such failures because we use the buck build system.
      This diff fixes it.
      
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/547
      
      Reviewed By: patricksnape
      
      Differential Revision: D46177424
      
      fbshipit-source-id: 06b23b9b221c990cd15a2debff6def8cfb99743b
      34823153
    • Anthony Chen's avatar
      fix attribute mismatch for memory profiler · 99c65490
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/544
      
      The previous diff on memory profiler D45673764 doesn't pick up a config key name change and causes an attribute not found error. This diff fixes it and adds two unittests (one with gpu one without) for using memory profiler in runner
      
      Reviewed By: wat3rBro
      
      Differential Revision: D46114730
      
      fbshipit-source-id: d066d435021983d90f4a75e0c88798a3aedcaf92
      99c65490