1. 22 Jun, 2023 3 commits
    • Anthony Chen's avatar
      expose use_orig_params to d2go config · 7f17bbf0
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/582
      
      Expose use_orig_params for FSDP constructor to d2go config. Read more about it in the docstring of torch.distributed.fsdp.fully_sharded_data_parallel.
      
      use_orig_params=False (default) uses FlatParameters to store flattened parameters, which saves memory by avoiding fragmentation. However, use_orig_params=True is essential for models that are partly frozen. This is because FlatParameters can only accept uniform requries_grad across the whole model
      
      Reviewed By: wat3rBro
      
      Differential Revision: D46917757
      
      fbshipit-source-id: 12ebe83e6de456e37d89eaf8b257f23925a6786d
      7f17bbf0
    • Francisc Bungiu's avatar
      Add MAST support for eval · 60b6995d
      Francisc Bungiu authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/583
      
      Extend support to MAST for evaluator binary.
      
      Reviewed By: miqueljubert
      
      Differential Revision: D46762473
      
      fbshipit-source-id: 62ac68f195c89924abf71c9b6a9715d60ffcbf9b
      60b6995d
    • Yanghan Wang's avatar
      clean up all __init__.py · 955e53f6
      Yanghan Wang authored
      Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/580
      
      Reviewed By: ajinkya-deogade
      
      Differential Revision: D46875151
      
      fbshipit-source-id: e19d9ac79c0a4ad1b1ab49112e36f80c55062ea4
      955e53f6
  2. 21 Jun, 2023 1 commit
  3. 19 Jun, 2023 1 commit
  4. 16 Jun, 2023 2 commits
  5. 14 Jun, 2023 1 commit
  6. 13 Jun, 2023 2 commits
    • Anthony Chen's avatar
      delete loaded ckpt after use to save memory · 3fce52cf
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/574
      
      Currently, d2go runner doesn't delete checkpoint after loading. This is fine if we run `resume=True` because all the model/optimizer/ema state in the checkpoint will be loaded into the corresponding training components. However, in the case of `resume=False`, only model state will be loaded and the optimizer/ema state will be left in memory until the end of training. This could potentially cause OOM if the checkpoint size is large.
      
      This diff deletes loaded ckpt after use to save memory and avoid potentiall OOM issues.
      
      Reviewed By: tglik
      
      Differential Revision: D46674618
      
      fbshipit-source-id: 2b70a8e46c7f2a309f83cc4deefe5d7a14783734
      3fce52cf
    • Yanghan Wang's avatar
      move detectron2 related .autodeps.toml to detectron2 · a879c1b4
      Yanghan Wang authored
      Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/572
      
      Reviewed By: ajinkya-deogade
      
      Differential Revision: D46664313
      
      fbshipit-source-id: acb1876c92c3907eb185dd144782495bda593d23
      a879c1b4
  7. 12 Jun, 2023 1 commit
    • Yanghan Wang's avatar
      fix d2go.config · bcad53f6
      Yanghan Wang authored
      Summary:
      I think the main issue is that we import `reroute_config_path` from `d2go.config.config` in `__init__.py`, but it's actually in `d2go.config.utils`. After fixing this, the namespace forward also works, see `scripts/wangyanghan/autodeps_testbed/d2go_config/TARGETS`
      
      Update all TARGETS:
      ```
      fbgs -l "d2go/config:" | xargs printf -- '/data/sandcastle/boxes/%s\n' | xargs arc lint -a
      ```
      
      For reviewers, only `.autodeps.toml` and files in `d2go/d2go/config/` and `scripts/wangyanghan/autodeps_testbed/d2go_config/` are manually changed, other files are auto modified.
      
      Reviewed By: ajinkya-deogade
      
      Differential Revision: D46582416
      
      fbshipit-source-id: 0be0bebedd1aad5b67a746c75db3c6b81bcfecee
      bcad53f6
  8. 08 Jun, 2023 1 commit
  9. 07 Jun, 2023 1 commit
  10. 06 Jun, 2023 1 commit
  11. 03 Jun, 2023 1 commit
  12. 02 Jun, 2023 1 commit
  13. 01 Jun, 2023 1 commit
  14. 29 May, 2023 2 commits
  15. 27 May, 2023 8 commits
  16. 26 May, 2023 4 commits
    • Ajinkya Deogade's avatar
      Quantization: create a separate buck target · 1581776b
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/552
      
      This diff breaks down the TARGETS for dir `quantization`.
      Apart from creating the TARGETS the diff temporarily copies the function `_convert_to_d2` from `d2go/runner/lightning_task.py` to avoid circular dependencies. The change is reverted in the diff D46096373.
      
      Reviewed By: tglik
      
      Differential Revision: D45912067
      
      fbshipit-source-id: b430b2abd129690f8c56479bb75819940fde4e3b
      1581776b
    • Ajinkya Deogade's avatar
      Utils part 1: create a separate buck target · 77dfafa2
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/551
      
      The `utils` dir is broken down into two steps to deal with circular dependencies while keeping the diffs atomic. This diff creates TARGETS for the dirs. `utils`(except `demo_predictor.py`) and `utils/fb`. The TARGETS for `utils/testing` and `utils/demo_predictor.py` are introduced in down the stack in the diff D46096376.
      
      Reviewed By: tglik
      
      Differential Revision: D45912077
      
      fbshipit-source-id: fb01969c5f5df97de8afaa24bee8492591059b4d
      77dfafa2
    • Ajinkya Deogade's avatar
      Optimizer: create a separate buck target · bc1939eb
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/550
      
      Create buck target for the optimizer.
      Other than creating TARGETS, replaced `from d2go.optimizer import build_optimizer_mapper` with `from d2go.optimizer.build import build_optimizer_mapper`
      
      Reviewed By: tglik
      
      Differential Revision: D45912075
      
      fbshipit-source-id: e478783a9ec16d4573d6365e5567e8d2ed72eb06
      bc1939eb
    • Ajinkya Deogade's avatar
      Move iterate_module_named_parameters to utils · 1950242a
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/549
      
      The `iterate_module_named_parameters` is used by the `optimizer` and `quantization`.
      Let's move the `iterate_module_named_parameters` to a shared location `utils` to break the circular dependencies for the following diffs in the stack.
      
      Reviewed By: tglik
      
      Differential Revision: D45912066
      
      fbshipit-source-id: bce5c5db3bbc1866f4da8662f7bd5908bfe30aad
      1950242a
  17. 25 May, 2023 4 commits
    • Jiaxu Zhu's avatar
      Generic Reproducibility · edcdb731
      Jiaxu Zhu authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/548
      
      As title, by setting
      ```
      SOLVER.DETERMINISTIC = True
      SEED = 42 # or other values
      ```
      Training results are reproducible
      
      Reviewed By: wat3rBro, rkaarimi
      
      Differential Revision: D46174626
      
      fbshipit-source-id: d6665b777376a176bd46a1286c3199ed0da26ae6
      edcdb731
    • Ajinkya Deogade's avatar
      Config and Registry: create a separate buck target · 1accd414
      Ajinkya Deogade authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/546
      
      Here we start modularizing the targets. I had to introduce some temporary hacks to break the circular dependency while keeping the diff atomic. There are some TODOs left at the end of the stack that are still WIP.
      
      Reviewed By: tglik
      
      Differential Revision: D45912076
      
      fbshipit-source-id: 375f579fe749dd4a588908cdca7b76ba68f1048f
      1accd414
    • Ajinkya Deogade's avatar
      Resolve relative import for modeldef · 34823153
      Ajinkya Deogade authored
      Summary:
      There is an issue with the relative import in the `__init__` file of modeldef that causes tests on GitHub CI to fail.
      Specifically, the `FBNetV2ModelArch` is not correctly populated.
      The internal CI does not detect such failures because we use the buck build system.
      This diff fixes it.
      
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/547
      
      Reviewed By: patricksnape
      
      Differential Revision: D46177424
      
      fbshipit-source-id: 06b23b9b221c990cd15a2debff6def8cfb99743b
      34823153
    • Anthony Chen's avatar
      fix attribute mismatch for memory profiler · 99c65490
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/544
      
      The previous diff on memory profiler D45673764 doesn't pick up a config key name change and causes an attribute not found error. This diff fixes it and adds two unittests (one with gpu one without) for using memory profiler in runner
      
      Reviewed By: wat3rBro
      
      Differential Revision: D46114730
      
      fbshipit-source-id: d066d435021983d90f4a75e0c88798a3aedcaf92
      99c65490
  18. 24 May, 2023 1 commit
  19. 22 May, 2023 1 commit
    • Anthony Chen's avatar
      Add a GPU memory snapshot profiler in d2go · 20e18edc
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/542
      
      ## Overview
      Add an option to enable GPU memory snapshot profiler in d2go. The profiler is natively supported by Pytorch and is able to record stack traces associated with all CUDA memory allocation/free events, allowing users to understand which parts of code contribute to the memory bottleneck. It also provides a powerful interactive web tool to visualize memory utilization ordered by time:
      {F978609840}
      Each colored block represents an allocated cuda memory block. User can click on the block to see the corresponding python stack trace that allocates the block.
      
      ## d2go integration
      This diff integrates the profiler as a hook controlled by config key `USE_MEMORY_PROFILER`. The profiler will log snapshots and web tools to the output directory. There are three places that logging could happen: start of training, during training and OOM. Please read the docstring of `D2GoGpuMemorySnapshot` for more information.
      
      Reviewed By: tglik, jaconey
      
      Differential Revision: D45673764
      
      fbshipit-source-id: 8900484a2266d94421fe3ee7a85a4dea3a9f6b72
      20e18edc
  20. 19 May, 2023 1 commit
    • Yanghan Wang's avatar
      another implementation of log_interval · 876c6756
      Yanghan Wang authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/543
      
      The previous implementation:
      > the problem is the ContextDecorator somehow swallows the exception in the wrapped function and just returns None.
      
      This diff adds a test such that previous implementation would fail:
      ```
      ======================================================================
      FAIL: test_log_interval_error_prop (d2go.tests.fb.test_utils_logging.TestUtilsLogging)
      Make sure the log_interval can handle error propagation.
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/mobile-vision/d2go/tests/__init_tests__/init_tests#link-tree/d2go/tests/fb/test_utils_logging.py", line 152, in test_log_interval_error_prop
          foo(-1)
      AssertionError: ValueError not raised
      
      ----------------------------------------------------------------------
      Ran 1 test in 0.098s
      ```
      
      The new version seems easier to understand and doesn't have the error swallowing.
      
      Reviewed By: jaconey
      
      Differential Revision: D46009938
      
      fbshipit-source-id: 6b632deb513ab47c4d760f796bf49fc45eae3005
      876c6756
  21. 18 May, 2023 1 commit
  22. 16 May, 2023 1 commit