- 22 Jun, 2023 3 commits
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/582 Expose use_orig_params for FSDP constructor to d2go config. Read more about it in the docstring of torch.distributed.fsdp.fully_sharded_data_parallel. use_orig_params=False (default) uses FlatParameters to store flattened parameters, which saves memory by avoiding fragmentation. However, use_orig_params=True is essential for models that are partly frozen. This is because FlatParameters can only accept uniform requries_grad across the whole model Reviewed By: wat3rBro Differential Revision: D46917757 fbshipit-source-id: 12ebe83e6de456e37d89eaf8b257f23925a6786d
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/583 Extend support to MAST for evaluator binary. Reviewed By: miqueljubert Differential Revision: D46762473 fbshipit-source-id: 62ac68f195c89924abf71c9b6a9715d60ffcbf9b
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/580 Reviewed By: ajinkya-deogade Differential Revision: D46875151 fbshipit-source-id: e19d9ac79c0a4ad1b1ab49112e36f80c55062ea4
-
- 21 Jun, 2023 1 commit
-
-
Devin Zhou authored
Summary: This diff enables both category and datasets weight balancing at the same time by declaring "WeightedCategoryTrainingSampler" under "SAMPLER_TRAIN" in config file. X-link: https://github.com/facebookresearch/detectron2/pull/4995 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/570 Reviewed By: jiaxuzhu92, shiyud Differential Revision: D46377371 fbshipit-source-id: 4e8bdf6a7e5d40b04072cb99637d13d85b2e0fce
-
- 19 Jun, 2023 1 commit
-
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/579 Current code assumed training runs only on one node, and there is always a global rank0 on each node. This assumption fails on multinode training, resulting in a key 0 error. Reviewed By: crassirostris Differential Revision: D46841286 fbshipit-source-id: d57919239fa5042de795d74c9c2013b07c9a0a48
-
- 16 Jun, 2023 2 commits
-
-
Miquel Jubert Hermoso authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/577 Reviewed By: seijiyamamoto Differential Revision: D46798443 fbshipit-source-id: 21e66cc26d98e866d34c92fa86b26b977c02925d
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/575 ez Reviewed By: ajinkya-deogade Differential Revision: D46773836 fbshipit-source-id: 8cbfbfac6a60cab26ee1975ce0b876738711c160
-
- 14 Jun, 2023 1 commit
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/573 Enable Activation Checkpointing from Pytorch Distributed in d2go. Reviewed By: rohan-varma Differential Revision: D45681009 fbshipit-source-id: c03f27af61e0374b9e5991d82070edbe41edde6d
-
- 13 Jun, 2023 2 commits
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/574 Currently, d2go runner doesn't delete checkpoint after loading. This is fine if we run `resume=True` because all the model/optimizer/ema state in the checkpoint will be loaded into the corresponding training components. However, in the case of `resume=False`, only model state will be loaded and the optimizer/ema state will be left in memory until the end of training. This could potentially cause OOM if the checkpoint size is large. This diff deletes loaded ckpt after use to save memory and avoid potentiall OOM issues. Reviewed By: tglik Differential Revision: D46674618 fbshipit-source-id: 2b70a8e46c7f2a309f83cc4deefe5d7a14783734
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/572 Reviewed By: ajinkya-deogade Differential Revision: D46664313 fbshipit-source-id: acb1876c92c3907eb185dd144782495bda593d23
-
- 12 Jun, 2023 1 commit
-
-
Yanghan Wang authored
Summary: I think the main issue is that we import `reroute_config_path` from `d2go.config.config` in `__init__.py`, but it's actually in `d2go.config.utils`. After fixing this, the namespace forward also works, see `scripts/wangyanghan/autodeps_testbed/d2go_config/TARGETS` Update all TARGETS: ``` fbgs -l "d2go/config:" | xargs printf -- '/data/sandcastle/boxes/%s\n' | xargs arc lint -a ``` For reviewers, only `.autodeps.toml` and files in `d2go/d2go/config/` and `scripts/wangyanghan/autodeps_testbed/d2go_config/` are manually changed, other files are auto modified. Reviewed By: ajinkya-deogade Differential Revision: D46582416 fbshipit-source-id: 0be0bebedd1aad5b67a746c75db3c6b81bcfecee
-
- 08 Jun, 2023 1 commit
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/567 As title. Reviewed By: tglik Differential Revision: D46383823 fbshipit-source-id: b5f80f55eb37ddc4e0918a349840b451f2b4b094
-
- 07 Jun, 2023 1 commit
-
-
Jessica Zhong authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/569 Reviewed By: wat3rBro Differential Revision: D46498855 fbshipit-source-id: 99888f6a36a0f69155c3447cc080392ae9886539
-
- 06 Jun, 2023 1 commit
-
-
Jessica Zhong authored
Reviewed By: wat3rBro Differential Revision: D46460305 fbshipit-source-id: e91d9312c5d81ef1ba64ab169380329c8ad05f7c
-
- 03 Jun, 2023 1 commit
-
-
Jiaxu Zhu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/564 As title, as we need `ai_factory.quantization.convert.convert_eager` for Stinson models. This diff renames ``get_convert_fx_fn` to `get_convert_fn` and includes eager mode convert functions as well Reviewed By: wat3rBro Differential Revision: D46368438 fbshipit-source-id: 5ebea1f05b43b476a14ab1091f6ce39bffe614d3
-
- 02 Jun, 2023 1 commit
-
-
Jessica Zhong authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/566 Reviewed By: wat3rBro Differential Revision: D45829249 fbshipit-source-id: 4e70bed0e85179b49b4e2358be3d937cfbf474d4
-
- 01 Jun, 2023 1 commit
-
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/539 Print out parameter names in each parameter group to a separate file (vs writing to the main log file) This is useful to know assignment of specific parameters to a param group. Reviewed By: wat3rBro Differential Revision: D45855436 fbshipit-source-id: 1e1db4cf079802fc20fe3e3d0a931d8c44721d6c
-
- 29 May, 2023 2 commits
-
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/562 Reverting the changes introduced in the diff D46096375 to restore the state before modularization. Reviewed By: tglik Differential Revision: D46145093 fbshipit-source-id: 9897640ec00331fc6ea2817fa46b2272fc33cb8d
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/561 This is the continuation from the part 1 D45912069 where we had not defined the TARGETS for the lightning trainer. As the circular deps have been resolved, we can define the targets for `d2go/trainer/lightning` and move the other TARGETS inside `d2go/trainer`. Reviewed By: tglik Differential Revision: D46096373 fbshipit-source-id: 6efc13eb9ab343d11028fb238e6e3f0c64a03e09
-
- 27 May, 2023 8 commits
-
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/560 This is the continuation from the part 1 D45912077. As the dependencies have been resolved, we can define the targets inside the dir `d2go/utils` Reviewed By: wat3rBro Differential Revision: D46096376 fbshipit-source-id: ab674d382162a4d7e5ee944b2a649e23278ca79f
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/559 Create modular TARGETS for files inside `runner`. Reviewed By: wat3rBro Differential Revision: D45854271 fbshipit-source-id: a15ef475f72685ae8c3c73e0a83cf136a7285d3e
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/558 Presently, `D2GoSharedContext` is only imported in `mobile-vision/d2go/d2go/runner/default_runner.py` for type annotations. Unfortunately, this causes a circular dependency as the TARGET for `d2go.distributed` does not exists. Remove it temporarily and reintroduce back at the top of the stack once all the TARGETS have been introduced. Reviewed By: wat3rBro Differential Revision: D46096375 fbshipit-source-id: d8633ac755d39b807c18967f35a087178afc9787
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/557 To avoid circular dependencies, move the function `add_distillation_configs` that defines the default config for a `runner` making use of distillation from `mobile-vision/d2go/d2go/modeling/distillation.py` to `mobile-vision/d2go/d2go/runner/config_defaults.py`. Reviewed By: wat3rBro Differential Revision: D46096374 fbshipit-source-id: eb85d91b5239e7ab10809a9bf84c869d05d32401
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/556 The TARGETS for the files inside the directory `checkpoint` are tackled in two parts: 1. This diff creates TARGETS for the files inside `checkpoint` i.e. except `checkpoint/fb/tests` 2. The diff D46096372 creates TARGETS for files inside `checkpoint/fb/tests` Reviewed By: tglik, wat3rBro Differential Revision: D45912080 fbshipit-source-id: 04ab44e015a9d89d18e31c854600df05d35539d1
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/555 The TARGETS for the files inside the directory `trainer` are tackled in two parts. 1. This diff creates TARGETS for the files inside `trainer` i.e. except `trainer/lightning` 2. The diff D46096373 creates TARGETS for files inside `trainer/lightning` Reviewed By: tglik, wat3rBro Differential Revision: D45912069 fbshipit-source-id: 3026250a49978f1b8e7a48aeebe1914d8a0a692b
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/554 Create separate buck TARGET for the files in the directory `data`. Reviewed By: wat3rBro Differential Revision: D45912070 fbshipit-source-id: 6785f623343ac826b01fab4ac187b928462a45dc
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/553 Move `inplace_delegate` from `utils` to `modeling` to break the circular dependency. Reviewed By: tglik, wat3rBro Differential Revision: D45912068 fbshipit-source-id: f9f8b1be866ea4d793f4afcd019f16dec3d2f147
-
- 26 May, 2023 4 commits
-
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/552 This diff breaks down the TARGETS for dir `quantization`. Apart from creating the TARGETS the diff temporarily copies the function `_convert_to_d2` from `d2go/runner/lightning_task.py` to avoid circular dependencies. The change is reverted in the diff D46096373. Reviewed By: tglik Differential Revision: D45912067 fbshipit-source-id: b430b2abd129690f8c56479bb75819940fde4e3b
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/551 The `utils` dir is broken down into two steps to deal with circular dependencies while keeping the diffs atomic. This diff creates TARGETS for the dirs. `utils`(except `demo_predictor.py`) and `utils/fb`. The TARGETS for `utils/testing` and `utils/demo_predictor.py` are introduced in down the stack in the diff D46096376. Reviewed By: tglik Differential Revision: D45912077 fbshipit-source-id: fb01969c5f5df97de8afaa24bee8492591059b4d
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/550 Create buck target for the optimizer. Other than creating TARGETS, replaced `from d2go.optimizer import build_optimizer_mapper` with `from d2go.optimizer.build import build_optimizer_mapper` Reviewed By: tglik Differential Revision: D45912075 fbshipit-source-id: e478783a9ec16d4573d6365e5567e8d2ed72eb06
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/549 The `iterate_module_named_parameters` is used by the `optimizer` and `quantization`. Let's move the `iterate_module_named_parameters` to a shared location `utils` to break the circular dependencies for the following diffs in the stack. Reviewed By: tglik Differential Revision: D45912066 fbshipit-source-id: bce5c5db3bbc1866f4da8662f7bd5908bfe30aad
-
- 25 May, 2023 4 commits
-
-
Jiaxu Zhu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/548 As title, by setting ``` SOLVER.DETERMINISTIC = True SEED = 42 # or other values ``` Training results are reproducible Reviewed By: wat3rBro, rkaarimi Differential Revision: D46174626 fbshipit-source-id: d6665b777376a176bd46a1286c3199ed0da26ae6
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/546 Here we start modularizing the targets. I had to introduce some temporary hacks to break the circular dependency while keeping the diff atomic. There are some TODOs left at the end of the stack that are still WIP. Reviewed By: tglik Differential Revision: D45912076 fbshipit-source-id: 375f579fe749dd4a588908cdca7b76ba68f1048f
-
Ajinkya Deogade authored
Summary: There is an issue with the relative import in the `__init__` file of modeldef that causes tests on GitHub CI to fail. Specifically, the `FBNetV2ModelArch` is not correctly populated. The internal CI does not detect such failures because we use the buck build system. This diff fixes it. Pull Request resolved: https://github.com/facebookresearch/d2go/pull/547 Reviewed By: patricksnape Differential Revision: D46177424 fbshipit-source-id: 06b23b9b221c990cd15a2debff6def8cfb99743b
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/544 The previous diff on memory profiler D45673764 doesn't pick up a config key name change and causes an attribute not found error. This diff fixes it and adds two unittests (one with gpu one without) for using memory profiler in runner Reviewed By: wat3rBro Differential Revision: D46114730 fbshipit-source-id: d066d435021983d90f4a75e0c88798a3aedcaf92
-
- 24 May, 2023 1 commit
-
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/545 Expanding the relative imports to absolute ones helps the autodeps down the stack. Reviewed By: tglik Differential Revision: D45912074 fbshipit-source-id: d42c9756dde731504ee6fd0f93cf549d71157489
-
- 22 May, 2023 1 commit
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/542 ## Overview Add an option to enable GPU memory snapshot profiler in d2go. The profiler is natively supported by Pytorch and is able to record stack traces associated with all CUDA memory allocation/free events, allowing users to understand which parts of code contribute to the memory bottleneck. It also provides a powerful interactive web tool to visualize memory utilization ordered by time: {F978609840} Each colored block represents an allocated cuda memory block. User can click on the block to see the corresponding python stack trace that allocates the block. ## d2go integration This diff integrates the profiler as a hook controlled by config key `USE_MEMORY_PROFILER`. The profiler will log snapshots and web tools to the output directory. There are three places that logging could happen: start of training, during training and OOM. Please read the docstring of `D2GoGpuMemorySnapshot` for more information. Reviewed By: tglik, jaconey Differential Revision: D45673764 fbshipit-source-id: 8900484a2266d94421fe3ee7a85a4dea3a9f6b72
-
- 19 May, 2023 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/543 The previous implementation: > the problem is the ContextDecorator somehow swallows the exception in the wrapped function and just returns None. This diff adds a test such that previous implementation would fail: ``` ====================================================================== FAIL: test_log_interval_error_prop (d2go.tests.fb.test_utils_logging.TestUtilsLogging) Make sure the log_interval can handle error propagation. ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/mobile-vision/d2go/tests/__init_tests__/init_tests#link-tree/d2go/tests/fb/test_utils_logging.py", line 152, in test_log_interval_error_prop foo(-1) AssertionError: ValueError not raised ---------------------------------------------------------------------- Ran 1 test in 0.098s ``` The new version seems easier to understand and doesn't have the error swallowing. Reviewed By: jaconey Differential Revision: D46009938 fbshipit-source-id: 6b632deb513ab47c4d760f796bf49fc45eae3005
-
- 18 May, 2023 1 commit
-
-
Jiaxu Zhu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/541 The issue post https://fb.workplace.com/groups/277527419809135/permalink/1303604910534709/ The fix was suggested by the MV folks. Reviewed By: dilinwang820, wat3rBro Differential Revision: D45881863 fbshipit-source-id: b33345c4230067b78f27e7deb038c095d55f1360
-
- 16 May, 2023 1 commit
-
-
Jiaxu Zhu authored
Summary: X-link: https://github.com/facebookresearch/detectron2/pull/4955 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/540 Allow users to launch deterministic training jobs. That is, using the same training config, users can get identical training results. Reviewed By: dilinwang820 Differential Revision: D45370627 fbshipit-source-id: 88db388c992500b0d789b8341952502cd1f8f995
-