- 12 Jul, 2023 1 commit
-
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/591 We previously added reply files for train_net, but not the other relevant binaries with MAST support: evaluator and lightning. Adding support here by extracting the common bits into a separate module and wrapping the functions to reuse the functionality. Differential Revision: D47293689 fbshipit-source-id: 70630a471c0cf037d180c9edfb57a4db4fdf7bde
-
- 05 Jul, 2023 1 commit
-
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/589 Allow attaching GPU profiler to lightning d2go tasks. Reviewed By: miqueljubert Differential Revision: D47190798 fbshipit-source-id: b10269d25de6b5f977633796e77b0d6d912a873a
-
- 28 Jun, 2023 2 commits
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/588 enable autodeps for d2go test to unblock next diff. maybe in future we can break it into smaller pieces to make tests build and run faster. Reviewed By: ajinkya-deogade Differential Revision: D47080563 fbshipit-source-id: 9d8ee2a13f91a34c79aa13f2b8165c615643b87d
-
Francisc Bungiu authored
Summary: Deprecate prepare_fb_model_for_eval(). Reviewed By: miqueljubert Differential Revision: D47085783 fbshipit-source-id: 34b7e822e9baa1f9f77a11d3497df7fb0463c955
-
- 26 Jun, 2023 1 commit
-
-
Ayushi Dalmia authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/586 Adding additional parameters for observers Reviewed By: navsud Differential Revision: D46136523 fbshipit-source-id: ce44d4cdfcd4ef8524f85eb148ee789137fa8abf
-
- 23 Jun, 2023 4 commits
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/585 Disable FSDP mixed precision for model buffers. Buffers are usually small in size so there's very limited performance gain for enabling mixed precision. Plus, applications like BatchNorm layers and diffusion models are very sensitive to the precision of buffers. Thus, we stick to full precision for buffers in FSDP. Reviewed By: wat3rBro Differential Revision: D46951673 fbshipit-source-id: 12bb1a47fbd8b3dd85c7f781bab707206044af15
-
Zhicheng Yan authored
Summary: When registering AdhocCOCODataset, INJECTED_COCO_DATASETS_LUT needs to be updated as well. For example, if a dataset uses custom registering function, it can be only retrieved from INJECTED_COCO_DATASETS_LUT. Otherwise, it uses the default registering function as in branch `register_dataset_split`. Reviewed By: antonrigner Differential Revision: D46826507 fbshipit-source-id: 9170c5b57f3935875b899ab7f93c3c57e77eb28c
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/578 # Problem: d2go EMA uses `named_parameters()` to traverse model states and save EMA checkpoints, while using `state_dict()` to save model checkpoints. This is a brittle practice because `named_parameters()` and `state_dict()` are calling two sets of python APIs and can return different things. In the case of Activation Checkpointing (AC), we don't want AC wrapper to affect checkpoint names. Thus, `state_dict()` is overriden by Pytorch to remove prefix "_checkpoint_wrapped_module" from FQN. However, `named_parameters()` does not have that support, so prefix still exists. In the event of us changing AC wrapping strategy (very common for optimization), we will not be able to load the previous EMA state back to the model. And the same problem also happened with FSDP. # Short-term hack: This diff adds a short term hack to manually remove the AC prefix in EMA. We can expand `IGNORED_FQN_PREFIX` to support more use cases. Reviewed By: wat3rBro Differential Revision: D46815031 fbshipit-source-id: 29b6ea444ed2ef90b8741fccdcb2b62625933e7f
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/581 Reviewed By: wat3rBro Differential Revision: D46913792 fbshipit-source-id: cf3c3812c455091fbf63842443644d2571976017
-
- 22 Jun, 2023 3 commits
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/582 Expose use_orig_params for FSDP constructor to d2go config. Read more about it in the docstring of torch.distributed.fsdp.fully_sharded_data_parallel. use_orig_params=False (default) uses FlatParameters to store flattened parameters, which saves memory by avoiding fragmentation. However, use_orig_params=True is essential for models that are partly frozen. This is because FlatParameters can only accept uniform requries_grad across the whole model Reviewed By: wat3rBro Differential Revision: D46917757 fbshipit-source-id: 12ebe83e6de456e37d89eaf8b257f23925a6786d
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/583 Extend support to MAST for evaluator binary. Reviewed By: miqueljubert Differential Revision: D46762473 fbshipit-source-id: 62ac68f195c89924abf71c9b6a9715d60ffcbf9b
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/580 Reviewed By: ajinkya-deogade Differential Revision: D46875151 fbshipit-source-id: e19d9ac79c0a4ad1b1ab49112e36f80c55062ea4
-
- 21 Jun, 2023 1 commit
-
-
Devin Zhou authored
Summary: This diff enables both category and datasets weight balancing at the same time by declaring "WeightedCategoryTrainingSampler" under "SAMPLER_TRAIN" in config file. X-link: https://github.com/facebookresearch/detectron2/pull/4995 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/570 Reviewed By: jiaxuzhu92, shiyud Differential Revision: D46377371 fbshipit-source-id: 4e8bdf6a7e5d40b04072cb99637d13d85b2e0fce
-
- 19 Jun, 2023 1 commit
-
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/579 Current code assumed training runs only on one node, and there is always a global rank0 on each node. This assumption fails on multinode training, resulting in a key 0 error. Reviewed By: crassirostris Differential Revision: D46841286 fbshipit-source-id: d57919239fa5042de795d74c9c2013b07c9a0a48
-
- 16 Jun, 2023 2 commits
-
-
Miquel Jubert Hermoso authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/577 Reviewed By: seijiyamamoto Differential Revision: D46798443 fbshipit-source-id: 21e66cc26d98e866d34c92fa86b26b977c02925d
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/575 ez Reviewed By: ajinkya-deogade Differential Revision: D46773836 fbshipit-source-id: 8cbfbfac6a60cab26ee1975ce0b876738711c160
-
- 14 Jun, 2023 1 commit
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/573 Enable Activation Checkpointing from Pytorch Distributed in d2go. Reviewed By: rohan-varma Differential Revision: D45681009 fbshipit-source-id: c03f27af61e0374b9e5991d82070edbe41edde6d
-
- 13 Jun, 2023 2 commits
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/574 Currently, d2go runner doesn't delete checkpoint after loading. This is fine if we run `resume=True` because all the model/optimizer/ema state in the checkpoint will be loaded into the corresponding training components. However, in the case of `resume=False`, only model state will be loaded and the optimizer/ema state will be left in memory until the end of training. This could potentially cause OOM if the checkpoint size is large. This diff deletes loaded ckpt after use to save memory and avoid potentiall OOM issues. Reviewed By: tglik Differential Revision: D46674618 fbshipit-source-id: 2b70a8e46c7f2a309f83cc4deefe5d7a14783734
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/572 Reviewed By: ajinkya-deogade Differential Revision: D46664313 fbshipit-source-id: acb1876c92c3907eb185dd144782495bda593d23
-
- 12 Jun, 2023 1 commit
-
-
Yanghan Wang authored
Summary: I think the main issue is that we import `reroute_config_path` from `d2go.config.config` in `__init__.py`, but it's actually in `d2go.config.utils`. After fixing this, the namespace forward also works, see `scripts/wangyanghan/autodeps_testbed/d2go_config/TARGETS` Update all TARGETS: ``` fbgs -l "d2go/config:" | xargs printf -- '/data/sandcastle/boxes/%s\n' | xargs arc lint -a ``` For reviewers, only `.autodeps.toml` and files in `d2go/d2go/config/` and `scripts/wangyanghan/autodeps_testbed/d2go_config/` are manually changed, other files are auto modified. Reviewed By: ajinkya-deogade Differential Revision: D46582416 fbshipit-source-id: 0be0bebedd1aad5b67a746c75db3c6b81bcfecee
-
- 08 Jun, 2023 1 commit
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/567 As title. Reviewed By: tglik Differential Revision: D46383823 fbshipit-source-id: b5f80f55eb37ddc4e0918a349840b451f2b4b094
-
- 07 Jun, 2023 1 commit
-
-
Jessica Zhong authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/569 Reviewed By: wat3rBro Differential Revision: D46498855 fbshipit-source-id: 99888f6a36a0f69155c3447cc080392ae9886539
-
- 06 Jun, 2023 1 commit
-
-
Jessica Zhong authored
Reviewed By: wat3rBro Differential Revision: D46460305 fbshipit-source-id: e91d9312c5d81ef1ba64ab169380329c8ad05f7c
-
- 03 Jun, 2023 1 commit
-
-
Jiaxu Zhu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/564 As title, as we need `ai_factory.quantization.convert.convert_eager` for Stinson models. This diff renames ``get_convert_fx_fn` to `get_convert_fn` and includes eager mode convert functions as well Reviewed By: wat3rBro Differential Revision: D46368438 fbshipit-source-id: 5ebea1f05b43b476a14ab1091f6ce39bffe614d3
-
- 02 Jun, 2023 1 commit
-
-
Jessica Zhong authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/566 Reviewed By: wat3rBro Differential Revision: D45829249 fbshipit-source-id: 4e70bed0e85179b49b4e2358be3d937cfbf474d4
-
- 01 Jun, 2023 1 commit
-
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/539 Print out parameter names in each parameter group to a separate file (vs writing to the main log file) This is useful to know assignment of specific parameters to a param group. Reviewed By: wat3rBro Differential Revision: D45855436 fbshipit-source-id: 1e1db4cf079802fc20fe3e3d0a931d8c44721d6c
-
- 29 May, 2023 2 commits
-
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/562 Reverting the changes introduced in the diff D46096375 to restore the state before modularization. Reviewed By: tglik Differential Revision: D46145093 fbshipit-source-id: 9897640ec00331fc6ea2817fa46b2272fc33cb8d
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/561 This is the continuation from the part 1 D45912069 where we had not defined the TARGETS for the lightning trainer. As the circular deps have been resolved, we can define the targets for `d2go/trainer/lightning` and move the other TARGETS inside `d2go/trainer`. Reviewed By: tglik Differential Revision: D46096373 fbshipit-source-id: 6efc13eb9ab343d11028fb238e6e3f0c64a03e09
-
- 27 May, 2023 8 commits
-
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/560 This is the continuation from the part 1 D45912077. As the dependencies have been resolved, we can define the targets inside the dir `d2go/utils` Reviewed By: wat3rBro Differential Revision: D46096376 fbshipit-source-id: ab674d382162a4d7e5ee944b2a649e23278ca79f
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/559 Create modular TARGETS for files inside `runner`. Reviewed By: wat3rBro Differential Revision: D45854271 fbshipit-source-id: a15ef475f72685ae8c3c73e0a83cf136a7285d3e
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/558 Presently, `D2GoSharedContext` is only imported in `mobile-vision/d2go/d2go/runner/default_runner.py` for type annotations. Unfortunately, this causes a circular dependency as the TARGET for `d2go.distributed` does not exists. Remove it temporarily and reintroduce back at the top of the stack once all the TARGETS have been introduced. Reviewed By: wat3rBro Differential Revision: D46096375 fbshipit-source-id: d8633ac755d39b807c18967f35a087178afc9787
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/557 To avoid circular dependencies, move the function `add_distillation_configs` that defines the default config for a `runner` making use of distillation from `mobile-vision/d2go/d2go/modeling/distillation.py` to `mobile-vision/d2go/d2go/runner/config_defaults.py`. Reviewed By: wat3rBro Differential Revision: D46096374 fbshipit-source-id: eb85d91b5239e7ab10809a9bf84c869d05d32401
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/556 The TARGETS for the files inside the directory `checkpoint` are tackled in two parts: 1. This diff creates TARGETS for the files inside `checkpoint` i.e. except `checkpoint/fb/tests` 2. The diff D46096372 creates TARGETS for files inside `checkpoint/fb/tests` Reviewed By: tglik, wat3rBro Differential Revision: D45912080 fbshipit-source-id: 04ab44e015a9d89d18e31c854600df05d35539d1
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/555 The TARGETS for the files inside the directory `trainer` are tackled in two parts. 1. This diff creates TARGETS for the files inside `trainer` i.e. except `trainer/lightning` 2. The diff D46096373 creates TARGETS for files inside `trainer/lightning` Reviewed By: tglik, wat3rBro Differential Revision: D45912069 fbshipit-source-id: 3026250a49978f1b8e7a48aeebe1914d8a0a692b
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/554 Create separate buck TARGET for the files in the directory `data`. Reviewed By: wat3rBro Differential Revision: D45912070 fbshipit-source-id: 6785f623343ac826b01fab4ac187b928462a45dc
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/553 Move `inplace_delegate` from `utils` to `modeling` to break the circular dependency. Reviewed By: tglik, wat3rBro Differential Revision: D45912068 fbshipit-source-id: f9f8b1be866ea4d793f4afcd019f16dec3d2f147
-
- 26 May, 2023 4 commits
-
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/552 This diff breaks down the TARGETS for dir `quantization`. Apart from creating the TARGETS the diff temporarily copies the function `_convert_to_d2` from `d2go/runner/lightning_task.py` to avoid circular dependencies. The change is reverted in the diff D46096373. Reviewed By: tglik Differential Revision: D45912067 fbshipit-source-id: b430b2abd129690f8c56479bb75819940fde4e3b
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/551 The `utils` dir is broken down into two steps to deal with circular dependencies while keeping the diffs atomic. This diff creates TARGETS for the dirs. `utils`(except `demo_predictor.py`) and `utils/fb`. The TARGETS for `utils/testing` and `utils/demo_predictor.py` are introduced in down the stack in the diff D46096376. Reviewed By: tglik Differential Revision: D45912077 fbshipit-source-id: fb01969c5f5df97de8afaa24bee8492591059b4d
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/550 Create buck target for the optimizer. Other than creating TARGETS, replaced `from d2go.optimizer import build_optimizer_mapper` with `from d2go.optimizer.build import build_optimizer_mapper` Reviewed By: tglik Differential Revision: D45912075 fbshipit-source-id: e478783a9ec16d4573d6365e5567e8d2ed72eb06
-
Ajinkya Deogade authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/549 The `iterate_module_named_parameters` is used by the `optimizer` and `quantization`. Let's move the `iterate_module_named_parameters` to a shared location `utils` to break the circular dependencies for the following diffs in the stack. Reviewed By: tglik Differential Revision: D45912066 fbshipit-source-id: bce5c5db3bbc1866f4da8662f7bd5908bfe30aad
-