- 28 Sep, 2022 1 commit
-
-
Artsiom Sanakoyeu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/374 AMP trained with mixed precision is implemented for the Native d2go Runner, but not for Lightning Tasks. Now we pass params SOLVER.AMP* and SOLVER.CLIP_GRADIENTS* to the lightning Trainer as well. Reviewed By: wat3rBro Differential Revision: D39798007 fbshipit-source-id: e48560a91d37c21c56d953eed141876d8c759329
-
- 09 Aug, 2022 1 commit
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/357 This change makes it possible to unpickle TrainNetOutput which is currently cannot be unpickled because it's a part of main module which can be different for the binary that's unpickling this dataclass. Reviewed By: miqueljubert Differential Revision: D38536040 fbshipit-source-id: 856594251b2eca7630d69c7917bc4746859dab9f
-
- 22 Jul, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/340 Reviewed By: miqueljubert Differential Revision: D37968017 fbshipit-source-id: a3953fdbb2c48ceaffcf94df081c0b3253d247d5
-
- 30 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/319 follow up on D37500599 (https://github.com/facebookresearch/d2go/commit/668b7ac29b0afb55d5923e72fe4f6428e5c85cbd), move lightning_train_net part of D37367360 to this diff. Reviewed By: sstsai-adl Differential Revision: D37534370 fbshipit-source-id: 7f48942a14ce16a9a9540b189441b540ce4f4b25
-
- 29 Jun, 2022 1 commit
-
-
Sam Tsai authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/317 1. Add eval-only only option in similar fashion with train_net 2. Use output_dir from config is not specified via command line Reviewed By: wenliangzhao2018 Differential Revision: D37500599 fbshipit-source-id: 00c5804d08a449def3cc15fff49e27066d01f229
-
- 24 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/312 As discussed, we decided to not use runner instance outside of `main`, previous diffs already solved the prerequisites, this diff mainly does the renaming. - Use runner name (str) in the fblearner, ML pipeline. - Use runner name (str) in FBL operator, MAST and binary operator. - Use runner class as the interface of main, it can be either the name of class (str) or actual class. The main usage should be using `str`, so that the importing of class happens inside `main`. But it's also a common use case to import runner class and call `main` for things like ad-hoc scripts or tests, supporting actual class makes it easier modify code for those cases (eg. some local test class doesn't have a name, so it's not feasible to use runner name). Reviewed By: newstzpz Differential Revision: D37060338 fbshipit-source-id: 879852d41902b87d6db6cb9d7b3e8dc55dc4b976
-
- 16 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/237 Reviewed By: tglik Differential Revision: D35954531 fbshipit-source-id: b69c8065928fe385d29f20f2c2460d60d63fca00
-
- 14 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/285 `setup_after_launch` can now take `DefaultTask` as well (the `runner_or_task` can still be `None`, for runner-less train_net). Reviewed By: tglik Differential Revision: D37011560 fbshipit-source-id: ce8a88242df0a16de8da97d94e8eb7def524c69c
-
- 09 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/274 X-link: https://github.com/facebookresearch/mobile-vision/pull/76 TLDR: this diff consolidate the `distributed_helper` of `mobile_cv`, it (together with `mobile_cv`'s `comm` module) should be the TOGO library for dealing with DDP. D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's `distributed` is now built on-top of `mobile_cv`'s `distributed_helper`. Reviewed By: newstzpz Differential Revision: D36787336 fbshipit-source-id: 640c9dcff5eec534e7894c75cfdf0a12d21c297e
-
- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402205 fbshipit-source-id: a4efc688d02da80c6e96685aa8eb00411615a366
-
- 14 May, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/242 Reviewed By: newstzpz Differential Revision: D36297282 fbshipit-source-id: 8efb19b3186f6978283f4e17e0628b55c2ec816e
-
- 10 Mar, 2022 1 commit
-
-
Haroun Habeeb authored
Summary: see https://fb.workplace.com/notes/3006074566389155 ---- did the integration test not catch this? Reviewed By: ananthsub, tangbinh Differential Revision: D34665501 fbshipit-source-id: ff2cbfa9462f131455dce46a0c413c4c69105f48
-
- 08 Jan, 2022 1 commit
-
-
Binh Tang authored
Summary: ### New commit log messages 4eede7c30 Add deprecation path for renamed training type plugins (#11227) Reviewed By: edward-io, daniellepintz Differential Revision: D33409991 fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0
-
- 06 Jan, 2022 1 commit
-
-
Binh Tang authored
Summary: ### New commit log messages b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142) Reviewed By: jjenniferdai Differential Revision: D33259306 fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0
-
- 29 Dec, 2021 1 commit
-
-
Yanghan Wang authored
Summary: DDPPlugin has been renamed to DDPStrategy (as part of https://github.com/PyTorchLightning/pytorch-lightning/issues/10549), causing oss CI to fail. Simply skipping the import to unblock CI since DDP feature is not used in test. Reviewed By: kazhang Differential Revision: D33351636 fbshipit-source-id: 7a1881c8cd48d9ff17edd41137d27a976103fdde
-
- 24 Sep, 2021 1 commit
-
-
Lei Tian authored
Summary: deprecate terminate_on_nan in pytorch lightning's default trainer config Reviewed By: kazhang, wat3rBro Differential Revision: D30910709 fbshipit-source-id: cb22c1f5f1cf3a3236333f21be87756d3f657f78
-
- 09 Sep, 2021 1 commit
-
-
Yanghan Wang authored
Summary: https://fb.workplace.com/groups/pythonfoundation/posts/2990917737888352 Remove `mobile-vision` from opt-out list; leaving `mobile-vision/SNPE` opted out because of 3rd-party code. arc lint --take BLACK --apply-patches --paths-cmd 'hg files mobile-vision' allow-large-files Reviewed By: sstsai-adl Differential Revision: D30721093 fbshipit-source-id: 9e5c16d988b315b93a28038443ecfb92efd18ef8
-
- 07 Jul, 2021 1 commit
-
-
Daniel Li (AI) authored
Summary: Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS with DDPPlugin Reviewed By: kazhang Differential Revision: D29567013 fbshipit-source-id: f3ffac566a2ff046f55e692b3b24f9531913d4d4
-
- 07 Jun, 2021 1 commit
-
-
Kai Zhang authored
Summary: Detectron2 and D2 (https://github.com/facebookresearch/d2go/commit/81ab967feb650145d3a5904f20fdddd28be83445)Go use custom sampler, we don't need Lightning to add distributed sampler. Reviewed By: ananthsub Differential Revision: D28921092 fbshipit-source-id: ec8f310d0590ed92227935b979d59a06d7fb7a69
-
- 25 May, 2021 2 commits
-
-
Kai Zhang authored
Summary: Currently we are checking if MODEL.DEVICE is "gpu", but actually we DEVICE could also be "cuda". This diff checks if device is "cpu" instead. Reviewed By: wat3rBro Differential Revision: D28689547 fbshipit-source-id: 7512d32b7c08b0dcdc6487c6c2f1703655e64b19
-
Kai Zhang authored
Summary: Currently when launching a training flow, we read number of processes from resources.num_gpus. To be backward compatible with existing D2 (https://github.com/facebookresearch/d2go/commit/f82d44d3c33e6c781a3c6f2b27b376fdfbaeda53)Go training config, this diff changes to dist_config.num_processes_per_machine instead. Reviewed By: wat3rBro Differential Revision: D28630334 fbshipit-source-id: 3c684cd56e5d2e247c7b82e1d1eeff0f39e59ee4
-
- 13 May, 2021 1 commit
-
-
Kai Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/62 Lightning trainer set max step to cfg.SOLVER.MAX_ITER. However, this is the max iteration for all nodes, in multi-node training, we need to scale it down, as well as eval period and other configs. This diff calls `auto_scale_world_size` before passing the config to trainer. Reviewed By: wat3rBro Differential Revision: D28140877 fbshipit-source-id: 2639ae58773a4ec2a0cc59dfefb2f5d9b1afe1a8
-
- 17 Apr, 2021 2 commits
-
-
Kai Zhang authored
Summary: Delegate FX quantization callback's customization to model. Reviewed By: wat3rBro Differential Revision: D27669212 fbshipit-source-id: 2715546cf03134896da6f95ecddaf8503ff95d0b
-
Kai Zhang authored
Summary: As per title and sanity test E2E QAT workflow on Lightning Trainer. - add `post_training_opts`. This is required to use `all_steps_qat.json` with Lightning. We don't actually support the post_training_opts in this diff though - we leave it part of T83437359. - Update .yaml to specify the Quantize-able modules. - Update `lightning_train_net.py` to use the QuantizationAwareTraining callback. Reviewed By: kandluis Differential Revision: D26304879 fbshipit-source-id: 948bef4817d385d8a0969e4990d7f17ecd6994b7
-
- 09 Apr, 2021 1 commit
-
-
Ananth Subramaniam authored
Summary: Before: this test would assume only 2 checkpoints were stored: `last.ckpt`, and `FINAL_MODEL_CKPT` Now: this test asserts that at least these 2 checkpoints are stored. In case the config specifies `save_top_k=-1` for instance, we'd save more checkpoints, causing this test to fail Since this test is only loading the last and the final outputs, I'm changing the behavior to assert that these checkpoints must be saved and ignoring other checkpoint files that could be generated. Reviewed By: kazhang Differential Revision: D27671284 fbshipit-source-id: 0419fb46856d048e7b6eba3ff1dc65b7280a9a90
-
- 24 Mar, 2021 2 commits
-
-
Kai Zhang authored
Summary: Evaluate the predictor generated by previous step. This diff modify the lightning_train_net to reuse the evaluation logic by adding a `predictor_path` param. This diff also makes Lightning training backend depends on `cfg.MODEL.DEVICE` so that in evaluate_predictor step, user could set backend by changing model device. This is useful for evaluating int8 quantized model. Reviewed By: newstzpz Differential Revision: D27150609 fbshipit-source-id: fb72da3e81db932c0fa479350150720143e09a3e
-
Kai Zhang authored
Summary: Given that the way to create D2 (https://github.com/facebookresearch/d2go/commit/465cdb842513eb910aa20fcedea1d2edd15dc7b7)go runner and Lightning task are different, get_class was introduced so that in application we could do: ``` if is Lightning: task_cls = get_class(classname) task = task_cls(cfg) else: runner = create_runner(classname) ``` It turns out that we could need to do that in many places: workflow, binaries. This diff revert `get_class` and return class in `create_runner` if the class is a Lightning module. Reviewed By: newstzpz Differential Revision: D26676595 fbshipit-source-id: c3ce2016d09fe073af4c2dd9f98eea4e59ca621b
-
- 03 Mar, 2021 2 commits
-
-
Kai Zhang authored
Summary: As titled. The OSS version only use PyTorch Lightning while internal version leverages some features(e.g. Manifold integration, every_n_step checkpointing). This diff splits train_net.main into smaller functions so that they could be shared across OSS and internal versions. Reviewed By: zhanghang1989 Differential Revision: D26752701 fbshipit-source-id: 7f68e2a81e78193e117517a0ff668ab14b76ea65
-
facebook-github-bot authored
fbshipit-source-id: f4a8ba78691d8cf46e003ef0bd2e95f170932778
-