- 22 Jul, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/340 Reviewed By: miqueljubert Differential Revision: D37968017 fbshipit-source-id: a3953fdbb2c48ceaffcf94df081c0b3253d247d5
-
- 30 Jun, 2022 2 commits
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/320 MCV/D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's `launch` now supports `kwargs`, which matches elastic launch. Let's always use `args=(cfg, output_dir, runner_name)` for all the binaries, and use `kwargs` for remaining binary arguments (which matches the `extra_args` in FBL's OperatorArgument). Reviewed By: sstsai-adl Differential Revision: D37535145 fbshipit-source-id: 9767e8d71421d2262aee1fd4b9019758aa4a6bbd
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/319 follow up on D37500599 (https://github.com/facebookresearch/d2go/commit/668b7ac29b0afb55d5923e72fe4f6428e5c85cbd), move lightning_train_net part of D37367360 to this diff. Reviewed By: sstsai-adl Differential Revision: D37534370 fbshipit-source-id: 7f48942a14ce16a9a9540b189441b540ce4f4b25
-
- 29 Jun, 2022 1 commit
-
-
Sam Tsai authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/317 1. Add eval-only only option in similar fashion with train_net 2. Use output_dir from config is not specified via command line Reviewed By: wenliangzhao2018 Differential Revision: D37500599 fbshipit-source-id: 00c5804d08a449def3cc15fff49e27066d01f229
-
- 24 Jun, 2022 2 commits
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/309 Right now multiple machines can try to write to the same output file, since they get the same argument. Additionally, on the same machine, several outputs can be saved which requires unncessary unpacking. This change makes train_net only write output of the rank 0 trainer. Reviewed By: wat3rBro Differential Revision: D37310084 fbshipit-source-id: 9d5352a274e8fb1d2043393b12896d402333c17b
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/312 As discussed, we decided to not use runner instance outside of `main`, previous diffs already solved the prerequisites, this diff mainly does the renaming. - Use runner name (str) in the fblearner, ML pipeline. - Use runner name (str) in FBL operator, MAST and binary operator. - Use runner class as the interface of main, it can be either the name of class (str) or actual class. The main usage should be using `str`, so that the importing of class happens inside `main`. But it's also a common use case to import runner class and call `main` for things like ad-hoc scripts or tests, supporting actual class makes it easier modify code for those cases (eg. some local test class doesn't have a name, so it's not feasible to use runner name). Reviewed By: newstzpz Differential Revision: D37060338 fbshipit-source-id: 879852d41902b87d6db6cb9d7b3e8dc55dc4b976
-
- 18 Jun, 2022 2 commits
-
-
Tsahi Glik authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/297 X-link: https://github.com/facebookresearch/mobile-vision/pull/84 Add command line arg to specify whether and where to save results. This is useful where binaries are being launched from another process, or remotely on another machine. Reviewed By: wat3rBro Differential Revision: D37157955 fbshipit-source-id: 2a48cf967f6cf928049f2be41952834e1dd2a04d
-
Tsahi Glik authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/302 Fixing issue introduced in D35035813 (https://github.com/facebookresearch/d2go/commit/744d72d73b7103b8dd9ca69372a179b44ad7d733) that break the OSS cli tools defined in https://github.com/facebookresearch/d2go/blob/8098d160c0b38b796a2c164719650a50238a0f89/setup.py#L87-L92. The cli alias in setup need a function without any args to call. So creating a new main_cli function Reviewed By: wat3rBro Differential Revision: D37210948 fbshipit-source-id: efb3df15e9933c617414a727e5b53553db170622
-
- 16 Jun, 2022 2 commits
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/237 Reviewed By: tglik Differential Revision: D35954531 fbshipit-source-id: b69c8065928fe385d29f20f2c2460d60d63fca00
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/301 This is a follow-up of earlier work to extract part responsible for the centrally defined parameters from the helper in train_net closer to where the parameters are defined. Reviewed By: tglik Differential Revision: D37176212 fbshipit-source-id: 226415f36f4872ac3d9ba41541b4389a18cc11e6
-
- 15 Jun, 2022 1 commit
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/290 When running through torchx, converting from arguments to CLI arguments is necessary. Reviewed By: wat3rBro Differential Revision: D37086938 fbshipit-source-id: d17c4e36bece8eb02955263181789b71e3483a40
-
- 14 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/285 `setup_after_launch` can now take `DefaultTask` as well (the `runner_or_task` can still be `None`, for runner-less train_net). Reviewed By: tglik Differential Revision: D37011560 fbshipit-source-id: ce8a88242df0a16de8da97d94e8eb7def524c69c
-
- 09 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/274 X-link: https://github.com/facebookresearch/mobile-vision/pull/76 TLDR: this diff consolidate the `distributed_helper` of `mobile_cv`, it (together with `mobile_cv`'s `comm` module) should be the TOGO library for dealing with DDP. D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's `distributed` is now built on-top of `mobile_cv`'s `distributed_helper`. Reviewed By: newstzpz Differential Revision: D36787336 fbshipit-source-id: 640c9dcff5eec534e7894c75cfdf0a12d21c297e
-
- 02 Jun, 2022 1 commit
-
-
Miquel Jubert Hermoso authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/238 *This diff is part of a stack which has the goal of "buckifying" D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go core and enabling autodeps and other tooling. The last diff in the stack introduces the TARGETS. The diffs earlier in the stack are resolving circular dependencies and other issues which prevent the buckification from occurring.* Following the comments in an abandoned diff, split the export code into two files, which will have their corresponding dependencies: exporter and api. api.py contains the components which have little dependencies, so it can be imported basically anywhere without circular dependencies. exporter.py contains the utilities, which are use for export operations, for example in the exporter binary. Reviewed By: mcimpoi Differential Revision: D36166603 fbshipit-source-id: 25ded0b3925464c05be4048472a4c2ddcdb17ecf
-
- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402205 fbshipit-source-id: a4efc688d02da80c6e96685aa8eb00411615a366
-
- 14 May, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/242 Reviewed By: newstzpz Differential Revision: D36297282 fbshipit-source-id: 8efb19b3186f6978283f4e17e0628b55c2ec816e
-
- 24 Mar, 2022 1 commit
-
-
Tsahi Glik authored
Summary: Tweak exporter and evaluator cli entry point func to support calling it as a module with args from custom launching code. Reviewed By: sstsai-adl Differential Revision: D35035813 fbshipit-source-id: c8b24099e94ccc58c184f8aac95b2a24a137e86a
-
- 10 Mar, 2022 1 commit
-
-
Haroun Habeeb authored
Summary: see https://fb.workplace.com/notes/3006074566389155 ---- did the integration test not catch this? Reviewed By: ananthsub, tangbinh Differential Revision: D34665501 fbshipit-source-id: ff2cbfa9462f131455dce46a0c413c4c69105f48
-
- 05 Mar, 2022 1 commit
-
-
Yanghan Wang authored
Summary: fix D34540275 (https://github.com/facebookresearch/d2go/commit/d8bdc633ec66e6ce73076d027f8e777791c2e067) Reviewed By: tglik Differential Revision: D34662745 fbshipit-source-id: 6fd67db041fab6f5810763702e4cc3f16a08c5df
-
- 03 Mar, 2022 1 commit
-
-
Tsahi Glik authored
Summary: Add support in d2go.distributed for `env://` init method. Use env variables as specified in https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization for initialized distributed params. Also change train_net cli function signature to accept args list instead of only using `sys.argv`. To allow calling this function from AIEnv launcher. Differential Revision: D34540275 fbshipit-source-id: 7f718aed4c010b0ac8347d43b5ca5b401210756c
-
- 14 Feb, 2022 1 commit
-
-
Tugrul Savran authored
Summary: Currently, the exporter method takes in a compare_accuracy parameter, which after all the compute (exporting etc.) raises an exception if it is set to True. This looks like an antipattern, and causes a waste of compute. Therefore, I am proposing to raise the exception at the very beginning of method call to let the client know in advance that this argument's functionality isn't implemented yet. NOTE: We might also choose to get rid of the entire parameter. I am open for suggestions. Differential Revision: D34186578 fbshipit-source-id: d7fbe7589dfe2d2f688b870885ca61e6829c9329
-
- 08 Jan, 2022 1 commit
-
-
Binh Tang authored
Summary: ### New commit log messages 4eede7c30 Add deprecation path for renamed training type plugins (#11227) Reviewed By: edward-io, daniellepintz Differential Revision: D33409991 fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0
-
- 06 Jan, 2022 1 commit
-
-
Binh Tang authored
Summary: ### New commit log messages b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142) Reviewed By: jjenniferdai Differential Revision: D33259306 fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0
-
- 29 Dec, 2021 1 commit
-
-
Yanghan Wang authored
Summary: DDPPlugin has been renamed to DDPStrategy (as part of https://github.com/PyTorchLightning/pytorch-lightning/issues/10549), causing oss CI to fail. Simply skipping the import to unblock CI since DDP feature is not used in test. Reviewed By: kazhang Differential Revision: D33351636 fbshipit-source-id: 7a1881c8cd48d9ff17edd41137d27a976103fdde
-
- 25 Nov, 2021 1 commit
-
-
Yuxin Wu authored
Summary: make it an option Differential Revision: D32601981 fbshipit-source-id: 308a0c49939531d840914aa8e256aae6db463929
-
- 24 Sep, 2021 1 commit
-
-
Lei Tian authored
Summary: deprecate terminate_on_nan in pytorch lightning's default trainer config Reviewed By: kazhang, wat3rBro Differential Revision: D30910709 fbshipit-source-id: cb22c1f5f1cf3a3236333f21be87756d3f657f78
-
- 18 Sep, 2021 1 commit
-
-
Yuxin Wu authored
Differential Revision: D30973518 fbshipit-source-id: fbdfb862ab23d5141553499471f92d2218addf91
-
- 09 Sep, 2021 1 commit
-
-
Yanghan Wang authored
Summary: https://fb.workplace.com/groups/pythonfoundation/posts/2990917737888352 Remove `mobile-vision` from opt-out list; leaving `mobile-vision/SNPE` opted out because of 3rd-party code. arc lint --take BLACK --apply-patches --paths-cmd 'hg files mobile-vision' allow-large-files Reviewed By: sstsai-adl Differential Revision: D30721093 fbshipit-source-id: 9e5c16d988b315b93a28038443ecfb92efd18ef8
-
- 09 Jul, 2021 1 commit
-
-
Mircea Cimpoi authored
Summary: Added predictor_type `boltnn_int8` to export to BoltNN via torch delegate. - `int8` needs to be in the name, otherwise the post-train quantization won't happen; ``` cfg.QUANTIZATION.BACKEND = "qnnpack" // cfg.QUANTIZATION.CUSTOM_QSCHEME = "per_tensor_affine" ``` Seems that ` QUANTIZATION.CUSTOM_QSCHEME per_tensor_affine` is not needed - likely covered by "qnnpack". Reviewed By: wat3rBro Differential Revision: D29106043 fbshipit-source-id: 865ac5af86919fe7b4530b48433a1bd11e295bf4
-
- 07 Jul, 2021 1 commit
-
-
Daniel Li (AI) authored
Summary: Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS with DDPPlugin Reviewed By: kazhang Differential Revision: D29567013 fbshipit-source-id: f3ffac566a2ff046f55e692b3b24f9531913d4d4
-
- 09 Jun, 2021 1 commit
-
-
Sam Tsai authored
Summary: Use all training dataset for export instead of just first. This is to support use cases where there is only a small amount of images per jsons but a number of jsons. Since calibration uses the first dataset, it is limited by the number of images in a single dataset. Reviewed By: ppwwyyxx Differential Revision: D28902673 fbshipit-source-id: f80146b02d2d1bc04703fbb21ef410f5e26ba64c
-
- 07 Jun, 2021 1 commit
-
-
Kai Zhang authored
Summary: Detectron2 and D2 (https://github.com/facebookresearch/d2go/commit/81ab967feb650145d3a5904f20fdddd28be83445)Go use custom sampler, we don't need Lightning to add distributed sampler. Reviewed By: ananthsub Differential Revision: D28921092 fbshipit-source-id: ec8f310d0590ed92227935b979d59a06d7fb7a69
-
- 25 May, 2021 2 commits
-
-
Kai Zhang authored
Summary: Currently we are checking if MODEL.DEVICE is "gpu", but actually we DEVICE could also be "cuda". This diff checks if device is "cpu" instead. Reviewed By: wat3rBro Differential Revision: D28689547 fbshipit-source-id: 7512d32b7c08b0dcdc6487c6c2f1703655e64b19
-
Kai Zhang authored
Summary: Currently when launching a training flow, we read number of processes from resources.num_gpus. To be backward compatible with existing D2 (https://github.com/facebookresearch/d2go/commit/f82d44d3c33e6c781a3c6f2b27b376fdfbaeda53)Go training config, this diff changes to dist_config.num_processes_per_machine instead. Reviewed By: wat3rBro Differential Revision: D28630334 fbshipit-source-id: 3c684cd56e5d2e247c7b82e1d1eeff0f39e59ee4
-
- 22 May, 2021 2 commits
-
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/70 DDP supports an fp16_compress_hook which compresses the gradient to FP16 before communication. This can result in a significant speed up. Add one argument `_C.MODEL.DDP_FP16_GRAD_COMPRESS` to trigger it. Reviewed By: zhanghang1989 Differential Revision: D28467701 fbshipit-source-id: 3c80865222f48eb8fe6947ea972448c445ee3ef3
-
Yanghan Wang authored
Differential Revision: D27881742 (https://github.com/facebookresearch/d2go/commit/90aff5daf608473dd312b300db8615326fa40a37) Original commit changeset: 34a3ab7a88f4 fbshipit-source-id: 42c03b4f2b69c656b26774a4665b84b832262650
-
- 21 May, 2021 1 commit
-
-
Sanjeev Kumar authored
Summary: - Enable sdk inference config specification in export step. This enables adding the sdk configuration as part of model file in the export step. The sdk config can be specified as infernece_config.yaml and is zipped together with torchscript model. The main goal of sdk configuration is to control the model inference behavior with model. - SDK inference config design doc: https://docs.google.com/document/d/1j5qx8IrnFg1DJFzTnu4W8WmXFYJ-AgCDfSQHb2ACJsk/edit - One click fblearner pipeline is in next diff on the stack Differential Revision: D27881742 fbshipit-source-id: 34a3ab7a88f456b74841cf671ea1b3f678cdb733
-
- 13 May, 2021 1 commit
-
-
Kai Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/62 Lightning trainer set max step to cfg.SOLVER.MAX_ITER. However, this is the max iteration for all nodes, in multi-node training, we need to scale it down, as well as eval period and other configs. This diff calls `auto_scale_world_size` before passing the config to trainer. Reviewed By: wat3rBro Differential Revision: D28140877 fbshipit-source-id: 2639ae58773a4ec2a0cc59dfefb2f5d9b1afe1a8
-
- 17 Apr, 2021 2 commits
-
-
Kai Zhang authored
Summary: Delegate FX quantization callback's customization to model. Reviewed By: wat3rBro Differential Revision: D27669212 fbshipit-source-id: 2715546cf03134896da6f95ecddaf8503ff95d0b
-
Kai Zhang authored
Summary: As per title and sanity test E2E QAT workflow on Lightning Trainer. - add `post_training_opts`. This is required to use `all_steps_qat.json` with Lightning. We don't actually support the post_training_opts in this diff though - we leave it part of T83437359. - Update .yaml to specify the Quantize-able modules. - Update `lightning_train_net.py` to use the QuantizationAwareTraining callback. Reviewed By: kandluis Differential Revision: D26304879 fbshipit-source-id: 948bef4817d385d8a0969e4990d7f17ecd6994b7
-