- 14 Nov, 2022 1 commit
-
-
Miquel Jubert Hermoso authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/388 Reviewed By: wat3rBro Differential Revision: D40377653 fbshipit-source-id: 3f99d30480a801c794665e67bb2b0d28c7c5b0e5
-
- 11 Nov, 2022 1 commit
-
-
Anthony Chen authored
Summary: X-link: https://github.com/facebookresearch/detectron2/pull/4654 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/412 Support custom precision dtype [float16, bfloat16] for AMP training on D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) backend. There's an old config key `SOLVER.AMP.PRECISION` that only works on lightning backend. This diff enables this config key on D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) backend (train_net binary) as well. Reviewed By: tax313, wat3rBro Differential Revision: D40811604 fbshipit-source-id: 58da17ae1519a54243b5295eb4253c297e4d9296
-
- 27 Oct, 2022 1 commit
-
-
Tsahi Glik authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/401 as followup on D40001329 (https://github.com/facebookresearch/d2go/commit/69bf820c64cd0ffb6a84f465199c9134814cf58e). The export is running main func without launching distributed workers, so it need to set the shared context explicitly. Reviewed By: wat3rBro Differential Revision: D40708631 fbshipit-source-id: 7689a45dff383ba2cce01d33d3be95d612269fbe
-
- 23 Oct, 2022 1 commit
-
-
Tsahi Glik authored
Summary: X-link: https://github.com/facebookresearch/mobile-vision/pull/116 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/398 D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go doesn't have per node initialization api, but only per worker initialization that happens per subprocess. Some projects (like IOBT) need to way to do shared initialization before spawning all the workers in subprocess and pass this initialized shared context to the workers. This diff adds API to create a shared context object before launching workers and then use this shared context by the runners inside the workers after launch. Reviewed By: wat3rBro Differential Revision: D40001329 fbshipit-source-id: 231a4e7e4da7b5db50849176c58b104c4565306a
-
- 05 Oct, 2022 1 commit
-
-
Artsiom Sanakoyeu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/381 Introduce extra parameter SOLVER.AMP.PRECISION which can be sued to control the mixed precision training when lightning backend is used. Previous value `precision: "mixed"` was worng and the training failed (See screenshot below) {F777576618} I had to make AMP.PRECISION as string and make sure that it can work with two values: "float16" and "bfloat16". Before feeding it to the Trainer we convert "float16" string to integer value 16. Such a workaround was unavoidable because D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's config value cannot be of int and str at the same time. Reviewed By: wat3rBro Differential Revision: D40035367 fbshipit-source-id: ed4f615ab29a2258164cbe179a9adba11559d804
-
- 28 Sep, 2022 1 commit
-
-
Artsiom Sanakoyeu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/374 AMP trained with mixed precision is implemented for the Native d2go Runner, but not for Lightning Tasks. Now we pass params SOLVER.AMP* and SOLVER.CLIP_GRADIENTS* to the lightning Trainer as well. Reviewed By: wat3rBro Differential Revision: D39798007 fbshipit-source-id: e48560a91d37c21c56d953eed141876d8c759329
-
- 10 Sep, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/367 EZ Reviewed By: xiecong Differential Revision: D39407416 fbshipit-source-id: d0e6fa09ff926780e98c210bfce955e6b8eec7f6
-
- 09 Aug, 2022 2 commits
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/357 This change makes it possible to unpickle TrainNetOutput which is currently cannot be unpickled because it's a part of main module which can be different for the binary that's unpickling this dataclass. Reviewed By: miqueljubert Differential Revision: D38536040 fbshipit-source-id: 856594251b2eca7630d69c7917bc4746859dab9f
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/356 Attaching PDB on failure is not working when running in distributed environment. This change allows to disable this behavior by passing a command line argument. Reviewed By: miqueljubert Differential Revision: D38514736 fbshipit-source-id: 2e0008d6fbc6a4518a605debe67d76f8354364fc
-
- 28 Jul, 2022 1 commit
-
-
Mircea Cimpoi authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/349 This is to allow None, meaning model_configs is not used. Added tasks for the other TODO. Reviewed By: wat3rBro Differential Revision: D38199075 fbshipit-source-id: 774ca42a82a972b7e4c642cc4306aec39e2c2f7f
-
- 27 Jul, 2022 1 commit
-
-
Peizhao Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/278 Allow skipping do_test after do_train. Reviewed By: wat3rBro Differential Revision: D36786790 fbshipit-source-id: 785556b5743ee9af2abfe6c0e9e78c7055697048
-
- 25 Jul, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/343 Reviewed By: miqueljubert Differential Revision: D38077850 fbshipit-source-id: a79541d899ce2b49a30c7f2a81a616f76321026f
-
- 22 Jul, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/340 Reviewed By: miqueljubert Differential Revision: D37968017 fbshipit-source-id: a3953fdbb2c48ceaffcf94df081c0b3253d247d5
-
- 30 Jun, 2022 2 commits
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/320 MCV/D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's `launch` now supports `kwargs`, which matches elastic launch. Let's always use `args=(cfg, output_dir, runner_name)` for all the binaries, and use `kwargs` for remaining binary arguments (which matches the `extra_args` in FBL's OperatorArgument). Reviewed By: sstsai-adl Differential Revision: D37535145 fbshipit-source-id: 9767e8d71421d2262aee1fd4b9019758aa4a6bbd
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/319 follow up on D37500599 (https://github.com/facebookresearch/d2go/commit/668b7ac29b0afb55d5923e72fe4f6428e5c85cbd), move lightning_train_net part of D37367360 to this diff. Reviewed By: sstsai-adl Differential Revision: D37534370 fbshipit-source-id: 7f48942a14ce16a9a9540b189441b540ce4f4b25
-
- 29 Jun, 2022 1 commit
-
-
Sam Tsai authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/317 1. Add eval-only only option in similar fashion with train_net 2. Use output_dir from config is not specified via command line Reviewed By: wenliangzhao2018 Differential Revision: D37500599 fbshipit-source-id: 00c5804d08a449def3cc15fff49e27066d01f229
-
- 24 Jun, 2022 2 commits
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/309 Right now multiple machines can try to write to the same output file, since they get the same argument. Additionally, on the same machine, several outputs can be saved which requires unncessary unpacking. This change makes train_net only write output of the rank 0 trainer. Reviewed By: wat3rBro Differential Revision: D37310084 fbshipit-source-id: 9d5352a274e8fb1d2043393b12896d402333c17b
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/312 As discussed, we decided to not use runner instance outside of `main`, previous diffs already solved the prerequisites, this diff mainly does the renaming. - Use runner name (str) in the fblearner, ML pipeline. - Use runner name (str) in FBL operator, MAST and binary operator. - Use runner class as the interface of main, it can be either the name of class (str) or actual class. The main usage should be using `str`, so that the importing of class happens inside `main`. But it's also a common use case to import runner class and call `main` for things like ad-hoc scripts or tests, supporting actual class makes it easier modify code for those cases (eg. some local test class doesn't have a name, so it's not feasible to use runner name). Reviewed By: newstzpz Differential Revision: D37060338 fbshipit-source-id: 879852d41902b87d6db6cb9d7b3e8dc55dc4b976
-
- 18 Jun, 2022 2 commits
-
-
Tsahi Glik authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/297 X-link: https://github.com/facebookresearch/mobile-vision/pull/84 Add command line arg to specify whether and where to save results. This is useful where binaries are being launched from another process, or remotely on another machine. Reviewed By: wat3rBro Differential Revision: D37157955 fbshipit-source-id: 2a48cf967f6cf928049f2be41952834e1dd2a04d
-
Tsahi Glik authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/302 Fixing issue introduced in D35035813 (https://github.com/facebookresearch/d2go/commit/744d72d73b7103b8dd9ca69372a179b44ad7d733) that break the OSS cli tools defined in https://github.com/facebookresearch/d2go/blob/8098d160c0b38b796a2c164719650a50238a0f89/setup.py#L87-L92. The cli alias in setup need a function without any args to call. So creating a new main_cli function Reviewed By: wat3rBro Differential Revision: D37210948 fbshipit-source-id: efb3df15e9933c617414a727e5b53553db170622
-
- 16 Jun, 2022 2 commits
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/237 Reviewed By: tglik Differential Revision: D35954531 fbshipit-source-id: b69c8065928fe385d29f20f2c2460d60d63fca00
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/301 This is a follow-up of earlier work to extract part responsible for the centrally defined parameters from the helper in train_net closer to where the parameters are defined. Reviewed By: tglik Differential Revision: D37176212 fbshipit-source-id: 226415f36f4872ac3d9ba41541b4389a18cc11e6
-
- 15 Jun, 2022 1 commit
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/290 When running through torchx, converting from arguments to CLI arguments is necessary. Reviewed By: wat3rBro Differential Revision: D37086938 fbshipit-source-id: d17c4e36bece8eb02955263181789b71e3483a40
-
- 14 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/285 `setup_after_launch` can now take `DefaultTask` as well (the `runner_or_task` can still be `None`, for runner-less train_net). Reviewed By: tglik Differential Revision: D37011560 fbshipit-source-id: ce8a88242df0a16de8da97d94e8eb7def524c69c
-
- 09 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/274 X-link: https://github.com/facebookresearch/mobile-vision/pull/76 TLDR: this diff consolidate the `distributed_helper` of `mobile_cv`, it (together with `mobile_cv`'s `comm` module) should be the TOGO library for dealing with DDP. D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's `distributed` is now built on-top of `mobile_cv`'s `distributed_helper`. Reviewed By: newstzpz Differential Revision: D36787336 fbshipit-source-id: 640c9dcff5eec534e7894c75cfdf0a12d21c297e
-
- 02 Jun, 2022 1 commit
-
-
Miquel Jubert Hermoso authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/238 *This diff is part of a stack which has the goal of "buckifying" D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go core and enabling autodeps and other tooling. The last diff in the stack introduces the TARGETS. The diffs earlier in the stack are resolving circular dependencies and other issues which prevent the buckification from occurring.* Following the comments in an abandoned diff, split the export code into two files, which will have their corresponding dependencies: exporter and api. api.py contains the components which have little dependencies, so it can be imported basically anywhere without circular dependencies. exporter.py contains the utilities, which are use for export operations, for example in the exporter binary. Reviewed By: mcimpoi Differential Revision: D36166603 fbshipit-source-id: 25ded0b3925464c05be4048472a4c2ddcdb17ecf
-
- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402205 fbshipit-source-id: a4efc688d02da80c6e96685aa8eb00411615a366
-
- 14 May, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/242 Reviewed By: newstzpz Differential Revision: D36297282 fbshipit-source-id: 8efb19b3186f6978283f4e17e0628b55c2ec816e
-
- 24 Mar, 2022 1 commit
-
-
Tsahi Glik authored
Summary: Tweak exporter and evaluator cli entry point func to support calling it as a module with args from custom launching code. Reviewed By: sstsai-adl Differential Revision: D35035813 fbshipit-source-id: c8b24099e94ccc58c184f8aac95b2a24a137e86a
-
- 10 Mar, 2022 1 commit
-
-
Haroun Habeeb authored
Summary: see https://fb.workplace.com/notes/3006074566389155 ---- did the integration test not catch this? Reviewed By: ananthsub, tangbinh Differential Revision: D34665501 fbshipit-source-id: ff2cbfa9462f131455dce46a0c413c4c69105f48
-
- 05 Mar, 2022 1 commit
-
-
Yanghan Wang authored
Summary: fix D34540275 (https://github.com/facebookresearch/d2go/commit/d8bdc633ec66e6ce73076d027f8e777791c2e067) Reviewed By: tglik Differential Revision: D34662745 fbshipit-source-id: 6fd67db041fab6f5810763702e4cc3f16a08c5df
-
- 03 Mar, 2022 1 commit
-
-
Tsahi Glik authored
Summary: Add support in d2go.distributed for `env://` init method. Use env variables as specified in https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization for initialized distributed params. Also change train_net cli function signature to accept args list instead of only using `sys.argv`. To allow calling this function from AIEnv launcher. Differential Revision: D34540275 fbshipit-source-id: 7f718aed4c010b0ac8347d43b5ca5b401210756c
-
- 14 Feb, 2022 1 commit
-
-
Tugrul Savran authored
Summary: Currently, the exporter method takes in a compare_accuracy parameter, which after all the compute (exporting etc.) raises an exception if it is set to True. This looks like an antipattern, and causes a waste of compute. Therefore, I am proposing to raise the exception at the very beginning of method call to let the client know in advance that this argument's functionality isn't implemented yet. NOTE: We might also choose to get rid of the entire parameter. I am open for suggestions. Differential Revision: D34186578 fbshipit-source-id: d7fbe7589dfe2d2f688b870885ca61e6829c9329
-
- 08 Jan, 2022 1 commit
-
-
Binh Tang authored
Summary: ### New commit log messages 4eede7c30 Add deprecation path for renamed training type plugins (#11227) Reviewed By: edward-io, daniellepintz Differential Revision: D33409991 fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0
-
- 06 Jan, 2022 1 commit
-
-
Binh Tang authored
Summary: ### New commit log messages b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142) Reviewed By: jjenniferdai Differential Revision: D33259306 fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0
-
- 29 Dec, 2021 1 commit
-
-
Yanghan Wang authored
Summary: DDPPlugin has been renamed to DDPStrategy (as part of https://github.com/PyTorchLightning/pytorch-lightning/issues/10549), causing oss CI to fail. Simply skipping the import to unblock CI since DDP feature is not used in test. Reviewed By: kazhang Differential Revision: D33351636 fbshipit-source-id: 7a1881c8cd48d9ff17edd41137d27a976103fdde
-
- 25 Nov, 2021 1 commit
-
-
Yuxin Wu authored
Summary: make it an option Differential Revision: D32601981 fbshipit-source-id: 308a0c49939531d840914aa8e256aae6db463929
-
- 24 Sep, 2021 1 commit
-
-
Lei Tian authored
Summary: deprecate terminate_on_nan in pytorch lightning's default trainer config Reviewed By: kazhang, wat3rBro Differential Revision: D30910709 fbshipit-source-id: cb22c1f5f1cf3a3236333f21be87756d3f657f78
-
- 18 Sep, 2021 1 commit
-
-
Yuxin Wu authored
Differential Revision: D30973518 fbshipit-source-id: fbdfb862ab23d5141553499471f92d2218addf91
-
- 09 Sep, 2021 1 commit
-
-
Yanghan Wang authored
Summary: https://fb.workplace.com/groups/pythonfoundation/posts/2990917737888352 Remove `mobile-vision` from opt-out list; leaving `mobile-vision/SNPE` opted out because of 3rd-party code. arc lint --take BLACK --apply-patches --paths-cmd 'hg files mobile-vision' allow-large-files Reviewed By: sstsai-adl Differential Revision: D30721093 fbshipit-source-id: 9e5c16d988b315b93a28038443ecfb92efd18ef8
-