- 14 Nov, 2022 1 commit
-
-
Miquel Jubert Hermoso authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/388 Reviewed By: wat3rBro Differential Revision: D40377653 fbshipit-source-id: 3f99d30480a801c794665e67bb2b0d28c7c5b0e5
-
- 23 Oct, 2022 1 commit
-
-
Tsahi Glik authored
Summary: X-link: https://github.com/facebookresearch/mobile-vision/pull/116 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/398 D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go doesn't have per node initialization api, but only per worker initialization that happens per subprocess. Some projects (like IOBT) need to way to do shared initialization before spawning all the workers in subprocess and pass this initialized shared context to the workers. This diff adds API to create a shared context object before launching workers and then use this shared context by the runners inside the workers after launch. Reviewed By: wat3rBro Differential Revision: D40001329 fbshipit-source-id: 231a4e7e4da7b5db50849176c58b104c4565306a
-
- 09 Aug, 2022 2 commits
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/357 This change makes it possible to unpickle TrainNetOutput which is currently cannot be unpickled because it's a part of main module which can be different for the binary that's unpickling this dataclass. Reviewed By: miqueljubert Differential Revision: D38536040 fbshipit-source-id: 856594251b2eca7630d69c7917bc4746859dab9f
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/356 Attaching PDB on failure is not working when running in distributed environment. This change allows to disable this behavior by passing a command line argument. Reviewed By: miqueljubert Differential Revision: D38514736 fbshipit-source-id: 2e0008d6fbc6a4518a605debe67d76f8354364fc
-
- 28 Jul, 2022 1 commit
-
-
Mircea Cimpoi authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/349 This is to allow None, meaning model_configs is not used. Added tasks for the other TODO. Reviewed By: wat3rBro Differential Revision: D38199075 fbshipit-source-id: 774ca42a82a972b7e4c642cc4306aec39e2c2f7f
-
- 27 Jul, 2022 1 commit
-
-
Peizhao Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/278 Allow skipping do_test after do_train. Reviewed By: wat3rBro Differential Revision: D36786790 fbshipit-source-id: 785556b5743ee9af2abfe6c0e9e78c7055697048
-
- 25 Jul, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/343 Reviewed By: miqueljubert Differential Revision: D38077850 fbshipit-source-id: a79541d899ce2b49a30c7f2a81a616f76321026f
-
- 22 Jul, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/340 Reviewed By: miqueljubert Differential Revision: D37968017 fbshipit-source-id: a3953fdbb2c48ceaffcf94df081c0b3253d247d5
-
- 30 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/320 MCV/D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's `launch` now supports `kwargs`, which matches elastic launch. Let's always use `args=(cfg, output_dir, runner_name)` for all the binaries, and use `kwargs` for remaining binary arguments (which matches the `extra_args` in FBL's OperatorArgument). Reviewed By: sstsai-adl Differential Revision: D37535145 fbshipit-source-id: 9767e8d71421d2262aee1fd4b9019758aa4a6bbd
-
- 24 Jun, 2022 2 commits
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/309 Right now multiple machines can try to write to the same output file, since they get the same argument. Additionally, on the same machine, several outputs can be saved which requires unncessary unpacking. This change makes train_net only write output of the rank 0 trainer. Reviewed By: wat3rBro Differential Revision: D37310084 fbshipit-source-id: 9d5352a274e8fb1d2043393b12896d402333c17b
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/312 As discussed, we decided to not use runner instance outside of `main`, previous diffs already solved the prerequisites, this diff mainly does the renaming. - Use runner name (str) in the fblearner, ML pipeline. - Use runner name (str) in FBL operator, MAST and binary operator. - Use runner class as the interface of main, it can be either the name of class (str) or actual class. The main usage should be using `str`, so that the importing of class happens inside `main`. But it's also a common use case to import runner class and call `main` for things like ad-hoc scripts or tests, supporting actual class makes it easier modify code for those cases (eg. some local test class doesn't have a name, so it's not feasible to use runner name). Reviewed By: newstzpz Differential Revision: D37060338 fbshipit-source-id: 879852d41902b87d6db6cb9d7b3e8dc55dc4b976
-
- 18 Jun, 2022 2 commits
-
-
Tsahi Glik authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/297 X-link: https://github.com/facebookresearch/mobile-vision/pull/84 Add command line arg to specify whether and where to save results. This is useful where binaries are being launched from another process, or remotely on another machine. Reviewed By: wat3rBro Differential Revision: D37157955 fbshipit-source-id: 2a48cf967f6cf928049f2be41952834e1dd2a04d
-
Tsahi Glik authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/302 Fixing issue introduced in D35035813 (https://github.com/facebookresearch/d2go/commit/744d72d73b7103b8dd9ca69372a179b44ad7d733) that break the OSS cli tools defined in https://github.com/facebookresearch/d2go/blob/8098d160c0b38b796a2c164719650a50238a0f89/setup.py#L87-L92. The cli alias in setup need a function without any args to call. So creating a new main_cli function Reviewed By: wat3rBro Differential Revision: D37210948 fbshipit-source-id: efb3df15e9933c617414a727e5b53553db170622
-
- 16 Jun, 2022 1 commit
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/301 This is a follow-up of earlier work to extract part responsible for the centrally defined parameters from the helper in train_net closer to where the parameters are defined. Reviewed By: tglik Differential Revision: D37176212 fbshipit-source-id: 226415f36f4872ac3d9ba41541b4389a18cc11e6
-
- 15 Jun, 2022 1 commit
-
-
Mik Vyatskov authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/290 When running through torchx, converting from arguments to CLI arguments is necessary. Reviewed By: wat3rBro Differential Revision: D37086938 fbshipit-source-id: d17c4e36bece8eb02955263181789b71e3483a40
-
- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402205 fbshipit-source-id: a4efc688d02da80c6e96685aa8eb00411615a366
-
- 05 Mar, 2022 1 commit
-
-
Yanghan Wang authored
Summary: fix D34540275 (https://github.com/facebookresearch/d2go/commit/d8bdc633ec66e6ce73076d027f8e777791c2e067) Reviewed By: tglik Differential Revision: D34662745 fbshipit-source-id: 6fd67db041fab6f5810763702e4cc3f16a08c5df
-
- 03 Mar, 2022 1 commit
-
-
Tsahi Glik authored
Summary: Add support in d2go.distributed for `env://` init method. Use env variables as specified in https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization for initialized distributed params. Also change train_net cli function signature to accept args list instead of only using `sys.argv`. To allow calling this function from AIEnv launcher. Differential Revision: D34540275 fbshipit-source-id: 7f718aed4c010b0ac8347d43b5ca5b401210756c
-
- 22 May, 2021 1 commit
-
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/70 DDP supports an fp16_compress_hook which compresses the gradient to FP16 before communication. This can result in a significant speed up. Add one argument `_C.MODEL.DDP_FP16_GRAD_COMPRESS` to trigger it. Reviewed By: zhanghang1989 Differential Revision: D28467701 fbshipit-source-id: 3c80865222f48eb8fe6947ea972448c445ee3ef3
-
- 03 Mar, 2021 1 commit
-
-
facebook-github-bot authored
fbshipit-source-id: f4a8ba78691d8cf46e003ef0bd2e95f170932778
-