Commits · d353b5aff6eaed6bb1bea45437e244d6c3068da3 · OpenDAS / d2go

"server/vscode:/vscode.git/clone" did not exist on "b2554455572b28c0e18423d6fe6896cf7137dbd6"

30 Jun, 2022 1 commit

use the same prepare_for_launch for lightning · d353b5af

Yanghan Wang authored Jun 30, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/319

follow up on D37500599 (https://github.com/facebookresearch/d2go/commit/668b7ac29b0afb55d5923e72fe4f6428e5c85cbd), move lightning_train_net part of D37367360 to this diff.

Reviewed By: sstsai-adl

Differential Revision: D37534370

fbshipit-source-id: 7f48942a14ce16a9a9540b189441b540ce4f4b25

d353b5af

29 Jun, 2022 1 commit

update for using lightning trainer binary · 668b7ac2

Sam Tsai authored Jun 28, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/317

1. Add eval-only only option in similar fashion with train_net
2. Use output_dir from config is not specified via command line

Reviewed By: wenliangzhao2018

Differential Revision: D37500599

fbshipit-source-id: 00c5804d08a449def3cc15fff49e27066d01f229

668b7ac2

24 Jun, 2022 1 commit

use runner class instead of instance outside of main · 8051775c

Yanghan Wang authored Jun 23, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/312

As discussed, we decided to not use runner instance outside of `main`, previous diffs already solved the prerequisites, this diff mainly does the renaming.
- Use runner name (str) in the fblearner, ML pipeline.
- Use runner name (str) in FBL operator, MAST and binary operator.
- Use runner class as the interface of main, it can be either the name of class (str) or actual class. The main usage should be using `str`, so that the importing of class happens inside `main`. But it's also a common use case to import runner class and call `main` for things like ad-hoc scripts or tests, supporting actual class makes it easier modify code for those cases (eg. some local test class doesn't have a name, so it's not feasible to use runner name).

Reviewed By: newstzpz

Differential Revision: D37060338

fbshipit-source-id: 879852d41902b87d6db6cb9d7b3e8dc55dc4b976

8051775c

16 Jun, 2022 1 commit

restructure lightning related code · 318a3d79

Yanghan Wang authored Jun 16, 2022

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/237

Reviewed By: tglik

Differential Revision: D35954531

fbshipit-source-id: b69c8065928fe385d29f20f2c2460d60d63fca00

318a3d79

14 Jun, 2022 1 commit

support diff config for lightning_train_net · 8cf2b879

Yanghan Wang authored Jun 14, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/285

`setup_after_launch` can now take `DefaultTask` as well (the `runner_or_task` can still be `None`, for runner-less train_net).

Reviewed By: tglik

Differential Revision: D37011560

fbshipit-source-id: ce8a88242df0a16de8da97d94e8eb7def524c69c

8cf2b879

09 Jun, 2022 1 commit

unify DDP launcher for elastic and non-elastic (support elastic launch correctly) · 94dc481a

Yanghan Wang authored Jun 08, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/274

X-link: https://github.com/facebookresearch/mobile-vision/pull/76

TLDR: this diff consolidate the `distributed_helper` of `mobile_cv`, it (together with `mobile_cv`'s `comm` module) should be the TOGO library for dealing with DDP. D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's `distributed` is now built on-top of `mobile_cv`'s `distributed_helper`.

Reviewed By: newstzpz

Differential Revision: D36787336

fbshipit-source-id: 640c9dcff5eec534e7894c75cfdf0a12d21c297e

94dc481a

15 May, 2022 1 commit

apply import merging for fbcode (7 of 11) · b3a9204c

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402205

fbshipit-source-id: a4efc688d02da80c6e96685aa8eb00411615a366

b3a9204c

14 May, 2022 1 commit

refactor setup for lightning_train_net · 6e8e4256

Yanghan Wang authored May 13, 2022

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/242

Reviewed By: newstzpz

Differential Revision: D36297282

fbshipit-source-id: 8efb19b3186f6978283f4e17e0628b55c2ec816e

6e8e4256

10 Mar, 2022 1 commit

upgrade lightning's API for Trainer in d2go · 82c6a50b

Haroun Habeeb authored Mar 09, 2022

Summary:
see https://fb.workplace.com/notes/3006074566389155

----
did the integration test not catch this?

Reviewed By: ananthsub, tangbinh

Differential Revision: D34665501

fbshipit-source-id: ff2cbfa9462f131455dce46a0c413c4c69105f48

82c6a50b

08 Jan, 2022 1 commit

Add deprecation path for renamed training type plugins (#11227) · fcd51171

Binh Tang authored Jan 08, 2022

Summary:
### New commit log messages
  4eede7c30 Add deprecation path for renamed training type plugins (#11227)

Reviewed By: edward-io, daniellepintz

Differential Revision: D33409991

fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0

fcd51171

06 Jan, 2022 1 commit

Rename `DDPPlugin` to `DDPStrategy` (#11142) · aeb15613

Binh Tang authored Jan 05, 2022

Summary:
### New commit log messages
  b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142)

Reviewed By: jjenniferdai

Differential Revision: D33259306

fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0

aeb15613

29 Dec, 2021 1 commit

fix import error for DDPPlugin in oss · 62a97445

Yanghan Wang authored Dec 29, 2021

Summary: DDPPlugin has been renamed to DDPStrategy (as part of https://github.com/PyTorchLightning/pytorch-lightning/issues/10549), causing oss CI to fail. Simply skipping the import to unblock CI since DDP feature is not used in test.

Reviewed By: kazhang

Differential Revision: D33351636

fbshipit-source-id: 7a1881c8cd48d9ff17edd41137d27a976103fdde

62a97445

24 Sep, 2021 1 commit

deprecate terminate_on_nan in pytorch lightning's default trainer config · 1ce9e124

Lei Tian authored Sep 24, 2021

Summary: deprecate terminate_on_nan in pytorch lightning's default trainer config

Reviewed By: kazhang, wat3rBro

Differential Revision: D30910709

fbshipit-source-id: cb22c1f5f1cf3a3236333f21be87756d3f657f78

1ce9e124

09 Sep, 2021 1 commit

enable black for mobile-vision · 82295dbf

Yanghan Wang authored Sep 08, 2021

Summary:
https://fb.workplace.com/groups/pythonfoundation/posts/2990917737888352

Remove `mobile-vision` from opt-out list; leaving `mobile-vision/SNPE` opted out because of 3rd-party code.

arc lint --take BLACK --apply-patches --paths-cmd 'hg files mobile-vision'

allow-large-files

Reviewed By: sstsai-adl

Differential Revision: D30721093

fbshipit-source-id: 9e5c16d988b315b93a28038443ecfb92efd18ef8

82295dbf

07 Jul, 2021 1 commit

Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS · 236b15cd

Daniel Li (AI) authored Jul 07, 2021

Summary: Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS with DDPPlugin

Reviewed By: kazhang

Differential Revision: D29567013

fbshipit-source-id: f3ffac566a2ff046f55e692b3b24f9531913d4d4

236b15cd

07 Jun, 2021 1 commit

Disable replace_sampler_ddp · 20347488

Kai Zhang authored Jun 07, 2021

Summary: Detectron2 and D2 (https://github.com/facebookresearch/d2go/commit/81ab967feb650145d3a5904f20fdddd28be83445)Go use custom sampler, we don't need Lightning to add distributed sampler.

Reviewed By: ananthsub

Differential Revision: D28921092

fbshipit-source-id: ec8f310d0590ed92227935b979d59a06d7fb7a69

20347488

25 May, 2021 2 commits

fix for checking device type · bf395ce5

Kai Zhang authored May 25, 2021

Summary: Currently we are checking if MODEL.DEVICE is "gpu", but actually we DEVICE could also be "cuda". This diff checks if device is "cpu" instead.

Reviewed By: wat3rBro

Differential Revision: D28689547

fbshipit-source-id: 7512d32b7c08b0dcdc6487c6c2f1703655e64b19

bf395ce5

Read number of processes from dist_config · 29b57165

Kai Zhang authored May 24, 2021

Summary: Currently when launching a training flow, we read number of processes from resources.num_gpus. To be backward compatible with existing D2 (https://github.com/facebookresearch/d2go/commit/f82d44d3c33e6c781a3c6f2b27b376fdfbaeda53)Go training config, this diff changes to dist_config.num_processes_per_machine instead.

Reviewed By: wat3rBro

Differential Revision: D28630334

fbshipit-source-id: 3c684cd56e5d2e247c7b82e1d1eeff0f39e59ee4

29b57165

13 May, 2021 1 commit

Auto scale config for multi-node training · e87ed5f0

Kai Zhang authored May 13, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/62

Lightning trainer set max step to cfg.SOLVER.MAX_ITER. However, this is the max iteration for all nodes, in multi-node training, we need to scale it down, as well as eval period and other configs.
This diff calls `auto_scale_world_size` before passing the config to trainer.

Reviewed By: wat3rBro

Differential Revision: D28140877

fbshipit-source-id: 2639ae58773a4ec2a0cc59dfefb2f5d9b1afe1a8

e87ed5f0

17 Apr, 2021 2 commits

Delegate to model's customization · aeb24a92

Kai Zhang authored Apr 17, 2021

Summary: Delegate FX quantization callback's customization to model.

Reviewed By: wat3rBro

Differential Revision: D27669212

fbshipit-source-id: 2715546cf03134896da6f95ecddaf8503ff95d0b

aeb24a92

E2E QAT Workflow on Lightning · 845d0b2c

Kai Zhang authored Apr 17, 2021

Summary:
As per title and sanity test E2E QAT workflow on Lightning Trainer.

- add `post_training_opts`. This is required to use `all_steps_qat.json` with Lightning. We don't actually support the post_training_opts in this diff though - we leave it part of T83437359.
- Update .yaml to specify the Quantize-able modules.
- Update `lightning_train_net.py` to use the QuantizationAwareTraining callback.

Reviewed By: kandluis

Differential Revision: D26304879

fbshipit-source-id: 948bef4817d385d8a0969e4990d7f17ecd6994b7

845d0b2c

09 Apr, 2021 1 commit

Make checkpointing tests slightly less restrictive · fc5616c8

Ananth Subramaniam authored Apr 09, 2021

Summary:
Before: this test would assume only 2 checkpoints were stored: `last.ckpt`, and `FINAL_MODEL_CKPT`
Now: this test asserts that at least these 2 checkpoints are stored. In case the config specifies `save_top_k=-1` for instance, we'd save more checkpoints, causing this test to fail

Since this test is only loading the last and the final outputs, I'm changing the behavior to assert that these checkpoints must be saved and ignoring other checkpoint files that could be generated.

Reviewed By: kazhang

Differential Revision: D27671284

fbshipit-source-id: 0419fb46856d048e7b6eba3ff1dc65b7280a9a90

fc5616c8

24 Mar, 2021 2 commits

Support evaluate predictor · 6aec097e

Kai Zhang authored Mar 24, 2021

Summary:
Evaluate the predictor generated by previous step.
This diff modify the lightning_train_net to reuse the evaluation logic by adding a `predictor_path` param.
This diff also makes Lightning training backend depends on `cfg.MODEL.DEVICE` so that in evaluate_predictor step, user could set backend by changing model device. This is useful for evaluating int8 quantized model.

Reviewed By: newstzpz

Differential Revision: D27150609

fbshipit-source-id: fb72da3e81db932c0fa479350150720143e09a3e

6aec097e

Simplify Lightning task and model creation · 9051f71a

Kai Zhang authored Mar 24, 2021

Summary:
Given that the way to create D2 (https://github.com/facebookresearch/d2go/commit/465cdb842513eb910aa20fcedea1d2edd15dc7b7)go runner and Lightning task are different, get_class was introduced so that in application we could do:
```
if is Lightning:
    task_cls = get_class(classname)
    task = task_cls(cfg)
else:
    runner = create_runner(classname)
```
It turns out that we could need to do that in many places: workflow, binaries.
This diff revert `get_class` and return class in `create_runner` if the class is a Lightning module.

Reviewed By: newstzpz

Differential Revision: D26676595

fbshipit-source-id: c3ce2016d09fe073af4c2dd9f98eea4e59ca621b

9051f71a

03 Mar, 2021 2 commits

Split lightning_train_net into OSS and internal · 857195d8

Kai Zhang authored Mar 03, 2021

Summary:
As titled. The OSS version only use PyTorch Lightning while internal version leverages some features(e.g. Manifold integration, every_n_step checkpointing).
This diff splits train_net.main into smaller functions so that they could be shared across OSS and internal versions.

Reviewed By: zhanghang1989

Differential Revision: D26752701

fbshipit-source-id: 7f68e2a81e78193e117517a0ff668ab14b76ea65

857195d8

Initial commit · f23248c0
facebook-github-bot authored Mar 02, 2021
```
fbshipit-source-id: f4a8ba78691d8cf46e003ef0bd2e95f170932778
```
f23248c0