Commits · a578044f38d12efebce021dc91c44a22549073dc · OpenDAS / d2go

05 Mar, 2022 1 commit

Yanghan Wang authored Mar 04, 2022

Summary: fix D34540275 (https://github.com/facebookresearch/d2go/commit/d8bdc633ec66e6ce73076d027f8e777791c2e067)

Reviewed By: tglik

Differential Revision: D34662745

fbshipit-source-id: 6fd67db041fab6f5810763702e4cc3f16a08c5df

a578044f

03 Mar, 2022 1 commit

Integrate AIEnv with D2Go train_net · d8bdc633

Tsahi Glik authored Mar 02, 2022

Summary:
Add support in d2go.distributed for `env://` init method. Use env variables as specified in https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization for initialized distributed params.

Also change train_net cli function signature to accept args list instead of only using `sys.argv`. To allow calling this function from AIEnv launcher.

Differential Revision: D34540275

fbshipit-source-id: 7f718aed4c010b0ac8347d43b5ca5b401210756c

d8bdc633

14 Feb, 2022 1 commit

D2Go Fail Fast: Move exception coming from not implemented "compare accuracy" feature to the top. · eee4dfc1

Tugrul Savran authored Feb 14, 2022

Summary:
Currently, the exporter method takes in a compare_accuracy parameter, which after all the compute (exporting etc.) raises an exception if it is set to True.

This looks like an antipattern, and causes a waste of compute.

Therefore, I am proposing to raise the exception at the very beginning of method call to let the client know in advance that this argument's functionality isn't implemented yet.

NOTE: We might also choose to get rid of the entire parameter. I am open for suggestions.

Differential Revision: D34186578

fbshipit-source-id: d7fbe7589dfe2d2f688b870885ca61e6829c9329

eee4dfc1

08 Jan, 2022 1 commit

Add deprecation path for renamed training type plugins (#11227) · fcd51171

Binh Tang authored Jan 08, 2022

Summary:
### New commit log messages
  4eede7c30 Add deprecation path for renamed training type plugins (#11227)

Reviewed By: edward-io, daniellepintz

Differential Revision: D33409991

fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0

fcd51171

06 Jan, 2022 1 commit

Rename `DDPPlugin` to `DDPStrategy` (#11142) · aeb15613

Binh Tang authored Jan 05, 2022

Summary:
### New commit log messages
  b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142)

Reviewed By: jjenniferdai

Differential Revision: D33259306

fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0

aeb15613

29 Dec, 2021 1 commit

fix import error for DDPPlugin in oss · 62a97445

Yanghan Wang authored Dec 29, 2021

Summary: DDPPlugin has been renamed to DDPStrategy (as part of https://github.com/PyTorchLightning/pytorch-lightning/issues/10549), causing oss CI to fail. Simply skipping the import to unblock CI since DDP feature is not used in test.

Reviewed By: kazhang

Differential Revision: D33351636

fbshipit-source-id: 7a1881c8cd48d9ff17edd41137d27a976103fdde

62a97445

25 Nov, 2021 1 commit

don't hard code device=cpu in exporter · 6d3a5fdb

Yuxin Wu authored Nov 25, 2021

Summary: make it an option

Differential Revision: D32601981

fbshipit-source-id: 308a0c49939531d840914aa8e256aae6db463929

6d3a5fdb

24 Sep, 2021 1 commit

deprecate terminate_on_nan in pytorch lightning's default trainer config · 1ce9e124

Lei Tian authored Sep 24, 2021

Summary: deprecate terminate_on_nan in pytorch lightning's default trainer config

Reviewed By: kazhang, wat3rBro

Differential Revision: D30910709

fbshipit-source-id: cb22c1f5f1cf3a3236333f21be87756d3f657f78

1ce9e124

18 Sep, 2021 1 commit

show stack trace when export errors happen · 81328bf2

Yuxin Wu authored Sep 17, 2021

Differential Revision: D30973518

fbshipit-source-id: fbdfb862ab23d5141553499471f92d2218addf91

81328bf2

09 Sep, 2021 1 commit

enable black for mobile-vision · 82295dbf

Yanghan Wang authored Sep 08, 2021

Summary:
https://fb.workplace.com/groups/pythonfoundation/posts/2990917737888352

Remove `mobile-vision` from opt-out list; leaving `mobile-vision/SNPE` opted out because of 3rd-party code.

arc lint --take BLACK --apply-patches --paths-cmd 'hg files mobile-vision'

allow-large-files

Reviewed By: sstsai-adl

Differential Revision: D30721093

fbshipit-source-id: 9e5c16d988b315b93a28038443ecfb92efd18ef8

82295dbf

09 Jul, 2021 1 commit

Add BoltNN conversion to d2go exporter · ecf832da

Mircea Cimpoi authored Jul 09, 2021

Summary:
Added predictor_type `boltnn_int8` to export to BoltNN via torch delegate.

- `int8` needs to be in the name, otherwise the post-train quantization won't happen;

```
cfg.QUANTIZATION.BACKEND = "qnnpack"
// cfg.QUANTIZATION.CUSTOM_QSCHEME = "per_tensor_affine"
```

Seems that ` QUANTIZATION.CUSTOM_QSCHEME per_tensor_affine` is not needed - likely covered by "qnnpack".

Reviewed By: wat3rBro

Differential Revision: D29106043

fbshipit-source-id: 865ac5af86919fe7b4530b48433a1bd11e295bf4

ecf832da

07 Jul, 2021 1 commit

Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS · 236b15cd

Daniel Li (AI) authored Jul 07, 2021

Summary: Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS with DDPPlugin

Reviewed By: kazhang

Differential Revision: D29567013

fbshipit-source-id: f3ffac566a2ff046f55e692b3b24f9531913d4d4

236b15cd

09 Jun, 2021 1 commit

allow for multiple datasets for test data loader creation · fc690b45

Sam Tsai authored Jun 09, 2021

Summary: Use all training dataset for export instead of just first. This is to support use cases where there is only a small amount of images per jsons but a number of jsons. Since calibration uses the first dataset, it is limited by the number of images in a single dataset.

Reviewed By: ppwwyyxx

Differential Revision: D28902673

fbshipit-source-id: f80146b02d2d1bc04703fbb21ef410f5e26ba64c

fc690b45

07 Jun, 2021 1 commit

Disable replace_sampler_ddp · 20347488

Kai Zhang authored Jun 07, 2021

Summary: Detectron2 and D2 (https://github.com/facebookresearch/d2go/commit/81ab967feb650145d3a5904f20fdddd28be83445)Go use custom sampler, we don't need Lightning to add distributed sampler.

Reviewed By: ananthsub

Differential Revision: D28921092

fbshipit-source-id: ec8f310d0590ed92227935b979d59a06d7fb7a69

20347488

25 May, 2021 2 commits

fix for checking device type · bf395ce5

Kai Zhang authored May 25, 2021

Summary: Currently we are checking if MODEL.DEVICE is "gpu", but actually we DEVICE could also be "cuda". This diff checks if device is "cpu" instead.

Reviewed By: wat3rBro

Differential Revision: D28689547

fbshipit-source-id: 7512d32b7c08b0dcdc6487c6c2f1703655e64b19

bf395ce5

Read number of processes from dist_config · 29b57165

Kai Zhang authored May 24, 2021

Summary: Currently when launching a training flow, we read number of processes from resources.num_gpus. To be backward compatible with existing D2 (https://github.com/facebookresearch/d2go/commit/f82d44d3c33e6c781a3c6f2b27b376fdfbaeda53)Go training config, this diff changes to dist_config.num_processes_per_machine instead.

Reviewed By: wat3rBro

Differential Revision: D28630334

fbshipit-source-id: 3c684cd56e5d2e247c7b82e1d1eeff0f39e59ee4

29b57165

22 May, 2021 2 commits

support FP16 gradient compression · 57809b0f

Zhicheng Yan authored May 21, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/70

DDP supports an fp16_compress_hook which compresses the gradient to FP16 before communication. This can result in a significant speed up.

Add one argument `_C.MODEL.DDP_FP16_GRAD_COMPRESS` to trigger it.

Reviewed By: zhanghang1989

Differential Revision: D28467701

fbshipit-source-id: 3c80865222f48eb8fe6947ea972448c445ee3ef3

57809b0f

Revert D27881742: Enable inference config in export step · daf37a84

Yanghan Wang authored May 21, 2021

Differential Revision:
D27881742 (https://github.com/facebookresearch/d2go/commit/90aff5daf608473dd312b300db8615326fa40a37)

Original commit changeset: 34a3ab7a88f4

fbshipit-source-id: 42c03b4f2b69c656b26774a4665b84b832262650

daf37a84

21 May, 2021 1 commit

Enable inference config in export step · 90aff5da

Sanjeev Kumar authored May 21, 2021

Summary:
- Enable sdk inference config specification in export step. This enables adding the sdk configuration as part of model file in the export step. The sdk config can be specified as infernece_config.yaml and is zipped together with torchscript model. The main goal of sdk configuration is to control the model inference behavior with model.
- SDK inference config design doc: https://docs.google.com/document/d/1j5qx8IrnFg1DJFzTnu4W8WmXFYJ-AgCDfSQHb2ACJsk/edit
- One click fblearner pipeline is in next diff on the stack

Differential Revision: D27881742

fbshipit-source-id: 34a3ab7a88f456b74841cf671ea1b3f678cdb733

90aff5da

13 May, 2021 1 commit

Auto scale config for multi-node training · e87ed5f0

Kai Zhang authored May 13, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/62

Lightning trainer set max step to cfg.SOLVER.MAX_ITER. However, this is the max iteration for all nodes, in multi-node training, we need to scale it down, as well as eval period and other configs.
This diff calls `auto_scale_world_size` before passing the config to trainer.

Reviewed By: wat3rBro

Differential Revision: D28140877

fbshipit-source-id: 2639ae58773a4ec2a0cc59dfefb2f5d9b1afe1a8

e87ed5f0

17 Apr, 2021 2 commits

Delegate to model's customization · aeb24a92

Kai Zhang authored Apr 17, 2021

Summary: Delegate FX quantization callback's customization to model.

Reviewed By: wat3rBro

Differential Revision: D27669212

fbshipit-source-id: 2715546cf03134896da6f95ecddaf8503ff95d0b

aeb24a92

E2E QAT Workflow on Lightning · 845d0b2c

Kai Zhang authored Apr 17, 2021

Summary:
As per title and sanity test E2E QAT workflow on Lightning Trainer.

- add `post_training_opts`. This is required to use `all_steps_qat.json` with Lightning. We don't actually support the post_training_opts in this diff though - we leave it part of T83437359.
- Update .yaml to specify the Quantize-able modules.
- Update `lightning_train_net.py` to use the QuantizationAwareTraining callback.

Reviewed By: kandluis

Differential Revision: D26304879

fbshipit-source-id: 948bef4817d385d8a0969e4990d7f17ecd6994b7

845d0b2c

15 Apr, 2021 1 commit

Fix typos in exporter · c4f0fbe6

Alexander Pivovarov authored Apr 14, 2021

Summary:
Fix typos in exporter

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/45

Reviewed By: wat3rBro

Differential Revision: D27779963

Pulled By: zhanghang1989

fbshipit-source-id: bcf7922afe6d4cccc074615069538eb5a6098b98

c4f0fbe6

09 Apr, 2021 1 commit

Make checkpointing tests slightly less restrictive · fc5616c8

Ananth Subramaniam authored Apr 09, 2021

Summary:
Before: this test would assume only 2 checkpoints were stored: `last.ckpt`, and `FINAL_MODEL_CKPT`
Now: this test asserts that at least these 2 checkpoints are stored. In case the config specifies `save_top_k=-1` for instance, we'd save more checkpoints, causing this test to fail

Since this test is only loading the last and the final outputs, I'm changing the behavior to assert that these checkpoints must be saved and ignoring other checkpoint files that could be generated.

Reviewed By: kazhang

Differential Revision: D27671284

fbshipit-source-id: 0419fb46856d048e7b6eba3ff1dc65b7280a9a90

fc5616c8

24 Mar, 2021 2 commits

Support evaluate predictor · 6aec097e

Kai Zhang authored Mar 24, 2021

Summary:
Evaluate the predictor generated by previous step.
This diff modify the lightning_train_net to reuse the evaluation logic by adding a `predictor_path` param.
This diff also makes Lightning training backend depends on `cfg.MODEL.DEVICE` so that in evaluate_predictor step, user could set backend by changing model device. This is useful for evaluating int8 quantized model.

Reviewed By: newstzpz

Differential Revision: D27150609

fbshipit-source-id: fb72da3e81db932c0fa479350150720143e09a3e

6aec097e

Simplify Lightning task and model creation · 9051f71a

Kai Zhang authored Mar 24, 2021

Summary:
Given that the way to create D2 (https://github.com/facebookresearch/d2go/commit/465cdb842513eb910aa20fcedea1d2edd15dc7b7)go runner and Lightning task are different, get_class was introduced so that in application we could do:
```
if is Lightning:
    task_cls = get_class(classname)
    task = task_cls(cfg)
else:
    runner = create_runner(classname)
```
It turns out that we could need to do that in many places: workflow, binaries.
This diff revert `get_class` and return class in `create_runner` if the class is a Lightning module.

Reviewed By: newstzpz

Differential Revision: D26676595

fbshipit-source-id: c3ce2016d09fe073af4c2dd9f98eea4e59ca621b

9051f71a

17 Mar, 2021 1 commit

Add Copyright Header · 8c3618d9

Hang Zhang authored Mar 17, 2021

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/24

Reviewed By: wat3rBro

Differential Revision: D27127642

Pulled By: zhanghang1989

fbshipit-source-id: 18bc3c2fa05232cacc778925db6b7dcea99b108c

8c3618d9

09 Mar, 2021 1 commit

add benchmark_data binary · 66b7c7c8

Yanghan Wang authored Mar 09, 2021

Reviewed By: newstzpz

Differential Revision: D26072333

fbshipit-source-id: 6727b34458d410e904045aa58f81c3e09111882a

66b7c7c8

07 Mar, 2021 1 commit

Add missing __init__.py · 2d4ebf8c

Hang Zhang authored Mar 06, 2021

Summary:
fixes https://github.com/facebookresearch/d2go/issues/9

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/13

Reviewed By: wat3rBro

Differential Revision: D26870048

Pulled By: zhanghang1989

fbshipit-source-id: 29298bca7a59aad214976aaa37461e3d316132d8

2d4ebf8c

04 Mar, 2021 1 commit

Typo fixes · 5bf4cc7d

RangiLyu authored Mar 04, 2021

Summary:
Change depoyment to deployment in README.md.
Change datasest to datasets in tools/exporter.py.

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/7

Reviewed By: newstzpz

Differential Revision: D26821039

Pulled By: zhanghang1989

fbshipit-source-id: 5056d15c877c4b3d771d33267139e73f1527da21

5bf4cc7d

03 Mar, 2021 2 commits

Split lightning_train_net into OSS and internal · 857195d8

Kai Zhang authored Mar 03, 2021

Summary:
As titled. The OSS version only use PyTorch Lightning while internal version leverages some features(e.g. Manifold integration, every_n_step checkpointing).
This diff splits train_net.main into smaller functions so that they could be shared across OSS and internal versions.

Reviewed By: zhanghang1989

Differential Revision: D26752701

fbshipit-source-id: 7f68e2a81e78193e117517a0ff668ab14b76ea65

857195d8

Initial commit · f23248c0
facebook-github-bot authored Mar 02, 2021
```
fbshipit-source-id: f4a8ba78691d8cf46e003ef0bd2e95f170932778
```
f23248c0