Commits · 5c16a4eae60441bdd2946f32ffba2a528b5b4c5e · OpenDAS / d2go

22 Jul, 2022 1 commit

use dataclass to annotate the output of main & operator · 5c16a4ea

Yanghan Wang authored Jul 22, 2022

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/340

Reviewed By: miqueljubert

Differential Revision: D37968017

fbshipit-source-id: a3953fdbb2c48ceaffcf94df081c0b3253d247d5

5c16a4ea

30 Jun, 2022 2 commits

use kwargs for extra args in launch · 4397dcbe

Yanghan Wang authored Jun 30, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/320

MCV/D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's `launch` now supports `kwargs`, which matches elastic launch. Let's always use `args=(cfg, output_dir, runner_name)` for all the binaries, and use `kwargs` for remaining binary arguments (which matches the `extra_args` in FBL's OperatorArgument).

Reviewed By: sstsai-adl

Differential Revision: D37535145

fbshipit-source-id: 9767e8d71421d2262aee1fd4b9019758aa4a6bbd

4397dcbe

use the same prepare_for_launch for lightning · d353b5af

Yanghan Wang authored Jun 30, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/319

follow up on D37500599 (https://github.com/facebookresearch/d2go/commit/668b7ac29b0afb55d5923e72fe4f6428e5c85cbd), move lightning_train_net part of D37367360 to this diff.

Reviewed By: sstsai-adl

Differential Revision: D37534370

fbshipit-source-id: 7f48942a14ce16a9a9540b189441b540ce4f4b25

d353b5af

29 Jun, 2022 1 commit

update for using lightning trainer binary · 668b7ac2

Sam Tsai authored Jun 28, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/317

1. Add eval-only only option in similar fashion with train_net
2. Use output_dir from config is not specified via command line

Reviewed By: wenliangzhao2018

Differential Revision: D37500599

fbshipit-source-id: 00c5804d08a449def3cc15fff49e27066d01f229

668b7ac2

24 Jun, 2022 2 commits

Only save results to file from rank 0 · f0297b81

Mik Vyatskov authored Jun 24, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/309

Right now multiple machines can try to write to the same output file,
since they get the same argument. Additionally, on the same machine, several
outputs can be saved which requires unncessary unpacking. This change makes
train_net only write output of the rank 0 trainer.

Reviewed By: wat3rBro

Differential Revision: D37310084

fbshipit-source-id: 9d5352a274e8fb1d2043393b12896d402333c17b

f0297b81

use runner class instead of instance outside of main · 8051775c

Yanghan Wang authored Jun 23, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/312

As discussed, we decided to not use runner instance outside of `main`, previous diffs already solved the prerequisites, this diff mainly does the renaming.
- Use runner name (str) in the fblearner, ML pipeline.
- Use runner name (str) in FBL operator, MAST and binary operator.
- Use runner class as the interface of main, it can be either the name of class (str) or actual class. The main usage should be using `str`, so that the importing of class happens inside `main`. But it's also a common use case to import runner class and call `main` for things like ad-hoc scripts or tests, supporting actual class makes it easier modify code for those cases (eg. some local test class doesn't have a name, so it's not feasible to use runner name).

Reviewed By: newstzpz

Differential Revision: D37060338

fbshipit-source-id: 879852d41902b87d6db6cb9d7b3e8dc55dc4b976

8051775c

18 Jun, 2022 2 commits

Support saving results in d2go tools · b57fde40

Tsahi Glik authored Jun 18, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/297

X-link: https://github.com/facebookresearch/mobile-vision/pull/84

Add command line arg to specify whether and where to save results.
This is useful where binaries are being launched from another process, or remotely on another machine.

Reviewed By: wat3rBro

Differential Revision: D37157955

fbshipit-source-id: 2a48cf967f6cf928049f2be41952834e1dd2a04d

b57fde40

fix OSS CLI tools · da04300b

Tsahi Glik authored Jun 18, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/302

Fixing issue introduced in D35035813 (https://github.com/facebookresearch/d2go/commit/744d72d73b7103b8dd9ca69372a179b44ad7d733) that break the OSS cli tools defined in https://github.com/facebookresearch/d2go/blob/8098d160c0b38b796a2c164719650a50238a0f89/setup.py#L87-L92.
The cli alias in setup need a function without any args to call. So creating a new main_cli function

Reviewed By: wat3rBro

Differential Revision: D37210948

fbshipit-source-id: efb3df15e9933c617414a727e5b53553db170622

da04300b

16 Jun, 2022 2 commits

restructure lightning related code · 318a3d79

Yanghan Wang authored Jun 16, 2022

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/237

Reviewed By: tglik

Differential Revision: D35954531

fbshipit-source-id: b69c8065928fe385d29f20f2c2460d60d63fca00

318a3d79

Implement a central helper for converting arguments to CLI args · 3b64e76a

Mik Vyatskov authored Jun 16, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/301

This is a follow-up of earlier work to extract part responsible for the
centrally defined parameters from the helper in train_net closer to where the
parameters are defined.

Reviewed By: tglik

Differential Revision: D37176212

fbshipit-source-id: 226415f36f4872ac3d9ba41541b4389a18cc11e6

3b64e76a

15 Jun, 2022 1 commit

Introduce a helper to convert args to CLI args · 7e436109

Mik Vyatskov authored Jun 15, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/290

When running through torchx, converting from arguments to CLI arguments is necessary.

Reviewed By: wat3rBro

Differential Revision: D37086938

fbshipit-source-id: d17c4e36bece8eb02955263181789b71e3483a40

7e436109

14 Jun, 2022 1 commit

support diff config for lightning_train_net · 8cf2b879

Yanghan Wang authored Jun 14, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/285

`setup_after_launch` can now take `DefaultTask` as well (the `runner_or_task` can still be `None`, for runner-less train_net).

Reviewed By: tglik

Differential Revision: D37011560

fbshipit-source-id: ce8a88242df0a16de8da97d94e8eb7def524c69c

8cf2b879

09 Jun, 2022 1 commit

unify DDP launcher for elastic and non-elastic (support elastic launch correctly) · 94dc481a

Yanghan Wang authored Jun 08, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/274

X-link: https://github.com/facebookresearch/mobile-vision/pull/76

TLDR: this diff consolidate the `distributed_helper` of `mobile_cv`, it (together with `mobile_cv`'s `comm` module) should be the TOGO library for dealing with DDP. D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's `distributed` is now built on-top of `mobile_cv`'s `distributed_helper`.

Reviewed By: newstzpz

Differential Revision: D36787336

fbshipit-source-id: 640c9dcff5eec534e7894c75cfdf0a12d21c297e

94dc481a

02 Jun, 2022 1 commit

Separate into API and Exporter · 24da990f

Miquel Jubert Hermoso authored Jun 02, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/238

*This diff is part of a stack which has the goal of "buckifying" D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go core and enabling autodeps and other tooling. The last diff in the stack introduces the TARGETS. The diffs earlier in the stack are resolving circular dependencies and other issues which prevent the buckification from occurring.*

Following the comments in an abandoned diff, split the export code into two files, which will have their corresponding dependencies: exporter and api. api.py contains the components which have little dependencies, so it can be imported basically anywhere without circular dependencies.

exporter.py contains the utilities, which are use for export operations, for example in the exporter binary.

Reviewed By: mcimpoi

Differential Revision: D36166603

fbshipit-source-id: 25ded0b3925464c05be4048472a4c2ddcdb17ecf

24da990f

15 May, 2022 1 commit

apply import merging for fbcode (7 of 11) · b3a9204c

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402205

fbshipit-source-id: a4efc688d02da80c6e96685aa8eb00411615a366

b3a9204c

14 May, 2022 1 commit

refactor setup for lightning_train_net · 6e8e4256

Yanghan Wang authored May 13, 2022

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/242

Reviewed By: newstzpz

Differential Revision: D36297282

fbshipit-source-id: 8efb19b3186f6978283f4e17e0628b55c2ec816e

6e8e4256

24 Mar, 2022 1 commit

refactor exporter and eval command line tools · 744d72d7

Tsahi Glik authored Mar 24, 2022

Summary: Tweak exporter and evaluator cli entry point func to support calling it as a module with args from custom launching code.

Reviewed By: sstsai-adl

Differential Revision: D35035813

fbshipit-source-id: c8b24099e94ccc58c184f8aac95b2a24a137e86a

744d72d7

10 Mar, 2022 1 commit

upgrade lightning's API for Trainer in d2go · 82c6a50b

Haroun Habeeb authored Mar 09, 2022

Summary:
see https://fb.workplace.com/notes/3006074566389155

----
did the integration test not catch this?

Reviewed By: ananthsub, tangbinh

Differential Revision: D34665501

fbshipit-source-id: ff2cbfa9462f131455dce46a0c413c4c69105f48

82c6a50b

05 Mar, 2022 1 commit

fix cli arg parsing · a578044f

Yanghan Wang authored Mar 04, 2022

Summary: fix D34540275 (https://github.com/facebookresearch/d2go/commit/d8bdc633ec66e6ce73076d027f8e777791c2e067)

Reviewed By: tglik

Differential Revision: D34662745

fbshipit-source-id: 6fd67db041fab6f5810763702e4cc3f16a08c5df

a578044f

03 Mar, 2022 1 commit

Integrate AIEnv with D2Go train_net · d8bdc633

Tsahi Glik authored Mar 02, 2022

Summary:
Add support in d2go.distributed for `env://` init method. Use env variables as specified in https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization for initialized distributed params.

Also change train_net cli function signature to accept args list instead of only using `sys.argv`. To allow calling this function from AIEnv launcher.

Differential Revision: D34540275

fbshipit-source-id: 7f718aed4c010b0ac8347d43b5ca5b401210756c

d8bdc633

14 Feb, 2022 1 commit

D2Go Fail Fast: Move exception coming from not implemented "compare accuracy" feature to the top. · eee4dfc1

Tugrul Savran authored Feb 14, 2022

Summary:
Currently, the exporter method takes in a compare_accuracy parameter, which after all the compute (exporting etc.) raises an exception if it is set to True.

This looks like an antipattern, and causes a waste of compute.

Therefore, I am proposing to raise the exception at the very beginning of method call to let the client know in advance that this argument's functionality isn't implemented yet.

NOTE: We might also choose to get rid of the entire parameter. I am open for suggestions.

Differential Revision: D34186578

fbshipit-source-id: d7fbe7589dfe2d2f688b870885ca61e6829c9329

eee4dfc1

08 Jan, 2022 1 commit

Add deprecation path for renamed training type plugins (#11227) · fcd51171

Binh Tang authored Jan 08, 2022

Summary:
### New commit log messages
  4eede7c30 Add deprecation path for renamed training type plugins (#11227)

Reviewed By: edward-io, daniellepintz

Differential Revision: D33409991

fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0

fcd51171

06 Jan, 2022 1 commit

Rename `DDPPlugin` to `DDPStrategy` (#11142) · aeb15613

Binh Tang authored Jan 05, 2022

Summary:
### New commit log messages
  b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142)

Reviewed By: jjenniferdai

Differential Revision: D33259306

fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0

aeb15613

29 Dec, 2021 1 commit

fix import error for DDPPlugin in oss · 62a97445

Yanghan Wang authored Dec 29, 2021

Summary: DDPPlugin has been renamed to DDPStrategy (as part of https://github.com/PyTorchLightning/pytorch-lightning/issues/10549), causing oss CI to fail. Simply skipping the import to unblock CI since DDP feature is not used in test.

Reviewed By: kazhang

Differential Revision: D33351636

fbshipit-source-id: 7a1881c8cd48d9ff17edd41137d27a976103fdde

62a97445

25 Nov, 2021 1 commit

don't hard code device=cpu in exporter · 6d3a5fdb

Yuxin Wu authored Nov 25, 2021

Summary: make it an option

Differential Revision: D32601981

fbshipit-source-id: 308a0c49939531d840914aa8e256aae6db463929

6d3a5fdb

24 Sep, 2021 1 commit

deprecate terminate_on_nan in pytorch lightning's default trainer config · 1ce9e124

Lei Tian authored Sep 24, 2021

Summary: deprecate terminate_on_nan in pytorch lightning's default trainer config

Reviewed By: kazhang, wat3rBro

Differential Revision: D30910709

fbshipit-source-id: cb22c1f5f1cf3a3236333f21be87756d3f657f78

1ce9e124

18 Sep, 2021 1 commit

show stack trace when export errors happen · 81328bf2

Yuxin Wu authored Sep 17, 2021

Differential Revision: D30973518

fbshipit-source-id: fbdfb862ab23d5141553499471f92d2218addf91

81328bf2

09 Sep, 2021 1 commit

enable black for mobile-vision · 82295dbf

Yanghan Wang authored Sep 08, 2021

Summary:
https://fb.workplace.com/groups/pythonfoundation/posts/2990917737888352

Remove `mobile-vision` from opt-out list; leaving `mobile-vision/SNPE` opted out because of 3rd-party code.

arc lint --take BLACK --apply-patches --paths-cmd 'hg files mobile-vision'

allow-large-files

Reviewed By: sstsai-adl

Differential Revision: D30721093

fbshipit-source-id: 9e5c16d988b315b93a28038443ecfb92efd18ef8

82295dbf

09 Jul, 2021 1 commit

Add BoltNN conversion to d2go exporter · ecf832da

Mircea Cimpoi authored Jul 09, 2021

Summary:
Added predictor_type `boltnn_int8` to export to BoltNN via torch delegate.

- `int8` needs to be in the name, otherwise the post-train quantization won't happen;

```
cfg.QUANTIZATION.BACKEND = "qnnpack"
// cfg.QUANTIZATION.CUSTOM_QSCHEME = "per_tensor_affine"
```

Seems that ` QUANTIZATION.CUSTOM_QSCHEME per_tensor_affine` is not needed - likely covered by "qnnpack".

Reviewed By: wat3rBro

Differential Revision: D29106043

fbshipit-source-id: 865ac5af86919fe7b4530b48433a1bd11e295bf4

ecf832da

07 Jul, 2021 1 commit

Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS · 236b15cd

Daniel Li (AI) authored Jul 07, 2021

Summary: Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS with DDPPlugin

Reviewed By: kazhang

Differential Revision: D29567013

fbshipit-source-id: f3ffac566a2ff046f55e692b3b24f9531913d4d4

236b15cd

09 Jun, 2021 1 commit

allow for multiple datasets for test data loader creation · fc690b45

Sam Tsai authored Jun 09, 2021

Summary: Use all training dataset for export instead of just first. This is to support use cases where there is only a small amount of images per jsons but a number of jsons. Since calibration uses the first dataset, it is limited by the number of images in a single dataset.

Reviewed By: ppwwyyxx

Differential Revision: D28902673

fbshipit-source-id: f80146b02d2d1bc04703fbb21ef410f5e26ba64c

fc690b45

07 Jun, 2021 1 commit

Disable replace_sampler_ddp · 20347488

Kai Zhang authored Jun 07, 2021

Summary: Detectron2 and D2 (https://github.com/facebookresearch/d2go/commit/81ab967feb650145d3a5904f20fdddd28be83445)Go use custom sampler, we don't need Lightning to add distributed sampler.

Reviewed By: ananthsub

Differential Revision: D28921092

fbshipit-source-id: ec8f310d0590ed92227935b979d59a06d7fb7a69

20347488

25 May, 2021 2 commits

fix for checking device type · bf395ce5

Kai Zhang authored May 25, 2021

Summary: Currently we are checking if MODEL.DEVICE is "gpu", but actually we DEVICE could also be "cuda". This diff checks if device is "cpu" instead.

Reviewed By: wat3rBro

Differential Revision: D28689547

fbshipit-source-id: 7512d32b7c08b0dcdc6487c6c2f1703655e64b19

bf395ce5

Read number of processes from dist_config · 29b57165

Kai Zhang authored May 24, 2021

Summary: Currently when launching a training flow, we read number of processes from resources.num_gpus. To be backward compatible with existing D2 (https://github.com/facebookresearch/d2go/commit/f82d44d3c33e6c781a3c6f2b27b376fdfbaeda53)Go training config, this diff changes to dist_config.num_processes_per_machine instead.

Reviewed By: wat3rBro

Differential Revision: D28630334

fbshipit-source-id: 3c684cd56e5d2e247c7b82e1d1eeff0f39e59ee4

29b57165

22 May, 2021 2 commits

support FP16 gradient compression · 57809b0f

Zhicheng Yan authored May 21, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/70

DDP supports an fp16_compress_hook which compresses the gradient to FP16 before communication. This can result in a significant speed up.

Add one argument `_C.MODEL.DDP_FP16_GRAD_COMPRESS` to trigger it.

Reviewed By: zhanghang1989

Differential Revision: D28467701

fbshipit-source-id: 3c80865222f48eb8fe6947ea972448c445ee3ef3

57809b0f

Revert D27881742: Enable inference config in export step · daf37a84

Yanghan Wang authored May 21, 2021

Differential Revision:
D27881742 (https://github.com/facebookresearch/d2go/commit/90aff5daf608473dd312b300db8615326fa40a37)

Original commit changeset: 34a3ab7a88f4

fbshipit-source-id: 42c03b4f2b69c656b26774a4665b84b832262650

daf37a84

21 May, 2021 1 commit

Enable inference config in export step · 90aff5da

Sanjeev Kumar authored May 21, 2021

Summary:
- Enable sdk inference config specification in export step. This enables adding the sdk configuration as part of model file in the export step. The sdk config can be specified as infernece_config.yaml and is zipped together with torchscript model. The main goal of sdk configuration is to control the model inference behavior with model.
- SDK inference config design doc: https://docs.google.com/document/d/1j5qx8IrnFg1DJFzTnu4W8WmXFYJ-AgCDfSQHb2ACJsk/edit
- One click fblearner pipeline is in next diff on the stack

Differential Revision: D27881742

fbshipit-source-id: 34a3ab7a88f456b74841cf671ea1b3f678cdb733

90aff5da

13 May, 2021 1 commit

Auto scale config for multi-node training · e87ed5f0

Kai Zhang authored May 13, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/62

Lightning trainer set max step to cfg.SOLVER.MAX_ITER. However, this is the max iteration for all nodes, in multi-node training, we need to scale it down, as well as eval period and other configs.
This diff calls `auto_scale_world_size` before passing the config to trainer.

Reviewed By: wat3rBro

Differential Revision: D28140877

fbshipit-source-id: 2639ae58773a4ec2a0cc59dfefb2f5d9b1afe1a8

e87ed5f0

17 Apr, 2021 2 commits

Delegate to model's customization · aeb24a92

Kai Zhang authored Apr 17, 2021

Summary: Delegate FX quantization callback's customization to model.

Reviewed By: wat3rBro

Differential Revision: D27669212

fbshipit-source-id: 2715546cf03134896da6f95ecddaf8503ff95d0b

aeb24a92

E2E QAT Workflow on Lightning · 845d0b2c

Kai Zhang authored Apr 17, 2021

Summary:
As per title and sanity test E2E QAT workflow on Lightning Trainer.

- add `post_training_opts`. This is required to use `all_steps_qat.json` with Lightning. We don't actually support the post_training_opts in this diff though - we leave it part of T83437359.
- Update .yaml to specify the Quantize-able modules.
- Update `lightning_train_net.py` to use the QuantizationAwareTraining callback.

Reviewed By: kandluis

Differential Revision: D26304879

fbshipit-source-id: 948bef4817d385d8a0969e4990d7f17ecd6994b7

845d0b2c