- 14 Feb, 2022 1 commit
-
-
Tugrul Savran authored
Summary: Currently, the exporter method takes in a compare_accuracy parameter, which after all the compute (exporting etc.) raises an exception if it is set to True. This looks like an antipattern, and causes a waste of compute. Therefore, I am proposing to raise the exception at the very beginning of method call to let the client know in advance that this argument's functionality isn't implemented yet. NOTE: We might also choose to get rid of the entire parameter. I am open for suggestions. Differential Revision: D34186578 fbshipit-source-id: d7fbe7589dfe2d2f688b870885ca61e6829c9329
-
- 08 Jan, 2022 1 commit
-
-
Binh Tang authored
Summary: ### New commit log messages 4eede7c30 Add deprecation path for renamed training type plugins (#11227) Reviewed By: edward-io, daniellepintz Differential Revision: D33409991 fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0
-
- 06 Jan, 2022 1 commit
-
-
Binh Tang authored
Summary: ### New commit log messages b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142) Reviewed By: jjenniferdai Differential Revision: D33259306 fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0
-
- 29 Dec, 2021 1 commit
-
-
Yanghan Wang authored
Summary: DDPPlugin has been renamed to DDPStrategy (as part of https://github.com/PyTorchLightning/pytorch-lightning/issues/10549), causing oss CI to fail. Simply skipping the import to unblock CI since DDP feature is not used in test. Reviewed By: kazhang Differential Revision: D33351636 fbshipit-source-id: 7a1881c8cd48d9ff17edd41137d27a976103fdde
-
- 25 Nov, 2021 1 commit
-
-
Yuxin Wu authored
Summary: make it an option Differential Revision: D32601981 fbshipit-source-id: 308a0c49939531d840914aa8e256aae6db463929
-
- 24 Sep, 2021 1 commit
-
-
Lei Tian authored
Summary: deprecate terminate_on_nan in pytorch lightning's default trainer config Reviewed By: kazhang, wat3rBro Differential Revision: D30910709 fbshipit-source-id: cb22c1f5f1cf3a3236333f21be87756d3f657f78
-
- 18 Sep, 2021 1 commit
-
-
Yuxin Wu authored
Differential Revision: D30973518 fbshipit-source-id: fbdfb862ab23d5141553499471f92d2218addf91
-
- 09 Sep, 2021 1 commit
-
-
Yanghan Wang authored
Summary: https://fb.workplace.com/groups/pythonfoundation/posts/2990917737888352 Remove `mobile-vision` from opt-out list; leaving `mobile-vision/SNPE` opted out because of 3rd-party code. arc lint --take BLACK --apply-patches --paths-cmd 'hg files mobile-vision' allow-large-files Reviewed By: sstsai-adl Differential Revision: D30721093 fbshipit-source-id: 9e5c16d988b315b93a28038443ecfb92efd18ef8
-
- 09 Jul, 2021 1 commit
-
-
Mircea Cimpoi authored
Summary: Added predictor_type `boltnn_int8` to export to BoltNN via torch delegate. - `int8` needs to be in the name, otherwise the post-train quantization won't happen; ``` cfg.QUANTIZATION.BACKEND = "qnnpack" // cfg.QUANTIZATION.CUSTOM_QSCHEME = "per_tensor_affine" ``` Seems that ` QUANTIZATION.CUSTOM_QSCHEME per_tensor_affine` is not needed - likely covered by "qnnpack". Reviewed By: wat3rBro Differential Revision: D29106043 fbshipit-source-id: 865ac5af86919fe7b4530b48433a1bd11e295bf4
-
- 07 Jul, 2021 1 commit
-
-
Daniel Li (AI) authored
Summary: Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS with DDPPlugin Reviewed By: kazhang Differential Revision: D29567013 fbshipit-source-id: f3ffac566a2ff046f55e692b3b24f9531913d4d4
-
- 09 Jun, 2021 1 commit
-
-
Sam Tsai authored
Summary: Use all training dataset for export instead of just first. This is to support use cases where there is only a small amount of images per jsons but a number of jsons. Since calibration uses the first dataset, it is limited by the number of images in a single dataset. Reviewed By: ppwwyyxx Differential Revision: D28902673 fbshipit-source-id: f80146b02d2d1bc04703fbb21ef410f5e26ba64c
-
- 07 Jun, 2021 1 commit
-
-
Kai Zhang authored
Summary: Detectron2 and D2 (https://github.com/facebookresearch/d2go/commit/81ab967feb650145d3a5904f20fdddd28be83445)Go use custom sampler, we don't need Lightning to add distributed sampler. Reviewed By: ananthsub Differential Revision: D28921092 fbshipit-source-id: ec8f310d0590ed92227935b979d59a06d7fb7a69
-
- 25 May, 2021 2 commits
-
-
Kai Zhang authored
Summary: Currently we are checking if MODEL.DEVICE is "gpu", but actually we DEVICE could also be "cuda". This diff checks if device is "cpu" instead. Reviewed By: wat3rBro Differential Revision: D28689547 fbshipit-source-id: 7512d32b7c08b0dcdc6487c6c2f1703655e64b19
-
Kai Zhang authored
Summary: Currently when launching a training flow, we read number of processes from resources.num_gpus. To be backward compatible with existing D2 (https://github.com/facebookresearch/d2go/commit/f82d44d3c33e6c781a3c6f2b27b376fdfbaeda53)Go training config, this diff changes to dist_config.num_processes_per_machine instead. Reviewed By: wat3rBro Differential Revision: D28630334 fbshipit-source-id: 3c684cd56e5d2e247c7b82e1d1eeff0f39e59ee4
-
- 22 May, 2021 2 commits
-
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/70 DDP supports an fp16_compress_hook which compresses the gradient to FP16 before communication. This can result in a significant speed up. Add one argument `_C.MODEL.DDP_FP16_GRAD_COMPRESS` to trigger it. Reviewed By: zhanghang1989 Differential Revision: D28467701 fbshipit-source-id: 3c80865222f48eb8fe6947ea972448c445ee3ef3
-
Yanghan Wang authored
Differential Revision: D27881742 (https://github.com/facebookresearch/d2go/commit/90aff5daf608473dd312b300db8615326fa40a37) Original commit changeset: 34a3ab7a88f4 fbshipit-source-id: 42c03b4f2b69c656b26774a4665b84b832262650
-
- 21 May, 2021 1 commit
-
-
Sanjeev Kumar authored
Summary: - Enable sdk inference config specification in export step. This enables adding the sdk configuration as part of model file in the export step. The sdk config can be specified as infernece_config.yaml and is zipped together with torchscript model. The main goal of sdk configuration is to control the model inference behavior with model. - SDK inference config design doc: https://docs.google.com/document/d/1j5qx8IrnFg1DJFzTnu4W8WmXFYJ-AgCDfSQHb2ACJsk/edit - One click fblearner pipeline is in next diff on the stack Differential Revision: D27881742 fbshipit-source-id: 34a3ab7a88f456b74841cf671ea1b3f678cdb733
-
- 13 May, 2021 1 commit
-
-
Kai Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/62 Lightning trainer set max step to cfg.SOLVER.MAX_ITER. However, this is the max iteration for all nodes, in multi-node training, we need to scale it down, as well as eval period and other configs. This diff calls `auto_scale_world_size` before passing the config to trainer. Reviewed By: wat3rBro Differential Revision: D28140877 fbshipit-source-id: 2639ae58773a4ec2a0cc59dfefb2f5d9b1afe1a8
-
- 17 Apr, 2021 2 commits
-
-
Kai Zhang authored
Summary: Delegate FX quantization callback's customization to model. Reviewed By: wat3rBro Differential Revision: D27669212 fbshipit-source-id: 2715546cf03134896da6f95ecddaf8503ff95d0b
-
Kai Zhang authored
Summary: As per title and sanity test E2E QAT workflow on Lightning Trainer. - add `post_training_opts`. This is required to use `all_steps_qat.json` with Lightning. We don't actually support the post_training_opts in this diff though - we leave it part of T83437359. - Update .yaml to specify the Quantize-able modules. - Update `lightning_train_net.py` to use the QuantizationAwareTraining callback. Reviewed By: kandluis Differential Revision: D26304879 fbshipit-source-id: 948bef4817d385d8a0969e4990d7f17ecd6994b7
-
- 15 Apr, 2021 1 commit
-
-
Alexander Pivovarov authored
Summary: Fix typos in exporter Pull Request resolved: https://github.com/facebookresearch/d2go/pull/45 Reviewed By: wat3rBro Differential Revision: D27779963 Pulled By: zhanghang1989 fbshipit-source-id: bcf7922afe6d4cccc074615069538eb5a6098b98
-
- 09 Apr, 2021 1 commit
-
-
Ananth Subramaniam authored
Summary: Before: this test would assume only 2 checkpoints were stored: `last.ckpt`, and `FINAL_MODEL_CKPT` Now: this test asserts that at least these 2 checkpoints are stored. In case the config specifies `save_top_k=-1` for instance, we'd save more checkpoints, causing this test to fail Since this test is only loading the last and the final outputs, I'm changing the behavior to assert that these checkpoints must be saved and ignoring other checkpoint files that could be generated. Reviewed By: kazhang Differential Revision: D27671284 fbshipit-source-id: 0419fb46856d048e7b6eba3ff1dc65b7280a9a90
-
- 24 Mar, 2021 2 commits
-
-
Kai Zhang authored
Summary: Evaluate the predictor generated by previous step. This diff modify the lightning_train_net to reuse the evaluation logic by adding a `predictor_path` param. This diff also makes Lightning training backend depends on `cfg.MODEL.DEVICE` so that in evaluate_predictor step, user could set backend by changing model device. This is useful for evaluating int8 quantized model. Reviewed By: newstzpz Differential Revision: D27150609 fbshipit-source-id: fb72da3e81db932c0fa479350150720143e09a3e
-
Kai Zhang authored
Summary: Given that the way to create D2 (https://github.com/facebookresearch/d2go/commit/465cdb842513eb910aa20fcedea1d2edd15dc7b7)go runner and Lightning task are different, get_class was introduced so that in application we could do: ``` if is Lightning: task_cls = get_class(classname) task = task_cls(cfg) else: runner = create_runner(classname) ``` It turns out that we could need to do that in many places: workflow, binaries. This diff revert `get_class` and return class in `create_runner` if the class is a Lightning module. Reviewed By: newstzpz Differential Revision: D26676595 fbshipit-source-id: c3ce2016d09fe073af4c2dd9f98eea4e59ca621b
-
- 17 Mar, 2021 1 commit
-
-
Hang Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/24 Reviewed By: wat3rBro Differential Revision: D27127642 Pulled By: zhanghang1989 fbshipit-source-id: 18bc3c2fa05232cacc778925db6b7dcea99b108c
-
- 09 Mar, 2021 1 commit
-
-
Yanghan Wang authored
Reviewed By: newstzpz Differential Revision: D26072333 fbshipit-source-id: 6727b34458d410e904045aa58f81c3e09111882a
-
- 07 Mar, 2021 1 commit
-
-
Hang Zhang authored
Summary: fixes https://github.com/facebookresearch/d2go/issues/9 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/13 Reviewed By: wat3rBro Differential Revision: D26870048 Pulled By: zhanghang1989 fbshipit-source-id: 29298bca7a59aad214976aaa37461e3d316132d8
-
- 04 Mar, 2021 1 commit
-
-
RangiLyu authored
Summary: Change depoyment to deployment in README.md. Change datasest to datasets in tools/exporter.py. Pull Request resolved: https://github.com/facebookresearch/d2go/pull/7 Reviewed By: newstzpz Differential Revision: D26821039 Pulled By: zhanghang1989 fbshipit-source-id: 5056d15c877c4b3d771d33267139e73f1527da21
-
- 03 Mar, 2021 2 commits
-
-
Kai Zhang authored
Summary: As titled. The OSS version only use PyTorch Lightning while internal version leverages some features(e.g. Manifold integration, every_n_step checkpointing). This diff splits train_net.main into smaller functions so that they could be shared across OSS and internal versions. Reviewed By: zhanghang1989 Differential Revision: D26752701 fbshipit-source-id: 7f68e2a81e78193e117517a0ff668ab14b76ea65
-
facebook-github-bot authored
fbshipit-source-id: f4a8ba78691d8cf46e003ef0bd2e95f170932778
-