Commits · 6e3f514f4341f303fa7e0a50b41412e1fdde26e0 · OpenDAS / d2go

29 Jun, 2021 3 commits

Load model weights from checkpoint in non-strict mode · 6e3f514f

Arman Kapbasov authored Jun 29, 2021

Summary: Updated load_from_checkpoint method call inside lighting_task.py to include extra 'strict' keyword parameter

Reviewed By: kazhang

Differential Revision: D29446372

fbshipit-source-id: b14bc13db551f0876ca78d3ea164cfb08e71a757

6e3f514f

StyleGAN2 · 4731c56a

Kai Zhang authored Jun 29, 2021

Summary: A Lightning task for training StyleGAN2.

Reviewed By: tax313

Differential Revision: D28922408

fbshipit-source-id: bdc9e7370de1b7b7ca9086bc6c0acbe66810d5f8

4731c56a

GAN task · 3805abbe

Kai Zhang authored Jun 29, 2021

Summary:
This diff introduces the D2 (https://github.com/facebookresearch/d2go/commit/9d9f438b191634dc38d16f3973e490909b7f67dd)Go GANs Lightning task for migrating D2 (https://github.com/facebookresearch/d2go/commit/9d9f438b191634dc38d16f3973e490909b7f67dd)Go's GANsRunner to Lightning based workflow.
The Lightning task could directly work with D2 (https://github.com/facebookresearch/d2go/commit/9d9f438b191634dc38d16f3973e490909b7f67dd)Go e2e workflow.

Reviewed By: tax313

Differential Revision: D28165835

fbshipit-source-id: 4d9d679e188f9d5f9a46f01f7d34a8f30c3e170b

3805abbe

27 Jun, 2021 2 commits

Move EMA weights to current device before training · 9d9f438b

Kai Zhang authored Jun 27, 2021

Summary:
Currently we move EMA weights to expected device right after loading from checkpoint.
However, by the time on_load_checkpoint hook is called, current GPU device has not been assigned. This could lead to EMA weights on cuda:0 while the model is on cuda:1.
This diff move EMA weights to device in `on_pretrain_routine_end` instead.

Reviewed By: zhanghang1989

Differential Revision: D28429843

fbshipit-source-id: d864fb3687eb6958872300c5ec0af7ce90591f83

9d9f438b

enable flop printing & logging at the beginning of train & test · 5509a138

Yuxin Wu authored Jun 27, 2021

Reviewed By: zhanghang1989

Differential Revision: D29379832

fbshipit-source-id: 9283a8796a1dbee81b51611407c22f7d5a2069dc

5509a138

26 Jun, 2021 1 commit

Fix quantization test failure · 1894f8a3

Kai Zhang authored Jun 25, 2021

Summary:
# Context
In post training quantization callback, we make a deepcopy of the Lightning module before validation start and prepare the copy with FX quantization API. The callback keeps the prepared model inside it.

# The problem
By the second time we run the validation epoch, we try to make a copy of the Lightning module, which has a reference to trainer, which has a reference to quantization callback, which has a prepared model, which is not deepcopiable.

# Mitigation
Delete the trainer before making a deepcopy.
We're already doing that in stl/callbacks/quantization, but the changes were not ported into D2 (https://github.com/facebookresearch/d2go/commit/4169abc18ec539a24081b179fcbbc5a5754d102b)Go.

Reviewed By: zhanghang1989

Differential Revision: D29409085

fbshipit-source-id: 24550124181673b2e567b2a04563bcdfb440e145

1894f8a3

25 Jun, 2021 3 commits

Freeze matched bn layers · 4169abc1

Haricharan Lakshman authored Jun 25, 2021

Summary:
Convert the batchnorm layers that match the specified regular expressions to FrozenBatchNorm2d.

If module is an instance of batchnorm and it matches the reg exps, returns a new FrozenBatchNorm2d module.

Otherwise, in-place converts the matching batchnorm child modules to FrozenBatchNorm2d
and returns the main module.

Reviewed By: ppwwyyxx

Differential Revision: D29286500

fbshipit-source-id: 3a20f5eeff59ddff50c42fe297eedf0ce2b909bc

4169abc1

read "bbox_mode" from annotation when filtering out images with invalid bbox · 77ef0db7

Luming Ma authored Jun 25, 2021

Summary: Some annotations are using XYXY_ABS for bbox mode so that many images were incorrectly filtered out by assuming XYWH_ABS mode. This diff read bbox_mode from annotation and convert bbox to XYWH_ABS before checking invalid bbox.

Differential Revision: D29365700

fbshipit-source-id: 355346b6826f401f504691090631997e169ead4a

77ef0db7

use src dataset name instead of the derived class name · d4aedb83

Sam Tsai authored Jun 25, 2021

Summary: "@ [0-9]classes" is appended to datasets to mark whether it is a derived class of the original one and saved as a config. When reloading the config, the derived class name will be used as the source instead of the original source. Adding a check to remove the derived suffix.

Reviewed By: wat3rBro

Differential Revision: D29315132

fbshipit-source-id: 0cc204d305d2da6c9f1817aaf631270bd874f90d

d4aedb83

24 Jun, 2021 1 commit

stabilize the training of deformable DETR with box refinement · c480d4e4

Zhicheng Yan authored Jun 23, 2021

Summary:
Major changes
- As described in details in appendix A.4 in deformable DETR paper (https://arxiv.org/abs/2010.04159), the gradient back-propagation is blocked at inverse_sigmoid(bounding box x/y/w/h from last decoder layer). This can be implemented by detaching tensor from compute graph in pytorch. However, currently we detach at an incorrect tensor, preventing update the layers which predicts delta x/y/w/h. Fix this bug.
- Add more comments to annotate data types and tensor shape in the code. This should NOT affect the actual implementation.

Reviewed By: zhanghang1989

Differential Revision: D29048363

fbshipit-source-id: c5b5e89793c86d530b077a7b999769881f441b69

c480d4e4

23 Jun, 2021 1 commit

easier way to separate internal code from oss · 37947353

Yanghan Wang authored Jun 22, 2021

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/90

Reviewed By: zhanghang1989

Differential Revision: D29279123

fbshipit-source-id: d94cea65bd439d54fd14afded0dba066799cedca

37947353

21 Jun, 2021 1 commit

additional flop counting using fvcore's flop counter · bc9d5070

Yuxin Wu authored Jun 21, 2021

Summary:
1. save 3 versions of flop count, using both mobile_cv's flop counter and fvcore's flop counter
2. print only a simple short table in terminal, but save others to files

The `print_flops` function seems not used anywhere so this diff just replaced it.

TODO: enable this feature automatically for train/eval workflows in the next diff

Reviewed By: zhanghang1989

Differential Revision: D29182412

fbshipit-source-id: bfa1dfad41b99fcda06b96c4732237b5e753f1bb

bc9d5070

20 Jun, 2021 1 commit

Add unittest for DETR runner · 54b352d9

Albert Pumarola authored Jun 20, 2021

Summary: Add create and train unit tests to OSS runner

Reviewed By: zhanghang1989

Differential Revision: D29254417

fbshipit-source-id: f7c52b90b2bc7afa83a204895be149664c675e52

54b352d9

19 Jun, 2021 2 commits

enable hive loader for person segmentation · 58f0ae3d

Yanghan Wang authored Jun 18, 2021

Reviewed By: leitian

Differential Revision: D28363172

fbshipit-source-id: e69a71e6525dc9b76171b0cdc5f55ee8d188d6cc

58f0ae3d

fix bug when checking for invalid bounding boxes · de0829f1

Fu-Chen Chen authored Jun 18, 2021

Summary:
The dict `record` might not have keys `"width"` or `"height"`.
This diff check if `"width"` and `"height"` are in the dict `record` before getting the values.

Reviewed By: sstsai-adl

Differential Revision: D29243341

fbshipit-source-id: a1e0e343dd1afcced834c3732e64bb6f372fbd1a

de0829f1

16 Jun, 2021 4 commits

Synchronize PyTorchLightning/pytorch-lightning (revision f7459f53@master) to... · 670b4c4a

Luis Perez authored Jun 16, 2021

Synchronize PyTorchLightning/pytorch-lightning (revision f7459f53@master) to github/third-party/PyTorchLightning/pytorch-lightning

Summary:
## OSS
Note these issues are being solved in OSS here: https://github.com/PyTorchLightning/pytorch-lightning/pull/7994/files#

## Manual
- `speed_monitor.py` - `Result.unpack_batch_size` has been removed, moved to new implementation.
- `fully_sharded.py` - There was a refactor for plugins, so updated corresponding function to keep reduced memory usage.
- `hive_writing_classy.py`, `hive_writing_faim.py`, `hive_writing_xrayvideo.py` - Same as `speed_monitor.py`.
- [Temporary] Uncommented misconfiguration exception. See https://github.com/PyTorchLightning/pytorch-lightning/pull/7882#pullrequestreview-683282719.
- Update `TestModel` to detach appropriately.
- Manually `detach` metrics stored in ResultStore.

## Automatic
### New commit log messages
  f7459f53 DeepSpeed Infinity Update (#7234)
  03e7bdf8 Improve `LightningModule` hook tests (#7944)
  3a0ed02b Properly handle parent modules w/ parameters in `BaseFinetuning` callback (#7931)
  ce93d8bc Handle errors due to uninitailized parameters (#7642)
  cca0e753 remove parsing comments (#7958)
  898fb56b added on_test_start() documentation (#7962)
  22d82661 Seed all workers when using DDP (#7942)
  436fc53c Improve `LightningDataModule` hook test and fix `dataloader_idx` argument (#7941)
  6b7b4047 deprecate hpc_load() and integrate it with restore() (#7955)
  20a5e09e fix myst-parser warning blocking docs ci (#7967)
  f15ea601 update chlog + legacy chpt (#7954)
  59d0c656 Add dataclass support to `apply_to_collection` (#7935)
  cdd01f32 LightningCLI support for argument links applied on instantiation (#7895)
  6856cced Remove rank_zero_only on DataModule prepare_data (#7945)
  96433d03 IPU Integration 5/5 (#7867)
  42c7f272 refactor checkpoint loading for training type plugins (#7928)
  ac4eb0a0 `is_overridden` improvements (#7918)
  9e932f4d Delete `on_after_backward` unused argument (#7925)
  8b738693 Deprecate the default `EarlyStopping` callback monitor value (#7907)
  c1eac483 split `restore_training_state` into logical parts [2 / 2] (#7900)
  d209b689 split `restore_training_state` into logical parts [1 / 2] (#7901)
  111287b4 add pre-commit hooks (#7906)
  839019a3 Remove legacy teardown check in train loop (#7917)
  b45a89a2 Clean-up after logger connector redesign 2/2 (#7631)
  07b69231 Remove fn check for ipu output (#7915)
  580a3b5e Remove dead code (#7910)
  df812398 Clean-up after logger connector redesign 1/2 (#7909)
  ec4f8856 Enable logger connector re-design (#7891)
  15be9865 add logger to __all__ (#6854)
  6fee9262 Deprecate `LightningDataModule` lifecycle properties (#7657)
  764d2c77 refactor CheckpointConnector.restore_weights  (#7862)
  7f4ef6d1 Fix logs overwriting issue for remote fs (#7889)
  c310ce66 Logger connector re-design `_Metadata.reduce_fx` fixes. (#7890)
  b214442e New logger connector code (#7882)

Reviewed By: yifuwang

Differential Revision: D29105294

fbshipit-source-id: 990b2a4a7333908d676de193f5ec930cb50b8a19

670b4c4a

Log D2Go model instantiation events · 14b25e8d

Kai Zhang authored Jun 16, 2021

Summary: This diff logs D2 (https://github.com/facebookresearch/d2go/commit/692a4fb3c506aeebbb49070a20d139d617381b19)Go model instantiation events to table scuba_caffe2_pytorch_usage_stats, so that we could track model usage in fblearner, bento, local scripts, etc.

Reviewed By: zhanghang1989

Differential Revision: D28986723

fbshipit-source-id: 3e865354e5884c9e82bd1b08819cc10d349f93bd

14b25e8d

add segmentation points and use circular kp pattern · dcdf3dcf

Sam Tsai authored Jun 15, 2021

Summary:
1. Circular pattern segmentation points
2. Use circular pattern for kp patterns

Reviewed By: wat3rBro

Differential Revision: D29069224

fbshipit-source-id: c4c01d6d93de5abbdfceae07f1cd48fb56e05f57

dcdf3dcf

add check/filter for invalid bounding boxes · 692a4fb3

Sam Tsai authored Jun 15, 2021

Summary: Checks for invalid bounding boxes and removes from the being included.

Reviewed By: wat3rBro

Differential Revision: D28902711

fbshipit-source-id: 1f017d6ccf5c959059bcb94a09ddd81de868feed

692a4fb3

15 Jun, 2021 1 commit

Lightning as the default runner in e2e workflow · 8cbe10d5

Kai Zhang authored Jun 14, 2021

Summary: As titled.

Reviewed By: zhanghang1989

Differential Revision: D29075952

fbshipit-source-id: 6ef3dc35cd436c1fffb031ea59f20ca23afc5368

8cbe10d5

14 Jun, 2021 1 commit

add prepare_for_export for D2's SemanticSegmentor · 30fb79b6

Yanghan Wang authored Jun 14, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/83

- Implement `prepare_for_export` for `SemanticSegmentor`
- Add unit test comparing numerical matching

Reviewed By: zhanghang1989

Differential Revision: D29088421

fbshipit-source-id: ccb86ac4b4b90a63eeebdbf76b2bf31c1da65a8b

30fb79b6

12 Jun, 2021 1 commit

add EgoDETRRunner · 32dbb035

Zhicheng Yan authored Jun 12, 2021

Summary:
Major changes
- Add a new runner `EgoDETRRunner` which inherit from existing `DETRRunner` in D2 (https://github.com/facebookresearch/d2go/commit/62c21f252ad314961cf0157ee8f37cc4f7835e1d)GO repo.
- Add a new data mapper `EgoDETRDatasetMapper` which has custom crop transform generator and supports generic data augmentation.

Reviewed By: zhanghang1989

Differential Revision: D28895225

fbshipit-source-id: 4181ff8fce81df22a01d355fdff7e81e83d69e64

32dbb035

09 Jun, 2021 2 commits

remove deprecated silicon quantization · 62c21f25

Yanghan Wang authored Jun 09, 2021

Summary: EZ

Reviewed By: zhanghang1989

Differential Revision: D29000628

fbshipit-source-id: f954214dfe3a989fc145663f8bb1870812e78ce7

62c21f25

allow for multiple datasets for test data loader creation · fc690b45

Sam Tsai authored Jun 09, 2021

Summary: Use all training dataset for export instead of just first. This is to support use cases where there is only a small amount of images per jsons but a number of jsons. Since calibration uses the first dataset, it is limited by the number of images in a single dataset.

Reviewed By: ppwwyyxx

Differential Revision: D28902673

fbshipit-source-id: f80146b02d2d1bc04703fbb21ef410f5e26ba64c

fc690b45

07 Jun, 2021 1 commit

Disable replace_sampler_ddp · 20347488

Kai Zhang authored Jun 07, 2021

Summary: Detectron2 and D2 (https://github.com/facebookresearch/d2go/commit/81ab967feb650145d3a5904f20fdddd28be83445)Go use custom sampler, we don't need Lightning to add distributed sampler.

Reviewed By: ananthsub

Differential Revision: D28921092

fbshipit-source-id: ec8f310d0590ed92227935b979d59a06d7fb7a69

20347488

01 Jun, 2021 2 commits

misc update to config utils · 81ab967f

Yanghan Wang authored Jun 01, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/77

- Reimplement `get_cfg_diff_table` by reusing other utils
- Adding `reorder` option for `flatten_config_dict`
- Remove the legacy BC support for `ARCH_DEF`, including `str_wrap_fbnet_arch_def` and customized `merge_from_other_cfg`.
- Move `temp_defrost` from `utils.py` to `config.py`, this way there's no more namespace forwarding for `utils.py`
- Merge `test_config_utils.py` and `test_configs.py`

Reviewed By: zhanghang1989

Differential Revision: D28734493

fbshipit-source-id: 925f5944cf0e9019e4c54462e851ea16a5c94b8c

81ab967f

supporting embed config in extra_files · ad9f35c7

Yanghan Wang authored Jun 01, 2021

Reviewed By: sanjeevk42

Differential Revision: D28346869

fbshipit-source-id: b226acf5ee5d90be4ea183dc7de92133db4d5717

ad9f35c7

27 May, 2021 1 commit

add an option to set the number of test images · 73f0f05f

Tao Xu authored May 26, 2021

Summary: Add an option to set the number of test images. Thus, during finetune, we can set a small number of test images (for only visualization purpose) to save the time for evaluation.

Reviewed By: leehomyc

Differential Revision: D28720086

fbshipit-source-id: 8085be6a0f4f8742784e3dafe255716f3ae02acb

73f0f05f

25 May, 2021 3 commits

fix for checking device type · bf395ce5

Kai Zhang authored May 25, 2021

Summary: Currently we are checking if MODEL.DEVICE is "gpu", but actually we DEVICE could also be "cuda". This diff checks if device is "cpu" instead.

Reviewed By: wat3rBro

Differential Revision: D28689547

fbshipit-source-id: 7512d32b7c08b0dcdc6487c6c2f1703655e64b19

bf395ce5

update RCNN model test base · 0ab6d3f1

Yanghan Wang authored May 25, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/75

Refactor the base test case
- make test_dir valid throughout the test (rather than under local context), so individual test can load back the export model
- refactor the `custom_setup_test` for easier override.
- move parameterized into base class to avoid copying naming function

Reviewed By: zhanghang1989

Differential Revision: D28651067

fbshipit-source-id: c59a311564f6114039e20ed3a23e5dd9c84f4ae4

0ab6d3f1

Read number of processes from dist_config · 29b57165

Kai Zhang authored May 24, 2021

Summary: Currently when launching a training flow, we read number of processes from resources.num_gpus. To be backward compatible with existing D2 (https://github.com/facebookresearch/d2go/commit/f82d44d3c33e6c781a3c6f2b27b376fdfbaeda53)Go training config, this diff changes to dist_config.num_processes_per_machine instead.

Reviewed By: wat3rBro

Differential Revision: D28630334

fbshipit-source-id: 3c684cd56e5d2e247c7b82e1d1eeff0f39e59ee4

29b57165

24 May, 2021 1 commit

update runner in d2go_beginner script · f82d44d3

Yanghan Wang authored May 24, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/76

Detectron2GoRunner doesn't contain configs about exporting RCNN models, use GeneralizedRCNNRunner instead

Reviewed By: zhanghang1989

Differential Revision: D28652627

fbshipit-source-id: 6f324f608d8b2abdf98179a36e4b79837f135340

f82d44d3

22 May, 2021 2 commits

support FP16 gradient compression · 57809b0f

Zhicheng Yan authored May 21, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/70

DDP supports an fp16_compress_hook which compresses the gradient to FP16 before communication. This can result in a significant speed up.

Add one argument `_C.MODEL.DDP_FP16_GRAD_COMPRESS` to trigger it.

Reviewed By: zhanghang1989

Differential Revision: D28467701

fbshipit-source-id: 3c80865222f48eb8fe6947ea972448c445ee3ef3

57809b0f

Revert D27881742: Enable inference config in export step · daf37a84

Yanghan Wang authored May 21, 2021

Differential Revision:
D27881742 (https://github.com/facebookresearch/d2go/commit/90aff5daf608473dd312b300db8615326fa40a37)

Original commit changeset: 34a3ab7a88f4

fbshipit-source-id: 42c03b4f2b69c656b26774a4665b84b832262650

daf37a84

21 May, 2021 3 commits

Create __init__.py · 49b82aed

Ioannis Gatopoulos authored May 21, 2021

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/73

Reviewed By: kazhang

Differential Revision: D28610607

Pulled By: zhanghang1989

fbshipit-source-id: c079bc6fff64cd452db6750b3e984546c47fdfe7

49b82aed

Enable inference config in export step · 90aff5da

Sanjeev Kumar authored May 21, 2021

Summary:
- Enable sdk inference config specification in export step. This enables adding the sdk configuration as part of model file in the export step. The sdk config can be specified as infernece_config.yaml and is zipped together with torchscript model. The main goal of sdk configuration is to control the model inference behavior with model.
- SDK inference config design doc: https://docs.google.com/document/d/1j5qx8IrnFg1DJFzTnu4W8WmXFYJ-AgCDfSQHb2ACJsk/edit
- One click fblearner pipeline is in next diff on the stack

Differential Revision: D27881742

fbshipit-source-id: 34a3ab7a88f456b74841cf671ea1b3f678cdb733

90aff5da

adding bounding box only options · 27bef8e3

Sam Tsai authored May 20, 2021

Summary: Option to change only bounding boxes, others remain the same.

Differential Revision: D28339388

fbshipit-source-id: 7a6d4c5153cf10c473992119f4c684e0b9159b44

27bef8e3

17 May, 2021 2 commits

add dataset visualization · 536e9d25

Kai Zhang authored May 17, 2021

Summary: Add dataset visualization so that we could visualize test results in Tensorboard.

Reviewed By: zhanghang1989

Differential Revision: D28457363

fbshipit-source-id: 4c2fd9dce349c6fb9e1cec51c9138cf0abb45d7b

536e9d25

Remove run_on_bundled_input · fdd64119

Jacob Szwejbka authored May 17, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58344

remove a helper function thats more trouble then its worth.

ghstack-source-id: 129131889

Reviewed By: dhruvbird

Differential Revision: D28460607

fbshipit-source-id: 31bd6c1cc169785bb360e3113d258b612cad47fc

fdd64119

16 May, 2021 1 commit

create CfgNode with consistent type · cbd695ac

Zhicheng Yan authored May 16, 2021

Summary: Create new CfgNode that is consistent with the parent node.

Reviewed By: zhanghang1989

Differential Revision: D28318466

fbshipit-source-id: 38cb84de6bdfec2b283c4d9a1090cad47c118c9c

cbd695ac