Commits · 737d099b0a8b0fb1f548435e73f95e1252442827 · OpenDAS / d2go

13 Aug, 2021 1 commit

Reduce number of parameter groups to make optimizer more efficient · 737d099b

Valentin Andrei authored Aug 13, 2021

Summary:
`torch.optim._multi_tensor` provides faster Optimizer implementations as it uses foreach APIs. We can enable it by modifying from `OPTIMIZER: "ADAMW"` to `OPTIMIZER: "ADAMW_MT"` in the config file.

In order to profit from the speedup, we need to reduce the number of parameter groups as suggested in this post: https://fb.workplace.com/groups/1405155842844877/permalink/4971600462867046/

The current implementation uses one parameter group per parameter which is not optimal. The proposed change groups parameters by learning rate and weight decay combinations.

Reviewed By: zhanghang1989

Differential Revision: D30272112

fbshipit-source-id: d8d24298a59b52c2fc2930f7d614a0c6380a432f

737d099b

11 Aug, 2021 3 commits

Use current device id in dist.barrier · 6140395f

Kai Zhang authored Aug 11, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/detectron2/pull/3350

`get_local_rank` relies on a global variable set by Detectron2's `launch` utils.
Since other frameworks might use Detectron2's distribute utils but don't launch with Detectron2's `launch` utils. Use `torch.cuda.current_device` to get the current device instead.

Reviewed By: HarounH, ppwwyyxx

Differential Revision: D30233746

fbshipit-source-id: 0b140ed5c1e7cd87ccf05235127f338ffc40a53d

6140395f

workaround observer shapes seem to be important for EMA QAT model to have correct min_val/max_val · adf223bd

Tao Xu authored Aug 11, 2021

Summary:
Before this fix, the EMA GAN model will have inf min_val/max_val when converting QAT models to int8 torchscript model (as shown in f290518237).

https://pxl.cl/1MNx0

Reviewed By: yc-fb

Differential Revision: D23387923

fbshipit-source-id: 5c2119e2c5170e30c6059e7374c22e367fcd2b26

adf223bd

Enforce torch.float32 for ms_deform_attn when using AMP · fd79c680

Valentin Andrei authored Aug 10, 2021

Reviewed By: stephenyan1231

Differential Revision: D30225977

fbshipit-source-id: 479b96acc7f90a8ee2373ab44112e21086e9d1d2

fd79c680

06 Aug, 2021 2 commits

Implement Boltnn conversion · cb985322

Francisc Bungiu authored Aug 06, 2021

Summary:
Implementing `prepare_for_export` using the boltnn conversion from https://fburl.com/diffusion/ql1i3358.
Implementing `prepare_for_quant` using the quantization from https://fburl.com/diffusion/8nre9o03.

Differential Revision: D29817424

fbshipit-source-id: 800571ecf7f07d01c0a3a12100525354b48fe568

cb985322

Add MaskFormer to d2go · cbb6843e

Hang Zhang authored Aug 05, 2021

Summary: Add MaskFormer to d2go

Reviewed By: bichenwu09

Differential Revision: D30006691

fbshipit-source-id: 15c85f4ab8b3d515805d639ad8cf47532af81f5e

cbb6843e

05 Aug, 2021 2 commits

Clarifying the use of do_test function · 610d2d03

Abduallah Mohamed authored Aug 04, 2021

Summary: The `do_test` method might be used to perform testing outside the training process. One might think it will load the weights of the models before testing similar to `do_train` method. This diff adds a comment that clarifies this confusion.

Reviewed By: ppwwyyxx

Differential Revision: D29082338

fbshipit-source-id: 6ec7d7f7f243503414fa904f4eb8856e62e9ed6d

610d2d03

avoid warnings of NCCL · 30d5ca55

Yuxin Wu authored Aug 04, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/detectron2/pull/3322

avoid warnings like the following:
```
[W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by
this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is
incorrect. Specify device_ids in barrier() to force use of a particular device.
```

maybe can fix the hang in https://github.com/facebookresearch/detectron2/issues/3319

Reviewed By: vaibhava0

Differential Revision: D30077957

fbshipit-source-id: b8827e66c5eecc06b650acde2e7ff44106327f69

30d5ca55

04 Aug, 2021 1 commit

add default config for RandomSubsetTrainingSampler in D2go · b9209b69

Fu-Chen Chen authored Aug 04, 2021

Summary:
Add default config for RandomSubsetTrainingSampler in D2 (https://github.com/facebookresearch/d2go/commit/3ee8885047e7ffb9eadcc6a1ecf8253c7ce9f79e)go.

User can use use the RandomSubsetTrainingSampler with the following yaml configs
```
DATALOADER:
  SAMPLER_TRAIN: RandomSubsetTrainingSampler
  RANDOM_SUBSET_RATIO: [Desired_ratio]  # for RandomSubsetTrainingSampler
```

Reviewed By: XiaoliangDai

Differential Revision: D29892366

fbshipit-source-id: cabb67fb46e51a93a8342a42f77a8a4d23a933e9

b9209b69

03 Aug, 2021 3 commits

exploring deformable attention in transformer · 3ee88850

Hang Zhang authored Aug 03, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/105

exploring deformable attention in transformer

Reviewed By: bichenwu09

Differential Revision: D29093714

fbshipit-source-id: dd691754d9e439661e2eddecb3a1e7cefc8fe568

3ee88850

benchmark asyncio + multiprocessing · 666e605e

Yuxin Wu authored Aug 03, 2021

Reviewed By: newstzpz

Differential Revision: D29897245

fbshipit-source-id: 5f96fc17361e7dcc65b2b15c995ec6104496c5c7

666e605e

fix model_zoo links & retrain V3G mask rcnn · 30e798a6

Hang Zhang authored Aug 02, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/102

- fix model_zoo model urls (missed in D27992340 (https://github.com/facebookresearch/d2go/commit/477ab964e2165cb586b5c00425f6e463d7edeadd))
- update mask rcnn fbnet V3G config
- update v3g retrained weights

Reviewed By: ppwwyyxx, wat3rBro

Differential Revision: D29627615

fbshipit-source-id: 0694772e47b9c58965e47492177a5d6de53364cb

30e798a6

01 Aug, 2021 1 commit

stabilize deformable DETR training · a4f06b88

Zhicheng Yan authored Jul 31, 2021

Summary:
Deformable DETR training can be unstable due to iterative box refinement in the transformer decoder. To stabilize the training, introduce two changes
- Remove the unnecessary use of inverse sigmoid.
It is possible to completely avoid using inverse sigmoid when box refinement is turned on.
- In `DeformableTransformer` class, detach `init_reference_out` before passing it into decoder to update memory and computer per-decoder-layer reference points/

Reviewed By: zhanghang1989

Differential Revision: D29903599

fbshipit-source-id: a374ba161be0d7bcdfb42553044c4c6700e92623

a4f06b88

29 Jul, 2021 1 commit

Try DeiT, PiT with DETR in d2go · 0a458091

Hang Zhang authored Jul 28, 2021

Summary:
Add new backbone
Experimental results are https://fburl.com/7fyecmrc

Reviewed By: bichenwu09

Differential Revision: D26877909

fbshipit-source-id: ba3f97a1e4d84bec22d6a345f1fca06c741010cc

0a458091

21 Jul, 2021 1 commit

fix bug in valid_bbox check · b4d9aad9

Xi Yin authored Jul 20, 2021

Summary: In case the height/width is None, the original version will cause a crash. So adding additional check to bypass this issue.

Reviewed By: ppwwyyxx

Differential Revision: D29807853

fbshipit-source-id: b2b1a7edb52b7911da79a11329d4cf93f343c279

b4d9aad9

14 Jul, 2021 1 commit

fix a bug which breaks the GAN training · f0de0b50

Tao Xu authored Jul 14, 2021

Summary:
The GAN training pipeline is broken since last week, with the TypeError: cannot pickle
'_io.TextIOWrapper' object (refer to f283777469 for more details)

The issue is caused by D29379832 (https://github.com/facebookresearch/d2go/commit/5509a1383c1162081e9784d79eaf0b12ebbca1fd), in which the deepcopy is failed. This diff fixes this issue by disabling the FLOPs calculation when the deepcopy is failed.

Reviewed By: ppwwyyxx

Differential Revision: D29552182

fbshipit-source-id: 80d078b3d8ca68535e0366a412668e098b04ed04

f0de0b50

09 Jul, 2021 2 commits

Add tests for exporter / boltnn export via torch delegate · d0c38c43

Mircea Cimpoi authored Jul 09, 2021

Summary:
Adding test for previous diff.
Boltnn backend is supported on device -- so this test only checks if the conversion takes place and the output file is present.

Differential Revision: D29589245

fbshipit-source-id: ba66a733295304531d177086ce6459a50cfbaa07

d0c38c43

Add BoltNN conversion to d2go exporter · ecf832da

Mircea Cimpoi authored Jul 09, 2021

Summary:
Added predictor_type `boltnn_int8` to export to BoltNN via torch delegate.

- `int8` needs to be in the name, otherwise the post-train quantization won't happen;

```
cfg.QUANTIZATION.BACKEND = "qnnpack"
// cfg.QUANTIZATION.CUSTOM_QSCHEME = "per_tensor_affine"
```

Seems that ` QUANTIZATION.CUSTOM_QSCHEME per_tensor_affine` is not needed - likely covered by "qnnpack".

Reviewed By: wat3rBro

Differential Revision: D29106043

fbshipit-source-id: 865ac5af86919fe7b4530b48433a1bd11e295bf4

ecf832da

08 Jul, 2021 3 commits

fix a bug in D2GoDatasetMapper · abf2f327

Zhicheng Yan authored Jul 08, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/101

In D2 (https://github.com/facebookresearch/d2go/commit/4f3f3401173ee842995ec69a7ce2635e2deb178a)GoDatasetMapper, when crop transform is applied to the image. "Inputs" should be updated to use the cropped images before other transforms are applied later.

Reviewed By: zhanghang1989

Differential Revision: D29551488

fbshipit-source-id: 48917ffc91c8a80286d61ba3ae8391541ec2c930

abf2f327

remove redundant build_optimizer() · b1e2cc56

Zhicheng Yan authored Jul 08, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/96

In `DETRRunner`, the method `build_optimizer` customized the following logics, which are actually redundant to parent class implementation and can be removed.
- Discount LR for certain modules, such as those with name `reference_points`, `backbone`, and `sampling_offsets`.
  - Those can be achieved by `SOLVER.LR_MULTIPLIER_OVERWRITE` after we update `get_default_optimizer_params` in `mobile-vision/d2go/d2go/optimizer/build.py`.
- Full model gradient clipping
  - This is also implemented in `mobile-vision/d2go/d2go/optimizer/build.py`

It also has minor issues
- It ignores `SOLVER.WEIGHT_DECAY_NORM` which can set a different weight decay for affine parameters in the norm modules.

Reviewed By: zhanghang1989

Differential Revision: D29420642

fbshipit-source-id: deeb9348c9d282231c540dde6161acedd8e3a119

b1e2cc56

fix extended coco load missing comma · 4f3f3401

Sam Tsai authored Jul 07, 2021

Summary: Fix missing comma for extended coco load, which would ignore bbox_mode and keypoints field.

Reviewed By: zhanghang1989

Differential Revision: D29608815

fbshipit-source-id: 8c737df1dfef7f88494f7de25e06b0c37742ac30

4f3f3401

07 Jul, 2021 1 commit

Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS · 236b15cd

Daniel Li (AI) authored Jul 07, 2021

Summary: Set find_unused_parameters according to DDP_FIND_UNUSED_PARAMETERS with DDPPlugin

Reviewed By: kazhang

Differential Revision: D29567013

fbshipit-source-id: f3ffac566a2ff046f55e692b3b24f9531913d4d4

236b15cd

06 Jul, 2021 1 commit

Add the fields which will be used in point-based modeling. · 80c18641

Cheng-Yang Fu authored Jul 06, 2021

Summary:
Add the fields which will be used in point-based modeling.
- `point_coords` : indicates the point_coords in the image.
- `point_labels`: indicates the foreground or background points.

Differential Revision: D29532103

fbshipit-source-id: 9af6c9b049e1d05fd0d77909b09de1feec391ce9

80c18641

02 Jul, 2021 1 commit

revert D29048363 · e69e0ffe

Zhicheng Yan authored Jul 01, 2021

Summary:
In D29048363 (https://github.com/facebookresearch/d2go/commit/c480d4e4e213a850cced7758f7b62c20caad8820) we make the detaching of `reference_points` earlier in the hope of allowing more gradient flow to update weights in `self.bbox_embed`.
In this diff, we revert the changes as i) it does not improve box AP ii) it may potential cause in-stable optimization when iterative box refinement is turned on.

Reviewed By: zhanghang1989

Differential Revision: D29530735

fbshipit-source-id: 3217c863343836e129d53e07c0eedb2db8164fe6

e69e0ffe

01 Jul, 2021 1 commit

fallback to import from typing_extensions for older python version · ff9d5d38

Yanghan Wang authored Jun 30, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/98

https://github.com/facebookresearch/d2go/issues/60#issuecomment-863149605
#stamp2ship

Reviewed By: zhanghang1989

Differential Revision: D29495802

fbshipit-source-id: 0dc63b1ee1f7c8c0a694c39ce41ab77c25109c60

ff9d5d38

30 Jun, 2021 3 commits

Remove redundant quant/dequant in GenrealizedRCNN · 2ff49517

Jerry Zhang authored Jun 30, 2021

Summary: Removed quant/dequant between backbone and proposal generator, and roi_box_conv and the following avg_pool

Reviewed By: wat3rBro

Differential Revision: D29383036

fbshipit-source-id: ef07b3d1997b1fc7f92bcd9201523e9071510a8b

2ff49517

fix a bug at inference and code refactoring · b2b8e216

Zhicheng Yan authored Jun 30, 2021

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/97

Major changes
- Fix a bug within `inference()` function
- Refactor code to remove redundant code between `SetCriterion` and `FocalLossSetCriterion`.

Reviewed By: zhanghang1989

Differential Revision: D29481067

fbshipit-source-id: 64788f1ff331177db964eb36d380430799d1d2f2

b2b8e216

Fix typo in quantization callback · e830629a

Kai Zhang authored Jun 30, 2021

Summary: "fb" -> "fn"

Reviewed By: ananthsub

Differential Revision: D29480559

fbshipit-source-id: 78a0cd3ddd25df2c877514d4a5c0c29c248267a2

e830629a

29 Jun, 2021 3 commits

Load model weights from checkpoint in non-strict mode · 6e3f514f

Arman Kapbasov authored Jun 29, 2021

Summary: Updated load_from_checkpoint method call inside lighting_task.py to include extra 'strict' keyword parameter

Reviewed By: kazhang

Differential Revision: D29446372

fbshipit-source-id: b14bc13db551f0876ca78d3ea164cfb08e71a757

6e3f514f

StyleGAN2 · 4731c56a

Kai Zhang authored Jun 29, 2021

Summary: A Lightning task for training StyleGAN2.

Reviewed By: tax313

Differential Revision: D28922408

fbshipit-source-id: bdc9e7370de1b7b7ca9086bc6c0acbe66810d5f8

4731c56a

GAN task · 3805abbe

Kai Zhang authored Jun 29, 2021

Summary:
This diff introduces the D2 (https://github.com/facebookresearch/d2go/commit/9d9f438b191634dc38d16f3973e490909b7f67dd)Go GANs Lightning task for migrating D2 (https://github.com/facebookresearch/d2go/commit/9d9f438b191634dc38d16f3973e490909b7f67dd)Go's GANsRunner to Lightning based workflow.
The Lightning task could directly work with D2 (https://github.com/facebookresearch/d2go/commit/9d9f438b191634dc38d16f3973e490909b7f67dd)Go e2e workflow.

Reviewed By: tax313

Differential Revision: D28165835

fbshipit-source-id: 4d9d679e188f9d5f9a46f01f7d34a8f30c3e170b

3805abbe

27 Jun, 2021 2 commits

Move EMA weights to current device before training · 9d9f438b

Kai Zhang authored Jun 27, 2021

Summary:
Currently we move EMA weights to expected device right after loading from checkpoint.
However, by the time on_load_checkpoint hook is called, current GPU device has not been assigned. This could lead to EMA weights on cuda:0 while the model is on cuda:1.
This diff move EMA weights to device in `on_pretrain_routine_end` instead.

Reviewed By: zhanghang1989

Differential Revision: D28429843

fbshipit-source-id: d864fb3687eb6958872300c5ec0af7ce90591f83

9d9f438b

enable flop printing & logging at the beginning of train & test · 5509a138

Yuxin Wu authored Jun 27, 2021

Reviewed By: zhanghang1989

Differential Revision: D29379832

fbshipit-source-id: 9283a8796a1dbee81b51611407c22f7d5a2069dc

5509a138

26 Jun, 2021 1 commit

Fix quantization test failure · 1894f8a3

Kai Zhang authored Jun 25, 2021

Summary:
# Context
In post training quantization callback, we make a deepcopy of the Lightning module before validation start and prepare the copy with FX quantization API. The callback keeps the prepared model inside it.

# The problem
By the second time we run the validation epoch, we try to make a copy of the Lightning module, which has a reference to trainer, which has a reference to quantization callback, which has a prepared model, which is not deepcopiable.

# Mitigation
Delete the trainer before making a deepcopy.
We're already doing that in stl/callbacks/quantization, but the changes were not ported into D2 (https://github.com/facebookresearch/d2go/commit/4169abc18ec539a24081b179fcbbc5a5754d102b)Go.

Reviewed By: zhanghang1989

Differential Revision: D29409085

fbshipit-source-id: 24550124181673b2e567b2a04563bcdfb440e145

1894f8a3

25 Jun, 2021 3 commits

Freeze matched bn layers · 4169abc1

Haricharan Lakshman authored Jun 25, 2021

Summary:
Convert the batchnorm layers that match the specified regular expressions to FrozenBatchNorm2d.

If module is an instance of batchnorm and it matches the reg exps, returns a new FrozenBatchNorm2d module.

Otherwise, in-place converts the matching batchnorm child modules to FrozenBatchNorm2d
and returns the main module.

Reviewed By: ppwwyyxx

Differential Revision: D29286500

fbshipit-source-id: 3a20f5eeff59ddff50c42fe297eedf0ce2b909bc

4169abc1

read "bbox_mode" from annotation when filtering out images with invalid bbox · 77ef0db7

Luming Ma authored Jun 25, 2021

Summary: Some annotations are using XYXY_ABS for bbox mode so that many images were incorrectly filtered out by assuming XYWH_ABS mode. This diff read bbox_mode from annotation and convert bbox to XYWH_ABS before checking invalid bbox.

Differential Revision: D29365700

fbshipit-source-id: 355346b6826f401f504691090631997e169ead4a

77ef0db7

use src dataset name instead of the derived class name · d4aedb83

Sam Tsai authored Jun 25, 2021

Summary: "@ [0-9]classes" is appended to datasets to mark whether it is a derived class of the original one and saved as a config. When reloading the config, the derived class name will be used as the source instead of the original source. Adding a check to remove the derived suffix.

Reviewed By: wat3rBro

Differential Revision: D29315132

fbshipit-source-id: 0cc204d305d2da6c9f1817aaf631270bd874f90d

d4aedb83

24 Jun, 2021 1 commit

stabilize the training of deformable DETR with box refinement · c480d4e4

Zhicheng Yan authored Jun 23, 2021

Summary:
Major changes
- As described in details in appendix A.4 in deformable DETR paper (https://arxiv.org/abs/2010.04159), the gradient back-propagation is blocked at inverse_sigmoid(bounding box x/y/w/h from last decoder layer). This can be implemented by detaching tensor from compute graph in pytorch. However, currently we detach at an incorrect tensor, preventing update the layers which predicts delta x/y/w/h. Fix this bug.
- Add more comments to annotate data types and tensor shape in the code. This should NOT affect the actual implementation.

Reviewed By: zhanghang1989

Differential Revision: D29048363

fbshipit-source-id: c5b5e89793c86d530b077a7b999769881f441b69

c480d4e4

23 Jun, 2021 1 commit

easier way to separate internal code from oss · 37947353

Yanghan Wang authored Jun 22, 2021

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/90

Reviewed By: zhanghang1989

Differential Revision: D29279123

fbshipit-source-id: d94cea65bd439d54fd14afded0dba066799cedca

37947353

21 Jun, 2021 1 commit

additional flop counting using fvcore's flop counter · bc9d5070

Yuxin Wu authored Jun 21, 2021

Summary:
1. save 3 versions of flop count, using both mobile_cv's flop counter and fvcore's flop counter
2. print only a simple short table in terminal, but save others to files

The `print_flops` function seems not used anywhere so this diff just replaced it.

TODO: enable this feature automatically for train/eval workflows in the next diff

Reviewed By: zhanghang1989

Differential Revision: D29182412

fbshipit-source-id: bfa1dfad41b99fcda06b96c4732237b5e753f1bb

bc9d5070