Commits · 5d6bd7c2cdcdd5faa1f3f1b0395dfe40adf352f2 · OpenDAS / d2go

17 Feb, 2023 2 commits

interleave FSDP checkpointing to avoid manifold quota exceeding · 5d6bd7c2

Anthony Chen authored Feb 17, 2023

Summary:
X-link: https://github.com/facebookresearch/mobile-vision/pull/138

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/477

Interleave FSDP checkpointing to avoid excessive reading/writing patterns that may cause manifold quota exceeding error

Reviewed By: wat3rBro

Differential Revision: D43266742

fbshipit-source-id: 85549c3b10413e0ffad2f3ec8e198d8c77486478

5d6bd7c2

Remove unused import · 99709e93

Paul Deveau authored Feb 16, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/476

Removing unused List import from typing

Reviewed By: wat3rBro

Differential Revision: D43358109

fbshipit-source-id: 10d4b2289957657fd17f62b8fea073bb1db6dc10

99709e93

16 Feb, 2023 4 commits

Add an option to specify the period of metric gathering and writing in Trainer · 6f43a43a

Anthony Chen authored Feb 15, 2023

Summary:
X-link: https://github.com/fairinternal/detectron2/pull/591

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/469

X-link: https://github.com/facebookresearch/detectron2/pull/4785

Add an option to specify the period of metric gathering and writing in Trainer.

This feature is needed to optimize training speed for large-scale training jobs like generative AI. The reason is that the all_gather call in metric writing at every iteration is time-consuming when hundreds of gpus are used. This takes ~10% of the total training time. With this feature we can set the metric writing period as the same as cfg.WRITER_PERIOD=20 to reduce training time while still keeping metric logging the same to users

Reviewed By: miqueljubert, wat3rBro

Differential Revision:
D43098985

Privacy Context Container: 2011691122555468

fbshipit-source-id: 63c93a7331aa63badce5125e5240d2d5f7e61b74

6f43a43a

Add reply files to d2go training processes · f0f55cdc

Sudarshan Raghunathan authored Feb 15, 2023

Summary:
This diff contains a minimal set of changes to support returning reply files to MAST.

There are three parts:
1. First, we have a try..except in the main function to catch all the "catchable" Python exceptions. Exceptions from C++ code or segfaults will not be handled here.
2. Each exception is then written to a per-process JSON reply file.
3. At the end, all per-process files are stat-ed and the earliest file is copied to a location specified by MAST.

# Limitations
1. This only works when local processes are launched using multiprocessing (which is the default)
2. If any error happens in C++ code - it will likely not be caught in Python and the reply file might not have the correct logs

Differential Revision: D43097683

fbshipit-source-id: 0eaf4f19f6199a9c77f2ce4c7d2bbc2a2078be99

f0f55cdc

fix the issue of tensorboard visualization · b21607b1

Tao Xu authored Feb 15, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/473

As shown in the attached image and tb visualization, some of our jobs fail to save the results to tensorboard.
There should be some messages between circled lines of the screenshot if the images are added to tensorboard.
One possible reason is that the tensorbord visualization evaluator is only added for the rank 0 gpu. It may fail to fetch any data during evaluation of diffusion model which only do 1 batch of inference during validataion.
To resolve this issue, we add the visualization evaluator to all ranks of gpus and gather their results, and finally add the results with biggest batchsize to the tensorboard for visualization.

The screenshot is from f410204704 (https://www.internalfb.com/manifold/explorer/mobile_vision_workflows/tree/workflows/xutao/20230211/latest_train/dalle2_decoder.SIULDLpgix/e2e_train/log.txt)

Refactored the default_runner.py to have a new function _create_evaluators for create all evaluators. Thus we do not need to override the whole _do_test function in the runner which need to add the visualization evaluator of all ranks.

(Note: this ignores all push blocking failures!)

Reviewed By: YanjunChen329

Differential Revision: D43263543

fbshipit-source-id: eca2259277584819dcc5400d47fa4fb142f2ed9b

b21607b1

add type annotations to preserve return type · 31197c3e

Yanghan Wang authored Feb 15, 2023

Summary:
X-link: https://github.com/facebookresearch/mobile-vision/pull/137

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/475

Reviewed By: YanjunChen329

Differential Revision: D42148563

fbshipit-source-id: 76b794988bda7f773a734838c79d2de087d7ce94

31197c3e

14 Feb, 2023 3 commits

Add NUMA binding · 07ddd262

Fei Sun authored Feb 14, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/472

Add NUMA binding to d2go. It equally distributes the GPUs to the CPU sockets so that the CPU traffic, GPU to CPU traffic are all balanced. It helps the diffusion model training, but it is a general technique that can be applied to all models. We still want to manually enable it in each case though, until we are confident that it gives better performance and set it as a default.

NUMA binding is based on jspark1105's work D42827082. Full credit goes to him.

This diff does not enable the feature.

Reviewed By: newstzpz

Differential Revision: D43036817

fbshipit-source-id: fe67fd656ed3980f04bc81909cae7ba2527346fd

07ddd262

Add option to use fused adamw optimizer · 8bb24bb0

Fei Sun authored Feb 14, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/471

adamw recently added an option to use a fused optimizer. It may give better performance than foreach argument. However, we cannot enable it by default, since it requires all parameters to be in CUDA and maybe some other restrictions. So, enable it on a per project basis.

On DALLE2, it results about 23ms faster.

Reviewed By: newstzpz

Differential Revision: D43027327

fbshipit-source-id: 82c6855116094e86386ad2edeea3a74f9e555174

8bb24bb0

Ignore modules · 7ef9d897

Fei Sun authored Feb 13, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/470

Enable ignore FSDP modules. Those modules will not be put in FSDP. It is useful in the diffusion model, where the CLIP model is not used in training. Thus, it is OK to have a separate copy in each GPU. It reduces the CLIP execution time from 63ms to 48ms (15ms reduction). This is mostly because it is a CPU bounded module and in each FSDP block, some code is injected. In addition, it also reduces the FSDP all gather time before the CLIP execution from 56ms to 7ms (49ms reduction).

In total, this change may reduce the CLIP runtime from 119ms to 64ms (63ms reduction)

This feature is controlled by this flag:
IGNORED_MODULES: ["clip_model"]

Reviewed By: newstzpz

Differential Revision: D42910383

fbshipit-source-id: dc4c12254d45ac45d88329feb63a26ec4ae04aef

7ef9d897

05 Feb, 2023 1 commit

Fix TB train visualization · c4c512ce

Maayan Frid-Adar authored Feb 05, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/465

Training visualization was basically activated only for the first training iterations if TRAIN_LOADER_VIS_MAX_IMAGES and TRAIN_LOADER_VIS_WRITE_PERIOD were set to be > 0. because the MAX_IMAGES was taken as the number of samples to log + the allowed number of samples to load overall. So after the first log to TB it was set to 0 and the visualization was not activated for later training steps (ignoring WRITE_PERIOD).

I've added a TRAIN_LOADER_VIS_MAX_BATCH_IMAGES parameter to set a number of samples to visualize each write period up to the max images defined with TRAIN_LOADER_VIS_MAX_IMAGES

Reviewed By: tglik

Differential Revision: D42832903

fbshipit-source-id: 02a0d9aa4ea6d0ee725120916d26b77843a3e8ab

c4c512ce

04 Feb, 2023 1 commit

Allow multiple separators in transform string · 8311dc45

Mircea Cimpoi authored Feb 03, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/464

Allow multiple and nested transforms in to be parsed.

Reviewed By: wat3rBro

Differential Revision: D42997149

fbshipit-source-id: 317a27351342f44facab947ca0cba74fbc6c94bb

8311dc45

03 Feb, 2023 1 commit

Add HSDP · 0753f8b4

Fei Sun authored Feb 03, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/463

Enable HSDP when training models.

Reviewed By: wat3rBro

Differential Revision: D42658128

fbshipit-source-id: 3c37c3b6c4abaa54d677447ee704f2e18c9d3b26

0753f8b4

01 Feb, 2023 2 commits

missing keys in _convert_to_d2 · c5bf9222

Licheng Yu authored Feb 01, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/462

Fix errors in `_convert_to_d2`. Sometimes the keys are missing, we don't need remove them.

{F860805441}

Reviewed By: newstzpz

Differential Revision: D42929485

fbshipit-source-id: 8584879df5a07cbe5a864b4f170eef3d5f34dd6c

c5bf9222

Allow specifying extra lightning trainer params via `_DEFAULTS_` in yaml · 6940fa9c

Yanghan Wang authored Feb 01, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/461

There're needs for extending trainer parameters that are not in (or conflict with) the base d2go config, this diff adds a way to inject those configs without touching the base d2go config.
- In `get_trainer_params`, it simply checks the `LIGHTNING_TRAINER` and use whatever configs under it.
- Adds `GeneralizedRCNNTaskNoDefaultConfig`, which allows specify default config via yaml file for `GeneralizedRCNNTask`. (also make some changes for prerequisite)
- (next diff) User can add their own config updater by registering it in `CONFIG_UPDATER_REGISTRY`.

Differential Revision: D42928992

fbshipit-source-id: f2a1d8a3f2bec9908bb1af03928611d963b92c0e

6940fa9c

23 Jan, 2023 1 commit

Parallelize EMA optimizer · eb184a78

Francisc Bungiu authored Jan 23, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/451

Tracing d2go runners using adamw optimizer yielded small operators being executed in the EMA code. They can be fused together by using multi-tensor API.

Reviewed By: tglik

Differential Revision: D42098310

fbshipit-source-id: 544d7e214964530ec03674986827410b0f60951f

eb184a78

16 Jan, 2023 1 commit

Moving add_cfg_nodes helper from config.py to runner.py · 554b6992

Anastasia Tkach authored Jan 16, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/460

Moving add_cfg_nodes helper from config.py to runner.py

Reviewed By: wat3rBro

Differential Revision: D42209435

fbshipit-source-id: 6eac4987c57df148307911e4fe87d99d8590d4ac

554b6992

14 Jan, 2023 1 commit

Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation · 0e0a67f4

Salil Desai authored Jan 14, 2023

Summary:
X-link: https://github.com/pytorch/pytorch/pull/92081

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/459

Reland of D41690203 (https://github.com/facebookresearch/d2go/commit/18de6ffb19037e8ea11057515cc3d6d966a6e799)

Remove MobileOptimizerType and all rewrite flags from torch.X and torch._C.X to clean up torch.X and torch._C.X namespaces

The affected rewrite flags are
- CONV_BN_FUSION
- FUSE_ADD_RELU
- HOIST_CONV_PACKED_PARAMS
- INSERT_FOLD_PREPACK_OPS
- REMOVE_DROPOUT
- VULKAN_AUTOMATIC_GPU_TRANSFER

Bc-Breaking Change:

Before this change, the rewrite flags were accessible through all of
1. torch.utils.mobile_optimizer.MobileOptimizerType.X
2. torch._C.MobileOptimizerType.X
3. torch.X
4. torch.MobileOptimizerType.X
5. torch._C.X

But after this change, only torch.utils.mobile_optimizer.MobileOptimizerType.X  (option 1 above) and the newly added torch._C._MobileOptimizerType.X remain

Corresponding updates to PyTorch Tutorial Docs are in https://github.com/pytorch/tutorials/pull/2163

Reviewed By: SS-JIA

Differential Revision: D42442395

fbshipit-source-id: 14500b3667f541fd1ec85b1624125120176c6fd0

0e0a67f4

13 Jan, 2023 4 commits

Make AMP compatible with FSDP · abf0ca0c

Anthony Chen authored Jan 12, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/458

Make AMP compatible with FSDP. FSDP does not depend on the torch AMP module and implements its own MixedPrecision module. This MixedPrecision module directly saves additional copy of weights in lower precision and use run these tensors in mixed precision training. This is very different from AMP, which automatically casts tensors to lower precision upon tensor operations.

This diff solves some compatibility bugs between AMP and FSDP with 2 changes:
1. Use "never_wrap_policy" as the default dummy autowrap policy.
FSDP Mixed Precision doesn't work with Batchnorm layers. This is because FSDP and other resources like NVidia apex highly discourage running lower precision for batchnorm: https://github.com/pytorch/pytorch/issues/75478. We need to use some autowrap policy in order to let FSDP surpass batchnorm layers in constructing mixed precision.
2. Wrap FSDPWrapper.forward() with autocast()
FSDP Mixed Precision uses lower-precision tensors in computation, which could raise type mismatch error when amp.autocast() is not enabled, like in eval. Thus, we wrap FSDP forward() with autocast()

Reviewed By: wat3rBro

Differential Revision: D41328834

fbshipit-source-id: 18cf94c4ad8d9422ffd3bb335873cd29ac987ae9

abf0ca0c

Convert local checkpoint to global one automatically in d2go FSDP checkpointer · 5ad2d57e

Anthony Chen authored Jan 12, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/446

## Design
Following D41861308, local checkpoints need to be converted to global ones before being loaded and used in non-FSDP wrapped models. This diff implements such conversion in d2go checkpointer level to allow automatic conversion with minimal user interference and no new config key.

In previous diff, `FSDPWrapper` has 2 loading modes and 2 saving modes: it uses `load_local_state_dict` to determine whether the ckpt we want to load is local or global, and uses `use_local_state_dict` to decide whether to save new ckpts as local or global. Thus, there are 4 combinations of loading/saving modes:
1. load local + save local
2. load local + save global
3. load global + save local
4. load global + save global

And the local-to-global checkpoint conversion maps to mode 2: load local + save global. Thus, when the checkpointer is in mode 2, it automatically saves the model to a global ckpt right after it loads the local ckpt. Because this happens in checkpointer level, normal training/eval can resume after ckpt conversion. This gives users a consistent and seamless experience with normal training/eval, while also providing a separate ckpt conversion feature via eval-only.

## Usage
Suppose we want to convert local checkpoint `/tmp/model_final`, user can run the same training command with extra args: `MODEL.WEIGHTS=/tmp/model_final` and `FSDP.USE_LOCAL_STATE_DICT=False`

Wiki: https://www.internalfb.com/intern/wiki/Mobile_Vision/Detectron2Go/D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go_Tutorials/Diffusion_Pipeline/Diffusion_Model_Inference/#using-checkpoints-traine

Reviewed By: wat3rBro

Differential Revision: D41926662

fbshipit-source-id: 18a62607a79b0e917d929e9ea85ac1658fb895ca

5ad2d57e

Support local state dict checkpointing for FSDP · eea6339f

Anthony Chen authored Jan 12, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/457

## Context:

The Pytorch FSDP (Fully Sharded Data Parallel) backend supports two checkpointing modes. The first one is full_state_dict mode, where each FSDP worker summons parameters from other workers to produce a global state dict that can be loaded by non-FSDP models. This mode is the desired mode for checkpointing because checkpoint structures and key names follows the default convention. It's already supported in D39228316 (https://github.com/facebookresearch/d2go/commit/02625ff83207b836df349eadc4a61eb3d4a5810c)

However, when the model is too large to fit into a single GPU memory, this approach would fail because a worker's GPU can't hold all the summoned parameters during checkpoint saving. The rescue is to use the second checkpointing mode: local_state_dict. This mode saves the sharded parameters in each GPU process locally. It can only be loaded by FSDP-wrapped models with the same distributed training settings (i.e. num processes), but it reduces the need for summoning parameters and greatly saves peak GPU memory during training

This diff enables local state dict checkpointing in d2go.

## API:

This diff supports both **saving** local state and **loading** state dict that is locally sharded. Whether to save local state is controlled by `FSDP.USE_LOCAL_STATE`. If `FSDP.USE_LOCAL_STATE=True` and we want to save `output/model_0000001.pth` as in the old pattern, the local checkpoints will be saved as:
```
- output
- model_0000001
- rank0.pth
- rank1.pth
- rank2.pth
- rank3.pth
```
Whether to load local state, on the other hand, is controlled by the path of the checkpoint to load. If the path is a file, i.e. `output/model_final.pth`, the file will be loaded as a full state dict by all GPU processes like before. If the path is a directory, i.e. `output/model_final`, the checkpointer will attempt to load `output/model_final/rankX.pth` for rank X.

This API design enables the full combinations of loading local/full states and saving local/full states.

## Conversion to full state dict [Temporary]

Conversion from local state dict to full state dict is needed during an e2e workflow. This will be implemented in another diff

Reviewed By: wat3rBro

Differential Revision: D41861308

fbshipit-source-id: 2e01b601683d06b46f0c5517c6cff30bbcffa8f7

eea6339f

Rewrite FSDP wrapping as modeling hook · dc6fac12

Anthony Chen authored Jan 12, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/440

Move FSDP wrapping to runner.build_model by rewriting it as a modeling hook

**Motivation**
When a model is too large to run inference on a single GPU, it requires using FSDP with local checkpointing mode to save peak GPU memory. However, in eval_pytorch workflow (train_net with eval-only), models are evaluated without being wrapped by FSDP. This may cause OOM errors for the reasons above. Thus, it may be a better practice to wrap model with FSDP during `runner.build_model(cfg)`, so evaluation can also be run in the same FSDP setting as in training.

This diff moves FSDP wrapping to `runner.build_model(cfg)` by rewriting it as a modeling hook.

**API changes**
* Users need to append `"FSDPModelingHook"` to `MODEL.MODELING_HOOKS` to enable FSDP.
* `FSDP.ALGORITHM` can only be `full` or `grad_optim`

**Note**
It's not possible to unwrap an FSDP model back to the normal model, so FSDPModelingHook.unapply() can't be implemented

Reviewed By: wat3rBro

Differential Revision: D41416917

fbshipit-source-id: f3fc72d574cc6ccbe0d238e48c575926ba5b4d06

dc6fac12

10 Jan, 2023 2 commits

Revert D41690203: Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation · 7b2ba6cb

Danylo Baibak authored Jan 10, 2023

Differential Revision:
D41690203 (https://github.com/facebookresearch/d2go/commit/18de6ffb19037e8ea11057515cc3d6d966a6e799)

Original commit changeset: d901bdcbd16a

Original Phabricator Diff: D41690203 (https://github.com/facebookresearch/d2go/commit/18de6ffb19037e8ea11057515cc3d6d966a6e799)

fbshipit-source-id: bba9a29d5c2b6b69423160726d29c44843023fb0

7b2ba6cb

Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation · 18de6ffb

Salil Desai authored Jan 10, 2023

Summary:
X-link: https://github.com/pytorch/pytorch/pull/91600

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/452

bypass-github-export-checks

Remove MobileOptimizerType and all rewrite flags from torch.X and torch._C.X to clean up torch.X and torch._C.X namespaces

The affected rewrite flags are
- CONV_BN_FUSION
- FUSE_ADD_RELU
- HOIST_CONV_PACKED_PARAMS
- INSERT_FOLD_PREPACK_OPS
- REMOVE_DROPOUT
- VULKAN_AUTOMATIC_GPU_TRANSFER

Bc-Breaking Change:

Before this change, the rewrite flags were accessible through all of
1. torch.utils.mobile_optimizer.MobileOptimizerType.X
2. torch._C.MobileOptimizerType.X
3. torch.X
4. torch.MobileOptimizerType.X
5. torch._C.X

But after this change, only torch.utils.mobile_optimizer.MobileOptimizerType.X  (option 1 above) and the newly added torch._C._MobileOptimizerType.X remain

Corresponding updates to PyTorch Tutorial Docs are in https://github.com/pytorch/tutorials/pull/2163

Reviewed By: kimishpatel

Differential Revision: D41690203

fbshipit-source-id: d901bdcbd16a594c3268e09b57c61b38c33a562f

18de6ffb

05 Jan, 2023 3 commits

add option to load checkpoints to GPU · c688c175

Anthony Chen authored Jan 04, 2023

Summary:
X-link: https://github.com/facebookresearch/detectron2/pull/4667

X-link: https://github.com/fairinternal/detectron2/pull/578

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/411

Add config option `cfg.LOAD_CKPT_TO_GPU` to load checkpoints to the worker's current GPU

Previously, D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)go maps checkpoints to CPU before loading them to the model. In large-scale distributed training, many GPU processes may be used to train a model. This means each process will load the model checkpoint to a single CPU, causing the same model checkpoint to be loaded many times. This would cause CPU OOM issue when the model checkpoint size is large.

There're two solutions to this problem. One is to load checkpoints to GPU; the other one is to use share memory for the checkpoint between different GPU processes. This diff implements the first solution, which can support cases where model size + model checkpoint size is smaller than the total GPU memory. The second solution may be revisited for large models that need to offload checkpoints to cpu. Reference diff: D40789062

Reviewed By: mcimpoi

Differential Revision: D41063306

fbshipit-source-id: edcfd390a25582fffb2f1a6a7fc22917874ee2fc

c688c175

pin protobuf version to 3.20.2 · 61e5ddce

Yanghan Wang authored Jan 04, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/456

`3.20.1` version has security issue, as https://github.com/facebookresearch/d2go/security/dependabot/1.`3.20.x+` has previously known compatibility issue, thus pin the version to `3.20.2`.

Reviewed By: YanjunChen329

Differential Revision: D42353218

fbshipit-source-id: 172d3a2d53f0c46326e0ae730ad0a96b33b8bade

61e5ddce

use torch.testing.assert_close in test_modeling_distillation · c088c257

Yanghan Wang authored Jan 04, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/455

The test can be flaky due to numerical mismatch if using `self.AssertEqual`, eg. https://www.internalfb.com/intern/testinfra/diagnostics/1688850007977704.562950031998292.1672749571/

```
Traceback (most recent call last):
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/buck-out/v2/gen/fbcode/104a4d5c3a690252/mobile-vision/d2go/tests/__modeling_test_modeling_distillation__/modeling_test_modeling_distillation#link-tree/d2go/tests/modeling/test_modeling_distillation.py", line 674, in test_da_train
    self.assertEqual(
AssertionError: {'rea[14 chars]2894], grad_fn=<MulBackward0>), 'synthetic': t[85 chars]d0>)} != {'rea[14 chars]2894]), 'synthetic': tensor([1.4532]), 'add': [13 chars]64])}
- {'add': tensor([18.0064], grad_fn=<MulBackward0>),
-  'real': tensor([0.2894], grad_fn=<MulBackward0>),
-  'synthetic': tensor([1.4532], grad_fn=<MulBackward0>)}
+ {'add': tensor([18.0064]),
+  'real': tensor([0.2894]),
+  'synthetic': tensor([1.4532])}
```

.Change to use `torch.testing.assert_close` instead for tensor comparison.

Reviewed By: YanjunChen329

Differential Revision: D42352509

fbshipit-source-id: 8a647685d1347a9bd493f2faed7e066eb9159e14

c088c257

04 Jan, 2023 1 commit

upgrade pytorch-lightning version to 1.8.6 · 9e93852d

Yanghan Wang authored Jan 04, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/453

Previous diffs updated the LRScheduler to public version (eg. https://github.com/facebookresearch/detectron2/pull/4709), this also requires newer version of pytorch-lightning. This diff upgrades the lightning version to 1.8.6, also fixes some deprecated call sites of old lightning versions.
- `deepcopy` seems to be supported now, remove `_deepcopy` (there's now not allowed to access `trainer` attributed when it is `None`)
- `dataloader_idx` is removed from `on_train_batch_start`.
- stop using `_accelerator_connector` (the AcceleratorConnector doesn't have those attributes anymore).
- deprecated `on_pretrain_routine_end` -> `on_fit_start`

Reviewed By: YanjunChen329

Differential Revision: D42319019

fbshipit-source-id: ba46abbd98da96783e15d187a361fda47dc7d4d6

9e93852d

20 Dec, 2022 1 commit

create_toy_dataset: instance indices start at 1 · 2246aba3

Manuel Lopez Antequera authored Dec 19, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/447

The first instance of each toy dataset was ignored when running cocoeval (null instance ids in the gt are not counted). See https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L39

Reviewed By: wat3rBro

Differential Revision: D42080839

fbshipit-source-id: df5c758ba0a858a514c6d4a3c68d659b5b7220e5

2246aba3

19 Dec, 2022 4 commits

add check to `import_runner` · fb41d071

Yanghan Wang authored Dec 19, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/450

move checking logic into `import_runner`, simplfy `_is_lightning_task`

Reviewed By: mcimpoi

Differential Revision: D42105853

fbshipit-source-id: 5fd51865a01f2cbac38aaedcac49207c26172ab9

fb41d071

temp_new_allowed · 7bed2910

Haroun Habeeb authored Dec 19, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/438

Adding new fields to a config is only allowed if `new_allowed=True`. yacs `CfgNode` provides a `set_new_allowed(value: bool)` function.

We create a context manager like `temp_defrost` but for new_allowed to use it. We also implement unit test for the same

Reviewed By: yanglinfang, newstzpz, wat3rBro

Differential Revision: D41748992

fbshipit-source-id: 71d048511476001ca96e6b36dde4d177b11268d7

7bed2910

separate TestNetOutput and TrainNetOutput · e2537c82

Yanghan Wang authored Dec 19, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/449

separate TestNetOutput and TrainNetOutput
- update d2go binaries
- update operators / workflows

Reviewed By: mcimpoi

Differential Revision: D42103714

fbshipit-source-id: 53f318c79d7339fb6fcfc3486e8b9cf249a598bf

e2537c82

Fix WeightedSampler to also work with adhoc datasets · ab49d0b6

Anton Rigner authored Dec 19, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/437

# Problem
- We use `TRAIN_CATEGORIES` to overrider the classes for convenient experimentation, to not have to re-map the JSON file
- But it's not possible to use the WeightedTrainingSampler with specified repeat factors (`DATASETS.TRAIN_REPEAT_FACTOR`) when also overriding the classes to use for training (ad-hoc datasets), because the underlying dataset name doesn't match the datasets specified in the `TRAIN_REPEAT_FACTOR` pairs (mapping between <dataset_name, repeat_factor>)

# Fix

- Update the dataset names for the REPEAT_FACTORS mapping as well, if we have enabled the `WeightedTrainingSampler` and use ad-hoc datasets.

Reviewed By: wat3rBro

Differential Revision: D41765638

fbshipit-source-id: 51dad484e4d715d2de900b5d0b7c7caa19903fb7

ab49d0b6

16 Dec, 2022 1 commit

Use parallel version of AdamW optimizer · b5e5b0ad

Francisc Bungiu authored Dec 16, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/448

Tracing d2go runners with using adamw optimizer yielded small operators being executed in the optimizer code. They can be fused together by using the foreach version.

QPS gain is ~4.5%.

Reviewed By: miqueljubert

Differential Revision: D42004110

fbshipit-source-id: 807e0a297bb0b4272f67cc4348389294145a20eb

b5e5b0ad

12 Dec, 2022 2 commits

MinIoURandomCrop augmentation in d2go for detection models · 02723f24

Olga Gerasimova authored Dec 12, 2022

Summary:
X-link: https://github.com/fairinternal/detectron2/pull/589

X-link: https://github.com/facebookresearch/detectron2/pull/4702

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/443

Add MinIoURandomCrop augmentation, compare model performance.
Example of aug
{F822053068}{F822053066}{F822053051}
Data overhead is around 0.04 sec
without aug {F822053812}
with aug {F822053818}

Reviewed By: zechenghe

Differential Revision: D41804643

fbshipit-source-id: 8f13f98fa8132378a803534b59e892fbc1b3058c

02723f24

Running with D2Go · 93301dad

Anastasia Tkach authored Dec 12, 2022

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/442

Reviewed By: tglik

Differential Revision: D41714933

fbshipit-source-id: 5b2b3610af554f6082a4025af0673b4bc34b17ca

93301dad

09 Dec, 2022 1 commit

One EMAState in D2go 1/N - model_ema.py --> ema.py · aae8381a

Mircea Cimpoi authored Dec 09, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/436

Renaming `model_ema.py` to `ema.py` (as `modeling` is already in the folder name. Fixing dependencies after rename

Reviewed By: wat3rBro

Differential Revision: D41685115

fbshipit-source-id: 006999a020a901ea8be4b71e072d688bd36cdce2

aae8381a

08 Dec, 2022 1 commit

Support caching arbitrary nested data structures of Tensor · 40a6a453

Siddharth Shah authored Dec 08, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/439

As title

Reviewed By: mattcyu1

Differential Revision: D41759804

fbshipit-source-id: 929efa960be570f0fe8543600e012d1bf037ab3b

40a6a453

30 Nov, 2022 3 commits

support caching tuples · dece58ba

Matthew Yu authored Nov 30, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/432

We support caching of tuples since they behave similarly to lists

Reviewed By: XiaoliangDai

Differential Revision: D41483876

fbshipit-source-id: 9d741074f8e2335ddd737ae3f1bdb288910f5564

dece58ba

algorithm · 150db2d1

Matthew Yu authored Nov 30, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/431

Add a generic domain adaptation algorithm. This algorithm:
* gets domain0 data out of the dataloader
* runs domain0 data into the model and saves target layer output
* gets domain1 data of the dataloader
* runs domain1 data into the model and saves target layer output
* runs domain adaptation loss on domain0, domain1 outputs
* combines losses using model training iteration

This diffs adds `get_preprocess_domain0_input` and `get_preprocess_domain1_input` to the distillation helper. These are functions that the user can use to convert the dataloader output to something that will be used by the model (e.g., pull the domain0 or domain1 key out of a dataloader that returns a dict).

Differential Revision: D40970724

fbshipit-source-id: fff050fbe864654fa6cb0df927f6843855ec1c14

150db2d1

support registering layer losses to model · c4860c5b

Matthew Yu authored Nov 30, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/430

We add losses in distillation by instantiating them in the distillation algorithm's init and then running them during the forward pass.

However this has some issues:
* the losses are not registered as a module in the model since they we organize them as a list of layerlossmetadata => this means that things like AMP do not behave as expected
* the losses are not on the same device as the rest of the model since they are created potentially after the model is moved to a new device

This diff solves both of these issues by including a helper function that registers and moves the losses to the same device as the model. `register_layer_losses_and_to_device` takes as input `List[LayerLossMetadata]`, moves the losses to the same device as the model and then registers these losses to the model.

Differential Revision: D41296932

fbshipit-source-id: ae7ae0847bce1b5cc481d838b9cae69cea424f25

c4860c5b