Commits · 67267821419c56b3ab64e7d744609e23598454fb · OpenDAS / d2go

30 Mar, 2023 1 commit

Mircea Cimpoi authored Mar 30, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/518

Enable profiling for eval step only, not on every eval (which can be called during training)

Reviewed By: frabu6

Differential Revision: D44535915

fbshipit-source-id: 4497a3f74f5d751277df9ed41bc9bf21056341c4

67267821

11 Mar, 2023 1 commit

print grad scaler as part of the metric. · 1506551f

Peizhao Zhang authored Mar 10, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/501

X-link: https://github.com/facebookresearch/detectron2/pull/4851

print grad scaler as part of the metric.
* Controlled by a flag "SOLVER.AMP.LOG_GRAD_SCALER"

Reviewed By: tax313

Differential Revision: D43585363

fbshipit-source-id: 495b37ff524c47e515cea0b3c677ee81b34ad4ca

1506551f

09 Mar, 2023 1 commit

Add autotracing to eval step · 25049cdf

Mircea Cimpoi authored Mar 09, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/499

Add `prepare_fb_model_for_eval` override; no-op.

Reviewed By: frabu6

Differential Revision: D43906444

fbshipit-source-id: 97e06f1de8f3ba07808a0493d3d216031ff011d0

25049cdf

25 Feb, 2023 1 commit

add collate_fn to dataloader · cd9c320d

Naveen Suda authored Feb 25, 2023

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/483

Reviewed By: YXIE14

Differential Revision: D42733542

fbshipit-source-id: 0dc936c536554b5beead462eaf74bc007758c12e

cd9c320d

23 Feb, 2023 1 commit

add ai infra checkpointer support for d2go · 34a5a3e8

Matthew Yu authored Feb 22, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/479

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/467

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/466

This allows internal solution to be plugged in, in a generic fashion, rather than relying on training patterns (FSDP or not).

Reviewed By: wat3rBro

Differential Revision: D42983444

fbshipit-source-id: a70bf0d25737d9cbbf22e3368363d3fdec57b8b5

34a5a3e8

16 Feb, 2023 2 commits

Add an option to specify the period of metric gathering and writing in Trainer · 6f43a43a

Anthony Chen authored Feb 15, 2023

Summary:
X-link: https://github.com/fairinternal/detectron2/pull/591

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/469

X-link: https://github.com/facebookresearch/detectron2/pull/4785

Add an option to specify the period of metric gathering and writing in Trainer.

This feature is needed to optimize training speed for large-scale training jobs like generative AI. The reason is that the all_gather call in metric writing at every iteration is time-consuming when hundreds of gpus are used. This takes ~10% of the total training time. With this feature we can set the metric writing period as the same as cfg.WRITER_PERIOD=20 to reduce training time while still keeping metric logging the same to users

Reviewed By: miqueljubert, wat3rBro

Differential Revision:
D43098985

Privacy Context Container: 2011691122555468

fbshipit-source-id: 63c93a7331aa63badce5125e5240d2d5f7e61b74

6f43a43a

fix the issue of tensorboard visualization · b21607b1

Tao Xu authored Feb 15, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/473

As shown in the attached image and tb visualization, some of our jobs fail to save the results to tensorboard.
There should be some messages between circled lines of the screenshot if the images are added to tensorboard.
One possible reason is that the tensorbord visualization evaluator is only added for the rank 0 gpu. It may fail to fetch any data during evaluation of diffusion model which only do 1 batch of inference during validataion.
To resolve this issue, we add the visualization evaluator to all ranks of gpus and gather their results, and finally add the results with biggest batchsize to the tensorboard for visualization.

The screenshot is from f410204704 (https://www.internalfb.com/manifold/explorer/mobile_vision_workflows/tree/workflows/xutao/20230211/latest_train/dalle2_decoder.SIULDLpgix/e2e_train/log.txt)

Refactored the default_runner.py to have a new function _create_evaluators for create all evaluators. Thus we do not need to override the whole _do_test function in the runner which need to add the visualization evaluator of all ranks.

(Note: this ignores all push blocking failures!)

Reviewed By: YanjunChen329

Differential Revision: D43263543

fbshipit-source-id: eca2259277584819dcc5400d47fa4fb142f2ed9b

b21607b1

14 Feb, 2023 1 commit

Add NUMA binding · 07ddd262

Fei Sun authored Feb 14, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/472

Add NUMA binding to d2go. It equally distributes the GPUs to the CPU sockets so that the CPU traffic, GPU to CPU traffic are all balanced. It helps the diffusion model training, but it is a general technique that can be applied to all models. We still want to manually enable it in each case though, until we are confident that it gives better performance and set it as a default.

NUMA binding is based on jspark1105's work D42827082. Full credit goes to him.

This diff does not enable the feature.

Reviewed By: newstzpz

Differential Revision: D43036817

fbshipit-source-id: fe67fd656ed3980f04bc81909cae7ba2527346fd

07ddd262

13 Jan, 2023 1 commit

Rewrite FSDP wrapping as modeling hook · dc6fac12

Anthony Chen authored Jan 12, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/440

Move FSDP wrapping to runner.build_model by rewriting it as a modeling hook

**Motivation**
When a model is too large to run inference on a single GPU, it requires using FSDP with local checkpointing mode to save peak GPU memory. However, in eval_pytorch workflow (train_net with eval-only), models are evaluated without being wrapped by FSDP. This may cause OOM errors for the reasons above. Thus, it may be a better practice to wrap model with FSDP during `runner.build_model(cfg)`, so evaluation can also be run in the same FSDP setting as in training.

This diff moves FSDP wrapping to `runner.build_model(cfg)` by rewriting it as a modeling hook.

**API changes**
* Users need to append `"FSDPModelingHook"` to `MODEL.MODELING_HOOKS` to enable FSDP.
* `FSDP.ALGORITHM` can only be `full` or `grad_optim`

**Note**
It's not possible to unwrap an FSDP model back to the normal model, so FSDPModelingHook.unapply() can't be implemented

Reviewed By: wat3rBro

Differential Revision: D41416917

fbshipit-source-id: f3fc72d574cc6ccbe0d238e48c575926ba5b4d06

dc6fac12

05 Jan, 2023 1 commit

add option to load checkpoints to GPU · c688c175

Anthony Chen authored Jan 04, 2023

Summary:
X-link: https://github.com/facebookresearch/detectron2/pull/4667

X-link: https://github.com/fairinternal/detectron2/pull/578

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/411

Add config option `cfg.LOAD_CKPT_TO_GPU` to load checkpoints to the worker's current GPU

Previously, D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)go maps checkpoints to CPU before loading them to the model. In large-scale distributed training, many GPU processes may be used to train a model. This means each process will load the model checkpoint to a single CPU, causing the same model checkpoint to be loaded many times. This would cause CPU OOM issue when the model checkpoint size is large.

There're two solutions to this problem. One is to load checkpoints to GPU; the other one is to use share memory for the checkpoint between different GPU processes. This diff implements the first solution, which can support cases where model size + model checkpoint size is smaller than the total GPU memory. The second solution may be revisited for large models that need to offload checkpoints to cpu. Reference diff: D40789062

Reviewed By: mcimpoi

Differential Revision: D41063306

fbshipit-source-id: edcfd390a25582fffb2f1a6a7fc22917874ee2fc

c688c175

09 Dec, 2022 1 commit

One EMAState in D2go 1/N - model_ema.py --> ema.py · aae8381a

Mircea Cimpoi authored Dec 09, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/436

Renaming `model_ema.py` to `ema.py` (as `modeling` is already in the folder name. Fixing dependencies after rename

Reviewed By: wat3rBro

Differential Revision: D41685115

fbshipit-source-id: 006999a020a901ea8be4b71e072d688bd36cdce2

aae8381a

28 Nov, 2022 1 commit

move common data api into separate module · 19c5392d

Yanghan Wang authored Nov 28, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/427

Re-try previous reverted diff D41350485 (https://github.com/facebookresearch/d2go/commit/0ea6bc1b61ab736ccf1840c58c2b19ed2e9a1282). The problem was essentially because `DefaultTask` is not a subclass of `Runner`, so when we call `Runner`'s class methods from `DefaultTask`, it won't work if the `Runner`'s method also calls other methods that are in `Runner` but not `DefaultTask`. The solution is simply split the data related APIs out into a separate class (mixin), and let `DefaultTask` and `Runner` both subclass from it.

Reviewed By: tglik

Differential Revision: D41507448

fbshipit-source-id: 8b26c129811436c0bd35e1c6b0705e7035d7e823

19c5392d

17 Nov, 2022 1 commit

Integrate PyTorch Fully Sharded Data Parallel (FSDP) · 02625ff8

Anthony Chen authored Nov 17, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/396

Integrate PyTorch FSDP, which supports two sharding modes: 1. gradient + optimizer sharding; 2. full model sharding (params + gradient + optimizer). This feature is enabled in the train_net.py code path.

Sources
* Integration follows this tutorial: https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html

API changes
* Add new config keys to support the new feature. Refer to mobile-vision/d2go/d2go/trainer/fsdp.py for the full list of config options
* Add `FSDPCheckpointer` as an inheritance of `QATCheckpointer` to support special loading/saving logic for FSDP models

Reviewed By: wat3rBro

Differential Revision: D39228316

fbshipit-source-id: 342ecb3bcbce748453c3fba2d6e1b7b7e478473c

02625ff8

11 Nov, 2022 1 commit

custom precision dtype for AMP training on D2 backend · 729682ff

Anthony Chen authored Nov 11, 2022

Summary:
X-link: https://github.com/facebookresearch/detectron2/pull/4654

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/412

Support custom precision dtype [float16, bfloat16] for AMP training on D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) backend. There's an old config key `SOLVER.AMP.PRECISION` that only works on lightning backend. This diff enables this config key on D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) backend (train_net binary) as well.

Reviewed By: tax313, wat3rBro

Differential Revision: D40811604

fbshipit-source-id: 58da17ae1519a54243b5295eb4253c297e4d9296

729682ff

03 Nov, 2022 1 commit

use SharedList as offload backend of DatasetFromList by default · 01c351bc

Yanghan Wang authored Nov 03, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/405

- Use the non-hacky way (added in D40818736, https://github.com/facebookresearch/detectron2/pull/4626) to customize offloaded backend for DatasetFromList.
- In `D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go`, switch to use `SharedList` (added in D40789062, https://github.com/facebookresearch/mobile-vision/pull/120) by default to save RAM and optionally use `DiskCachedList` to further save RAM.

Local benchmarking results (using a ~2.4 GiB dataset) using dev mode:
| RAM usage (RES, SHR) | No-dataset | Naive | NumpySerializedList | SharedList | DiskCachedList |
| -- | -- | -- | -- | -- | -- |
| Master GPU worker.         | 8.0g, 2.8g | 21.4g, 2.8g | 11.6g, 2.8g | 11.5g, 5.2g | -- |
| Non-master GPU worker  | 7.5g, 2.8g | 21.0g, 2.8g | 11.5g, 2.8g | 8.0g, 2.8g | -- |
| Per data loader worker     | 2.0g, 1.0g | 14.0g, 1.0g | 4.4g, 1.0g | 2.1g, 1.0g | -- |

- The memory usage (RES, SHR) is found from `top` command. `RES` is total memory used per process; `SHR` shows how much RAM can be shared inside `RES`.
- experiments are done using 2 GPU and 2 data loader workers per GPU, so there're 6 processes in total, the **numbers are per-process**.
- `No-dataset`: running the same job with tiny dataset (only 4.47 MiB after serialization), since RAM usage should be negligible, it shows the floor RAM usage.
- other experiments are running using a dataset of the size of **2413.57 MiB** after serialization.
  - `Naive`: vanilla version if we don't offload the dataset to other storage.
  - `NumpySerializedList`: this optimization was added a long time ago in D19896490. I recalled that the RAM was indeed shared for data loader worker, but seems that there was a regression. Now basically all the processes have a copy of data.
  - `SharedList`: is enabled in this diff. It shows that only the master GPU needs extra RAM. It's interesting that it uses 3.5GB RAM more than other rank, while the data itself is 2.4GB. I'm not so sure if it's overhead of the storage itself or the overhead caused by sharing it with other processes, since non-master GPU using `NumpySerializedList` also uses 11.5g of RAM, we probably don't need to worry too much about it.
  - `DiskCachedList`: didn't benchmark, should have no extra RAM usage.

Using the above number for a typical 8GPU, 4worker training, assuming the OS and other programs take 20-30GB RAM, the current training will use `11.6g * 8 + 4.4g * 8*4 = 233.6g` RAM, on the edge of causing OOM for a 256gb machine. This aligns with our experience that it supports ~2GB dataset. After the change, the training will use only `(11.5g * 7 + 8.0g) + 2.1g * 8*4 = 155.7g` RAM, which gives a much larger head room, we can thus train with much larger dataset (eg. 20GB) or use more DL workers (eg. 8 workers).

Reviewed By: sstsai-adl

Differential Revision: D40819959

fbshipit-source-id: fbdc9d2d1d440e14ae8496be65979a09f3ed3638

01c351bc

31 Oct, 2022 1 commit

Allow to use TensorFloat32 · 6c1682f9

Francisc Bungiu authored Oct 31, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/403

`cfg.SOLVER.AMP.ENABLED` enabled mixed precision, but this only works for V100 GPUs.
For A100s, the equivalent is to enable TF32.

Reviewed By: tglik

Differential Revision: D40675242

fbshipit-source-id: 5cc3d12cd3d7ec76665e0907ecc87fc5f64d73f0

6c1682f9

26 Oct, 2022 1 commit

swap the order of qat and layer freezing to preserve checkpoint values · 13b2fe71

Matthew Yu authored Oct 26, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/399

Freezing the model before running quantization causes an issue with loading a saved checkpoint bc fusing does not support FrozenBatchNorm2d (which means that the checkpoint could have a fused weight conv.bn.weight whereas the model would have an unfused weight bn.weight). The longer term solution is to add FrozenBatchNorm2d to the fusing support but there are some subtle issues there that will take some time to fix:
* need to move FrozenBatchNorm2d out of D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) and into mobile_cv lib
* current fuser has options to add new bn ops (e.g., FrozenBatchNorm2d) which we use with ops like SyncBN but this currently is only tested with inference so we need to write some additional checks on training

The swap will make freezing compatible with QAT and should still work with standard models. One subtle potential issue is that the current BN swap assumes that BN is a leaf node. If a user runs QAT without fusing BN, the BN will no longer be the leaf node as it will obtain an activation_post_process module in order to record the output. The result is that BN will not be frozen in this specific instance. This should not occur as BN is usually fused. A small adjustment to the BN swap would just be to swap the BN regardless of whether it is a leaf node (but we have to check whether activation_post_process module is retained). Another long term consideration is moving both freezing and quant to modeling hooks so the user can decide the order.

Reviewed By: wat3rBro

Differential Revision: D40496052

fbshipit-source-id: 0d7e467b833821f7952cd2fce459ae1f76e1fa3b

13b2fe71

23 Oct, 2022 1 commit

Add shared workers context API · 69bf820c

Tsahi Glik authored Oct 23, 2022

Summary:
X-link: https://github.com/facebookresearch/mobile-vision/pull/116

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/398

D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go doesn't have per node initialization api, but only per worker initialization that happens per subprocess.
Some projects (like IOBT) need to way to do shared initialization before spawning all the workers in subprocess and pass this initialized shared context to the workers.
This diff adds API to create a shared context object before launching workers and then use this shared context by the runners inside the workers after launch.

Reviewed By: wat3rBro

Differential Revision: D40001329

fbshipit-source-id: 231a4e7e4da7b5db50849176c58b104c4565306a

69bf820c

03 Oct, 2022 1 commit

enable zoomer for train_net · e0649685

Francisc Bungiu authored Oct 03, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/378

Some hooks need access to cfg to be initialized correctly. Pass cfg down the hook registration method.

Reviewed By: ertrue, miqueljubert

Differential Revision: D39303862

fbshipit-source-id: 931c356c7045f95fc0af5b20c7782ea4d1aff138

e0649685

29 Sep, 2022 1 commit

only automatically rescale the lr for sgd optimizers. · 3c68dda7

Peizhao Zhang authored Sep 29, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/377

Only automatically rescale the lr for sgd optimizers.
* Seems that only sgd needs scaling the lr, so we change to not to scale lr automatically by default. This will work better for newly added optimizers (like adam).

Reviewed By: itomatik, lg-zhang

Differential Revision: D39899434

fbshipit-source-id: d6eebc5b07d4489b401c1fc3cea00f5a060fe19d

3c68dda7

31 Aug, 2022 1 commit

switch to use inference_on_dataset_with_checkpointing in default runner. · 923a0568

Peizhao Zhang authored Aug 31, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/355

switch to use inference_on_dataset_with_checkpointing in default runner.

Reviewed By: HarounH

Differential Revision: D37215292

fbshipit-source-id: c006784ce0b31700bcbb1f79c303fd791f1561ff

923a0568

20 Aug, 2022 1 commit

Avoid calling scheduler.step() after the last training iteration is done · 53f9eee2

Xiaofang Wang authored Aug 19, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/358

Avoid calling scheduler.step() after the last training iteration is done

Reviewed By: wat3rBro

Differential Revision: D38605135

fbshipit-source-id: 87a55309bf6d1f7e598b567cc2372b00b8885c7c

53f9eee2

27 Jul, 2022 2 commits

EMA parity / change build_d2go_model · 482fdc8a

Mircea Cimpoi authored Jul 27, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/344

we need access to the modeling hooks in EMA, e.g. build trainer.

Reviewed By: wat3rBro

Differential Revision: D37997773

fbshipit-source-id: bf4372cd310605fa35aa70f0604b084b047001d8

482fdc8a

add cfg option to control the frequency of the writers · 3c811d21

Kevin Chih-Yao Ma authored Jul 27, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/342

Add a cfg option to control the frequency of the writers.

Currently, the default writers include:
```
writers = [
    CommonMetricPrinter(max_iter),
    JSONWriter(os.path.join(cfg.OUTPUT_DIR, "metrics.json")),
    tbx_writer,
]
```

Reviewed By: wat3rBro

Differential Revision: D38065583

fbshipit-source-id: ebdc20aab71e03b4e18772af78b410f17ba4216d

3c811d21

29 Jun, 2022 1 commit

formalize build_d2go_model API · 4208a791

Yanghan Wang authored Jun 29, 2022

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/318

Reviewed By: mcimpoi

Differential Revision: D37501246

fbshipit-source-id: 6dbe5dcbaf7454f451d4a3bb3fa2d856cc87d5cc

4208a791

24 Jun, 2022 1 commit

better organize builtin runner's default config · 07eb7a1e

Yanghan Wang authored Jun 24, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/313

It's not natural to put runner's default config functions under `d2go/utils/`, move them to `d2go/runner/config_defaults.py` and clean things up. This also reduce the inter-sub-package dependencies.

Reviewed By: mattcyu1

Differential Revision: D37407078

fbshipit-source-id: 432644bee4f12306a14bac3dba76ced08b3683aa

07eb7a1e

20 Jun, 2022 1 commit

remove patch_d2_meta_arch · d1518ff6

Yanghan Wang authored Jun 20, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/305

One benefit of having separate registries for D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) and D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's meta-arch is that there's no need to patch original D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)'s meta arch because we can just register new meta arch in D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go directly. This diff removes the `patch_d2_meta_arch` and makes things simpler.

Reviewed By: mcimpoi

Differential Revision: D37246483

fbshipit-source-id: c8b7adef1fa7a5ff2f89c376c7e3b39bec8f19ee

d1518ff6

14 Jun, 2022 1 commit

make get_default_cfg a classmethod · 65dad512

Yanghan Wang authored Jun 14, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/293

In order to pass runner during the workflow using "runner name" instead of runner instance, we need to make sure the `get_default_cfg` is not instance method. It can be either staticmethod or classmethod, but I choose classmethod for better inheritance.

code mode using following script:
```
#!/usr/bin/env python3

import json
import os
import subprocess

result = subprocess.check_output("fbgs --json 'def get_default_cfg('", shell=True)
fbgs = json.loads(result)
fbsource_root = os.path.expanduser("~")

def _indent(s):
    return len(s) - len(s.lstrip())

def resolve_instance_method(content):
    lines = content.split("\n")
    for idx, line in enumerate(lines):
        if "def get_default_cfg(self" in line:
            indent = _indent(line)
            # find the class
            for j in range(idx, 0, -1):
                if lines[j].startswith(" " * (indent - 4) + "class "):
                    class_line = lines[j]
                    break
            else:
                raise RuntimeError("Can't find class")
            print("class_line: ", class_line)
            if "Runner" in class_line:
                # check self if not used
                for j in range(idx + 1, len(lines)):
                    if _indent(lines[j]) < indent:
                        break
                    assert "self" not in lines[j], (j, lines[j])
                # update the content
                assert "def get_default_cfg(self)" in line
                lines[idx] = lines[idx].replace(
                    "def get_default_cfg(self)", "def get_default_cfg(cls)"
                )
                lines.insert(idx, " " * indent + "classmethod")
                return "\n".join(lines)
    return content

def resolve_static_method(content):
    lines = content.split("\n")
    for idx, line in enumerate(lines):
        if "def get_default_cfg()" in line:
            indent = _indent(line)
            # find the class
            for j in range(idx, 0, -1):
                if "class " in lines[j]:
                    class_line = lines[j]
                    break
            else:
                print("[WARNING] Can't find class!!!")
                continue
            if "Runner" in class_line:
                # check staticmethod is used
                for j in range(idx, 0, -1):
                    if lines[j] == " " * indent + "staticmethod":
                        staticmethod_line_idx = j
                        break
                else:
                    raise RuntimeError("Can't find staticmethod")
                # update the content
                lines[idx] = lines[idx].replace(
                    "def get_default_cfg()", "def get_default_cfg(cls)"
                )
                lines[staticmethod_line_idx] = " " * indent + "classmethod"
                return "\n".join(lines)
    return content

for result in fbgs["results"]:
    filename = os.path.join(fbsource_root, result["file_name"])
    print(f"processing: {filename}")
    with open(filename) as f:
        content = f.read()
    orig_content = content
    while True:
        old_content = content
        content = resolve_instance_method(content)
        content = resolve_static_method(content)
        if content == old_content:
            break
    if content != orig_content:
        print("Updating ...")
        with open(filename, "w") as f:
            f.write(content)
```

Reviewed By: tglik

Differential Revision: D37059264

fbshipit-source-id: b09d5518f4232de95d8313621468905cf10a731c

65dad512

26 May, 2022 1 commit

revert the runner change in D36632783 · 4f5548a1

Yanghan Wang authored May 25, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/262

synchronization should be handled inside creating data loader, otherwise we need to add synchronization everywhere besides the training-loop, which is impossible.

Reviewed By: sstsai-adl

Differential Revision: D36683362

fbshipit-source-id: 0bb7c9b50656fece5df6a007c37ec5888ee172bc

4f5548a1

25 May, 2022 2 commits

move oss utils from d2go to mobile_cv · dcb9cb48

Yanghan Wang authored May 25, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/261

X-link: https://github.com/facebookresearch/mobile-vision/pull/71

`is_oss` and `fb_overwritable` are also needed in `mobile_cv`, move them from d2go.

Reviewed By: zhanghang1989

Differential Revision: D36655821

fbshipit-source-id: 421c4d22d4c4620678908fe13d6e47ab39604ae7

dcb9cb48

Fix coco conversion race condition with file lock · 6f02a8de

Alan Li authored May 25, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/260

Training + Evaluation for OCR Detection based on D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)GO

Purpose: To restart training of OCR detection after abandonment. Updates filepaths and some configs, as well as addresses a race condition situation when creating temp json files for annotations read/write within D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)GO.

The training can be restarted from scratch, or specified from a previous model by setting the weights_pth variable. Test plan is purely on 1 particular dataset.

Reviewed By: sstsai-adl

Differential Revision: D36632783

fbshipit-source-id: fe8677a57d660c495458d083c3fb70b95128b260

6f02a8de

21 May, 2022 1 commit

Add required example_inputs argument to prepare_fx and prepare_qat_fx · 9f746159

Jerry Zhang authored May 21, 2022

Summary:
X-link: https://github.com/pytorch/pytorch/pull/77608

X-link: https://github.com/pytorch/fx2trt/pull/76

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/249

X-link: https://github.com/fairinternal/ClassyVision/pull/104

X-link: https://github.com/pytorch/benchmark/pull/916

X-link: https://github.com/facebookresearch/ClassyVision/pull/791

X-link: https://github.com/facebookresearch/mobile-vision/pull/68

FX Graph Mode Quantization needs to know whether an fx node is a floating point Tensor before it can decide whether to
insert observer/fake_quantize module or not, since we only insert observer/fake_quantize module for floating point Tensors.
Currently we have some hacks to support this by defining some rules like NON_OBSERVABLE_ARG_DICT (https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/utils.py#L496), but this approach is fragile and we do not plan to maintain it long term in the pytorch code base.

As we discussed in the design review, we'd need to ask users to provide sample args and sample keyword args
so that we can infer the type in a more robust way. This PR starts with changing the prepare_fx and prepare_qat_fx api to require user to either provide
example arguments thrugh example_inputs, Note this api doesn't support kwargs, kwargs can make https://github.com/pytorch/pytorch/pull/76496#discussion_r861230047 (comment) simpler, but
it will be rare, and even then we can still workaround with positional arguments, also torch.jit.trace(https://pytorch.org/docs/stable/generated/torch.jit.trace.html) and ShapeProp: https://github.com/pytorch/pytorch/blob/master/torch/fx/passes/shape_prop.py#L140 just have single positional args, we'll just use a single example_inputs argument for now.

If needed, we can extend the api with an optional example_kwargs. e.g. in case when there are a lot of arguments for forward and it makes more sense to
pass the arguments by keyword

BC-breaking Note:
Before:
```python
m = resnet18(...)
m = prepare_fx(m, qconfig_dict)
# or
m = prepare_qat_fx(m, qconfig_dict)
```
After:
```python
m = resnet18(...)
m = prepare_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),))
# or
m = prepare_qat_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),))
```

Reviewed By: vkuzo, andrewor14

Differential Revision: D35984526

fbshipit-source-id: 706c8df71722c9aa5082a6491734f0144f0dd670

9f746159

20 May, 2022 1 commit

Refactor runner and runner_fb, to be autodeps friendly and user OSS option 2 · 8d58b499

Miquel Jubert Hermoso authored May 20, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/245

At the moment D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's runner still uses the OSS pattern 1 (see wiki), where the files get remapped. This does not work with D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go, and makes it necessary to use some renaming tricks.

This diff refactors the runner setup, to reduce the number of classes, and rely on fb_overwrite to add the correct fields to the config.

Reviewed By: wat3rBro

Differential Revision: D36316955

fbshipit-source-id: 4aaaece121b8df802f9395648c97a647fa7db857

8d58b499

17 May, 2022 1 commit

Use build_model in d2go in default_runner instead of the version in d2. · 2d328adb

Peizhao Zhang authored May 17, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/206

Use build_model in d2go in default_runner instead of the version in d2.

Differential Revision: D35535568

fbshipit-source-id: e5e2de63787d21a3c51ed3d689f9058c5f4518b3

2d328adb

15 May, 2022 1 commit

apply import merging for fbcode (7 of 11) · b3a9204c

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402205

fbshipit-source-id: a4efc688d02da80c6e96685aa8eb00411615a366

b3a9204c

26 Apr, 2022 1 commit

move quantization out from modeling · 5a068943

Yanghan Wang authored Apr 25, 2022

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/221

Reviewed By: tglik

Differential Revision: D35855051

fbshipit-source-id: f742dfbc91bb7a20f632a508743fa93e3a7e9aa9

5a068943

12 Apr, 2022 1 commit

Allow to modify train hooks. · 8d5c70e9

Pavel Pidlypenskyi authored Apr 12, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/204

Having additional train hooks is a nice feature to have, especially when one wants to add some training metrics via hooks.

Reviewed By: tglik

Differential Revision: D35377418

fbshipit-source-id: ca8e00a3c64f992fe9f6975689e50a8b846a1a37

8d5c70e9

05 Apr, 2022 1 commit

support do_postprocess when tracing rcnn model in D2 style · 647a3fdf

Yanghan Wang authored Apr 04, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/200

Currently when exporting the RCNN model, we call it with `self.model.inference(inputs, do_postprocess=False)[0]`, therefore the output of exported model is not post-processed, eg. the mask is in the squared shape. This diff adds the option to include postprocess in the exported model.

Worth noting that since the input is a single tensor, the post-process doesn't resize the output to original resolution, and we can't apply the post-process twice to further resize it in the Predictor's PostProcessFunc, add an assertion to raise error in this case. But this is fine for most production use cases where the input is not resized.

Set `RCNN_EXPORT.INCLUDE_POSTPROCESS` to `True` to enable this.

Reviewed By: tglik

Differential Revision: D34904058

fbshipit-source-id: 65f120eadc9747e9918d26ce0bd7dd265931cfb5

647a3fdf

31 Mar, 2022 1 commit

d2go profiler registry · 4c746dbe

Michael Snower authored Mar 31, 2022

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/201

Adds profiler registry.

Reviewed By: Maninae, wat3rBro

Differential Revision: D34725664

fbshipit-source-id: 52cb99b618e5ba5f9bd8d272d4dcaa770d66983a

4c746dbe

01 Mar, 2022 1 commit

Allow Users to Disable the Evaluation after the Last Training Iteration · f16cc060

Tong Xiao authored Feb 28, 2022

Summary:
`Detectron2GoRunner` defaults to trigger an evaluation right after the last iteration in the `runner.do_train` method. This sometimes might be unnecessary, because there is a `runner.do_test` at the end of training anyways.

It could also lead to some side effects. For example, it would cause the training and test data loader present at the same time, which led to an OOM issue in our use case.

In this diff, we add an option `eval_after_train` in the `EvalHook` to allow users to disable the evaluation after the last training iteration.

Reviewed By: wat3rBro

Differential Revision: D34295685

fbshipit-source-id: 3612eb649bb50145346c56c072ae9ca91cb199f5

f16cc060