Commits · 3ecf880600d87e050d70e01e66b3da5b1b711d08 · OpenDAS / d2go

07 Jun, 2023 1 commit

Convert GPU to CPU if CUDA not available · 3ecf8806

Jessica Zhong authored Jun 06, 2023

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/569

Reviewed By: wat3rBro

Differential Revision: D46498855

fbshipit-source-id: 99888f6a36a0f69155c3447cc080392ae9886539

3ecf8806

06 Jun, 2023 1 commit
- added logging and command line flag --use_elastic to enable torch elastic · f6afd9a9
  Jessica Zhong authored Jun 06, 2023
```
Reviewed By: wat3rBro

Differential Revision: D46460305

fbshipit-source-id: e91d9312c5d81ef1ba64ab169380329c8ad05f7c
```
  f6afd9a9
03 Jun, 2023 1 commit

use `get_convert_fx_fn` for eager mode convert · 3ba489fa

Jiaxu Zhu authored Jun 02, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/564

As title, as we need `ai_factory.quantization.convert.convert_eager` for Stinson models. This diff renames ``get_convert_fx_fn` to `get_convert_fn` and includes eager mode convert functions as well

Reviewed By: wat3rBro

Differential Revision: D46368438

fbshipit-source-id: 5ebea1f05b43b476a14ab1091f6ce39bffe614d3

3ba489fa

02 Jun, 2023 1 commit

Enable Torch Elastic Launch on Mast in D2go · 7d35bae7

Jessica Zhong authored Jun 02, 2023

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/566

Reviewed By: wat3rBro

Differential Revision: D45829249

fbshipit-source-id: 4e70bed0e85179b49b4e2358be3d937cfbf474d4

7d35bae7

01 Jun, 2023 1 commit

print parameter names in individual param groups · 87956d50

Zhicheng Yan authored May 31, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/539

Print out parameter names in each parameter group to a separate file (vs writing to the main log file)
This is useful to know assignment of specific parameters to a param group.

Reviewed By: wat3rBro

Differential Revision: D45855436

fbshipit-source-id: 1e1db4cf079802fc20fe3e3d0a931d8c44721d6c

87956d50

29 May, 2023 2 commits

Put back typing for Base Runner create_shared_context · 17672daa

Ajinkya Deogade authored May 28, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/562

Reverting the changes introduced in the diff D46096375 to restore the state before modularization.

Reviewed By: tglik

Differential Revision: D46145093

fbshipit-source-id: 9897640ec00331fc6ea2817fa46b2272fc33cb8d

17672daa

Trainer part 2: Create a separate TARGET for lightning trainer · d06a8fb1

Ajinkya Deogade authored May 28, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/561

This is the continuation from the part 1 D45912069 where we had not defined the TARGETS for the lightning trainer.
As the circular deps have been resolved, we can define the targets for `d2go/trainer/lightning` and move the other TARGETS inside `d2go/trainer`.

Reviewed By: tglik

Differential Revision: D46096373

fbshipit-source-id: 6efc13eb9ab343d11028fb238e6e3f0c64a03e09

d06a8fb1

27 May, 2023 8 commits

Utils part 2: create a separate buck target · 0cde431c

Ajinkya Deogade authored May 27, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/560

This is the continuation from the part 1 D45912077.
As the dependencies have been resolved, we can define the targets inside the dir `d2go/utils`

Reviewed By: wat3rBro

Differential Revision: D46096376

fbshipit-source-id: ab674d382162a4d7e5ee944b2a649e23278ca79f

0cde431c

Runner: create a separate buck target · 00208026

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/559

Create modular TARGETS for files inside `runner`.

Reviewed By: wat3rBro

Differential Revision: D45854271

fbshipit-source-id: a15ef475f72685ae8c3c73e0a83cf136a7285d3e

00208026

Temp remove output typing for Base Runner create_shared_context · 03156ce7

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/558

Presently, `D2GoSharedContext` is only imported in `mobile-vision/d2go/d2go/runner/default_runner.py` for type annotations. Unfortunately, this causes a circular dependency as the TARGET for `d2go.distributed` does not exists. Remove it temporarily and reintroduce back at the top of the stack once all the TARGETS have been introduced.

Reviewed By: wat3rBro

Differential Revision: D46096375

fbshipit-source-id: d8633ac755d39b807c18967f35a087178afc9787

03156ce7

Move distillation config to runner default configs · 64c467e2

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/557

To avoid circular dependencies, move the function `add_distillation_configs` that defines the default config for a `runner` making use of distillation from `mobile-vision/d2go/d2go/modeling/distillation.py` to `mobile-vision/d2go/d2go/runner/config_defaults.py`.

Reviewed By: wat3rBro

Differential Revision: D46096374

fbshipit-source-id: eb85d91b5239e7ab10809a9bf84c869d05d32401

64c467e2

Checkpoint part 1: create a separate buck target · f5072d01

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/556

The TARGETS for the files inside the directory `checkpoint` are tackled in two parts:

1. This diff creates TARGETS for the files inside `checkpoint` i.e. except `checkpoint/fb/tests`
2. The diff D46096372 creates TARGETS for files inside `checkpoint/fb/tests`

Reviewed By: tglik, wat3rBro

Differential Revision: D45912080

fbshipit-source-id: 04ab44e015a9d89d18e31c854600df05d35539d1

f5072d01

Trainer except lightning: create a separate buck target · b6efc047

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/555

The TARGETS for the files inside the directory `trainer` are tackled in two parts.
1. This diff creates TARGETS for the files inside `trainer` i.e. except `trainer/lightning`
2. The diff D46096373 creates TARGETS for files inside `trainer/lightning`

Reviewed By: tglik, wat3rBro

Differential Revision: D45912069

fbshipit-source-id: 3026250a49978f1b8e7a48aeebe1914d8a0a692b

b6efc047

Data: create a separate buck target · 363dffb1

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/554

Create separate buck TARGET for the files in the directory `data`.

Reviewed By: wat3rBro

Differential Revision: D45912070

fbshipit-source-id: 6785f623343ac826b01fab4ac187b928462a45dc

363dffb1

Move inplace_delegate from utils to modeling · 818a8c23

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/553

Move `inplace_delegate` from `utils` to `modeling` to break the circular dependency.

Reviewed By: tglik, wat3rBro

Differential Revision: D45912068

fbshipit-source-id: f9f8b1be866ea4d793f4afcd019f16dec3d2f147

818a8c23

26 May, 2023 4 commits

Quantization: create a separate buck target · 1581776b

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/552

This diff breaks down the TARGETS for dir `quantization`.
Apart from creating the TARGETS the diff temporarily copies the function `_convert_to_d2` from `d2go/runner/lightning_task.py` to avoid circular dependencies. The change is reverted in the diff D46096373.

Reviewed By: tglik

Differential Revision: D45912067

fbshipit-source-id: b430b2abd129690f8c56479bb75819940fde4e3b

1581776b

Utils part 1: create a separate buck target · 77dfafa2

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/551

The `utils` dir is broken down into two steps to deal with circular dependencies while keeping the diffs atomic. This diff creates TARGETS for the dirs. `utils`(except `demo_predictor.py`) and `utils/fb`. The TARGETS for `utils/testing` and `utils/demo_predictor.py` are introduced in down the stack in the diff D46096376.

Reviewed By: tglik

Differential Revision: D45912077

fbshipit-source-id: fb01969c5f5df97de8afaa24bee8492591059b4d

77dfafa2

Optimizer: create a separate buck target · bc1939eb

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/550

Create buck target for the optimizer.
Other than creating TARGETS, replaced `from d2go.optimizer import build_optimizer_mapper` with `from d2go.optimizer.build import build_optimizer_mapper`

Reviewed By: tglik

Differential Revision: D45912075

fbshipit-source-id: e478783a9ec16d4573d6365e5567e8d2ed72eb06

bc1939eb

Move iterate_module_named_parameters to utils · 1950242a

Ajinkya Deogade authored May 26, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/549

The `iterate_module_named_parameters` is used by the `optimizer` and `quantization`.
Let's move the `iterate_module_named_parameters` to a shared location `utils` to break the circular dependencies for the following diffs in the stack.

Reviewed By: tglik

Differential Revision: D45912066

fbshipit-source-id: bce5c5db3bbc1866f4da8662f7bd5908bfe30aad

1950242a

25 May, 2023 4 commits

Generic Reproducibility · edcdb731

Jiaxu Zhu authored May 25, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/548

As title, by setting
```
SOLVER.DETERMINISTIC = True
SEED = 42 # or other values
```
Training results are reproducible

Reviewed By: wat3rBro, rkaarimi

Differential Revision: D46174626

fbshipit-source-id: d6665b777376a176bd46a1286c3199ed0da26ae6

edcdb731

Config and Registry: create a separate buck target · 1accd414

Ajinkya Deogade authored May 25, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/546

Here we start modularizing the targets. I had to introduce some temporary hacks to break the circular dependency while keeping the diff atomic. There are some TODOs left at the end of the stack that are still WIP.

Reviewed By: tglik

Differential Revision: D45912076

fbshipit-source-id: 375f579fe749dd4a588908cdca7b76ba68f1048f

1accd414

Resolve relative import for modeldef · 34823153

Ajinkya Deogade authored May 25, 2023

Summary:
There is an issue with the relative import in the `__init__` file of modeldef that causes tests on GitHub CI to fail.
Specifically, the `FBNetV2ModelArch` is not correctly populated.
The internal CI does not detect such failures because we use the buck build system.
This diff fixes it.

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/547

Reviewed By: patricksnape

Differential Revision: D46177424

fbshipit-source-id: 06b23b9b221c990cd15a2debff6def8cfb99743b

34823153

fix attribute mismatch for memory profiler · 99c65490

Anthony Chen authored May 24, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/544

The previous diff on memory profiler D45673764 doesn't pick up a config key name change and causes an attribute not found error. This diff fixes it and adds two unittests (one with gpu one without) for using memory profiler in runner

Reviewed By: wat3rBro

Differential Revision: D46114730

fbshipit-source-id: d066d435021983d90f4a75e0c88798a3aedcaf92

99c65490

24 May, 2023 1 commit

Expand relative imports to absolute versions · 2526b053

Ajinkya Deogade authored May 24, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/545

Expanding the relative imports to absolute ones helps the autodeps down the stack.

Reviewed By: tglik

Differential Revision: D45912074

fbshipit-source-id: d42c9756dde731504ee6fd0f93cf549d71157489

2526b053

22 May, 2023 1 commit

Add a GPU memory snapshot profiler in d2go · 20e18edc

Anthony Chen authored May 22, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/542

## Overview
Add an option to enable GPU memory snapshot profiler in d2go. The profiler is natively supported by Pytorch and is able to record stack traces associated with all CUDA memory allocation/free events, allowing users to understand which parts of code contribute to the memory bottleneck. It also provides a powerful interactive web tool to visualize memory utilization ordered by time:
{F978609840}
Each colored block represents an allocated cuda memory block. User can click on the block to see the corresponding python stack trace that allocates the block.

## d2go integration
This diff integrates the profiler as a hook controlled by config key `USE_MEMORY_PROFILER`. The profiler will log snapshots and web tools to the output directory. There are three places that logging could happen: start of training, during training and OOM. Please read the docstring of `D2GoGpuMemorySnapshot` for more information.

Reviewed By: tglik, jaconey

Differential Revision: D45673764

fbshipit-source-id: 8900484a2266d94421fe3ee7a85a4dea3a9f6b72

20e18edc

19 May, 2023 1 commit

another implementation of log_interval · 876c6756

Yanghan Wang authored May 19, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/543

The previous implementation:
> the problem is the ContextDecorator somehow swallows the exception in the wrapped function and just returns None.

This diff adds a test such that previous implementation would fail:
```
======================================================================
FAIL: test_log_interval_error_prop (d2go.tests.fb.test_utils_logging.TestUtilsLogging)
Make sure the log_interval can handle error propagation.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/mobile-vision/d2go/tests/__init_tests__/init_tests#link-tree/d2go/tests/fb/test_utils_logging.py", line 152, in test_log_interval_error_prop
    foo(-1)
AssertionError: ValueError not raised

----------------------------------------------------------------------
Ran 1 test in 0.098s
```

The new version seems easier to understand and doesn't have the error swallowing.

Reviewed By: jaconey

Differential Revision: D46009938

fbshipit-source-id: 6b632deb513ab47c4d760f796bf49fc45eae3005

876c6756

18 May, 2023 1 commit

synchronize before dumping model configs · 55319e5d

Jiaxu Zhu authored May 17, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/541

The issue post https://fb.workplace.com/groups/277527419809135/permalink/1303604910534709/

The fix was suggested by the MV folks.

Reviewed By: dilinwang820, wat3rBro

Differential Revision: D45881863

fbshipit-source-id: b33345c4230067b78f27e7deb038c095d55f1360

55319e5d

16 May, 2023 1 commit

Training Reproducibility · c37ecd66

Jiaxu Zhu authored May 16, 2023

Summary:
X-link: https://github.com/facebookresearch/detectron2/pull/4955

Pull Request resolved: https://github.com/facebookresearch/d2go/pull/540

Allow users to launch deterministic training jobs. That is, using the same training config, users can get identical training results.

Reviewed By: dilinwang820

Differential Revision: D45370627

fbshipit-source-id: 88db388c992500b0d789b8341952502cd1f8f995

c37ecd66

12 May, 2023 1 commit

Add @log_interval to log function duration · 64a0e9a7

Jack Zhang authored May 12, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/538

We want to log interval to measure execution time for a function.

Reviewed By: wat3rBro

Differential Revision: D45751279

fbshipit-source-id: fe25d3fedd32f61b64e978881b6547d3bc1acb22

64a0e9a7

10 May, 2023 1 commit

Move print replacement to module level · 67aeb618

Mik Vyatskov authored May 10, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/537

For some reason numba cannot work with the print being overwritten by a local variable. However when the override is a module attribute, it seems to work.

Reviewed By: navsud

Differential Revision: D45730776

fbshipit-source-id: fee1288b1adb43f69fe7c4e43f4a8a750f0b98b4

67aeb618

08 May, 2023 1 commit

Quantize FBS model with 16bit FX Quantization · e3642005

Jiaxu Zhu authored May 08, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/531

As title, enable mixed precision FX quantization for FBS model.

This diff includes
1. Add `custom_prepare_fx` to the FBS d2go model to enable the FX quantization.
2. Add two new d2go config params `QUANTIZATION.ACT_BITS/QUANTIZATION.WEIGHTS`
3. Add `backend_config/qconfig_mapping` to d2go convert function calls.
4. Add an example FBS fx QAT config.

Reviewed By: ayushidalmia

Differential Revision: D45252545

fbshipit-source-id: 813b192fcdd66c17629490b8908ce8cd8534506a

e3642005

07 May, 2023 1 commit

Instrument checkpoints for FSDPCheckpointer · 859f0bb9

John Lee authored May 07, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/536

This diff insteuments checkpoints using signpost for FSDPCheckpointer using D44278485 as a reference

Reviewed By: miqueljubert

Differential Revision: D45524792

fbshipit-source-id: 9b7e004e6853141ee26d65ae11f79b1f5f5db0e6

859f0bb9

02 May, 2023 1 commit

Use FSDP.STATE_DICT_TYPE = SHARDED_STATE_DICT by default · 5ecbb174

Anthony Chen authored May 02, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/535

Use `FSDP.STATE_DICT_TYPE = SHARDED_STATE_DICT` for FSDP checkpointing by default.` FSDP.USE_LOCAL_STATE_DICT` will be deprecated in the future.

# Note
After the change, config usage of `FSDP.USE_LOCAL_STATE_DICT` will not be picked up by code: it will be superseded by the default type of FSDP.STATE_DICT_TYPE instead

Reviewed By: tglik

Differential Revision: D45413143

fbshipit-source-id: e7bc2d5dc04ac09004cb89353333be020a9c80b5

5ecbb174

01 May, 2023 3 commits

Replace hasattr with getattr in mobile-vision/d2go/d2go/utils/abnormal_checker.py · bbb792d3

Richard Barnes authored Apr 30, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/533

The pattern
```
X.Y if hasattr(X, "Y") else Z
```
can be replaced with
```
getattr(X, "Y", Z)
```

The [getattr](https://www.w3schools.com/python/ref_func_getattr.asp) function gives more succinct code than the [hasattr](https://www.w3schools.com/python/ref_func_hasattr.asp) function. Please use it when appropriate.

**This diff is very low risk. Green tests indicate that you can safely Accept & Ship.**

Differential Revision: D44886687

fbshipit-source-id: f3f0265251bf8008ae927b767da5749bf6828c2c

bbb792d3

support visualizing panoptic segmentation prediction · 18bd89b2

Zhicheng Yan authored Apr 30, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/532

Enable the visualization of panoptic segmentation.

Reviewed By: tglik

Differential Revision: D45334039

fbshipit-source-id: eebd9316d56d8132a5d3c166058ae18a0e88e928

18bd89b2

Add logging for checkpointer type, distributed mode, and checkpointing mode in d2go · a536c85b

Anthony Chen authored Apr 30, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/534

Currently, d2go supports 2 checkpointers, 2 distributed modes and 3 checkpointing modes. The many options make it hard to maintain and manage all use cases. For example, after the recent migration to FSDP sharded_state_dict, it's hard to understand and trace down the usage of the deprecated version.

Per crassirostris and wat3rBro's advice, this diff add API loggings to better keep track of checkpointer usage in d2go.

## Appendix
2 checkpointers: FSDPCheckpointer, AIInfraCheckpointer
2 distributed modes: ddp, fsdp
3 checkpointing modes (fsdp only): local_state_dict, sharded_state_dict, full_state_dict

Reviewed By: tglik

Differential Revision: D45385021

fbshipit-source-id: 5d2cb115ed0fdada254b819793e376e410ecd97d

a536c85b

21 Apr, 2023 1 commit

enable the diffusion visualization evaluators to run on multiple datasets · c7bd7dfe

Tao Xu authored Apr 21, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/527

- Add model.reset_generation_counter() to enable the diffusion visualization evaluators to run on multiple test datasets.
  - Before this fix, the visualization evaluators will only run on the 1st test dataset since self.generation_counter will set to <0 after running on the 1st test datasaet. Thus the visualization evaluators will skip for all the other test sets since self.generation_counter < 0.
- Use the ddim for upsampler by default for better results

Reviewed By: zechenghe

Differential Revision: D45058672

fbshipit-source-id: 2f7919bf6ecd2e5f6f242ce3e7891cb3dc8d6af4

c7bd7dfe

20 Apr, 2023 2 commits

add options to exclude buffers and frozen parameters in EMA · d032c02c

Anthony Chen authored Apr 20, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/530

Add options to include/exclude model buffers and frozen parameters in EMA state via two new config keys `MODEL_EMA.INCLUDE_FROZEN` and `MODEL_EMA.INCLUDE_BUFFER`

Reviewed By: tglik

Differential Revision: D45129625

fbshipit-source-id: 895ebe7e4f8e15566c3c3bddd852dd98c40a27b1

d032c02c

Enable configuration of async write metrics · 3639b43c

Tsahi Glik authored Apr 20, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/529

Set config param for enabling async write metrics added in D44305165

Use it in LDM Pokemon config as first use case

Reviewed By: sf-wind

Differential Revision: D44335491

fbshipit-source-id: b000502e6ed0e19a10d6fe3a7470bcd3045e7717

3639b43c

18 Apr, 2023 1 commit

Add the missing optimizer argument · feb74214

Chien-Chin Huang authored Apr 18, 2023

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/528

Not passing optimizer object to shard_full_optim_state_dict() is being deprecated. This diff passes optimizer to shard_full_optim_state_dict().

Reviewed By: YanjunChen329

Differential Revision: D45065185

fbshipit-source-id: 0abec3eeff6e7c626eefc432c73e38779a6f02d9

feb74214