Commits · 39021408587eb252ebb842a54195d840d6d76095 · OpenDAS / vision

03 Jun, 2020 1 commit

torchvision QAT tutorial: update for QAT with DDP (#2280) · 39021408

Vasiliy Kuznetsov authored Jun 03, 2020

Summary:

We've made two recent changes to QAT in PyTorch core:
1. add support for SyncBatchNorm
2. make eager mode QAT prepare scripts respect device affinity

This PR updates the torchvision QAT reference script to take
advantage of both of these.  This should be landed after
https://github.com/pytorch/pytorch/pull/39337 (the last PT
fix) to avoid compatibility issues.

Test Plan:

```
python -m torch.distributed.launch
  --nproc_per_node 8
  --use_env
  references/classification/train_quantization.py
  --data-path {imagenet1k_subset}
  --output-dir {tmp}
  --sync-bn
```

Reviewers:

Subscribers:

Tasks:

Tags:

39021408

18 May, 2020 1 commit

vision classification QAT tutorial: fix for DDP (redo) (#2230) · 7ed3950e

Vasiliy Kuznetsov authored May 18, 2020

Summary:

Redo of https://github.com/pytorch/vision/pull/2191

Makes the classification QAT tutorial not crash when used
with DDP. There were two issues:

1. the model was moved to GPU before the observers were added, and they
are created on CPU. In the context of this repo, the fix is to finalize
the model before moving to GPU. We can potentially follow up with a
better error message in the future, in a separate PR.
2. the QAT conversion was running on the DDP'ed model, which had various
problems. The fix is to unwrap the model from DDP before cloning it for
evaluation.

There is still work to do on verifying that BN is working correctly in
QAT + DDP, but saving that for a separate PR.

Test Plan:

```
python -m torch.distributed.launch --use_env references/classification/train_quantization.py --data-path {path_to_imagenet_1k} --output_dir {output_dir}
```

Reviewers:

Subscribers:

Tasks:

Tags:

7ed3950e

31 Mar, 2020 1 commit

Remove python2 compability code (#2033) · 24f16a33

Philip Meier authored Mar 31, 2020

* remove sys.version_info == 2

* remove sys.version_info < 3

* remove from __future__ imports

24f16a33

20 Mar, 2020 1 commit
- Add default training parameters to classification refrence README (#1998) · ae228fef
  Philip Meier authored Mar 20, 2020
```
* add default parameters to README

* fix vgg_*_bn
```
  ae228fef
13 Mar, 2020 1 commit
- update inception number (#1975) · 1f4f7624
  hx89 authored Mar 13, 2020
  
  1f4f7624
10 Mar, 2020 1 commit
- document int8 quantization model (#1951) · 7d1cd1de
  Kentaro Yoshioka authored Mar 10, 2020
```
usage and performance are from the vision0.5 release notes.
```
  7d1cd1de
04 Nov, 2019 1 commit
- Add commands to run quantized model with pretrained weights (#1547) · bb261c5c
  hx89 authored Nov 04, 2019
  
  bb261c5c
30 Oct, 2019 1 commit
- adding documentation for automatic mixed precision training (#1533) · 06770498
  Vinh Nguyen authored Oct 31, 2019
  
  06770498
26 Oct, 2019 2 commits

Quantizable resnet and mobilenet models (#1471) · b4cb5765

raghuramank100 authored Oct 26, 2019

* add quantized models

* Modify mobilenet.py documentation and clean up comments
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Move fuse_model method to QuantizableInvertedResidual and clean up args documentation
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Restore relu settings to default in resnet.py
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix missing return in forward
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix missing return in forwards
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Change pretrained -> pretrained_float_models
Replace InvertedResidual with block

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Update tests to follow similar structure to test_models.py, allowing for modular testing

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Replace forward method with simple function assignment

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix error in arguments for resnet18

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* pretrained_float_model argument missing for mobilenet

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* reference script for quantization aware training and post training quantization

* reference script for quantization aware training and post training quantization

* set pretrained_float_model as False and explicitly provide float model

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Address review comments:
1. Replace forward with _forward
2. Use pretrained models in reference train/eval script
3. Modify test to skip if fbgemm is not supported

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix lint errors.
Use _forward for common code between float and quantized models
Clean up linting for reference train scripts
Test over all quantizable models

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Update default values for args in quantization/train.py

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Update models to conform to new API with quantize argument
Remove apex in training script, add post training quant as an option
Add support for separate calibration data set.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix minor errors in train_quantization.py

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Remove duplicate file

* Bugfix

* Minor improvements on the models

* Expose print_freq to evaluate

* Minor improvements on train_quantization.py

* Ensure that quantized models are created and run on the specified backends
Fix errors in test only mode

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add model urls

* Fix errors in quantized model tests.
Speedup creation of random quantized model by removing histogram observers

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Move setting qengine prior to convert.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix lint error

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add readme.md

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Readme.md

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix lint

b4cb5765

[WIP] Add commands for model training (#1203) · 9e27356f
Francisco Massa authored Oct 26, 2019
```
* Initial version of README for classification reference scripts

* More context
```
9e27356f

19 Jul, 2019 1 commit

Fix apex distributed training (#1124) · c187c2b1

Vinh Nguyen authored Jul 19, 2019

* adding mixed precision training with Apex

* fix APEX default optimization level

* adding python version check for apex

* fix LINT errors and raise exceptions if apex not available

* fixing apex distributed training

* fix throughput calculation: include forward pass

* remove torch.cuda.set_device(args.gpu) as it's already called in init_distributed_mode

* fix linter: new line

* move Apex initialization code back to the beginning of main

* move apex initialization to before lr_scheduler - for peace of mind. Though, doing apex initialization after lr_scheduler seems to work fine as well

c187c2b1

14 Jun, 2019 1 commit

utils.py in references can't work with pytorch-cpu (#1023) · 250bac89

LXYTSOS authored Jun 15, 2019

* can't work with pytorch-cpu fixed

utils.py can't work with pytorch-cpu because of this line of code `memory=torch.cuda.max_memory_allocated()`

* can't work with pytorch-cpu fixed

utils.py can't work with pytorch-cpu because of this line of code 'memory=torch.cuda.max_memory_allocated()'

250bac89

06 Jun, 2019 1 commit

adding mixed precision training with Apex (#972) · aa32c937

Vinh Nguyen authored Jun 06, 2019

* adding mixed precision training with Apex

* fix APEX default optimization level

* adding python version check for apex

* fix LINT errors and raise exceptions if apex not available

aa32c937

21 May, 2019 1 commit
- Add pretrained arg to reference scripts (#935) · 115d2eb7
  Francisco Massa authored May 21, 2019
```
Allows for easily evaluating the pre-trained models in the modelzoo
```
  115d2eb7
19 May, 2019 1 commit
- Fixes for PyTorch 1.1 (#919) · 12d2c737
  Francisco Massa authored May 19, 2019
  
  12d2c737
08 May, 2019 1 commit
- Miscellaneous improvements to the classification reference scripts (#894) · ae81313f
  Francisco Massa authored May 08, 2019
```
* Miscellaneous improvements to the classification reference scritps

* Fix lint
```
  ae81313f
02 Apr, 2019 2 commits

Add groups support to ResNet (#822) · ad0daff1

Francisco Massa authored Apr 02, 2019

* Add groups support to ResNet

* Kill BaseResNet

* Make it support multi-machine training

ad0daff1

Making references/classification/train.py and... · 8697f9e0

Surgan Jandial authored Apr 03, 2019

Making references/classification/train.py and references/classification/utils.py  compatible with python2  (#831)

* linter fixes

* linter fixes

8697f9e0

28 Mar, 2019 1 commit

Initial version of classification reference scripts (#819) · 27ff89f6

Francisco Massa authored Mar 28, 2019

* Initial version of classification reference training script

* Updates

* Minor updates

* Expose a few more options

* Load optimizer and lr_scheduler when resuming

Also log the learning rate

* Evaluation-only and minor improvements

Identified a bug in the reporting of the results. They need to be reduced between all processes

* Address Soumith's comment

* Fix some approximations on the evaluation metric

* Flake8

27ff89f6