"vscode:/vscode.git/clone" did not exist on "6495927e7ccf4a67720fd81fe5013f752a71f45c"
Unverified Commit 604f8431 authored by Houwen Peng's avatar Houwen Peng Committed by GitHub
Browse files

Request for Integrating the new NAS algorithm: Cream (#2705)

parent cda02aff
...@@ -135,6 +135,7 @@ Within the following table, we summarized the current NNI capabilities, we are g ...@@ -135,6 +135,7 @@ Within the following table, we summarized the current NNI capabilities, we are g
<li><a href="docs/en_US/NAS/Proxylessnas.md">ProxylessNAS</a></li> <li><a href="docs/en_US/NAS/Proxylessnas.md">ProxylessNAS</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#NetworkMorphism">Network Morphism</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#NetworkMorphism">Network Morphism</a></li>
<li><a href="docs/en_US/NAS/TextNAS.md">TextNAS</a></li> <li><a href="docs/en_US/NAS/TextNAS.md">TextNAS</a></li>
<li><a href="docs/en_US/NAS/Cream.md">Cream</a></li>
</ul> </ul>
</ul> </ul>
<a href="docs/en_US/Compression/Overview.md">Model Compression</a> <a href="docs/en_US/Compression/Overview.md">Model Compression</a>
......
# CDARTS # CDARTS
## Introduction ## Introduction
CDARTS builds a cyclic feedback mechanism between the search and evaluation networks. First, the search network generates an initial topology for evaluation, so that the weights of the evaluation network can be optimized. Second, the architecture topology in the search network is further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks, and thus enables the evolution of the topology to fit the final evaluation network. [CDARTS](https://arxiv.org/pdf/2006.10724.pdf) builds a cyclic feedback mechanism between the search and evaluation networks. First, the search network generates an initial topology for evaluation, so that the weights of the evaluation network can be optimized. Second, the architecture topology in the search network is further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks, and thus enables the evolution of the topology to fit the final evaluation network.
In implementation of `CdartsTrainer`, it first instantiates two models and two mutators (one for each). The first model is the so-called "search network", which is mutated with a `RegularizedDartsMutator` -- a mutator with subtle differences with `DartsMutator`. The second model is the "evaluation network", which is mutated with a discrete mutator that leverages the previous search network mutator, to sample a single path each time. Trainers train models and mutators alternatively. Users can refer to [references](#reference) if they are interested in more details on these trainers and mutators. In implementation of `CdartsTrainer`, it first instantiates two models and two mutators (one for each). The first model is the so-called "search network", which is mutated with a `RegularizedDartsMutator` -- a mutator with subtle differences with `DartsMutator`. The second model is the "evaluation network", which is mutated with a discrete mutator that leverages the previous search network mutator, to sample a single path each time. Trainers train models and mutators alternatively. Users can refer to [paper](https://arxiv.org/pdf/2006.10724.pdf) if they are interested in more details on these trainers and mutators.
## Reproduction Results ## Reproduction Results
...@@ -19,7 +20,7 @@ This is CDARTS based on the NNI platform, which currently supports CIFAR10 searc ...@@ -19,7 +20,7 @@ This is CDARTS based on the NNI platform, which currently supports CIFAR10 searc
## Examples ## Examples
[Example code](https://github.com/microsoft/nni/tree/v1.9/examples/nas/cdarts) [Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cdarts)
```bash ```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder. # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
...@@ -55,3 +56,4 @@ bash run_retrain_cifar.sh ...@@ -55,3 +56,4 @@ bash run_retrain_cifar.sh
.. autoclass:: nni.algorithms.nas.pytorch.cdarts.RegularizedMutatorParallel .. autoclass:: nni.algorithms.nas.pytorch.cdarts.RegularizedMutatorParallel
:members: :members:
``` ```
# Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
**[[Paper]](https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf) [[Models-Google Drive]](https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing)[[Models-Baidu Disk (PWD: wqw6)]](https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g) [[BibTex]](https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja)** <br/>
In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. The discovered architectures achieve superior performance compared to the recent [MobileNetV3](https://arxiv.org/abs/1905.02244) and [EfficientNet](https://arxiv.org/abs/1905.11946) families under aligned settings.
<div >
<img src="https://github.com/microsoft/Cream/blob/main/demo/intro.jpg" width="800"/>
</div>
## Reproduced Results
Top-1 Accuracy on ImageNet. The top-1 accuracy of Cream search algorithm surpasses MobileNetV3 and EfficientNet-B0/B1 on ImageNet.
The training with 16 Gpus is a little bit superior than 8 Gpus, as below.
| Model (M Flops) | 8Gpus | 16Gpus |
| ---- |:-------------:| :-----:|
| 14M | 53.7 | 53.8 |
| 43M | 65.8 | 66.5 |
| 114M | 72.1 | 72.8 |
| 287M | 76.7 | 77.6 |
| 481M | 78.9 | 79.2 |
| 604M | 79.4 | 80.0 |
<table style="border: none">
<th><img src="./../../img/cream_flops100.jpg" alt="drawing" width="400"/></th>
<th><img src="./../../img/cream_flops600.jpg" alt="drawing" width="400"/></th>
</table>
## Examples
[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cream)
Please run the following scripts in the example folder.
## Data Preparation
You need to first download the [ImageNet-2012](http://www.image-net.org/) to the folder `./data/imagenet` and move the validation set to the subfolder `./data/imagenet/val`. To move the validation set, you cloud use the following script: <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh>
Put the imagenet data in `./data`. It should be like following:
```
./data/imagenet/train
./data/imagenet/val
...
```
## Quick Start
### I. Search
First, build environments for searching.
```
pip install -r ./requirements
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cpp_ext --cuda_ext
```
To search for an architecture, you need to configure the parameters `FLOPS_MINIMUM` and `FLOPS_MAXIMUM` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in `./configs/train.yaml`
```
FLOPS_MINIMUM: 0 # Minimum Flops of Architecture
FLOPS_MAXIMUM: 600 # Maximum Flops of Architecture
```
For example, if you expect to search an architecture with model flops <= 200M, please set the `FLOPS_MINIMUM` and `FLOPS_MAXIMUM` to be `0` and `200`.
After you specify the flops of the architectures you would like to search, you can search an architecture now by running:
```
python -m torch.distributed.launch --nproc_per_node=8 ./train.py --cfg ./configs/train.yaml
```
The searched architectures need to be retrained and obtain the final model. The final model is saved in `.pth.tar` format. Retraining code will be released soon.
### II. Retrain
To train searched architectures, you need to configure the parameter `MODEL_SELECTION` to specify the model Flops. To specify which model to train, you should add `MODEL_SELECTION` in `./configs/retrain.yaml`. You can select one from [14,43,112,287,481,604], which stands for different Flops(MB).
```
MODEL_SELECTION: 43 # Retrain 43m model
MODEL_SELECTION: 481 # Retrain 481m model
......
```
To train random architectures, you need specify `MODEL_SELECTION` to `-1` and configure the parameter `INPUT_ARCH`:
```
MODEL_SELECTION: -1 # Train random architectures
INPUT_ARCH: [[0], [3], [3, 3], [3, 1, 3], [3, 3, 3, 3], [3, 3, 3], [0]] # Random Architectures
......
```
After adding `MODEL_SELECTION` in `./configs/retrain.yaml`, you need to use the following command to train the model.
```
python -m torch.distributed.launch --nproc_per_node=8 ./retrain.py --cfg ./configs/retrain.yaml
```
### III. Test
To test our trained of models, you need to use `MODEL_SELECTION` in `./configs/test.yaml` to specify which model to test.
```
MODEL_SELECTION: 43 # test 43m model
MODEL_SELECTION: 481 # test 470m model
......
```
After specifying the flops of the model, you need to write the path to the resume model in `./test.sh`.
```
RESUME_PATH: './43.pth.tar'
RESUME_PATH: './481.pth.tar'
......
```
We provide 14M/43M/114M/287M/481M/604M pretrained models in [google drive](https://drive.google.com/drive/folders/1CQjyBryZ4F20Rutj7coF8HWFcedApUn2) or [[Models-Baidu Disk (password: wqw6)]](https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g) .
After downloading the pretrained models and adding `MODEL_SELECTION` and `RESUME_PATH` in './configs/test.yaml', you need to use the following command to test the model.
```
python -m torch.distributed.launch --nproc_per_node=8 ./test.py --cfg ./configs/test.yaml
```
...@@ -15,3 +15,4 @@ One-shot NAS algorithms leverage weight sharing among models in neural architect ...@@ -15,3 +15,4 @@ One-shot NAS algorithms leverage weight sharing among models in neural architect
CDARTS <CDARTS> CDARTS <CDARTS>
ProxylessNAS <Proxylessnas> ProxylessNAS <Proxylessnas>
TextNAS <TextNAS> TextNAS <TextNAS>
Cream <Cream>
[Documentation](https://nni.readthedocs.io/en/latest/NAS/Cream.html)
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '604m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 4
NUM_GPU: 2
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 32 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.0
SELECTION: 42
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9998
OPT: 'sgd'
OPT_EPS: 1e-2
MOMENTUM: 0.9
DECAY_RATE: 0.1
SCHED: 'sgd'
LR_NOISE: None
LR_NOISE_PCT: 0.67
LR_NOISE_STD: 1.0
WARMUP_LR: 1e-4
MIN_LR: 1e-5
EPOCHS: 200
START_EPOCH: None
DECAY_EPOCHS: 30.0
WARMUP_EPOCHS: 3
COOLDOWN_EPOCHS: 10
PATIENCE_EPOCHS: 10
LR: 1e-2
\ No newline at end of file
AUTO_RESUME: True
DATA_DIR: './data/imagenet'
MODEL: 'Childnet_Testing'
RESUME_PATH: './experiments/workspace/ckps/42.pth.tar'
SAVE_PATH: './'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 4
NUM_GPU: 2
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'bilinear' # Image resize interpolation type
BATCH_SIZE: 32 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.0
SELECTION: 42
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9998
OPTIMIZER:
MOMENTUM: 0.9
WEIGHT_DECAY: 1e-3
\ No newline at end of file
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: 'Supernet_Training'
RESUME_PATH: './experiments/workspace/train/resume.pth.tar'
SAVE_PATH: './'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'bilinear' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NET:
GP: 'avg'
DROPOUT_RATE: 0.0
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9998
OPT: 'sgd'
LR: 1.0
EPOCHS: 120
META_LR: 1e-4
BATCHNORM:
SYNC_BN: False
SUPERNET:
UPDATE_ITER: 200
SLICE: 4
POOL_SIZE: 10
RESUNIT: False
DIL_CONV: False
UPDATE_2ND: True
FLOPS_MINIMUM: 0
FLOPS_MAXIMUM: 600
PICK_METHOD: 'meta'
META_STA_EPOCH: 20
HOW_TO_PROB: 'pre_prob'
PRE_PROB: (0.05,0.2,0.05,0.5,0.05,0.15)
\ No newline at end of file
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# Written by Hao Du and Houwen Peng
# email: haodu8-c@my.cityu.edu.hk and houwen.peng@microsoft.com
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from yacs.config import CfgNode as CN
DEFAULT_CROP_PCT = 0.875
IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
IMAGENET_DEFAULT_STD = (0.229, 0.224, 0.225)
__C = CN()
cfg = __C
__C.AUTO_RESUME = True
__C.DATA_DIR = './data/imagenet'
__C.MODEL = 'cream'
__C.RESUME_PATH = './experiments/ckps/resume.pth.tar'
__C.SAVE_PATH = './experiments/ckps/'
__C.SEED = 42
__C.LOG_INTERVAL = 50
__C.RECOVERY_INTERVAL = 0
__C.WORKERS = 4
__C.NUM_GPU = 1
__C.SAVE_IMAGES = False
__C.AMP = False
__C.ACC_GAP = 5
__C.OUTPUT = 'output/path/'
__C.EVAL_METRICS = 'prec1'
__C.TTA = 0 # Test or inference time augmentation
__C.LOCAL_RANK = 0
__C.VERBOSE = False
# dataset configs
__C.DATASET = CN()
__C.DATASET.NUM_CLASSES = 1000
__C.DATASET.IMAGE_SIZE = 224 # image patch size
__C.DATASET.INTERPOLATION = 'bilinear' # Image resize interpolation type
__C.DATASET.BATCH_SIZE = 32 # batch size
__C.DATASET.NO_PREFECHTER = False
__C.DATASET.PIN_MEM = True
__C.DATASET.VAL_BATCH_MUL = 4
# model configs
__C.NET = CN()
__C.NET.SELECTION = 14
__C.NET.GP = 'avg' # type of global pool ["avg", "max", "avgmax", "avgmaxc"]
__C.NET.DROPOUT_RATE = 0.0 # dropout rate
__C.NET.INPUT_ARCH = [[0], [3], [3, 3], [3, 1, 3], [3, 3, 3, 3], [3, 3, 3], [0]]
# model ema parameters
__C.NET.EMA = CN()
__C.NET.EMA.USE = True
__C.NET.EMA.FORCE_CPU = False # force model ema to be tracked on CPU
__C.NET.EMA.DECAY = 0.9998
# optimizer configs
__C.OPT = 'sgd'
__C.OPT_EPS = 1e-2
__C.MOMENTUM = 0.9
__C.WEIGHT_DECAY = 1e-4
__C.OPTIMIZER = CN()
__C.OPTIMIZER.NAME = 'sgd'
__C.OPTIMIZER.MOMENTUM = 0.9
__C.OPTIMIZER.WEIGHT_DECAY = 1e-3
# scheduler configs
__C.SCHED = 'sgd'
__C.LR_NOISE = None
__C.LR_NOISE_PCT = 0.67
__C.LR_NOISE_STD = 1.0
__C.WARMUP_LR = 1e-4
__C.MIN_LR = 1e-5
__C.EPOCHS = 200
__C.START_EPOCH = None
__C.DECAY_EPOCHS = 30.0
__C.WARMUP_EPOCHS = 3
__C.COOLDOWN_EPOCHS = 10
__C.PATIENCE_EPOCHS = 10
__C.DECAY_RATE = 0.1
__C.LR = 1e-2
__C.META_LR = 1e-4
# data augmentation parameters
__C.AUGMENTATION = CN()
__C.AUGMENTATION.AA = 'rand-m9-mstd0.5'
__C.AUGMENTATION.COLOR_JITTER = 0.4
__C.AUGMENTATION.RE_PROB = 0.2 # random erase prob
__C.AUGMENTATION.RE_MODE = 'pixel' # random erase mode
__C.AUGMENTATION.MIXUP = 0.0 # mixup alpha
__C.AUGMENTATION.MIXUP_OFF_EPOCH = 0 # turn off mixup after this epoch
__C.AUGMENTATION.SMOOTHING = 0.1 # label smoothing parameters
# batch norm parameters (only works with gen_efficientnet based models
# currently)
__C.BATCHNORM = CN()
__C.BATCHNORM.SYNC_BN = True
__C.BATCHNORM.BN_TF = False
__C.BATCHNORM.BN_MOMENTUM = 0.1 # batchnorm momentum override
__C.BATCHNORM.BN_EPS = 1e-5 # batchnorm eps override
# supernet training hyperparameters
__C.SUPERNET = CN()
__C.SUPERNET.UPDATE_ITER = 1300
__C.SUPERNET.SLICE = 4
__C.SUPERNET.POOL_SIZE = 10
__C.SUPERNET.RESUNIT = False
__C.SUPERNET.DIL_CONV = False
__C.SUPERNET.UPDATE_2ND = True
__C.SUPERNET.FLOPS_MAXIMUM = 600
__C.SUPERNET.FLOPS_MINIMUM = 0
__C.SUPERNET.PICK_METHOD = 'meta' # pick teacher method
__C.SUPERNET.META_STA_EPOCH = 20 # start using meta picking method
__C.SUPERNET.HOW_TO_PROB = 'pre_prob' # sample method
__C.SUPERNET.PRE_PROB = (0.05, 0.2, 0.05, 0.5, 0.05,
0.15) # sample prob in 'pre_prob'
import os
import time
import torch
import torchvision
from collections import OrderedDict
from lib.utils.util import AverageMeter, accuracy, reduce_tensor
def train_epoch(
epoch, model, loader, optimizer, loss_fn, cfg,
lr_scheduler=None, saver=None, output_dir='', use_amp=False,
model_ema=None, logger=None, writer=None, local_rank=0):
batch_time_m = AverageMeter()
data_time_m = AverageMeter()
losses_m = AverageMeter()
prec1_m = AverageMeter()
prec5_m = AverageMeter()
model.train()
end = time.time()
last_idx = len(loader) - 1
num_updates = epoch * len(loader)
optimizer.zero_grad()
for batch_idx, (input, target) in enumerate(loader):
last_batch = batch_idx == last_idx
data_time_m.update(time.time() - end)
input = input.cuda()
target = target.cuda()
output = model(input)
loss = loss_fn(output, target)
prec1, prec5 = accuracy(output, target, topk=(1, 5))
if cfg.NUM_GPU > 1:
reduced_loss = reduce_tensor(loss.data, cfg.NUM_GPU)
prec1 = reduce_tensor(prec1, cfg.NUM_GPU)
prec5 = reduce_tensor(prec5, cfg.NUM_GPU)
else:
reduced_loss = loss.data
optimizer.zero_grad()
loss.backward()
optimizer.step()
torch.cuda.synchronize()
losses_m.update(reduced_loss.item(), input.size(0))
prec1_m.update(prec1.item(), output.size(0))
prec5_m.update(prec5.item(), output.size(0))
if model_ema is not None:
model_ema.update(model)
num_updates += 1
batch_time_m.update(time.time() - end)
if last_batch or batch_idx % cfg.LOG_INTERVAL == 0:
lrl = [param_group['lr'] for param_group in optimizer.param_groups]
lr = sum(lrl) / len(lrl)
if local_rank == 0:
logger.info(
'Train: {} [{:>4d}/{}] '
'Loss: {loss.val:>9.6f} ({loss.avg:>6.4f}) '
'Prec@1: {top1.val:>7.4f} ({top1.avg:>7.4f}) '
'Prec@5: {top5.val:>7.4f} ({top5.avg:>7.4f}) '
'Time: {batch_time.val:.3f}s, {rate:>7.2f}/s '
'({batch_time.avg:.3f}s, {rate_avg:>7.2f}/s) '
'LR: {lr:.3e}'
'Data: {data_time.val:.3f} ({data_time.avg:.3f})'.format(
epoch,
batch_idx,
len(loader),
loss=losses_m,
top1=prec1_m,
top5=prec5_m,
batch_time=batch_time_m,
rate=input.size(0) *
cfg.NUM_GPU /
batch_time_m.val,
rate_avg=input.size(0) *
cfg.NUM_GPU /
batch_time_m.avg,
lr=lr,
data_time=data_time_m))
writer.add_scalar(
'Loss/train',
prec1_m.avg,
epoch *
len(loader) +
batch_idx)
writer.add_scalar(
'Accuracy/train',
prec1_m.avg,
epoch *
len(loader) +
batch_idx)
writer.add_scalar(
'Learning_Rate',
optimizer.param_groups[0]['lr'],
epoch * len(loader) + batch_idx)
if cfg.SAVE_IMAGES and output_dir:
torchvision.utils.save_image(
input, os.path.join(
output_dir, 'train-batch-%d.jpg' %
batch_idx), padding=0, normalize=True)
if saver is not None and cfg.RECOVERY_INTERVAL and (
last_batch or (batch_idx + 1) % cfg.RECOVERY_INTERVAL == 0):
saver.save_recovery(
model,
optimizer,
cfg,
epoch,
model_ema=model_ema,
use_amp=use_amp,
batch_idx=batch_idx)
if lr_scheduler is not None:
lr_scheduler.step_update(
num_updates=num_updates,
metric=losses_m.avg)
end = time.time()
# end for
if hasattr(optimizer, 'sync_lookahead'):
optimizer.sync_lookahead()
return OrderedDict([('loss', losses_m.avg)])
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# Written by Hao Du and Houwen Peng
# email: haodu8-c@my.cityu.edu.hk and houwen.peng@microsoft.com
import time
import torch
from collections import OrderedDict
from lib.utils.util import AverageMeter, accuracy, reduce_tensor
def validate(epoch, model, loader, loss_fn, cfg, log_suffix='', logger=None, writer=None, local_rank=0):
batch_time_m = AverageMeter()
losses_m = AverageMeter()
prec1_m = AverageMeter()
prec5_m = AverageMeter()
model.eval()
end = time.time()
last_idx = len(loader) - 1
with torch.no_grad():
for batch_idx, (input, target) in enumerate(loader):
last_batch = batch_idx == last_idx
output = model(input)
if isinstance(output, (tuple, list)):
output = output[0]
# augmentation reduction
reduce_factor = cfg.TTA
if reduce_factor > 1:
output = output.unfold(
0,
reduce_factor,
reduce_factor).mean(
dim=2)
target = target[0:target.size(0):reduce_factor]
loss = loss_fn(output, target)
prec1, prec5 = accuracy(output, target, topk=(1, 5))
if cfg.NUM_GPU > 1:
reduced_loss = reduce_tensor(loss.data, cfg.NUM_GPU)
prec1 = reduce_tensor(prec1, cfg.NUM_GPU)
prec5 = reduce_tensor(prec5, cfg.NUM_GPU)
else:
reduced_loss = loss.data
torch.cuda.synchronize()
losses_m.update(reduced_loss.item(), input.size(0))
prec1_m.update(prec1.item(), output.size(0))
prec5_m.update(prec5.item(), output.size(0))
batch_time_m.update(time.time() - end)
end = time.time()
if local_rank == 0 and (last_batch or batch_idx % cfg.LOG_INTERVAL == 0):
log_name = 'Test' + log_suffix
logger.info(
'{0}: [{1:>4d}/{2}] '
'Time: {batch_time.val:.3f} ({batch_time.avg:.3f}) '
'Loss: {loss.val:>7.4f} ({loss.avg:>6.4f}) '
'Prec@1: {top1.val:>7.4f} ({top1.avg:>7.4f}) '
'Prec@5: {top5.val:>7.4f} ({top5.avg:>7.4f})'.format(
log_name, batch_idx, last_idx,
batch_time=batch_time_m, loss=losses_m,
top1=prec1_m, top5=prec5_m))
writer.add_scalar(
'Loss' + log_suffix + '/vaild',
prec1_m.avg,
epoch * len(loader) + batch_idx)
writer.add_scalar(
'Accuracy' +
log_suffix +
'/vaild',
prec1_m.avg,
epoch *
len(loader) +
batch_idx)
metrics = OrderedDict(
[('loss', losses_m.avg), ('prec1', prec1_m.avg), ('prec5', prec5_m.avg)])
return metrics
from lib.models.blocks.residual_block import get_Bottleneck, get_BasicBlock
from lib.models.blocks.inverted_residual_block import InvertedResidual
\ No newline at end of file
# This file is downloaded from
# https://github.com/rwightman/pytorch-image-models
import torch.nn as nn
from timm.models.layers import create_conv2d
from timm.models.efficientnet_blocks import make_divisible, resolve_se_args, \
SqueezeExcite, drop_path
class InvertedResidual(nn.Module):
""" Inverted residual block w/ optional SE and CondConv routing"""
def __init__(
self,
in_chs,
out_chs,
dw_kernel_size=3,
stride=1,
dilation=1,
pad_type='',
act_layer=nn.ReLU,
noskip=False,
exp_ratio=1.0,
exp_kernel_size=1,
pw_kernel_size=1,
se_ratio=0.,
se_kwargs=None,
norm_layer=nn.BatchNorm2d,
norm_kwargs=None,
conv_kwargs=None,
drop_path_rate=0.):
super(InvertedResidual, self).__init__()
norm_kwargs = norm_kwargs or {}
conv_kwargs = conv_kwargs or {}
mid_chs = make_divisible(in_chs * exp_ratio)
has_se = se_ratio is not None and se_ratio > 0.
self.has_residual = (in_chs == out_chs and stride == 1) and not noskip
self.drop_path_rate = drop_path_rate
# Point-wise expansion
self.conv_pw = create_conv2d(
in_chs,
mid_chs,
exp_kernel_size,
padding=pad_type,
**conv_kwargs)
self.bn1 = norm_layer(mid_chs, **norm_kwargs)
self.act1 = act_layer(inplace=True)
# Depth-wise convolution
self.conv_dw = create_conv2d(
mid_chs, mid_chs, dw_kernel_size, stride=stride, dilation=dilation,
padding=pad_type, depthwise=True, **conv_kwargs)
self.bn2 = norm_layer(mid_chs, **norm_kwargs)
self.act2 = act_layer(inplace=True)
# Squeeze-and-excitation
if has_se:
se_kwargs = resolve_se_args(se_kwargs, in_chs, act_layer)
self.se = SqueezeExcite(mid_chs, se_ratio=se_ratio, **se_kwargs)
else:
self.se = None
# Point-wise linear projection
self.conv_pwl = create_conv2d(
mid_chs,
out_chs,
pw_kernel_size,
padding=pad_type,
**conv_kwargs)
self.bn3 = norm_layer(out_chs, **norm_kwargs)
def feature_info(self, location):
if location == 'expansion': # after SE, input to PWL
info = dict(
module='conv_pwl',
hook_type='forward_pre',
num_chs=self.conv_pwl.in_channels)
else: # location == 'bottleneck', block output
info = dict(
module='',
hook_type='',
num_chs=self.conv_pwl.out_channels)
return info
def forward(self, x):
residual = x
# Point-wise expansion
x = self.conv_pw(x)
x = self.bn1(x)
x = self.act1(x)
# Depth-wise convolution
x = self.conv_dw(x)
x = self.bn2(x)
x = self.act2(x)
# Squeeze-and-excitation
if self.se is not None:
x = self.se(x)
# Point-wise linear projection
x = self.conv_pwl(x)
x = self.bn3(x)
if self.has_residual:
if self.drop_path_rate > 0.:
x = drop_path(x, self.drop_path_rate, self.training)
x += residual
return x
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# Written by Hao Du and Houwen Peng
# email: haodu8-c@my.cityu.edu.hk and houwen.peng@microsoft.com
import torch
import torch.nn as nn
import torch.nn.functional as F
def conv3x3(in_planes, out_planes, stride=1):
"3x3 convolution with padding"
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias=True)
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = nn.BatchNorm2d(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class Bottleneck(nn.Module):
def __init__(self, inplanes, planes, stride=1, expansion=4):
super(Bottleneck, self).__init__()
planes = int(planes / expansion)
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=True)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=True)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(
planes,
planes * expansion,
kernel_size=1,
bias=True)
self.bn3 = nn.BatchNorm2d(planes * expansion)
self.relu = nn.ReLU(inplace=True)
self.stride = stride
self.expansion = expansion
if inplanes != planes * self.expansion:
self.downsample = nn.Sequential(
nn.Conv2d(inplanes, planes * self.expansion,
kernel_size=1, stride=stride, bias=True),
nn.BatchNorm2d(planes * self.expansion),
)
else:
self.downsample = None
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
def get_Bottleneck(in_c, out_c, stride):
return Bottleneck(in_c, out_c, stride=stride)
def get_BasicBlock(in_c, out_c, stride):
return BasicBlock(in_c, out_c, stride=stride)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment