"vscode:/vscode.git/clone" did not exist on "f8ba4017007dd189c8a0b9968a1f84b32e61a839"
Unverified Commit 5fe29b06 authored by Jiahang Xu's avatar Jiahang Xu Committed by GitHub
Browse files

Evaluate current SPOS support (#4322)

parent cf7032a5
...@@ -6,24 +6,23 @@ Introduction ...@@ -6,24 +6,23 @@ Introduction
Proposed in `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning. Proposed in `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.
Implementation on NNI is based on `official repo <https://github.com/megvii-model/SinglePathOneShot>`__. We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase. We have also shown Implementation on NNI is based on `official repo <https://github.com/megvii-model/SinglePathOneShot>`__. We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase.
Examples Examples
-------- --------
Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling. Here is a use case, which is the search space in paper. However, we applied latency limit instead of flops limit to perform the architecture search phase.
:githublink:`Example code <examples/nas/oneshot/spos>` :githublink:`Example code <examples/nas/oneshot/spos>`
Requirements Requirements
^^^^^^^^^^^^ ^^^^^^^^^^^^
NVIDIA DALI >= 0.16 is needed as we use DALI to accelerate the data loading of ImageNet. `Installation guide <https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/installation.html>`__ Prepare ImageNet in the standard format (follow the script `here <https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4>`__\ ). Linking it to ``data/imagenet`` will be more convenient.
Download the flops lookup table from `here <https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN>`__ (maintained by `Megvii <https://github.com/megvii-model>`__\ ). Download the checkpoint file from `here <https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN>`__ (maintained by `Megvii <https://github.com/megvii-model>`__\ ) if you don't want to retrain the supernet.
Put ``op_flops_dict.pkl`` and ``checkpoint-150000.pth.tar`` (if you don't want to retrain the supernet) under ``data`` directory. Put ``checkpoint-150000.pth.tar`` under ``data`` directory.
Prepare ImageNet in the standard format (follow the script `here <https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4>`__\ ). Linking it to ``data/imagenet`` will be more convenient.
After preparation, it's expected to have the following code structure: After preparation, it's expected to have the following code structure:
...@@ -32,19 +31,16 @@ After preparation, it's expected to have the following code structure: ...@@ -32,19 +31,16 @@ After preparation, it's expected to have the following code structure:
spos spos
├── architecture_final.json ├── architecture_final.json
├── blocks.py ├── blocks.py
├── config_search.yml
├── data ├── data
│ ├── imagenet │ ├── imagenet
│ │ ├── train │ │ ├── train
│ │ └── val │ │ └── val
│ └── op_flops_dict.pkl │ └── checkpoint-150000.pth.tar
├── dataloader.py
├── network.py ├── network.py
├── readme.md ├── readme.md
├── scratch.py
├── supernet.py ├── supernet.py
├── tester.py ├── evaluation.py
├── tuner.py ├── search.py
└── utils.py └── utils.py
Step 1. Train Supernet Step 1. Train Supernet
...@@ -61,30 +57,18 @@ NOTE: The data loading used in the official repo is `slightly different from usu ...@@ -61,30 +57,18 @@ NOTE: The data loading used in the official repo is `slightly different from usu
Step 2. Evolution Search Step 2. Evolution Search
^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^
Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set. Single Path One-Shot leverages evolution algorithm to search for the best architecture. In the paper, the search module, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
In order to make the tuner aware of the flops limit and have the ability to calculate the flops, we created a new tuner called ``EvolutionWithFlops`` in ``tuner.py``\ , inheriting the tuner in SDK.
To have a search space ready for NNI framework, first run
.. code-block:: bash
nnictl ss_gen -t "python tester.py"
This will generate a file called ``nni_auto_gen_search_space.json``\ , which is a serialized representation of your search space.
By default, it will use ``checkpoint-150000.pth.tar`` downloaded previously. In case you want to use the checkpoint trained by yourself from the last step, specify ``--checkpoint`` in the command in ``config_search.yml``.
Then search with evolution tuner. In this example, we have an incomplete implementation of the evolution search. The example only support training from scratch. Inheriting weights from pretrained supernet is not supported yet. To search with the regularized evolution strategy, run
.. code-block:: bash .. code-block:: bash
nnictl create --config config_search.yml python search.py
The final architecture exported from every epoch of evolution can be found in ``checkpoints`` under the working directory of your tuner, which, by default, is ``$HOME/nni-experiments/your_experiment_id/log``. The final architecture exported from every epoch of evolution can be found in ``trials`` under the working directory of your tuner, which, by default, is ``$HOME/nni-experiments/your_experiment_id/trials``.
Step 3. Train from Scratch Step 3. Train for Evaluation
^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash .. code-block:: bash
...@@ -106,7 +90,7 @@ Known Limitations ...@@ -106,7 +90,7 @@ Known Limitations
* Block search only. Channel search is not supported yet. * Block search only. Channel search is not supported yet.
* Only GPU version is provided here. * In the search phase, training from the scratch is required. Inheriting weights from supernet is not supported yet.
Current Reproduction Results Current Reproduction Results
---------------------------- ----------------------------
......
{ {
"LayerChoice1": [false, false, true, false], "LayerChoice1": "2",
"LayerChoice2": [false, true, false, false], "LayerChoice2": "1",
"LayerChoice3": [true, false, false, false], "LayerChoice3": "0",
"LayerChoice4": [false, true, false, false], "LayerChoice4": "1",
"LayerChoice5": [false, false, true, false], "LayerChoice5": "2",
"LayerChoice6": [true, false, false, false], "LayerChoice6": "0",
"LayerChoice7": [false, false, true, false], "LayerChoice7": "2",
"LayerChoice8": [true, false, false, false], "LayerChoice8": "0",
"LayerChoice9": [false, false, true, false], "LayerChoice9": "2",
"LayerChoice10": [true, false, false, false], "LayerChoice10": "0",
"LayerChoice11": [false, false, true, false], "LayerChoice11": "2",
"LayerChoice12": [false, false, false, true], "LayerChoice12": "3",
"LayerChoice13": [true, false, false, false], "LayerChoice13": "0",
"LayerChoice14": [true, false, false, false], "LayerChoice14": "0",
"LayerChoice15": [true, false, false, false], "LayerChoice15": "0",
"LayerChoice16": [true, false, false, false], "LayerChoice16": "0",
"LayerChoice17": [false, false, false, true], "LayerChoice17": "3",
"LayerChoice18": [false, false, true, false], "LayerChoice18": "2",
"LayerChoice19": [false, false, false, true], "LayerChoice19": "3",
"LayerChoice20": [false, false, false, true] "LayerChoice20": "3"
} }
\ No newline at end of file
authorName: unknown
experimentName: SPOS Search
trialConcurrency: 4
maxExecDuration: 7d
maxTrialNum: 99999
trainingServicePlatform: local
searchSpacePath: nni_auto_gen_search_space.json
useAnnotation: false
tuner:
codeDir: .
classFileName: tuner.py
className: EvolutionWithFlops
trial:
command: python tester.py --spos-prep
codeDir: .
gpuNum: 1
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import torch.utils.data
from nvidia.dali.pipeline import Pipeline
from nvidia.dali.plugin.pytorch import DALIClassificationIterator
class HybridTrainPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data_dir, crop, seed=12, local_rank=0, world_size=1,
spos_pre=False):
super(HybridTrainPipe, self).__init__(batch_size, num_threads, device_id, seed=seed + device_id)
color_space_type = types.BGR if spos_pre else types.RGB
self.input = ops.FileReader(file_root=data_dir, shard_id=local_rank, num_shards=world_size, random_shuffle=True)
self.decode = ops.ImageDecoder(device="mixed", output_type=color_space_type)
self.res = ops.RandomResizedCrop(device="gpu", size=crop,
interp_type=types.INTERP_LINEAR if spos_pre else types.INTERP_TRIANGULAR)
self.twist = ops.ColorTwist(device="gpu")
self.jitter_rng = ops.Uniform(range=[0.6, 1.4])
self.cmnp = ops.CropMirrorNormalize(device="gpu",
output_dtype=types.FLOAT,
output_layout=types.NCHW,
image_type=color_space_type,
mean=0. if spos_pre else [0.485 * 255, 0.456 * 255, 0.406 * 255],
std=1. if spos_pre else [0.229 * 255, 0.224 * 255, 0.225 * 255])
self.coin = ops.CoinFlip(probability=0.5)
def define_graph(self):
rng = self.coin()
self.jpegs, self.labels = self.input(name="Reader")
images = self.decode(self.jpegs)
images = self.res(images)
images = self.twist(images, saturation=self.jitter_rng(),
contrast=self.jitter_rng(), brightness=self.jitter_rng())
output = self.cmnp(images, mirror=rng)
return [output, self.labels]
class HybridValPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data_dir, crop, size, seed=12, local_rank=0, world_size=1,
spos_pre=False, shuffle=False):
super(HybridValPipe, self).__init__(batch_size, num_threads, device_id, seed=seed + device_id)
color_space_type = types.BGR if spos_pre else types.RGB
self.input = ops.FileReader(file_root=data_dir, shard_id=local_rank, num_shards=world_size,
random_shuffle=shuffle)
self.decode = ops.ImageDecoder(device="mixed", output_type=color_space_type)
self.res = ops.Resize(device="gpu", resize_shorter=size,
interp_type=types.INTERP_LINEAR if spos_pre else types.INTERP_TRIANGULAR)
self.cmnp = ops.CropMirrorNormalize(device="gpu",
output_dtype=types.FLOAT,
output_layout=types.NCHW,
crop=(crop, crop),
image_type=color_space_type,
mean=0. if spos_pre else [0.485 * 255, 0.456 * 255, 0.406 * 255],
std=1. if spos_pre else [0.229 * 255, 0.224 * 255, 0.225 * 255])
def define_graph(self):
self.jpegs, self.labels = self.input(name="Reader")
images = self.decode(self.jpegs)
images = self.res(images)
output = self.cmnp(images)
return [output, self.labels]
class ClassificationWrapper:
def __init__(self, loader, size):
self.loader = loader
self.size = size
def __iter__(self):
return self
def __next__(self):
data = next(self.loader)
return data[0]["data"], data[0]["label"].view(-1).long().cuda(non_blocking=True)
def __len__(self):
return self.size
def get_imagenet_iter_dali(split, image_dir, batch_size, num_threads, crop=224, val_size=256,
spos_preprocessing=False, seed=12, shuffle=False, device_id=None):
world_size, local_rank = 1, 0
if device_id is None:
device_id = torch.cuda.device_count() - 1 # use last gpu
if split == "train":
pipeline = HybridTrainPipe(batch_size=batch_size, num_threads=num_threads, device_id=device_id,
data_dir=os.path.join(image_dir, "train"), seed=seed,
crop=crop, world_size=world_size, local_rank=local_rank,
spos_pre=spos_preprocessing)
elif split == "val":
pipeline = HybridValPipe(batch_size=batch_size, num_threads=num_threads, device_id=device_id,
data_dir=os.path.join(image_dir, "val"), seed=seed,
crop=crop, size=val_size, world_size=world_size, local_rank=local_rank,
spos_pre=spos_preprocessing, shuffle=shuffle)
else:
raise AssertionError
pipeline.build()
num_samples = pipeline.epoch_size("Reader")
return ClassificationWrapper(
DALIClassificationIterator(pipeline, size=num_samples, fill_last_batch=split == "train",
auto_reset=True), (num_samples + batch_size - 1) // batch_size)
...@@ -9,13 +9,14 @@ import random ...@@ -9,13 +9,14 @@ import random
import numpy as np import numpy as np
import torch import torch
import torch.nn as nn import torch.nn as nn
from dataloader import get_imagenet_iter_dali import torchvision.transforms as transforms
from nni.nas.pytorch.fixed import apply_fixed_architecture import torchvision.datasets as datasets
from nni.nas.pytorch.utils import AverageMeterGroup from nni.retiarii import fixed_arch
from nni.retiarii.oneshot.pytorch.utils import AverageMeterGroup
from torch.utils.tensorboard import SummaryWriter from torch.utils.tensorboard import SummaryWriter
from network import ShuffleNetV2OneShot from network import ShuffleNetV2OneShot
from utils import CrossEntropyLabelSmooth, accuracy from utils import CrossEntropyLabelSmooth, accuracy, ToBGRTensor
logger = logging.getLogger("nni.spos.scratch") logger = logging.getLogger("nni.spos.scratch")
...@@ -26,6 +27,7 @@ def train(epoch, model, criterion, optimizer, loader, writer, args): ...@@ -26,6 +27,7 @@ def train(epoch, model, criterion, optimizer, loader, writer, args):
cur_lr = optimizer.param_groups[0]["lr"] cur_lr = optimizer.param_groups[0]["lr"]
for step, (x, y) in enumerate(loader): for step, (x, y) in enumerate(loader):
x, y = x.to('cuda'), y.to('cuda')
cur_step = len(loader) * epoch + step cur_step = len(loader) * epoch + step
optimizer.zero_grad() optimizer.zero_grad()
logits = model(x) logits = model(x)
...@@ -54,6 +56,7 @@ def validate(epoch, model, criterion, loader, writer, args): ...@@ -54,6 +56,7 @@ def validate(epoch, model, criterion, loader, writer, args):
meters = AverageMeterGroup() meters = AverageMeterGroup()
with torch.no_grad(): with torch.no_grad():
for step, (x, y) in enumerate(loader): for step, (x, y) in enumerate(loader):
x, y = x.to('cuda'), y.to('cuda')
logits = model(x) logits = model(x)
loss = criterion(logits, y) loss = criterion(logits, y)
metrics = accuracy(logits, y) metrics = accuracy(logits, y)
...@@ -109,9 +112,9 @@ if __name__ == "__main__": ...@@ -109,9 +112,9 @@ if __name__ == "__main__":
random.seed(args.seed) random.seed(args.seed)
torch.backends.cudnn.deterministic = True torch.backends.cudnn.deterministic = True
with fixed_arch(args.architecture):
model = ShuffleNetV2OneShot(affine=True) model = ShuffleNetV2OneShot(affine=True)
model.cuda() model.cuda()
apply_fixed_architecture(model, args.architecture)
if torch.cuda.device_count() > 1: # exclude last gpu, saving for data preprocessing on gpu if torch.cuda.device_count() > 1: # exclude last gpu, saving for data preprocessing on gpu
model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1))) model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing) criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing)
...@@ -128,14 +131,25 @@ if __name__ == "__main__": ...@@ -128,14 +131,25 @@ if __name__ == "__main__":
raise ValueError("'%s' not supported." % args.lr_decay) raise ValueError("'%s' not supported." % args.lr_decay)
writer = SummaryWriter(log_dir=args.tb_dir) writer = SummaryWriter(log_dir=args.tb_dir)
train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.batch_size, args.workers, if args.spos_preprocessing:
spos_preprocessing=args.spos_preprocessing) trans = transforms.Compose([
val_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.batch_size, args.workers, transforms.RandomResizedCrop(224),
spos_preprocessing=args.spos_preprocessing) transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),
transforms.RandomHorizontalFlip(0.5),
ToBGRTensor(),
])
else:
trans = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ToTensor()
])
train_dataset = datasets.ImageNet(args.imagenet_dir, split='train', transform=trans)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=args.batch_size, num_workers=args.workers)
val_dataset = datasets.ImageNet(args.imagenet_dir, split='val', transform=trans)
valid_loader = torch.utils.data.DataLoader(val_dataset, batch_size=args.batch_size, num_workers=args.workers)
for epoch in range(args.epochs): for epoch in range(args.epochs):
train(epoch, model, criterion, optimizer, train_loader, writer, args) train(epoch, model, criterion, optimizer, train_loader, writer, args)
validate(epoch, model, criterion, val_loader, writer, args) validate(epoch, model, criterion, valid_loader, writer, args)
scheduler.step() scheduler.step()
dump_checkpoint(model, epoch, "scratch_checkpoints") dump_checkpoint(model, epoch, "scratch_checkpoints")
......
...@@ -6,8 +6,8 @@ import pickle ...@@ -6,8 +6,8 @@ import pickle
import re import re
import torch import torch
import torch.nn as nn import nni.retiarii.nn.pytorch as nn
from nni.nas.pytorch import mutables from nni.retiarii.nn.pytorch import LayerChoice
from blocks import ShuffleNetBlock, ShuffleXceptionBlock from blocks import ShuffleNetBlock, ShuffleXceptionBlock
...@@ -20,23 +20,20 @@ class ShuffleNetV2OneShot(nn.Module): ...@@ -20,23 +20,20 @@ class ShuffleNetV2OneShot(nn.Module):
'xception_3x3', 'xception_3x3',
] ]
def __init__(self, input_size=224, first_conv_channels=16, last_conv_channels=1024, n_classes=1000, def __init__(self, input_size=224, first_conv_channels=16, last_conv_channels=1024,
op_flops_path="./data/op_flops_dict.pkl", affine=False): n_classes=1000, affine=False):
super().__init__() super().__init__()
assert input_size % 32 == 0 assert input_size % 32 == 0
with open(os.path.join(os.path.dirname(__file__), op_flops_path), "rb") as fp:
self._op_flops_dict = pickle.load(fp)
self.stage_blocks = [4, 4, 8, 4] self.stage_blocks = [4, 4, 8, 4]
self.stage_channels = [64, 160, 320, 640] self.stage_channels = [64, 160, 320, 640]
self._parsed_flops = dict()
self._input_size = input_size self._input_size = input_size
self._feature_map_size = input_size self._feature_map_size = input_size
self._first_conv_channels = first_conv_channels self._first_conv_channels = first_conv_channels
self._last_conv_channels = last_conv_channels self._last_conv_channels = last_conv_channels
self._n_classes = n_classes self._n_classes = n_classes
self._affine = affine self._affine = affine
self._layerchoice_count = 0
# building first layer # building first layer
self.first_conv = nn.Sequential( self.first_conv = nn.Sequential(
...@@ -75,19 +72,15 @@ class ShuffleNetV2OneShot(nn.Module): ...@@ -75,19 +72,15 @@ class ShuffleNetV2OneShot(nn.Module):
base_mid_channels = channels // 2 base_mid_channels = channels // 2
mid_channels = int(base_mid_channels) # prepare for scale mid_channels = int(base_mid_channels) # prepare for scale
choice_block = mutables.LayerChoice([ self._layerchoice_count += 1
choice_block = LayerChoice([
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=3, stride=stride, affine=self._affine), ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=3, stride=stride, affine=self._affine),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=5, stride=stride, affine=self._affine), ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=5, stride=stride, affine=self._affine),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=7, stride=stride, affine=self._affine), ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=7, stride=stride, affine=self._affine),
ShuffleXceptionBlock(inp, oup, mid_channels=mid_channels, stride=stride, affine=self._affine) ShuffleXceptionBlock(inp, oup, mid_channels=mid_channels, stride=stride, affine=self._affine)
]) ], label="LayerChoice" + str(self._layerchoice_count))
result.append(choice_block) result.append(choice_block)
# find the corresponding flops
flop_key = (inp, oup, mid_channels, self._feature_map_size, self._feature_map_size, stride)
self._parsed_flops[choice_block.key] = [
self._op_flops_dict["{}_stride_{}".format(k, stride)][flop_key] for k in self.block_keys
]
if stride == 2: if stride == 2:
self._feature_map_size //= 2 self._feature_map_size //= 2
return result return result
...@@ -104,46 +97,30 @@ class ShuffleNetV2OneShot(nn.Module): ...@@ -104,46 +97,30 @@ class ShuffleNetV2OneShot(nn.Module):
x = self.classifier(x) x = self.classifier(x)
return x return x
def get_candidate_flops(self, candidate):
conv1_flops = self._op_flops_dict["conv1"][(3, self._first_conv_channels,
self._input_size, self._input_size, 2)]
# Should use `last_conv_channels` here, but megvii insists that it's `n_classes`. Keeping it.
# https://github.com/megvii-model/SinglePathOneShot/blob/36eed6cf083497ffa9cfe7b8da25bb0b6ba5a452/src/Supernet/flops.py#L313
rest_flops = self._op_flops_dict["rest_operation"][(self.stage_channels[-1], self._n_classes,
self._feature_map_size, self._feature_map_size, 1)]
total_flops = conv1_flops + rest_flops
for k, m in candidate.items():
parsed_flops_dict = self._parsed_flops[k]
if isinstance(m, dict): # to be compatible with classical nas format
total_flops += parsed_flops_dict[m["_idx"]]
else:
total_flops += parsed_flops_dict[torch.max(m, 0)[1]]
return total_flops
def _initialize_weights(self): def _initialize_weights(self):
for name, m in self.named_modules(): for name, m in self.named_modules():
if isinstance(m, nn.Conv2d): if isinstance(m, nn.Conv2d):
if 'first' in name: if 'first' in name:
nn.init.normal_(m.weight, 0, 0.01) torch.nn.init.normal_(m.weight, 0, 0.01)
else: else:
nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1]) torch.nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1])
if m.bias is not None: if m.bias is not None:
nn.init.constant_(m.bias, 0) torch.nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d): elif isinstance(m, nn.BatchNorm2d):
if m.weight is not None: if m.weight is not None:
nn.init.constant_(m.weight, 1) torch.nn.init.constant_(m.weight, 1)
if m.bias is not None: if m.bias is not None:
nn.init.constant_(m.bias, 0.0001) torch.nn.init.constant_(m.bias, 0.0001)
nn.init.constant_(m.running_mean, 0) torch.nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.BatchNorm1d): elif isinstance(m, nn.BatchNorm1d):
nn.init.constant_(m.weight, 1) torch.nn.init.constant_(m.weight, 1)
if m.bias is not None: if m.bias is not None:
nn.init.constant_(m.bias, 0.0001) torch.nn.init.constant_(m.bias, 0.0001)
nn.init.constant_(m.running_mean, 0) torch.nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.Linear): elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) torch.nn.init.normal_(m.weight, 0, 0.01)
if m.bias is not None: if m.bias is not None:
nn.init.constant_(m.bias, 0) torch.nn.init.constant_(m.bias, 0)
def load_and_parse_state_dict(filepath="./data/checkpoint-150000.pth.tar"): def load_and_parse_state_dict(filepath="./data/checkpoint-150000.pth.tar"):
......
# This file is to demo the usage of multi-trial NAS in the usage of SPOS search space. # This file is to demo the usage of multi-trial NAS in the usage of SPOS search space.
import click import click
import json
import nni.retiarii.evaluator.pytorch as pl import nni.retiarii.evaluator.pytorch as pl
import nni.retiarii.nn.pytorch as nn
import nni.retiarii.strategy as strategy import nni.retiarii.strategy as strategy
import torch
from nni.retiarii import serialize from nni.retiarii import serialize
from nni.retiarii.nn.pytorch import LayerChoice
from nni.retiarii.experiment.pytorch import RetiariiExeConfig, RetiariiExperiment from nni.retiarii.experiment.pytorch import RetiariiExeConfig, RetiariiExperiment
from torchvision import transforms from torchvision import transforms
from torchvision.datasets import CIFAR10 from torchvision.datasets import CIFAR10
from blocks import ShuffleNetBlock, ShuffleXceptionBlock
from nn_meter import load_latency_predictor from nn_meter import load_latency_predictor
from network import ShuffleNetV2OneShot
class ShuffleNetV2(nn.Module): from utils import get_archchoice_by_model
block_keys = [
'shufflenet_3x3',
'shufflenet_5x5',
'shufflenet_7x7',
'xception_3x3',
]
def __init__(self, input_size=224, first_conv_channels=16, last_conv_channels=1024, n_classes=1000, affine=False):
super().__init__()
assert input_size % 32 == 0
self.stage_blocks = [4, 4, 8, 4]
self.stage_channels = [64, 160, 320, 640]
self._parsed_flops = dict()
self._input_size = input_size
self._feature_map_size = input_size
self._first_conv_channels = first_conv_channels
self._last_conv_channels = last_conv_channels
self._n_classes = n_classes
self._affine = affine
# building first layer
self.first_conv = nn.Sequential(
nn.Conv2d(3, first_conv_channels, 3, 2, 1, bias=False),
nn.BatchNorm2d(first_conv_channels, affine=affine),
nn.ReLU(inplace=True),
)
self._feature_map_size //= 2
p_channels = first_conv_channels
features = []
for num_blocks, channels in zip(self.stage_blocks, self.stage_channels):
features.extend(self._make_blocks(num_blocks, p_channels, channels))
p_channels = channels
self.features = nn.Sequential(*features)
self.conv_last = nn.Sequential(
nn.Conv2d(p_channels, last_conv_channels, 1, 1, 0, bias=False),
nn.BatchNorm2d(last_conv_channels, affine=affine),
nn.ReLU(inplace=True),
)
self.globalpool = nn.AvgPool2d(self._feature_map_size)
self.dropout = nn.Dropout(0.1)
self.classifier = nn.Sequential(
nn.Linear(last_conv_channels, n_classes, bias=False),
)
self._initialize_weights()
def _make_blocks(self, blocks, in_channels, channels):
result = []
for i in range(blocks):
stride = 2 if i == 0 else 1
inp = in_channels if i == 0 else channels
oup = channels
base_mid_channels = channels // 2
mid_channels = int(base_mid_channels) # prepare for scale
choice_block = LayerChoice([
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=3, stride=stride, affine=self._affine),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=5, stride=stride, affine=self._affine),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=7, stride=stride, affine=self._affine),
ShuffleXceptionBlock(inp, oup, mid_channels=mid_channels, stride=stride, affine=self._affine)
])
result.append(choice_block)
if stride == 2:
self._feature_map_size //= 2
return result
def forward(self, x):
bs = x.size(0)
x = self.first_conv(x)
x = self.features(x)
x = self.conv_last(x)
x = self.globalpool(x)
x = self.dropout(x)
x = x.contiguous().view(bs, -1)
x = self.classifier(x)
return x
def _initialize_weights(self):
# FIXME this won't work in base engine
for name, m in self.named_modules():
if isinstance(m, nn.Conv2d):
if 'first' in name:
torch.nn.init.normal_(m.weight, 0, 0.01)
else:
torch.nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1])
if m.bias is not None:
torch.nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
if m.weight is not None:
torch.nn.init.constant_(m.weight, 1)
if m.bias is not None:
torch.nn.init.constant_(m.bias, 0.0001)
torch.nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.BatchNorm1d):
torch.nn.init.constant_(m.weight, 1)
if m.bias is not None:
torch.nn.init.constant_(m.bias, 0.0001)
torch.nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.Linear):
torch.nn.init.normal_(m.weight, 0, 0.01)
if m.bias is not None:
torch.nn.init.constant_(m.bias, 0)
class LatencyFilter: class LatencyFilter:
def __init__(self, threshold, predictor, predictor_version=None, reverse=False): def __init__(self, threshold, predictor, predictor_version=None, reverse=False):
""" """
Filter the models according to predcted latency. Filter the models according to predicted latency.
Parameters Parameters
---------- ----------
...@@ -140,7 +27,7 @@ class LatencyFilter: ...@@ -140,7 +27,7 @@ class LatencyFilter:
determine the targeted device determine the targeted device
reverse: `bool` reverse: `bool`
if reverse is `False`, then the model returns `True` when `latency < threshold`, if reverse is `False`, then the model returns `True` when `latency < threshold`,
else otherwisse else otherwise
""" """
self.predictors = load_latency_predictor(predictor, predictor_version) self.predictors = load_latency_predictor(predictor, predictor_version)
self.threshold = threshold self.threshold = threshold
...@@ -153,7 +40,7 @@ class LatencyFilter: ...@@ -153,7 +40,7 @@ class LatencyFilter:
@click.command() @click.command()
@click.option('--port', default=8081, help='On which port the experiment is run.') @click.option('--port', default=8081, help='On which port the experiment is run.')
def _main(port): def _main(port):
base_model = ShuffleNetV2(32) base_model = ShuffleNetV2OneShot(32)
base_predictor = 'cortexA76cpu_tflite21' base_predictor = 'cortexA76cpu_tflite21'
transf = [ transf = [
transforms.RandomCrop(32, padding=4), transforms.RandomCrop(32, padding=4),
...@@ -170,13 +57,12 @@ def _main(port): ...@@ -170,13 +57,12 @@ def _main(port):
val_dataloaders=pl.DataLoader(test_dataset, batch_size=64), val_dataloaders=pl.DataLoader(test_dataset, batch_size=64),
max_epochs=2, gpus=1) max_epochs=2, gpus=1)
simple_strategy = strategy.Random(model_filter=LatencyFilter(threshold=100, predictor=base_predictor)) simple_strategy = strategy.RegularizedEvolution(model_filter=LatencyFilter(threshold=100, predictor=base_predictor), population_size=2, cycles=2)
exp = RetiariiExperiment(base_model, trainer, strategy=simple_strategy) exp = RetiariiExperiment(base_model, trainer, strategy=simple_strategy)
exp_config = RetiariiExeConfig('local') exp_config = RetiariiExeConfig('local')
exp_config.trial_concurrency = 2 exp_config.trial_concurrency = 2
exp_config.max_trial_number = 2 # exp_config.max_trial_number = 2
exp_config.trial_gpu_number = 1 exp_config.trial_gpu_number = 1
exp_config.training_service.use_active_gpu = False exp_config.training_service.use_active_gpu = False
exp_config.execution_engine = 'base' exp_config.execution_engine = 'base'
...@@ -185,8 +71,10 @@ def _main(port): ...@@ -185,8 +71,10 @@ def _main(port):
exp.run(exp_config, port) exp.run(exp_config, port)
print('Exported models:') print('Exported models:')
for model in exp.export_top_models(formatter='dict'): for i, model in enumerate(exp.export_top_models(formatter='dict')):
print(model) print(model)
with open(f'architecture_final_{i}.json', 'w') as f:
json.dump(get_archchoice_by_model(model), f, indent=4)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -8,13 +8,12 @@ import random ...@@ -8,13 +8,12 @@ import random
import numpy as np import numpy as np
import torch import torch
import torch.nn as nn import torch.nn as nn
from nni.nas.pytorch.callbacks import LRSchedulerCallback import torchvision.transforms as transforms
from nni.nas.pytorch.callbacks import ModelCheckpoint import torchvision.datasets as datasets
from nni.algorithms.nas.pytorch.spos import SPOSSupernetTrainingMutator, SPOSSupernetTrainer from nni.retiarii.oneshot.pytorch import SinglePathTrainer
from dataloader import get_imagenet_iter_dali
from network import ShuffleNetV2OneShot, load_and_parse_state_dict from network import ShuffleNetV2OneShot, load_and_parse_state_dict
from utils import CrossEntropyLabelSmooth, accuracy from utils import CrossEntropyLabelSmooth, accuracy, ToBGRTensor
logger = logging.getLogger("nni.spos.supernet") logger = logging.getLogger("nni.spos.supernet")
...@@ -45,16 +44,17 @@ if __name__ == "__main__": ...@@ -45,16 +44,17 @@ if __name__ == "__main__":
torch.backends.cudnn.deterministic = True torch.backends.cudnn.deterministic = True
model = ShuffleNetV2OneShot() model = ShuffleNetV2OneShot()
flops_func = model.get_candidate_flops
if args.load_checkpoint: if args.load_checkpoint:
if not args.spos_preprocessing: if not args.spos_preprocessing:
logger.warning("You might want to use SPOS preprocessing if you are loading their checkpoints.") logger.warning("You might want to use SPOS preprocessing if you are loading their checkpoints.")
model.load_state_dict(load_and_parse_state_dict()) # load state_dict and
model_dict = model.state_dict()
model_dict.update(load_and_parse_state_dict())
model.load_state_dict(model_dict)
logger.info(f'Model loaded from ./data/checkpoint-150000.pth.tar')
model.cuda() model.cuda()
if torch.cuda.device_count() > 1: # exclude last gpu, saving for data preprocessing on gpu if torch.cuda.device_count() > 1: # exclude last gpu, saving for data preprocessing on gpu
model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1))) model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
mutator = SPOSSupernetTrainingMutator(model, flops_func=flops_func,
flops_lb=290E6, flops_ub=360E6)
criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing) criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing)
optimizer = torch.optim.SGD(model.parameters(), lr=args.learning_rate, optimizer = torch.optim.SGD(model.parameters(), lr=args.learning_rate,
momentum=args.momentum, weight_decay=args.weight_decay) momentum=args.momentum, weight_decay=args.weight_decay)
...@@ -62,14 +62,22 @@ if __name__ == "__main__": ...@@ -62,14 +62,22 @@ if __name__ == "__main__":
lambda step: (1.0 - step / args.epochs) lambda step: (1.0 - step / args.epochs)
if step <= args.epochs else 0, if step <= args.epochs else 0,
last_epoch=-1) last_epoch=-1)
train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.batch_size, args.workers, if args.spos_preprocessing:
spos_preprocessing=args.spos_preprocessing) trans = transforms.Compose([
valid_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.batch_size, args.workers, transforms.RandomResizedCrop(224),
spos_preprocessing=args.spos_preprocessing) transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),
trainer = SPOSSupernetTrainer(model, criterion, accuracy, optimizer, transforms.RandomHorizontalFlip(0.5),
args.epochs, train_loader, valid_loader, ToBGRTensor(),
mutator=mutator, batch_size=args.batch_size, ])
log_frequency=args.log_frequency, workers=args.workers, else:
callbacks=[LRSchedulerCallback(scheduler), trans = transforms.Compose([
ModelCheckpoint("./checkpoints")]) transforms.RandomResizedCrop(224),
trainer.train() transforms.ToTensor()
])
train_dataset = datasets.ImageNet(args.imagenet_dir, split='train', transform=trans)
val_dataset = datasets.ImageNet(args.imagenet_dir, split='val', transform=trans)
trainer = SinglePathTrainer(model, criterion, accuracy, optimizer,
args.epochs, train_dataset, val_dataset,
batch_size=args.batch_size,
log_frequency=args.log_frequency, workers=args.workers)
trainer.fit()
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import argparse
import logging
import random
import time
from itertools import cycle
import nni
import numpy as np
import torch
import torch.nn as nn
from nni.algorithms.nas.pytorch.classic_nas import get_and_apply_next_architecture
from nni.nas.pytorch.utils import AverageMeterGroup
from dataloader import get_imagenet_iter_dali
from network import ShuffleNetV2OneShot, load_and_parse_state_dict
from utils import CrossEntropyLabelSmooth, accuracy
logger = logging.getLogger("nni.spos.tester")
def retrain_bn(model, criterion, max_iters, log_freq, loader):
with torch.no_grad():
logger.info("Clear BN statistics...")
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.running_mean = torch.zeros_like(m.running_mean)
m.running_var = torch.ones_like(m.running_var)
logger.info("Train BN with training set (BN sanitize)...")
model.train()
meters = AverageMeterGroup()
for step in range(max_iters):
inputs, targets = next(loader)
logits = model(inputs)
loss = criterion(logits, targets)
metrics = accuracy(logits, targets)
metrics["loss"] = loss.item()
meters.update(metrics)
if step % log_freq == 0 or step + 1 == max_iters:
logger.info("Train Step [%d/%d] %s", step + 1, max_iters, meters)
def test_acc(model, criterion, log_freq, loader):
logger.info("Start testing...")
model.eval()
meters = AverageMeterGroup()
start_time = time.time()
with torch.no_grad():
for step, (inputs, targets) in enumerate(loader):
logits = model(inputs)
loss = criterion(logits, targets)
metrics = accuracy(logits, targets)
metrics["loss"] = loss.item()
meters.update(metrics)
if step % log_freq == 0 or step + 1 == len(loader):
logger.info("Valid Step [%d/%d] time %.3fs acc1 %.4f acc5 %.4f loss %.4f",
step + 1, len(loader), time.time() - start_time,
meters.acc1.avg, meters.acc5.avg, meters.loss.avg)
return meters.acc1.avg
def evaluate_acc(model, criterion, args, loader_train, loader_test):
acc_before = test_acc(model, criterion, args.log_frequency, loader_test)
nni.report_intermediate_result(acc_before)
retrain_bn(model, criterion, args.train_iters, args.log_frequency, loader_train)
acc = test_acc(model, criterion, args.log_frequency, loader_test)
assert isinstance(acc, float)
nni.report_intermediate_result(acc)
nni.report_final_result(acc)
if __name__ == "__main__":
parser = argparse.ArgumentParser("SPOS Candidate Tester")
parser.add_argument("--imagenet-dir", type=str, default="./data/imagenet")
parser.add_argument("--checkpoint", type=str, default="./data/checkpoint-150000.pth.tar")
parser.add_argument("--spos-preprocessing", action="store_true", default=False,
help="When true, image values will range from 0 to 255 and use BGR "
"(as in original repo).")
parser.add_argument("--seed", type=int, default=42)
parser.add_argument("--workers", type=int, default=6)
parser.add_argument("--train-batch-size", type=int, default=128)
parser.add_argument("--train-iters", type=int, default=200)
parser.add_argument("--test-batch-size", type=int, default=512)
parser.add_argument("--log-frequency", type=int, default=10)
args = parser.parse_args()
# use a fixed set of image will improve the performance
torch.manual_seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
np.random.seed(args.seed)
random.seed(args.seed)
torch.backends.cudnn.deterministic = True
assert torch.cuda.is_available()
model = ShuffleNetV2OneShot()
criterion = CrossEntropyLabelSmooth(1000, 0.1)
get_and_apply_next_architecture(model)
model.load_state_dict(load_and_parse_state_dict(filepath=args.checkpoint))
model.cuda()
train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.train_batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing,
seed=args.seed, device_id=0)
val_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.test_batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing, shuffle=True,
seed=args.seed, device_id=0)
train_loader = cycle(train_loader)
evaluate_acc(model, criterion, args, train_loader, val_loader)
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from nni.nas.pytorch.spos import SPOSEvolution
from network import ShuffleNetV2OneShot
class EvolutionWithFlops(SPOSEvolution):
"""
This tuner extends the function of evolution tuner, by limiting the flops generated by tuner.
Needs a function to examine the flops.
"""
def __init__(self, flops_limit=330E6, **kwargs):
super().__init__(**kwargs)
self.model = ShuffleNetV2OneShot()
self.flops_limit = flops_limit
def _is_legal(self, cand):
if not super()._is_legal(cand):
return False
if self.model.get_candidate_flops(cand) > self.flops_limit:
return False
return True
...@@ -3,6 +3,8 @@ ...@@ -3,6 +3,8 @@
import torch import torch
import torch.nn as nn import torch.nn as nn
import numpy as np
import PIL
class CrossEntropyLabelSmooth(nn.Module): class CrossEntropyLabelSmooth(nn.Module):
...@@ -39,3 +41,24 @@ def accuracy(output, target, topk=(1, 5)): ...@@ -39,3 +41,24 @@ def accuracy(output, target, topk=(1, 5)):
correct_k = correct[:k].reshape(-1).float().sum(0) correct_k = correct[:k].reshape(-1).float().sum(0)
res["acc{}".format(k)] = correct_k.mul_(1.0 / batch_size).item() res["acc{}".format(k)] = correct_k.mul_(1.0 / batch_size).item()
return res return res
class ToBGRTensor(object):
def __call__(self, img):
assert isinstance(img, (np.ndarray, PIL.Image.Image))
if isinstance(img, PIL.Image.Image):
img = np.asarray(img)
img = img[:,:, ::-1] # 2 BGR
img = np.transpose(img, [2, 0, 1]) # 2 (3, H, W)
img = np.ascontiguousarray(img)
img = torch.from_numpy(img).float()
return img
def get_archchoice_by_model(model):
result = {}
for k, v in model.items():
assert k in v
result[k] = model[k].split("_")[-1]
return result
...@@ -10,7 +10,7 @@ import time ...@@ -10,7 +10,7 @@ import time
from ..execution import query_available_resources, submit_models from ..execution import query_available_resources, submit_models
from ..graph import ModelStatus from ..graph import ModelStatus
from .base import BaseStrategy from .base import BaseStrategy
from .utils import dry_run_for_search_space, get_targeted_model from .utils import dry_run_for_search_space, get_targeted_model, filter_model
_logger = logging.getLogger(__name__) _logger = logging.getLogger(__name__)
...@@ -47,10 +47,12 @@ class RegularizedEvolution(BaseStrategy): ...@@ -47,10 +47,12 @@ class RegularizedEvolution(BaseStrategy):
Can be one of "ignore" and "worst". If "ignore", simply give up the model and find a new one. Can be one of "ignore" and "worst". If "ignore", simply give up the model and find a new one.
If "worst", mark the model as -inf (if maximize, inf if minimize), so that the algorithm "learns" to avoid such model. If "worst", mark the model as -inf (if maximize, inf if minimize), so that the algorithm "learns" to avoid such model.
Default: ignore. Default: ignore.
model_filter: Callable[[Model], bool]
Feed the model and return a bool. This will filter the models in search space and select which to submit.
""" """
def __init__(self, optimize_mode='maximize', population_size=100, sample_size=25, cycles=20000, def __init__(self, optimize_mode='maximize', population_size=100, sample_size=25, cycles=20000,
mutation_prob=0.05, on_failure='ignore'): mutation_prob=0.05, on_failure='ignore', model_filter=None):
assert optimize_mode in ['maximize', 'minimize'] assert optimize_mode in ['maximize', 'minimize']
assert on_failure in ['ignore', 'worst'] assert on_failure in ['ignore', 'worst']
assert sample_size < population_size assert sample_size < population_size
...@@ -67,6 +69,7 @@ class RegularizedEvolution(BaseStrategy): ...@@ -67,6 +69,7 @@ class RegularizedEvolution(BaseStrategy):
self._population = collections.deque() self._population = collections.deque()
self._running_models = [] self._running_models = []
self._polling_interval = 2. self._polling_interval = 2.
self.filter = model_filter
def random(self, search_space): def random(self, search_space):
return {k: random.choice(v) for k, v in search_space.items()} return {k: random.choice(v) for k, v in search_space.items()}
...@@ -127,6 +130,11 @@ class RegularizedEvolution(BaseStrategy): ...@@ -127,6 +130,11 @@ class RegularizedEvolution(BaseStrategy):
def _submit_config(self, config, base_model, mutators): def _submit_config(self, config, base_model, mutators):
_logger.debug('Model submitted to running queue: %s', config) _logger.debug('Model submitted to running queue: %s', config)
model = get_targeted_model(base_model, mutators, config) model = get_targeted_model(base_model, mutators, config)
if not filter_model(self.filter, model):
if self.on_failure == "worst":
model.status = ModelStatus.Failed
self._running_models.append((config, model))
else:
submit_models(model) submit_models(model)
self._running_models.append((config, model)) self._running_models.append((config, model))
return model return model
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment