"...git@developer.sourcefind.cn:SIYIXNI/vllm.git" did not exist on "e86717833da1216222cf0d490c2e3ba198610b13"
Unverified Commit 5fe29b06 authored by Jiahang Xu's avatar Jiahang Xu Committed by GitHub
Browse files

Evaluate current SPOS support (#4322)

parent cf7032a5
......@@ -6,24 +6,23 @@ Introduction
Proposed in `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.
Implementation on NNI is based on `official repo <https://github.com/megvii-model/SinglePathOneShot>`__. We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase. We have also shown
Implementation on NNI is based on `official repo <https://github.com/megvii-model/SinglePathOneShot>`__. We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase.
Examples
--------
Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling.
Here is a use case, which is the search space in paper. However, we applied latency limit instead of flops limit to perform the architecture search phase.
:githublink:`Example code <examples/nas/oneshot/spos>`
Requirements
^^^^^^^^^^^^
NVIDIA DALI >= 0.16 is needed as we use DALI to accelerate the data loading of ImageNet. `Installation guide <https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/installation.html>`__
Prepare ImageNet in the standard format (follow the script `here <https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4>`__\ ). Linking it to ``data/imagenet`` will be more convenient.
Download the flops lookup table from `here <https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN>`__ (maintained by `Megvii <https://github.com/megvii-model>`__\ ).
Put ``op_flops_dict.pkl`` and ``checkpoint-150000.pth.tar`` (if you don't want to retrain the supernet) under ``data`` directory.
Download the checkpoint file from `here <https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN>`__ (maintained by `Megvii <https://github.com/megvii-model>`__\ ) if you don't want to retrain the supernet.
Put ``checkpoint-150000.pth.tar`` under ``data`` directory.
Prepare ImageNet in the standard format (follow the script `here <https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4>`__\ ). Linking it to ``data/imagenet`` will be more convenient.
After preparation, it's expected to have the following code structure:
......@@ -32,19 +31,16 @@ After preparation, it's expected to have the following code structure:
spos
├── architecture_final.json
├── blocks.py
├── config_search.yml
├── data
│ ├── imagenet
│ │ ├── train
│ │ └── val
│ └── op_flops_dict.pkl
├── dataloader.py
│ └── checkpoint-150000.pth.tar
├── network.py
├── readme.md
├── scratch.py
├── supernet.py
├── tester.py
├── tuner.py
├── evaluation.py
├── search.py
└── utils.py
Step 1. Train Supernet
......@@ -61,30 +57,18 @@ NOTE: The data loading used in the official repo is `slightly different from usu
Step 2. Evolution Search
^^^^^^^^^^^^^^^^^^^^^^^^
Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
In order to make the tuner aware of the flops limit and have the ability to calculate the flops, we created a new tuner called ``EvolutionWithFlops`` in ``tuner.py``\ , inheriting the tuner in SDK.
To have a search space ready for NNI framework, first run
.. code-block:: bash
nnictl ss_gen -t "python tester.py"
This will generate a file called ``nni_auto_gen_search_space.json``\ , which is a serialized representation of your search space.
By default, it will use ``checkpoint-150000.pth.tar`` downloaded previously. In case you want to use the checkpoint trained by yourself from the last step, specify ``--checkpoint`` in the command in ``config_search.yml``.
Single Path One-Shot leverages evolution algorithm to search for the best architecture. In the paper, the search module, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
Then search with evolution tuner.
In this example, we have an incomplete implementation of the evolution search. The example only support training from scratch. Inheriting weights from pretrained supernet is not supported yet. To search with the regularized evolution strategy, run
.. code-block:: bash
nnictl create --config config_search.yml
python search.py
The final architecture exported from every epoch of evolution can be found in ``checkpoints`` under the working directory of your tuner, which, by default, is ``$HOME/nni-experiments/your_experiment_id/log``.
The final architecture exported from every epoch of evolution can be found in ``trials`` under the working directory of your tuner, which, by default, is ``$HOME/nni-experiments/your_experiment_id/trials``.
Step 3. Train from Scratch
^^^^^^^^^^^^^^^^^^^^^^^^^^
Step 3. Train for Evaluation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
......@@ -106,7 +90,7 @@ Known Limitations
* Block search only. Channel search is not supported yet.
* Only GPU version is provided here.
* In the search phase, training from the scratch is required. Inheriting weights from supernet is not supported yet.
Current Reproduction Results
----------------------------
......
[文档](https://nni.readthedocs.io/zh/latest/NAS/SPOS.html)
[文档](https://nni.readthedocs.io/zh/latest/NAS/SPOS.html)
\ No newline at end of file
{
"LayerChoice1": [false, false, true, false],
"LayerChoice2": [false, true, false, false],
"LayerChoice3": [true, false, false, false],
"LayerChoice4": [false, true, false, false],
"LayerChoice5": [false, false, true, false],
"LayerChoice6": [true, false, false, false],
"LayerChoice7": [false, false, true, false],
"LayerChoice8": [true, false, false, false],
"LayerChoice9": [false, false, true, false],
"LayerChoice10": [true, false, false, false],
"LayerChoice11": [false, false, true, false],
"LayerChoice12": [false, false, false, true],
"LayerChoice13": [true, false, false, false],
"LayerChoice14": [true, false, false, false],
"LayerChoice15": [true, false, false, false],
"LayerChoice16": [true, false, false, false],
"LayerChoice17": [false, false, false, true],
"LayerChoice18": [false, false, true, false],
"LayerChoice19": [false, false, false, true],
"LayerChoice20": [false, false, false, true]
}
"LayerChoice1": "2",
"LayerChoice2": "1",
"LayerChoice3": "0",
"LayerChoice4": "1",
"LayerChoice5": "2",
"LayerChoice6": "0",
"LayerChoice7": "2",
"LayerChoice8": "0",
"LayerChoice9": "2",
"LayerChoice10": "0",
"LayerChoice11": "2",
"LayerChoice12": "3",
"LayerChoice13": "0",
"LayerChoice14": "0",
"LayerChoice15": "0",
"LayerChoice16": "0",
"LayerChoice17": "3",
"LayerChoice18": "2",
"LayerChoice19": "3",
"LayerChoice20": "3"
}
\ No newline at end of file
authorName: unknown
experimentName: SPOS Search
trialConcurrency: 4
maxExecDuration: 7d
maxTrialNum: 99999
trainingServicePlatform: local
searchSpacePath: nni_auto_gen_search_space.json
useAnnotation: false
tuner:
codeDir: .
classFileName: tuner.py
className: EvolutionWithFlops
trial:
command: python tester.py --spos-prep
codeDir: .
gpuNum: 1
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import torch.utils.data
from nvidia.dali.pipeline import Pipeline
from nvidia.dali.plugin.pytorch import DALIClassificationIterator
class HybridTrainPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data_dir, crop, seed=12, local_rank=0, world_size=1,
spos_pre=False):
super(HybridTrainPipe, self).__init__(batch_size, num_threads, device_id, seed=seed + device_id)
color_space_type = types.BGR if spos_pre else types.RGB
self.input = ops.FileReader(file_root=data_dir, shard_id=local_rank, num_shards=world_size, random_shuffle=True)
self.decode = ops.ImageDecoder(device="mixed", output_type=color_space_type)
self.res = ops.RandomResizedCrop(device="gpu", size=crop,
interp_type=types.INTERP_LINEAR if spos_pre else types.INTERP_TRIANGULAR)
self.twist = ops.ColorTwist(device="gpu")
self.jitter_rng = ops.Uniform(range=[0.6, 1.4])
self.cmnp = ops.CropMirrorNormalize(device="gpu",
output_dtype=types.FLOAT,
output_layout=types.NCHW,
image_type=color_space_type,
mean=0. if spos_pre else [0.485 * 255, 0.456 * 255, 0.406 * 255],
std=1. if spos_pre else [0.229 * 255, 0.224 * 255, 0.225 * 255])
self.coin = ops.CoinFlip(probability=0.5)
def define_graph(self):
rng = self.coin()
self.jpegs, self.labels = self.input(name="Reader")
images = self.decode(self.jpegs)
images = self.res(images)
images = self.twist(images, saturation=self.jitter_rng(),
contrast=self.jitter_rng(), brightness=self.jitter_rng())
output = self.cmnp(images, mirror=rng)
return [output, self.labels]
class HybridValPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data_dir, crop, size, seed=12, local_rank=0, world_size=1,
spos_pre=False, shuffle=False):
super(HybridValPipe, self).__init__(batch_size, num_threads, device_id, seed=seed + device_id)
color_space_type = types.BGR if spos_pre else types.RGB
self.input = ops.FileReader(file_root=data_dir, shard_id=local_rank, num_shards=world_size,
random_shuffle=shuffle)
self.decode = ops.ImageDecoder(device="mixed", output_type=color_space_type)
self.res = ops.Resize(device="gpu", resize_shorter=size,
interp_type=types.INTERP_LINEAR if spos_pre else types.INTERP_TRIANGULAR)
self.cmnp = ops.CropMirrorNormalize(device="gpu",
output_dtype=types.FLOAT,
output_layout=types.NCHW,
crop=(crop, crop),
image_type=color_space_type,
mean=0. if spos_pre else [0.485 * 255, 0.456 * 255, 0.406 * 255],
std=1. if spos_pre else [0.229 * 255, 0.224 * 255, 0.225 * 255])
def define_graph(self):
self.jpegs, self.labels = self.input(name="Reader")
images = self.decode(self.jpegs)
images = self.res(images)
output = self.cmnp(images)
return [output, self.labels]
class ClassificationWrapper:
def __init__(self, loader, size):
self.loader = loader
self.size = size
def __iter__(self):
return self
def __next__(self):
data = next(self.loader)
return data[0]["data"], data[0]["label"].view(-1).long().cuda(non_blocking=True)
def __len__(self):
return self.size
def get_imagenet_iter_dali(split, image_dir, batch_size, num_threads, crop=224, val_size=256,
spos_preprocessing=False, seed=12, shuffle=False, device_id=None):
world_size, local_rank = 1, 0
if device_id is None:
device_id = torch.cuda.device_count() - 1 # use last gpu
if split == "train":
pipeline = HybridTrainPipe(batch_size=batch_size, num_threads=num_threads, device_id=device_id,
data_dir=os.path.join(image_dir, "train"), seed=seed,
crop=crop, world_size=world_size, local_rank=local_rank,
spos_pre=spos_preprocessing)
elif split == "val":
pipeline = HybridValPipe(batch_size=batch_size, num_threads=num_threads, device_id=device_id,
data_dir=os.path.join(image_dir, "val"), seed=seed,
crop=crop, size=val_size, world_size=world_size, local_rank=local_rank,
spos_pre=spos_preprocessing, shuffle=shuffle)
else:
raise AssertionError
pipeline.build()
num_samples = pipeline.epoch_size("Reader")
return ClassificationWrapper(
DALIClassificationIterator(pipeline, size=num_samples, fill_last_batch=split == "train",
auto_reset=True), (num_samples + batch_size - 1) // batch_size)
......@@ -9,13 +9,14 @@ import random
import numpy as np
import torch
import torch.nn as nn
from dataloader import get_imagenet_iter_dali
from nni.nas.pytorch.fixed import apply_fixed_architecture
from nni.nas.pytorch.utils import AverageMeterGroup
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from nni.retiarii import fixed_arch
from nni.retiarii.oneshot.pytorch.utils import AverageMeterGroup
from torch.utils.tensorboard import SummaryWriter
from network import ShuffleNetV2OneShot
from utils import CrossEntropyLabelSmooth, accuracy
from utils import CrossEntropyLabelSmooth, accuracy, ToBGRTensor
logger = logging.getLogger("nni.spos.scratch")
......@@ -26,6 +27,7 @@ def train(epoch, model, criterion, optimizer, loader, writer, args):
cur_lr = optimizer.param_groups[0]["lr"]
for step, (x, y) in enumerate(loader):
x, y = x.to('cuda'), y.to('cuda')
cur_step = len(loader) * epoch + step
optimizer.zero_grad()
logits = model(x)
......@@ -54,6 +56,7 @@ def validate(epoch, model, criterion, loader, writer, args):
meters = AverageMeterGroup()
with torch.no_grad():
for step, (x, y) in enumerate(loader):
x, y = x.to('cuda'), y.to('cuda')
logits = model(x)
loss = criterion(logits, y)
metrics = accuracy(logits, y)
......@@ -109,9 +112,9 @@ if __name__ == "__main__":
random.seed(args.seed)
torch.backends.cudnn.deterministic = True
model = ShuffleNetV2OneShot(affine=True)
with fixed_arch(args.architecture):
model = ShuffleNetV2OneShot(affine=True)
model.cuda()
apply_fixed_architecture(model, args.architecture)
if torch.cuda.device_count() > 1: # exclude last gpu, saving for data preprocessing on gpu
model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing)
......@@ -128,14 +131,25 @@ if __name__ == "__main__":
raise ValueError("'%s' not supported." % args.lr_decay)
writer = SummaryWriter(log_dir=args.tb_dir)
train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing)
val_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing)
if args.spos_preprocessing:
trans = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),
transforms.RandomHorizontalFlip(0.5),
ToBGRTensor(),
])
else:
trans = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ToTensor()
])
train_dataset = datasets.ImageNet(args.imagenet_dir, split='train', transform=trans)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=args.batch_size, num_workers=args.workers)
val_dataset = datasets.ImageNet(args.imagenet_dir, split='val', transform=trans)
valid_loader = torch.utils.data.DataLoader(val_dataset, batch_size=args.batch_size, num_workers=args.workers)
for epoch in range(args.epochs):
train(epoch, model, criterion, optimizer, train_loader, writer, args)
validate(epoch, model, criterion, val_loader, writer, args)
validate(epoch, model, criterion, valid_loader, writer, args)
scheduler.step()
dump_checkpoint(model, epoch, "scratch_checkpoints")
......
......@@ -6,8 +6,8 @@ import pickle
import re
import torch
import torch.nn as nn
from nni.nas.pytorch import mutables
import nni.retiarii.nn.pytorch as nn
from nni.retiarii.nn.pytorch import LayerChoice
from blocks import ShuffleNetBlock, ShuffleXceptionBlock
......@@ -20,23 +20,20 @@ class ShuffleNetV2OneShot(nn.Module):
'xception_3x3',
]
def __init__(self, input_size=224, first_conv_channels=16, last_conv_channels=1024, n_classes=1000,
op_flops_path="./data/op_flops_dict.pkl", affine=False):
def __init__(self, input_size=224, first_conv_channels=16, last_conv_channels=1024,
n_classes=1000, affine=False):
super().__init__()
assert input_size % 32 == 0
with open(os.path.join(os.path.dirname(__file__), op_flops_path), "rb") as fp:
self._op_flops_dict = pickle.load(fp)
self.stage_blocks = [4, 4, 8, 4]
self.stage_channels = [64, 160, 320, 640]
self._parsed_flops = dict()
self._input_size = input_size
self._feature_map_size = input_size
self._first_conv_channels = first_conv_channels
self._last_conv_channels = last_conv_channels
self._n_classes = n_classes
self._affine = affine
self._layerchoice_count = 0
# building first layer
self.first_conv = nn.Sequential(
......@@ -75,19 +72,15 @@ class ShuffleNetV2OneShot(nn.Module):
base_mid_channels = channels // 2
mid_channels = int(base_mid_channels) # prepare for scale
choice_block = mutables.LayerChoice([
self._layerchoice_count += 1
choice_block = LayerChoice([
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=3, stride=stride, affine=self._affine),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=5, stride=stride, affine=self._affine),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=7, stride=stride, affine=self._affine),
ShuffleXceptionBlock(inp, oup, mid_channels=mid_channels, stride=stride, affine=self._affine)
])
], label="LayerChoice" + str(self._layerchoice_count))
result.append(choice_block)
# find the corresponding flops
flop_key = (inp, oup, mid_channels, self._feature_map_size, self._feature_map_size, stride)
self._parsed_flops[choice_block.key] = [
self._op_flops_dict["{}_stride_{}".format(k, stride)][flop_key] for k in self.block_keys
]
if stride == 2:
self._feature_map_size //= 2
return result
......@@ -104,46 +97,30 @@ class ShuffleNetV2OneShot(nn.Module):
x = self.classifier(x)
return x
def get_candidate_flops(self, candidate):
conv1_flops = self._op_flops_dict["conv1"][(3, self._first_conv_channels,
self._input_size, self._input_size, 2)]
# Should use `last_conv_channels` here, but megvii insists that it's `n_classes`. Keeping it.
# https://github.com/megvii-model/SinglePathOneShot/blob/36eed6cf083497ffa9cfe7b8da25bb0b6ba5a452/src/Supernet/flops.py#L313
rest_flops = self._op_flops_dict["rest_operation"][(self.stage_channels[-1], self._n_classes,
self._feature_map_size, self._feature_map_size, 1)]
total_flops = conv1_flops + rest_flops
for k, m in candidate.items():
parsed_flops_dict = self._parsed_flops[k]
if isinstance(m, dict): # to be compatible with classical nas format
total_flops += parsed_flops_dict[m["_idx"]]
else:
total_flops += parsed_flops_dict[torch.max(m, 0)[1]]
return total_flops
def _initialize_weights(self):
for name, m in self.named_modules():
if isinstance(m, nn.Conv2d):
if 'first' in name:
nn.init.normal_(m.weight, 0, 0.01)
torch.nn.init.normal_(m.weight, 0, 0.01)
else:
nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1])
torch.nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1])
if m.bias is not None:
nn.init.constant_(m.bias, 0)
torch.nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
if m.weight is not None:
nn.init.constant_(m.weight, 1)
torch.nn.init.constant_(m.weight, 1)
if m.bias is not None:
nn.init.constant_(m.bias, 0.0001)
nn.init.constant_(m.running_mean, 0)
torch.nn.init.constant_(m.bias, 0.0001)
torch.nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.BatchNorm1d):
nn.init.constant_(m.weight, 1)
torch.nn.init.constant_(m.weight, 1)
if m.bias is not None:
nn.init.constant_(m.bias, 0.0001)
nn.init.constant_(m.running_mean, 0)
torch.nn.init.constant_(m.bias, 0.0001)
torch.nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
torch.nn.init.normal_(m.weight, 0, 0.01)
if m.bias is not None:
nn.init.constant_(m.bias, 0)
torch.nn.init.constant_(m.bias, 0)
def load_and_parse_state_dict(filepath="./data/checkpoint-150000.pth.tar"):
......
# This file is to demo the usage of multi-trial NAS in the usage of SPOS search space.
import click
import json
import nni.retiarii.evaluator.pytorch as pl
import nni.retiarii.nn.pytorch as nn
import nni.retiarii.strategy as strategy
import torch
from nni.retiarii import serialize
from nni.retiarii.nn.pytorch import LayerChoice
from nni.retiarii.experiment.pytorch import RetiariiExeConfig, RetiariiExperiment
from torchvision import transforms
from torchvision.datasets import CIFAR10
from blocks import ShuffleNetBlock, ShuffleXceptionBlock
from nn_meter import load_latency_predictor
class ShuffleNetV2(nn.Module):
block_keys = [
'shufflenet_3x3',
'shufflenet_5x5',
'shufflenet_7x7',
'xception_3x3',
]
def __init__(self, input_size=224, first_conv_channels=16, last_conv_channels=1024, n_classes=1000, affine=False):
super().__init__()
assert input_size % 32 == 0
self.stage_blocks = [4, 4, 8, 4]
self.stage_channels = [64, 160, 320, 640]
self._parsed_flops = dict()
self._input_size = input_size
self._feature_map_size = input_size
self._first_conv_channels = first_conv_channels
self._last_conv_channels = last_conv_channels
self._n_classes = n_classes
self._affine = affine
# building first layer
self.first_conv = nn.Sequential(
nn.Conv2d(3, first_conv_channels, 3, 2, 1, bias=False),
nn.BatchNorm2d(first_conv_channels, affine=affine),
nn.ReLU(inplace=True),
)
self._feature_map_size //= 2
p_channels = first_conv_channels
features = []
for num_blocks, channels in zip(self.stage_blocks, self.stage_channels):
features.extend(self._make_blocks(num_blocks, p_channels, channels))
p_channels = channels
self.features = nn.Sequential(*features)
self.conv_last = nn.Sequential(
nn.Conv2d(p_channels, last_conv_channels, 1, 1, 0, bias=False),
nn.BatchNorm2d(last_conv_channels, affine=affine),
nn.ReLU(inplace=True),
)
self.globalpool = nn.AvgPool2d(self._feature_map_size)
self.dropout = nn.Dropout(0.1)
self.classifier = nn.Sequential(
nn.Linear(last_conv_channels, n_classes, bias=False),
)
self._initialize_weights()
def _make_blocks(self, blocks, in_channels, channels):
result = []
for i in range(blocks):
stride = 2 if i == 0 else 1
inp = in_channels if i == 0 else channels
oup = channels
base_mid_channels = channels // 2
mid_channels = int(base_mid_channels) # prepare for scale
choice_block = LayerChoice([
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=3, stride=stride, affine=self._affine),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=5, stride=stride, affine=self._affine),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=7, stride=stride, affine=self._affine),
ShuffleXceptionBlock(inp, oup, mid_channels=mid_channels, stride=stride, affine=self._affine)
])
result.append(choice_block)
if stride == 2:
self._feature_map_size //= 2
return result
def forward(self, x):
bs = x.size(0)
x = self.first_conv(x)
x = self.features(x)
x = self.conv_last(x)
x = self.globalpool(x)
x = self.dropout(x)
x = x.contiguous().view(bs, -1)
x = self.classifier(x)
return x
def _initialize_weights(self):
# FIXME this won't work in base engine
for name, m in self.named_modules():
if isinstance(m, nn.Conv2d):
if 'first' in name:
torch.nn.init.normal_(m.weight, 0, 0.01)
else:
torch.nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1])
if m.bias is not None:
torch.nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
if m.weight is not None:
torch.nn.init.constant_(m.weight, 1)
if m.bias is not None:
torch.nn.init.constant_(m.bias, 0.0001)
torch.nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.BatchNorm1d):
torch.nn.init.constant_(m.weight, 1)
if m.bias is not None:
torch.nn.init.constant_(m.bias, 0.0001)
torch.nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.Linear):
torch.nn.init.normal_(m.weight, 0, 0.01)
if m.bias is not None:
torch.nn.init.constant_(m.bias, 0)
from network import ShuffleNetV2OneShot
from utils import get_archchoice_by_model
class LatencyFilter:
def __init__(self, threshold, predictor, predictor_version=None, reverse=False):
"""
Filter the models according to predcted latency.
Filter the models according to predicted latency.
Parameters
----------
......@@ -140,7 +27,7 @@ class LatencyFilter:
determine the targeted device
reverse: `bool`
if reverse is `False`, then the model returns `True` when `latency < threshold`,
else otherwisse
else otherwise
"""
self.predictors = load_latency_predictor(predictor, predictor_version)
self.threshold = threshold
......@@ -153,7 +40,7 @@ class LatencyFilter:
@click.command()
@click.option('--port', default=8081, help='On which port the experiment is run.')
def _main(port):
base_model = ShuffleNetV2(32)
base_model = ShuffleNetV2OneShot(32)
base_predictor = 'cortexA76cpu_tflite21'
transf = [
transforms.RandomCrop(32, padding=4),
......@@ -170,13 +57,12 @@ def _main(port):
val_dataloaders=pl.DataLoader(test_dataset, batch_size=64),
max_epochs=2, gpus=1)
simple_strategy = strategy.Random(model_filter=LatencyFilter(threshold=100, predictor=base_predictor))
simple_strategy = strategy.RegularizedEvolution(model_filter=LatencyFilter(threshold=100, predictor=base_predictor), population_size=2, cycles=2)
exp = RetiariiExperiment(base_model, trainer, strategy=simple_strategy)
exp_config = RetiariiExeConfig('local')
exp_config.trial_concurrency = 2
exp_config.max_trial_number = 2
# exp_config.max_trial_number = 2
exp_config.trial_gpu_number = 1
exp_config.training_service.use_active_gpu = False
exp_config.execution_engine = 'base'
......@@ -185,8 +71,10 @@ def _main(port):
exp.run(exp_config, port)
print('Exported models:')
for model in exp.export_top_models(formatter='dict'):
for i, model in enumerate(exp.export_top_models(formatter='dict')):
print(model)
with open(f'architecture_final_{i}.json', 'w') as f:
json.dump(get_archchoice_by_model(model), f, indent=4)
if __name__ == '__main__':
......
......@@ -8,13 +8,12 @@ import random
import numpy as np
import torch
import torch.nn as nn
from nni.nas.pytorch.callbacks import LRSchedulerCallback
from nni.nas.pytorch.callbacks import ModelCheckpoint
from nni.algorithms.nas.pytorch.spos import SPOSSupernetTrainingMutator, SPOSSupernetTrainer
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from nni.retiarii.oneshot.pytorch import SinglePathTrainer
from dataloader import get_imagenet_iter_dali
from network import ShuffleNetV2OneShot, load_and_parse_state_dict
from utils import CrossEntropyLabelSmooth, accuracy
from utils import CrossEntropyLabelSmooth, accuracy, ToBGRTensor
logger = logging.getLogger("nni.spos.supernet")
......@@ -45,16 +44,17 @@ if __name__ == "__main__":
torch.backends.cudnn.deterministic = True
model = ShuffleNetV2OneShot()
flops_func = model.get_candidate_flops
if args.load_checkpoint:
if not args.spos_preprocessing:
logger.warning("You might want to use SPOS preprocessing if you are loading their checkpoints.")
model.load_state_dict(load_and_parse_state_dict())
# load state_dict and
model_dict = model.state_dict()
model_dict.update(load_and_parse_state_dict())
model.load_state_dict(model_dict)
logger.info(f'Model loaded from ./data/checkpoint-150000.pth.tar')
model.cuda()
if torch.cuda.device_count() > 1: # exclude last gpu, saving for data preprocessing on gpu
model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
mutator = SPOSSupernetTrainingMutator(model, flops_func=flops_func,
flops_lb=290E6, flops_ub=360E6)
criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing)
optimizer = torch.optim.SGD(model.parameters(), lr=args.learning_rate,
momentum=args.momentum, weight_decay=args.weight_decay)
......@@ -62,14 +62,22 @@ if __name__ == "__main__":
lambda step: (1.0 - step / args.epochs)
if step <= args.epochs else 0,
last_epoch=-1)
train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing)
valid_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing)
trainer = SPOSSupernetTrainer(model, criterion, accuracy, optimizer,
args.epochs, train_loader, valid_loader,
mutator=mutator, batch_size=args.batch_size,
log_frequency=args.log_frequency, workers=args.workers,
callbacks=[LRSchedulerCallback(scheduler),
ModelCheckpoint("./checkpoints")])
trainer.train()
if args.spos_preprocessing:
trans = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),
transforms.RandomHorizontalFlip(0.5),
ToBGRTensor(),
])
else:
trans = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ToTensor()
])
train_dataset = datasets.ImageNet(args.imagenet_dir, split='train', transform=trans)
val_dataset = datasets.ImageNet(args.imagenet_dir, split='val', transform=trans)
trainer = SinglePathTrainer(model, criterion, accuracy, optimizer,
args.epochs, train_dataset, val_dataset,
batch_size=args.batch_size,
log_frequency=args.log_frequency, workers=args.workers)
trainer.fit()
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import argparse
import logging
import random
import time
from itertools import cycle
import nni
import numpy as np
import torch
import torch.nn as nn
from nni.algorithms.nas.pytorch.classic_nas import get_and_apply_next_architecture
from nni.nas.pytorch.utils import AverageMeterGroup
from dataloader import get_imagenet_iter_dali
from network import ShuffleNetV2OneShot, load_and_parse_state_dict
from utils import CrossEntropyLabelSmooth, accuracy
logger = logging.getLogger("nni.spos.tester")
def retrain_bn(model, criterion, max_iters, log_freq, loader):
with torch.no_grad():
logger.info("Clear BN statistics...")
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.running_mean = torch.zeros_like(m.running_mean)
m.running_var = torch.ones_like(m.running_var)
logger.info("Train BN with training set (BN sanitize)...")
model.train()
meters = AverageMeterGroup()
for step in range(max_iters):
inputs, targets = next(loader)
logits = model(inputs)
loss = criterion(logits, targets)
metrics = accuracy(logits, targets)
metrics["loss"] = loss.item()
meters.update(metrics)
if step % log_freq == 0 or step + 1 == max_iters:
logger.info("Train Step [%d/%d] %s", step + 1, max_iters, meters)
def test_acc(model, criterion, log_freq, loader):
logger.info("Start testing...")
model.eval()
meters = AverageMeterGroup()
start_time = time.time()
with torch.no_grad():
for step, (inputs, targets) in enumerate(loader):
logits = model(inputs)
loss = criterion(logits, targets)
metrics = accuracy(logits, targets)
metrics["loss"] = loss.item()
meters.update(metrics)
if step % log_freq == 0 or step + 1 == len(loader):
logger.info("Valid Step [%d/%d] time %.3fs acc1 %.4f acc5 %.4f loss %.4f",
step + 1, len(loader), time.time() - start_time,
meters.acc1.avg, meters.acc5.avg, meters.loss.avg)
return meters.acc1.avg
def evaluate_acc(model, criterion, args, loader_train, loader_test):
acc_before = test_acc(model, criterion, args.log_frequency, loader_test)
nni.report_intermediate_result(acc_before)
retrain_bn(model, criterion, args.train_iters, args.log_frequency, loader_train)
acc = test_acc(model, criterion, args.log_frequency, loader_test)
assert isinstance(acc, float)
nni.report_intermediate_result(acc)
nni.report_final_result(acc)
if __name__ == "__main__":
parser = argparse.ArgumentParser("SPOS Candidate Tester")
parser.add_argument("--imagenet-dir", type=str, default="./data/imagenet")
parser.add_argument("--checkpoint", type=str, default="./data/checkpoint-150000.pth.tar")
parser.add_argument("--spos-preprocessing", action="store_true", default=False,
help="When true, image values will range from 0 to 255 and use BGR "
"(as in original repo).")
parser.add_argument("--seed", type=int, default=42)
parser.add_argument("--workers", type=int, default=6)
parser.add_argument("--train-batch-size", type=int, default=128)
parser.add_argument("--train-iters", type=int, default=200)
parser.add_argument("--test-batch-size", type=int, default=512)
parser.add_argument("--log-frequency", type=int, default=10)
args = parser.parse_args()
# use a fixed set of image will improve the performance
torch.manual_seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
np.random.seed(args.seed)
random.seed(args.seed)
torch.backends.cudnn.deterministic = True
assert torch.cuda.is_available()
model = ShuffleNetV2OneShot()
criterion = CrossEntropyLabelSmooth(1000, 0.1)
get_and_apply_next_architecture(model)
model.load_state_dict(load_and_parse_state_dict(filepath=args.checkpoint))
model.cuda()
train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.train_batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing,
seed=args.seed, device_id=0)
val_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.test_batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing, shuffle=True,
seed=args.seed, device_id=0)
train_loader = cycle(train_loader)
evaluate_acc(model, criterion, args, train_loader, val_loader)
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from nni.nas.pytorch.spos import SPOSEvolution
from network import ShuffleNetV2OneShot
class EvolutionWithFlops(SPOSEvolution):
"""
This tuner extends the function of evolution tuner, by limiting the flops generated by tuner.
Needs a function to examine the flops.
"""
def __init__(self, flops_limit=330E6, **kwargs):
super().__init__(**kwargs)
self.model = ShuffleNetV2OneShot()
self.flops_limit = flops_limit
def _is_legal(self, cand):
if not super()._is_legal(cand):
return False
if self.model.get_candidate_flops(cand) > self.flops_limit:
return False
return True
......@@ -3,6 +3,8 @@
import torch
import torch.nn as nn
import numpy as np
import PIL
class CrossEntropyLabelSmooth(nn.Module):
......@@ -39,3 +41,24 @@ def accuracy(output, target, topk=(1, 5)):
correct_k = correct[:k].reshape(-1).float().sum(0)
res["acc{}".format(k)] = correct_k.mul_(1.0 / batch_size).item()
return res
class ToBGRTensor(object):
def __call__(self, img):
assert isinstance(img, (np.ndarray, PIL.Image.Image))
if isinstance(img, PIL.Image.Image):
img = np.asarray(img)
img = img[:,:, ::-1] # 2 BGR
img = np.transpose(img, [2, 0, 1]) # 2 (3, H, W)
img = np.ascontiguousarray(img)
img = torch.from_numpy(img).float()
return img
def get_archchoice_by_model(model):
result = {}
for k, v in model.items():
assert k in v
result[k] = model[k].split("_")[-1]
return result
......@@ -10,7 +10,7 @@ import time
from ..execution import query_available_resources, submit_models
from ..graph import ModelStatus
from .base import BaseStrategy
from .utils import dry_run_for_search_space, get_targeted_model
from .utils import dry_run_for_search_space, get_targeted_model, filter_model
_logger = logging.getLogger(__name__)
......@@ -47,10 +47,12 @@ class RegularizedEvolution(BaseStrategy):
Can be one of "ignore" and "worst". If "ignore", simply give up the model and find a new one.
If "worst", mark the model as -inf (if maximize, inf if minimize), so that the algorithm "learns" to avoid such model.
Default: ignore.
model_filter: Callable[[Model], bool]
Feed the model and return a bool. This will filter the models in search space and select which to submit.
"""
def __init__(self, optimize_mode='maximize', population_size=100, sample_size=25, cycles=20000,
mutation_prob=0.05, on_failure='ignore'):
mutation_prob=0.05, on_failure='ignore', model_filter=None):
assert optimize_mode in ['maximize', 'minimize']
assert on_failure in ['ignore', 'worst']
assert sample_size < population_size
......@@ -67,6 +69,7 @@ class RegularizedEvolution(BaseStrategy):
self._population = collections.deque()
self._running_models = []
self._polling_interval = 2.
self.filter = model_filter
def random(self, search_space):
return {k: random.choice(v) for k, v in search_space.items()}
......@@ -127,8 +130,13 @@ class RegularizedEvolution(BaseStrategy):
def _submit_config(self, config, base_model, mutators):
_logger.debug('Model submitted to running queue: %s', config)
model = get_targeted_model(base_model, mutators, config)
submit_models(model)
self._running_models.append((config, model))
if not filter_model(self.filter, model):
if self.on_failure == "worst":
model.status = ModelStatus.Failed
self._running_models.append((config, model))
else:
submit_models(model)
self._running_models.append((config, model))
return model
def _move_succeeded_models_to_population(self):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment