"...composable_kernel_onnxruntime.git" did not exist on "716f1c7fb172733d7ec9330b75aece8bad10a423"
Unverified Commit 53b565e4 authored by Houwen Peng's avatar Houwen Peng Committed by GitHub
Browse files

Request for Updating Cream NAS algorithm (#3228)

parent fbbe14d8
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
======================================================================================= =======================================================================================
**`[Paper] <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__ `[Models-Google Drive] <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__\ `[Models-Baidu Disk (PWD: wqw6)] <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__ `[BibTex] <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__** :raw-html:`<br/>` `[Paper] <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__ `[Models-Google Drive] <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__ `[Models-Baidu Disk (PWD: wqw6)] <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__ `[BibTex] <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__ :raw-html:`<br/>`
In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. The discovered architectures achieve superior performance compared to the recent `MobileNetV3 <https://arxiv.org/abs/1905.02244>`__ and `EfficientNet <https://arxiv.org/abs/1905.11946>`__ families under aligned settings. In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. The discovered architectures achieve superior performance compared to the recent `MobileNetV3 <https://arxiv.org/abs/1905.02244>`__ and `EfficientNet <https://arxiv.org/abs/1905.11946>`__ families under aligned settings.
......
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '112m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '14m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '23m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '287m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '43m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 43
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '481m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 481
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
...@@ -2,12 +2,12 @@ AUTO_RESUME: False ...@@ -2,12 +2,12 @@ AUTO_RESUME: False
DATA_DIR: './data/imagenet' DATA_DIR: './data/imagenet'
MODEL: '604m_retrain' MODEL: '604m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar' RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './' SAVE_PATH: './experiments/workspace/retrain'
SEED: 42 SEED: 42
LOG_INTERVAL: 50 LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0 RECOVERY_INTERVAL: 0
WORKERS: 4 WORKERS: 8
NUM_GPU: 2 NUM_GPU: 8
SAVE_IMAGES: False SAVE_IMAGES: False
AMP: False AMP: False
OUTPUT: 'None' OUTPUT: 'None'
...@@ -19,34 +19,33 @@ DATASET: ...@@ -19,34 +19,33 @@ DATASET:
NUM_CLASSES: 1000 NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 32 # batch size BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False NO_PREFECHTER: False
NET: NET:
GP: 'avg' GP: 'avg'
DROPOUT_RATE: 0.0 DROPOUT_RATE: 0.2
SELECTION: 42 SELECTION: 604
EMA: EMA:
USE: True USE: True
FORCE_CPU: False # force model ema to be tracked on CPU FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9998 DECAY: 0.9999
OPT: 'sgd'
OPT_EPS: 1e-2
MOMENTUM: 0.9
DECAY_RATE: 0.1
SCHED: 'sgd' LR: 0.064
LR_NOISE: None EPOCHS: 500
LR_NOISE_PCT: 0.67 OPT_EPS: 1e-3
LR_NOISE_STD: 1.0 SCHED: 'cosine'
WARMUP_LR: 1e-4 OPT: 'rmsproptf'
MIN_LR: 1e-5 WARMUP_LR: 1e-6
EPOCHS: 200 DECAY_EPOCHS: 2.4
START_EPOCH: None DECAY_RATE: 0.973
DECAY_EPOCHS: 30.0
WARMUP_EPOCHS: 3 WARMUP_EPOCHS: 3
COOLDOWN_EPOCHS: 10 WEIGHT_DECAY: 1e-5
PATIENCE_EPOCHS: 10
LR: 1e-2 AUGMENTATION:
\ No newline at end of file AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '72m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
...@@ -101,7 +101,7 @@ __C.AUGMENTATION.SMOOTHING = 0.1 # label smoothing parameters ...@@ -101,7 +101,7 @@ __C.AUGMENTATION.SMOOTHING = 0.1 # label smoothing parameters
# batch norm parameters (only works with gen_efficientnet based models # batch norm parameters (only works with gen_efficientnet based models
# currently) # currently)
__C.BATCHNORM = CN() __C.BATCHNORM = CN()
__C.BATCHNORM.SYNC_BN = True __C.BATCHNORM.SYNC_BN = False
__C.BATCHNORM.BN_TF = False __C.BATCHNORM.BN_TF = False
__C.BATCHNORM.BN_MOMENTUM = 0.1 # batchnorm momentum override __C.BATCHNORM.BN_MOMENTUM = 0.1 # batchnorm momentum override
__C.BATCHNORM.BN_EPS = 1e-5 # batchnorm eps override __C.BATCHNORM.BN_EPS = 1e-5 # batchnorm eps override
......
...@@ -139,8 +139,8 @@ def main(): ...@@ -139,8 +139,8 @@ def main():
'[Model-{}] Flops: {} Params: {}'.format(cfg.NET.SELECTION, macs, params)) '[Model-{}] Flops: {} Params: {}'.format(cfg.NET.SELECTION, macs, params))
# create optimizer # create optimizer
optimizer = create_optimizer(cfg, model)
model = model.cuda() model = model.cuda()
optimizer = create_optimizer(cfg, model)
# optionally resume from a checkpoint # optionally resume from a checkpoint
resume_state, resume_epoch = {}, None resume_state, resume_epoch = {}, None
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment