"docs/archive_en_US/Tutorial/FAQ.md" did not exist on "fa6a23cf70d9fca7dab88d7d4cdf00c805b3f608"
Unverified Commit 53b565e4 authored by Houwen Peng's avatar Houwen Peng Committed by GitHub
Browse files

Request for Updating Cream NAS algorithm (#3228)

parent fbbe14d8
......@@ -5,7 +5,7 @@
Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
=======================================================================================
**`[Paper] <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__ `[Models-Google Drive] <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__\ `[Models-Baidu Disk (PWD: wqw6)] <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__ `[BibTex] <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__** :raw-html:`<br/>`
`[Paper] <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__ `[Models-Google Drive] <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__ `[Models-Baidu Disk (PWD: wqw6)] <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__ `[BibTex] <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__ :raw-html:`<br/>`
In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. The discovered architectures achieve superior performance compared to the recent `MobileNetV3 <https://arxiv.org/abs/1905.02244>`__ and `EfficientNet <https://arxiv.org/abs/1905.11946>`__ families under aligned settings.
......
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '112m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '14m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '23m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '287m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '43m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 43
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '481m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 481
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
......@@ -2,12 +2,12 @@ AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '604m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 4
NUM_GPU: 2
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
......@@ -19,34 +19,33 @@ DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 32 # batch size
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.0
SELECTION: 42
DROPOUT_RATE: 0.2
SELECTION: 604
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9998
DECAY: 0.9999
OPT: 'sgd'
OPT_EPS: 1e-2
MOMENTUM: 0.9
DECAY_RATE: 0.1
SCHED: 'sgd'
LR_NOISE: None
LR_NOISE_PCT: 0.67
LR_NOISE_STD: 1.0
WARMUP_LR: 1e-4
MIN_LR: 1e-5
EPOCHS: 200
START_EPOCH: None
DECAY_EPOCHS: 30.0
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
COOLDOWN_EPOCHS: 10
PATIENCE_EPOCHS: 10
LR: 1e-2
\ No newline at end of file
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '72m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0
DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False
NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470
EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5
AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode
......@@ -101,7 +101,7 @@ __C.AUGMENTATION.SMOOTHING = 0.1 # label smoothing parameters
# batch norm parameters (only works with gen_efficientnet based models
# currently)
__C.BATCHNORM = CN()
__C.BATCHNORM.SYNC_BN = True
__C.BATCHNORM.SYNC_BN = False
__C.BATCHNORM.BN_TF = False
__C.BATCHNORM.BN_MOMENTUM = 0.1 # batchnorm momentum override
__C.BATCHNORM.BN_EPS = 1e-5 # batchnorm eps override
......
......@@ -139,8 +139,8 @@ def main():
'[Model-{}] Flops: {} Params: {}'.format(cfg.NET.SELECTION, macs, params))
# create optimizer
optimizer = create_optimizer(cfg, model)
model = model.cuda()
optimizer = create_optimizer(cfg, model)
# optionally resume from a checkpoint
resume_state, resume_epoch = {}, None
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment