README.md

# 1. 简介

该脚本是基于目标检测模型SSD_ResNet34的功能测试用例，参考mlperf工程，当mAP值达到0.23时，视为模型收敛并成功结束作业运行。

# 2. 运行

## 安装依赖库
    Cython==0.28.4
    mlperf-compliance==0.0.10
    cycler==0.10.0
    kiwisolver==1.0.1
    matplotlib==2.2.2
    numpy==1.14.5
    Pillow==5.2.0
    pyparsing==2.2.0
    python-dateutil==2.7.3
    pytz==2018.5
    six==1.11.0
    torchvision(if installed, ignore it)
    apex(if installed, ignore it)


## 下载数据集
   
    bash download_dataset.sh


## 运行训练脚本
* 单节点环境配置、系统超参设置脚本为config_singlenode.sh，可根据实际情况对应修改
* 多节点环境配置、系统超参设置脚本为config_multinode.sh，可根据实际情况对应修改
### 单机单卡（FP32）
    python3 train_fp32.py \
                      --epochs "${NUMEPOCHS}" \
                      --warmup-factor 0 \
                      --lr "${LR}" \
                      --no-save \
                      --threshold=0.23 \
                      --data ${DATASET_DIR} \
                      --batch-size ${BATCH_SIZE}
                      --warmup-factor 0
                      --warmup ${WARMUP}
### 单机多卡（FP32）

    python3 -m bind_launch --nsockets_per_node ${NSOCKET} \
                      --ncores_per_socket ${SOCKETCORES} \
                      --nproc_per_node ${NTASKS_PER_NODE} \
                      --no_hyperthreads \
                      --no_membind \
                      train_fp32.py \
                      --epochs "${NUMEPOCHS}" \
                      --warmup-factor 0 \
                      --lr "${LR}" \
                      --no-save \
                      --threshold=0.23 \
                      --data ${DATASET_DIR} \
                      --batch-size ${BATCH_SIZE}
                      --warmup-factor 0
                      --warmup ${WARMUP}
* 可参考作业提交脚本 run_fp32_single.sh 

### 多机多卡（FP32）
   
    sh run_fp32_multi.sh

* 参考run_fp32_multi.sh脚本,其中hostfile文件内容格式参考如下：
        
        node1 slots=4  
        node2 slots=4
    
### 单机单卡（FP16）
    python3  train_fp16.py \
                      --epochs "${NUMEPOCHS}" \
                      --warmup-factor 0 \
                      --lr "${LR}" \
                      --no-save \
                      --threshold=0.23 \
                      --data ${DATASET_DIR} \
                      --opt-level O3 --loss-scale="dynamic" --keep-batchnorm-fp32 True \
                      --batch-size 180 \
                      --warmup ${WARMUP}

### 单机多卡（FP16）
    python3 -m bind_launch --nsockets_per_node ${NSOCKET} \
                      --ncores_per_socket ${SOCKETCORES} \
                      --nproc_per_node ${NTASKS_PER_NODE} \
                      --no_hyperthreads \
                      --no_membind \
                      train_fp16.py \
                      --epochs "${NUMEPOCHS}" \
                      --warmup-factor 0 \
                      --lr "${LR}" \
                      --no-save \
                      --threshold=0.23 \
                      --data ${DATASET_DIR} \
                      --opt-level O3 --loss-scale="dynamic" --keep-batchnorm-fp32 True \
                      --batch-size 180 \
                      --warmup ${WARMUP}
* 可参考作业提交脚本 run_fp16_single.sh                      

### 多机多卡（FP16）
    sh run_fp16_multi.sh
* 类似地, hostfile文件的设置可参考上文部分


# 3. 数据集


### Publiction/Attribution.
Microsoft COCO: COmmon Objects in Context. 2017.

### Training and test data separation
Train on 2017 COCO train data set, compute mAP on 2017 COCO val data set.

# 4. 模型
### Publication/Attribution
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. In the Proceedings of the European Conference on Computer Vision (ECCV), 2016.

Backbone is ResNet34 pretrained on ILSVRC 2012 (from torchvision). Modifications to the backbone networks: remove conv_5x residual blocks, change the first 3x3 convolution of the conv_4x block from stride 2 to stride1 (this increases the resolution of the feature map to which detector heads are attached), attach all 6 detector heads to the output of the last conv_4x residual block. Thus detections are attached to 38x38, 19x19, 10x10, 5x5, 3x3, and 1x1 feature maps.

# 5. 评价指标
### Quality metric
Metric is COCO box mAP (averaged over IoU of 0.5:0.95), computed over 2017 COCO val data.

### Quality target
mAP of 0.23

### Evaluation frequency

### Evaluation thoroughness
All the images in COCO 2017 val data set.

# 6. 参考
[https://github.com/mlperf/training/tree/master/single_stage_detector/ssd](https://github.com/mlperf/training/tree/master/single_stage_detector/ssd)