"vscode:/vscode.git/clone" did not exist on "b816ff86c923e0290f58f2275e831fc17c29ba37"
Commit e129194a authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add new model resnet50v1.5

parents
Pipeline #571 failed with stages
in 0 seconds
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.03-py3
FROM ${FROM_IMAGE_NAME}
ADD requirements.txt /workspace/
WORKDIR /workspace/
RUN pip install --no-cache-dir -r requirements.txt
ADD . /workspace/rn50
WORKDIR /workspace/rn50
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This diff is collapsed.
# Convolutional Network for Image Classification in PyTorch
In this repository you will find implementations of various image classification models.
Detailed information on each model can be found here:
## Table Of Contents
* [Models](#models)
* [Validation accuracy results](#validation-accuracy-results)
* [Training performance results](#training-performance-results)
* [Training performance: NVIDIA DGX A100 (8x A100 80GB)](#training-performance-nvidia-dgx-a100-8x-a100-80gb)
* [Training performance: NVIDIA DGX-1 16GB (8x V100 16GB)](#training-performance-nvidia-dgx-1-16gb-8x-v100-16gb)
* [Training performance: NVIDIA DGX-2 (16x V100 32GB)](#training-performance-nvidia-dgx-2-16x-v100-32gb)
* [Model comparison](#model-comparison)
* [Accuracy vs FLOPS](#accuracy-vs-flops)
* [Latency vs Throughput on different batch sizes](#latency-vs-throughput-on-different-batch-sizes)
## Models
The following table provides links to where you can find additional information on each model:
| **Model** | **Link**|
|:-:|:-:|
| resnet50 | [README](./resnet50v1.5/README.md) |
| resnext101-32x4d | [README](./resnext101-32x4d/README.md) |
| se-resnext101-32x4d | [README](./se-resnext101-32x4d/README.md) |
| EfficientNet | [README](./efficientnet/README.md) |
## Validation accuracy results
Our results were obtained by running the applicable
training scripts in the 20.12 PyTorch NGC container
on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
The specific training script that was run is documented
in the corresponding model's README.
The following table shows the validation accuracy results of the
three classification models side-by-side.
| **Model** | **Mixed Precision Top1** | **Mixed Precision Top5** | **32 bit Top1** | **32 bit Top5** |
|:----------------------:|:------------------------:|:------------------------:|:---------------:|:---------------:|
| efficientnet-b0 | 77.63 | 93.82 | 77.31 | 93.76 |
| efficientnet-b4 | 82.98 | 96.44 | 82.92 | 96.43 |
| efficientnet-widese-b0 | 77.89 | 94.00 | 77.97 | 94.05 |
| efficientnet-widese-b4 | 83.28 | 96.45 | 83.30 | 96.47 |
| resnet50 | 78.60 | 94.19 | 78.69 | 94.16 |
| resnext101-32x4d | 80.43 | 95.06 | 80.40 | 95.04 |
| se-resnext101-32x4d | 81.00 | 95.48 | 81.09 | 95.45 |
## Training performance results
### Training performance: NVIDIA DGX A100 (8x A100 80GB)
Our results were obtained by running the applicable
training scripts in the 21.03 PyTorch NGC container
on NVIDIA DGX A100 with (8x A100 80GB) GPUs.
Performance numbers (in images per second)
were averaged over an entire training epoch.
The specific training script that was run is documented
in the corresponding model's README.
The following table shows the training accuracy results of
all the classification models side-by-side.
| **Model** | **Mixed Precision** | **TF32** | **Mixed Precision Speedup** |
|:----------------------:|:-------------------:|:----------:|:---------------------------:|
| efficientnet-b0 | 16652 img/s | 8193 img/s | 2.03 x |
| efficientnet-b4 | 2570 img/s | 1223 img/s | 2.1 x |
| efficientnet-widese-b0 | 16368 img/s | 8244 img/s | 1.98 x |
| efficientnet-widese-b4 | 2585 img/s | 1223 img/s | 2.11 x |
| resnet50 | 16621 img/s | 7248 img/s | 2.29 x |
| resnext101-32x4d | 7925 img/s | 3471 img/s | 2.28 x |
| se-resnext101-32x4d | 5779 img/s | 2991 img/s | 1.93 x |
### Training performance: NVIDIA DGX-1 16G (8x V100 16GB)
Our results were obtained by running the applicable
training scripts in the 21.03 PyTorch NGC container
on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
Performance numbers (in images per second)
were averaged over an entire training epoch.
The specific training script that was run is documented
in the corresponding model's README.
The following table shows the training accuracy results of all the
classification models side-by-side.
| **Model** | **Mixed Precision** | **FP32** | **Mixed Precision Speedup** |
|:----------------------:|:-------------------:|:----------:|:---------------------------:|
| efficientnet-b0 | 7789 img/s | 4672 img/s | 1.66 x |
| efficientnet-b4 | 1366 img/s | 616 img/s | 2.21 x |
| efficientnet-widese-b0 | 7875 img/s | 4592 img/s | 1.71 x |
| efficientnet-widese-b4 | 1356 img/s | 612 img/s | 2.21 x |
| resnet50 | 8322 img/s | 2855 img/s | 2.91 x |
| resnext101-32x4d | 4065 img/s | 1133 img/s | 3.58 x |
| se-resnext101-32x4d | 2971 img/s | 1004 img/s | 2.95 x |
## Model Comparison
### Accuracy vs FLOPS
![ACCvsFLOPS](./img/ACCvsFLOPS.png)
Plot describes relationship between floating point operations
needed for computing forward pass on a 224px x 224px image,
for the implemented models.
Dot size indicates number of trainable parameters.
### Latency vs Throughput on different batch sizes
![LATvsTHR](./img/LATvsTHR.png)
Plot describes relationship between
inference latency, throughput and batch size
for the implemented models.
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the BSD 3-Clause License (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://opensource.org/licenses/BSD-3-Clause
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import torch
def add_parser_arguments(parser):
parser.add_argument(
"--checkpoint-path", metavar="<path>", help="checkpoint filename"
)
parser.add_argument(
"--weight-path", metavar="<path>", help="name of file in which to store weights"
)
parser.add_argument("--ema", action="store_true", default=False)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="PyTorch ImageNet Training")
add_parser_arguments(parser)
args = parser.parse_args()
checkpoint = torch.load(args.checkpoint_path, map_location=torch.device("cpu"))
key = "state_dict" if not args.ema else "ema_state_dict"
model_state_dict = {
k[len("module.") :] if "module." in k else k: v
for k, v in checkpoint["state_dict"].items()
}
print(f"Loaded model, acc : {checkpoint['best_prec1']}")
torch.save(model_state_dict, args.weight_path)
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the BSD 3-Clause License (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://opensource.org/licenses/BSD-3-Clause
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from PIL import Image
import argparse
import numpy as np
import json
import torch
from torch.cuda.amp import autocast
import torch.backends.cudnn as cudnn
from image_classification import models
import torchvision.transforms as transforms
from image_classification.models import (
resnet50,
resnext101_32x4d,
se_resnext101_32x4d,
efficientnet_b0,
efficientnet_b4,
efficientnet_widese_b0,
efficientnet_widese_b4,
efficientnet_quant_b0,
efficientnet_quant_b4,
)
def available_models():
models = {
m.name: m
for m in [
resnet50,
resnext101_32x4d,
se_resnext101_32x4d,
efficientnet_b0,
efficientnet_b4,
efficientnet_widese_b0,
efficientnet_widese_b4,
efficientnet_quant_b0,
efficientnet_quant_b4,
]
}
return models
def add_parser_arguments(parser):
model_names = available_models().keys()
parser.add_argument("--image-size", default="224", type=int)
parser.add_argument(
"--arch",
"-a",
metavar="ARCH",
default="resnet50",
choices=model_names,
help="model architecture: " + " | ".join(model_names) + " (default: resnet50)",
)
parser.add_argument(
"--precision", metavar="PREC", default="AMP", choices=["AMP", "FP32"]
)
parser.add_argument("--cpu", action="store_true", help="perform inference on CPU")
parser.add_argument("--image", metavar="<path>", help="path to classified image")
def load_jpeg_from_file(path, image_size, cuda=True):
img_transforms = transforms.Compose(
[
transforms.Resize(image_size + 32),
transforms.CenterCrop(image_size),
transforms.ToTensor(),
]
)
img = img_transforms(Image.open(path))
with torch.no_grad():
# mean and std are not multiplied by 255 as they are in training script
# torch dataloader reads data into bytes whereas loading directly
# through PIL creates a tensor with floats in [0,1] range
mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1)
std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1)
if cuda:
mean = mean.cuda()
std = std.cuda()
img = img.cuda()
img = img.float()
input = img.unsqueeze(0).sub_(mean).div_(std)
return input
def check_quant_weight_correctness(checkpoint_path, model):
state_dict = torch.load(checkpoint_path, map_location=torch.device("cpu"))
state_dict = {
k[len("module.") :] if k.startswith("module.") else k: v
for k, v in state_dict.items()
}
quantizers_sd_keys = {
f"{n[0]}._amax" for n in model.named_modules() if "quantizer" in n[0]
}
sd_all_keys = quantizers_sd_keys | set(model.state_dict().keys())
assert set(state_dict.keys()) == sd_all_keys, (
f"Passed quantized architecture, but following keys are missing in "
f"checkpoint: {list(sd_all_keys - set(state_dict.keys()))}"
)
def main(args, model_args):
imgnet_classes = np.array(json.load(open("./LOC_synset_mapping.json", "r")))
try:
model = available_models()[args.arch](**model_args.__dict__)
except RuntimeError as e:
print_in_box(
"Error when creating model, did you forget to run checkpoint2model script?"
)
raise e
if args.arch in ["efficientnet-quant-b0", "efficientnet-quant-b4"]:
check_quant_weight_correctness(model_args.pretrained_from_file, model)
if not args.cpu:
model = model.cuda()
model.eval()
input = load_jpeg_from_file(args.image, args.image_size, cuda=not args.cpu)
with torch.no_grad(), autocast(enabled=args.precision == "AMP"):
output = torch.nn.functional.softmax(model(input), dim=1)
output = output.float().cpu().view(-1).numpy()
top5 = np.argsort(output)[-5:][::-1]
print(args.image)
for c, v in zip(imgnet_classes[top5], output[top5]):
print(f"{c}: {100*v:.1f}%")
def print_in_box(msg):
print("#" * (len(msg) + 10))
print(f"#### {msg} ####")
print("#" * (len(msg) + 10))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="PyTorch ImageNet Classification")
add_parser_arguments(parser)
args, rest = parser.parse_known_args()
model_args, rest = available_models()[args.arch].parser().parse_known_args(rest)
assert len(rest) == 0, f"Unknown args passed: {rest}"
cudnn.benchmark = True
main(args, model_args)
precision:
AMP:
static_loss_scale: 128
amp: True
FP32:
amp: False
TF32:
amp: False
platform:
DGX1V-16G:
workers: 8
prefetch: 4
gpu_affinity: socket_unique_contiguous
DGX1V-32G:
workers: 8
prefetch: 4
gpu_affinity: socket_unique_contiguous
T4:
workers: 8
DGX1V:
workers: 8
prefetch: 4
gpu_affinity: socket_unique_contiguous
DGX2V:
workers: 8
prefetch: 4
gpu_affinity: socket_unique_contiguous
DGXA100:
workers: 10
prefetch: 4
gpu_affinity: socket_unique_contiguous
Z100L:
workers: 8
prefetch: 4
gpu_affinity: none
mode:
benchmark_training: &benchmark_training
print_freq: 1
epochs: 3
training_only: True
evaluate: False
save_checkpoints: False
benchmark_training_short:
<<: *benchmark_training
epochs: 1
data_backend: synthetic
prof: 100
benchmark_inference: &benchmark_inference
print_freq: 1
epochs: 1
training_only: False
evaluate: True
save_checkpoints: False
convergence:
print_freq: 20
training_only: False
evaluate: False
save_checkpoints: True
evaluate:
print_freq: 20
training_only: False
evaluate: True
epochs: 1
save_checkpoints: False
anchors:
# ResNet_like params: {{{
resnet_params: &resnet_params
label_smoothing: 0.1
mixup: 0.2
lr_schedule: cosine
momentum: 0.875
warmup: 8
epochs: 250
data_backend: pytorch
num_classes: 1000
image_size: 224
interpolation: bilinear
resnet_params_896: &resnet_params_896
<<: *resnet_params
optimizer_batch_size: 896
lr: 0.896
weight_decay: 6.103515625e-05
resnet_params_1k: &resnet_params_1k
<<: *resnet_params
optimizer_batch_size: 1024
lr: 1.024
weight_decay: 6.103515625e-05
resnet_params_2k: &resnet_params_2k
<<: *resnet_params
optimizer_batch_size: 2048
lr: 2.048
weight_decay: 3.0517578125e-05
resnet_params_4k: &resnet_params_4k
<<: *resnet_params
optimizer_batch_size: 4096
lr: 4.096
weight_decay: 3.0517578125e-05
# }}}
# EfficienNet Params: {{{
efficientnet_params: &efficientnet_params
optimizer: rmsprop
rmsprop_alpha: 0.9
rmsprop_eps: 0.01
print_freq: 100
label_smoothing: 0.1
mixup: 0.2
lr_schedule: cosine
momentum: 0.9
warmup: 16
epochs: 400
data_backend: pytorch
augmentation: autoaugment
num_classes: 1000
interpolation: bicubic
efficientnet_b0_params_4k: &efficientnet_b0_params_4k
<<: *efficientnet_params
optimizer_batch_size: 4096
lr: 0.08
weight_decay: 1e-05
image_size: 224
efficientnet_b4_params_4k: &efficientnet_b4_params_4k
<<: *efficientnet_params
optimizer_batch_size: 4096
lr: 0.16
weight_decay: 5e-06
image_size: 380
# }}}
models:
resnet50: # {{{
DGX1V: &RN50_DGX1V
AMP:
<<: *resnet_params_2k
arch: resnet50
batch_size: 256
memory_format: nhwc
FP32:
<<: *resnet_params_896
batch_size: 112
DGX1V-16G:
<<: *RN50_DGX1V
DGX1V-32G:
<<: *RN50_DGX1V
DGX2V:
AMP:
<<: *resnet_params_4k
arch: resnet50
batch_size: 256
memory_format: nhwc
FP32:
<<: *resnet_params_4k
arch: resnet50
batch_size: 256
DGXA100:
AMP:
<<: *resnet_params_2k
arch: resnet50
batch_size: 256
memory_format: nhwc
TF32:
<<: *resnet_params_2k
arch: resnet50
batch_size: 256
T4:
AMP:
<<: *resnet_params_2k
arch: resnet50
batch_size: 256
memory_format: nhwc
FP32:
<<: *resnet_params_2k
batch_size: 128
Z100L:
AMP:
<<: *resnet_params_2k
arch: resnet50
batch_size: 128
memory_format: nhwc
FP32:
<<: *resnet_params_2k
batch_size: 128
# }}}
resnext101-32x4d: # {{{
DGX1V: &RNXT_DGX1V
AMP:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 128
memory_format: nhwc
FP32:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 64
DGX1V-16G:
<<: *RNXT_DGX1V
DGX1V-32G:
<<: *RNXT_DGX1V
DGXA100:
AMP:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 128
memory_format: nhwc
TF32:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 128
T4:
AMP:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 128
memory_format: nhwc
FP32:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 64
# }}}
se-resnext101-32x4d: # {{{
DGX1V: &SERNXT_DGX1V
AMP:
<<: *resnet_params_896
arch: se-resnext101-32x4d
batch_size: 112
memory_format: nhwc
FP32:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 64
DGX1V-16G:
<<: *SERNXT_DGX1V
DGX1V-32G:
<<: *SERNXT_DGX1V
DGXA100:
AMP:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 128
memory_format: nhwc
TF32:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 128
T4:
AMP:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 128
memory_format: nhwc
FP32:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 64
# }}}
efficientnet-widese-b0: # {{{
T4:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 64
DGX1V-16G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 64
DGX1V-32G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 256
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 128
DGXA100:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 256
memory_format: nhwc
TF32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 256
# }}}
efficientnet-b0: # {{{
T4:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 64
DGX1V-16G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 64
DGX1V-32G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 256
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 128
DGXA100:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 256
memory_format: nhwc
TF32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 256
# }}}
efficientnet-quant-b0: # {{{
T4:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 64
DGX1V-16G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 64
DGX1V-32G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 256
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 128
DGXA100:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 256
memory_format: nhwc
TF32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 256
# }}}
efficientnet-widese-b4: # {{{
T4:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 16
DGX1V-16G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 16
DGX1V-32G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 64
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 32
DGXA100:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 128
memory_format: nhwc
TF32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 64
# }}}
efficientnet-b4: # {{{
T4:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 16
DGX1V-16G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 16
DGX1V-32G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 64
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 32
DGXA100:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 128
memory_format: nhwc
TF32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 64
# }}}
efficientnet-quant-b4: # {{{
T4:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 16
DGX1V-16G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 16
DGX1V-32G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 64
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 32
DGXA100:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 128
memory_format: nhwc
TF32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 64
# }}}
precision:
AMP:
static_loss_scale: 128
amp: True
FP32:
amp: False
TF32:
amp: False
platform:
DGX1V-16G:
workers: 8
prefetch: 4
gpu_affinity: socket_unique_contiguous
DGX1V-32G:
workers: 8
prefetch: 4
gpu_affinity: socket_unique_contiguous
T4:
workers: 8
DGX1V:
workers: 8
prefetch: 4
gpu_affinity: socket_unique_contiguous
DGX2V:
workers: 8
prefetch: 4
gpu_affinity: socket_unique_contiguous
DGXA100:
workers: 10
prefetch: 4
gpu_affinity: socket_unique_contiguous
mode:
benchmark_training: &benchmark_training
print_freq: 1
epochs: 3
training_only: True
evaluate: False
save_checkpoints: False
benchmark_training_short:
<<: *benchmark_training
epochs: 1
data_backend: synthetic
prof: 100
benchmark_inference: &benchmark_inference
print_freq: 1
epochs: 1
training_only: False
evaluate: True
save_checkpoints: False
convergence:
print_freq: 20
training_only: False
evaluate: False
save_checkpoints: True
evaluate:
print_freq: 20
training_only: False
evaluate: True
epochs: 1
save_checkpoints: False
anchors:
# ResNet_like params: {{{
resnet_params: &resnet_params
label_smoothing: 0.1
mixup: 0.2
lr_schedule: cosine
momentum: 0.875
warmup: 8
epochs: 250
data_backend: pytorch
num_classes: 1000
image_size: 224
interpolation: bilinear
resnet_params_896: &resnet_params_896
<<: *resnet_params
optimizer_batch_size: 896
lr: 0.896
weight_decay: 6.103515625e-05
resnet_params_1k: &resnet_params_1k
<<: *resnet_params
optimizer_batch_size: 1024
lr: 1.024
weight_decay: 6.103515625e-05
resnet_params_2k: &resnet_params_2k
<<: *resnet_params
optimizer_batch_size: 2048
lr: 2.048
weight_decay: 3.0517578125e-05
resnet_params_4k: &resnet_params_4k
<<: *resnet_params
optimizer_batch_size: 4096
lr: 4.096
weight_decay: 3.0517578125e-05
# }}}
# EfficienNet Params: {{{
efficientnet_params: &efficientnet_params
optimizer: rmsprop
rmsprop_alpha: 0.9
rmsprop_eps: 0.01
print_freq: 100
label_smoothing: 0.1
mixup: 0.2
lr_schedule: cosine
momentum: 0.9
warmup: 16
epochs: 400
data_backend: pytorch
augmentation: autoaugment
num_classes: 1000
interpolation: bicubic
efficientnet_b0_params_4k: &efficientnet_b0_params_4k
<<: *efficientnet_params
optimizer_batch_size: 4096
lr: 0.08
weight_decay: 1e-05
image_size: 224
efficientnet_b4_params_4k: &efficientnet_b4_params_4k
<<: *efficientnet_params
optimizer_batch_size: 4096
lr: 0.16
weight_decay: 5e-06
image_size: 380
# }}}
models:
resnet50: # {{{
DGX1V: &RN50_DGX1V
AMP:
<<: *resnet_params_2k
arch: resnet50
batch_size: 256
memory_format: nhwc
FP32:
<<: *resnet_params_896
batch_size: 112
DGX1V-16G:
<<: *RN50_DGX1V
DGX1V-32G:
<<: *RN50_DGX1V
DGX2V:
AMP:
<<: *resnet_params_4k
arch: resnet50
batch_size: 256
memory_format: nhwc
FP32:
<<: *resnet_params_4k
arch: resnet50
batch_size: 256
DGXA100:
AMP:
<<: *resnet_params_2k
arch: resnet50
batch_size: 256
memory_format: nhwc
TF32:
<<: *resnet_params_2k
arch: resnet50
batch_size: 256
T4:
AMP:
<<: *resnet_params_2k
arch: resnet50
batch_size: 256
memory_format: nhwc
FP32:
<<: *resnet_params_2k
batch_size: 128
# }}}
resnext101-32x4d: # {{{
DGX1V: &RNXT_DGX1V
AMP:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 128
memory_format: nhwc
FP32:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 64
DGX1V-16G:
<<: *RNXT_DGX1V
DGX1V-32G:
<<: *RNXT_DGX1V
DGXA100:
AMP:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 128
memory_format: nhwc
TF32:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 128
T4:
AMP:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 128
memory_format: nhwc
FP32:
<<: *resnet_params_1k
arch: resnext101-32x4d
batch_size: 64
# }}}
se-resnext101-32x4d: # {{{
DGX1V: &SERNXT_DGX1V
AMP:
<<: *resnet_params_896
arch: se-resnext101-32x4d
batch_size: 112
memory_format: nhwc
FP32:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 64
DGX1V-16G:
<<: *SERNXT_DGX1V
DGX1V-32G:
<<: *SERNXT_DGX1V
DGXA100:
AMP:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 128
memory_format: nhwc
TF32:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 128
T4:
AMP:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 128
memory_format: nhwc
FP32:
<<: *resnet_params_1k
arch: se-resnext101-32x4d
batch_size: 64
# }}}
efficientnet-widese-b0: # {{{
T4:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 64
DGX1V-16G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 64
DGX1V-32G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 256
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 128
DGXA100:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 256
memory_format: nhwc
TF32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-widese-b0
batch_size: 256
# }}}
efficientnet-b0: # {{{
T4:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 64
DGX1V-16G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 64
DGX1V-32G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 256
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 128
DGXA100:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 256
memory_format: nhwc
TF32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-b0
batch_size: 256
# }}}
efficientnet-quant-b0: # {{{
T4:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 64
DGX1V-16G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 128
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 64
DGX1V-32G:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 256
memory_format: nhwc
FP32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 128
DGXA100:
AMP:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 256
memory_format: nhwc
TF32:
<<: *efficientnet_b0_params_4k
arch: efficientnet-quant-b0
batch_size: 256
# }}}
efficientnet-widese-b4: # {{{
T4:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 16
DGX1V-16G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 16
DGX1V-32G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 64
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 32
DGXA100:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 128
memory_format: nhwc
TF32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-widese-b4
batch_size: 64
# }}}
efficientnet-b4: # {{{
T4:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 16
DGX1V-16G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 16
DGX1V-32G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 64
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 32
DGXA100:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 128
memory_format: nhwc
TF32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-b4
batch_size: 64
# }}}
efficientnet-quant-b4: # {{{
T4:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 16
DGX1V-16G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 32
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 16
DGX1V-32G:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 64
memory_format: nhwc
FP32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 32
DGXA100:
AMP:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 128
memory_format: nhwc
TF32:
<<: *efficientnet_b4_params_4k
arch: efficientnet-quant-b4
batch_size: 64
# }}}
dllogger-v1.0.0 @ 0540a439
Subproject commit 0540a43971f4a8a16693a9de9de73c1072020769
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .logger import (
Backend,
Verbosity,
Logger,
default_step_format,
default_metric_format,
StdOutBackend,
JSONStreamBackend,
)
__version__ = "1.0.0"
class DLLoggerNotInitialized(Exception):
pass
class DLLLoggerAlreadyInitialized(Exception):
pass
class NotInitializedObject(object):
def __getattribute__(self, name):
raise DLLoggerNotInitialized(
"DLLogger not initialized. Initialize DLLogger with init(backends) function"
)
GLOBAL_LOGGER = NotInitializedObject()
def log(step, data, verbosity=Verbosity.DEFAULT):
GLOBAL_LOGGER.log(step, data, verbosity=verbosity)
def metadata(metric, metadata):
GLOBAL_LOGGER.metadata(metric, metadata)
def flush():
GLOBAL_LOGGER.flush()
def init(backends):
global GLOBAL_LOGGER
try:
if isinstance(GLOBAL_LOGGER, Logger):
raise DLLLoggerAlreadyInitialized()
except DLLoggerNotInitialized:
GLOBAL_LOGGER = Logger(backends)
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from abc import ABC, abstractmethod
from collections import defaultdict
from datetime import datetime
import json
import atexit
class Backend(ABC):
def __init__(self, verbosity):
self._verbosity = verbosity
@property
def verbosity(self):
return self._verbosity
@abstractmethod
def log(self, timestamp, elapsedtime, step, data):
pass
@abstractmethod
def metadata(self, timestamp, elapsedtime, metric, metadata):
pass
class Verbosity:
OFF = -1
DEFAULT = 0
VERBOSE = 1
class Logger:
def __init__(self, backends):
self.backends = backends
atexit.register(self.flush)
self.starttime = datetime.now()
def metadata(self, metric, metadata):
timestamp = datetime.now()
elapsedtime = (timestamp - self.starttime).total_seconds()
for b in self.backends:
b.metadata(timestamp, elapsedtime, metric, metadata)
def log(self, step, data, verbosity=1):
timestamp = datetime.now()
elapsedtime = (timestamp - self.starttime).total_seconds()
for b in self.backends:
if b.verbosity >= verbosity:
b.log(timestamp, elapsedtime, step, data)
def flush(self):
for b in self.backends:
b.flush()
def default_step_format(step):
return str(step)
def default_metric_format(metric, metadata, value):
unit = metadata["unit"] if "unit" in metadata.keys() else ""
format = "{" + metadata["format"] + "}" if "format" in metadata.keys() else "{}"
return "{} : {} {}".format(
metric, format.format(value) if value is not None else value, unit
)
def default_prefix_format(timestamp):
return "DLL {} - ".format(timestamp)
class StdOutBackend(Backend):
def __init__(
self,
verbosity,
step_format=default_step_format,
metric_format=default_metric_format,
prefix_format=default_prefix_format,
):
super().__init__(verbosity=verbosity)
self._metadata = defaultdict(dict)
self.step_format = step_format
self.metric_format = metric_format
self.prefix_format = prefix_format
def metadata(self, timestamp, elapsedtime, metric, metadata):
self._metadata[metric].update(metadata)
def log(self, timestamp, elapsedtime, step, data):
print(
"{}{} {}".format(
self.prefix_format(timestamp),
self.step_format(step),
" ".join(
[
self.metric_format(m, self._metadata[m], v)
for m, v in data.items()
]
),
)
)
def flush(self):
pass
class JSONStreamBackend(Backend):
def __init__(self, verbosity, filename, append=False):
super().__init__(verbosity=verbosity)
self._filename = filename
self.file = open(filename, "a" if append else "w")
atexit.register(self.file.close)
def metadata(self, timestamp, elapsedtime, metric, metadata):
self.file.write(
"DLLL {}\n".format(
json.dumps(
dict(
timestamp=str(timestamp.timestamp()),
elapsedtime=str(elapsedtime),
datetime=str(timestamp),
type="METADATA",
metric=metric,
metadata=metadata,
)
)
)
)
def log(self, timestamp, elapsedtime, step, data):
self.file.write(
"DLLL {}\n".format(
json.dumps(
dict(
timestamp=str(timestamp.timestamp()),
datetime=str(timestamp),
elapsedtime=str(elapsedtime),
type="LOG",
step=step,
data=data,
)
)
)
)
def flush(self):
self.file.flush()
This diff is collapsed.
python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 1 --workspace ${1:-./} --raport-file raport_1.json
python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 2 --workspace ${1:-./} --raport-file raport_2.json
python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 4 --workspace ${1:-./} --raport-file raport_4.json
python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 8 --workspace ${1:-./} --raport-file raport_8.json
python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 16 --workspace ${1:-./} --raport-file raport_16.json
python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 32 --workspace ${1:-./} --raport-file raport_32.json
python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 64 --workspace ${1:-./} --raport-file raport_64.json
python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 128 --workspace ${1:-./} --raport-file raport_128.json
python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 256 --workspace ${1:-./} --raport-file raport_256.json
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment