add new model resnet50v1.5

e129194a · Sugon_ldc · e129194a · e129194a · e129194a · e129194a
Commit e129194a authored Sep 26, 2023 by Sugon_ldc
20 changed files
--- a/Dockerfile
+++ b/Dockerfile
+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.03-py3
+FROM ${FROM_IMAGE_NAME}
+
+ADD requirements.txt /workspace/
+WORKDIR /workspace/
+RUN pip install --no-cache-dir -r requirements.txt
+ADD . /workspace/rn50
+WORKDIR /workspace/rn50
--- a/LICENSE
+++ b/LICENSE
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/LOC_synset_mapping.json
+++ b/LOC_synset_mapping.json
--- a/README.md
+++ b/README.md
+# Convolutional Network for Image Classification in PyTorch
+
+In this repository you will find implementations of various image classification models.
+
+Detailed information on each model can be found here:
+
+## Table Of Contents
+
+* [Models](#models)
+* [Validation accuracy results](#validation-accuracy-results)
+* [Training performance results](#training-performance-results)
+  * [Training performance: NVIDIA DGX A100 (8x A100 80GB)](#training-performance-nvidia-dgx-a100-8x-a100-80gb)
+  * [Training performance: NVIDIA DGX-1 16GB (8x V100 16GB)](#training-performance-nvidia-dgx-1-16gb-8x-v100-16gb)
+  * [Training performance: NVIDIA DGX-2 (16x V100 32GB)](#training-performance-nvidia-dgx-2-16x-v100-32gb)
+* [Model comparison](#model-comparison)
+  * [Accuracy vs FLOPS](#accuracy-vs-flops)
+  * [Latency vs Throughput on different batch sizes](#latency-vs-throughput-on-different-batch-sizes)
+
+## Models
+
+The following table provides links to where you can find additional information on each model:
+
+| **Model** | **Link**|
+|:-:|:-:|
+| resnet50 | [README](./resnet50v1.5/README.md) |
+| resnext101-32x4d | [README](./resnext101-32x4d/README.md) |
+| se-resnext101-32x4d | [README](./se-resnext101-32x4d/README.md) |
+| EfficientNet | [README](./efficientnet/README.md) |
+
+## Validation accuracy results
+
+Our results were obtained by running the applicable
+training scripts in the 20.12 PyTorch NGC container
+on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
+The specific training script that was run is documented
+in the corresponding model's README.
+
+
+The following table shows the validation accuracy results of the
+three classification models side-by-side.
+
+|       **Model**        | **Mixed Precision Top1** | **Mixed Precision Top5** | **32 bit Top1** | **32 bit Top5** |
+|:----------------------:|:------------------------:|:------------------------:|:---------------:|:---------------:|
+|    efficientnet-b0     |          77.63           |          93.82           |      77.31      |      93.76      |
+|    efficientnet-b4     |          82.98           |          96.44           |      82.92      |      96.43      |
+| efficientnet-widese-b0 |          77.89           |          94.00           |      77.97      |      94.05      |
+| efficientnet-widese-b4 |          83.28           |          96.45           |      83.30      |      96.47      |
+|        resnet50        |          78.60           |          94.19           |      78.69      |      94.16      |
+|    resnext101-32x4d    |          80.43           |          95.06           |      80.40      |      95.04      |
+|  se-resnext101-32x4d   |          81.00           |          95.48           |      81.09      |      95.45      |
+
+
+## Training performance results
+
+### Training performance: NVIDIA DGX A100 (8x A100 80GB)
+
+
+Our results were obtained by running the applicable
+training scripts in the 21.03 PyTorch NGC container
+on NVIDIA DGX A100 with (8x A100 80GB) GPUs.
+Performance numbers (in images per second)
+were averaged over an entire training epoch.
+The specific training script that was run is documented
+in the corresponding model's README.
+
+The following table shows the training accuracy results of
+all the classification models side-by-side.
+
+|       **Model**        | **Mixed Precision** |  **TF32**  | **Mixed Precision Speedup** |
+|:----------------------:|:-------------------:|:----------:|:---------------------------:|
+|    efficientnet-b0     |     16652 img/s     | 8193 img/s |           2.03 x            |
+|    efficientnet-b4     |     2570 img/s      | 1223 img/s |            2.1 x            |
+| efficientnet-widese-b0 |     16368 img/s     | 8244 img/s |           1.98 x            |
+| efficientnet-widese-b4 |     2585 img/s      | 1223 img/s |           2.11 x            |
+|        resnet50        |     16621 img/s     | 7248 img/s |           2.29 x            |
+|    resnext101-32x4d    |     7925 img/s      | 3471 img/s |           2.28 x            |
+|  se-resnext101-32x4d   |     5779 img/s      | 2991 img/s |           1.93 x            |
+
+### Training performance: NVIDIA DGX-1 16G (8x V100 16GB)
+
+Our results were obtained by running the applicable
+training scripts in the 21.03 PyTorch NGC container
+on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
+Performance numbers (in images per second)
+were averaged over an entire training epoch.
+The specific training script that was run is documented
+in the corresponding model's README.
+
+The following table shows the training accuracy results of all the
+classification models side-by-side.
+
+|       **Model**        | **Mixed Precision** |  **FP32**  | **Mixed Precision Speedup** |
+|:----------------------:|:-------------------:|:----------:|:---------------------------:|
+|    efficientnet-b0     |     7789 img/s      | 4672 img/s |           1.66 x            |
+|    efficientnet-b4     |     1366 img/s      | 616 img/s  |           2.21 x            |
+| efficientnet-widese-b0 |     7875 img/s      | 4592 img/s |           1.71 x            |
+| efficientnet-widese-b4 |     1356 img/s      | 612 img/s  |           2.21 x            |
+|        resnet50        |     8322 img/s      | 2855 img/s |           2.91 x            |
+|    resnext101-32x4d    |     4065 img/s      | 1133 img/s |           3.58 x            |
+|  se-resnext101-32x4d   |     2971 img/s      | 1004 img/s |           2.95 x            |
+
+## Model Comparison
+
+### Accuracy vs FLOPS
+![ACCvsFLOPS](./img/ACCvsFLOPS.png)
+
+Plot describes relationship between floating point operations
+needed for computing forward pass on a 224px x 224px image, 
+for the implemented models.
+Dot size indicates number of trainable parameters.
+
+### Latency vs Throughput on different batch sizes
+![LATvsTHR](./img/LATvsTHR.png)
+
+Plot describes relationship between
+inference latency, throughput and batch size
+for the implemented models.
--- a/__pycache__/main.cpython-37.pyc
+++ b/__pycache__/main.cpython-37.pyc
--- a/checkpoint2model.py
+++ b/checkpoint2model.py
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the BSD 3-Clause License  (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# https://opensource.org/licenses/BSD-3-Clause
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import torch
+
+
+def add_parser_arguments(parser):
+    parser.add_argument(
+        "--checkpoint-path", metavar="<path>", help="checkpoint filename"
+    )
+    parser.add_argument(
+        "--weight-path", metavar="<path>", help="name of file in which to store weights"
+    )
+    parser.add_argument("--ema", action="store_true", default=False)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="PyTorch ImageNet Training")
+
+    add_parser_arguments(parser)
+    args = parser.parse_args()
+
+    checkpoint = torch.load(args.checkpoint_path, map_location=torch.device("cpu"))
+
+    key = "state_dict" if not args.ema else "ema_state_dict"
+    model_state_dict = {
+        k[len("module.") :] if "module." in k else k: v
+        for k, v in checkpoint["state_dict"].items()
+    }
+    print(f"Loaded model, acc : {checkpoint['best_prec1']}")
+
+    torch.save(model_state_dict, args.weight_path)
--- a/classify.py
+++ b/classify.py
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the BSD 3-Clause License  (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# https://opensource.org/licenses/BSD-3-Clause
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from PIL import Image
+import argparse
+import numpy as np
+import json
+import torch
+from torch.cuda.amp import autocast
+import torch.backends.cudnn as cudnn
+
+from image_classification import models
+import torchvision.transforms as transforms
+
+from image_classification.models import (
+    resnet50,
+    resnext101_32x4d,
+    se_resnext101_32x4d,
+    efficientnet_b0,
+    efficientnet_b4,
+    efficientnet_widese_b0,
+    efficientnet_widese_b4,
+    efficientnet_quant_b0,
+    efficientnet_quant_b4,
+)
+
+
+def available_models():
+    models = {
+        m.name: m
+        for m in [
+            resnet50,
+            resnext101_32x4d,
+            se_resnext101_32x4d,
+            efficientnet_b0,
+            efficientnet_b4,
+            efficientnet_widese_b0,
+            efficientnet_widese_b4,
+            efficientnet_quant_b0,
+            efficientnet_quant_b4,
+        ]
+    }
+    return models
+
+
+def add_parser_arguments(parser):
+    model_names = available_models().keys()
+    parser.add_argument("--image-size", default="224", type=int)
+    parser.add_argument(
+        "--arch",
+        "-a",
+        metavar="ARCH",
+        default="resnet50",
+        choices=model_names,
+        help="model architecture: " + " | ".join(model_names) + " (default: resnet50)",
+    )
+    parser.add_argument(
+        "--precision", metavar="PREC", default="AMP", choices=["AMP", "FP32"]
+    )
+    parser.add_argument("--cpu", action="store_true", help="perform inference on CPU")
+    parser.add_argument("--image", metavar="<path>", help="path to classified image")
+
+
+def load_jpeg_from_file(path, image_size, cuda=True):
+    img_transforms = transforms.Compose(
+        [
+            transforms.Resize(image_size + 32),
+            transforms.CenterCrop(image_size),
+            transforms.ToTensor(),
+        ]
+    )
+
+    img = img_transforms(Image.open(path))
+    with torch.no_grad():
+        # mean and std are not multiplied by 255 as they are in training script
+        # torch dataloader reads data into bytes whereas loading directly
+        # through PIL creates a tensor with floats in [0,1] range
+        mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1)
+        std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1)
+
+        if cuda:
+            mean = mean.cuda()
+            std = std.cuda()
+            img = img.cuda()
+        img = img.float()
+
+        input = img.unsqueeze(0).sub_(mean).div_(std)
+
+    return input
+
+
+def check_quant_weight_correctness(checkpoint_path, model):
+    state_dict = torch.load(checkpoint_path, map_location=torch.device("cpu"))
+    state_dict = {
+        k[len("module.") :] if k.startswith("module.") else k: v
+        for k, v in state_dict.items()
+    }
+    quantizers_sd_keys = {
+        f"{n[0]}._amax" for n in model.named_modules() if "quantizer" in n[0]
+    }
+    sd_all_keys = quantizers_sd_keys | set(model.state_dict().keys())
+    assert set(state_dict.keys()) == sd_all_keys, (
+        f"Passed quantized architecture, but following keys are missing in "
+        f"checkpoint: {list(sd_all_keys - set(state_dict.keys()))}"
+    )
+
+
+def main(args, model_args):
+    imgnet_classes = np.array(json.load(open("./LOC_synset_mapping.json", "r")))
+    try:
+        model = available_models()[args.arch](**model_args.__dict__)
+    except RuntimeError as e:
+        print_in_box(
+            "Error when creating model, did you forget to run checkpoint2model script?"
+        )
+        raise e
+
+    if args.arch in ["efficientnet-quant-b0", "efficientnet-quant-b4"]:
+        check_quant_weight_correctness(model_args.pretrained_from_file, model)
+
+    if not args.cpu:
+        model = model.cuda()
+    model.eval()
+
+    input = load_jpeg_from_file(args.image, args.image_size, cuda=not args.cpu)
+
+    with torch.no_grad(), autocast(enabled=args.precision == "AMP"):
+        output = torch.nn.functional.softmax(model(input), dim=1)
+
+    output = output.float().cpu().view(-1).numpy()
+    top5 = np.argsort(output)[-5:][::-1]
+
+    print(args.image)
+    for c, v in zip(imgnet_classes[top5], output[top5]):
+        print(f"{c}: {100*v:.1f}%")
+
+
+def print_in_box(msg):
+    print("#" * (len(msg) + 10))
+    print(f"#### {msg} ####")
+    print("#" * (len(msg) + 10))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="PyTorch ImageNet Classification")
+
+    add_parser_arguments(parser)
+    args, rest = parser.parse_known_args()
+    model_args, rest = available_models()[args.arch].parser().parse_known_args(rest)
+
+    assert len(rest) == 0, f"Unknown args passed: {rest}"
+
+    cudnn.benchmark = True
+
+    main(args, model_args)
--- a/configs.yml
+++ b/configs.yml
+precision:
+    AMP:
+        static_loss_scale: 128
+        amp: True
+    FP32:
+        amp: False
+    TF32:
+        amp: False
+
+platform:
+    DGX1V-16G:
+        workers: 8
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+    DGX1V-32G:
+        workers: 8
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+    T4:
+        workers: 8
+    DGX1V:
+        workers: 8
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+    DGX2V:
+        workers: 8
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+    DGXA100:
+        workers: 10
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+    Z100L:
+        workers: 8
+        prefetch: 4
+        gpu_affinity: none
+
+mode:
+    benchmark_training: &benchmark_training
+        print_freq: 1
+        epochs: 3
+        training_only: True
+        evaluate: False
+        save_checkpoints: False
+    benchmark_training_short:
+        <<: *benchmark_training
+        epochs: 1
+        data_backend: synthetic
+        prof: 100
+    benchmark_inference: &benchmark_inference
+        print_freq: 1
+        epochs: 1
+        training_only: False
+        evaluate: True
+        save_checkpoints: False
+    convergence:
+        print_freq: 20
+        training_only: False
+        evaluate: False
+        save_checkpoints: True
+    evaluate:
+        print_freq: 20
+        training_only: False
+        evaluate: True
+        epochs: 1
+        save_checkpoints: False
+
+anchors:
+    # ResNet_like params: {{{
+    resnet_params: &resnet_params
+        label_smoothing: 0.1
+        mixup: 0.2
+        lr_schedule: cosine
+        momentum: 0.875
+        warmup: 8
+        epochs: 250
+        data_backend: pytorch
+        num_classes: 1000
+        image_size: 224
+        interpolation: bilinear
+    resnet_params_896: &resnet_params_896
+        <<: *resnet_params
+        optimizer_batch_size: 896
+        lr: 0.896
+        weight_decay: 6.103515625e-05
+    resnet_params_1k: &resnet_params_1k
+        <<: *resnet_params
+        optimizer_batch_size: 1024
+        lr: 1.024
+        weight_decay: 6.103515625e-05
+    resnet_params_2k: &resnet_params_2k
+        <<: *resnet_params
+        optimizer_batch_size: 2048
+        lr: 2.048
+        weight_decay: 3.0517578125e-05
+    resnet_params_4k: &resnet_params_4k
+        <<: *resnet_params
+        optimizer_batch_size: 4096
+        lr: 4.096
+        weight_decay: 3.0517578125e-05
+    # }}}
+    # EfficienNet Params: {{{
+    efficientnet_params: &efficientnet_params
+        optimizer: rmsprop
+        rmsprop_alpha: 0.9
+        rmsprop_eps: 0.01
+        print_freq: 100
+        label_smoothing: 0.1
+        mixup: 0.2
+        lr_schedule: cosine
+        momentum: 0.9
+        warmup: 16 
+        epochs: 400
+        data_backend: pytorch
+        augmentation: autoaugment
+        num_classes: 1000
+        interpolation: bicubic
+    efficientnet_b0_params_4k: &efficientnet_b0_params_4k
+        <<: *efficientnet_params
+        optimizer_batch_size: 4096
+        lr: 0.08
+        weight_decay: 1e-05
+        image_size: 224
+    efficientnet_b4_params_4k: &efficientnet_b4_params_4k
+        <<: *efficientnet_params
+        optimizer_batch_size: 4096
+        lr: 0.16
+        weight_decay: 5e-06
+        image_size: 380
+    # }}}
+models:
+    resnet50: # {{{
+        DGX1V: &RN50_DGX1V
+            AMP:
+                <<: *resnet_params_2k
+                arch: resnet50
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_896
+                batch_size: 112
+        DGX1V-16G:
+            <<: *RN50_DGX1V
+        DGX1V-32G:
+            <<: *RN50_DGX1V
+        DGX2V:
+            AMP:
+                <<: *resnet_params_4k
+                arch: resnet50
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_4k
+                arch: resnet50
+                batch_size: 256
+        DGXA100:
+            AMP:
+                <<: *resnet_params_2k
+                arch: resnet50
+                batch_size: 256
+                memory_format: nhwc
+            TF32:
+                <<: *resnet_params_2k
+                arch: resnet50
+                batch_size: 256
+        T4:
+            AMP:
+                <<: *resnet_params_2k
+                arch: resnet50
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_2k
+                batch_size: 128
+        Z100L:
+            AMP:
+                <<: *resnet_params_2k
+                arch: resnet50
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_2k
+                batch_size: 128
+    # }}}
+    resnext101-32x4d: # {{{
+        DGX1V: &RNXT_DGX1V
+            AMP:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 64
+        DGX1V-16G:
+            <<: *RNXT_DGX1V
+        DGX1V-32G:
+            <<: *RNXT_DGX1V
+        DGXA100:
+            AMP:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 128
+        T4:
+            AMP:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 64
+    # }}}
+    se-resnext101-32x4d: # {{{
+        DGX1V: &SERNXT_DGX1V
+            AMP:
+                <<: *resnet_params_896
+                arch: se-resnext101-32x4d
+                batch_size: 112
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 64
+        DGX1V-16G:
+            <<: *SERNXT_DGX1V
+        DGX1V-32G:
+            <<: *SERNXT_DGX1V
+        DGXA100:
+            AMP:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 128
+        T4:
+            AMP:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 64
+    # }}}
+    efficientnet-widese-b0: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 64
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 64
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 128
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 256
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 256
+    # }}}
+    efficientnet-b0: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 64
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 64
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 128
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 256
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 256
+    # }}}
+    efficientnet-quant-b0: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 64
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 64
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 128
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 256
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 256
+    # }}}
+    efficientnet-widese-b4: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 16
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 16
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 64
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 32
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 64
+    # }}}
+    efficientnet-b4: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 16
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 16
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 64
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 32
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 64
+    # }}}
+    efficientnet-quant-b4: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 16
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 16
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 64
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 32
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 64
+    # }}}
--- a/configs.yml.bak
+++ b/configs.yml.bak
+precision:
+    AMP:
+        static_loss_scale: 128
+        amp: True
+    FP32:
+        amp: False
+    TF32:
+        amp: False
+
+platform:
+    DGX1V-16G:
+        workers: 8
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+    DGX1V-32G:
+        workers: 8
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+    T4:
+        workers: 8
+    DGX1V:
+        workers: 8
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+    DGX2V:
+        workers: 8
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+    DGXA100:
+        workers: 10
+        prefetch: 4
+        gpu_affinity: socket_unique_contiguous
+
+mode:
+    benchmark_training: &benchmark_training
+        print_freq: 1
+        epochs: 3
+        training_only: True
+        evaluate: False
+        save_checkpoints: False
+    benchmark_training_short:
+        <<: *benchmark_training
+        epochs: 1
+        data_backend: synthetic
+        prof: 100
+    benchmark_inference: &benchmark_inference
+        print_freq: 1
+        epochs: 1
+        training_only: False
+        evaluate: True
+        save_checkpoints: False
+    convergence:
+        print_freq: 20
+        training_only: False
+        evaluate: False
+        save_checkpoints: True
+    evaluate:
+        print_freq: 20
+        training_only: False
+        evaluate: True
+        epochs: 1
+        save_checkpoints: False
+
+anchors:
+    # ResNet_like params: {{{
+    resnet_params: &resnet_params
+        label_smoothing: 0.1
+        mixup: 0.2
+        lr_schedule: cosine
+        momentum: 0.875
+        warmup: 8
+        epochs: 250
+        data_backend: pytorch
+        num_classes: 1000
+        image_size: 224
+        interpolation: bilinear
+    resnet_params_896: &resnet_params_896
+        <<: *resnet_params
+        optimizer_batch_size: 896
+        lr: 0.896
+        weight_decay: 6.103515625e-05
+    resnet_params_1k: &resnet_params_1k
+        <<: *resnet_params
+        optimizer_batch_size: 1024
+        lr: 1.024
+        weight_decay: 6.103515625e-05
+    resnet_params_2k: &resnet_params_2k
+        <<: *resnet_params
+        optimizer_batch_size: 2048
+        lr: 2.048
+        weight_decay: 3.0517578125e-05
+    resnet_params_4k: &resnet_params_4k
+        <<: *resnet_params
+        optimizer_batch_size: 4096
+        lr: 4.096
+        weight_decay: 3.0517578125e-05
+    # }}}
+    # EfficienNet Params: {{{
+    efficientnet_params: &efficientnet_params
+        optimizer: rmsprop
+        rmsprop_alpha: 0.9
+        rmsprop_eps: 0.01
+        print_freq: 100
+        label_smoothing: 0.1
+        mixup: 0.2
+        lr_schedule: cosine
+        momentum: 0.9
+        warmup: 16 
+        epochs: 400
+        data_backend: pytorch
+        augmentation: autoaugment
+        num_classes: 1000
+        interpolation: bicubic
+    efficientnet_b0_params_4k: &efficientnet_b0_params_4k
+        <<: *efficientnet_params
+        optimizer_batch_size: 4096
+        lr: 0.08
+        weight_decay: 1e-05
+        image_size: 224
+    efficientnet_b4_params_4k: &efficientnet_b4_params_4k
+        <<: *efficientnet_params
+        optimizer_batch_size: 4096
+        lr: 0.16
+        weight_decay: 5e-06
+        image_size: 380
+    # }}}
+models:
+    resnet50: # {{{
+        DGX1V: &RN50_DGX1V
+            AMP:
+                <<: *resnet_params_2k
+                arch: resnet50
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_896
+                batch_size: 112
+        DGX1V-16G:
+            <<: *RN50_DGX1V
+        DGX1V-32G:
+            <<: *RN50_DGX1V
+        DGX2V:
+            AMP:
+                <<: *resnet_params_4k
+                arch: resnet50
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_4k
+                arch: resnet50
+                batch_size: 256
+        DGXA100:
+            AMP:
+                <<: *resnet_params_2k
+                arch: resnet50
+                batch_size: 256
+                memory_format: nhwc
+            TF32:
+                <<: *resnet_params_2k
+                arch: resnet50
+                batch_size: 256
+        T4:
+            AMP:
+                <<: *resnet_params_2k
+                arch: resnet50
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_2k
+                batch_size: 128
+    # }}}
+    resnext101-32x4d: # {{{
+        DGX1V: &RNXT_DGX1V
+            AMP:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 64
+        DGX1V-16G:
+            <<: *RNXT_DGX1V
+        DGX1V-32G:
+            <<: *RNXT_DGX1V
+        DGXA100:
+            AMP:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 128
+        T4:
+            AMP:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_1k
+                arch: resnext101-32x4d
+                batch_size: 64
+    # }}}
+    se-resnext101-32x4d: # {{{
+        DGX1V: &SERNXT_DGX1V
+            AMP:
+                <<: *resnet_params_896
+                arch: se-resnext101-32x4d
+                batch_size: 112
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 64
+        DGX1V-16G:
+            <<: *SERNXT_DGX1V
+        DGX1V-32G:
+            <<: *SERNXT_DGX1V
+        DGXA100:
+            AMP:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 128
+        T4:
+            AMP:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *resnet_params_1k
+                arch: se-resnext101-32x4d
+                batch_size: 64
+    # }}}
+    efficientnet-widese-b0: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 64
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 64
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 128
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 256
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-widese-b0
+                batch_size: 256
+    # }}}
+    efficientnet-b0: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 64
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 64
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 128
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 256
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-b0
+                batch_size: 256
+    # }}}
+    efficientnet-quant-b0: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 64
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 128
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 64
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 256
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 128
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 256
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b0_params_4k
+                arch: efficientnet-quant-b0
+                batch_size: 256
+    # }}}
+    efficientnet-widese-b4: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 16
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 16
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 64
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 32
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-widese-b4
+                batch_size: 64
+    # }}}
+    efficientnet-b4: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 16
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 16
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 64
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 32
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-b4
+                batch_size: 64
+    # }}}
+    efficientnet-quant-b4: # {{{
+        T4:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 16
+        DGX1V-16G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 32
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 16
+        DGX1V-32G:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 64
+                memory_format: nhwc
+            FP32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 32
+        DGXA100:
+            AMP:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 128
+                memory_format: nhwc
+            TF32:
+                <<: *efficientnet_b4_params_4k
+                arch: efficientnet-quant-b4
+                batch_size: 64
+    # }}}
--- a/dllogger-master.zip
+++ b/dllogger-master.zip
--- a/dllogger-v1.0.0 @ 0540a439
+++ b/dllogger-v1.0.0 @ 0540a439
+Subproject commit 0540a43971f4a8a16693a9de9de73c1072020769
--- a/dllogger/__init__.py
+++ b/dllogger/__init__.py
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .logger import (
+    Backend,
+    Verbosity,
+    Logger,
+    default_step_format,
+    default_metric_format,
+    StdOutBackend,
+    JSONStreamBackend,
+)
+
+__version__ = "1.0.0"
+
+
+class DLLoggerNotInitialized(Exception):
+    pass
+
+
+class DLLLoggerAlreadyInitialized(Exception):
+    pass
+
+
+class NotInitializedObject(object):
+    def __getattribute__(self, name):
+        raise DLLoggerNotInitialized(
+            "DLLogger not initialized. Initialize DLLogger with init(backends) function"
+        )
+
+
+GLOBAL_LOGGER = NotInitializedObject()
+
+
+def log(step, data, verbosity=Verbosity.DEFAULT):
+    GLOBAL_LOGGER.log(step, data, verbosity=verbosity)
+
+
+def metadata(metric, metadata):
+    GLOBAL_LOGGER.metadata(metric, metadata)
+
+
+def flush():
+    GLOBAL_LOGGER.flush()
+
+
+def init(backends):
+    global GLOBAL_LOGGER
+    try:
+        if isinstance(GLOBAL_LOGGER, Logger):
+            raise DLLLoggerAlreadyInitialized()
+    except DLLoggerNotInitialized:
+        GLOBAL_LOGGER = Logger(backends)
--- a/dllogger/__pycache__/__init__.cpython-37.pyc
+++ b/dllogger/__pycache__/__init__.cpython-37.pyc
--- a/dllogger/__pycache__/logger.cpython-37.pyc
+++ b/dllogger/__pycache__/logger.cpython-37.pyc
--- a/dllogger/logger.py
+++ b/dllogger/logger.py
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from abc import ABC, abstractmethod
+from collections import defaultdict
+from datetime import datetime
+import json
+import atexit
+
+
+class Backend(ABC):
+    def __init__(self, verbosity):
+        self._verbosity = verbosity
+
+    @property
+    def verbosity(self):
+        return self._verbosity
+
+    @abstractmethod
+    def log(self, timestamp, elapsedtime, step, data):
+        pass
+
+    @abstractmethod
+    def metadata(self, timestamp, elapsedtime, metric, metadata):
+        pass
+
+
+class Verbosity:
+    OFF = -1
+    DEFAULT = 0
+    VERBOSE = 1
+
+
+class Logger:
+    def __init__(self, backends):
+        self.backends = backends
+        atexit.register(self.flush)
+        self.starttime = datetime.now()
+
+    def metadata(self, metric, metadata):
+        timestamp = datetime.now()
+        elapsedtime = (timestamp - self.starttime).total_seconds()
+        for b in self.backends:
+            b.metadata(timestamp, elapsedtime, metric, metadata)
+
+    def log(self, step, data, verbosity=1):
+        timestamp = datetime.now()
+        elapsedtime = (timestamp - self.starttime).total_seconds()
+        for b in self.backends:
+            if b.verbosity >= verbosity:
+                b.log(timestamp, elapsedtime, step, data)
+
+    def flush(self):
+        for b in self.backends:
+            b.flush()
+
+
+def default_step_format(step):
+    return str(step)
+
+
+def default_metric_format(metric, metadata, value):
+    unit = metadata["unit"] if "unit" in metadata.keys() else ""
+    format = "{" + metadata["format"] + "}" if "format" in metadata.keys() else "{}"
+    return "{} : {} {}".format(
+        metric, format.format(value) if value is not None else value, unit
+    )
+
+
+def default_prefix_format(timestamp):
+    return "DLL {} - ".format(timestamp)
+
+
+class StdOutBackend(Backend):
+    def __init__(
+        self,
+        verbosity,
+        step_format=default_step_format,
+        metric_format=default_metric_format,
+        prefix_format=default_prefix_format,
+    ):
+        super().__init__(verbosity=verbosity)
+
+        self._metadata = defaultdict(dict)
+        self.step_format = step_format
+        self.metric_format = metric_format
+        self.prefix_format = prefix_format
+
+    def metadata(self, timestamp, elapsedtime, metric, metadata):
+        self._metadata[metric].update(metadata)
+
+    def log(self, timestamp, elapsedtime, step, data):
+        print(
+            "{}{} {}".format(
+                self.prefix_format(timestamp),
+                self.step_format(step),
+                " ".join(
+                    [
+                        self.metric_format(m, self._metadata[m], v)
+                        for m, v in data.items()
+                    ]
+                ),
+            )
+        )
+
+    def flush(self):
+        pass
+
+
+class JSONStreamBackend(Backend):
+    def __init__(self, verbosity, filename, append=False):
+        super().__init__(verbosity=verbosity)
+        self._filename = filename
+        self.file = open(filename, "a" if append else "w")
+        atexit.register(self.file.close)
+
+    def metadata(self, timestamp, elapsedtime, metric, metadata):
+        self.file.write(
+            "DLLL {}\n".format(
+                json.dumps(
+                    dict(
+                        timestamp=str(timestamp.timestamp()),
+                        elapsedtime=str(elapsedtime),
+                        datetime=str(timestamp),
+                        type="METADATA",
+                        metric=metric,
+                        metadata=metadata,
+                    )
+                )
+            )
+        )
+
+    def log(self, timestamp, elapsedtime, step, data):
+        self.file.write(
+            "DLLL {}\n".format(
+                json.dumps(
+                    dict(
+                        timestamp=str(timestamp.timestamp()),
+                        datetime=str(timestamp),
+                        elapsedtime=str(elapsedtime),
+                        type="LOG",
+                        step=step,
+                        data=data,
+                    )
+                )
+            )
+        )
+
+    def flush(self):
+        self.file.flush()
--- a/efficientnet/README.md
+++ b/efficientnet/README.md
--- a/efficientnet/img/loss_plot.png
+++ b/efficientnet/img/loss_plot.png
--- a/efficientnet/img/top1_plot.png
+++ b/efficientnet/img/top1_plot.png
--- a/efficientnet/img/top5_plot.png
+++ b/efficientnet/img/top5_plot.png
--- a/efficientnet/inference/AMP/DGXA100_efficientnet-b0_AMP.sh
+++ b/efficientnet/inference/AMP/DGXA100_efficientnet-b0_AMP.sh
+
+python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 1 --workspace ${1:-./} --raport-file raport_1.json
+python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 2 --workspace ${1:-./} --raport-file raport_2.json
+python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 4 --workspace ${1:-./} --raport-file raport_4.json
+python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 8 --workspace ${1:-./} --raport-file raport_8.json
+python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 16 --workspace ${1:-./} --raport-file raport_16.json
+python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 32 --workspace ${1:-./} --raport-file raport_32.json
+python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 64 --workspace ${1:-./} --raport-file raport_64.json
+python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 128 --workspace ${1:-./} --raport-file raport_128.json
+python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-b0 --precision AMP --mode benchmark_inference --platform DGXA100 /imagenet -b 256 --workspace ${1:-./} --raport-file raport_256.json