Port Multi-weight support from prototype to main (#5618)

* Moving basefiles outside of prototype and porting Alexnet, ConvNext, Densenet and EfficientNet. * Porting googlenet * Porting inception * Porting mnasnet * Porting mobilenetv2 * Porting mobilenetv3 * Porting regnet * Porting resnet * Porting shufflenetv2 * Porting squeezenet * Porting vgg * Porting vit * Fix docstrings * Fixing imports * Adding missing import * Fix mobilenet imports * Fix tests * Fix prototype tests * Exclude get_weight from models on test * Fix init files * Porting googlenet * Porting inception * porting mobilenetv2 * porting mobilenetv3 * porting resnet * porting shufflenetv2 * Fix test and linter * Fixing docs. * Porting Detection models (#5617) * fix inits * fix docs * Port faster_rcnn * Port fcos * Port keypoint_rcnn * Port mask_rcnn * Port retinanet * Port ssd * Port ssdlite * Fix linter * Fixing tests * Fixing tests * Fixing vgg test * Porting Optical Flow, Segmentation, Video models (#5619) * Porting raft * Porting video resnet * Porting deeplabv3 * Porting fcn and lraspp * Fixing the tests and linter * Porting docs, examples, tutorials and galleries (#5620) * Fix examples, tutorials and gallery * Update gallery/plot_optical_flow.py Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> * Fix import * Revert hardcoded normalization * fix uncommitted changes * Fix bug * Fix more bugs * Making resize optional for segmentation * Fixing preset * Fix mypy * Fixing documentation strings * Fix flake8 * minor refactoring Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> * Resolve conflict * Porting model tests (#5622) * Porting tests * Remove unnecessary variable * Fix linter * Move prototype to extended tests * Fix download models job * Update CI on Multiweight branch to use the new weight download approach (#5628) * port Pad to prototype transforms (#5621) * port Pad to prototype transforms * use literal * Bump up LibTorchvision version number for Podspec to release Cocoapods (#5624) Co-authored-by: Anton Thomma <anton@pri.co.nz> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com> * pre-download model weights in CI docs build (#5625) * pre-download model weights in CI docs build * move changes into template * change docs image * Regenerated config.yml Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Anton Thomma <11010310+thommaa@users.noreply.github.com> Co-authored-by: Anton Thomma <anton@pri.co.nz> * Porting reference scripts and updating presets (#5629) * Making _preset.py classes * Remove support of targets on presets. * Rewriting the video preset * Adding tests to check that the bundled transforms are JIT scriptable * Rename all presets from *Eval to *Inference * Minor refactoring * Remove --prototype and --pretrained from reference scripts * remove pretained_backbone refs * Corrections and simplifications * Fixing bug * Fixing linter * Fix flake8 * restore documentation example * minor fixes * fix optical flow missing param * Fixing commands * Adding weights_backbone support in detection and segmentation * Updating the commands for InceptionV3 * Setting `weights_backbone` to its fully BC value (#5653) * Replace default `weights_backbone=None` with its BC values. * Fixing tests * Fix linter * Update docs. * Update preprocessing on reference scripts. * Change qat/ptq to their full values. * Refactoring preprocessing * Fix video preset * No initialization on VGG if pretrained * Fix warning messages for backbone utils. * Adding star to all preset constructors. * Fix mypy. Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Anton Thomma <11010310+thommaa@users.noreply.github.com> Co-authored-by: Anton Thomma <anton@pri.co.nz>

Port Multi-weight support from prototype to main (#5618)
* Moving basefiles outside of prototype and porting Alexnet, ConvNext, Densenet and EfficientNet. * Porting googlenet * Porting inception * Porting mnasnet * Porting mobilenetv2 * Porting mobilenetv3 * Porting regnet * Porting resnet * Porting shufflenetv2 * Porting squeezenet * Porting vgg * Porting vit * Fix docstrings * Fixing imports * Adding missing import * Fix mobilenet imports * Fix tests * Fix prototype tests * Exclude get_weight from models on test * Fix init files * Porting googlenet * Porting inception * porting mobilenetv2 * porting mobilenetv3 * porting resnet * porting shufflenetv2 * Fix test and linter * Fixing docs. * Porting Detection models (#5617) * fix inits * fix docs * Port faster_rcnn * Port fcos * Port keypoint_rcnn * Port mask_rcnn * Port retinanet * Port ssd * Port ssdlite * Fix linter * Fixing tests * Fixing tests * Fixing vgg test * Porting Optical Flow, Segmentation, Video models (#5619) * Porting raft * Porting video resnet * Porting deeplabv3 * Porting fcn and lraspp * Fixing the tests and linter * Porting docs, examples, tutorials and galleries (#5620) * Fix examples, tutorials and gallery * Update gallery/plot_optical_flow.py Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> * Fix import * Revert hardcoded normalization * fix uncommitted changes * Fix bug * Fix more bugs * Making resize optional for segmentation * Fixing preset * Fix mypy * Fixing documentation strings * Fix flake8 * minor refactoring Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> * Resolve conflict * Porting model tests (#5622) * Porting tests * Remove unnecessary variable * Fix linter * Move prototype to extended tests * Fix download models job * Update CI on Multiweight branch to use the new weight download approach (#5628) * port Pad to prototype transforms (#5621) * port Pad to prototype transforms * use literal * Bump up LibTorchvision version number for Podspec to release Cocoapods (#5624) Co-authored-by: Anton Thomma <anton@pri.co.nz> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com> * pre-download model weights in CI docs build (#5625) * pre-download model weights in CI docs build * move changes into template * change docs image * Regenerated config.yml Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Anton Thomma <11010310+thommaa@users.noreply.github.com> Co-authored-by: Anton Thomma <anton@pri.co.nz> * Porting reference scripts and updating presets (#5629) * Making _preset.py classes * Remove support of targets on presets. * Rewriting the video preset * Adding tests to check that the bundled transforms are JIT scriptable * Rename all presets from *Eval to *Inference * Minor refactoring * Remove --prototype and --pretrained from reference scripts * remove pretained_backbone refs * Corrections and simplifications * Fixing bug * Fixing linter * Fix flake8 * restore documentation example * minor fixes * fix optical flow missing param * Fixing commands * Adding weights_backbone support in detection and segmentation * Updating the commands for InceptionV3 * Setting `weights_backbone` to its fully BC value (#5653) * Replace default `weights_backbone=None` with its BC values. * Fixing tests * Fix linter * Update docs. * Update preprocessing on reference scripts. * Change qat/ptq to their full values. * Refactoring preprocessing * Fix video preset * No initialization on VGG if pretrained * Fix warning messages for backbone utils. * Adding star to all preset constructors. * Fix mypy. Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Anton Thomma <11010310+thommaa@users.noreply.github.com> Co-authored-by: Anton Thomma <anton@pri.co.nz>
11bd2eaa · Vasilis Vryniotis · GitHub · 375e4ab2 · 11bd2eaa · 11bd2eaa
Unverified Commit 11bd2eaa authored Mar 22, 2022 by Vasilis Vryniotis Committed by GitHub Mar 22, 2022
20 changed files
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -366,19 +366,28 @@ jobs:
    resource_class: xlarge
    steps:
      - checkout
-      - download_model_weights:
-          extract_roots: torchvision/prototype/models
      - install_torchvision
      - install_prototype_dependencies
      - pip_install:
          args: scipy pycocotools h5py
          descr: Install optional dependencies
-      - run:
-          name: Enable prototype tests
-          command: echo 'export PYTORCH_TEST_WITH_PROTOTYPE=1' >> $BASH_ENV
      - run_tests_selective:
          file_or_dir: test/test_prototype_*.py
+  unittest_extended:
+    docker:
+      - image: circleci/python:3.7
+    resource_class: xlarge
+    steps:
+      - checkout
+      - download_model_weights
+      - install_torchvision
+      - run:
+          name: Enable extended tests
+          command: echo 'export PYTORCH_TEST_WITH_EXTENDED=1' >> $BASH_ENV
+      - run_tests_selective:
+          file_or_dir: test/test_extended_*.py
  binary_linux_wheel:
    <<: *binary_common
    docker:
@@ -1629,6 +1638,7 @@ workflows:
      - unittest_torchhub
      - unittest_onnx
      - unittest_prototype
+      - unittest_extended
      - unittest_linux_cpu:
          cu_version: cpu
          name: unittest_linux_cpu_py3.7

--- a/.circleci/config.yml.in
+++ b/.circleci/config.yml.in
@@ -366,19 +366,28 @@ jobs:
    resource_class: xlarge
    steps:
      - checkout
-      - download_model_weights:
-          extract_roots: torchvision/prototype/models
      - install_torchvision
      - install_prototype_dependencies
      - pip_install:
          args: scipy pycocotools h5py
          descr: Install optional dependencies
-      - run:
-          name: Enable prototype tests
-          command: echo 'export PYTORCH_TEST_WITH_PROTOTYPE=1' >> $BASH_ENV
      - run_tests_selective:
          file_or_dir: test/test_prototype_*.py
+  unittest_extended:
+    docker:
+      - image: circleci/python:3.7
+    resource_class: xlarge
+    steps:
+      - checkout
+      - download_model_weights
+      - install_torchvision
+      - run:
+          name: Enable extended tests
+          command: echo 'export PYTORCH_TEST_WITH_EXTENDED=1' >> $BASH_ENV
+      - run_tests_selective:
+          file_or_dir: test/test_extended_*.py
  binary_linux_wheel:
    <<: *binary_common
    docker:
@@ -1115,6 +1124,7 @@ workflows:
      - unittest_torchhub
      - unittest_onnx
      - unittest_prototype
+      - unittest_extended
      {{ unittest_workflows() }}
  cmake:

--- a/android/test_app/make_assets.py
+++ b/android/test_app/make_assets.py
 import torch
-import torchvision
 from torch.utils.mobile_optimizer import optimize_for_mobile
+from torchvision.models.detection import (
+    fasterrcnn_mobilenet_v3_large_320_fpn,
+    FasterRCNN_MobileNet_V3_Large_320_FPN_Weights,
+)
 print(torch.__version__)
-model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(
+model = fasterrcnn_mobilenet_v3_large_320_fpn(
-    pretrained=True, box_score_thresh=0.7, rpn_post_nms_top_n_test=100, rpn_score_thresh=0.4, rpn_pre_nms_top_n_test=150
+    weights=FasterRCNN_MobileNet_V3_Large_320_FPN_Weights.DEFAULT,
+    box_score_thresh=0.7,
+    rpn_post_nms_top_n_test=100,
+    rpn_score_thresh=0.4,
+    rpn_pre_nms_top_n_test=150,
 )
 model.eval()

--- a/docs/source/models.rst
+++ b/docs/source/models.rst
@@ -98,58 +98,6 @@ You can construct a model with random weights by calling its constructor:
    convnext_large = models.convnext_large()
 We provide pre-trained models, using the PyTorch :mod:`torch.utils.model_zoo`.
-These can be constructed by passing ``pretrained=True``:
-.. code:: python
-    import torchvision.models as models
-    resnet18 = models.resnet18(pretrained=True)
-    alexnet = models.alexnet(pretrained=True)
-    squeezenet = models.squeezenet1_0(pretrained=True)
-    vgg16 = models.vgg16(pretrained=True)
-    densenet = models.densenet161(pretrained=True)
-    inception = models.inception_v3(pretrained=True)
-    googlenet = models.googlenet(pretrained=True)
-    shufflenet = models.shufflenet_v2_x1_0(pretrained=True)
-    mobilenet_v2 = models.mobilenet_v2(pretrained=True)
-    mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True)
-    mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True)
-    resnext50_32x4d = models.resnext50_32x4d(pretrained=True)
-    wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)
-    mnasnet = models.mnasnet1_0(pretrained=True)
-    efficientnet_b0 = models.efficientnet_b0(pretrained=True)
-    efficientnet_b1 = models.efficientnet_b1(pretrained=True)
-    efficientnet_b2 = models.efficientnet_b2(pretrained=True)
-    efficientnet_b3 = models.efficientnet_b3(pretrained=True)
-    efficientnet_b4 = models.efficientnet_b4(pretrained=True)
-    efficientnet_b5 = models.efficientnet_b5(pretrained=True)
-    efficientnet_b6 = models.efficientnet_b6(pretrained=True)
-    efficientnet_b7 = models.efficientnet_b7(pretrained=True)
-    efficientnet_v2_s = models.efficientnet_v2_s(pretrained=True)
-    efficientnet_v2_m = models.efficientnet_v2_m(pretrained=True)
-    efficientnet_v2_l = models.efficientnet_v2_l(pretrained=True)
-    regnet_y_400mf = models.regnet_y_400mf(pretrained=True)
-    regnet_y_800mf = models.regnet_y_800mf(pretrained=True)
-    regnet_y_1_6gf = models.regnet_y_1_6gf(pretrained=True)
-    regnet_y_3_2gf = models.regnet_y_3_2gf(pretrained=True)
-    regnet_y_8gf = models.regnet_y_8gf(pretrained=True)
-    regnet_y_16gf = models.regnet_y_16gf(pretrained=True)
-    regnet_y_32gf = models.regnet_y_32gf(pretrained=True)
-    regnet_x_400mf = models.regnet_x_400mf(pretrained=True)
-    regnet_x_800mf = models.regnet_x_800mf(pretrained=True)
-    regnet_x_1_6gf = models.regnet_x_1_6gf(pretrained=True)
-    regnet_x_3_2gf = models.regnet_x_3_2gf(pretrained=True)
-    regnet_x_8gf = models.regnet_x_8gf(pretrained=True)
-    regnet_x_16gf = models.regnet_x_16gf(pretrainedTrue)
-    regnet_x_32gf = models.regnet_x_32gf(pretrained=True)
-    vit_b_16 = models.vit_b_16(pretrained=True)
-    vit_b_32 = models.vit_b_32(pretrained=True)
-    vit_l_16 = models.vit_l_16(pretrained=True)
-    vit_l_32 = models.vit_l_32(pretrained=True)
-    convnext_tiny = models.convnext_tiny(pretrained=True)
-    convnext_small = models.convnext_small(pretrained=True)
-    convnext_base = models.convnext_base(pretrained=True)
-    convnext_large = models.convnext_large(pretrained=True)
 Instancing a pre-trained model will download its weights to a cache directory.
 This directory can be set using the `TORCH_HOME` environment variable. See
@@ -525,7 +473,7 @@ Obtaining a pre-trained quantized model can be done with a few lines of code:
 .. code:: python
    import torchvision.models as models
-    model = models.quantization.mobilenet_v2(pretrained=True, quantize=True)
+    model = models.quantization.mobilenet_v2(weights=MobileNet_V2_QuantizedWeights.IMAGENET1K_QNNPACK_V1, quantize=True)
    model.eval()
    # run the model with quantized inputs and weights
    out = model(torch.rand(1, 3, 224, 224))

--- a/examples/cpp/hello_world/trace_model.py
+++ b/examples/cpp/hello_world/trace_model.py
@@ -6,7 +6,7 @@ import torchvision
 HERE = osp.dirname(osp.abspath(__file__))
 ASSETS = osp.dirname(osp.dirname(HERE))
-model = torchvision.models.resnet18(pretrained=False)
+model = torchvision.models.resnet18()
 model.eval()
 traced_model = torch.jit.script(model)

--- a/gallery/plot_optical_flow.py
+++ b/gallery/plot_optical_flow.py
@@ -19,7 +19,6 @@ import numpy as np
 import torch
 import matplotlib.pyplot as plt
 import torchvision.transforms.functional as F
-import torchvision.transforms as T
 plt.rcParams["savefig.bbox"] = "tight"
@@ -88,24 +87,19 @@ plot(img1_batch)
 # reduce the image sizes for the example to run faster. Image dimension must be
 # divisible by 8.
+from torchvision.models.optical_flow import Raft_Large_Weights
-def preprocess(batch):
+weights = Raft_Large_Weights.DEFAULT
-    transforms = T.Compose(
+transforms = weights.transforms()
-        [
-            T.ConvertImageDtype(torch.float32),
-            T.Normalize(mean=0.5, std=0.5),  # map [0, 1] into [-1, 1]
-            T.Resize(size=(520, 960)),
-        ]
-    )
-    batch = transforms(batch)
-    return batch
-# If you can, run this example on a GPU, it will be a lot faster.
+def preprocess(img1_batch, img2_batch):
-device = "cuda" if torch.cuda.is_available() else "cpu"
+    img1_batch = F.resize(img1_batch, size=[520, 960])
+    img2_batch = F.resize(img2_batch, size=[520, 960])
+    return transforms(img1_batch, img2_batch)
-img1_batch = preprocess(img1_batch).to(device)
+img1_batch, img2_batch = preprocess(img1_batch, img2_batch)
-img2_batch = preprocess(img2_batch).to(device)
 print(f"shape = {img1_batch.shape}, dtype = {img1_batch.dtype}")
@@ -121,7 +115,10 @@ print(f"shape = {img1_batch.shape}, dtype = {img1_batch.dtype}")
 from torchvision.models.optical_flow import raft_large
-model = raft_large(pretrained=True, progress=False).to(device)
+# If you can, run this example on a GPU, it will be a lot faster.
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = raft_large(weights=Raft_Large_Weights.DEFAULT, progress=False).to(device)
 model = model.eval()
 list_of_flows = model(img1_batch.to(device), img2_batch.to(device))
@@ -182,10 +179,9 @@ plot(grid)
 # from torchvision.io import write_jpeg
 # for i, (img1, img2) in enumerate(zip(frames, frames[1:])):
 #     # Note: it would be faster to predict batches of flows instead of individual flows
-#     img1 = preprocess(img1[None]).to(device)
+#     img1, img2 = preprocess(img1, img2)
-#     img2 = preprocess(img2[None]).to(device)
-#     list_of_flows = model(img1_batch, img2_batch)
+#     list_of_flows = model(img1.to(device), img1.to(device))
 #     predicted_flow = list_of_flows[-1][0]
 #     flow_img = flow_to_image(predicted_flow).to("cpu")
 #     output_folder = "/tmp/"  # Update this to the folder of your choice

--- a/gallery/plot_repurposing_annotations.py
+++ b/gallery/plot_repurposing_annotations.py
@@ -139,12 +139,14 @@ show(drawn_boxes)
 # Here is demo with a Faster R-CNN model loaded from
 # :func:`~torchvision.models.detection.fasterrcnn_resnet50_fpn`
-from torchvision.models.detection import fasterrcnn_resnet50_fpn
+from torchvision.models.detection import fasterrcnn_resnet50_fpn, FasterRCNN_ResNet50_FPN_Weights
-model = fasterrcnn_resnet50_fpn(pretrained=True, progress=False)
+weights = FasterRCNN_ResNet50_FPN_Weights.DEFAULT
+model = fasterrcnn_resnet50_fpn(weights=weights, progress=False)
 print(img.size())
-img = F.convert_image_dtype(img, torch.float)
+tranforms = weights.transforms()
+img = tranforms(img)
 target = {}
 target["boxes"] = boxes
 target["labels"] = labels = torch.ones((masks.size(0),), dtype=torch.int64)

--- a/gallery/plot_scripted_tensor_transforms.py
+++ b/gallery/plot_scripted_tensor_transforms.py
@@ -85,20 +85,16 @@ show([transformed_dog1, transformed_dog2])
 # Let's define a ``Predictor`` module that transforms the input tensor and then
 # applies an ImageNet model on it.
-from torchvision.models import resnet18
+from torchvision.models import resnet18, ResNet18_Weights
 class Predictor(nn.Module):
    def __init__(self):
        super().__init__()
-        self.resnet18 = resnet18(pretrained=True, progress=False).eval()
+        weights = ResNet18_Weights.DEFAULT
-        self.transforms = nn.Sequential(
+        self.resnet18 = resnet18(weights=weights, progress=False).eval()
-            T.Resize([256, ]),  # We use single int value inside a list due to torchscript type restrictions
+        self.transforms = weights.transforms()
-            T.CenterCrop(224),
-            T.ConvertImageDtype(torch.float),
-            T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
-        )
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        with torch.no_grad():

--- a/gallery/plot_visualization_utils.py
+++ b/gallery/plot_visualization_utils.py
@@ -73,14 +73,17 @@ show(result)
 # :func:`~torchvision.models.detection.ssd300_vgg16`. For more details
 # on the output of such models, you may refer to :ref:`instance_seg_output`.
-from torchvision.models.detection import fasterrcnn_resnet50_fpn
+from torchvision.models.detection import fasterrcnn_resnet50_fpn, FasterRCNN_ResNet50_FPN_Weights
-from torchvision.transforms.functional import convert_image_dtype
 batch_int = torch.stack([dog1_int, dog2_int])
-batch = convert_image_dtype(batch_int, dtype=torch.float)
-model = fasterrcnn_resnet50_fpn(pretrained=True, progress=False)
+weights = FasterRCNN_ResNet50_FPN_Weights.DEFAULT
+transforms = weights.transforms()
+batch = transforms(batch_int)
+model = fasterrcnn_resnet50_fpn(weights=weights, progress=False)
 model = model.eval()
 outputs = model(batch)
@@ -120,13 +123,15 @@ show(dogs_with_boxes)
 # images must be normalized before they're passed to a semantic segmentation
 # model.
-from torchvision.models.segmentation import fcn_resnet50
+from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights
+weights = FCN_ResNet50_Weights.DEFAULT
+transforms = weights.transforms(resize_size=None)
-model = fcn_resnet50(pretrained=True, progress=False)
+model = fcn_resnet50(weights=weights, progress=False)
 model = model.eval()
-normalized_batch = F.normalize(batch, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
+normalized_batch = transforms(batch)
 output = model(normalized_batch)['out']
 print(output.shape, output.min().item(), output.max().item())
@@ -262,8 +267,14 @@ show(dogs_with_masks)
 #     of them may not have masks, like
 #     :func:`~torchvision.models.detection.fasterrcnn_resnet50_fpn`.
-from torchvision.models.detection import maskrcnn_resnet50_fpn
+from torchvision.models.detection import maskrcnn_resnet50_fpn, MaskRCNN_ResNet50_FPN_Weights
-model = maskrcnn_resnet50_fpn(pretrained=True, progress=False)
+weights = MaskRCNN_ResNet50_FPN_Weights.DEFAULT
+transforms = weights.transforms()
+batch = transforms(batch_int)
+model = maskrcnn_resnet50_fpn(weights=weights, progress=False)
 model = model.eval()
 output = model(batch)
@@ -378,13 +389,17 @@ show(dogs_with_masks)
 # Note that the keypoint detection model does not need normalized images.
 #
-from torchvision.models.detection import keypointrcnn_resnet50_fpn
+from torchvision.models.detection import keypointrcnn_resnet50_fpn, KeypointRCNN_ResNet50_FPN_Weights
 from torchvision.io import read_image
 person_int = read_image(str(Path("assets") / "person1.jpg"))
-person_float = convert_image_dtype(person_int, dtype=torch.float)
-model = keypointrcnn_resnet50_fpn(pretrained=True, progress=False)
+weights = KeypointRCNN_ResNet50_FPN_Weights.DEFAULT
+transforms = weights.transforms()
+person_float = transforms(person_int)
+model = keypointrcnn_resnet50_fpn(weights=weights, progress=False)
 model = model.eval()
 outputs = model([person_float])

--- a/ios/VisionTestApp/make_assets.py
+++ b/ios/VisionTestApp/make_assets.py
 import torch
-import torchvision
 from torch.utils.mobile_optimizer import optimize_for_mobile
+from torchvision.models.detection import (
+    fasterrcnn_mobilenet_v3_large_320_fpn,
+    FasterRCNN_MobileNet_V3_Large_320_FPN_Weights,
+)
 print(torch.__version__)
-model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(
+model = fasterrcnn_mobilenet_v3_large_320_fpn(
-    pretrained=True, box_score_thresh=0.7, rpn_post_nms_top_n_test=100, rpn_score_thresh=0.4, rpn_pre_nms_top_n_test=150
+    weights=FasterRCNN_MobileNet_V3_Large_320_FPN_Weights.DEFAULT,
+    box_score_thresh=0.7,
+    rpn_post_nms_top_n_test=100,
+    rpn_score_thresh=0.4,
+    rpn_pre_nms_top_n_test=150,
 )
 model.eval()

--- a/references/classification/README.md
+++ b/references/classification/README.md
@@ -43,7 +43,7 @@ Since it expects tensors with a size of N x 3 x 299 x 299, to validate the model
 ```
 torchrun --nproc_per_node=8 train.py --model inception_v3\
-      --val-resize-size 342 --val-crop-size 299 --train-crop-size 299 --test-only --pretrained
+      --test-only --weights Inception_V3_Weights.IMAGENET1K_V1
 ```
 ### ResNet
@@ -96,22 +96,14 @@ The weights of the B5-B7 variants are ported from Luke Melas' [EfficientNet-PyTo
 All models were trained using Bicubic interpolation and each have custom crop and resize sizes. To validate the models use the following commands:
 ```
-torchrun --nproc_per_node=8 train.py --model efficientnet_b0 --interpolation bicubic\
+torchrun --nproc_per_node=8 train.py --model efficientnet_b0 --test-only --weights EfficientNet_B0_Weights.IMAGENET1K_V1
-     --val-resize-size 256 --val-crop-size 224 --train-crop-size 224 --test-only --pretrained
+torchrun --nproc_per_node=8 train.py --model efficientnet_b1 --test-only --weights EfficientNet_B1_Weights.IMAGENET1K_V1
-torchrun --nproc_per_node=8 train.py --model efficientnet_b1 --interpolation bicubic\
+torchrun --nproc_per_node=8 train.py --model efficientnet_b2 --test-only --weights EfficientNet_B2_Weights.IMAGENET1K_V1
-      --val-resize-size 256 --val-crop-size 240 --train-crop-size 240 --test-only --pretrained
+torchrun --nproc_per_node=8 train.py --model efficientnet_b3 --test-only --weights EfficientNet_B3_Weights.IMAGENET1K_V1
-torchrun --nproc_per_node=8 train.py --model efficientnet_b2 --interpolation bicubic\
+torchrun --nproc_per_node=8 train.py --model efficientnet_b4 --test-only --weights EfficientNet_B4_Weights.IMAGENET1K_V1
-      --val-resize-size 288 --val-crop-size 288 --train-crop-size 288 --test-only --pretrained
+torchrun --nproc_per_node=8 train.py --model efficientnet_b5 --test-only --weights EfficientNet_B5_Weights.IMAGENET1K_V1
-torchrun --nproc_per_node=8 train.py --model efficientnet_b3 --interpolation bicubic\
+torchrun --nproc_per_node=8 train.py --model efficientnet_b6 --test-only --weights EfficientNet_B6_Weights.IMAGENET1K_V1
-      --val-resize-size 320 --val-crop-size 300 --train-crop-size 300 --test-only --pretrained
+torchrun --nproc_per_node=8 train.py --model efficientnet_b7 --test-only --weights EfficientNet_B7_Weights.IMAGENET1K_V1
-torchrun --nproc_per_node=8 train.py --model efficientnet_b4 --interpolation bicubic\
-      --val-resize-size 384 --val-crop-size 380 --train-crop-size 380 --test-only --pretrained
-torchrun --nproc_per_node=8 train.py --model efficientnet_b5 --interpolation bicubic\
-      --val-resize-size 456 --val-crop-size 456 --train-crop-size 456 --test-only --pretrained
-torchrun --nproc_per_node=8 train.py --model efficientnet_b6 --interpolation bicubic\
-      --val-resize-size 528 --val-crop-size 528 --train-crop-size 528 --test-only --pretrained
-torchrun --nproc_per_node=8 train.py --model efficientnet_b7 --interpolation bicubic\
-      --val-resize-size 600 --val-crop-size 600 --train-crop-size 600 --test-only --pretrained
 ```

--- a/references/classification/presets.py
+++ b/references/classification/presets.py
@@ -6,6 +6,7 @@ from torchvision.transforms.functional import InterpolationMode
 class ClassificationPresetTrain:
    def __init__(
        self,
+        *,
        crop_size,
        mean=(0.485, 0.456, 0.406),
        std=(0.229, 0.224, 0.225),
@@ -46,6 +47,7 @@ class ClassificationPresetTrain:
 class ClassificationPresetEval:
    def __init__(
        self,
+        *,
        crop_size,
        resize_size=256,
        mean=(0.485, 0.456, 0.406),

--- a/references/classification/train.py
+++ b/references/classification/train.py
@@ -15,12 +15,6 @@ from torch.utils.data.dataloader import default_collate
 from torchvision.transforms.functional import InterpolationMode
-try:
-    from torchvision import prototype
-except ImportError:
-    prototype = None
 def train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, args, model_ema=None, scaler=None):
    model.train()
    metric_logger = utils.MetricLogger(delimiter="  ")
@@ -154,16 +148,11 @@ def load_data(traindir, valdir, args):
        print(f"Loading dataset_test from {cache_path}")
        dataset_test, _ = torch.load(cache_path)
    else:
-        if not args.prototype:
+        if args.weights and args.test_only:
-            preprocessing = presets.ClassificationPresetEval(
+            weights = torchvision.models.get_weight(args.weights)
-                crop_size=val_crop_size, resize_size=val_resize_size, interpolation=interpolation
-            )
-        else:
-            if args.weights:
-                weights = prototype.models.get_weight(args.weights)
            preprocessing = weights.transforms()
        else:
-                preprocessing = prototype.transforms.ImageClassificationEval(
+            preprocessing = presets.ClassificationPresetEval(
                crop_size=val_crop_size, resize_size=val_resize_size, interpolation=interpolation
            )
@@ -191,10 +180,6 @@ def load_data(traindir, valdir, args):
 def main(args):
-    if args.prototype and prototype is None:
-        raise ImportError("The prototype module couldn't be found. Please install the latest torchvision nightly.")
-    if not args.prototype and args.weights:
-        raise ValueError("The weights parameter works only in prototype mode. Please pass the --prototype argument.")
    if args.output_dir:
        utils.mkdir(args.output_dir)
@@ -236,10 +221,7 @@ def main(args):
    )
    print("Creating model")
-    if not args.prototype:
+    model = torchvision.models.__dict__[args.model](weights=args.weights, num_classes=num_classes)
-        model = torchvision.models.__dict__[args.model](pretrained=args.pretrained, num_classes=num_classes)
-    else:
-        model = prototype.models.__dict__[args.model](weights=args.weights, num_classes=num_classes)
    model.to(device)
    if args.distributed and args.sync_bn:
@@ -446,12 +428,6 @@ def get_args_parser(add_help=True):
        help="Only test the model",
        action="store_true",
    )
-    parser.add_argument(
-        "--pretrained",
-        dest="pretrained",
-        help="Use pre-trained models from the modelzoo",
-        action="store_true",
-    )
    parser.add_argument("--auto-augment", default=None, type=str, help="auto augment policy (default: None)")
    parser.add_argument("--random-erase", default=0.0, type=float, help="random erasing probability (default: 0.0)")
@@ -496,14 +472,6 @@ def get_args_parser(add_help=True):
    parser.add_argument(
        "--ra-reps", default=3, type=int, help="number of repetitions for Repeated Augmentation (default: 3)"
    )
-    # Prototype models only
-    parser.add_argument(
-        "--prototype",
-        dest="prototype",
-        help="Use prototype model builders instead those from main area",
-        action="store_true",
-    )
    parser.add_argument("--weights", default=None, type=str, help="the weights enum name to load")
    return parser

--- a/references/classification/train_quantization.py
+++ b/references/classification/train_quantization.py
@@ -12,17 +12,7 @@ from torch import nn
 from train import train_one_epoch, evaluate, load_data
-try:
-    from torchvision import prototype
-except ImportError:
-    prototype = None
 def main(args):
-    if args.prototype and prototype is None:
-        raise ImportError("The prototype module couldn't be found. Please install the latest torchvision nightly.")
-    if not args.prototype and args.weights:
-        raise ValueError("The weights parameter works only in prototype mode. Please pass the --prototype argument.")
    if args.output_dir:
        utils.mkdir(args.output_dir)
@@ -56,10 +46,7 @@ def main(args):
    print("Creating model", args.model)
    # when training quantized models, we always start from a pre-trained fp32 reference model
-    if not args.prototype:
+    model = torchvision.models.quantization.__dict__[args.model](weights=args.weights, quantize=args.test_only)
-        model = torchvision.models.quantization.__dict__[args.model](pretrained=True, quantize=args.test_only)
-    else:
-        model = prototype.models.quantization.__dict__[args.model](weights=args.weights, quantize=args.test_only)
    model.to(device)
    if not (args.test_only or args.post_training_quantize):
@@ -264,14 +251,6 @@ def get_args_parser(add_help=True):
        "--train-crop-size", default=224, type=int, help="the random crop size used for training (default: 224)"
    )
    parser.add_argument("--clip-grad-norm", default=None, type=float, help="the maximum gradient norm (default None)")
-    # Prototype models only
-    parser.add_argument(
-        "--prototype",
-        dest="prototype",
-        help="Use prototype model builders instead those from main area",
-        action="store_true",
-    )
    parser.add_argument("--weights", default=None, type=str, help="the weights enum name to load")
    return parser

--- a/references/classification/utils.py
+++ b/references/classification/utils.py
@@ -330,22 +330,22 @@ def store_model_weights(model, checkpoint_path, checkpoint_key="model", strict=T
        from torchvision import models as M
        # Classification
-        model = M.mobilenet_v3_large(pretrained=False)
+        model = M.mobilenet_v3_large(weights=None)
        print(store_model_weights(model, './class.pth'))
        # Quantized Classification
-        model = M.quantization.mobilenet_v3_large(pretrained=False, quantize=False)
+        model = M.quantization.mobilenet_v3_large(weights=None, quantize=False)
        model.fuse_model(is_qat=True)
        model.qconfig = torch.ao.quantization.get_default_qat_qconfig('qnnpack')
        _ = torch.ao.quantization.prepare_qat(model, inplace=True)
        print(store_model_weights(model, './qat.pth'))
        # Object Detection
-        model = M.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=False, pretrained_backbone=False)
+        model = M.detection.fasterrcnn_mobilenet_v3_large_fpn(weights=None, weights_backbone=None)
        print(store_model_weights(model, './obj.pth'))
        # Segmentation
-        model = M.segmentation.deeplabv3_mobilenet_v3_large(pretrained=False, pretrained_backbone=False, aux_loss=True)
+        model = M.segmentation.deeplabv3_mobilenet_v3_large(weights=None, weights_backbone=None, aux_loss=True)
        print(store_model_weights(model, './segm.pth', strict=False))
    Args:

--- a/references/detection/README.md
+++ b/references/detection/README.md
@@ -24,35 +24,35 @@ Except otherwise noted, all models have been trained on 8x V100 GPUs.
 ```
 torchrun --nproc_per_node=8 train.py\
    --dataset coco --model fasterrcnn_resnet50_fpn --epochs 26\
-    --lr-steps 16 22 --aspect-ratio-group-factor 3
+    --lr-steps 16 22 --aspect-ratio-group-factor 3 --weights-backbone ResNet50_Weights.IMAGENET1K_V1
 ```
 ### Faster R-CNN MobileNetV3-Large FPN
 ```
 torchrun --nproc_per_node=8 train.py\
    --dataset coco --model fasterrcnn_mobilenet_v3_large_fpn --epochs 26\
-    --lr-steps 16 22 --aspect-ratio-group-factor 3
+    --lr-steps 16 22 --aspect-ratio-group-factor 3 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
 ```
 ### Faster R-CNN MobileNetV3-Large 320 FPN
 ```
 torchrun --nproc_per_node=8 train.py\
    --dataset coco --model fasterrcnn_mobilenet_v3_large_320_fpn --epochs 26\
-    --lr-steps 16 22 --aspect-ratio-group-factor 3
+    --lr-steps 16 22 --aspect-ratio-group-factor 3 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
 ```
 ### FCOS ResNet-50 FPN
 ```
 torchrun --nproc_per_node=8 train.py\
    --dataset coco --model fcos_resnet50_fpn --epochs 26\
-    --lr-steps 16 22 --aspect-ratio-group-factor 3  --lr 0.01 --amp
+    --lr-steps 16 22 --aspect-ratio-group-factor 3  --lr 0.01 --amp --weights-backbone ResNet50_Weights.IMAGENET1K_V1
 ```
 ### RetinaNet
 ```
 torchrun --nproc_per_node=8 train.py\
    --dataset coco --model retinanet_resnet50_fpn --epochs 26\
-    --lr-steps 16 22 --aspect-ratio-group-factor 3 --lr 0.01
+    --lr-steps 16 22 --aspect-ratio-group-factor 3 --lr 0.01 --weights-backbone ResNet50_Weights.IMAGENET1K_V1
 ```
 ### SSD300 VGG16
@@ -60,7 +60,7 @@ torchrun --nproc_per_node=8 train.py\
 torchrun --nproc_per_node=8 train.py\
    --dataset coco --model ssd300_vgg16 --epochs 120\
    --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4\
-    --weight-decay 0.0005 --data-augmentation ssd
+    --weight-decay 0.0005 --data-augmentation ssd --weights-backbone VGG16_Weights.IMAGENET1K_FEATURES
 ```
 ### SSDlite320 MobileNetV3-Large
@@ -68,7 +68,7 @@ torchrun --nproc_per_node=8 train.py\
 torchrun --nproc_per_node=8 train.py\
    --dataset coco --model ssdlite320_mobilenet_v3_large --epochs 660\
    --aspect-ratio-group-factor 3 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24\
-    --weight-decay 0.00004 --data-augmentation ssdlite
+    --weight-decay 0.00004 --data-augmentation ssdlite --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
 ```
@@ -76,7 +76,7 @@ torchrun --nproc_per_node=8 train.py\
 ```
 torchrun --nproc_per_node=8 train.py\
    --dataset coco --model maskrcnn_resnet50_fpn --epochs 26\
-    --lr-steps 16 22 --aspect-ratio-group-factor 3
+    --lr-steps 16 22 --aspect-ratio-group-factor 3 --weights-backbone ResNet50_Weights.IMAGENET1K_V1
 ```
@@ -84,5 +84,5 @@ torchrun --nproc_per_node=8 train.py\
 ```
 torchrun --nproc_per_node=8 train.py\
    --dataset coco_kp --model keypointrcnn_resnet50_fpn --epochs 46\
-    --lr-steps 36 43 --aspect-ratio-group-factor 3
+    --lr-steps 36 43 --aspect-ratio-group-factor 3 --weights-backbone ResNet50_Weights.IMAGENET1K_V1
 ```
--- a/references/detection/train.py
+++ b/references/detection/train.py
@@ -33,12 +33,6 @@ from engine import train_one_epoch, evaluate
 from group_by_aspect_ratio import GroupedBatchSampler, create_aspect_ratio_groups
-try:
-    from torchvision import prototype
-except ImportError:
-    prototype = None
 def get_dataset(name, image_set, transform, data_path):
    paths = {"coco": (data_path, get_coco, 91), "coco_kp": (data_path, get_coco_kp, 2)}
    p, ds_fn, num_classes = paths[name]
@@ -49,15 +43,13 @@ def get_dataset(name, image_set, transform, data_path):
 def get_transform(train, args):
    if train:
-        return presets.DetectionPresetTrain(args.data_augmentation)
+        return presets.DetectionPresetTrain(data_augmentation=args.data_augmentation)
-    elif not args.prototype:
+    elif args.weights and args.test_only:
-        return presets.DetectionPresetEval()
+        weights = torchvision.models.get_weight(args.weights)
-    else:
+        trans = weights.transforms()
-        if args.weights:
+        return lambda img, target: (trans(img), target)
-            weights = prototype.models.get_weight(args.weights)
-            return weights.transforms()
    else:
-            return prototype.transforms.ObjectDetectionEval()
+        return presets.DetectionPresetEval()
 def get_args_parser(add_help=True):
@@ -132,25 +124,12 @@ def get_args_parser(add_help=True):
        help="Only test the model",
        action="store_true",
    )
-    parser.add_argument(
-        "--pretrained",
-        dest="pretrained",
-        help="Use pre-trained models from the modelzoo",
-        action="store_true",
-    )
    # distributed training parameters
    parser.add_argument("--world-size", default=1, type=int, help="number of distributed processes")
    parser.add_argument("--dist-url", default="env://", type=str, help="url used to set up distributed training")
-    # Prototype models only
-    parser.add_argument(
-        "--prototype",
-        dest="prototype",
-        help="Use prototype model builders instead those from main area",
-        action="store_true",
-    )
    parser.add_argument("--weights", default=None, type=str, help="the weights enum name to load")
+    parser.add_argument("--weights-backbone", default=None, type=str, help="the backbone weights enum name to load")
    # Mixed precision training parameters
    parser.add_argument("--amp", action="store_true", help="Use torch.cuda.amp for mixed precision training")
@@ -159,10 +138,6 @@ def get_args_parser(add_help=True):
 def main(args):
-    if args.prototype and prototype is None:
-        raise ImportError("The prototype module couldn't be found. Please install the latest torchvision nightly.")
-    if not args.prototype and args.weights:
-        raise ValueError("The weights parameter works only in prototype mode. Please pass the --prototype argument.")
    if args.output_dir:
        utils.mkdir(args.output_dir)
@@ -204,12 +179,9 @@ def main(args):
    if "rcnn" in args.model:
        if args.rpn_score_thresh is not None:
            kwargs["rpn_score_thresh"] = args.rpn_score_thresh
-    if not args.prototype:
    model = torchvision.models.detection.__dict__[args.model](
-            pretrained=args.pretrained, num_classes=num_classes, **kwargs
+        weights=args.weights, weights_backbone=args.weights_backbone, num_classes=num_classes, **kwargs
    )
-    else:
-        model = prototype.models.detection.__dict__[args.model](weights=args.weights, num_classes=num_classes, **kwargs)
    model.to(device)
    if args.distributed and args.sync_bn:
        model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)

--- a/references/optical_flow/README.md
+++ b/references/optical_flow/README.md
@@ -51,7 +51,7 @@ torchrun --nproc_per_node 8 --nnodes 1 train.py \
 ### Evaluation
 ```
-torchrun --nproc_per_node 1 --nnodes 1 train.py --val-dataset sintel --batch-size 1 --dataset-root $dataset_root --model raft_large --pretrained
+torchrun --nproc_per_node 1 --nnodes 1 train.py --val-dataset sintel --batch-size 1 --dataset-root $dataset_root --model raft_large --weights Raft_Large_Weights.C_T_SKHT_V2
 ```
 This should give an epe of about 1.3822 on the clean pass and 2.7161 on the
@@ -67,6 +67,6 @@ Sintel val final epe: 2.7161	1px: 0.8528	3px: 0.9204	5px: 0.9392	per_image_epe:
 You can also evaluate on Kitti train:
 ```
-torchrun --nproc_per_node 1 --nnodes 1 train.py --val-dataset kitti --batch-size 1 --dataset-root $dataset_root --model raft_large --pretrained
+torchrun --nproc_per_node 1 --nnodes 1 train.py --val-dataset kitti --batch-size 1 --dataset-root $dataset_root --model raft_large --weights Raft_Large_Weights.C_T_SKHT_V2
 Kitti val epe: 4.7968	1px: 0.6388	3px: 0.8197	5px: 0.8661	per_image_epe: 4.5118	f1: 16.0679
 ```
--- a/references/optical_flow/presets.py
+++ b/references/optical_flow/presets.py
@@ -22,6 +22,7 @@ class OpticalFlowPresetEval(torch.nn.Module):
 class OpticalFlowPresetTrain(torch.nn.Module):
    def __init__(
        self,
+        *,
        # RandomResizeAndCrop params
        crop_size,
        min_scale=-0.2,

--- a/references/optical_flow/train.py
+++ b/references/optical_flow/train.py
@@ -9,11 +9,6 @@ import utils
 from presets import OpticalFlowPresetTrain, OpticalFlowPresetEval
 from torchvision.datasets import KittiFlow, FlyingChairs, FlyingThings3D, Sintel, HD1K
-try:
-    from torchvision import prototype
-except ImportError:
-    prototype = None
 def get_train_dataset(stage, dataset_root):
    if stage == "chairs":
@@ -138,12 +133,18 @@ def _evaluate(model, args, val_dataset, *, padder_mode, num_flow_updates=None, b
 def evaluate(model, args):
    val_datasets = args.val_dataset or []
-    if args.prototype:
+    if args.weights and args.test_only:
-        if args.weights:
+        weights = torchvision.models.get_weight(args.weights)
-            weights = prototype.models.get_weight(args.weights)
+        trans = weights.transforms()
-            preprocessing = weights.transforms()
-        else:
+        def preprocessing(img1, img2, flow, valid_flow_mask):
-            preprocessing = prototype.transforms.OpticalFlowEval()
+            img1, img2 = trans(img1, img2)
+            if flow is not None and not isinstance(flow, torch.Tensor):
+                flow = torch.from_numpy(flow)
+            if valid_flow_mask is not None and not isinstance(valid_flow_mask, torch.Tensor):
+                valid_flow_mask = torch.from_numpy(valid_flow_mask)
+            return img1, img2, flow, valid_flow_mask
    else:
        preprocessing = OpticalFlowPresetEval()
@@ -201,20 +202,14 @@ def train_one_epoch(model, optimizer, scheduler, train_loader, logger, args):
 def main(args):
-    if args.prototype and prototype is None:
-        raise ImportError("The prototype module couldn't be found. Please install the latest torchvision nightly.")
-    if not args.prototype and args.weights:
-        raise ValueError("The weights parameter works only in prototype mode. Please pass the --prototype argument.")
    utils.setup_ddp(args)
+    args.test_only = args.train_dataset is None
    if args.distributed and args.device == "cpu":
        raise ValueError("The device must be cuda if we want to run in distributed mode using torchrun")
    device = torch.device(args.device)
-    if args.prototype:
+    model = torchvision.models.optical_flow.__dict__[args.model](weights=args.weights)
-        model = prototype.models.optical_flow.__dict__[args.model](weights=args.weights)
-    else:
-        model = torchvision.models.optical_flow.__dict__[args.model](pretrained=args.pretrained)
    if args.distributed:
        model = model.to(args.local_rank)
@@ -228,7 +223,7 @@ def main(args):
        checkpoint = torch.load(args.resume, map_location="cpu")
        model_without_ddp.load_state_dict(checkpoint["model"])
-    if args.train_dataset is None:
+    if args.test_only:
        # Set deterministic CUDNN algorithms, since they can affect epe a fair bit.
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True
@@ -356,8 +351,7 @@ def get_args_parser(add_help=True):
    parser.add_argument(
        "--model", type=str, default="raft_large", help="The name of the model to use - either raft_large or raft_small"
    )
-    # TODO: resume, pretrained, and weights should be in an exclusive arg group
+    # TODO: resume and weights should be in an exclusive arg group
-    parser.add_argument("--pretrained", action="store_true", help="Whether to use pretrained weights")
    parser.add_argument(
        "--num_flow_updates",
@@ -376,13 +370,6 @@ def get_args_parser(add_help=True):
        required=True,
    )
-    # Prototype models only
-    parser.add_argument(
-        "--prototype",
-        dest="prototype",
-        help="Use prototype model builders instead those from main area",
-        action="store_true",
-    )
    parser.add_argument("--weights", default=None, type=str, help="the weights enum name to load.")
    parser.add_argument("--device", default="cuda", type=str, help="device (Use cuda or cpu, Default: cuda)")