Unverified Commit 7dc5e5bd authored by Philip Meier's avatar Philip Meier Committed by GitHub
Browse files

Fix typos and grammar errors (#7065)

* fix typos throughout the code base

* fix grammar

* revert formatting changes to gallery

* revert 'an uXX'

* remove 'number of the best'
parent ed2a0adb
...@@ -69,7 +69,7 @@ If you plan to modify the code or documentation, please follow the steps below: ...@@ -69,7 +69,7 @@ If you plan to modify the code or documentation, please follow the steps below:
For more details about pull requests, For more details about pull requests,
please read [GitHub's guides](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request). please read [GitHub's guides](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request).
If you would like to contribute a new model, please see [here](#New-model). If you would like to contribute a new model, please see [here](#New-architecture-or-improved-model-weights).
If you would like to contribute a new dataset, please see [here](#New-dataset). If you would like to contribute a new dataset, please see [here](#New-dataset).
...@@ -198,7 +198,7 @@ it in an issue as, most likely, it will not be accepted. ...@@ -198,7 +198,7 @@ it in an issue as, most likely, it will not be accepted.
### Pull Request ### Pull Request
If all previous checks (flake8, mypy, unit tests) are passing, please send a PR. Submitted PR will pass other tests on If all previous checks (flake8, mypy, unit tests) are passing, please send a PR. Submitted PR will pass other tests on
different operation systems, python versions and hardwares. different operating systems, python versions and hardware.
For more details about pull requests workflow, For more details about pull requests workflow,
please read [GitHub's guides](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request). please read [GitHub's guides](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request).
......
...@@ -20,13 +20,13 @@ So, before starting any work and submitting a PR there are a few critical things ...@@ -20,13 +20,13 @@ So, before starting any work and submitting a PR there are a few critical things
### 1. Preparation work ### 1. Preparation work
- Start by looking into this [issue](https://github.com/pytorch/vision/issues/2707) in order to have an idea of the models that are being considered, express your willingness to add a new model and discuss with the community whether or not this model should be included in TorchVision. It is very important at this stage to make sure that there is an agreement on the value of having this model in TorchVision and there is no one else already working on it. - Start by looking into this [issue](https://github.com/pytorch/vision/issues/2707) in order to have an idea of the models that are being considered, express your willingness to add a new model and discuss with the community whether this model should be included in TorchVision. It is very important at this stage to make sure that there is an agreement on the value of having this model in TorchVision and there is no one else already working on it.
- If the decision is to include the new model, then please create a new ticket which will be used for all design and implementation discussions prior to the PR. One of the TorchVision maintainers will reach out at this stage and this will be your POC from this point onwards in order to provide support, guidance and regular feedback. - If the decision is to include the new model, then please create a new ticket which will be used for all design and implementation discussions prior to the PR. One of the TorchVision maintainers will reach out at this stage and this will be your POC from this point onwards in order to provide support, guidance and regular feedback.
### 2. Implement the model ### 2. Implement the model
Please take a look at existing models in TorchVision to get familiar with the idioms. Also please look at recent contributions for new models. If in doubt about any design decisions you can ask for feedback on the issue created in step 1. Example of things to take into account: Please take a look at existing models in TorchVision to get familiar with the idioms. Also, please look at recent contributions for new models. If in doubt about any design decisions you can ask for feedback on the issue created in step 1. Example of things to take into account:
- The implementation should be as close as possible to the canonical implementation/paper - The implementation should be as close as possible to the canonical implementation/paper
- The PR must include the code implementation, documentation and tests - The PR must include the code implementation, documentation and tests
...@@ -34,7 +34,7 @@ Please take a look at existing models in TorchVision to get familiar with the id ...@@ -34,7 +34,7 @@ Please take a look at existing models in TorchVision to get familiar with the id
- The weights need to reproduce closely the results of the paper in terms of accuracy, even though the final weights to be deployed will be those trained by the TorchVision maintainers - The weights need to reproduce closely the results of the paper in terms of accuracy, even though the final weights to be deployed will be those trained by the TorchVision maintainers
- The PR description should include commands/configuration used to train the model, so that the TorchVision maintainers can easily run them to verify the implementation and generate the final model to be released - The PR description should include commands/configuration used to train the model, so that the TorchVision maintainers can easily run them to verify the implementation and generate the final model to be released
- Make sure we re-use existing components as much as possible (inheritance) - Make sure we re-use existing components as much as possible (inheritance)
- New primitives (transforms, losses, etc) can be added if necessary, but the final location will be determined after discussion with the dedicated maintainer - New primitives (transforms, losses, etc.) can be added if necessary, but the final location will be determined after discussion with the dedicated maintainer
- Please take a look at the detailed [implementation and documentation guidelines](https://github.com/pytorch/vision/issues/5319) for a fine grain list of things not to be missed - Please take a look at the detailed [implementation and documentation guidelines](https://github.com/pytorch/vision/issues/5319) for a fine grain list of things not to be missed
### 3. Train the model with reference scripts ### 3. Train the model with reference scripts
......
...@@ -331,7 +331,7 @@ def inject_weight_metadata(app, what, name, obj, options, lines): ...@@ -331,7 +331,7 @@ def inject_weight_metadata(app, what, name, obj, options, lines):
] ]
if obj.__doc__ != "An enumeration.": if obj.__doc__ != "An enumeration.":
# We only show the custom enum doc if it was overriden. The default one from Python is "An enumeration" # We only show the custom enum doc if it was overridden. The default one from Python is "An enumeration"
lines.append("") lines.append("")
lines.append(obj.__doc__) lines.append(obj.__doc__)
......
...@@ -14,7 +14,7 @@ and is based on `One weird trick for parallelizing convolutional neural networks ...@@ -14,7 +14,7 @@ and is based on `One weird trick for parallelizing convolutional neural networks
Model builders Model builders
-------------- --------------
The following model builders can be used to instanciate an AlexNet model, with or The following model builders can be used to instantiate an AlexNet model, with or
without pre-trained weights. All the model builders internally rely on the without pre-trained weights. All the model builders internally rely on the
``torchvision.models.alexnet.AlexNet`` base class. Please refer to the `source ``torchvision.models.alexnet.AlexNet`` base class. Please refer to the `source
code code
......
...@@ -10,7 +10,7 @@ paper. ...@@ -10,7 +10,7 @@ paper.
Model builders Model builders
-------------- --------------
The following model builders can be used to instanciate an EfficientNet model, with or The following model builders can be used to instantiate an EfficientNet model, with or
without pre-trained weights. All the model builders internally rely on the without pre-trained weights. All the model builders internally rely on the
``torchvision.models.efficientnet.EfficientNet`` base class. Please refer to the `source ``torchvision.models.efficientnet.EfficientNet`` base class. Please refer to the `source
code code
......
...@@ -10,7 +10,7 @@ paper. ...@@ -10,7 +10,7 @@ paper.
Model builders Model builders
-------------- --------------
The following model builders can be used to instanciate an EfficientNetV2 model, with or The following model builders can be used to instantiate an EfficientNetV2 model, with or
without pre-trained weights. All the model builders internally rely on the without pre-trained weights. All the model builders internally rely on the
``torchvision.models.efficientnet.EfficientNet`` base class. Please refer to the `source ``torchvision.models.efficientnet.EfficientNet`` base class. Please refer to the `source
code code
......
...@@ -10,7 +10,7 @@ paper. ...@@ -10,7 +10,7 @@ paper.
Model builders Model builders
-------------- --------------
The following model builders can be used to instanciate a GoogLeNet model, with or The following model builders can be used to instantiate a GoogLeNet model, with or
without pre-trained weights. All the model builders internally rely on the without pre-trained weights. All the model builders internally rely on the
``torchvision.models.googlenet.GoogLeNet`` base class. Please refer to the `source ``torchvision.models.googlenet.GoogLeNet`` base class. Please refer to the `source
code code
......
...@@ -10,7 +10,7 @@ paper. ...@@ -10,7 +10,7 @@ paper.
Model builders Model builders
-------------- --------------
The following model builders can be used to instanciate a quantized GoogLeNet The following model builders can be used to instantiate a quantized GoogLeNet
model, with or without pre-trained weights. All the model builders internally model, with or without pre-trained weights. All the model builders internally
rely on the ``torchvision.models.quantization.googlenet.QuantizableGoogLeNet`` rely on the ``torchvision.models.quantization.googlenet.QuantizableGoogLeNet``
base class. Please refer to the `source code base class. Please refer to the `source code
......
...@@ -10,7 +10,7 @@ Computer Vision <https://arxiv.org/abs/1512.00567>`__ paper. ...@@ -10,7 +10,7 @@ Computer Vision <https://arxiv.org/abs/1512.00567>`__ paper.
Model builders Model builders
-------------- --------------
The following model builders can be used to instanciate an InceptionV3 model, with or The following model builders can be used to instantiate an InceptionV3 model, with or
without pre-trained weights. All the model builders internally rely on the without pre-trained weights. All the model builders internally rely on the
``torchvision.models.inception.Inception3`` base class. Please refer to the `source ``torchvision.models.inception.Inception3`` base class. Please refer to the `source
code <https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py>`_ for code <https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py>`_ for
......
...@@ -10,7 +10,7 @@ Computer Vision <https://arxiv.org/abs/1512.00567>`__ paper. ...@@ -10,7 +10,7 @@ Computer Vision <https://arxiv.org/abs/1512.00567>`__ paper.
Model builders Model builders
-------------- --------------
The following model builders can be used to instanciate a quantized Inception The following model builders can be used to instantiate a quantized Inception
model, with or without pre-trained weights. All the model builders internally model, with or without pre-trained weights. All the model builders internally
rely on the ``torchvision.models.quantization.inception.QuantizableInception3`` rely on the ``torchvision.models.quantization.inception.QuantizableInception3``
base class. Please refer to the `source code base class. Please refer to the `source code
......
...@@ -11,7 +11,7 @@ Search for Mobile <https://arxiv.org/pdf/1807.11626.pdf>`__ paper. ...@@ -11,7 +11,7 @@ Search for Mobile <https://arxiv.org/pdf/1807.11626.pdf>`__ paper.
Model builders Model builders
-------------- --------------
The following model builders can be used to instanciate an MNASNet model. The following model builders can be used to instantiate an MNASNet model.
All the model builders internally rely on the All the model builders internally rely on the
``torchvision.models.mnasnet.MNASNet`` base class. Please refer to the `source ``torchvision.models.mnasnet.MNASNet`` base class. Please refer to the `source
code code
......
...@@ -12,7 +12,7 @@ The SSD model is based on the `SSD: Single Shot MultiBox Detector ...@@ -12,7 +12,7 @@ The SSD model is based on the `SSD: Single Shot MultiBox Detector
Model builders Model builders
-------------- --------------
The following model builders can be used to instanciate a SSD model, with or The following model builders can be used to instantiate a SSD model, with or
without pre-trained weights. All the model builders internally rely on the without pre-trained weights. All the model builders internally rely on the
``torchvision.models.detection.SSD`` base class. Please refer to the `source ``torchvision.models.detection.SSD`` base class. Please refer to the `source
code code
......
...@@ -4,7 +4,7 @@ Utils ...@@ -4,7 +4,7 @@ Utils
===== =====
The ``torchvision.utils`` module contains various utilities, mostly :ref:`for The ``torchvision.utils`` module contains various utilities, mostly :ref:`for
vizualization <sphx_glr_auto_examples_plot_visualization_utils.py>`. visualization <sphx_glr_auto_examples_plot_visualization_utils.py>`.
.. currentmodule:: torchvision.utils .. currentmodule:: torchvision.utils
......
...@@ -10,7 +10,7 @@ them using JIT compilation. ...@@ -10,7 +10,7 @@ them using JIT compilation.
Prior to v0.8.0, transforms in torchvision have traditionally been PIL-centric Prior to v0.8.0, transforms in torchvision have traditionally been PIL-centric
and presented multiple limitations due to that. Now, since v0.8.0, transforms and presented multiple limitations due to that. Now, since v0.8.0, transforms
implementations are Tensor and PIL compatible and we can achieve the following implementations are Tensor and PIL compatible, and we can achieve the following
new features: new features:
- transform multi-band torch tensor images (with more than 3-4 channels) - transform multi-band torch tensor images (with more than 3-4 channels)
......
...@@ -188,7 +188,7 @@ show(dogs_with_masks) ...@@ -188,7 +188,7 @@ show(dogs_with_masks)
# We can plot more than one mask per image! Remember that the model returned as # We can plot more than one mask per image! Remember that the model returned as
# many masks as there are classes. Let's ask the same query as above, but this # many masks as there are classes. Let's ask the same query as above, but this
# time for *all* classes, not just the dog class: "For each pixel and each class # time for *all* classes, not just the dog class: "For each pixel and each class
# C, is class C the most most likely class?" # C, is class C the most likely class?"
# #
# This one is a bit more involved, so we'll first show how to do it with a # This one is a bit more involved, so we'll first show how to do it with a
# single image, and then we'll generalize to the batch # single image, and then we'll generalize to the batch
...@@ -317,7 +317,7 @@ show(draw_segmentation_masks(dog1_int, dog1_bool_masks, alpha=0.9)) ...@@ -317,7 +317,7 @@ show(draw_segmentation_masks(dog1_int, dog1_bool_masks, alpha=0.9))
##################################### #####################################
# The model seems to have properly detected the dog, but it also confused trees # The model seems to have properly detected the dog, but it also confused trees
# with people. Looking more closely at the scores will help us plotting more # with people. Looking more closely at the scores will help us plot more
# relevant masks: # relevant masks:
print(dog1_output['scores']) print(dog1_output['scores'])
...@@ -343,7 +343,7 @@ show(dogs_with_masks) ...@@ -343,7 +343,7 @@ show(dogs_with_masks)
##################################### #####################################
# The two 'people' masks in the first image where not selected because they have # The two 'people' masks in the first image where not selected because they have
# a lower score than the score threshold. Similarly in the second image, the # a lower score than the score threshold. Similarly, in the second image, the
# instance with class 15 (which corresponds to 'bench') was not selected. # instance with class 15 (which corresponds to 'bench') was not selected.
##################################### #####################################
......
...@@ -298,7 +298,7 @@ Here `$MODEL` is one of `googlenet`, `inception_v3`, `resnet18`, `resnet50`, `re ...@@ -298,7 +298,7 @@ Here `$MODEL` is one of `googlenet`, `inception_v3`, `resnet18`, `resnet50`, `re
### Quantized ShuffleNet V2 ### Quantized ShuffleNet V2
Here are commands that we use to quantized the `shufflenet_v2_x1_5` and `shufflenet_v2_x2_0` models. Here are commands that we use to quantize the `shufflenet_v2_x1_5` and `shufflenet_v2_x2_0` models.
``` ```
# For shufflenet_v2_x1_5 # For shufflenet_v2_x1_5
python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' \ python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' \
......
...@@ -314,11 +314,11 @@ def main(args): ...@@ -314,11 +314,11 @@ def main(args):
model_ema = None model_ema = None
if args.model_ema: if args.model_ema:
# Decay adjustment that aims to keep the decay independent from other hyper-parameters originally proposed at: # Decay adjustment that aims to keep the decay independent of other hyper-parameters originally proposed at:
# https://github.com/facebookresearch/pycls/blob/f8cd9627/pycls/core/net.py#L123 # https://github.com/facebookresearch/pycls/blob/f8cd9627/pycls/core/net.py#L123
# #
# total_ema_updates = (Dataset_size / n_GPUs) * epochs / (batch_size_per_gpu * EMA_steps) # total_ema_updates = (Dataset_size / n_GPUs) * epochs / (batch_size_per_gpu * EMA_steps)
# We consider constant = Dataset_size for a given dataset/setup and ommit it. Thus: # We consider constant = Dataset_size for a given dataset/setup and omit it. Thus:
# adjust = 1 / total_ema_updates ~= n_GPUs * batch_size_per_gpu * EMA_steps / epochs # adjust = 1 / total_ema_updates ~= n_GPUs * batch_size_per_gpu * EMA_steps / epochs
adjust = args.world_size * args.batch_size * args.model_ema_steps / args.epochs adjust = args.world_size * args.batch_size * args.model_ema_steps / args.epochs
alpha = 1.0 - args.model_ema_decay alpha = 1.0 - args.model_ema_decay
......
...@@ -365,12 +365,12 @@ def store_model_weights(model, checkpoint_path, checkpoint_key="model", strict=T ...@@ -365,12 +365,12 @@ def store_model_weights(model, checkpoint_path, checkpoint_key="model", strict=T
checkpoint_path = os.path.abspath(checkpoint_path) checkpoint_path = os.path.abspath(checkpoint_path)
output_dir = os.path.dirname(checkpoint_path) output_dir = os.path.dirname(checkpoint_path)
# Deep copy to avoid side-effects on the model object. # Deep copy to avoid side effects on the model object.
model = copy.deepcopy(model) model = copy.deepcopy(model)
checkpoint = torch.load(checkpoint_path, map_location="cpu") checkpoint = torch.load(checkpoint_path, map_location="cpu")
# Load the weights to the model to validate that everything works # Load the weights to the model to validate that everything works
# and remove unnecessary weights (such as auxiliaries, etc) # and remove unnecessary weights (such as auxiliaries, etc.)
if checkpoint_key == "model_ema": if checkpoint_key == "model_ema":
del checkpoint[checkpoint_key]["n_averaged"] del checkpoint[checkpoint_key]["n_averaged"]
torch.nn.modules.utils.consume_prefix_in_state_dict_if_present(checkpoint[checkpoint_key], "module.") torch.nn.modules.utils.consume_prefix_in_state_dict_if_present(checkpoint[checkpoint_key], "module.")
......
...@@ -12,8 +12,8 @@ A ratio of **88-6-6** was used in order to train a baseline weight set. We provi ...@@ -12,8 +12,8 @@ A ratio of **88-6-6** was used in order to train a baseline weight set. We provi
Both used 8 A100 GPUs and a batch size of 2 (so effective batch size is 16). The Both used 8 A100 GPUs and a batch size of 2 (so effective batch size is 16). The
rest of the hyper-parameters loosely follow the recipe from https://github.com/megvii-research/CREStereo. rest of the hyper-parameters loosely follow the recipe from https://github.com/megvii-research/CREStereo.
The original recipe trains for **300000** updates (or steps) on the dataset mixture. We modify the learning rate The original recipe trains for **300000** updates (or steps) on the dataset mixture. We modify the learning rate
schedule to one that starts decaying the weight much sooner. Throughout experiments we found that this reduces overfitting schedule to one that starts decaying the weight much sooner. Throughout the experiments we found that this reduces
during evaluation time and gradient clip help stabilize the loss during a pre-mature learning rate change. overfitting during evaluation time and gradient clip help stabilize the loss during a pre-mature learning rate change.
``` ```
torchrun --nproc_per_node 8 --nnodes 1 train.py \ torchrun --nproc_per_node 8 --nnodes 1 train.py \
...@@ -31,7 +31,7 @@ torchrun --nproc_per_node 8 --nnodes 1 train.py \ ...@@ -31,7 +31,7 @@ torchrun --nproc_per_node 8 --nnodes 1 train.py \
--clip-grad-norm 1.0 \ --clip-grad-norm 1.0 \
``` ```
We employ a multi-set fine-tuning stage where we uniformly sample from multiple datasets. Given hat some of these datasets have extremely large images (``2048x2048`` or more) we opt for a very aggresive scale-range ``[0.2 - 0.8]`` such that as much of the original frame composition is captured inside the ``384x512`` crop. We employ a multi-set fine-tuning stage where we uniformly sample from multiple datasets. Given hat some of these datasets have extremely large images (``2048x2048`` or more) we opt for a very aggressive scale-range ``[0.2 - 0.8]`` such that as much of the original frame composition is captured inside the ``384x512`` crop.
``` ```
torchrun --nproc_per_node 8 --nnodes 1 train.py \ torchrun --nproc_per_node 8 --nnodes 1 train.py \
...@@ -59,7 +59,7 @@ Evaluating the base weights ...@@ -59,7 +59,7 @@ Evaluating the base weights
torchrun --nproc_per_node 1 --nnodes 1 cascade_evaluation.py --dataset middlebury2014-train --batch-size 1 --dataset-root $dataset_root --model crestereo_base --weights CREStereo_Base_Weights.CRESTEREO_ETH_MBL_V1 torchrun --nproc_per_node 1 --nnodes 1 cascade_evaluation.py --dataset middlebury2014-train --batch-size 1 --dataset-root $dataset_root --model crestereo_base --weights CREStereo_Base_Weights.CRESTEREO_ETH_MBL_V1
``` ```
This should give an **mae of about 1.416** on the train set of `Middlebury2014`. Results may vary slightly depending on the batch size and the number of GPUs. For the most accurate resuts use 1 GPU and `--batch-size 1`. The created log file should look like this, where the first key is the number of cascades and the nested key is the number of recursive iterations: This should give an **mae of about 1.416** on the train set of `Middlebury2014`. Results may vary slightly depending on the batch size and the number of GPUs. For the most accurate results use 1 GPU and `--batch-size 1`. The created log file should look like this, where the first key is the number of cascades and the nested key is the number of recursive iterations:
``` ```
Dataset: middlebury2014-train @size: [384, 512]: Dataset: middlebury2014-train @size: [384, 512]:
...@@ -135,7 +135,7 @@ Dataset: middlebury2014-train @size: [384, 512]: ...@@ -135,7 +135,7 @@ Dataset: middlebury2014-train @size: [384, 512]:
# Concerns when training # Concerns when training
We encourage users to be aware of the **aspect-ratio** and **disparity scale** they are targetting when doing any sort of training or fine-tuning. The model is highly sensitive to these two factors, as a consequence with naive multi-set fine-tuning one can achieve `0.2 mae` relatively fast. We recommend that users pay close attention to how they **balance dataset sizing** when training such networks. We encourage users to be aware of the **aspect-ratio** and **disparity scale** they are targeting when doing any sort of training or fine-tuning. The model is highly sensitive to these two factors, as a consequence of naive multi-set fine-tuning one can achieve `0.2 mae` relatively fast. We recommend that users pay close attention to how they **balance dataset sizing** when training such networks.
Ideally, dataset scaling should be trated at an individual level and a thorough **EDA** of the disparity distribution in random crops at the desired training / inference size should be performed prior to any large compute investments. Ideally, dataset scaling should be trated at an individual level and a thorough **EDA** of the disparity distribution in random crops at the desired training / inference size should be performed prior to any large compute investments.
...@@ -146,14 +146,14 @@ We encourage users to be aware of the **aspect-ratio** and **disparity scale** t ...@@ -146,14 +146,14 @@ We encourage users to be aware of the **aspect-ratio** and **disparity scale** t
![Disparity1](assets/disparity-domain-drift.jpg) ![Disparity1](assets/disparity-domain-drift.jpg)
From left to right (`left_image`, `right_image`, `valid_mask`, `valid_mask & ground_truth`, `prediction`). **Darker is further away, lighter is closer**. In the case of `Sintel` which is more closely aligned to the original distribution of `CREStereo` we notice that the model accurately predicts the background scale whereas in the case of `Middlebury2014` it cannot correcly estimate the continous disparity. Notice that the frame composition is similar for both examples. The blue skybox in the `Sintel` scene behaves similarly to the `Middlebury` black background. However, because the `Middlebury` samples comes from an extremly large scene the crop size of `384x512` does not correctly capture the general training distribution. From left to right (`left_image`, `right_image`, `valid_mask`, `valid_mask & ground_truth`, `prediction`). **Darker is further away, lighter is closer**. In the case of `Sintel` which is more closely aligned to the original distribution of `CREStereo` we notice that the model accurately predicts the background scale whereas in the case of `Middlebury2014` it cannot correctly estimate the continuous disparity. Notice that the frame composition is similar for both examples. The blue skybox in the `Sintel` scene behaves similarly to the `Middlebury` black background. However, because the `Middlebury` samples comes from an extremely large scene the crop size of `384x512` does not correctly capture the general training distribution.
##### Sample B ##### Sample B
The top row contains a scene from `Sceneflow` using the `Monkaa` split whilst the bottom row is a scene from `Middlebury`. This sample exhibits the same issues when it comes to **background estimation**. Given the exagerated size of the `Middlebury` samples the model **colapses the smooth background** of the sample to what it considers to be a mean background disparity value. The top row contains a scene from `Sceneflow` using the `Monkaa` split whilst the bottom row is a scene from `Middlebury`. This sample exhibits the same issues when it comes to **background estimation**. Given the exaggerated size of the `Middlebury` samples the model **colapses the smooth background** of the sample to what it considers to be a mean background disparity value.
![Disparity2](assets/disparity-background-mode-collapse.jpg) ![Disparity2](assets/disparity-background-mode-collapse.jpg)
......
...@@ -9,7 +9,7 @@ from torch.nn import functional as F ...@@ -9,7 +9,7 @@ from torch.nn import functional as F
from train import make_eval_loader from train import make_eval_loader
from utils.metrics import AVAILABLE_METRICS from utils.metrics import AVAILABLE_METRICS
from vizualization import make_prediction_image_side_to_side from visualization import make_prediction_image_side_to_side
def get_args_parser(add_help=True): def get_args_parser(add_help=True):
...@@ -113,7 +113,7 @@ def _evaluate( ...@@ -113,7 +113,7 @@ def _evaluate(
*, *,
padder_mode, padder_mode,
print_freq=10, print_freq=10,
writter=None, writer=None,
step=None, step=None,
iterations=10, iterations=10,
cascades=1, cascades=1,
...@@ -180,10 +180,10 @@ def _evaluate( ...@@ -180,10 +180,10 @@ def _evaluate(
"the dataset is not divisible by the batch size. Try lowering the batch size for more accurate results." "the dataset is not divisible by the batch size. Try lowering the batch size for more accurate results."
) )
if writter is not None and args.rank == 0: if writer is not None and args.rank == 0:
for meter_name, meter_value in logger.meters.items(): for meter_name, meter_value in logger.meters.items():
scalar_name = f"{meter_name} {header}" scalar_name = f"{meter_name} {header}"
writter.add_scalar(scalar_name, meter_value.avg, step) writer.add_scalar(scalar_name, meter_value.avg, step)
logger.synchronize_between_processes() logger.synchronize_between_processes()
print(header, logger) print(header, logger)
...@@ -192,7 +192,7 @@ def _evaluate( ...@@ -192,7 +192,7 @@ def _evaluate(
return logger_metrics return logger_metrics
def evaluate(model, loader, args, writter=None, step=None): def evaluate(model, loader, args, writer=None, step=None):
os.makedirs(args.img_folder, exist_ok=True) os.makedirs(args.img_folder, exist_ok=True)
checkpoint_name = os.path.basename(args.checkpoint) or args.weights checkpoint_name = os.path.basename(args.checkpoint) or args.weights
image_checkpoint_folder = os.path.join(args.img_folder, checkpoint_name) image_checkpoint_folder = os.path.join(args.img_folder, checkpoint_name)
...@@ -215,7 +215,7 @@ def evaluate(model, loader, args, writter=None, step=None): ...@@ -215,7 +215,7 @@ def evaluate(model, loader, args, writter=None, step=None):
padder_mode=args.padder_type, padder_mode=args.padder_type,
header=f"{args.dataset} evaluation@ size:{args.eval_size} n_cascades:{n_cascades} n_iters:{n_iters}", header=f"{args.dataset} evaluation@ size:{args.eval_size} n_cascades:{n_cascades} n_iters:{n_iters}",
batch_size=args.batch_size, batch_size=args.batch_size,
writter=writter, writer=writer,
step=step, step=step,
iterations=n_iters, iterations=n_iters,
cascades=n_cascades, cascades=n_cascades,
...@@ -271,7 +271,7 @@ def load_checkpoint(args): ...@@ -271,7 +271,7 @@ def load_checkpoint(args):
model = torchvision.prototype.models.depth.stereo.__dict__[args.model](weights=None) model = torchvision.prototype.models.depth.stereo.__dict__[args.model](weights=None)
model.load_state_dict(checkpoint) model.load_state_dict(checkpoint)
# set the appropiate devices # set the appropriate devices
if args.distributed and args.device == "cpu": if args.distributed and args.device == "cpu":
raise ValueError("The device must be cuda if we want to run in distributed mode using torchrun") raise ValueError("The device must be cuda if we want to run in distributed mode using torchrun")
device = torch.device(args.device) device = torch.device(args.device)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment