"...unconditional_image_generation/train_unconditional.py" did not exist on "848c86ca0a2e321cda294c3cd3e8a3572281c2f3"
Commit b634945d authored by limm's avatar limm
Browse files

support v0.6

parent 5b3792fc
# Copyright (c) Facebook, Inc. and its affiliates.
# This file defines a container that compiles the C++ examples of detectron2.
# See docker/README.md for usage.
# Depends on the image produced by "./Dockerfile"
FROM detectron2:v0
USER appuser
ENV HOME=/home/appuser
WORKDIR $HOME
# Let torchvision find libtorch
ENV CMAKE_PREFIX_PATH=$HOME/.local/lib/python3.6/site-packages/torch/
RUN sudo apt-get update && sudo apt-get install libgflags-dev libgoogle-glog-dev libopencv-dev --yes
RUN pip install mkl-include
# Install the correct version of protobuf (find it at torch/caffe2/proto/caffe2.pb.h after installing pytorch):
RUN wget https://github.com/protocolbuffers/protobuf/releases/download/v3.13.0/protobuf-cpp-3.13.0.tar.gz && tar xf protobuf-cpp-3.13.0.tar.gz
RUN export CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=$(python3 -c 'import torch; print(int(torch.compiled_with_cxx11_abi()))'); \
cd protobuf-3.13.0 && \
./configure --prefix=$HOME/.local && make -j && make install
# install libtorchvision
RUN git clone --branch v0.11.1 https://github.com/pytorch/vision/
RUN mkdir vision/build && cd vision/build && \
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/.local -DCMAKE_BUILD_TYPE=Release -DWITH_CUDA=on -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST && \
make -j && make install
# make our installation take effect
ENV CPATH=$HOME/.local/include \
LIBRARY_PATH=$HOME/.local/lib \
LD_LIBRARY_PATH=$HOME/.local/lib
# build C++ examples of detectron2
RUN cd detectron2_repo/tools/deploy && mkdir build && cd build && \
cmake -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST .. && make
# binaries will be available under tools/deploy/build
version: "2.3"
services:
detectron2:
build:
context: .
dockerfile: Dockerfile
args:
USER_ID: ${USER_ID:-1000}
deploy:
resources:
reservations:
devices:
- capabilities:
- gpu
shm_size: "8gb"
ulimits:
memlock: -1
stack: 67108864
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix:ro
environment:
- DISPLAY=$DISPLAY
- NVIDIA_VISIBLE_DEVICES=all
# Uncomment with proper source to access webcam from docker
# devices:
# - /dev/video0:/dev/video0
# DeepLab in Detectron2
In this repository, we implement DeepLabV3 and DeepLabV3+ in Detectron2.
## Installation
Install Detectron2 following [the instructions](https://detectron2.readthedocs.io/tutorials/install.html).
## Training
To train a model with 8 GPUs run:
```bash
cd /path/to/detectron2/projects/DeepLab
python train_net.py --config-file configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml --num-gpus 8
```
## Evaluation
Model evaluation can be done similarly:
```bash
cd /path/to/detectron2/projects/DeepLab
python train_net.py --config-file configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint
```
## Cityscapes Semantic Segmentation
Cityscapes models are trained with ImageNet pretraining.
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Method</th>
<th valign="bottom">Backbone</th>
<th valign="bottom">Output<br/>resolution</th>
<th valign="bottom">mIoU</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<tr><td align="left">DeepLabV3</td>
<td align="center">R101-DC5</td>
<td align="center">1024&times;2048</td>
<td align="center"> 76.7 </td>
<td align="center"> - </td>
<td align="center"> - &nbsp;|&nbsp; - </td>
</tr>
<tr><td align="left"><a href="configs/Cityscapes-SemanticSegmentation/deeplab_v3_R_103_os16_mg124_poly_90k_bs16.yaml">DeepLabV3</a></td>
<td align="center">R103-DC5</td>
<td align="center">1024&times;2048</td>
<td align="center"> 78.5 </td>
<td align="center"> 28041665 </td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/DeepLab/Cityscapes-SemanticSegmentation/deeplab_v3_R_103_os16_mg124_poly_90k_bs16/28041665/model_final_0dff1b.pkl
">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/DeepLab/Cityscapes-SemanticSegmentation/deeplab_v3_R_103_os16_mg124_poly_90k_bs16/28041665/metrics.json
">metrics</a></td>
</tr>
<tr><td align="left">DeepLabV3+</td>
<td align="center">R101-DC5</td>
<td align="center">1024&times;2048</td>
<td align="center"> 78.1 </td>
<td align="center"> - </td>
<td align="center"> - &nbsp;|&nbsp; - </td>
</tr>
<tr><td align="left"><a href="configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml">DeepLabV3+</a></td>
<td align="center">R103-DC5</td>
<td align="center">1024&times;2048</td>
<td align="center"> 80.0 </td>
<td align="center">28054032</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/DeepLab/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16/28054032/model_final_a8a355.pkl
">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/DeepLab/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16/28054032/metrics.json
">metrics</a></td>
</tr>
</tbody></table>
Note:
- [R103](https://dl.fbaipublicfiles.com/detectron2/DeepLab/R-103.pkl): a ResNet-101 with its first 7x7 convolution replaced by 3 3x3 convolutions.
This modification has been used in most semantic segmentation papers. We pre-train this backbone on ImageNet using the default recipe of [pytorch examples](https://github.com/pytorch/examples/tree/master/imagenet).
- DC5 means using dilated convolution in `res5`.
## <a name="CitingDeepLab"></a>Citing DeepLab
If you use DeepLab, please use the following BibTeX entry.
* DeepLabv3+:
```
@inproceedings{deeplabv3plus2018,
title={Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation},
author={Liang-Chieh Chen and Yukun Zhu and George Papandreou and Florian Schroff and Hartwig Adam},
booktitle={ECCV},
year={2018}
}
```
* DeepLabv3:
```
@article{deeplabv32018,
title={Rethinking atrous convolution for semantic image segmentation},
author={Chen, Liang-Chieh and Papandreou, George and Schroff, Florian and Adam, Hartwig},
journal={arXiv:1706.05587},
year={2017}
}
```
_BASE_: "../../../../configs/Base-RCNN-DilatedC5.yaml"
MODEL:
META_ARCHITECTURE: "SemanticSegmentor"
BACKBONE:
FREEZE_AT: 0
SEM_SEG_HEAD:
NAME: "DeepLabV3Head"
IN_FEATURES: ["res5"]
ASPP_CHANNELS: 256
ASPP_DILATIONS: [6, 12, 18]
ASPP_DROPOUT: 0.1
CONVS_DIM: 256
COMMON_STRIDE: 16
NUM_CLASSES: 19
LOSS_TYPE: "hard_pixel_mining"
DATASETS:
TRAIN: ("cityscapes_fine_sem_seg_train",)
TEST: ("cityscapes_fine_sem_seg_val",)
SOLVER:
BASE_LR: 0.01
MAX_ITER: 90000
LR_SCHEDULER_NAME: "WarmupPolyLR"
IMS_PER_BATCH: 16
INPUT:
MIN_SIZE_TRAIN: (512, 768, 1024, 1280, 1536, 1792, 2048)
MIN_SIZE_TRAIN_SAMPLING: "choice"
MIN_SIZE_TEST: 1024
MAX_SIZE_TRAIN: 4096
MAX_SIZE_TEST: 2048
CROP:
ENABLED: True
TYPE: "absolute"
SIZE: (512, 1024)
SINGLE_CATEGORY_MAX_AREA: 1.0
DATALOADER:
NUM_WORKERS: 10
_BASE_: Base-DeepLabV3-OS16-Semantic.yaml
MODEL:
WEIGHTS: "detectron2://DeepLab/R-103.pkl"
PIXEL_MEAN: [123.675, 116.280, 103.530]
PIXEL_STD: [58.395, 57.120, 57.375]
BACKBONE:
NAME: "build_resnet_deeplab_backbone"
RESNETS:
DEPTH: 101
NORM: "SyncBN"
RES5_MULTI_GRID: [1, 2, 4]
STEM_TYPE: "deeplab"
STEM_OUT_CHANNELS: 128
STRIDE_IN_1X1: False
SEM_SEG_HEAD:
NAME: "DeepLabV3Head"
NORM: "SyncBN"
INPUT:
FORMAT: "RGB"
_BASE_: Base-DeepLabV3-OS16-Semantic.yaml
MODEL:
WEIGHTS: "detectron2://DeepLab/R-103.pkl"
PIXEL_MEAN: [123.675, 116.280, 103.530]
PIXEL_STD: [58.395, 57.120, 57.375]
BACKBONE:
NAME: "build_resnet_deeplab_backbone"
RESNETS:
DEPTH: 101
NORM: "SyncBN"
OUT_FEATURES: ["res2", "res5"]
RES5_MULTI_GRID: [1, 2, 4]
STEM_TYPE: "deeplab"
STEM_OUT_CHANNELS: 128
STRIDE_IN_1X1: False
SEM_SEG_HEAD:
NAME: "DeepLabV3PlusHead"
IN_FEATURES: ["res2", "res5"]
PROJECT_FEATURES: ["res2"]
PROJECT_CHANNELS: [48]
NORM: "SyncBN"
COMMON_STRIDE: 4
INPUT:
FORMAT: "RGB"
# Copyright (c) Facebook, Inc. and its affiliates.
from .build_solver import build_lr_scheduler
from .config import add_deeplab_config
from .resnet import build_resnet_deeplab_backbone
from .semantic_seg import DeepLabV3Head, DeepLabV3PlusHead
# Copyright (c) Facebook, Inc. and its affiliates.
import torch
from detectron2.config import CfgNode
from detectron2.solver import build_lr_scheduler as build_d2_lr_scheduler
from .lr_scheduler import WarmupPolyLR
def build_lr_scheduler(
cfg: CfgNode, optimizer: torch.optim.Optimizer
) -> torch.optim.lr_scheduler._LRScheduler:
"""
Build a LR scheduler from config.
"""
name = cfg.SOLVER.LR_SCHEDULER_NAME
if name == "WarmupPolyLR":
return WarmupPolyLR(
optimizer,
cfg.SOLVER.MAX_ITER,
warmup_factor=cfg.SOLVER.WARMUP_FACTOR,
warmup_iters=cfg.SOLVER.WARMUP_ITERS,
warmup_method=cfg.SOLVER.WARMUP_METHOD,
power=cfg.SOLVER.POLY_LR_POWER,
constant_ending=cfg.SOLVER.POLY_LR_CONSTANT_ENDING,
)
else:
return build_d2_lr_scheduler(cfg, optimizer)
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates.
def add_deeplab_config(cfg):
"""
Add config for DeepLab.
"""
# We retry random cropping until no single category in semantic segmentation GT occupies more
# than `SINGLE_CATEGORY_MAX_AREA` part of the crop.
cfg.INPUT.CROP.SINGLE_CATEGORY_MAX_AREA = 1.0
# Used for `poly` learning rate schedule.
cfg.SOLVER.POLY_LR_POWER = 0.9
cfg.SOLVER.POLY_LR_CONSTANT_ENDING = 0.0
# Loss type, choose from `cross_entropy`, `hard_pixel_mining`.
cfg.MODEL.SEM_SEG_HEAD.LOSS_TYPE = "hard_pixel_mining"
# DeepLab settings
cfg.MODEL.SEM_SEG_HEAD.PROJECT_FEATURES = ["res2"]
cfg.MODEL.SEM_SEG_HEAD.PROJECT_CHANNELS = [48]
cfg.MODEL.SEM_SEG_HEAD.ASPP_CHANNELS = 256
cfg.MODEL.SEM_SEG_HEAD.ASPP_DILATIONS = [6, 12, 18]
cfg.MODEL.SEM_SEG_HEAD.ASPP_DROPOUT = 0.1
cfg.MODEL.SEM_SEG_HEAD.USE_DEPTHWISE_SEPARABLE_CONV = False
# Backbone new configs
cfg.MODEL.RESNETS.RES4_DILATION = 1
cfg.MODEL.RESNETS.RES5_MULTI_GRID = [1, 2, 4]
# ResNet stem type from: `basic`, `deeplab`
cfg.MODEL.RESNETS.STEM_TYPE = "deeplab"
# Copyright (c) Facebook, Inc. and its affiliates.
import torch
import torch.nn as nn
class DeepLabCE(nn.Module):
"""
Hard pixel mining with cross entropy loss, for semantic segmentation.
This is used in TensorFlow DeepLab frameworks.
Paper: DeeperLab: Single-Shot Image Parser
Reference: https://github.com/tensorflow/models/blob/bd488858d610e44df69da6f89277e9de8a03722c/research/deeplab/utils/train_utils.py#L33 # noqa
Arguments:
ignore_label: Integer, label to ignore.
top_k_percent_pixels: Float, the value lies in [0.0, 1.0]. When its
value < 1.0, only compute the loss for the top k percent pixels
(e.g., the top 20% pixels). This is useful for hard pixel mining.
weight: Tensor, a manual rescaling weight given to each class.
"""
def __init__(self, ignore_label=-1, top_k_percent_pixels=1.0, weight=None):
super(DeepLabCE, self).__init__()
self.top_k_percent_pixels = top_k_percent_pixels
self.ignore_label = ignore_label
self.criterion = nn.CrossEntropyLoss(
weight=weight, ignore_index=ignore_label, reduction="none"
)
def forward(self, logits, labels, weights=None):
if weights is None:
pixel_losses = self.criterion(logits, labels).contiguous().view(-1)
else:
# Apply per-pixel loss weights.
pixel_losses = self.criterion(logits, labels) * weights
pixel_losses = pixel_losses.contiguous().view(-1)
if self.top_k_percent_pixels == 1.0:
return pixel_losses.mean()
top_k_pixels = int(self.top_k_percent_pixels * pixel_losses.numel())
pixel_losses, _ = torch.topk(pixel_losses, top_k_pixels)
return pixel_losses.mean()
# Copyright (c) Facebook, Inc. and its affiliates.
import math
from typing import List
import torch
from detectron2.solver.lr_scheduler import _get_warmup_factor_at_iter
# NOTE: PyTorch's LR scheduler interface uses names that assume the LR changes
# only on epoch boundaries. We typically use iteration based schedules instead.
# As a result, "epoch" (e.g., as in self.last_epoch) should be understood to mean
# "iteration" instead.
# FIXME: ideally this would be achieved with a CombinedLRScheduler, separating
# MultiStepLR with WarmupLR but the current LRScheduler design doesn't allow it.
class WarmupPolyLR(torch.optim.lr_scheduler._LRScheduler):
"""
Poly learning rate schedule used to train DeepLab.
Paper: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets,
Atrous Convolution, and Fully Connected CRFs.
Reference: https://github.com/tensorflow/models/blob/21b73d22f3ed05b650e85ac50849408dd36de32e/research/deeplab/utils/train_utils.py#L337 # noqa
"""
def __init__(
self,
optimizer: torch.optim.Optimizer,
max_iters: int,
warmup_factor: float = 0.001,
warmup_iters: int = 1000,
warmup_method: str = "linear",
last_epoch: int = -1,
power: float = 0.9,
constant_ending: float = 0.0,
):
self.max_iters = max_iters
self.warmup_factor = warmup_factor
self.warmup_iters = warmup_iters
self.warmup_method = warmup_method
self.power = power
self.constant_ending = constant_ending
super().__init__(optimizer, last_epoch)
def get_lr(self) -> List[float]:
warmup_factor = _get_warmup_factor_at_iter(
self.warmup_method, self.last_epoch, self.warmup_iters, self.warmup_factor
)
if self.constant_ending > 0 and warmup_factor == 1.0:
# Constant ending lr.
if (
math.pow((1.0 - self.last_epoch / self.max_iters), self.power)
< self.constant_ending
):
return [base_lr * self.constant_ending for base_lr in self.base_lrs]
return [
base_lr * warmup_factor * math.pow((1.0 - self.last_epoch / self.max_iters), self.power)
for base_lr in self.base_lrs
]
def _compute_values(self) -> List[float]:
# The new interface
return self.get_lr()
# Copyright (c) Facebook, Inc. and its affiliates.
import fvcore.nn.weight_init as weight_init
import torch.nn.functional as F
from detectron2.layers import CNNBlockBase, Conv2d, get_norm
from detectron2.modeling import BACKBONE_REGISTRY
from detectron2.modeling.backbone.resnet import (
BasicStem,
BottleneckBlock,
DeformBottleneckBlock,
ResNet,
)
class DeepLabStem(CNNBlockBase):
"""
The DeepLab ResNet stem (layers before the first residual block).
"""
def __init__(self, in_channels=3, out_channels=128, norm="BN"):
"""
Args:
norm (str or callable): norm after the first conv layer.
See :func:`layers.get_norm` for supported format.
"""
super().__init__(in_channels, out_channels, 4)
self.in_channels = in_channels
self.conv1 = Conv2d(
in_channels,
out_channels // 2,
kernel_size=3,
stride=2,
padding=1,
bias=False,
norm=get_norm(norm, out_channels // 2),
)
self.conv2 = Conv2d(
out_channels // 2,
out_channels // 2,
kernel_size=3,
stride=1,
padding=1,
bias=False,
norm=get_norm(norm, out_channels // 2),
)
self.conv3 = Conv2d(
out_channels // 2,
out_channels,
kernel_size=3,
stride=1,
padding=1,
bias=False,
norm=get_norm(norm, out_channels),
)
weight_init.c2_msra_fill(self.conv1)
weight_init.c2_msra_fill(self.conv2)
weight_init.c2_msra_fill(self.conv3)
def forward(self, x):
x = self.conv1(x)
x = F.relu_(x)
x = self.conv2(x)
x = F.relu_(x)
x = self.conv3(x)
x = F.relu_(x)
x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1)
return x
@BACKBONE_REGISTRY.register()
def build_resnet_deeplab_backbone(cfg, input_shape):
"""
Create a ResNet instance from config.
Returns:
ResNet: a :class:`ResNet` instance.
"""
# need registration of new blocks/stems?
norm = cfg.MODEL.RESNETS.NORM
if cfg.MODEL.RESNETS.STEM_TYPE == "basic":
stem = BasicStem(
in_channels=input_shape.channels,
out_channels=cfg.MODEL.RESNETS.STEM_OUT_CHANNELS,
norm=norm,
)
elif cfg.MODEL.RESNETS.STEM_TYPE == "deeplab":
stem = DeepLabStem(
in_channels=input_shape.channels,
out_channels=cfg.MODEL.RESNETS.STEM_OUT_CHANNELS,
norm=norm,
)
else:
raise ValueError("Unknown stem type: {}".format(cfg.MODEL.RESNETS.STEM_TYPE))
# fmt: off
freeze_at = cfg.MODEL.BACKBONE.FREEZE_AT
out_features = cfg.MODEL.RESNETS.OUT_FEATURES
depth = cfg.MODEL.RESNETS.DEPTH
num_groups = cfg.MODEL.RESNETS.NUM_GROUPS
width_per_group = cfg.MODEL.RESNETS.WIDTH_PER_GROUP
bottleneck_channels = num_groups * width_per_group
in_channels = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS
out_channels = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS
stride_in_1x1 = cfg.MODEL.RESNETS.STRIDE_IN_1X1
res4_dilation = cfg.MODEL.RESNETS.RES4_DILATION
res5_dilation = cfg.MODEL.RESNETS.RES5_DILATION
deform_on_per_stage = cfg.MODEL.RESNETS.DEFORM_ON_PER_STAGE
deform_modulated = cfg.MODEL.RESNETS.DEFORM_MODULATED
deform_num_groups = cfg.MODEL.RESNETS.DEFORM_NUM_GROUPS
res5_multi_grid = cfg.MODEL.RESNETS.RES5_MULTI_GRID
# fmt: on
assert res4_dilation in {1, 2}, "res4_dilation cannot be {}.".format(res4_dilation)
assert res5_dilation in {1, 2, 4}, "res5_dilation cannot be {}.".format(res5_dilation)
if res4_dilation == 2:
# Always dilate res5 if res4 is dilated.
assert res5_dilation == 4
num_blocks_per_stage = {50: [3, 4, 6, 3], 101: [3, 4, 23, 3], 152: [3, 8, 36, 3]}[depth]
stages = []
# Avoid creating variables without gradients
# It consumes extra memory and may cause allreduce to fail
out_stage_idx = [{"res2": 2, "res3": 3, "res4": 4, "res5": 5}[f] for f in out_features]
max_stage_idx = max(out_stage_idx)
for idx, stage_idx in enumerate(range(2, max_stage_idx + 1)):
if stage_idx == 4:
dilation = res4_dilation
elif stage_idx == 5:
dilation = res5_dilation
else:
dilation = 1
first_stride = 1 if idx == 0 or dilation > 1 else 2
stage_kargs = {
"num_blocks": num_blocks_per_stage[idx],
"stride_per_block": [first_stride] + [1] * (num_blocks_per_stage[idx] - 1),
"in_channels": in_channels,
"out_channels": out_channels,
"norm": norm,
}
stage_kargs["bottleneck_channels"] = bottleneck_channels
stage_kargs["stride_in_1x1"] = stride_in_1x1
stage_kargs["dilation"] = dilation
stage_kargs["num_groups"] = num_groups
if deform_on_per_stage[idx]:
stage_kargs["block_class"] = DeformBottleneckBlock
stage_kargs["deform_modulated"] = deform_modulated
stage_kargs["deform_num_groups"] = deform_num_groups
else:
stage_kargs["block_class"] = BottleneckBlock
if stage_idx == 5:
stage_kargs.pop("dilation")
stage_kargs["dilation_per_block"] = [dilation * mg for mg in res5_multi_grid]
blocks = ResNet.make_stage(**stage_kargs)
in_channels = out_channels
out_channels *= 2
bottleneck_channels *= 2
stages.append(blocks)
return ResNet(stem, stages, out_features=out_features).freeze(freeze_at)
# Copyright (c) Facebook, Inc. and its affiliates.
from typing import Callable, Dict, List, Optional, Tuple, Union
import fvcore.nn.weight_init as weight_init
import torch
from torch import nn
from torch.nn import functional as F
from detectron2.config import configurable
from detectron2.layers import ASPP, Conv2d, DepthwiseSeparableConv2d, ShapeSpec, get_norm
from detectron2.modeling import SEM_SEG_HEADS_REGISTRY
from .loss import DeepLabCE
@SEM_SEG_HEADS_REGISTRY.register()
class DeepLabV3PlusHead(nn.Module):
"""
A semantic segmentation head described in :paper:`DeepLabV3+`.
"""
@configurable
def __init__(
self,
input_shape: Dict[str, ShapeSpec],
*,
project_channels: List[int],
aspp_dilations: List[int],
aspp_dropout: float,
decoder_channels: List[int],
common_stride: int,
norm: Union[str, Callable],
train_size: Optional[Tuple],
loss_weight: float = 1.0,
loss_type: str = "cross_entropy",
ignore_value: int = -1,
num_classes: Optional[int] = None,
use_depthwise_separable_conv: bool = False,
):
"""
NOTE: this interface is experimental.
Args:
input_shape: shape of the input features. They will be ordered by stride
and the last one (with largest stride) is used as the input to the
decoder (i.e. the ASPP module); the rest are low-level feature for
the intermediate levels of decoder.
project_channels (list[int]): a list of low-level feature channels.
The length should be len(in_features) - 1.
aspp_dilations (list(int)): a list of 3 dilations in ASPP.
aspp_dropout (float): apply dropout on the output of ASPP.
decoder_channels (list[int]): a list of output channels of each
decoder stage. It should have the same length as "in_features"
(each element in "in_features" corresponds to one decoder stage).
common_stride (int): output stride of decoder.
norm (str or callable): normalization for all conv layers.
train_size (tuple): (height, width) of training images.
loss_weight (float): loss weight.
loss_type (str): type of loss function, 2 opptions:
(1) "cross_entropy" is the standard cross entropy loss.
(2) "hard_pixel_mining" is the loss in DeepLab that samples
top k% hardest pixels.
ignore_value (int): category to be ignored during training.
num_classes (int): number of classes, if set to None, the decoder
will not construct a predictor.
use_depthwise_separable_conv (bool): use DepthwiseSeparableConv2d
in ASPP and decoder.
"""
super().__init__()
input_shape = sorted(input_shape.items(), key=lambda x: x[1].stride)
# fmt: off
self.in_features = [k for k, v in input_shape] # starting from "res2" to "res5"
in_channels = [x[1].channels for x in input_shape]
in_strides = [x[1].stride for x in input_shape]
aspp_channels = decoder_channels[-1]
self.ignore_value = ignore_value
self.common_stride = common_stride # output stride
self.loss_weight = loss_weight
self.loss_type = loss_type
self.decoder_only = num_classes is None
self.use_depthwise_separable_conv = use_depthwise_separable_conv
# fmt: on
assert (
len(project_channels) == len(self.in_features) - 1
), "Expected {} project_channels, got {}".format(
len(self.in_features) - 1, len(project_channels)
)
assert len(decoder_channels) == len(
self.in_features
), "Expected {} decoder_channels, got {}".format(
len(self.in_features), len(decoder_channels)
)
self.decoder = nn.ModuleDict()
use_bias = norm == ""
for idx, in_channel in enumerate(in_channels):
decoder_stage = nn.ModuleDict()
if idx == len(self.in_features) - 1:
# ASPP module
if train_size is not None:
train_h, train_w = train_size
encoder_stride = in_strides[-1]
if train_h % encoder_stride or train_w % encoder_stride:
raise ValueError("Crop size need to be divisible by encoder stride.")
pool_h = train_h // encoder_stride
pool_w = train_w // encoder_stride
pool_kernel_size = (pool_h, pool_w)
else:
pool_kernel_size = None
project_conv = ASPP(
in_channel,
aspp_channels,
aspp_dilations,
norm=norm,
activation=F.relu,
pool_kernel_size=pool_kernel_size,
dropout=aspp_dropout,
use_depthwise_separable_conv=use_depthwise_separable_conv,
)
fuse_conv = None
else:
project_conv = Conv2d(
in_channel,
project_channels[idx],
kernel_size=1,
bias=use_bias,
norm=get_norm(norm, project_channels[idx]),
activation=F.relu,
)
weight_init.c2_xavier_fill(project_conv)
if use_depthwise_separable_conv:
# We use a single 5x5 DepthwiseSeparableConv2d to replace
# 2 3x3 Conv2d since they have the same receptive field,
# proposed in :paper:`Panoptic-DeepLab`.
fuse_conv = DepthwiseSeparableConv2d(
project_channels[idx] + decoder_channels[idx + 1],
decoder_channels[idx],
kernel_size=5,
padding=2,
norm1=norm,
activation1=F.relu,
norm2=norm,
activation2=F.relu,
)
else:
fuse_conv = nn.Sequential(
Conv2d(
project_channels[idx] + decoder_channels[idx + 1],
decoder_channels[idx],
kernel_size=3,
padding=1,
bias=use_bias,
norm=get_norm(norm, decoder_channels[idx]),
activation=F.relu,
),
Conv2d(
decoder_channels[idx],
decoder_channels[idx],
kernel_size=3,
padding=1,
bias=use_bias,
norm=get_norm(norm, decoder_channels[idx]),
activation=F.relu,
),
)
weight_init.c2_xavier_fill(fuse_conv[0])
weight_init.c2_xavier_fill(fuse_conv[1])
decoder_stage["project_conv"] = project_conv
decoder_stage["fuse_conv"] = fuse_conv
self.decoder[self.in_features[idx]] = decoder_stage
if not self.decoder_only:
self.predictor = Conv2d(
decoder_channels[0], num_classes, kernel_size=1, stride=1, padding=0
)
nn.init.normal_(self.predictor.weight, 0, 0.001)
nn.init.constant_(self.predictor.bias, 0)
if self.loss_type == "cross_entropy":
self.loss = nn.CrossEntropyLoss(reduction="mean", ignore_index=self.ignore_value)
elif self.loss_type == "hard_pixel_mining":
self.loss = DeepLabCE(ignore_label=self.ignore_value, top_k_percent_pixels=0.2)
else:
raise ValueError("Unexpected loss type: %s" % self.loss_type)
@classmethod
def from_config(cls, cfg, input_shape):
if cfg.INPUT.CROP.ENABLED:
assert cfg.INPUT.CROP.TYPE == "absolute"
train_size = cfg.INPUT.CROP.SIZE
else:
train_size = None
decoder_channels = [cfg.MODEL.SEM_SEG_HEAD.CONVS_DIM] * (
len(cfg.MODEL.SEM_SEG_HEAD.IN_FEATURES) - 1
) + [cfg.MODEL.SEM_SEG_HEAD.ASPP_CHANNELS]
ret = dict(
input_shape={
k: v for k, v in input_shape.items() if k in cfg.MODEL.SEM_SEG_HEAD.IN_FEATURES
},
project_channels=cfg.MODEL.SEM_SEG_HEAD.PROJECT_CHANNELS,
aspp_dilations=cfg.MODEL.SEM_SEG_HEAD.ASPP_DILATIONS,
aspp_dropout=cfg.MODEL.SEM_SEG_HEAD.ASPP_DROPOUT,
decoder_channels=decoder_channels,
common_stride=cfg.MODEL.SEM_SEG_HEAD.COMMON_STRIDE,
norm=cfg.MODEL.SEM_SEG_HEAD.NORM,
train_size=train_size,
loss_weight=cfg.MODEL.SEM_SEG_HEAD.LOSS_WEIGHT,
loss_type=cfg.MODEL.SEM_SEG_HEAD.LOSS_TYPE,
ignore_value=cfg.MODEL.SEM_SEG_HEAD.IGNORE_VALUE,
num_classes=cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES,
use_depthwise_separable_conv=cfg.MODEL.SEM_SEG_HEAD.USE_DEPTHWISE_SEPARABLE_CONV,
)
return ret
def forward(self, features, targets=None):
"""
Returns:
In training, returns (None, dict of losses)
In inference, returns (CxHxW logits, {})
"""
y = self.layers(features)
if self.decoder_only:
# Output from self.layers() only contains decoder feature.
return y
if self.training:
return None, self.losses(y, targets)
else:
y = F.interpolate(
y, scale_factor=self.common_stride, mode="bilinear", align_corners=False
)
return y, {}
def layers(self, features):
# Reverse feature maps into top-down order (from low to high resolution)
for f in self.in_features[::-1]:
x = features[f]
proj_x = self.decoder[f]["project_conv"](x)
if self.decoder[f]["fuse_conv"] is None:
# This is aspp module
y = proj_x
else:
# Upsample y
y = F.interpolate(y, size=proj_x.size()[2:], mode="bilinear", align_corners=False)
y = torch.cat([proj_x, y], dim=1)
y = self.decoder[f]["fuse_conv"](y)
if not self.decoder_only:
y = self.predictor(y)
return y
def losses(self, predictions, targets):
predictions = F.interpolate(
predictions, scale_factor=self.common_stride, mode="bilinear", align_corners=False
)
loss = self.loss(predictions, targets)
losses = {"loss_sem_seg": loss * self.loss_weight}
return losses
@SEM_SEG_HEADS_REGISTRY.register()
class DeepLabV3Head(nn.Module):
"""
A semantic segmentation head described in :paper:`DeepLabV3`.
"""
def __init__(self, cfg, input_shape: Dict[str, ShapeSpec]):
super().__init__()
# fmt: off
self.in_features = cfg.MODEL.SEM_SEG_HEAD.IN_FEATURES
in_channels = [input_shape[f].channels for f in self.in_features]
aspp_channels = cfg.MODEL.SEM_SEG_HEAD.ASPP_CHANNELS
aspp_dilations = cfg.MODEL.SEM_SEG_HEAD.ASPP_DILATIONS
self.ignore_value = cfg.MODEL.SEM_SEG_HEAD.IGNORE_VALUE
num_classes = cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES
conv_dims = cfg.MODEL.SEM_SEG_HEAD.CONVS_DIM
self.common_stride = cfg.MODEL.SEM_SEG_HEAD.COMMON_STRIDE # output stride
norm = cfg.MODEL.SEM_SEG_HEAD.NORM
self.loss_weight = cfg.MODEL.SEM_SEG_HEAD.LOSS_WEIGHT
self.loss_type = cfg.MODEL.SEM_SEG_HEAD.LOSS_TYPE
train_crop_size = cfg.INPUT.CROP.SIZE
aspp_dropout = cfg.MODEL.SEM_SEG_HEAD.ASPP_DROPOUT
use_depthwise_separable_conv = cfg.MODEL.SEM_SEG_HEAD.USE_DEPTHWISE_SEPARABLE_CONV
# fmt: on
assert len(self.in_features) == 1
assert len(in_channels) == 1
# ASPP module
if cfg.INPUT.CROP.ENABLED:
assert cfg.INPUT.CROP.TYPE == "absolute"
train_crop_h, train_crop_w = train_crop_size
if train_crop_h % self.common_stride or train_crop_w % self.common_stride:
raise ValueError("Crop size need to be divisible by output stride.")
pool_h = train_crop_h // self.common_stride
pool_w = train_crop_w // self.common_stride
pool_kernel_size = (pool_h, pool_w)
else:
pool_kernel_size = None
self.aspp = ASPP(
in_channels[0],
aspp_channels,
aspp_dilations,
norm=norm,
activation=F.relu,
pool_kernel_size=pool_kernel_size,
dropout=aspp_dropout,
use_depthwise_separable_conv=use_depthwise_separable_conv,
)
self.predictor = Conv2d(conv_dims, num_classes, kernel_size=1, stride=1, padding=0)
nn.init.normal_(self.predictor.weight, 0, 0.001)
nn.init.constant_(self.predictor.bias, 0)
if self.loss_type == "cross_entropy":
self.loss = nn.CrossEntropyLoss(reduction="mean", ignore_index=self.ignore_value)
elif self.loss_type == "hard_pixel_mining":
self.loss = DeepLabCE(ignore_label=self.ignore_value, top_k_percent_pixels=0.2)
else:
raise ValueError("Unexpected loss type: %s" % self.loss_type)
def forward(self, features, targets=None):
"""
Returns:
In training, returns (None, dict of losses)
In inference, returns (CxHxW logits, {})
"""
x = features[self.in_features[0]]
x = self.aspp(x)
x = self.predictor(x)
if self.training:
return None, self.losses(x, targets)
else:
x = F.interpolate(
x, scale_factor=self.common_stride, mode="bilinear", align_corners=False
)
return x, {}
def losses(self, predictions, targets):
predictions = F.interpolate(
predictions, scale_factor=self.common_stride, mode="bilinear", align_corners=False
)
loss = self.loss(predictions, targets)
losses = {"loss_sem_seg": loss * self.loss_weight}
return losses
#!/usr/bin/env python3
# Copyright (c) Facebook, Inc. and its affiliates.
"""
DeepLab Training Script.
This script is a simplified version of the training script in detectron2/tools.
"""
import os
import torch
import detectron2.data.transforms as T
import detectron2.utils.comm as comm
from detectron2.checkpoint import DetectionCheckpointer
from detectron2.config import get_cfg
from detectron2.data import DatasetMapper, MetadataCatalog, build_detection_train_loader
from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, launch
from detectron2.evaluation import CityscapesSemSegEvaluator, DatasetEvaluators, SemSegEvaluator
from detectron2.projects.deeplab import add_deeplab_config, build_lr_scheduler
def build_sem_seg_train_aug(cfg):
augs = [
T.ResizeShortestEdge(
cfg.INPUT.MIN_SIZE_TRAIN, cfg.INPUT.MAX_SIZE_TRAIN, cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING
)
]
if cfg.INPUT.CROP.ENABLED:
augs.append(
T.RandomCrop_CategoryAreaConstraint(
cfg.INPUT.CROP.TYPE,
cfg.INPUT.CROP.SIZE,
cfg.INPUT.CROP.SINGLE_CATEGORY_MAX_AREA,
cfg.MODEL.SEM_SEG_HEAD.IGNORE_VALUE,
)
)
augs.append(T.RandomFlip())
return augs
class Trainer(DefaultTrainer):
"""
We use the "DefaultTrainer" which contains a number pre-defined logic for
standard training workflow. They may not work for you, especially if you
are working on a new research project. In that case you can use the cleaner
"SimpleTrainer", or write your own training loop.
"""
@classmethod
def build_evaluator(cls, cfg, dataset_name, output_folder=None):
"""
Create evaluator(s) for a given dataset.
This uses the special metadata "evaluator_type" associated with each builtin dataset.
For your own dataset, you can simply create an evaluator manually in your
script and do not have to worry about the hacky if-else logic here.
"""
if output_folder is None:
output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
evaluator_list = []
evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type
if evaluator_type == "sem_seg":
return SemSegEvaluator(
dataset_name,
distributed=True,
output_dir=output_folder,
)
if evaluator_type == "cityscapes_sem_seg":
assert (
torch.cuda.device_count() > comm.get_rank()
), "CityscapesEvaluator currently do not work with multiple machines."
return CityscapesSemSegEvaluator(dataset_name)
if len(evaluator_list) == 0:
raise NotImplementedError(
"no Evaluator for the dataset {} with the type {}".format(
dataset_name, evaluator_type
)
)
if len(evaluator_list) == 1:
return evaluator_list[0]
return DatasetEvaluators(evaluator_list)
@classmethod
def build_train_loader(cls, cfg):
if "SemanticSegmentor" in cfg.MODEL.META_ARCHITECTURE:
mapper = DatasetMapper(cfg, is_train=True, augmentations=build_sem_seg_train_aug(cfg))
else:
mapper = None
return build_detection_train_loader(cfg, mapper=mapper)
@classmethod
def build_lr_scheduler(cls, cfg, optimizer):
"""
It now calls :func:`detectron2.solver.build_lr_scheduler`.
Overwrite it if you'd like a different scheduler.
"""
return build_lr_scheduler(cfg, optimizer)
def setup(args):
"""
Create configs and perform basic setups.
"""
cfg = get_cfg()
add_deeplab_config(cfg)
cfg.merge_from_file(args.config_file)
cfg.merge_from_list(args.opts)
cfg.freeze()
default_setup(cfg, args)
return cfg
def main(args):
cfg = setup(args)
if args.eval_only:
model = Trainer.build_model(cfg)
DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
cfg.MODEL.WEIGHTS, resume=args.resume
)
res = Trainer.test(cfg, model)
return res
trainer = Trainer(cfg)
trainer.resume_or_load(resume=args.resume)
return trainer.train()
if __name__ == "__main__":
args = default_argument_parser().parse_args()
print("Command Line Args:", args)
launch(
main,
args.num_gpus,
num_machines=args.num_machines,
machine_rank=args.machine_rank,
dist_url=args.dist_url,
args=(args,),
)
# DensePose in Detectron2
DensePose aims at learning and establishing dense correspondences between image pixels
and 3D object geometry for deformable objects, such as humans or animals.
In this repository, we provide the code to train and evaluate DensePose R-CNN and
various tools to visualize DensePose annotations and results.
There are two main paradigms that are used within DensePose project.
## [Chart-based Dense Pose Estimation for Humans and Animals](doc/DENSEPOSE_IUV.md)
<div align="center">
<img src="https://dl.fbaipublicfiles.com/densepose/web/densepose_teaser_compressed_25.gif" width="700px" />
</div>
For chart-based estimation, 3D object mesh is split into charts and
for each pixel the model estimates chart index `I` and local chart coordinates `(U, V)`.
Please follow the link above to find a [detailed overview](doc/DENSEPOSE_IUV.md#Overview)
of the method, links to trained models along with their performance evaluation in the
[Model Zoo](doc/DENSEPOSE_IUV.md#ModelZoo) and
[references](doc/DENSEPOSE_IUV.md#References) to the corresponding papers.
## [Continuous Surface Embeddings for Dense Pose Estimation for Humans and Animals](doc/DENSEPOSE_CSE.md)
<div align="center">
<img src="https://dl.fbaipublicfiles.com/densepose/web/densepose_cse_teaser.png" width="700px" />
</div>
To establish continuous surface embeddings, the model simultaneously learns
descriptors for mesh vertices and for image pixels.
The embeddings are put into correspondence, thus the location
of each pixel on the 3D model is derived.
Please follow the link above to find a [detailed overview](doc/DENSEPOSE_CSE.md#Overview)
of the method, links to trained models along with their performance evaluation in the
[Model Zoo](doc/DENSEPOSE_CSE.md#ModelZoo) and
[references](doc/DENSEPOSE_CSE.md#References) to the corresponding papers.
# Quick Start
See [ Getting Started ](doc/GETTING_STARTED.md)
# Model Zoo
Please check the dedicated pages
for [chart-based model zoo](doc/DENSEPOSE_IUV.md#ModelZoo)
and for [continuous surface embeddings model zoo](doc/DENSEPOSE_CSE.md#ModelZoo).
# What's New
* June 2021: [DensePose CSE with Cycle Losses](doc/RELEASE_2021_06.md)
* March 2021: [DensePose CSE (a framework to extend DensePose to various categories using 3D models)
and DensePose Evolution (a framework to bootstrap DensePose on unlabeled data) released](doc/RELEASE_2021_03.md)
* April 2020: [DensePose Confidence Estimation and Model Zoo Improvements](doc/RELEASE_2020_04.md)
# License
Detectron2 is released under the [Apache 2.0 license](../../LICENSE)
## <a name="CitingDensePose"></a>Citing DensePose
If you use DensePose, please refer to the BibTeX entries
for [chart-based models](doc/DENSEPOSE_IUV.md#References)
and for [continuous surface embeddings](doc/DENSEPOSE_CSE.md#References).
#!/usr/bin/env python3
# Copyright (c) Facebook, Inc. and its affiliates.
import argparse
import glob
import logging
import os
import pickle
import sys
from typing import Any, ClassVar, Dict, List
import torch
from detectron2.config import CfgNode, get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.engine.defaults import DefaultPredictor
from detectron2.structures.instances import Instances
from detectron2.utils.logger import setup_logger
from densepose import add_densepose_config
from densepose.structures import DensePoseChartPredictorOutput, DensePoseEmbeddingPredictorOutput
from densepose.utils.logger import verbosity_to_level
from densepose.vis.base import CompoundVisualizer
from densepose.vis.bounding_box import ScoredBoundingBoxVisualizer
from densepose.vis.densepose_outputs_vertex import (
DensePoseOutputsTextureVisualizer,
DensePoseOutputsVertexVisualizer,
get_texture_atlases,
)
from densepose.vis.densepose_results import (
DensePoseResultsContourVisualizer,
DensePoseResultsFineSegmentationVisualizer,
DensePoseResultsUVisualizer,
DensePoseResultsVVisualizer,
)
from densepose.vis.densepose_results_textures import (
DensePoseResultsVisualizerWithTexture,
get_texture_atlas,
)
from densepose.vis.extractor import (
CompoundExtractor,
DensePoseOutputsExtractor,
DensePoseResultExtractor,
create_extractor,
)
DOC = """Apply Net - a tool to print / visualize DensePose results
"""
LOGGER_NAME = "apply_net"
logger = logging.getLogger(LOGGER_NAME)
_ACTION_REGISTRY: Dict[str, "Action"] = {}
class Action(object):
@classmethod
def add_arguments(cls: type, parser: argparse.ArgumentParser):
parser.add_argument(
"-v",
"--verbosity",
action="count",
help="Verbose mode. Multiple -v options increase the verbosity.",
)
def register_action(cls: type):
"""
Decorator for action classes to automate action registration
"""
global _ACTION_REGISTRY
_ACTION_REGISTRY[cls.COMMAND] = cls
return cls
class InferenceAction(Action):
@classmethod
def add_arguments(cls: type, parser: argparse.ArgumentParser):
super(InferenceAction, cls).add_arguments(parser)
parser.add_argument("cfg", metavar="<config>", help="Config file")
parser.add_argument("model", metavar="<model>", help="Model file")
parser.add_argument("input", metavar="<input>", help="Input data")
parser.add_argument(
"--opts",
help="Modify config options using the command-line 'KEY VALUE' pairs",
default=[],
nargs=argparse.REMAINDER,
)
@classmethod
def execute(cls: type, args: argparse.Namespace):
logger.info(f"Loading config from {args.cfg}")
opts = []
cfg = cls.setup_config(args.cfg, args.model, args, opts)
logger.info(f"Loading model from {args.model}")
predictor = DefaultPredictor(cfg)
logger.info(f"Loading data from {args.input}")
file_list = cls._get_input_file_list(args.input)
if len(file_list) == 0:
logger.warning(f"No input images for {args.input}")
return
context = cls.create_context(args, cfg)
for file_name in file_list:
img = read_image(file_name, format="BGR") # predictor expects BGR image.
with torch.no_grad():
outputs = predictor(img)["instances"]
cls.execute_on_outputs(context, {"file_name": file_name, "image": img}, outputs)
cls.postexecute(context)
@classmethod
def setup_config(
cls: type, config_fpath: str, model_fpath: str, args: argparse.Namespace, opts: List[str]
):
cfg = get_cfg()
add_densepose_config(cfg)
cfg.merge_from_file(config_fpath)
cfg.merge_from_list(args.opts)
if opts:
cfg.merge_from_list(opts)
cfg.MODEL.WEIGHTS = model_fpath
cfg.freeze()
return cfg
@classmethod
def _get_input_file_list(cls: type, input_spec: str):
if os.path.isdir(input_spec):
file_list = [
os.path.join(input_spec, fname)
for fname in os.listdir(input_spec)
if os.path.isfile(os.path.join(input_spec, fname))
]
elif os.path.isfile(input_spec):
file_list = [input_spec]
else:
file_list = glob.glob(input_spec)
return file_list
@register_action
class DumpAction(InferenceAction):
"""
Dump action that outputs results to a pickle file
"""
COMMAND: ClassVar[str] = "dump"
@classmethod
def add_parser(cls: type, subparsers: argparse._SubParsersAction):
parser = subparsers.add_parser(cls.COMMAND, help="Dump model outputs to a file.")
cls.add_arguments(parser)
parser.set_defaults(func=cls.execute)
@classmethod
def add_arguments(cls: type, parser: argparse.ArgumentParser):
super(DumpAction, cls).add_arguments(parser)
parser.add_argument(
"--output",
metavar="<dump_file>",
default="results.pkl",
help="File name to save dump to",
)
@classmethod
def execute_on_outputs(
cls: type, context: Dict[str, Any], entry: Dict[str, Any], outputs: Instances
):
image_fpath = entry["file_name"]
logger.info(f"Processing {image_fpath}")
result = {"file_name": image_fpath}
if outputs.has("scores"):
result["scores"] = outputs.get("scores").cpu()
if outputs.has("pred_boxes"):
result["pred_boxes_XYXY"] = outputs.get("pred_boxes").tensor.cpu()
if outputs.has("pred_densepose"):
if isinstance(outputs.pred_densepose, DensePoseChartPredictorOutput):
extractor = DensePoseResultExtractor()
elif isinstance(outputs.pred_densepose, DensePoseEmbeddingPredictorOutput):
extractor = DensePoseOutputsExtractor()
result["pred_densepose"] = extractor(outputs)[0]
context["results"].append(result)
@classmethod
def create_context(cls: type, args: argparse.Namespace, cfg: CfgNode):
context = {"results": [], "out_fname": args.output}
return context
@classmethod
def postexecute(cls: type, context: Dict[str, Any]):
out_fname = context["out_fname"]
out_dir = os.path.dirname(out_fname)
if len(out_dir) > 0 and not os.path.exists(out_dir):
os.makedirs(out_dir)
with open(out_fname, "wb") as hFile:
pickle.dump(context["results"], hFile)
logger.info(f"Output saved to {out_fname}")
@register_action
class ShowAction(InferenceAction):
"""
Show action that visualizes selected entries on an image
"""
COMMAND: ClassVar[str] = "show"
VISUALIZERS: ClassVar[Dict[str, object]] = {
"dp_contour": DensePoseResultsContourVisualizer,
"dp_segm": DensePoseResultsFineSegmentationVisualizer,
"dp_u": DensePoseResultsUVisualizer,
"dp_v": DensePoseResultsVVisualizer,
"dp_iuv_texture": DensePoseResultsVisualizerWithTexture,
"dp_cse_texture": DensePoseOutputsTextureVisualizer,
"dp_vertex": DensePoseOutputsVertexVisualizer,
"bbox": ScoredBoundingBoxVisualizer,
}
@classmethod
def add_parser(cls: type, subparsers: argparse._SubParsersAction):
parser = subparsers.add_parser(cls.COMMAND, help="Visualize selected entries")
cls.add_arguments(parser)
parser.set_defaults(func=cls.execute)
@classmethod
def add_arguments(cls: type, parser: argparse.ArgumentParser):
super(ShowAction, cls).add_arguments(parser)
parser.add_argument(
"visualizations",
metavar="<visualizations>",
help="Comma separated list of visualizations, possible values: "
"[{}]".format(",".join(sorted(cls.VISUALIZERS.keys()))),
)
parser.add_argument(
"--min_score",
metavar="<score>",
default=0.8,
type=float,
help="Minimum detection score to visualize",
)
parser.add_argument(
"--nms_thresh", metavar="<threshold>", default=None, type=float, help="NMS threshold"
)
parser.add_argument(
"--texture_atlas",
metavar="<texture_atlas>",
default=None,
help="Texture atlas file (for IUV texture transfer)",
)
parser.add_argument(
"--texture_atlases_map",
metavar="<texture_atlases_map>",
default=None,
help="JSON string of a dict containing texture atlas files for each mesh",
)
parser.add_argument(
"--output",
metavar="<image_file>",
default="outputres.png",
help="File name to save output to",
)
@classmethod
def setup_config(
cls: type, config_fpath: str, model_fpath: str, args: argparse.Namespace, opts: List[str]
):
opts.append("MODEL.ROI_HEADS.SCORE_THRESH_TEST")
opts.append(str(args.min_score))
if args.nms_thresh is not None:
opts.append("MODEL.ROI_HEADS.NMS_THRESH_TEST")
opts.append(str(args.nms_thresh))
cfg = super(ShowAction, cls).setup_config(config_fpath, model_fpath, args, opts)
return cfg
@classmethod
def execute_on_outputs(
cls: type, context: Dict[str, Any], entry: Dict[str, Any], outputs: Instances
):
import cv2
import numpy as np
visualizer = context["visualizer"]
extractor = context["extractor"]
image_fpath = entry["file_name"]
logger.info(f"Processing {image_fpath}")
image = cv2.cvtColor(entry["image"], cv2.COLOR_BGR2GRAY)
image = np.tile(image[:, :, np.newaxis], [1, 1, 3])
data = extractor(outputs)
image_vis = visualizer.visualize(image, data)
entry_idx = context["entry_idx"] + 1
out_fname = cls._get_out_fname(entry_idx, context["out_fname"])
out_dir = os.path.dirname(out_fname)
if len(out_dir) > 0 and not os.path.exists(out_dir):
os.makedirs(out_dir)
cv2.imwrite(out_fname, image_vis)
logger.info(f"Output saved to {out_fname}")
context["entry_idx"] += 1
@classmethod
def postexecute(cls: type, context: Dict[str, Any]):
pass
@classmethod
def _get_out_fname(cls: type, entry_idx: int, fname_base: str):
base, ext = os.path.splitext(fname_base)
return base + ".{0:04d}".format(entry_idx) + ext
@classmethod
def create_context(cls: type, args: argparse.Namespace, cfg: CfgNode) -> Dict[str, Any]:
vis_specs = args.visualizations.split(",")
visualizers = []
extractors = []
for vis_spec in vis_specs:
texture_atlas = get_texture_atlas(args.texture_atlas)
texture_atlases_dict = get_texture_atlases(args.texture_atlases_map)
vis = cls.VISUALIZERS[vis_spec](
cfg=cfg,
texture_atlas=texture_atlas,
texture_atlases_dict=texture_atlases_dict,
)
visualizers.append(vis)
extractor = create_extractor(vis)
extractors.append(extractor)
visualizer = CompoundVisualizer(visualizers)
extractor = CompoundExtractor(extractors)
context = {
"extractor": extractor,
"visualizer": visualizer,
"out_fname": args.output,
"entry_idx": 0,
}
return context
def create_argument_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description=DOC,
formatter_class=lambda prog: argparse.HelpFormatter(prog, max_help_position=120),
)
parser.set_defaults(func=lambda _: parser.print_help(sys.stdout))
subparsers = parser.add_subparsers(title="Actions")
for _, action in _ACTION_REGISTRY.items():
action.add_parser(subparsers)
return parser
def main():
parser = create_argument_parser()
args = parser.parse_args()
verbosity = args.verbosity if hasattr(args, "verbosity") else None
global logger
logger = setup_logger(name=LOGGER_NAME)
logger.setLevel(verbosity_to_level(verbosity))
args.func(args)
if __name__ == "__main__":
main()
VERSION: 2
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
BACKBONE:
NAME: "build_resnet_fpn_backbone"
RESNETS:
OUT_FEATURES: ["res2", "res3", "res4", "res5"]
FPN:
IN_FEATURES: ["res2", "res3", "res4", "res5"]
ANCHOR_GENERATOR:
SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map
ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps)
RPN:
IN_FEATURES: ["p2", "p3", "p4", "p5", "p6"]
PRE_NMS_TOPK_TRAIN: 2000 # Per FPN level
PRE_NMS_TOPK_TEST: 1000 # Per FPN level
# Detectron1 uses 2000 proposals per-batch,
# (See "modeling/rpn/rpn_outputs.py" for details of this legacy issue)
# which is approximately 1000 proposals per-image since the default batch size for FPN is 2.
POST_NMS_TOPK_TRAIN: 1000
POST_NMS_TOPK_TEST: 1000
DENSEPOSE_ON: True
ROI_HEADS:
NAME: "DensePoseROIHeads"
IN_FEATURES: ["p2", "p3", "p4", "p5"]
NUM_CLASSES: 1
ROI_BOX_HEAD:
NAME: "FastRCNNConvFCHead"
NUM_FC: 2
POOLER_RESOLUTION: 7
POOLER_SAMPLING_RATIO: 2
POOLER_TYPE: "ROIAlign"
ROI_DENSEPOSE_HEAD:
NAME: "DensePoseV1ConvXHead"
POOLER_TYPE: "ROIAlign"
NUM_COARSE_SEGM_CHANNELS: 2
DATASETS:
TRAIN: ("densepose_coco_2014_train", "densepose_coco_2014_valminusminival")
TEST: ("densepose_coco_2014_minival",)
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.01
STEPS: (60000, 80000)
MAX_ITER: 90000
WARMUP_FACTOR: 0.1
INPUT:
MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
_BASE_: "../Base-DensePose-RCNN-FPN.yaml"
MODEL:
WEIGHTS: "https://1drv.ms/u/s!Aus8VCZ_C_33dYBMemi9xOUFR0w"
BACKBONE:
NAME: "build_hrfpn_backbone"
RPN:
IN_FEATURES: ['p1', 'p2', 'p3', 'p4', 'p5']
ROI_HEADS:
IN_FEATURES: ['p1', 'p2', 'p3', 'p4', 'p5']
SOLVER:
MAX_ITER: 130000
STEPS: (100000, 120000)
CLIP_GRADIENTS:
ENABLED: True
CLIP_TYPE: "norm"
BASE_LR: 0.03
_BASE_: "../Base-DensePose-RCNN-FPN.yaml"
MODEL:
WEIGHTS: "https://1drv.ms/u/s!Aus8VCZ_C_33ck0gvo5jfoWBOPo"
BACKBONE:
NAME: "build_hrfpn_backbone"
RPN:
IN_FEATURES: ['p1', 'p2', 'p3', 'p4', 'p5']
ROI_HEADS:
IN_FEATURES: ['p1', 'p2', 'p3', 'p4', 'p5']
HRNET:
STAGE2:
NUM_CHANNELS: [40, 80]
STAGE3:
NUM_CHANNELS: [40, 80, 160]
STAGE4:
NUM_CHANNELS: [40, 80, 160, 320]
SOLVER:
MAX_ITER: 130000
STEPS: (100000, 120000)
CLIP_GRADIENTS:
ENABLED: True
CLIP_TYPE: "norm"
BASE_LR: 0.03
_BASE_: "../Base-DensePose-RCNN-FPN.yaml"
MODEL:
WEIGHTS: "https://1drv.ms/u/s!Aus8VCZ_C_33dKvqI6pBZlifgJk"
BACKBONE:
NAME: "build_hrfpn_backbone"
RPN:
IN_FEATURES: ['p1', 'p2', 'p3', 'p4', 'p5']
ROI_HEADS:
IN_FEATURES: ['p1', 'p2', 'p3', 'p4', 'p5']
HRNET:
STAGE2:
NUM_CHANNELS: [48, 96]
STAGE3:
NUM_CHANNELS: [48, 96, 192]
STAGE4:
NUM_CHANNELS: [48, 96, 192, 384]
SOLVER:
MAX_ITER: 130000
STEPS: (100000, 120000)
CLIP_GRADIENTS:
ENABLED: True
CLIP_TYPE: "norm"
BASE_LR: 0.03
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment