Commit 0d97cc8c authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add new model

parents
Pipeline #316 failed with stages
in 0 seconds
# 快速体验
## 环境配置
#### 1. 安装PaddlePaddle
版本要求
* PaddlePaddle >= 2.0.2
* Python >= 3.7+
由于图像抠图模型计算开销大,推荐在GPU版本的PaddlePaddle下使用。
推荐安装10.0以上的CUDA环境。安装教程请见[PaddlePaddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)
#### 2. 下载PaddleSeg仓库
```shell
git clone https://github.com/PaddlePaddle/PaddleSeg
```
#### 3. 安装
```shell
cd PaddleSeg/Matting
pip install -r requirements.txt
```
## 下载预训练模型
下载[模型库](../README_CN.md/#模型库)中的预训练模型并放置于pretrained_models目录下。这边以PP—MattingV2为例。
```shell
mkdir pretrained_models && cd pretrained_models
wget https://paddleseg.bj.bcebos.com/matting/models/ppmattingv2-stdc1-human_512.pdparams
cd ..
```
## 预测
```shell
export CUDA_VISIBLE_DEVICES=0
python tools/predict.py \
--config configs/ppmattingv2/ppmattingv2-stdc1-human_512.yml \
--model_path pretrained_models/ppmattingv2-stdc1-human_512.pdparams \
--image_path demo/human.jpg \
--save_dir ./output/results \
--fg_estimate True
```
预测结果如下:
<div align="center">
<img src="https://user-images.githubusercontent.com/30919197/201861635-0d139592-7da5-44b1-9bfa-7502d9643320.png" width = "90%" />
* </div>
**注意**`--config`需要与`--model_path`匹配。
## 背景替换
```shell
export CUDA_VISIBLE_DEVICES=0
python tools/bg_replace.py \
--config configs/ppmattingv2/ppmattingv2-stdc1-human_512.yml \
--model_path pretrained_models/ppmattingv2-stdc1-human_512.pdparams \
--image_path demo/human.jpg \
--background 'g' \
--save_dir ./output/results \
--fg_estimate True
```
背景替换效果如下:
<div align="center">
<img src="https://user-images.githubusercontent.com/30919197/201861644-15dd5ccf-fb6e-4440-a731-8e7c1d464699.png" width = "90%" />
* </div>
**注意:**
* `--image_path`必须是一张图片的具体路径。
* `--config`需要与`--model_path`匹配。
* `--background`可以传入背景图片路径,或选择('r','g','b','w')中的一种,代表红,绿,蓝,白背景, 若不提供则采用绿色作为背景。
# Quick Start
## Installation
#### 1. Install PaddlePaddle
Versions
* PaddlePaddle >= 2.0.2
* Python >= 3.7+
Due to the high computational cost of model, PaddleSeg is recommended for GPU version PaddlePaddle.
CUDA 10.0 or later is recommended. See [PaddlePaddle official website](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html) for the installation tutorial.
#### 2. Download the PaddleSeg repository
```shell
git clone https://github.com/PaddlePaddle/PaddleSeg
```
#### 3. Installation
```shell
cd PaddleSeg/Matting
pip install -r requirements.txt
```
## Download pre-trained model
Download the pre-trained model in [Models](../README.md/#Models) to `pretrained_models`. Take PP-MattingV2 as an example.
```shell
mkdir pretrained_models && cd pretrained_models
wget https://paddleseg.bj.bcebos.com/matting/models/ppmattingv2-stdc1-human_512.pdparams
cd ..
```
## Prediction
```shell
export CUDA_VISIBLE_DEVICES=0
python tools/predict.py \
--config configs/ppmattingv2/ppmattingv2-stdc1-human_512.yml \
--model_path pretrained_models/ppmattingv2-stdc1-human_512.pdparams \
--image_path demo/human.jpg \
--save_dir ./output/results \
--fg_estimate True
```
Prediction results are as follows:
<div align="center">
<img src="https://user-images.githubusercontent.com/30919197/201861635-0d139592-7da5-44b1-9bfa-7502d9643320.png" width = "90%" />
* </div>
**Note**: `--config` needs to match `--model_path`.
## Background Replacement
```shell
export CUDA_VISIBLE_DEVICES=0
python tools/bg_replace.py \
--config configs/ppmattingv2/ppmattingv2-stdc1-human_512.yml \
--model_path pretrained_models/ppmattingv2-stdc1-human_512.pdparams \
--image_path demo/human.jpg \
--background 'g' \
--save_dir ./output/results \
--fg_estimate True
```
The background replacement effect is as follows:
<div align="center">
<img src="https://user-images.githubusercontent.com/30919197/201861644-15dd5ccf-fb6e-4440-a731-8e7c1d464699.png" width = "90%" />
* </div>
**Notes:**
* `--image_path` must be the specific path of an image.
* `--config` needs to match `--model_path`.
* `--background` can be passed into the background image path, or one of ('r','g','b','w'), representing a red, green, blue, or white background, default green if not passed.
from . import ml, metrics, transforms, datasets, models, utils
from .val import evaluate
from .val_ml import evaluate_ml
from .train import train
from .predict import predict
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import math
import time
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
from paddleseg import utils
from paddleseg.core import infer
from paddleseg.utils import logger, progbar, TimeAverager
from ppmatting.utils import mkdir, estimate_foreground_ml
def partition_list(arr, m):
"""split the list 'arr' into m pieces"""
n = int(math.ceil(len(arr) / float(m)))
return [arr[i:i + n] for i in range(0, len(arr), n)]
def save_result(alpha, path, im_path, trimap=None, fg_estimate=True):
"""
The value of alpha is range [0, 1], shape should be [h,w]
"""
dirname = os.path.dirname(path)
if not os.path.exists(dirname):
os.makedirs(dirname)
basename = os.path.basename(path)
name = os.path.splitext(basename)[0]
alpha_save_path = os.path.join(dirname, name + '_alpha.png')
rgba_save_path = os.path.join(dirname, name + '_rgba.png')
# save alpha matte
if trimap is not None:
trimap = cv2.imread(trimap, 0)
alpha[trimap == 0] = 0
alpha[trimap == 255] = 255
alpha = (alpha).astype('uint8')
cv2.imwrite(alpha_save_path, alpha)
# save rgba
im = cv2.imread(im_path)
if fg_estimate:
fg = estimate_foreground_ml(im / 255.0, alpha / 255.0) * 255
else:
fg = im
fg = fg.astype('uint8')
alpha = alpha[:, :, np.newaxis]
rgba = np.concatenate((fg, alpha), axis=-1)
cv2.imwrite(rgba_save_path, rgba)
return fg
def reverse_transform(alpha, trans_info):
"""recover pred to origin shape"""
for item in trans_info[::-1]:
if item[0] == 'resize':
h, w = item[1][0], item[1][1]
alpha = F.interpolate(alpha, [h, w], mode='bilinear')
elif item[0] == 'padding':
h, w = item[1][0], item[1][1]
alpha = alpha[:, :, 0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return alpha
def preprocess(img, transforms, trimap=None):
data = {}
data['img'] = img
if trimap is not None:
data['trimap'] = trimap
data['gt_fields'] = ['trimap']
data['trans_info'] = []
data = transforms(data)
data['img'] = paddle.to_tensor(data['img'])
data['img'] = data['img'].unsqueeze(0)
if trimap is not None:
data['trimap'] = paddle.to_tensor(data['trimap'])
data['trimap'] = data['trimap'].unsqueeze((0, 1))
return data
def predict(model,
model_path,
transforms,
image_list,
image_dir=None,
trimap_list=None,
save_dir='output',
fg_estimate=True):
"""
predict and visualize the image_list.
Args:
model (nn.Layer): Used to predict for input image.
model_path (str): The path of pretrained model.
transforms (transforms.Compose): Preprocess for input image.
image_list (list): A list of image path to be predicted.
image_dir (str, optional): The root directory of the images predicted. Default: None.
trimap_list (list, optional): A list of trimap of image_list. Default: None.
save_dir (str, optional): The directory to save the visualized results. Default: 'output'.
"""
utils.utils.load_entire_model(model, model_path)
model.eval()
nranks = paddle.distributed.get_world_size()
local_rank = paddle.distributed.get_rank()
if nranks > 1:
img_lists = partition_list(image_list, nranks)
trimap_lists = partition_list(
trimap_list, nranks) if trimap_list is not None else None
else:
img_lists = [image_list]
trimap_lists = [trimap_list] if trimap_list is not None else None
logger.info("Start to predict...")
progbar_pred = progbar.Progbar(target=len(img_lists[0]), verbose=1)
preprocess_cost_averager = TimeAverager()
infer_cost_averager = TimeAverager()
postprocess_cost_averager = TimeAverager()
batch_start = time.time()
with paddle.no_grad():
for i, im_path in enumerate(img_lists[local_rank]):
preprocess_start = time.time()
trimap = trimap_lists[local_rank][
i] if trimap_list is not None else None
data = preprocess(img=im_path, transforms=transforms, trimap=trimap)
preprocess_cost_averager.record(time.time() - preprocess_start)
infer_start = time.time()
alpha_pred = model(data)
infer_cost_averager.record(time.time() - infer_start)
postprocess_start = time.time()
alpha_pred = reverse_transform(alpha_pred, data['trans_info'])
alpha_pred = (alpha_pred.numpy()).squeeze()
alpha_pred = (alpha_pred * 255).astype('uint8')
# get the saved name
if image_dir is not None:
im_file = im_path.replace(image_dir, '')
else:
im_file = os.path.basename(im_path)
if im_file[0] == '/' or im_file[0] == '\\':
im_file = im_file[1:]
save_path = os.path.join(save_dir, im_file)
mkdir(save_path)
fg = save_result(
alpha_pred,
save_path,
im_path=im_path,
trimap=trimap,
fg_estimate=fg_estimate)
postprocess_cost_averager.record(time.time() - postprocess_start)
preprocess_cost = preprocess_cost_averager.get_average()
infer_cost = infer_cost_averager.get_average()
postprocess_cost = postprocess_cost_averager.get_average()
if local_rank == 0:
progbar_pred.update(i + 1,
[('preprocess_cost', preprocess_cost),
('infer_cost cost', infer_cost),
('postprocess_cost', postprocess_cost)])
preprocess_cost_averager.reset()
infer_cost_averager.reset()
postprocess_cost_averager.reset()
return alpha_pred, fg
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
from collections import deque, defaultdict
import pickle
import shutil
import numpy as np
import paddle
import paddle.nn.functional as F
from paddleseg.utils import TimeAverager, calculate_eta, resume, logger, train_profiler
from .val import evaluate
def visual_in_traning(log_writer, vis_dict, step):
"""
Visual in vdl
Args:
log_writer (LogWriter): The log writer of vdl.
vis_dict (dict): Dict of tensor. The shape of thesor is (C, H, W)
"""
for key, value in vis_dict.items():
value_shape = value.shape
if value_shape[0] not in [1, 3]:
value = value[0]
value = value.unsqueeze(0)
value = paddle.transpose(value, (1, 2, 0))
min_v = paddle.min(value)
max_v = paddle.max(value)
if (min_v > 0) and (max_v < 1):
value = value * 255
elif (min_v < 0 and min_v >= -1) and (max_v <= 1):
value = (1 + value) / 2 * 255
else:
value = (value - min_v) / (max_v - min_v) * 255
value = value.astype('uint8')
value = value.numpy()
log_writer.add_image(tag=key, img=value, step=step)
def save_best(best_model_dir, metrics_data, iter):
with open(os.path.join(best_model_dir, 'best_metrics.txt'), 'w') as f:
for key, value in metrics_data.items():
line = key + ' ' + str(value) + '\n'
f.write(line)
f.write('iter' + ' ' + str(iter) + '\n')
def get_best(best_file, metrics, resume_model=None):
'''Get best metrics and iter from file'''
best_metrics_data = {}
if os.path.exists(best_file) and (resume_model is not None):
values = []
with open(best_file, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.strip()
key, value = line.split(' ')
best_metrics_data[key] = eval(value)
if key == 'iter':
best_iter = eval(value)
else:
for key in metrics:
best_metrics_data[key] = np.inf
best_iter = -1
return best_metrics_data, best_iter
def train(model,
train_dataset,
val_dataset=None,
optimizer=None,
save_dir='output',
iters=10000,
batch_size=2,
resume_model=None,
save_interval=1000,
log_iters=10,
log_image_iters=1000,
num_workers=0,
use_vdl=False,
losses=None,
keep_checkpoint_max=5,
eval_begin_iters=None,
metrics='sad',
precision='fp32',
amp_level='O1',
profiler_options=None):
"""
Launch training.
Args:
model(nn.Layer): A matting model.
train_dataset (paddle.io.Dataset): Used to read and process training datasets.
val_dataset (paddle.io.Dataset, optional): Used to read and process validation datasets.
optimizer (paddle.optimizer.Optimizer): The optimizer.
save_dir (str, optional): The directory for saving the model snapshot. Default: 'output'.
iters (int, optional): How may iters to train the model. Defualt: 10000.
batch_size (int, optional): Mini batch size of one gpu or cpu. Default: 2.
resume_model (str, optional): The path of resume model.
save_interval (int, optional): How many iters to save a model snapshot once during training. Default: 1000.
log_iters (int, optional): Display logging information at every log_iters. Default: 10.
log_image_iters (int, optional): Log image to vdl. Default: 1000.
num_workers (int, optional): Num workers for data loader. Default: 0.
use_vdl (bool, optional): Whether to record the data to VisualDL during training. Default: False.
losses (dict, optional): A dict of loss, refer to the loss function of the model for details. Default: None.
keep_checkpoint_max (int, optional): Maximum number of checkpoints to save. Default: 5.
eval_begin_iters (int): The iters begin evaluation. It will evaluate at iters/2 if it is None. Defalust: None.
metrics(str|list, optional): The metrics to evaluate, it may be the combination of ("sad", "mse", "grad", "conn").
precision (str, optional): Use AMP if precision='fp16'. If precision='fp32', the training is normal.
amp_level (str, optional): Auto mixed precision level. Accepted values are “O1” and “O2”: O1 represent mixed precision,
the input data type of each operator will be casted by white_list and black_list; O2 represent Pure fp16, all operators
parameters and input data will be casted to fp16, except operators in black_list, don’t support fp16 kernel and batchnorm. Default is O1(amp)
profiler_options (str, optional): The option of train profiler.
"""
model.train()
nranks = paddle.distributed.ParallelEnv().nranks
local_rank = paddle.distributed.ParallelEnv().local_rank
start_iter = 0
if resume_model is not None:
start_iter = resume(model, optimizer, resume_model)
if not os.path.isdir(save_dir):
if os.path.exists(save_dir):
os.remove(save_dir)
os.makedirs(save_dir)
# Use amp
if precision == 'fp16':
logger.info('use AMP to train. AMP level = {}'.format(amp_level))
scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
if amp_level == 'O2':
model, optimizer = paddle.amp.decorate(
models=model,
optimizers=optimizer,
level='O2',
save_dtype='float32')
if nranks > 1:
# Initialize parallel environment if not done.
if not paddle.distributed.parallel.parallel_helper._is_parallel_ctx_initialized(
):
paddle.distributed.init_parallel_env()
ddp_model = paddle.DataParallel(model)
else:
ddp_model = paddle.DataParallel(model)
batch_sampler = paddle.io.DistributedBatchSampler(
train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
loader = paddle.io.DataLoader(
train_dataset,
batch_sampler=batch_sampler,
num_workers=num_workers,
return_list=True, )
if use_vdl:
from visualdl import LogWriter
log_writer = LogWriter(save_dir)
if isinstance(metrics, str):
metrics = [metrics]
elif not isinstance(metrics, list):
metrics = ['sad']
best_metrics_data, best_iter = get_best(
os.path.join(save_dir, 'best_model', 'best_metrics.txt'),
metrics,
resume_model=resume_model)
avg_loss = defaultdict(float)
iters_per_epoch = len(batch_sampler)
reader_cost_averager = TimeAverager()
batch_cost_averager = TimeAverager()
save_models = deque()
batch_start = time.time()
iter = start_iter
while iter < iters:
for data in loader:
iter += 1
if iter > iters:
break
reader_cost_averager.record(time.time() - batch_start)
if precision == 'fp16':
with paddle.amp.auto_cast(
level=amp_level,
enable=True,
custom_white_list={
"elementwise_add", "batch_norm", "sync_batch_norm"
},
custom_black_list={'bilinear_interp_v2', 'pad3d'}):
logit_dict, loss_dict = ddp_model(
data) if nranks > 1 else model(data)
scaled = scaler.scale(loss_dict['all']) # scale the loss
scaled.backward() # do backward
scaler.minimize(optimizer, scaled) # update parameters
else:
logit_dict, loss_dict = ddp_model(
data) if nranks > 1 else model(data)
loss_dict['all'].backward()
optimizer.step()
lr = optimizer.get_lr()
if isinstance(optimizer._learning_rate,
paddle.optimizer.lr.LRScheduler):
optimizer._learning_rate.step()
train_profiler.add_profiler_step(profiler_options)
model.clear_gradients()
for key, value in loss_dict.items():
avg_loss[key] += float(value)
batch_cost_averager.record(
time.time() - batch_start, num_samples=batch_size)
if (iter) % log_iters == 0 and local_rank == 0:
for key, value in avg_loss.items():
avg_loss[key] = value / log_iters
remain_iters = iters - iter
avg_train_batch_cost = batch_cost_averager.get_average()
avg_train_reader_cost = reader_cost_averager.get_average()
eta = calculate_eta(remain_iters, avg_train_batch_cost)
# loss info
loss_str = ' ' * 26 + '\t[LOSSES]'
loss_str = loss_str
for key, value in avg_loss.items():
if key != 'all':
loss_str = loss_str + ' ' + key + '={:.4f}'.format(
value)
logger.info(
"[TRAIN] epoch={}, iter={}/{}, loss={:.4f}, lr={:.6f}, batch_cost={:.4f}, reader_cost={:.5f}, ips={:.4f} samples/sec | ETA {}\n{}\n"
.format((iter - 1) // iters_per_epoch + 1, iter, iters,
avg_loss['all'], lr, avg_train_batch_cost,
avg_train_reader_cost,
batch_cost_averager.get_ips_average(
), eta, loss_str))
if use_vdl:
for key, value in avg_loss.items():
log_tag = 'Train/' + key
log_writer.add_scalar(log_tag, value, iter)
log_writer.add_scalar('Train/lr', lr, iter)
log_writer.add_scalar('Train/batch_cost',
avg_train_batch_cost, iter)
log_writer.add_scalar('Train/reader_cost',
avg_train_reader_cost, iter)
if iter % log_image_iters == 0:
vis_dict = {}
# ground truth
vis_dict['ground truth/img'] = data['img'][0]
for key in data['gt_fields']:
key = key[0]
vis_dict['/'.join(['ground truth', key])] = data[
key][0]
# predict
for key, value in logit_dict.items():
vis_dict['/'.join(['predict', key])] = logit_dict[
key][0]
visual_in_traning(
log_writer=log_writer, vis_dict=vis_dict, step=iter)
for key in avg_loss.keys():
avg_loss[key] = 0.
reader_cost_averager.reset()
batch_cost_averager.reset()
# save model
if (iter % save_interval == 0 or iter == iters) and local_rank == 0:
current_save_dir = os.path.join(save_dir,
"iter_{}".format(iter))
if not os.path.isdir(current_save_dir):
os.makedirs(current_save_dir)
paddle.save(model.state_dict(),
os.path.join(current_save_dir, 'model.pdparams'))
paddle.save(optimizer.state_dict(),
os.path.join(current_save_dir, 'model.pdopt'))
save_models.append(current_save_dir)
if len(save_models) > keep_checkpoint_max > 0:
model_to_remove = save_models.popleft()
shutil.rmtree(model_to_remove)
# eval model
if eval_begin_iters is None:
eval_begin_iters = iters // 2
if (iter % save_interval == 0 or iter == iters) and (
val_dataset is not None
) and local_rank == 0 and iter >= eval_begin_iters:
num_workers = 1 if num_workers > 0 else 0
metrics_data = evaluate(
model,
val_dataset,
num_workers=1,
print_detail=True,
save_results=False,
metrics=metrics,
precision=precision,
amp_level=amp_level)
model.train()
# save best model and add evaluation results to vdl
if (iter % save_interval == 0 or iter == iters) and local_rank == 0:
if val_dataset is not None and iter >= eval_begin_iters:
if metrics_data[metrics[0]] < best_metrics_data[metrics[0]]:
best_iter = iter
best_metrics_data = metrics_data.copy()
best_model_dir = os.path.join(save_dir, "best_model")
paddle.save(
model.state_dict(),
os.path.join(best_model_dir, 'model.pdparams'))
save_best(best_model_dir, best_metrics_data, iter)
show_list = []
for key, value in best_metrics_data.items():
show_list.append((key, value))
log_str = '[EVAL] The model with the best validation {} ({:.4f}) was saved at iter {}.'.format(
show_list[0][0], show_list[0][1], best_iter)
if len(show_list) > 1:
log_str += " While"
for i in range(1, len(show_list)):
log_str = log_str + ' {}: {:.4f},'.format(
show_list[i][0], show_list[i][1])
log_str = log_str[:-1]
logger.info(log_str)
if use_vdl:
for key, value in metrics_data.items():
log_writer.add_scalar('Evaluate/' + key, value,
iter)
batch_start = time.time()
# Sleep for half a second to let dataloader release resources.
time.sleep(0.5)
if use_vdl:
log_writer.close()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import numpy as np
import time
import paddle
import paddle.nn.functional as F
from paddleseg.utils import TimeAverager, calculate_eta, logger, progbar
from ppmatting.metrics import metrics_class_dict
np.set_printoptions(suppress=True)
def save_alpha_pred(alpha, path):
"""
The value of alpha is range [0, 1], shape should be [h,w]
"""
dirname = os.path.dirname(path)
if not os.path.exists(dirname):
os.makedirs(dirname)
alpha = (alpha).astype('uint8')
cv2.imwrite(path, alpha)
def reverse_transform(alpha, trans_info):
"""recover pred to origin shape"""
for item in trans_info[::-1]:
if item[0][0] == 'resize':
h, w = item[1][0], item[1][1]
alpha = F.interpolate(alpha, [h, w], mode='bilinear')
elif item[0][0] == 'padding':
h, w = item[1][0], item[1][1]
alpha = alpha[:, :, 0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return alpha
def evaluate(model,
eval_dataset,
num_workers=0,
print_detail=True,
save_dir='output/results',
save_results=True,
metrics='sad',
precision='fp32',
amp_level='O1'):
model.eval()
nranks = paddle.distributed.ParallelEnv().nranks
local_rank = paddle.distributed.ParallelEnv().local_rank
if nranks > 1:
# Initialize parallel environment if not done.
if not paddle.distributed.parallel.parallel_helper._is_parallel_ctx_initialized(
):
paddle.distributed.init_parallel_env()
loader = paddle.io.DataLoader(
eval_dataset,
batch_size=1,
drop_last=False,
num_workers=num_workers,
return_list=True, )
total_iters = len(loader)
# Get metric instances and data saving
metrics_ins = {}
metrics_data = {}
if isinstance(metrics, str):
metrics = [metrics]
elif not isinstance(metrics, list):
metrics = ['sad']
for key in metrics:
key = key.lower()
metrics_ins[key] = metrics_class_dict[key]()
metrics_data[key] = None
if print_detail:
logger.info("Start evaluating (total_samples: {}, total_iters: {})...".
format(len(eval_dataset), total_iters))
progbar_val = progbar.Progbar(
target=total_iters, verbose=1 if nranks < 2 else 2)
reader_cost_averager = TimeAverager()
batch_cost_averager = TimeAverager()
batch_start = time.time()
img_name = ''
i = 0
with paddle.no_grad():
for iter, data in enumerate(loader):
reader_cost_averager.record(time.time() - batch_start)
if precision == 'fp16':
with paddle.amp.auto_cast(
level=amp_level,
enable=True,
custom_white_list={
"elementwise_add", "batch_norm", "sync_batch_norm"
},
custom_black_list={'bilinear_interp_v2', 'pad3d'}):
alpha_pred = model(data)
alpha_pred = reverse_transform(alpha_pred,
data['trans_info'])
else:
alpha_pred = model(data)
alpha_pred = reverse_transform(alpha_pred, data['trans_info'])
alpha_pred = alpha_pred.numpy()
alpha_gt = data['alpha'].numpy() * 255
trimap = data.get('ori_trimap')
if trimap is not None:
trimap = trimap.numpy().astype('uint8')
alpha_pred = np.round(alpha_pred * 255)
for key in metrics_ins.keys():
metrics_data[key] = metrics_ins[key].update(alpha_pred,
alpha_gt, trimap)
if save_results:
alpha_pred_one = alpha_pred[0].squeeze()
if trimap is not None:
trimap = trimap.squeeze().astype('uint8')
alpha_pred_one[trimap == 255] = 255
alpha_pred_one[trimap == 0] = 0
save_name = data['img_name'][0]
name, ext = os.path.splitext(save_name)
if save_name == img_name:
save_name = name + '_' + str(i) + ext
i += 1
else:
img_name = save_name
save_name = name + '_' + str(i) + ext
i = 1
save_alpha_pred(alpha_pred_one,
os.path.join(save_dir, save_name))
batch_cost_averager.record(
time.time() - batch_start, num_samples=len(alpha_gt))
batch_cost = batch_cost_averager.get_average()
reader_cost = reader_cost_averager.get_average()
if local_rank == 0 and print_detail:
show_list = [(k, v) for k, v in metrics_data.items()]
show_list = show_list + [('batch_cost', batch_cost),
('reader cost', reader_cost)]
progbar_val.update(iter + 1, show_list)
reader_cost_averager.reset()
batch_cost_averager.reset()
batch_start = time.time()
for key in metrics_ins.keys():
metrics_data[key] = metrics_ins[key].evaluate()
log_str = '[EVAL] '
for key, value in metrics_data.items():
log_str = log_str + key + ': {:.4f}, '.format(value)
log_str = log_str[:-2]
logger.info(log_str)
return metrics_data
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import numpy as np
import time
import paddle
import paddle.nn.functional as F
from paddleseg.utils import TimeAverager, calculate_eta, logger, progbar
from ppmatting.metrics import metric
from pymatting.util.util import load_image, save_image, stack_images
from pymatting.foreground.estimate_foreground_ml import estimate_foreground_ml
np.set_printoptions(suppress=True)
def save_alpha_pred(alpha, path):
"""
The value of alpha is range [0, 1], shape should be [h,w]
"""
dirname = os.path.dirname(path)
if not os.path.exists(dirname):
os.makedirs(dirname)
alpha = (alpha).astype('uint8')
cv2.imwrite(path, alpha)
def reverse_transform(alpha, trans_info):
"""recover pred to origin shape"""
for item in trans_info[::-1]:
if item[0][0] == 'resize':
h, w = int(item[1][0]), int(item[1][1])
alpha = cv2.resize(alpha, dsize=(w, h))
elif item[0][0] == 'padding':
h, w = int(item[1][0]), int(item[1][1])
alpha = alpha[0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return alpha
def evaluate_ml(model,
eval_dataset,
num_workers=0,
print_detail=True,
save_dir='output/results',
save_results=True):
loader = paddle.io.DataLoader(
eval_dataset,
batch_size=1,
drop_last=False,
num_workers=num_workers,
return_list=True, )
total_iters = len(loader)
mse_metric = metric.MSE()
sad_metric = metric.SAD()
grad_metric = metric.Grad()
conn_metric = metric.Conn()
if print_detail:
logger.info("Start evaluating (total_samples: {}, total_iters: {})...".
format(len(eval_dataset), total_iters))
progbar_val = progbar.Progbar(target=total_iters, verbose=1)
reader_cost_averager = TimeAverager()
batch_cost_averager = TimeAverager()
batch_start = time.time()
img_name = ''
i = 0
ignore_cnt = 0
for iter, data in enumerate(loader):
reader_cost_averager.record(time.time() - batch_start)
image_rgb_chw = data['img'].numpy()[0]
image_rgb_hwc = np.transpose(image_rgb_chw, (1, 2, 0))
trimap = data['trimap'].numpy().squeeze() / 255.0
image = image_rgb_hwc * 0.5 + 0.5 # reverse normalize (x/255 - mean) / std
is_fg = trimap >= 0.9
is_bg = trimap <= 0.1
if is_fg.sum() == 0 or is_bg.sum() == 0:
ignore_cnt += 1
logger.info(str(iter))
continue
alpha_pred = model(image, trimap)
alpha_pred = reverse_transform(alpha_pred, data['trans_info'])
alpha_gt = data['alpha'].numpy().squeeze() * 255
trimap = data['ori_trimap'].numpy().squeeze()
alpha_pred = np.round(alpha_pred * 255)
mse = mse_metric.update(alpha_pred, alpha_gt, trimap)
sad = sad_metric.update(alpha_pred, alpha_gt, trimap)
grad = grad_metric.update(alpha_pred, alpha_gt, trimap)
conn = conn_metric.update(alpha_pred, alpha_gt, trimap)
if sad > 1000:
print(data['img_name'][0])
if save_results:
alpha_pred_one = alpha_pred
alpha_pred_one[trimap == 255] = 255
alpha_pred_one[trimap == 0] = 0
save_name = data['img_name'][0]
name, ext = os.path.splitext(save_name)
if save_name == img_name:
save_name = name + '_' + str(i) + ext
i += 1
else:
img_name = save_name
save_name = name + '_' + str(0) + ext
i = 1
save_alpha_pred(alpha_pred_one, os.path.join(save_dir, save_name))
batch_cost_averager.record(
time.time() - batch_start, num_samples=len(alpha_gt))
batch_cost = batch_cost_averager.get_average()
reader_cost = reader_cost_averager.get_average()
if print_detail:
progbar_val.update(iter + 1,
[('SAD', sad), ('MSE', mse), ('Grad', grad),
('Conn', conn), ('batch_cost', batch_cost),
('reader cost', reader_cost)])
reader_cost_averager.reset()
batch_cost_averager.reset()
batch_start = time.time()
mse = mse_metric.evaluate()
sad = sad_metric.evaluate()
grad = grad_metric.evaluate()
conn = conn_metric.evaluate()
logger.info('[EVAL] SAD: {:.4f}, MSE: {:.4f}, Grad: {:.4f}, Conn: {:.4f}'.
format(sad, mse, grad, conn))
logger.info('{}'.format(ignore_cnt))
return sad, mse, grad, conn
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .matting_dataset import MattingDataset
from .composition_1k import Composition1K
from .distinctions_646 import Distinctions646
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import math
import cv2
import numpy as np
import random
import paddle
from paddleseg.cvlibs import manager
import ppmatting.transforms as T
from ppmatting.datasets.matting_dataset import MattingDataset
@manager.DATASETS.add_component
class Composition1K(MattingDataset):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import math
import cv2
import numpy as np
import random
import paddle
from paddleseg.cvlibs import manager
import ppmatting.transforms as T
from ppmatting.datasets.matting_dataset import MattingDataset
@manager.DATASETS.add_component
class Distinctions646(MattingDataset):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import math
import cv2
import numpy as np
import random
import paddle
from paddleseg.cvlibs import manager
import ppmatting.transforms as T
@manager.DATASETS.add_component
class MattingDataset(paddle.io.Dataset):
"""
Pass in a dataset that conforms to the format.
matting_dataset/
|--bg/
|
|--train/
| |--fg/
| |--alpha/
|
|--val/
| |--fg/
| |--alpha/
| |--trimap/ (if existing)
|
|--train.txt
|
|--val.txt
See README.md for more information of dataset.
Args:
dataset_root(str): The root path of dataset.
transforms(list): Transforms for image.
mode (str, optional): which part of dataset to use. it is one of ('train', 'val', 'trainval'). Default: 'train'.
train_file (str|list, optional): File list is used to train. It should be `foreground_image.png background_image.png`
or `foreground_image.png`. It shold be provided if mode equal to 'train'. Default: None.
val_file (str|list, optional): File list is used to evaluation. It should be `foreground_image.png background_image.png`
or `foreground_image.png` or ``foreground_image.png background_image.png trimap_image.png`.
It shold be provided if mode equal to 'val'. Default: None.
get_trimap (bool, optional): Whether to get triamp. Default: True.
separator (str, optional): The separator of train_file or val_file. If file name contains ' ', '|' may be perfect. Default: ' '.
key_del (tuple|list, optional): The key which is not need will be delete to accellect data reader. Default: None.
if_rssn (bool, optional): Whether to use RSSN while Compositing image. Including denoise and blur. Default: False.
"""
def __init__(self,
dataset_root,
transforms,
mode='train',
train_file=None,
val_file=None,
get_trimap=True,
separator=' ',
key_del=None,
if_rssn=False):
super().__init__()
self.dataset_root = dataset_root
self.transforms = T.Compose(transforms)
self.mode = mode
self.get_trimap = get_trimap
self.separator = separator
self.key_del = key_del
self.if_rssn = if_rssn
# check file
if mode == 'train' or mode == 'trainval':
if train_file is None:
raise ValueError(
"When `mode` is 'train' or 'trainval', `train_file must be provided!"
)
if isinstance(train_file, str):
train_file = [train_file]
file_list = train_file
if mode == 'val' or mode == 'trainval':
if val_file is None:
raise ValueError(
"When `mode` is 'val' or 'trainval', `val_file must be provided!"
)
if isinstance(val_file, str):
val_file = [val_file]
file_list = val_file
if mode == 'trainval':
file_list = train_file + val_file
# read file
self.fg_bg_list = []
for file in file_list:
file = os.path.join(dataset_root, file)
with open(file, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.strip()
self.fg_bg_list.append(line)
if mode != 'val':
random.shuffle(self.fg_bg_list)
def __getitem__(self, idx):
data = {}
fg_bg_file = self.fg_bg_list[idx]
fg_bg_file = fg_bg_file.split(self.separator)
data['img_name'] = fg_bg_file[0] # using in save prediction results
fg_file = os.path.join(self.dataset_root, fg_bg_file[0])
alpha_file = fg_file.replace('/fg', '/alpha')
fg = cv2.imread(fg_file)
alpha = cv2.imread(alpha_file, 0)
data['alpha'] = alpha
data['gt_fields'] = []
# line is: fg [bg] [trimap]
if len(fg_bg_file) >= 2:
bg_file = os.path.join(self.dataset_root, fg_bg_file[1])
bg = cv2.imread(bg_file)
data['img'], data['fg'], data['bg'] = self.composite(fg, alpha, bg)
if self.mode in ['train', 'trainval']:
data['gt_fields'].append('fg')
data['gt_fields'].append('bg')
data['gt_fields'].append('alpha')
if len(fg_bg_file) == 3 and self.get_trimap:
if self.mode == 'val':
trimap_path = os.path.join(self.dataset_root, fg_bg_file[2])
if os.path.exists(trimap_path):
data['trimap'] = trimap_path
data['gt_fields'].append('trimap')
data['ori_trimap'] = cv2.imread(trimap_path, 0)
else:
raise FileNotFoundError(
'trimap is not Found: {}'.format(fg_bg_file[2]))
else:
data['img'] = fg
if self.mode in ['train', 'trainval']:
data['fg'] = fg.copy()
data['bg'] = fg.copy()
data['gt_fields'].append('fg')
data['gt_fields'].append('bg')
data['gt_fields'].append('alpha')
data['trans_info'] = [] # Record shape change information
# Generate trimap from alpha if no trimap file provided
if self.get_trimap:
if 'trimap' not in data:
data['trimap'] = self.gen_trimap(
data['alpha'], mode=self.mode).astype('float32')
data['gt_fields'].append('trimap')
if self.mode == 'val':
data['ori_trimap'] = data['trimap'].copy()
# Delete key which is not need
if self.key_del is not None:
for key in self.key_del:
if key in data.keys():
data.pop(key)
if key in data['gt_fields']:
data['gt_fields'].remove(key)
data = self.transforms(data)
# When evaluation, gt should not be transforms.
if self.mode == 'val':
data['gt_fields'].append('alpha')
data['img'] = data['img'].astype('float32')
for key in data.get('gt_fields', []):
data[key] = data[key].astype('float32')
if 'trimap' in data:
data['trimap'] = data['trimap'][np.newaxis, :, :]
if 'ori_trimap' in data:
data['ori_trimap'] = data['ori_trimap'][np.newaxis, :, :]
data['alpha'] = data['alpha'][np.newaxis, :, :] / 255.
return data
def __len__(self):
return len(self.fg_bg_list)
def composite(self, fg, alpha, ori_bg):
if self.if_rssn:
if np.random.rand() < 0.5:
fg = cv2.fastNlMeansDenoisingColored(fg, None, 3, 3, 7, 21)
ori_bg = cv2.fastNlMeansDenoisingColored(ori_bg, None, 3, 3, 7,
21)
if np.random.rand() < 0.5:
radius = np.random.choice([19, 29, 39, 49, 59])
ori_bg = cv2.GaussianBlur(ori_bg, (radius, radius), 0, 0)
fg_h, fg_w = fg.shape[:2]
ori_bg_h, ori_bg_w = ori_bg.shape[:2]
wratio = fg_w / ori_bg_w
hratio = fg_h / ori_bg_h
ratio = wratio if wratio > hratio else hratio
# Resize ori_bg if it is smaller than fg.
if ratio > 1:
resize_h = math.ceil(ori_bg_h * ratio)
resize_w = math.ceil(ori_bg_w * ratio)
bg = cv2.resize(
ori_bg, (resize_w, resize_h), interpolation=cv2.INTER_LINEAR)
else:
bg = ori_bg
bg = bg[0:fg_h, 0:fg_w, :]
alpha = alpha / 255
alpha = np.expand_dims(alpha, axis=2)
image = alpha * fg + (1 - alpha) * bg
image = image.astype(np.uint8)
return image, fg, bg
@staticmethod
def gen_trimap(alpha, mode='train', eval_kernel=25):
if mode == 'train':
k_size = random.choice(range(2, 5))
iterations = np.random.randint(5, 15)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,
(k_size, k_size))
dilated = cv2.dilate(alpha, kernel, iterations=iterations)
eroded = cv2.erode(alpha, kernel, iterations=iterations)
trimap = np.zeros(alpha.shape)
trimap.fill(128)
trimap[eroded > 254.5] = 255
trimap[dilated < 0.5] = 0
else:
k_size = eval_kernel
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,
(k_size, k_size))
dilated = cv2.dilate(alpha, kernel)
eroded = cv2.erode(alpha, kernel)
trimap = np.zeros(alpha.shape)
trimap.fill(128)
trimap[eroded > 254.5] = 255
trimap[dilated < 0.5] = 0
return trimap
from .metric import MSE, SAD, Grad, Conn
metrics_class_dict = {'sad': SAD, 'mse': MSE, 'grad': Grad, 'conn': Conn}
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Grad and Conn is refer to https://github.com/yucornetto/MGMatting/blob/main/code-base/utils/evaluate.py
# Output of `Grad` is sightly different from the MATLAB version provided by Adobe (less than 0.1%)
# Output of `Conn` is smaller than the MATLAB version (~5%, maybe MATLAB has a different algorithm)
# So do not report results calculated by these functions in your paper.
# Evaluate your inference with the MATLAB file `DIM_evaluation_code/evaluate.m`.
import cv2
import numpy as np
from scipy.ndimage.filters import convolve
from scipy.special import gamma
from skimage.measure import label
class MSE:
"""
Only calculate the unknown region if trimap provided.
"""
def __init__(self):
self.mse_diffs = 0
self.count = 0
def update(self, pred, gt, trimap=None):
"""
update metric.
Args:
pred (np.ndarray): The value range is [0., 255.].
gt (np.ndarray): The value range is [0, 255].
trimap (np.ndarray, optional) The value is in {0, 128, 255}. Default: None.
"""
if trimap is None:
trimap = np.ones_like(gt) * 128
if not (pred.shape == gt.shape == trimap.shape):
raise ValueError(
'The shape of `pred`, `gt` and `trimap` should be equal. '
'but they are {}, {} and {}'.format(pred.shape, gt.shape,
trimap.shape))
pred[trimap == 0] = 0
pred[trimap == 255] = 255
mask = trimap == 128
pixels = float(mask.sum())
pred = pred / 255.
gt = gt / 255.
diff = (pred - gt) * mask
mse_diff = (diff**2).sum() / pixels if pixels > 0 else 0
self.mse_diffs += mse_diff
self.count += 1
return mse_diff
def evaluate(self):
mse = self.mse_diffs / self.count if self.count > 0 else 0
return mse
class SAD:
"""
Only calculate the unknown region if trimap provided.
"""
def __init__(self):
self.sad_diffs = 0
self.count = 0
def update(self, pred, gt, trimap=None):
"""
update metric.
Args:
pred (np.ndarray): The value range is [0., 255.].
gt (np.ndarray): The value range is [0., 255.].
trimap (np.ndarray, optional)L The value is in {0, 128, 255}. Default: None.
"""
if trimap is None:
trimap = np.ones_like(gt) * 128
if not (pred.shape == gt.shape == trimap.shape):
raise ValueError(
'The shape of `pred`, `gt` and `trimap` should be equal. '
'but they are {}, {} and {}'.format(pred.shape, gt.shape,
trimap.shape))
pred[trimap == 0] = 0
pred[trimap == 255] = 255
mask = trimap == 128
pred = pred / 255.
gt = gt / 255.
diff = (pred - gt) * mask
sad_diff = (np.abs(diff)).sum()
sad_diff /= 1000
self.sad_diffs += sad_diff
self.count += 1
return sad_diff
def evaluate(self):
sad = self.sad_diffs / self.count if self.count > 0 else 0
return sad
class Grad:
"""
Only calculate the unknown region if trimap provided.
Refer to: https://github.com/open-mlab/mmediting/blob/master/mmedit/core/evaluation/metrics.py
"""
def __init__(self):
self.grad_diffs = 0
self.count = 0
def gaussian(self, x, sigma):
return np.exp(-x**2 / (2 * sigma**2)) / (sigma * np.sqrt(2 * np.pi))
def dgaussian(self, x, sigma):
return -x * self.gaussian(x, sigma) / sigma**2
def gauss_filter(self, sigma, epsilon=1e-2):
half_size = np.ceil(
sigma * np.sqrt(-2 * np.log(np.sqrt(2 * np.pi) * sigma * epsilon)))
size = int(2 * half_size + 1)
# create filter in x axis
filter_x = np.zeros((size, size))
for i in range(size):
for j in range(size):
filter_x[i, j] = self.gaussian(
i - half_size, sigma) * self.dgaussian(j - half_size, sigma)
# normalize filter
norm = np.sqrt((filter_x**2).sum())
filter_x = filter_x / norm
filter_y = np.transpose(filter_x)
return filter_x, filter_y
def gauss_gradient(self, img, sigma):
filter_x, filter_y = self.gauss_filter(sigma)
img_filtered_x = cv2.filter2D(
img, -1, filter_x, borderType=cv2.BORDER_REPLICATE)
img_filtered_y = cv2.filter2D(
img, -1, filter_y, borderType=cv2.BORDER_REPLICATE)
return np.sqrt(img_filtered_x**2 + img_filtered_y**2)
def update(self, pred, gt, trimap=None, sigma=1.4):
"""
update metric.
Args:
pred (np.ndarray): The value range is [0., 1.].
gt (np.ndarray): The value range is [0, 255].
trimap (np.ndarray, optional)L The value is in {0, 128, 255}. Default: None.
sigma (float, optional): Standard deviation of the gaussian kernel. Default: 1.4.
"""
if trimap is None:
trimap = np.ones_like(gt) * 128
if not (pred.shape == gt.shape == trimap.shape):
raise ValueError(
'The shape of `pred`, `gt` and `trimap` should be equal. '
'but they are {}, {} and {}'.format(pred.shape, gt.shape,
trimap.shape))
pred[trimap == 0] = 0
pred[trimap == 255] = 255
gt = gt.squeeze()
pred = pred.squeeze()
gt = gt.astype(np.float64)
pred = pred.astype(np.float64)
gt_normed = np.zeros_like(gt)
pred_normed = np.zeros_like(pred)
cv2.normalize(gt, gt_normed, 1., 0., cv2.NORM_MINMAX)
cv2.normalize(pred, pred_normed, 1., 0., cv2.NORM_MINMAX)
gt_grad = self.gauss_gradient(gt_normed, sigma).astype(np.float32)
pred_grad = self.gauss_gradient(pred_normed, sigma).astype(np.float32)
grad_diff = ((gt_grad - pred_grad)**2 * (trimap == 128)).sum()
grad_diff /= 1000
self.grad_diffs += grad_diff
self.count += 1
return grad_diff
def evaluate(self):
grad = self.grad_diffs / self.count if self.count > 0 else 0
return grad
class Conn:
"""
Only calculate the unknown region if trimap provided.
Refer to: Refer to: https://github.com/open-mlab/mmediting/blob/master/mmedit/core/evaluation/metrics.py
"""
def __init__(self):
self.conn_diffs = 0
self.count = 0
def update(self, pred, gt, trimap=None, step=0.1):
"""
update metric.
Args:
pred (np.ndarray): The value range is [0., 1.].
gt (np.ndarray): The value range is [0, 255].
trimap (np.ndarray, optional)L The value is in {0, 128, 255}. Default: None.
step (float, optional): Step of threshold when computing intersection between
`gt` and `pred`. Default: 0.1.
"""
if trimap is None:
trimap = np.ones_like(gt) * 128
if not (pred.shape == gt.shape == trimap.shape):
raise ValueError(
'The shape of `pred`, `gt` and `trimap` should be equal. '
'but they are {}, {} and {}'.format(pred.shape, gt.shape,
trimap.shape))
pred[trimap == 0] = 0
pred[trimap == 255] = 255
gt = gt.squeeze()
pred = pred.squeeze()
gt = gt.astype(np.float32) / 255
pred = pred.astype(np.float32) / 255
thresh_steps = np.arange(0, 1 + step, step)
round_down_map = -np.ones_like(gt)
for i in range(1, len(thresh_steps)):
gt_thresh = gt >= thresh_steps[i]
pred_thresh = pred >= thresh_steps[i]
intersection = (gt_thresh & pred_thresh).astype(np.uint8)
# connected components
_, output, stats, _ = cv2.connectedComponentsWithStats(
intersection, connectivity=4)
# start from 1 in dim 0 to exclude background
size = stats[1:, -1]
# largest connected component of the intersection
omega = np.zeros_like(gt)
if len(size) != 0:
max_id = np.argmax(size)
# plus one to include background
omega[output == max_id + 1] = 1
mask = (round_down_map == -1) & (omega == 0)
round_down_map[mask] = thresh_steps[i - 1]
round_down_map[round_down_map == -1] = 1
gt_diff = gt - round_down_map
pred_diff = pred - round_down_map
# only calculate difference larger than or equal to 0.15
gt_phi = 1 - gt_diff * (gt_diff >= 0.15)
pred_phi = 1 - pred_diff * (pred_diff >= 0.15)
conn_diff = np.sum(np.abs(gt_phi - pred_phi) * (trimap == 128))
conn_diff /= 1000
self.conn_diffs += conn_diff
self.count += 1
return conn_diff
def evaluate(self):
conn = self.conn_diffs / self.count if self.count > 0 else 0
return conn
from .methods import CloseFormMatting, KNNMatting, LearningBasedMatting, FastMatting, RandomWalksMatting
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import pymatting
from paddleseg.cvlibs import manager
class BaseMLMatting(object):
def __init__(self, alpha_estimator, **kargs):
self.alpha_estimator = alpha_estimator
self.kargs = kargs
def __call__(self, image, trimap):
image = self.__to_float64(image)
trimap = self.__to_float64(trimap)
alpha_matte = self.alpha_estimator(image, trimap, **self.kargs)
return alpha_matte
def __to_float64(self, x):
x_dtype = x.dtype
assert x_dtype in ["float32", "float64"]
x = x.astype("float64")
return x
@manager.MODELS.add_component
class CloseFormMatting(BaseMLMatting):
def __init__(self, **kargs):
cf_alpha_estimator = pymatting.estimate_alpha_cf
super().__init__(cf_alpha_estimator, **kargs)
@manager.MODELS.add_component
class KNNMatting(BaseMLMatting):
def __init__(self, **kargs):
knn_alpha_estimator = pymatting.estimate_alpha_knn
super().__init__(knn_alpha_estimator, **kargs)
@manager.MODELS.add_component
class LearningBasedMatting(BaseMLMatting):
def __init__(self, **kargs):
lbdm_alpha_estimator = pymatting.estimate_alpha_lbdm
super().__init__(lbdm_alpha_estimator, **kargs)
@manager.MODELS.add_component
class FastMatting(BaseMLMatting):
def __init__(self, **kargs):
lkm_alpha_estimator = pymatting.estimate_alpha_lkm
super().__init__(lkm_alpha_estimator, **kargs)
@manager.MODELS.add_component
class RandomWalksMatting(BaseMLMatting):
def __init__(self, **kargs):
rw_alpha_estimator = pymatting.estimate_alpha_rw
super().__init__(rw_alpha_estimator, **kargs)
if __name__ == "__main__":
from pymatting.util.util import load_image, save_image, stack_images
from pymatting.foreground.estimate_foreground_ml import estimate_foreground_ml
import cv2
root = "/mnt/liuyi22/PaddlePaddle/PaddleSeg/Matting/data/examples/"
image_path = root + "lemur.png"
trimap_path = root + "lemur_trimap.png"
cutout_path = root + "lemur_cutout.png"
image = cv2.cvtColor(
cv2.imread(image_path).astype("float64"), cv2.COLOR_BGR2RGB) / 255.0
cv2.imwrite("image.png", (image * 255).astype('uint8'))
trimap = load_image(trimap_path, "GRAY")
print(image.shape, trimap.shape)
print(image.dtype, trimap.dtype)
cf = CloseFormMatting()
alpha = cf(image, trimap)
# alpha = pymatting.estimate_alpha_lkm(image, trimap)
foreground = estimate_foreground_ml(image, alpha)
cutout = stack_images(foreground, alpha)
save_image(cutout_path, cutout)
from .backbone import *
from .losses import *
from .modnet import MODNet
from .human_matting import HumanMatting
from .dim import DIM
from .ppmatting import PPMatting
from .gca import GCABaseline, GCA
from .ppmattingv2 import PPMattingV2
from .mobilenet_v2 import *
from .hrnet import *
from .resnet_vd import *
from .vgg import *
from .gca_enc import *
from .stdcnet import *
\ No newline at end of file
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# The gca code was heavily based on https://github.com/Yaoyi-Li/GCA-Matting
# and https://github.com/open-mmlab/mmediting
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddleseg.cvlibs import manager, param_init
from paddleseg.utils import utils
from ppmatting.models.layers import GuidedCxtAtten
class ResNet_D(nn.Layer):
def __init__(self,
input_channels,
layers,
late_downsample=False,
pretrained=None):
super().__init__()
self.pretrained = pretrained
self._norm_layer = nn.BatchNorm
self.inplanes = 64
self.late_downsample = late_downsample
self.midplanes = 64 if late_downsample else 32
self.start_stride = [1, 2, 1, 2] if late_downsample else [2, 1, 2, 1]
self.conv1 = nn.utils.spectral_norm(
nn.Conv2D(
input_channels,
32,
kernel_size=3,
stride=self.start_stride[0],
padding=1,
bias_attr=False))
self.conv2 = nn.utils.spectral_norm(
nn.Conv2D(
32,
self.midplanes,
kernel_size=3,
stride=self.start_stride[1],
padding=1,
bias_attr=False))
self.conv3 = nn.utils.spectral_norm(
nn.Conv2D(
self.midplanes,
self.inplanes,
kernel_size=3,
stride=self.start_stride[2],
padding=1,
bias_attr=False))
self.bn1 = self._norm_layer(32)
self.bn2 = self._norm_layer(self.midplanes)
self.bn3 = self._norm_layer(self.inplanes)
self.activation = nn.ReLU()
self.layer1 = self._make_layer(
BasicBlock, 64, layers[0], stride=self.start_stride[3])
self.layer2 = self._make_layer(BasicBlock, 128, layers[1], stride=2)
self.layer3 = self._make_layer(BasicBlock, 256, layers[2], stride=2)
self.layer_bottleneck = self._make_layer(
BasicBlock, 512, layers[3], stride=2)
self.init_weight()
def _make_layer(self, block, planes, block_num, stride=1):
if block_num == 0:
return nn.Sequential(nn.Identity())
norm_layer = self._norm_layer
downsample = None
if stride != 1:
downsample = nn.Sequential(
nn.AvgPool2D(2, stride),
nn.utils.spectral_norm(
conv1x1(self.inplanes, planes * block.expansion)),
norm_layer(planes * block.expansion), )
elif self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.utils.spectral_norm(
conv1x1(self.inplanes, planes * block.expansion, stride)),
norm_layer(planes * block.expansion), )
layers = [block(self.inplanes, planes, stride, downsample, norm_layer)]
self.inplanes = planes * block.expansion
for _ in range(1, block_num):
layers.append(block(self.inplanes, planes, norm_layer=norm_layer))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.activation(x)
x = self.conv2(x)
x = self.bn2(x)
x1 = self.activation(x) # N x 32 x 256 x 256
x = self.conv3(x1)
x = self.bn3(x)
x2 = self.activation(x) # N x 64 x 128 x 128
x3 = self.layer1(x2) # N x 64 x 128 x 128
x4 = self.layer2(x3) # N x 128 x 64 x 64
x5 = self.layer3(x4) # N x 256 x 32 x 32
x = self.layer_bottleneck(x5) # N x 512 x 16 x 16
return x, (x1, x2, x3, x4, x5)
def init_weight(self):
for layer in self.sublayers():
if isinstance(layer, nn.Conv2D):
if hasattr(layer, "weight_orig"):
param = layer.weight_orig
else:
param = layer.weight
param_init.xavier_uniform(param)
elif isinstance(layer, (nn.BatchNorm, nn.SyncBatchNorm)):
param_init.constant_init(layer.weight, value=1.0)
param_init.constant_init(layer.bias, value=0.0)
elif isinstance(layer, BasicBlock):
param_init.constant_init(layer.bn2.weight, value=0.0)
if self.pretrained is not None:
utils.load_pretrained_model(self, self.pretrained)
@manager.MODELS.add_component
class ResShortCut_D(ResNet_D):
def __init__(self,
input_channels,
layers,
late_downsample=False,
pretrained=None):
super().__init__(
input_channels,
layers,
late_downsample=late_downsample,
pretrained=pretrained)
self.shortcut_inplane = [input_channels, self.midplanes, 64, 128, 256]
self.shortcut_plane = [32, self.midplanes, 64, 128, 256]
self.shortcut = nn.LayerList()
for stage, inplane in enumerate(self.shortcut_inplane):
self.shortcut.append(
self._make_shortcut(inplane, self.shortcut_plane[stage]))
def _make_shortcut(self, inplane, planes):
return nn.Sequential(
nn.utils.spectral_norm(
nn.Conv2D(
inplane, planes, kernel_size=3, padding=1,
bias_attr=False)),
nn.ReLU(),
self._norm_layer(planes),
nn.utils.spectral_norm(
nn.Conv2D(
planes, planes, kernel_size=3, padding=1, bias_attr=False)),
nn.ReLU(),
self._norm_layer(planes))
def forward(self, x):
out = self.conv1(x)
out = self.bn1(out)
out = self.activation(out)
out = self.conv2(out)
out = self.bn2(out)
x1 = self.activation(out) # N x 32 x 256 x 256
out = self.conv3(x1)
out = self.bn3(out)
out = self.activation(out)
x2 = self.layer1(out) # N x 64 x 128 x 128
x3 = self.layer2(x2) # N x 128 x 64 x 64
x4 = self.layer3(x3) # N x 256 x 32 x 32
out = self.layer_bottleneck(x4) # N x 512 x 16 x 16
fea1 = self.shortcut[0](x) # input image and trimap
fea2 = self.shortcut[1](x1)
fea3 = self.shortcut[2](x2)
fea4 = self.shortcut[3](x3)
fea5 = self.shortcut[4](x4)
return out, {
'shortcut': (fea1, fea2, fea3, fea4, fea5),
'image': x[:, :3, ...]
}
@manager.MODELS.add_component
class ResGuidedCxtAtten(ResNet_D):
def __init__(self,
input_channels,
layers,
late_downsample=False,
pretrained=None):
super().__init__(
input_channels,
layers,
late_downsample=late_downsample,
pretrained=pretrained)
self.input_channels = input_channels
self.shortcut_inplane = [input_channels, self.midplanes, 64, 128, 256]
self.shortcut_plane = [32, self.midplanes, 64, 128, 256]
self.shortcut = nn.LayerList()
for stage, inplane in enumerate(self.shortcut_inplane):
self.shortcut.append(
self._make_shortcut(inplane, self.shortcut_plane[stage]))
self.guidance_head = nn.Sequential(
nn.Pad2D(
1, mode="reflect"),
nn.utils.spectral_norm(
nn.Conv2D(
3, 16, kernel_size=3, padding=0, stride=2,
bias_attr=False)),
nn.ReLU(),
self._norm_layer(16),
nn.Pad2D(
1, mode="reflect"),
nn.utils.spectral_norm(
nn.Conv2D(
16, 32, kernel_size=3, padding=0, stride=2,
bias_attr=False)),
nn.ReLU(),
self._norm_layer(32),
nn.Pad2D(
1, mode="reflect"),
nn.utils.spectral_norm(
nn.Conv2D(
32,
128,
kernel_size=3,
padding=0,
stride=2,
bias_attr=False)),
nn.ReLU(),
self._norm_layer(128))
self.gca = GuidedCxtAtten(128, 128)
self.init_weight()
def init_weight(self):
for layer in self.sublayers():
if isinstance(layer, nn.Conv2D):
initializer = nn.initializer.XavierUniform()
if hasattr(layer, "weight_orig"):
param = layer.weight_orig
else:
param = layer.weight
initializer(param, param.block)
elif isinstance(layer, (nn.BatchNorm, nn.SyncBatchNorm)):
param_init.constant_init(layer.weight, value=1.0)
param_init.constant_init(layer.bias, value=0.0)
elif isinstance(layer, BasicBlock):
param_init.constant_init(layer.bn2.weight, value=0.0)
if self.pretrained is not None:
utils.load_pretrained_model(self, self.pretrained)
def _make_shortcut(self, inplane, planes):
return nn.Sequential(
nn.utils.spectral_norm(
nn.Conv2D(
inplane, planes, kernel_size=3, padding=1,
bias_attr=False)),
nn.ReLU(),
self._norm_layer(planes),
nn.utils.spectral_norm(
nn.Conv2D(
planes, planes, kernel_size=3, padding=1, bias_attr=False)),
nn.ReLU(),
self._norm_layer(planes))
def forward(self, x):
out = self.conv1(x)
out = self.bn1(out)
out = self.activation(out)
out = self.conv2(out)
out = self.bn2(out)
x1 = self.activation(out) # N x 32 x 256 x 256
out = self.conv3(x1)
out = self.bn3(out)
out = self.activation(out)
im_fea = self.guidance_head(
x[:, :3, ...]) # downsample origin image and extract features
if self.input_channels == 6:
unknown = F.interpolate(
x[:, 4:5, ...], scale_factor=1 / 8, mode='nearest')
else:
unknown = x[:, 3:, ...].equal(paddle.to_tensor([1.]))
unknown = paddle.cast(unknown, dtype='float32')
unknown = F.interpolate(unknown, scale_factor=1 / 8, mode='nearest')
x2 = self.layer1(out) # N x 64 x 128 x 128
x3 = self.layer2(x2) # N x 128 x 64 x 64
x3 = self.gca(im_fea, x3, unknown) # contextual attention
x4 = self.layer3(x3) # N x 256 x 32 x 32
out = self.layer_bottleneck(x4) # N x 512 x 16 x 16
fea1 = self.shortcut[0](x) # input image and trimap
fea2 = self.shortcut[1](x1)
fea3 = self.shortcut[2](x2)
fea4 = self.shortcut[3](x3)
fea5 = self.shortcut[4](x4)
return out, {
'shortcut': (fea1, fea2, fea3, fea4, fea5),
'image_fea': im_fea,
'unknown': unknown,
}
class BasicBlock(nn.Layer):
expansion = 1
def __init__(self,
inplanes,
planes,
stride=1,
downsample=None,
norm_layer=None):
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm
# Both self.conv1 and self.downsample layers downsample the input when stride != 1
self.conv1 = nn.utils.spectral_norm(conv3x3(inplanes, planes, stride))
self.bn1 = norm_layer(planes)
self.activation = nn.ReLU()
self.conv2 = nn.utils.spectral_norm(conv3x3(planes, planes))
self.bn2 = norm_layer(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.activation(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.activation(out)
return out
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
"""3x3 convolution with padding"""
return nn.Conv2D(
in_planes,
out_planes,
kernel_size=3,
stride=stride,
padding=dilation,
groups=groups,
bias_attr=False,
dilation=dilation)
def conv1x1(in_planes, out_planes, stride=1):
"""1x1 convolution"""
return nn.Conv2D(
in_planes, out_planes, kernel_size=1, stride=stride, bias_attr=False)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddleseg.cvlibs import manager, param_init
from paddleseg.models import layers
import ppmatting
__all__ = [
"HRNet_W18_Small_V1", "HRNet_W18_Small_V2", "HRNet_W18", "HRNet_W30",
"HRNet_W32", "HRNet_W40", "HRNet_W44", "HRNet_W48", "HRNet_W60", "HRNet_W64"
]
class HRNet(nn.Layer):
"""
The HRNet implementation based on PaddlePaddle.
The original article refers to
Jingdong Wang, et, al. "HRNet:Deep High-Resolution Representation Learning for Visual Recognition"
(https://arxiv.org/pdf/1908.07919.pdf).
Args:
pretrained (str, optional): The path of pretrained model.
stage1_num_modules (int, optional): Number of modules for stage1. Default 1.
stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4).
stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64).
stage2_num_modules (int, optional): Number of modules for stage2. Default 1.
stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4).
stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36).
stage3_num_modules (int, optional): Number of modules for stage3. Default 4.
stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4).
stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [18, 36, 72).
stage4_num_modules (int, optional): Number of modules for stage4. Default 3.
stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4).
stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144).
has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False.
align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
"""
def __init__(self,
input_channels=3,
pretrained=None,
stage1_num_modules=1,
stage1_num_blocks=(4, ),
stage1_num_channels=(64, ),
stage2_num_modules=1,
stage2_num_blocks=(4, 4),
stage2_num_channels=(18, 36),
stage3_num_modules=4,
stage3_num_blocks=(4, 4, 4),
stage3_num_channels=(18, 36, 72),
stage4_num_modules=3,
stage4_num_blocks=(4, 4, 4, 4),
stage4_num_channels=(18, 36, 72, 144),
has_se=False,
align_corners=False,
padding_same=True):
super(HRNet, self).__init__()
self.pretrained = pretrained
self.stage1_num_modules = stage1_num_modules
self.stage1_num_blocks = stage1_num_blocks
self.stage1_num_channels = stage1_num_channels
self.stage2_num_modules = stage2_num_modules
self.stage2_num_blocks = stage2_num_blocks
self.stage2_num_channels = stage2_num_channels
self.stage3_num_modules = stage3_num_modules
self.stage3_num_blocks = stage3_num_blocks
self.stage3_num_channels = stage3_num_channels
self.stage4_num_modules = stage4_num_modules
self.stage4_num_blocks = stage4_num_blocks
self.stage4_num_channels = stage4_num_channels
self.has_se = has_se
self.align_corners = align_corners
self.feat_channels = [i for i in stage4_num_channels]
self.feat_channels = [64] + self.feat_channels
self.conv_layer1_1 = layers.ConvBNReLU(
in_channels=input_channels,
out_channels=64,
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False)
self.conv_layer1_2 = layers.ConvBNReLU(
in_channels=64,
out_channels=64,
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False)
self.la1 = Layer1(
num_channels=64,
num_blocks=self.stage1_num_blocks[0],
num_filters=self.stage1_num_channels[0],
has_se=has_se,
name="layer2",
padding_same=padding_same)
self.tr1 = TransitionLayer(
in_channels=[self.stage1_num_channels[0] * 4],
out_channels=self.stage2_num_channels,
name="tr1",
padding_same=padding_same)
self.st2 = Stage(
num_channels=self.stage2_num_channels,
num_modules=self.stage2_num_modules,
num_blocks=self.stage2_num_blocks,
num_filters=self.stage2_num_channels,
has_se=self.has_se,
name="st2",
align_corners=align_corners,
padding_same=padding_same)
self.tr2 = TransitionLayer(
in_channels=self.stage2_num_channels,
out_channels=self.stage3_num_channels,
name="tr2",
padding_same=padding_same)
self.st3 = Stage(
num_channels=self.stage3_num_channels,
num_modules=self.stage3_num_modules,
num_blocks=self.stage3_num_blocks,
num_filters=self.stage3_num_channels,
has_se=self.has_se,
name="st3",
align_corners=align_corners,
padding_same=padding_same)
self.tr3 = TransitionLayer(
in_channels=self.stage3_num_channels,
out_channels=self.stage4_num_channels,
name="tr3",
padding_same=padding_same)
self.st4 = Stage(
num_channels=self.stage4_num_channels,
num_modules=self.stage4_num_modules,
num_blocks=self.stage4_num_blocks,
num_filters=self.stage4_num_channels,
has_se=self.has_se,
name="st4",
align_corners=align_corners,
padding_same=padding_same)
self.init_weight()
def forward(self, x):
feat_list = []
conv1 = self.conv_layer1_1(x)
feat_list.append(conv1)
conv2 = self.conv_layer1_2(conv1)
la1 = self.la1(conv2)
tr1 = self.tr1([la1])
st2 = self.st2(tr1)
tr2 = self.tr2(st2)
st3 = self.st3(tr2)
tr3 = self.tr3(st3)
st4 = self.st4(tr3)
feat_list = feat_list + st4
return feat_list
def init_weight(self):
for layer in self.sublayers():
if isinstance(layer, nn.Conv2D):
param_init.normal_init(layer.weight, std=0.001)
elif isinstance(layer, (nn.BatchNorm, nn.SyncBatchNorm)):
param_init.constant_init(layer.weight, value=1.0)
param_init.constant_init(layer.bias, value=0.0)
if self.pretrained is not None:
ppmatting.utils.load_pretrained_model(self, self.pretrained)
class Layer1(nn.Layer):
def __init__(self,
num_channels,
num_filters,
num_blocks,
has_se=False,
name=None,
padding_same=True):
super(Layer1, self).__init__()
self.bottleneck_block_list = []
for i in range(num_blocks):
bottleneck_block = self.add_sublayer(
"bb_{}_{}".format(name, i + 1),
BottleneckBlock(
num_channels=num_channels if i == 0 else num_filters * 4,
num_filters=num_filters,
has_se=has_se,
stride=1,
downsample=True if i == 0 else False,
name=name + '_' + str(i + 1),
padding_same=padding_same))
self.bottleneck_block_list.append(bottleneck_block)
def forward(self, x):
conv = x
for block_func in self.bottleneck_block_list:
conv = block_func(conv)
return conv
class TransitionLayer(nn.Layer):
def __init__(self, in_channels, out_channels, name=None, padding_same=True):
super(TransitionLayer, self).__init__()
num_in = len(in_channels)
num_out = len(out_channels)
self.conv_bn_func_list = []
for i in range(num_out):
residual = None
if i < num_in:
if in_channels[i] != out_channels[i]:
residual = self.add_sublayer(
"transition_{}_layer_{}".format(name, i + 1),
layers.ConvBNReLU(
in_channels=in_channels[i],
out_channels=out_channels[i],
kernel_size=3,
padding=1 if not padding_same else 'same',
bias_attr=False))
else:
residual = self.add_sublayer(
"transition_{}_layer_{}".format(name, i + 1),
layers.ConvBNReLU(
in_channels=in_channels[-1],
out_channels=out_channels[i],
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False))
self.conv_bn_func_list.append(residual)
def forward(self, x):
outs = []
for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
if conv_bn_func is None:
outs.append(x[idx])
else:
if idx < len(x):
outs.append(conv_bn_func(x[idx]))
else:
outs.append(conv_bn_func(x[-1]))
return outs
class Branches(nn.Layer):
def __init__(self,
num_blocks,
in_channels,
out_channels,
has_se=False,
name=None,
padding_same=True):
super(Branches, self).__init__()
self.basic_block_list = []
for i in range(len(out_channels)):
self.basic_block_list.append([])
for j in range(num_blocks[i]):
in_ch = in_channels[i] if j == 0 else out_channels[i]
basic_block_func = self.add_sublayer(
"bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
BasicBlock(
num_channels=in_ch,
num_filters=out_channels[i],
has_se=has_se,
name=name + '_branch_layer_' + str(i + 1) + '_' +
str(j + 1),
padding_same=padding_same))
self.basic_block_list[i].append(basic_block_func)
def forward(self, x):
outs = []
for idx, input in enumerate(x):
conv = input
for basic_block_func in self.basic_block_list[idx]:
conv = basic_block_func(conv)
outs.append(conv)
return outs
class BottleneckBlock(nn.Layer):
def __init__(self,
num_channels,
num_filters,
has_se,
stride=1,
downsample=False,
name=None,
padding_same=True):
super(BottleneckBlock, self).__init__()
self.has_se = has_se
self.downsample = downsample
self.conv1 = layers.ConvBNReLU(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=1,
bias_attr=False)
self.conv2 = layers.ConvBNReLU(
in_channels=num_filters,
out_channels=num_filters,
kernel_size=3,
stride=stride,
padding=1 if not padding_same else 'same',
bias_attr=False)
self.conv3 = layers.ConvBN(
in_channels=num_filters,
out_channels=num_filters * 4,
kernel_size=1,
bias_attr=False)
if self.downsample:
self.conv_down = layers.ConvBN(
in_channels=num_channels,
out_channels=num_filters * 4,
kernel_size=1,
bias_attr=False)
if self.has_se:
self.se = SELayer(
num_channels=num_filters * 4,
num_filters=num_filters * 4,
reduction_ratio=16,
name=name + '_fc')
self.add = layers.Add()
self.relu = layers.Activation("relu")
def forward(self, x):
residual = x
conv1 = self.conv1(x)
conv2 = self.conv2(conv1)
conv3 = self.conv3(conv2)
if self.downsample:
residual = self.conv_down(x)
if self.has_se:
conv3 = self.se(conv3)
y = self.add(conv3, residual)
y = self.relu(y)
return y
class BasicBlock(nn.Layer):
def __init__(self,
num_channels,
num_filters,
stride=1,
has_se=False,
downsample=False,
name=None,
padding_same=True):
super(BasicBlock, self).__init__()
self.has_se = has_se
self.downsample = downsample
self.conv1 = layers.ConvBNReLU(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=3,
stride=stride,
padding=1 if not padding_same else 'same',
bias_attr=False)
self.conv2 = layers.ConvBN(
in_channels=num_filters,
out_channels=num_filters,
kernel_size=3,
padding=1 if not padding_same else 'same',
bias_attr=False)
if self.downsample:
self.conv_down = layers.ConvBNReLU(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=1,
bias_attr=False)
if self.has_se:
self.se = SELayer(
num_channels=num_filters,
num_filters=num_filters,
reduction_ratio=16,
name=name + '_fc')
self.add = layers.Add()
self.relu = layers.Activation("relu")
def forward(self, x):
residual = x
conv1 = self.conv1(x)
conv2 = self.conv2(conv1)
if self.downsample:
residual = self.conv_down(x)
if self.has_se:
conv2 = self.se(conv2)
y = self.add(conv2, residual)
y = self.relu(y)
return y
class SELayer(nn.Layer):
def __init__(self, num_channels, num_filters, reduction_ratio, name=None):
super(SELayer, self).__init__()
self.pool2d_gap = nn.AdaptiveAvgPool2D(1)
self._num_channels = num_channels
med_ch = int(num_channels / reduction_ratio)
stdv = 1.0 / math.sqrt(num_channels * 1.0)
self.squeeze = nn.Linear(
num_channels,
med_ch,
weight_attr=paddle.ParamAttr(
initializer=nn.initializer.Uniform(-stdv, stdv)))
stdv = 1.0 / math.sqrt(med_ch * 1.0)
self.excitation = nn.Linear(
med_ch,
num_filters,
weight_attr=paddle.ParamAttr(
initializer=nn.initializer.Uniform(-stdv, stdv)))
def forward(self, x):
pool = self.pool2d_gap(x)
pool = paddle.reshape(pool, shape=[-1, self._num_channels])
squeeze = self.squeeze(pool)
squeeze = F.relu(squeeze)
excitation = self.excitation(squeeze)
excitation = F.sigmoid(excitation)
excitation = paddle.reshape(
excitation, shape=[-1, self._num_channels, 1, 1])
out = x * excitation
return out
class Stage(nn.Layer):
def __init__(self,
num_channels,
num_modules,
num_blocks,
num_filters,
has_se=False,
multi_scale_output=True,
name=None,
align_corners=False,
padding_same=True):
super(Stage, self).__init__()
self._num_modules = num_modules
self.stage_func_list = []
for i in range(num_modules):
if i == num_modules - 1 and not multi_scale_output:
stage_func = self.add_sublayer(
"stage_{}_{}".format(name, i + 1),
HighResolutionModule(
num_channels=num_channels,
num_blocks=num_blocks,
num_filters=num_filters,
has_se=has_se,
multi_scale_output=False,
name=name + '_' + str(i + 1),
align_corners=align_corners,
padding_same=padding_same))
else:
stage_func = self.add_sublayer(
"stage_{}_{}".format(name, i + 1),
HighResolutionModule(
num_channels=num_channels,
num_blocks=num_blocks,
num_filters=num_filters,
has_se=has_se,
name=name + '_' + str(i + 1),
align_corners=align_corners,
padding_same=padding_same))
self.stage_func_list.append(stage_func)
def forward(self, x):
out = x
for idx in range(self._num_modules):
out = self.stage_func_list[idx](out)
return out
class HighResolutionModule(nn.Layer):
def __init__(self,
num_channels,
num_blocks,
num_filters,
has_se=False,
multi_scale_output=True,
name=None,
align_corners=False,
padding_same=True):
super(HighResolutionModule, self).__init__()
self.branches_func = Branches(
num_blocks=num_blocks,
in_channels=num_channels,
out_channels=num_filters,
has_se=has_se,
name=name,
padding_same=padding_same)
self.fuse_func = FuseLayers(
in_channels=num_filters,
out_channels=num_filters,
multi_scale_output=multi_scale_output,
name=name,
align_corners=align_corners,
padding_same=padding_same)
def forward(self, x):
out = self.branches_func(x)
out = self.fuse_func(out)
return out
class FuseLayers(nn.Layer):
def __init__(self,
in_channels,
out_channels,
multi_scale_output=True,
name=None,
align_corners=False,
padding_same=True):
super(FuseLayers, self).__init__()
self._actual_ch = len(in_channels) if multi_scale_output else 1
self._in_channels = in_channels
self.align_corners = align_corners
self.residual_func_list = []
for i in range(self._actual_ch):
for j in range(len(in_channels)):
if j > i:
residual_func = self.add_sublayer(
"residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
layers.ConvBN(
in_channels=in_channels[j],
out_channels=out_channels[i],
kernel_size=1,
bias_attr=False))
self.residual_func_list.append(residual_func)
elif j < i:
pre_num_filters = in_channels[j]
for k in range(i - j):
if k == i - j - 1:
residual_func = self.add_sublayer(
"residual_{}_layer_{}_{}_{}".format(
name, i + 1, j + 1, k + 1),
layers.ConvBN(
in_channels=pre_num_filters,
out_channels=out_channels[i],
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False))
pre_num_filters = out_channels[i]
else:
residual_func = self.add_sublayer(
"residual_{}_layer_{}_{}_{}".format(
name, i + 1, j + 1, k + 1),
layers.ConvBNReLU(
in_channels=pre_num_filters,
out_channels=out_channels[j],
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False))
pre_num_filters = out_channels[j]
self.residual_func_list.append(residual_func)
def forward(self, x):
outs = []
residual_func_idx = 0
for i in range(self._actual_ch):
residual = x[i]
residual_shape = paddle.shape(residual)[-2:]
for j in range(len(self._in_channels)):
if j > i:
y = self.residual_func_list[residual_func_idx](x[j])
residual_func_idx += 1
y = F.interpolate(
y,
residual_shape,
mode='bilinear',
align_corners=self.align_corners)
residual = residual + y
elif j < i:
y = x[j]
for k in range(i - j):
y = self.residual_func_list[residual_func_idx](y)
residual_func_idx += 1
residual = residual + y
residual = F.relu(residual)
outs.append(residual)
return outs
@manager.BACKBONES.add_component
def HRNet_W18_Small_V1(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[1],
stage1_num_channels=[32],
stage2_num_modules=1,
stage2_num_blocks=[2, 2],
stage2_num_channels=[16, 32],
stage3_num_modules=1,
stage3_num_blocks=[2, 2, 2],
stage3_num_channels=[16, 32, 64],
stage4_num_modules=1,
stage4_num_blocks=[2, 2, 2, 2],
stage4_num_channels=[16, 32, 64, 128],
**kwargs)
return model
@manager.BACKBONES.add_component
def HRNet_W18_Small_V2(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[2],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[2, 2],
stage2_num_channels=[18, 36],
stage3_num_modules=3,
stage3_num_blocks=[2, 2, 2],
stage3_num_channels=[18, 36, 72],
stage4_num_modules=2,
stage4_num_blocks=[2, 2, 2, 2],
stage4_num_channels=[18, 36, 72, 144],
**kwargs)
return model
@manager.BACKBONES.add_component
def HRNet_W18(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[18, 36],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[18, 36, 72],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[18, 36, 72, 144],
**kwargs)
return model
@manager.BACKBONES.add_component
def HRNet_W30(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[30, 60],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[30, 60, 120],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[30, 60, 120, 240],
**kwargs)
return model
@manager.BACKBONES.add_component
def HRNet_W32(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[32, 64],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[32, 64, 128],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[32, 64, 128, 256],
**kwargs)
return model
@manager.BACKBONES.add_component
def HRNet_W40(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[40, 80],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[40, 80, 160],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[40, 80, 160, 320],
**kwargs)
return model
@manager.BACKBONES.add_component
def HRNet_W44(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[44, 88],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[44, 88, 176],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[44, 88, 176, 352],
**kwargs)
return model
@manager.BACKBONES.add_component
def HRNet_W48(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[48, 96],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[48, 96, 192],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[48, 96, 192, 384],
**kwargs)
return model
@manager.BACKBONES.add_component
def HRNet_W60(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[60, 120],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[60, 120, 240],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[60, 120, 240, 480],
**kwargs)
return model
@manager.BACKBONES.add_component
def HRNet_W64(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[64, 128],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[64, 128, 256],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[64, 128, 256, 512],
**kwargs)
return model
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment