Commit bea94578 authored by bailuo's avatar bailuo
Browse files

init

parents
GroundingDINO @ 856dde20
Subproject commit 856dde20aee659246248e20734ef9ba5214f5e44
MIT License
Copyright (c) 2023 Hust Vision Lab
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
\ No newline at end of file
# Matte-Anything
Matte-Anything 一种交互式自然图像抠图模型。
## 论文
`Matte Anything: Interactive Natural Image Matting with Segment Anything Model`
- https://arxiv.org/abs/2306.04121
- Image and Vision Computing (Vol.147 Jul. '24)
## 模型结构
<!-- 此处一句话简要介绍模型结构 -->
<div align=center>
<img src="./doc/architecture.png"/>
<div >Matte-Anything</div>
</div>
## 算法原理
Matte-Anything 利用视觉基础模型,如 Segment Anything Models、Open Vocabulary Detection Models 实现交互简单且高质量的自然图像抠图。\
关键思想是使用视觉基础模型生成的分割和透明度信息来生成伪三元图。因此,我们将图像抠图作为图像分割和透明对象检测的下游任务。Matte-Anything 利用特定于任务的视觉模型来增强自然图像抠图的性能。首先,我们采用分段任意模型 SAM 为目标实例生成高质量的掩模。随后,我们利用开放词汇对象检测模型,即 GroudingDINO 来检测常见的透明对象。然后根据分割和透明对象检测结果生成伪三元图,随后将其输入到自然图像抠图模型中—— ViTMatte 。 ViTMatte模型是最先进的与类别无关的抠图方法,通过有效地适应预训练的ViT来实现,并且具有很强的泛化能力。 SAM、GroundingDINO 和 ViTMatte 这三个部分是解耦的,不需要额外的培训。
## 环境配置
```
mv matte-anything_pytoch matte-anything # 去框架名后缀
# docker的-v 路径、docker_name和imageID根据实际情况修改
# pip安装时如果出现下载慢可以尝试别的镜像源
```
### Docker(方法一)
<!-- 此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤 -->
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10 # 本镜像imageID为:2f1f619d0182
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --network=host --name docker_name imageID bash
cd /your_code_path/matte-anything
# Install segment-anything
pip install git+https://github.com/facebookresearch/segment-anything.git
# Install ViTMatte
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
pip install -r requirements.txt
# Install GroundingDINO
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
pip install -e .
```
### Dockerfile(方法二)
<!-- 此处提供dockerfile的使用方法 -->
```
cd /your_code_path/matte-anything/docker
docker build --no-cache -t codestral:latest .
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --network=host --name docker_name imageID bash
cd /your_code_path/matte-anything
# Install segment-anything
pip install git+https://github.com/facebookresearch/segment-anything.git
# Install ViTMatte
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
pip install -r requirements.txt
# Install GroundingDINO
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
pip install -e .
```
### Anaconda(方法三)
<!-- 此处提供本地配置、编译的详细步骤,例如: -->
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```
DTK驱动: dtk24.04.2
python: python3.10
pytorch: 2.1.0
```
`Tips:以上DTK驱动、python、pytorch等DCU相关工具版本需要严格一一对应`
其它非深度学习库参照requirements.txt安装:
```
# Install segment-anything
pip install git+https://github.com/facebookresearch/segment-anything.git
# Install ViTMatte
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
pip install -r requirements.txt
# Install GroundingDINO
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
pip install -e .
```
## 数据集
无。
## 训练
无。
## 推理
下载 [sam_vit_h_4b8939.pth](https://github.com/facebookresearch/segment-anything?tab=readme-ov-file#model-checkpoints)
或者从 [SCNet](http://113.200.138.88:18080/aimodels/findsource-dependency/sam_vit_h_4b8939) 上快速下载;
下载 [GroundingDINO-T](https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth) 或者从 [SCNet]() 上快速下载;
下载 [ViTMatte_vit_b](https://drive.google.com/file/d/1d97oKuITCeWgai2Tf3iNilt6rMSSYzkW/view?usp=sharing) 或者从 [SCNet]() 上快速下载。
并放在 ./pretrained 下。
可视化webui推理:
```
python matte_anything.py --listen
```
<div align=center>
<img src="./doc/webui.png" width=600/>
<div >webui界面</div>
</div>
1、上传图片;\
2、选择目标或者区域,可通过标点或者文本输入;\
3、运行。\
ps:请勿频繁操作;参数自行调整。
## result
<!-- 此处填算法效果测试图(包括输入、输出) -->
<div align=center>
<img src="./doc/demo.png" width=600/>
<div >输入</div>
</div>
<div align=center>
<img src="./doc/Refined.png" width=600/>
<div >抠图结果</div>
</div>
<div align=center>
<img src="./doc/new_background.png" width=600/>
<div >替换背景</div>
</div>
### 精度
无。
<!-- | 加速卡 | lpips | clip sim |
| :-----| :----- | :---- |
| K100_AI | 0.115 | 0.977 | -->
<!-- | 单元格 | 单元格 | 单元格 | -->
## 应用场景
### 算法类别
<!-- 超出以上分类的类别命名也可参考此网址中的类别名:https://huggingface.co/ \ -->
`AIGC`
### 热点应用行业
<!-- 应用行业的填写需要做大量调研,从而为使用者提供专业、全面的推荐,除特殊算法,通常推荐数量>=3。 -->
`零售,制造,电商,医疗,教育`
## 源码仓库及问题反馈
<!-- - 此处填本项目gitlab地址 -->
- https://developer.sourcefind.cn/codes/modelzoo/matte-anything_pytorch
## 参考资料
- https://github.com/hustvl/Matte-Anything
<div align="center">
<h1>Matte Anything!🐒</h1>
<h3> Interactive Natural Image Matting with Segment Anything Models </h3>
Authors: [Jingfeng Yao](https://github.com/JingfengYao), [Xinggang Wang](https://scholar.google.com/citations?user=qNCTLV0AAAAJ&hl=zh-CN)<sup>:email:</sup>, [Lang Ye](https://github.com/YeL6), [Wenyu Liu](http://eic.hust.edu.cn/professor/liuwenyu/)
Institute: School of EIC, HUST
(<sup>:email:</sup>) corresponding author
[![arxiv paper](https://img.shields.io/badge/arxiv-paper-orange)](https://arxiv.org/abs/2306.04121)
[![video](https://img.shields.io/badge/Demo-Video-blue)](https://github.com/hustvl/Matte-Anything/assets/74295796/dfe051c2-b5d1-442d-9eff-cd1fcfd1f51b)
[![license](https://img.shields.io/badge/license-MIT-blue)](LICENSE)
[![authors](https://img.shields.io/badge/by-hustvl-green)](https://github.com/hustvl)
</div>
![demo](https://github.com/hustvl/Matte-Anything/assets/74295796/d947f59d-b0c1-4c22-9967-d8f2bf633879)
#
## 📢 News
* **`2024/05/04`** Matte Anything has been accepted by the Journal of Image and Vision Computing!
* **`2024/01/02`** Now you can get alpha materials with MatAny! Check it now!
* **`2023/07/01`** We release a new version that enables text input and transparency correction!
* **`2023/06/08`** We release arxiv tech report!
* **`2023/06/08`** We release source codes of Matte Anything!
The program is still in progress. You can try the early version first! Thanks for your attention. If you like Matte Anything, you may also like its previous foundation work [ViTMatte](https://github.com/hustvl/ViTMatte).
#
## 🔥 Transparent Materials of Yourself !
MatAny could get matting results as *RGBA images*. It will be saved into ```your_demos/your_demo.png```. You can specify the saving name of it in our UI. We hope MatAny could help.
<p align="center">
<img src="figs/materials.png" width="70%">
</p>
## 📜 Introduction
We propose Matte Anything (MatAny), an interactive natural image matting model. It could produce high-quality alpha-matte with various simple hints. The key insight of MatAny is to generate pseudo trimap automatically with contour and transparency prediction. We leverage task-specific vision models to enhance the performance of natural image matting.
![web_ui](figs/first.png)
## 🌞 Features
* Matte Anything with Simple Interaction
* High Quality Matting Results
* Ability to Process Transparent Object
## 🎮 Quick Start
Try our Matte Anything with our web-ui!
![web_ui](figs/web_ui.gif)
### Quick Installation
Install [Segment Anything Models](https://github.com/facebookresearch/segment-anything) as following:
```
pip install git+https://github.com/facebookresearch/segment-anything.git
```
Install [ViTMatte](https://github.com/hustvl/ViTMatte) as following:
```
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
pip install -r requirements.txt
```
Install [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) as following:
```
cd Matte-Anything
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
pip install -e .
```
Download pretrained models [SAM_vit_h](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth), [ViTMatte_vit_b](https://drive.google.com/file/d/1d97oKuITCeWgai2Tf3iNilt6rMSSYzkW/view?usp=sharing), and [GroundingDINO-T](https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth). Put them in ``./pretrained``
### Run our web-ui!
```
python matte_anything.py
```
### How to use
1. Upload the image and click on it (default: ``foreground point``).
2. Click ``Start!``.
3. Modify ``erode_kernel_size`` and ``dilate_kernel_size`` for a better trimap (optional).
## 🎬 Demo
https://github.com/hustvl/Matte-Anything/assets/74295796/dfe051c2-b5d1-442d-9eff-cd1fcfd1f51b
Visualization of SAM and MatAny on real-world data from [AM-2K](https://github.com/JizhiziLi/GFM) and [P3M-500](https://github.com/JizhiziLi/P3M) .
![web_ui](figs/demo1.png)
Visualization of SAM and MatAny on [Composition-1k](https://arxiv.org/pdf/1703.03872v3.pdf)
![web_ui](figs/demo2.png)
## 📋 Todo List
- [x] adjustable trimap generation
- [x] arxiv tech report
- [x] support user transparency correction
- [x] support text input
- [ ] add example data
- [ ] finetune ViTMatte for better performance
## 🤝Acknowledgement
Our repo is built upon [Segment Anything](https://github.com/facebookresearch/segment-anything), [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO), and [ViTMatte](https://github.com/hustvl/ViTMatte). Thanks to their work.
## Citation
```
@article{yao2024matte,
title={Matte anything: Interactive natural image matting with segment anything model},
author={Yao, Jingfeng and Wang, Xinggang and Ye, Lang and Liu, Wenyu},
journal={Image and Vision Computing},
pages={105067},
year={2024},
publisher={Elsevier}
}
```
from omegaconf import OmegaConf
from torch.utils.data import DataLoader
from detectron2.config import LazyCall as L
from torch.utils.data.distributed import DistributedSampler
from data import ImageFileTrain, DataGenerator
#Dataloader
train_dataset = DataGenerator(
data = ImageFileTrain(
alpha_dir='path/to/alpha',
fg_dir='path/to/alpha',
bg_dir='path/to/alpha',
root='path/to/Adobe_Image_Matting'
),
phase = 'train'
)
dataloader = OmegaConf.create()
dataloader.train = L(DataLoader)(
dataset = train_dataset,
batch_size=15,
shuffle=False,
num_workers=4,
pin_memory=True,
sampler=L(DistributedSampler)(
dataset = train_dataset,
),
drop_last=True
)
\ No newline at end of file
import torch.nn as nn
from functools import partial
from detectron2.config import LazyCall as L
from modeling import ViTMatte, MattingCriterion, Detail_Capture, ViT
# Base
embed_dim, num_heads = 384, 6
model = L(ViTMatte)(
backbone = L(ViT)( # Single-scale ViT backbone
in_chans=4,
img_size=512,
patch_size=16,
embed_dim=embed_dim,
depth=12,
num_heads=num_heads,
drop_path_rate=0,
window_size=14,
mlp_ratio=4,
qkv_bias=True,
norm_layer=partial(nn.LayerNorm, eps=1e-6),
window_block_indexes=[
# 2, 5, 8 11 for global attention
0,
1,
3,
4,
6,
7,
9,
10,
],
residual_block_indexes=[2, 5, 8, 11],
use_rel_pos=True,
out_feature="last_feat",
),
criterion=L(MattingCriterion)(
losses = ['unknown_l1_loss', 'known_l1_loss', 'loss_pha_laplacian', 'loss_gradient_penalty']
),
pixel_mean = [123.675 / 255., 116.280 / 255., 103.530 / 255.],
pixel_std = [58.395 / 255., 57.120 / 255., 57.375 / 255.],
input_format = "RGB",
size_divisibility=32,
decoder=L(Detail_Capture)(),
)
\ No newline at end of file
from detectron2 import model_zoo
from functools import partial
def get_vit_lr_decay_rate(name, lr_decay_rate=1.0, num_layers=12):
"""
Calculate lr decay rate for different ViT blocks.
Args:
name (string): parameter name.
lr_decay_rate (float): base lr decay rate.
num_layers (int): number of ViT blocks.
Returns:
lr decay rate for the given parameter.
"""
layer_id = num_layers + 1
if name.startswith("backbone"):
if ".pos_embed" in name or ".patch_embed" in name:
layer_id = 0
elif ".blocks." in name and ".residual." not in name:
layer_id = int(name[name.find(".blocks.") :].split(".")[2]) + 1
return lr_decay_rate ** (num_layers + 1 - layer_id)
# Optimizer
optimizer = model_zoo.get_config("common/optim.py").AdamW
optimizer.params.lr_factor_func = partial(get_vit_lr_decay_rate, num_layers=12, lr_decay_rate=0.65)
optimizer.params.overrides = {"pos_embed": {"weight_decay": 0.0}}
\ No newline at end of file
from detectron2.config import LazyCall as L
from detectron2.solver import WarmupParamScheduler
from fvcore.common.param_scheduler import MultiStepParamScheduler
lr_multiplier = L(WarmupParamScheduler)(
scheduler=L(MultiStepParamScheduler)(
values=[1.0, 0.1, 0.01],
milestones=[96778, 103579],
num_updates=100,
),
warmup_length=250 / 100,
warmup_factor=0.001,
)
\ No newline at end of file
train = dict(
output_dir="./output",
init_checkpoint="",
max_iter=90000,
amp=dict(enabled=False), # options for Automatic Mixed Precision
ddp=dict( # options for DistributedDataParallel
broadcast_buffers=True,
find_unused_parameters=False,
fp16_compression=True,
),
checkpointer=dict(period=5000, max_to_keep=100), # options for PeriodicCheckpointer
eval_period=5000,
log_period=20,
device="cuda"
# ...
)
\ No newline at end of file
from .common.train import train
from .common.model import model
from .common.optimizer import optimizer
from .common.scheduler import lr_multiplier
model.backbone.embed_dim = 768
model.backbone.num_heads = 12
model.decoder.in_chans = 768
train.max_iter = int(43100 / 10 / 2 * 100)
train.checkpointer.period = int(43100 / 10 / 2 * 10)
optimizer.lr=5e-4
lr_multiplier.scheduler.values=[1.0, 0.1, 0.05]
lr_multiplier.scheduler.milestones=[int(43100 / 10 / 2 * 30), int(43100 / 10 / 2 * 90)]
lr_multiplier.scheduler.num_updates = train.max_iter
lr_multiplier.warmup_length = 250 / train.max_iter
train.init_checkpoint = './pretrained/mae_vit_b_fna.pth'
train.output_dir = './output_of_train/ViTMatte_B_100ep'
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10
ENV DEBIAN_FRONTEND=noninteractive
# COPY requirements.txt requirements.txt
# RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
\ No newline at end of file
from .mattingtrainer import MattingTrainer
from detectron2.engine import AMPTrainer
import torch
import time
def cycle(iterable):
while True:
for x in iterable:
yield x
class MattingTrainer(AMPTrainer):
def __init__(self, model, data_loader, optimizer, grad_scaler=None):
super().__init__(model, data_loader, optimizer, grad_scaler=None)
self.data_loader_iter = iter(cycle(self.data_loader))
def run_step(self):
"""
Implement the AMP training logic.
"""
assert self.model.training, "[AMPTrainer] model was changed to eval mode!"
assert torch.cuda.is_available(), "[AMPTrainer] CUDA is required for AMP training!"
from torch.cuda.amp import autocast
#matting pass
start = time.perf_counter()
data = next(self.data_loader_iter)
data_time = time.perf_counter() - start
with autocast():
loss_dict = self.model(data)
if isinstance(loss_dict, torch.Tensor):
losses = loss_dict
loss_dict = {"total_loss": loss_dict}
else:
losses = sum(loss_dict.values())
self.optimizer.zero_grad()
self.grad_scaler.scale(losses).backward()
self._write_metrics(loss_dict, data_time)
self.grad_scaler.step(self.optimizer)
self.grad_scaler.update()
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment