init

bea94578 · bailuo · 856dde20 · bea94578 · bea94578 · bea94578
Commit bea94578 authored Nov 26, 2024 by bailuo
20 changed files
--- a/GroundingDINO @ 856dde20
+++ b/GroundingDINO @ 856dde20
+Subproject commit 856dde20aee659246248e20734ef9ba5214f5e44
--- a/LICENSE
+++ b/LICENSE
+MIT License
+Copyright (c) 2023 Hust Vision Lab
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# Matte-Anything
+Matte-Anything 一种交互式自然图像抠图模型。
+## 论文
+`Matte Anything: Interactive Natural Image Matting with Segment Anything Model`
+- https://arxiv.org/abs/2306.04121
+- Image and Vision Computing (Vol.147 Jul. '24)
+## 模型结构
+<!-- 此处一句话简要介绍模型结构 -->
+<div align=center>
+    <img src="./doc/architecture.png"/>
+    <div >Matte-Anything</div>
+</div>
+## 算法原理
+Matte-Anything 利用视觉基础模型，如 Segment Anything Models、Open Vocabulary Detection Models 实现交互简单且高质量的自然图像抠图。\
+关键思想是使用视觉基础模型生成的分割和透明度信息来生成伪三元图。因此，我们将图像抠图作为图像分割和透明对象检测的下游任务。Matte-Anything 利用特定于任务的视觉模型来增强自然图像抠图的性能。首先，我们采用分段任意模型 SAM 为目标实例生成高质量的掩模。随后，我们利用开放词汇对象检测模型，即 GroudingDINO 来检测常见的透明对象。然后根据分割和透明对象检测结果生成伪三元图，随后将其输入到自然图像抠图模型中—— ViTMatte 。 ViTMatte模型是最先进的与类别无关的抠图方法，通过有效地适应预训练的ViT来实现，并且具有很强的泛化能力。 SAM、GroundingDINO 和 ViTMatte 这三个部分是解耦的，不需要额外的培训。
+## 环境配置
+```
+mv matte-anything_pytoch matte-anything # 去框架名后缀
+# docker的-v 路径、docker_name和imageID根据实际情况修改
+# pip安装时如果出现下载慢可以尝试别的镜像源
+```
+### Docker（方法一）
+<!-- 此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤 -->
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10 # 本镜像imageID为：2f1f619d0182
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --network=host --name docker_name imageID bash
+cd /your_code_path/matte-anything
+# Install segment-anything
+pip install git+https://github.com/facebookresearch/segment-anything.git
+# Install ViTMatte
+python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
+pip install -r requirements.txt
+# Install GroundingDINO
+git clone https://github.com/IDEA-Research/GroundingDINO.git
+cd GroundingDINO
+pip install -e .
+```
+### Dockerfile（方法二）
+<!-- 此处提供dockerfile的使用方法 -->
+```
+cd /your_code_path/matte-anything/docker
+docker build --no-cache -t codestral:latest .
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --network=host --name docker_name imageID bash
+cd /your_code_path/matte-anything
+# Install segment-anything
+pip install git+https://github.com/facebookresearch/segment-anything.git
+# Install ViTMatte
+python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
+pip install -r requirements.txt
+# Install GroundingDINO
+git clone https://github.com/IDEA-Research/GroundingDINO.git
+cd GroundingDINO
+pip install -e .
+```
+### Anaconda（方法三）
+<!-- 此处提供本地配置、编译的详细步骤，例如： -->
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+```
+DTK驱动: dtk24.04.2
+python: python3.10
+pytorch: 2.1.0
+```
+`Tips：以上DTK驱动、python、pytorch等DCU相关工具版本需要严格一一对应`
+其它非深度学习库参照requirements.txt安装：
+```
+# Install segment-anything
+pip install git+https://github.com/facebookresearch/segment-anything.git
+# Install ViTMatte
+python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
+pip install -r requirements.txt
+# Install GroundingDINO
+git clone https://github.com/IDEA-Research/GroundingDINO.git
+cd GroundingDINO
+pip install -e .
+```
+## 数据集
+无。
+## 训练
+无。
+## 推理
+下载 [sam_vit_h_4b8939.pth](https://github.com/facebookresearch/segment-anything?tab=readme-ov-file#model-checkpoints)
+或者从 [SCNet](http://113.200.138.88:18080/aimodels/findsource-dependency/sam_vit_h_4b8939) 上快速下载；
+下载 [GroundingDINO-T](https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth) 或者从 [SCNet]() 上快速下载；
+下载 [ViTMatte_vit_b](https://drive.google.com/file/d/1d97oKuITCeWgai2Tf3iNilt6rMSSYzkW/view?usp=sharing) 或者从 [SCNet]() 上快速下载。
+并放在 ./pretrained 下。
+可视化webui推理：
+```
+python matte_anything.py --listen
+```
+<div align=center>
+    <img src="./doc/webui.png" width=600/>
+    <div >webui界面</div>
+</div>
+1、上传图片；\
+2、选择目标或者区域，可通过标点或者文本输入；\
+3、运行。\
+ps：请勿频繁操作；参数自行调整。
+## result
+<!-- 此处填算法效果测试图（包括输入、输出） -->
+<div align=center>
+    <img src="./doc/demo.png" width=600/>
+    <div >输入</div>
+</div>
+<div align=center>
+    <img src="./doc/Refined.png" width=600/>
+    <div >抠图结果</div>
+</div>
+<div align=center>
+    <img src="./doc/new_background.png" width=600/>
+    <div >替换背景</div>
+</div>
+### 精度
+无。
+<!-- | 加速卡 | lpips | clip sim |
+| :-----| :----- | :---- |
+| K100_AI | 0.115 | 0.977 | -->
+<!-- | 单元格 | 单元格 | 单元格 | -->
+## 应用场景
+### 算法类别
+<!-- 超出以上分类的类别命名也可参考此网址中的类别名：https://huggingface.co/ \ -->
+`AIGC`
+### 热点应用行业
+<!-- 应用行业的填写需要做大量调研，从而为使用者提供专业、全面的推荐，除特殊算法，通常推荐数量>=3。 -->
+`零售,制造,电商,医疗,教育`
+## 源码仓库及问题反馈
+<!-- - 此处填本项目gitlab地址 -->
+- https://developer.sourcefind.cn/codes/modelzoo/matte-anything_pytorch
+## 参考资料
+- https://github.com/hustvl/Matte-Anything
--- a/README_origin.md
+++ b/README_origin.md
+<div align="center">
+<h1>Matte Anything!🐒</h1>
+<h3> Interactive Natural Image Matting with Segment Anything Models </h3>
+Authors: [Jingfeng Yao](https://github.com/JingfengYao), [Xinggang Wang](https://scholar.google.com/citations?user=qNCTLV0AAAAJ&hl=zh-CN)<sup>:email:</sup>, [Lang Ye](https://github.com/YeL6), [Wenyu Liu](http://eic.hust.edu.cn/professor/liuwenyu/)
+Institute: School of EIC, HUST
+(<sup>:email:</sup>) corresponding author
+[![arxiv paper](https://img.shields.io/badge/arxiv-paper-orange)](https://arxiv.org/abs/2306.04121)
+[![video](https://img.shields.io/badge/Demo-Video-blue)](https://github.com/hustvl/Matte-Anything/assets/74295796/dfe051c2-b5d1-442d-9eff-cd1fcfd1f51b)
+[![license](https://img.shields.io/badge/license-MIT-blue)](LICENSE)
+[![authors](https://img.shields.io/badge/by-hustvl-green)](https://github.com/hustvl)
+</div>
+![demo](https://github.com/hustvl/Matte-Anything/assets/74295796/d947f59d-b0c1-4c22-9967-d8f2bf633879)
+#
+## 📢 News
+* **`2024/05/04`** Matte Anything has been accepted by the Journal of Image and Vision Computing!
+* **`2024/01/02`** Now you can get alpha materials with MatAny! Check it now!
+* **`2023/07/01`** We release a new version that enables text input and transparency correction!
+* **`2023/06/08`** We release arxiv tech report! 
+* **`2023/06/08`** We release source codes of Matte Anything!
+The program is still in progress. You can try the early version first! Thanks for your attention. If you like Matte Anything, you may also like its previous foundation work [ViTMatte](https://github.com/hustvl/ViTMatte).
+#
+## 🔥 Transparent Materials of Yourself !
+MatAny could get matting results as *RGBA images*. It will be saved into ```your_demos/your_demo.png```. You can specify the saving name of it in our UI. We hope MatAny could help.
+<p align="center">
+<img src="figs/materials.png" width="70%">
+</p>
+## 📜 Introduction
+We propose Matte Anything (MatAny), an interactive natural image matting model. It could produce high-quality alpha-matte with various simple hints. The key insight of MatAny is to generate pseudo trimap automatically with contour and transparency prediction. We leverage task-specific vision models to enhance the performance of natural image matting.
+![web_ui](figs/first.png)
+## 🌞 Features
+* Matte Anything with Simple Interaction
+* High Quality Matting Results
+* Ability to Process Transparent Object
+## 🎮 Quick Start
+Try our Matte Anything with our web-ui!
+![web_ui](figs/web_ui.gif)
+### Quick Installation
+Install [Segment Anything Models](https://github.com/facebookresearch/segment-anything) as following:
+```
+pip install git+https://github.com/facebookresearch/segment-anything.git
+```
+Install [ViTMatte](https://github.com/hustvl/ViTMatte) as following:
+```
+python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
+pip install -r requirements.txt
+```
+Install [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) as following:
+```
+cd Matte-Anything
+git clone https://github.com/IDEA-Research/GroundingDINO.git
+cd GroundingDINO
+pip install -e .
+```
+Download pretrained models [SAM_vit_h](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth), [ViTMatte_vit_b](https://drive.google.com/file/d/1d97oKuITCeWgai2Tf3iNilt6rMSSYzkW/view?usp=sharing), and [GroundingDINO-T](https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth). Put them in ``./pretrained``
+### Run our web-ui!
+```
+python matte_anything.py
+```
+### How to use
+1. Upload the image and click on it (default: ``foreground point``).
+2. Click ``Start!``.
+3. Modify ``erode_kernel_size`` and ``dilate_kernel_size`` for a better trimap (optional).
+## 🎬 Demo
+https://github.com/hustvl/Matte-Anything/assets/74295796/dfe051c2-b5d1-442d-9eff-cd1fcfd1f51b
+Visualization of SAM and MatAny on real-world data from [AM-2K](https://github.com/JizhiziLi/GFM) and [P3M-500](https://github.com/JizhiziLi/P3M) .
+![web_ui](figs/demo1.png)
+ Visualization of SAM and MatAny on [Composition-1k](https://arxiv.org/pdf/1703.03872v3.pdf)
+![web_ui](figs/demo2.png)
+## 📋 Todo List
+- [x] adjustable trimap generation
+- [x] arxiv tech report
+- [x] support user transparency correction
+- [x] support text input
+- [ ] add example data
+- [ ] finetune ViTMatte for better performance
+## 🤝Acknowledgement
+Our repo is built upon [Segment Anything](https://github.com/facebookresearch/segment-anything), [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO), and [ViTMatte](https://github.com/hustvl/ViTMatte). Thanks to their work.
+## Citation
+```
+@article{yao2024matte,
+  title={Matte anything: Interactive natural image matting with segment anything model},
+  author={Yao, Jingfeng and Wang, Xinggang and Ye, Lang and Liu, Wenyu},
+  journal={Image and Vision Computing},
+  pages={105067},
+  year={2024},
+  publisher={Elsevier}
+}
+```
--- a/configs/common/dataloader.py
+++ b/configs/common/dataloader.py
+from omegaconf import OmegaConf
+from torch.utils.data import DataLoader
+from detectron2.config import LazyCall as L
+from torch.utils.data.distributed import DistributedSampler
+from data import ImageFileTrain, DataGenerator
+#Dataloader
+train_dataset = DataGenerator(
+    data = ImageFileTrain(
+        alpha_dir='path/to/alpha',
+        fg_dir='path/to/alpha',
+        bg_dir='path/to/alpha',
+        root='path/to/Adobe_Image_Matting'
+    ),
+    phase = 'train'
+)
+dataloader = OmegaConf.create()
+dataloader.train = L(DataLoader)(
+    dataset = train_dataset,
+    batch_size=15,
+    shuffle=False,
+    num_workers=4,
+    pin_memory=True,
+    sampler=L(DistributedSampler)(
+        dataset = train_dataset,
+    ),
+    drop_last=True
+)
\ No newline at end of file
--- a/configs/common/model.py
+++ b/configs/common/model.py
+import torch.nn as nn
+from functools import partial
+from detectron2.config import LazyCall as L
+from modeling import ViTMatte, MattingCriterion, Detail_Capture, ViT
+# Base
+embed_dim, num_heads = 384, 6
+model = L(ViTMatte)(
+    backbone = L(ViT)(  # Single-scale ViT backbone
+        in_chans=4,
+        img_size=512,
+        patch_size=16,
+        embed_dim=embed_dim,
+        depth=12,
+        num_heads=num_heads,
+        drop_path_rate=0,
+        window_size=14,
+        mlp_ratio=4,
+        qkv_bias=True,
+        norm_layer=partial(nn.LayerNorm, eps=1e-6),
+        window_block_indexes=[
+            # 2, 5, 8 11 for global attention
+            0,
+            1,
+            3,
+            4,
+            6,
+            7,
+            9,
+            10,
+        ],
+        residual_block_indexes=[2, 5, 8, 11],
+        use_rel_pos=True,
+        out_feature="last_feat",
+    ),
+    criterion=L(MattingCriterion)(
+        losses = ['unknown_l1_loss', 'known_l1_loss', 'loss_pha_laplacian', 'loss_gradient_penalty']
+    ),
+    pixel_mean = [123.675 / 255., 116.280 / 255., 103.530 / 255.],
+    pixel_std = [58.395 / 255., 57.120 / 255., 57.375 / 255.],
+    input_format = "RGB",
+    size_divisibility=32,
+    decoder=L(Detail_Capture)(),
+)
\ No newline at end of file
--- a/configs/common/optimizer.py
+++ b/configs/common/optimizer.py
+from detectron2 import model_zoo
+from functools import partial
+def get_vit_lr_decay_rate(name, lr_decay_rate=1.0, num_layers=12):
+    """
+    Calculate lr decay rate for different ViT blocks.
+    Args:
+        name (string): parameter name.
+        lr_decay_rate (float): base lr decay rate.
+        num_layers (int): number of ViT blocks.
+    Returns:
+        lr decay rate for the given parameter.
+    """
+    layer_id = num_layers + 1
+    if name.startswith("backbone"):
+        if ".pos_embed" in name or ".patch_embed" in name:
+            layer_id = 0
+        elif ".blocks." in name and ".residual." not in name:
+            layer_id = int(name[name.find(".blocks.") :].split(".")[2]) + 1
+    return lr_decay_rate ** (num_layers + 1 - layer_id)
+# Optimizer
+optimizer = model_zoo.get_config("common/optim.py").AdamW
+optimizer.params.lr_factor_func = partial(get_vit_lr_decay_rate, num_layers=12, lr_decay_rate=0.65)
+optimizer.params.overrides = {"pos_embed": {"weight_decay": 0.0}}
\ No newline at end of file
--- a/configs/common/scheduler.py
+++ b/configs/common/scheduler.py
+from detectron2.config import LazyCall as L
+from detectron2.solver import WarmupParamScheduler
+from fvcore.common.param_scheduler import MultiStepParamScheduler
+lr_multiplier = L(WarmupParamScheduler)(
+    scheduler=L(MultiStepParamScheduler)(
+        values=[1.0, 0.1, 0.01],
+        milestones=[96778, 103579],
+        num_updates=100,
+    ),
+    warmup_length=250 / 100,
+    warmup_factor=0.001,
+)
\ No newline at end of file
--- a/configs/common/train.py
+++ b/configs/common/train.py
+train = dict(
+    output_dir="./output",
+    init_checkpoint="",
+    max_iter=90000,
+    amp=dict(enabled=False),  # options for Automatic Mixed Precision
+    ddp=dict(  # options for DistributedDataParallel
+        broadcast_buffers=True,
+        find_unused_parameters=False,
+        fp16_compression=True,
+    ),
+    checkpointer=dict(period=5000, max_to_keep=100),  # options for PeriodicCheckpointer
+    eval_period=5000,
+    log_period=20,
+    device="cuda"
+    # ...
+)
\ No newline at end of file
--- a/configs/matte_anything.py
+++ b/configs/matte_anything.py
+from .common.train import train
+from .common.model import model
+from .common.optimizer import optimizer
+from .common.scheduler import lr_multiplier
+model.backbone.embed_dim = 768
+model.backbone.num_heads = 12
+model.decoder.in_chans = 768
+train.max_iter = int(43100 / 10 / 2 * 100)
+train.checkpointer.period = int(43100 / 10 / 2 * 10)
+optimizer.lr=5e-4
+lr_multiplier.scheduler.values=[1.0, 0.1, 0.05]
+lr_multiplier.scheduler.milestones=[int(43100 / 10 / 2 * 30), int(43100 / 10 / 2 * 90)]
+lr_multiplier.scheduler.num_updates = train.max_iter
+lr_multiplier.warmup_length = 250 / train.max_iter
+train.init_checkpoint = './pretrained/mae_vit_b_fna.pth'
+train.output_dir = './output_of_train/ViTMatte_B_100ep'
\ No newline at end of file
--- a/doc/Refined.png
+++ b/doc/Refined.png
--- a/doc/architecture.png
+++ b/doc/architecture.png
--- a/doc/demo.png
+++ b/doc/demo.png
--- a/doc/new_background.png
+++ b/doc/new_background.png
--- a/doc/webui.png
+++ b/doc/webui.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10
+ENV DEBIAN_FRONTEND=noninteractive
+# COPY requirements.txt requirements.txt
+# RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
\ No newline at end of file
--- a/engine/__init__.py
+++ b/engine/__init__.py
+from .mattingtrainer import MattingTrainer
--- a/engine/mattingtrainer.py
+++ b/engine/mattingtrainer.py
+from detectron2.engine import AMPTrainer
+import torch
+import time
+def cycle(iterable):
+    while True:
+        for x in iterable:
+            yield x
+class MattingTrainer(AMPTrainer):
+    def __init__(self, model, data_loader, optimizer, grad_scaler=None):
+        super().__init__(model, data_loader, optimizer, grad_scaler=None)
+        self.data_loader_iter = iter(cycle(self.data_loader))
+    def run_step(self):
+        """
+        Implement the AMP training logic.
+        """
+        assert self.model.training, "[AMPTrainer] model was changed to eval mode!"
+        assert torch.cuda.is_available(), "[AMPTrainer] CUDA is required for AMP training!"
+        from torch.cuda.amp import autocast
+        #matting pass
+        start = time.perf_counter()        
+        data = next(self.data_loader_iter)
+        data_time = time.perf_counter() - start
+        with autocast():
+            loss_dict = self.model(data)
+            if isinstance(loss_dict, torch.Tensor):
+                losses = loss_dict
+                loss_dict = {"total_loss": loss_dict}
+            else:
+                losses = sum(loss_dict.values())
+        self.optimizer.zero_grad()
+        self.grad_scaler.scale(losses).backward()
+        self._write_metrics(loss_dict, data_time)
+        self.grad_scaler.step(self.optimizer)
+        self.grad_scaler.update()
\ No newline at end of file
--- a/figs/demo.png
+++ b/figs/demo.png
--- a/figs/demo1.png
+++ b/figs/demo1.png