Commit 778f4319 authored by mashun1's avatar mashun1
Browse files

sdxl

parents
Pipeline #1370 canceled with stages
__pycache__
results
pretrained_models
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
\ No newline at end of file
# stable-diffusion-xl_pytorch
## 论文
**SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis**
* https://arxiv.org/abs/2307.01952
## 模型结构
sdxl在`stable diffusion`的基础上使用了更大的`Unet backbone`,以及两个`text-encoder`对prompt进行处理,同时包含两个模型,分别为`Base``Refiner`,其中`Base`可以单独使用以生成图像,也可以`Base``Refiner`同时使用,生成更高分辨率的图像。
![alt text](readme_imgs/mr.png)
## 算法原理
sdxl遵循`DDPM`训练目标,通过`扩散-重建`方式训练网络。
![alt text](readme_imgs/alg.png)
## 环境配置
### Docker(方法一)
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run --shm-size 50g --network=host --name=sdxl --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
pip install -r requirements.txt
### Dockerfile(方法二)
docker build -t <IMAGE_NAME>:<TAG> .
docker run --shm-size 50g --network=host --name=sdxl --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
pip install -r requirements.txt
### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/
DTK驱动:dtk24.04.1
python:python3.10
torch: 2.1.0
torchvision: 0.16.0
Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
2、其它非特殊库参照requirements.txt安装
pip install -r requirements.txt
## 数据集
## 训练
## 推理
python inference_diffusers.py --mode t2i --prompt <your prompt>
更多参数列表如下所示
|key|value|说明|
|:---:|:---:|:---:|
|--mode|t2i/i2i/inpainting/t2i_wr/inpainting_wr|模型模式,wr表示with refiner|
|--base_path|/path/to/sdxl-base|sdxl基础模型路径|
|--prompt|your prompt|提示语|
|--refiner_path|/path/to/sdxl-refiner|sdxl-refiner模型路径|
|--image_path||图像路径|
|--mask_path||掩码图像路径|
|--save_root||图像存储文件夹路径|
注意:该脚本仅用于sdxl基本功能测试,更多高级功能请参考`参考资料`部分,也可以使用[ComfyUI](https://github.com/comfyanonymous/ComfyUI)[Fooocus](https://github.com/lllyasviel/Fooocus) 获取更好的体验。
## result
prompt: a panda is playing a ball.
||t2i|i2i|inpainting|
|---|:---:|:---:|:---:|
|img||![alt txt](readme_imgs/i2i_input.png)|![alt txt](readme_imgs/inpainting_input.png)
|mask|||![alt txt](readme_imgs/inpainting_mask.png)|
|output|![alt txt](readme_imgs/t2i.png)|![alt txt](readme_imgs/i2i.png)|![alt txt](readme_imgs/inpainting.png)
### 精度
## 应用场景
### 算法类别
`AIGC`
### 热点应用行业
`零售,广媒,教育`
## 预训练权重
|base-model|refiner-model|
|:---:|:---:|
|[huggingface](https://hf-mirror.com/stabilityai/stable-diffusion-xl-base-1.0) / [SCNet](http://113.200.138.88:18080/aimodels/stable-diffusion-xl-base-1.0) 高速通道|[huggingface](https://hf-mirror.com/stabilityai/stable-diffusion-xl-refiner-1.0) / [SCNet](http://113.200.138.88:18080/aimodels/stable-diffusion-xl-base-1.0)高速通道|
权重文件结构
pretrained_models/
├── stable-diffusion-xl-base-1.0
│   ├── 01.png
│   ├── comparison.png
│   ├── LICENSE.md
│   ├── model_index.json
│   ├── pipeline.png
│   ├── README.md
│   ├── scheduler
│   │   └── scheduler_config.json
│   ├── sd_xl_base_1.0_0.9vae.safetensors
│   ├── sd_xl_base_1.0.safetensors
│   ├── sd_xl_offset_example-lora_1.0.safetensors
│   ├── text_encoder
│   │   ├── config.json
│   │   ├── flax_model.msgpack
│   │   ├── model.fp16.safetensors
│   │   ├── model.onnx
│   │   ├── model.safetensors
│   │   ├── openvino_model.bin
│   │   └── openvino_model.xml
│   ├── text_encoder_2
│   │   ├── config.json
│   │   ├── flax_model.msgpack
│   │   ├── model.fp16.safetensors
│   │   ├── model.onnx
│   │   ├── model.onnx_data
│   │   ├── model.safetensors
│   │   ├── openvino_model.bin
│   │   └── openvino_model.xml
│   ├── tokenizer
│   │   ├── merges.txt
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer_config.json
│   │   └── vocab.json
│   ├── tokenizer_2
│   │   ├── merges.txt
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer_config.json
│   │   └── vocab.json
│   ├── unet
│   │   ├── config.json
│   │   ├── diffusion_flax_model.msgpack
│   │   ├── diffusion_pytorch_model.fp16.safetensors
│   │   ├── diffusion_pytorch_model.safetensors
│   │   ├── model.onnx
│   │   ├── model.onnx_data
│   │   ├── openvino_model.bin
│   │   └── openvino_model.xml
│   ├── vae
│   │   ├── config.json
│   │   ├── diffusion_flax_model.msgpack
│   │   ├── diffusion_pytorch_model.fp16.safetensors
│   │   └── diffusion_pytorch_model.safetensors
│   ├── vae_1_0
│   │   ├── config.json
│   │   ├── diffusion_pytorch_model.fp16.safetensors
│   │   └── diffusion_pytorch_model.safetensors
│   ├── vae_decoder
│   │   ├── config.json
│   │   ├── model.onnx
│   │   ├── openvino_model.bin
│   │   └── openvino_model.xml
│   └── vae_encoder
│   ├── config.json
│   ├── model.onnx
│   ├── openvino_model.bin
│   └── openvino_model.xml
└── stable-diffusion-xl-refiner-1.0
├── 01.png
├── comparison.png
├── LICENSE.md
├── model_index.json
├── pipeline.png
├── README.md
├── scheduler
│   └── scheduler_config.json
├── sd_xl_refiner_1.0_0.9vae.safetensors
├── sd_xl_refiner_1.0.safetensors
├── text_encoder_2
│   ├── config.json
│   ├── model.fp16.safetensors
│   └── model.safetensors
├── tokenizer_2
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet
│   ├── config.json
│   ├── diffusion_pytorch_model.fp16.safetensors
│   └── diffusion_pytorch_model.safetensors
├── vae
│   ├── config.json
│   ├── diffusion_pytorch_model.fp16.safetensors
│   └── diffusion_pytorch_model.safetensors
└── vae_1_0
├── config.json
├── diffusion_pytorch_model.fp16.safetensors
└── diffusion_pytorch_model.safetensors
## 源码仓库及问题反馈
* https://developer.hpccube.com/codes/modelzoo/stable-diffusion-xl_pytorch
## 参考资料
* https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0
* https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
* https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/sdxl.md
icon.png

68.4 KB

import torch
import time
import os
from typing import Optional
from pathlib import Path
from diffusers.utils import load_image
from diffusers import AutoPipelineForText2Image, AutoPipelineForInpainting, \
StableDiffusionXLInpaintPipeline, AutoPipelineForImage2Image, \
StableDiffusionXLImg2ImgPipeline, DiffusionPipeline
def get_pipeline(base_path: Optional[str] = None,
refiner_path: Optional[str] = None,
mode: str = "t2i"):
# mode = Union[t2i, i2i, inpainting, t2i_wr, inpainting_wr]
if "t2i" in mode:
if "wr" in mode:
base_pipeline = DiffusionPipeline.from_pretrained(
base_path,
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
).to("cuda")
refiner_pipeline = DiffusionPipeline.from_pretrained(
refiner_path,
text_encoder_2=base_pipeline.text_encoder_2,
vae=base_pipeline.vae,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
).to("cuda")
pipelines = [base_pipeline, refiner_pipeline]
else:
base_pipeline = AutoPipelineForText2Image.from_pretrained(
base_path,
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
).to("cuda")
pipelines = [base_pipeline]
elif "i2i" in mode:
base_pipeline = AutoPipelineForImage2Image.from_pretrained(
base_path,
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
).to("cuda")
pipelines = [base_pipeline]
elif "inpainting" in mode:
if "wr" in mode:
base_pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
base_path,
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
).to("cuda")
refiner_pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
refiner_path,
text_encoder_2=base_pipeline.text_encoder_2,
vae=base_pipeline.vae,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16"
).to("cuda")
pipelines = [base_pipeline, refiner_pipeline]
else:
base_pipeline = AutoPipelineForInpainting.from_pretrained(
base_path,
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
).to("cuda")
pipelines = [base_pipeline]
else:
raise NotImplemented
return pipelines
def save_img(img,
save_root,
mode):
save_root = os.path.join(save_root, mode)
os.makedirs(save_root, exist_ok=True)
save_path = os.path.join(save_root, str(time.time())+".png")
img.save(save_path)
def t2i(pipelines: list,
prompt: str):
if len(pipelines) == 1: # only base
image = pipelines[0](prompt=prompt).images[0]
else:
image = pipelines[0](prompt=prompt,
num_inference_steps=40,
denoising_end=0.8,
output_type="latent").images
image = pipelines[1](prompt=prompt,
num_inference_steps=40,
denoising_start=0.8,
image=image).images[0]
return image
def i2i(pipelines: list,
prompt: str,
img_path: str):
if len(pipelines) == 1:
init_image = load_image(img_path)
image = pipelines[0](prompt=prompt,
image=init_image,
strength=0.8,
guidance_scale=10.5).images[0]
else:
raise NotImplementedError
return image
def inpainting(pipelines: list,
prompt: str,
img_path: str,
mask_path: str):
init_image = load_image(img_path)
mask_image = load_image(mask_path)
if len(pipelines) == 1:
image = pipelines[0](prompt=prompt,
image=init_image,
mask_image=mask_image,
strength=0.85,
guidance_scale=12.5).images[0]
else:
image = pipelines[0](prompt=prompt,
image=init_image,
mask_image=mask_image,
num_inference_steps=75,
denoising_end=0.7,
output_type="latent").images
image = pipelines[1](prompt=prompt,
image=image,
mask_image=mask_image,
num_inference_steps=75,
denoising_start=0.7).images[0]
return image
def inference(args):
pipelines = get_pipeline(args.base_path,
args.refiner_path,
args.mode)
if "t2i" in args.mode:
img = t2i(pipelines, args.prompt)
elif "i2i" in args.mode:
img = i2i(pipelines, args.prompt, args.img_path)
elif "inpainting" in args.mode:
img = inpainting(pipelines, args.prompt, args.img_path, args.mask_path)
save_img(img, args.save_root, args.mode)
if __name__ == "__main__":
from argparse import ArgumentParser
default_base_path = str(Path(__file__).resolve().parent / "pretrained_models" / "stable-diffusion-xl-base-1.0")
default_refiner_path = str(Path(__file__).resolve().parent / "pretrained_models" / "stable-diffusion-xl-refiner-1.0")
parser = ArgumentParser()
parser.add_argument("--base_path", default=default_base_path, type=str)
parser.add_argument("--refiner_path", default=default_refiner_path, type=str)
parser.add_argument("--prompt", default="a panda is playing a ball", type=str)
parser.add_argument("--img_path", type=str, default="")
parser.add_argument("--mask_path", type=str, default="")
parser.add_argument("--save_root", default="./results", type=str)
parser.add_argument("--mode", type=str, required=True)
args = parser.parse_args()
inference(args)
\ No newline at end of file
# 模型唯一标识
modelCode=794
# 模型名称
modelName=stable-diffusion-xl_pytorch
# 模型描述
modelDescription=SDXL可以生成高质量的图像。
# 应用场景
appScenario=推理,AIGC,零售,广媒,教育
# 框架类型
frameType=pytorch
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment