Commit 426c3736 authored by lijian6's avatar lijian6
Browse files

Update README.md


Signed-off-by: lijian6's avatarlijian <lijian6@sugon.com>
parent 264b1b7a
---
license: other
license_name: stabilityai-nc-research-community
license_link: LICENSE
tags:
- text-to-image
- stable-diffusion
- diffusers
extra_gated_prompt: >-
By clicking "Agree", you agree to the [License
Agreement](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE)
and acknowledge Stability AI's [Privacy
Policy](https://stability.ai/privacy-policy).
extra_gated_fields:
Name: text
Email: text
Country: country
Organization or Affiliation: text
Receive email updates and promotions on Stability AI products, services, and research?:
type: select
options:
- 'Yes'
- 'No'
I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Stability AI: checkbox
language:
- en
pipeline_tag: text-to-image
---
# Stable Diffusion 3 Medium # Stable Diffusion 3 Medium
![sd3 demo images](sd3demo.jpg) ## 论文
## Model `Scaling Rectified Flow Transformers for High-Resolution Image Synthesis`
![mmdit](mmdit.png) https://arxiv.org/abs/2403.03206
[Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. ## 模型结构
For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper). Stable Diffusion 3 Medium 是一种多模态扩散转换器(MMDiT)文本到图像模型,在图像质量、排版、复杂提示理解和资源效率方面具有显着改进的性能。
Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or [contact us](https://stability.ai/license) for commercial licensing details. 本项目主要针对Stable Diffusion 3 Medium在DCU平台的推理性能优化,达到DCU平台较快的对话效果。
![img](docs/mmdit.png)
### Model Description ## 算法原理
- **Developed by:** Stability AI SD3 以序列 Embedding 的形式处理文本输入和视觉隐空间特征。位置编码 (Positional Encoding) 是施加在隐空间特征的 2x2 patch 上的,随后被展开成 patch 的 Enbedding 序列。这一序列和文本的特征序列一起,被送入 MMDiT 的各个模块中去。两种特征序列被转化成相同特征维度,拼接在一起,然后送入一系列注意力机制模块和多层感知机 (MLP) 里。
- **Model type:** MMDiT text-to-image generative model
- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer
(https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
### License 为应对两种模态间的差异,MMDiT 模块使用两组不同的权重去转换文本和图像序列的特征维度。两个序列之后会在注意力操作之前被合并在一起。这种设计使得两种表征能在自己的特征空间里工作,同时也使得它们之间可以通过注意力机制 [1] 从对方的特征中提取有用的信息。这种文本和图像间双向的信息流动有别于以前的文生图模型,后者的文本信息是通过 cross-attention 送入模型的,且不同层输入的文本特征均是文本编码器的输出,不随深度的变化而改变。
- **Non-commercial Use:** Stable Diffusion 3 Medium is released under the [Stability AI Non-Commercial Research Community License](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE). The model is free to use for non-commercial purposes such as academic research. ## 环境配置
- **Commercial Use**: This model is not available for commercial use without a separate commercial license from Stability. We encourage professional artists, designers, and creators to use our Creator License. Please visit https://stability.ai/license to learn more. 提供[光源](https://www.sourcefind.cn/#/service-details)拉取推理的docker镜像:
### Model Sources
For local or self-hosted use, we recommend [ComfyUI](https://github.com/comfyanonymous/ComfyUI) for inference.
Stable Diffusion 3 Medium is available on our [Stability API Platform](https://platform.stability.ai/docs/api-reference#tag/Generate/paths/~1v2beta~1stable-image~1generate~1sd3/post).
Stable Diffusion 3 models and workflows are available on [Stable Assistant](https://stability.ai/stable-assistant) and on Discord via [Stable Artisan](https://stability.ai/stable-artisan).
- **ComfyUI:** https://github.com/comfyanonymous/ComfyUI
- **StableSwarmUI:** https://github.com/Stability-AI/StableSwarmUI
- **Tech report:** https://stability.ai/news/stable-diffusion-3-research-paper
- **Demo:** https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium
- **Diffusers support:** https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers
## Training Dataset
We used synthetic data and filtered publicly available data to train our models. The model was pre-trained on 1 billion images. The fine-tuning data includes 30M high-quality aesthetic images focused on specific visual content and style, as well as 3M preference data images.
## File Structure
``` ```
├── comfy_example_workflows/ docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:stablediffusion_v2-1_dtk24.04_xformers0.0.25_py310
│ ├── sd3_medium_example_workflow_basic.json # <Image ID>用上面拉取docker镜像的ID替换
│ ├── sd3_medium_example_workflow_multi_prompt.json # <Host Path>主机端路径
│ └── sd3_medium_example_workflow_upscaling.json # <Container Path>容器映射路径
docker run -it --name sd3 --shm-size=1024G --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
├── text_encoders/
│ ├── README.md
│ ├── clip_g.safetensors
│ ├── clip_l.safetensors
│ ├── t5xxl_fp16.safetensors
│ └── t5xxl_fp8_e4m3fn.safetensors
├── LICENSE
├── sd3_medium.safetensors
├── sd3_medium_incl_clips.safetensors
├── sd3_medium_incl_clips_t5xxlfp8.safetensors
└── sd3_medium_incl_clips_t5xxlfp16.safetensors
``` ```
镜像版本依赖:
* DTK驱动:dtk24.04
* Pytorch: 2.1.0
* python: python3.10
We have prepared three packaging variants of the SD3 Medium model, each equipped with the same set of MMDiT & VAE weights, for user convenience. ## 数据集
* `sd3_medium.safetensors` includes the MMDiT and VAE weights but does not include any text encoders. ## 推理
* `sd3_medium_incl_clips_t5xxlfp16.safetensors` contains all necessary weights, including fp16 version of the T5XXL text encoder. ### 安装diffuser和依赖
* `sd3_medium_incl_clips_t5xxlfp8.safetensors` contains all necessary weights, including fp8 version of the T5XXL text encoder, offering a balance between quality and resource requirements.
* `sd3_medium_incl_clips.safetensors` includes all necessary weights except for the T5XXL text encoder. It requires minimal resources, but the model's performance will differ without the T5XXL text encoder.
* The `text_encoders` folder contains three text encoders and their original model card links for user convenience. All components within the text_encoders folder (and their equivalents embedded in other packings) are subject to their respective original licenses.
* The `example_workfows` folder contains example comfy workflows.
## Using with Diffusers
Make sure you upgrade to the latest version of diffusers: pip install -U diffusers. And then you can run:
```python
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
image = pipe(
"A cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=28,
guidance_scale=7.0,
).images[0]
image
``` ```
git clone http://developer.hpccube.com/codes/modelzoo/stable-diffusion-3-medium_diffusers.git
cd stable-diffusion-3-medium_diffusers
git submodule init && git submodule update
cd diffusers
python3 setup.py install
cd ..
Refer to [the documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) for more details on optimization and image-to-image support. ```
## Uses
### Intended Uses
Intended uses include the following:
* Generation of artworks and use in design and other artistic processes.
* Applications in educational or creative tools.
* Research on generative models, including understanding the limitations of generative models.
All uses of the model should be in accordance with our [Acceptable Use Policy](https://stability.ai/use-policy).
### Out-of-Scope Uses ### 模型下载
The model was not trained to be factual or true representations of people or events. As such, using the model to generate such content is out-of-scope of the abilities of this model. [stable-diffusion-3-medium-diffusers](https://modelscope.cn/models/AI-ModelScope/stable-diffusion-3-medium-diffusers)
## Safety ### 运行 stable-diffusion-3-medium
As part of our safety-by-design and responsible AI deployment approach, we implement safety measures throughout the development of our models, from the time we begin pre-training a model to the ongoing development, fine-tuning, and deployment of each model. We have implemented a number of safety mitigations that are intended to reduce the risk of severe harms, however we recommend that developers conduct their own testing and apply additional mitigations based on their specific use cases. ```
For more about our approach to Safety, please visit our [Safety page](https://stability.ai/safety). python test_diffusers.py
```
### Evaluation Approach ## result
![img](./doc/result.png)
Our evaluation methods include structured evaluations and internal and external red-teaming testing for specific, severe harms such as child sexual abuse and exploitation, extreme violence, and gore, sexually explicit content, and non-consensual nudity. Testing was conducted primarily in English and may not cover all possible harms. As with any model, the model may, at times, produce inaccurate, biased or objectionable responses to user prompts. ### 精度
### Risks identified and mitigations: ## 应用场景
### 算法类别
`以文生图`
* Harmful content: We have used filtered data sets when training our models and implemented safeguards that attempt to strike the right balance between usefulness and preventing harm. However, this does not guarantee that all possible harmful content has been removed. The model may, at times, generate toxic or biased content. All developers and deployers should exercise caution and implement content safety guardrails based on their specific product policies and application use cases. ### 热点应用行业
* Misuse: Technical limitations and developer and end-user education can help mitigate against malicious applications of models. All users are required to adhere to our Acceptable Use Policy, including when applying fine-tuning and prompt engineering mechanisms. Please reference the Stability AI Acceptable Use Policy for information on violative uses of our products. `绘画,动漫,媒体`
* Privacy violations: Developers and deployers are encouraged to adhere to privacy regulations with techniques that respect data privacy.
### Contact ## 源码仓库及问题反馈
http://developer.hpccube.com/codes/modelzoo/stable-diffusion-3-medium_diffusers.git
Please report any issues with the model or contact us: ## 参考资料
https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers
* Safety issues: safety@stability.ai
* Security issues: security@stability.ai
* Privacy issues: privacy@stability.ai
* License and general: https://stability.ai/license
* Enterprise license: https://stability.ai/enterprise
\ No newline at end of file
# 模型唯一标识
modelCode = 724
# 模型名称
modelName=stable-diffusion-3-medium_diffusers
# 模型描述
modelDescription=stable-diffusion-3-mediums是基于Diffusion Transformer的多模态文生图模型
# 应用场景
appScenario=推理,以文生图,绘画,动漫,媒体
# 框架类型
frameType=diffusers
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment