Commit bc281f4d authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Pipeline #2615 failed with stages
in 0 seconds
---
license: cc-by-nc-4.0
language:
- en
pipeline_tag: text-to-image
tags:
- Text-to-Image
- FLUX.1-dev
- image-generation
- Diffusion-Transformer
- subject-personalization
base_model: black-forest-labs/FLUX.1-dev
library_name: infinite-you
---
# InfiniteYou Model Card
<div style="display:flex;justify-content: center">
<a href="https://bytedance.github.io/InfiniteYou"><img src="https://img.shields.io/static/v1?label=Project&message=Page&color=blue&logo=github-pages"></a> &ensp;
<a href="https://arxiv.org/abs/2503.16418"><img src="https://img.shields.io/static/v1?label=Arxiv&message=InfiniteYou&color=darkred&logo=arxiv"></a> &ensp;
<a href="https://github.com/bytedance/InfiniteYou"><img src="https://img.shields.io/static/v1?label=GitHub&message=Code&color=green&logo=github"></a> &ensp;
<a href="https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Demo&color=orange"></a> &ensp;
</div>
![teaser](./assets/teaser.jpg)
This repository provides the official models for the following paper:
[**InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity**](https://arxiv.org/abs/2503.16418)<br />
[Liming Jiang](https://liming-jiang.com/),
[Qing Yan](https://scholar.google.com/citations?user=0TIYjPAAAAAJ),
[Yumin Jia](https://www.linkedin.com/in/yuminjia/),
[Zichuan Liu](https://scholar.google.com/citations?user=-H18WY8AAAAJ),
[Hao Kang](https://scholar.google.com/citations?user=VeTCSyEAAAAJ),
[Xin Lu](https://scholar.google.com/citations?user=mFC0wp8AAAAJ)<br />
ByteDance Intelligent Creation
> **Abstract:** Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce **InfiniteYou (InfU)**, one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.
## 🔧 Installation and Usage
Please clone our [GitHub code repository](https://github.com/bytedance/InfiniteYou) and follow the [detailed instructions](https://github.com/bytedance/InfiniteYou#-requirements-and-installation) to install and use the released models for local inference.
We appreciate the GPU grant from the Hugging Face team.
You can also try our [InfiniteYou-FLUX Hugging Face demo](https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX) online.
## 💡 Important Usage Tips
- We released two model variants of InfiniteYou-FLUX v1.0: [aes_stage2](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/aes_stage2) and [sim_stage1](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/sim_stage1). The `aes_stage2` is our model after stage-2 SFT, which is used by default for better text-image alignment and aesthetics. If you wish to achieve higher ID similarity, please try `sim_stage1`.
- To better fit specific personal needs, we find that two arguments are highly useful to adjust in our [code](https://github.com/bytedance/InfiniteYou): `--infusenet_conditioning_scale` (default: `1.0`) and `--infusenet_guidance_start` (default: `0.0`). Usually, you may NOT need to adjust them. If necessary, start by trying a slightly larger `--infusenet_guidance_start` (*e.g.*, `0.1`) only (especially helpful for `sim_stage1`). If still not satisfactory, then try a slightly smaller `--infusenet_conditioning_scale` (*e.g.*, `0.9`).
- We also provided two LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)) to enable additional usage flexibility. If needed, try `Realism` only first. They are *entirely optional*, which are examples to try but are NOT used in our paper.
- If the generated gender is not preferred, try adding specific words in the text prompt, such as 'a man', 'a woman', *etc*. We encourage using inclusive and respectful language.
## 🏰 Model Zoo
| InfiniteYou Version | Model Version | Base Model Trained with | Description |
| :---: | :---: | :---: | :---: |
| [InfiniteYou-FLUX v1.0](https://huggingface.co/ByteDance/InfiniteYou) | [aes_stage2](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/aes_stage2) | [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) | Stage-2 model after SFT. Better text-image alignment and aesthetics. |
| [InfiniteYou-FLUX v1.0](https://huggingface.co/ByteDance/InfiniteYou) | [sim_stage1](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/sim_stage1) | [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) | Stage-1 model before SFT. Higher identity similarity. |
## 🆚 Comparison with State-of-the-Art Relevant Methods
![comparative_results](./assets/comparative_results.jpg)
Qualitative comparison results of InfU with the state-of-the-art baselines, FLUX.1-dev IP-Adapter and PuLID-FLUX. The identity similarity and text-image alignment of the results generated by FLUX.1-dev IP-Adapter (IPA) are inadequate. PuLID-FLUX generates images with decent identity similarity. However, it suffers from poor text-image alignment (Columns 1, 2, 4), and the image quality (e.g., bad hands in Column 5) and aesthetic appeal are degraded. In addition, the face copy-paste issue of PuLID-FLUX is evident (Column 5). In comparison, the proposed InfU outperforms the baselines across all dimensions.
## ⚙️ Plug-and-Play Property with Off-the-Shelf Popular Approaches
![plug_and_play](./assets/plug_and_play.jpg)
InfU features a desirable plug-and-play design, compatible with many existing methods. It naturally supports base model replacement with any variants of FLUX.1-dev, such as FLUX.1-schnell for more efficient generation (e.g., in 4 steps). The compatibility with ControlNets and LoRAs provides more controllability and flexibility for customized tasks. Notably, the compatibility with OminiControl extends our potential for multi-concept personalization, such as interacted identity (ID) and object personalized generation. InfU is also compatible with IP-Adapter (IPA) for stylization of personalized images, producing decent results when injecting style references via IPA. Our plug-and-play feature may extend to even more approaches, providing valuable contributions to the broader community.
## 📜 Disclaimer and Licenses
The images used in this repository and related demos are sourced from consented subjects or generated by the models.
These pictures are intended solely to showcase the capabilities of our research. If you have any concerns, please feel free to contact us, and we will promptly remove any inappropriate content.
Our model is released under the [Creative Commons Attribution-NonCommercial 4.0 International Public License](./LICENSE) for academic research purposes only. Any manual or automatic downloading of the face models from [InsightFace](https://github.com/deepinsight/insightface), the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) base model, LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)), *etc.*, must follow their original licenses and be used only for academic research purposes.
This research aims to positively impact the field of Generative AI. Any usage of this method must be responsible and comply with local laws. The developers do not assume any responsibility for any potential misuse.
## 📖 Citation
If you find InfiniteYou useful for your research or applications, please cite our paper:
```bibtex
@article{jiang2025infiniteyou,
title={{InfiniteYou}: Flexible Photo Recrafting While Preserving Your Identity},
author={Jiang, Liming and Yan, Qing and Jia, Yumin and Liu, Zichuan and Kang, Hao and Lu, Xin},
journal={arXiv preprint},
volume={arXiv:2503.16418},
year={2025}
}
```
We also appreciate it if you could give a star ⭐ to our [Github repository](https://github.com/bytedance/InfiniteYou). Thanks a lot!
\ No newline at end of file
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
\ No newline at end of file
# InfiniteYou
在灵活变换场景和内容的同时,精准保留你的身份特征,不只是简单的换脸。
## 论文
`InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity`
- https://arxiv.org/pdf/2503.16418
## 模型结构
利用InfuseNet的残差连接将身份特征注入DiT基础模型,增强了身份相似性,同时保持生成能力。
<div align=center>
<img src="./doc/structure.png"/>
</div>
## 算法原理
InfuseNet:InfuseNet 是 InfiniteYou 的核心组件,类似于 ControlNet,将身份特征注入扩散模型(如 FLUX)。身份特征基于残差连接注入到扩散模型中,避免直接修改注意力层,减少对基础模型生成能力的负面影响。
预训练阶段:基于真实单人单样本(SPSS)数据进行预训练,学习身份图像的重建能力。
监督微调阶段:基于合成的单人多样本(SPMS)数据进行微调,提升文本与图像对齐、图像质量和美学效果。
扩散变换器(Diffusion Transformers):用先进的扩散变换器(如 FLUX)作为基础模型,模型在图像生成方面表现出色。扩散变换器支持生成高质量、高分辨率的图像,为身份保持图像生成提供了强大的基础。
<div align=center>
<img src="./doc/algorithm.png"/>
</div>
## 环境配置
```
mv InfiniteYou_pytorch InfiniteYou # 去框架名后缀
```
### Docker(方法一)
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
# <your IMAGE ID>为以上拉取的docker的镜像ID替换,本镜像为:e77c15729879
docker run -it --shm-size=64G -v $PWD/InfiniteYou:/home/InfiniteYou -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name iy <your IMAGE ID> bash
cd /home/InfiniteYou
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
```
### Dockerfile(方法二)
```
cd /home/InfiniteYou/docker
docker build --no-cache -t iy:latest .
docker run --shm-size=64G --name iy -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../InfiniteYou:/home/InfiniteYou -it iy bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt。
```
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
- https://developer.hpccube.com/tool/
```
DTK驱动:dtk2504
python:python3.10
torch:2.4.1
torchvision:0.19.1
triton:3.0.0
vllm:0.6.2
flash-attn:2.6.1
deepspeed:0.14.2
apex:1.4.0
transformers:4.48.0
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
2、其它非特殊库参照requirements.txt安装
```
cd /home/InfiniteYou
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
```
## 数据集
`无`
## 训练
## 推理
预训练权重目录结构:
```
/home/InfiniteYou
|── ByteDance/InfiniteYou
|── black-forest-labs/FLUX.1-dev
└── recognition_arcface_ir_se50.pth
mv recognition_arcface_ir_se50.pth /usr/local/lib/python3.10/dist-packages/facexlib/weights/ #将权重recognition_arcface_ir_se50放到facexlib库的weights目录下
```
### 单机多卡
```
cd /home/InfiniteYou
python test.py --id_image ./assets/examples/man.jpg --prompt "A man, portrait, cinematic" --out_results_dir ./results
```
更多资料可参考源项目中的[`README_origin`](./README_origin.md)
## result
`输入: `
```
./assets/examples/man.jpg
```
<div align=center>
<img src="./doc/input.png"/>
</div>
`输出:`
```
results/'00000_man_A man, portrait, cinematic_seed876627650.png'
```
<div align=center>
<img src="./doc/result.png"/>
</div>
### 精度
DCU与GPU精度一致,推理框架:pytorch。
## 应用场景
### 算法类别
`AIGC`
### 热点应用行业
`零售,制造,电商,医疗,教育`
## 预训练权重
预训练权重快速下载中心:[SCNet AIModels](https://www.scnet.cn/ui/aihub/models) ,项目中的预训练权重可从快速下载通道下载:[ByteDance/InfiniteYou](https://gitlab.scnet.cn:9002/model/sugon_scnet/InfiniteYou.git)[black-forest-labs/FLUX.1-dev](https://gitlab.scnet.cn:9002/model/icszy_zs_ai/FLUX.1-dev.git)
HF/github下载地址为:[ByteDance/InfiniteYou](https://huggingface.co/ByteDance/InfiniteYou)[black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)[facexlib-recognition_arcface_ir_se50](https://github.com/xinntao/facexlib/releases/download/v0.1.0/recognition_arcface_ir_se50.pth)
## 源码仓库及问题反馈
- http://developer.sourcefind.cn/codes/modelzoo/InfiniteYou_pytorch.git
## 参考资料
- https://github.com/bytedance/InfiniteYou.git
<div align="center">
## InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
[**Liming Jiang**](https://liming-jiang.com/)&nbsp;&nbsp;&nbsp;&nbsp;
[**Qing Yan**](https://scholar.google.com/citations?user=0TIYjPAAAAAJ)&nbsp;&nbsp;&nbsp;&nbsp;
[**Yumin Jia**](https://www.linkedin.com/in/yuminjia/)&nbsp;&nbsp;&nbsp;&nbsp;
[**Zichuan Liu**](https://scholar.google.com/citations?user=-H18WY8AAAAJ)&nbsp;&nbsp;&nbsp;&nbsp;
[**Hao Kang**](https://scholar.google.com/citations?user=VeTCSyEAAAAJ)&nbsp;&nbsp;&nbsp;&nbsp;
[**Xin Lu**](https://scholar.google.com/citations?user=mFC0wp8AAAAJ)<br />
ByteDance Intelligent Creation
<a href="https://bytedance.github.io/InfiniteYou"><img src="https://img.shields.io/static/v1?label=Project&message=Page&color=blue&logo=github-pages"></a> &ensp;
<a href="https://arxiv.org/abs/2503.16418"><img src="https://img.shields.io/static/v1?label=Arxiv&message=InfiniteYou&color=darkred&logo=arxiv"></a> &ensp;
<a href="https://arxiv.org/pdf/2503.16418"><img src="https://img.shields.io/static/v1?label=%F0%9F%93%96%20Paper&message=PDF&color=green"></a> &ensp;
<a href="https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Demo&color=orange"></a> &ensp;
</div>
![teaser](./assets/teaser.jpg)
> **Abstract:** *Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce **InfiniteYou (InfU)**, one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.*
## 🔥 News
- [03/2025] 🔥 The [code](https://github.com/bytedance/InfiniteYou), [model](https://huggingface.co/ByteDance/InfiniteYou), and [demo](https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX) of InfiniteYou-FLUX v1.0 are released.
- [03/2025] 🔥 The [project page](https://bytedance.github.io/InfiniteYou) of InfiniteYou is created.
- [03/2025] 🔥 The [paper](https://arxiv.org/abs/2503.16418) of InfiniteYou is released on arXiv.
## 💡 Important Usage Tips
- We released two model variants of InfiniteYou-FLUX v1.0: [aes_stage2](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/aes_stage2) and [sim_stage1](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/sim_stage1). The `aes_stage2` is our model after SFT, which is used by default for better text-image alignment and aesthetics. For higher ID similarity, please try `sim_stage1` (using `--model_version` to switch). More details can be found in our [paper](https://arxiv.org/abs/2503.16418).
- To better fit specific personal needs, we find that two arguments are highly useful to adjust: <br />`--infusenet_conditioning_scale` (default: `1.0`) and `--infusenet_guidance_start` (default: `0.0`). Usually, you may NOT need to adjust them. If necessary, start by trying a slightly larger <br />`--infusenet_guidance_start` (*e.g.*, `0.1`) only (especially helpful for `sim_stage1`). If still not satisfactory, then try a slightly smaller `--infusenet_conditioning_scale` (*e.g.*, `0.9`).
- We also provided two LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)) to enable additional usage flexibility. If needed, try `Realism` only first. They are *entirely optional*, which are examples to try but are NOT used in our paper.
- If the generated gender does not align with your preferences, try adding specific words in the text prompt, such as 'a man', 'a woman', *etc*. We encourage users to use inclusive and respectful language.
## :european_castle: Model Zoo
| InfiniteYou Version | Model Version | Base Model Trained with | Description |
| :---: | :---: | :---: | :---: |
| [InfiniteYou-FLUX v1.0](https://huggingface.co/ByteDance/InfiniteYou) | [aes_stage2](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/aes_stage2) | [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) | Stage-2 model after SFT. Better text-image alignment and aesthetics. |
| [InfiniteYou-FLUX v1.0](https://huggingface.co/ByteDance/InfiniteYou) | [sim_stage1](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/sim_stage1) | [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) | Stage-1 model before SFT. Higher identity similarity. |
## 🔧 Requirements and Installation
### Dependencies
Simply run this one-line command to install (feel free to create a `python3` virtual environment before you run):
```bash
pip install -r requirements.txt
```
### Memory Requirements
Please note that the current full-performance `bf16` model inference requires a **peak VRAM** of around **43GB**. **We are trying to reduce memory usage and will post an update soon.** Community contributions are welcome.
If you want to use our models ASAP but do not have a GPU with sufficient VRAM, please follow [Diffusers memory reduction tips](https://huggingface.co/docs/diffusers/en/optimization/memory) first, where some offloading strategies may be helpful.
## ⚡️ Quick Inference
### Local Inference Script
```bash
python test.py --id_image ./assets/examples/man.jpg --prompt "A man, portrait, cinematic" --out_results_dir ./results
```
<details>
<summary style='font-size:20px'><b><i>Explanation of all the arguments (click to expand!)</i></b></summary>
- Input and output:
- `--id_image (str)`: The path to the input identity (ID) image. Default: `./assets/examples/man.jpg`.
- `--prompt (str)`: The text prompt for image generation. Default: `A man, portrait, cinematic`.
- `--out_results_dir (str)`: The path to the output directory to save the generated results. Default: `./results`.
- `--control_image (str or None)`: The path to the control image \[*optional*\] to extract five facical keypoints to control the generation. Default: `None`.
- `--base_model_path (str)`: The huggingface or local path to the base model. Default: `black-forest-labs/FLUX.1-dev`.
- `--model_dir (str)`: The path to the InfiniteYou model directory. Default: `ByteDance/InfiniteYou`.
- Version control:
- `--infu_flux_version (str)`: InfiniteYou-FLUX version: currently only `v1.0` is supported. Default: `v1.0`.
- `--model_version (str)`: The model variant to use: `aes_stage2` | `sim_stage1`. Default: `aes_stage2`.
- General inference arguments:
- `--cuda_device (int)`: The cuda device ID to use. Default: `0`.
- `--seed (int)`: The seed for reproducibility (0 for random). Default: `0`.
- `--guideance_scale (float)`: The guidance scale for the diffusion process. Default: `3.5`.
- `--num_steps (int)`: The number of inference steps. Default: `30`.
- InfiniteYou-specific arguments:
- `--infusenet_conditioning_scale (float)`: The scale for the InfuseNet conditioning. Default: `1.0`.
- `--infusenet_guidance_start (float)`: The start point for the InfuseNet guidance injection. Default: `0.0`.
- `--infusenet_guidance_end (float)`: The end point for the InfuseNet guidance injection. Default: `1.0`.
- Optional LoRAs:
- `--enable_realism_lora (store_true)`: Whether to enable the Realism LoRA. Default: `False`.
- `--enable_anti_blur_lora (store_true)`: Whether to enable the Anti-blur LoRA. Default: `False`.
</details>
### Local Gradio Demo
```bash
python app.py
```
### Online Hugging Face Demo
We appreciate the GPU grant from the Hugging Face team.
You can also try our [InfiniteYou-FLUX Hugging Face demo](https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX) online.
## 🆚 Comparison with State-of-the-Art Relevant Methods
![comparative_results](./assets/comparative_results.jpg)
Qualitative comparison results of InfU with the state-of-the-art baselines, FLUX.1-dev IP-Adapter and PuLID-FLUX. The identity similarity and text-image alignment of the results generated by FLUX.1-dev IP-Adapter (IPA) are inadequate. PuLID-FLUX generates images with decent identity similarity. However, it suffers from poor text-image alignment (Columns 1, 2, 4), and the image quality (e.g., bad hands in Column 5) and aesthetic appeal are degraded. In addition, the face copy-paste issue of PuLID-FLUX is evident (Column 5). In comparison, the proposed InfU outperforms the baselines across all dimensions.
## ⚙️ Plug-and-Play Property with Off-the-Shelf Popular Approaches
![plug_and_play](./assets/plug_and_play.jpg)
InfU features a desirable plug-and-play design, compatible with many existing methods. It naturally supports base model replacement with any variants of FLUX.1-dev, such as FLUX.1-schnell for more efficient generation (e.g., in 4 steps). The compatibility with ControlNets and LoRAs provides more controllability and flexibility for customized tasks. Notably, the compatibility with OminiControl extends our potential for multi-concept personalization, such as interacted identity (ID) and object personalized generation. InfU is also compatible with IP-Adapter (IPA) for stylization of personalized images, producing decent results when injecting style references via IPA. Our plug-and-play feature may extend to even more approaches, providing valuable contributions to the broader community.
## 📜 Disclaimer and Licenses
The images used in this repository and related demos are sourced from consented subjects or generated by the models. These pictures are intended solely to showcase the capabilities of our research. If you have any concerns, please feel free to contact us, and we will promptly remove any inappropriate content.
The use of the released code, model, and demo must strictly adhere to the respective licenses. Our code is released under the [Apache 2.0 License](./LICENSE), and our model is released under the [Creative Commons Attribution-NonCommercial 4.0 International Public License](https://huggingface.co/ByteDance/InfiniteYou/blob/main/LICENSE) for academic research purposes only. Any manual or automatic downloading of the face models from [InsightFace](https://github.com/deepinsight/insightface), the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) base model, LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)), *etc.*, must follow their original licenses and be used only for academic research purposes.
This research aims to positively impact the field of Generative AI. Any usage of this method must be responsible and comply with local laws. The developers do not assume any responsibility for any potential misuse.
## 🤗 Acknowledgments
We sincerely acknowledge the insightful discussions from Stathi Fotiadis, Min Jin Chong, Xiao Yang, Tiancheng Zhi, Jing Liu, and Xiaohui Shen. We genuinely appreciate the help from Jincheng Liang and Lu Guo with our user study and qualitative evaluation.
## 📖 Citation
If you find InfiniteYou useful for your research or applications, please cite our paper:
```bibtex
@article{jiang2025infiniteyou,
title={{InfiniteYou}: Flexible Photo Recrafting While Preserving Your Identity},
author={Jiang, Liming and Yan, Qing and Jia, Yumin and Liu, Zichuan and Kang, Hao and Lu, Xin},
journal={arXiv preprint},
volume={arXiv:2503.16418},
year={2025}
}
```
We also appreciate it if you could give a star :star: to this repository. Thanks a lot!
# Copyright (c) 2025 Bytedance Ltd. and/or its affiliates. All rights reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import gradio as gr
import pillow_avif
import torch
from huggingface_hub import snapshot_download
from pillow_heif import register_heif_opener
from pipelines.pipeline_infu_flux import InfUFluxPipeline
# Register HEIF support for Pillow
register_heif_opener()
class ModelVersion:
STAGE_1 = "sim_stage1"
STAGE_2 = "aes_stage2"
DEFAULT_VERSION = STAGE_2
ENABLE_ANTI_BLUR_DEFAULT = False
ENABLE_REALISM_DEFAULT = False
pipeline = None
loaded_pipeline_config = {
"model_version": "aes_stage2",
"enable_realism": False,
"enable_anti_blur": False,
}
def download_models():
snapshot_download(repo_id='ByteDance/InfiniteYou', local_dir='./models/InfiniteYou', local_dir_use_symlinks=False)
try:
snapshot_download(repo_id='black-forest-labs/FLUX.1-dev', local_dir='./models/FLUX.1-dev', local_dir_use_symlinks=False)
except Exception as e:
print(e)
print('\nYou are downloading `black-forest-labs/FLUX.1-dev` to `./models/FLUX.1-dev` but failed. '
'Please accept the agreement and obtain access at https://huggingface.co/black-forest-labs/FLUX.1-dev. '
'Then, use `huggingface-cli login` and your access tokens at https://huggingface.co/settings/tokens to authenticate. '
'After that, run the code again.')
print('\nYou can also download it manually from HuggingFace and put it in `./models/InfiniteYou`, '
'or you can modify `base_model_path` in `app.py` to specify the correct path.')
exit()
def prepare_pipeline(model_version, enable_realism, enable_anti_blur):
global pipeline
if (
pipeline
and loaded_pipeline_config["enable_realism"] == enable_realism
and loaded_pipeline_config["enable_anti_blur"] == enable_anti_blur
and model_version == loaded_pipeline_config["model_version"]
):
return
loaded_pipeline_config["enable_realism"] = enable_realism
loaded_pipeline_config["enable_anti_blur"] = enable_anti_blur
loaded_pipeline_config["model_version"] = model_version
if pipeline is None or pipeline.model_version != model_version:
del pipeline
model_path = f'./models/InfiniteYou/infu_flux_v1.0/{model_version}'
print(f'loading model from {model_path}')
pipeline = InfUFluxPipeline(
base_model_path='./models/FLUX.1-dev',
infu_model_path=model_path,
insightface_root_path='./models/InfiniteYou/supports/insightface',
image_proj_num_tokens=8,
infu_flux_version='v1.0',
model_version=model_version,
)
pipeline.pipe.delete_adapters(['realism', 'anti_blur'])
loras = []
if enable_realism:
loras.append(['./models/InfiniteYou/supports/optional_loras/flux_realism_lora.safetensors', 'realism', 1.0])
if enable_anti_blur:
loras.append(['./models/InfiniteYou/supports/optional_loras/flux_anti_blur_lora.safetensors', 'anti_blur', 1.0])
pipeline.load_loras(loras)
def generate_image(
input_image,
control_image,
prompt,
seed,
width,
height,
guidance_scale,
num_steps,
infusenet_conditioning_scale,
infusenet_guidance_start,
infusenet_guidance_end,
enable_realism,
enable_anti_blur,
model_version
):
global pipeline
prepare_pipeline(model_version=model_version, enable_realism=enable_realism, enable_anti_blur=enable_anti_blur)
if seed == 0:
seed = torch.seed() & 0xFFFFFFFF
try:
image = pipeline(
id_image=input_image,
prompt=prompt,
control_image=control_image,
seed=seed,
width=width,
height=height,
guidance_scale=guidance_scale,
num_steps=num_steps,
infusenet_conditioning_scale=infusenet_conditioning_scale,
infusenet_guidance_start=infusenet_guidance_start,
infusenet_guidance_end=infusenet_guidance_end,
)
except Exception as e:
print(e)
gr.Error(f"An error occurred: {e}")
return gr.update()
return gr.update(value = image, label=f"Generated Image, seed = {seed}")
def generate_examples(id_image, control_image, prompt_text, seed, enable_realism, enable_anti_blur, model_version):
return generate_image(id_image, control_image, prompt_text, seed, 864, 1152, 3.5, 30, 1.0, 0.0, 1.0, enable_realism, enable_anti_blur, model_version)
sample_list = [
['./assets/examples/man.jpg', None, 'A sophisticated gentleman exuding confidence. He is dressed in a 1990s brown plaid jacket with a high collar, paired with a dark grey turtleneck. His trousers are tailored and charcoal in color, complemented by a sleek leather belt. The background showcases an elegant library with bookshelves, a marble fireplace, and warm lighting, creating a refined and cozy atmosphere. His relaxed posture and casual hand-in-pocket stance add to his composed and stylish demeanor', 666, False, False, 'aes_stage2'],
['./assets/examples/man.jpg', './assets/examples/man_pose.jpg', 'A man, portrait, cinematic', 42, True, False, 'aes_stage2'],
['./assets/examples/man.jpg', None, 'A man, portrait, cinematic', 12345, False, False, 'sim_stage1'],
['./assets/examples/woman.jpg', './assets/examples/woman.jpg', 'A woman, portrait, cinematic', 1621695706, False, False, 'sim_stage1'],
['./assets/examples/woman.jpg', None, 'A young woman holding a sign with the text "InfiniteYou", "Infinite" in black and "You" in red, pure background', 3724009365, False, False, 'aes_stage2'],
['./assets/examples/woman.jpg', None, 'A photo of an elegant Javanese bride in traditional attire, with long hair styled into intricate a braid made of many fresh flowers, wearing a delicate headdress made from sequins and beads. She\'s holding flowers, light smiling at the camera, against a backdrop adorned with orchid blooms. The scene captures her grace as she stands amidst soft pastel colors, adding to its dreamy atmosphere', 42, True, False, 'aes_stage2'],
['./assets/examples/woman.jpg', None, 'A photo of an elegant Javanese bride in traditional attire, with long hair styled into intricate a braid made of many fresh flowers, wearing a delicate headdress made from sequins and beads. She\'s holding flowers, light smiling at the camera, against a backdrop adorned with orchid blooms. The scene captures her grace as she stands amidst soft pastel colors, adding to its dreamy atmosphere', 42, False, False, 'sim_stage1'],
]
with gr.Blocks() as demo:
session_state = gr.State({})
default_model_version = "v1.0"
gr.HTML("""
<div style="text-align: center; max-width: 900px; margin: 0 auto;">
<h1 style="font-size: 1.5rem; font-weight: 700; display: block;">InfiniteYou-FLUX</h1>
<h2 style="font-size: 1.2rem; font-weight: 300; margin-bottom: 1rem; display: block;">Official Gradio Demo for <a href="https://arxiv.org/abs/2503.16418">InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity</a></h2>
<a href="https://bytedance.github.io/InfiniteYou">[Project Page]</a>&ensp;
<a href="https://arxiv.org/abs/2503.16418">[Paper]</a>&ensp;
<a href="https://github.com/bytedance/InfiniteYou">[Code]</a>&ensp;
<a href="https://huggingface.co/ByteDance/InfiniteYou">[Model]</a>
</div>
""")
gr.Markdown("""
### 💡 How to Use This Demo:
1. **Upload an identity (ID) image containing a human face.** For multiple faces, only the largest face will be detected. The face should ideally be clear and large enough, without significant occlusions or blur.
2. **Enter the text prompt to describe the generated image and select the model version.** Please refer to **important usage tips** under the Generated Image field.
3. *[Optional] Upload a control image containing a human face.* Only five facial keypoints will be extracted to control the generation. If not provided, we use a black control image, indicating no control.
4. *[Optional] Adjust advanced hyperparameters or apply optional LoRAs to meet personal needs.* Please refer to **important usage tips** under the Generated Image field.
5. **Click the "Generate" button to generate an image.** Enjoy!
""")
with gr.Row():
with gr.Column(scale=3):
with gr.Row():
ui_id_image = gr.Image(label="Identity Image", type="pil", scale=3, height=370, min_width=100)
with gr.Column(scale=2, min_width=100):
ui_control_image = gr.Image(label="Control Image [Optional]", type="pil", height=370, min_width=100)
ui_prompt_text = gr.Textbox(label="Prompt", value="Portrait, 4K, high quality, cinematic")
ui_model_version = gr.Dropdown(
label="Model Version",
choices=[ModelVersion.STAGE_1, ModelVersion.STAGE_2],
value=ModelVersion.DEFAULT_VERSION,
)
ui_btn_generate = gr.Button("Generate")
with gr.Accordion("Advanced", open=False):
with gr.Row():
ui_num_steps = gr.Number(label="num steps", value=30)
ui_seed = gr.Number(label="seed (0 for random)", value=0)
with gr.Row():
ui_width = gr.Number(label="width", value=864)
ui_height = gr.Number(label="height", value=1152)
ui_guidance_scale = gr.Number(label="guidance scale", value=3.5, step=0.5)
ui_infusenet_conditioning_scale = gr.Slider(minimum=0.0, maximum=1.0, value=1.0, step=0.05, label="infusenet conditioning scale")
with gr.Row():
ui_infusenet_guidance_start = gr.Slider(minimum=0.0, maximum=1.0, value=0.0, step=0.05, label="infusenet guidance start")
ui_infusenet_guidance_end = gr.Slider(minimum=0.0, maximum=1.0, value=1.0, step=0.05, label="infusenet guidance end")
with gr.Accordion("LoRAs [Optional]", open=True):
with gr.Row():
ui_enable_realism = gr.Checkbox(label="Enable realism LoRA", value=ENABLE_REALISM_DEFAULT)
ui_enable_anti_blur = gr.Checkbox(label="Enable anti-blur LoRA", value=ENABLE_ANTI_BLUR_DEFAULT)
with gr.Column(scale=2):
image_output = gr.Image(label="Generated Image", interactive=False, height=550, format='png')
gr.Markdown(
"""
### ❗️ Important Usage Tips:
- **Model Version**: `aes_stage2` is used by default for better text-image alignment and aesthetics. For higher ID similarity, try `sim_stage1`.
- **Useful Hyperparameters**: Usually, there is NO need to adjust too much. If necessary, try a slightly larger `--infusenet_guidance_start` (*e.g.*, `0.1`) only (especially helpful for `sim_stage1`). If still not satisfactory, then try a slightly smaller `--infusenet_conditioning_scale` (*e.g.*, `0.9`).
- **Optional LoRAs**: `realism` and `anti-blur`. To enable them, please check the corresponding boxes. If needed, try `realism` only first. They are optional and were NOT used in our paper.
- **Gender Prompt**: If the generated gender is not preferred, add specific words in the prompt, such as 'a man', 'a woman', *etc*. We encourage using inclusive and respectful language.
"""
)
gr.Examples(
sample_list,
inputs=[ui_id_image, ui_control_image, ui_prompt_text, ui_seed, ui_enable_realism, ui_enable_anti_blur, ui_model_version],
outputs=[image_output],
fn=generate_examples,
cache_examples=True,
)
ui_btn_generate.click(
generate_image,
inputs=[
ui_id_image,
ui_control_image,
ui_prompt_text,
ui_seed,
ui_width,
ui_height,
ui_guidance_scale,
ui_num_steps,
ui_infusenet_conditioning_scale,
ui_infusenet_guidance_start,
ui_infusenet_guidance_end,
ui_enable_realism,
ui_enable_anti_blur,
ui_model_version
],
outputs=[image_output],
concurrency_id="gpu"
)
with gr.Accordion("Local Gradio Demo for Developers", open=False):
gr.Markdown(
'Please refer to our GitHub repository to [run the InfiniteYou-FLUX gradio demo locally](https://github.com/bytedance/InfiniteYou#local-gradio-demo).'
)
gr.Markdown(
"""
---
### 📜 Disclaimer and Licenses
The images used in this demo are sourced from consented subjects or generated by the models. These pictures are intended solely to show the capabilities of our research. If you have any concerns, please contact us, and we will promptly remove any inappropriate content.
The use of the released code, model, and demo must strictly adhere to the respective licenses.
Our code is released under the [Apache 2.0 License](https://github.com/bytedance/InfiniteYou/blob/main/LICENSE),
and our model is released under the [Creative Commons Attribution-NonCommercial 4.0 International Public License](https://huggingface.co/ByteDance/InfiniteYou/blob/main/LICENSE)
for academic research purposes only. Any manual or automatic downloading of the face models from [InsightFace](https://github.com/deepinsight/insightface),
the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) base model, LoRAs, *etc.*, must follow their original licenses and be used only for academic research purposes.
This research aims to positively impact the field of Generative AI. Any usage of this method must be responsible and comply with local laws. The developers do not assume any responsibility for any potential misuse.
"""
)
gr.Markdown(
"""
### 📖 Citation
If you find InfiniteYou useful for your research or applications, please cite our paper:
```bibtex
@article{jiang2025infiniteyou,
title={{InfiniteYou}: Flexible Photo Recrafting While Preserving Your Identity},
author={Jiang, Liming and Yan, Qing and Jia, Yumin and Liu, Zichuan and Kang, Hao and Lu, Xin},
journal={arXiv preprint},
volume={arXiv:2503.16418},
year={2025}
}
```
We also appreciate it if you could give a star ⭐ to our [Github repository](https://github.com/bytedance/InfiniteYou). Thanks a lot!
"""
)
download_models()
prepare_pipeline(model_version=ModelVersion.DEFAULT_VERSION, enable_realism=ENABLE_REALISM_DEFAULT, enable_anti_blur=ENABLE_ANTI_BLUR_DEFAULT)
demo.queue()
demo.launch(server_name='0.0.0.0') # IPv4
# demo.launch(server_name='[::]') # IPv6
---
language:
- en
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md
extra_gated_prompt: By clicking "Agree", you agree to the [FluxDev Non-Commercial License Agreement](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
and acknowledge the [Acceptable Use Policy](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/POLICY.md).
tags:
- text-to-image
- image-generation
- flux
---
![FLUX.1 [dev] Grid](./dev_grid.jpg)
`FLUX.1 [dev]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
For more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/).
# Key Features
1. Cutting-edge output quality, second only to our state-of-the-art model `FLUX.1 [pro]`.
2. Competitive prompt following, matching the performance of closed source alternatives .
3. Trained using guidance distillation, making `FLUX.1 [dev]` more efficient.
4. Open weights to drive new scientific research, and empower artists to develop innovative workflows.
5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the [flux-1-dev-non-commercial-license](./licence.md).
# Usage
We provide a reference implementation of `FLUX.1 [dev]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux).
Developers and creatives looking to build on top of `FLUX.1 [dev]` are encouraged to use this as a starting point.
## API Endpoints
The FLUX.1 models are also available via API from the following sources
1. [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`)
2. [replicate.com](https://replicate.com/collections/flux)
3. [fal.ai](https://fal.ai/models/fal-ai/flux/dev)
## ComfyUI
`FLUX.1 [dev]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow.
## Diffusers
To use `FLUX.1 [dev]` with the 🧨 diffusers python library, first install or upgrade diffusers
```shell
pip install -U diffusers
```
Then you can use `FluxPipeline` to run the model
```python
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")
```
To learn more check out the [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) documentation
---
# Limitations
- This model is not intended or able to provide factual information.
- As a statistical model this checkpoint might amplify existing societal biases.
- The model may fail to generate output that matches the prompts.
- Prompt following is heavily influenced by the prompting-style.
# Out-of-Scope Use
The model and its derivatives may not be used
- In any way that violates any applicable national, federal, state, local or international law or regulation.
- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content.
- To generate or disseminate verifiably false information and/or content with the purpose of harming others.
- To generate or disseminate personal identifiable information that can be used to harm an individual.
- To harass, abuse, threaten, stalk, or bully individuals or groups of individuals.
- To create non-consensual nudity or illegal pornographic content.
- For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation.
- Generating or facilitating large-scale disinformation campaigns.
# License
This model falls under the [`FLUX.1 [dev]` Non-Commercial License](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md).
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
ENV DEBIAN_FRONTEND=noninteractive
# RUN yum update && yum install -y git cmake wget build-essential
# RUN source /opt/dtk-dtk25.04/env.sh
# # 安装pip相关依赖
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
accelerate==1.0.1
diffusers==0.31.0
facexlib==0.3.0
gradio==5.21.0
httpcore==1.0.7
httpx==0.28.1
huggingface-hub==0.28.1
insightface==0.7.3
numpy==1.26.4
onnxruntime==1.19.2
opencv-python==4.11.0.86
pillow==10.4.0
pillow-avif-plugin==1.5.0
pillow-heif==0.21.0
sentencepiece==0.2.0
# torch==2.2.1
# torchvision==0.17.1
transformers==4.48.0
peft==0.14.0
icon.png

68.4 KB

# 模型编码
modelCode=1482
# 模型名称
modelName=InfiniteYou_pytorch
# 模型描述
modelDescription=在灵活变换场景和内容的同时,精准保留你的身份特征,不只是简单的换脸。
# 应用场景
appScenario=推理,AIGC,零售,制造,电商,医疗,教育
# 框架类型
frameType=pytorch
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment