v1.0

bc281f4d · chenzk · bc281f4d · bc281f4d · bc281f4d · bc281f4d
Commit bc281f4d authored Apr 07, 2025 by chenzk
20 changed files
--- a/ByteDance/InfiniteYou/README.md
+++ b/ByteDance/InfiniteYou/README.md
+---
+license: cc-by-nc-4.0
+language:
+- en
+pipeline_tag: text-to-image
+tags:
+- Text-to-Image
+- FLUX.1-dev
+- image-generation
+- Diffusion-Transformer
+- subject-personalization
+base_model: black-forest-labs/FLUX.1-dev
+library_name: infinite-you
+---
+# InfiniteYou Model Card
+<div style="display:flex;justify-content: center">
+<a href="https://bytedance.github.io/InfiniteYou"><img src="https://img.shields.io/static/v1?label=Project&message=Page&color=blue&logo=github-pages"></a> &ensp;
+<a href="https://arxiv.org/abs/2503.16418"><img src="https://img.shields.io/static/v1?label=Arxiv&message=InfiniteYou&color=darkred&logo=arxiv"></a> &ensp;
+<a href="https://github.com/bytedance/InfiniteYou"><img src="https://img.shields.io/static/v1?label=GitHub&message=Code&color=green&logo=github"></a> &ensp;
+<a href="https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Demo&color=orange"></a> &ensp;
+</div>
+![teaser](./assets/teaser.jpg)
+This repository provides the official models for the following paper:
+[**InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity**](https://arxiv.org/abs/2503.16418)<br />
+[Liming Jiang](https://liming-jiang.com/), 
+[Qing Yan](https://scholar.google.com/citations?user=0TIYjPAAAAAJ), 
+[Yumin Jia](https://www.linkedin.com/in/yuminjia/), 
+[Zichuan Liu](https://scholar.google.com/citations?user=-H18WY8AAAAJ), 
+[Hao Kang](https://scholar.google.com/citations?user=VeTCSyEAAAAJ), 
+[Xin Lu](https://scholar.google.com/citations?user=mFC0wp8AAAAJ)<br />
+ByteDance Intelligent Creation
+> **Abstract:** Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce **InfiniteYou (InfU)**, one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.
+## 🔧 Installation and Usage
+Please clone our [GitHub code repository](https://github.com/bytedance/InfiniteYou) and follow the [detailed instructions](https://github.com/bytedance/InfiniteYou#-requirements-and-installation) to install and use the released models for local inference.
+We appreciate the GPU grant from the Hugging Face team. 
+You can also try our [InfiniteYou-FLUX Hugging Face demo](https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX) online.
+## 💡 Important Usage Tips
+- We released two model variants of InfiniteYou-FLUX v1.0: [aes_stage2](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/aes_stage2) and [sim_stage1](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/sim_stage1). The `aes_stage2` is our model after stage-2 SFT, which is used by default for better text-image alignment and aesthetics. If you wish to achieve higher ID similarity, please try `sim_stage1`.
+- To better fit specific personal needs, we find that two arguments are highly useful to adjust in our [code](https://github.com/bytedance/InfiniteYou): `--infusenet_conditioning_scale` (default: `1.0`) and `--infusenet_guidance_start` (default: `0.0`). Usually, you may NOT need to adjust them. If necessary, start by trying a slightly larger `--infusenet_guidance_start` (*e.g.*, `0.1`) only (especially helpful for `sim_stage1`). If still not satisfactory, then try a slightly smaller `--infusenet_conditioning_scale` (*e.g.*, `0.9`).
+- We also provided two LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)) to enable additional usage flexibility. If needed, try `Realism` only first.  They are *entirely optional*, which are examples to try but are NOT used in our paper.
+- If the generated gender is not preferred, try adding specific words in the text prompt, such as 'a man', 'a woman', *etc*. We encourage using inclusive and respectful language.
+## 🏰 Model Zoo
+| InfiniteYou Version | Model Version | Base Model Trained with | Description |  
+| :---: | :---: | :---: | :---: |
+| [InfiniteYou-FLUX v1.0](https://huggingface.co/ByteDance/InfiniteYou) | [aes_stage2](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/aes_stage2) | [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) | Stage-2 model after SFT. Better text-image alignment and aesthetics. |
+| [InfiniteYou-FLUX v1.0](https://huggingface.co/ByteDance/InfiniteYou) | [sim_stage1](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/sim_stage1) | [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) | Stage-1 model before SFT. Higher identity similarity. |
+## 🆚 Comparison with State-of-the-Art Relevant Methods
+![comparative_results](./assets/comparative_results.jpg)
+Qualitative comparison results of InfU with the state-of-the-art baselines, FLUX.1-dev IP-Adapter and PuLID-FLUX. The identity similarity and text-image alignment of the results generated by FLUX.1-dev IP-Adapter (IPA) are inadequate. PuLID-FLUX generates images with decent identity similarity. However, it suffers from poor text-image alignment (Columns 1, 2, 4), and the image quality (e.g., bad hands in Column 5) and aesthetic appeal are degraded. In addition, the face copy-paste issue of PuLID-FLUX is evident (Column 5). In comparison, the proposed InfU outperforms the baselines across all dimensions.
+## ⚙️ Plug-and-Play Property with Off-the-Shelf Popular Approaches
+![plug_and_play](./assets/plug_and_play.jpg)
+InfU features a desirable plug-and-play design, compatible with many existing methods. It naturally supports base model replacement with any variants of FLUX.1-dev, such as FLUX.1-schnell for more efficient generation (e.g., in 4 steps). The compatibility with ControlNets and LoRAs provides more controllability and flexibility for customized tasks. Notably, the compatibility with OminiControl extends our potential for multi-concept personalization, such as interacted identity (ID) and object personalized generation. InfU is also compatible with IP-Adapter (IPA) for stylization of personalized images, producing decent results when injecting style references via IPA. Our plug-and-play feature may extend to even more approaches, providing valuable contributions to the broader community.
+## 📜 Disclaimer and Licenses
+The images used in this repository and related demos are sourced from consented subjects or generated by the models. 
+These pictures are intended solely to showcase the capabilities of our research. If you have any concerns, please feel free to contact us, and we will promptly remove any inappropriate content.
+Our model is released under the [Creative Commons Attribution-NonCommercial 4.0 International Public License](./LICENSE) for academic research purposes only. Any manual or automatic downloading of the face models from [InsightFace](https://github.com/deepinsight/insightface), the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) base model, LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)), *etc.*, must follow their original licenses and be used only for academic research purposes.
+This research aims to positively impact the field of Generative AI. Any usage of this method must be responsible and comply with local laws. The developers do not assume any responsibility for any potential misuse.
+## 📖 Citation
+If you find InfiniteYou useful for your research or applications, please cite our paper:
+```bibtex
+@article{jiang2025infiniteyou,
+  title={{InfiniteYou}: Flexible Photo Recrafting While Preserving Your Identity},
+  author={Jiang, Liming and Yan, Qing and Jia, Yumin and Liu, Zichuan and Kang, Hao and Lu, Xin},
+  journal={arXiv preprint},
+  volume={arXiv:2503.16418},
+  year={2025}
+}
+```
+We also appreciate it if you could give a star ⭐ to our [Github repository](https://github.com/bytedance/InfiniteYou). Thanks a lot!
\ No newline at end of file
--- a/LICENSE
+++ b/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# InfiniteYou
+在灵活变换场景和内容的同时，精准保留你的身份特征，不只是简单的换脸。
+## 论文
+`InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity`
+- https://arxiv.org/pdf/2503.16418
+## 模型结构
+利用InfuseNet的残差连接将身份特征注入DiT基础模型，增强了身份相似性，同时保持生成能力。
+<div align=center>
+    <img src="./doc/structure.png"/>
+</div>
+## 算法原理
+InfuseNet：InfuseNet 是 InfiniteYou 的核心组件，类似于 ControlNet，将身份特征注入扩散模型（如 FLUX）。身份特征基于残差连接注入到扩散模型中，避免直接修改注意力层，减少对基础模型生成能力的负面影响。
+预训练阶段：基于真实单人单样本（SPSS）数据进行预训练，学习身份图像的重建能力。
+监督微调阶段：基于合成的单人多样本（SPMS）数据进行微调，提升文本与图像对齐、图像质量和美学效果。
+扩散变换器（Diffusion Transformers）：用先进的扩散变换器（如 FLUX）作为基础模型，模型在图像生成方面表现出色。扩散变换器支持生成高质量、高分辨率的图像，为身份保持图像生成提供了强大的基础。
+<div align=center>
+    <img src="./doc/algorithm.png"/>
+</div>
+## 环境配置
+```
+mv InfiniteYou_pytorch InfiniteYou # 去框架名后缀
+```
+### Docker（方法一）
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
+# <your IMAGE ID>为以上拉取的docker的镜像ID替换，本镜像为：e77c15729879
+docker run -it --shm-size=64G -v $PWD/InfiniteYou:/home/InfiniteYou -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name iy <your IMAGE ID> bash
+cd /home/InfiniteYou
+pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
+```
+### Dockerfile（方法二）
+```
+cd /home/InfiniteYou/docker
+docker build --no-cache -t iy:latest .
+docker run --shm-size=64G --name iy -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../InfiniteYou:/home/InfiniteYou -it iy bash
+# 若遇到Dockerfile启动的方式安装环境需要长时间等待，可注释掉里面的pip安装，启动容器后再安装python库：pip install -r requirements.txt。
+```
+### Anaconda（方法三）
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
+- https://developer.hpccube.com/tool/
+```
+DTK驱动:dtk2504
+python:python3.10
+torch:2.4.1
+torchvision:0.19.1
+triton:3.0.0
+vllm:0.6.2
+flash-attn:2.6.1
+deepspeed:0.14.2
+apex:1.4.0
+transformers:4.48.0
+```
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
+2、其它非特殊库参照requirements.txt安装
+```
+cd /home/InfiniteYou
+pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
+```
+## 数据集
+`无`
+## 训练
+无
+## 推理
+预训练权重目录结构：
+```
+/home/InfiniteYou
+    |── ByteDance/InfiniteYou 
+    |── black-forest-labs/FLUX.1-dev
+    └── recognition_arcface_ir_se50.pth
+mv recognition_arcface_ir_se50.pth /usr/local/lib/python3.10/dist-packages/facexlib/weights/ #将权重recognition_arcface_ir_se50放到facexlib库的weights目录下
+``` 
+### 单机多卡
+```
+cd /home/InfiniteYou
+python test.py --id_image ./assets/examples/man.jpg --prompt "A man, portrait, cinematic" --out_results_dir ./results
+```
+更多资料可参考源项目中的[`README_origin`](./README_origin.md)。
+## result
+`输入: `
+```
+./assets/examples/man.jpg
+```
+<div align=center>
+    <img src="./doc/input.png"/>
+</div>
+`输出:`
+```
+results/'00000_man_A man, portrait, cinematic_seed876627650.png'
+```
+<div align=center>
+    <img src="./doc/result.png"/>
+</div>
+### 精度
+DCU与GPU精度一致，推理框架：pytorch。
+## 应用场景
+### 算法类别
+`AIGC`
+### 热点应用行业
+`零售,制造,电商,医疗,教育`
+## 预训练权重
+预训练权重快速下载中心：[SCNet AIModels](https://www.scnet.cn/ui/aihub/models) ，项目中的预训练权重可从快速下载通道下载：[ByteDance/InfiniteYou](https://gitlab.scnet.cn:9002/model/sugon_scnet/InfiniteYou.git)、[black-forest-labs/FLUX.1-dev](https://gitlab.scnet.cn:9002/model/icszy_zs_ai/FLUX.1-dev.git)
+HF/github下载地址为：[ByteDance/InfiniteYou](https://huggingface.co/ByteDance/InfiniteYou)、[black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)、[facexlib-recognition_arcface_ir_se50](https://github.com/xinntao/facexlib/releases/download/v0.1.0/recognition_arcface_ir_se50.pth)
+## 源码仓库及问题反馈
+- http://developer.sourcefind.cn/codes/modelzoo/InfiniteYou_pytorch.git
+## 参考资料
+- https://github.com/bytedance/InfiniteYou.git
--- a/README_origin.md
+++ b/README_origin.md
+<div align="center">
+## InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
+[**Liming Jiang**](https://liming-jiang.com/)&nbsp;&nbsp;&nbsp;&nbsp;
+[**Qing Yan**](https://scholar.google.com/citations?user=0TIYjPAAAAAJ)&nbsp;&nbsp;&nbsp;&nbsp;
+[**Yumin Jia**](https://www.linkedin.com/in/yuminjia/)&nbsp;&nbsp;&nbsp;&nbsp;
+[**Zichuan Liu**](https://scholar.google.com/citations?user=-H18WY8AAAAJ)&nbsp;&nbsp;&nbsp;&nbsp;
+[**Hao Kang**](https://scholar.google.com/citations?user=VeTCSyEAAAAJ)&nbsp;&nbsp;&nbsp;&nbsp;
+[**Xin Lu**](https://scholar.google.com/citations?user=mFC0wp8AAAAJ)<br />
+ByteDance Intelligent Creation
+<a href="https://bytedance.github.io/InfiniteYou"><img src="https://img.shields.io/static/v1?label=Project&message=Page&color=blue&logo=github-pages"></a> &ensp;
+<a href="https://arxiv.org/abs/2503.16418"><img src="https://img.shields.io/static/v1?label=Arxiv&message=InfiniteYou&color=darkred&logo=arxiv"></a> &ensp;
+<a href="https://arxiv.org/pdf/2503.16418"><img src="https://img.shields.io/static/v1?label=%F0%9F%93%96%20Paper&message=PDF&color=green"></a> &ensp;
+<a href="https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Demo&color=orange"></a> &ensp;
+</div>
+![teaser](./assets/teaser.jpg)
+> **Abstract:** *Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce **InfiniteYou (InfU)**, one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.*
+## 🔥 News
+- [03/2025] 🔥 The [code](https://github.com/bytedance/InfiniteYou), [model](https://huggingface.co/ByteDance/InfiniteYou), and [demo](https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX) of InfiniteYou-FLUX v1.0 are released.
+- [03/2025] 🔥 The [project page](https://bytedance.github.io/InfiniteYou) of InfiniteYou is created.
+- [03/2025] 🔥 The [paper](https://arxiv.org/abs/2503.16418) of InfiniteYou is released on arXiv.
+## 💡 Important Usage Tips
+- We released two model variants of InfiniteYou-FLUX v1.0: [aes_stage2](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/aes_stage2) and [sim_stage1](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/sim_stage1). The `aes_stage2` is our model after SFT, which is used by default for better text-image alignment and aesthetics. For higher ID similarity, please try `sim_stage1` (using `--model_version` to switch). More details can be found in our [paper](https://arxiv.org/abs/2503.16418).
+- To better fit specific personal needs, we find that two arguments are highly useful to adjust: <br />`--infusenet_conditioning_scale` (default: `1.0`) and `--infusenet_guidance_start` (default: `0.0`). Usually, you may NOT need to adjust them. If necessary, start by trying a slightly larger <br />`--infusenet_guidance_start` (*e.g.*, `0.1`) only (especially helpful for `sim_stage1`). If still not satisfactory, then try a slightly smaller `--infusenet_conditioning_scale` (*e.g.*, `0.9`).
+- We also provided two LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)) to enable additional usage flexibility. If needed, try `Realism` only first.  They are *entirely optional*, which are examples to try but are NOT used in our paper.
+- If the generated gender does not align with your preferences, try adding specific words in the text prompt, such as 'a man', 'a woman', *etc*. We encourage users to use inclusive and respectful language.
+## :european_castle: Model Zoo
+| InfiniteYou Version | Model Version | Base Model Trained with | Description |  
+| :---: | :---: | :---: | :---: |
+| [InfiniteYou-FLUX v1.0](https://huggingface.co/ByteDance/InfiniteYou) | [aes_stage2](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/aes_stage2) | [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) | Stage-2 model after SFT. Better text-image alignment and aesthetics. |
+| [InfiniteYou-FLUX v1.0](https://huggingface.co/ByteDance/InfiniteYou) | [sim_stage1](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/sim_stage1) | [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) | Stage-1 model before SFT. Higher identity similarity. |
+## 🔧 Requirements and Installation
+### Dependencies
+Simply run this one-line command to install (feel free to create a `python3` virtual environment before you run):
+```bash
+pip install -r requirements.txt
+```
+### Memory Requirements 
+Please note that the current full-performance `bf16` model inference requires a **peak VRAM** of around **43GB**. **We are trying to reduce memory usage and will post an update soon.** Community contributions are welcome.
+If you want to use our models ASAP but do not have a GPU with sufficient VRAM, please follow [Diffusers memory reduction tips](https://huggingface.co/docs/diffusers/en/optimization/memory) first, where some offloading strategies may be helpful.
+## ⚡️ Quick Inference
+### Local Inference Script
+```bash
+python test.py --id_image ./assets/examples/man.jpg --prompt "A man, portrait, cinematic" --out_results_dir ./results
+```
+<details>
+<summary style='font-size:20px'><b><i>Explanation of all the arguments (click to expand!)</i></b></summary>
+- Input and output:
+  - `--id_image (str)`: The path to the input identity (ID) image. Default: `./assets/examples/man.jpg`.
+  - `--prompt (str)`: The text prompt for image generation. Default: `A man, portrait, cinematic`.
+  - `--out_results_dir (str)`: The path to the output directory to save the generated results. Default: `./results`.
+  - `--control_image (str or None)`: The path to the control image \[*optional*\] to extract five facical keypoints to control the generation. Default: `None`.
+  - `--base_model_path (str)`: The huggingface or local path to the base model. Default: `black-forest-labs/FLUX.1-dev`.
+  - `--model_dir (str)`: The path to the InfiniteYou model directory. Default: `ByteDance/InfiniteYou`.
+- Version control:
+  - `--infu_flux_version (str)`: InfiniteYou-FLUX version: currently only `v1.0` is supported. Default: `v1.0`.
+  - `--model_version (str)`: The model variant to use: `aes_stage2` | `sim_stage1`. Default: `aes_stage2`.
+- General inference arguments:
+  - `--cuda_device (int)`: The cuda device ID to use. Default: `0`.
+  - `--seed (int)`: The seed for reproducibility (0 for random). Default: `0`.
+  - `--guideance_scale (float)`: The guidance scale for the diffusion process. Default: `3.5`.
+  - `--num_steps (int)`: The number of inference steps. Default: `30`.
+- InfiniteYou-specific arguments:
+  - `--infusenet_conditioning_scale (float)`: The scale for the InfuseNet conditioning. Default: `1.0`.
+  - `--infusenet_guidance_start (float)`: The start point for the InfuseNet guidance injection. Default: `0.0`.
+  - `--infusenet_guidance_end (float)`: The end point for the InfuseNet guidance injection. Default: `1.0`.
+- Optional LoRAs:
+  - `--enable_realism_lora (store_true)`: Whether to enable the Realism LoRA. Default: `False`.
+  - `--enable_anti_blur_lora (store_true)`: Whether to enable the Anti-blur LoRA. Default: `False`.
+</details>
+### Local Gradio Demo
+```bash
+python app.py
+```
+### Online Hugging Face Demo
+We appreciate the GPU grant from the Hugging Face team. 
+You can also try our [InfiniteYou-FLUX Hugging Face demo](https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX) online.
+## 🆚 Comparison with State-of-the-Art Relevant Methods
+![comparative_results](./assets/comparative_results.jpg)
+Qualitative comparison results of InfU with the state-of-the-art baselines, FLUX.1-dev IP-Adapter and PuLID-FLUX. The identity similarity and text-image alignment of the results generated by FLUX.1-dev IP-Adapter (IPA) are inadequate. PuLID-FLUX generates images with decent identity similarity. However, it suffers from poor text-image alignment (Columns 1, 2, 4), and the image quality (e.g., bad hands in Column 5) and aesthetic appeal are degraded. In addition, the face copy-paste issue of PuLID-FLUX is evident (Column 5). In comparison, the proposed InfU outperforms the baselines across all dimensions.
+## ⚙️ Plug-and-Play Property with Off-the-Shelf Popular Approaches
+![plug_and_play](./assets/plug_and_play.jpg)
+InfU features a desirable plug-and-play design, compatible with many existing methods. It naturally supports base model replacement with any variants of FLUX.1-dev, such as FLUX.1-schnell for more efficient generation (e.g., in 4 steps). The compatibility with ControlNets and LoRAs provides more controllability and flexibility for customized tasks. Notably, the compatibility with OminiControl extends our potential for multi-concept personalization, such as interacted identity (ID) and object personalized generation. InfU is also compatible with IP-Adapter (IPA) for stylization of personalized images, producing decent results when injecting style references via IPA. Our plug-and-play feature may extend to even more approaches, providing valuable contributions to the broader community.
+## 📜 Disclaimer and Licenses
+The images used in this repository and related demos are sourced from consented subjects or generated by the models. These pictures are intended solely to showcase the capabilities of our research. If you have any concerns, please feel free to contact us, and we will promptly remove any inappropriate content.
+The use of the released code, model, and demo must strictly adhere to the respective licenses. Our code is released under the [Apache 2.0 License](./LICENSE), and our model is released under the [Creative Commons Attribution-NonCommercial 4.0 International Public License](https://huggingface.co/ByteDance/InfiniteYou/blob/main/LICENSE) for academic research purposes only. Any manual or automatic downloading of the face models from [InsightFace](https://github.com/deepinsight/insightface), the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) base model, LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)), *etc.*, must follow their original licenses and be used only for academic research purposes.
+This research aims to positively impact the field of Generative AI. Any usage of this method must be responsible and comply with local laws. The developers do not assume any responsibility for any potential misuse.
+## 🤗 Acknowledgments
+We sincerely acknowledge the insightful discussions from Stathi Fotiadis, Min Jin Chong, Xiao Yang, Tiancheng Zhi, Jing Liu, and Xiaohui Shen. We genuinely appreciate the help from Jincheng Liang and Lu Guo with our user study and qualitative evaluation.
+## 📖 Citation
+If you find InfiniteYou useful for your research or applications, please cite our paper:
+```bibtex
+@article{jiang2025infiniteyou,
+  title={{InfiniteYou}: Flexible Photo Recrafting While Preserving Your Identity},
+  author={Jiang, Liming and Yan, Qing and Jia, Yumin and Liu, Zichuan and Kang, Hao and Lu, Xin},
+  journal={arXiv preprint},
+  volume={arXiv:2503.16418},
+  year={2025}
+}
+```
+We also appreciate it if you could give a star :star: to this repository. Thanks a lot!
--- a/app.py
+++ b/app.py
+# Copyright (c) 2025 Bytedance Ltd. and/or its affiliates. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import gradio as gr
+import pillow_avif
+import torch
+from huggingface_hub import snapshot_download
+from pillow_heif import register_heif_opener
+from pipelines.pipeline_infu_flux import InfUFluxPipeline
+# Register HEIF support for Pillow
+register_heif_opener()
+class ModelVersion:
+    STAGE_1 = "sim_stage1"
+    STAGE_2 = "aes_stage2"
+    DEFAULT_VERSION = STAGE_2
+ENABLE_ANTI_BLUR_DEFAULT = False
+ENABLE_REALISM_DEFAULT = False
+pipeline = None
+loaded_pipeline_config = {
+    "model_version": "aes_stage2",
+    "enable_realism": False,
+    "enable_anti_blur": False,
+}
+def download_models():
+    snapshot_download(repo_id='ByteDance/InfiniteYou', local_dir='./models/InfiniteYou', local_dir_use_symlinks=False)
+    try:
+        snapshot_download(repo_id='black-forest-labs/FLUX.1-dev', local_dir='./models/FLUX.1-dev', local_dir_use_symlinks=False)
+    except Exception as e:
+        print(e)
+        print('\nYou are downloading `black-forest-labs/FLUX.1-dev` to `./models/FLUX.1-dev` but failed. '
+              'Please accept the agreement and obtain access at https://huggingface.co/black-forest-labs/FLUX.1-dev. '
+              'Then, use `huggingface-cli login` and your access tokens at https://huggingface.co/settings/tokens to authenticate. '
+              'After that, run the code again.')
+        print('\nYou can also download it manually from HuggingFace and put it in `./models/InfiniteYou`, '
+              'or you can modify `base_model_path` in `app.py` to specify the correct path.')
+        exit()
+def prepare_pipeline(model_version, enable_realism, enable_anti_blur):
+    global pipeline
+    if (
+        pipeline 
+        and loaded_pipeline_config["enable_realism"] == enable_realism 
+        and loaded_pipeline_config["enable_anti_blur"] == enable_anti_blur
+        and model_version == loaded_pipeline_config["model_version"]
+    ):
+        return
+    loaded_pipeline_config["enable_realism"] = enable_realism
+    loaded_pipeline_config["enable_anti_blur"] = enable_anti_blur
+    loaded_pipeline_config["model_version"] = model_version
+    if pipeline is None or pipeline.model_version != model_version:
+        del pipeline
+        model_path = f'./models/InfiniteYou/infu_flux_v1.0/{model_version}'
+        print(f'loading model from {model_path}')
+        pipeline = InfUFluxPipeline(
+            base_model_path='./models/FLUX.1-dev',
+            infu_model_path=model_path,
+            insightface_root_path='./models/InfiniteYou/supports/insightface',
+            image_proj_num_tokens=8,
+            infu_flux_version='v1.0',
+            model_version=model_version,
+        )
+    pipeline.pipe.delete_adapters(['realism', 'anti_blur'])
+    loras = []
+    if enable_realism:
+        loras.append(['./models/InfiniteYou/supports/optional_loras/flux_realism_lora.safetensors', 'realism', 1.0])
+    if enable_anti_blur:
+        loras.append(['./models/InfiniteYou/supports/optional_loras/flux_anti_blur_lora.safetensors', 'anti_blur', 1.0])
+    pipeline.load_loras(loras)
+def generate_image(
+    input_image, 
+    control_image, 
+    prompt, 
+    seed, 
+    width,
+    height,
+    guidance_scale, 
+    num_steps, 
+    infusenet_conditioning_scale, 
+    infusenet_guidance_start,
+    infusenet_guidance_end,
+    enable_realism,
+    enable_anti_blur,
+    model_version
+):
+    global pipeline
+    prepare_pipeline(model_version=model_version, enable_realism=enable_realism, enable_anti_blur=enable_anti_blur)
+    if seed == 0:
+        seed = torch.seed() & 0xFFFFFFFF
+    try:
+        image = pipeline(
+            id_image=input_image,
+            prompt=prompt,
+            control_image=control_image,
+            seed=seed,
+            width=width,
+            height=height,
+            guidance_scale=guidance_scale,
+            num_steps=num_steps,
+            infusenet_conditioning_scale=infusenet_conditioning_scale,
+            infusenet_guidance_start=infusenet_guidance_start,
+            infusenet_guidance_end=infusenet_guidance_end,
+        )
+    except Exception as e:
+        print(e)
+        gr.Error(f"An error occurred: {e}")
+        return gr.update()
+    return gr.update(value = image, label=f"Generated Image, seed = {seed}")
+def generate_examples(id_image, control_image, prompt_text, seed, enable_realism, enable_anti_blur, model_version):
+    return generate_image(id_image, control_image, prompt_text, seed, 864, 1152, 3.5, 30, 1.0, 0.0, 1.0, enable_realism, enable_anti_blur, model_version)
+sample_list = [
+    ['./assets/examples/man.jpg', None, 'A sophisticated gentleman exuding confidence. He is dressed in a 1990s brown plaid jacket with a high collar, paired with a dark grey turtleneck. His trousers are tailored and charcoal in color, complemented by a sleek leather belt. The background showcases an elegant library with bookshelves, a marble fireplace, and warm lighting, creating a refined and cozy atmosphere. His relaxed posture and casual hand-in-pocket stance add to his composed and stylish demeanor', 666, False, False, 'aes_stage2'],
+    ['./assets/examples/man.jpg', './assets/examples/man_pose.jpg', 'A man, portrait, cinematic', 42, True, False, 'aes_stage2'],
+    ['./assets/examples/man.jpg', None, 'A man, portrait, cinematic', 12345, False, False, 'sim_stage1'],
+    ['./assets/examples/woman.jpg', './assets/examples/woman.jpg', 'A woman, portrait, cinematic', 1621695706, False, False, 'sim_stage1'],
+    ['./assets/examples/woman.jpg', None, 'A young woman holding a sign with the text "InfiniteYou", "Infinite" in black and "You" in red, pure background', 3724009365, False, False, 'aes_stage2'],
+    ['./assets/examples/woman.jpg', None, 'A photo of an elegant Javanese bride in traditional attire, with long hair styled into intricate a braid made of many fresh flowers, wearing a delicate headdress made from sequins and beads. She\'s holding flowers, light smiling at the camera, against a backdrop adorned with orchid blooms. The scene captures her grace as she stands amidst soft pastel colors, adding to its dreamy atmosphere', 42, True, False, 'aes_stage2'],
+    ['./assets/examples/woman.jpg', None, 'A photo of an elegant Javanese bride in traditional attire, with long hair styled into intricate a braid made of many fresh flowers, wearing a delicate headdress made from sequins and beads. She\'s holding flowers, light smiling at the camera, against a backdrop adorned with orchid blooms. The scene captures her grace as she stands amidst soft pastel colors, adding to its dreamy atmosphere', 42, False, False, 'sim_stage1'],
+]
+with gr.Blocks() as demo:
+    session_state = gr.State({})
+    default_model_version = "v1.0"
+    gr.HTML("""
+    <div style="text-align: center; max-width: 900px; margin: 0 auto;">
+        <h1 style="font-size: 1.5rem; font-weight: 700; display: block;">InfiniteYou-FLUX</h1>
+        <h2 style="font-size: 1.2rem; font-weight: 300; margin-bottom: 1rem; display: block;">Official Gradio Demo for <a href="https://arxiv.org/abs/2503.16418">InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity</a></h2>
+        <a href="https://bytedance.github.io/InfiniteYou">[Project Page]</a>&ensp;
+        <a href="https://arxiv.org/abs/2503.16418">[Paper]</a>&ensp;
+        <a href="https://github.com/bytedance/InfiniteYou">[Code]</a>&ensp;
+        <a href="https://huggingface.co/ByteDance/InfiniteYou">[Model]</a>
+    </div>
+    """)
+    gr.Markdown("""
+    ### 💡 How to Use This Demo:
+    1. **Upload an identity (ID) image containing a human face.** For multiple faces, only the largest face will be detected. The face should ideally be clear and large enough, without significant occlusions or blur.
+    2. **Enter the text prompt to describe the generated image and select the model version.** Please refer to **important usage tips** under the Generated Image field.
+    3. *[Optional] Upload a control image containing a human face.* Only five facial keypoints will be extracted to control the generation. If not provided, we use a black control image, indicating no control.
+    4. *[Optional] Adjust advanced hyperparameters or apply optional LoRAs to meet personal needs.* Please refer to **important usage tips** under the Generated Image field.
+    5. **Click the "Generate" button to generate an image.** Enjoy!
+    """)
+    with gr.Row():
+        with gr.Column(scale=3):
+            with gr.Row():
+                ui_id_image = gr.Image(label="Identity Image", type="pil", scale=3, height=370, min_width=100)
+                with gr.Column(scale=2, min_width=100):
+                    ui_control_image = gr.Image(label="Control Image [Optional]", type="pil", height=370, min_width=100)
+            ui_prompt_text = gr.Textbox(label="Prompt", value="Portrait, 4K, high quality, cinematic")
+            ui_model_version = gr.Dropdown(
+                label="Model Version",
+                choices=[ModelVersion.STAGE_1, ModelVersion.STAGE_2],
+                value=ModelVersion.DEFAULT_VERSION,
+            )
+            ui_btn_generate = gr.Button("Generate")
+            with gr.Accordion("Advanced", open=False):
+                with gr.Row():
+                    ui_num_steps = gr.Number(label="num steps", value=30)
+                    ui_seed = gr.Number(label="seed (0 for random)", value=0)
+                with gr.Row():
+                    ui_width = gr.Number(label="width", value=864)
+                    ui_height = gr.Number(label="height", value=1152)
+                ui_guidance_scale = gr.Number(label="guidance scale", value=3.5, step=0.5)
+                ui_infusenet_conditioning_scale = gr.Slider(minimum=0.0, maximum=1.0, value=1.0, step=0.05, label="infusenet conditioning scale")
+                with gr.Row():
+                    ui_infusenet_guidance_start = gr.Slider(minimum=0.0, maximum=1.0, value=0.0, step=0.05, label="infusenet guidance start")
+                    ui_infusenet_guidance_end = gr.Slider(minimum=0.0, maximum=1.0, value=1.0, step=0.05, label="infusenet guidance end")
+            with gr.Accordion("LoRAs [Optional]", open=True):
+                with gr.Row():
+                    ui_enable_realism = gr.Checkbox(label="Enable realism LoRA", value=ENABLE_REALISM_DEFAULT)
+                    ui_enable_anti_blur = gr.Checkbox(label="Enable anti-blur LoRA", value=ENABLE_ANTI_BLUR_DEFAULT)
+        with gr.Column(scale=2):
+            image_output = gr.Image(label="Generated Image", interactive=False, height=550, format='png')
+            gr.Markdown(
+                """
+                ### ❗️ Important Usage Tips:
+                - **Model Version**: `aes_stage2` is used by default for better text-image alignment and aesthetics. For higher ID similarity, try `sim_stage1`.
+                - **Useful Hyperparameters**: Usually, there is NO need to adjust too much. If necessary, try a slightly larger `--infusenet_guidance_start` (*e.g.*, `0.1`) only (especially helpful for `sim_stage1`). If still not satisfactory, then try a slightly smaller `--infusenet_conditioning_scale` (*e.g.*, `0.9`).
+                - **Optional LoRAs**: `realism` and `anti-blur`. To enable them, please check the corresponding boxes. If needed, try `realism` only first. They are optional and were NOT used in our paper.
+                - **Gender Prompt**: If the generated gender is not preferred, add specific words in the prompt, such as 'a man', 'a woman', *etc*. We encourage using inclusive and respectful language.
+                """
+            )
+    gr.Examples(
+        sample_list,
+        inputs=[ui_id_image, ui_control_image, ui_prompt_text, ui_seed, ui_enable_realism, ui_enable_anti_blur, ui_model_version],
+        outputs=[image_output],
+        fn=generate_examples,
+        cache_examples=True,
+    )
+    ui_btn_generate.click(
+        generate_image, 
+        inputs=[
+            ui_id_image, 
+            ui_control_image, 
+            ui_prompt_text, 
+            ui_seed, 
+            ui_width,
+            ui_height,
+            ui_guidance_scale, 
+            ui_num_steps, 
+            ui_infusenet_conditioning_scale, 
+            ui_infusenet_guidance_start, 
+            ui_infusenet_guidance_end,
+            ui_enable_realism,
+            ui_enable_anti_blur,
+            ui_model_version
+        ], 
+        outputs=[image_output], 
+        concurrency_id="gpu"
+    )
+    with gr.Accordion("Local Gradio Demo for Developers", open=False):
+        gr.Markdown(
+            'Please refer to our GitHub repository to [run the InfiniteYou-FLUX gradio demo locally](https://github.com/bytedance/InfiniteYou#local-gradio-demo).'
+        )
+    gr.Markdown(
+        """
+        ---
+        ### 📜 Disclaimer and Licenses 
+        The images used in this demo are sourced from consented subjects or generated by the models. These pictures are intended solely to show the capabilities of our research. If you have any concerns, please contact us, and we will promptly remove any inappropriate content.
+        The use of the released code, model, and demo must strictly adhere to the respective licenses. 
+        Our code is released under the [Apache 2.0 License](https://github.com/bytedance/InfiniteYou/blob/main/LICENSE), 
+        and our model is released under the [Creative Commons Attribution-NonCommercial 4.0 International Public License](https://huggingface.co/ByteDance/InfiniteYou/blob/main/LICENSE) 
+        for academic research purposes only. Any manual or automatic downloading of the face models from [InsightFace](https://github.com/deepinsight/insightface), 
+        the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) base model, LoRAs, *etc.*, must follow their original licenses and be used only for academic research purposes.
+        This research aims to positively impact the field of Generative AI. Any usage of this method must be responsible and comply with local laws. The developers do not assume any responsibility for any potential misuse.
+        """
+    )    
+    gr.Markdown(
+        """
+        ### 📖 Citation
+        If you find InfiniteYou useful for your research or applications, please cite our paper:
+        ```bibtex
+        @article{jiang2025infiniteyou,
+          title={{InfiniteYou}: Flexible Photo Recrafting While Preserving Your Identity},
+          author={Jiang, Liming and Yan, Qing and Jia, Yumin and Liu, Zichuan and Kang, Hao and Lu, Xin},
+          journal={arXiv preprint},
+          volume={arXiv:2503.16418},
+          year={2025}
+        }
+        ```
+        We also appreciate it if you could give a star ⭐ to our [Github repository](https://github.com/bytedance/InfiniteYou). Thanks a lot!
+        """
+    )
+download_models()
+prepare_pipeline(model_version=ModelVersion.DEFAULT_VERSION, enable_realism=ENABLE_REALISM_DEFAULT, enable_anti_blur=ENABLE_ANTI_BLUR_DEFAULT)
+demo.queue()
+demo.launch(server_name='0.0.0.0')  # IPv4
+# demo.launch(server_name='[::]')  # IPv6
--- a/assets/comparative_results.jpg
+++ b/assets/comparative_results.jpg
--- a/assets/examples/man.jpg
+++ b/assets/examples/man.jpg
--- a/assets/examples/man_pose.jpg
+++ b/assets/examples/man_pose.jpg
--- a/assets/examples/woman.jpg
+++ b/assets/examples/woman.jpg
--- a/assets/plug_and_play.jpg
+++ b/assets/plug_and_play.jpg
--- a/assets/teaser.jpg
+++ b/assets/teaser.jpg
--- a/black-forest-labs/FLUX.1-dev/README.md
+++ b/black-forest-labs/FLUX.1-dev/README.md
+---
+language:
+- en
+license: other
+license_name: flux-1-dev-non-commercial-license
+license_link: LICENSE.md
+extra_gated_prompt: By clicking "Agree", you agree to the [FluxDev Non-Commercial License Agreement](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
+  and acknowledge the [Acceptable Use Policy](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/POLICY.md).
+tags:
+- text-to-image
+- image-generation
+- flux
+---
+![FLUX.1 [dev] Grid](./dev_grid.jpg)
+`FLUX.1 [dev]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
+For more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/).
+# Key Features
+1. Cutting-edge output quality, second only to our state-of-the-art model `FLUX.1 [pro]`.
+2. Competitive prompt following, matching the performance of closed source alternatives .
+3. Trained using guidance distillation, making `FLUX.1 [dev]` more efficient.
+4. Open weights to drive new scientific research, and empower artists to develop innovative workflows.
+5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the [flux-1-dev-non-commercial-license](./licence.md).
+# Usage
+We provide a reference implementation of `FLUX.1 [dev]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux).
+Developers and creatives looking to build on top of `FLUX.1 [dev]` are encouraged to use this as a starting point.
+## API Endpoints
+The FLUX.1 models are also available via API from the following sources
+1. [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`)
+2. [replicate.com](https://replicate.com/collections/flux)
+3. [fal.ai](https://fal.ai/models/fal-ai/flux/dev)
+## ComfyUI
+`FLUX.1 [dev]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow.
+## Diffusers
+To use `FLUX.1 [dev]` with the 🧨 diffusers python library, first install or upgrade diffusers
+```shell
+pip install -U diffusers
+```
+Then you can use `FluxPipeline` to run the model
+```python
+import torch
+from diffusers import FluxPipeline
+pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
+pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
+prompt = "A cat holding a sign that says hello world"
+image = pipe(
+    prompt,
+    height=1024,
+    width=1024,
+    guidance_scale=3.5,
+    num_inference_steps=50,
+    max_sequence_length=512,
+    generator=torch.Generator("cpu").manual_seed(0)
+).images[0]
+image.save("flux-dev.png")
+```
+To learn more check out the [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) documentation
+---
+# Limitations
+- This model is not intended or able to provide factual information.
+- As a statistical model this checkpoint might amplify existing societal biases.
+- The model may fail to generate output that matches the prompts.
+- Prompt following is heavily influenced by the prompting-style.
+# Out-of-Scope Use
+The model and its derivatives may not be used
+- In any way that violates any applicable national, federal, state, local or international law or regulation.
+- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content.
+- To generate or disseminate verifiably false information and/or content with the purpose of harming others.
+- To generate or disseminate personal identifiable information that can be used to harm an individual.
+- To harass, abuse, threaten, stalk, or bully individuals or groups of individuals.
+- To create non-consensual nudity or illegal pornographic content.
+- For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation.
+- Generating or facilitating large-scale disinformation campaigns.
+# License
+This model falls under the [`FLUX.1 [dev]` Non-Commercial License](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md).
\ No newline at end of file
--- a/doc/algorithm.png
+++ b/doc/algorithm.png
--- a/doc/input.png
+++ b/doc/input.png
--- a/doc/result.png
+++ b/doc/result.png
--- a/doc/structure.png
+++ b/doc/structure.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
+ENV DEBIAN_FRONTEND=noninteractive
+# RUN yum update && yum install -y git cmake wget build-essential
+# RUN source /opt/dtk-dtk25.04/env.sh
+# # 安装pip相关依赖
+COPY requirements.txt requirements.txt
+RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
--- a/docker/requirements.txt
+++ b/docker/requirements.txt
+accelerate==1.0.1
+diffusers==0.31.0
+facexlib==0.3.0
+gradio==5.21.0
+httpcore==1.0.7
+httpx==0.28.1
+huggingface-hub==0.28.1
+insightface==0.7.3
+numpy==1.26.4
+onnxruntime==1.19.2
+opencv-python==4.11.0.86
+pillow==10.4.0
+pillow-avif-plugin==1.5.0
+pillow-heif==0.21.0
+sentencepiece==0.2.0
+# torch==2.2.1
+# torchvision==0.17.1
+transformers==4.48.0
+peft==0.14.0
--- a/icon.png
+++ b/icon.png
--- a/model.properties
+++ b/model.properties
+# 模型编码
+modelCode=1482
+# 模型名称
+modelName=InfiniteYou_pytorch
+# 模型描述
+modelDescription=在灵活变换场景和内容的同时，精准保留你的身份特征，不只是简单的换脸。
+# 应用场景
+appScenario=推理,AIGC,零售,制造,电商,医疗,教育
+# 框架类型
+frameType=pytorch