"mmdet3d/vscode:/vscode.git/clone" did not exist on "f584b9700ade8e05214d593f3cfbc0eff55a19bf"
Commit e1deac83 authored by chenpangpang's avatar chenpangpang
Browse files

feat: 更新kolors到新版本,带参考图版本

parents 0f1bd6af 1b51b523
Pipeline #1592 canceled with stages
...@@ -4,3 +4,4 @@ weights/ ...@@ -4,3 +4,4 @@ weights/
*.safetensors *.safetensors
.* .*
!.gitignore !.gitignore
__pycache__
FROM image.sourcefind.cn:5000/dcu/admin/base/jupyterlab-pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 as base FROM image.sourcefind.cn:5000/dcu/admin/base/jupyterlab-pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 as base
RUN cd /root && git clone -b dcu http://developer.hpccube.com/codes/chenpangpang/kolors.git ARG IMAGE=kolors
WORKDIR /root/kolors/Kolors ARG IMAGE_UPPER=Kolors
RUN pip install -r requirements.txt && python3 setup.py install ARG BRANCH=dcu
RUN rm -r build dist kolors.egg-info RUN cd /root && git clone -b $BRANCH http://developer.hpccube.com/codes/chenpangpang/$IMAGE.git
WORKDIR /root/$IMAGE/$IMAGE_UPPER
RUN pip install -r requirements.txt
######### #########
# Prod # # Prod #
######### #########
FROM image.sourcefind.cn:5000/dcu/admin/base/jupyterlab-pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 FROM image.sourcefind.cn:5000/dcu/admin/base/jupyterlab-pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
ARG IMAGE=kolors
ARG IMAGE_UPPER=Kolors
COPY chenyh/$IMAGE/frpc_linux_amd64_v0.2 /opt/conda/lib/python3.10/site-packages/gradio/
RUN chmod +x /opt/conda/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2
COPY chenyh/$IMAGE/Kwai-Kolors/Kolors /root/$IMAGE_UPPER/Kwai-Kolors/Kolors
COPY chenyh/$IMAGE/Kwai-Kolors/Kolors-IP-Adapter-Plus /root/$IMAGE_UPPER/Kwai-Kolors/Kolors-IP-Adapter-Plus
COPY --from=base /opt/conda/lib/python3.10/site-packages /opt/conda/lib/python3.10/site-packages COPY --from=base /opt/conda/lib/python3.10/site-packages /opt/conda/lib/python3.10/site-packages
COPY --from=base /root/kolors/Kolors /root/Kolors COPY --from=base /root/$IMAGE/$IMAGE_UPPER /root/$IMAGE_UPPER
COPY --from=base /root/kolors/启动器.ipynb /root/kolors/run.sh /root/ COPY --from=base /root/$IMAGE/启动器.ipynb /root/$IMAGE/start.sh /root/
RUN mkdir /root/Kolors/weights COPY --from=base /root/$IMAGE/assets /root/assets
# 模型仓库http://113.200.138.88:18080/aimodels/Kolors.git,放置在/public/home/openaimodels/dockerFileTemp/chenyh
COPY chenyh/Kolors /root/Kolors/weights/Kolors
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
模型许可协议
模型发布日期:2024/7/6
通过点击同意或使用、复制、修改、分发、表演或展示模型作品的任何部分或元素,您将被视为已承认并接受本协议的内容,本协议立即生效。
1.定义。
a. “协议”指本协议中所规定的使用、复制、分发、修改、表演和展示模型作品或其任何部分或元素的条款和条件。
b. “材料”是指根据本协议提供的专有的模型和文档(及其任何部分)的统称。
c. “模型”指大型语言模型、图像/视频/音频/3D 生成模型、多模态大型语言模型及其软件和算法,包括训练后的模型权重、参数(包括优化器状态)、机器学习模型代码、推理支持代码、训练支持代码、微调支持代码以及我们公开提供的前述其他元素。
d. “输出”是指通过操作或以其他方式使用模型或模型衍生品而产生的模型或模型衍生品的信息和/或内容输出。
e. “模型衍生品”包括:(i)对模型或任何模型衍生物的修改;(ii)基于模型的任何模型衍生物的作品;或(iii)通过将模型或模型的任何模型衍生物的权重、参数、操作或输出的模式转移到该模型而创建的任何其他机器学习模型,以使该模型的性能类似于模型或模型衍生物。为清楚起见,输出本身不被视为模型衍生物。
f. “模型作品”包括:(i)材料;(ii)模型衍生品;及(iii)其所有衍生作品。
g. “许可人”或“我们”指作品所有者或作品所有者授权的授予许可的实体,包括可能对模型和/或分发模型拥有权利的个人或实体。
h.“被许可人”、“您”或“您的”是指行使本协议授予的权利和/或为任何目的和在任何使用领域使用模型作品的自然人或法人实体。
i.“第三方”是指不受我们或您共同控制的个人或法人实体。
2. 许可内容。
a.我们授予您非排他性的、全球性的、不可转让的、免版税的许可(在我们的知识产权或我们拥有的体现在材料中或利用材料的其他权利的范围内),允许您仅根据本协议的条款使用、复制、分发、创作衍生作品(包括模型衍生品)和对材料进行修改,并且您不得违反(或鼓励、或允许任何其他人违反)本协议的任何条款。
b.在遵守本协议的前提下,您可以分发或向第三方提供模型作品,您须满足以下条件:
(i)您必须向所有该模型作品或使用该作品的产品或服务的任何第三方接收者提供模型作品的来源和本协议的副本;
(ii)您必须在任何修改过的文档上附加明显的声明,说明您更改了这些文档;
(iii)您可以在您的修改中添加您自己的版权声明,并且,在您对该作品的使用、复制、修改、分发、表演和展示符合本协议的条款和条件的前提下,您可以为您的修改或任何此类模型衍生品的使用、复制或分发提供额外或不同的许可条款和条件。
c. 附加商业条款: 若您希望将模型及模型衍生品用作商业用途,则您必须向许可人申请许可,许可人可自行决定向您授予许可。除非许可人另行明确授予您该等权利,否则您无权行使本协议项下的任何权利。
3.使用限制。
a. 您对本模型作品的使用必须遵守适用法律法规(包括贸易合规法律法规),并遵守《服务协议》(https://kolors.kuaishou.com/agreement)。您必须将本第 3(a) 和 3(b) 条中提及的使用限制作为可执行条款纳入任何规范本模型作品使用和/或分发的协议(例如许可协议、使用条款等),并且您必须向您分发的后续用户发出通知,告知其本模型作品受本第 3(a) 和 3(b) 条中的使用限制约束。
b. 您不得使用本模型作品或本模型作品的任何输出或成果来改进任何其他模型(本模型或其模型衍生品除外)。
4.知识产权。
a. 我们保留模型的所有权及其相关知识产权。在遵守本协议条款和条件的前提下,对于您制作的材料的任何衍生作品和修改,您是且将是此类衍生作品和修改的所有者。
b. 本协议不授予任何商标、商号、服务标记或产品名称的标识许可,除非出于描述和分发本模型作品的合理和惯常用途。
c. 如果您对我们或任何个人或实体提起诉讼或其他程序(包括诉讼中的交叉索赔或反索赔),声称材料或任何输出或任何上述内容的任何部分侵犯您拥有或可许可的任何知识产权或其他权利,则根据本协议授予您的所有许可应于提起此类诉讼或其他程序之日起终止。
5. 免责声明和责任限制。
a. 本模型作品及其任何输出和结果按“原样”提供,不作任何明示或暗示的保证,包括适销性、非侵权性或适用于特定用途的保证。我们不对材料及其任何输出的安全性或稳定性作任何保证,也不承担任何责任。
b. 在任何情况下,我们均不对您承担任何损害赔偿责任,包括但不限于因您使用或无法使用材料或其任何输出而造成的任何直接、间接、特殊或后果性损害赔偿责任,无论该损害赔偿责任是如何造成的。
6. 存续和终止。
a. 本协议期限自您接受本协议或访问材料之日起开始,并将持续完全有效,直至根据本协议条款和条件终止。
b. 如果您违反本协议的任何条款或条件,我们可终止本协议。本协议终止后,您必须立即删除并停止使用本模型作品。第 4(a)、4(c)、5和 7 条在本协议终止后仍然有效。
7. 适用法律和管辖权。
a. 本协议及由本协议引起的或与本协议有关的任何争议均受中华人民共和国大陆地区(仅为本协议目的,不包括香港、澳门和台湾)法律管辖,并排除冲突法的适用,且《联合国国际货物销售合同公约》不适用于本协议。
b. 因本协议引起或与本协议有关的任何争议,由许可人住所地人民法院管辖。
请注意,许可证可能会更新到更全面的版本。 有关许可和版权的任何问题,请通过 kwai-kolors@kuaishou.com 与我们联系。
英文版
MODEL LICENSE AGREEMENT
Release Date: 2024/7/6
By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Model Works, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
1. DEFINITIONS.
a. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of the Model Works or any portion or element thereof set forth herein.
b. “Materials” shall mean, collectively, Us proprietary the Model and Documentation (and any portion thereof) as made available by Us under this Agreement.
c. “Model” shall mean the large language models, image/video/audio/3D generation models, and multimodal large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us .
d. “Output” shall mean the information and/or content output of Model or a Model Derivative that results from operating or otherwise using Model or a Model Derivative.
e. “Model Derivatives” shall mean all: (i) modifications to the Model or any Model Derivative; (ii) works based on the Model or any Model Derivative; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of the Model or any Model Derivative, to that model in order to cause that model to perform similarly to the Model or a Model Derivative, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs or a Model Derivative for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
f. “Model Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
g. “Licensor” , “We” or “Us” shall mean the copyright owner or entity authorized by the copyright owner that is granting the License, including the persons or entities that may have rights in the Model and/or distributing the Model.
h. “Licensee”, “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Model Works for any purpose and in any field of use.
i. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
2. LICENSE CONTENT.
a. We grant You a non-exclusive, worldwide, non-transferable and royalty-free limited license under the intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
b. You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Model Works, provided that You meet all of the following conditions:
(i) You must provide all such Third Party recipients of the Model Works or products or services using them the source of the Model and a copy of this Agreement;
(ii) You must cause any modified documents to carry prominent notices stating that You changed the documents;
(iii) You may add Your own copyright statement to Your modifications and, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement.
c. additional commercial terms: If, on the Model version release date, the monthly active users of all products or services made available by or for Licensee is greater than 100 million monthly active users in the preceding calendar month, or, Licensee is a cloud computing platform vendor, You must request a license from licensor, which the licensor may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until We otherwise expressly grants You such rights.
3. LICENSE RESTRICITIONS.
a. Your use of the Model Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Service Agreement. You must include the use restrictions referenced in these Sections 3(a) and 3(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Model Works and You must provide notice to subsequent users to whom You distribute that Model Works are subject to the use restrictions in these Sections 3(a) and 3(b).
b. You must not use the Model Works or any Output or results of the Model Works to improve any other large model (other than Model or Model Derivatives thereof).
4. INTELLECTUAL PROPERTY.
a. We retain ownership of all intellectual property rights in and to the Model and derivatives. Conditioned upon compliance with the terms and conditions of this Agreement, with respect to any derivative works and modifications of the Materials that are made by you, you are and will be the owner of such derivative works and modifications.
b. No trademark license is granted to use the trade names, trademarks, service marks, or product names of Us, except as required to fulfill notice requirements under this Agreement or as required for reasonable and customary use in describing and redistributing the Materials.
c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed.
5. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
a. THE MODEL WORKS AND ANY OUTPUT AND RESULTS THERE FROM ARE PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. WE MAKE NO WARRANTY AND ASSUME NO RESPONSIBILITY FOR THE SAFETY OR STABILITY OF THE MATERIALS AND ANY OUTPUT THEREFROM.
b. IN NO EVENT SHALL WE BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE MATERIALS OR ANY OUTPUT OF IT, NO MATTER HOW IT’S CAUSED.
c. You will defend, indemnify and hold harmless Us from and against any claim by any third party arising out of or related to your use or distribution of the Materials.
6. SURVIVAL AND TERMINATION.
a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Model Works. Sections 4(a), 4(c), 5 and 7 shall survive the termination of this Agreement.
7. GOVERNING LAW AND JURISDICTION.
a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of China (for the purpose of this agreement only, excluding Hong Kong, Macau, and Taiwan), without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
b. Any disputes arising from or related to this Agreement shall be under the jurisdiction of the People's Court where the Licensor is located.
Note that the license is subject to update to a more comprehensive version. For any questions related to the license and copyright, please contact us at kwai-kolors@kuaishou.com.
\ No newline at end of file
<p align="left"> ---
English</a>&nbsp | &nbsp<a href="README_CN.md">中文</a>&nbsp title: Kolors
</p> emoji: 🖼
<br><br> colorFrom: purple
colorTo: red
<p align="center"> sdk: gradio
<img src="imgs/logo.png" width="400"/> sdk_version: 4.38.1
<p> app_file: app.py
<br> pinned: false
license: apache-2.0
---
<div align="center">
<a href='https://huggingface.co/Kwai-Kolors/Kolors'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-HF-yellow'></a> &ensp; Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
<a href="https://github.com/Kwai-Kolors/Kolors"><img src="https://img.shields.io/static/v1?label=Kolors Code&message=Github&color=blue&logo=github-pages"></a> &ensp;
<a href="https://kwai-kolors.github.io/"><img src="https://img.shields.io/static/v1?label=Team%20Page&message=Page&color=green"></a> &ensp;
<a href="https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv:Kolors&color=red&logo=arxiv"></a> &ensp;
<a href="https://kolors.kuaishou.com/"><img src="https://img.shields.io/static/v1?label=Official Website&message=Page&color=green"></a> &ensp;
</div>
# Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis
<figure>
<img src="imgs/head_final3.png">
</figure>
<br><br>
## Contents
- [🎉 News](#News)
- [📑 Open-source Plan](#open-source-plan)
- [📖 Introduction](#Introduction)
- [📊 Evaluation 🥇🥇🔥🔥](#Evaluation)
- [🎥 Visualization](#Visualization)
- [🛠️ Usage](#Usage)
- [📜 License & Citation & Acknowledgments](#License)
<br><br>
## <a name="News"></a>🎉 News
* 2024.07.09 💥 Kolors supports [ComfyUI](https://github.com/comfyanonymous/ComfyUI#manual-install-windows-linux). Thanks to [@kijai](https://github.com/kijai/ComfyUI-KwaiKolorsWrapper) with his great work.
* 2024.07.06 🔥🔥🔥 We release **Kolors**, a large text-to-image model trained on billions of text-image pairs. This model is bilingual in both Chinese and English, and supports a context length of 256 tokens. For more technical details, please refer to [technical report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf).
* 2024.07.03 📊 Kolors won the second place on [FlagEval Multimodal Text-to-Image Leaderboard](https://flageval.baai.ac.cn/#/leaderboard/multimodal?kind=t2i), excelling particularly in the Chinese and English subjective quality assessment where Kolors took the first place.
* 2024.07.02 🎉 Congratulations! Our paper on controllable video generation, [DragAnything: Motion Control for Anything using Entity Representation](https://arxiv.org/abs/2403.07420), have been accepted by ECCV 2024.
* 2024.02.08 🎉 Congratulations! Our paper on generative model evaluation, [Learning Multi-dimensional Human Preference for Text-to-Image Generation](https://wangbohan97.github.io/MPS/), have been accepted by CVPR 2024.
<br><br>
## <a name="open-source-plan"></a>📑 Open-source Plan
- Kolors (Text-to-Image Model)
- [x] Inference
- [x] Checkpoints
- [ ] LoRA
- [ ] ControlNet (Pose, Canny, Depth)
- [ ] IP-Adapter
- [x] ComfyUI
- [x] Gradio
- [ ] Diffusers
<br><br>
##
## <a name="Introduction"></a>📖 Introduction
Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this <a href="https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf">technical report</a></b>.
<br><br>
## <a name="Evaluation"></a>📊 Evaluation
We have collected a comprehensive text-to-image evaluation dataset named KolorsPrompts to compare Kolors with other state-of-the-art open models and closed-source models. KolorsPrompts includes over 1,000 prompts across 14 catagories and 12 evaluation dimensions. The evaluation process incorporates both human and machine assessments. In relevant benchmark evaluations, Kolors demonstrated highly competitive performance, achieving industry-leading standards.
<br><br>
### Human Assessment
For the human evaluation, we invited 50 imagery experts to conduct comparative evaluations of the results generated by different models. The experts rated the generated images based on three criteria: visual appeal, text faithfulness, and overall satisfaction. In the evaluation, Kolors achieved the highest overall satisfaction score and significantly led in visual appeal compared to other models.
| Model | Average Overall Satisfaction | Average Visual Appeal | Average Text Faithfulness |
| :--------------: | :--------: | :--------: | :--------: |
| Adobe-Firefly | 3.03 | 3.46 | 3.84 |
| Stable Diffusion 3 | 3.26 | 3.50 | 4.20 |
| DALL-E 3 | 3.32 | 3.54 | 4.22 |
| Midjourney-v5 | 3.32 | 3.68 | 4.02 |
| Playground-v2.5 | 3.37 | 3.73 | 4.04 |
| Midjourney-v6 | 3.58 | 3.92 | 4.18 |
| **Kolors** | **3.59** | **3.99** | **4.17** |
------
<div style="color: gray; font-size: small;">
**All model results are tested with the April 2024 product versions**
</div>
<br>
### Machine Assessment
We used [MPS](https://arxiv.org/abs/2405.14705) (Multi-dimensional Human Preference Score) on KolorsPrompts as the evaluation metric for machine assessment. Kolors achieved the highest MPS score, which is consistent with the results of the human evaluations.
<div style="text-align:center">
| Models | Overall MPS |
|:-------------------:|:-------------:|
| Adobe-Firefly | 8.5 |
| Stable Diffusion 3 | 8.9 |
| DALL-E 3 | 9.0 |
| Midjourney-v5 | 9.4 |
| Playground-v2.5 | 9.8 |
| Midjourney-v6 | 10.2 |
| **Kolors** | **10.3** |
</div>
<br>
For more experimental results and details, please refer to our [technical report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf).
<br><br>
## <a name="Visualization"></a>🎥 Visualization
* **High-quality Portrait**
<div style="display: flex; justify-content: space-between;">
<img src="imgs/zl8.png" />
</div>
<br>
* **Chinese Elements Generation**
<div style="display: flex; justify-content: space-between;">
<img src="imgs/cn_all.png"/>
</div>
<br>
* **Complex Semantic Understanding**
<div style="display: flex; justify-content: space-between;">
<img src="imgs/fz_all.png"/>
</div>
<br>
* **Text Rendering**
<div style="display: flex; justify-content: space-between;">
<img src="imgs/wz_all.png" />
</div>
<br>
</div>
The visualized case prompts mentioned above can be accessed [here](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/prompt_vis.txt).
<br><br>
## <a name="Usage"></a>🛠️ Usage
### Requirements
* Python 3.8 or later
* PyTorch 1.13.1 or later
* Transformers 4.26.1 or later
* Recommended: CUDA 11.7 or later
<br>
1. Repository Cloning and Dependency Installation
```bash
apt-get install git-lfs
git clone https://github.com/Kwai-Kolors/Kolors
cd Kolors
conda create --name kolors python=3.8
conda activate kolors
pip install -r requirements.txt
python3 setup.py install
```
2. Weights download([link](https://huggingface.co/Kwai-Kolors/Kolors)):
```bash
huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors
```
or
```bash
git lfs clone https://huggingface.co/Kwai-Kolors/Kolors weights/Kolors
```
3. Inference:
```bash
python3 scripts/sample.py "一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着“可图”"
# The image will be saved to "scripts/outputs/sample_text.jpg"
```
4. Web demo:
```bash
python3 scripts/sampleui.py
```
<br><br>
## <a name="License"></a>📜 License & Citation & Acknowledgments
### License
Kolors are fully open-sourced for academic research. For commercial use, please fill out this [questionnaire](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/可图KOLORS模型商业授权申请书.docx) and sent it to kwai-kolors@kuaishou.com to obtain the agreement for free use.
We open-source Kolors to promote the development of large text-to-image models in collaboration with the open-source community. The code of this project is open-sourced under the Apache-2.0 license. We sincerely urge all developers and users to strictly adhere to the [open-source license](MODEL_LICENSE), avoiding the use of the open-source model, code, and its derivatives for any purposes that may harm the country and society or for any services not evaluated and registered for safety. Note that despite our best efforts to ensure the compliance, accuracy, and safety of the data during training, due to the diversity and combinability of generated content and the probabilistic randomness affecting the model, we cannot guarantee the accuracy and safety of the output content, and the model is susceptible to misleading. This project does not assume any legal responsibility for any data security issues, public opinion risks, or risks and liabilities arising from the model being misled, abused, misused, or improperly utilized due to the use of the open-source model and code.
### Citation
If you find our work helpful, please cite it!
```
@article{kolors,
title={Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis},
author={Kolors Team},
journal={arXiv preprint},
year={2024}
}
```
### Acknowledgments
- Thanks to [Diffusers](https://github.com/huggingface/diffusers) for providing the codebase.
- Thanks to [ChatGLM3](https://github.com/THUDM/ChatGLM3) for providing the powerful Chinese language model.
<br>
### Contact Us
If you want to leave a message for our R&D team and product team, feel free to join our [WeChat group](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/wechat.png). You can also contact us via email (kwai-kolors@kuaishou.com).
[![Star History Chart](https://api.star-history.com/svg?repos=Kwai-Kolors/Kolors&type=Date)](https://star-history.com/#Kwai-Kolors/Kolors&Date)
<p align="left">
中文</a>&nbsp | &nbsp<a href="README.md">English</a>&nbsp
</p>
<!-- <br><br> -->
<p align="center">
<img src="imgs/logo.png" width="400"/>
<p>
<br>
<!-- <div align="center">
<a href='https://kwai-kolors.github.io/'><img src='https://img.shields.io/badge/Team-Page-green'></a> <a href=''><img src='https://img.shields.io/badge/Technique-Report-red'></a> [![Teampage](https://img.shields.io/badge/Website-Page-blue)](https://kolors.kuaishou.com/)
</div> -->
<div align="center">
<a href='https://huggingface.co/Kwai-Kolors/Kolors'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-HF-yellow'></a> &ensp;
<a href="https://github.com/Kwai-Kolors/Kolors"><img src="https://img.shields.io/static/v1?label=Kolors Code&message=Github&color=blue&logo=github-pages"></a> &ensp;
<a href="https://kwai-kolors.github.io/"><img src="https://img.shields.io/static/v1?label=Team%20Page&message=Page&color=green"></a> &ensp;
<a href="https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv:Kolors&color=red&logo=arxiv"></a> &ensp;
<a href="https://kolors.kuaishou.com/"><img src="https://img.shields.io/static/v1?label=Official Website&message=Page&color=green"></a> &ensp;
</div>
</p>
# Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis
<figure>
<img src="imgs/head_final3.png">
</figure>
<br><br>
## 目录
- [🎉 新闻](#新闻)
- [📑 开源计划](#开源计划)
- [📖 模型介绍](#模型介绍)
- [📊 评测表现 🥇🥇🔥🔥](#评测表现)
- [🎥 可视化](#可视化)
- [🛠️ 快速使用](#快速使用)
- [📜 协议、引用、致谢](#协议引用)
<br><br>
## <a name="新闻"></a>🎉 新闻
* 2024.07.09 💥 Kolors 支持了 [ComfyUI](https://github.com/comfyanonymous/ComfyUI#manual-install-windows-linux),感谢 [@kijai](https://github.com/kijai/ComfyUI-KwaiKolorsWrapper) 的工作。
* 2024.07.06 🔥🔥🔥 我们开源了基于隐空间扩散的文生图大模型 **Kolors** ,该模型基于数十亿图文对进行训练,支持256的上下文token数,支持中英双语,技术细节参考[技术报告](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf)
* 2024.07.03 📊 Kolors 在智源研究院 [FlagEval 多模态文生图](https://flageval.baai.ac.cn/#/leaderboard/multimodal?kind=t2i)评测中取得第二名,其中中文主观质量、英文主观质量两个单项排名第一。
* 2024.07.02 🎉 祝贺,可图项目组提出的可控视频生成方法 [DragAnything: Motion Control for Anything using Entity Representation](https://arxiv.org/abs/2403.07420) 被 ECCV 2024 接收。
* 2024.02.08 🎉 祝贺,可图项目组提出的生成模型评估方法 [Learning Multi-dimensional Human Preference for Text-to-Image Generation](https://wangbohan97.github.io/MPS/) 被 CVPR 2024 接收。
<br><br>
## <a name="开源计划"></a>📑 开源计划
- Kolors (Text-to-Image Model)
- [x] Inference
- [x] Checkpoints
- [ ] LoRA
- [ ] ControlNet (Pose, Canny, Depth)
- [ ] IP-Adapter
- [x] ComfyUI
- [x] Gradio
- [ ] Diffusers
<br><br>
## <a name="模型介绍"></a>📖 模型介绍
可图大模型是由快手可图团队开发的基于潜在扩散的大规模文本到图像生成模型。Kolors 在数十亿图文对下进行训练,在视觉质量、复杂语义理解、文字生成(中英文字符)等方面,相比于开源/闭源模型,都展示出了巨大的优势。同时,Kolors 支持中英双语,在中文特色内容理解方面更具竞争力。更多的实验结果和细节请查看我们的<a href="https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf">技术报告</a></b>
<br><br>
## <a name="评测表现"></a>📊 评测表现
为了全面比较 Kolors 与其他模型的生成能力,我们构建了包含人工评估、机器评估的全面评测内容。
在相关基准评测中,Kolors 具有非常有竞争力的表现,达到业界领先水平。我们构建了一个包含14种垂类,12个挑战项,总数量为一千多个 prompt 的文生图评估集 KolorsPrompts。在 KolorsPrompts 上,我们收集了 Kolors 与市面上常见的 SOTA 级别的开源/闭源系统的文生图结果,并进行了人工评测和机器评测。
<br><br>
### 人工评测
我们邀请了50个具有图像领域知识的专业评估人员对不同模型的生成结果进行对比评估,为生成图像打分,衡量维度为:画面质量、图文相关性、整体满意度三个方面。
Kolors 在整体满意度方面处于最优水平,其中画面质量显著领先其他模型。
<div style="text-align: center;">
| 模型 | 整体满意度平均分 | 画面质量平均分 | 图文相关性平均分 |
| :--------------: | :--------: | :--------: | :--------: |
| Adobe-Firefly | 3.03 | 3.46 | 3.84 |
| Stable Diffusion 3 | 3.26 | 3.50 | 4.20 |
| DALL-E 3 | 3.32 | 3.54 | 4.22 |
| Midjourney-v5 | 3.32 | 3.68 | 4.02 |
| Playground-v2.5 | 3.37 | 3.73 | 4.04 |
| Midjourney-v6 | 3.58 | 3.92 | 4.18 |
| **Kolors** | **3.59** | **3.99** | **4.17** |
</div>
<div style="color: gray; font-size: small;">
**所有模型结果取自 2024.04 的产品版本**
</div>
<br>
### 机器评测
我们采用 [MPS](https://arxiv.org/abs/2405.14705) (Multi-dimensional Human preference Score) 来评估上述模型。
我们以 KolorsPrompts 作为基础评估数据集,计算多个模型的 MPS 指标。Kolors 实现了最高的MPS 指标,这与人工评估的指标一致。
<div style="text-align:center">
| 模型 | MPS综合得分 |
|-------------------|-------------|
| Adobe-Firefly | 8.5 |
| Stable Diffusion 3 | 8.9 |
| DALL-E 3 | 9.0 |
| Midjourney-v5 | 9.4 |
| Playground-v2.5 | 9.8 |
| Midjourney-v6 | 10.2 |
| **Kolors** | **10.3** |
</div>
<br>
更多的实验结果和细节请查看我们的技术报告。点击[技术报告](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf)
<br><br>
## <a name="可视化"></a>🎥 可视化
* **高质量人像**
<div style="display: flex; justify-content: space-between;">
<img src="imgs/zl8.png" />
</div>
<br>
* **中国元素**
<div style="display: flex; justify-content: space-between;">
<img src="imgs/cn_all.png"/>
</div>
<br>
* **复杂语义理解**
<div style="display: flex; justify-content: space-between;">
<img src="imgs/fz_all.png"/>
</div>
<br>
* **文字绘制**
<div style="display: flex; justify-content: space-between;">
<img src="imgs/wz_all.png" />
</div>
<br>
</div>
上述可视化 case,可以点击[可视化prompts](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/prompt_vis.txt) 获取
<br><br>
## <a name="快速使用"></a>🛠️ 快速使用
### 要求
* python 3.8及以上版本
* pytorch 1.13.1及以上版本
* transformers 4.26.1及以上版本
* 建议使用CUDA 11.7及以上
<br>
1、仓库克隆及依赖安装
```bash
apt-get install git-lfs
git clone https://github.com/Kwai-Kolors/Kolors
cd Kolors
conda create --name kolors python=3.8
conda activate kolors
pip install -r requirements.txt
python3 setup.py install
```
2、模型权重下载([链接](https://huggingface.co/Kwai-Kolors/Kolors)):
```bash
huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors
```
或者
```bash
git lfs clone https://huggingface.co/Kwai-Kolors/Kolors weights/Kolors
```
3、模型推理:
```bash
python3 scripts/sample.py "一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着“可图”"
# The image will be saved to "scripts/outputs/sample_text.jpg"
```
4、 Web demo:
```bash
python3 scripts/sampleui.py
```
<br><br>
## <a name="协议引用"></a>📜协议、引用、致谢
### 协议
**Kolors**(可图) 权重对学术研究完全开放,如需商用请填写[问卷](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/可图KOLORS模型商业授权申请书.docx),发送问卷至 kwai-kolors@kuaishou.com 进行登记获取许可后免费使用。
本开源模型旨在与开源社区共同推进文生图大模型技术的发展。本项目代码依照 Apache-2.0 协议开源,模型权重需要遵循本《模型许可协议》,我们恳请所有开发者和用户严格遵守[开源协议](MODEL_LICENSE),避免将开源模型、代码及其衍生物用于任何可能对国家和社会造成危害的用途,或用于任何未经安全评估和备案的服务。需要注意,尽管模型在训练中我们尽力确保数据的合规性、准确性和安全性,但由于视觉生成模型存在生成多样性和可组合性等特点,以及生成模型受概率随机性因素的影响,模型无法保证输出内容的准确性和安全性,且模型易被误导。本项目不对因使用开源模型和代码而导致的任何数据安全问题、舆情风险或因模型被误导、滥用、传播、不当利用而产生的风险和责任承担任何法律责任。
<br>
### 引用
如果你觉得我们的工作对你有帮助,欢迎引用!
```
@article{kolors,
title={Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis},
author={Kolors Team},
journal={arXiv preprint},
year={2024}
}
```
<br>
### 致谢
- 感谢 [Diffusers](https://github.com/huggingface/diffusers) 提供的codebase
- 感谢 [ChatGLM3](https://github.com/THUDM/ChatGLM3) 提供的强大中文语言模型
<br>
### 联系我们
如果你想给我们的研发团队和产品团队留言,欢迎加入我们的[微信群](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/wechat.png)。当然也可以通过邮件(kwai-kolors@kuaishou.com)联系我们。
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=Kwai-Kolors/Kolors&type=Date)](https://star-history.com/#Kwai-Kolors/Kolors&Date)
import random
import torch
from huggingface_hub import snapshot_download
from transformers import CLIPVisionModelWithProjection, CLIPImageProcessor
from kolors.pipelines import pipeline_stable_diffusion_xl_chatglm_256_ipadapter, \
pipeline_stable_diffusion_xl_chatglm_256
from kolors.models.modeling_chatglm import ChatGLMModel
from kolors.models.tokenization_chatglm import ChatGLMTokenizer
from kolors.models import unet_2d_condition
from diffusers import AutoencoderKL, EulerDiscreteScheduler, UNet2DConditionModel
import gradio as gr
import numpy as np
device = "cuda"
ckpt_dir = "Kwai-Kolors/Kolors"
ckpt_IPA_dir = "Kwai-Kolors/Kolors-IP-Adapter-Plus"
text_encoder = ChatGLMModel.from_pretrained(f'{ckpt_dir}/text_encoder', torch_dtype=torch.float16).half().to(device)
tokenizer = ChatGLMTokenizer.from_pretrained(f'{ckpt_dir}/text_encoder')
vae = AutoencoderKL.from_pretrained(f"{ckpt_dir}/vae", revision=None).half().to(device)
scheduler = EulerDiscreteScheduler.from_pretrained(f"{ckpt_dir}/scheduler")
unet_t2i = UNet2DConditionModel.from_pretrained(f"{ckpt_dir}/unet", revision=None).half().to(device)
unet_i2i = unet_2d_condition.UNet2DConditionModel.from_pretrained(f"{ckpt_dir}/unet", revision=None).half().to(device)
image_encoder = CLIPVisionModelWithProjection.from_pretrained(f'{ckpt_IPA_dir}/image_encoder',
ignore_mismatched_sizes=True).to(dtype=torch.float16,
device=device)
ip_img_size = 336
clip_image_processor = CLIPImageProcessor(size=ip_img_size, crop_size=ip_img_size)
pipe_t2i = pipeline_stable_diffusion_xl_chatglm_256.StableDiffusionXLPipeline(
vae=vae,
text_encoder=text_encoder,
tokenizer=tokenizer,
unet=unet_t2i,
scheduler=scheduler,
force_zeros_for_empty_prompt=False
).to(device)
pipe_i2i = pipeline_stable_diffusion_xl_chatglm_256_ipadapter.StableDiffusionXLPipeline(
vae=vae,
text_encoder=text_encoder,
tokenizer=tokenizer,
unet=unet_i2i,
scheduler=scheduler,
image_encoder=image_encoder,
feature_extractor=clip_image_processor,
force_zeros_for_empty_prompt=False
).to(device)
if hasattr(pipe_i2i.unet, 'encoder_hid_proj'):
pipe_i2i.unet.text_encoder_hid_proj = pipe_i2i.unet.encoder_hid_proj
pipe_i2i.load_ip_adapter(f'{ckpt_IPA_dir}', subfolder="", weight_name=["ip_adapter_plus_general.bin"])
MAX_SEED = np.iinfo(np.int32).max
MAX_IMAGE_SIZE = 1024
def infer(prompt,
ip_adapter_image=None,
ip_adapter_scale=0.5,
negative_prompt="",
seed=0,
randomize_seed=False,
width=1024,
height=1024,
guidance_scale=5.0,
num_inference_steps=25
):
if randomize_seed:
seed = random.randint(0, MAX_SEED)
generator = torch.Generator().manual_seed(seed)
if ip_adapter_image is None:
pipe_t2i.to(device)
image = pipe_t2i(
prompt=prompt,
negative_prompt=negative_prompt,
guidance_scale=guidance_scale,
num_inference_steps=num_inference_steps,
width=width,
height=height,
generator=generator
).images[0]
return image
else:
pipe_i2i.to(device)
image_encoder.to(device)
pipe_i2i.image_encoder = image_encoder
pipe_i2i.set_ip_adapter_scale([ip_adapter_scale])
image = pipe_i2i(
prompt=prompt,
ip_adapter_image=[ip_adapter_image],
negative_prompt=negative_prompt,
height=height,
width=width,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
num_images_per_prompt=1,
generator=generator
).images[0]
return image
examples = [
["穿着黑色T恤衫,上面中文绿色大字写着“可图”", "image/test_ip.jpg", 0.5],
["A cute dog is running", "image/test_ip2.png", 0.5]
]
css = """
#col-left {
margin: 0 auto;
max-width: 600px;
}
#col-right {
margin: 0 auto;
max-width: 750px;
}
"""
def load_description(fp):
with open(fp, 'r', encoding='utf-8') as f:
content = f.read()
return content
with gr.Blocks(css=css) as Kolors:
gr.HTML(load_description("assets/title.md"))
with gr.Row():
with gr.Column(elem_id="col-left"):
with gr.Row():
prompt = gr.Textbox(
label="Prompt",
placeholder="Enter your prompt",
lines=2
)
with gr.Row():
ip_adapter_image = gr.Image(label="Image Prompt (optional)", type="pil")
with gr.Accordion("Advanced Settings", open=False):
negative_prompt = gr.Textbox(
label="Negative prompt",
placeholder="Enter a negative prompt",
visible=True,
)
seed = gr.Slider(
label="Seed",
minimum=0,
maximum=MAX_SEED,
step=1,
value=0,
)
randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
with gr.Row():
width = gr.Slider(
label="Width",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=1024,
)
height = gr.Slider(
label="Height",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=1024,
)
with gr.Row():
guidance_scale = gr.Slider(
label="Guidance scale",
minimum=0.0,
maximum=10.0,
step=0.1,
value=5.0,
)
num_inference_steps = gr.Slider(
label="Number of inference steps",
minimum=10,
maximum=50,
step=1,
value=25,
)
with gr.Row():
ip_adapter_scale = gr.Slider(
label="Image influence scale",
info="Use 1 for creating variations",
minimum=0.0,
maximum=1.0,
step=0.05,
value=0.5,
)
with gr.Row():
run_button = gr.Button("Run")
with gr.Column(elem_id="col-right"):
result = gr.Image(label="Result", show_label=False)
with gr.Row():
gr.Examples(
fn=infer,
examples=examples,
inputs=[prompt, ip_adapter_image, ip_adapter_scale],
outputs=[result]
)
run_button.click(
fn=infer,
inputs=[prompt, ip_adapter_image, ip_adapter_scale, negative_prompt, seed, randomize_seed, width, height,
guidance_scale, num_inference_steps],
outputs=[result]
)
Kolors.queue().launch(server_name="0.0.0.0", share=True)
<div style="display: flex; justify-content: center; align-items: center; text-align: center;">
<div>
<h1>Kolors</h1>
<span>Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis</span>
<br>
<div style="display: flex; justify-content: center; align-items: center; text-align: center;">
<a href="https://github.com/Kwai-Kolors/Kolors"><img src="https://img.shields.io/static/v1?label=Kolors Code&message=Github&color=blue&logo=github-pages"></a> &ensp;
<a href="https://kwai-kolors.github.io/"><img src="https://img.shields.io/static/v1?label=Team%20Page&message=Page&color=green"></a> &ensp;
<a href="https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Kolors&color=red"></a> &ensp;
<a href="https://klingai.kuaishou.com/"><img src="https://img.shields.io/static/v1?label=Official Website&message=Page&color=green"></a>
</div>
</div>
</div>
\ No newline at end of file
This image diff could not be displayed because it is too large. You can view the blob instead.
1、视觉质量:
- 一对年轻的中国情侣,皮肤白皙,穿着时尚的运动装,背景是现代的北京城市天际线。面部细节,清晰的毛孔,使用最新款的相机拍摄,特写镜头,超高画质,8K,视觉盛宴
2、中国元素
- 万里长城,蜿蜒
- 一张北京国家体育场(俗称“鸟巢”)的高度细节图片。图片应展示体育场复杂的钢结构,强调其独特的建筑设计。场景设定在白天,天空晴朗,突显出体育场的宏伟规模和现代感。包括周围的奥林匹克公园和一些游客,以增加场景的背景和生气。
- 上海外滩
3、复杂语义理解
- 满月下的街道,熙熙攘攘的行人正在享受繁华夜生活。街角摊位上,一位有着火红头发、穿着标志性天鹅绒斗篷的年轻女子,正在和脾气暴躁的老小贩讨价还价。这个脾气暴躁的小贩身材高大、老道,身着一套整洁西装,留着小胡子,用他那部蒸汽朋克式的电话兴致勃勃地交谈
- 画面有四只神兽:朱雀、玄武、青龙、白虎。朱雀位于画面上方,羽毛鲜红如火,尾羽如凤凰般绚丽,翅膀展开时似燃烧的火焰。玄武居于下方,是龟蛇相缠的形象,巨龟背上盘绕着一条黑色巨蛇,龟甲上有古老的符文,蛇眼冰冷锐利。青龙位于右方,长身盘旋在天际,龙鳞碧绿如翡翠,龙须飘逸,龙角如鹿,口吐云雾。白虎居于左方,体态威猛,白色的皮毛上有黑色斑纹,双眼炯炯有神,尖牙利爪,周围是苍茫的群山和草原。
- 一张高对比度的照片,熊猫骑在马上,戴着巫师帽,正在看书,马站在土墙旁的街道上,有绿草从街道的裂缝中长出来。
4、文字绘制
- 一张瓢虫的照片,微距,变焦,高质量,电影,瓢虫拿着一个木牌,上面写着“我爱世界” 的文字
- 一只小橘猫在弹钢琴,钢琴是黑色的牌子是“KOLORS”,猫的身影清晰的映照在钢琴上
- 街边的路牌,上面写着“天道酬勤”
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment