Initial commit

0513d03d · jerrrrry · 0513d03d · 0513d03d · 0513d03d · 0513d03d
Commit 0513d03d authored Feb 03, 2026 by jerrrrry
20 changed files
--- a/.gitignore
+++ b/.gitignore
+__pycache__
+/ckpts/**/
\ No newline at end of file
--- a/LICENSE.txt
+++ b/LICENSE.txt
+TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT
+Tencent HunyuanVideo Release Date: December 3, 2024
+THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
+By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent Hunyuan Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
+1.	DEFINITIONS.
+a.	“Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A.
+b.	“Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of Tencent Hunyuan Works or any portion or element thereof set forth herein.
+c.	“Documentation” shall mean the specifications, manuals and documentation for Tencent Hunyuan made publicly available by Tencent.
+d.	“Hosted Service” shall mean a hosted service offered via an application programming interface (API), web access, or any other electronic or remote means.
+e.	“Licensee,” “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Tencent Hunyuan Works for any purpose and in any field of use.
+f.	“Materials” shall mean, collectively, Tencent’s proprietary Tencent Hunyuan and Documentation (and any portion thereof) as made available by Tencent under this Agreement.
+g.	“Model Derivatives” shall mean all: (i) modifications to Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; (ii) works based on Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Tencent Hunyuan or any Model Derivative of Tencent Hunyuan, to that model in order to cause that model to perform similarly to Tencent Hunyuan or a Model Derivative of Tencent Hunyuan, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs by Tencent Hunyuan or a Model Derivative of Tencent Hunyuan for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
+h.	“Output” shall mean the information and/or content output of Tencent Hunyuan or a Model Derivative that results from operating or otherwise using Tencent Hunyuan or a Model Derivative, including via a Hosted Service.
+i.	“Tencent,” “We” or “Us” shall mean THL A29 Limited.
+j.	“Tencent Hunyuan” shall mean the large language models, text/image/video/audio/3D generation models, and multimodal large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us, including, without limitation to, Tencent HunyuanVideo released at [https://github.com/Tencent/HunyuanVideo].
+k.	“Tencent Hunyuan Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
+l.	“Territory” shall mean the worldwide territory, excluding the territory of the European Union, United Kingdom and South Korea. 
+m.	“Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
+n.	“including” shall mean including but not limited to.
+2.	GRANT OF RIGHTS.
+We grant You, for the Territory only, a non-exclusive, non-transferable and royalty-free limited license under Tencent’s intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
+3.	DISTRIBUTION.
+You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Tencent Hunyuan Works, exclusively in the Territory, provided that You meet all of the following conditions:
+a.	You must provide all such Third Party recipients of the Tencent Hunyuan Works or products or services using them a copy of this Agreement;
+b.	You must cause any modified files to carry prominent notices stating that You changed the files;
+c.	You are encouraged to: (i) publish at least one technology introduction blogpost or one public statement expressing Your experience of using the Tencent Hunyuan Works; and (ii) mark the products or services developed by using the Tencent Hunyuan Works to indicate that the product/service is “Powered by Tencent Hunyuan”; and
+d.	All distributions to Third Parties (other than through a Hosted Service) must be accompanied by a “Notice” text file that contains the following notice: “Tencent Hunyuan is licensed under the Tencent Hunyuan Community License Agreement, Copyright © 2024 Tencent. All Rights Reserved. The trademark rights of “Tencent Hunyuan” are owned by Tencent or its affiliate.”
+You may add Your own copyright statement to Your modifications and, except as set forth in this Section and in Section 5, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement (including as regards the Territory). If You receive Tencent Hunyuan Works from a Licensee as part of an integrated end user product, then this Section 3 of this Agreement will not apply to You.
+4.	ADDITIONAL COMMERCIAL TERMS.
+If, on the Tencent Hunyuan version release date, the monthly active users of all products or services made available by or for Licensee is greater than 100 million monthly active users in the preceding calendar month, You must request a license from Tencent, which Tencent may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until Tencent otherwise expressly grants You such rights.
+5.	RULES OF USE.
+a.	Your use of the Tencent Hunyuan Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Tencent Hunyuan Works, which is hereby incorporated by reference into this Agreement. You must include the use restrictions referenced in these Sections 5(a) and 5(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Tencent Hunyuan Works and You must provide notice to subsequent users to whom You distribute that Tencent Hunyuan Works are subject to the use restrictions in these Sections 5(a) and 5(b).
+b.	You must not use the Tencent Hunyuan Works or any Output or results of the Tencent Hunyuan Works to improve any other AI model (other than Tencent Hunyuan or Model Derivatives thereof).
+c.	You must not use, reproduce, modify, distribute, or display the Tencent Hunyuan Works, Output or results of the Tencent Hunyuan Works outside the Territory. Any such use outside the Territory is unlicensed and unauthorized under this Agreement.
+6.	INTELLECTUAL PROPERTY.
+a.	Subject to Tencent’s ownership of Tencent Hunyuan Works made by or for Tencent and intellectual property rights therein, conditioned upon Your compliance with the terms and conditions of this Agreement, as between You and Tencent, You will be the owner of any derivative works and modifications of the Materials and any Model Derivatives that are made by or for You.
+b.	No trademark licenses are granted under this Agreement, and in connection with the Tencent Hunyuan Works, Licensee may not use any name or mark owned by or associated with Tencent or any of its affiliates, except as required for reasonable and customary use in describing and distributing the Tencent Hunyuan Works. Tencent hereby grants You a license to use “Tencent Hunyuan” (the “Mark”) in the Territory solely as required to comply with the provisions of Section 3(c), provided that You comply with any applicable laws related to trademark protection. All goodwill arising out of Your use of the Mark will inure to the benefit of Tencent.
+c.	If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed. You will defend, indemnify and hold harmless Us from and against any claim by any Third Party arising out of or related to Your or the Third Party’s use or distribution of the Tencent Hunyuan Works.
+d.	Tencent claims no rights in Outputs You generate. You and Your users are solely responsible for Outputs and their subsequent uses.
+7.	DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
+a.	We are not obligated to support, update, provide training for, or develop any further version of the Tencent Hunyuan Works or to grant any license thereto.
+b.	UNLESS AND ONLY TO THE EXTENT REQUIRED BY APPLICABLE LAW, THE TENCENT HUNYUAN WORKS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND INCLUDING ANY WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, COURSE OF DEALING, USAGE OF TRADE, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR OR A THIRD PARTY’S USE OR DISTRIBUTION OF ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT.
+c.	TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL TENCENT OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, FOR ANY DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS, EVEN IF TENCENT OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
+8.	SURVIVAL AND TERMINATION.
+a.	The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
+b.	We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Tencent Hunyuan Works. Sections 6(a), 6(c), 7 and 9 shall survive the termination of this Agreement.
+9.	GOVERNING LAW AND JURISDICTION.
+a.	This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
+b.	Exclusive jurisdiction and venue for any dispute arising out of or relating to this Agreement will be a court of competent jurisdiction in the Hong Kong Special Administrative Region of the People’s Republic of China, and Tencent and Licensee consent to the exclusive jurisdiction of such court with respect to any such dispute.
+ 
+EXHIBIT A
+ACCEPTABLE USE POLICY
+
+Tencent reserves the right to update this Acceptable Use Policy from time to time.
+Last modified: November 5, 2024
+
+Tencent endeavors to promote safe and fair use of its tools and features, including Tencent Hunyuan. You agree not to use Tencent Hunyuan or Model Derivatives:
+1.	Outside the Territory;
+2.	In any way that violates any applicable national, federal, state, local, international or any other law or regulation;
+3.	To harm Yourself or others;
+4.	To repurpose or distribute output from Tencent Hunyuan or any Model Derivatives to harm Yourself or others; 
+5.	To override or circumvent the safety guardrails and safeguards We have put in place;
+6.	For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
+7.	To generate or disseminate verifiably false information and/or content with the purpose of harming others or influencing elections;
+8.	To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement;
+9.	To intentionally defame, disparage or otherwise harass others;
+10.	To generate and/or disseminate malware (including ransomware) or any other content to be used for the purpose of harming electronic systems;
+11.	To generate or disseminate personal identifiable information with the purpose of harming others;
+12.	To generate or disseminate information (including images, code, posts, articles), and place the information in any public context (including –through the use of bot generated tweets), without expressly and conspicuously identifying that the information and/or content is machine generated;
+13.	To impersonate another individual without consent, authorization, or legal right;
+14.	To make high-stakes automated decisions in domains that affect an individual’s safety, rights or wellbeing (e.g., law enforcement, migration, medicine/health, management of critical infrastructure, safety components of products, essential services, credit, employment, housing, education, social scoring, or insurance);
+15.	In a manner that violates or disrespects the social ethics and moral standards of other countries or regions;
+16.	To perform, facilitate, threaten, incite, plan, promote or encourage violent extremism or terrorism;
+17.	For any use intended to discriminate against or harm individuals or groups based on protected characteristics or categories, online or offline social behavior or known or predicted personal or personality characteristics;
+18.	To intentionally exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
+19.	For military purposes;
+20.	To engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or other professional practices.
--- a/Notice
+++ b/Notice
+Usage and Legal Notices:
+
+Tencent is pleased to support the open source community by making Tencent HunyuanVideo available.
+
+Copyright (C) 2024 THL A29 Limited, a Tencent company.  All rights reserved. The below software and/or models in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) THL A29 Limited.
+
+Tencent HunyuanVideo is licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT except for the third-party components listed below. Tencent HunyuanVideo does not impose any additional limitations beyond what is outlined in the repsective licenses of these third-party components. Users must comply with all terms and conditions of original licenses of these third-party components and must ensure that the usage of the third party components adheres to all relevant laws and regulations. 
+
+For avoidance of doubts, Tencent HunyuanVideo means the large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing may be made publicly available by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
+
+
+Other dependencies and licenses:
+
+
+Open Source Model Licensed under the Apache License Version 2.0:
+The below software in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
+--------------------------------------------------------------------
+1. diffusers
+Copyright (c) diffusers original author and authors
+Please note this software has been modified by Tencent in this distribution.
+
+2. transformers
+Copyright (c) transformers original author and authors
+
+3. safetensors
+Copyright (c) safetensors original author and authors
+
+4. flux
+Copyright (c) flux original author and authors
+
+
+Terms of the Apache License Version 2.0:
+--------------------------------------------------------------------
+Apache License 
+
+Version 2.0, January 2004
+
+http://www.apache.org/licenses/ 
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+1. Definitions.
+
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+
+4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+
+You must give any other recipients of the Work or Derivative Works a copy of this License; and 
+
+You must cause any modified files to carry prominent notices stating that You changed the files; and 
+
+You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and 
+
+If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. 
+
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 
+
+5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+
+6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+
+7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+
+8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+
+9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+
+END OF TERMS AND CONDITIONS
+
+
+
+Open Source Software Licensed under the BSD 2-Clause License:
+--------------------------------------------------------------------
+1. imageio
+Copyright (c) 2014-2022, imageio developers
+All rights reserved.
+
+
+Terms of the BSD 2-Clause License:
+--------------------------------------------------------------------
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+
+Open Source Software Licensed under the BSD 3-Clause License:
+--------------------------------------------------------------------
+1. torchvision
+Copyright (c) Soumith Chintala 2016, 
+All rights reserved.
+
+2. flash-attn
+Copyright (c) 2022, the respective contributors, as shown by the AUTHORS file.
+All rights reserved.
+
+
+Terms of the BSD 3-Clause License:
+--------------------------------------------------------------------
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+
+Open Source Software Licensed under the BSD 3-Clause License and Other Licenses of the Third-Party Components therein:
+--------------------------------------------------------------------
+1. torch
+Copyright (c) 2016-     Facebook, Inc            (Adam Paszke)
+Copyright (c) 2014-     Facebook, Inc            (Soumith Chintala)
+Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
+Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
+Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
+Copyright (c) 2011-2013 NYU                      (Clement Farabet)
+Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
+Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
+Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
+
+
+A copy of the BSD 3-Clause is included in this file.
+
+For the license of other third party components, please refer to the following URL:
+https://github.com/pytorch/pytorch/tree/v2.1.1/third_party
+
+
+Open Source Software Licensed under the BSD 3-Clause License and Other Licenses of the Third-Party Components therein:
+--------------------------------------------------------------------
+1. pandas
+Copyright (c) 2008-2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
+All rights reserved.
+
+Copyright (c) 2011-2023, Open source contributors.
+
+
+A copy of the BSD 3-Clause is included in this file.
+
+For the license of other third party components, please refer to the following URL:
+https://github.com/pandas-dev/pandas/tree/v2.0.3/LICENSES
+
+
+Open Source Software Licensed under the BSD 3-Clause License and Other Licenses of the Third-Party Components therein:
+--------------------------------------------------------------------
+1. numpy
+Copyright (c) 2005-2022, NumPy Developers.
+All rights reserved.
+
+
+A copy of the BSD 3-Clause is included in this file.
+
+For the license of other third party components, please refer to the following URL:
+https://github.com/numpy/numpy/blob/v1.24.4/LICENSES_bundled.txt
+
+
+Open Source Software Licensed under the MIT License:
+--------------------------------------------------------------------
+1. einops
+Copyright (c) 2018 Alex Rogozhnikov
+
+2. loguru
+Copyright (c) 2017
+
+
+Terms of the MIT License:
+--------------------------------------------------------------------
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+
+
+Open Source Software Licensed under the MIT License and Other Licenses of the Third-Party Components therein:
+--------------------------------------------------------------------
+1. tqdm
+Copyright (c) 2013 noamraph
+
+
+A copy of the MIT is included in this file.
+
+For the license of other third party components, please refer to the following URL:
+https://github.com/tqdm/tqdm/blob/v4.66.2/LICENCE
+
+
+
+Open Source Model Licensed under the MIT License:
+--------------------------------------------------------------------
+1. clip-large
+Copyright (c) 2021 OpenAI
+
+
+A copy of the MIT is included in this file.
+
+
+--------------------------------------------------------------------
+We may also use other third-party components:
+
+1. llava-llama3
+
+Copyright (c) llava-llama3 original author and authors
+
+URL: https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers#model
\ No newline at end of file
--- a/README.md
+++ b/README.md
+<!-- ## **HunyuanVideo** -->
+
+[中文阅读](./README_zh.md)
+
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/logo.png"  height=100>
+</p>
+
+# HunyuanVideo: A Systematic Framework For Large Video Generation Model
+
+<div align="center">
+  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo"><img src="https://img.shields.io/static/v1?label=HunyuanVideo Code&message=Github&color=blue"></a> &ensp;
+  <a href="https://aivideo.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Web&color=green"></a> &ensp;
+  <a href="https://video.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Playground&message=Web&color=green"></a>
+</div>
+<div align="center">
+  <a href="https://arxiv.org/abs/2412.03603"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red"></a> &ensp;
+  <a href="https://aivideo.hunyuan.tencent.com/hunyuanvideo.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=High-Quality Version (~350M)&color=red"></a>
+</div>
+<div align="center">
+  <a href="https://huggingface.co/tencent/HunyuanVideo"><img src="https://img.shields.io/static/v1?label=HunyuanVideo&message=HuggingFace&color=yellow"></a> &ensp;
+  <a href="https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video"><img src="https://img.shields.io/static/v1?label=HunyuanVideo&message=Diffusers&color=yellow"></a> &ensp;
+  <a href="https://huggingface.co/tencent/HunyuanVideo-PromptRewrite"><img src="https://img.shields.io/static/v1?label=HunyuanVideo-PromptRewrite&message=HuggingFace&color=yellow"></a>
+
+
+ [![Replicate](https://replicate.com/zsxkib/hunyuan-video/badge)](https://replicate.com/zsxkib/hunyuan-video)
+</div>
+
+<p align="center">
+    👋 Join our <a href="assets/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/tv7FkG4Nwf" target="_blank">Discord</a> 
+</p>
+<p align="center">
+
+-----
+
+This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring HunyuanVideo. You can find more visualizations on our [project page](https://aivideo.hunyuan.tencent.com).
+
+> [**HunyuanVideo: A Systematic Framework For Large Video Generation Model**](https://arxiv.org/abs/2412.03603) <be>
+
+
+
+## 🔥🔥🔥 News!!
+
+* May 28, 2025: 💃 We release the [HunyuanVideo-Avatar](https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar), an audio-driven human animation model  based on HunyuanVideo.
+* May 09, 2025: 🙆 We release the [HunyuanCustom](https://github.com/Tencent-Hunyuan/HunyuanCustom), a multimodal-driven architecture for customized video generation based on HunyuanVideo.
+* Mar 06, 2025: 🌅 We release the [HunyuanVideo-I2V](https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V), an image-to-video model based on HunyuanVideo.
+* Jan 13, 2025: 📈 We release the [Penguin Video Benchmark](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/assets/PenguinVideoBenchmark.csv).
+* Dec 18, 2024: 🏃‍♂️ We release the [FP8 model weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) of HunyuanVideo to save more GPU memory.
+* Dec 17, 2024: 🤗 HunyuanVideo has been integrated into [Diffusers](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video).
+* Dec 7, 2024: 🚀 We release the parallel inference code for HunyuanVideo powered by [xDiT](https://github.com/xdit-project/xDiT).
+* Dec 3, 2024: 👋 We release the inference code and model weights of HunyuanVideo. [Download](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md).
+
+
+
+## 🎥 Demo
+
+<div align="center">
+  <video src="https://github.com/user-attachments/assets/22440764-0d7e-438e-a44d-d0dad1006d3d" width="70%" poster="./assets/video_poster.png"> </video>
+</div>
+
+
+## 🧩 Community Contributions
+
+If you develop/use HunyuanVideo in your projects, welcome to let us know.
+
+- ComfyUI-Kijai (FP8 Inference, V2V and IP2V Generation): [ComfyUI-HunyuanVideoWrapper](https://github.com/kijai/ComfyUI-HunyuanVideoWrapper) by [Kijai](https://github.com/kijai)
+- ComfyUI-Native (Native Support): [ComfyUI-HunyuanVideo](https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_video/) by [ComfyUI Official](https://github.com/comfyanonymous/ComfyUI)
+
+- FastVideo (Consistency Distilled Model and Sliding Tile Attention): [FastVideo](https://github.com/hao-ai-lab/FastVideo) and [Sliding Tile Attention](https://hao-ai-lab.github.io/blogs/sta/) by [Hao AI Lab](https://hao-ai-lab.github.io/)
+- HunyuanVideo-gguf (GGUF Version and Quantization): [HunyuanVideo-gguf](https://huggingface.co/city96/HunyuanVideo-gguf) by [city96](https://huggingface.co/city96)
+- Enhance-A-Video (Better Generated Video for Free): [Enhance-A-Video](https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video) by [NUS-HPC-AI-Lab](https://ai.comp.nus.edu.sg/)
+- TeaCache (Cache-based Accelerate): [TeaCache](https://github.com/LiewFeng/TeaCache) by [Feng Liu](https://github.com/LiewFeng)
+- HunyuanVideoGP (GPU Poor version): [HunyuanVideoGP](https://github.com/deepbeepmeep/HunyuanVideoGP) by [DeepBeepMeep](https://github.com/deepbeepmeep)
+- RIFLEx (Video Length Extrapolation): [RIFLEx](https://riflex-video.github.io/) by [Tsinghua University](https://riflex-video.github.io/)
+- HunyuanVideo Keyframe Control Lora: [hunyuan-video-keyframe-control-lora](https://github.com/dashtoon/hunyuan-video-keyframe-control-lora) by [dashtoon](https://github.com/dashtoon)
+- Sparse-VideoGen (Accelerate Video Generation with High Pixel-level Fidelity): [Sparse-VideoGen](https://github.com/svg-project/Sparse-VideoGen) by [University of California, Berkeley](https://svg-project.github.io/)
+- FramePack (Packing Input Frame Context in Next-Frame Prediction Models for Video Generation): [FramePack](https://github.com/lllyasviel/FramePack) by [Lvmin Zhang](https://github.com/lllyasviel) 
+- Jenga (Training-Free Efficient Video Generation via Dynamic Token Carving): [Jenga](https://github.com/dvlab-research/Jenga) by [DV Lab](https://github.com/dvlab-research)
+- DCM (Dual-Expert Consistency Model for Efficient and High-Quality Video Generation): [DCM](https://github.com/Vchitect/DCM) by [Vchitect](https://github.com/Vchitect/DCM)
+
+
+## 📑 Open-source Plan
+
+- HunyuanVideo (Text-to-Video Model)
+  - [x] Inference 
+  - [x] Checkpoints
+  - [x] Multi-gpus Sequence Parallel inference (Faster inference speed on more gpus)
+  - [x] Web Demo (Gradio)
+  - [x] Diffusers 
+  - [x] FP8 Quantified weight
+  - [x] Penguin Video Benchmark
+  - [x] ComfyUI
+- [HunyuanVideo (Image-to-Video Model)](https://github.com/Tencent/HunyuanVideo-I2V)
+  - [X] Inference 
+  - [X] Checkpoints 
+
+
+
+## Contents
+- [HunyuanVideo: A Systematic Framework For Large Video Generation Model](#hunyuanvideo-a-systematic-framework-for-large-video-generation-model)
+  - [🎥 Demo](#-demo)
+  - [🔥🔥🔥 News!!](#-news)
+  - [🧩 Community Contributions](#-community-contributions)
+  - [📑 Open-source Plan](#-open-source-plan)
+  - [Contents](#contents)
+  - [**Abstract**](#abstract)
+  - [**HunyuanVideo Overall Architecture**](#hunyuanvideo-overall-architecture)
+  - [🎉 **HunyuanVideo Key Features**](#-hunyuanvideo-key-features)
+    - [**Unified Image and Video Generative Architecture**](#unified-image-and-video-generative-architecture)
+    - [**MLLM Text Encoder**](#mllm-text-encoder)
+    - [**3D VAE**](#3d-vae)
+    - [**Prompt Rewrite**](#prompt-rewrite)
+  - [📈 Comparisons](#-comparisons)
+  - [📜 Requirements](#-requirements)
+  - [🛠️ Dependencies and Installation](#️-dependencies-and-installation)
+    - [Installation Guide for Linux](#installation-guide-for-linux)
+  - [🧱 Download Pretrained Models](#-download-pretrained-models)
+  - [🔑 Single-gpu Inference](#-single-gpu-inference)
+    - [Using Command Line](#using-command-line)
+    - [Run a Gradio Server](#run-a-gradio-server)
+    - [More Configurations](#more-configurations)
+  - [🚀 Parallel Inference on Multiple GPUs by xDiT](#-parallel-inference-on-multiple-gpus-by-xdit)
+    - [Using Command Line](#using-command-line-1)
+  - [🚀  FP8 Inference](#--fp8-inference)
+    - [Using Command Line](#using-command-line-2)
+  - [🔗 BibTeX](#-bibtex)
+  - [Acknowledgements](#acknowledgements)
+  - [Star History](#star-history)
+---
+
+## **Abstract**
+We present HunyuanVideo, a novel open-source video foundation model that exhibits performance in video generation that is comparable to, if not superior to, leading closed-source models. In order to train HunyuanVideo model, we adopt several key technologies for model learning, including data curation, image-video joint model training, and an efficient infrastructure designed to facilitate large-scale model training and inference. Additionally, through an effective strategy for scaling model architecture and dataset, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. 
+
+We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion diversity, text-video alignment, and generation stability. According to professional human evaluation results, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and 3 top-performing Chinese video generative models. By releasing the code and weights of the foundation model and its applications, we aim to bridge the gap between closed-source and open-source video foundation models. This initiative will empower everyone in the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. 
+
+
+
+## **HunyuanVideo Overall Architecture**
+
+HunyuanVideo is trained on a spatial-temporally
+compressed latent space, which is compressed through a Causal 3D VAE. Text prompts are encoded
+using a large language model, and used as the conditions. Taking Gaussian noise and the conditions as
+input, our generative model produces an output latent, which is then decoded to images or videos through
+the 3D VAE decoder.
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/overall.png"  height=300>
+</p>
+
+
+## 🎉 **HunyuanVideo Key Features**
+
+### **Unified Image and Video Generative Architecture**
+HunyuanVideo introduces the Transformer design and employs a Full Attention mechanism for unified image and video generation. 
+Specifically, we use a "Dual-stream to Single-stream" hybrid model design for video generation. In the dual-stream phase, video and text
+tokens are processed independently through multiple Transformer blocks, enabling each modality to learn its own appropriate modulation mechanisms without interference. In the single-stream phase, we concatenate the video and text
+tokens and feed them into subsequent Transformer blocks for effective multimodal information fusion.
+This design captures complex interactions between visual and semantic information, enhancing
+overall model performance.
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/backbone.png"  height=350>
+</p>
+
+### **MLLM Text Encoder**
+Some previous text-to-video models typically use pre-trained CLIP and T5-XXL as text encoders where CLIP uses Transformer Encoder and T5 uses an Encoder-Decoder structure. In contrast, we utilize a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only structure as our text encoder, which has the following advantages: (i) Compared with T5, MLLM after visual instruction finetuning has better image-text alignment in the feature space, which alleviates the difficulty of the instruction following in diffusion models; (ii)
+Compared with CLIP, MLLM has demonstrated superior ability in image detail description
+and complex reasoning; (iii) MLLM can play as a zero-shot learner by following system instructions prepended to user prompts, helping text features pay more attention to key information. In addition, MLLM is based on causal attention while T5-XXL utilizes bidirectional attention that produces better text guidance for diffusion models. Therefore, we introduce an extra bidirectional token refiner to enhance text features.
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/text_encoder.png"  height=275>
+</p>
+
+### **3D VAE**
+HunyuanVideo trains a 3D VAE with CausalConv3D to compress pixel-space videos and images into a compact latent space. We set the compression ratios of video length, space, and channel to 4, 8, and 16 respectively. This can significantly reduce the number of tokens for the subsequent diffusion transformer model, allowing us to train videos at the original resolution and frame rate.
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/3dvae.png"  height=150>
+</p>
+
+### **Prompt Rewrite**
+To address the variability in linguistic style and length of user-provided prompts, we fine-tune the [Hunyuan-Large model](https://github.com/Tencent/Tencent-Hunyuan-Large) as our prompt rewrite model to adapt the original user prompt to model-preferred prompt.
+
+We provide two rewrite modes: Normal mode and Master mode, which can be called using different prompts. The prompts are shown [here](hyvideo/prompt_rewrite.py). The Normal mode is designed to enhance the video generation model's comprehension of user intent, facilitating a more accurate interpretation of the instructions provided. The Master mode enhances the description of aspects such as composition, lighting, and camera movement, which leans towards generating videos with a higher visual quality. However, this emphasis may occasionally result in the loss of some semantic details. 
+
+The Prompt Rewrite Model can be directly deployed and inferred using the [Hunyuan-Large original code](https://github.com/Tencent/Tencent-Hunyuan-Large). We release the weights of the Prompt Rewrite Model [here](https://huggingface.co/Tencent/HunyuanVideo-PromptRewrite).
+
+
+
+## 📈 Comparisons
+
+To evaluate the performance of HunyuanVideo, we selected five strong baselines from closed-source video generation models. In total, we utilized 1,533 text prompts, generating an equal number of video samples with HunyuanVideo in a single run. For a fair comparison, we conducted inference only once, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models, ensuring consistent video resolution. Videos were assessed based on three criteria: Text Alignment, Motion Quality, and Visual Quality. More than 60 professional evaluators performed the evaluation. Notably, HunyuanVideo demonstrated the best overall performance, particularly excelling in motion quality. Please note that the evaluation is based on Hunyuan Video's high-quality version. This is different from the currently released fast version.
+
+<p align="center">
+<table> 
+<thead> 
+<tr> 
+    <th rowspan="2">Model</th> <th rowspan="2">Open Source</th> <th>Duration</th> <th>Text Alignment</th> <th>Motion Quality</th> <th rowspan="2">Visual Quality</th> <th rowspan="2">Overall</th>  <th rowspan="2">Ranking</th>
+</tr> 
+</thead> 
+<tbody> 
+<tr> 
+    <td>HunyuanVideo (Ours)</td> <td> ✔ </td> <td>5s</td> <td>61.8%</td> <td>66.5%</td> <td>95.7%</td> <td>41.3%</td> <td>1</td>
+</tr> 
+<tr> 
+    <td>CNTopA (API)</td> <td> &#10008 </td> <td>5s</td> <td>62.6%</td> <td>61.7%</td> <td>95.6%</td> <td>37.7%</td> <td>2</td>
+</tr> 
+<tr> 
+    <td>CNTopB (Web)</td> <td> &#10008</td> <td>5s</td> <td>60.1%</td> <td>62.9%</td> <td>97.7%</td> <td>37.5%</td> <td>3</td>
+</tr> 
+<tr> 
+    <td>GEN-3 alpha (Web)</td> <td>&#10008</td> <td>6s</td> <td>47.7%</td> <td>54.7%</td> <td>97.5%</td> <td>27.4%</td> <td>4</td> 
+</tr> 
+<tr> 
+    <td>Luma1.6 (API)</td><td>&#10008</td> <td>5s</td> <td>57.6%</td> <td>44.2%</td> <td>94.1%</td> <td>24.8%</td> <td>5</td>
+</tr>
+<tr> 
+    <td>CNTopC (Web)</td> <td>&#10008</td> <td>5s</td> <td>48.4%</td> <td>47.2%</td> <td>96.3%</td> <td>24.6%</td> <td>6</td>
+</tr> 
+</tbody>
+</table>
+</p>
+
+
+## 📜 Requirements
+
+The following table shows the requirements for running HunyuanVideo model (batch size = 1) to generate videos:
+
+|     Model    |  Setting<br/>(height/width/frame) | GPU Peak Memory  |
+|:------------:|:--------------------------------:|:----------------:|
+| HunyuanVideo   |        720px1280px129f          |       60GB        |
+| HunyuanVideo   |        544px960px129f           |       45GB        |
+
+* An NVIDIA GPU with CUDA support is required. 
+  * The model is tested on a single 80G GPU.
+  * **Minimum**: The minimum GPU memory required is 60GB for 720px1280px129f and 45G for 544px960px129f.
+  * **Recommended**: We recommend using a GPU with 80GB of memory for better generation quality.
+* Tested operating system: Linux
+
+
+
+## 🛠️ Dependencies and Installation
+
+Begin by cloning the repository:
+```shell
+git clone https://github.com/Tencent-Hunyuan/HunyuanVideo
+cd HunyuanVideo
+```
+
+### Installation Guide for Linux
+
+We recommend CUDA versions 12.4 or 11.8 for the manual installation.
+
+Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html).
+
+```shell
+# 1. Create conda environment
+conda create -n HunyuanVideo python==3.10.9
+
+# 2. Activate the environment
+conda activate HunyuanVideo
+
+# 3. Install PyTorch and other dependencies using conda
+# For CUDA 11.8
+conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
+# For CUDA 12.4
+conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
+
+# 4. Install pip dependencies
+python -m pip install -r requirements.txt
+
+# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
+python -m pip install ninja
+python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
+
+# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
+python -m pip install xfuser==0.4.0
+```
+
+In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions:
+
+```shell
+# Option 1: Making sure you have installed CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00 (or simply using our CUDA 12 docker image).
+pip install nvidia-cublas-cu12==12.4.5.8
+export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/
+
+# Option 2: Forcing to explictly use the CUDA 11.8 compiled version of Pytorch and all the other packages
+pip uninstall -r requirements.txt  # uninstall all packages
+pip uninstall -y xfuser
+pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118
+pip install -r requirements.txt
+pip install ninja
+pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
+pip install xfuser==0.4.0
+```
+
+Additionally, HunyuanVideo also provides a pre-built Docker image. Use the following command to pull and run the docker image.
+
+```shell
+# For CUDA 12.4 (updated to avoid float point exception)
+docker pull hunyuanvideo/hunyuanvideo:cuda_12
+docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12
+
+# For CUDA 11.8
+docker pull hunyuanvideo/hunyuanvideo:cuda_11
+docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11
+```
+
+
+## 🧱 Download Pretrained Models
+
+The details of download pretrained models are shown [here](ckpts/README.md).
+
+
+
+## 🔑 Single-gpu Inference
+
+We list the height/width/frame settings we support in the following table.
+
+|      Resolution       |           h/w=9:16           |    h/w=16:9     |     h/w=4:3     |     h/w=3:4     |     h/w=1:1     |
+|:---------------------:|:----------------------------:|:---------------:|:---------------:|:---------------:|:---------------:|
+|         540p          |        544px960px129f        |  960px544px129f | 624px832px129f  |  832px624px129f |  720px720px129f |
+| 720p (recommended)    |       720px1280px129f        | 1280px720px129f | 1104px832px129f | 832px1104px129f | 960px960px129f  |
+
+### Using Command Line
+
+```bash
+cd HunyuanVideo
+
+python3 sample_video.py \
+    --video-size 720 1280 \
+    --video-length 129 \
+    --infer-steps 50 \
+    --prompt "A cat walks on the grass, realistic style." \
+    --flow-reverse \
+    --use-cpu-offload \
+    --save-path ./results
+```
+
+### Run a Gradio Server
+```bash
+python3 gradio_server.py --flow-reverse
+
+# set SERVER_NAME and SERVER_PORT manually
+# SERVER_NAME=0.0.0.0 SERVER_PORT=8081 python3 gradio_server.py --flow-reverse
+```
+
+### More Configurations
+
+We list some more useful configurations for easy usage:
+
+|        Argument        |  Default  |                Description                |
+|:----------------------:|:---------:|:-----------------------------------------:|
+|       `--prompt`       |   None    |   The text prompt for video generation    |
+|     `--video-size`     | 720 1280  |      The size of the generated video      |
+|    `--video-length`    |    129    |     The length of the generated video     |
+|    `--infer-steps`     |    50     |     The number of steps for sampling      |
+| `--embedded-cfg-scale` |    6.0    |    Embedded  Classifier free guidance scale       |
+|     `--flow-shift`     |    7.0    | Shift factor for flow matching schedulers |
+|     `--flow-reverse`   |    False  | If reverse, learning/sampling from t=1 -> t=0 |
+|        `--seed`        |     None  |   The random seed for generating video, if None, we init a random seed    |
+|  `--use-cpu-offload`   |   False   |    Use CPU offload for the model load to save more memory, necessary for high-res video generation    |
+|     `--save-path`      | ./results |     Path to save the generated video      |
+
+
+
+## 🚀 Parallel Inference on Multiple GPUs by xDiT
+
+[xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters.
+It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo model.
+
+### Using Command Line
+
+For example, to generate a video with 8 GPUs, you can use the following command:
+
+```bash
+cd HunyuanVideo
+
+torchrun --nproc_per_node=8 sample_video.py \
+    --video-size 1280 720 \
+    --video-length 129 \
+    --infer-steps 50 \
+    --prompt "A cat walks on the grass, realistic style." \
+    --flow-reverse \
+    --seed 42 \
+    --ulysses-degree 8 \
+    --ring-degree 1 \
+    --save-path ./results
+```
+
+You can change the `--ulysses-degree` and `--ring-degree` to control the parallel configurations for the best performance. The valid parallel configurations are shown in the following table.
+
+<details>
+<summary>Supported Parallel Configurations (Click to expand)</summary>
+
+|     --video-size     | --video-length | --ulysses-degree x --ring-degree | --nproc_per_node |
+|----------------------|----------------|----------------------------------|------------------|
+| 1280 720 or 720 1280 | 129            | 8x1,4x2,2x4,1x8                  | 8                |
+| 1280 720 or 720 1280 | 129            | 1x5                              | 5                |
+| 1280 720 or 720 1280 | 129            | 4x1,2x2,1x4                      | 4                |
+| 1280 720 or 720 1280 | 129            | 3x1,1x3                          | 3                |
+| 1280 720 or 720 1280 | 129            | 2x1,1x2                          | 2                |
+| 1104 832 or 832 1104 | 129            | 4x1,2x2,1x4                      | 4                |
+| 1104 832 or 832 1104 | 129            | 3x1,1x3                          | 3                |
+| 1104 832 or 832 1104 | 129            | 2x1,1x2                          | 2                |
+| 960 960              | 129            | 6x1,3x2,2x3,1x6                  | 6                |
+| 960 960              | 129            | 4x1,2x2,1x4                      | 4                |
+| 960 960              | 129            | 3x1,1x3                          | 3                |
+| 960 960              | 129            | 1x2,2x1                          | 2                |
+| 960 544 or 544 960   | 129            | 6x1,3x2,2x3,1x6                  | 6                |
+| 960 544 or 544 960   | 129            | 4x1,2x2,1x4                      | 4                |
+| 960 544 or 544 960   | 129            | 3x1,1x3                          | 3                |
+| 960 544 or 544 960   | 129            | 1x2,2x1                          | 2                |
+| 832 624 or 624 832   | 129            | 4x1,2x2,1x4                      | 4                |
+| 624 832 or 624 832   | 129            | 3x1,1x3                          | 3                |
+| 832 624 or 624 832   | 129            | 2x1,1x2                          | 2                |
+| 720 720              | 129            | 1x5                              | 5                |
+| 720 720              | 129            | 3x1,1x3                          | 3                |
+
+</details>
+
+
+<p align="center">
+<table align="center">
+<thead>
+<tr>
+    <th colspan="4">Latency (Sec) for 1280x720 (129 frames 50 steps) on 8xGPU</th>
+</tr>
+<tr>
+    <th>1</th>
+    <th>2</th>
+    <th>4</th>
+    <th>8</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+    <th>1904.08</th>
+    <th>934.09 (2.04x)</th>
+    <th>514.08 (3.70x)</th>
+    <th>337.58 (5.64x)</th>
+</tr>
+
+</tbody>
+</table>
+</p>
+
+
+
+## 🚀  FP8 Inference
+
+Using HunyuanVideo with FP8 quantized weights, which saves about 10GB of GPU memory. You can download the [weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) and [weight scales](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8_map.pt) from Huggingface.
+
+### Using Command Line
+
+Here, you must explicitly specify the FP8 weight path. For example, to generate a video with fp8 weights, you can use the following command:
+
+```bash
+cd HunyuanVideo
+
+DIT_CKPT_PATH={PATH_TO_FP8_WEIGHTS}/{WEIGHT_NAME}_fp8.pt
+
+python3 sample_video.py \
+    --dit-weight ${DIT_CKPT_PATH} \
+    --video-size 1280 720 \
+    --video-length 129 \
+    --infer-steps 50 \
+    --prompt "A cat walks on the grass, realistic style." \
+    --seed 42 \
+    --embedded-cfg-scale 6.0 \
+    --flow-shift 7.0 \
+    --flow-reverse \
+    --use-cpu-offload \
+    --use-fp8 \
+    --save-path ./results
+```
+
+
+
+## 🔗 BibTeX
+
+If you find [HunyuanVideo](https://arxiv.org/abs/2412.03603) useful for your research and applications, please cite using this BibTeX:
+
+```BibTeX
+@article{kong2024hunyuanvideo,
+  title={Hunyuanvideo: A systematic framework for large video generative models},
+  author={Kong, Weijie and Tian, Qi and Zhang, Zijian and Min, Rox and Dai, Zuozhuo and Zhou, Jin and Xiong, Jiangfeng and Li, Xin and Wu, Bo and Zhang, Jianwei and others},
+  journal={arXiv preprint arXiv:2412.03603},
+  year={2024}
+}
+```
+
+
+
+## Acknowledgements
+
+We would like to thank the contributors to the [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [FLUX](https://github.com/black-forest-labs/flux), [Llama](https://github.com/meta-llama/llama), [LLaVA](https://github.com/haotian-liu/LLaVA), [Xtuner](https://github.com/InternLM/xtuner), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open research and exploration.
+Additionally, we also thank the Tencent Hunyuan Multimodal team for their help with the text encoder. 
+
+
+## Github Star History
+<a href="https://star-history.com/#Tencent-Hunyuan/HunyuanVideo&Date">
+ <picture>
+   <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo&type=Date&theme=dark" />
+   <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo&type=Date" />
+   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo&type=Date" />
+ </picture>
+</a>
--- a/README_zh.md
+++ b/README_zh.md
+<!-- ## **HunyuanVideo** -->
+
+[English](./README.md)
+
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/logo.png"  height=100>
+</p>
+
+# HunyuanVideo: A Systematic Framework For Large Video Generation Model
+
+<div align="center">
+  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo"><img src="https://img.shields.io/static/v1?label=HunyuanVideo Code&message=Github&color=blue"></a> &ensp;
+  <a href="https://aivideo.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Web&color=green"></a> &ensp;
+  <a href="https://video.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Playground&message=Web&color=green"></a>
+</div>
+<div align="center">
+  <a href="https://arxiv.org/abs/2412.03603"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red"></a> &ensp;
+  <a href="https://aivideo.hunyuan.tencent.com/hunyuanvideo.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=High-Quality Version (~350M)&color=red"></a>
+</div>
+<div align="center">
+  <a href="https://huggingface.co/tencent/HunyuanVideo"><img src="https://img.shields.io/static/v1?label=HunyuanVideo&message=HuggingFace&color=yellow"></a> &ensp;
+  <a href="https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video"><img src="https://img.shields.io/static/v1?label=HunyuanVideo&message=Diffusers&color=yellow"></a> &ensp;
+  <a href="https://huggingface.co/tencent/HunyuanVideo-PromptRewrite"><img src="https://img.shields.io/static/v1?label=HunyuanVideo-PromptRewrite&message=HuggingFace&color=yellow"></a>
+
+
+ [![Replicate](https://replicate.com/zsxkib/hunyuan-video/badge)](https://replicate.com/zsxkib/hunyuan-video)
+</div>
+
+
+<p align="center">
+    👋 加入我们的 <a href="assets/WECHAT.md" target="_blank">WeChat</a> 和 <a href="https://discord.gg/tv7FkG4Nwf" target="_blank">Discord</a> 
+</p>
+
+
+
+-----
+
+本仓库包含了 HunyuanVideo 项目的 PyTorch 模型定义、预训练权重和推理/采样代码。参考我们的项目页面 [project page](https://aivideo.hunyuan.tencent.com) 查看更多内容。
+
+> [**HunyuanVideo: A Systematic Framework For Large Video Generation Model**](https://arxiv.org/abs/2412.03603) <br>
+
+
+
+## 🔥🔥🔥 更新!!
+
+* 2025年05月28日: 💃 开源 [HunyuanVideo-Avatar](https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar), 腾讯混元语音数字人模型。
+* 2025年05月09日: 🙆 开源 [HunyuanCustom](https://github.com/Tencent-Hunyuan/HunyuanCustom), 腾讯混元一致性视频生成模型。
+* 2025年03月06日: 🌅 开源 [HunyuanVideo-I2V](https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V), 支持高质量图生视频。
+* 2025年01月13日: 📈 开源 Penguin Video [基准测试集](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/assets/PenguinVideoBenchmark.csv) 。
+* 2024年12月18日: 🏃‍♂️ 开源 HunyuanVideo [FP8 模型权重](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt)，节省更多 GPU 显存。
+* 2024年12月17日: 🤗 HunyuanVideo已经集成到[Diffusers](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video)中。
+* 2024年12月03日: 🚀 开源 HunyuanVideo 多卡并行推理代码，由[xDiT](https://github.com/xdit-project/xDiT)提供。
+* 2024年12月03日: 👋 开源 HunyuanVideo 文生视频的推理代码和模型权重。
+
+
+
+## 🎥 作品展示
+
+<div align="center">
+  <video width="70%" src="https://github.com/user-attachments/assets/22440764-0d7e-438e-a44d-d0dad1006d3d" poster="./assets/video_poster.png"> </video>
+</div>
+
+
+## 🧩 社区贡献
+
+如果您的项目中有开发或使用 HunyuanVideo，欢迎告知我们。
+
+- ComfyUI (支持FP8推理、V2V和IP2V生成): [ComfyUI-HunyuanVideoWrapper](https://github.com/kijai/ComfyUI-HunyuanVideoWrapper) by [Kijai](https://github.com/kijai)
+
+- ComfyUI-Native (ComfyUI官方原生支持): [ComfyUI-HunyuanVideo](https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_video/) by [ComfyUI Official](https://github.com/comfyanonymous/ComfyUI)
+
+- FastVideo (一致性蒸馏模型、滑动块注意力): [FastVideo](https://github.com/hao-ai-lab/FastVideo) and [Sliding Tile Attention](https://hao-ai-lab.github.io/blogs/sta/) by [Hao AI Lab](https://hao-ai-lab.github.io/)
+
+- HunyuanVideo-gguf (GGUF、量化): [HunyuanVideo-gguf](https://huggingface.co/city96/HunyuanVideo-gguf) by [city96](https://huggingface.co/city96)
+
+- Enhance-A-Video (生成更高质量的视频): [Enhance-A-Video](https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video) by [NUS-HPC-AI-Lab](https://ai.comp.nus.edu.sg/)
+
+- TeaCache (基于缓存的加速采样): [TeaCache](https://github.com/LiewFeng/TeaCache) by [Feng Liu](https://github.com/LiewFeng)
+
+- HunyuanVideoGP (针对低性能GPU的版本): [HunyuanVideoGP](https://github.com/deepbeepmeep/HunyuanVideoGP) by [DeepBeepMeep](https://github.com/deepbeepmeep)
+
+- RIFLEx (视频时序外拓): [RIFLEx](https://riflex-video.github.io/) by [Tsinghua University](https://riflex-video.github.io/)
+- HunyuanVideo Keyframe Control Lora (视频关键帧控制LoRA): [hunyuan-video-keyframe-control-lora](https://github.com/dashtoon/hunyuan-video-keyframe-control-lora) by [dashtoon](https://github.com/dashtoon)
+- Sparse-VideoGen (基于高像素级保真度的视频加速生成): [Sparse-VideoGen](https://github.com/svg-project/Sparse-VideoGen) by [University of California, Berkeley](https://svg-project.github.io/)
+- FramePack (将输入帧上下文打包到下一帧预测模型中用于视频生成): [FramePack](https://github.com/lllyasviel/FramePack) by [Lvmin Zhang](https://github.com/lllyasviel)
+- Jenga (加速采样): [Jenga](https://github.com/dvlab-research/Jenga) by [DV Lab](https://github.com/dvlab-research)
+- DCM (用于高效高质量视频生成的双专家一致性模型): [DCM](https://github.com/Vchitect/DCM) by [Vchitect](https://github.com/Vchitect/DCM)
+
+
+
+
+## 📑 开源计划
+
+- HunyuanVideo (文生视频模型)
+  - [x] 推理代码
+  - [x] 模型权重 
+  - [x] 多GPU序列并行推理（GPU 越多，推理速度越快）
+  - [x] Web Demo (Gradio) 
+  - [x] Diffusers 
+  - [x] FP8 量化版本
+  - [x] Penguin Video 基准测试集 
+  - [x] ComfyUI
+- [HunyuanVideo (图生视频模型)](https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V)
+  - [x] 推理代码 
+  - [x] 模型权重 
+
+
+
+## 目录
+
+- [HunyuanVideo: A Systematic Framework For Large Video Generation Model](#hunyuanvideo-a-systematic-framework-for-large-video-generation-model)
+  - [🎥 作品展示](#-作品展示)
+  - [🔥🔥🔥 更新!!](#-更新)
+  - [🧩 社区贡献](#-社区贡献)
+  - [📑 开源计划](#-开源计划)
+  - [目录](#目录)
+  - [**摘要**](#摘要)
+  - [**HunyuanVideo 的架构**](#hunyuanvideo-的架构)
+  - [🎉 **亮点**](#-亮点)
+    - [**统一的图视频生成架构**](#统一的图视频生成架构)
+    - [**MLLM 文本编码器**](#mllm-文本编码器)
+    - [**3D VAE**](#3d-vae)
+    - [**Prompt 改写**](#prompt-改写)
+  - [📈 能力评估](#-能力评估)
+  - [📜 运行配置](#-运行配置)
+  - [🛠️ 安装和依赖](#️-安装和依赖)
+    - [Linux 安装指引](#linux-安装指引)
+  - [🧱 下载预训练模型](#-下载预训练模型)
+  - [🔑 单卡推理](#-单卡推理)
+    - [使用命令行](#使用命令行)
+    - [运行gradio服务](#运行gradio服务)
+    - [更多配置](#更多配置)
+  - [🚀 使用 xDiT 实现多卡并行推理](#-使用-xdit-实现多卡并行推理)
+    - [使用命令行](#使用命令行-1)
+  - [🚀   FP8 Inference](#---fp8-inference)
+    - [Using Command Line](#using-command-line)
+  - [🔗 BibTeX](#-bibtex)
+  - [致谢](#致谢)
+  - [Star 趋势](#star-趋势)
+---
+
+
+
+## **摘要**
+
+HunyuanVideo 是一个全新的开源视频生成大模型，具有与领先的闭源模型相媲美甚至更优的视频生成表现。为了训练 HunyuanVideo，我们采用了一个全面的框架，集成了数据整理、图像-视频联合模型训练和高效的基础设施以支持大规模模型训练和推理。此外，通过有效的模型架构和数据集扩展策略，我们成功地训练了一个拥有超过 130 亿参数的视频生成模型，使其成为最大的开源视频生成模型之一。
+
+我们在模型结构的设计上做了大量的实验以确保其能拥有高质量的视觉效果、多样的运动、文本-视频对齐和生成稳定性。根据专业人员的评估结果，HunyuanVideo 在综合指标上优于以往的最先进模型，包括 Runway Gen-3、Luma 1.6 和 3 个中文社区表现最好的视频生成模型。**通过开源基础模型和应用模型的代码和权重，我们旨在弥合闭源和开源视频基础模型之间的差距，帮助社区中的每个人都能够尝试自己的想法，促进更加动态和活跃的视频生成生态。**
+
+
+
+## **HunyuanVideo 的架构**
+
+HunyuanVideo 是一个隐空间模型，训练时它采用了 3D VAE 压缩时间维度和空间维度的特征。文本提示通过一个大语言模型编码后作为条件输入模型，引导模型通过对高斯噪声的多步去噪，输出一个视频的隐空间表示。最后，推理时通过 3D VAE 解码器将隐空间表示解码为视频。
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/overall.png"  height=300>
+</p>
+
+
+## 🎉 **亮点**
+
+### **统一的图视频生成架构**
+
+HunyuanVideo 采用了 Transformer 和 Full Attention 的设计用于视频生成。具体来说，我们使用了一个“双流到单流”的混合模型设计用于视频生成。在双流阶段，视频和文本 token 通过并行的 Transformer Block 独立处理，使得每个模态可以学习适合自己的调制机制而不会相互干扰。在单流阶段，我们将视频和文本 token 连接起来并将它们输入到后续的 Transformer Block 中进行有效的多模态信息融合。这种设计捕捉了视觉和语义信息之间的复杂交互，增强了整体模型性能。
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/backbone.png"  height=350>
+</p>
+
+### **MLLM 文本编码器**
+过去的视频生成模型通常使用预训练的 CLIP 和 T5-XXL 作为文本编码器，其中 CLIP 使用 Transformer Encoder，T5 使用 Encoder-Decoder 结构。HunyuanVideo 使用了一个预训练的 Multimodal Large Language Model (MLLM) 作为文本编码器，它具有以下优势：
+* 与 T5 相比，MLLM 基于图文数据指令微调后在特征空间中具有更好的图像-文本对齐能力，这减轻了扩散模型中的图文对齐的难度；
+* 与 CLIP 相比，MLLM 在图像的细节描述和复杂推理方面表现出更强的能力；
+* MLLM 可以通过遵循系统指令实现零样本生成，帮助文本特征更多地关注关键信息。
+
+由于 MLLM 是基于 Causal Attention 的，而 T5-XXL 使用了 Bidirectional Attention 为扩散模型提供更好的文本引导。因此，我们引入了一个额外的 token 优化器来增强文本特征。
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/text_encoder.png"  height=275>
+</p>
+
+### **3D VAE**
+我们的 VAE 采用了 CausalConv3D 作为 HunyuanVideo 的编码器和解码器，用于压缩视频的时间维度和空间维度，其中时间维度压缩 4 倍，空间维度压缩 8 倍，压缩为 16 channels。这样可以显著减少后续 Transformer 模型的 token 数量，使我们能够在原始分辨率和帧率下训练视频生成模型。
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanVideo/refs/heads/main/assets/3dvae.png"  height=150>
+</p>
+
+### **Prompt 改写**
+为了解决用户输入文本提示的多样性和不一致性的困难，我们微调了 [Hunyuan-Large model](https://github.com/Tencent/Tencent-Hunyuan-Large) 模型作为我们的 prompt 改写模型，将用户输入的提示词改写为更适合模型偏好的写法。
+
+我们提供了两个改写模式：正常模式和导演模式。两种模式的提示词见[这里](hyvideo/prompt_rewrite.py)。正常模式旨在增强视频生成模型对用户意图的理解，从而更准确地解释提供的指令。导演模式增强了诸如构图、光照和摄像机移动等方面的描述，倾向于生成视觉质量更高的视频。注意，这种增强有时可能会导致一些语义细节的丢失。
+
+Prompt 改写模型可以直接使用 [Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) 部署和推理. 我们开源了 prompt 改写模型的权重，见[这里](https://huggingface.co/Tencent/HunyuanVideo-PromptRewrite).
+
+
+
+## 📈 能力评估
+
+为了评估 HunyuanVideo 的能力，我们选择了四个闭源视频生成模型作为对比。我们总共使用了 1,533 个 prompt，每个 prompt 通过一次推理生成了相同数量的视频样本。为了公平比较，我们只进行了一次推理以避免任何挑选。在与其他方法比较时，我们保持了所有选择模型的默认设置，并确保了视频分辨率的一致性。视频根据三个标准进行评估：文本对齐、运动质量和视觉质量。在 60 多名专业评估人员评估后，HunyuanVideo 在综合指标上表现最好，特别是在运动质量方面表现较为突出。
+
+<p align="center">
+<table> 
+<thead> 
+<tr> 
+    <th rowspan="2">模型</th> <th rowspan="2">是否开源</th> <th>时长</th> <th>文本对齐</th> <th>运动质量</th> <th rowspan="2">视觉质量</th> <th rowspan="2">综合评价</th>  <th rowspan="2">排序</th>
+</tr> 
+</thead> 
+<tbody> 
+<tr> 
+    <td>HunyuanVideo (Ours)</td> <td> ✔ </td> <td>5s</td> <td>61.8%</td> <td>66.5%</td> <td>95.7%</td> <td>41.3%</td> <td>1</td>
+</tr> 
+<tr> 
+    <td>国内模型 A (API)</td> <td> &#10008 </td> <td>5s</td> <td>62.6%</td> <td>61.7%</td> <td>95.6%</td> <td>37.7%</td> <td>2</td>
+</tr> 
+<tr> 
+    <td>国内模型 B (Web)</td> <td> &#10008</td> <td>5s</td> <td>60.1%</td> <td>62.9%</td> <td>97.7%</td> <td>37.5%</td> <td>3</td>
+</tr> 
+<tr> 
+    <td>GEN-3 alpha (Web)</td> <td>&#10008</td> <td>6s</td> <td>47.7%</td> <td>54.7%</td> <td>97.5%</td> <td>27.4%</td> <td>4</td> 
+</tr> 
+<tr> 
+    <td>Luma1.6 (API)</td><td>&#10008</td> <td>5s</td> <td>57.6%</td> <td>44.2%</td> <td>94.1%</td> <td>24.8%</td> <td>5</td>
+</tr>
+</tbody>
+</table>
+</p>
+
+
+## 📜 运行配置
+
+下表列出了运行 HunyuanVideo 模型使用文本生成视频的推荐配置（batch size = 1）：
+
+|     模型      | 分辨率<br/>(height/width/frame) | 峰值显存  |
+|:--------------:|:--------------------------------:|:----------------:|
+| HunyuanVideo   |         720px1280px129f          |       60G        |
+| HunyuanVideo   |          544px960px129f          |       45G        |
+
+* 本项目适用于使用 NVIDIA GPU 和支持 CUDA 的设备
+  * 模型在单张 80G GPU 上测试
+  * 运行 720px1280px129f 的最小显存要求是 60GB，544px960px129f 的最小显存要求是 45GB。
+* 测试操作系统：Linux
+
+
+
+## 🛠️ 安装和依赖
+
+首先克隆 git 仓库:
+```shell
+git clone https://github.com/Tencent-Hunyuan/HunyuanVideo
+cd HunyuanVideo
+```
+
+### Linux 安装指引
+
+我们推荐使用 CUDA 12.4 或 11.8 的版本。
+
+Conda 的安装指南可以参考[这里](https://docs.anaconda.com/free/miniconda/index.html)。
+
+```shell
+# 1. Create conda environment
+conda create -n HunyuanVideo python==3.10.9
+
+# 2. Activate the environment
+conda activate HunyuanVideo
+
+# 3. Install PyTorch and other dependencies using conda
+# For CUDA 11.8
+conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
+# For CUDA 12.4
+conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
+
+# 4. Install pip dependencies
+python -m pip install -r requirements.txt
+
+# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
+python -m pip install ninja
+python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
+
+# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
+python -m pip install xfuser==0.4.0
+```
+
+如果在特定 GPU 型号上遭遇 float point exception(core dump) 问题，可尝试以下方案修复：
+
+```shell
+#选项1：确保已正确安装 CUDA 12.4, CUBLAS>=12.4.5.8, 和 CUDNN>=9.00 (或直接使用我们提供的CUDA12镜像)
+pip install nvidia-cublas-cu12==12.4.5.8
+export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/
+
+#选项2：强制显式使用 CUDA11.8 编译的 Pytorch 版本以及其他所有软件包
+pip uninstall -r requirements.txt  # 确保卸载所有依赖包
+pip uninstall -y xfuser
+pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118
+pip install -r requirements.txt
+pip install ninja
+pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
+pip install xfuser==0.4.0
+```
+
+另外，我们提供了一个预构建的 Docker 镜像，可以使用如下命令进行拉取和运行。
+```shell
+# 用于 CUDA 12.4 (已更新避免 float point exception)
+docker pull hunyuanvideo/hunyuanvideo:cuda_12
+docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12
+
+# 用于 CUDA 11.8
+docker pull hunyuanvideo/hunyuanvideo:cuda_11
+docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11
+```
+
+## 🧱 下载预训练模型
+
+下载预训练模型参考[这里](ckpts/README.md)。
+
+
+
+## 🔑 单卡推理
+
+我们在下表中列出了支持的高度/宽度/帧数设置。
+
+|      分辨率       |           h/w=9:16           |    h/w=16:9     |     h/w=4:3     |     h/w=3:4     |     h/w=1:1     |
+|:---------------------:|:----------------------------:|:---------------:|:---------------:|:---------------:|:---------------:|
+|         540p          |        544px960px129f        |  960px544px129f | 624px832px129f  |  832px624px129f |  720px720px129f |
+| 720p (推荐)    |       720px1280px129f        | 1280px720px129f | 1104px832px129f | 832px1104px129f | 960px960px129f  |
+
+### 使用命令行
+
+```bash
+cd HunyuanVideo
+
+python3 sample_video.py \
+    --video-size 720 1280 \
+    --video-length 129 \
+    --infer-steps 50 \
+    --prompt "A cat walks on the grass, realistic style." \
+    --flow-reverse \
+    --use-cpu-offload \
+    --save-path ./results
+```
+
+### 运行gradio服务
+```bash
+python3 gradio_server.py --flow-reverse
+
+# set SERVER_NAME and SERVER_PORT manually
+# SERVER_NAME=0.0.0.0 SERVER_PORT=8081 python3 gradio_server.py --flow-reverse
+```
+
+### 更多配置
+
+下面列出了更多关键配置项：
+
+|        参数        |  默认值  |                描述                |
+|:----------------------:|:---------:|:-----------------------------------------:|
+|       `--prompt`       |   None    |   用于生成视频的 prompt    |
+|     `--video-size`     | 720 1280  |      生成视频的高度和宽度      |
+|    `--video-length`    |    129    |     生成视频的帧数     |
+|    `--infer-steps`     |    50     |     生成时采样的步数      |
+| `--embedded-cfg-scale` |    6.0    |    文本的控制强度       |
+|     `--flow-shift`     |    7.0    | 推理时 timestep 的 shift 系数，值越大，高噪区域采样步数越多 |
+|     `--flow-reverse`   |    False  | If reverse, learning/sampling from t=1 -> t=0 |
+|     `--neg-prompt`     |   None    | 负向词  |
+|        `--seed`        |     0     |   随机种子    |
+|  `--use-cpu-offload`   |   False   |    启用 CPU offload，可以节省显存    |
+|     `--save-path`      | ./results |     保存路径      |
+
+
+
+## 🚀 使用 xDiT 实现多卡并行推理
+
+[xDiT](https://github.com/xdit-project/xDiT) 是一个针对多 GPU 集群的扩展推理引擎，用于扩展 Transformers（DiTs）。
+它成功为各种 DiT 模型（包括 mochi-1、CogVideoX、Flux.1、SD3 等）提供了低延迟的并行推理解决方案。该存储库采用了 [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) API 用于混元视频模型的并行推理。
+
+### 使用命令行
+
+例如，可用如下命令使用8张GPU卡完成推理
+
+```bash
+cd HunyuanVideo
+
+torchrun --nproc_per_node=8 sample_video_parallel.py \
+    --video-size 1280 720 \
+    --video-length 129 \
+    --infer-steps 50 \
+    --prompt "A cat walks on the grass, realistic style." \
+    --flow-reverse \
+    --seed 42 \
+    --ulysses_degree 8 \
+    --ring_degree 1 \
+    --save-path ./results
+```
+
+可以配置`--ulysses-degree`和`--ring-degree`来控制并行配置，可选参数如下。
+
+<details>
+<summary>支持的并行配置 (点击查看详情)</summary>
+
+|     --video-size     | --video-length | --ulysses-degree x --ring-degree | --nproc_per_node |
+|----------------------|----------------|----------------------------------|------------------|
+| 1280 720 或 720 1280 | 129            | 8x1,4x2,2x4,1x8                  | 8                |
+| 1280 720 或 720 1280 | 129            | 1x5                              | 5                |
+| 1280 720 或 720 1280 | 129            | 4x1,2x2,1x4                      | 4                |
+| 1280 720 或 720 1280 | 129            | 3x1,1x3                          | 3                |
+| 1280 720 或 720 1280 | 129            | 2x1,1x2                          | 2                |
+| 1104 832 或 832 1104 | 129            | 4x1,2x2,1x4                      | 4                |
+| 1104 832 或 832 1104 | 129            | 3x1,1x3                          | 3                |
+| 1104 832 或 832 1104 | 129            | 2x1,1x2                          | 2                |
+| 960 960              | 129            | 6x1,3x2,2x3,1x6                  | 6                |
+| 960 960              | 129            | 4x1,2x2,1x4                      | 4                |
+| 960 960              | 129            | 3x1,1x3                          | 3                |
+| 960 960              | 129            | 1x2,2x1                          | 2                |
+| 960 544 或 544 960   | 129            | 6x1,3x2,2x3,1x6                  | 6                |
+| 960 544 或 544 960   | 129            | 4x1,2x2,1x4                      | 4                |
+| 960 544 或 544 960   | 129            | 3x1,1x3                          | 3                |
+| 960 544 或 544 960   | 129            | 1x2,2x1                          | 2                |
+| 832 624 或 624 832   | 129            | 4x1,2x2,1x4                      | 4                |
+| 624 832 或 624 832   | 129            | 3x1,1x3                          | 3                |
+| 832 624 或 624 832   | 129            | 2x1,1x2                          | 2                |
+| 720 720              | 129            | 1x5                              | 5                |
+| 720 720              | 129            | 3x1,1x3                          | 3                |
+
+</details>
+
+<p align="center">
+<table align="center">
+<thead>
+<tr>
+    <th colspan="4">在 8xGPU上生成1280x720 (129 帧 50 步)的时耗 (秒)  </th>
+</tr>
+<tr>
+    <th>1</th>
+    <th>2</th>
+    <th>4</th>
+    <th>8</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+    <th>1904.08</th>
+    <th>934.09 (2.04x)</th>
+    <th>514.08 (3.70x)</th>
+    <th>337.58 (5.64x)</th>
+</tr>
+
+</tbody>
+</table>
+</p>
+
+
+
+## 🚀   FP8 Inference
+
+使用FP8量化后的HunyuanVideo模型能够帮您节省大概10GB显存。 使用前需要从 Huggingface 下载[FP8权重](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt)和每层量化权重的[scale参数](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8_map.pt).
+
+### Using Command Line
+
+这里，您必须显示地指定FP8的权重路径。例如，可用如下命令使用FP8模型推理
+
+```bash
+cd HunyuanVideo
+
+DIT_CKPT_PATH={PATH_TO_FP8_WEIGHTS}/{WEIGHT_NAME}_fp8.pt
+
+python3 sample_video.py \
+    --dit-weight ${DIT_CKPT_PATH} \
+    --video-size 1280 720 \
+    --video-length 129 \
+    --infer-steps 50 \
+    --prompt "A cat walks on the grass, realistic style." \
+    --seed 42 \
+    --embedded-cfg-scale 6.0 \
+    --flow-shift 7.0 \
+    --flow-reverse \
+    --use-cpu-offload \
+    --use-fp8 \
+    --save-path ./results
+```
+
+
+
+## 🔗 BibTeX
+
+如果您认为 [HunyuanVideo](https://arxiv.org/abs/2412.03603) 给您的研究和应用带来了一些帮助，可以通过下面的方式来引用:
+
+
+```BibTeX
+@article{kong2024hunyuanvideo,
+  title={Hunyuanvideo: A systematic framework for large video generative models},
+  author={Kong, Weijie and Tian, Qi and Zhang, Zijian and Min, Rox and Dai, Zuozhuo and Zhou, Jin and Xiong, Jiangfeng and Li, Xin and Wu, Bo and Zhang, Jianwei and others},
+  journal={arXiv preprint arXiv:2412.03603},
+  year={2024}
+}
+```
+
+
+
+## 致谢
+
+HunyuanVideo 的开源离不开诸多开源工作，这里我们特别感谢 [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [FLUX](https://github.com/black-forest-labs/flux), [Llama](https://github.com/meta-llama/llama), [LLaVA](https://github.com/haotian-liu/LLaVA), [Xtuner](https://github.com/InternLM/xtuner), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) 的开源工作和探索。另外，我们也感谢腾讯混元多模态团队对 HunyuanVideo 适配多种文本编码器的支持。
+
+
+
+## Star 趋势
+
+<a href="https://star-history.com/#Tencent-Hunyuan/HunyuanVideo&Date">
+ <picture>
+   <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo&type=Date&theme=dark" />
+   <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo&type=Date" />
+   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo&type=Date" />
+ </picture>
+</a>
--- a/assets/3dvae.png
+++ b/assets/3dvae.png
--- a/assets/PenguinVideoBenchmark.csv
+++ b/assets/PenguinVideoBenchmark.csv
+,prompt
+0,"In the large cage, two puppies were wagging their tails at each other."
+1,"A flock of bats flies over the village, captured in medium long shot."
+2,"Above the sea, a school of silver flying fish leaped out of the water."
+3,"In the early morning park, a bee is collecting pollen on a flower, in anime style."
+4,Two dolphins are swimming in the blue sea.
+5,"Several ducks are lying in the mud pit, occasionally preening their feathers leisurely, and sometimes probing for food in the muddy water with their beaks."
+6,"Under the azure sky, a polar bear stands in the snow, turning its head to look at its cub behind him."
+7,A butterfly is fluttering.
+8,A woodpecker is pecking holes in the tree.The video is presented in a realistic style.
+9,"Several swallows are carrying mud to build nests under the eaves, low angle shot."
+10,"A white dove is flapping its wings, flying freely in the sky, in anime style."
+11,A flying wild goose captured from a low-angle shot.
+12,Two crows have landed on a branch.
+13,The panda in the zoo is eating bamboo. The video is in black and white style.
+14,"Tracking shot of a wolf slowly moving through a tranquil snowy landscape, leaving footprints with each step. Suddenly, the wolf accelerates into a run and leaps. At the moment of the jump, the camera cuts to reveal the wolf landing in the midst of a lush, verdant tropical rainforest."
+15,"A small shrimp is holding food with its claws, nibbling at it bit by bit."
+16,"In the aquarium tank, a group of tropical fish swim among the colorful coral reefs. The video captured in stunning 4K clarity."
+17,"A goldfish swimming at the bottom of the water, with several aquatic plants nearby. The video is in HD quality."
+18,"In the lush green fields, herds of cows and sheep graze leisurely."
+19,"a Antelope is drinking water by the creek, the camera moves vertically from top to bottom."
+20,A pelican is drinking water by the lake.
+21,Jellyfish dancing in the sea.The video is presented in a realistic style.
+22,The seaweed sways in the water.
+23,A golden koi swims gracefully in the clear pond.
+24,"Under the warm sunshine, a little dog is eating slowly."
+25,"A small quail forages in the field, looking left and right cautiously."
+26,"A hyena is looking for food, walking while lowering its head to sniff the scent on the ground."
+27,"In a field of green grass, a small grey rabbit was busily munching on fresh blades of grass, occasionally lifting its head to cautiously scan its surroundings."
+28,Three little fishes are swimming around in the fish tank
+29,"A pod of dolphins playfully leaped out of the water in the wake of the ship, as if they were enjoying a game with the vessel."
+30,"A wild duck glides leisurely across the crystal-clear lake, leaving ripples in its wake, captured from a high angle shot."
+31,"Under the rocks in the water, there are many small shells hidden, with numerous clownfish swimming nearby."
+32,"A squirrel is busily jumping on a tree trunk, looking for food to prepare for winter."
+33,"A whale leaps out of the sea, in anime style, captured from a bird's-eye view."
+34,"A grey chubby cat walked on the lush green grass, then suddenly stopped and looked up at the sky."
+35,"A clear lake bottom, where a school of fish leisurely swims."
+36,"By the pond, red-crowned cranes are foraging.Medium shot."
+37,A hummingbird flaps its wings and hovers in front of a flower.
+38,"A magpie takes off with a twig in its beak, in anime style."
+39,"A snake slithers smoothly along the ground, weaving agilely through the grass and fallen leaves, tracking shot."
+40,"In the blue sky, wild geese are flying"
+41,"In the ocean, a large sea turtle covered with green algae on its shell swims in the sea. The video is captured in 4K clarity."
+42,"In the garden, a little bee was fluttering and then landed on a flower."
+43,"Under the setting sun, a little dog is digging a hole."
+44,"In the darkness of night, a white cat treads along the rooftop, captured in a wide shot."
+45,"On a vast expanse of snow, an Arctic fox is running lightly. Its ears are pointed and alertly turning, capturing any sound around."
+46,"An albatross perches on a floating log on the sea surface, with its wingtips and tail being gray-brown. The video is taken with an arc shot."
+47,"A dragonfly hovers, its transparent wings vibrating, captured in high angle shot."
+48,Two clownfish are feeding next to the underwater seaweed.
+49,"A deer is running, in a realistic style."
+50,"On the shimmering surface of the water, a large fish poked its head out."
+51,Two mandarin ducks are playing in the water.
+52,"On a sunny winter day, two little kittens play with a ball of yarn by the windowsill, long shot."
+53,"A gray little rabbit is eating grass, with its mouth chewing. Medium close-up shot."
+54,"Under the morning sunlight, two gray kittens are climbing trees."
+55,"A monkey is sitting in front of a computer screen, laughing heartily."
+56,Several goldfish are swimming slowly in the fish tank.
+57,A giant panda is walking.
+58,"A turtle cautiously forages among the dense aquatic plants in the pond. Sensing danger nearby, it quickly retracts its head into its hard shell."
+59,Three green parrots in the cage are drinking water.Blue tint for the picture
+60,"Schools of clownfish, with their vibrant tails swaying, busily weave through the seaweed in search of food."
+61,High-speed shot of a Fly's Wingbeats
+62,"A wizard is waving a magic wand, chanting spells, controlling an apple to fly into the air. The color of the apple changes from green to bright red."
+63,"On the barren surface of a planet, ancient and huge mechanical statues stand tall, surrounded by scattered wreckage of damaged spaceships. White fog floats in the sky, and peculiar crystals emit light. The entire video presents a sci-fi art style."
+64,"Under the starry sky, trees sway gently, and fireflies dot the night sky. A few people are sitting around a bonfire, chatting and laughing. The entire video presents a joyful atmosphere."
+65,"At the porch, a loyal dog lies on the welcome mat wagging its tail. Above it is a row of hooks with the owner's coat and hat hanging on them."
+66,"Silver skyscrapers in the city, with streams of people passing by below. The camera moves vertically from top to bottom during filming."
+67,"A computer is placed on the table, with fingers rapidly tapping on the keyboard. At the same time, the printer next to it makes a buzzing sound, and paper smoothly slides out from the paper outlet. There is also a cup of steaming coffee on the table."
+68,"In the jungle, a lush tree stands quietly. Under the tree, a cute rabbit is leisurely eating grass. Behind the rabbit, a majestic tiger quietly approaches."
+69,"In New York's Times Square, colorful neon lights flash incessantly, like stars dotting the sky, illuminating the entire block. The crowd bustles about, hurrying through the area. The camera switches to a quiet corner of a library in the city. Here, a reader is engrossed in reading a book in hand. His eyes are fixed on the pages, and his fingers gently turn each page."
+70,"There is a mirror on the pink dressing table, and below the mirror, an ant is slowly crawling on the desktop."
+71,"By a fog-enshrouded lakeside, an old boat gently sways, reeds by the lake sway with the wind, and unidentified objects float on the lake surface. The entire video presents a suspenseful atmosphere."
+72,"In the yard, there is an apple tree laden with apples. Grandpa sits in the rocking chair under the tree, fanning himself with a bamboo fan. A child is running and playing around the apple tree."
+73,"Inside the music room, a woman is practicing playing the piano. A cello is placed to the right of the piano, accompanied by a music stand, a violin, and a chair, captured in a wide-angle shot."
+74,"A man in a gray suit stands on the balcony drinking. At his feet lies a black puppy, depicted in 3D cartoon style."
+75,"A motorcycle lies overturned in a weedy ditch, where a crocodile is swimming back and forth. The entire video presents a suspenseful atmosphere."
+76,"On the ancient battlefield, warriors bravely charged forward, wielding long spears and shields. Horses neighed, carrying knights and galloping across the battlefield."
+77,"A stapler is placed to the left of a small round mirror. To the left of the stapler, there are also a pair of sunglasses and a key. The camera moves horizontally from right to left."
+78,"The veterinarian is holding a stethoscope and conducting a thorough physical examination of the kitten in front of him. The kitten appears very scared, constantly dodging the doctor's touch. The pet owner beside them watches this scene anxiously, scratching their head in worry."
+79,"A vast expanse of grassland teems with herds of cattle and sheep, as herders gallop across the plains on horseback."
+80,"By the azure sea, a little girl is picking up seashells, followed by a husky, rendered in pixel art style."
+81,"A tourist dressed in a lightweight adventure suit, with a bulging backpack on his back, is slowly walking towards the mountain top. By the roadside, a little monkey curiously stares at him, jumping around on the rocks, trying to get close to the tourist."
+82,"In the evening in the desert, guided by the shepherd gently, a flock of sheep is walking towards the distant water source."
+83,"A carpet covers the living room floor, and a puppy is lying on it playing with a kitten. To the right of the carpet, there is a toy ball."
+84,"As the sun sets, the reeds sway gently in the breeze, casting their reflections on the water. Beneath the reeds, a frog intently watches a dragonfly that has just landed on one of the leaves."
+85,"A computer is placed on the computer desk, playing a movie. Under the computer desk, there is a black dog lying down, gnawing on a bone.The video is in black and white style."
+86,"After jumping on the sofa, the kitten lies down quietly on the sofa. The desk lamp beside it emits yellow light."
+87,"In the afternoon, a man was focused on typing at his desk. A curious dog jumped onto the table, trying to touch the keyboard. The man gently stopped him and carried him back to the ground."
+88,"In the sky above the forest, several crows are soaring. The setting sun casts its glow across the horizon, making the distant mountains clearly visible. The camera tilts upwards."
+89,"In the underwater world, colorful schools of fish swim freely in the water, their tails swaying happily, sometimes gathering in groups, and sometimes scattering. The camera switches to the bustling beach, where a group of people are sweating profusely and striving hard on the beach volleyball court. They jump and spike, every movement filled with strength and passion."
+90,"In the deep sea, there is a large rock covered with a thin layer of algae. To the right of the rock is a large octopus, its eight tentacles sometimes stretching flexibly, sometimes wrapping around the rock."
+91,"Across the vast expanse of snow, numerous deep crevasses are scattered. A polar bear leisurely strolls across the icy terrain, its white fur blending seamlessly with the surrounding frozen landscape."
+92,"Beneath the azure sky, mountains encircle a lake, whose surface is as calm as a mirror, reflecting the surroundings. Trees lushly line the lakeside, where several squirrels frolic and leap among the branches."
+93,Two great white sharks are preying on a school of coral fish. The school of coral fish is scattering in all directions.
+94,"A single light boat slowly progresses through the river water, with continuous undulating mountains behind it. These mountains are layered and lush, resembling a vivid landscape scroll. Sunlight filters through the thin mist onto the water surface, creating a sparkling effect. The scene is depicted in the style of Zhang Daqian."
+95,"In the vast desert, a person leads a camel walking slowly. The camel carries a bundle on its back. Aerial shot."
+96,"In the garden the fountain gushes merrily. To the left of the fountain is placed a bench, and to the left of the bench is a flowerbed blooming with various colors of flowers. The camera pans to the left."
+97,"It's snowing heavily in the sky, and a thick layer of snow quickly covers the stone table in the yard. The camera switches to inside the house, where adults are wrapping dumplings, each with a happy smile. The entire video presents a joyful atmosphere."
+98,"Under the dim streetlights, fallen leaves flutter. A man walks alone on the street. He is dressed as a clown and holds a balloon in his hand. The entire video presents a melancholic atmosphere."
+99,"On a sunny afternoon, the clouds in the sky were as soft as cotton candy, drifting slowly. An elderly man with white hair rode a motorcycle through the country road. He wore a helmet and had a contented smile on his face, enjoying the caress of the wind and the warmth of the sunshine. The entire video is in a sci-fi art style with 4K ultra-high-definition quality."
+100,"At the party, colorful lights flickered, tables were laden with cakes and desserts, and people held drinks in their hands, dancing to the rhythm."
+101,"A slow-motion shot of coffee splashing, with the coffee forming an arc as it rises, and the word ""Energy"" appearing in bold letters, entirely made up of coffee."
+102,"In the fish tank, there are three small goldfish swimming, and a piece of seaweed is gently swaying."
+103,"The little boy carefully placed the pen on the document on the office desk. To the right of the document was a red telephone, and next to the telephone, the curtain swayed with the wind."
+104,"On an open grassland, a man is intently controlling a drone in his hands. The man is wearing a simple T-shirt and jeans, with a baseball cap on his head. He is skillfully pressing the buttons on the remote control. Under his manipulation, the drone is circling in the air at times and rising at other times."
+105,"On the grass, a picnic blanket is spread with bananas and apples. A rabbit hops onto the blanket and starts nibbling on an apple. The camera zooms out, revealing a 2K high-definition scene."
+106,"In the deep and mysterious sea, an ancient shipwreck lies quietly there, its hull eroded by seawater and mottled. Inside the cabin, several crabs shuttle back and forth."
+107,"On the autumn lake, the fiery red maple trees are reflected, their leaves gently swaying in the wind. Occasionally, a few maple leaves fall, floating softly on the water's surface. A small boat is drifting on the lake. The video is in anime style."
+108,"In the park, a lame golden retriever is leisurely strolling. Its right front leg is wrapped with bandages, and each step seems particularly careful, but it still maintains an optimistic attitude, wagging its tail. Behind it, a little boy is sitting in a wheelchair, skillfully turning the wheels with his hands."
+109,"On the beach, the waves gently lap against the shore. Some people are sunbathing, surfers are sliding on the waves, and children are building sandcastles. The entire video presents a joyful atmosphere."
+110,"In a lab filled with a sense of technology, an advanced robot is busily working. Its body is flexible and its movements are precise, showing amazing technical strength. At the same time, a man stands by, watching the robot's operation with focused attention. He is dressed in a white coat and holds a notebook in his hand, occasionally recording the robot's operating data."
+111,"Shot using an over-the-shoulder angle, a boy explores a dark cave, guided by the light from his flashlight. As the boy progresses deeper, the light at the end of the cave intensifies. The camera then cuts to reveal a magnificent waterfall and a verdant valley."
+112,"At dawn by the seaside, gentle waves lap at the shore while seagulls circle above. Fishermen bustle about the pier, prepping their boats for a day at sea. In the distance, a lighthouse stands out prominently against the morning light, casting a golden sheen across the water, in the 80s vaporwave style."
+113,"In the cold Arctic, the night sky is filled with dazzling auroras that dance like a dreamy scroll, their colorful lights twinkling in the darkness. The camera switches to a cozy cabin, where a man is sitting by the fireplace, smoking a pipe."
+114,"At the hotel reception, there is a computer playing a movie. To the left of the computer is a printer, and to the right is a candy box filled with various colored candies. The camera pans from left to right."
+115,"Lotus flowers bloom in the pond. By the pond, a calligrapher is wielding his brush, with ink, paper, and inkstone neatly arranged."
+116,"A dancer with orange hair, sitting on a blue mushroom with a pink wand in her hand and green fireflies flying around her, 3D cartoon."
+117,"A woman is lying on the sofa watching TV, enjoying the twists and turns of the plot while leisurely eating potato chips. A monkey sneaks in quietly and tiptoes past behind the sofa."
+118,"The butterfly lands on the railing, and the camera pans to the right to capture the tulip field next to it. The entire video is presented in anime style"
+119,"On the Shonan coast of Kanagawa Prefecture, the tide rises and falls, and a few seabirds stay on the shore."
+120,"A shark preys on an octopus at the bottom of the sea, with the octopus fleeing. Slow motion shot."
+121,A green garbage truck is parked by the roadside. Several sanitation workers are cleaning the road behind the garbage truck.
+122,"A female art teacher is drawing in the classroom, with a box of colored pencils on the desk, in an animation style."
+123,"In the winter ski resort, as large snowflakes descend from the sky, skis are neatly arrayed outside the equipment rental shop. Two tourists are meticulously selecting their skis."
+124,"In the forest, the trees are lush, the creek is gurgling, and the deer, rabbits, zebras, and foxes are running and playing freely."
+125,"There are five people in the conference room having a meeting, and one of them is speaking in front of the conference table."
+126,Two girls are sitting on a park bench crying. The entire scene creates a sad atmosphere.
+127,A worried man was pacing back and forth in front of the hospital room door.
+128,Two chubby little boys are playing with a ball.
+129,A little girl with braided pigtails is enjoying an ice cream.
+130,"A tracking shot follows an African skateboarder weaving through the city park, capturing his smooth movements and the crowd he passes by."
+131,Two elderly men in their 70s are playing chess in the park. The video is presented in black and white style.
+132,"A thin and small man hides under the bed, closing his eyes, in the style of Alfred Hitchcock."
+133,"A little girl, adorned with a red hairpin in her hair, smiles broadly, her happiness evident."
+134,A middle-aged couple is riding a tandem bicycle in the park. The video is in standard definition.
+135,"A man in a black suit looks into the distance and cries silently, his eyes filled with confusion. The entire scene presents a sorrowful atmosphere."
+136,A group of twenty-something youths are participating in a running race. The entire video creates a joyful atmosphere.
+137,"A group of 6-7 year old children holding tickets, walking towards the Ferris wheel. In anime style."
+138,"A little boy, with a white backpack, hopped along the road."
+139,"The landlord with dark skin, sporting a bouffant hairstyle, is hanging laundry on the balcony. A low angle shot."
+140,"In the pet hospital, a male doctor is treating the wound on the cat's paw, graphic novel style."
+141,"In the nighttime park, two young girls sit on a bench, chatting. The scene then shifts to a bustling night market, where a BBQ stall owner is busily flipping skewers, in 4K HD."
+142,"A man in a suit slowly ties his tie,present in the style of the movie ""The Godfather""."
+143,"A man stands on a bustling street making a phone call, dressed in a gray suit, holding a briefcase, pacing as he speaks on the phone. To his left stands a luxury car. Captured in a full shot."
+144,"There are three children on the beach building sandcastles with sand, all wearing blue school uniforms. The video is shot from a low-angle perspective."
+145,A white man is running on the beach.
+146,"A woman dressed in a red gown is playing the piano, enveloped by candlelight and flowers. The video exudes a romantic ambiance."
+147,"A man wearing headphones is listening to music, shaking his head. Medium close-up shot, neutral tones for the picture."
+148,An African person places a basket of bananas on her head.
+149,An elderly grandmother with white hair and a slightly hunched back walks slowly on the street.
+150,The camera captures five women sitting around a dining table sharing dinner. The warm candlelight illuminates their smiling faces.
+151,"In the kitchen, a young man dressed in a white chef's coat and hat is whisking eggs in a glass bowl, painted in a watercolor paint style."
+152,A lady with long curly hair is sadly wiping the tears from the corners of her eyes with a handkerchief in her hand.
+153,"A hunter is aiming at a nearby elk, in a realistic style."
+154,"The keeper is feeding two sea lions on the shore, busily walking back and forth."
+155,"On the lake, two people leisurely row a small boat.The entire scene exudes a romantic atmosphere."
+156,The little boy rode his bicycle down the road.
+157,"A gray-haired old man washes clothes by the river, pixel art style."
+158,"The short-haired little girl sat on the bench, slowly lifting her head to watch the sunset."
+159,"Under the afternoon sun, an elderly man wearing a straw hat strolled leisurely along the edge of the golden wheat field, in animation style."
+160,"Tears streamed down the girl's cheeks, leaving trails. The video presents a sorrowful atmosphere."
+161,"A man is walking towards an ancient building, seen from a 45-degree high angle shot,blue tint for the picture"
+162,"A worker on a factory assembly line is skillfully labeling goods as they pass by. The camera switches to the factory office, where the boss stands with one hand on his hip and the other pointing in front of another worker, seemingly scolding them. The entire video presents a serious atmosphere."
+163,Create a pixel art video of a man running on a park trail in a blue tracksuit and black running shoes.
+164,"In the tea plantation, farmers are busy picking tea leaves. The camera horizontally moving from right to left."
+165,A female student in a gray coat slowly stands up in the rain. The entire video presents a melancholic atmosphere.
+166,"An 8-year-old little girl is brushing her teeth, suddenly getting toothpaste on the mirror."
+167,"A man exposes his arm, showing off the muscle lines on his arm, close-up."
+168,"An explorer walks alone on a mountain path, captured in a long shot."
+169,A man crying with his face in his hands by the roadside.
+170,"Inside the library, a short-haired girl is searching for books on the shelf in front of her."
+171,"Under the dense big tree, two grandfathers over 60 years old are playing chess, with the camera tilting downward."
+172,"In a warm cabin, the warm light of winter fills the entire room,
+A kind old woman is knitting a blanket by the fireplace while a small child curls up beside her with a book. The camera captures the intimate, peaceful atmosphere, focusing on the gentle expressions of both characters. The animation includes subtle, soft movements. The flames flicker and the blanket rustles slightly, evoking a feeling of comfort and warmth."
+173,"In the study, a man is typing on his keyboard, deeply focused on his work. The scene shifts to the bedroom, where under the warm yellow light, his wife is sitting by the bed, reading a storybook and gently patting their child's chest, lulling the child to sleep. The atmosphere is cozy and heartwarming."
+174,"In the park, a little girl in a pink dress is on the swing, in a full shot."
+175,A sanitation worker trims the flowers along the edge of the flower bed with scissors.
+176,"A woman in a dress lost her balance and fell on the steps, with the camera tilting down."
+177,"On the park grassland, several Asians are chatting together in the campground."
+178,"A Frenchman in a black suit, with spectacles and a hat on his head, holding a cane in his right hand and picking up a pocket watch in his left to check the time. Medium long shot."
+179,"Amidst the vast expanse of the grassland, a man in Martin boots strides forward. In one hand, he holds a whip, while the other leads a magnificent white horse. The scene is shown in a full shot."
+180,"Under the banyan tree, there sits a couple in their 80s, chatting."
+181,"The camera follows the back of a woman with long hair as she runs, capturing the strands as they whip and flow dynamically in the wind."
+182,"In front of the mountaintop temple, a monk in his robes bows in prayer, with the camera shooting from a low angle upwards."
+183,"In the corner of the library, a man is flipping through a newspaper, while a woman next to him is working on a laptop. The entire video is presented in a sticker style."
+184,"Workers are building a house on a construction site, and time-lapse footage records the process of the house being built from the beginning to its completion."
+185,"In the bustling square at night, a woman around 60 years old is happily dancing the square dance. The entire scene creates a joyful atmosphere."
+186,"In the library, several students are whispering to each other."
+187,A girl is making crafts with leaves in a craft store. The video is presented in 3D animation style.
+188,"Two people on the bridge are looking down at the fish in the water, one of them is pointing into the water, introducing to the other person."
+189,The little girl pouted angrily.
+190,Two hikers make their way through the muddy terrain of a tropical rainforest.
+191,"In the early morning bamboo forest, three people leisurely stroll along a path strewn with fallen leaves, captured in a full shot."
+192,"The soccer player sits alone on the ground, slowly wrapping his arms around his knees, with a desolate expression. The entire video presents a sorrowful atmosphere."
+193,A slim Asian woman is taking off from the springboard.
+194,"A young woman with an oval face is making dumplings, in a medium close-up shot."
+195,"On the rooftop, there's an open-air basketball court where six boys are playing basketball."
+196,"In an opulently decorated fine dining restaurant, Audrey Hepburn is seated at a table, elegantly dressed in a simple yet refined white gown. Exquisite tableware, each piece exuding a gentle sheen, is laid out in front of her. Hepburn moves with grace and poise as she savors the delicacies on her plate. To her left, a formally dressed waiter is slightly bent over, politely inquiring if she requires any additional service."
+197,"An Indian person sits at the dining table, raising his left hand to reach for the plate."
+198,The waitress is cleaning the table.The video is in an anime style
+199,"It started to rain heavily in the desert, and three travelers took shelter under rocks to avoid the rain."
+200,"A few children, around six or seven years old, were playing and laughing under the tree."
+201,The brown-haired woman's face turned red with anger as she clenched her fists.
+202,A centenarian looks at the silent night sky and murmurs to himself. The entire video presents a sad atmosphere.
+203,"A 2-year-old little girl stands in front of a pink mushroom-shaped house, holding a lollipop in her left hand, eating it with relish. Her right hand is holding a small dog, in 3D cartoon style."
+204,"The strong man clenched his fists tightly, his knuckles turning white from the effort, in a close-up shot."
+205,"In the corner of the café, a couple in their 70s are chatting while sipping hot coffee, presented in a realistic style."
+206,A man with yellow skin is putting on glasses.
+207,"A side-tracking shot follows an elderly woman with a kind face, wearing a worn, cozy shawl and a straw hat, as she gently walks through a bustling, vibrant market. She carries a basket filled with freshly baked bread, the warm steam wafting up into the crisp morning air. Stalls filled with colorful fruits, vegetables, and handmade trinkets line the path, and cheerful market-goers, including children and shopkeepers, greet her with smiles. The camera captures the detailed textures of cobblestone streets and fluttering market banners, evoking a sense of nostalgia, warmth, and a lively, enchanting atmosphere."
+208,A girl in a red dress is swinging in the park.  Long shot.
+209,"In a large classroom, a teacher is imparting knowledge to the students, high angle shot."
+210,"At the entrance of the village at night, five farmers are warming themselves by the fire."
+211,An elderly man in his 60s is singing and playing the guitar in the square. The entire video presents a melancholic style.
+212,A little girl is holding a balloon and running slowly forward.
+213,"A couple snuggle up next to each other sitting by the fireplace, the warm glow of the flames reflecting on their faces. The entire scene exudes a romantic atmosphere."
+214,"An elderly person with graying hair stands by the sea, slowly raising his head to gaze at the boundless ocean. The entire video presents a melancholic atmosphere."
+215,"In the garden early in the morning, two elderly people are practicing Tai Chi. Eye level shot."
+216,"A girl stands beside a car with a broken door, crying. The entire video presents a sad atmosphere."
+217,Four pedestrians are hurrying along under the streetlights at night.
+218,"On the street side, three people are chatting. Shot in medium shot."
+219,A black man won a gold medal and danced with joy.The video presents a cheerful atmosphere.
+220,"In the rain, a couple stands facing each other under a streetlight. The man reaches out to hold the woman's hand, but she pulls it away. The entire scene creates a sad atmosphere."
+221,A young writer side-on to the camera picks up a fountain pen and signs his name on the cover of a book handed by a fan.
+222,"In the aquarium, two girls jump up happily, long shot."
+223,"In the train station waiting room, there are passengers waiting for their trains and ticket inspectors checking tickets, with people coming and going. The camera switches to a carriage where a girl is quietly reading a book with headphones on."
+224,"In front of the aquarium exhibition tank, a group of children around 10 years old are watching the fish in the tank."
+225,"A woman in a white dress is walking up the stairs, with the camera shooting from a low angle. The image is in 4K resolution."
+226,A Latino man with brown skin is throwing punches in the boxing ring.
+227,A little boy is doing his homework.
+228,"On the ancient city tower, a woman in red slowly unfolds her oil-paper umbrella, then looks towards the distance. The entire scene presents a calm atmosphere."
+229,"In front of the milk tea shop, there are three boys wearing sportswear, one of whom is spinning a basketball on his finger. The video is in black and white."
+230,Two kids are clapping their hands.
+231,A Russian white-collar worker is drinking coffee leisurely
+232,A couple is watching the aurora. They occasionally look up to admire the aurora and frequently exchange glances. The video presents a romantic atmosphere.
+233,"At dusk, an elderly couple holding hands stroll along the beach, with the waves gently lapping at the shore and the afterglow of the setting sun shining on them. The entire scene creates a tranquil atmosphere."
+234,"In the park, two elderly gentlemen are engaged in a game of chess, with a crowd of onlookers animatedly discussing the moves. The video is in 3D cartoon style."
+235,"In the laboratory, Thomas Edison is using a pen to record his experimental data."
+236,"Beads of sweat slowly rolled down the boy's forehead, close-up shot."
+237,"On the street, pedestrians hurried by, and a boy drenched by rain hid under the eaves, wiping away tears. The entire scene created a melancholic atmosphere."
+238,A young mother carefully lifts the baby bottle to feed the infant.
+239,"Under the ancient tree of the temple, three worshippers are devoutly kneeling to pray for blessings."
+240,"A curly-haired black woman is clearing the bar in a high jump competition,high saturation for the picture."
+241,"An 8-year-old girl playing in a water park, in 3D cartoon style."
+242,"In the classroom, the male teacher, holding a piece of chalk, writes math problems on the blackboard while the students attentively watch from their desks."
+243,"A Chinese man wearing glasses sits on a bench reading a newspaper. He occasionally flips through the pages, pausing at times to read intently."
+244,"On a frigid winter night, a homeless person is huddled in a street corner, trembling from the cold. His clothing is worn and insufficient. The camera is positioned at the knee level of a passerby, shooting horizontally."
+245,"An Indian male hands a drink to a girl sitting next to him, green tint overlay for the video."
+246,"In the classroom, the teacher points at a student, and the student stands up."
+247,The bald man angrily smashed a glass.
+248,"Outside the office building, there is a ground parking lot with several black cars parked. A bus slowly drives into the parking lot."
+249,"Tall chimneys billowing thick smoke, in a low angle shot."
+250,"In the movie theater, the big screen is playing a suspenseful film. The camera shoots from behind the audience, slowly moving forward along the aisle."
+251,"In the ancient courtyard, there is a huge rockery stone. The camera tilts downward."
+252,"In the shopping mall, with people coming and going, there are various kinds of shops, captured with a wide shot."
+253,"In the park, the sun is shining brightly, the green trees provide a cool shade, flowers are blooming, and children are chasing colorful balloons. The entire video presents a joyful atmosphere."
+254,"A girl is searching for books in an ancient library. After she gently pushes aside a row of bookshelves, the scene switches to a well-lit laboratory filled with modern technological ambiance, where instruments are flashing. The video is in ultra-high-definition quality."
+255,"The white marble facade of Milan Cathedral shines brilliantly in the sunlight, with its slender spires and flying buttresses stretching towards the sky. Visitors stop here, gazing up at the exquisite carvings and sculptures."
+256,Three paintings hang on the blue wall. The camera horizontally moving from left to right during shooting.
+257,The worn waterwheel turns slowly by the riverbank.
+258,"On the green grass, the white-walled Leaning Tower of Pisa stands tall. The camera moves vertically from top to bottom during filming."
+259,The morning market in the small town is bustling with crowds and shrouded in mist.
+260,"In the factory, there are two forklifts, one large and one small, and many wooden boxes. The camera pans to the right."
+261,"Shot from a first-person perspective, the camera passes through the terracotta warrior pits, where rows of pottery figurines stand quietly in formation."
+262,"The industrial park under construction is bustling with construction vehicles operating intensively, and workers are installing equipment."
+263,"Filmed from a first-person perspective, the camera passes through the graffiti alley in Melbourne, Australia, where the graffiti walls are covered with artwork from many artists."
+264,"Aerial shot of a busy medieval market, then cuts to a modern cityscape of towering skyscrapers and neon lights."
+265,"In a dilapidated house, dust and cobwebs are scattered across the floor, and the door is suddenly blown open by the wind. The entire video presents a suspenseful atmosphere."
+266,"In the early morning park, the sky is tinged with light blue and purple, with the first light of dawn appearing. On the park paths, people who have risen early are already walking, and the flowers in the flower beds bloom with dew."
+267,"The ancient Roman Colosseum, with the camera panning from left to right."
+268,"In the quiet library, books of various categories are neatly arranged on the shelves, and several readers are sitting or standing, attentively flipping through the pages in their hands, immersed in the ocean of knowledge."
+269,A magnificent bridge with a constant stream of traffic spans over a river flowing ceaselessly below. The camera moves horizontally from left to right.
+270,"On the ancient city walls, ancient cannons are placed, and visitors pick up cameras to take photos while walking.The entire video is in black and white style."
+271,"The chimney of the power plant is slowly emitting black smoke, which has dyed the sky gray. Shot with a wide-angle lens."
+272,"A golden wheat field, with wheat swaying gently in the breeze, next to which there is a dilapidated wooden cabin."
+273,"Inside the lecture hall, a bottle of mineral water is placed on each seat for the convenience of the audience to use at any time. Various documents and materials are neatly stacked on the podium, waiting for the lecturer to consult and display at any time. The camera pans from left to right."
+274,"The dome of Hagia Sophia stands solemnly under the blue sky, with its four minarets positioned at the corners of the building. The camera shoots from a top-down view and continuously zooms out."
+275,"The Golden Gate Bridge glows with a warm halo in the sunset's afterglow, standing majestically against the sea breeze with ships slowly passing beneath.The camera moves vertically from top to bottom during filming."
+276,"The Vatican's St. Peter's Basilica under the blue sky, with birds flying over the roof, and the camera moving vertically upward."
+277,"Steam rises from the hot dogs sizzling on the grill at a New York street vendor's cart, with condiment bottles neatly lined up."
+278,"Under the cherry blossom trees in Kyoto, there are two stone lions and two wooden stools. A gust of wind blew, and the cherry blossom petals fell one after another."
+279,"The Sagrada Família stands in the heart of Barcelona, resembling a mysterious megalithic structure. As evening sunlight spills over its intricate facade, the eighteen spires seem like fingers pointing to the heavens, shimmering with golden light. The camera tilts upwards."
+280,"Beside the makeup bag lie a white cotton pad, an eyebrow pencil, and a pink lipstick, as the camera horizontally moving to right."
+281,"The plate is placed on the dining table, with a spoon next to it. The camera pans to the right, and a chair is positioned on the right side of the dining table."
+282,"An old, rundown factory with piles of industrial waste at the entrance, zoom in on the lens."
+283,The towering Leshan Giant Buddha statue. A full shot with the camera tilting upwards.
+284,"As night falls, on Michigan Avenue in Chicago, the towering buildings are decorated with colorful neon lights on their exterior walls, forming a brilliant ocean of light that contrasts sharply with the dark night sky."
+285,"In the abandoned factory, a cold wind blows through, lifting discarded papers on the ground. The entire video creates a suspenseful atmosphere."
+286,The ancient Acropolis of Athens looks even more majestic under the glow of the setting sun.The camera pans from left to right.
+287,"On a country path lined with small trees, two elderly people are taking a stroll down in the middle. The video is in standard definition."
+288,"In the center of the room, there is a bed, above which hangs a ceiling fan that is rotating."
+289,"In the rural wheat fields, farmers are harvesting wheat. Aerial shot."
+290,"Skyscrapers tower along Michigan Avenue in Chicago, with pedestrians weaving through the streets."
+291,"As night falls, the lights on the Eiffel Tower begin to twinkle."
+292,"In the dawn, wisps of smoke drifted from the roof of the farmhouse.The video is in black and white style."
+293,"Under the old locust tree in the countryside, there is a stone table and three wooden chairs. The video is taken with a blurred background to highlight the table and chairs. The camera pans from left to right."
+294,"A full shot captures the Arc de Triomphe in France, showcasing its majestic and spectacular architectural style. Then, the camera slowly pushes in, focusing on the bas-reliefs on the Arc de Triomphe."
+295,"In the yellow cabinet, clothes and trousers are hung full. The camera pans from left to right."
+296,"An empty classroom, with the breeze gently moving the curtains. The entire video exudes a tranquil atmosphere."
+297,"In the abandoned hospital corridor, the wall paint has peeled off, and discarded syringes and medicine bottles are visible everywhere on the ground. The camera slowly pushes in, creating a suspenseful atmosphere throughout the video."
+298,"In the spacious greenhouse of the farm, the left side showcases a vibrant vegetable garden with lush green leafy vegetables; while the right side features a fruitful orchard with various fruits hanging full on the branches. The camera slowly moves from left to right."
+299,"The door of a sealed room slowly opens, revealing a bed, chairs, a portrait on the wall, and an old carpet with some stains. The entire video presents a suspenseful atmosphere."
+300,"On the bed in the bedroom, a puppy is playing with a slipper in its mouth."
+301,"Deep in the dense forest, a lighthouse stands alone, its light at the top flickering on and off. The entire video presents a suspenseful atmosphere."
+302,There is a green mailbox on the street. To the right of the mailbox is a flower shop. The camera moves horizontally from left to right.
+303,"In the center of the square is an exquisite fountain, with water jets dancing to the rhythm of the music, sparkling with countless dazzling points of light under the sunlight. The video is in 4K resolution."
+304,"Night view of Canton Tower.Full shot, camera tilts upwards."
+305,Raindrops pattered against the tightly closed windows.
+306,"On the right side of the art gallery is a series of modern paintings, while on the left are classical sculptures, filling the entire space with an artistic aura. The camera pans from right to left."
+307,"In the deserts of Egypt, the ancient pyramids stand tall and majestic, with the afterglow of the setting sun casting a golden hue on the stone bricks, giving them a layer of gold. The camera moves vertically from bottom to top during filming."
+308,"The majestic palaces of the Forbidden City stand tall and imposing, with the glazed tiles shimmering in the sunlight. The camera tilts downwards, revealing the vibrant and profound red walls of the Forbidden City."
+309,"In the bright meeting room, various tables, chairs, and meeting equipment are neatly arranged. The camera pans from left to right, presenting a solemn atmosphere throughout the entire scene."
+310,"Inside the cabin, passengers lean back in their seats, resting with closed eyes. As the camera pushes closer to the windows, it switches to the departure hall where crowds are surging and passengers are rushing about."
+311,"The wind blew through the broken window, scattering the papers on the table.The video is in black and white style."
+312,"In the chemistry laboratory, various reagent bottles are neatly arranged on the shelves along the wall, and test tubes and beakers are placed on the central long table. The liquid in a beaker is gently boiling, emitting wisps of steam."
+313,"The fields are full of wheat, with waves of golden yellow swaying in the wind."
+314,"In the gymnasium, there is no one around, and the lights flicker on and off. A basketball gently bounces on the floor."
+315,"On the night street, the streetlights illuminate with a pale yellow light."
+316,"A newspaper is placed on the garden bench, and a gust of wind blows it onto the ground. The entire scene is in watercolor style, captured with a close-up shot."
+317,"There is a row of electric poles by the roadside, standing there quietly. The electric poles are connected by neat wires, swaying gently with the breeze. The camera slowly tilts upwards."
+318,"On the streets of Hong Kong, the outdoor seating at coffee shops is filled with patrons enjoying their afternoon tea."
+319,"At night, the Oriental Pearl Tower emits colorful lights, with surrounding buildings towering into the clouds. The camera moves vertically from bottom to top."
+320,"The Louvre is bathed in a golden glow under the setting sun, the camera pans to the left."
+321,"At Shibuya Crossing in Tokyo, the crowds surge, and the neon lights flash. The entire video is in high-definition quality."
+322,"The Potala Palace covered in snow, with the camera moving vertically from top to bottom."
+323,"In the living room at home, a couple is watching TV and eating popcorn."
+324,The kite drifts through the sky.
+325,A black sports watch with the hands constantly turning.
+326,Hot dog sausages are steaming continuously in the air fryer.
+327,"A colorful dress hangs on the balcony, swaying gently with the wind."
+328,"The refrigerator is placed in a corner of the kitchen, its white shell appearing particularly bright under the light. Various sticky notes and children's drawings are attached to the refrigerator door. The camera zooms in, focusing on showcasing the sticky notes on the refrigerator door."
+329,A black badminton racket with purple tape wrapped around the handle. The video is taken with an arc shot circling around the racket.
+330,"The climbing wall is covered with colorful handholds of various shapes, and the camera tilts upwards."
+331,A kite is flying in the sky.
+332,The street food stall is filled with steaming fried skewers.
+333,"In the bedroom, there is an infant bed, and the camera zooms in to display the small baby pillow on the bed."
+334,"Circular cleaning robots are neatly arranged on the shelves in the mall, with the ""Xiaomi"" logo printed on them. The camera slowly pushes in along the shelves."
+335,"A Scottish kilt flutters gently in the breeze, and the camera zooms in to reveal the plaid pattern on the kilt."
+336,"A strawberry drops into a cocktail glass, causing a splash, captured with high-speed shot, in 2K quality."
+337,"A cage of steamed dumplings was placed on the table, steaming hot."
+338,Close-up shot captures the steam gently rising from a coffee cup.
+339,A green apple was placed on the table and then cut in half.
+340,"Two flower-adorned cars drove side by side through the bustling, crowded streets, in a realistic style."
+341,The teapot is on the left side of the teacup. The camera moves horizontally from right to left during shooting.
+342,"A skateboard is sliding, with a blue baseball cap placed on it. In front of the skateboard is an empty street with a smooth and flat road surface. The video is in high-definition quality."
+343,"A black Nikon camera is placed on a stable tripod, and the shutter button is rapidly and continuously pressed by a finger."
+344,"The iMac computer screen and camera light up, captured in a medium close-up shot."
+345,"A floral-patterned evening gown, with the camera moving vertically from top to bottom in a close-up shot."
+346,"A black sedan speeds through a puddle-slicked road, splashing water,presented in high-speed shot."
+347,A man-made satellite is orbiting the Earth.
+348,"A silver necklace lies quietly in an exquisite box, shining brightly. The camera slowly zooms out, gradually revealing the outline of the entire box and its surroundings. The box is placed on a soft velvet pad."
+349,"In the living room, there is a sofa and a TV playing programs. The camera pans from right to left. The entire video is presented in black and white style."
+350,"A pair of nylon-made black sports pants with conspicuous reflective strips on the sides, flashing in the night with the wearer's steps, like a flowing ribbon of light."
+351,A pair of brand-new basketball shoes. The video is taken with a curved motion shot circling around the shoes.
+352,"A drone descends from the sky to the ground, tracking shot."
+353,"A red scarf is blown by the strong wind, dancing wildly in the air like a burning flame."
+354,A blue long dress fell from the balcony clothes rack and dropped into the water on the ground.
+355,"The calligraphy brush tip is dancing smoothly on the white paper, with ink slowly seeping out, captured in a close-up shot."
+356,The silk dress on the clothesline fluttered in the wind.
+357,A white robot vacuum is cleaning the floor along the edge of the wall.
+358,A treadmill. The video is taken with an arc shot.
+359,"The camera pushes in toward the vintage and exquisite bedroom lamp, with a lamp base crafted by hand soldering, featuring a colorful glass shade with varied textures."
+360,A circular digital watch is flashing numbers. The video is presented in a realistic style.
+361,"In the double-flavor hot pot, the left side is a tomato pot, and the right side is a spicy pot. The spicy pot is filled with floating Sichuan peppercorns, chili peppers, and other seasonings, accompanied by beef and vegetables, with continuous bubbles rising from the pot."
+362,"In the rain, there is a telephone booth, with rainwater sliding down the edges, forming curtains of water. As the camera zooms out, it reveals the telephone booth standing alone in a street soaked by the rain."
+363,"On the dining table, there is a steaming hot and spicy hot pot with bright red chili oil, broccoli, mushrooms, and sesame seeds sprinkled on top. The video is taken with an arc shot."
+364,A set of kimono with red and blue floral patterns. The camera zooms in to highlight the patterns on the kimono.
+365,A red sports car is drifting quickly around the corner.
+366,A pair of black and white soccer shoes with protruding studs on the sole. Close-up shot. The video is in black and white style.
+367,"A giant truck speeds down the highway, tracking shot."
+368,"Several pristine white nurse uniforms are hanging on the clothesline, with water droplets continuously falling from their edges."
+369,"A coffee machine is making coffee, and the coffee is slowly flowing from the outlet into the cup."
+370,"A desk lamp is lit, emitting yellow light. The video is taken with an arc shot surrounding the lamp."
+371,The embroidered shoes are adorned with plum blossoms. The camera zooms in to reveal the details of the plum blossoms. The image adopts a subtle green-blue hue.
+372,"The brushes lie across the palette, and the palette is placed on the art table, on the right side of the art table are paints, as the camera horizontally moving from left to right."
+373,"A bowl of seafood soup with shrimp, tomatoes, and lemon slices, the camera zooms in to highlight the small bubbles on the edges of the lemon slices."
+374,"The fan spins, in a realistic style."
+375,The display screen of the scale keeps flashing.
+376,A white weighing scale. The video is shot with an arc shot.
+377,"Beneath the mouse lies a blue mouse pad, which a hand slowly draws away."
+378,"A red and yellow bus slowly makes its way along a mountain road, captured in a long shot."
+379,"The ""A Thousand Li of Rivers and Mountains"" painting depicts continuous mountain ranges, meandering rivers, and fine lines, as the camera horizontally moving from left to right."
+380,Next to the matte-textured e-reader is a pink-covered book. The camera horizontally moving from left to right.
+381,"An electronic blood pressure monitor is at work, displaying numbers on its screen that are continuously changing."
+382,"A plate of stir-fried shrimp is placed on the table, with the shrimp paired with peas and corn inside the plate, steaming hot."
+383,A speedboat is making turn around on the river.
+384,"The balloon is ascending, captured in a slow-motion shot."
+385,"A khaki-colored fisherman's hat made of canvas, with a wide, round brim, is hanging on a coat rack behind the door. The camera zooms in to highlight the small daisy pattern embellished on the hat."
+386,"Eyes, noses, mouths and teeth are carved on the surface of the pumpkin lantern, and the candle flame in the pumpkin lantern sways in the wind."
+387,"In the evening at the harbor, ships sail along the river.Captured in a time-lapse shot,neutral tones for the picture"
+388,"The washing machine in the corner of the room is shaking, and you can see the drum spinning."
+389,"The treadmill belt is running at a steady speed, propelling the runner forward continuously. The camera shoots horizontally from the ground level."
+390,A blue cocktail with ice cubes floating at the top of the glass.
+391,A pajama was thrown onto the bed. This pajama has delicate lace trim and thin shoulder straps.
+392,There is a fishing boat speeding on the sea surface.
+393,The light blue towel hanging on the clothesline was blown by the wind and fell down.
+394,"The oven in the kitchen is toasting two slices of bread, which are gradually taking on a deeper hue. The video is in anime style"
+395,"A sushi platter, with sushi wrapped in nori containing rice and raw fish slices, accompanied by a small dish of soy sauce and wasabi. The camera pans to right."
+396,A painting hangs on the wall of the bedroom. The camera rolls counterclockwise during shooting.
+397,"Two tablets are placed on the table, with the camera panning horizontally from right to left."
+398,"Sparks fly from a grinding wheel, creating a shower of light against a dark, industrial background, with the camera capturing the vivid colors and slow-motion effect."
+399,A pair of gray dumbbells falls heavily to the ground.
+400,"A red electric iron rests on a coat, with wisps of light smoke drifting around it."
+401,"An ink bottle has been knocked over on the table, and the blue ink inside is flowing out."
+402,"A game controller is tightly held by a pair of hands, with fingers skillfully pressing each button."
+403,"A rotating plate holds a pizza covered with a thick layer of cheese. The pizza is topped with a variety of ingredients, including green vegetables, yellow pineapples, and pink shrimp."
+404,"There is a spherical hammock chair on the balcony, with a pure white fluffy cushion on the chair. The camera moves vertically downward from above."
+405,A yellow football tracksuit was thrown onto the stands.
+406,"The strings of the guitar are plucked, and an enchanting melody ensues."
+407,"As evening falls, a Tesla sedan speeds along the mountain road, captured in realistic style."
+408,Ball number 14 slowly rolls on the billiard table and then falls into the corner pocket.
+409,"A huge cargo ship is sailing on the turbulent sea, carrying containers. The video is in black and white style."
+410,"The orange curtains in the living room are slowly opened, and a beam of light comes through."
+411,"On the bed, there was a mobile phone on the right side of the pillow, and a video was playing on the phone."
+412,"The flag flutters in the wind, with the camera capturing it from a low angle."
+413,An apple is placed in front of the monitor. The camera moves horizontally from right to left during shooting.
+414,"At the coal mining site, a large red truck is seen transporting coal out, medium close-up shot."
+415,A helicopter is hovering in the sky.
+416,"There is a computer on the table, and the computer screen shows programs that are running."
+417,"A black and white silk shirt hangs on the balcony, with water dripping from the garment."
+418,"The chicken soup bubbles away in the pot, infused with goji berries and red dates."
+419,"Milk and cucumbers are placed on the dining table. The entire video is shot with a medium close-up shot, and the camera moves horizontally from left to right."
+420,"In the center of the living room, there is a sofa, and to the right of the sofa is a massage chair. The camera horizontally moving from left to right."
+421,"In space, two satellites collide, sending fragments flying in all directions, as the camera tilts up."
+422,"The firecracker exploded, sending shards of its wrapping paper flying in all directions, captured by high-speed shot."
+423,"In the bedroom, the ceiling fan is spinning rapidly."
+424,"Smoke billows through the air, captured in slow motion."
+425,"The parachute gently descends from the sky, low-angle shot."
+426,"There is a broken plate on the table inside the house, and the stove fire next to it is burning, flickering with flames."
+427,"A bachelor's gown. The video is taken with a 360-degree arc shot circling around the gown, presented in a realistic style."
+428,A colorful dragon dancing in the air
+429,"A magnificent palace emerges gradually from the shimmering sea, as if a mysterious force from the ocean depths gently lifts it. The palace's spires gleam with golden light in the setting sun, resembling a mirage, and is breathtakingly beautiful."
+430,"A deserted alien colony, with buildings eroded by sandstorms and abandoned high-tech equipment scattered across the desolate surface, in a realistic style. The camera pans from right to left."
+431,"In the story of Jack and the Beanstalk, a giant beanstalk pierces through the clouds, leading to a castle in the sky. The camera tilts upwards."
+432,"In a river flowing with diamonds, a group of little elves sit on a boat. The entire video presents a fantastical atmosphere, with low contrast."
+433,"A volcano made up of letters, with its body constructed from densely packed English letters, forming a unique volcanic landscape. The crater continuously emits steam clouds composed of letters."
+434,A building constructed from flowing code. The video is shot with an arc shot.
+435,"The black hole is located in the center of the galaxy, currently engulfing the surrounding matter."
+436,"Along the banks of a winding river, cherry blossom trees sway in the wind, and petals drift with the water, forming a pink river. The entire video presents a fantastical atmosphere."
+437,"In a three-dimensional space, colorful lights crisscross in all directions, constantly changing hues."
+438,"A woman in a red suit stands in front of the mirror, elegantly adjusting her collar. Suddenly, a woman in a blue suit appears in the mirror, smiling and reaching out her hand to the woman outside the mirror, as if inviting her into an unknown world."
+439,A beehive hotel made of glass on the moon. The video is taken with a 360-degree arc shot surrounding the hotel.
+440,"A vast grassland where every blade of grass is made up of geometric shapes. With the breeze blowing, the geometric grass leaves sway gently."
+441,"A library in a fairyland, with bookshelves filled with ancient books and magical potions. On the right side of the bookshelf is a door inlaid with gems. The camera moves horizontally from left to right, from displaying books to the door."
+442,"A massive spaceship slowly glided above the Earth, casting a long shadow through the clouds.The video is in a realistic style
+."
+443,"After opening the book, the words on the pages turn into a swarm of colorful butterflies fluttering about."
+444,"As the spaceship ascends, shimmering auroras appear in the sky. After the spaceship moves away, the auroras gradually fade away, in a sci-fi art style."
+445,A high tower built of stacked books straight into the clouds. The camera moves vertically from bottom to top during shooting.
+446,"After the candle is lit, a door appears in the middle of the flame."
+447,A power station operating in space. The video is presented in 3D animation style.
+448,A house with a sense of technology is floating on the sea surface. The video is shot with an arc shot.
+449,"A broken bronze UFO, its surface already covered in rust, lies quietly on a desolate plain. The camera moves horizontally from right to left during shooting."
+450,"An ancient tree stands tall and proud, its branches entwined with purple vines. Moonlight gently falls, and the vines seem to emit tiny particles of magic, forming mysterious pathways between the moonlight and the ancient tree, as if the tree is absorbing the essence of the moonlight.The entire video presents a fantastical atmosphere."
+451,Giant biological footprints on the moon. The video is shot with an arc shot.
+452,"In a snow-covered landscape, snowflakes twirl gracefully in the air, transforming into the image of a princess."
+453,"In a forest filled with music, the branches and leaves of the trees sway gently with the wind, releasing melodious notes, and the creatures of the forest dance to this natural rhythm."
+454,"A desert covered with stars, every grain of sand twinkling with starlight. A stone falls, stirring up a shiny dust cloud on the sand."
+455,"The refrigerator door was slowly opened, revealing a maze of various vegetables."
+456,"On the computer keyboard, a mini puppy jumps on different letter keys."
+457,"The forest is shrouded in morning mist. As the sun rises, the fog gradually disperses. Sunlight filters through the treetops, forming beams of light that outline the shape of a door, as if leading to the entrance of a mysterious realm."
+458,Icebergs on Uranus. The video is captured with an arc shot.
+459,"In a bowl of white rice, a black kitten is struggling to climb over the grains of rice."
+460,"In the canyon of Mars, a high-tech alien research station sits quietly there. Many Mars exploration vehicles collect Martian rocks around the research station and send them back to the research station."
+461,"In the desolate alien mining site, huge excavation machinery and transport vehicles are scattered around the vast mine pits. Some small reconnaissance machines are flying around in the air."
+462,"A palace made of ice crystals, emitting a faint blue glow under the aurora"
+463,"Individual dazzling diamond-shaped color blocks, like falling petals, pile up into a dreamlike sea of colors."
+464,"A blue tractor with the words ""Dongfanghong"" on it is parked on a cloud, with its exhaust pipe puffing out candy.Dark blue tint for the picture"
+465,"White clouds floated in the sky, and with a gust of wind, they transformed into the letters ""OK."""
+466,"A streamlined interstellar spaceship glides slowly through the brilliance of a nebula, its surface adorned with smooth metal armor and twinkling navigation lights. The entire image presents a sci-fi art style."
+467,A plus sign is circling around the letter Z. The entire video is in anime style.
+468,"Eggs are dancing in the pot, while apples watch from the side. The entire video presents a joyful atmosphere."
+469,"Various colored desk lamps, slowly lowering their heads. Shot from high angle."
+470,A sweater with a big mouth is slowly closing its mouth.
+471,A plus sign jumped out of the book.
+472,Three mermaids are swimming beside the coral reef.
+473,"Tang Sanzang, draped in his monk's robe and holding a staff, walked forward slowly."
+474,"A wide shot of a unicorn peacefully grazing in a lush, enchanted forest. The camera captures the unicorn’s shimmering coat, the magical glow of the surrounding flora, and the serene atmosphere of the scene."
+475,A pillow is flying with jet propulsion.
+476,"Hou Yi, carrying a bow and arrows, shuttles through the mountains and forests."
+477,The golden dragon is coiling around the top of the palace. Close-up shot.
+478,The nine-tailed fox strolls in the forest.
+479,"On the grass, a crimson demon with sharp horns is striding, each step causing the ground to tremble slightly."
+480,"In the underwater world, a mermaid swims past colorful coral reefs, with the camera moving vertically from top to bottom during filming."
+481,"Mickey Mouse from Disney clad in a suit, is conducting the orchestra on stage with elegant hand gestures."
+482,A piece of paper filled with mathematical formulas. The camera moves horizontally from left to right.
+483,"A mechanical bear walks through the white snow, leaving deep footprints. The video is presented in a sci-fi art style."
+484,"A crystal snake, with its tongue flicking out, slithers quietly through a dim cave."
+485,"In the sky, a fairy in a white dress dances on the clouds. The entire video presents a fantastical atmosphere."
+486,A gray mechanical dolphin jumps on the water surface.
+487,"An exclamation mark ""!"" is pasted on the laboratory door as a warning as the camera tilts upwards."
+488,Generate a video in which a circle gradually emerges.
+489,A mechanical dinosaur is swishing its tail.
+490,Kamen Rider was speeding down the road on his motorcycle.
+491,"The three-headed dragon slowly lies down, in anime style."
+492,"The lotus has emerald green leaves, the leaves are floating on the water, the petals of the lotus are as white as snow, medium close-up shot"
+493,"Ivy cascades down like a waterfall from above, the camera tilts downwards."
+494,The palm leaves sway in the wind with a blurred background.
+495,"Lotus flowers float on the water surface. The camera moves horizontally from left to right. The image is in cyan tones, with a clay animation style."
+496,"The hornwort stands still in the water, its tiny leaves and red stems looking particularly vibrant in the sunlight, with the water's surface sparkling with light patches under the sun."
+497,"Duckweed spreads densely on the water surface, with water droplets covering the leaves, and the camera pans to right."
+498,A piece of kelp is gently swaying with the current at the bottom of the sea. Its dark green blades are just like an underwater forest.
+499,A rolling watermelon.The video is in black and white style.
+500,"A clump of Schoenoplectus tabernaemontani blooms with white flowers, swaying with the current on the water surface."
+501,"Spring has arrived, and peach trees are blooming with beautiful peach blossoms. The camera tilts upwards.Blue tint for the picture"
+502,"In the early morning, the leaves of the lotus are swaying slightly, with a few drops of dew on the leaves.The video is in black and white style."
+503,"A tranquil lake surface, with lotus flowers blooming and lotus leaves swaying in the wind. The camera moves horizontally from right to left during shooting."
+504,A medium close-up shot shot captures the white daffodil flowers gently drifting atop the water's surface.
+505,"In the center of the pond, three water lilies are slowly blooming, and the lily pads are oval-shaped, time-lapse shot."
+506,"Tender green leaves. Close-up shot showing clear leaf veins, the camera moves horizontally from right to left."
+507,"In the garden, the rose vines are entwined around the wall, lush and vibrant, captured by a camera tilted upwards."
+508,"On the grassland, patches of dandelion seeds flutter in the wind."
+509,"On the left side of the lake, a cluster of water lilies blooms quietly, their pink petals vibrant in the clear water."
+510,"The tulips in the flower pot fell to the ground, with the camera tilting downward,in HD quality."
+511,The seaweed in the water is slowly swaying.
+512,"There is a pine tree on the right side of the road, swaying gently in the wind, and its needles reflecting the light when the sun shines."
+513,"Amid the majestic scenery of the Three Gorges of the Yangtze River, the water flows gently while misty clouds envelop the layers of mountains on both sides."
+514,"Under the polar night sky, the stars shine brilliantly and the Milky Way is clearly visible. The glaciers glisten with a cold radiance in the moonlight, while the Northern Lights dance across the horizon."
+515,The endless Great Wall weaves through the mountains like a dragon. The camera pans left horizontally.
+516,"The forest at night is silent and mysterious. Moonlight filters through the sparse leaves, casting dappled patterns of light and shadow on the ground. The shadows of the trees sway gently, and a light breeze blows, bringing the rustling sound of leaves."
+517,A meteor streaked across the night sky rapidly.
+518,"The countryside stream is crystal clear, and the willow trees by the bank sway gently in the wind, a slow-motion shot."
+519,"The blue sky meets the azure sea, with the surface of the ocean shimmering in the sunlight. In the distance, several sailboats are sailing slowly. The video is presented in a realistic style."
+520,"The river rushes through the valley, with turbulent and surging waters. The river water crashes against the rocks."
+521,"Behind the rubber tree, a cluster of wild mushrooms is sprouting quietly. Taken with a time-lapse shot, the process of mushrooms emerging from the soil and gradually unfolding their umbrella-shaped caps is captured."
+522,"In the early morning, the dewdrops hanging on the leaves sparkle under the sunlight."
+523,"As the sun slowly rose above the horizon, the snow on the hillside turned golden in the morning light, shimmering with a warm and dazzling glow."
+524,"The camera slowly pushes in, capturing a lush shrubbery on the sandy ground."
+525,"As the gentle breeze sweeps over Yamdrok Lake, ripples form and the water sparkles, medium close-up shot."
+526,A crystal-clear stream gently flows as fish swim within it.
+527,"In Antarctica, a desolate continent covered in ice, the glaciers glisten brightly under the golden sunlight. Penguins waddle across the ice sheet and seals play among the ice floes."
+528,"The clouds in the sky move rapidly under the influence of the wind, captured by time-lapse shot."
+529,Volcanic Eruption
+530,"The bright Polaris is hanging in the sky. Suddenly, a meteor streaked across beside it. The video is in a realistic style."
+531,"In the night sky, stunning ribbons of light emerged, with green, purple, and red interweaving as if dancing shadows."
+532,"Under the clear sky, the summit of Mount Fuji is gradually gently obscured by the drifting white clouds. The camera slowly zoom out, reflecting its magnificent mountain body in the calm Lake Kawaguchi."
+533,"In a vast tropical rainforest, towering trees reach up to the clouds, their dense canopies blocking most of the sunlight, casting the ground in a soft green hue. Various tropical plants compete to grow, and colorful birds sing happily on the branches."
+534,"The sunset stains the sky red, and the waves gently lap against the rocks. The entire video presents a romantic atmosphere."
+535,"A plane streaked across the sky, leaving a long contrail behind."
+536,"In the summer polar regions, under the sun's rays, the edges of the glaciers begin to melt. The meltwater flows along the ice crevices, exposing rocks and soil, infusing this icy realm with a hint of vitality."
+537,A stormy night with lightning and thunder.The video is in black and white style.
+538,"The avalanche came swiftly like a mountain collapse, with a large amount of snow blocks mixed with cold wind, pouring down the slope irresistibly."
+539,The aurora shines in the sky.
+540,"The stream water is crystal clear to the bottom, and the pebbles shine under the sunlight. Tracking shot along with the stream."
+541,"First-person perspective aerial shot, sunlight shining on the snow-capped mountain peak, clouds rolling and unfolding, creating a tranquil atmosphere."
+542,"The Milky Way spans across the sky, with stars twinkling with various shades of light."
+543,"The water in the river flowed slowly, and the reeds alongside the river swayed."
+544,"A small quail forages in the field, looking left and right cautiously."
+545,"As night falls, the stars in the sky begin to twinkle."
+546,"A serene night sky, with stars twinkling and the Milky Way clearly visible.The camera horizontally moves from right to left."
+547,Aerial view of winding rivers flowing into the sea
+548,A small bridge over flowing water in the countryside reflects the blue sky and drifting clouds in the clear river below.
+549,"In the rural stream, aquatic plants sway with the waves, and peach blossoms on both banks bloom in the spring breeze."
+550,The boundless ocean with its surging waves.
+551,"Under the sunset in the polar region, the sky takes on a warm, reddish-orange hue. The glaciers and snow-covered mountains appear even more serene and majestic under the sunlight. The camera slowly pulls back, revealing a vast expanse of the polar landscape in its entirety."
+552,"The stream flows gently, with pebbles scattered along its bed."
+553,Victoria Falls
+554,"By the seaside, waves crash against the rocks, sending up sprays of foam."
+555,"The rain drizzles down, soaking the seats by the roadside. The entire video presents a melancholic atmosphere."
+556,"The mountaintop of Huangshan is shrouded in mist, with birds flying through. The camera moves horizontally from right to left during shooting."
+557,The lotus flowers and leaves in the park's lake are being blown around by the strong wind.
+558,"Snowflakes gently falling from the sky, close-up shot."
+559,"The waterfall plunges down from the steep cliff, with a fierce current and a sound that shakes the heavens. The water of the waterfall draws beautiful arcs in the air and then falls into the deep pool, stirring up waves of white spray."
+560,"In the glacier crevasses, one can see meltwater gently flowing, medium close-up shot."
+561,"The vast and boundless ocean, with its clear blue waters, waves endlessly crashing against the shore."
+562,"Clouds in the sky are changing shapes, sometimes like galloping steeds, and sometimes like white castles."
+563,"The river rushes through the canyon, splashing white water."
+564,"At the foot of the snowy mountain, a small river flows, next to which lies a skier's camp.The picture adopts a strong orange and red color tone."
+565,Raindrops pitter-patter against the window. The entire video presents a melancholic atmosphere.
+566,"The banks of the Thames, as the camera moves vertically from low to high."
+567,"Waves crash against the rock, slow motion shot."
+568,"On Kuta Beach in Bali, warm sunlight spills onto the soft white sand, and gentle waves lap against the shore. The camera slowly pulls back, revealing the entire beautiful beach."
+569,The brook flows gently through the countryside.
+570,"In the mysterious canyon, bizarre rock formations and striking peaks stand tall. A crystal-clear stream flows through the valley, flanked by lush vegetation. Birds fly through the sky. In the style of Paul Gauguin."
+571,"In the foggy weather, the mountains are faintly visible, blending with the sky and the earth as if it were a mysterious ink-wash painting. The camera tilts down."
+572,"At night, the desert is bathed in moonlight, creating patches of silver on the sand dunes. The camera pans from left to right."
+573,"The rain gradually stopped, and a rare double rainbow appeared in the sky. One rainbow had distinct color layers, while the other had softer colors."
+574,"The mountain stream is crystal clear, and small stones gently roll in the babbling water."
+575,"A desolate tropical island with clear waters, lush palm trees, and sandy beaches. The camera captures the scene with an aerial arc shot, circling around the island."
+576,"Several ice floes are floating on the Arctic Ocean, and one of them suddenly cracks open."
+577,"In the canyon, a meandering turquoise river flows quietly, wide shot."
+578,"Under the sunlight, the lake surface shimmered. The camera slowly pans from right to left."
+579,"The clear surface of the pond is like a mirror, reflecting the blue sky and white clouds. A gentle breeze blows, bringing a refreshing coolness, and the water surface ripples in waves."
+580,The turbulent river rushes forward.The video is in black and white style.
+581,"There is a stream in the lush forest, flowing slowly."
+582,"Waves pound the rocks, with the camera capturing the scene from a high angle shot."
+583,"A strong wind whips up dust and sand, turning the sky murky. The air is filled with particles of sand, and pedestrians cover their mouths and noses, struggling to move forward."
+584,"A full moon hangs high in the night sky, emitting a soft glow. A cloud slowly drifts over, covering half of the moon."
+585,"As the camera slowly pushes forward, passing over the rolling sand dunes, a vibrant oasis suddenly comes into view like a miraculous sight."
+586,"As the sand continues to accumulate, the height of the pile is slowly increasing."
+587,"Village in the valley, with wisps of cooking smoke rising. The camera tilts up."
+588,"The afterglow of the setting sun spills onto the lake surface, turning the water a golden yellow. The camera pans to the left."
+589,"At the break of dawn, the sky by the seaside began to lighten with a pale hue, and the sun slowly rose from the horizon, in a long shot."
+590,"The Himalayas, with their layered peaks and snow-covered expanses, shine with golden light where the sun illuminates the mountaintops. Clouds drift above the summit, gathering and dispersing intermittently."
+591,A tsunami is striking the city.
+592,"After the heavy snow, the hillside was left with a pristine world covered in a fresh layer of snow. The camera horizontally moves from left to right."
+593,"At the foot of the mountains, there is a clear lake, and two Africans are swimming in the lake."
+594,"A dazzling starry sky, with occasional meteors streaking across. Captured with a time-lapse camera, the image is of standard definition."
+595,"In the morning, when the sun is about to appear, the sky appears dark blue, and as the sun rises, the sky gradually becomes brighter. The video is in anime style"
+596,A mudslide roared down the hillside.
+597,"A violent earthquake caused the ground to shake violently, and subsequently, a huge crack appeared in the ground."
+598,"On a clear day, the surface of the Nile River is sparkling, and the water flows slowly. There are stretches of oases on the banks of the river, and stretches of desert in the distance."
+599,"The tornado moved over the lake, sucking the lake water into the storm, forming a waterspout."
--- a/assets/WECHAT.md
+++ b/assets/WECHAT.md
+<div align="center">
+<img src=wechat.jpg width="60%"/>
+
+<p> 扫码关注混元系列工作，加入「 Hunyuan Video 交流群」 </p>
+<p> Scan the QR code to  join the "Hunyuan Discussion Group" </p>
+</div>
+
--- a/assets/backbone.png
+++ b/assets/backbone.png
--- a/assets/hunyuanvideo.pdf
+++ b/assets/hunyuanvideo.pdf
--- a/assets/logo.png
+++ b/assets/logo.png
--- a/assets/overall.png
+++ b/assets/overall.png
--- a/assets/text_encoder.png
+++ b/assets/text_encoder.png
--- a/assets/video_poster.png
+++ b/assets/video_poster.png
--- a/assets/wechat.jpg
+++ b/assets/wechat.jpg
--- a/ckpts
+++ b/ckpts
+/workspace/cicd/packages/hunyuan-video-t2v/ckpts
\ No newline at end of file
--- a/fix.sh
+++ b/fix.sh
+#!/bin/bash
+
+cp modified/config.py /usr/local/lib/python3.10/dist-packages/xfuser/config/
+cp modified/envs.py /usr/local/lib/python3.10/dist-packages/xfuser/
--- a/gradio_server.py
+++ b/gradio_server.py
+import os
+import time
+from pathlib import Path
+from loguru import logger
+from datetime import datetime
+import gradio as gr
+import random
+
+from hyvideo.utils.file_utils import save_videos_grid
+from hyvideo.config import parse_args
+from hyvideo.inference import HunyuanVideoSampler
+from hyvideo.constants import NEGATIVE_PROMPT
+
+def initialize_model(model_path):
+    args = parse_args()
+    models_root_path = Path(model_path)
+    if not models_root_path.exists():
+        raise ValueError(f"`models_root` not exists: {models_root_path}")
+    
+    hunyuan_video_sampler = HunyuanVideoSampler.from_pretrained(models_root_path, args=args)
+    return hunyuan_video_sampler
+
+def generate_video(
+    model,
+    prompt,
+    resolution,
+    video_length,
+    seed,
+    num_inference_steps,
+    guidance_scale,
+    flow_shift,
+    embedded_guidance_scale
+):
+    seed = None if seed == -1 else seed
+    width, height = resolution.split("x")
+    width, height = int(width), int(height)
+    negative_prompt = "" # not applicable in the inference
+
+    outputs = model.predict(
+        prompt=prompt,
+        height=height,
+        width=width, 
+        video_length=video_length,
+        seed=seed,
+        negative_prompt=negative_prompt,
+        infer_steps=num_inference_steps,
+        guidance_scale=guidance_scale,
+        num_videos_per_prompt=1,
+        flow_shift=flow_shift,
+        batch_size=1,
+        embedded_guidance_scale=embedded_guidance_scale
+    )
+    
+    samples = outputs['samples']
+    sample = samples[0].unsqueeze(0)
+    
+    save_path = os.path.join(os.getcwd(), "gradio_outputs")
+    os.makedirs(save_path, exist_ok=True)
+    
+    time_flag = datetime.fromtimestamp(time.time()).strftime("%Y-%m-%d-%H:%M:%S")
+    video_path = f"{save_path}/{time_flag}_seed{outputs['seeds'][0]}_{outputs['prompts'][0][:100].replace('/','')}.mp4"
+    save_videos_grid(sample, video_path, fps=24)
+    logger.info(f'Sample saved to: {video_path}')
+    
+    return video_path
+
+def create_demo(model_path, save_path):
+    model = initialize_model(model_path)
+    
+    with gr.Blocks() as demo:
+        gr.Markdown("# Hunyuan Video Generation")
+        
+        with gr.Row():
+            with gr.Column():
+                prompt = gr.Textbox(label="Prompt", value="A cat walks on the grass, realistic style.")
+                with gr.Row():
+                    resolution = gr.Dropdown(
+                        choices=[
+                            # 720p
+                            ("1280x720 (16:9, 720p)", "1280x720"),
+                            ("720x1280 (9:16, 720p)", "720x1280"), 
+                            ("1104x832 (4:3, 720p)", "1104x832"),
+                            ("832x1104 (3:4, 720p)", "832x1104"),
+                            ("960x960 (1:1, 720p)", "960x960"),
+                            # 540p
+                            ("960x544 (16:9, 540p)", "960x544"),
+                            ("544x960 (9:16, 540p)", "544x960"),
+                            ("832x624 (4:3, 540p)", "832x624"), 
+                            ("624x832 (3:4, 540p)", "624x832"),
+                            ("720x720 (1:1, 540p)", "720x720"),
+                        ],
+                        value="1280x720",
+                        label="Resolution"
+                    )
+                    video_length = gr.Dropdown(
+                        label="Video Length",
+                        choices=[
+                            ("2s(65f)", 65),
+                            ("5s(129f)", 129),
+                        ],
+                        value=129,
+                    )
+                num_inference_steps = gr.Slider(1, 100, value=50, step=1, label="Number of Inference Steps")
+                show_advanced = gr.Checkbox(label="Show Advanced Options", value=False)
+                with gr.Row(visible=False) as advanced_row:
+                    with gr.Column():
+                        seed = gr.Number(value=-1, label="Seed (-1 for random)")
+                        guidance_scale = gr.Slider(1.0, 20.0, value=1.0, step=0.5, label="Guidance Scale")
+                        flow_shift = gr.Slider(0.0, 10.0, value=7.0, step=0.1, label="Flow Shift") 
+                        embedded_guidance_scale = gr.Slider(1.0, 20.0, value=6.0, step=0.5, label="Embedded Guidance Scale")
+                show_advanced.change(fn=lambda x: gr.Row(visible=x), inputs=[show_advanced], outputs=[advanced_row])
+                generate_btn = gr.Button("Generate")
+            
+            with gr.Column():
+                output = gr.Video(label="Generated Video")
+        
+        generate_btn.click(
+            fn=lambda *inputs: generate_video(model, *inputs),
+            inputs=[
+                prompt,
+                resolution,
+                video_length,
+                seed,
+                num_inference_steps,
+                guidance_scale,
+                flow_shift,
+                embedded_guidance_scale
+            ],
+            outputs=output
+        )
+    
+    return demo
+
+if __name__ == "__main__":
+    os.environ["GRADIO_ANALYTICS_ENABLED"] = "False"
+    server_name = os.getenv("SERVER_NAME", "0.0.0.0")
+    server_port = int(os.getenv("SERVER_PORT", "8081"))
+    args = parse_args()
+    print(args)
+    demo = create_demo(args.model_base, args.save_path)
+    demo.launch(server_name=server_name, server_port=server_port)
\ No newline at end of file
--- a/hyvideo/__init__.py
+++ b/hyvideo/__init__.py
--- a/hyvideo/config.py
+++ b/hyvideo/config.py
+import argparse
+from .constants import *
+import re
+from .modules.models import HUNYUAN_VIDEO_CONFIG
+
+
+def parse_args(namespace=None):
+    parser = argparse.ArgumentParser(description="HunyuanVideo inference script")
+
+    parser = add_network_args(parser)
+    parser = add_extra_models_args(parser)
+    parser = add_denoise_schedule_args(parser)
+    parser = add_inference_args(parser)
+    parser = add_parallel_args(parser)
+
+    args = parser.parse_args(namespace=namespace)
+    args = sanity_check_args(args)
+
+    return args
+
+
+def add_network_args(parser: argparse.ArgumentParser):
+    group = parser.add_argument_group(title="HunyuanVideo network args")
+
+    # Main model
+    group.add_argument(
+        "--model",
+        type=str,
+        choices=list(HUNYUAN_VIDEO_CONFIG.keys()),
+        default="HYVideo-T/2-cfgdistill",
+    )
+    group.add_argument(
+        "--latent-channels",
+        type=str,
+        default=16,
+        help="Number of latent channels of DiT. If None, it will be determined by `vae`. If provided, "
+        "it still needs to match the latent channels of the VAE model.",
+    )
+    group.add_argument(
+        "--precision",
+        type=str,
+        default="bf16",
+        choices=PRECISIONS,
+        help="Precision mode. Options: fp32, fp16, bf16. Applied to the backbone model and optimizer.",
+    )
+
+    # RoPE
+    group.add_argument(
+        "--rope-theta", type=int, default=256, help="Theta used in RoPE."
+    )
+    return parser
+
+
+def add_extra_models_args(parser: argparse.ArgumentParser):
+    group = parser.add_argument_group(
+        title="Extra models args, including vae, text encoders and tokenizers)"
+    )
+
+    # - VAE
+    group.add_argument(
+        "--vae",
+        type=str,
+        default="884-16c-hy",
+        choices=list(VAE_PATH),
+        help="Name of the VAE model.",
+    )
+    group.add_argument(
+        "--vae-precision",
+        type=str,
+        default="fp16",
+        choices=PRECISIONS,
+        help="Precision mode for the VAE model.",
+    )
+    group.add_argument(
+        "--vae-tiling",
+        action="store_true",
+        help="Enable tiling for the VAE model to save GPU memory.",
+    )
+    group.set_defaults(vae_tiling=True)
+
+    group.add_argument(
+        "--text-encoder",
+        type=str,
+        default="llm",
+        choices=list(TEXT_ENCODER_PATH),
+        help="Name of the text encoder model.",
+    )
+    group.add_argument(
+        "--text-encoder-precision",
+        type=str,
+        default="fp16",
+        choices=PRECISIONS,
+        help="Precision mode for the text encoder model.",
+    )
+    group.add_argument(
+        "--text-states-dim",
+        type=int,
+        default=4096,
+        help="Dimension of the text encoder hidden states.",
+    )
+    group.add_argument(
+        "--text-len", type=int, default=256, help="Maximum length of the text input."
+    )
+    group.add_argument(
+        "--tokenizer",
+        type=str,
+        default="llm",
+        choices=list(TOKENIZER_PATH),
+        help="Name of the tokenizer model.",
+    )
+    group.add_argument(
+        "--prompt-template",
+        type=str,
+        default="dit-llm-encode",
+        choices=PROMPT_TEMPLATE,
+        help="Image prompt template for the decoder-only text encoder model.",
+    )
+    group.add_argument(
+        "--prompt-template-video",
+        type=str,
+        default="dit-llm-encode-video",
+        choices=PROMPT_TEMPLATE,
+        help="Video prompt template for the decoder-only text encoder model.",
+    )
+    group.add_argument(
+        "--hidden-state-skip-layer",
+        type=int,
+        default=2,
+        help="Skip layer for hidden states.",
+    )
+    group.add_argument(
+        "--apply-final-norm",
+        action="store_true",
+        help="Apply final normalization to the used text encoder hidden states.",
+    )
+
+    # - CLIP
+    group.add_argument(
+        "--text-encoder-2",
+        type=str,
+        default="clipL",
+        choices=list(TEXT_ENCODER_PATH),
+        help="Name of the second text encoder model.",
+    )
+    group.add_argument(
+        "--text-encoder-precision-2",
+        type=str,
+        default="fp16",
+        choices=PRECISIONS,
+        help="Precision mode for the second text encoder model.",
+    )
+    group.add_argument(
+        "--text-states-dim-2",
+        type=int,
+        default=768,
+        help="Dimension of the second text encoder hidden states.",
+    )
+    group.add_argument(
+        "--tokenizer-2",
+        type=str,
+        default="clipL",
+        choices=list(TOKENIZER_PATH),
+        help="Name of the second tokenizer model.",
+    )
+    group.add_argument(
+        "--text-len-2",
+        type=int,
+        default=77,
+        help="Maximum length of the second text input.",
+    )
+
+    return parser
+
+
+def add_denoise_schedule_args(parser: argparse.ArgumentParser):
+    group = parser.add_argument_group(title="Denoise schedule args")
+
+    group.add_argument(
+        "--denoise-type",
+        type=str,
+        default="flow",
+        help="Denoise type for noised inputs.",
+    )
+
+    # Flow Matching
+    group.add_argument(
+        "--flow-shift",
+        type=float,
+        default=7.0,
+        help="Shift factor for flow matching schedulers.",
+    )
+    group.add_argument(
+        "--flow-reverse",
+        action="store_true",
+        help="If reverse, learning/sampling from t=1 -> t=0.",
+    )
+    group.add_argument(
+        "--flow-solver",
+        type=str,
+        default="euler",
+        help="Solver for flow matching.",
+    )
+    group.add_argument(
+        "--use-linear-quadratic-schedule",
+        action="store_true",
+        help="Use linear quadratic schedule for flow matching."
+        "Following MovieGen (https://ai.meta.com/static-resource/movie-gen-research-paper)",
+    )
+    group.add_argument(
+        "--linear-schedule-end",
+        type=int,
+        default=25,
+        help="End step for linear quadratic schedule for flow matching.",
+    )
+
+    return parser
+
+
+def add_inference_args(parser: argparse.ArgumentParser):
+    group = parser.add_argument_group(title="Inference args")
+
+    # ======================== Model loads ========================
+    group.add_argument(
+        "--model-base",
+        type=str,
+        default="ckpts",
+        help="Root path of all the models, including t2v models and extra models.",
+    )
+    group.add_argument(
+        "--dit-weight",
+        type=str,
+        default="ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt",
+        help="Path to the HunyuanVideo model. If None, search the model in the args.model_root."
+        "1. If it is a file, load the model directly."
+        "2. If it is a directory, search the model in the directory. Support two types of models: "
+        "1) named `pytorch_model_*.pt`"
+        "2) named `*_model_states.pt`, where * can be `mp_rank_00`.",
+    )
+    group.add_argument(
+        "--model-resolution",
+        type=str,
+        default="540p",
+        choices=["540p", "720p"],
+        help="Root path of all the models, including t2v models and extra models.",
+    )
+    group.add_argument(
+        "--load-key",
+        type=str,
+        default="module",
+        help="Key to load the model states. 'module' for the main model, 'ema' for the EMA model.",
+    )
+    group.add_argument(
+        "--use-cpu-offload",
+        action="store_true",
+        help="Use CPU offload for the model load.",
+    )
+
+    # ======================== Inference general setting ========================
+    group.add_argument(
+        "--batch-size",
+        type=int,
+        default=1,
+        help="Batch size for inference and evaluation.",
+    )
+    group.add_argument(
+        "--infer-steps",
+        type=int,
+        default=50,
+        help="Number of denoising steps for inference.",
+    )
+    group.add_argument(
+        "--disable-autocast",
+        action="store_true",
+        help="Disable autocast for denoising loop and vae decoding in pipeline sampling.",
+    )
+    group.add_argument(
+        "--save-path",
+        type=str,
+        default="./results",
+        help="Path to save the generated samples.",
+    )
+    group.add_argument(
+        "--save-path-suffix",
+        type=str,
+        default="",
+        help="Suffix for the directory of saved samples.",
+    )
+    group.add_argument(
+        "--name-suffix",
+        type=str,
+        default="",
+        help="Suffix for the names of saved samples.",
+    )
+    group.add_argument(
+        "--num-videos",
+        type=int,
+        default=1,
+        help="Number of videos to generate for each prompt.",
+    )
+    # ---sample size---
+    group.add_argument(
+        "--video-size",
+        type=int,
+        nargs="+",
+        default=(720, 1280),
+        help="Video size for training. If a single value is provided, it will be used for both height "
+        "and width. If two values are provided, they will be used for height and width "
+        "respectively.",
+    )
+    group.add_argument(
+        "--video-length",
+        type=int,
+        default=129,
+        help="How many frames to sample from a video. if using 3d vae, the number should be 4n+1",
+    )
+    # --- prompt ---
+    group.add_argument(
+        "--prompt",
+        type=str,
+        default=None,
+        help="Prompt for sampling during evaluation.",
+    )
+    group.add_argument(
+        "--seed-type",
+        type=str,
+        default="auto",
+        choices=["file", "random", "fixed", "auto"],
+        help="Seed type for evaluation. If file, use the seed from the CSV file. If random, generate a "
+        "random seed. If fixed, use the fixed seed given by `--seed`. If auto, `csv` will use the "
+        "seed column if available, otherwise use the fixed `seed` value. `prompt` will use the "
+        "fixed `seed` value.",
+    )
+    group.add_argument("--seed", type=int, default=None, help="Seed for evaluation.")
+
+    # Classifier-Free Guidance
+    group.add_argument(
+        "--neg-prompt", type=str, default=None, help="Negative prompt for sampling."
+    )
+    group.add_argument(
+        "--cfg-scale", type=float, default=1.0, help="Classifier free guidance scale."
+    )
+    group.add_argument(
+        "--embedded-cfg-scale",
+        type=float,
+        default=6.0,
+        help="Embeded classifier free guidance scale.",
+    )
+
+    group.add_argument(
+        "--use-fp8",
+        action="store_true",
+        help="Enable use fp8 for inference acceleration."
+    )
+
+    group.add_argument(
+        "--reproduce",
+        action="store_true",
+        help="Enable reproducibility by setting random seeds and deterministic algorithms.",
+    )
+
+    return parser
+
+
+def add_parallel_args(parser: argparse.ArgumentParser):
+    group = parser.add_argument_group(title="Parallel args")
+
+    # ======================== Model loads ========================
+    group.add_argument(
+        "--ulysses-degree",
+        type=int,
+        default=1,
+        help="Ulysses degree.",
+    )
+    group.add_argument(
+        "--ring-degree",
+        type=int,
+        default=1,
+        help="Ulysses degree.",
+    )
+
+    return parser
+
+
+def sanity_check_args(args):
+    # VAE channels
+    vae_pattern = r"\d{2,3}-\d{1,2}c-\w+"
+    if not re.match(vae_pattern, args.vae):
+        raise ValueError(
+            f"Invalid VAE model: {args.vae}. Must be in the format of '{vae_pattern}'."
+        )
+    vae_channels = int(args.vae.split("-")[1][:-1])
+    if args.latent_channels is None:
+        args.latent_channels = vae_channels
+    if vae_channels != args.latent_channels:
+        raise ValueError(
+            f"Latent channels ({args.latent_channels}) must match the VAE channels ({vae_channels})."
+        )
+    return args