Commit 9957af49 authored by jerrrrry's avatar jerrrrry
Browse files

Update README.md

parent e2ec88b4
Pipeline #3336 failed with stages
in 0 seconds
<!-- ## **HunyuanVideo** -->
[中文阅读](./README_zh.md)
<p align="center">
<img src="./assets/logo.png" height=100>
</p>
# **HunyuanVideo-I2V** 🌅
<div align="center">
<a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V"><img src="https://img.shields.io/static/v1?label=HunyuanVideo-I2V Code&message=Github&color=blue"></a> &ensp;
<a href="https://aivideo.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Web&color=green"></a> &ensp;
<a href="https://video.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Playground&message=Web&color=green"></a>
</div>
<div align="center">
<a href="https://arxiv.org/abs/2412.03603"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red"></a> &ensp;
<a href="https://aivideo.hunyuan.tencent.com/hunyuanvideo.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=High-Quality Version (~350M)&color=red"></a>
</div>
<div align="center">
<a href="https://huggingface.co/tencent/HunyuanVideo-I2V"><img src="https://img.shields.io/static/v1?label=HunyuanVideo-I2V&message=HuggingFace&color=yellow"></a> &ensp;
<!-- <a href="https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video"><img src="https://img.shields.io/static/v1?label=HunyuanVideo&message=Diffusers&color=yellow"></a> &ensp; -->
<!-- <a href="https://huggingface.co/tencent/HunyuanVideo-PromptRewrite"><img src="https://img.shields.io/static/v1?label=HunyuanVideo-PromptRewrite&message=HuggingFace&color=yellow"></a> -->
<!--
[![Replicate](https://replicate.com/zsxkib/hunyuan-video/badge)](https://replicate.com/zsxkib/hunyuan-video) -->
</div>
<p align="center">
👋 Join our <a href="assets/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/tv7FkG4Nwf" target="_blank">Discord</a>
</p>
<p align="center">
-----
Following the great successful open-sourcing of our [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo), we proudly present the [HunyuanVideo-I2V](https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V), a new image-to-video generation framework to accelerate open-source community exploration!
This repo contains official PyTorch model definitions, pre-trained weights and inference/sampling code. You can find more visualizations on our [project page](https://aivideo.hunyuan.tencent.com). Meanwhile, we have released the LoRA training code for customizable special effects, which can be used to create more interesting video effects.
> [**HunyuanVideo: A Systematic Framework For Large Video Generation Model**](https://arxiv.org/abs/2412.03603) <be>
## 🔥🔥🔥 News!!
* Mar 13, 2025: 🚀 We release the parallel inference code for HunyuanVideo-I2V powered by [xDiT](https://github.com/xdit-project/xDiT).
* Mar 11, 2025: 🎉 We have updated the lora training and inference code after fixing the bug.
* Mar 07, 2025: 🔥 We have fixed the bug in our open-source version that caused ID changes. Please try the new model weights of [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V) to ensure full visual consistency in the first frame and produce higher quality videos.
* Mar 06, 2025: 👋 We release the inference code and model weights of HunyuanVideo-I2V. [Download](https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V/blob/main/ckpts/README.md).
## 🎥 Demo
### I2V Demo
<div align="center">
<video src="https://github.com/user-attachments/assets/442afb73-3092-454f-bc46-02361c285930" width="80%" poster="./assets/video_poster.jpg"> </video>
<p>Co-creator @D-aiY Director Ding Yi</p>
</div>
### First Frame Consistency Demo
| Reference Image | Generated Video |
|:----------------:|:----------------:|
| <img src="https://github.com/user-attachments/assets/83e7a097-ffca-40db-9c72-be01d866aa7d" width="80%"> | <video src="https://github.com/user-attachments/assets/f81d2c88-bb1a-43f8-b40f-1ccc20774563" width="100%"> </video> |
<img src="https://github.com/user-attachments/assets/c385a11f-60c7-4919-b0f1-bc5e715f673c" width="80%"> | <video src="https://github.com/user-attachments/assets/0c29ede9-0481-4d40-9c67-a4b6267fdc2d" width="100%"> </video> |
<img src="https://github.com/user-attachments/assets/5763f5eb-0be5-4b36-866a-5199e31c5802" width="95%"> | <video src="https://github.com/user-attachments/assets/a8da0a1b-ba7d-45a4-a901-5d213ceaf50e" width="100%"> </video> |
### Customizable I2V LoRA Demo
| I2V Lora Effect | Reference Image | Generated Video |
|:---------------:|:--------------------------------:|:----------------:|
| Hair growth | <img src="./assets/demo/i2v_lora/imgs/hair_growth.png" width="40%"> | <video src="https://github.com/user-attachments/assets/06b998ae-bbde-4c1f-96cb-a25a9197d5cb" width="100%"> </video> |
| Embrace | <img src="./assets/demo/i2v_lora/imgs/embrace.png" width="40%"> | <video src="https://github.com/user-attachments/assets/f8c99eb1-2a43-489a-ba02-6bd50a6dd260" width="100%" > </video> |
## 🧩 Community Contributions
If you develop/use HunyuanVideo-I2V in your projects, welcome to let us know.
- ComfyUI-Kijai (FP8 Inference, V2V and IP2V Generation): [ComfyUI-HunyuanVideoWrapper](https://github.com/kijai/ComfyUI-HunyuanVideoWrapper) by [Kijai](https://github.com/kijai)
- HunyuanVideoGP (GPU Poor version): [HunyuanVideoGP](https://github.com/deepbeepmeep/HunyuanVideoGP) by [DeepBeepMeep](https://github.com/deepbeepmeep)
- xDiT compatibility improvement: [xDiT compatibility improvement](https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V/issues/36#issuecomment-2728068507) by [pftq](https://github.com/pftq) and [xibosun](https://github.com/xibosun)
## 📑 Open-source Plan
- HunyuanVideo-I2V (Image-to-Video Model)
- [x] Inference
- [x] Checkpoints
- [x] ComfyUI
- [x] Lora training scripts
- [x] Multi-gpus Sequence Parallel inference (Faster inference speed on more gpus)
## Contents
- [**HunyuanVideo-I2V** 🌅](#hunyuanvideo-i2v-)
- [🔥🔥🔥 News!!](#-news)
- [🎥 Demo](#-demo)
- [I2V Demo](#i2v-demo)
- [Frist Frame Consistency Demo](#frist-frame-consistency-demo)
- [Customizable I2V LoRA Demo](#customizable-i2v-lora-demo)
- [🧩 Community Contributions](#-community-contributions)
- [📑 Open-source Plan](#-open-source-plan)
- [Contents](#contents)
- [**HunyuanVideo-I2V Overall Architecture**](#hunyuanvideo-i2v-overall-architecture)
- [📜 Requirements](#-requirements)
- [🛠️ Dependencies and Installation](#️-dependencies-and-installation)
- [Installation Guide for Linux](#installation-guide-for-linux)
- [🧱 Download Pretrained Models](#-download-pretrained-models)
- [🔑 Single-gpu Inference](#-single-gpu-inference)
- [Tips for Using Image-to-Video Models](#tips-for-using-image-to-video-models)
- [Using Command Line](#using-command-line)
- [More Configurations](#more-configurations)
- [🎉 Customizable I2V LoRA effects training](#-customizable-i2v-lora-effects-training)
- [Requirements](#requirements)
- [Environment](#environment)
- [Training data construction](#training-data-construction)
- [Training](#training)
- [Inference](#inference)
- [🚀 Parallel Inference on Multiple GPUs by xDiT](#-parallel-inference-on-multiple-gpus-by-xdit)
- [Using Command Line](#using-command-line-1)
- [🔗 BibTeX](#-bibtex)
- [Acknowledgements](#acknowledgements)
---
## **HunyuanVideo-I2V Overall Architecture**
Leveraging the advanced video generation capabilities of [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo), we have extended its application to image-to-video generation tasks. To achieve this, we employ a token replace technique to effectively reconstruct and incorporate reference image information into the video generation process.
Since we utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder, we can significantly enhance the model's ability to comprehend the semantic content of the input image and to seamlessly integrate information from both the image and its associated caption. Specifically, the input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data.
The overall architecture of our system is designed to maximize the synergy between image and text modalities, ensuring a robust and coherent generation of video content from static images. This integration not only improves the fidelity of the generated videos but also enhances the model's ability to interpret and utilize complex multimodal inputs. The overall architecture is as follows.
<p align="center">
<img src="./assets/backbone.png" height=300>
</p>
# hunyuan-I2V
## 项目简介
hunyuan-I2V 是一个基于多GPU环境下的文本到视频(Text-to-Video)生成项目,整合了多种先进模型,支持高分辨率、高质量的视频生成。项目中使用了 Docker 容器化环境,便于快速部署和测试,适合研究、产品原型开发及教学演示。
## 📜 Requirements
The following table shows the requirements for running HunyuanVideo-I2V model (batch size = 1) to generate videos:
| Model | Resolution | GPU Peak Memory |
|:----------------:|:-----------:|:----------------:|
| HunyuanVideo-I2V | 720p | 60GB |
---
* An NVIDIA GPU with CUDA support is required.
* The model is tested on a single 80G GPU.
* **Minimum**: The minimum GPU memory required is 60GB for 720p.
* **Recommended**: We recommend using a GPU with 80GB of memory for better generation quality.
* Tested operating system: Linux
## 环境部署
## 🛠️ Dependencies and Installation
### 1. 拉取镜像
Begin by cloning the repository:
```shell
git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V
cd HunyuanVideo-I2V
```bash
docker pull harbor.sourcefind.cn:5443/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.4-1230-py3.10-20260115
```
### Installation Guide for Linux
### 2. 创建容器
We recommend CUDA versions 12.4 or 11.8 for the manual installation.
Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html).
```shell
# 1. Create conda environment
conda create -n HunyuanVideo-I2V python==3.11.9
# 2. Activate the environment
conda activate HunyuanVideo-I2V
# 3. Install PyTorch and other dependencies using conda
# For CUDA 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
# 4. Install pip dependencies
python -m pip install -r requirements.txt
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
python -m pip install xfuser==0.4.0
```bash
docker run -it \
--network=host \
--hostname=localhost \
--name=hunyuan \
-v /opt/hyhal:/opt/hyhal:ro \
-v $PWD:/workspace \
--ipc=host \
--device=/dev/kfd \
--device=/dev/mkfd \
--device=/dev/dri \
--shm-size=512G \
--privileged \
--group-add video \
--cap-add=SYS_PTRACE \
-u root \
--security-opt seccomp=unconfined \
harbor.sourcefind.cn:5443/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.4-1230-py3.10-20260115 \
/bin/bash
```
In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions:
---
```shell
# Making sure you have installed CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00 (or simply using our CUDA 12 docker image).
pip install nvidia-cublas-cu12==12.4.5.8
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/
```
## 测试步骤
Additionally, HunyuanVideo-I2V also provides a pre-built Docker image. Use the following command to pull and run the docker image.
### 1. 拉取代码
```shell
# For CUDA 12.4 (updated to avoid float point exception)
docker pull hunyuanvideo/hunyuanvideo-i2v:cuda12
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo-i2v --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo-i2v:cuda12
```bash
git clone http://developer.sourcefind.cn/codes/bw-bestperf/hunyuanvideo-i2v.git
cd hunyuanvideo-i2v/
```
### 2. 安装依赖
## 🧱 Download Pretrained Models
The details of download pretrained models are shown [here](ckpts/README.md).
## 🔑 Single-gpu Inference
Similar to [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo), HunyuanVideo-I2V supports high-resolution video generation, with resolution up to 720P and video length up to 129 frames (5 seconds).
### Tips for Using Image-to-Video Models
- **Use Concise Prompts**: To effectively guide the model's generation, keep your prompts short and to the point.
- **Include Key Elements**: A well-structured prompt should cover:
- **Main Subject**: Specify the primary focus of the video.
- **Action**: Describe the main movement or activity taking place.
- **Background (Optional)**: Set the scene for the video.
- **Camera Angle (Optional)**: Indicate the perspective or viewpoint.
- **Avoid Overly Detailed Prompts**: Lengthy or highly detailed prompts can lead to unnecessary transitions in the video output.
<!-- **For image-to-video models, we recommend using concise prompts to guide the model's generation process. A good prompt should include elements such as background, main subject, action, and camera angle. Overly long or excessively detailed prompts may introduce unnecessary transitions.** -->
### Using Command Line
<!-- ### Run a Gradio Server
```bash
python3 gradio_server.py --flow-reverse
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# set SERVER_NAME and SERVER_PORT manually
# SERVER_NAME=0.0.0.0 SERVER_PORT=8081 python3 gradio_server.py --flow-reverse
``` -->
If you want to generate a more **stable** video, you can set `--i2v-stability` and `--flow-shift 7.0`. Execute the command as follows
```bash
cd HunyuanVideo-I2V
python3 sample_image2video.py \
--model HYVideo-T/2 \
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
--i2v-mode \
--i2v-image-path ./assets/demo/i2v/imgs/0.jpg \
--i2v-resolution 720p \
--i2v-stability \
--infer-steps 50 \
--video-length 129 \
--flow-reverse \
--flow-shift 7.0 \
--seed 0 \
--embedded-cfg-scale 6.0 \
--use-cpu-offload \
--save-path ./results
```
If you want to generate a more **high-dynamic** video, you can **unset** `--i2v-stability` and `--flow-shift 17.0`. Execute the command as follows
```bash
cd HunyuanVideo-I2V
pip install yunchang==0.6.0 xfuser==0.4.2
python3 sample_image2video.py \
--model HYVideo-T/2 \
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
--i2v-mode \
--i2v-image-path ./assets/demo/i2v/imgs/0.jpg \
--i2v-resolution 720p \
--infer-steps 50 \
--video-length 129 \
--flow-reverse \
--flow-shift 17.0 \
--embedded-cfg-scale 6.0 \
--seed 0 \
--use-cpu-offload \
--save-path ./results
bash fix.sh # 适配xfuser
```
### More Configurations
We list some more useful configurations for easy usage:
| Argument | Default | Description |
|:----------------------:|:----------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| `--prompt` | None | The text prompt for video generation. |
| `--model` | HYVideo-T/2-cfgdistill | Here we use HYVideo-T/2 for I2V, HYVideo-T/2-cfgdistill is used for T2V mode. |
| `--i2v-mode` | False | Whether to open i2v mode. |
| `--i2v-image-path` | ./assets/demo/i2v/imgs/0.jpg | The reference image for video generation. |
| `--i2v-resolution` | 720p | The resolution for the generated video. |
| `--i2v-stability` | False | Whether to use stable mode for i2v inference. |
| `--video-length` | 129 | The length of the generated video. |
| `--infer-steps` | 50 | The number of steps for sampling. |
| `--flow-shift` | 7.0 | Shift factor for flow matching schedulers. We recommend 7 with `--i2v-stability` switch on for more stable video, 17 with `--i2v-stability` switch off for more dynamic video |
| `--flow-reverse` | False | If reverse, learning/sampling from t=1 -> t=0. |
| `--seed` | None | The random seed for generating video, if None, we init a random seed. |
| `--use-cpu-offload` | False | Use CPU offload for the model load to save more memory, necessary for high-res video generation. |
| `--save-path` | ./results | Path to save the generated video. |
### 3. 下载模型
安装 ModelScope:
```bash
pip install modelscope
```
## 🎉 Customizable I2V LoRA effects training
下载所需模型:
### Requirements
```bash
mkdir ckpts
The following table shows the requirements for training HunyuanVideo-I2V lora model (batch size = 1) to generate videos:
modelscope download --model Tencent-Hunyuan/HunyuanVideo --local_dir ./ckpts
| Model | Resolution | GPU Peak Memory |
|:----------------:|:----------:|:---------------:|
| HunyuanVideo-I2V | 360p | 79GB |
modelscope download --model AI-ModelScope/HunyuanVideo-I2V --local_dir ./ckpts
* An NVIDIA GPU with CUDA support is required.
* The model is tested on a single 80G GPU.
* **Minimum**: The minimum GPU memory required is 79GB for 360p.
* **Recommended**: We recommend using a GPU with 80GB of memory for better generation quality.
* Tested operating system: Linux
* Note: You can train with 360p data and directly infer 720p videos
modelscope download --model AI-ModelScope/clip-vit-large-patch14 --local_dir ckpts/text_encoder_2
### Environment
modelscope download --model AI-ModelScope/llava-llama-3-8b-v1_1-transformers --local_dir ckpts/text_encoder_i2v
```
pip install -r requirements.txt
```
### Training data construction
Prompt description: The trigger word is written directly in the video caption. It is recommended to use a phrase or short sentence.
For example, AI hair growth effect (trigger): rapid_hair_growth, The hair of the characters in the video is growing rapidly. + original prompt
执行修复脚本:
After having the training video and prompt pair, refer to [here] (hyvideo/hyvae_extract/README.md) for training data construction.
### Training
```
cd HunyuanVideo-I2V
sh scripts/run_train_image2video_lora.sh
```bash
bash modified/fix.sh
```
We list some training specific configurations for easy usage:
| Argument | Default | Description |
|:----------------:|:-------------------------------------------------------------:|:-----------------------------------------------------------:|
| `SAVE_BASE` | . | Root path for saving experimental results. |
| `EXP_NAME` | i2v_lora | Path suffix for saving experimental results. |
| `DATA_JSONS_DIR` | ./assets/demo/i2v_lora/train_dataset/processed_data/json_path | Data jsons dir generated by hyvideo/hyvae_extract/start.sh. |
| `CHIEF_IP` | 127.0.0.1 | Master node IP of the machine. |
设置环境变量,禁用 HIP 缓存 allocator 防止 OOM:
After training, you can find `pytorch_lora_kohaya_weights.safetensors` in `{SAVE_BASE}/log_EXP/*_{EXP_NAME}/checkpoints/global_step{*}/pytorch_lora_kohaya_weights.safetensors` and set it in `--lora-path` to perform inference.
### Inference
```bash
cd HunyuanVideo-I2V
python3 sample_image2video.py \
--model HYVideo-T/2 \
--prompt "Two people hugged tightly, In the video, two people are standing apart from each other. They then move closer to each other and begin to hug tightly. The hug is very affectionate, with the two people holding each other tightly and looking into each other's eyes. The interaction is very emotional and heartwarming, with the two people expressing their love and affection for each other." \
--i2v-mode \
--i2v-image-path ./assets/demo/i2v_lora/imgs/embrace.png \
--i2v-resolution 720p \
--i2v-stability \
--infer-steps 50 \
--video-length 129 \
--flow-reverse \
--flow-shift 5.0 \
--embedded-cfg-scale 6.0 \
--seed 0 \
--use-cpu-offload \
--save-path ./results \
--use-lora \
--lora-scale 1.0 \
--lora-path ./ckpts/hunyuan-video-i2v-720p/lora/embrace_kohaya_weights.safetensors
export PYTORCH_NO_HIP_MEMORY_CACHING=1
```
We list some lora specific configurations for easy usage:
| Argument | Default | Description |
|:-------------------:|:-------:|:----------------------------:|
| `--use-lora` | False | Whether to open lora mode. |
| `--lora-scale` | 1.0 | Fusion scale for lora model. |
| `--lora-path` | "" | Weight path for lora model. |
---
## 🚀 Parallel Inference on Multiple GPUs by xDiT
## 测试代码示例(四卡多GPU测试)
[xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters.
It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo-I2V model.
导出设备环境和禁用缓存:
### Using Command Line
```bash
export HIP_VISIBLE_DEVICES=4,5,6,7
export PYTORCH_NO_HIP_MEMORY_CACHING=1
```
For example, to generate a video with 8 GPUs, you can use the following command:
运行文本到视频多GPU推理:
```bash
cd HunyuanVideo-I2V
ALLOW_RESIZE_FOR_SP=1 torchrun --nproc_per_node=8 \
ALLOW_RESIZE_FOR_SP=1 torchrun --nproc_per_node=4 \
sample_image2video.py \
--model HYVideo-T/2 \
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
......@@ -390,73 +117,56 @@ ALLOW_RESIZE_FOR_SP=1 torchrun --nproc_per_node=8 \
--infer-steps 50 \
--video-length 129 \
--flow-reverse \
--flow-shift 7.0 \
--seed 0 \
--flow-shift 17.0 \
--seed 42 \
--embedded-cfg-scale 6.0 \
--save-path ./results \
--ulysses-degree 8 \
--ring-degree 1
--ulysses-degree 4 \
--ring-degree 1 \
--num-videos 1 2>&1 | tee z_logs/bw_image2video_4ka.log
```
The number of GPUs equals the product of `--ulysses-degree` and `--ring-degree.` Feel free to adjust these parallel configurations to optimize performance.
xDiT divides the video in the latent space based on either the height or width dimension, depending on which one is divisible by the number of GPUs. Enabling `ALLOW_RESIZE_FOR_SP=1` permits xDiT to slightly adjust the input image size so that the height or width is divisible by the number of GPUs.
The speedup of parallel inference is shown as follows.
<p align="center">
<table align="center">
<thead>
<tr>
<th colspan="4">Latency (Sec) for 1280x720 (129 frames 50 steps) on 8xGPU</th>
</tr>
<tr>
<th>1</th>
<th>2</th>
<th>4</th>
<th>8</th>
</tr>
</thead>
<tbody>
<tr>
<th>1904.08</th>
<th>934.09 (2.04x)</th>
<th>514.08 (3.70x)</th>
<th>337.58 (5.64x)</th>
</tr>
</tbody>
</table>
</p>
## 🔗 BibTeX
If you find [HunyuanVideo](https://arxiv.org/abs/2412.03603) useful for your research and applications, please cite using this BibTeX:
```BibTeX
@article{kong2024hunyuanvideo,
title={Hunyuanvideo: A systematic framework for large video generative models},
author={Kong, Weijie and Tian, Qi and Zhang, Zijian and Min, Rox and Dai, Zuozhuo and Zhou, Jin and Xiong, Jiangfeng and Li, Xin and Wu, Bo and Zhang, Jianwei and others},
journal={arXiv preprint arXiv:2412.03603},
year={2024}
}
```
---
## 配置选项说明
| 参数 | 说明 | 默认值 / 示例 |
| -------------------- | ------------------------------ | ---------------------------------------- |
| `--model` | 指定使用的模型名称 | `HYVideo-T/2` |
| `--prompt` | 文本描述,用于生成视频 | `"An Asian man with short hair..."` |
| `--i2v-mode` | 启用文本到视频模式 | |
| `--i2v-image-path` | 输入图像路径 | `./assets/demo/i2v/imgs/0.jpg` |
| `--i2v-resolution` | 输出视频分辨率 | `720p` |
| `--i2v-stability` | 稳定性增强选项 | |
| `--infer-steps` | 推理步数,影响生成质量与速度 | `50` |
| `--video-length` | 生成视频的长度(帧数) | `129` |
| `--flow-reverse` | 是否反转光流 | |
| `--flow-shift` | 光流偏移值 | `17.0` |
| `--seed` | 随机种子,保证结果可复现 | `42` |
| `--embedded-cfg-scale` | Condition scaling比例 | `6.0` |
| `--save-path` | 生成结果保存路径 | `./results` |
| `--ulysses-degree` | 自定义参数,具体含义见代码说明 | `4` |
| `--ring-degree` | 自定义参数,具体含义见代码说明 | `1` |
| `--num-videos` | 生成视频数量 | `1` |
---
## 贡献指南
欢迎对 hunyuan-I2V 项目进行贡献!请遵循以下步骤:
1. Fork 本仓库,并新建分支进行功能开发或问题修复。
2. 提交规范的 commit 信息,描述清晰。
3. 提交 Pull Request,简述修改内容及目的。
4. 遵守项目代码规范和测试标准。
5. 参与代码评审,积极沟通改进方案。
## Acknowledgements
---
We would like to thank the contributors to the [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [FLUX](https://github.com/black-forest-labs/flux), [Llama](https://github.com/meta-llama/llama), [LLaVA](https://github.com/haotian-liu/LLaVA), [Xtuner](https://github.com/InternLM/xtuner), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open research and exploration.
Additionally, we also thank the Tencent Hunyuan Multimodal team for their help with the text encoder.
## 许可证
本项目遵循 MIT 许可证,详见 [LICENSE](./LICENSE) 文件。
---
<!-- ## Github Star History
<a href="https://star-history.com/#Tencent-Hunyuan/HunyuanVideo&Date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo&type=Date&theme=dark" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo&type=Date" />
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo&type=Date" />
</picture>
</a> -->
感谢您的关注与支持!如有问题,欢迎提交 Issue 或联系维护团队。
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment