i2vgen-xl

1ad55bb4 · mashun1 · 1ad55bb4 · 1ad55bb4 · 1ad55bb4 · 1ad55bb4
Commit 1ad55bb4 authored Mar 15, 2024 by mashun1
20 changed files
--- a/.gitignore
+++ b/.gitignore
+*.pkl
+*.pt
+*.mov
+*.pth
+*.mov
+*.npz
+*.npy
+*.boj
+*.onnx
+*.tar
+*.bin
+cache*
+.DS_Store
+*DS_Store
+outputs/
+workspace/experiments/
+nohup*.txt
+models/
+i2vgen-xl
\ No newline at end of file
--- a/Dockerfile
+++ b/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10.1-py38
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# i2vgen-xl
+
+## 论文
+
+**I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models**
+
+* https://arxiv.org/abs/2311.04145
+
+## 模型结构
+该模型为两阶段的视频生成模型，其主要结构都为`3D-Unet`，其中第一阶段模型为低质量视频生成模型，其中包括提取图像高阶信息（如语义特征）的`CLIP`，图片压缩用到的`D.Enc.`（`VQGAN`中的`Encoder`）以及提取低阶特征（如细节特征）的`G.Enc.`；第二阶段模型用于生成高质量视频，以文本作为条件，第一阶段的输出进行Resize后作为LDM的输入并执行加噪去噪过程，最终得到高清视频。
+
+![Alt text](readme_imgs/image-1.png)
+
+## 算法原理
+
+该算法使用了级联的方式进行视频生成，将其分为了两个过程，一个用于保证视频语义的连贯性，一个用于增强视频的细节并提高分辨率。
+
+![alt text](readme_imgs/image-2.png)
+
+## 环境配置
+
+### Docker（方法一）
+
+    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10.1-py38
+
+    docker run --shm-size 10g --network=host --name=vgen --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
+
+    pip install -r requirements.txt
+
+    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  （whl.zip文件中）
+
+    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl （开发者社区下载）
+
+    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  （whl.zip文件中）
+
+    # 以下按需安装
+    yum install epel-release -y
+
+    yum localinstall --nogpgcheck https://download1.rpmfusion.org/free/el/rpmfusion-free-release-7.noarch.rpm -y
+
+    yum install ffmpeg ffmpeg-devel libsm6 libxext6 -y
+
+
+
+### Docker（方法二）
+
+    # 需要在对应的目录下
+    docker build -t <IMAGE_NAME>:<TAG> .
+
+    docker run --shm-size 10g --network=host --name=vgen --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
+
+    pip install -r requirements.txt
+
+    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  （whl.zip文件中）
+
+    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl （开发者社区下载）
+
+    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  （whl.zip文件中）
+
+    # 以下按需安装
+
+    yum install epel-release -y
+
+    yum localinstall --nogpgcheck https://download1.rpmfusion.org/free/el/rpmfusion-free-release-7.noarch.rpm -y
+
+    yum install ffmpeg ffmpeg-devel libsm6 libxext6 -y
+
+### Anaconda (方法三)
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
+https://developer.hpccube.com/tool/
+
+    DTK驱动：dtk23.10.1
+    python：python3.8
+    torch:2.1.0
+    torchvision:0.16.0
+    triton:2.1.0
+
+
+Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
+
+2、其它非特殊库参照requirements.txt安装
+
+    pip install -r requirements.txt
+
+    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  （whl.zip文件中）
+
+    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  （whl.zip文件中）
+
+    # 按需安装
+
+    conda install -c conda-forge ffmpeg
+
+
+## 数据集
+
+作者未公开训练数据集，常用的数据集目前无法下载。
+
+## 推理
+
+### 模型下载
+
+https://huggingface.co/ali-vilab/i2vgen-xl/tree/main
+
+    i2vgen-xl/
+    ├── i2vgen_xl_00854500.pth
+    ├── open_clip_pytorch_model.bin
+    ├── stable_diffusion_image_key_temporal_attention_x1.json
+    └── v2-1_512-ema-pruned.ckpt
+
+
+### 命令行
+
+    python inference.py --cfg configs/i2vgen_xl_infer.yaml
+
+    python inference.py --cfg configs/i2vgen_xl_infer.yaml  test_list_path data/test_list_for_i2vgen.txt test_model i2vgen-xl/i2vgen_xl_00854500.pth
+
+test_list_path 表示输入图像路径及其对应的标题请参考演示文件 data/test_list_for_i2vgen.txt 中的特定格式和建议。test_model 是加载模型的路径。
+
+
+### gradio页面
+
+    python gradio_app.py
+
+注意：第一次执行该命令会下载默认文件，当默认文件下载完毕后需手动注释`~/.cache/modelscope/modelscope_modules/i2vgen-xl/ms_wrapper.py`中的代码
+
+![alt text](readme_imgs/image-3.png)
+        
+## result
+
+||输入|输出|
+|:---|:---|:---|
+|图像|![alt text](readme_imgs/img_0001.jpg)|![alt text](readme_imgs/r.gif)|
+|prompt|A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style.||
+
+### 精度
+
+无
+
+## 应用场景
+
+### 算法类别
+
+`视频生成`
+
+### 热点应用行业
+
+`媒体,科研,教育`
+
+## 源码仓库及问题反馈
+
+* https://developer.hpccube.com/codes/modelzoo/i2vgen-xl_pytorch
+
+## 参考资料
+
+* https://github.com/ali-vilab/VGen
--- a/README_official.MD
+++ b/README_official.MD
+# VGen
+
+
+![figure1](source/VGen.jpg "figure1")
+
+VGen is an open-source video synthesis codebase developed by the Tongyi Lab of Alibaba Group, featuring state-of-the-art video generative models. This repository includes implementations of the following methods:
+
+
+- [I2VGen-xl: High-quality image-to-video synthesis via cascaded diffusion models](https://i2vgen-xl.github.io)
+- [VideoComposer: Compositional Video Synthesis with Motion Controllability](https://videocomposer.github.io)
+- [Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation](https://higen-t2v.github.io)
+- [A Recipe for Scaling up Text-to-Video Generation with Text-free Videos](https://tf-t2v.github.io)
+- [InstructVideo: Instructing Video Diffusion Models with Human Feedback](https://instructvideo.github.io)
+- [DreamVideo: Composing Your Dream Videos with Customized Subject and Motion](https://dreamvideo-t2v.github.io)
+- [VideoLCM: Video Latent Consistency Model](https://arxiv.org/abs/2312.09109)
+- [Modelscope text-to-video technical report](https://arxiv.org/abs/2308.06571)
+
+
+VGen can produce high-quality videos from the input text, images, desired motion, desired subjects, and even the feedback signals provided.  It also offers a variety of commonly used video generation tools such as visualization, sampling, training, inference, join training using images and videos, acceleration, and more. 
+
+
+<a href='https://i2vgen-xl.github.io/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2311.04145'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/damo-vilab/I2VGen-XL) [![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-sm-dark.svg)](https://huggingface.co/papers/2311.04145) 
+[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-a-discussion-sm-dark.svg)](https://huggingface.co/spaces/damo-vilab/I2VGen-XL/discussions) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/XUi0y7dxqEQ)  <a href='https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/441039979087.mp4'><img src='source/logo.png'></a>
+[![Replicate](https://replicate.com/cjwbw/i2vgen-xl/badge)](https://replicate.com/cjwbw/i2vgen-xl/)
+
+
+## 🔥News!!!
+- __[2024.03]__ We release the code and model of HiGen!!
+- __[2024.01]__ The gradio demo of I2VGen-XL has been completed in [HuggingFace](https://huggingface.co/spaces/damo-vilab/I2VGen-XL), thanks to our colleague @[Wenmeng Zhou](https://github.com/wenmengzhou) and @[AK](https://twitter.com/_akhaliq) for the support, and welcome to try it out.
+- __[2024.01]__ We support running the gradio app locally, thanks to our colleague @[Wenmeng Zhou](https://github.com/wenmengzhou) for the support and @[AK](https://twitter.com/_akhaliq) for the suggestion, and welcome to have a try.
+- __[2024.01]__ Thanks @[Chenxi](https://chenxwh.github.io) for supporting the running of i2vgen-xl on [![Replicate](https://replicate.com/cjwbw/i2vgen-xl/badge)](https://replicate.com/cjwbw/i2vgen-xl/). Feel free to give it a try. 
+- __[2024.01]__ The gradio demo of I2VGen-XL has been completed in [Modelscope](https://modelscope.cn/studios/damo/I2VGen-XL/summary), and welcome to try it out.
+- __[2023.12]__ We have open-sourced the code and models for [DreamTalk](https://github.com/ali-vilab/dreamtalk), which can produce high-quality talking head videos across diverse speaking styles using diffusion models.
+- __[2023.12]__ We release [TF-T2V](https://tf-t2v.github.io) that can scale up existing video generation techniques using text-free videos, significantly enhancing the performance of both [Modelscope-T2V](https://arxiv.org/abs/2308.06571) and [VideoComposer](https://videocomposer.github.io) at the same time.
+- __[2023.12]__ We updated the codebase to support higher versions of xformer (0.0.22), torch2.0+, and removed the dependency on flash_attn.
+- __[2023.12]__ We release [InstructVideo](https://instructvideo.github.io/) that can accept human feedback signals to improve VLDM
+- __[2023.12]__ We release the diffusion based expressive talking head generation [DreamTalk](https://dreamtalk-project.github.io)
+- __[2023.12]__ We release the high-efficiency video generation method [VideoLCM](https://arxiv.org/abs/2312.09109)
+- __[2023.12]__ We release the code and model of [I2VGen-XL](https://i2vgen-xl.github.io) and the [ModelScope T2V](https://arxiv.org/abs/2308.06571)
+- __[2023.12]__ We release the T2V method [HiGen](https://higen-t2v.github.io) and customizing T2V method [DreamVideo](https://dreamvideo-t2v.github.io).
+- __[2023.12]__ We write an [introduction document](doc/introduction.pdf) for VGen and compare I2VGen-XL with SVD.
+- __[2023.11]__ We release a high-quality I2VGen-XL model, please refer to the [Webpage](https://i2vgen-xl.github.io)
+
+
+## TODO
+- [x] Release the technical papers and webpage of [I2VGen-XL](doc/i2vgen-xl.md)
+- [x] Release the code and pretrained models that can generate 1280x720 videos
+- [x] Release the code and  models of [DreamTalk](https://github.com/ali-vilab/dreamtalk) that can generate expressive talking head
+- [ ] Release the code and pretrained models of [HumanDiff]()
+- [ ] Release models optimized specifically for the human body and faces
+- [ ] Updated version can fully maintain the ID and capture large and accurate motions simultaneously
+- [ ] Release other methods and the corresponding models
+
+
+## Preparation
+
+The main features of VGen are as follows:
+- Expandability, allowing for easy management of your own experiments.
+- Completeness, encompassing all common components for video generation.
+- Excellent performance, featuring powerful pre-trained models in multiple tasks.
+
+
+### Installation
+
+```
+conda create -n vgen python=3.8
+conda activate vgen
+pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
+pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+You  also need to ensure that your system has installed the `ffmpeg` command. If it is not installed, you can install it using the following command:
+```
+sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y
+```
+
+
+### Datasets
+
+We have provided a **demo dataset** that includes images and videos, along with their lists in ``data``. 
+
+*Please note that the demo images used here are for testing purposes and were not included in the training.*
+
+
+### Clone the code
+
+```
+git clone https://github.com/ali-vilab/VGen.git
+cd VGen
+```
+
+
+## Getting Started with VGen
+
+### (1) Train your text-to-video model
+
+
+Executing the following command to enable distributed training is as easy as that.
+```
+python train_net.py --cfg configs/t2v_train.yaml
+```
+
+In the `t2v_train.yaml` configuration file, you can specify the data, adjust the video-to-image ratio using `frame_lens`, and validate your ideas with different Diffusion settings, and so on.
+
+- Before the training, you can download any of our open-source models for initialization. Our codebase supports custom initialization and `grad_scale` settings, all of which are included in the `Pretrain` item in yaml file.
+- During the training, you can view the saved models and intermediate inference results in the `workspace/experiments/t2v_train`directory.
+
+After the training is completed, you can perform inference on the model using the following command.
+```
+python inference.py --cfg configs/t2v_infer.yaml
+```
+Then you can find the videos you generated in the `workspace/experiments/test_img_01` directory. For specific configurations such as data, models, seed, etc., please refer to the `t2v_infer.yaml` file.
+
+
+*If you want to directly load our previously open-sourced [Modelscope T2V model](https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis/tree/main), please refer to [this link](https://github.com/damo-vilab/i2vgen-xl/issues/31).*
+
+
+<!-- <table>
+<center>
+  <tr>
+    <td ><center>
+      <video muted="true" autoplay="true" loop="true" height="260" src="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/441754174077.mp4"></video>	
+    </center></td>
+    <td ><center>
+      <video muted="true" autoplay="true" loop="true" height="260" src="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/441138824052.mp4"></video>	
+    </center></td>
+  </tr>
+</center>
+</table>
+</center> -->
+
+
+
+
+### (2) Run the I2VGen-XL model
+
+(i) Download model and test data:
+```
+!pip install modelscope
+from modelscope.hub.snapshot_download import snapshot_download
+model_dir = snapshot_download('damo/I2VGen-XL', cache_dir='models/', revision='v1.0.0')
+```
+
+or you can also download it through HuggingFace (https://huggingface.co/damo-vilab/i2vgen-xl):
+```
+# Make sure you have git-lfs installed (https://git-lfs.com)
+git lfs install
+git clone https://huggingface.co/damo-vilab/i2vgen-xl
+```
+
+
+(ii) Run the following command:
+```
+python inference.py --cfg configs/i2vgen_xl_infer.yaml
+```
+or you can run:
+```
+python inference.py --cfg configs/i2vgen_xl_infer.yaml  test_list_path data/test_list_for_i2vgen.txt test_model models/i2vgen_xl_00854500.pth
+```
+The `test_list_path` represents the input image path and its corresponding caption. Please refer to the specific format and suggestions within demo file `data/test_list_for_i2vgen.txt`. `test_model` is the path for loading the model. In a few minutes, you can retrieve the high-definition video you wish to create from the `workspace/experiments/test_list_for_i2vgen` directory. At present, we find that the current model performs inadequately on **anime images** and **images with a black background** due to the lack of relevant training data. We are consistently working to optimize it.
+
+
+(iii) Run the gradio app locally:
+```
+python gradio_app.py
+```
+
+(iv) Run the model on ModelScope and HuggingFace:
+- [Modelscope](https://modelscope.cn/studios/damo/I2VGen-XL/summary)
+- [HuggingFace](https://huggingface.co/spaces/damo-vilab/I2VGen-XL)
+
+
+<span style="color:red">Due to the compression of our video quality in GIF format, please click 'HRER' below to view the original video.</span>
+
+<center>
+<table>
+<center>
+  <tr>
+    <td ><center>
+      <image  height="260" src="https://img.alicdn.com/imgextra/i1/O1CN01CCEq7K1ZeLpNQqrWu_!!6000000003219-0-tps-1280-720.jpg"></image>	
+    </center></td>
+    <td ><center>
+      <!-- <video muted="true" autoplay="true" loop="true" height="260" src="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/442125067544.mp4"></video>	 -->
+      <image  height="260" src="https://img.alicdn.com/imgextra/i4/O1CN01hIQcvG1spmQMLqBo0_!!6000000005816-1-tps-1280-704.gif"></image>	
+    </center></td>
+  </tr> 
+  <tr>
+    <td ><center>
+      <p>Input Image</p>
+    </center></td>
+    <td ><center>
+      <p>Click <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/442125067544.mp4">HERE</a> to view the generated video.</p>
+    </center></td>
+  </tr> 
+  <tr>
+    <td ><center>
+      <image  height="260" src="https://img.alicdn.com/imgextra/i4/O1CN01ZXY7UN23K8q4oQ3uG_!!6000000007236-2-tps-1280-720.png"></image>	
+    </center></td>
+    <td ><center>
+      <!-- <video muted="true" autoplay="true" loop="true" height="260" src="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/441385957074.mp4"></video>	 -->
+      <image height="260" src="https://img.alicdn.com/imgextra/i1/O1CN01iaSiiv1aJZURUEY53_!!6000000003309-1-tps-1280-704.gif"></image>	
+    </center></td>
+  </tr>
+  <tr>
+    <td ><center>
+      <p>Input Image</p>
+    </center></td>
+    <td ><center>
+      <p>Click <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/441385957074.mp4">HERE</a> to view the generated video.</p>
+    </center></td>
+  </tr> 
+  <tr>
+    <td ><center>
+      <image  height="260" src="https://img.alicdn.com/imgextra/i3/O1CN01NHpVGl1oat4H54Hjf_!!6000000005242-2-tps-1280-720.png"></image>	
+    </center></td>
+    <td ><center>
+      <!-- <video muted="true" autoplay="true" loop="true" height="260" src="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/442102706767.mp4"></video>	 -->
+      <!-- <image muted="true" height="260" src="https://img.alicdn.com/imgextra/i4/O1CN01DgLj1T240jfpzKoaQ_!!6000000007329-1-tps-1280-704.gif"></image>	
+       -->
+      <image  height="260" src="https://img.alicdn.com/imgextra/i4/O1CN01DgLj1T240jfpzKoaQ_!!6000000007329-1-tps-1280-704.gif"></image>
+    </center></td>
+  </tr> 
+  <tr>
+    <td ><center>
+      <p>Input Image</p>
+    </center></td>
+    <td ><center>
+      <p>Click <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/442102706767.mp4">HERE</a> to view the generated video.</p>
+    </center></td>
+  </tr> 
+  <tr>
+    <td ><center>
+      <image height="260" src="https://img.alicdn.com/imgextra/i1/O1CN01odS61s1WW9tXen21S_!!6000000002795-0-tps-1280-720.jpg"></image>	
+    </center></td>
+    <td ><center>
+      <!-- <video muted="true" autoplay="true" loop="true" height="260" src="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/442163934688.mp4"></video>	 -->
+      <image height="260" src="https://img.alicdn.com/imgextra/i3/O1CN01Jyk1HT28JkZtpAtY6_!!6000000007912-1-tps-1280-704.gif"></image>	
+    </center></td>
+  </tr>
+  <tr>
+    <td ><center>
+      <p>Input Image</p>
+    </center></td>
+    <td ><center>
+      <p>Click <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/442163934688.mp4">HERE</a> to view the generated video.</p>
+    </center></td>
+  </tr> 
+</center>
+</table>
+</center>
+
+(ii) Run the following command:
+```
+python inference.py --cfg configs/i2vgen_xl_train.yaml
+```
+In a few minutes, you can retrieve the high-definition video you wish to create from the `workspace/experiments/test_img_01` directory. At present, we find that the current model performs inadequately on **anime images** and **images with a black background** due to the lack of relevant training data. We are consistently working to optimize it.
+
+
+### (3) Run the HiGen model
+
+(i) Download model:
+```
+!pip install modelscope
+from modelscope.hub.snapshot_download import snapshot_download
+model_dir = snapshot_download('iic/HiGen', cache_dir='models/')
+```
+Then you might need the following command to move the checkpoints to the "models/" directory:
+```
+mv ./models/iic/HiGen/* ./models/
+```
+
+(ii) Run the following command for text-to-video generation:
+```
+python inference.py --cfg configs/higen_infer.yaml
+```
+In a few minutes, you can retrieve the videos you wish to create from the `workspace/experiments/text_list_for_t2v_share` directory.
+Then you can execute the following command to perform super-resolution on the generated videos:
+```
+python inference.py --cfg configs/sr600_infer.yaml
+```
+Finally, you can retrieve the high-definition video from the `workspace/experiments/text_list_for_t2v_share` directory.
+
+<span style="color:red">Due to the compression of our video quality in GIF format, please click 'HERE' below to view the original video.</span>
+<table>
+<center>
+  <tr>
+    <td ><center>
+      <image  height="260" src="source/duck.png"></image>	
+    </center></td>
+    <td ><center>
+      <image height="260" src="source/bat_man.png"></image>	
+    </center></td>
+  </tr>
+  <tr>
+    <td ><center>
+      <p>Click <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/452227605224.mp4">HERE</a> to view the generated video.</p>
+    </center></td>
+    <td ><center>
+      <p>Click <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/452015792863.mp4">HERE</a> to view the generated video.</p>
+    </center></td>
+  </tr>
+</center>
+</table>
+</center>
+
+
+### (4) Other methods
+
+In preparation!!
+
+
+## Customize your own approach
+
+Our codebase essentially supports all the commonly used components in video generation. You can manage your experiments flexibly by adding corresponding registration classes, including `ENGINE, MODEL, DATASETS, EMBEDDER, AUTO_ENCODER, VISUAL, DIFFUSION, PRETRAIN`, and can be compatible with all our open-source algorithms according to your own needs. If you have any questions, feel free to give us your feedback at any time.
+
+
+
+## BibTeX
+
+If this repo is useful to you, please cite our corresponding technical paper.
+
+
+```bibtex
+@article{2023videocomposer,
+  title={VideoComposer: Compositional Video Synthesis with Motion Controllability},
+  author={Wang, Xiang and Yuan, Hangjie and Zhang, Shiwei and Chen, Dayou and Wang, Jiuniu, and Zhang, Yingya, and Shen, Yujun, and Zhao, Deli and Zhou, Jingren},
+  booktitle={arXiv preprint arXiv:2306.02018},
+  year={2023}
+}
+@article{2023i2vgenxl,
+  title={I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models},
+  author={Zhang, Shiwei and Wang, Jiayu and Zhang, Yingya and Zhao, Kang and Yuan, Hangjie and Qing, Zhiwu and Wang, Xiang  and Zhao, Deli and Zhou, Jingren},
+  booktitle={arXiv preprint arXiv:2311.04145},
+  year={2023}
+}
+@article{wang2023modelscope,
+  title={Modelscope text-to-video technical report},
+  author={Wang, Jiuniu and Yuan, Hangjie and Chen, Dayou and Zhang, Yingya and Wang, Xiang and Zhang, Shiwei},
+  journal={arXiv preprint arXiv:2308.06571},
+  year={2023}
+}
+@article{dreamvideo,
+  title={DreamVideo: Composing Your Dream Videos with Customized Subject and Motion},
+  author={Wei, Yujie and Zhang, Shiwei and Qing, Zhiwu and Yuan, Hangjie and Liu, Zhiheng and Liu, Yu and Zhang, Yingya and Zhou, Jingren and Shan, Hongming},
+  journal={arXiv preprint arXiv:2312.04433},
+  year={2023}
+}
+@article{qing2023higen,
+  title={Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation},
+  author={Qing, Zhiwu and Zhang, Shiwei and Wang, Jiayu and Wang, Xiang and Wei, Yujie and Zhang, Yingya and Gao, Changxin and Sang, Nong },
+  journal={arXiv preprint arXiv:2312.04483},
+  year={2023}
+}
+@article{wang2023videolcm,
+  title={VideoLCM: Video Latent Consistency Model},
+  author={Wang, Xiang and Zhang, Shiwei and Zhang, Han and Liu, Yu and Zhang, Yingya and Gao, Changxin and Sang, Nong },
+  journal={arXiv preprint arXiv:2312.09109},
+  year={2023}
+}
+@article{ma2023dreamtalk,
+  title={DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models},
+  author={Ma, Yifeng and Zhang, Shiwei and Wang, Jiayu and Wang, Xiang and Zhang, Yingya and Deng Zhidong},
+  journal={arXiv preprint arXiv:2312.09767},
+  year={2023}
+}
+@article{2023InstructVideo,
+  title={InstructVideo: Instructing Video Diffusion Models with Human Feedback},
+  author={Yuan, Hangjie and Zhang, Shiwei and Wang, Xiang and Wei, Yujie and Feng, Tao and Pan, Yining and Zhang, Yingya and Liu, Ziwei and Albanie, Samuel and Ni, Dong},
+  booktitle={arXiv preprint arXiv:2312.12490},
+  year={2023}
+}
+@article{TFT2V,
+ title={A Recipe for Scaling up Text-to-Video Generation with Text-free Videos},
+ author={Wang, Xiang and Zhang, Shiwei and Yuan, Hangjie and Qing, Zhiwu and Gong, Biao and Zhang, Yingya and Shen, Yujun and Gao, Changxin and Sang, Nong},
+ journal={arXiv preprint arXiv:2312.15770},
+ year={2023}
+}
+```
+
+
+## Acknowledgement
+
+We would like to express our gratitude for the contributions of several previous works to the development of VGen. This includes, but is not limited to [Composer](https://arxiv.org/abs/2302.09778), [ModelScopeT2V](https://modelscope.cn/models/damo/text-to-video-synthesis/summary), [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), [OpenCLIP](https://github.com/mlfoundations/open_clip), [WebVid-10M](https://m-bain.github.io/webvid-dataset/), [LAION-400M](https://laion.ai/blog/laion-400-open-dataset/), [Pidinet](https://github.com/zhuoinoulu/pidinet) and [MiDaS](https://github.com/isl-org/MiDaS). We are committed to building upon these foundations in a way that respects their original contributions.
+
+
+
+## Disclaimer
+
+This open-source model is trained with using [WebVid-10M](https://m-bain.github.io/webvid-dataset/) and [LAION-400M](https://laion.ai/blog/laion-400-open-dataset/) datasets and is intended for <strong>RESEARCH/NON-COMMERCIAL USE ONLY</strong>. 
--- a/cog.yaml
+++ b/cog.yaml
+# Configuration for Cog ⚙️
+# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md
+
+build:
+  gpu: true
+  system_packages:
+    - libgl1-mesa-glx
+    - libglib2.0-0
+    - ffmpeg
+  python_version: "3.11"
+  python_packages:
+    - torch==2.0.1
+    - torchvision==0.15.2
+    - easydict==1.10
+    - tokenizers==0.15.0
+    - ftfy==6.1.1
+    - transformers==4.36.2
+    - imageio==2.33.1
+    - fairscale==0.4.13
+    - open-clip-torch==2.23.0
+    - chardet==5.2.0
+    - torchdiffeq==0.2.3
+    - opencv-python==4.9.0.80
+    - opencv-python-headless==4.9.0.80
+    - torchsde==0.2.6
+    - simplejson==3.19.2
+    - scikit-learn==1.3.2
+    - scikit-image==0.22.0
+    - rotary-embedding-torch==0.5.3
+    - pynvml==11.5.0
+    - triton==2.0.0
+    - pytorch-lightning==2.1.3
+    - torchmetrics==1.2.1
+    - PyYAML==6.0.1
+  run:
+    - pip install -U xformers --index-url https://download.pytorch.org/whl/cu118
+predict: "predict.py:Predictor"
--- a/configs/base.yaml
+++ b/configs/base.yaml
+ENABLE: true
+DATASET: webvid10m
\ No newline at end of file
--- a/configs/higen_infer.yaml
+++ b/configs/higen_infer.yaml
+TASK_TYPE: inference_higen_entrance
+use_fp16: True
+guide_scale: 12.0
+use_fp16: True
+chunk_size: 2
+decoder_bs: 2
+max_frames: 32
+target_fps: 16      # FPS Conditions, not the encoding fps
+scale: 8
+seed: 0
+round: 1
+batch_size: 1
+# For important input
+vldm_cfg: configs/higen_train.yaml
+test_list_path: data/text_list_for_t2v_share.txt
+test_model: models/cvpr2024.t2v.e003.non_ema_0725000.pth
+motion_factor: 500
+appearance_factor: 1.0
\ No newline at end of file
--- a/configs/higen_train.yaml
+++ b/configs/higen_train.yaml
+TASK_TYPE: train_t2v_higen_entrance
+ENABLE: true
+use_ema: true
+num_workers: 6
+frame_lens: [32, 32, 32, 32, 32, 32, 32, 32]
+sample_fps: [8,  8,  8, 8, 8, 8,  8, 8]
+resolution: [448, 256]
+vit_resolution: [224, 224]
+vid_dataset: {
+    'type': 'VideoDataset',
+    'data_list': ['data/vid_list.txt', ],
+    'data_dir_list': ['data/videos/', ],
+    'vit_resolution': [224, 224],
+    'resolution': [448, 256],
+    'get_first_frame': True,
+    'max_words': 1000,
+}
+img_dataset: {
+    'type': 'ImageDataset',
+    'data_list': ['data/img_list.txt', ],
+    'data_dir_list': ['data/images', ],
+    'vit_resolution': [224, 224],
+    'resolution': [448, 256],
+    'max_words': 1000
+}
+embedder: {
+    'type': 'FrozenOpenCLIPTextVisualEmbedder',
+    'layer': 'penultimate',
+    'vit_resolution': [224, 224],
+    'pretrained': 'models/open_clip_pytorch_model.bin'
+}
+UNet: {
+    'type': 'UNetSD_HiGen',
+    'in_dim': 4,
+    'y_dim': 1024,
+    'upper_len': 128,
+    'context_dim': 1024,
+    'concat_dim': 4,
+    'out_dim': 4,
+    'dim_mult': [1, 2, 4, 4],
+    'num_heads': 8,
+    'default_fps': 8,
+    'head_dim': 64,
+    'num_res_blocks': 2,
+    'dropout': 0.1,
+    'temporal_attention': True,
+    'temporal_attn_times': 1,
+    'use_checkpoint': True,
+    'use_fps_condition': False,
+    'use_sim_mask': False,
+    'context_embedding_depth': 2,
+    'num_tokens': 16
+}
+Diffusion: {
+    'type': 'DiffusionDDIM',
+    'schedule': 'linear_sd', # linear_sd
+    'schedule_param': {
+        'num_timesteps': 1000,
+        'zero_terminal_snr': True,
+        'init_beta': 0.00085,
+        'last_beta': 0.0120
+    },
+    'mean_type': 'v',
+    'loss_type': 'mse',
+    'var_type': 'fixed_small',
+    'rescale_timesteps': False,
+    'noise_strength': 0.1
+}
+batch_sizes: {
+    "1": 256,
+    "4": 96,
+    "8": 48,
+    "16": 32,
+    "24": 24,
+    "32": 10
+}
+visual_train: {
+    'type': 'VisualTrainTextImageToVideo',
+    'partial_keys': [
+        # ['y', 'local_image', 'fps'],
+        # ['image', 'local_image', 'fps'],
+        ['y', 'image', 'local_image', 'fps']
+    ],
+    'use_offset_noise': True,
+    'guide_scale': 9.0, 
+}
+
+Pretrain: {
+    'type': pretrain_specific_strategies,
+    'fix_weight': False,
+    'grad_scale': 0.5,
+    'resume_checkpoint': 'models/i2vgen_xl_00854500.pth',
+    'sd_keys_path': 'models/stable_diffusion_image_key_temporal_attention_x1.json',
+}
+
+chunk_size: 4
+decoder_bs: 4
+lr: 0.00003
+
+noise_strength: 0.1
+# classifier-free guidance
+p_zero: 0.0
+guide_scale: 3.0
+num_steps: 1000000
+
+use_zero_infer: True
+viz_interval: 50        # 200
+save_ckp_interval: 50   # 500
+
+# Log
+log_dir: "workspace/experiments"
+log_interval: 1
+seed: 6666
--- a/configs/i2vgen_xl_infer.yaml
+++ b/configs/i2vgen_xl_infer.yaml
+TASK_TYPE: inference_i2vgen_entrance
+use_fp16: True
+guide_scale: 9.0
+use_fp16: True
+chunk_size: 2
+decoder_bs: 2
+max_frames: 16
+target_fps: 16      # FPS Conditions, not the encoding fps
+scale: 8
+seed: 8888
+round: 4
+batch_size: 1
+use_zero_infer: True 
+# For important input
+vldm_cfg: configs/i2vgen_xl_train.yaml
+test_list_path: data/test_list_for_i2vgen.txt
+test_model: i2vgen-xl/i2vgen_xl_00854500.pth
--- a/configs/i2vgen_xl_infer_person.yaml
+++ b/configs/i2vgen_xl_infer_person.yaml
+TASK_TYPE: inference_i2vgen_entrance
+use_fp16: True
+guide_scale: 9.0
+use_fp16: True
+chunk_size: 2
+decoder_bs: 2
+max_frames: 16
+target_fps: 16      # FPS Conditions 
+scale: 8
+batch_size: 1
+use_zero_infer: True 
+# For important input
+round: 4
+seed: 0
+data_root: workspace/test_imgs/test_img_01
+# test_list_path: workspace/test_imgs/test_img_01.txt
+test_list_path: workspace/test_imgs/test_img_02.txt
+cap_dict_path: workspace/test_imgs/cap_dict_01.json
+vldm_cfg: configs/i2vgen_xl_train.yaml
+test_model: i2vgen-xl/i2vgen_xl_person_00854500.pth
--- a/configs/i2vgen_xl_train.yaml
+++ b/configs/i2vgen_xl_train.yaml
+TASK_TYPE: train_i2v_vs_img_text_entrance
+ENABLE: true
+use_ema: true
+num_workers: 6
+frame_lens: [16, 16, 16, 16, 16, 32, 32, 32]
+sample_fps: [8,  8,  16, 16, 16, 8,  16, 16]
+resolution: [1280, 704]
+vit_resolution: [224, 224]
+vid_dataset: {
+    'type': 'VideoDataset',
+    'data_list': ['data/vid_list.txt', ],
+    'data_dir_list': ['data/videos/', ],
+    'vit_resolution': [224, 224],
+    'resolution': [1280, 704],
+    'get_first_frame': True,
+    'max_words': 1000,
+}
+img_dataset: {
+    'type': 'ImageDataset',
+    'data_list': ['data/img_list.txt', ],
+    'data_dir_list': ['data/images', ],
+    'vit_resolution': [224, 224],
+    'resolution': [1280, 704],
+    'max_words': 1000
+}
+embedder: {
+    'type': 'FrozenOpenCLIPTextVisualEmbedder',
+    'layer': 'penultimate',
+    'vit_resolution': [224, 224],
+    'pretrained': 'i2vgen-xl/open_clip_pytorch_model.bin'
+}
+UNet: {
+    'type': 'UNetSD_I2VGen',
+    'in_dim': 4,
+    'y_dim': 1024,
+    'upper_len': 128,
+    'context_dim': 1024,
+    'concat_dim': 4,
+    'out_dim': 4,
+    'dim_mult': [1, 2, 4, 4],
+    'num_heads': 8,
+    'default_fps': 8,
+    'head_dim': 64,
+    'num_res_blocks': 2,
+    'dropout': 0.1,
+    'temporal_attention': True,
+    'temporal_attn_times': 1,
+    'use_checkpoint': True,
+    'use_fps_condition': False,
+    'use_sim_mask': False
+}
+Diffusion: {
+    'type': 'DiffusionDDIM',
+    'schedule': 'cosine', # cosine
+    'schedule_param': {
+        'num_timesteps': 1000,
+        'cosine_s': 0.008,
+        'zero_terminal_snr': True,
+    },
+    'mean_type': 'v',
+    'loss_type': 'mse',
+    'var_type': 'fixed_small',
+    'rescale_timesteps': False,
+    'noise_strength': 0.1
+}
+batch_sizes: {
+    "1": 32,
+    "4": 8,
+    "8": 4,
+    "16": 2,
+    "32": 1,
+}
+visual_train: {
+    'type': 'VisualTrainTextImageToVideo',
+    'partial_keys': [
+        # ['y', 'local_image', 'fps'],
+        # ['image', 'local_image', 'fps'],
+        ['y', 'image', 'local_image', 'fps']
+    ],
+    'use_offset_noise': True,
+    'guide_scale': 9.0, 
+}
+
+Pretrain: {
+    'type': pretrain_specific_strategies,
+    'fix_weight': False,
+    'grad_scale': 0.5,
+    'resume_checkpoint': 'i2vgen-xl/i2vgen_xl_00854500.pth',
+    'sd_keys_path': 'i2vgen-xl/stable_diffusion_image_key_temporal_attention_x1.json',
+}
+
+chunk_size: 4
+decoder_bs: 4
+lr: 0.00003
+
+noise_strength: 0.1
+# classifier-free guidance
+p_zero: 0.0
+guide_scale: 3.0
+num_steps: 1000000
+
+use_zero_infer: True
+viz_interval: 50        # 200
+save_ckp_interval: 50   # 500
+
+# Log
+log_dir: "workspace/experiments"
+log_interval: 1
+seed: 6666
--- a/configs/sr600_infer.yaml
+++ b/configs/sr600_infer.yaml
+TASK_TYPE: inference_sr600_entrance
+use_fp16: True
+vldm_cfg: ''
+round: 1
+batch_size: 1
+# For important input
+test_list_path: data/text_list_for_t2v_share.txt
+test_model: models/sr_step_110000_ema.pth
+
+embedder: {
+    'type': 'FrozenOpenCLIPTextVisualEmbedder',
+    'layer': 'penultimate',
+    'vit_resolution': [224, 224],
+    'pretrained': 'i2vgen-xl/models/open_clip_pytorch_model.bin',
+    'negative_prompt': 'worst quality, normal quality, low quality, low res, blurry, text, watermark, logo, banner, extra digits, cropped, jpeg artifacts, signature, username, error, sketch ,duplicate, ugly, monochrome, horror, geometry, mutation, disgusting',
+    'positive_prompt': ', cinematic, High Contrast, highly detailed, Unreal Engine 5, no blur, full length ultra-wide angle shot a cinematic scene, taken using a Canon EOS R camera, hyper detailed photo - realistic maximum detail, 32k, Color Grading, portrait Photography, ultra HD, extreme meticulous detailing, skin pore detailing, hyper sharpness, perfect without deformations, 4k render'
+}
+UNet: {
+    'type': 'UNetSD_SR600',
+    'in_dim': 4,
+    'dim': 320,
+    'y_dim': 1024,
+    'context_dim': 1024,
+    'out_dim': 4,
+    'dim_mult': [1, 2, 4, 4],
+    'num_heads': 8,
+    'head_dim': 64,
+    'num_res_blocks': 2,
+    'attn_scales' :[1, 0.5, 0.25],
+    'use_scale_shift_norm': True,
+    'dropout': 0.1,
+    'temporal_attn_times': 1,
+    'temporal_attention': True,
+    'use_checkpoint': True,
+    'use_image_dataset': False,
+    'use_sim_mask': False,
+    'inpainting': True
+}
+Diffusion: {
+    'type': 'DiffusionDDIMSR',
+    'reverse_diffusion': {
+      'schedule': 'cosine',
+      'mean_type': 'v',
+      'schedule_param':
+      {
+        'num_timesteps': 1000,
+        'zero_terminal_snr': True
+      }
+    },
+    'forward_diffusion': {
+      'schedule': 'logsnr_cosine_interp',
+      'mean_type': 'v',
+      'schedule_param':
+      {
+        'num_timesteps': 1000,
+        'zero_terminal_snr': True,
+        'scale_min': 2.0,
+        'scale_max': 4.0
+      }
+    }
+}
+batch_sizes: {
+    "1": 256,
+    "4": 96,
+    "8": 48,
+    "16": 32,
+    "24": 24,
+    "32": 10
+}
+visual_train: {
+    'type': 'VisualTrainTextImageToVideo',
+    'partial_keys': [
+        # ['y', 'local_image', 'fps'],
+        # ['image', 'local_image', 'fps'],
+        ['y', 'image', 'local_image', 'fps']
+    ],
+    'use_offset_noise': True,
+    'guide_scale': 9.0, 
+}
+
+chunk_size: 4
+decoder_bs: 4
+lr: 0.00003
+
+noise_strength: 0.1
+# classifier-free guidance
+p_zero: 0.0
+guide_scale: 3.0
+num_steps: 1000000
+
+use_zero_infer: True
+viz_interval: 50        # 200
+save_ckp_interval: 50   # 500
+
+# Log
+log_dir: "workspace/experiments"
+log_interval: 1
+seed: 6666
+
+total_noise_levels: 700
\ No newline at end of file
--- a/configs/t2v_infer.yaml
+++ b/configs/t2v_infer.yaml
+TASK_TYPE: inference_text2video_entrance
+use_fp16: True
+guide_scale: 9.0
+use_fp16: True
+chunk_size: 2
+decoder_bs: 2
+max_frames: 16
+target_fps: 16      # FPS Conditions, not encoding fps
+scale: 8
+batch_size: 1
+use_zero_infer: True 
+# For important input
+round: 4
+seed: 8888
+test_list_path: data/text_img_for_t2v.txt
+vldm_cfg: configs/t2v_train.yaml
+test_model: workspace/model_bk/model_scope_0267000.pth
--- a/configs/t2v_train.yaml
+++ b/configs/t2v_train.yaml
+TASK_TYPE: train_t2v_entrance
+ENABLE: true
+use_ema: false
+num_workers: 6
+frame_lens: [1, 16, 16, 16, 16, 32, 32, 32]
+sample_fps: [1,  8,  16, 16, 16, 8,  16, 16]
+resolution: [448, 256]
+vit_resolution: [224, 224]
+vid_dataset: {
+    'type': 'VideoDataset',
+    'data_list': ['data/vid_list.txt', ],
+    'data_dir_list': ['data/videos/', ],
+    'vit_resolution': [224, 224],
+    'resolution': [448, 256],
+    'get_first_frame': True,
+    'max_words': 1000,
+}
+img_dataset: {
+    'type': 'ImageDataset',
+    'data_list': ['data/img_list.txt', ],
+    'data_dir_list': ['data/images', ],
+    'vit_resolution': [224, 224],
+    'resolution': [448, 256],
+    'max_words': 1000
+}
+embedder: {
+    'type': 'FrozenOpenCLIPTextVisualEmbedder',
+    'layer': 'penultimate',
+    'vit_resolution': [224, 224],
+    'pretrained': 'models/open_clip_pytorch_model.bin'
+}
+UNet: {
+    'type': 'UNetSD_T2VBase',
+    'in_dim': 4,
+    'y_dim': 1024,
+    'upper_len': 128,
+    'context_dim': 1024,
+    'out_dim': 4,
+    'dim_mult': [1, 2, 4, 4],
+    'num_heads': 8,
+    'default_fps': 8,
+    'head_dim': 64,
+    'num_res_blocks': 2,
+    'dropout': 0.1,
+    'misc_dropout': 0.4,
+    'temporal_attention': True,
+    'temporal_attn_times': 1,
+    'use_checkpoint': True,
+    'use_fps_condition': False,
+    'use_sim_mask': False
+}
+Diffusion: {
+    'type': 'DiffusionDDIM',
+    'schedule': 'cosine', # cosine
+    'schedule_param': {
+        'num_timesteps': 1000,
+        'cosine_s': 0.008,
+        'zero_terminal_snr': True,
+    },
+    'mean_type': 'v',
+    'loss_type': 'mse',
+    'var_type': 'fixed_small',
+    'rescale_timesteps': False,
+    'noise_strength': 0.1
+}
+batch_sizes: {
+    "1": 32,
+    "4": 8,
+    "8": 4,
+    "16": 4,
+    "32": 2
+}
+visual_train: {
+    'type': 'VisualTrainTextImageToVideo',
+    'partial_keys': [
+        ['y', 'fps'],
+    ],
+    'use_offset_noise': False,
+    'guide_scale': 9.0, 
+}
+
+Pretrain: {
+    'type': pretrain_specific_strategies,
+    'fix_weight': False,
+    'grad_scale': 0.5,
+    'resume_checkpoint': 'workspace/model_bk/model_scope_0267000.pth',
+    'sd_keys_path': 'data/stable_diffusion_image_key_temporal_attention_x1.json',
+}
+
+chunk_size: 4
+decoder_bs: 4
+lr: 0.00003
+
+noise_strength: 0.1
+# classifier-free guidance
+p_zero: 0.1
+guide_scale: 3.0
+num_steps: 1000000
+
+use_zero_infer: True
+viz_interval: 5        # 200
+save_ckp_interval: 50   # 500
+
+# Log
+log_dir: "workspace/experiments"
+log_interval: 1
+seed: 8888
--- a/data/font/DejaVuSans.ttf
+++ b/data/font/DejaVuSans.ttf
--- a/data/images/s09_003750_037507367.jpg
+++ b/data/images/s09_003750_037507367.jpg
--- a/data/images/s09_006735_067357176.jpg
+++ b/data/images/s09_006735_067357176.jpg
--- a/data/images/s09_006882_068827514.jpg
+++ b/data/images/s09_006882_068827514.jpg
--- a/data/images/s09_009187_091873942.jpg
+++ b/data/images/s09_009187_091873942.jpg
--- a/data/img_list.txt
+++ b/data/img_list.txt
+s09_009187_091873942.jpg|||FOTON 4x2 4x4 Right Hand Drive Mobile Outdoor Waterproof LED Advertising Truck Manufacturer
+s09_006882_068827514.jpg|||China Electric Propulsion Outboards 6HP 10HP 20HP for high
+s09_003750_037507367.jpg|||Fish Farming Use HDPE Net Cage in The Sea
+s09_009187_091873942.jpg|||FOTON 4x2 4x4 Right Hand Drive Mobile Outdoor Waterproof LED Advertising Truck Manufacturer
+s09_006882_068827514.jpg|||China Electric Propulsion Outboards 6HP 10HP 20HP for high
+s09_003750_037507367.jpg|||Fish Farming Use HDPE Net Cage in The Sea
+s09_009187_091873942.jpg|||FOTON 4x2 4x4 Right Hand Drive Mobile Outdoor Waterproof LED Advertising Truck Manufacturer
+s09_006882_068827514.jpg|||China Electric Propulsion Outboards 6HP 10HP 20HP for high
+s09_003750_037507367.jpg|||Fish Farming Use HDPE Net Cage in The Sea
+s09_009187_091873942.jpg|||FOTON 4x2 4x4 Right Hand Drive Mobile Outdoor Waterproof LED Advertising Truck Manufacturer
+s09_006882_068827514.jpg|||China Electric Propulsion Outboards 6HP 10HP 20HP for high
+s09_003750_037507367.jpg|||Fish Farming Use HDPE Net Cage in The Sea
+s09_009187_091873942.jpg|||FOTON 4x2 4x4 Right Hand Drive Mobile Outdoor Waterproof LED Advertising Truck Manufacturer
+s09_006882_068827514.jpg|||China Electric Propulsion Outboards 6HP 10HP 20HP for high
+s09_003750_037507367.jpg|||Fish Farming Use HDPE Net Cage in The Sea
+s09_009187_091873942.jpg|||FOTON 4x2 4x4 Right Hand Drive Mobile Outdoor Waterproof LED Advertising Truck Manufacturer
+s09_006882_068827514.jpg|||China Electric Propulsion Outboards 6HP 10HP 20HP for high
+s09_003750_037507367.jpg|||Fish Farming Use HDPE Net Cage in The Sea
+s09_009187_091873942.jpg|||FOTON 4x2 4x4 Right Hand Drive Mobile Outdoor Waterproof LED Advertising Truck Manufacturer
+s09_006882_068827514.jpg|||China Electric Propulsion Outboards 6HP 10HP 20HP for high
+s09_003750_037507367.jpg|||Fish Farming Use HDPE Net Cage in The Sea