Commit e2778d0d authored by litzh's avatar litzh
Browse files

Initial commit

parents
Pipeline #3370 canceled with stages
---
name: Bug Report
about: Use this template to report bugs in the project.
title: "[Bug] "
labels: bug
assignees: ''
---
### Description
Briefly describe the bug you encountered.
### Steps to Reproduce
1. First step of the operation.
2. Second step of the operation.
3. ...
### Expected Result
Describe the normal behavior you expected.
### Actual Result
Describe the abnormal situation that actually occurred.
### Environment Information
- Operating System: [e.g., Ubuntu 22.04]
- Commit ID: [Version of the project]
### Log Information
Please provide relevant error logs or debugging information.
### Additional Information
If there is any other information that can help solve the problem, please add it here.
name: lint
on:
pull_request:
push:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Cache Python dependencies
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
- name: Install pre-commit hook
run: |
pip install pre-commit ruff
- name: Check pre-commit config file
run: |
if [ ! -f ".pre-commit-config.yaml" ]; then
echo "Error: .pre-commit-config.yaml not found."
exit 1
fi
- name: Linting
run: |
echo "Running pre-commit on all files..."
pre-commit run --all-files || {
echo "Linting failed. Please check the above output for details."
exit 1
}
*.pth
*.pt
*.onnx
*.pk
*.model
*.zip
*.tar
*.pyc
*.log
*.o
*.so
*.a
*.exe
*.out
.idea
**.DS_Store**
**/__pycache__/**
**.swp
.vscode/
.env
.log
*.pid
*.ipynb*
*.mp4
build/
dist/
.cache/
server_cache/
app/.gradio/
*.pkl
save_results/*
*.egg-info/
# Follow https://verdantfox.com/blog/how-to-use-git-pre-commit-hooks-the-hard-way-and-the-easy-way
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.0
hooks:
- id: ruff
args: [--fix, --respect-gitignore, --config=pyproject.toml]
- id: ruff-format
args: [--config=pyproject.toml]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-toml
- id: check-added-large-files
args: ['--maxkb=3000'] # Allow files up to 3MB
- id: check-case-conflict
- id: check-merge-conflict
- id: debug-statements
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This diff is collapsed.
<div align="center" style="font-family: charter;">
<h1>⚡️ LightX2V:<br> 轻量级视频生成推理框架</h1>
<img alt="logo" src="assets/img_lightx2v.png" width=75%></img>
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ModelTC/lightx2v)
[![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://lightx2v-en.readthedocs.io/en/latest)
[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest)
[![Papers](https://img.shields.io/badge/论文集-中文-99cc2)](https://lightx2v-papers-zhcn.readthedocs.io/zh-cn/latest)
[![Docker](https://img.shields.io/badge/Docker-2496ED?style=flat&logo=docker&logoColor=white)](https://hub.docker.com/r/lightx2v/lightx2v/tags)
**\[ [English](README.md) | 中文 \]**
</div>
--------------------------------------------------------------------------------
**LightX2V** 是一个先进的轻量级图像视频生成推理框架,专为提供高效、高性能的图像视频生成解决方案而设计。该统一平台集成了多种前沿的图像视频生成技术,支持文本生成视频(T2V)和图像生成视频(I2V),文本生图片(T2I),图像编辑(I2I)等多样化生成任务。**X2V 表示将不同的输入模态(X,如文本或图像)转换为视觉输出(Vision)**
> 🌐 **立即在线体验!** 无需安装即可体验 LightX2V:**[LightX2V 在线服务](https://x2v.light-ai.top/login)** - 免费、轻量、快速的AI数字人视频生成平台。
> 👋 **加入微信交流群,LightX2V加群机器人微信号: random42seed**
## 🧾 社区代码贡献指南
在提交之前,请确保代码格式符合项目规范。可以使用如下执行命令,确保项目代码格式的一致性。
```bash
pip install ruff pre-commit
pre-commit run --all-files
```
除了LightX2V团队的贡献,我们也收到一些社区开发者的贡献,包括但不限于:
- [triple-Mu](https://github.com/triple-Mu)
- [vivienfanghuagood](https://github.com/vivienfanghuagood)
- [yeahdongcn](https://github.com/yeahdongcn)
- [kikidouloveme79](https://github.com/kikidouloveme79)
## :fire: 最新动态
- **2026年1月20日:** 🚀 我们支持了[LTX-2](https://huggingface.co/Lightricks/LTX-2)音频-视频生成模型,包含CFG并行、block级别offload、FP8 per-tensor量化等先进特性。使用示例可参考[examples/ltx2](https://github.com/ModelTC/LightX2V/tree/main/examples/ltx2)[scripts/ltx2](https://github.com/ModelTC/LightX2V/tree/main/scripts/ltx2)
- **2026年1月6日:** 🚀 我们更新了[Qwen-Image-2512](https://huggingface.co/Qwen/Qwen-Image-2512)[Qwen/Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511)的8步的CFG/步数蒸馏模型。可以在[Qwen-Image-Edit-2511-Lightning](https://huggingface.co/lightx2v/Qwen-Image-Edit-2511-Lightning)[Qwen-Image-2512-Lightning](https://huggingface.co/lightx2v/Qwen-Image-2512-Lightning)下载对应的权重进行使用。使用教程参考[这里](https://github.com/ModelTC/LightX2V/tree/main/examples/qwen_image)
- **2026年1月6日:** 🚀 支持燧原 Enflame S60 (GCU) 的部署。
- **2025年12月31日:** 🚀 我们Day0支持了[Qwen-Image-2512](https://huggingface.co/Qwen/Qwen-Image-2512) 文生图模型. 我们的[HuggingFace](https://huggingface.co/lightx2v/Qwen-Image-2512-Lightning) 已经更新了CFG/步数蒸馏lora权重。使用方式可以参考[这里](https://github.com/ModelTC/LightX2V/tree/main/examples/qwen_image)
- **2025年12月27日:** 🚀 支持摩尔线程 MUSA 的部署。
- **2025年12月25日:** 🚀 支持 AMD ROCm 和 Ascend 910B 的部署。
- **2025年12月23日:** 🚀 我们Day0支持了[Qwen/Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511)的图像编辑模型,H100单卡,LightX2V可带来约1.4倍的速度提升,支持CFG并行/Ulysses并行,高效Offload等技术。我们的[HuggingFace](https://huggingface.co/lightx2v/Qwen-Image-Edit-2511-Lightning)已经更新了CFG/步数蒸馏lora和FP8权重。使用方式可以参考[这里](https://github.com/ModelTC/LightX2V/tree/main/examples/qwen_image)。结合LightX2V,4步CFG/步数蒸馏,FP8模型,最高可以加速约42倍。可以在[LightX2V 在线服务](https://x2v.light-ai.top/login)的图生图的Qwen-Image-Edit-2511进行体验。
- **2025年12月22日:** 🚀 新增 **Wan2.1 NVFP4 量化感知 4 步蒸馏模型** 支持;模型与权重已发布在 HuggingFace: [Wan-NVFP4](https://huggingface.co/lightx2v/Wan-NVFP4)
- **2025年12月15日:** 🚀 支持 海光DCU 硬件上的部署。
- **2025年12月4日:** 🚀 支持 GGUF 格式模型推理,以及在寒武纪 MLU590、MetaX C500 硬件上的部署。
- **2025年11月24日:** 🚀 我们发布了HunyuanVideo-1.5的4步蒸馏模型!这些模型支持**超快速4步推理**,无需CFG配置,相比标准50步推理可实现约**25倍加速**。现已提供基础版本和FP8量化版本:[Hy1.5-Distill-Models](https://huggingface.co/lightx2v/Hy1.5-Distill-Models)
- **2025年11月21日:** 🚀 我们Day0支持了[HunyuanVideo-1.5](https://huggingface.co/tencent/HunyuanVideo-1.5)的视频生成模型,同样GPU数量,LightX2V可带来约2倍以上的速度提升,并支持更低显存GPU部署(如24G RTX4090)。支持CFG并行/Ulysses并行,高效Offload,TeaCache/MagCache等技术。同时支持沐曦,寒武纪等国产芯片部署。我们很快将在我们的[HuggingFace主页](https://huggingface.co/lightx2v)更新更多模型,包括步数蒸馏,VAE蒸馏等相关模型。量化模型和轻量VAE模型现已可用:[Hy1.5-Quantized-Models](https://huggingface.co/lightx2v/Hy1.5-Quantized-Models)用于量化推理,[HunyuanVideo-1.5轻量TAE](https://huggingface.co/lightx2v/Autoencoders/blob/main/lighttaehy1_5.safetensors)用于快速VAE解码。使用教程参考[这里](https://github.com/ModelTC/LightX2V/tree/main/scripts/hunyuan_video_15),或查看[示例目录](https://github.com/ModelTC/LightX2V/tree/main/examples)获取代码示例。
## 🏆 性能测试数据 (更新于 2025.12.01)
### 📊 推理框架之间性能对比 (H100)
| Framework | GPUs | Step Time | Speedup |
|-----------|---------|---------|---------|
| Diffusers | 1 | 9.77s/it | 1x |
| xDiT | 1 | 8.93s/it | 1.1x |
| FastVideo | 1 | 7.35s/it | 1.3x |
| SGL-Diffusion | 1 | 6.13s/it | 1.6x |
| **LightX2V** | 1 | **5.18s/it** | **1.9x** 🚀 |
| FastVideo | 8 | 2.94s/it | 1x |
| xDiT | 8 | 2.70s/it | 1.1x |
| SGL-Diffusion | 8 | 1.19s/it | 2.5x |
| **LightX2V** | 8 | **0.75s/it** | **3.9x** 🚀 |
### 📊 推理框架之间性能对比 (RTX 4090D)
| Framework | GPUs | Step Time | Speedup |
|-----------|---------|---------|---------|
| Diffusers | 1 | 30.50s/it | 1x |
| FastVideo | 1 | 22.66s/it | 1.3x |
| xDiT | 1 | OOM | OOM |
| SGL-Diffusion | 1 | OOM | OOM |
| **LightX2V** | 1 | **20.26s/it** | **1.5x** 🚀 |
| FastVideo | 8 | 15.48s/it | 1x |
| xDiT | 8 | OOM | OOM |
| SGL-Diffusion | 8 | OOM | OOM |
| **LightX2V** | 8 | **4.75s/it** | **3.3x** 🚀 |
### 📊 LightX2V不同配置之间性能对比
| Framework | GPU | Configuration | Step Time | Speedup |
|-----------|-----|---------------|-----------|---------------|
| **LightX2V** | H100 | 8 GPUs + cfg | 0.75s/it | 1x |
| **LightX2V** | H100 | 8 GPUs + no cfg | 0.39s/it | 1.9x |
| **LightX2V** | H100 | **8 GPUs + no cfg + fp8** | **0.35s/it** | **2.1x** 🚀 |
| **LightX2V** | 4090D | 8 GPUs + cfg | 4.75s/it | 1x |
| **LightX2V** | 4090D | 8 GPUs + no cfg | 3.13s/it | 1.5x |
| **LightX2V** | 4090D | **8 GPUs + no cfg + fp8** | **2.35s/it** | **2.0x** 🚀 |
**注意**: 所有以上性能数据均在 Wan2.1-I2V-14B-480P(40 steps, 81 frames) 上测试。此外,我们[HuggingFace 主页](https://huggingface.co/lightx2v)还提供了4步蒸馏模型。
## 💡 快速开始
详细使用说明请参考我们的文档:**[英文文档](https://lightx2v-en.readthedocs.io/en/latest/) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**
**我们强烈推荐使用 Docker 环境,这是最简单快捷的环境安装方式。具体参考:文档中的快速入门章节。**
### 从 Git 安装
```bash
pip install -v git+https://github.com/ModelTC/LightX2V.git
```
### 从源码构建
```bash
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v . # pip install -v .
```
### (可选)安装注意力/量化算子
注意力算子安装说明请参考我们的文档:**[英文文档](https://lightx2v-en.readthedocs.io/en/latest/getting_started/quickstart.html#step-4-install-attention-operators) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html#id9)**
### 使用示例
```python
# examples/wan/wan_i2v.py
"""
Wan2.2 image-to-video generation example.
This example demonstrates how to use LightX2V with Wan2.2 model for I2V generation.
"""
from lightx2v import LightX2VPipeline
# Initialize pipeline for Wan2.2 I2V task
# For wan2.1, use model_cls="wan2.1"
pipe = LightX2VPipeline(
model_path="/path/to/Wan2.2-I2V-A14B",
model_cls="wan2.2_moe",
task="i2v",
)
# Alternative: create generator from config JSON file
# pipe.create_generator(
# config_json="configs/wan22/wan_moe_i2v.json"
# )
# Enable offloading to significantly reduce VRAM usage with minimal speed impact
# Suitable for RTX 30/40/50 consumer GPUs
pipe.enable_offload(
cpu_offload=True,
offload_granularity="block", # For Wan models, supports both "block" and "phase"
text_encoder_offload=True,
image_encoder_offload=False,
vae_offload=False,
)
# Create generator manually with specified parameters
pipe.create_generator(
attn_mode="sage_attn2",
infer_steps=40,
height=480, # Can be set to 720 for higher resolution
width=832, # Can be set to 1280 for higher resolution
num_frames=81,
guidance_scale=[3.5, 3.5], # For wan2.1, guidance_scale is a scalar (e.g., 5.0)
sample_shift=5.0,
)
# Generation parameters
seed = 42
prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
negative_prompt = "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
image_path="/path/to/img_0.jpg"
save_result_path = "/path/to/save_results/output.mp4"
# Generate video
pipe.generate(
seed=seed,
image_path=image_path,
prompt=prompt,
negative_prompt=negative_prompt,
save_result_path=save_result_path,
)
```
**NVFP4(量化感知 4 步)资源**
- 推理示例:`examples/wan/wan_i2v_nvfp4.py`(I2V),`examples/wan/wan_t2v_nvfp4.py`(T2V)。
- NVFP4 算子编译/安装指南:参见 `lightx2v_kernel/README.md`
> 💡 **更多示例**: 更多使用案例,包括量化、卸载、缓存等进阶配置,请参考 [examples 目录](https://github.com/ModelTC/LightX2V/tree/main/examples)。
## 🤖 支持的模型生态
### 官方开源模型
-[LTX-2](https://huggingface.co/Lightricks/LTX-2)
-[HunyuanVideo-1.5](https://huggingface.co/tencent/HunyuanVideo-1.5)
-[Wan2.1 & Wan2.2](https://huggingface.co/Wan-AI/)
-[Qwen-Image](https://huggingface.co/Qwen/Qwen-Image)
-[Qwen-Image-Edit](https://huggingface.co/spaces/Qwen/Qwen-Image-Edit)
-[Qwen-Image-Edit-2509](https://huggingface.co/Qwen/Qwen-Image-Edit-2509)
-[Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511)
### 量化模型和蒸馏模型/Lora (**🚀 推荐:4步推理**)
-[Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
-[Wan2.2-Distill-Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
-[Wan2.1-Distill-Loras](https://huggingface.co/lightx2v/Wan2.1-Distill-Loras)
-[Wan2.2-Distill-Loras](https://huggingface.co/lightx2v/Wan2.2-Distill-Loras)
-[Wan2.1-Distill-NVFP4](https://huggingface.co/lightx2v/Wan-NVFP4)
-[Qwen-Image-Edit-2511-Lightning](https://huggingface.co/lightx2v/Qwen-Image-Edit-2511-Lightning)
### 轻量级自编码器模型(**🚀 推荐:推理快速 + 内存占用低**)
-[Autoencoders](https://huggingface.co/lightx2v/Autoencoders)
### 自回归模型
-[Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
-[Self-Forcing](https://github.com/guandeh17/Self-Forcing)
-[Matrix-Game-2.0](https://huggingface.co/Skywork/Matrix-Game-2.0)
🔔 可以关注我们的[HuggingFace主页](https://huggingface.co/lightx2v),及时获取我们团队的模型。
💡 参考[模型结构文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/model_structure.html)快速上手 LightX2V
## 🚀 前端展示
我们提供了多种前端界面部署方式:
- **🎨 Gradio界面**: 简洁易用的Web界面,适合快速体验和原型开发
- 📖 [Gradio部署文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_gradio.html)
- **🎯 ComfyUI界面**: 强大的节点式工作流界面,支持复杂的视频生成任务
- 📖 [ComfyUI部署文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_comfyui.html)
- **🚀 Windows一键部署**: 专为Windows用户设计的便捷部署方案,支持自动环境配置和智能参数优化
- 📖 [Windows一键部署文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_local_windows.html)
**💡 推荐方案**:
- **首次使用**: 建议选择Windows一键部署方案
- **高级用户**: 推荐使用ComfyUI界面获得更多自定义选项
- **快速体验**: Gradio界面提供最直观的操作体验
## 🚀 核心特性
### 🎯 **极致性能优化**
- **🔥 SOTA推理速度**: 通过步数蒸馏和系统优化实现**20倍**极速加速(单GPU)
- **⚡️ 革命性4步蒸馏**: 将原始40-50步推理压缩至仅需4步,且无需CFG配置
- **🛠️ 先进算子支持**: 集成顶尖算子,包括[Sage Attention](https://github.com/thu-ml/SageAttention)[Flash Attention](https://github.com/Dao-AILab/flash-attention)[Radial Attention](https://github.com/mit-han-lab/radial-attention)[q8-kernel](https://github.com/KONAKONA666/q8_kernels)[sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)[vllm](https://github.com/vllm-project/vllm)
### 💾 **资源高效部署**
- **💡 突破硬件限制**: **仅需8GB显存 + 16GB内存**即可运行14B模型生成480P/720P视频
- **🔧 智能参数卸载**: 先进的磁盘-CPU-GPU三级卸载架构,支持阶段/块级别的精细化管理
- **⚙️ 全面量化支持**: 支持`w8a8-int8``w8a8-fp8``w4a4-nvfp4`等多种量化策略
### 🎨 **丰富功能生态**
- **📈 智能特征缓存**: 智能缓存机制,消除冗余计算,提升效率
- **🔄 并行推理加速**: 多GPU并行处理,显著提升性能表现
- **📱 灵活部署选择**: 支持Gradio、服务化部署、ComfyUI等多种部署方式
- **🎛️ 动态分辨率推理**: 自适应分辨率调整,优化生成质量
- **🎞️ 视频帧插值**: 基于RIFE的帧插值技术,实现流畅的帧率提升
## 📚 技术文档
### 📖 **方法教程**
- [模型量化](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/quantization.html) - 量化策略全面指南
- [特征缓存](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/cache.html) - 智能缓存机制详解
- [注意力机制](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/attention.html) - 前沿注意力算子
- [参数卸载](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/offload.html) - 三级存储架构
- [并行推理](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/parallel.html) - 多GPU加速策略
- [变分辨率推理](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/changing_resolution.html) - U型分辨率策略
- [步数蒸馏](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/step_distill.html) - 4步推理技术
- [视频帧插值](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/video_frame_interpolation.html) - 基于RIFE的帧插值技术
### 🛠️ **部署指南**
- [低资源场景部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/for_low_resource.html) - 优化的8GB显存解决方案
- [低延迟场景部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/for_low_latency.html) - 极速推理优化
- [Gradio部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_gradio.html) - Web界面搭建
- [服务化部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_service.html) - 生产级API服务部署
- [Lora模型部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/lora_deploy.html) - Lora灵活部署
## 🤝 致谢
我们向所有启发和促进LightX2V开发的模型仓库和研究社区表示诚挚的感谢。此框架基于开源社区的集体努力而构建。包括但不限于:
- [Tencent-Hunyuan](https://github.com/Tencent-Hunyuan)
- [Wan-Video](https://github.com/Wan-Video)
- [Qwen-Image](https://github.com/QwenLM/Qwen-Image)
- [LightLLM](https://github.com/ModelTC/LightLLM)
- [sglang](https://github.com/sgl-project/sglang)
- [vllm](https://github.com/vllm-project/vllm)
- [flash-attention](https://github.com/Dao-AILab/flash-attention)
- [SageAttention](https://github.com/thu-ml/SageAttention)
- [flashinfer](https://github.com/flashinfer-ai/flashinfer)
- [MagiAttention](https://github.com/SandAI-org/MagiAttention)
- [radial-attention](https://github.com/mit-han-lab/radial-attention)
- [xDiT](https://github.com/xdit-project/xDiT)
- [FastVideo](https://github.com/hao-ai-lab/FastVideo)
## 🌟 Star 历史
[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/lightx2v&type=Timeline)](https://star-history.com/#ModelTC/lightx2v&Timeline)
## ✏️ 引用
如果您发现LightX2V对您的研究有用,请考虑引用我们的工作:
```bibtex
@misc{lightx2v,
author = {LightX2V Contributors},
title = {LightX2V: Light Video Generation Inference Framework},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ModelTC/lightx2v}},
}
```
## 📞 联系与支持
如有任何问题、建议或需要支持,欢迎通过以下方式联系我们:
- 🐛 [GitHub Issues](https://github.com/ModelTC/lightx2v/issues) - 错误报告和功能请求
---
<div align="center">
由 LightX2V 团队用 ❤️ 构建
</div>
# Gradio Demo
Please refer our gradio deployment doc:
[English doc: Gradio Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_gradio.html)
[中文文档: Gradio 部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_gradio.html)
## 🚀 Quick Start (快速开始)
For Windows users, we provide a convenient one-click deployment solution with automatic environment configuration and intelligent parameter optimization. Please refer to the [One-Click Gradio Launch](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_local_windows.html) section for detailed instructions.
对于Windows用户,我们提供了便捷的一键部署方式,支持自动环境配置和智能参数优化。详细操作请参考[一键启动Gradio](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_local_windows.html)章节。
"""
重构后的 Gradio Demo 主入口文件
整合了所有模块,支持中英文切换
"""
import argparse
import gc
import json
import logging
import os
import warnings
import torch
from loguru import logger
from utils.i18n import DEFAULT_LANG, set_language
from utils.model_utils import cleanup_memory, extract_op_name, get_model_configs
from utils.ui_builder import build_ui, generate_unique_filename, get_auto_config_dict
from lightx2v.utils.input_info import init_empty_input_info, update_input_info_from_dict
from lightx2v.utils.set_config import get_default_config
warnings.filterwarnings("ignore", category=UserWarning, module="huggingface_hub")
warnings.filterwarnings("ignore", category=UserWarning, module="huggingface_hub.utils")
logging.getLogger("httpx").setLevel(logging.ERROR)
logging.getLogger("httpcore").setLevel(logging.ERROR)
logging.getLogger("urllib3").setLevel(logging.ERROR)
os.environ["PROFILING_DEBUG_LEVEL"] = "2"
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
os.environ["DTYPE"] = "BF16"
logger.add(
"inference_logs.log",
rotation="100 MB",
encoding="utf-8",
enqueue=True,
backtrace=True,
diagnose=True,
)
global_runner = None
current_config = None
cur_dit_path = None
def run_inference(
prompt="",
negative_prompt="",
save_result_path="",
infer_steps=4,
num_frames=81,
resolution="480p",
seed=42,
sample_shift=5,
cfg_scale=1,
fps=16,
model_path_input=None,
model_type_input="wan2.1",
task_type_input="i2v",
dit_path_input=None,
high_noise_path_input=None,
low_noise_path_input=None,
t5_path_input=None,
clip_path_input="",
image_path=None,
vae_path=None,
qwen_image_dit_path_input=None,
qwen_image_vae_path_input=None,
qwen_image_scheduler_path_input=None,
qwen25vl_encoder_path_input=None,
z_image_dit_path_input=None,
z_image_vae_path_input=None,
z_image_scheduler_path_input=None,
qwen3_encoder_path_input=None,
aspect_ratio="1:1",
use_lora=None,
lora_path=None,
lora_strength=None,
high_noise_lora_path=None,
low_noise_lora_path=None,
high_noise_lora_strength=None,
low_noise_lora_strength=None,
):
cleanup_memory()
auto_config = get_auto_config_dict(model_type=model_type_input, resolution=resolution, num_frames=num_frames, task_type=task_type_input)
# 从 auto_config 中获取 offload 和 rope 相关配置
rope_chunk = auto_config["rope_chunk_val"]
rope_chunk_size = auto_config["rope_chunk_size_val"]
cpu_offload = auto_config["cpu_offload_val"]
offload_granularity = auto_config["offload_granularity_val"]
lazy_load = auto_config["lazy_load_val"]
t5_cpu_offload = auto_config["t5_cpu_offload_val"]
clip_cpu_offload = auto_config["clip_cpu_offload_val"]
vae_cpu_offload = auto_config["vae_cpu_offload_val"]
unload_modules = auto_config["unload_modules_val"]
attention_type = auto_config["attention_type_val"]
quant_op = auto_config["quant_op_val"]
use_tiling_vae = auto_config["use_tiling_vae_val"]
clean_cuda_cache = auto_config["clean_cuda_cache_val"]
quant_op = extract_op_name(quant_op)
attention_type = extract_op_name(attention_type)
task = task_type_input
is_image_output = task in ["i2i", "t2i"]
save_result_path = generate_unique_filename(output_dir, is_image=is_image_output)
if cfg_scale == 1:
enable_cfg = False
else:
enable_cfg = True
model_config = get_model_configs(
model_type_input,
model_path_input,
dit_path_input,
high_noise_path_input,
low_noise_path_input,
t5_path_input,
clip_path_input,
vae_path,
qwen_image_dit_path_input,
qwen_image_vae_path_input,
qwen_image_scheduler_path_input,
qwen25vl_encoder_path_input,
z_image_dit_path_input,
z_image_vae_path_input,
z_image_scheduler_path_input,
qwen3_encoder_path_input,
quant_op,
use_lora=use_lora,
lora_path=lora_path,
lora_strength=lora_strength,
high_noise_lora_path=high_noise_lora_path,
low_noise_lora_path=low_noise_lora_path,
high_noise_lora_strength=high_noise_lora_strength,
low_noise_lora_strength=low_noise_lora_strength,
)
model_cls = model_config["model_cls"]
model_path = model_config["model_path"]
global global_runner, current_config, cur_dit_path
logger.info(f"Auto-determined model_cls: {model_cls} (model type: {model_type_input})")
if model_cls.startswith("wan2.2"):
current_dit_path = f"{high_noise_path_input}|{low_noise_path_input}" if high_noise_path_input and low_noise_path_input else None
else:
current_dit_path = dit_path_input
needs_reinit = lazy_load or unload_modules or global_runner is None or cur_dit_path != current_dit_path
config_graio = {
"infer_steps": infer_steps,
"target_video_length": num_frames,
"resolution": resolution,
"resize_mode": "adaptive",
"self_attn_1_type": attention_type,
"cross_attn_1_type": attention_type,
"cross_attn_2_type": attention_type,
"attn_type": attention_type,
"enable_cfg": enable_cfg,
"sample_guide_scale": cfg_scale,
"sample_shift": sample_shift,
"fps": fps,
"feature_caching": "NoCaching",
"do_mm_calib": False,
"parallel_attn_type": None,
"parallel_vae": False,
"max_area": False,
"vae_stride": (4, 8, 8),
"patch_size": (1, 2, 2),
"lora_path": None,
"strength_model": 1.0,
"use_prompt_enhancer": False,
"text_len": 512,
"denoising_step_list": [1000, 750, 500, 250],
"cpu_offload": True if "wan2.2" in model_cls else cpu_offload,
"offload_granularity": ("phase" if "wan2.2" in model_cls else offload_granularity),
"t5_cpu_offload": t5_cpu_offload,
"clip_cpu_offload": clip_cpu_offload,
"vae_cpu_offload": vae_cpu_offload,
"use_tiling_vae": use_tiling_vae,
"lazy_load": lazy_load,
"rope_chunk": rope_chunk,
"rope_chunk_size": rope_chunk_size,
"clean_cuda_cache": clean_cuda_cache,
"unload_modules": unload_modules,
"seq_parallel": False,
"warm_up_cpu_buffers": False,
"boundary_step_index": 2,
"boundary": 0.900,
"use_image_encoder": False if "wan2.2" in model_cls else True,
"rope_type": "torch",
"t5_lazy_load": lazy_load,
"bucket_shape": {
"0.667": [[480, 832], [544, 960], [720, 960]],
"1.500": [[832, 480], [960, 544], [960, 720]],
"1.000": [[480, 480], [576, 576], [720, 720]],
},
"aspect_ratio": aspect_ratio,
}
args = argparse.Namespace(
model_cls=model_cls,
seed=seed,
task=task,
model_path=model_path,
prompt_enhancer=None,
prompt=prompt,
negative_prompt=negative_prompt,
image_path=image_path,
save_result_path=save_result_path,
return_result_tensor=False,
aspect_ratio=aspect_ratio,
target_shape=[],
)
input_info = init_empty_input_info(args.task)
config = get_default_config()
config.update({k: v for k, v in vars(args).items()})
config.update(config_graio)
config.update(model_config)
# 如果 model_config 中包含 lora_configs,设置 lora_dynamic_apply
if config.get("lora_configs"):
config["lora_dynamic_apply"] = True
logger.info(f"Using model: {model_path}")
logger.info(f"Inference config:\n{json.dumps(config, indent=4, ensure_ascii=False)}")
# 初始化或重用 runner
runner = global_runner
if needs_reinit:
if runner is not None:
del runner
torch.cuda.empty_cache()
gc.collect()
from lightx2v.infer import init_runner
runner = init_runner(config)
data = args.__dict__
update_input_info_from_dict(input_info, data)
current_config = config
cur_dit_path = current_dit_path
if not lazy_load:
global_runner = runner
else:
runner.config = config
data = args.__dict__
update_input_info_from_dict(input_info, data)
runner.run_pipeline(input_info)
cleanup_memory()
return save_result_path
def main(lang=DEFAULT_LANG):
"""主函数"""
set_language(lang)
demo = build_ui(
model_path=model_path,
output_dir=output_dir,
run_inference=run_inference,
lang=lang,
)
# 启动 Gradio 应用
demo.launch(
share=True,
server_port=args.server_port,
server_name=args.server_name,
inbrowser=True,
allowed_paths=[output_dir],
max_file_size="1gb",
)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="轻量级视频生成")
parser.add_argument("--model_path", type=str, required=True, help="模型文件夹路径")
parser.add_argument("--server_port", type=int, default=7862, help="服务器端口")
parser.add_argument("--server_name", type=str, default="0.0.0.0", help="服务器IP")
parser.add_argument("--output_dir", type=str, default="./outputs", help="输出视频保存目录")
parser.add_argument("--lang", type=str, default="zh", choices=["zh", "en"], help="界面语言")
args = parser.parse_args()
global model_path, model_cls, output_dir
model_path = args.model_path
model_cls = "wan2.1"
output_dir = args.output_dir
main(lang=args.lang)
#!/bin/bash
# Lightx2v Gradio Demo Startup Script
# Supports both Image-to-Video (i2v) and Text-to-Video (t2v) modes
# ==================== Configuration Area ====================
# ⚠️ Important: Please modify the following paths according to your actual environment
# 🚨 Storage Performance Tips 🚨
# 💾 Strongly recommend storing model files on SSD solid-state drives!
# 📈 SSD can significantly improve model loading speed and inference performance
# 🐌 Using mechanical hard drives (HDD) may cause slow model loading and affect overall experience
# Lightx2v project root directory path
# Example: /home/user/lightx2v or /data/video_gen/lightx2v
lightx2v_path=/data/video_gen/lightx2v_debug/LightX2V
# Model path configuration
# Example: /path/to/Wan2.1-I2V-14B-720P-Lightx2v
model_path=/models/
# Server configuration
server_name="0.0.0.0"
server_port=8036
# Output directory configuration
output_dir="./outputs"
# GPU configuration
gpu_id=0
# ==================== Environment Variables Setup ====================
export CUDA_VISIBLE_DEVICES=$gpu_id
export CUDA_LAUNCH_BLOCKING=1
export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
export PROFILING_DEBUG_LEVEL=2
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
# ==================== Parameter Parsing ====================
# Default interface language
lang="zh"
# 解析命令行参数
while [[ $# -gt 0 ]]; do
case $1 in
--lang)
lang="$2"
shift 2
;;
--port)
server_port="$2"
shift 2
;;
--gpu)
gpu_id="$2"
export CUDA_VISIBLE_DEVICES=$gpu_id
shift 2
;;
--output_dir)
output_dir="$2"
shift 2
;;
--model_path)
model_path="$2"
shift 2
;;
--help)
echo "🎬 Lightx2v Gradio Demo Startup Script"
echo "=========================================="
echo "Usage: $0 [options]"
echo ""
echo "📋 Available options:"
echo " --lang zh|en Interface language (default: zh)"
echo " zh: Chinese interface"
echo " en: English interface"
echo " --port PORT Server port (default: 8032)"
echo " --gpu GPU_ID GPU device ID (default: 0)"
echo " --model_path PATH Model path (default: configured in script)"
echo " --output_dir DIR Output video save directory (default: ./outputs)"
echo " --help Show this help message"
echo ""
echo "📝 Notes:"
echo " - Task type (i2v/t2v) and model type are selected in the web UI"
echo " - Model class is auto-detected based on selected diffusion model"
echo " - Edit script to configure model paths before first use"
echo " - Ensure required Python dependencies are installed"
echo " - Recommended to use GPU with 8GB+ VRAM"
echo " - 🚨 Strongly recommend storing models on SSD for better performance"
exit 0
;;
*)
echo "Unknown parameter: $1"
echo "Use --help to see help information"
exit 1
;;
esac
done
# ==================== Parameter Validation ====================
if [[ "$lang" != "zh" && "$lang" != "en" ]]; then
echo "Error: Language must be 'zh' or 'en'"
exit 1
fi
# Check if model path exists
if [[ ! -d "$model_path" ]]; then
echo "❌ Error: Model path does not exist"
echo "📁 Path: $model_path"
echo "🔧 Solutions:"
echo " 1. Check model path configuration in script"
echo " 2. Ensure model files are properly downloaded"
echo " 3. Verify path permissions are correct"
echo " 4. 💾 Recommend storing models on SSD for faster loading"
exit 1
fi
# 使用新的统一入口文件
demo_file="gradio_demo.py"
echo "🌏 Using unified interface (language: $lang)"
# Check if demo file exists
if [[ ! -f "$demo_file" ]]; then
echo "❌ Error: Demo file does not exist"
echo "📄 File: $demo_file"
echo "🔧 Solutions:"
echo " 1. Ensure script is run in the correct directory"
echo " 2. Check if file has been renamed or moved"
echo " 3. Re-clone or download project files"
exit 1
fi
# ==================== System Information Display ====================
echo "=========================================="
echo "🚀 Lightx2v Gradio Demo Starting..."
echo "=========================================="
echo "📁 Project path: $lightx2v_path"
echo "🤖 Model path: $model_path"
echo "🌏 Interface language: $lang"
echo "🖥️ GPU device: $gpu_id"
echo "🌐 Server address: $server_name:$server_port"
echo "📁 Output directory: $output_dir"
echo "📝 Note: Task type and model class are selected in web UI"
echo "=========================================="
# Display system resource information
echo "💻 System resource information:"
free -h | grep -E "Mem|Swap"
echo ""
# Display GPU information
if command -v nvidia-smi &> /dev/null; then
echo "🎮 GPU information:"
nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader,nounits | head -1
echo ""
fi
# ==================== Start Demo ====================
echo "🎬 Starting Gradio demo..."
echo "📱 Please access in browser: http://$server_name:$server_port"
echo "⏹️ Press Ctrl+C to stop service"
echo "🔄 First startup may take several minutes to load resources..."
echo "=========================================="
# Start Python demo
if [[ "$demo_file" == "gradio_demo.py" ]]; then
python $demo_file \
--model_path "$model_path" \
--server_name "$server_name" \
--server_port "$server_port" \
--output_dir "$output_dir" \
--lang "$lang"
else
python $demo_file \
--model_path "$model_path" \
--server_name "$server_name" \
--server_port "$server_port" \
--output_dir "$output_dir"
fi
# Display final system resource usage
echo ""
echo "=========================================="
echo "📊 Final system resource usage:"
free -h | grep -E "Mem|Swap"
@echo off
chcp 65001 >nul
echo 🎬 LightX2V Gradio Windows Startup Script
echo ==========================================
REM ==================== Configuration Area ====================
REM ⚠️ Important: Please modify the following paths according to your actual environment
REM 🚨 Storage Performance Tips 🚨
REM 💾 Strongly recommend storing model files on SSD solid-state drives!
REM 📈 SSD can significantly improve model loading speed and inference performance
REM 🐌 Using mechanical hard drives (HDD) may cause slow model loading and affect overall experience
REM LightX2V project root directory path
REM Example: D:\LightX2V
set lightx2v_path=/path/to/LightX2V
REM Model path configuration
REM Model root directory path
REM Example: D:\models\LightX2V
set model_path=/path/to/LightX2V
REM Server configuration
set server_name=127.0.0.1
set server_port=8032
REM Output directory configuration
set output_dir=./outputs
REM GPU configuration
set gpu_id=0
REM ==================== Environment Variables Setup ====================
set CUDA_VISIBLE_DEVICES=%gpu_id%
set PYTHONPATH=%lightx2v_path%;%PYTHONPATH%
set PROFILING_DEBUG_LEVEL=2
set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
REM ==================== Parameter Parsing ====================
REM Default interface language
set lang=zh
REM Parse command line arguments
:parse_args
if "%1"=="" goto :end_parse
if "%1"=="--lang" (
set lang=%2
shift
shift
goto :parse_args
)
if "%1"=="--port" (
set server_port=%2
shift
shift
goto :parse_args
)
if "%1"=="--gpu" (
set gpu_id=%2
set CUDA_VISIBLE_DEVICES=%gpu_id%
shift
shift
goto :parse_args
)
if "%1"=="--output_dir" (
set output_dir=%2
shift
shift
goto :parse_args
)
if "%1"=="--help" (
echo 🎬 LightX2V Gradio Windows Startup Script
echo ==========================================
echo Usage: %0 [options]
echo.
echo 📋 Available options:
echo --lang zh^|en Interface language (default: zh)
echo zh: Chinese interface
echo en: English interface
echo --port PORT Server port (default: 8032)
echo --gpu GPU_ID GPU device ID (default: 0)
echo --output_dir OUTPUT_DIR
echo Output video save directory (default: ./outputs)
echo --help Show this help message
echo.
echo 🚀 Usage examples:
echo %0 # Default startup
echo %0 --lang zh --port 8032 # Start with specified parameters
echo %0 --lang en --port 7860 # English interface
echo %0 --gpu 1 --port 8032 # Use GPU 1
echo %0 --output_dir ./custom_output # Use custom output directory
echo.
echo 📝 Notes:
echo - Edit script to configure model path before first use
echo - Ensure required Python dependencies are installed
echo - Recommended to use GPU with 8GB+ VRAM
echo - 🚨 Strongly recommend storing models on SSD for better performance
pause
exit /b 0
)
echo Unknown parameter: %1
echo Use --help to see help information
pause
exit /b 1
:end_parse
REM ==================== Parameter Validation ====================
if "%lang%"=="zh" goto :valid_lang
if "%lang%"=="en" goto :valid_lang
echo Error: Language must be 'zh' or 'en'
pause
exit /b 1
:valid_lang
REM Check if model path exists
if not exist "%model_path%" (
echoError: Model path does not exist
echo 📁 Path: %model_path%
echo 🔧 Solutions:
echo 1. Check model path configuration in script
echo 2. Ensure model files are properly downloaded
echo 3. Verify path permissions are correct
echo 4. 💾 Recommend storing models on SSD for faster loading
pause
exit /b 1
)
REM Select demo file based on language
if "%lang%"=="zh" (
set demo_file=gradio_demo_zh.py
echo 🌏 Using Chinese interface
) else (
set demo_file=gradio_demo.py
echo 🌏 Using English interface
)
REM Check if demo file exists
if not exist "%demo_file%" (
echoError: Demo file does not exist
echo 📄 File: %demo_file%
echo 🔧 Solutions:
echo 1. Ensure script is run in the correct directory
echo 2. Check if file has been renamed or moved
echo 3. Re-clone or download project files
pause
exit /b 1
)
REM ==================== System Information Display ====================
echo ==========================================
echo 🚀 LightX2V Gradio Starting...
echo ==========================================
echo 📁 Project path: %lightx2v_path%
echo 🤖 Model path: %model_path%
echo 🌏 Interface language: %lang%
echo 🖥️ GPU device: %gpu_id%
echo 🌐 Server address: %server_name%:%server_port%
echo 📁 Output directory: %output_dir%
echo ==========================================
REM Display system resource information
echo 💻 System resource information:
wmic OS get TotalVisibleMemorySize,FreePhysicalMemory /format:table
REM Display GPU information
nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader,nounits 2>nul
if errorlevel 1 (
echo 🎮 GPU information: Unable to get GPU info
) else (
echo 🎮 GPU information:
nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader,nounits
)
REM ==================== Start Demo ====================
echo 🎬 Starting Gradio demo...
echo 📱 Please access in browser: http://%server_name%:%server_port%
echo ⏹️ Press Ctrl+C to stop service
echo 🔄 First startup may take several minutes to load resources...
echo ==========================================
REM Start Python demo
python %demo_file% ^
--model_path "%model_path%" ^
--server_name %server_name% ^
--server_port %server_port% ^
--output_dir "%output_dir%"
REM Display final system resource usage
echo.
echo ==========================================
echo 📊 Final system resource usage:
wmic OS get TotalVisibleMemorySize,FreePhysicalMemory /format:table
pause
"""国际化支持模块"""
import os
# 默认语言
DEFAULT_LANG = os.getenv("GRADIO_LANG", "zh")
# 翻译字典
TRANSLATIONS = {
"zh": {
"title": "🎬 LightX2V 图片/视频生成器",
"model_config": "🗂️ 模型配置",
"model_config_hint": "💡 **提示**:请确保以下每个模型选项至少有一个已下载✅的模型可用,否则可能无法正常生成视频。",
"fp8_not_supported": "⚠️ **您的设备不支持fp8推理**,已自动隐藏包含fp8的模型选项。",
"model_type": "模型类型",
"model_type_info": "Wan2.2 需要分别指定高噪模型和低噪模型; Qwen-Image-Edit-2511 用于图片编辑(i2i); Qwen-Image-2512 用于文本生成图片(t2i); Z-Image-Turbo 用于文本生成图片(t2i)",
"qwen3_encoder": "📝 Qwen3 编码器",
"scheduler": "⏱️ 调度器",
"qwen25vl_encoder": "📝 Qwen25-VL 编码器",
"task_type": "任务类型",
"task_type_info": "I2V: 图生视频, T2V: 文生视频, T2I: 文生图, I2I: 图片编辑",
"download_source": "📥 下载源",
"download_source_info": "选择模型下载源",
"diffusion_model": "🎨 Diffusion模型",
"high_noise_model": "🔊 高噪模型",
"low_noise_model": "🔇 低噪模型",
"text_encoder": "📝 文本编码器",
"text_encoder_tokenizer": "📝 文本编码器 Tokenizer",
"image_encoder": "🖼️ 图像编码器",
"image_encoder_tokenizer": "🖼️ 图像编码器 Tokenizer",
"vae": "🎞️ VAE编码/解码器",
"attention_operator": "⚡ 注意力算子",
"attention_operator_info": "使用适当的注意力算子加速推理",
"quant_operator": "⚡矩阵乘法算子",
"quant_operator_info": "选择低精度矩阵乘法算子以加速推理",
"input_params": "📥 输入参数",
"input_image": "输入图像(可拖入多张图片)",
"image_preview": "已上传的图片预览",
"image_path": "图片路径",
"prompt": "提示词",
"prompt_placeholder": "描述视频/图片内容...",
"negative_prompt": "负向提示词",
"negative_prompt_placeholder": "不希望出现在视频/图片中的内容...",
"max_resolution": "最大分辨率",
"max_resolution_info": "如果显存不足,可调低分辨率",
"random_seed": "随机种子",
"infer_steps": "推理步数",
"infer_steps_distill": "蒸馏模型推理步数默认为4。",
"infer_steps_info": "视频生成的推理步数。增加步数可能提高质量但降低速度。",
"sample_shift": "分布偏移",
"sample_shift_info": "控制样本分布偏移的程度。值越大表示偏移越明显。",
"cfg_scale": "CFG缩放因子",
"cfg_scale_info": "控制提示词的影响强度。值越高,提示词的影响越大。当值为1时,自动禁用CFG。",
"enable_cfg": "启用无分类器引导",
"fps": "每秒帧数(FPS)",
"fps_info": "视频的每秒帧数。较高的FPS会产生更流畅的视频。",
"num_frames": "总帧数",
"num_frames_info": "视频中的总帧数。更多帧数会产生更长的视频。",
"video_duration": "视频时长(秒)",
"video_duration_info": "视频的时长(秒)。实际帧数 = 时长 × FPS。",
"output_path": "输出视频路径",
"output_path_info": "必须包含.mp4扩展名。如果留空或使用默认值,将自动生成唯一文件名。",
"output_image_path": "输出图片路径",
"output_image_path_info": "必须包含.png扩展名。如果留空或使用默认值,将自动生成唯一文件名。",
"output_result": "📤 生成的结果",
"output_image": "输出图片",
"generate_video": "🎬 生成视频",
"generate_image": "🖼️ 生成图片",
"infer_steps_image_info": "图片编辑的推理步数,默认为8。",
"aspect_ratio": "宽高比",
"aspect_ratio_info": "选择生成图片的宽高比",
"model_config_hint_image": "💡 **提示**:请确保以下每个模型选项至少有一个已下载✅的模型可用,否则可能无法正常生成图片。",
"download": "📥 下载",
"downloaded": "✅ 已下载",
"not_downloaded": "❌ 未下载",
"download_complete": "✅ {model_name} 下载完成",
"download_start": "开始从 {source} 下载 {model_name}...",
"please_select_model": "请先选择模型",
"loading_models": "正在加载 Hugging Face 模型列表缓存...",
"models_loaded": "模型列表缓存加载完成",
"use_lora": "使用 LoRA",
"lora": "🎨 LoRA",
"lora_info": "选择要使用的 LoRA 模型",
"lora_strength": "LoRA 强度",
"lora_strength_info": "控制 LoRA 的影响强度,范围 0-10",
"high_noise_lora": "🔊 高噪模型 LoRA",
"high_noise_lora_info": "选择高噪模型使用的 LoRA",
"high_noise_lora_strength": "高噪模型 LoRA 强度",
"high_noise_lora_strength_info": "控制高噪模型 LoRA 的影响强度,范围 0-10",
"low_noise_lora": "🔇 低噪模型 LoRA",
"low_noise_lora_info": "选择低噪模型使用的 LoRA",
"low_noise_lora_strength": "低噪模型 LoRA 强度",
"low_noise_lora_strength_info": "控制低噪模型 LoRA 的影响强度,范围 0-10",
},
"en": {
"title": "🎬 LightX2V Image/Video Generator",
"model_config": "🗂️ Model Configuration",
"model_config_hint": "💡 **Tip**: Please ensure at least one downloaded ✅ model is available for each model option below, otherwise video generation may fail.",
"fp8_not_supported": "⚠️ **Your device does not support fp8 inference**, fp8 model options have been automatically hidden.",
"model_type": "Model Type",
"model_type_info": "Wan2.2 requires separate high-noise and low-noise models; Qwen-Image-Edit-2511 is for image editing (i2i); Qwen-Image-2512 is for text-to-image (t2i); Z-Image-Turbo is for text-to-image (t2i)",
"qwen3_encoder": "📝 Qwen3 Encoder",
"scheduler": "⏱️ Scheduler",
"qwen25vl_encoder": "📝 Qwen25-VL Encoder",
"task_type": "Task Type",
"task_type_info": "I2V: Image-to-Video, T2V: Text-to-Video, T2I: Text-to-Image, I2I: Image Editing",
"download_source": "📥 Download Source",
"download_source_info": "Select model download source",
"diffusion_model": "🎨 Diffusion Model",
"high_noise_model": "🔊 High Noise Model",
"low_noise_model": "🔇 Low Noise Model",
"text_encoder": "📝 Text Encoder",
"text_encoder_tokenizer": "📝 Text Encoder Tokenizer",
"image_encoder": "🖼️ Image Encoder",
"image_encoder_tokenizer": "🖼️ Image Encoder Tokenizer",
"vae": "🎞️ VAE Encoder/Decoder",
"attention_operator": "⚡ Attention Operator",
"attention_operator_info": "Use appropriate attention operator to accelerate inference",
"quant_operator": "⚡ Matrix Multiplication Operator",
"quant_operator_info": "Select low-precision matrix multiplication operator to accelerate inference",
"input_params": "📥 Input Parameters",
"input_image": "Input Image (drag multiple images)",
"image_preview": "Uploaded Image Preview",
"image_path": "Image Path",
"prompt": "Prompt",
"prompt_placeholder": "Describe video/image content...",
"negative_prompt": "Negative Prompt",
"negative_prompt_placeholder": "Content you don't want in the video/image...",
"max_resolution": "Max Resolution",
"max_resolution_info": "Reduce resolution if VRAM is insufficient",
"random_seed": "Random Seed",
"infer_steps": "Inference Steps",
"infer_steps_distill": "Distill model inference steps default to 4.",
"infer_steps_info": "Number of inference steps for video generation. More steps may improve quality but reduce speed.",
"sample_shift": "Sample Shift",
"sample_shift_info": "Control the degree of sample distribution shift. Higher values indicate more obvious shift.",
"cfg_scale": "CFG Scale",
"cfg_scale_info": "Control the influence strength of prompts. Higher values mean stronger prompt influence. When value is 1, CFG is automatically disabled.",
"enable_cfg": "Enable Classifier-Free Guidance",
"fps": "Frames Per Second (FPS)",
"fps_info": "Frames per second of the video. Higher FPS produces smoother videos.",
"num_frames": "Total Frames",
"num_frames_info": "Total number of frames in the video. More frames produce longer videos.",
"video_duration": "Video Duration (seconds)",
"video_duration_info": "Duration of the video in seconds. Actual frames = duration × FPS.",
"output_path": "Output Video Path",
"output_path_info": "Must include .mp4 extension. If left empty or using default value, a unique filename will be automatically generated.",
"output_image_path": "Output Image Path",
"output_image_path_info": "Must include .png extension. If left empty or using default value, a unique filename will be automatically generated.",
"output_result": "📤 Generated Result",
"output_image": "Output Image",
"generate_video": "🎬 Generate Video",
"generate_image": "🖼️ Generate Image",
"infer_steps_image_info": "Number of inference steps for image editing, default is 8.",
"aspect_ratio": "Aspect Ratio",
"aspect_ratio_info": "Select the aspect ratio for generated images",
"model_config_hint_image": "💡 **Tip**: Please ensure at least one downloaded ✅ model is available for each model option below, otherwise image generation may fail.",
"download": "📥 Download",
"downloaded": "✅ Downloaded",
"not_downloaded": "❌ Not Downloaded",
"download_complete": "✅ {model_name} download complete",
"download_start": "Starting to download {model_name} from {source}...",
"please_select_model": "Please select a model first",
"loading_models": "Loading Hugging Face model list cache...",
"models_loaded": "Model list cache loaded",
"use_lora": "Use LoRA",
"lora": "🎨 LoRA",
"lora_info": "Select LoRA model to use",
"lora_strength": "LoRA Strength",
"lora_strength_info": "Control LoRA influence strength, range 0-10",
"high_noise_lora": "🔊 High Noise Model LoRA",
"high_noise_lora_info": "Select high noise model LoRA to use",
"high_noise_lora_strength": "High Noise Model LoRA Strength",
"high_noise_lora_strength_info": "Control high noise model LoRA influence strength, range 0-10",
"low_noise_lora": "🔇 Low Noise Model LoRA",
"low_noise_lora_info": "Select low noise model LoRA to use",
"low_noise_lora_strength": "Low Noise Model LoRA Strength",
"low_noise_lora_strength_info": "Control low noise model LoRA influence strength, range 0-10",
},
}
def t(key: str, lang: str = None) -> str:
"""获取翻译文本"""
if lang is None:
lang = DEFAULT_LANG
if lang not in TRANSLATIONS:
lang = "zh"
return TRANSLATIONS[lang].get(key, key)
def set_language(lang: str):
"""设置语言"""
global DEFAULT_LANG
if lang in TRANSLATIONS:
DEFAULT_LANG = lang
os.environ["GRADIO_LANG"] = lang
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment