The official ComfyUI integration nodes for LightX2V are now available in a dedicated repository, providing a complete modular configuration system and optimization features.
Lightx2v is a lightweight video inference and generation engine that provides a web interface based on Gradio, supporting both Image-to-Video and Text-to-Video generation modes.
This project contains two main demo files:
-`gradio_demo.py` - English interface version
-`gradio_demo_zh.py` - Chinese interface version
## 🚀 Quick Start
### System Requirements
- Python 3.10+ (recommended)
- CUDA 12.4+ (recommended)
- At least 8GB GPU VRAM
- At least 16GB system memory (preferably at least 32GB)
- At least 128GB SSD solid-state drive (**💾 Strongly recommend using SSD solid-state drives to store model files! During "lazy loading" startup, significantly improves model loading speed and inference performance**)
| ✅ [Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v) | 720p | 14B | HD distilled version | High quality + fast inference |
#### 📝 Text-to-Video Models
| Model Name | Parameters | Features | Recommended Use |
**💡 Tip**: Generally, after enabling "Auto-configure Inference Options", the system will automatically optimize parameter settings based on your hardware configuration, and performance issues usually won't occur. If you encounter problems, please refer to the following solutions:
1.**CUDA Memory Insufficient**
- Enable CPU offloading
- Reduce resolution
- Enable quantization options
2.**System Memory Insufficient**
- Enable CPU offloading
- Enable lazy loading option
- Enable quantization options
3.**Slow Generation Speed**
- Reduce inference steps
- Enable auto-configuration
- Use lightweight models
- Enable Tea Cache
- Use quantization operators
- 💾 **Check if models are stored on SSD**
4.**Slow Model Loading**
- 💾 **Migrate models to SSD storage**
- Enable lazy loading option
- Check disk I/O performance
- Consider using NVMe SSD
5.**Poor Video Quality**
- Increase inference steps
- Increase CFG scale factor
- Use 14B models
- Optimize prompts
### Log Viewing
```bash
# View inference logs
tail-f inference_logs.log
# View GPU usage
nvidia-smi
# View system resources
htop
```
**Note**: Please comply with relevant laws and regulations when using videos generated by this tool, and do not use them for illegal purposes.
After installation, we recommend running a verification script to ensure proper functionality:
> 📝 **Testing**: You can also run the [official test script](https://github.com/woct0rdho/SageAttention/blob/main/tests/test_sageattn.py) for more detailed functionality verification.
### Step 6: Get LightX2V Project Code
Clone the LightX2V project from GitHub and install Windows-specific dependencies:
```bash
# Clone project code
git clone https://github.com/ModelTC/LightX2V.git
# Enter project directory
cd LightX2V
# Install Windows-specific dependencies
pip install-r requirements_win.txt
```
> 🔍 **Note**: We use `requirements_win.txt` instead of the standard `requirements.txt` because Windows environments may require specific package versions or additional dependencies.
## Troubleshooting
### 1. CUDA Version Mismatch
**Symptoms**: CUDA-related errors occur
**Solutions**:
- Verify GPU driver supports required CUDA version
- Re-download matching wheel packages
- Use `nvidia-smi` to check maximum supported CUDA version
### 2. Dependency Conflicts
**Symptoms**: Package version conflicts or import errors
- Recreate environment and install dependencies strictly by version requirements
- Use virtual environments to isolate dependencies for different projects
### 3. Wheel Package Download Issues
**Symptoms**: Slow download speeds or connection failures
**Solutions**:
- Use download tools or browser for direct downloads
- Look for domestic mirror sources
- Check network connections and firewall settings
## Next Steps
After completing the environment setup, you can:
- 📚 Check the [Quick Start Guide](../getting_started/quickstart.md)(skip environment installation steps)
- 🌐 Use the [Gradio Web Interface](./deploy_gradio.md) for visual operations (skip environment installation steps)
## Version Compatibility Reference
| Component | Recommended Version |
|-----------|-------------------|
| Python | 3.12 |
| PyTorch | 2.6.0+cu124 |
| vLLM | 0.9.1+cu124 |
| SageAttention | 2.1.1+cu126torch2.6.0 |
| CUDA | 12.4+ |
---
💡 **Pro Tip**: If you encounter other issues, we recommend first checking whether all component versions match properly, as most problems stem from version incompatibilities.
This guide is specifically designed for hardware resource-constrained environments, particularly configurations with **8GB VRAM + 16/32GB RAM**, providing detailed instructions on how to successfully run Lightx2v 14B models for 480p and 720p video generation.
Lightx2v is a powerful video generation model, but it requires careful optimization to run smoothly in resource-constrained environments. This guide provides a complete solution from hardware selection to software configuration, ensuring you can achieve the best video generation experience under limited hardware conditions.
## 🎯 Target Hardware Configuration
### Recommended Hardware Specifications
**GPU Requirements**:
-**VRAM**: 8GB (RTX 3060/3070/4060/4060Ti, etc.)
-**Architecture**: NVIDIA graphics cards with CUDA support
**System Memory**:
-**Minimum**: 16GB DDR4
-**Recommended**: 32GB DDR4/DDR5
-**Memory Speed**: 3200MHz or higher recommended
**Storage Requirements**:
-**Type**: NVMe SSD strongly recommended
-**Capacity**: At least 50GB available space
-**Speed**: Read speed of 3000MB/s or higher recommended
**CPU Requirements**:
-**Cores**: 8 cores or more recommended
-**Frequency**: 3.0GHz or higher recommended
-**Architecture**: Support for AVX2 instruction set
## ⚙️ Core Optimization Strategies
### 1. Environment Optimization
Before running Lightx2v, it's recommended to set the following environment variables to optimize performance:
-**[14B Model 480p Video Generation Configuration](https://github.com/ModelTC/lightx2v/tree/main/configs/offload/disk/wan_i2v_phase_lazy_load_480p.json)**
-**[14B Model 720p Video Generation Configuration](https://github.com/ModelTC/lightx2v/tree/main/configs/offload/disk/wan_i2v_phase_lazy_load_720p.json)**
-**[1.3B Model 720p Video Generation Configuration](https://github.com/ModelTC/LightX2V/tree/main/configs/offload/block/wan_t2v_1_3b.json)**
- The inference bottleneck for 1.3B models is the T5 encoder, so the configuration file specifically optimizes for T5
LoRA (Low-Rank Adaptation) is an efficient model fine-tuning technique that significantly reduces the number of trainable parameters through low-rank matrix decomposition. LightX2V fully supports LoRA technology, including LoRA inference, LoRA extraction, and LoRA merging functions.
## 🎯 LoRA Technical Features
-**Efficient Fine-tuning**: Dramatically reduces training parameters through low-rank adaptation
-**Flexible Deployment**: Supports dynamic loading and removal of LoRA weights
-**Multiple Formats**: Supports various LoRA weight formats and naming conventions
-**Comprehensive Tools**: Provides complete LoRA extraction and merging toolchain
Specify through [config file](wan_t2v_distill_4step_cfg_lora.json), modify the startup command in [scripts/server/start_server.sh](https://github.com/ModelTC/lightx2v/blob/main/scripts/server/start_server.sh):
This document introduces the model directory structure of the Lightx2v project, helping users correctly organize model files for a convenient user experience. Through proper directory organization, users can enjoy the convenience of "one-click startup" without manually configuring complex path parameters.
## 🗂️ Model Directory Structure
### Lightx2v Official Model List
View all available models: [Lightx2v Official Model Repository](https://huggingface.co/lightx2v)
### Standard Directory Structure
Using `Wan2.1-I2V-14B-480P-Lightx2v` as an example:
When starting with configuration files, such as [configuration file](https://github.com/ModelTC/LightX2V/tree/main/configs/offload/disk/wan_i2v_phase_lazy_load_480p.json), the following path configurations can be omitted:
-`dit_quantized_ckpt`: No need to specify, code will automatically search in the model directory
-`tiny_vae_path`: No need to specify, code will automatically search in the model directory
-`clip_quantized_ckpt`: No need to specify, code will automatically search in the model directory
-`t5_quantized_ckpt`: No need to specify, code will automatically search in the model directory
**💡 Simplified Configuration**: After organizing model files according to the recommended directory structure, most path configurations can be omitted as the code will handle them automatically.
### Manual Download
1. Visit the [Hugging Face Model Page](https://huggingface.co/lightx2v)
2. Select the required model version
3. Download all files to the corresponding directory
**💡 Download Recommendations**: It is recommended to use SSD storage and ensure stable network connection. For large files, you can use `git lfs` or download tools such as `aria2c`.
## 💡 Best Practices
-**Use SSD Storage**: Significantly improve model loading speed and inference performance
-**Unified Directory Structure**: Facilitate management and switching between different model versions
-**Reserve Sufficient Space**: Ensure adequate storage space (recommended at least 200GB)
-**Regular Cleanup**: Delete unnecessary model versions to save space
-**Network Optimization**: Use stable network connections and download tools
## 🚨 Common Issues
### Q: Model files are too large and download is slow?
A: Use domestic mirror sources, download tools such as `aria2c`, or consider using cloud storage services
### Q: Model path not found when starting?
A: Check if the model has been downloaded correctly and verify the path configuration
### Q: How to switch between different model versions?
A: Modify the model path parameter in the startup command, supports running multiple model instances simultaneously
### Q: Model loading is very slow?
A: Ensure models are stored on SSD, enable lazy loading, and use quantized version models
### Q: How to set paths in configuration files?
A: After organizing according to the recommended directory structure, most path configurations can be omitted as the code will handle them automatically
## 📚 Related Links
-[Lightx2v Official Model Repository](https://huggingface.co/lightx2v)
-[Gradio Deployment Guide](./deploy_gradio.md)
---
Through proper model file organization, users can enjoy the convenience of "one-click startup" without manually configuring complex path parameters. It is recommended to organize model files according to the structure recommended in this document and fully utilize the advantages of SSD storage.
For a better display of video playback effects and detailed performance comparisons, you can get better presentation and corresponding documentation content on this [🔗 page](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/benchmark_source.md).
-**Wan2.1 Official(baseline)**: Baseline implementation based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1)
-**LightX2V_1**: Uses SageAttention2 to replace native attention mechanism with DIT BF16+FP32 mixed precision (sensitive layers), improving computational efficiency while maintaining precision
-**LightX2V_2**: Unified BF16 precision computation to further reduce memory usage and computational overhead while maintaining generation quality
-**LightX2V_3**: Quantization optimization introducing FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage
-**LightX2V_4**: Ultimate optimization adding TeaCache (teacache_thresh=0.2) caching reuse technology on top of LightX2V_3 to achieve maximum acceleration by intelligently skipping redundant computations
-**LightX2V_4-Distill**: Building on LightX2V_4 with 4-step distilled model ([Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v))
@@ -34,7 +33,7 @@ cd flash-attention/hopper && python setup.py install
```shell
# Modify the path in the script
bash scripts/run_wan_t2v.sh
bash scripts/wan/run_wan_t2v.sh
```
In addition to the existing input arguments in the script, there are also some necessary parameters in the `${lightx2v_path}/configs/wan_t2v.json` file specified by `--config_json`. You can modify them as needed.
In addition to the existing input arguments in the script, there are also some necessary parameters in the `wan_t2v.json` file specified by `--config_json`. You can modify them as needed.
<strong>LightX2V: Light Video Generation Inference Framework</strong>
</div>
LightX2V is a lightweight video generation inference framework designed to provide an inference tool that leverages multiple advanced video generation inference techniques. As a unified inference platform, this framework supports various generation tasks such as text-to-video (T2V) and image-to-video (I2V) across different models. X2V means transforming different input modalities (such as text or images) to video output.
GitHub: https://github.com/ModelTC/lightx2v
HuggingFace: https://huggingface.co/lightx2v
Documentation
-------------
...
...
@@ -22,6 +37,7 @@ Documentation
:caption: Quick Start
Quick Start <getting_started/quickstart.md>
Benchmark <getting_started/benchmark.md>
.. toctree::
:maxdepth: 1
...
...
@@ -30,8 +46,9 @@ Documentation
Model Quantization <method_tutorials/quantization.md>
The DiT model in `LightX2V` currently uses three types of attention mechanisms. Each type of attention can be configured with a specific backend library.
## Attention Mechanisms Supported by LightX2V
---
## Attention Usage Locations
1.**Self-Attention on the image**
- Configuration key: `self_attn_1_type`
2.**Cross-Attention between image and prompt text**
- Configuration key: `cross_attn_1_type`
3.**Cross-Attention between image and reference image (in I2V mode)**
Tips: radial_attn can only be used in self attention due to the limitations of its sparse algorithm principle.
For further customization or behavior tuning, please refer to the official documentation of the respective attention libraries.
For further customization of attention mechanism behavior, please refer to the official documentation or implementation code of each attention library.
Autoregressive distillation is a technical exploration in LightX2V. By training distilled models, it reduces inference steps from the original 40-50 steps to **8 steps**, achieving inference acceleration while enabling infinite-length video generation through KV Cache technology.
> ⚠️ Warning: Currently, autoregressive distillation has mediocre effects and the acceleration improvement has not met expectations, but it can serve as a long-term research project. Currently, LightX2V only supports autoregressive models for T2V.
## 🔍 Technical Principle
Autoregressive distillation is implemented through [CausVid](https://github.com/tianweiy/CausVid) technology. CausVid performs step distillation and CFG distillation on 1.3B autoregressive models. LightX2V extends it with a series of enhancements:
1.**Larger Models**: Supports autoregressive distillation training for 14B models;
2.**More Complete Data Processing Pipeline**: Generates a training dataset of 50,000 prompt-video pairs;
For detailed implementation, refer to [CausVid-Plus](https://github.com/GoatWu/CausVid-Plus).
## 🛠️ Configuration Files
### Configuration File
Configuration options are provided in the [configs/causvid/](https://github.com/ModelTC/lightx2v/tree/main/configs/causvid) directory: