Commit 1292c53b authored by gushiqiao's avatar gushiqiao
Browse files

Update benchmark

parent 927ce73a
...@@ -49,7 +49,7 @@ For comprehensive usage instructions, please refer to our documentation: **[Engl ...@@ -49,7 +49,7 @@ For comprehensive usage instructions, please refer to our documentation: **[Engl
## 🚀 Core Features ## 🚀 Core Features
### 🎯 **Ultimate Performance Optimization** ### 🎯 **Ultimate Performance Optimization**
- **🔥 SOTA Inference Speed**: Achieve **~15x** acceleration via step distillation and system optimization (single GPU) - **🔥 SOTA Inference Speed**: Achieve **~20x** acceleration via step distillation and system optimization (single GPU)
- **⚡️ Revolutionary 4-Step Distillation**: Compress original 40-50 step inference to just 4 steps without CFG requirements - **⚡️ Revolutionary 4-Step Distillation**: Compress original 40-50 step inference to just 4 steps without CFG requirements
- **🛠️ Advanced Operator Support**: Integrated with cutting-edge operators including [Sage Attention](https://github.com/thu-ml/SageAttention), [Flash Attention](https://github.com/Dao-AILab/flash-attention), [Radial Attention](https://github.com/mit-han-lab/radial-attention), [q8-kernel](https://github.com/KONAKONA666/q8_kernels), [sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel), [vllm](https://github.com/vllm-project/vllm) - **🛠️ Advanced Operator Support**: Integrated with cutting-edge operators including [Sage Attention](https://github.com/thu-ml/SageAttention), [Flash Attention](https://github.com/Dao-AILab/flash-attention), [Radial Attention](https://github.com/mit-han-lab/radial-attention), [q8-kernel](https://github.com/KONAKONA666/q8_kernels), [sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel), [vllm](https://github.com/vllm-project/vllm)
......
...@@ -47,7 +47,7 @@ ...@@ -47,7 +47,7 @@
## 🚀 核心特性 ## 🚀 核心特性
### 🎯 **极致性能优化** ### 🎯 **极致性能优化**
- **🔥 SOTA推理速度**: 通过步数蒸馏和系统优化实现**15倍**极速加速(单GPU) - **🔥 SOTA推理速度**: 通过步数蒸馏和系统优化实现**20倍**极速加速(单GPU)
- **⚡️ 革命性4步蒸馏**: 将原始40-50步推理压缩至仅需4步,且无需CFG配置 - **⚡️ 革命性4步蒸馏**: 将原始40-50步推理压缩至仅需4步,且无需CFG配置
- **🛠️ 先进算子支持**: 集成顶尖算子,包括[Sage Attention](https://github.com/thu-ml/SageAttention)[Flash Attention](https://github.com/Dao-AILab/flash-attention)[Radial Attention](https://github.com/mit-han-lab/radial-attention)[q8-kernel](https://github.com/KONAKONA666/q8_kernels)[sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)[vllm](https://github.com/vllm-project/vllm) - **🛠️ 先进算子支持**: 集成顶尖算子,包括[Sage Attention](https://github.com/thu-ml/SageAttention)[Flash Attention](https://github.com/Dao-AILab/flash-attention)[Radial Attention](https://github.com/mit-han-lab/radial-attention)[q8-kernel](https://github.com/KONAKONA666/q8_kernels)[sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)[vllm](https://github.com/vllm-project/vllm)
......
...@@ -5,11 +5,11 @@ ...@@ -5,11 +5,11 @@
## H200 (~140GB VRAM) ## H200 (~140GB VRAM)
**Software Environment:** **Software Environment:**
- Python 3.11 - **Python**: 3.11
- PyTorch 2.7.1+cu128 - **PyTorch**: 2.7.1+cu128
- SageAttention 2.2.0 - **SageAttention**: 2.2.0
- vLLM 0.9.2 - **vLLM**: 0.9.2
- sgl-kernel 0.1.8 - **sgl-kernel**: 0.1.8
### 480P 5s Video ### 480P 5s Video
...@@ -19,14 +19,15 @@ ...@@ -19,14 +19,15 @@
#### Performance Comparison #### Performance Comparison
| Configuration | Model Load Time(s) | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect | | Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
|:-------------|:------------------:|:-----------------:|:--------------:|:-------:|:------------:| |:-------------|:-----------------:|:--------------:|:-------:|:------------:|
| Wan2.1 Official(baseline) | 68.26 | 366.04 | 71 | 1.0x | <video src="PATH_TO_BASELINE_480P_VIDEO" width="200px"></video> | | **Wan2.1 Official** | 366 | 71 | 1.0x | <video src="https://github.com/user-attachments/assets/24fb112e-c868-4484-b7f0-d9542979c2c3" width="200px"></video> |
| **LightX2V_1** | 37.28 | 249.54 | 53 | **1.47x** | <video src="PATH_TO_LIGHTX2V_1_480P_VIDEO" width="200px"></video> | | **FastVideo** | 292 | 26 | **1.25x** | <video src="" width="200px"></video> |
| **LightX2V_2** | 37.24 | 216.16 | 50 | **1.69x** | <video src="PATH_TO_LIGHTX2V_2_480P_VIDEO" width="200px"></video> | | **LightX2V_1** | 250 | 53 | **1.46x** | <video src="https://github.com/user-attachments/assets/7bffe48f-e433-430b-91dc-ac745908ba3a" width="200px"></video> |
| **LightX2V_3** | 23.62 | 190.73 | 35 | **1.92x** | <video src="PATH_TO_LIGHTX2V_3_480P_VIDEO" width="200px"></video> | | **LightX2V_2** | 216 | 50 | **1.70x** | <video src="https://github.com/user-attachments/assets/0a24ca47-c466-433e-8a53-96f259d19841" width="200px"></video> |
| **LightX2V_4** | 23.62 | 107.19 | 35 | **3.41x** | <video src="PATH_TO_LIGHTX2V_4_480P_VIDEO" width="200px"></video> | | **LightX2V_3** | 191 | 35 | **1.92x** | <video src="https://github.com/user-attachments/assets/970c73d3-1d60-444e-b64d-9bf8af9b19f1" width="200px"></video> |
| **LightX2V_4-Distill** | 23.62 | 107.19 | 35 | **3.41x** | <video src="PATH_TO_LIGHTX2V_4_DISTILL_480P_VIDEO" width="200px"></video> | | **LightX2V_3-Distill** | 14 | 35 | **🏆 20.85x** | <video src="" width="200px"></video> |
| **LightX2V_4** | 107 | 35 | **3.41x** | <video src="https://github.com/user-attachments/assets/49cd2760-4be2-432c-bf4e-01af9a1303dd" width="200px"></video> |
### 720P 5s Video ### 720P 5s Video
...@@ -34,7 +35,17 @@ ...@@ -34,7 +35,17 @@
- **Model**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) - **Model**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
- **Parameters**: infer_steps=40, seed=42, enable_cfg=True - **Parameters**: infer_steps=40, seed=42, enable_cfg=True
*Coming soon...* #### Performance Comparison
| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
| **Wan2.1 Official** | 974 | 81 | 1.0x | <video src="" width="200px"></video> |
| **FastVideo** | 914 | 40 | **1.07x** | <video src="" width="200px"></video> |
| **LightX2V_1** | 807 | 65 | **1.21x** | <video src="" width="200px"></video> |
| **LightX2V_2** | 751 | 57 | **1.30x** | <video src="" width="200px"></video> |
| **LightX2V_3** | 671 | 43 | **1.45x** | <video src="" width="200px"></video> |
| **LightX2V_3-Distill** | 44 | 43 | **🏆 22.14x** | <video src="" width="200px"></video> |
| **LightX2V_4** | 344 | 46 | **2.83x** | <video src="" width="200px"></video> |
--- ---
...@@ -50,11 +61,12 @@ ...@@ -50,11 +61,12 @@
--- ---
## Table Descriptions ## Configuration Descriptions
- **Wan2.1 Official(baseline)**: Baseline implementation based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1) - **Wan2.1 Official**: Based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1)
- **LightX2V_1**: Uses SageAttention2 to replace native attention mechanism with DIT BF16+FP32 mixed precision (sensitive layers), improving computational efficiency while maintaining precision - **FastVideo**: Based on [FastVideo official repository](https://github.com/hao-ai-lab/FastVideo), using SageAttention backend
- **LightX2V_2**: Unified BF16 precision computation to further reduce memory usage and computational overhead while maintaining generation quality - **LightX2V_1**: Uses SageAttention2 to replace native attention mechanism, adopts DIT BF16+FP32 (partial sensitive layers) mixed precision computation, improving computational efficiency while maintaining precision
- **LightX2V_3**: Quantization optimization introducing FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage - **LightX2V_2**: Unified BF16 precision computation, further reducing memory usage and computational overhead while maintaining generation quality
- **LightX2V_4**: Ultimate optimization adding TeaCache (teacache_thresh=0.2) caching reuse technology on top of LightX2V_3 to achieve maximum acceleration by intelligently skipping redundant computations - **LightX2V_3**: Introduces FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage
- **LightX2V_4-Distill**: Building on LightX2V_4 with 4-step distilled model ([Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v)) - **LightX2V_3-Distill**: Based on LightX2V_3 using 4-step distillation model(`infer_step=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality.
- **LightX2V_4**: Based on LightX2V_3 with TeaCache(teacache_thresh=0.2) caching reuse technology, achieving acceleration through intelligent redundant computation skipping
...@@ -4,12 +4,12 @@ ...@@ -4,12 +4,12 @@
## H200 (~140GB显存) ## H200 (~140GB显存)
**软件环境配置:** **软件环境配置**
- Python 3.11 - **Python**: 3.11
- PyTorch 2.7.1+cu128 - **PyTorch**: 2.7.1+cu128
- SageAttention 2.2.0 - **SageAttention**: 2.2.0
- vLLM 0.9.2 - **vLLM**: 0.9.2
- sgl-kernel 0.1.8 - **sgl-kernel**: 0.1.8
### 480P 5s视频 ### 480P 5s视频
...@@ -19,23 +19,34 @@ ...@@ -19,23 +19,34 @@
#### 性能对比 #### 性能对比
| 配置 | 模型加载时间(s) | 推理时间(s) | GPU显存占用(GB) | 加速比 | 视频效果 | | 配置 | 推理时间(s) | GPU显存占用(GB) | 加速比 | 视频效果 |
|:-----|:---------------:|:----------:|:---------------:|:------:|:--------:| |:-----|:----------:|:---------------:|:------:|:--------:|
| **Wan2.1 Official** | 68.26 | 366.04 | 71 | 1.0x | <video src="https://github.com/user-attachments/assets/24fb112e-c868-4484-b7f0-d9542979c2c3" width="200px"></video> | | **Wan2.1 Official** | 366 | 71 | 1.0x | <video src="https://github.com/user-attachments/assets/24fb112e-c868-4484-b7f0-d9542979c2c3" width="200px"></video> |
| **Fast Video** | xx | xx | xx | **xx** | <video src="" width="200px"></video> | | **FastVideo** | 292 | 26 | **1.25x** | <video src="" width="200px"></video> |
| **LightX2V_1** | 37.28 | 249.54 | 53 | **1.47x** | <video src="https://github.com/user-attachments/assets/7bffe48f-e433-430b-91dc-ac745908ba3a" width="200px"></video> | | **LightX2V_1** | 250 | 53 | **1.46x** | <video src="https://github.com/user-attachments/assets/7bffe48f-e433-430b-91dc-ac745908ba3a" width="200px"></video> |
| **LightX2V_2** | 37.24 | 216.16 | 50 | **1.69x** | <video src="https://github.com/user-attachments/assets/0a24ca47-c466-433e-8a53-96f259d19841" width="200px"></video> | | **LightX2V_2** | 216 | 50 | **1.70x** | <video src="https://github.com/user-attachments/assets/0a24ca47-c466-433e-8a53-96f259d19841" width="200px"></video> |
| **LightX2V_3** | 23.62 | 190.73 | 35 | **1.92x** | <video src="https://github.com/user-attachments/assets/970c73d3-1d60-444e-b64d-9bf8af9b19f1" width="200px"></video> | | **LightX2V_3** | 191 | 35 | **1.92x** | <video src="https://github.com/user-attachments/assets/970c73d3-1d60-444e-b64d-9bf8af9b19f1" width="200px"></video> |
| **LightX2V_4** | 23.62 | 107.19 | 35 | **3.41x** | <video src="https://github.com/user-attachments/assets/49cd2760-4be2-432c-bf4e-01af9a1303dd" width="200px"></video> | | **LightX2V_3-Distill** | 14 | 35 | **🏆 20.85x** | <video src="" width="200px"></video> |
| **LightX2V_4-Distill** | xxx| xxx | xx | **xx** | <video src="" width="200px"></video> | | **LightX2V_4** | 107 | 35 | **3.41x** | <video src="https://github.com/user-attachments/assets/49cd2760-4be2-432c-bf4e-01af9a1303dd" width="200px"></video> |
### 720P 5s视频 ### 720P 5s视频
**测试配置:** **测试配置:**
- **模型**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) - **模型**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
- **参数**: infer_steps=40, seed=42, enable_cfg=True - **参数**: infer_steps=40, seed=1234, enable_cfg=True
*即将更新...* #### 性能对比
| 配置 | 推理时间(s) | GPU显存占用(GB) | 加速比 | 视频效果 |
|:-----|:----------:|:---------------:|:------:|:--------:|
| **Wan2.1 Official** | 974 | 81 | 1.0x | <video src="" width="200px"></video> |
| **FastVideo** | 914 | 40 | **1.07x** | <video src="" width="200px"></video> |
| **LightX2V_1** | 807 | 65 | **1.21x** | <video src="" width="200px"></video> |
| **LightX2V_2** | 751 | 57 | **1.30x** | <video src="" width="200px"></video> |
| **LightX2V_3** | 671 | 43 | **1.45x** | <video src="" width="200px"></video> |
| **LightX2V_3-Distill** | 44 | 43 | **🏆 22.14x** | <video src="" width="200px"></video> |
| **LightX2V_4** | 344 | 46 | **2.83x** | <video src="" width="200px"></video> |
--- ---
...@@ -51,12 +62,12 @@ ...@@ -51,12 +62,12 @@
--- ---
## 表格说明 ## 配置说明
- **Wan2.1 Official**: 基于[Wan2.1官方仓库](https://github.com/Wan-Video/Wan2.1) - **Wan2.1 Official**: 基于[Wan2.1官方仓库](https://github.com/Wan-Video/Wan2.1)
- **FastVideo**: 基于[FastVideo官方仓库](https://github.com/hao-ai-lab/FastVideo) - **FastVideo**: 基于[FastVideo官方仓库](https://github.com/hao-ai-lab/FastVideo),使用SageAttention后端
- **LightX2V_1**: 使用SageAttention2替换原生注意力机制,采用DIT BF16+FP32(部分敏感层)混合精度计算,在保持精度的同时提升计算效率 - **LightX2V_1**: 使用SageAttention2替换原生注意力机制,采用DIT BF16+FP32(部分敏感层)混合精度计算,在保持精度的同时提升计算效率
- **LightX2V_2**: 统一使用BF16精度计算,进一步减少显存占用和计算开销,同时保持生成质量 - **LightX2V_2**: 统一使用BF16精度计算,进一步减少显存占用和计算开销,同时保持生成质量
- **LightX2V_3**: 引入FP8量化技术显著减少计算精度要求,结合Tiling VAE技术优化显存使用 - **LightX2V_3**: 引入FP8量化技术显著减少计算精度要求,结合Tiling VAE技术优化显存使用
- **LightX2V_3-Distill**: 在LightX2V_3基础上使用4步蒸馏模型(`infer_step=4`, `enable_cfg=False`),进一步减少推理步数并保持生成质量。
- **LightX2V_4**: 在LightX2V_3基础上加入TeaCache(teacache_thresh=0.2)缓存复用技术,通过智能跳过冗余计算实现加速 - **LightX2V_4**: 在LightX2V_3基础上加入TeaCache(teacache_thresh=0.2)缓存复用技术,通过智能跳过冗余计算实现加速
- **LightX2V_4-Distill**: 在LightX2V_4基础上使用4步蒸馏模型([Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v))
...@@ -96,7 +96,7 @@ class WanModel: ...@@ -96,7 +96,7 @@ class WanModel:
return weight_dict return weight_dict
def _load_quant_ckpt(self, use_bf16, skip_bf16): def _load_quant_ckpt(self, use_bf16, skip_bf16):
ckpt_path = self.config.dit_quantized_ckpt ckpt_path = self.dit_quantized_ckpt
logger.info(f"Loading quant dit model from {ckpt_path}") logger.info(f"Loading quant dit model from {ckpt_path}")
index_files = [f for f in os.listdir(ckpt_path) if f.endswith(".index.json")] index_files = [f for f in os.listdir(ckpt_path) if f.endswith(".index.json")]
...@@ -126,7 +126,7 @@ class WanModel: ...@@ -126,7 +126,7 @@ class WanModel:
return weight_dict return weight_dict
def _load_quant_split_ckpt(self, use_bf16, skip_bf16): def _load_quant_split_ckpt(self, use_bf16, skip_bf16):
lazy_load_model_path = self.config.dit_quantized_ckpt lazy_load_model_path = self.dit_quantized_ckpt
logger.info(f"Loading splited quant model from {lazy_load_model_path}") logger.info(f"Loading splited quant model from {lazy_load_model_path}")
pre_post_weight_dict = {} pre_post_weight_dict = {}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment