Commit 8ec850b8 authored by helloyongyang's avatar helloyongyang
Browse files

update readme and docs

parent 31654ae2
......@@ -13,7 +13,7 @@
--------------------------------------------------------------------------------
LightX2V is a lightweight video generation inference framework designed to provide an inference tool that leverages multiple advanced video generation inference techniques. As a unified inference platform, this framework supports various generation tasks such as text-to-video (T2V) and image-to-video (I2V) across different models. X2V means transforming different input modalities (such as text or images) to video output.
**LightX2V** is a lightweight video generation inference framework designed to provide an inference tool that leverages multiple advanced video generation inference techniques. As a unified inference platform, this framework supports various generation tasks such as text-to-video (T2V) and image-to-video (I2V) across different models. X2V means transforming different input modalities (such as text or images) to video output.
## How to Start
......
......@@ -6,7 +6,7 @@
- The key to the algorithm is how to decide at which time steps to perform cache reuse, usually based on dynamic judgment of model state changes or error thresholds.
- During inference, key content such as intermediate features, residuals, and attention outputs need to be cached. When entering reusable time steps, directly use the cached content and reconstruct the current output through approximation methods like Taylor expansion, thereby reducing repetitive calculations and achieving efficient inference.
## TeaCache
### TeaCache
The core idea of `TeaCache` is to accumulate the **relative L1** distance between adjacent time step inputs, and when the cumulative distance reaches a set threshold, determine that the current time step can perform cache reuse.
- Specifically, the algorithm calculates the relative L1 distance between the current input and the previous step input at each inference step, and accumulates it.
- When the cumulative distance exceeds the threshold, indicating that the model state has changed sufficiently, it directly reuses the most recently cached content, skipping some redundant computations. This can significantly reduce the number of forward computations of the model and improve inference speed.
......@@ -20,7 +20,7 @@ In actual effect, TeaCache achieves significant acceleration while ensuring gene
- Speedup ratio: **3.24**
- Reference paper: [https://arxiv.org/abs/2411.19108](https://arxiv.org/abs/2411.19108)
## TaylorSeer Cache
### TaylorSeer Cache
The core of `TaylorSeer Cache` lies in using Taylor formula to recalculate cached content as residual compensation for cache reuse time steps. The specific approach is that at cache reuse time steps, not only simply reuse historical cache, but also approximate reconstruction of current output through Taylor expansion. This can further improve output accuracy while reducing computational load. Taylor expansion can effectively capture subtle changes in model state, compensating for errors brought by cache reuse, thereby ensuring generation quality while accelerating. `TaylorSeer Cache` is suitable for scenarios with high requirements for output precision, and can further improve model inference performance on the basis of cache reuse.
| Before Acceleration | After Acceleration |
......@@ -30,7 +30,7 @@ The core of `TaylorSeer Cache` lies in using Taylor formula to recalculate cache
- Speedup ratio: **1.39**
- Reference paper: [https://arxiv.org/abs/2503.06923](https://arxiv.org/abs/2503.06923)
## AdaCache
### AdaCache
The core idea of `AdaCache` is to dynamically adjust the step size of cache reuse based on partial cached content in specified block chunks.
- The algorithm analyzes feature differences between two adjacent time steps within specific blocks, and adaptively decides the next cache reuse time step interval based on the difference magnitude.
- When model state changes are small, the step size automatically increases, reducing cache update frequency; when state changes are large, the step size decreases to ensure output quality.
......@@ -44,7 +44,7 @@ This allows flexible adjustment of cache strategies based on dynamic changes in
- Speedup ratio: **2.73**
- Reference paper: [https://arxiv.org/abs/2411.02397](https://arxiv.org/abs/2411.02397)
## CustomCache
### CustomCache
`CustomCache` combines the advantages of `TeaCache` and `TaylorSeer Cache`.
- It combines the real-time and rationality of `TeaCache` in cache decision-making, determining when to perform cache reuse through dynamic thresholds.
- At the same time, it utilizes `TaylorSeer`'s Taylor expansion method to make use of cached content.
......@@ -56,3 +56,12 @@ This not only efficiently determines the timing of cache reuse, but also maximiz
| Single H200 inference time: 57.9s | Single H200 inference time: 16.6s |
| ![Effect before acceleration](../../../../assets/gifs/7.gif) | ![Effect after acceleration](../../../../assets/gifs/8.gif) |
- Speedup ratio: **3.49**
## How to Run
The config files for feature caching are available [here](https://github.com/ModelTC/lightx2v/tree/main/configs/caching)
By specifying --config_json to the specific config file, you can test different cache algorithms.
[Here](https://github.com/ModelTC/lightx2v/tree/main/scripts/cache) are some running scripts for use.
......@@ -6,7 +6,7 @@
- 算法的关键在于如何决策在哪些时间步进行缓存复用,通常基于模型状态变化或误差阈值动态判断。
- 在推理过程中,需要缓存如中间特征、残差、注意力输出等关键内容。当进入可复用时间步时,直接利用已缓存的内容,通过泰勒展开等近似方法重构当前输出,从而减少重复计算,实现高效推理。
## TeaCache
### TeaCache
`TeaCache`的核心思想是通过对相邻时间步输入的**相对L1**距离进行累加,当累计距离达到设定阈值时,判定当前时间步可以进行缓存复用。
- 具体来说,算法在每一步推理时计算当前输入与上一步输入的相对L1距离,并将其累加。
- 当累计距离超过阈值,说明模型状态发生了足够的变化,则直接复用最近一次缓存的内容,跳过部分冗余计算。这样可以显著减少模型的前向计算次数,提高推理速度。
......@@ -20,7 +20,7 @@
- 加速比为:**3.24**
- 参考论文:[https://arxiv.org/abs/2411.19108](https://arxiv.org/abs/2411.19108)
## TaylorSeer Cache
### TaylorSeer Cache
`TaylorSeer Cache`的核心在于利用泰勒公式对缓存内容进行再次计算,作为缓存复用时间步的残差补偿。具体做法是在缓存复用的时间步,不仅简单地复用历史缓存,还通过泰勒展开对当前输出进行近似重构。这样可以在减少计算量的同时,进一步提升输出的准确性。泰勒展开能够有效捕捉模型状态的微小变化,使得缓存复用带来的误差得到补偿,从而在加速的同时保证生成质量。`TaylorSeer Cache`适用于对输出精度要求较高的场景,能够在缓存复用的基础上进一步提升模型推理的表现。
| 加速前 | 加速后 |
......@@ -30,7 +30,7 @@
- 加速比为:**1.39**
- 参考论文:[https://arxiv.org/abs/2503.06923](https://arxiv.org/abs/2503.06923)
## AdaCache
### AdaCache
`AdaCache`的核心思想是根据指定block块中的部分缓存内容,动态调整缓存复用的步长。
- 算法会分析相邻两个时间步在特定 block 内的特征差异,根据差异大小自适应地决定下一个缓存复用的时间步间隔。
- 当模型状态变化较小时,步长自动加大,减少缓存更新频率;当状态变化较大时,步长缩小,保证输出质量。
......@@ -44,7 +44,7 @@
- 加速比为:**2.73**
- 参考论文:[https://arxiv.org/abs/2411.02397](https://arxiv.org/abs/2411.02397)
## CustomCache
### CustomCache
`CustomCache`综合了`TeaCache``TaylorSeer Cache`的优势。
- 它结合了`TeaCache`在缓存决策上的实时性和合理性,通过动态阈值判断何时进行缓存复用.
- 同时利用`TaylorSeer`的泰勒展开方法对已缓存内容进行利用。
......@@ -56,3 +56,12 @@
| 单卡H200推理耗时:57.9s | 单卡H200推理耗时:16.6s |
| ![加速前效果](../../../../assets/gifs/7.gif) | ![加速后效果](../../../../assets/gifs/8.gif) |
- 加速比为:**3.49**
## 使用方式
特征缓存的config文件在[这里](https://github.com/ModelTC/lightx2v/tree/main/configs/caching)
通过指定--config_json到具体的config文件,即可以测试不同的cache算法
[这里](https://github.com/ModelTC/lightx2v/tree/main/scripts/cache)有一些运行脚本供使用。
# Cache
## 缓存加速算法
- 在扩散模型的推理过程中,缓存复用是一种重要的加速算法。
- 其核心思想是在部分时间步跳过冗余计算,通过复用历史缓存结果提升推理效率。
- 算法的关键在于如何决策在哪些时间步进行缓存复用,通常基于模型状态变化或误差阈值动态判断。
- 在推理过程中,需要缓存如中间特征、残差、注意力输出等关键内容。当进入可复用时间步时,直接利用已缓存的内容,通过泰勒展开等近似方法重构当前输出,从而减少重复计算,实现高效推理。
# Feature Caching
## TeaCache
`TeaCache`的核心思想是通过对相邻时间步输入的**相对L1**距离进行累加,当累计距离达到设定阈值时,判定当前时间步可以进行缓存复用。
- 具体来说,算法在每一步推理时计算当前输入与上一步输入的相对L1距离,并将其累加。
- 当累计距离超过阈值,说明模型状态发生了足够的变化,则直接复用最近一次缓存的内容,跳过部分冗余计算。这样可以显著减少模型的前向计算次数,提高推理速度。
The config files for feature caching are available [here](https://github.com/ModelTC/lightx2v/tree/main/configs/caching)
实际效果上,TeaCache 在保证生成质量的前提下,实现了明显的加速。加速前后的视频对比如下:  
By specifying --config_json to the specific config file, you can test different cache algorithms.
| 加速前 | 加速后 |
|:------:|:------:|
| 单卡H200推理耗时:58s | 单卡H200推理耗时:17.9s |
| ![加速前效果](../../assets/gifs/1.gif) | ![加速后效果](../../assets/gifs/2.gif) |
- 加速比为:**3.24**
- 参考论文:[https://arxiv.org/abs/2411.19108](https://arxiv.org/abs/2411.19108)
Please refer our feature caching doc:
## TaylorSeer Cache
`TaylorSeer Cache`的核心在于利用泰勒公式对缓存内容进行再次计算,作为缓存复用时间步的残差补偿。具体做法是在缓存复用的时间步,不仅简单地复用历史缓存,还通过泰勒展开对当前输出进行近似重构。这样可以在减少计算量的同时,进一步提升输出的准确性。泰勒展开能够有效捕捉模型状态的微小变化,使得缓存复用带来的误差得到补偿,从而在加速的同时保证生成质量。`TaylorSeer Cache`适用于对输出精度要求较高的场景,能够在缓存复用的基础上进一步提升模型推理的表现。
[English doc: Feature Caching](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/cache.html)
| 加速前 | 加速后 |
|:------:|:------:|
| 单卡H200推理耗时:57.7s | 单卡H200推理耗时:41.3s |
| ![加速前效果](../../assets/gifs/3.gif) | ![加速后效果](../../assets/gifs/4.gif) |
- 加速比为:**1.39**
- 参考论文:[https://arxiv.org/abs/2503.06923](https://arxiv.org/abs/2503.06923)
## AdaCache
`AdaCache`的核心思想是根据指定block块中的部分缓存内容,动态调整缓存复用的步长。
- 算法会分析相邻两个时间步在特定 block 内的特征差异,根据差异大小自适应地决定下一个缓存复用的时间步间隔。
- 当模型状态变化较小时,步长自动加大,减少缓存更新频率;当状态变化较大时,步长缩小,保证输出质量。
这样可以根据实际推理过程中的动态变化,灵活调整缓存策略,实现更高效的加速和更优的生成效果。AdaCache 适合对推理速度和生成质量都有较高要求的应用场景。
| 加速前 | 加速后 |
|:------:|:------:|
| 单卡H200推理耗时:227s | 单卡H200推理耗时:83s |
| ![加速前效果](../../assets/gifs/5.gif) | ![加速后效果](../../assets/gifs/6.gif) |
- 加速比为:**2.73**
- 参考论文:[https://arxiv.org/abs/2411.02397](https://arxiv.org/abs/2411.02397)
## CustomCache
`CustomCache`综合了`TeaCache``TaylorSeer Cache`的优势。
- 它结合了`TeaCache`在缓存决策上的实时性和合理性,通过动态阈值判断何时进行缓存复用.
- 同时利用`TaylorSeer`的泰勒展开方法对已缓存内容进行利用。
这样不仅能够高效地决定缓存复用的时机,还能最大程度地利用缓存内容,提升输出的准确性和生成质量。实际测试表明,`CustomCache`在多个内容生成任务上,生成的视频质量优于单独使用`TeaCache、TaylorSeer Cache``AdaCache`的方案,是目前综合性能最优的缓存加速算法之一。
| 加速前 | 加速后 |
|:------:|:------:|
| 单卡H200推理耗时:57.9s | 单卡H200推理耗时:16.6s |
| ![加速前效果](../../assets/gifs/7.gif) | ![加速后效果](../../assets/gifs/8.gif) |
- 加速比为:**3.49**
[中文文档: 特征缓存](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/cache.html)
#!/bin/bash
# set path and first
lightx2v_path=
model_path=
# check section
if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
cuda_devices=0
echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
export CUDA_VISIBLE_DEVICES=${cuda_devices}
fi
if [ -z "${lightx2v_path}" ]; then
echo "Error: lightx2v_path is not set. Please set this variable first."
exit 1
fi
if [ -z "${model_path}" ]; then
echo "Error: model_path is not set. Please set this variable first."
exit 1
fi
export TOKENIZERS_PARALLELISM=false
export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
export DTYPE=BF16
export ENABLE_PROFILING_DEBUG=true
export ENABLE_GRAPH_MODE=false
python -m lightx2v.infer \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/caching/adacache/wan_i2v_ada.json \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--image_path ${lightx2v_path}/assets/inputs/imgs/img_0.jpg \
--negative_prompt "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_ada.mp4
#!/bin/bash
# set path and first
lightx2v_path=
model_path=
# check section
if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
cuda_devices=0
echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
export CUDA_VISIBLE_DEVICES=${cuda_devices}
fi
if [ -z "${lightx2v_path}" ]; then
echo "Error: lightx2v_path is not set. Please set this variable first."
exit 1
fi
if [ -z "${model_path}" ]; then
echo "Error: model_path is not set. Please set this variable first."
exit 1
fi
export TOKENIZERS_PARALLELISM=false
export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
export DTYPE=BF16
export ENABLE_PROFILING_DEBUG=true
export ENABLE_GRAPH_MODE=false
python -m lightx2v.infer \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/caching/custom/wan_i2v_custom_480p.json \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--image_path ${lightx2v_path}/assets/inputs/imgs/img_0.jpg \
--negative_prompt "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_custom.mp4
#!/bin/bash
# set path and first
lightx2v_path=
model_path=
# check section
if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
cuda_devices=0
echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
export CUDA_VISIBLE_DEVICES=${cuda_devices}
fi
if [ -z "${lightx2v_path}" ]; then
echo "Error: lightx2v_path is not set. Please set this variable first."
exit 1
fi
if [ -z "${model_path}" ]; then
echo "Error: model_path is not set. Please set this variable first."
exit 1
fi
export TOKENIZERS_PARALLELISM=false
export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
export DTYPE=BF16
export ENABLE_PROFILING_DEBUG=true
export ENABLE_GRAPH_MODE=false
python -m lightx2v.infer \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/caching/taylorseer/wan_i2v_tea_480p.json \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--negative_prompt "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" \
--image_path ${lightx2v_path}/assets/inputs/imgs/img_0.jpg \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_taylor.mp4
#!/bin/bash
# set path and first
lightx2v_path=
model_path=
# check section
if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
cuda_devices=0
echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
export CUDA_VISIBLE_DEVICES=${cuda_devices}
fi
if [ -z "${lightx2v_path}" ]; then
echo "Error: lightx2v_path is not set. Please set this variable first."
exit 1
fi
if [ -z "${model_path}" ]; then
echo "Error: model_path is not set. Please set this variable first."
exit 1
fi
export TOKENIZERS_PARALLELISM=false
export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
export DTYPE=BF16
export ENABLE_PROFILING_DEBUG=true
export ENABLE_GRAPH_MODE=false
python -m lightx2v.infer \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/caching/adacache/wan_t2v_ada.json \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--negative_prompt "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_t2v_ada.mp4
#!/bin/bash
# set path and first
lightx2v_path=
model_path=
# check section
if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
cuda_devices=0
echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
export CUDA_VISIBLE_DEVICES=${cuda_devices}
fi
if [ -z "${lightx2v_path}" ]; then
echo "Error: lightx2v_path is not set. Please set this variable first."
exit 1
fi
if [ -z "${model_path}" ]; then
echo "Error: model_path is not set. Please set this variable first."
exit 1
fi
export TOKENIZERS_PARALLELISM=false
export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
export DTYPE=BF16
export ENABLE_PROFILING_DEBUG=true
export ENABLE_GRAPH_MODE=false
python -m lightx2v.infer \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/caching/custom/wan_t2v_custom_1_3b.json \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--negative_prompt "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_t2v_custom.mp4
#!/bin/bash
# set path and first
lightx2v_path=
model_path=
# check section
if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
cuda_devices=0
echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
export CUDA_VISIBLE_DEVICES=${cuda_devices}
fi
if [ -z "${lightx2v_path}" ]; then
echo "Error: lightx2v_path is not set. Please set this variable first."
exit 1
fi
if [ -z "${model_path}" ]; then
echo "Error: model_path is not set. Please set this variable first."
exit 1
fi
export TOKENIZERS_PARALLELISM=false
export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
export DTYPE=BF16
export ENABLE_PROFILING_DEBUG=true
export ENABLE_GRAPH_MODE=false
python -m lightx2v.infer \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/caching/taylorseer/wan_t2v_taylorseer.json \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--negative_prompt "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_t2v_taylor.mp4
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment