Cache readme (#90)

* add readme * modify readme

Cache readme (#90)
* add readme * modify readme
7a8951ba · Yang Rongjin · GitHub · d32d1b48 · d32d1b48 · d32d1b48
Commit 7a8951ba authored Jul 05, 2025 by Yang Rongjin Committed by GitHub Jul 05, 2025
18 changed files
--- a/configs/caching/adacache/readme.md
+++ b/configs/caching/adacache/readme.md
-## TODO
--- a/configs/caching/custom/readme.md
+++ b/configs/caching/custom/readme.md
-## TODO
--- a/configs/caching/custom/wan_i2v_custom_480p.json
+++ b/configs/caching/custom/wan_i2v_custom_480p.json
@@ -16,6 +16,6 @@
        [2.57151496e05, -3.54229917e04, 1.40286849e03, -1.35890334e01, 1.32517977e-01],
        [-3.02331670e02, 2.23948934e02, -5.25463970e01, 5.87348440e00, -2.01973289e-01]
    ],
-    "use_ret_steps": true,
+    "use_ret_steps": false,
    "teacache_thresh": 0.26
 }
--- a/configs/caching/custom/wan_i2v_custom_720p.json
+++ b/configs/caching/custom/wan_i2v_custom_720p.json
@@ -16,6 +16,6 @@
        [8.10705460e03, 2.13393892e03, -3.72934672e02, 1.66203073e01, -4.17769401e-02],
        [-114.36346466, 65.26524496, -18.82220707, 4.91518089, -0.23412683]
    ],
-    "use_ret_steps": true,
+    "use_ret_steps": false,
    "teacache_thresh": 0.26
 }
--- a/configs/caching/custom/wan_t2v_custom_14b.json
+++ b/configs/caching/custom/wan_t2v_custom_14b.json
@@ -17,6 +17,6 @@
        [-3.03318725e05, 4.90537029e04, -2.65530556e03, 5.87365115e01, -3.15583525e-01],
        [-5784.54975374, 5449.50911966, -1811.16591783, 256.27178429, -13.02252404]
    ],
-    "use_ret_steps": true,
+    "use_ret_steps": false,
    "teacache_thresh": 0.26
 }
--- a/configs/caching/custom/wan_t2v_custom_1_3b.json
+++ b/configs/caching/custom/wan_t2v_custom_1_3b.json
@@ -17,6 +17,6 @@
        [-5.21862437e04, 9.23041404e03, -5.28275948e02, 1.36987616e01, -4.99875664e-02],
        [2.39676752e03, -1.31110545e03, 2.01331979e02, -8.29855975e00, 1.37887774e-01]
    ],
-    "use_ret_steps": true,
+    "use_ret_steps": false,
    "teacache_thresh": 0.26
 }
--- a/configs/caching/taylorseer/readme.md
+++ b/configs/caching/taylorseer/readme.md
-## TODO
--- a/configs/caching/teacache/readme.md
+++ b/configs/caching/teacache/readme.md
-## TODO
--- a/lightx2v/models/networks/wan/infer/feature_caching/transformer_infer.py
+++ b/lightx2v/models/networks/wan/infer/feature_caching/transformer_infer.py
@@ -252,7 +252,7 @@ class WanTransformerInferTaylorCaching(WanTransformerInfer, BaseTaylorCachingTra
    # 1. taylor using caching
    def infer_block(self, weights, grid_sizes, embed, x, embed0, seq_lens, freqs, context, i):
        # 1. shift, scale, gate
-        _, _, gate_msa, _, _, c_gate_msa = self.infer_modulation(weights, embed0)
+        _, _, gate_msa, _, _, c_gate_msa = self.infer_modulation(weights.compute_phases[0], embed0)

        # 2. residual and taylor
        if self.infer_conditional:

--- a/scripts/cache/readme.md
+++ b/scripts/cache/readme.md
+# Cache
+## 缓存加速算法
+- 在扩散模型的推理过程中，缓存复用是一种重要的加速算法。
+- 其核心思想是在部分时间步跳过冗余计算，通过复用历史缓存结果提升推理效率。
+- 算法的关键在于如何决策在哪些时间步进行缓存复用，通常基于模型状态变化或误差阈值动态判断。
+- 在推理过程中，需要缓存如中间特征、残差、注意力输出等关键内容。当进入可复用时间步时，直接利用已缓存的内容，通过泰勒展开等近似方法重构当前输出，从而减少重复计算，实现高效推理。
+
+## TeaCache
+`TeaCache`的核心思想是通过对相邻时间步输入的**相对L1**距离进行累加，当累计距离达到设定阈值时，判定当前时间步可以进行缓存复用。
+- 具体来说，算法在每一步推理时计算当前输入与上一步输入的相对L1距离，并将其累加。
+- 当累计距离超过阈值，说明模型状态发生了足够的变化，则直接复用最近一次缓存的内容，跳过部分冗余计算。这样可以显著减少模型的前向计算次数，提高推理速度。
+
+实际效果上，TeaCache 在保证生成质量的前提下，实现了明显的加速。加速前后的视频对比如下：  
+
+| 加速前 | 加速后 |
+|:------:|:------:|
+| 单卡H200推理耗时：58s | 单卡H200推理耗时：17.9s |
+| ![加速前效果](../../assets/gifs/1.gif) | ![加速后效果](../../assets/gifs/2.gif) |
+- 加速比为：**3.24**
+- 参考论文：[https://arxiv.org/abs/2411.19108](https://arxiv.org/abs/2411.19108)
+
+## TaylorSeer Cache
+`TaylorSeer Cache`的核心在于利用泰勒公式对缓存内容进行再次计算，作为缓存复用时间步的残差补偿。具体做法是在缓存复用的时间步，不仅简单地复用历史缓存，还通过泰勒展开对当前输出进行近似重构。这样可以在减少计算量的同时，进一步提升输出的准确性。泰勒展开能够有效捕捉模型状态的微小变化，使得缓存复用带来的误差得到补偿，从而在加速的同时保证生成质量。`TaylorSeer Cache`适用于对输出精度要求较高的场景，能够在缓存复用的基础上进一步提升模型推理的表现。
+
+| 加速前 | 加速后 |
+|:------:|:------:|
+| 单卡H200推理耗时：57.7s | 单卡H200推理耗时：41.3s |
+| ![加速前效果](../../assets/gifs/3.gif) | ![加速后效果](../../assets/gifs/4.gif) |
+- 加速比为：**1.39**
+- 参考论文：[https://arxiv.org/abs/2503.06923](https://arxiv.org/abs/2503.06923)
+
+## AdaCache
+`AdaCache`的核心思想是根据指定block块中的部分缓存内容，动态调整缓存复用的步长。
+- 算法会分析相邻两个时间步在特定 block 内的特征差异，根据差异大小自适应地决定下一个缓存复用的时间步间隔。
+- 当模型状态变化较小时，步长自动加大，减少缓存更新频率；当状态变化较大时，步长缩小，保证输出质量。
+
+这样可以根据实际推理过程中的动态变化，灵活调整缓存策略，实现更高效的加速和更优的生成效果。AdaCache 适合对推理速度和生成质量都有较高要求的应用场景。
+
+| 加速前 | 加速后 |
+|:------:|:------:|
+| 单卡H200推理耗时：227s | 单卡H200推理耗时：83s |
+| ![加速前效果](../../assets/gifs/5.gif) | ![加速后效果](../../assets/gifs/6.gif) |
+- 加速比为：**2.73**
+- 参考论文：[https://arxiv.org/abs/2411.02397](https://arxiv.org/abs/2411.02397)
+
+## CustomCache
+`CustomCache`综合了`TeaCache`和`TaylorSeer Cache`的优势。
+- 它结合了`TeaCache`在缓存决策上的实时性和合理性，通过动态阈值判断何时进行缓存复用.
+- 同时利用`TaylorSeer`的泰勒展开方法对已缓存内容进行利用。
+
+这样不仅能够高效地决定缓存复用的时机，还能最大程度地利用缓存内容，提升输出的准确性和生成质量。实际测试表明，`CustomCache`在多个内容生成任务上，生成的视频质量优于单独使用`TeaCache、TaylorSeer Cache`或`AdaCache`的方案，是目前综合性能最优的缓存加速算法之一。
+
+| 加速前 | 加速后 |
+|:------:|:------:|
+| 单卡H200推理耗时：57.9s | 单卡H200推理耗时：16.6s |
+| ![加速前效果](../../assets/gifs/7.gif) | ![加速后效果](../../assets/gifs/8.gif) |
+- 加速比为：**3.49**
--- a/scripts/cache/run_wan_i2v_ada.sh
+++ b/scripts/cache/run_wan_i2v_ada.sh
+#!/bin/bash
+
+# set path and first
+lightx2v_path=
+model_path=
+
+# check section
+if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
+    cuda_devices=0
+    echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
+    export CUDA_VISIBLE_DEVICES=${cuda_devices}
+fi
+
+if [ -z "${lightx2v_path}" ]; then
+    echo "Error: lightx2v_path is not set. Please set this variable first."
+    exit 1
+fi
+
+if [ -z "${model_path}" ]; then
+    echo "Error: model_path is not set. Please set this variable first."
+    exit 1
+fi
+
+export TOKENIZERS_PARALLELISM=false
+
+export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
+export DTYPE=BF16
+export ENABLE_PROFILING_DEBUG=true
+export ENABLE_GRAPH_MODE=false
+
+python -m lightx2v.infer \
+--model_cls wan2.1 \
+--task t2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/caching/adacache/wan_i2v_ada.json \
+--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
+--image_path ${lightx2v_path}/assets/inputs/imgs/img_0.jpg \
+--negative_prompt "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_ada.mp4
--- a/scripts/cache/run_wan_i2v_custom.sh
+++ b/scripts/cache/run_wan_i2v_custom.sh
+#!/bin/bash
+
+# set path and first
+lightx2v_path=
+model_path=
+
+# check section
+if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
+    cuda_devices=0
+    echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
+    export CUDA_VISIBLE_DEVICES=${cuda_devices}
+fi
+
+if [ -z "${lightx2v_path}" ]; then
+    echo "Error: lightx2v_path is not set. Please set this variable first."
+    exit 1
+fi
+
+if [ -z "${model_path}" ]; then
+    echo "Error: model_path is not set. Please set this variable first."
+    exit 1
+fi
+
+export TOKENIZERS_PARALLELISM=false
+
+export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
+export DTYPE=BF16
+export ENABLE_PROFILING_DEBUG=true
+export ENABLE_GRAPH_MODE=false
+
+python -m lightx2v.infer \
+--model_cls wan2.1 \
+--task t2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/caching/custom/wan_i2v_custom_480p.json \
+--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
+--image_path ${lightx2v_path}/assets/inputs/imgs/img_0.jpg \
+--negative_prompt "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_custom.mp4
--- a/scripts/cache/run_wan_i2v_taylor.sh
+++ b/scripts/cache/run_wan_i2v_taylor.sh
+#!/bin/bash
+
+# set path and first
+lightx2v_path=
+model_path=
+
+# check section
+if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
+    cuda_devices=0
+    echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
+    export CUDA_VISIBLE_DEVICES=${cuda_devices}
+fi
+
+if [ -z "${lightx2v_path}" ]; then
+    echo "Error: lightx2v_path is not set. Please set this variable first."
+    exit 1
+fi
+
+if [ -z "${model_path}" ]; then
+    echo "Error: model_path is not set. Please set this variable first."
+    exit 1
+fi
+
+export TOKENIZERS_PARALLELISM=false
+
+export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
+export DTYPE=BF16
+export ENABLE_PROFILING_DEBUG=true
+export ENABLE_GRAPH_MODE=false
+
+python -m lightx2v.infer \
+--model_cls wan2.1 \
+--task t2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/caching/taylorseer/wan_i2v_tea_480p.json \
+--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
+--negative_prompt "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" \
+--image_path ${lightx2v_path}/assets/inputs/imgs/img_0.jpg \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_taylor.mp4
--- a/scripts/cache/run_wan_i2v_tea.sh
+++ b/scripts/cache/run_wan_i2v_tea.sh
+#!/bin/bash
+
+# set path and first
+lightx2v_path=
+model_path=
+
+# check section
+if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
+    cuda_devices=0
+    echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
+    export CUDA_VISIBLE_DEVICES=${cuda_devices}
+fi
+
+if [ -z "${lightx2v_path}" ]; then
+    echo "Error: lightx2v_path is not set. Please set this variable first."
+    exit 1
+fi
+
+if [ -z "${model_path}" ]; then
+    echo "Error: model_path is not set. Please set this variable first."
+    exit 1
+fi
+
+export TOKENIZERS_PARALLELISM=false
+
+export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
+export DTYPE=BF16
+export ENABLE_PROFILING_DEBUG=true
+export ENABLE_GRAPH_MODE=false
+
+python -m lightx2v.infer \
+--model_cls wan2.1 \
+--task t2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/caching/teacache/wan_i2v_tea_480p.json \
+--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
+--negative_prompt "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" \
+--image_path ${lightx2v_path}/assets/inputs/imgs/img_0.jpg \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_tea.mp4
--- a/scripts/cache/run_wan_t2v_ada.sh
+++ b/scripts/cache/run_wan_t2v_ada.sh
+#!/bin/bash
+
+# set path and first
+lightx2v_path=
+model_path=
+
+# check section
+if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
+    cuda_devices=0
+    echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
+    export CUDA_VISIBLE_DEVICES=${cuda_devices}
+fi
+
+if [ -z "${lightx2v_path}" ]; then
+    echo "Error: lightx2v_path is not set. Please set this variable first."
+    exit 1
+fi
+
+if [ -z "${model_path}" ]; then
+    echo "Error: model_path is not set. Please set this variable first."
+    exit 1
+fi
+
+export TOKENIZERS_PARALLELISM=false
+
+export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
+export DTYPE=BF16
+export ENABLE_PROFILING_DEBUG=true
+export ENABLE_GRAPH_MODE=false
+
+python -m lightx2v.infer \
+--model_cls wan2.1 \
+--task t2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/caching/adacache/wan_t2v_ada.json \
+--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
+--negative_prompt "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_t2v_ada.mp4
--- a/scripts/cache/run_wan_t2v_custom.sh
+++ b/scripts/cache/run_wan_t2v_custom.sh
+#!/bin/bash
+
+# set path and first
+lightx2v_path=
+model_path=
+
+# check section
+if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
+    cuda_devices=0
+    echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
+    export CUDA_VISIBLE_DEVICES=${cuda_devices}
+fi
+
+if [ -z "${lightx2v_path}" ]; then
+    echo "Error: lightx2v_path is not set. Please set this variable first."
+    exit 1
+fi
+
+if [ -z "${model_path}" ]; then
+    echo "Error: model_path is not set. Please set this variable first."
+    exit 1
+fi
+
+export TOKENIZERS_PARALLELISM=false
+
+export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
+export DTYPE=BF16
+export ENABLE_PROFILING_DEBUG=true
+export ENABLE_GRAPH_MODE=false
+
+python -m lightx2v.infer \
+--model_cls wan2.1 \
+--task t2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/caching/custom/wan_t2v_custom_1_3b.json \
+--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
+--negative_prompt "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_t2v_custom.mp4
--- a/scripts/cache/run_wan_t2v_taylor.sh
+++ b/scripts/cache/run_wan_t2v_taylor.sh
+#!/bin/bash
+
+# set path and first
+lightx2v_path=
+model_path=
+
+# check section
+if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
+    cuda_devices=0
+    echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
+    export CUDA_VISIBLE_DEVICES=${cuda_devices}
+fi
+
+if [ -z "${lightx2v_path}" ]; then
+    echo "Error: lightx2v_path is not set. Please set this variable first."
+    exit 1
+fi
+
+if [ -z "${model_path}" ]; then
+    echo "Error: model_path is not set. Please set this variable first."
+    exit 1
+fi
+
+export TOKENIZERS_PARALLELISM=false
+
+export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
+export DTYPE=BF16
+export ENABLE_PROFILING_DEBUG=true
+export ENABLE_GRAPH_MODE=false
+
+python -m lightx2v.infer \
+--model_cls wan2.1 \
+--task t2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/caching/taylorseer/wan_t2v_taylorseer.json \
+--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
+--negative_prompt "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_t2v_taylor.mp4
--- a/scripts/cache/run_wan_t2v_tea.sh
+++ b/scripts/cache/run_wan_t2v_tea.sh
+#!/bin/bash
+
+# set path and first
+lightx2v_path=
+model_path=
+
+# check section
+if [ -z "${CUDA_VISIBLE_DEVICES}" ]; then
+    cuda_devices=0
+    echo "Warn: CUDA_VISIBLE_DEVICES is not set, using default value: ${cuda_devices}, change at shell script or set env variable."
+    export CUDA_VISIBLE_DEVICES=${cuda_devices}
+fi
+
+if [ -z "${lightx2v_path}" ]; then
+    echo "Error: lightx2v_path is not set. Please set this variable first."
+    exit 1
+fi
+
+if [ -z "${model_path}" ]; then
+    echo "Error: model_path is not set. Please set this variable first."
+    exit 1
+fi
+
+export TOKENIZERS_PARALLELISM=false
+
+export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
+export DTYPE=BF16
+export ENABLE_PROFILING_DEBUG=true
+export ENABLE_GRAPH_MODE=false
+
+python -m lightx2v.infer \
+--model_cls wan2.1 \
+--task t2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/caching/teacache/wan_t2v_1_3b.json \
+--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
+--negative_prompt "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_t2v_tea.mp4