update doc

7423e576 · helloyongyang · 708ea9d7 · 7423e576 · 7423e576 · 7423e576
Commit 7423e576 authored Jul 11, 2025 by helloyongyang
4 changed files
--- a/docs/EN/source/method_tutorials/cache.md
+++ b/docs/EN/source/method_tutorials/cache.md
@@ -9,7 +9,7 @@
 ### TeaCache
 The core idea of `TeaCache` is to accumulate the **relative L1** distance between adjacent time step inputs, and when the cumulative distance reaches a set threshold, determine that the current time step can perform cache reuse.
 - Specifically, the algorithm calculates the relative L1 distance between the current input and the previous step input at each inference step, and accumulates it.
- When the cumulative distance exceeds the threshold, indicating that the model state has changed sufficiently, it directly reuses the most recently cached content, skipping some redundant computations. This can significantly reduce the number of forward computations of the model and improve inference speed.
+- When the cumulative distance exceeds the threshold, indicating that the model state changes are not significant, it directly reuses the most recently cached content, skipping some redundant computations. This can significantly reduce the number of forward computations of the model and improve inference speed.
 In actual effect, TeaCache achieves significant acceleration while ensuring generation quality. The video comparison before and after acceleration is as follows:

--- a/docs/EN/source/method_tutorials/quantization.md
+++ b/docs/EN/source/method_tutorials/quantization.md
 # Model Quantization
-lightx2v supports quantized inference for linear layers in **Dit**, enabling `w8a8-int8` and `w8a8-fp8` matrix multiplication.
+lightx2v supports quantized inference for linear layers in **Dit**, enabling `w8a8-int8`, `w8a8-fp8`, `w8a8-fp8block`, `w8a8-mxfp8` and `w4a4-nvfp4` matrix multiplication.
-## Generating Quantized Models
-### Automatic Quantization
-lightx2v supports automatic weight quantization during inference. Refer to the [configuration file](https://github.com/ModelTC/lightx2v/tree/main/configs/quantization/wan_i2v_quant_auto.json).
+## Generating Quantized Models
-**Key configuration**:
-Set `"mm_config": {"mm_type": "W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm", "weight_auto_quant": true}`.
- `mm_type`: Specifies the quantized operator
- `weight_auto_quant: true`: Enables automatic model quantization
 ### Offline Quantization

--- a/docs/ZH_CN/source/method_tutorials/cache.md
+++ b/docs/ZH_CN/source/method_tutorials/cache.md
@@ -9,7 +9,7 @@
 ### TeaCache
 `TeaCache`的核心思想是通过对相邻时间步输入的**相对L1**距离进行累加，当累计距离达到设定阈值时，判定当前时间步可以进行缓存复用。
 - 具体来说，算法在每一步推理时计算当前输入与上一步输入的相对L1距离，并将其累加。
- 当累计距离超过阈值，说明模型状态发生了足够的变化，则直接复用最近一次缓存的内容，跳过部分冗余计算。这样可以显著减少模型的前向计算次数，提高推理速度。
+- 当累计距离超过阈值，说明模型状态变化不明显，则直接复用最近一次缓存的内容，跳过部分冗余计算。这样可以显著减少模型的前向计算次数，提高推理速度。
 实际效果上，TeaCache 在保证生成质量的前提下，实现了明显的加速。加速前后的视频对比如下：  

--- a/docs/ZH_CN/source/method_tutorials/quantization.md
+++ b/docs/ZH_CN/source/method_tutorials/quantization.md
 # 模型量化
-lightx2v支持对`Dit`中的线性层进行量化推理，支持`w8a8-int8`和`w8a8-fp8`的矩阵乘法。
+lightx2v支持对`Dit`中的线性层进行量化推理，支持`w8a8-int8`, `w8a8-fp8`, `w8a8-fp8block`, `w8a8-mxfp8` 和 `w4a4-nvfp4`的矩阵乘法。
 ## 生产量化模型
-### 自动量化
-lightx2v支持推理时自动对模型权重进行量化，具体可参考[配置文件](https://github.com/ModelTC/lightx2v/tree/main/configs/quantization/wan_i2v_quant_auto.json)。
-值得注意的是，需要将配置文件的**mm_config**进行设置：**"mm_config": {"mm_type": "W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm","weight_auto_quant": true }**， **mm_type**代表希望使用的量化算子，**weight_auto_quant：true**代表自动转量化模型。
 ### 离线量化
 lightx2v同时支持直接加载量化好的权重进行推理，对模型进行离线量化可参考[文档](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md)。