update docs

144c10f3 · helloyongyang · 0862ec5b · 144c10f3 · 144c10f3 · 144c10f3
Commit 144c10f3 authored Jul 21, 2025 by helloyongyang
3 changed files
--- a/docs/EN/source/deploy_guides/for_low_latency.md
+++ b/docs/EN/source/deploy_guides/for_low_latency.md
-# 低延迟场景部署
+# Deployment for Low Latency Scenarios

-xxx
+In low latency scenarios, we pursue faster speed, ignoring issues such as video memory and RAM overhead. We provide two solutions:
+
+## 💡 Solution 1: Inference with Step Distillation Model
+
+This solution can refer to the [Step Distillation Documentation](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/step_distill.html)
+
+🧠 **Step Distillation** is a very direct acceleration inference solution for video generation models. By distilling from 50 steps to 4 steps, the time consumption will be reduced to 4/50 of the original. At the same time, under this solution, it can still be combined with the following solutions:
+1. [Efficient Attention Mechanism Solution](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/attention.html)
+2. [Model Quantization](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/quantization.html)
+
+## 💡 Solution 2: Inference with Non-Step Distillation Model
+
+Step distillation requires relatively large training resources, and the model after step distillation may have degraded video dynamic range.
+
+For the original model without step distillation, we can use the following solutions or a combination of multiple solutions for acceleration:
+
+1. [Parallel Inference](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/parallel.html) for multi-GPU parallel acceleration.
+2. [Feature Caching](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/cache.html) to reduce the actual inference steps.
+3. [Efficient Attention Mechanism Solution](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/attention.html) to accelerate Attention inference.
+4. [Model Quantization](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/quantization.html) to accelerate Linear layer inference.
+5. [Variable Resolution Inference](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/changing_resolution.html) to reduce the resolution of intermediate inference steps.
+
+## ⚠️ Note
+
+Some acceleration solutions currently cannot be used together, and we are working to resolve this issue.
+
+If you have any questions, feel free to report bugs or request features in [🐛 GitHub Issues](https://github.com/ModelTC/lightx2v/issues)
--- a/docs/ZH_CN/source/deploy_guides/for_low_latency.md
+++ b/docs/ZH_CN/source/deploy_guides/for_low_latency.md
 # 低延迟场景部署

-xxx
+在低延迟的场景，我们会追求更快的速度，忽略显存和内存开销等问题。我们提供两套方案：
+
+## 💡 方案一：步数蒸馏模型的推理
+
+该方案可以参考[步数蒸馏文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/step_distill.html)
+
+🧠 **步数蒸馏**是非常直接的视频生成模型的加速推理方案。从50步蒸馏到4步，耗时将缩短到原来的4/50。同时，该方案下，仍然可以和以下方案结合使用：
+1. [高效注意力机制方案](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/attention.html)
+2. [模型量化](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/quantization.html)
+
+## 💡 方案二：非步数蒸馏模型的推理
+
+步数蒸馏需要比较大的训练资源，以及步数蒸馏后的模型，可能会出现视频动态范围变差的问题。
+
+对于非步数蒸馏的原始模型，我们可以使用以下方案或者多种方案结合的方式进行加速：
+
+1. [并行推理](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/parallel.html) 进行多卡并行加速。
+2. [特征缓存](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/cache.html) 降低实际推理步数。
+3. [高效注意力机制方案](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/attention.html) 加速 Attention 的推理。
+4. [模型量化](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/quantization.html) 加速 Linear 层的推理。
+5. [变分辨率推理](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/changing_resolution.html) 降低中间推理步的分辨率。
+
+## ⚠️ 注意
+
+有一部分的加速方案之间目前无法结合使用，我们目前正在致力于解决这一问题。
+
+如有问题，欢迎在 [🐛 GitHub Issues](https://github.com/ModelTC/lightx2v/issues) 中进行错误报告或者功能请求
--- a/docs/ZH_CN/source/method_tutorials/changing_resolution.md
+++ b/docs/ZH_CN/source/method_tutorials/changing_resolution.md
@@ -28,7 +28,7 @@

 ### U型分辨率策略

-如果在刚开始的去噪步，降低分辨率，可能会导致最后的生成的视频和正常推理的生成的视频，整体差异偏大。因为可以采用U型的分辨率策略，即最一开始保持几步的原始分辨率，再降低分辨率推理。
+如果在刚开始的去噪步，降低分辨率，可能会导致最后的生成的视频和正常推理的生成的视频，整体差异偏大。因此可以采用U型的分辨率策略，即最一开始保持几步的原始分辨率，再降低分辨率推理。

 ## 使用方式