Commit 144c10f3 authored by helloyongyang's avatar helloyongyang
Browse files

update docs

parent 0862ec5b
# 低延迟场景部署
# Deployment for Low Latency Scenarios
xxx
In low latency scenarios, we pursue faster speed, ignoring issues such as video memory and RAM overhead. We provide two solutions:
## 💡 Solution 1: Inference with Step Distillation Model
This solution can refer to the [Step Distillation Documentation](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/step_distill.html)
🧠 **Step Distillation** is a very direct acceleration inference solution for video generation models. By distilling from 50 steps to 4 steps, the time consumption will be reduced to 4/50 of the original. At the same time, under this solution, it can still be combined with the following solutions:
1. [Efficient Attention Mechanism Solution](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/attention.html)
2. [Model Quantization](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/quantization.html)
## 💡 Solution 2: Inference with Non-Step Distillation Model
Step distillation requires relatively large training resources, and the model after step distillation may have degraded video dynamic range.
For the original model without step distillation, we can use the following solutions or a combination of multiple solutions for acceleration:
1. [Parallel Inference](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/parallel.html) for multi-GPU parallel acceleration.
2. [Feature Caching](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/cache.html) to reduce the actual inference steps.
3. [Efficient Attention Mechanism Solution](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/attention.html) to accelerate Attention inference.
4. [Model Quantization](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/quantization.html) to accelerate Linear layer inference.
5. [Variable Resolution Inference](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/changing_resolution.html) to reduce the resolution of intermediate inference steps.
## ⚠️ Note
Some acceleration solutions currently cannot be used together, and we are working to resolve this issue.
If you have any questions, feel free to report bugs or request features in [🐛 GitHub Issues](https://github.com/ModelTC/lightx2v/issues)
# 低延迟场景部署
xxx
在低延迟的场景,我们会追求更快的速度,忽略显存和内存开销等问题。我们提供两套方案:
## 💡 方案一:步数蒸馏模型的推理
该方案可以参考[步数蒸馏文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/step_distill.html)
🧠 **步数蒸馏**是非常直接的视频生成模型的加速推理方案。从50步蒸馏到4步,耗时将缩短到原来的4/50。同时,该方案下,仍然可以和以下方案结合使用:
1. [高效注意力机制方案](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/attention.html)
2. [模型量化](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/quantization.html)
## 💡 方案二:非步数蒸馏模型的推理
步数蒸馏需要比较大的训练资源,以及步数蒸馏后的模型,可能会出现视频动态范围变差的问题。
对于非步数蒸馏的原始模型,我们可以使用以下方案或者多种方案结合的方式进行加速:
1. [并行推理](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/parallel.html) 进行多卡并行加速。
2. [特征缓存](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/cache.html) 降低实际推理步数。
3. [高效注意力机制方案](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/attention.html) 加速 Attention 的推理。
4. [模型量化](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/quantization.html) 加速 Linear 层的推理。
5. [变分辨率推理](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/changing_resolution.html) 降低中间推理步的分辨率。
## ⚠️ 注意
有一部分的加速方案之间目前无法结合使用,我们目前正在致力于解决这一问题。
如有问题,欢迎在 [🐛 GitHub Issues](https://github.com/ModelTC/lightx2v/issues) 中进行错误报告或者功能请求
......@@ -28,7 +28,7 @@
### U型分辨率策略
如果在刚开始的去噪步,降低分辨率,可能会导致最后的生成的视频和正常推理的生成的视频,整体差异偏大。因可以采用U型的分辨率策略,即最一开始保持几步的原始分辨率,再降低分辨率推理。
如果在刚开始的去噪步,降低分辨率,可能会导致最后的生成的视频和正常推理的生成的视频,整体差异偏大。因可以采用U型的分辨率策略,即最一开始保持几步的原始分辨率,再降低分辨率推理。
## 使用方式
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment