Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
xuwx1
LightX2V
Commits
65dfa2f7
Commit
65dfa2f7
authored
Jul 28, 2025
by
gushiqiao
Browse files
Update quantization docs
parent
e53d3cb4
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
16 additions
and
22 deletions
+16
-22
docs/EN/source/method_tutorials/quantization.md
docs/EN/source/method_tutorials/quantization.md
+0
-3
docs/ZH_CN/source/method_tutorials/quantization.md
docs/ZH_CN/source/method_tutorials/quantization.md
+16
-19
No files found.
docs/EN/source/method_tutorials/quantization.md
View file @
65dfa2f7
...
...
@@ -169,9 +169,6 @@ LightX2V supports custom quantization kernels that can be extended in the follow
1.
**Hardware Requirements**
: FP8 quantization requires FP8-supported GPUs (such as H100, RTX 40 series)
2.
**Precision Impact**
: Quantization will bring certain precision loss, which needs to be weighed based on application scenarios
3.
**Model Compatibility**
: Ensure quantized models are compatible with inference code versions
4.
**Memory Management**
: Pay attention to memory usage when loading quantized models
5.
**Quantization Calibration**
: It is recommended to use representative datasets for quantization calibration to achieve optimal results
## 📚 Related Resources
...
...
docs/ZH_CN/source/method_tutorials/quantization.md
View file @
65dfa2f7
...
...
@@ -10,15 +10,15 @@ LightX2V支持多种DIT矩阵乘法量化方案,通过配置文件中的`mm_ty
#### 支持的 mm_type 类型
| mm_type | 权重量化 | 激活量化 | 计算内核 |
适用场景 |
|---------|----------|----------|----------|
----------|
|
`Default`
| 无量化 | 无量化 | PyTorch |
精度优先 |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm`
| FP8 通道对称 | FP8 通道动态对称 | VLLM |
H100/A100高性能 |
|
`W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm`
| INT8 通道对称 | INT8 通道动态对称 | VLLM |
通用GPU兼容 |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Q8F`
| FP8 通道对称 | FP8 通道动态对称 | Q8F |
高性能推理 |
|
`W-int8-channel-sym-A-int8-channel-sym-dynamic-Q8F`
| INT8 通道对称 | INT8 通道动态对称 | Q8F |
高性能推理 |
|
`W-fp8-block128-sym-A-fp8-channel-group128-sym-dynamic-Deepgemm`
| FP8 块对称 | FP8 通道组对称 | DeepGEMM |
大模型优化 |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Sgl`
| FP8 通道对称 | FP8 通道动态对称 | SGL |
流式推理 |
| mm_type | 权重量化 | 激活量化 | 计算内核 |
|---------|----------|----------|----------|
|
`Default`
| 无量化 | 无量化 | PyTorch |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm`
| FP8 通道对称 | FP8 通道动态对称 | VLLM |
|
`W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm`
| INT8 通道对称 | INT8 通道动态对称 | VLLM |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Q8F`
| FP8 通道对称 | FP8 通道动态对称 | Q8F |
|
`W-int8-channel-sym-A-int8-channel-sym-dynamic-Q8F`
| INT8 通道对称 | INT8 通道动态对称 | Q8F |
|
`W-fp8-block128-sym-A-fp8-channel-group128-sym-dynamic-Deepgemm`
| FP8 块对称 | FP8 通道组对称 | DeepGEMM |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Sgl`
| FP8 通道对称 | FP8 通道动态对称 | SGL |
#### 量化方案详细说明
...
...
@@ -45,13 +45,13 @@ T5编码器支持以下量化方案:
#### 支持的 quant_scheme 类型
| quant_scheme | 量化精度 | 计算内核 |
适用场景 |
|--------------|----------|----------|
----------|
|
`int8`
| INT8 | VLLM |
通用GPU |
|
`fp8`
| FP8 | VLLM |
H100/A100 GPU |
|
`int8-torchao`
| INT8 | TorchAO |
兼容性优先 |
|
`int8-q8f`
| INT8 | Q8F |
高性能推理 |
|
`fp8-q8f`
| FP8 | Q8F |
高性能推理 |
| quant_scheme | 量化精度 | 计算内核 |
|--------------|----------|----------|
|
`int8`
| INT8 | VLLM |
|
`fp8`
| FP8 | VLLM |
|
`int8-torchao`
| INT8 | TorchAO |
|
`int8-q8f`
| INT8 | Q8F |
|
`fp8-q8f`
| FP8 | Q8F |
#### T5量化特性
...
...
@@ -184,9 +184,6 @@ LightX2V支持自定义量化内核,可以通过以下方式扩展:
1.
**硬件要求**
:FP8 量化需要支持 FP8 的 GPU(如 H100、RTX40系)
2.
**精度影响**
:量化会带来一定的精度损失,需要根据应用场景权衡
3.
**模型兼容性**
:确保量化模型与推理代码版本兼容
4.
**内存管理**
:量化模型加载时注意内存使用情况
5.
**量化校准**
:建议使用代表性数据集进行量化校准以获得最佳效果
## 📚 相关资源
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment