Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
xuwx1
LightX2V
Commits
464d2424
"tests/vscode:/vscode.git/clone" did not exist on "fbff43acc9f52aec18e27806cc258a592f8b53f6"
Commit
464d2424
authored
Jul 28, 2025
by
gushiqiao
Committed by
GitHub
Jul 28, 2025
Browse files
Update quantization docs
Update quantization docs
parents
9f86d927
65dfa2f7
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
16 additions
and
22 deletions
+16
-22
docs/EN/source/method_tutorials/quantization.md
docs/EN/source/method_tutorials/quantization.md
+0
-3
docs/ZH_CN/source/method_tutorials/quantization.md
docs/ZH_CN/source/method_tutorials/quantization.md
+16
-19
No files found.
docs/EN/source/method_tutorials/quantization.md
View file @
464d2424
...
@@ -169,9 +169,6 @@ LightX2V supports custom quantization kernels that can be extended in the follow
...
@@ -169,9 +169,6 @@ LightX2V supports custom quantization kernels that can be extended in the follow
1.
**Hardware Requirements**
: FP8 quantization requires FP8-supported GPUs (such as H100, RTX 40 series)
1.
**Hardware Requirements**
: FP8 quantization requires FP8-supported GPUs (such as H100, RTX 40 series)
2.
**Precision Impact**
: Quantization will bring certain precision loss, which needs to be weighed based on application scenarios
2.
**Precision Impact**
: Quantization will bring certain precision loss, which needs to be weighed based on application scenarios
3.
**Model Compatibility**
: Ensure quantized models are compatible with inference code versions
4.
**Memory Management**
: Pay attention to memory usage when loading quantized models
5.
**Quantization Calibration**
: It is recommended to use representative datasets for quantization calibration to achieve optimal results
## 📚 Related Resources
## 📚 Related Resources
...
...
docs/ZH_CN/source/method_tutorials/quantization.md
View file @
464d2424
...
@@ -10,15 +10,15 @@ LightX2V支持多种DIT矩阵乘法量化方案,通过配置文件中的`mm_ty
...
@@ -10,15 +10,15 @@ LightX2V支持多种DIT矩阵乘法量化方案,通过配置文件中的`mm_ty
#### 支持的 mm_type 类型
#### 支持的 mm_type 类型
| mm_type | 权重量化 | 激活量化 | 计算内核 |
适用场景 |
| mm_type | 权重量化 | 激活量化 | 计算内核 |
|---------|----------|----------|----------|
----------|
|---------|----------|----------|----------|
|
`Default`
| 无量化 | 无量化 | PyTorch |
精度优先 |
|
`Default`
| 无量化 | 无量化 | PyTorch |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm`
| FP8 通道对称 | FP8 通道动态对称 | VLLM |
H100/A100高性能 |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm`
| FP8 通道对称 | FP8 通道动态对称 | VLLM |
|
`W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm`
| INT8 通道对称 | INT8 通道动态对称 | VLLM |
通用GPU兼容 |
|
`W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm`
| INT8 通道对称 | INT8 通道动态对称 | VLLM |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Q8F`
| FP8 通道对称 | FP8 通道动态对称 | Q8F |
高性能推理 |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Q8F`
| FP8 通道对称 | FP8 通道动态对称 | Q8F |
|
`W-int8-channel-sym-A-int8-channel-sym-dynamic-Q8F`
| INT8 通道对称 | INT8 通道动态对称 | Q8F |
高性能推理 |
|
`W-int8-channel-sym-A-int8-channel-sym-dynamic-Q8F`
| INT8 通道对称 | INT8 通道动态对称 | Q8F |
|
`W-fp8-block128-sym-A-fp8-channel-group128-sym-dynamic-Deepgemm`
| FP8 块对称 | FP8 通道组对称 | DeepGEMM |
大模型优化 |
|
`W-fp8-block128-sym-A-fp8-channel-group128-sym-dynamic-Deepgemm`
| FP8 块对称 | FP8 通道组对称 | DeepGEMM |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Sgl`
| FP8 通道对称 | FP8 通道动态对称 | SGL |
流式推理 |
|
`W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Sgl`
| FP8 通道对称 | FP8 通道动态对称 | SGL |
#### 量化方案详细说明
#### 量化方案详细说明
...
@@ -45,13 +45,13 @@ T5编码器支持以下量化方案:
...
@@ -45,13 +45,13 @@ T5编码器支持以下量化方案:
#### 支持的 quant_scheme 类型
#### 支持的 quant_scheme 类型
| quant_scheme | 量化精度 | 计算内核 |
适用场景 |
| quant_scheme | 量化精度 | 计算内核 |
|--------------|----------|----------|
----------|
|--------------|----------|----------|
|
`int8`
| INT8 | VLLM |
通用GPU |
|
`int8`
| INT8 | VLLM |
|
`fp8`
| FP8 | VLLM |
H100/A100 GPU |
|
`fp8`
| FP8 | VLLM |
|
`int8-torchao`
| INT8 | TorchAO |
兼容性优先 |
|
`int8-torchao`
| INT8 | TorchAO |
|
`int8-q8f`
| INT8 | Q8F |
高性能推理 |
|
`int8-q8f`
| INT8 | Q8F |
|
`fp8-q8f`
| FP8 | Q8F |
高性能推理 |
|
`fp8-q8f`
| FP8 | Q8F |
#### T5量化特性
#### T5量化特性
...
@@ -184,9 +184,6 @@ LightX2V支持自定义量化内核,可以通过以下方式扩展:
...
@@ -184,9 +184,6 @@ LightX2V支持自定义量化内核,可以通过以下方式扩展:
1.
**硬件要求**
:FP8 量化需要支持 FP8 的 GPU(如 H100、RTX40系)
1.
**硬件要求**
:FP8 量化需要支持 FP8 的 GPU(如 H100、RTX40系)
2.
**精度影响**
:量化会带来一定的精度损失,需要根据应用场景权衡
2.
**精度影响**
:量化会带来一定的精度损失,需要根据应用场景权衡
3.
**模型兼容性**
:确保量化模型与推理代码版本兼容
4.
**内存管理**
:量化模型加载时注意内存使用情况
5.
**量化校准**
:建议使用代表性数据集进行量化校准以获得最佳效果
## 📚 相关资源
## 📚 相关资源
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment