Commit d013cac7 authored by helloyongyang's avatar helloyongyang
Browse files

update doc

parent 6a09ef10
......@@ -6,9 +6,13 @@
Target data format reference: [MX-Formats](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). Note that we do not need to pack raw data and scale factors together here.
Source data format: fp16/bf16
Target data format: mxfp4/6/8
Quantization factor data format: E8M0, *Per-Row/Per-Column quantization typically stores quantization factors in fp32, whereas E8M0 has the same numerical range as fp32. After rounding, the quantization factors can be stored directly, though the loss of mantissa bits may affect precision.*
Quantization factor data format: E8M0, Per-Row/Per-Column quantization typically stores quantization factors in fp32, whereas E8M0 has the same numerical range as fp32. After rounding, the quantization factors can be stored directly, though the loss of mantissa bits may affect precision.
Quantization granularity: \[1X32\]
Quantization dimension: Following Cutlass GEMM conventions, where M, N, K represent the three dimensions of matrix multiplication, we should quantize along K dimension.
### Rounding and Clamp
......
......@@ -6,9 +6,13 @@
目标数据格式参考:[MX-Formats](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf),需要注意的是,我们这里不需要将raw data和scale factor打包在一起
源数据格式:fp16/bf16
目标数据格式:mxfp4/6/8
量化因子数据格式:E8M0, *Per-Row/Per-Column量化的量化因子一般以fp32进行存储,而E8M0与fp32数值范围一致,经过rounding后可直接存储量化因子,缺点是尾数的丢失会影响精度。*
量化因子数据格式:E8M0, Per-Row/Per-Column量化的量化因子一般以fp32进行存储,而E8M0与fp32数值范围一致,经过rounding后可直接存储量化因子,缺点是尾数的丢失会影响精度。
量化粒度:\[1X32\]
量化维度:以Cutlass GEMM的规范,M N K表示矩阵乘的三个维度,需要沿着K维度量化
### Rounding与Clamp
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment