Commits · 5e52952201f3515ba24d54458de2bddc656ba235 · OpenDAS / tilelang

13 Sep, 2025 1 commit
- [Lint] Add ruff config to check for useless spaces (#807) · 5e529522
  Yichen Yan authored Sep 13, 2025
```
* update lint config

* Remove spaces for blank line

* update
```
  5e529522
19 Aug, 2025 2 commits

Add docstrings to `mxfp4` (#732) · e3a80b70

coderabbitai[bot] authored Aug 19, 2025

* 📝 Add docstrings to `mxfp4`

Docstrings generation was requested by @LeiWang1999.

* https://github.com/tile-ai/tilelang/pull/725#issuecomment-3191656561



The following files were modified:

* `examples/bitnet-1.58b/kernel_benchmark/tilelang_bitnet_158_int8xint2_prefill.py`
* `examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py`
* `examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py`
* `examples/dequantize_gemm/utils.py`
* `examples/gemm/example_gemm_autotune.py`
* `tilelang/intrinsics/utils.py`
* `tilelang/language/__init__.py`
* `tilelang/language/utils.py`
* `tilelang/quantize/mxfp.py`
* `tilelang/quantize/quantization.py`

* [Lint] More accurate docstring

* [Lint]

---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: tzj-fxz <tzjfxz@gmail.com>

e3a80b70

[Feature] Low-bit twiddling dequantization and FP4 GEMM (#725) · 24603e4a

Zhengju Tang authored Aug 19, 2025



* [Dequant] Add bit-twiddling dequantize cuda for fp4-->bf16

* [Dequant] Add extern call and serial dequantization

* [Dequant] Parallel Dequant wait for fence debug.

* [Scale] Add scale matrix to mxfp4 gemm

* [Remove] Remove fence-buggy example and some generated source cuda code

* [MXFP4] Update initial version of MXFP4 GEMM

* [Scale] Add scale to latest mxfp4 gemm

* [Lint]

* [BugFix] Load Scale, disabe TMA to recover performance

* [Lint]

* [Lint]

* [Scale] Use L2 to hold Scale and enable TMA will slightly boost performance

* [Lint]

* Update example_dequant_gemm_bf16_fp4_hopper_serial.py

* Remove deprecated dequantization examples for BF16 and MXFP4 in the dequantize_gemm directory.

* Refactor dequantization examples for improved readability and consistency. Adjusted formatting in matmul function and added spacing for clarity. Updated function signatures and comments for better understanding.

* Refactor index_to_coordinates usage in bitnet example and update dequantization example configurations. Removed the custom index_to_coordinates function and replaced it with the built-in version. Adjusted block_K parameter in dequantization example for consistency.

* lint fix

* ci fix

* Remove non-existent example

* [BugFix] Add smem swizzle to recover performance of TMA

* [BugFix] Enough reg for producer when threads=512

---------
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

24603e4a