Commits · 2b1f5990653bd5767a39100d3670c0f37d89f5b7 · OpenDAS / tilelang

05 Nov, 2025 3 commits

[SM70] Refactor and minor fix for SM70 (#1195) · 4a9cb470

Lei Wang authored Nov 06, 2025

* [Feature] Add support for SM70 tensor core MMA instructions

- Introduced new intrinsic `ptx_mma_sm70` for Volta GPUs, enabling m16n16k4 shape with FP16 inputs and FP16/FP32 accumulation.
- Added `GemmMMASm70` class for handling GEMM operations specific to SM70 architecture.
- Implemented layout functions for Volta swizzled layouts and updated existing GEMM layout inference logic.
- Updated `requirements-dev.txt` to include `apache-tvm-ffi` dependency.
- Added correctness evaluation script for testing GEMM operations on SM70.

* [Refactor] Update formatting and installation commands in scripts

- Modified `format.sh` to install `pre-commit` and `clang-tidy` with the `--user` flag for user-specific installations.
- Improved readability in `correctness_evaluation_sm70.py` by adjusting the formatting of pytest parameters.
- Cleaned up spacing and formatting in various C++ source files for better consistency and readability.
- Removed unnecessary comments and improved layout function definitions in `mma_sm70_layout.py` and `mma_sm70_macro_generator.py` for clarity.
- Ensured consistent formatting in layout initialization and swizzle functions.

* typo fix

4a9cb470

[Refactor] Dynamic registration of FP8 data type for compatibility with older... · c67d66a3
Lei Wang authored Nov 05, 2025
```
[Refactor] Dynamic registration of FP8 data type for compatibility with older PyTorch versions (#1197)
```
c67d66a3

[Langauge] Support n>256 for v2 (#1182) · b66a93c5

Lei Wang authored Nov 05, 2025

* fix

* lint fix

* fix

* lint fix

* fix

* upd

* support n>256

* Remove unnecessary pass configurations for fast math in MHA forward BHSD latency script.

* lint fix

* lint fix

b66a93c5

02 Nov, 2025 1 commit
- [Language] Add Correctness and performance check scripts for V2 (#1174) · d99853b6
  Lei Wang authored Nov 03, 2025
```
* fix

* lint fix

* fix

* lint fix

* fix

* upd
```
  d99853b6