- 05 Nov, 2025 3 commits
-
-
Lei Wang authored
* [Feature] Add support for SM70 tensor core MMA instructions - Introduced new intrinsic `ptx_mma_sm70` for Volta GPUs, enabling m16n16k4 shape with FP16 inputs and FP16/FP32 accumulation. - Added `GemmMMASm70` class for handling GEMM operations specific to SM70 architecture. - Implemented layout functions for Volta swizzled layouts and updated existing GEMM layout inference logic. - Updated `requirements-dev.txt` to include `apache-tvm-ffi` dependency. - Added correctness evaluation script for testing GEMM operations on SM70. * [Refactor] Update formatting and installation commands in scripts - Modified `format.sh` to install `pre-commit` and `clang-tidy` with the `--user` flag for user-specific installations. - Improved readability in `correctness_evaluation_sm70.py` by adjusting the formatting of pytest parameters. - Cleaned up spacing and formatting in various C++ source files for better consistency and readability. - Removed unnecessary comments and improved layout function definitions in `mma_sm70_layout.py` and `mma_sm70_macro_generator.py` for clarity. - Ensured consistent formatting in layout initialization and swizzle functions. * typo fix
-
Lei Wang authored
[Refactor] Dynamic registration of FP8 data type for compatibility with older PyTorch versions (#1197)
-
Lei Wang authored
* fix * lint fix * fix * lint fix * fix * upd * support n>256 * Remove unnecessary pass configurations for fast math in MHA forward BHSD latency script. * lint fix * lint fix
-
- 02 Nov, 2025 1 commit
-
-
Lei Wang authored
* fix * lint fix * fix * lint fix * fix * upd
-