- 15 Nov, 2025 1 commit
-
-
Gabriel Wu authored
* [fix] NVRTC execution backend * [fmt] run pre-commit * [fix] coderabbit reviews * [test] add cuda-python to test dep * [fix] coderabbit reviews * [fix] CUDA 13 compatibility * [fix] sm90 * [fix] CUDA 13 compatibility * [fix] pre-commit * [fix] always use cuda::std::__atomic_ref_impl * [fix] restore to external API * Revert "[fix] restore to external API" This reverts commit 49bd875638fb631d270015f408991d38fd1e9a5d. * [fmt] use space instead tabs for py codegen * [fix] im2col API * [fix] revert atomic.h * [fix] dynamic shape * [refactor] extract common utils * [feat] support L2 persistent map * [fix] l2 persistent map * [fix] pre-commit * [fix] restore _TYPE_MAP * [fix] pre-commit * [fix] avoid duplicate TMA descs * [docs] add docstring * [fix] coderabbit * [fix] coderabbit * [fix] coderabbit * [fix] coderabbit
-
- 05 Nov, 2025 1 commit
-
-
Lei Wang authored
* [Feature] Add support for SM70 tensor core MMA instructions - Introduced new intrinsic `ptx_mma_sm70` for Volta GPUs, enabling m16n16k4 shape with FP16 inputs and FP16/FP32 accumulation. - Added `GemmMMASm70` class for handling GEMM operations specific to SM70 architecture. - Implemented layout functions for Volta swizzled layouts and updated existing GEMM layout inference logic. - Updated `requirements-dev.txt` to include `apache-tvm-ffi` dependency. - Added correctness evaluation script for testing GEMM operations on SM70. * [Refactor] Update formatting and installation commands in scripts - Modified `format.sh` to install `pre-commit` and `clang-tidy` with the `--user` flag for user-specific installations. - Improved readability in `correctness_evaluation_sm70.py` by adjusting the formatting of pytest parameters. - Cleaned up spacing and formatting in various C++ source files for better consistency and readability. - Removed unnecessary comments and improved layout function definitions in `mma_sm70_layout.py` and `mma_sm70_macro_generator.py` for clarity. - Ensured consistent formatting in layout initialization and swizzle functions. * typo fix
-