1. 05 Nov, 2025 1 commit
    • Lei Wang's avatar
      [SM70] Refactor and minor fix for SM70 (#1195) · 4a9cb470
      Lei Wang authored
      * [Feature] Add support for SM70 tensor core MMA instructions
      
      - Introduced new intrinsic `ptx_mma_sm70` for Volta GPUs, enabling m16n16k4 shape with FP16 inputs and FP16/FP32 accumulation.
      - Added `GemmMMASm70` class for handling GEMM operations specific to SM70 architecture.
      - Implemented layout functions for Volta swizzled layouts and updated existing GEMM layout inference logic.
      - Updated `requirements-dev.txt` to include `apache-tvm-ffi` dependency.
      - Added correctness evaluation script for testing GEMM operations on SM70.
      
      * [Refactor] Update formatting and installation commands in scripts
      
      - Modified `format.sh` to install `pre-commit` and `clang-tidy` with the `--user` flag for user-specific installations.
      - Improved readability in `correctness_evaluation_sm70.py` by adjusting the formatting of pytest parameters.
      - Cleaned up spacing and formatting in various C++ source files for better consistency and readability.
      - Removed unnecessary comments and improved layout function definitions in `mma_sm70_layout.py` and `mma_sm70_macro_generator.py` for clarity.
      - Ensured consistent formatting in layout initialization and swizzle functions.
      
      * typo fix
      4a9cb470