• Lei Wang's avatar
    [SM70] Refactor and minor fix for SM70 (#1195) · 4a9cb470
    Lei Wang authored
    * [Feature] Add support for SM70 tensor core MMA instructions
    
    - Introduced new intrinsic `ptx_mma_sm70` for Volta GPUs, enabling m16n16k4 shape with FP16 inputs and FP16/FP32 accumulation.
    - Added `GemmMMASm70` class for handling GEMM operations specific to SM70 architecture.
    - Implemented layout functions for Volta swizzled layouts and updated existing GEMM layout inference logic.
    - Updated `requirements-dev.txt` to include `apache-tvm-ffi` dependency.
    - Added correctness evaluation script for testing GEMM operations on SM70.
    
    * [Refactor] Update formatting and installation commands in scripts
    
    - Modified `format.sh` to install `pre-commit` and `clang-tidy` with the `--user` flag for user-specific installations.
    - Improved readability in `correctness_evaluation_sm70.py` by adjusting the formatting of pytest parameters.
    - Cleaned up spacing and formatting in various C++ source files for better consistency and readability.
    - Removed unnecessary comments and improved layout function definitions in `mma_sm70_layout.py` and `mma_sm70_macro_generator.py` for clarity.
    - Ensured consistent formatting in layout initialization and swizzle functions.
    
    * typo fix
    4a9cb470
codegen_cuda.h 6.33 KB