• Lei Wang's avatar
    [Refactor] Phaseout tf32 Casting from GEMM Templates (#573) · 9ba8b480
    Lei Wang authored
    * [Feature] Add Quarter Bank Swizzle Layout and Update GEMM Layout Logic
    
    - Introduced a new `makeQuarterBankSwizzleLayout` function for layout swizzling of 32 bytes.
    - Updated `makeGemmABLayout` to include an `enable_padding` parameter, allowing for conditional layout selection between padded and quarter bank swizzle layouts.
    - Adjusted layout inference in GEMM operations to utilize the new quarter bank swizzle layout when appropriate.
    - Enhanced bulk copy operations to recognize and handle the new layout type, improving memory access patterns.
    
    * lint fix
    
    * [Refactor] Update GEMM Layout Functions and Inference Logic
    
    - Removed the `enable_padding` parameter from `makeGemmABLayout` to simplify its signature.
    - Introduced `makeGemmABLayoutHopper` for enhanced layout handling specific to Hopper architecture.
    - Updated layout inference in GEMM operations to utilize the new `makeGemmABLayoutHopper` function, improving clarity and maintainability in layout selection.
    - Adjusted related layout functions to ensure consistent behavior across different architectures.
    
    * [Refactor] Remove tf32 Casting Logic from GEMM Templates
    
    - Eliminated the `cast_float_to_tf32` function from `gemm_sm80`, `gemm_sm89`, and `gemm_sm90` templates to streamline the code.
    - Removed conditional casting logic for float32 to tfloat32 conversion, enhancing clarity and maintainability.
    - Updated relevant sections in GEMM operations to reflect the removal of casting, ensuring consistent behavior across templates.
    - Adjusted tensor view handling to improve performance and accuracy in matrix operations.
    
    * Update bulk_copy.cc
    
    * Fix profiler initialization in GEMM test by removing TensorSupplyType argument for improved flexibility.
    9ba8b480
test_tilelang_kernel_gemm.py 11.1 KB