• Lei Wang's avatar
    [Feature] Implement Swizzle 32B (#566) · ae9668a8
    Lei Wang authored
    * [Feature] Add Quarter Bank Swizzle Layout and Update GEMM Layout Logic
    
    - Introduced a new `makeQuarterBankSwizzleLayout` function for layout swizzling of 32 bytes.
    - Updated `makeGemmABLayout` to include an `enable_padding` parameter, allowing for conditional layout selection between padded and quarter bank swizzle layouts.
    - Adjusted layout inference in GEMM operations to utilize the new quarter bank swizzle layout when appropriate.
    - Enhanced bulk copy operations to recognize and handle the new layout type, improving memory access patterns.
    
    * lint fix
    
    * [Refactor] Update GEMM Layout Functions and Inference Logic
    
    - Removed the `enable_padding` parameter from `makeGemmABLayout` to simplify its signature.
    - Introduced `makeGemmABLayoutHopper` for enhanced layout handling specific to Hopper architecture.
    - Updated layout inference in GEMM operations to utilize the new `makeGemmABLayoutHopper` function, improving clarity and maintainability in layout selection.
    - Adjusted related layout functions to ensure consistent behavior across different architectures.
    
    * Update bulk_copy.cc
    
    * Update __init__.py
    ae9668a8
layout.h 6.46 KB