• Lei Wang's avatar
    [Smem Reuse] Optimize to do memory alignment on identical buffers. (#693) · 17fafc1b
    Lei Wang authored
    * [Enhancement] Refactor GEMM operations for improved warp partitioning and target instruction handling
    
    - Introduced a new `GetGemmInst` method to determine the appropriate GEMM instruction based on block size and target architecture.
    - Updated `ComputeWarpPartition` to accept the GEMM instruction type, enhancing flexibility in warp partitioning logic.
    - Added `TargetGetWarpSize` utility to streamline warp size retrieval based on target architecture.
    - Refactored layout inference and lowering methods to utilize the new GEMM instruction handling, improving clarity and maintainability of the codebase.
    
    * bug fix
    
    * test fix
    
    * lint fix
    
    * phase out Canonialize
    
    * add option --expt-relaxed-constexpr
    
    * [Enhancement] Introduce tilelang intrinsic operations for GEMM
    
    - Added `tl_gemm` and `tl_gemm_sp` built-in operations to support general and sparse matrix multiplication in tilelang.
    - Updated the lowering logic in `Gemm` and `GemmSP` to utilize the new tilelang operations.
    - Enhanced CUDA and HIP code generation to handle the new GEMM operations, ensuring proper argument validation and external call printing.
    - Implemented shared memory alignment planning for GEMM operations to optimize performance on supported architectures.
    
    * lint fix
    
    * lint fix
    
    * test fix
    
    * test fix
    
    * rebase
    
    * Update builtin.cc
    17fafc1b
builtin.h 7.18 KB