• Lei Wang's avatar
    [Enhancement] Introduce a smarter warp partition strategy (#396) · ca730c0a
    Lei Wang authored
    * make it python 3.8- happy
    
    * [Enhancement] Improve loop partitioning and vectorization logic in layout inference and loop vectorization
    
    - Enhanced the VisitStmt_ method to support local buffer handling in parallel loops, allowing for register usage without explicit thread binding.
    - Updated loop vectorization logic to simplify expressions and ensure accurate vector size calculations, improving performance and clarity in the vectorization process.
    
    * lint fix
    
    * [Refactor] Update warp size checks and enhance warp partitioning logic in GEMM
    
    - Changed warp_n size check from 16 to 8 in gemm_layouts.cc to improve compatibility with specific configurations.
    - Refactored warp partitioning logic in gemm.cc to prioritize N dimension for better performance based on aspect ratio.
    - Introduced a new CompileArgs dataclass in autotuner to streamline compile argument management and improve code clarity.
    
    * lint fix
    
    * [Enhancement] Initialize jit_compile in AutoTuner class
    
    - Added initialization for jit_compile attribute in the AutoTuner class to ensure it is set to None by default.
    - Updated the assignment logic for jit_compile to prevent overwriting an existing compile function, enhancing the flexibility of the AutoTuner's compilation process.
    ca730c0a
gemm.cc 11.1 KB