"git@developer.sourcefind.cn:yangql/composable_kernel.git" did not exist on "df228b3cf514ec23dcc1decacfc1973e7f9016d9"
[Enhancement] Introduce a smarter warp partition strategy (#396)
* make it python 3.8- happy * [Enhancement] Improve loop partitioning and vectorization logic in layout inference and loop vectorization - Enhanced the VisitStmt_ method to support local buffer handling in parallel loops, allowing for register usage without explicit thread binding. - Updated loop vectorization logic to simplify expressions and ensure accurate vector size calculations, improving performance and clarity in the vectorization process. * lint fix * [Refactor] Update warp size checks and enhance warp partitioning logic in GEMM - Changed warp_n size check from 16 to 8 in gemm_layouts.cc to improve compatibility with specific configurations. - Refactored warp partitioning logic in gemm.cc to prioritize N dimension for better performance based on aspect ratio. - Introduced a new CompileArgs dataclass in autotuner to streamline compile argument management and improve code clarity. * lint fix * [Enhancement] Initialize jit_compile in AutoTuner class - Added initialization for jit_compile attribute in the AutoTuner class to ensure it is set to None by default. - Updated the assignment logic for jit_compile to prevent overwriting an existing compile function, enhancing the flexibility of the AutoTuner's compilation process.
Showing
Please register or sign in to comment