• Lei Wang's avatar
    [CostModel][Carver] Support Hint Recommend for Shared memory Kernel Fusion (#73) · 1ef644e7
    Lei Wang authored
    * [Enhancement] Add VectorizeLoop function and update imports for compatibility
    
    * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
    
    * lint fix
    
    * Fix incorrect module reference for VectorizeLoop transformation
    
    * Refactor vectorize_loop transformation by removing unused extent mutation logic
    
    * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
    
    * Fix formatting in CUDA FP8 header file for consistency
    
    * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
    
    * Update submodule 'tvm' to latest commit for improved functionality
    
    * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
    
    * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
    
    * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
    
    * Add CUDA requirements to FP8 test cases and update references for clarity
    
    * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
    
    * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
    
    * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
    
    * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
    
    * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
    
    * Add BF16 support to matrix multiplication and introduce corresponding test cases
    
    * Add a blank line for improved readability in BF16 GEMM test
    
    * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
    
    * enhance acknowledgement
    
    * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives
    
    * Update subproject commit for TVM dependency
    
    * Update subproject commit for TVM dependency
    
    * Add int4_t type and functions for packing char values in CUDA common header
    
    * Add plot_layout example and implement GetForwardVars method in layout classes
    
    * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files
    
    * Fix formatting by removing unnecessary line break in layout.h
    
    * Refactor make_int4 function for improved readability by adjusting parameter formatting
    
    * Add legend to plot_layout for improved clarity of thread and local IDs
    
    * Remove unnecessary dependencies from requirements files for cleaner setup
    
    * Remove flash_mha.py and add .gitkeep to deepseek_mla directory
    
    * Add build requirements and update installation scripts for improved setup
    
    * Introduce carver
    
    * Refactor imports and improve code formatting for consistency
    
    * Add unit tests for carver recommendation hints
    
    * lint fix
    
    * Enhance ElementwiseTemplate and BaseTemplate with detailed docstrings for improved code documentation and clarity
    
    * Refactor import statements and clean up whitespace in template files for improved readability
    
    * Add README.md for Carver framework with usage examples and architecture support
    
    * Refactor import statement in matmul_analysis.py for consistency
    
    * Refactor TileDict and TensorCorePolicy methods for improved clarity and functionality
    
    * Add tests for general matrix multiplication emit configurations
    
    * Refactor formatting in test_tilelang_carver_generate_hints.py for improved readability
    
    * Add FlashAttentionTemplate and related functionality for hint recommendations
    
    * Refactor whitespace in FlashAttentionTemplate and test_tilelang_carver_recommend_hints for improved readability
    1ef644e7
test_tilelang_carver_recommend_hints.py 4.74 KB