• Lei Wang's avatar
    [JIT] Enhance cython/ctypes wrapper for tma descriptor (#126) · 7b74bb01
    Lei Wang authored
    
    
    * refactor code
    
    * enhance tutorial
    
    * Enhance error handling and code generation in CUDA and TileLang components
    
    This commit introduces several improvements across multiple files:
    - Added more informative error messages in GEMM layout checks
    - Updated CUDA codegen to support more flexible function signature generation
    - Improved TMA descriptor initialization and kernel dispatch logic
    - Refined library generation and source code parsing utilities
    - Enhanced error handling in various adapter and wrapper classes
    
    * Add thread tag validation for warp specialization
    
    Introduce a ThreadTagChecker to validate that a PrimFunc only uses threadIdx.x before applying warp specialization. This prevents unintended transformations on kernels with complex thread binding and provides a clear warning to users about potential issues with warp specialization.
    
    * Update TileLang Profiling and Compilation in Flash Decoding Examples
    
    Refactor the profiling and compilation workflow in two flash decoding example scripts:
    - Replace `tilelang.lower()` and `tilelang.Profiler()` with `tilelang.compile()`
    - Simplify profiler initialization using `get_profiler()`
    - Update method calls to use the new profiler and compiled kernel objects
    - Maintain existing performance benchmarking and validation logic
    
    * Refactor and clean up code formatting in TileLang testing and adapter modules
    
    This commit includes several code style and formatting improvements:
    - Adjust whitespace and line breaks in test files
    - Improve code formatting in CUDA source wrapper and adapter utilities
    - Enhance readability of function calls and argument handling
    - Remove unnecessary whitespace and standardize indentation
    - Simplify function signatures and argument parsing
    
    * Refactor CUDA codegen and improve code formatting
    
    This commit includes several improvements to CUDA code generation and formatting:
    - Enhance function signature generation in CodeGenTileLangCUDA
    - Improve code formatting and readability in CUDA-related files
    - Simplify parameter handling and type annotations
    - Clean up whitespace and line breaks in codegen and layout files
    
    ---------
    Co-authored-by: default avatarUbuntu <dlisuser@h100testl730RPS.xu5snccwrbtejcqqalluoku5hb.xx.internal.cloudapp.net>
    7b74bb01
rt_mod_cuda.cc 3.27 KB