• Yu Cheng's avatar
    [Dev][Bugfix] Add RMS Normalization Kernels and Fix Reduce Bug (#188) · fe0de672
    Yu Cheng authored
    * [Dev][Bugfix] Add RMS Normalization Kernels and Fix Reduce Bug
    
    - Implement two RMS normalization implementations in TileLang:
      * `rms_norm_splitk`: Split-K reduction approach for large matrices
      * `rms_norm`: Full reduction kernel with simplified implementation
    - Add reference implementation using PyTorch for validation
    - Include performance benchmarking for both kernel variants
    - Demonstrate flexible block size and matrix size configurations
    
    * [Examples] Simplify RMS Normalization Kernel Compilation
    
    - Remove commented-out code for split-K RMS normalization
    - Simplify kernel compilation by removing explicit TMA lowering configuration
    - Update copyright header to Tile-AI Corporation
    - Streamline main script for RMS normalization example
    fe0de672
rms_norm.py 2.71 KB