1. 11 Mar, 2025 1 commit
    • Yu Cheng's avatar
      [Dev][Bugfix] Add RMS Normalization Kernels and Fix Reduce Bug (#188) · fe0de672
      Yu Cheng authored
      * [Dev][Bugfix] Add RMS Normalization Kernels and Fix Reduce Bug
      
      - Implement two RMS normalization implementations in TileLang:
        * `rms_norm_splitk`: Split-K reduction approach for large matrices
        * `rms_norm`: Full reduction kernel with simplified implementation
      - Add reference implementation using PyTorch for validation
      - Include performance benchmarking for both kernel variants
      - Demonstrate flexible block size and matrix size configurations
      
      * [Examples] Simplify RMS Normalization Kernel Compilation
      
      - Remove commented-out code for split-K RMS normalization
      - Simplify kernel compilation by removing explicit TMA lowering configuration
      - Update copyright header to Tile-AI Corporation
      - Streamline main script for RMS normalization example
      fe0de672