- 11 Mar, 2025 1 commit
-
-
Yu Cheng authored
* [Dev][Bugfix] Add RMS Normalization Kernels and Fix Reduce Bug - Implement two RMS normalization implementations in TileLang: * `rms_norm_splitk`: Split-K reduction approach for large matrices * `rms_norm`: Full reduction kernel with simplified implementation - Add reference implementation using PyTorch for validation - Include performance benchmarking for both kernel variants - Demonstrate flexible block size and matrix size configurations * [Examples] Simplify RMS Normalization Kernel Compilation - Remove commented-out code for split-K RMS normalization - Simplify kernel compilation by removing explicit TMA lowering configuration - Update copyright header to Tile-AI Corporation - Streamline main script for RMS normalization example
-