• Lei Wang's avatar
    [Example] Add Split-K and Stream-K Examples and move MLA from fld to mla (#110) · 5cea760c
    Lei Wang authored
    * Add DeepSeek MLA decode example with Flash Attention implementation
    
    * Add GEMM SplitK and StreamK example implementations
    
    This commit introduces two new example scripts demonstrating advanced GEMM (matrix multiplication) techniques:
    - `example_tilelang_gemm_splitk.py`: Implements a Split-K GEMM kernel using TileLang
    - `example_tilelang_gemm_streamk.py`: Implements a Stream-K GEMM kernel using TileLang
    
    Both examples showcase different parallel computation strategies for matrix multiplication, with comprehensive testing using PyTorch reference implementations.
    
    * Refactor GEMM SplitK and StreamK example implementations
    
    Clean up and improve code formatting for the SplitK and StreamK GEMM example scripts:
    - Remove unused import (Profiler) in splitk example
    - Simplify line breaks and improve code readability
    - Standardize indentation and remove unnecessary whitespace
    - Optimize atomic add and copy operations for better clarity
    5cea760c
example_mla_decode.py 12.3 KB