1. 23 Feb, 2025 2 commits
    • Lei Wang's avatar
      [Example] Add Split-K and Stream-K Examples and move MLA from fld to mla (#110) · 5cea760c
      Lei Wang authored
      * Add DeepSeek MLA decode example with Flash Attention implementation
      
      * Add GEMM SplitK and StreamK example implementations
      
      This commit introduces two new example scripts demonstrating advanced GEMM (matrix multiplication) techniques:
      - `example_tilelang_gemm_splitk.py`: Implements a Split-K GEMM kernel using TileLang
      - `example_tilelang_gemm_streamk.py`: Implements a Stream-K GEMM kernel using TileLang
      
      Both examples showcase different parallel computation strategies for matrix multiplication, with comprehensive testing using PyTorch reference implementations.
      
      * Refactor GEMM SplitK and StreamK example implementations
      
      Clean up and improve code formatting for the SplitK and StreamK GEMM example scripts:
      - Remove unused import (Profiler) in splitk example
      - Simplify line breaks and improve code readability
      - Standardize indentation and remove unnecessary whitespace
      - Optimize atomic add and copy operations for better clarity
      5cea760c
    • Yu Cheng's avatar
      [Dev] Add MLA and GQA decode examples (#109) · 40faabb1
      Yu Cheng authored
      * [CI][Test] Add test cases for tilelang transform MultiVersionBuffer and WarpSpecialized
      
      * Relax the mismatch ratio restrictions in the flash_linear_attention and mha tests
      
      * [Dev] Add mha backward example
      
      * [Dev] Add mla decode example
      
      * bug fix
      
      * Add triton impl
      
      * Add gqa decode example
      
      * [Dev] Add GQA decode example
      
      * lint
      
      * delete unused triton example
      
      * set default profiler to 'auto'
      40faabb1