1. 25 Feb, 2025 1 commit
  2. 23 Feb, 2025 1 commit
    • Yu Cheng's avatar
      [Dev] Add MLA and GQA decode examples (#109) · 40faabb1
      Yu Cheng authored
      * [CI][Test] Add test cases for tilelang transform MultiVersionBuffer and WarpSpecialized
      
      * Relax the mismatch ratio restrictions in the flash_linear_attention and mha tests
      
      * [Dev] Add mha backward example
      
      * [Dev] Add mla decode example
      
      * bug fix
      
      * Add triton impl
      
      * Add gqa decode example
      
      * [Dev] Add GQA decode example
      
      * lint
      
      * delete unused triton example
      
      * set default profiler to 'auto'
      40faabb1
  3. 25 Jan, 2025 2 commits
    • Yu Cheng's avatar
      [CI][Test] Add test cases for tilelang kernel FlashAttention (#54) · bedab1a0
      Yu Cheng authored
      * [Dev] Add FlashDecoding example
      
      * [CI][Test] Add test cases for tilelang kernel convolution
      
      * [CI][Test] Add test cases for tilelang kernel FlashAttention
      
      * Reduce the number of stages to ensure the shared memory allocation is valid
      
      * Temporarily remove the dim128 case
      
      * lint
      
      * update einops in requirements-dev.txt
      
      * update einops in requirements-test.txt
      
      * remove einops in requirements-dev.txt
      bedab1a0
    • Lei Wang's avatar
      [Doc] Remove unnecessary layout annotation (#49) · 47ecc791
      Lei Wang authored
      * [Doc] Update documentation structure and content: add overview section, revise project name, and change theme to Furo
      
      * [Feature] Add device-side debug printing functions and integrate into kernel interface
      
      * lint fix
      
      * remove debug print
      
      * implement test for debug
      
      * lint fix
      
      * add some comments
      
      * Enhance fragment design and assert fragment print
      
      * enhance debug print
      
      * add test for msg
      
      * lint fix
      
      * format
      
      * add flash decoding exmaples
      
      * remove comment
      
      * test simplified
      47ecc791