1. 21 Oct, 2025 1 commit
  2. 28 Aug, 2025 1 commit
    • Zhengju Tang's avatar
      [Feature] Add 1D TMA support (#761) · 1774a1aa
      Zhengju Tang authored
      
      
      * [Feature] Add 1D TMA support
      - Check the contiguous conditions of 1D TMA copy
      - Add new interface and params order of `tma_load` and `tma_store` call
      - Add 1D `tma_store` interface in sm90 template
      - Add elementwise kernel for 1D TMA example
      
      * [Lint]
      
      * [BugFix] Add conditions for 1D TMA copy on non-swizzle shared tensors
      
      * [Lint]
      
      * [BugFix] 1D TMA load
      
      * [README] Update GDN README for clarity and add acknowledgements (#758)
      
      - Improved formatting and clarity of the GDN kernel implementation description.
      - Updated requirement section to list dependencies in a clearer format.
      - Added an acknowledgements section to credit the developers and the Xiaomi LLM-Core Team for their contributions.
      
      * cutlass v4.2.0 supporting cuda 13 (#760)
      
      * [Lint]
      
      * [Lint]
      
      * [MXFP4] Add test for bf16&mxfp4 gemm
      
      * [BugFix]
      
      * [Lint]
      
      ---------
      Co-authored-by: default avatarYu Cheng <54519279+chengyupku@users.noreply.github.com>
      Co-authored-by: default avatarJohnny <johnnync13@gmail.com>
      1774a1aa
  3. 07 Aug, 2025 1 commit
    • Zhengju Tang's avatar
      Gated Delta Net(GDN) kernel implementation in TileLang (#695) · 6f59668d
      Zhengju Tang authored
      * [GDN] Add examples for GDN forward and backward kernels
      
      * [Refactor] Folder structure refactor for duplicated utils
      
      * [Test] Add test script for kernels
      
      * [Refactor] Rename examples to align with the repo
      
      * [Lint] Modify README
      
      * [Update] Modified README to align upstream repo
      
      * [BugFix] Path of FLA
      
      * [Fix] Copyright and test
      
      * [Lint]
      
      * [CI] Add GDN compilation test CI
      
      * [Lint]
      
      * [BugFix] Import error of fla
      6f59668d