1. 21 Oct, 2024 5 commits
    • rocking's avatar
      Add two pass pipeline · 93ec1681
      rocking authored
      93ec1681
    • carlushuang's avatar
      add n1536 · 4cef2fc5
      carlushuang authored
      4cef2fc5
    • carlushuang's avatar
      opt loading · 009cce41
      carlushuang authored
      009cce41
    • Po Yen Chen's avatar
      [CK_TILE] Optimize fmha splitkv & splitkv combine kernels (#1577) · 95e722a3
      Po Yen Chen authored
      * Use smaller width for lse_accum dist tensor
      
      * Update pipeline comment
      
      * Fix wrong distribution for lse_accum
      
      * Remove duplicate dim in lse_accum dist encoding
      
      * Decide fmha splitkv combine kernel kBlockSize by kM0
      
      * Remove assumption of MPerThread=1
      
      * Add log<4> & log<8> specialization
      
      * Enlarge occupancy array
      
      * Fix vector size for small tile
      
      * Add support for kMaxSplits=8
      
      * Re-format gemm.hpp
      
      * Use 16x16x16 warp gemm for fwd_splitkv
      
      * Centralize policy code changes
      
      * Leave fp8/bf8 tile settings unchanged
      95e722a3
    • carlushuang's avatar
      remove duplicated define · 24bebf15
      carlushuang authored
      24bebf15
  2. 20 Oct, 2024 4 commits
  3. 16 Oct, 2024 6 commits
  4. 15 Oct, 2024 2 commits
  5. 14 Oct, 2024 3 commits
  6. 12 Oct, 2024 3 commits
  7. 10 Oct, 2024 1 commit
    • Thomas Ning's avatar
      Ck tile gemm cshuffle & CK Tile GEMM restructure (#1535) · 6f27bc98
      Thomas Ning authored
      
      
      * ake the cshuffle compilable
      
      * modify Mhe reference on gpu and cpu. Correaccess of cshuffle
      
      * fix the cpu reference code
      
      * Complete the in tile shuffle logic
      
      * restructure the kernel template input
      
      * change the naming pattern of ck_tile gemm pipeline
      
      * Re-format files using remod.py
      
      * Solve the fmha conflict with gemm
      
      * Comment Addressed from Carlus
      
      ---------
      Co-authored-by: default avatarPo Yen, Chen <PoYen.Chen@amd.com>
      6f27bc98
  8. 08 Oct, 2024 2 commits
  9. 07 Oct, 2024 2 commits
  10. 04 Oct, 2024 1 commit
    • kylasa's avatar
      Adding seed and offset pointer support to the philox random number generator. (#1523) · c24fae23
      kylasa authored
      
      
      * Adding seed and offset pointer support to the philox random number generator.
      
      * Separating seed and offset pointer checks with different condition statements.
      
      * Changes include, adding support for device seed and offset pointers, union is used to store seed/offset values and device pointers to minimize device SGPRs.
      
      * Correcting a typo in the readme file
      
      * Re-format files using remod.py
      
      * Use STL type for API parameters
      
      * Use simpler struct design for drop_seed & drop_offset
      
      * Undo unnecessary changes
      
      * Sync kargs style for fmha_fwd.hpp/.cpp
      
      * Use templated union to reduce code
      
      * Use structured binding to make code more readable
      
      ---------
      Co-authored-by: default avatarSudhir Kylasa <sukylasa@amd.com>
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      c24fae23
  11. 01 Oct, 2024 2 commits
  12. 27 Sep, 2024 1 commit
  13. 26 Sep, 2024 1 commit
  14. 25 Sep, 2024 1 commit
  15. 22 Sep, 2024 1 commit
  16. 18 Sep, 2024 1 commit
  17. 14 Sep, 2024 1 commit
  18. 10 Sep, 2024 1 commit
  19. 07 Sep, 2024 1 commit
    • Thomas Ning's avatar
      Ck tile gemm example (#1488) · caacd388
      Thomas Ning authored
      
      
      * Checkpoint: Finished with the tile example & kernel verification, working on the different matrix layout
      
      * Finished the Matrix Layout feature set up. Note: Need to modify the inner block to solve the shuffle problem in the future.
      
      * Fix: Clang Format, API fixed from fmha
      
      * fix with better naming convention
      
      * revert back the pipeline code of fmha
      
      * Fixed: Addressed the comments and merge the GEMM shape of GEMM Operator and FMHA Operator to one.
      
      * clang format with the reference_gemm file
      
      * convert the clang format with the remod.py
      
      * Changed the format and variable name of the kernel gemm_shape and partitioner
      
      ---------
      Co-authored-by: default avatarthomasning <thomasning@banff-cyxtera-s70-4.ctr.dcgpu>
      caacd388
  20. 30 Aug, 2024 1 commit