1. 02 Jan, 2025 1 commit
  2. 01 Jan, 2025 1 commit
  3. 31 Dec, 2024 1 commit
  4. 12 Dec, 2024 2 commits
  5. 03 Dec, 2024 1 commit
  6. 28 Nov, 2024 1 commit
  7. 27 Nov, 2024 3 commits
  8. 26 Nov, 2024 6 commits
  9. 25 Nov, 2024 3 commits
  10. 22 Nov, 2024 1 commit
    • schung-amd's avatar
      [CK_TILE] MakeKargs overloads for backward compatibility (#1681) · ff92222f
      schung-amd authored
      
      
      * Add overloads for MakeKargs
      
      Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.
      
      * Add overloads for MakeKargs
      
      Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.
      
      * Re-format files using ck_tile remod.py
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      ff92222f
  11. 21 Nov, 2024 2 commits
  12. 18 Nov, 2024 2 commits
  13. 14 Nov, 2024 1 commit
  14. 13 Nov, 2024 3 commits
  15. 12 Nov, 2024 1 commit
  16. 11 Nov, 2024 2 commits
  17. 09 Nov, 2024 1 commit
    • dummycoderfe's avatar
      Ck tile/moe sorting (#1624) · bec6fbc6
      dummycoderfe authored
      
      
      * add moe_sorting & check ok
      
      * fix comments & typo
      
      * Run remod.py under include/ck_tile & example/ck_tile directories
      
      * format codes
      
      * fix output ci check bug
      
      * fix moe sorting readme and error commit file
      
      * use magiv div to accelerate compute
      
      * add an loop unroll for moe lds ops
      
      * add extblocksnel to set zeros for moebufs
      
      * [Ck_tile] moe set zero run ok, add size check and fix ref check
      
      * [Ck_tile]fix moe_sorting fuse set_zero remod
      
      * [Ck_tile] change name style, fix zero buffer size err, change folder
      
      * [Ck_tile] moe_sorting: fix name style
      
      * [Ck_tile] moe_sorting, remove useless params in traits
      
      * [Ck_tile] change outputtile cnt * unit_size; change output buf alloc
      
      ---------
      Co-authored-by: default avatardummycoderfe <noplydummmycoder@163.com>
      Co-authored-by: default avatarPo Yen, Chen <PoYen.Chen@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      bec6fbc6
  18. 08 Nov, 2024 1 commit
  19. 07 Nov, 2024 1 commit
  20. 05 Nov, 2024 1 commit
  21. 02 Nov, 2024 1 commit
  22. 01 Nov, 2024 2 commits
    • rocking's avatar
      [Ck_tile] smoothquant (#1617) · fbd65454
      rocking authored
      
      
      * fix compile error
      
      * fix typo of padding
      
      * Add smoothquant op
      
      * Add smoothquant instance library
      
      * refine type
      
      * add test script
      
      * Re-generate smoothquant.hpp
      
      * Always use 'current year' in copyright
      
      * use Generic2dBlockShape instead
      
      * Add vector = 8 instance back
      
      * Find exe path automatically
      
      * Simplify the api condition
      
      * Remove debugging code
      
      * update year
      
      * Add blank line between function declaration
      
      * explicitly cast return value to dim3
      
      * refine return value
      
      * Fix default warmup and repeat value
      
      * Add comment
      
      * refactor sommthquant cmake
      
      * Add README
      
      * Fix typo
      
      ---------
      Co-authored-by: default avatarPo Yen, Chen <PoYen.Chen@amd.com>
      fbd65454
    • carlushuang's avatar
      [layernorm] hot fix (#1620) · 550248de
      carlushuang authored
      * hot fix ln
      
      * some rename
      550248de
  23. 31 Oct, 2024 1 commit
    • carlushuang's avatar
      [CK_TILE] layernorm support fused-quant/fused-add (#1604) · c3a4800c
      carlushuang authored
      * add prenorm/postnorm support, refactor using generate.py
      
      * update README
      
      * update README
      
      * fix format
      
      * update some description and fix format
      
      * update format
      
      * format
      
      * use non-raw for loading
      
      * format and update n4096
      
      * dynamic-quant ready
      
      * update readme
      
      * support fused dynamic-quant
      
      * update fused-quant, with smooth
      
      * update README
      
      * update args
      
      * update some based on comment
      c3a4800c
  24. 30 Oct, 2024 1 commit