1. 21 Nov, 2024 2 commits
  2. 20 Nov, 2024 2 commits
    • Illia Silin's avatar
      Optimize docker file. (#1679) · d31e8249
      Illia Silin authored
      * reduce the docker image size and layers
      
      * clean up docker file
      
      * fix linker error for client example 24
      
      * install CK into the default /opt/rocm/ path
      
      * restore installing CK to alternative path in CI
      
      * add linking for utility lib
      d31e8249
    • Haocong WANG's avatar
      fix bug (#1680) · 81ec5eff
      Haocong WANG authored
      81ec5eff
  3. 19 Nov, 2024 2 commits
  4. 18 Nov, 2024 2 commits
  5. 15 Nov, 2024 3 commits
  6. 14 Nov, 2024 2 commits
  7. 13 Nov, 2024 3 commits
  8. 12 Nov, 2024 2 commits
  9. 11 Nov, 2024 3 commits
  10. 09 Nov, 2024 2 commits
    • dummycoderfe's avatar
      Ck tile/moe sorting (#1624) · bec6fbc6
      dummycoderfe authored
      
      
      * add moe_sorting & check ok
      
      * fix comments & typo
      
      * Run remod.py under include/ck_tile & example/ck_tile directories
      
      * format codes
      
      * fix output ci check bug
      
      * fix moe sorting readme and error commit file
      
      * use magiv div to accelerate compute
      
      * add an loop unroll for moe lds ops
      
      * add extblocksnel to set zeros for moebufs
      
      * [Ck_tile] moe set zero run ok, add size check and fix ref check
      
      * [Ck_tile]fix moe_sorting fuse set_zero remod
      
      * [Ck_tile] change name style, fix zero buffer size err, change folder
      
      * [Ck_tile] moe_sorting: fix name style
      
      * [Ck_tile] moe_sorting, remove useless params in traits
      
      * [Ck_tile] change outputtile cnt * unit_size; change output buf alloc
      
      ---------
      Co-authored-by: default avatardummycoderfe <noplydummmycoder@163.com>
      Co-authored-by: default avatarPo Yen, Chen <PoYen.Chen@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      bec6fbc6
    • Po Yen Chen's avatar
  11. 08 Nov, 2024 2 commits
  12. 07 Nov, 2024 1 commit
  13. 06 Nov, 2024 2 commits
  14. 05 Nov, 2024 6 commits
  15. 04 Nov, 2024 1 commit
  16. 02 Nov, 2024 1 commit
  17. 01 Nov, 2024 3 commits
    • Illia Silin's avatar
      Reduce build time. (#1621) · 03c6448b
      Illia Silin authored
      * disable fp8 gemm_universal on gfx90a and gfx908 by default
      
      * fix cmake syntax
      
      * fix clang format
      
      * add ifdefs in amd_xdlops
      
      * disable fp8 gemm instances on gfx90a by default
      
      * update readme
      03c6448b
    • rocking's avatar
      [Ck_tile] smoothquant (#1617) · fbd65454
      rocking authored
      
      
      * fix compile error
      
      * fix typo of padding
      
      * Add smoothquant op
      
      * Add smoothquant instance library
      
      * refine type
      
      * add test script
      
      * Re-generate smoothquant.hpp
      
      * Always use 'current year' in copyright
      
      * use Generic2dBlockShape instead
      
      * Add vector = 8 instance back
      
      * Find exe path automatically
      
      * Simplify the api condition
      
      * Remove debugging code
      
      * update year
      
      * Add blank line between function declaration
      
      * explicitly cast return value to dim3
      
      * refine return value
      
      * Fix default warmup and repeat value
      
      * Add comment
      
      * refactor sommthquant cmake
      
      * Add README
      
      * Fix typo
      
      ---------
      Co-authored-by: default avatarPo Yen, Chen <PoYen.Chen@amd.com>
      fbd65454
    • carlushuang's avatar
      [layernorm] hot fix (#1620) · 550248de
      carlushuang authored
      * hot fix ln
      
      * some rename
      550248de
  18. 31 Oct, 2024 1 commit
    • carlushuang's avatar
      [CK_TILE] layernorm support fused-quant/fused-add (#1604) · c3a4800c
      carlushuang authored
      * add prenorm/postnorm support, refactor using generate.py
      
      * update README
      
      * update README
      
      * fix format
      
      * update some description and fix format
      
      * update format
      
      * format
      
      * use non-raw for loading
      
      * format and update n4096
      
      * dynamic-quant ready
      
      * update readme
      
      * support fused dynamic-quant
      
      * update fused-quant, with smooth
      
      * update README
      
      * update args
      
      * update some based on comment
      c3a4800c