1. 29 Oct, 2024 1 commit
  2. 14 Oct, 2024 1 commit
  3. 07 Oct, 2024 1 commit
  4. 27 Sep, 2024 1 commit
  5. 20 Sep, 2024 1 commit
  6. 17 Sep, 2024 1 commit
  7. 16 Sep, 2024 1 commit
  8. 13 Sep, 2024 1 commit
  9. 12 Sep, 2024 1 commit
  10. 11 Sep, 2024 1 commit
    • jakpiase's avatar
      Rewrite pool2d fwd (#1462) · e8d2887c
      jakpiase authored
      
      
      * added pool2d fwd
      
      * add tests
      
      * add reviewers changes
      
      * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new"
      
      This reverts commit 6b2ba7ff8960b0a6ddbe30d8dac53eeb55a8597e, reversing
      changes made to 22c82bea0caf3e0f29399100c1bb67b8003fc042.
      
      * Revert "add reviewers changes"
      
      This reverts commit 22c82bea0caf3e0f29399100c1bb67b8003fc042.
      
      * added reviewers comments
      
      * revert some old files
      
      * add reviewers requests
      
      ---------
      Co-authored-by: default avatarAdam Osewski <19374865+aosewski@users.noreply.github.com>
      e8d2887c
  11. 05 Sep, 2024 1 commit
  12. 04 Sep, 2024 1 commit
  13. 03 Sep, 2024 1 commit
  14. 26 Aug, 2024 1 commit
    • Illia Silin's avatar
      Enable daily ninja build traces. (#1487) · 19d22e60
      Illia Silin authored
      * add ninja trace to CI builds
      
      * fix ninja trace logic
      
      * update the ninja trace logic in jenkins file
      
      * limit the number of threads to run ninja build
      
      * use ninja for installation after build
      
      * update the path to ninjatracing tool
      
      * use ninja to run check when using build trace
      
      * fix jenkins logic
      
      * fix typos
      
      * set proper setup_args for all stages
      
      * fix ninja syntax
      
      * replace ninja check with ninja test
      
      * enable ninja tracing with mainline and staging compilers
      19d22e60
  15. 22 Aug, 2024 1 commit
  16. 21 Aug, 2024 1 commit
    • Rostyslav Geyyer's avatar
      Set RNE fp8 conversion as a default (#1458) · e20f20ef
      Rostyslav Geyyer authored
      * Set RNE fp8 conversion as a default
      
      * Update f8 tests
      
      * Disable failing test on gfx11
      
      * Update bf8 tests
      
      * Add a flag
      
      * Fix the flag
      
      * Raise flag for gfx10 as well
      
      * Temp commit for tolerance testing
      
      * Update tolerances
      e20f20ef
  17. 16 Aug, 2024 1 commit
  18. 14 Aug, 2024 1 commit
    • Haocong WANG's avatar
      [GEMM] gemm_universal related optimization (#1453) · 3049b546
      Haocong WANG authored
      
      
      * replace buffer_atomic with global_atomic
      
      * fixed global_atomic_add
      
      * added bf16 atomic_add
      
      * format
      
      * clang-format-12
      
      * clean
      
      * clean
      
      * add guards
      
      * Update gtest.cmake
      
      * enabled splitk_gemm_multi_d
      
      * format
      
      * add ckProfiler
      
      * format
      
      * fixed naming
      
      * format
      
      * clean
      
      * clean
      
      * add guards
      
      * fix clang format
      
      * format
      
      * add kbatch printout
      
      * clean
      
      * Add rocm6.2 related gemm optimization
      
      * Limit bf16 atomic usage
      
      * remove redundant RCR gemm_universal instance
      
      * Add RRR fp8 gemm universal instance
      
      * Bug fix
      
      * Add GPU_TARGET guard to FP8/BF8 target
      
      * bug fix
      
      * update cmake
      
      * remove all fp8/bf8 example if arch not support
      
      * Enable fp8 RRR support in ckProfiler
      
      * limit greedy-reverse flag to gemm_universal in ckProfiler
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      3049b546
  19. 12 Aug, 2024 2 commits
  20. 06 Aug, 2024 1 commit
    • Bartłomiej Kocot's avatar
      Add Grouped Conv Fwd Large Tensor kernel (#1432) · 4ec5c52a
      Bartłomiej Kocot authored
      * Support 64 bit indexing
      
      * Add new grouped conv fwd kernel for large tensors
      
      * Add instances large tensor
      
      * Fixes for transform conv to gemm
      
      * Fixes
      
      * fixes
      
      * Remove not needed instances
      
      * examples fixes
      
      * Remove not need ds arrays
      
      * Fix tests
      
      * Add 2GB check in gridwise dl
      
      * Fixes
      4ec5c52a
  21. 01 Aug, 2024 1 commit
  22. 19 Jul, 2024 1 commit
    • Haocong WANG's avatar
      [GEMM] F8 GEMM, performance optimized. (#1384) · 8c90f25b
      Haocong WANG authored
      
      
      * add ab_scale init support
      
      * enabled interwave
      
      * add scale type; update isSupport
      
      * adjust example
      
      * clean
      
      * enable f8 pure gemm rcr ckprofiler
      
      * Add gemm_multiply_multiply instances
      
      * clang format
      
      * Optimize for ScaleBlockMNK=128
      
      * enable abscale f8 gemm ck profiler
      
      * Add pure f8 gemm test suite
      
      * Reverting to the state of project at f60fd77
      
      * update copyright
      
      * clang format
      
      * update copyright
      
      ---------
      Co-authored-by: default avatarroot <jizhan@amd.com>
      8c90f25b
  23. 12 Jul, 2024 1 commit
  24. 09 Jul, 2024 1 commit
  25. 04 Jul, 2024 1 commit
  26. 27 Jun, 2024 2 commits
  27. 21 Jun, 2024 1 commit
  28. 18 Jun, 2024 1 commit
  29. 14 Jun, 2024 1 commit
  30. 05 Jun, 2024 1 commit
  31. 28 May, 2024 1 commit
    • carlushuang's avatar
      [CK_TILE] support group from cmdline (#1295) · 5055b3bd
      carlushuang authored
      * support cmdline seqlen decode
      
      * silent print
      
      * update readme
      
      * update kernel launch 3d
      
      * update tile partitioner
      
      * fix spill for bf16
      
      * modify based on comment
      
      * modify payload_t
      
      * fix bug for alibi mode
      
      * fix alibi test err
      
      * refactor kernel launch, support select timer
      
      * add missing file
      
      * remove useless code
      
      * add some comments
      5055b3bd
  32. 22 May, 2024 2 commits
  33. 15 May, 2024 1 commit
  34. 10 May, 2024 2 commits
  35. 07 May, 2024 1 commit
  36. 26 Apr, 2024 1 commit
    • Haocong WANG's avatar
      [GEMM] UniversalGemm update (#1262) · 764164b4
      Haocong WANG authored
      
      
      * Add bf16 instances
      
      * Add bf16 gemm universal example
      
      * tempsave
      
      * Add guard to navi compilation
      
      * workground on a specific mixed gemm instance ( bring back it when compiler fix upload)
      
      * fix formatting condition statement issue
      
      * solve conflict
      
      ---------
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      764164b4