1. 05 Aug, 2023 1 commit
  2. 04 Aug, 2023 2 commits
  3. 03 Aug, 2023 5 commits
  4. 02 Aug, 2023 1 commit
  5. 28 Jul, 2023 2 commits
  6. 27 Jul, 2023 1 commit
  7. 26 Jul, 2023 4 commits
  8. 25 Jul, 2023 2 commits
  9. 21 Jul, 2023 3 commits
  10. 18 Jul, 2023 3 commits
  11. 17 Jul, 2023 1 commit
  12. 15 Jul, 2023 1 commit
  13. 12 Jul, 2023 1 commit
  14. 07 Jul, 2023 1 commit
  15. 06 Jul, 2023 5 commits
    • Adam Osewski's avatar
      Add basic setup for precommit (#749) (#764) · 237f9cd3
      Adam Osewski authored
      
      
      * Add basic setup for precommit
      
      * Update README.md with instructions on installing precommit hooks
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarBartlomiej Wroblewski <bwroblewski10@gmail.com>
      237f9cd3
    • Po Yen Chen's avatar
      Split GEMM instance library & enable pipeline v2 optimization (#783) · 850144a0
      Po Yen Chen authored
      * Move source file into sub-directories
      
      * Add missing include directive
      
      * Split DeviceGemmXdl<> fp16 instances
      
      * Fix format
      
      * Remove unnecessary CMakeLists.txt
      
      * Add macros to toggle new features
      
      * Remove debug message
      
      * Turn off GEMM v2 pipeline optimization by default
      
      * Fix format
      
      * Extract duplicated string as list
      
      * Enlarge indent in CMakeLists.txt
      850144a0
    • Qianfeng's avatar
      Batchnorm splitk single kernel (#771) · 8f5cafaf
      Qianfeng authored
      * Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance]
      
      * Add CountDataType as template parameter in blockwise_welford
      
      * Add utility/get_shift.hpp
      
      * Add BatchNorm multiblock single-kernel implementation
      
      * Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a
      
      * Renaming in device_batchnorm_forward_impl.hpp
      
      * Tiny fix in the batchnorm_fwd profiler
      
      * Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a"
      
      This reverts commit d16d00919c43f10759e7b4e4d112125221ed9064.
      
      * Use the old two-kernel batchnorm multiblock method for gfx1030
      
      * Use the old two-kernel batchnorm multiblock method for gfx908
      
      * use the single-kernel batchnorm multiblock method only for gfx90a
      
      * Remove get_wave_id() from utility/get_id.hpp since it is not used
      
      * Set true for testing running mean/variance and saving mean/invvariance in the examples
      
      * Fix to copy-right words
      
      * Remove un-needed including in utility/get_id.hpp
      
      * Add comments to workgroup_synchronization.hpp
      
      * Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp
      
      * Renaming in the kernels
      
      * Remove un-used kernel file
      8f5cafaf
    • Adam Osewski's avatar
      f4dfc060
    • Bartlomiej Kocot's avatar
      2b0b6d9f
  16. 05 Jul, 2023 2 commits
  17. 30 Jun, 2023 1 commit
  18. 28 Jun, 2023 1 commit
  19. 21 Jun, 2023 2 commits
  20. 20 Jun, 2023 1 commit