1. 09 Aug, 2023 5 commits
  2. 07 Aug, 2023 2 commits
    • Illia Silin's avatar
      Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769
      Illia Silin authored
      * properly split conv_nd_bwd_data instances
      
      * split conv2d_fwd instance data types
      
      * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
      
      * split the tests by data types where possible
      
      * filter examples by DTYPES
      
      * split few remaining examples by DTYPES
      
      * filter most instances by DTYPES
      
      * add new lines at end of headers, fix grouped_gemm profiler
      
      * fix syntax
      
      * split the ckprofiler instances by DTYPES
      
      * split the conv2d and quantization DL and XDL instances
      
      * fix the splitting of conv2d DL instances
      
      * split softmax and pool_fwd tests for fp16 and fp32 types
      
      * fix syntax
      
      * fix the dl_int8 quantization instances isolation
      08eb1769
    • Bartłomiej Kocot's avatar
      Add wei_strides to grouped conv3d wei to keep consistency (#817) · 22443f7a
      Bartłomiej Kocot authored
      
      
      * Add wei_strides to grouped conv3d wei to keep consistency
      
      * Fix strides in client examples
      
      * Unify backward weight api with forward
      
      * Fix for example
      
      * Fixes for examples
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      22443f7a
  3. 03 Aug, 2023 4 commits
  4. 02 Aug, 2023 1 commit
  5. 27 Jul, 2023 1 commit
  6. 26 Jul, 2023 4 commits
  7. 25 Jul, 2023 2 commits
  8. 21 Jul, 2023 3 commits
  9. 18 Jul, 2023 3 commits
  10. 17 Jul, 2023 1 commit
  11. 15 Jul, 2023 1 commit
  12. 12 Jul, 2023 1 commit
  13. 07 Jul, 2023 1 commit
  14. 06 Jul, 2023 5 commits
    • Adam Osewski's avatar
      Add basic setup for precommit (#749) (#764) · 237f9cd3
      Adam Osewski authored
      
      
      * Add basic setup for precommit
      
      * Update README.md with instructions on installing precommit hooks
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarBartlomiej Wroblewski <bwroblewski10@gmail.com>
      237f9cd3
    • Po Yen Chen's avatar
      Split GEMM instance library & enable pipeline v2 optimization (#783) · 850144a0
      Po Yen Chen authored
      * Move source file into sub-directories
      
      * Add missing include directive
      
      * Split DeviceGemmXdl<> fp16 instances
      
      * Fix format
      
      * Remove unnecessary CMakeLists.txt
      
      * Add macros to toggle new features
      
      * Remove debug message
      
      * Turn off GEMM v2 pipeline optimization by default
      
      * Fix format
      
      * Extract duplicated string as list
      
      * Enlarge indent in CMakeLists.txt
      850144a0
    • Qianfeng's avatar
      Batchnorm splitk single kernel (#771) · 8f5cafaf
      Qianfeng authored
      * Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance]
      
      * Add CountDataType as template parameter in blockwise_welford
      
      * Add utility/get_shift.hpp
      
      * Add BatchNorm multiblock single-kernel implementation
      
      * Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a
      
      * Renaming in device_batchnorm_forward_impl.hpp
      
      * Tiny fix in the batchnorm_fwd profiler
      
      * Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a"
      
      This reverts commit d16d00919c43f10759e7b4e4d112125221ed9064.
      
      * Use the old two-kernel batchnorm multiblock method for gfx1030
      
      * Use the old two-kernel batchnorm multiblock method for gfx908
      
      * use the single-kernel batchnorm multiblock method only for gfx90a
      
      * Remove get_wave_id() from utility/get_id.hpp since it is not used
      
      * Set true for testing running mean/variance and saving mean/invvariance in the examples
      
      * Fix to copy-right words
      
      * Remove un-needed including in utility/get_id.hpp
      
      * Add comments to workgroup_synchronization.hpp
      
      * Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp
      
      * Renaming in the kernels
      
      * Remove un-used kernel file
      8f5cafaf
    • Adam Osewski's avatar
      f4dfc060
    • Bartlomiej Kocot's avatar
      2b0b6d9f
  15. 05 Jul, 2023 2 commits
  16. 30 Jun, 2023 1 commit
  17. 28 Jun, 2023 1 commit
  18. 21 Jun, 2023 2 commits