1. 28 Jul, 2023 1 commit
  2. 26 Jul, 2023 5 commits
  3. 25 Jul, 2023 6 commits
  4. 24 Jul, 2023 1 commit
  5. 18 Jul, 2023 1 commit
    • Illia Silin's avatar
      Add mechanism to build CK for select data types, add Navi3x CI. (#790) · 189ea3b9
      Illia Silin authored
      * allow building CK for specific data types
      
      * add CI build and test stage on Naiv3x without some int8 instances
      
      * add missing gemm fp16 instances
      
      * add the changes to the missed cmake file
      
      * add empty lines at end of source files
      
      * Do not build quantization client example on navi3 in CI
      
      * disable batched_gemm_multi_d_int8 instances with DTYPES
      
      * disable device_conv2d_bwd_data_instance with DTYPES
      
      * fix ckprofiler for conv_bwd_data for int8
      
      * properly isolate the conv_bwd_data int8 instances
      
      * remove empty line
      189ea3b9
  6. 17 Jul, 2023 1 commit
  7. 15 Jul, 2023 1 commit
  8. 14 Jul, 2023 2 commits
  9. 13 Jul, 2023 1 commit
  10. 12 Jul, 2023 1 commit
  11. 10 Jul, 2023 1 commit
  12. 07 Jul, 2023 2 commits
  13. 06 Jul, 2023 3 commits
    • Qianfeng's avatar
      Batchnorm splitk single kernel (#771) · 8f5cafaf
      Qianfeng authored
      * Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance]
      
      * Add CountDataType as template parameter in blockwise_welford
      
      * Add utility/get_shift.hpp
      
      * Add BatchNorm multiblock single-kernel implementation
      
      * Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a
      
      * Renaming in device_batchnorm_forward_impl.hpp
      
      * Tiny fix in the batchnorm_fwd profiler
      
      * Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a"
      
      This reverts commit d16d00919c43f10759e7b4e4d112125221ed9064.
      
      * Use the old two-kernel batchnorm multiblock method for gfx1030
      
      * Use the old two-kernel batchnorm multiblock method for gfx908
      
      * use the single-kernel batchnorm multiblock method only for gfx90a
      
      * Remove get_wave_id() from utility/get_id.hpp since it is not used
      
      * Set true for testing running mean/variance and saving mean/invvariance in the examples
      
      * Fix to copy-right words
      
      * Remove un-needed including in utility/get_id.hpp
      
      * Add comments to workgroup_synchronization.hpp
      
      * Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp
      
      * Renaming in the kernels
      
      * Remove un-used kernel file
      8f5cafaf
    • Adam Osewski's avatar
      f4dfc060
    • ltqin's avatar
      change some functions name · dcfe312b
      ltqin authored
      dcfe312b
  14. 05 Jul, 2023 2 commits
  15. 04 Jul, 2023 1 commit
  16. 03 Jul, 2023 1 commit
    • ltqin's avatar
      first · 2416ddf7
      ltqin authored
      2416ddf7
  17. 30 Jun, 2023 1 commit
  18. 25 Jun, 2023 1 commit
  19. 19 Jun, 2023 4 commits
    • Illia Silin's avatar
      do not build gemm-gemm and conv-conv examples for gfx94* (#761) · 645eb2f2
      Illia Silin authored
      * do not build gemm-gemm and conv-conv examples for gfx94*
      
      * do not build gemm-gemm and conv-conv examples on navi
      645eb2f2
    • rocking's avatar
      Maxpool bwd (#750) · 341ad956
      rocking authored
      * Add maxpool f32 kernel and example
      
      * Revise copyright
      
      * Add device pool bwd device op
      
      * Support f16 and bf16
      
      * Add compute datatype for reference code.
      Prevent error in bf16
      
      * Fix type error
      
      * Remove layout
      
      * Fix bf16 error
      
      * Add f16 and bf16 example
      
      * Add more operations
      
      * Implement IsSupportedArgument
      
      * Add changelog
      
      * Add comment
      
      * Add comment
      
      * Remove useless header
      
      * Move initialize of workspace to the run
      
      * Move set din zero to the device operator
      
      * Save din_length_raw
      
      * Remove useless header
      
      * Calculate gridsize according to the number of CU
      
      * Calculate gridSize according to the number of CU.
      Remove useless header
      
      * Add put example
      
      * Remove useless header
      
      * Fix CI fail
      341ad956
    • danyao12's avatar
      rename all fwd related files · 6d63c311
      danyao12 authored
      6d63c311
    • danyao12's avatar
      rename all bwd related files · 656ebe9a
      danyao12 authored
      656ebe9a
  20. 16 Jun, 2023 4 commits