1. 23 Aug, 2023 1 commit
  2. 22 Aug, 2023 1 commit
  3. 14 Aug, 2023 2 commits
    • Bartlomiej Wroblewski's avatar
      d4c84256
    • rocking's avatar
      Refactor pool fwd (#815) · f60f0a5e
      rocking authored
      * Do not hardcode stride
      
      * devicePool2DFwd Inherit devicePool3DFwd
      
      * Move instance declaration out of common
      
      * Add dilation
      
      * use the pool3d rank, because pool2d inherit pooo3d
      
      * calculate Do Ho Wo for the dilation
      
      * Fix header name
      
      * Modify ckProfiler
      
      * Remove pool2d instance
      
      * Remove pool2d in profiler
      
      * Remove pool2d and add dilation
      
      * In to client example, this commit revise following:
      1. Add dilation.
      2. Use pool3d to implement pool2d
      
      * Refine naming and IsSupportedArgument()
      
      * Add dilation to maxpool bwd example
      
      * clang format
      
      * 1. Remove useless header
      2. Fix copyright
      3. Refine naming
      
      * Add layout parameter to pool fwd
      
      * clang format
      
      * Fix merge error
      
      * Fix compile error
      
      * Remove layout parameter in derived class
      
      * Refine changlog
      
      * Fix compile error
      
      * Fix compiler error
      
      * Add layout to external api and profiler
      f60f0a5e
  4. 10 Aug, 2023 2 commits
    • Jing Zhang's avatar
      fixed ci · 38a949c8
      Jing Zhang authored
      38a949c8
    • rocking's avatar
      Average pool backward deviceOP and example (#797) · 578142db
      rocking authored
      * Add avgpool bwd reference code
      
      * Refine naming
      
      * Fix invalid in_element op in ref_conv
      
      * Add example (only reference now)
      
      * Add the full example of avgpool bwd
      
      * Fix copyright
      
      * Imitate MakeDescriptor from  transform_conv_bwd_data_to_gemm_v1.hpp
      
      * rename channel to c from k
      
      * Arrange the code
      
      * Imitate the argument from conv bwd
      
      * Implement invoker
      
      * Fix order of parameter in example
      
      * Refactor reference code for different dimension
      
      * Support different stride
      
      * Check if argument is valid
      
      * Fix kernel parameter for NDHWC, fastest dimension C is not reduced
      
      * Add more data type in example
      
      * Fix bug in example
      
      * calculate Do Ho Wo according to the dilation
      
      * Remove useless header
      
      * Add comment in reference code
      
      * Add layout parameter
      
      * Remove layout in derived class
      
      * Refine reference comment
      578142db
  5. 09 Aug, 2023 1 commit
  6. 07 Aug, 2023 2 commits
    • Illia Silin's avatar
      Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769
      Illia Silin authored
      * properly split conv_nd_bwd_data instances
      
      * split conv2d_fwd instance data types
      
      * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
      
      * split the tests by data types where possible
      
      * filter examples by DTYPES
      
      * split few remaining examples by DTYPES
      
      * filter most instances by DTYPES
      
      * add new lines at end of headers, fix grouped_gemm profiler
      
      * fix syntax
      
      * split the ckprofiler instances by DTYPES
      
      * split the conv2d and quantization DL and XDL instances
      
      * fix the splitting of conv2d DL instances
      
      * split softmax and pool_fwd tests for fp16 and fp32 types
      
      * fix syntax
      
      * fix the dl_int8 quantization instances isolation
      08eb1769
    • Bartłomiej Kocot's avatar
      Add wei_strides to grouped conv3d wei to keep consistency (#817) · 22443f7a
      Bartłomiej Kocot authored
      
      
      * Add wei_strides to grouped conv3d wei to keep consistency
      
      * Fix strides in client examples
      
      * Unify backward weight api with forward
      
      * Fix for example
      
      * Fixes for examples
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      22443f7a
  7. 05 Aug, 2023 1 commit
  8. 03 Aug, 2023 2 commits
  9. 02 Aug, 2023 1 commit
  10. 28 Jul, 2023 2 commits
  11. 27 Jul, 2023 4 commits
  12. 26 Jul, 2023 4 commits
  13. 25 Jul, 2023 1 commit
  14. 19 Jul, 2023 6 commits
  15. 18 Jul, 2023 3 commits
    • Jing Zhang's avatar
      dedicated fixed_nk solution · e87f7319
      Jing Zhang authored
      e87f7319
    • Jing Zhang's avatar
      add SetDeviceKernelArgs · 5a5468f4
      Jing Zhang authored
      5a5468f4
    • Illia Silin's avatar
      Add mechanism to build CK for select data types, add Navi3x CI. (#790) · 189ea3b9
      Illia Silin authored
      * allow building CK for specific data types
      
      * add CI build and test stage on Naiv3x without some int8 instances
      
      * add missing gemm fp16 instances
      
      * add the changes to the missed cmake file
      
      * add empty lines at end of source files
      
      * Do not build quantization client example on navi3 in CI
      
      * disable batched_gemm_multi_d_int8 instances with DTYPES
      
      * disable device_conv2d_bwd_data_instance with DTYPES
      
      * fix ckprofiler for conv_bwd_data for int8
      
      * properly isolate the conv_bwd_data int8 instances
      
      * remove empty line
      189ea3b9
  16. 17 Jul, 2023 2 commits
  17. 12 Jul, 2023 1 commit
  18. 06 Jul, 2023 2 commits
    • Qianfeng's avatar
      Batchnorm splitk single kernel (#771) · 8f5cafaf
      Qianfeng authored
      * Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance]
      
      * Add CountDataType as template parameter in blockwise_welford
      
      * Add utility/get_shift.hpp
      
      * Add BatchNorm multiblock single-kernel implementation
      
      * Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a
      
      * Renaming in device_batchnorm_forward_impl.hpp
      
      * Tiny fix in the batchnorm_fwd profiler
      
      * Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a"
      
      This reverts commit d16d00919c43f10759e7b4e4d112125221ed9064.
      
      * Use the old two-kernel batchnorm multiblock method for gfx1030
      
      * Use the old two-kernel batchnorm multiblock method for gfx908
      
      * use the single-kernel batchnorm multiblock method only for gfx90a
      
      * Remove get_wave_id() from utility/get_id.hpp since it is not used
      
      * Set true for testing running mean/variance and saving mean/invvariance in the examples
      
      * Fix to copy-right words
      
      * Remove un-needed including in utility/get_id.hpp
      
      * Add comments to workgroup_synchronization.hpp
      
      * Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp
      
      * Renaming in the kernels
      
      * Remove un-used kernel file
      8f5cafaf
    • Adam Osewski's avatar
      f4dfc060
  19. 05 Jul, 2023 1 commit
  20. 19 Jun, 2023 1 commit