1. 14 Aug, 2023 2 commits
    • Bartlomiej Wroblewski's avatar
      d4c84256
    • rocking's avatar
      Refactor pool fwd (#815) · f60f0a5e
      rocking authored
      * Do not hardcode stride
      
      * devicePool2DFwd Inherit devicePool3DFwd
      
      * Move instance declaration out of common
      
      * Add dilation
      
      * use the pool3d rank, because pool2d inherit pooo3d
      
      * calculate Do Ho Wo for the dilation
      
      * Fix header name
      
      * Modify ckProfiler
      
      * Remove pool2d instance
      
      * Remove pool2d in profiler
      
      * Remove pool2d and add dilation
      
      * In to client example, this commit revise following:
      1. Add dilation.
      2. Use pool3d to implement pool2d
      
      * Refine naming and IsSupportedArgument()
      
      * Add dilation to maxpool bwd example
      
      * clang format
      
      * 1. Remove useless header
      2. Fix copyright
      3. Refine naming
      
      * Add layout parameter to pool fwd
      
      * clang format
      
      * Fix merge error
      
      * Fix compile error
      
      * Remove layout parameter in derived class
      
      * Refine changlog
      
      * Fix compile error
      
      * Fix compiler error
      
      * Add layout to external api and profiler
      f60f0a5e
  2. 10 Aug, 2023 2 commits
    • Jing Zhang's avatar
      fixed ci · 38a949c8
      Jing Zhang authored
      38a949c8
    • rocking's avatar
      Average pool backward deviceOP and example (#797) · 578142db
      rocking authored
      * Add avgpool bwd reference code
      
      * Refine naming
      
      * Fix invalid in_element op in ref_conv
      
      * Add example (only reference now)
      
      * Add the full example of avgpool bwd
      
      * Fix copyright
      
      * Imitate MakeDescriptor from  transform_conv_bwd_data_to_gemm_v1.hpp
      
      * rename channel to c from k
      
      * Arrange the code
      
      * Imitate the argument from conv bwd
      
      * Implement invoker
      
      * Fix order of parameter in example
      
      * Refactor reference code for different dimension
      
      * Support different stride
      
      * Check if argument is valid
      
      * Fix kernel parameter for NDHWC, fastest dimension C is not reduced
      
      * Add more data type in example
      
      * Fix bug in example
      
      * calculate Do Ho Wo according to the dilation
      
      * Remove useless header
      
      * Add comment in reference code
      
      * Add layout parameter
      
      * Remove layout in derived class
      
      * Refine reference comment
      578142db
  3. 09 Aug, 2023 1 commit
  4. 07 Aug, 2023 2 commits
    • Illia Silin's avatar
      Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769
      Illia Silin authored
      * properly split conv_nd_bwd_data instances
      
      * split conv2d_fwd instance data types
      
      * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
      
      * split the tests by data types where possible
      
      * filter examples by DTYPES
      
      * split few remaining examples by DTYPES
      
      * filter most instances by DTYPES
      
      * add new lines at end of headers, fix grouped_gemm profiler
      
      * fix syntax
      
      * split the ckprofiler instances by DTYPES
      
      * split the conv2d and quantization DL and XDL instances
      
      * fix the splitting of conv2d DL instances
      
      * split softmax and pool_fwd tests for fp16 and fp32 types
      
      * fix syntax
      
      * fix the dl_int8 quantization instances isolation
      08eb1769
    • Bartłomiej Kocot's avatar
      Add wei_strides to grouped conv3d wei to keep consistency (#817) · 22443f7a
      Bartłomiej Kocot authored
      
      
      * Add wei_strides to grouped conv3d wei to keep consistency
      
      * Fix strides in client examples
      
      * Unify backward weight api with forward
      
      * Fix for example
      
      * Fixes for examples
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      22443f7a
  5. 05 Aug, 2023 1 commit
  6. 03 Aug, 2023 2 commits
  7. 02 Aug, 2023 1 commit
  8. 28 Jul, 2023 2 commits
  9. 27 Jul, 2023 4 commits
  10. 26 Jul, 2023 4 commits
  11. 25 Jul, 2023 1 commit
  12. 19 Jul, 2023 6 commits
  13. 18 Jul, 2023 3 commits
    • Jing Zhang's avatar
      dedicated fixed_nk solution · e87f7319
      Jing Zhang authored
      e87f7319
    • Jing Zhang's avatar
      add SetDeviceKernelArgs · 5a5468f4
      Jing Zhang authored
      5a5468f4
    • Illia Silin's avatar
      Add mechanism to build CK for select data types, add Navi3x CI. (#790) · 189ea3b9
      Illia Silin authored
      * allow building CK for specific data types
      
      * add CI build and test stage on Naiv3x without some int8 instances
      
      * add missing gemm fp16 instances
      
      * add the changes to the missed cmake file
      
      * add empty lines at end of source files
      
      * Do not build quantization client example on navi3 in CI
      
      * disable batched_gemm_multi_d_int8 instances with DTYPES
      
      * disable device_conv2d_bwd_data_instance with DTYPES
      
      * fix ckprofiler for conv_bwd_data for int8
      
      * properly isolate the conv_bwd_data int8 instances
      
      * remove empty line
      189ea3b9
  14. 17 Jul, 2023 2 commits
  15. 12 Jul, 2023 1 commit
  16. 06 Jul, 2023 2 commits
    • Qianfeng's avatar
      Batchnorm splitk single kernel (#771) · 8f5cafaf
      Qianfeng authored
      * Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance]
      
      * Add CountDataType as template parameter in blockwise_welford
      
      * Add utility/get_shift.hpp
      
      * Add BatchNorm multiblock single-kernel implementation
      
      * Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a
      
      * Renaming in device_batchnorm_forward_impl.hpp
      
      * Tiny fix in the batchnorm_fwd profiler
      
      * Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a"
      
      This reverts commit d16d00919c43f10759e7b4e4d112125221ed9064.
      
      * Use the old two-kernel batchnorm multiblock method for gfx1030
      
      * Use the old two-kernel batchnorm multiblock method for gfx908
      
      * use the single-kernel batchnorm multiblock method only for gfx90a
      
      * Remove get_wave_id() from utility/get_id.hpp since it is not used
      
      * Set true for testing running mean/variance and saving mean/invvariance in the examples
      
      * Fix to copy-right words
      
      * Remove un-needed including in utility/get_id.hpp
      
      * Add comments to workgroup_synchronization.hpp
      
      * Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp
      
      * Renaming in the kernels
      
      * Remove un-used kernel file
      8f5cafaf
    • Adam Osewski's avatar
      f4dfc060
  17. 05 Jul, 2023 1 commit
  18. 19 Jun, 2023 2 commits
    • Illia Silin's avatar
      do not build gemm-gemm and conv-conv examples for gfx94* (#761) · 645eb2f2
      Illia Silin authored
      * do not build gemm-gemm and conv-conv examples for gfx94*
      
      * do not build gemm-gemm and conv-conv examples on navi
      645eb2f2
    • rocking's avatar
      Maxpool bwd (#750) · 341ad956
      rocking authored
      * Add maxpool f32 kernel and example
      
      * Revise copyright
      
      * Add device pool bwd device op
      
      * Support f16 and bf16
      
      * Add compute datatype for reference code.
      Prevent error in bf16
      
      * Fix type error
      
      * Remove layout
      
      * Fix bf16 error
      
      * Add f16 and bf16 example
      
      * Add more operations
      
      * Implement IsSupportedArgument
      
      * Add changelog
      
      * Add comment
      
      * Add comment
      
      * Remove useless header
      
      * Move initialize of workspace to the run
      
      * Move set din zero to the device operator
      
      * Save din_length_raw
      
      * Remove useless header
      
      * Calculate gridsize according to the number of CU
      
      * Calculate gridSize according to the number of CU.
      Remove useless header
      
      * Add put example
      
      * Remove useless header
      
      * Fix CI fail
      341ad956
  19. 15 Jun, 2023 1 commit
    • Illia Silin's avatar
      Enable gfx941 and gfx942 architectures. (#752) · 027e46ee
      Illia Silin authored
      * enable gfx941/942 targets
      
      * fix clang format
      
      * fix the cmake logic for multiple targets
      
      * fix cmake syntax for looping over targets
      
      * add gfx941/942 support for gemm_xdl instances
      027e46ee