1. 30 May, 2023 1 commit
  2. 24 May, 2023 2 commits
    • Illia Silin's avatar
      Clean-up the headers (#713) · ac9e01e2
      Illia Silin authored
      
      
      * fix headers for gpu instances
      
      * remove unused headers
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      ac9e01e2
    • rocking's avatar
      Pool3d fwd (#697) · 76ec0089
      rocking authored
      * Expand the base class of pool2d, prepare to share base class with pool3d
      
      * Add pool3d device op
      
      * Add pool3d f16 example
      
      * Refactor the base class. implement generic pooling in the future
      
      * clang format
      
      * get original index in max pooling
      
      * Add outputindex to base class
      
      * Fix dimension
      
      * Add pooling instance
      
      * Use indexType instead
      
      * Remove useless header
      
      * Extract IndexDataType to template
      
      * Extract pooling reference code
      
      * clang format
      
      * clang format
      
      * Fix typo
      
      * Add tensor stride
      
      * Add missing header
      
      * Add index stride and output stride
      
      * Refine naming
      
      * Add type to base class
      
      * Rename file
      
      * Use proper size
      
      * Fix typo
      
      * Refine naming
      
      * Modify the argument into vector.
      
      * Add max pool profiler
      
      * Refine naming
      
      * Support f32 pool
      
      * Fix typo
      
      * Add avg pool2d fwd in profiler
      
      * clang format
      
      * Rename AccDatatype to ComputeDatatype
      
      * Fix init
      
      * test pool
      
      * Extract variable
      
      * Add client example
      
      * Check the pooling dim
      
      * clang format
      
      * Connect argv and arg_parser
      
      * Add found check
      
      * Remove useless header
      
      * Refine naming
      
      * Adjust the order of device_pool_fwd
      76ec0089
  3. 23 May, 2023 1 commit
    • Illia Silin's avatar
      Enable gemm_dl and other kernels on Navi3x. (#714) · d821d1e5
      Illia Silin authored
      * enable dl kernels on navi3
      
      * do not build xdl tests and examples on Navi
      
      * run tests before building everything on jenkins
      
      * disable gemm_bilinear on gfx1030
      
      * add gpu targets to installer on Navi
      
      * put tests in the same order as before
      
      * reduce the number of navi targets in CI
      
      * build CI installed for gfx940 as well
      
      * only build for MI300 during QA runs
      d821d1e5
  4. 18 May, 2023 1 commit
    • Sam Wu's avatar
      Documentation Updates (#710) · 3cff3404
      Sam Wu authored
      * update documentation dependencies
      
      add version number to docs
      
      rename doc config directories
      
      enable more doc formats on rtd
      
      add license section in docs
      3cff3404
  5. 15 May, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Add contraction profiler and tests (#701) · 642d5e91
      Bartłomiej Kocot authored
      * Add contraction profiler and tests
      
      * Build and style fixes
      
      * Allow to use any elementwise operator for ref_contraction
      
      * Introduce profile_contraction_scale and profile_contraction_bilinear
      
      * Make ref_contraction generic and extend interface tests
      
      * Stylistic minor fixes
      
      * Extend test_contraction_interface
      642d5e91
  6. 11 May, 2023 1 commit
  7. 04 May, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Optimize bf16 conversion (#664) · b076a02a
      Rostyslav Geyyer authored
      * Add TypeConvert class and start refactoring
      
      * Refactor TypeConvert as a struct
      
      * Get back to template functions type_convert
      
      * Add a type_convert_bf16_rtn, set rtz as default
      
      * Clean up
      
      * Add UnaryConvertPrecision struct for high-precision workloads
      
      * Format
      
      * Update type_convert to UnaryConvert on threadwise level
      
      * Update UnaryConvertPrecision
      
      * Format
      
      * Fix chmod
      
      * Add a flag to pick converion method
      
      * Format
      
      * Remove the added flag
      
      * Merge elementwise op with type conversion
      
      * Move type_convert to elemwise op, update the op
      
      * Update type_convert_precision -> bf16_convert_rtn
      
      * Clean up
      
      * Update comments
      
      * Update the CK_WORKAROUND_DENORM_FIX flag handling
      
      * Update the unneeded op to work but warn user
      
      * Remove the message
      
      * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference
      
      * Format
      
      * Add missing include
      b076a02a
  8. 03 May, 2023 3 commits
  9. 02 May, 2023 1 commit
  10. 28 Apr, 2023 1 commit
  11. 26 Apr, 2023 2 commits
    • Haocong WANG's avatar
      add vector load check (#680) · 54c90aae
      Haocong WANG authored
      
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      54c90aae
    • Jun Liu's avatar
      [CK] suppress unsafe buffer warn (#687) · 7613c1d9
      Jun Liu authored
      incomplete fix from https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/670
      
      So it does not only happen in gtest but also in CK code:
      
      We need to fix them as a quality improvement, but for now suppressing this warning in immediate releases:
      http://compiler-ci.amd.com/blue/rest/organizations/jenkins/pipelines/compiler-psdb-amd-stg-open/runs/2540/nodes/282/steps/3202/log/?start=0
      
      e.g.
      ```
      [2023-04-26T17:26:31.524Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-0f98035df1cc5ba3e90ab03187e672b426a25b00/include/ck/utility/generic_memory_space_atomic.hpp:52:19: error: unsafe pointer arithmetic [-Werror,-Wunsafe-buffer-usage]
      [2023-04-26T17:26:31.524Z]         atomicAdd(c_style_pointer_cast<float*>(p_dst) + 1, vx.template AsType<float>()[I1]);
      [2023-04-26T17:26:31.524Z]                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ```
      ```
      [2023-04-26T17:26:31.523Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-0f98035df1cc5ba3e90ab03187e672b426a25b00/include/ck/utility/amd_inline_asm.hpp:62:20: error: 'p_a_half2' is an unsafe pointer used for buffer access [-Werror,-Wunsafe-buffer-usage]
      [2023-04-26T17:26:31.523Z]     const half2_t* p_a_half2  = c_style_pointer_cast<const half2_t*>(&a);
      [2023-04-26T17:26:31.523Z]     ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ```
      7613c1d9
  12. 24 Apr, 2023 3 commits
    • Adam Osewski's avatar
      Grouped Gemm + SplitK + simplified Kernel Args (#669) · 8bb2bb4a
      Adam Osewski authored
      
      
      * simplify karg in device/grid split-k op
      
      * fix mk_kn_mn instances
      
      * add more instances
      
      * B2C with 3D grid for KSplit
      
      * Remove unused code.
      
      * Use default B2C (3D grid) in grid gemm v2r4r2.
      
      * Device gemm splitk use B2C map.
      
      * Device GroupedGemmXdlSplitKCShuffle
      
      * Example for GroupedGemm Xdl SplitK
      
      * Introduce Device GroupedGemmSplitK
      
      * Fix updating kbatch size.
      
      * Add instance mk-nk-mn
      
      * Enable set kbatch in profiler.
      
      * Add GGemmSplitK mk-kn-mn instances
      
      * Add more instances & split into multiple files.
      
      * minor fix
      
      * tuning
      
      * clean
      
      * disabled failed instances
      
      * use pipe v2
      
      * Ignore arg on not supported arch.
      
      * fix warning
      
      ---------
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      8bb2bb4a
    • zjing14's avatar
      reduce inital number for half_t splitk (#685) · 8b9cbba8
      zjing14 authored
      8b9cbba8
    • rocking's avatar
      Revise layout of group convolution (#675) · 3eecbfb6
      rocking authored
      * [What] Remove pure conv int8 instance
      [Why] We will never use pure int8 conv in AI, use int8 quantization instead
      
      * Change layout
      
      * Share the kernel parameter
      
      * Support more type of NHWGC for group conv
      
      * Revise client example of conv 2d, use NHWGC layout
      
      * Add instance to cmake
      
      * Revise layout of group conv quantization instance
      
      * Revise layout of external api of group conv quantization
      
      * Revise layout of group conv quantization client example
      
      * Fix clang format
      
      * Add comment to describe meaning of each parameter
      3eecbfb6
  13. 22 Apr, 2023 1 commit
  14. 21 Apr, 2023 2 commits
  15. 18 Apr, 2023 1 commit
    • Illia Silin's avatar
      Allow using ROCm release candidate compilers. (#679) · bb0b772d
      Illia Silin authored
      * enable use of rocm5.5 release candidate 4
      
      * upgrade to ROCM5.5 RC5
      
      * try fix the PUB_KEY error, remove the cmake-data package
      
      * upgrade to latest cmake version
      
      * use private dockerhub repo for rocm5.5 rc5
      
      * add missing bracket
      bb0b772d
  16. 17 Apr, 2023 1 commit
  17. 16 Apr, 2023 2 commits
  18. 11 Apr, 2023 5 commits
  19. 10 Apr, 2023 1 commit
    • rocking5566's avatar
      Groupnorm + swish external api (#668) · ed3a2e52
      rocking5566 authored
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      ed3a2e52
  20. 07 Apr, 2023 1 commit
  21. 30 Mar, 2023 3 commits
  22. 29 Mar, 2023 3 commits
  23. 27 Mar, 2023 1 commit
  24. 24 Mar, 2023 1 commit