"composable_kernel/include/utility/config.hpp" did not exist on "31ded4ac4bc524acdbf897ffff094d7e7cbed991"
  1. 20 May, 2022 1 commit
    • rocking5566's avatar
      Gemm reduce max (#209) · 0ffe956a
      rocking5566 authored
      
      
      * [What] Rename the example
      [Why] Prepare to add unary reduction
      
      * Add global oparation to the parameter
      
      * Add atomicmax
      
      * Fix compile error
      
      * Support atomicMax (hip library)
      
      * Rename the reduction example
      
      * Fix target name
      
      * use p_d1_grid as the indicator directly
      
      * Prevent performance issue. Let passthrough handle it.
      
      * Implement the function template the specialize the float2
      
      * No need to separate into two lines
      
      * Remove empty line
      
      * add comment
      
      * Fix compile error due to merge from develop
      
      * make the implementation of atomic_max / atomic_add explicit for each datatype
      
      * Refine typo
      
      * For future CI test
      
      * Fix compiler error in ckProfiler
      
      * Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe'
      
      * simply use remove_pointer
      
      * Rename type and var
      
      * Refine example
      
      * Modify reducemax example
      
      * Fix bug in reduction
      
      * Change initialize range
      
      * Implement F64 version of atomicMax
      
      * Move reduction  code together
      
      * Add buffer atomic_max
      
      * Fix coding style by clang-format
      
      * Integrate new api of DeviceGemmReduce_Xdl_CShuffle
      
      * Integrate Batch gemm reduction
      
      * Fix example
      
      * fix example
      
      * clean up
      
      * Fix batch gemm tensor operation
      
      * Fix coding style
      
      * Fix template augument
      
      * Fix clang format
      
      * Keep flexible of different stride for each D tensor
      
      * Fix compile error for ckProfiler
      
      * Fix typo
      
      * [What] Fix naming
      [Why] Prepare to add out elementop
      
      * Add DoutElementOp
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarrocking <chunylai@amd.com>
      0ffe956a
  2. 19 May, 2022 1 commit
    • rocking5566's avatar
      elementwise op (#238) · aafc3ac2
      rocking5566 authored
      
      
      * Add elementwise operation kernel and example
      
      * Add comment
      
      * Add template argument of dim . Prepare to support multiple dimension
      
      * Rename example
      
      * Support 1 dimension
      
      * Add static assert
      
      * Add comment
      
      * Extract pad
      
      * Remove redundant argument
      
      * Support any dimension for elementwise operation
      
      * Remove line
      
      * Let it be the multiple number of CU
      
      * Move thread per block to the parameter of constructor
      
      * rename threadPerBlock with blockSize
      
      * Support double
      
      * rename kernel function name
      
      * remove redundant include header
      
      * Refine type
      
      * Need to the final dimension
      
      * Refine variable name
      
      * Refine type
      
      * Use index_t instead of int in API
      Co-authored-by: default avatarrocking <chunylai@amd.com>
      aafc3ac2
  3. 13 May, 2022 1 commit
  4. 12 May, 2022 2 commits
    • JD's avatar
      Add host API (#220) · cec69bc3
      JD authored
      
      
      * Add host API
      
      * manually rebase on develop
      
      * clean
      
      * manually rebase on develop
      
      * exclude tests from all target
      
      * address review comments
      
      * update client app name
      
      * fix missing lib name
      
      * clang-format update
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix test issue
      
      * refactor
      
      * refactor
      
      * refactor
      
      * upate cmake and readme
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      cec69bc3
    • ltqin's avatar
      enable convnd bwd data test (#234) · 0f912e20
      ltqin authored
      0f912e20
  5. 11 May, 2022 1 commit
    • Anthony Chang's avatar
      Manual control of MAC cluster for improved interwave performance (#184) · 76764d8c
      Anthony Chang authored
      * manual control of MAC cluster for improved 2-wave performance
      
      ensure setprio's order; ensure inner loop size >= local read size
      
      synchronize when single mac cluster
      
      * format
      
      * use value field from ck::integral_constant
      
      * roll out inter-wave loop scheduler to c-shuffle gemm variants
      
      will gradually roll out to other applicable device ops when occasional reg spill is resolved
      
      * additional comments
      
      * format
      
      * fix mismatch between inter-wave pipeline and interwave blockwise gemm
      
      * address review feedback
      
      * amend
      76764d8c
  6. 10 May, 2022 1 commit
  7. 09 May, 2022 3 commits
    • myamlak's avatar
      Resolution of issue #153: Add compiler warning on comparing int and size_t (#212) · f03a1738
      myamlak authored
      
      
      * Turning compare warnings on
      
      * Cleaning part I
      
      * Cleaning part II
      
      * Explicit static_cast to ck::type_convert
      
      * Resolving large tensor size issue.
      
      * format
      
      * revert change to tensor descriptor; promote lementSpaceSize to 64bit
      
      * use integer value for GEMM test
      
      * Review remarks
      
      * Review remarks + issues with (un)signed arithmetic
      
      * Format fix
      
      * Format
      
      * Clang-format.
      
      * fix 2gb limit issue
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      f03a1738
    • Wen-Heng (Jack) Chung's avatar
      Update README.md (#228) · 968bd932
      Wen-Heng (Jack) Chung authored
      968bd932
    • Chao Liu's avatar
      Code refactor (#175) · ec7c2e91
      Chao Liu authored
      * format
      
      * improving pipeline
      
      * fix typo
      
      * format
      
      * adding thread group
      
      * adding thread group
      
      * adding thread group
      
      * adding gemm pipeline
      
      * tweak
      
      * refactor
      
      * refactor
      
      * add missing type convert
      
      * refactor
      
      * refactor
      
      * refactor
      
      * clean
      
      * fix build
      
      * refactor
      
      * format
      
      * clean up
      
      * use remove_cvref_t
      
      * clean
      
      * clean up
      
      * clean up
      
      * clean up
      ec7c2e91
  8. 08 May, 2022 1 commit
    • Illia Silin's avatar
      Add Benchmark test into CI (#226) · a3c910ac
      Illia Silin authored
      
      
      * add performance test to jenkins pipeline
      
      * fix typo
      
      * fix the syntax in conv_fwd_util.cpp
      
      * fix the error message syntax spacing
      
      * fix the error message syntax spacing again
      
      * run profile_gemm and archive results
      
      * fix typo
      
      * try to figure out the paths
      
      * try to figure out the paths one more time
      
      * skip the copying step
      
      * build ckProfiler release only once
      
      * change directory using dir
      
      * fix dir syntax
      
      * change the gemm parameters
      
      * do not pipe script output to file
      
      * try running ckProfiler directly
      
      * fix typo
      
      * use set +e
      
      * run profile_gemm.sh || true
      
      * run multiple gemms and parse results
      
      * fix typo in jenkinsfile
      
      * fix syntax
      
      * add new gemm sizes, update scripts
      
      * put all jenkins steps in original order
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarChao Liu <lc.roy86@gmail.com>
      a3c910ac
  9. 30 Apr, 2022 2 commits
  10. 29 Apr, 2022 3 commits
  11. 25 Apr, 2022 1 commit
  12. 22 Apr, 2022 3 commits
  13. 21 Apr, 2022 4 commits
    • Adam Osewski's avatar
      Convolution FWD profiler refactor. (#183) · 1a0cd5d1
      Adam Osewski authored
      
      
      * Convolution ND
      
      * Code unification across dimensions for generating tensor descriptors.
      * Example
      * Instances
      
      * Move convnd f32 instance file to comply with repo structure.
      
      * Conv 1D tensor layouts.
      
      * Formatting and use ReferenceConv
      
      * Reference ConvFwd supporting 1D and 2D convolution.
      
      * Debug printing TensorLayout name.
      
      * Conv fwd 1D instance f32
      
      * Refactor conv ND example.
      
      Needed to support various conv dimensio.
      
      Needed to support various conv dimensions
      
      * Rename conv nd example director to prevent conflicts.
      
      * Refactor some common utility to single file.
      
      Plus some tests.
      
      * Refactor GetHostTensorDescriptor + UT.
      
      * Add 1D test case.
      
      * Test reference convolution 1d/2d
      
      * Remove some leftovers.
      
      * Fix convolution example error for 1D
      
      * Refactor test check errors utility function.
      
      * Test Conv2D Fwd XDL
      
      * More UT for 1D case.
      
      * Parameterize input & weight initializers.
      
      * Rename example to prevent conflicts.
      
      * Split convnd instance into separate files for 1d/2d
      
      * Address review comments.
      
      * Fix data type for flops/gbytes calculations.
      
      * Assign example number 11.
      
      * 3D cases for convolution utility functions.
      
      * 3D reference convolution.
      
      * Add support for 3D convolution.
      
      * Check for inputs bigger than  2GB.
      
      * Formatting
      
      * Support for bf16/f16/f32/i8 - conv instances + UT.
      
      * Use check_err from test_util.hpp.
      
      * Split convnd test into separate files for each dim.
      
      * Fix data generation and use proper instances.
      
      * Formatting
      
      * Skip tensor initialization if not necessary.
      
      * Fix CMakefiles.
      
      * Remove redundant conv2d_fwd test.
      
      * Lower problem size for conv3D UT.
      
      * 3D case for convnd example.
      
      * Remove leftovers after merge.
      
      * Add Conv Specialization string to GetTypeString
      
      * Skip instance causing numerical errors.
      
      * Small fixes.
      
      * Remove redundant includes.
      
      * Fix namespace name error.
      
      * Script for automatic testing and logging convolution fwd UTs
      
      * Comment out numactl cmd.
      
      * Refine weights initalization and relax rtol for fp16
      
      * Move test_util.hpp to check_err.hpp
      
      * Refine weights initalization and relax rtol for fp16
      
      * Refactor common part of test conv utils.
      
      * Move utility function to single common place.
      
      * Add additional common functions to utility.
      
      * Refactor convnd_fwd_xdl examples.
      
      * Remove redundant files.
      * Unify structure.
      
      * Add constructor to ConvParams.
      
      * And add input parameters validation.
      
      * Modify conv examples to use single utility file.
      
      * Remove check_error from host_tensor.hpp
      
      * Get rid of check_indices function.
      
      * Remove bf16_to_f32 function overload for scalars.
      
      * Fix namespace.
      
      * Add half_float::half for check_err.
      
      * Fix conv params size in UT.
      
      * Fix weights initialization for int8.
      
      * Fix weights initialization for int8.
      
      * Add type_convert when store output in ref conv 1D.
      
      * Get back old conv2d_fwd_xdl operation.
      
      * Silence conv debug print.
      
      * format
      
      * clean
      
      * clean
      
      * Fix merge.
      
      * Fix namespace for check_err
      
      * Formatting.
      
      * Fix merge artifacts.
      
      * Remove deleted header.
      
      * Fix some includes and use ck::utils::check_err.
      
      * Remove unused check_indices restored by previous merge.
      
      * Fix namespaces after merge.
      
      * Fix compilation error.
      
      * Small fixes.
      
      * Use common functions.
      * Fix filename
      * Fix namespaces.
      
      * Fix merge artifact - retrieve removed by accident fun.
      
      * Fix ConvForwardSpecialization.
      
      * Working example of OpInstanceRunEngine for conv2dfwd UT.
      
      * Adhere to coding style rules.
      
      * Formatting and adhere to coding style rules.
      
      * Fix merge artifacts.
      
      * Utility for collecting conv fwd instances.
      
      + Plus commmon part for parsing cmdline params.
      
      * Refactor FillUniform because of segfault for int8_t.
      
      * Naming convention.
      
      * Elegant version of device mem allocation.
      
      * Use OpInstanceRunEngine in conv fwd nd tests.
      
      * Multiple refinements.
      
      * conditional init
      * don't run reference op if not provided.
      
      * Use OpInstanceRunEngine for ckProfiler conv_fwd
      
      * Refactor common tensor fill function to separate file.
      
      * Clean up unused functions.
      
      * Support different init methods.
      
      * Create CMake target for conv_fwd_util.
      
      * Add header for profile_convnd_fwd.cpp
      
      * Fix CMakefiles to link with conv_fwd_util where needed.
      
      * Fix some clutter.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      1a0cd5d1
    • JD's avatar
      Fix `clang-format` (#189) · 7353ec0c
      JD authored
      * Fix clang-format filepath
      
      * update docker and fix format
      7353ec0c
    • zjing14's avatar
      removed unused lds loads (#196) · 860e291c
      zjing14 authored
      860e291c
    • Qianfeng's avatar
      Use ck::half_t for Host Reduction (#195) · c1ef7319
      Qianfeng authored
      * Add math functions for host
      
      * Change to host reduction to use ck::math:
      
      * Remove the using of half_float::half and half.hpp from reduction example/profiler/ctest
      c1ef7319
  14. 15 Apr, 2022 1 commit
    • Illia Silin's avatar
      Compile CK for all targets (#188) · 4221505d
      Illia Silin authored
      
      
      * compile ck for all targets
      
      * update the target criteria
      
      * change the target condition
      
      * fixed some typos
      
      * fixed missed file
      
      * revert changes in README
      
      * revert device_conv3d_fwd_xdl_...
      
      * update device_conv3d_fwd_xdl_...
      
      * update device_batched_gemm_reduce...
      
      * test the unused arguments fix
      
      * test the warning suppression
      
      * try suppress warnings in device_batched_gemm_reduce_xdl...
      
      * fix the last warnings
      
      * replace UNUSED with std::ignore
      
      * fix a typo
      
      * replaced std::ignore with ignore
      
      * add igonre header to common_header
      
      * refactor atomicAdd
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      4221505d
  15. 07 Apr, 2022 1 commit
  16. 05 Apr, 2022 4 commits
    • Adam Osewski's avatar
      Common forward convolution utility refactor. (#141) · abf4bdb9
      Adam Osewski authored
      
      
      * Convolution ND
      
      * Code unification across dimensions for generating tensor descriptors.
      * Example
      * Instances
      
      * Move convnd f32 instance file to comply with repo structure.
      
      * Conv 1D tensor layouts.
      
      * Formatting and use ReferenceConv
      
      * Reference ConvFwd supporting 1D and 2D convolution.
      
      * Debug printing TensorLayout name.
      
      * Conv fwd 1D instance f32
      
      * Refactor conv ND example.
      
      Needed to support various conv dimensio.
      
      Needed to support various conv dimensions
      
      * Rename conv nd example director to prevent conflicts.
      
      * Refactor some common utility to single file.
      
      Plus some tests.
      
      * Refactor GetHostTensorDescriptor + UT.
      
      * Add 1D test case.
      
      * Test reference convolution 1d/2d
      
      * Remove some leftovers.
      
      * Fix convolution example error for 1D
      
      * Refactor test check errors utility function.
      
      * Test Conv2D Fwd XDL
      
      * More UT for 1D case.
      
      * Parameterize input & weight initializers.
      
      * Rename example to prevent conflicts.
      
      * Split convnd instance into separate files for 1d/2d
      
      * Address review comments.
      
      * Fix data type for flops/gbytes calculations.
      
      * Assign example number 11.
      
      * 3D cases for convolution utility functions.
      
      * 3D reference convolution.
      
      * Add support for 3D convolution.
      
      * Check for inputs bigger than  2GB.
      
      * Formatting
      
      * Support for bf16/f16/f32/i8 - conv instances + UT.
      
      * Use check_err from test_util.hpp.
      
      * Split convnd test into separate files for each dim.
      
      * Fix data generation and use proper instances.
      
      * Formatting
      
      * Skip tensor initialization if not necessary.
      
      * Fix CMakefiles.
      
      * Remove redundant conv2d_fwd test.
      
      * Lower problem size for conv3D UT.
      
      * 3D case for convnd example.
      
      * Remove leftovers after merge.
      
      * Add Conv Specialization string to GetTypeString
      
      * Skip instance causing numerical errors.
      
      * Small fixes.
      
      * Remove redundant includes.
      
      * Fix namespace name error.
      
      * Script for automatic testing and logging convolution fwd UTs
      
      * Comment out numactl cmd.
      
      * Refine weights initalization and relax rtol for fp16
      
      * Move test_util.hpp to check_err.hpp
      
      * Refine weights initalization and relax rtol for fp16
      
      * Refactor common part of test conv utils.
      
      * Move utility function to single common place.
      
      * Add additional common functions to utility.
      
      * Refactor convnd_fwd_xdl examples.
      
      * Remove redundant files.
      * Unify structure.
      
      * Add constructor to ConvParams.
      
      * And add input parameters validation.
      
      * Modify conv examples to use single utility file.
      
      * Remove check_error from host_tensor.hpp
      
      * Get rid of check_indices function.
      
      * Remove bf16_to_f32 function overload for scalars.
      
      * Fix namespace.
      
      * Add half_float::half for check_err.
      
      * Fix conv params size in UT.
      
      * Fix weights initialization for int8.
      
      * Fix weights initialization for int8.
      
      * Add type_convert when store output in ref conv 1D.
      
      * Get back old conv2d_fwd_xdl operation.
      
      * Silence conv debug print.
      
      * format
      
      * clean
      
      * clean
      
      * Fix merge.
      
      * Fix namespace for check_err
      
      * Formatting.
      
      * Fix merge artifacts.
      
      * Remove deleted header.
      
      * Fix some includes and use ck::utils::check_err.
      
      * Remove unused check_indices restored by previous merge.
      
      * Fix namespaces after merge.
      
      * Fix compilation error.
      
      * Small fixes.
      
      * Use common functions.
      * Fix filename
      * Fix namespaces.
      
      * Fix merge artifact - retrieve removed by accident fun.
      
      * Fix ConvForwardSpecialization.
      
      * Adhere to coding style rules.
      
      * Fix merge artifacts.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      abf4bdb9
    • ltqin's avatar
      Patch for bwd data comments (#174) · 6717168c
      ltqin authored
      * change function name and way to set input zero
      
      * change enable if
      6717168c
    • ltqin's avatar
      NHWC Conv2d Bwd weight fp16 ckprofiler and test (#166) · 781cacd2
      ltqin authored
      * change backward weight name
      
      * start add bwd weight lib and profiler
      
      * change tuning paramter
      
      * change output info
      
      * add bwd weight test
      
      * change test info
      
      * using conv_util
      
      * change wgt to weight
      
      * add }
      
      * add fp32
      781cacd2
    • Qianfeng's avatar
      Improve Reduction kernel api (#152) · 82c8b9f8
      Qianfeng authored
      * Add ThreadwiseReduction functor as per-thread reduction api
      
      * Using ThreadwiseReduce api and some change in using PartitionedBlockwiseReduction api to simply the kernels
      
      * Add comments and remove useless declarations in the kernels
      
      * Tiny updates
      82c8b9f8
  17. 01 Apr, 2022 1 commit
  18. 31 Mar, 2022 7 commits
  19. 30 Mar, 2022 1 commit
    • Jianfeng Yan's avatar
      Batched gemm and reduction (#156) · 34c661e7
      Jianfeng Yan authored
      * adding batched_gemm_and_reduction
      
      * batched_gemm_reduce works with bactch_count=1
      
      * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1
      
      * adding profiler for batched_gemm_fp16
      
      * fixed a bug in declaration of d1 and d0; both example and profiler work
      
      * clang-format
      
      * cleanup
      
      * batched_gemm_reduce: add test
      
      * minor change
      
      * fixed some typo in function names
      34c661e7
  20. 29 Mar, 2022 1 commit