1. 29 Nov, 2023 1 commit
    • arai713's avatar
      Disable transpose device op for MI300 (#1050) · a2969aa8
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      * disabled device op for MI300
      
      * skip  elementwise_permute_2d on gfx94x
      
      * Update CMakeLists.txt
      
      * fixing CMake - disabling some GPU targets
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      a2969aa8
  2. 13 Nov, 2023 1 commit
    • arai713's avatar
      Hip tensor permute (#1002) · 454cf7bd
      arai713 authored
      * adding files for F32 example
      
      * adding functioning implementation with scalar multiplication and unary operator support
      
      * added fp 16 type check in unary square
      
      * updating scalar multiplication as an operator
      
      * functioning version with scalar operator
      
      * changing strides for col major
      
      * updated column major implementation
      
      * working column major implementation
      
      * cleaned up comments, rearranged/renamed files
      454cf7bd
  3. 09 Nov, 2023 1 commit
    • arai713's avatar
      Transpose 3d (#984) · 3af8c81a
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      3af8c81a
  4. 21 Sep, 2023 1 commit
    • Illia Silin's avatar
      Refactoring cmake files to build data types separately. (#932) · bba085d2
      Illia Silin authored
      * refactor cmake files for the tests
      
      * refactor cmake files for examples
      
      * fix cmake for gemm example
      
      * fix the cmake file for all examples
      
      * add splitting by data types in gemm_splitk instance header
      
      * rename test to reflect only dl instances are used
      
      * clean up CI workspace, update cmake for instances
      
      * change the jenkinsfile syntax
      
      * build all instances except DL on gfx11
      
      * move workspace cleanup after stages
      
      * clean up workspace after every stage
      
      * isolate data types in grouped_conv_fwd header
      
      * isolate dl instances for grouped_conv2d_fwd
      
      * fix syntax
      
      * fix cmake and batchnorm instances
      
      * fix typo
      
      * fix reduction instances
      
      * fix grouped_conv headers
      
      * fix syntax
      
      * replace parsing logic for instances, replace bfp16 with bf16
      
      * fix the client examples build
      
      * clean up DTYPES from instances cmake files
      
      * update the parsing logic in cmake files
      
      * make an exception for reduction kernels
      
      * update few remaining cmake files to handle DTYPES
      
      * fix syntax
      
      * fix cmake conflicts
      
      * replace f8 with fp8 test name
      
      * resolve conflicts for dpp instances
      bba085d2
  5. 07 Aug, 2023 1 commit
    • Illia Silin's avatar
      Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769
      Illia Silin authored
      * properly split conv_nd_bwd_data instances
      
      * split conv2d_fwd instance data types
      
      * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
      
      * split the tests by data types where possible
      
      * filter examples by DTYPES
      
      * split few remaining examples by DTYPES
      
      * filter most instances by DTYPES
      
      * add new lines at end of headers, fix grouped_gemm profiler
      
      * fix syntax
      
      * split the ckprofiler instances by DTYPES
      
      * split the conv2d and quantization DL and XDL instances
      
      * fix the splitting of conv2d DL instances
      
      * split softmax and pool_fwd tests for fp16 and fp32 types
      
      * fix syntax
      
      * fix the dl_int8 quantization instances isolation
      08eb1769
  6. 25 Jan, 2023 1 commit
    • Qianfeng's avatar
      Batchnorm inference instances, external API, client examples and gtests (#531) · a1b2441f
      Qianfeng authored
      * File renaming and class renaming for device element-wise operation
      
      * Add batchnorm-infer instances, external API and client example
      
      * Add batchnorm-infer profiler module and gtests
      
      * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp
      
      * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer
      
      * Rename class and file due to conflict from device_elementwise_2d.hpp
      
      * Fix namespace in batcnnorm_infer_nhwc client example
      a1b2441f
  7. 12 Dec, 2022 1 commit
    • arai713's avatar
      Gridwise elementwise 2d (#466) · 0e5c264c
      arai713 authored
      
      
      * added 2d gridwise elementwise
      
      * added 2d version of device elementwise
      
      * added example file with updated device elementwise call
      
      * added Cmake file
      
      * changed NumDim into 2D
      
      * fixed compiler issues
      
      * fixed indexing for loop step
      
      * fixed NumDim dimension error
      
      * changed blockID to 2D
      
      * updated Grid Desc
      
      * updated kernel call
      
      * fixed 2d thread indexing
      
      * added dimensions for example file
      
      * commented out unused code
      
      * changed vector load
      
      * removed extra code
      
      * temporarily removing vector load on 2nd dim
      
      * changed vector load back, still causing errors
      
      * altered indexing
      
      * changed isSupportedArgument for 2D
      
      * changed indexing + do/while
      
      * fixed isSupportedArgument
      
      * changed dimension for debugging
      
      * fixed
      
      * added testing printouts
      
      * testing change
      
      * added variables to distribute threads through both dimensions
      
      * testing changes
      
      * integrated variable for thread distribution into device elementwise and added as parameter for gridwise elementwise
      
      * removed most of the extraneous code, testing with different dimensions
      
      * testing
      
      * removed debugging print statements
      
      * moved 2d elementwise permute into elementwise permute directory
      
      * fixed formatting
      
      * removed debugging comments from threadwise transfer
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      0e5c264c
  8. 14 Nov, 2022 1 commit
    • Po Yen Chen's avatar
      Rangify STL algorithms (#438) · dc663fae
      Po Yen Chen authored
      * Rangify STL algorithms
      
      This commit adapts rangified std::copy(), std::fill() & std::transform()
      
      * Re-write more std::copy() calls
      
      * Re-write std::copy() calls in profiler
      dc663fae
  9. 19 Oct, 2022 1 commit