1. 20 Nov, 2023 1 commit
  2. 17 Nov, 2023 1 commit
  3. 13 Nov, 2023 1 commit
    • arai713's avatar
      Hip tensor permute (#1002) · 454cf7bd
      arai713 authored
      * adding files for F32 example
      
      * adding functioning implementation with scalar multiplication and unary operator support
      
      * added fp 16 type check in unary square
      
      * updating scalar multiplication as an operator
      
      * functioning version with scalar operator
      
      * changing strides for col major
      
      * updated column major implementation
      
      * working column major implementation
      
      * cleaned up comments, rearranged/renamed files
      454cf7bd
  4. 09 Nov, 2023 1 commit
    • arai713's avatar
      Transpose 3d (#984) · 3af8c81a
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      3af8c81a
  5. 08 Nov, 2023 1 commit
  6. 24 Oct, 2023 2 commits
  7. 23 Sep, 2023 1 commit
  8. 21 Sep, 2023 1 commit
    • Illia Silin's avatar
      Refactoring cmake files to build data types separately. (#932) · bba085d2
      Illia Silin authored
      * refactor cmake files for the tests
      
      * refactor cmake files for examples
      
      * fix cmake for gemm example
      
      * fix the cmake file for all examples
      
      * add splitting by data types in gemm_splitk instance header
      
      * rename test to reflect only dl instances are used
      
      * clean up CI workspace, update cmake for instances
      
      * change the jenkinsfile syntax
      
      * build all instances except DL on gfx11
      
      * move workspace cleanup after stages
      
      * clean up workspace after every stage
      
      * isolate data types in grouped_conv_fwd header
      
      * isolate dl instances for grouped_conv2d_fwd
      
      * fix syntax
      
      * fix cmake and batchnorm instances
      
      * fix typo
      
      * fix reduction instances
      
      * fix grouped_conv headers
      
      * fix syntax
      
      * replace parsing logic for instances, replace bfp16 with bf16
      
      * fix the client examples build
      
      * clean up DTYPES from instances cmake files
      
      * update the parsing logic in cmake files
      
      * make an exception for reduction kernels
      
      * update few remaining cmake files to handle DTYPES
      
      * fix syntax
      
      * fix cmake conflicts
      
      * replace f8 with fp8 test name
      
      * resolve conflicts for dpp instances
      bba085d2
  9. 20 Sep, 2023 1 commit
  10. 06 Sep, 2023 1 commit
  11. 29 Aug, 2023 1 commit
  12. 07 Aug, 2023 1 commit
    • Illia Silin's avatar
      Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769
      Illia Silin authored
      * properly split conv_nd_bwd_data instances
      
      * split conv2d_fwd instance data types
      
      * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
      
      * split the tests by data types where possible
      
      * filter examples by DTYPES
      
      * split few remaining examples by DTYPES
      
      * filter most instances by DTYPES
      
      * add new lines at end of headers, fix grouped_gemm profiler
      
      * fix syntax
      
      * split the ckprofiler instances by DTYPES
      
      * split the conv2d and quantization DL and XDL instances
      
      * fix the splitting of conv2d DL instances
      
      * split softmax and pool_fwd tests for fp16 and fp32 types
      
      * fix syntax
      
      * fix the dl_int8 quantization instances isolation
      08eb1769
  13. 12 Dec, 2022 1 commit
    • arai713's avatar
      Gridwise elementwise 2d (#466) · 0e5c264c
      arai713 authored
      
      
      * added 2d gridwise elementwise
      
      * added 2d version of device elementwise
      
      * added example file with updated device elementwise call
      
      * added Cmake file
      
      * changed NumDim into 2D
      
      * fixed compiler issues
      
      * fixed indexing for loop step
      
      * fixed NumDim dimension error
      
      * changed blockID to 2D
      
      * updated Grid Desc
      
      * updated kernel call
      
      * fixed 2d thread indexing
      
      * added dimensions for example file
      
      * commented out unused code
      
      * changed vector load
      
      * removed extra code
      
      * temporarily removing vector load on 2nd dim
      
      * changed vector load back, still causing errors
      
      * altered indexing
      
      * changed isSupportedArgument for 2D
      
      * changed indexing + do/while
      
      * fixed isSupportedArgument
      
      * changed dimension for debugging
      
      * fixed
      
      * added testing printouts
      
      * testing change
      
      * added variables to distribute threads through both dimensions
      
      * testing changes
      
      * integrated variable for thread distribution into device elementwise and added as parameter for gridwise elementwise
      
      * removed most of the extraneous code, testing with different dimensions
      
      * testing
      
      * removed debugging print statements
      
      * moved 2d elementwise permute into elementwise permute directory
      
      * fixed formatting
      
      * removed debugging comments from threadwise transfer
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      0e5c264c
  14. 19 Oct, 2022 1 commit
  15. 17 Oct, 2022 1 commit
    • arai713's avatar
      adding tensor_permutation example folder (#389) · cee440fe
      arai713 authored
      * adding tensor_permutation example folder
      
      * fixed formatting
      
      * adding tensor_permutation example folder
      
      * fixed formatting
      
      * changed deviceelementwise parameters for outscalar
      
      * removed .swo file
      
      * updated folder/file name
      
      * changed function call in verification for better consistency with hostelementwist parameters
      
      * formatted again
      
      * fixed shape in verification function call
      
      * changed verification function call, added definition for nhwc
      
      * added elementwise permute example
      
      * updated CMakeLists file in folder
      
      * Delete CmakeLists.txt
      
      * Delete tensor_permute.cpp
      
      * first version of 2d gridwise_elementwise kernel
      
      * temporary fix for stride problem
      
      * formatting
      
      * format
      
      * changed directory name
      
      * Delete gridwise_elementwise_2d.hpp
      
      * Delete CMakeLists.txt
      
      * Delete extra file
      
      * delete extra file
      
      * got rid of extraneous code
      
      * added 2d device elementwise file
      
      * deleted accidently added file
      
      * update
      
      * stride values generalized with equations
      
      * updated stride for output matrix
      
      * Update CMakeLists.txt
      
      * removed extraneous commented code
      
      * removed shape_nchw vector, replaced with GetLength for each dimension
      
      * changed vector load in kernel call
      
      * removed extra space in CMake
      cee440fe