1. 16 Feb, 2023 2 commits
  2. 14 Feb, 2023 1 commit
  3. 11 Feb, 2023 1 commit
  4. 09 Feb, 2023 1 commit
  5. 03 Feb, 2023 1 commit
  6. 30 Jan, 2023 1 commit
  7. 18 Jan, 2023 3 commits
  8. 16 Jan, 2023 1 commit
  9. 13 Jan, 2023 1 commit
  10. 12 Jan, 2023 2 commits
    • Illia Silin's avatar
      Add a flag to enable/disable debug output in many kernels. (#549) · 715e8dd2
      Illia Silin authored
      * add DEBUG_LOG macro to enable/disable debug output
      
      * fix syntax
      
      * fix syntax again
      
      * fix syntax one more time
      
      * remove balnk spaces
      
      * use ifdefs
      
      * add the Print argument
      
      * move the definition of DEBUG_LOG to ck.hpp
      
      * add the missign argument to Print()
      715e8dd2
    • Qianfeng's avatar
      Remove including of cmath (#551) · a17b0414
      Qianfeng authored
      * Let cmath included when compiling host codes in math_v2.hpp
      
      * Remove including of cmath in device_base.hpp and device_permute.hpp
      a17b0414
  11. 11 Jan, 2023 1 commit
  12. 15 Dec, 2022 6 commits
  13. 13 Dec, 2022 1 commit
  14. 12 Dec, 2022 2 commits
    • arai713's avatar
      Gridwise elementwise 2d (#466) · 0e5c264c
      arai713 authored
      
      
      * added 2d gridwise elementwise
      
      * added 2d version of device elementwise
      
      * added example file with updated device elementwise call
      
      * added Cmake file
      
      * changed NumDim into 2D
      
      * fixed compiler issues
      
      * fixed indexing for loop step
      
      * fixed NumDim dimension error
      
      * changed blockID to 2D
      
      * updated Grid Desc
      
      * updated kernel call
      
      * fixed 2d thread indexing
      
      * added dimensions for example file
      
      * commented out unused code
      
      * changed vector load
      
      * removed extra code
      
      * temporarily removing vector load on 2nd dim
      
      * changed vector load back, still causing errors
      
      * altered indexing
      
      * changed isSupportedArgument for 2D
      
      * changed indexing + do/while
      
      * fixed isSupportedArgument
      
      * changed dimension for debugging
      
      * fixed
      
      * added testing printouts
      
      * testing change
      
      * added variables to distribute threads through both dimensions
      
      * testing changes
      
      * integrated variable for thread distribution into device elementwise and added as parameter for gridwise elementwise
      
      * removed most of the extraneous code, testing with different dimensions
      
      * testing
      
      * removed debugging print statements
      
      * moved 2d elementwise permute into elementwise permute directory
      
      * fixed formatting
      
      * removed debugging comments from threadwise transfer
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      0e5c264c
    • aska-0096's avatar
      temp save · 9739ede0
      aska-0096 authored
      9739ede0
  15. 09 Dec, 2022 2 commits
  16. 08 Dec, 2022 1 commit
  17. 07 Dec, 2022 1 commit
  18. 05 Dec, 2022 1 commit
  19. 02 Dec, 2022 4 commits
  20. 01 Dec, 2022 1 commit
  21. 30 Nov, 2022 3 commits
    • rocking5566's avatar
      gemm, conv perchannel quantization (#503) · ad541ad6
      rocking5566 authored
      * Use gemm_multiple_D instead
      
      * Add gemm bias relu quantization example
      
      * Add pure gemm quantization example
      
      * Add quantization of perchannel conv + bias + relu example
      
      * Refine the code
      
      * Rename multiplier to requant_scale
      
      * Rename the folder
      
      * Remove redundant comment
      
      * Rename the file. Prepare to add perchannel
      
      * Add conv perchannel instance
      
      * Move to quantization folder
      
      * Add conv perchannel client example
      
      * Apply Rangify constructor of HostTensorDescriptor & Tensor<>
      
      * Fix merge error
      ad541ad6
    • Qianfeng's avatar
      BatchNorm backward instance/external API/profiler/tests (#519) · 63af525c
      Qianfeng authored
      * Refine the device batchnorm-backward base API templates and data type assignments
      
      * Remove duplicated kernel file
      
      * Add batchnorm backward instances and external API
      
      * Add batchnorm-backward profiler and tests
      
      * Add client example which uses batchnorm backward external API
      
      * Merge test/batchnorm_fwd and test/batchnorm_bwd into one directory
      
      * Loose the threshold for batchnorm-backward check_err()
      63af525c
    • aska-0096's avatar
      runtime bug, cannot find symbol · 9adf2e60
      aska-0096 authored
      9adf2e60
  22. 29 Nov, 2022 2 commits
    • fsx950223's avatar
      fix GetTypeString · 0e9c88ce
      fsx950223 authored
      0e9c88ce
    • Qianfeng's avatar
      BatchNorm backward implementation (#461) · 44789d99
      Qianfeng authored
      * Implemented batchnorm-backward Blockwise and Multiblock kernels
      
      * Add batchnorm-backward device op
      
      * Add batchnorm-backward host-reference op
      
      * Add batchnorm-backward example
      
      * Parameters renaming in batchnorm backward kernels and device op
      
      * Change in the example to loose the threshold for ScaleDiff checking
      
      * Add comments to explain the implementation of batchnorm-backward
      
      * Parameters renaming again in batchnorm backward kernels
      
      * Improve the expression calculation for performance
      
      * Add batchnorm backward to README
      
      * Add comments to explain inv-variance in batchnorm forward and backward
      
      * Renaming the batchnorm forward training and inferring examples
      
      * Add/update the comments for batchnorm-backward kernels
      
      * Renaming again
      
      * Add block_sync_lds between two consecutive blockwise reductions
      
      * Move common expression 1/N out of the static_for loops
      
      * Add dy_elementwise_op
      
      * Renaming in backward example again
      
      * Add checking for reduceDims in reference_batchnorm_backward
      
      * Update to comments and codes format
      
      * Rename in the comments
      
      * Remove common expression out of the loop in reference_batchnorm_backward_nhwc_c
      
      * Add block_sync_lds() between blockwise reduction again
      
      * Fix comments again
      
      * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
      44789d99
  23. 25 Nov, 2022 1 commit
    • Qianfeng's avatar
      BatchNorm forward instance/external api/profiler/tests/client example (#511) · 4e6a5575
      Qianfeng authored
      
      
      * Update to device_batchnorm_forward base class to include all template parameters for problem description
      
      * Add batchnorm forward instances and external api
      
      * Add batchnorm forward profiler module which uses the external api
      
      * Add some comments in batchnorm_forward example to explain the dimensions in lengths[]
      
      * Replace the reference_batchnorm_forward_nhwc_c by generic reference_batchnorm_forward
      
      * Improvement to the batchnorm infer base API
      
      * Add batchnorm forward client example which shows using the batchnorm forward external API
      
      * Add test for batchnorm forward
      
      * Tuning the batchnorm profiler initialized values and error threshold
      
      * Add support for bhalf_t in instances/external api/tests
      
      * Add support for int8_t in instances/external api/tests
      
      * Add support for double in instances/external api/tests
      
      * Let ScaleDataType and BiasDataType be same as XDataType and YDataType when creating instances
      
      * Checking before running best instance in batchnorm_fwd_nhwc client example
      
      * Add checking for YElementwiseOp in batchnorm_forward external API
      
      * Add more types in batchnorm forward profiler
      
      * Add more test lengths
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      4e6a5575