1. 06 Feb, 2023 1 commit
    • Illia Silin's avatar
      Fix CI issues. (#572) · f73574ff
      Illia Silin authored
      * switch to recent staging compiler as default for CI
      
      * fix the baseline query
      
      * roll back sqlalchemy to version 1.4.46
      f73574ff
  2. 01 Feb, 2023 1 commit
  3. 31 Jan, 2023 1 commit
  4. 30 Jan, 2023 1 commit
  5. 26 Jan, 2023 1 commit
  6. 25 Jan, 2023 1 commit
    • Qianfeng's avatar
      Batchnorm inference instances, external API, client examples and gtests (#531) · a1b2441f
      Qianfeng authored
      * File renaming and class renaming for device element-wise operation
      
      * Add batchnorm-infer instances, external API and client example
      
      * Add batchnorm-infer profiler module and gtests
      
      * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp
      
      * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer
      
      * Rename class and file due to conflict from device_elementwise_2d.hpp
      
      * Fix namespace in batcnnorm_infer_nhwc client example
      a1b2441f
  7. 18 Jan, 2023 6 commits
    • Qianfeng's avatar
      Use double for all scaling values and float-point constant values at the Device Op API (#557) · 52abc2f3
      Qianfeng authored
      * Use double as alpha/beta values type in reduce device op api
      
      * Use double as alpha/beta values type in softmax device op api
      
      * Use double as alpha/beta values type in multiple-reduce device op api
      
      * Use double as epsilon value type in normalization/elementwise-normalization device op api
      52abc2f3
    • Raman R jana's avatar
      Wavelet (inter-wave consumer-producer) GEMM (#310) · 1cfa8760
      Raman R jana authored
      
      
      * wavelet gemm programming model support for CK
      
      * GEMM pipeline update for wavelet progrmmaing model
      
      * Updated wavelet programming pipeline
      
      * fixes for global-write for math-wave
      
      * fixed bug in global writes
      
      * Updated comments for better readability
      
      * fixed clang format errors
      
      * added block_lds without barrier sync
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * refactor
      
      * prototype
      
      4 layouts
      
      fix default stride
      
      all problem sizes
      
      tidy
      
      move file
      
      update build script
      
      restore old file
      
      fix build
      
      * refactor standalone test to use gemm test harness
      
      * simplify gemm test
      
      * update build script
      
      * remove redundant
      
      * early return when cmd arg doesn't match
      
      * tidy
      
      * report failure when result not validated
      
      * tidy
      
      * Add comment depicting B2C mapping pattern.
      
      * Formatting & comments.
      
      * Comparison with custom B2C mapping pattern.
      
      * Example for wavelet gemm.
      
      * Add wavelet to Gemm standalone test.
      
      * Remove debug code.
      
      * Remove dangling #endif directive.
      
      Co-authored-by: root <Raman Jana>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarAnthony Chang <ac.chang@outlook.com>
      Co-authored-by: default avatarAdam Osewski <19374865+aosewski@users.noreply.github.com>
      1cfa8760
    • ltqin's avatar
      Add multiD Gemm client APIs (#534) · d66421fe
      ltqin authored
      
      
      * start add example
      
      * fix config
      
      * fix showinfo bug
      
      * add an elementop
      
      * change to padding
      
      * add xdl example
      
      * change elementwiseop
      
      * add instance
      
      * add instance to profiler
      
      * change file name
      
      * fix deive not support issue
      
      * add client example
      
      * fix client gemm_add_multiply name
      
      * change AddMultiply elementwiseop
      
      * fix elementwiseop
      
      * fix client example
      
      * fix addmultiply op
      
      * fix comments and fun name
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      d66421fe
    • Illia Silin's avatar
      fix a bug for 6-dim kernels (#555) · 00ff30af
      Illia Silin authored
      00ff30af
    • who who who's avatar
      add multi embeddings support (#542) · 147b7db5
      who who who authored
      * add multi embeddings support
      
      * fix format
      
      * optimize sqrt
      
      * add reduce operation
      
      * change to elementwise op
      
      * fix name
      
      * rename
      
      * run ci cd
      
      * format example
      
      * format code
      
      * format code
      147b7db5
    • ltqin's avatar
      Add client API/examples for 3xGemm+Bias+Add+Permute{0, 2, 3, 1} (#550) · 55236709
      ltqin authored
      * add example
      
      * fix example
      
      * add instance for gemm permute
      
      * add to client example
      
      * change configs
      
      * change instance file name
      
      * formate
      
      * change client example file name and remove example
      55236709
  8. 17 Jan, 2023 3 commits
    • Qianfeng's avatar
      Reduction external API and client examples (#493) · 80e05267
      Qianfeng authored
      
      
      * Change to the DeviceReduce base class template to include all problem description information
      
      * Add external api for reduction
      
      * Add client example to test the reduction external api
      
      * Spelling correction
      
      * Re-implement the host_reduction to follow the DeviceReduce base API format
      
      * Change the reduce profiler to call the external API for collecting device instances
      
      * Rename reduce client example directory from 08_reduce to 12_reduce
      
      * Remove (void) before the functional call
      
      * Tiny update in reduce client example
      
      * Tiny update in profile_reduce_impl.hpp
      
      * Rename the reduce client example directory
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      80e05267
    • rocking5566's avatar
      Gemm layernorm welford (#413) · 7829d729
      rocking5566 authored
      
      
      * Add device op of gemm layernorm
      
      * [What] Rename F to H
      [Why] F and G prepare for welford tensor
      
      * Add gridwise gemm + welford
      
      * Extract template parameter
      
      * Rename kernel. Prepare to add second half kernel
      
      * Extract var
      
      * Add second kernel for gemm+layernorm
      
      * Move to the gemm_layernorm folder
      
      * Rename F and G to mean and var
      
      * Do not use snakeCurved, it makes determination of padding  for welford difficult
      
      * Rewrite the device interface and rename some var
      
      * Add welford count
      
      * Update interface
      
      * Sync code, prepare to test on MI200
      
      * Clean the code
      
      * Implement layernorm
      
      * Add comment to mension hipFree
      
      * Wrtie out the e for debug.
      This could be remove and use h for instead
      
      * 1. Allocate mean, var and count into by SetWorkSpacePointer.
      2. Add GetWorkSpaceSize to calculate the space size
      
      * Add gemm layernorm host code
      
      * use reference layernorm
      
      * Fix bug of blockwise welford for first kernel
      
      * Fix bug of mean var padding for layernorm
      
      * Use sgpr for shuffleM_index
      
      * padding for GemmMeanVarCountGridDescriptor_M_NBlock
      
      * Add layout parameter
      
      * Check argument for gemm
      
      * calculate max count for tail block
      
      * Share E and H memory in device op
      
      * Hard code the vector dim
      
      * Refine the MakeDescriptor
      
      * 1. Remove E parameter, because E is inside of device op
      2. Check vector size
      
      * [What] Rename MakeMeanVarDescriptor_M_N
      [Why] Prepare to add count version of make descriptor
      
      * Use 1D global memory for count
      
      * Prevent redundant IO
      
      * Update parameter
      
      * Add pipeline v1/v2 selector
      
      * Rename the example name
      
      * Add base class for gemm layernorm
      
      * Refine naming to distinguish naive and welford
      
      * Add comment to explan in detail
      
      * We don't need to pad in N dimension in gemm for mean/var/count. Set NPerTile 1
      
      * Rewrite the 2st kernel, use multiple block along N dimension in layernorm kernel
      
      * Share the vector size
      
      * Refine var name
      
      * [What] Force LayernormThreadSliceSize_N = vector size.
      [Why] Memory coalesce
      
      * Add comment
      
      * Extract divisor out of the loop in reference layernorm
      
      * Pad different size for E and H in layernorm kernel according to different block tile
      
      * Refine naming
      
      * Refine naming
      
      * Prevent implicit cast
      
      * [What] use ck::math::sqrt instead of __builtin_amdgcn_sqrtf
      [Why] __builtin_amdgcn_sqrtf is only support float, double will cause casting
      
      * Cast only constant
      
      * Change of post shuffle thread descriptor
      
      * Add EMeanVarDataType parameter.
      
      * Merge the mean and var threadwise copy
      
      * Add missing index
      
      * Fix Typo
      
      * Sync the variable with previous if
      
      * 1. Declare e inside the host_gemm_layernorm()
      2. Prevent implicit cast in reference code
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      7829d729
    • Haocong WANG's avatar
      [Navi3x-LWPCK-545] Block-wise GEMM + Real GEMM_WMMA_FP16 (#541) · 919aeb1f
      Haocong WANG authored
      * wmma_op + unit test
      
      * add arch limitation to wmma test
      
      * change arch limitation
      
      * Refactor + Add all type unit test(int4 compile failed)
      
      * Add f32_16x16x16_bf16 unit test
      
      * tempsave
      
      * tempsave
      
      * tempsave
      
      * runtime bug, cannot find symbol
      
      * workaround for incorrect HIP warpSize return value
      
      * debugging
      
      * tempsave
      
      * Correctness OK, waiting for optimization
      
      * Tidy up + format
      
      * temp save
      
      * temp save, reproduce the v_bfi_b32 issue
      
      * add inline asm for wmmaop test
      
      * tidy up
      
      * clean some debug purpose code
      
      * discard some codes
      
      * clang format
      
      * clang format
      
      * compiler issue fixed + increase tile size
      919aeb1f
  9. 12 Jan, 2023 2 commits
    • Illia Silin's avatar
      Add a flag to enable/disable debug output in many kernels. (#549) · 715e8dd2
      Illia Silin authored
      * add DEBUG_LOG macro to enable/disable debug output
      
      * fix syntax
      
      * fix syntax again
      
      * fix syntax one more time
      
      * remove balnk spaces
      
      * use ifdefs
      
      * add the Print argument
      
      * move the definition of DEBUG_LOG to ck.hpp
      
      * add the missign argument to Print()
      715e8dd2
    • Qianfeng's avatar
      Remove including of cmath (#551) · a17b0414
      Qianfeng authored
      * Let cmath included when compiling host codes in math_v2.hpp
      
      * Remove including of cmath in device_base.hpp and device_permute.hpp
      a17b0414
  10. 15 Dec, 2022 4 commits
  11. 14 Dec, 2022 1 commit
  12. 12 Dec, 2022 1 commit
    • arai713's avatar
      Gridwise elementwise 2d (#466) · 0e5c264c
      arai713 authored
      
      
      * added 2d gridwise elementwise
      
      * added 2d version of device elementwise
      
      * added example file with updated device elementwise call
      
      * added Cmake file
      
      * changed NumDim into 2D
      
      * fixed compiler issues
      
      * fixed indexing for loop step
      
      * fixed NumDim dimension error
      
      * changed blockID to 2D
      
      * updated Grid Desc
      
      * updated kernel call
      
      * fixed 2d thread indexing
      
      * added dimensions for example file
      
      * commented out unused code
      
      * changed vector load
      
      * removed extra code
      
      * temporarily removing vector load on 2nd dim
      
      * changed vector load back, still causing errors
      
      * altered indexing
      
      * changed isSupportedArgument for 2D
      
      * changed indexing + do/while
      
      * fixed isSupportedArgument
      
      * changed dimension for debugging
      
      * fixed
      
      * added testing printouts
      
      * testing change
      
      * added variables to distribute threads through both dimensions
      
      * testing changes
      
      * integrated variable for thread distribution into device elementwise and added as parameter for gridwise elementwise
      
      * removed most of the extraneous code, testing with different dimensions
      
      * testing
      
      * removed debugging print statements
      
      * moved 2d elementwise permute into elementwise permute directory
      
      * fixed formatting
      
      * removed debugging comments from threadwise transfer
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      0e5c264c
  13. 08 Dec, 2022 1 commit
  14. 07 Dec, 2022 3 commits
  15. 06 Dec, 2022 1 commit
    • Illia Silin's avatar
      Fix CI error. (#530) · d072790f
      Illia Silin authored
      * ignore .git folder when doing clang-format
      
      * fix syntax
      
      * add backslashes before quotes
      
      * add path filter for several extensions
      d072790f
  16. 02 Dec, 2022 3 commits
  17. 01 Dec, 2022 1 commit
    • Po Yen Chen's avatar
      Modularize ckProfiler operations (#514) · 8784a72e
      Po Yen Chen authored
      
      
      * Re-structure ckProfiler source files
      
      * Rename profiler.cpp to main.cpp
      
      * Modularize ckProfiler operations
      
      * Add description for profiler operations
      
      * Use longer name to avoid name collision
      
      * Use macro to delay expansion
      
      * Use std::move() to avoid object copying
      
      * Prohibit users from calling dtor
      
      * Use macro to eliminate redundant code
      
      * Make friend function hidden
      
      * Add missing include directive <iostream>
      
      * Fix wrong include directives
      
      * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      8784a72e
  18. 30 Nov, 2022 2 commits
    • rocking5566's avatar
      gemm, conv perchannel quantization (#503) · ad541ad6
      rocking5566 authored
      * Use gemm_multiple_D instead
      
      * Add gemm bias relu quantization example
      
      * Add pure gemm quantization example
      
      * Add quantization of perchannel conv + bias + relu example
      
      * Refine the code
      
      * Rename multiplier to requant_scale
      
      * Rename the folder
      
      * Remove redundant comment
      
      * Rename the file. Prepare to add perchannel
      
      * Add conv perchannel instance
      
      * Move to quantization folder
      
      * Add conv perchannel client example
      
      * Apply Rangify constructor of HostTensorDescriptor & Tensor<>
      
      * Fix merge error
      ad541ad6
    • Qianfeng's avatar
      BatchNorm backward instance/external API/profiler/tests (#519) · 63af525c
      Qianfeng authored
      * Refine the device batchnorm-backward base API templates and data type assignments
      
      * Remove duplicated kernel file
      
      * Add batchnorm backward instances and external API
      
      * Add batchnorm-backward profiler and tests
      
      * Add client example which uses batchnorm backward external API
      
      * Merge test/batchnorm_fwd and test/batchnorm_bwd into one directory
      
      * Loose the threshold for batchnorm-backward check_err()
      63af525c
  19. 29 Nov, 2022 3 commits
    • Anthony Chang's avatar
      Fix split-k gemm test (#231) · 236bd148
      Anthony Chang authored
      
      
      * properly return error flag; reveals bug in split-k gemm
      
      * fix bug in split k
      
      * update split-k test case
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      236bd148
    • fsx950223's avatar
      fix GetTypeString · 0e9c88ce
      fsx950223 authored
      0e9c88ce
    • Qianfeng's avatar
      BatchNorm backward implementation (#461) · 44789d99
      Qianfeng authored
      * Implemented batchnorm-backward Blockwise and Multiblock kernels
      
      * Add batchnorm-backward device op
      
      * Add batchnorm-backward host-reference op
      
      * Add batchnorm-backward example
      
      * Parameters renaming in batchnorm backward kernels and device op
      
      * Change in the example to loose the threshold for ScaleDiff checking
      
      * Add comments to explain the implementation of batchnorm-backward
      
      * Parameters renaming again in batchnorm backward kernels
      
      * Improve the expression calculation for performance
      
      * Add batchnorm backward to README
      
      * Add comments to explain inv-variance in batchnorm forward and backward
      
      * Renaming the batchnorm forward training and inferring examples
      
      * Add/update the comments for batchnorm-backward kernels
      
      * Renaming again
      
      * Add block_sync_lds between two consecutive blockwise reductions
      
      * Move common expression 1/N out of the static_for loops
      
      * Add dy_elementwise_op
      
      * Renaming in backward example again
      
      * Add checking for reduceDims in reference_batchnorm_backward
      
      * Update to comments and codes format
      
      * Rename in the comments
      
      * Remove common expression out of the loop in reference_batchnorm_backward_nhwc_c
      
      * Add block_sync_lds() between blockwise reduction again
      
      * Fix comments again
      
      * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
      44789d99
  20. 28 Nov, 2022 1 commit
  21. 25 Nov, 2022 1 commit
    • Qianfeng's avatar
      BatchNorm forward instance/external api/profiler/tests/client example (#511) · 4e6a5575
      Qianfeng authored
      
      
      * Update to device_batchnorm_forward base class to include all template parameters for problem description
      
      * Add batchnorm forward instances and external api
      
      * Add batchnorm forward profiler module which uses the external api
      
      * Add some comments in batchnorm_forward example to explain the dimensions in lengths[]
      
      * Replace the reference_batchnorm_forward_nhwc_c by generic reference_batchnorm_forward
      
      * Improvement to the batchnorm infer base API
      
      * Add batchnorm forward client example which shows using the batchnorm forward external API
      
      * Add test for batchnorm forward
      
      * Tuning the batchnorm profiler initialized values and error threshold
      
      * Add support for bhalf_t in instances/external api/tests
      
      * Add support for int8_t in instances/external api/tests
      
      * Add support for double in instances/external api/tests
      
      * Let ScaleDataType and BiasDataType be same as XDataType and YDataType when creating instances
      
      * Checking before running best instance in batchnorm_fwd_nhwc client example
      
      * Add checking for YElementwiseOp in batchnorm_forward external API
      
      * Add more types in batchnorm forward profiler
      
      * Add more test lengths
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      4e6a5575
  22. 20 Nov, 2022 1 commit