1. 31 Aug, 2023 2 commits
    • zjing14's avatar
      Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0
      zjing14 authored
      
      
      * move all arguments into device
      
      * add b2c_tile_map
      
      * add examples
      
      * add SetDeviceKernelArgs
      
      * dedicated fixed_nk solution
      
      * init client api
      
      * add grouped_gemm_bias example
      
      * add a instance
      
      * add instances
      
      * formatting
      
      * fixed cmake
      
      * Update EnableCompilerWarnings.cmake
      
      * Update cmake-ck-dev.sh
      
      * clean; fixed comments
      
      * fixed comment
      
      * add instances for fp32 output
      
      * add instances for fp32 output
      
      * add fp32 out client example
      
      * fixed CI
      
      * init commit for kbatch
      
      * add splitk gridwise
      
      * format
      
      * fixed
      
      * clean deviceop
      
      * clean code
      
      * finish splitk
      
      * fixed instances
      
      * change m_loops to tile_loops
      
      * add setkbatch
      
      * clean code
      
      * add splitK+bias
      
      * add instances
      
      * opt mk_nk instances
      
      * clean examples
      
      * fixed CI
      
      * remove zero
      
      * finished non-zero
      
      * clean
      
      * clean code
      
      * optimized global_barrier
      
      * fixed ci
      
      * fixed CI
      
      * removed AddBias
      
      * format
      
      * fixed CI
      
      * fixed CI
      
      * move 20_grouped_gemm to 21_grouped_gemm
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      f5ec04f0
    • rocking's avatar
      MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861) · 866377de
      rocking authored
      * Add maxpool instances
      
      * Rename index pool to max pool.
      
      * Add maxpool bwd bf16 instances
      
      * Add avg pool bwd instances
      
      * Rename avgpool and maxpool to avg_pool3d and max_pool
      
      * Add bf16 pool fwd instances
      
      * Add max pool bwd to ckProfiler
      
      * Add avg pool3d bwd to ckProfiler
      
      * Add avg pool bwd test
      
      * Fix bug of reference pool fwd (dilation)
      
      * Fix bug of max pool bwd  (dilation and initZero)
      
      * Support bf16 compute data type
      
      * Force compute type be f32. Because atomicAdd only support f32
      
      * Add max pool bwd test
      
      * Rename folder
      
      * Rename pool
      
      * Add max pool bwd client example
      
      * Add avg pool bwd client example
      
      * Add missing workspace
      
      * clang format
      
      * Rename macro
      
      * remove useless header
      
      * remove useless layout
      866377de
  2. 28 Aug, 2023 1 commit
  3. 23 Aug, 2023 2 commits
    • Jun Liu's avatar
      [HotFix] add config and version files to pass on build info (#856) · c8a8385f
      Jun Liu authored
      * experiment with config file
      
      * experiment with version.h config
      
      * add more info to version.h
      
      * minor updates
      
      * minor updates
      
      * fix case where DTYPE is not used
      
      * large amount of files but minor changes
      
      * remove white space
      
      * minor changes to add more MACROs
      
      * fix cmakedefine01
      
      * fix issue with CK internal conflict
      
      * fix define and define value
      
      * fix clang-format
      
      * fix formatting issue
      
      * experiment with cmake
      
      * clang format v12 to be consistent with miopen
      
      * avoid clang-format for config file
      c8a8385f
    • zjing14's avatar
      add generic instances (#858) · 8ebea3a5
      zjing14 authored
      
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      8ebea3a5
  4. 22 Aug, 2023 2 commits
  5. 14 Aug, 2023 2 commits
    • Bartlomiej Wroblewski's avatar
      d4c84256
    • rocking's avatar
      Refactor pool fwd (#815) · f60f0a5e
      rocking authored
      * Do not hardcode stride
      
      * devicePool2DFwd Inherit devicePool3DFwd
      
      * Move instance declaration out of common
      
      * Add dilation
      
      * use the pool3d rank, because pool2d inherit pooo3d
      
      * calculate Do Ho Wo for the dilation
      
      * Fix header name
      
      * Modify ckProfiler
      
      * Remove pool2d instance
      
      * Remove pool2d in profiler
      
      * Remove pool2d and add dilation
      
      * In to client example, this commit revise following:
      1. Add dilation.
      2. Use pool3d to implement pool2d
      
      * Refine naming and IsSupportedArgument()
      
      * Add dilation to maxpool bwd example
      
      * clang format
      
      * 1. Remove useless header
      2. Fix copyright
      3. Refine naming
      
      * Add layout parameter to pool fwd
      
      * clang format
      
      * Fix merge error
      
      * Fix compile error
      
      * Remove layout parameter in derived class
      
      * Refine changlog
      
      * Fix compile error
      
      * Fix compiler error
      
      * Add layout to external api and profiler
      f60f0a5e
  6. 11 Aug, 2023 1 commit
  7. 09 Aug, 2023 1 commit
  8. 07 Aug, 2023 1 commit
    • Illia Silin's avatar
      Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769
      Illia Silin authored
      * properly split conv_nd_bwd_data instances
      
      * split conv2d_fwd instance data types
      
      * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
      
      * split the tests by data types where possible
      
      * filter examples by DTYPES
      
      * split few remaining examples by DTYPES
      
      * filter most instances by DTYPES
      
      * add new lines at end of headers, fix grouped_gemm profiler
      
      * fix syntax
      
      * split the ckprofiler instances by DTYPES
      
      * split the conv2d and quantization DL and XDL instances
      
      * fix the splitting of conv2d DL instances
      
      * split softmax and pool_fwd tests for fp16 and fp32 types
      
      * fix syntax
      
      * fix the dl_int8 quantization instances isolation
      08eb1769
  9. 02 Aug, 2023 1 commit
  10. 26 Jul, 2023 2 commits
    • carlushuang's avatar
      initial stream-k implementation with example (#699) · e7dca79d
      carlushuang authored
      
      
      * initial stream-k implementation with example
      
      * fix unexpected change in err
      
      * improve a little bit performance by reorganize pipeline.
      
      * improve perf a little bit by swizzle block idx
      
      * add profiler
      
      * update example
      
      * fix spelling
      
      * shrink karg for streamk
      
      * support dynamic buffer using memory coherence glc_slc bit from template
      
      * control memory coherence while construct dynamic buffer
      
      * update reduction for streamk(not ready yet)
      
      * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting
      
      * fix build issue
      
      * fix several bug
      
      * now result is correct, everything works (but has scratch)
      
      * remove scratch by manually reset coordinate
      
      * update device code
      
      * fix a bug in final reduce
      
      * fix something in example
      
      * update async memset
      
      * fix enum as camel case
      
      * modify coherence enum name
      
      * clean code and use atomic streamk by default
      
      * remove unused var
      
      * throw exception if have empty pointer
      
      * fix format
      
      * fix CI warning
      
      * fix type in init
      
      * modify CI error
      
      * filter out on gfx10+
      
      * restore changed example code
      
      ---------
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      e7dca79d
    • Illia Silin's avatar
      Disable DL kernels by default. (#816) · 9195435c
      Illia Silin authored
      9195435c
  11. 25 Jul, 2023 2 commits
  12. 21 Jul, 2023 1 commit
  13. 18 Jul, 2023 2 commits
    • Bartłomiej Kocot's avatar
      Grouped 3d conv backward data support (#799) · 49180fd6
      Bartłomiej Kocot authored
      * Grouped 3d conv backward data support
      
      * Fix comments
      49180fd6
    • Illia Silin's avatar
      Add mechanism to build CK for select data types, add Navi3x CI. (#790) · 189ea3b9
      Illia Silin authored
      * allow building CK for specific data types
      
      * add CI build and test stage on Naiv3x without some int8 instances
      
      * add missing gemm fp16 instances
      
      * add the changes to the missed cmake file
      
      * add empty lines at end of source files
      
      * Do not build quantization client example on navi3 in CI
      
      * disable batched_gemm_multi_d_int8 instances with DTYPES
      
      * disable device_conv2d_bwd_data_instance with DTYPES
      
      * fix ckprofiler for conv_bwd_data for int8
      
      * properly isolate the conv_bwd_data int8 instances
      
      * remove empty line
      189ea3b9
  14. 12 Jul, 2023 1 commit
  15. 06 Jul, 2023 3 commits
  16. 21 Jun, 2023 1 commit
  17. 17 Jun, 2023 1 commit
    • Qianfeng's avatar
      Padded Generic Kernel Instance (#730) · 0d911822
      Qianfeng authored
      
      
      * Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting
      
      * Move the generic kernel instance to be the first of the instance list for elementwise op of normalization
      
      * Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax
      
      * Add testing of GetGenericInstance() in client_example of Softmax
      
      * Revert "Add testing of GetGenericInstance() in client_example of Softmax"
      
      This reverts commit f629cd9a93ce38dfed4886d849f3c38d2e5379c8.
      
      * Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax"
      
      This reverts commit a9f0d000eb9fd240404112a526ef125429a351df.
      
      * Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm
      
      * Move generic kernel instance to separate tuple for elementwise op of normalization
      
      * Remove un-used files for softmax instance
      
      * Store generic kernel instance to separate tuple for softmax
      
      * Add IsSupported checking for generic instance to client example of softmax
      
      * Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization
      
      * clang-format fix
      
      * Remove int8 from softmax instances
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      0d911822
  18. 15 Jun, 2023 1 commit
  19. 14 Jun, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Add generic kernel instances for ck::tensor_operation::device::DeviceGemmMultipleD (#741) · 54b68eb3
      Rostyslav Geyyer authored
      * Add generic instance gemm_add_add_fastgelu
      
      * Add a client example for generic gemm_add_add_fastgelu
      
      * Update CMakeLists
      
      * Format
      
      * Format
      
      * Add generic instance gemm_add_fastgelu
      
      * Format
      
      * Add a gemm_add_fastgelu client example
      
      * Format
      
      * Add generic instance gemm_fastgelu
      
      * Format
      
      * Fix argument order
      
      * Add gemm_fastgelu client example
      
      * Add exceptions if argument is not supported
      54b68eb3
  20. 12 Jun, 2023 2 commits
  21. 31 May, 2023 1 commit
  22. 30 May, 2023 2 commits
    • Adam Osewski's avatar
      Multiple fixes to GroupedGemm+SplitK (#707) · 70e4eb56
      Adam Osewski authored
      
      
      * Add license header.
      
      * Reduce number of logged output. Add constant initialization.
      
      * Add functional tests for grouped_gemm with different kbatch value.
      
      * Add debug log informations + remove unused code.
      
      * Don't pass kbatch to CalculateKPadded.
      
      * Turn on logging in grouped gemm and gemm splitk profiler
      
      * Debug: limit number of test cases to run;
      
      * Log more information and initialize with constant value.
      
      * Turn on DEBUG_LOG
      
      * Add more debug log informations.
      
      * Limit the number of instances to compile.
      
      * Use GridwiseGemmPipeline
      
      * Use KBatch to calculate K0
      
      * Multiple DebugLog messages.
      
      * Unit tests for multiple KBatch values.
      
      * Refactoring
      
      * Disable logging
      * extract out of if statement KBatch update.
      
      * Uncomment instances.
      
      * Disable DebugLog.
      
      * Use Kbatch when calculate KPadded.
      
      * Fix CGridDesc padding.
      
      * Use available helper functions.
      
      * Uncomment code commented for debuggin.
      
      * Remove unnecessary debug log messages.
      
      * Uncomment previously commented code for debug purposes.
      
      * Add KBatch info to profiler output summary log.
      
      * Add gtests for gemm splitk using ckProfiler API.
      
      * Add more test-cases for different data layout.
      
      * Add more test cases for gemm splitk
      
      * Remove old test.
      
      * Unit tests for MKNK ggemm interface.
      
      * Fix and add more unit-tests.
      
      * Constepxr everything!
      
      * Increase error threshold for fp16 and splitk.
      
      Since we're using fp16 atomic add for splitk there's a
      known precision loss.
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      70e4eb56
    • Bartłomiej Kocot's avatar
      Add instances for fp16/int8 Gemm kernels (Navi21) (#717) · c2d7a29d
      Bartłomiej Kocot authored
      * Add instances for fp16/int8 Gemm kernels (Navi21)
      
      * Extend instances with smaller tiles
      
      * Fix SrcVectorTensor for km_kn_mn int8
      c2d7a29d
  23. 24 May, 2023 1 commit
    • rocking's avatar
      Pool3d fwd (#697) · 76ec0089
      rocking authored
      * Expand the base class of pool2d, prepare to share base class with pool3d
      
      * Add pool3d device op
      
      * Add pool3d f16 example
      
      * Refactor the base class. implement generic pooling in the future
      
      * clang format
      
      * get original index in max pooling
      
      * Add outputindex to base class
      
      * Fix dimension
      
      * Add pooling instance
      
      * Use indexType instead
      
      * Remove useless header
      
      * Extract IndexDataType to template
      
      * Extract pooling reference code
      
      * clang format
      
      * clang format
      
      * Fix typo
      
      * Add tensor stride
      
      * Add missing header
      
      * Add index stride and output stride
      
      * Refine naming
      
      * Add type to base class
      
      * Rename file
      
      * Use proper size
      
      * Fix typo
      
      * Refine naming
      
      * Modify the argument into vector.
      
      * Add max pool profiler
      
      * Refine naming
      
      * Support f32 pool
      
      * Fix typo
      
      * Add avg pool2d fwd in profiler
      
      * clang format
      
      * Rename AccDatatype to ComputeDatatype
      
      * Fix init
      
      * test pool
      
      * Extract variable
      
      * Add client example
      
      * Check the pooling dim
      
      * clang format
      
      * Connect argv and arg_parser
      
      * Add found check
      
      * Remove useless header
      
      * Refine naming
      
      * Adjust the order of device_pool_fwd
      76ec0089
  24. 24 Apr, 2023 2 commits
    • Adam Osewski's avatar
      Grouped Gemm + SplitK + simplified Kernel Args (#669) · 8bb2bb4a
      Adam Osewski authored
      
      
      * simplify karg in device/grid split-k op
      
      * fix mk_kn_mn instances
      
      * add more instances
      
      * B2C with 3D grid for KSplit
      
      * Remove unused code.
      
      * Use default B2C (3D grid) in grid gemm v2r4r2.
      
      * Device gemm splitk use B2C map.
      
      * Device GroupedGemmXdlSplitKCShuffle
      
      * Example for GroupedGemm Xdl SplitK
      
      * Introduce Device GroupedGemmSplitK
      
      * Fix updating kbatch size.
      
      * Add instance mk-nk-mn
      
      * Enable set kbatch in profiler.
      
      * Add GGemmSplitK mk-kn-mn instances
      
      * Add more instances & split into multiple files.
      
      * minor fix
      
      * tuning
      
      * clean
      
      * disabled failed instances
      
      * use pipe v2
      
      * Ignore arg on not supported arch.
      
      * fix warning
      
      ---------
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      8bb2bb4a
    • rocking's avatar
      Revise layout of group convolution (#675) · 3eecbfb6
      rocking authored
      * [What] Remove pure conv int8 instance
      [Why] We will never use pure int8 conv in AI, use int8 quantization instead
      
      * Change layout
      
      * Share the kernel parameter
      
      * Support more type of NHWGC for group conv
      
      * Revise client example of conv 2d, use NHWGC layout
      
      * Add instance to cmake
      
      * Revise layout of group conv quantization instance
      
      * Revise layout of external api of group conv quantization
      
      * Revise layout of group conv quantization client example
      
      * Fix clang format
      
      * Add comment to describe meaning of each parameter
      3eecbfb6
  25. 22 Apr, 2023 1 commit
  26. 17 Apr, 2023 1 commit
  27. 10 Apr, 2023 1 commit
    • rocking5566's avatar
      Groupnorm + swish external api (#668) · ed3a2e52
      rocking5566 authored
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      ed3a2e52
  28. 07 Apr, 2023 1 commit