1. 31 Aug, 2023 2 commits
    • zjing14's avatar
      Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0
      zjing14 authored
      
      
      * move all arguments into device
      
      * add b2c_tile_map
      
      * add examples
      
      * add SetDeviceKernelArgs
      
      * dedicated fixed_nk solution
      
      * init client api
      
      * add grouped_gemm_bias example
      
      * add a instance
      
      * add instances
      
      * formatting
      
      * fixed cmake
      
      * Update EnableCompilerWarnings.cmake
      
      * Update cmake-ck-dev.sh
      
      * clean; fixed comments
      
      * fixed comment
      
      * add instances for fp32 output
      
      * add instances for fp32 output
      
      * add fp32 out client example
      
      * fixed CI
      
      * init commit for kbatch
      
      * add splitk gridwise
      
      * format
      
      * fixed
      
      * clean deviceop
      
      * clean code
      
      * finish splitk
      
      * fixed instances
      
      * change m_loops to tile_loops
      
      * add setkbatch
      
      * clean code
      
      * add splitK+bias
      
      * add instances
      
      * opt mk_nk instances
      
      * clean examples
      
      * fixed CI
      
      * remove zero
      
      * finished non-zero
      
      * clean
      
      * clean code
      
      * optimized global_barrier
      
      * fixed ci
      
      * fixed CI
      
      * removed AddBias
      
      * format
      
      * fixed CI
      
      * fixed CI
      
      * move 20_grouped_gemm to 21_grouped_gemm
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      f5ec04f0
    • rocking's avatar
      MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861) · 866377de
      rocking authored
      * Add maxpool instances
      
      * Rename index pool to max pool.
      
      * Add maxpool bwd bf16 instances
      
      * Add avg pool bwd instances
      
      * Rename avgpool and maxpool to avg_pool3d and max_pool
      
      * Add bf16 pool fwd instances
      
      * Add max pool bwd to ckProfiler
      
      * Add avg pool3d bwd to ckProfiler
      
      * Add avg pool bwd test
      
      * Fix bug of reference pool fwd (dilation)
      
      * Fix bug of max pool bwd  (dilation and initZero)
      
      * Support bf16 compute data type
      
      * Force compute type be f32. Because atomicAdd only support f32
      
      * Add max pool bwd test
      
      * Rename folder
      
      * Rename pool
      
      * Add max pool bwd client example
      
      * Add avg pool bwd client example
      
      * Add missing workspace
      
      * clang format
      
      * Rename macro
      
      * remove useless header
      
      * remove useless layout
      866377de
  2. 30 Aug, 2023 2 commits
  3. 23 Aug, 2023 2 commits
  4. 22 Aug, 2023 1 commit
  5. 16 Aug, 2023 1 commit
  6. 14 Aug, 2023 1 commit
    • rocking's avatar
      Refactor pool fwd (#815) · f60f0a5e
      rocking authored
      * Do not hardcode stride
      
      * devicePool2DFwd Inherit devicePool3DFwd
      
      * Move instance declaration out of common
      
      * Add dilation
      
      * use the pool3d rank, because pool2d inherit pooo3d
      
      * calculate Do Ho Wo for the dilation
      
      * Fix header name
      
      * Modify ckProfiler
      
      * Remove pool2d instance
      
      * Remove pool2d in profiler
      
      * Remove pool2d and add dilation
      
      * In to client example, this commit revise following:
      1. Add dilation.
      2. Use pool3d to implement pool2d
      
      * Refine naming and IsSupportedArgument()
      
      * Add dilation to maxpool bwd example
      
      * clang format
      
      * 1. Remove useless header
      2. Fix copyright
      3. Refine naming
      
      * Add layout parameter to pool fwd
      
      * clang format
      
      * Fix merge error
      
      * Fix compile error
      
      * Remove layout parameter in derived class
      
      * Refine changlog
      
      * Fix compile error
      
      * Fix compiler error
      
      * Add layout to external api and profiler
      f60f0a5e
  7. 07 Aug, 2023 1 commit
  8. 03 Aug, 2023 2 commits
  9. 28 Jul, 2023 2 commits
  10. 21 Jul, 2023 3 commits
  11. 19 Jul, 2023 4 commits
  12. 18 Jul, 2023 4 commits
  13. 12 Jul, 2023 1 commit
  14. 17 Jun, 2023 1 commit
    • Qianfeng's avatar
      Padded Generic Kernel Instance (#730) · 0d911822
      Qianfeng authored
      
      
      * Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting
      
      * Move the generic kernel instance to be the first of the instance list for elementwise op of normalization
      
      * Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax
      
      * Add testing of GetGenericInstance() in client_example of Softmax
      
      * Revert "Add testing of GetGenericInstance() in client_example of Softmax"
      
      This reverts commit f629cd9a93ce38dfed4886d849f3c38d2e5379c8.
      
      * Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax"
      
      This reverts commit a9f0d000eb9fd240404112a526ef125429a351df.
      
      * Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm
      
      * Move generic kernel instance to separate tuple for elementwise op of normalization
      
      * Remove un-used files for softmax instance
      
      * Store generic kernel instance to separate tuple for softmax
      
      * Add IsSupported checking for generic instance to client example of softmax
      
      * Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization
      
      * clang-format fix
      
      * Remove int8 from softmax instances
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      0d911822
  15. 15 Jun, 2023 1 commit
  16. 14 Jun, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Add generic kernel instances for ck::tensor_operation::device::DeviceGemmMultipleD (#741) · 54b68eb3
      Rostyslav Geyyer authored
      * Add generic instance gemm_add_add_fastgelu
      
      * Add a client example for generic gemm_add_add_fastgelu
      
      * Update CMakeLists
      
      * Format
      
      * Format
      
      * Add generic instance gemm_add_fastgelu
      
      * Format
      
      * Add a gemm_add_fastgelu client example
      
      * Format
      
      * Add generic instance gemm_fastgelu
      
      * Format
      
      * Fix argument order
      
      * Add gemm_fastgelu client example
      
      * Add exceptions if argument is not supported
      54b68eb3
  17. 12 Jun, 2023 1 commit
  18. 31 May, 2023 1 commit
  19. 24 May, 2023 1 commit
    • rocking's avatar
      Pool3d fwd (#697) · 76ec0089
      rocking authored
      * Expand the base class of pool2d, prepare to share base class with pool3d
      
      * Add pool3d device op
      
      * Add pool3d f16 example
      
      * Refactor the base class. implement generic pooling in the future
      
      * clang format
      
      * get original index in max pooling
      
      * Add outputindex to base class
      
      * Fix dimension
      
      * Add pooling instance
      
      * Use indexType instead
      
      * Remove useless header
      
      * Extract IndexDataType to template
      
      * Extract pooling reference code
      
      * clang format
      
      * clang format
      
      * Fix typo
      
      * Add tensor stride
      
      * Add missing header
      
      * Add index stride and output stride
      
      * Refine naming
      
      * Add type to base class
      
      * Rename file
      
      * Use proper size
      
      * Fix typo
      
      * Refine naming
      
      * Modify the argument into vector.
      
      * Add max pool profiler
      
      * Refine naming
      
      * Support f32 pool
      
      * Fix typo
      
      * Add avg pool2d fwd in profiler
      
      * clang format
      
      * Rename AccDatatype to ComputeDatatype
      
      * Fix init
      
      * test pool
      
      * Extract variable
      
      * Add client example
      
      * Check the pooling dim
      
      * clang format
      
      * Connect argv and arg_parser
      
      * Add found check
      
      * Remove useless header
      
      * Refine naming
      
      * Adjust the order of device_pool_fwd
      76ec0089
  20. 24 Apr, 2023 1 commit
    • rocking's avatar
      Revise layout of group convolution (#675) · 3eecbfb6
      rocking authored
      * [What] Remove pure conv int8 instance
      [Why] We will never use pure int8 conv in AI, use int8 quantization instead
      
      * Change layout
      
      * Share the kernel parameter
      
      * Support more type of NHWGC for group conv
      
      * Revise client example of conv 2d, use NHWGC layout
      
      * Add instance to cmake
      
      * Revise layout of group conv quantization instance
      
      * Revise layout of external api of group conv quantization
      
      * Revise layout of group conv quantization client example
      
      * Fix clang format
      
      * Add comment to describe meaning of each parameter
      3eecbfb6
  21. 17 Apr, 2023 1 commit
  22. 11 Apr, 2023 1 commit
  23. 10 Apr, 2023 1 commit
    • rocking5566's avatar
      Groupnorm + swish external api (#668) · ed3a2e52
      rocking5566 authored
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      ed3a2e52
  24. 30 Mar, 2023 1 commit
  25. 29 Mar, 2023 1 commit
    • rocking5566's avatar
      Conv + quantization + tanh (#645) · 389e84a8
      rocking5566 authored
      
      
      * Rename file. Prepare to support another activation
      
      * Add comment for quantization
      
      * Extract out_elementop
      
      * Add tanh example
      
      * Add conv + bias + tanh quantization instance
      
      * Add missing parameter
      
      * Refine cmake
      
      * Add external api and client example
      
      * Extract variable in example
      
      * Fix the comment
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      389e84a8
  26. 15 Mar, 2023 1 commit
    • rocking5566's avatar
      gemm/Conv xdlops + dlops quantization (#625) · 16dc18e0
      rocking5566 authored
      * Add conv perlayer quantization
      
      * Add gemm_dlops quantization
      
      * Support int8 for innerproduct
      
      * Refine gemm dlops int8 kernel parameter
      
      * Support gfx908(MI100) and gfx90a(MI200)
      
      * clang-format
      
      * Rename example number
      
      * Support different layout for d tensor
      
      * Add conv dlops perchannel quantization example
      
      * Move to example 40
      
      * Extract the common code for different platform (dlops and xdlops)
      
      * Move ot subfolder. Prepare to add other op of quantization
      
      * Refine the quantization instance library
      
      * Add conv dl instances and client example
      
      * Remove unnecessary type
      
      * Add gemm quantization instance
      
      * Add external api and client example
      
      * Refine num_bytes
      
      * Separete different layout to different cpp
      
      * Add more xdl instances
      
      * Revert "Remove unnecessary type"
      
      This reverts commit 82086918
      
      .
      
      * Remove CShuffleDataType in dlops
      Let acc and CShuffleDataType be the same in xdlops
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      16dc18e0
  27. 08 Mar, 2023 1 commit
    • Adam Osewski's avatar
      GroupedGEMM + Gelu client example/instances/profiler (#614) · 9096b1c7
      Adam Osewski authored
      
      
      * Grouped gemm + Gelu instances.
      
      * Device Instance Factory for GroupedGemm+Gelu
      
      * Client example
      
      * Rangify fill helper functions.
      
      * Fix name clash.
      
      * Profiler for grouped_gemm+gelu
      
      * No need to use full namespace name.
      
      * Add check for MRaw divisible by vector load.
      
      * Ugly fix for big errors.
      
      * Add grouped_gemm+gelu to profiler CMakelists.
      
      * Store in argument additional info.
      
      * Information about Mraw, Nraw, Kraw values.
      
      * Use FastGelu instead of Gelu.
      
      * Change client ex to use FastGelu
      
      * Remove relaxed error precision.
      
      * Remove duplicate output elementwise-op
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      9096b1c7