1. 20 Feb, 2024 1 commit
  2. 07 Feb, 2024 1 commit
  3. 25 Jan, 2024 1 commit
    • rocking's avatar
      layernorm & groupnorm bwd gamma beta (#1133) · 28f68a5a
      rocking authored
      * Add layernorm bwd gamma beta external api
      
      * Add groupnorm external api
      
      * Add layernorm bwd gamma beta profiler
      
      * Add groupnorm bwd gamma beta ckProfiler
      
      * Add layernorm & groupnorm bwd gamma beta test
      
      * Fix groupnorm bwd gamma beta profiler bug
      
      * Layernorm bwd weight client example
      
      * Groupnorm bwd weight client example
      
      * clang format
      
      * Remove useless header
      
      * Let inv_std be positive
      
      * Rename to num_bytes and move this calculation outside the loop
      28f68a5a
  4. 24 Jan, 2024 1 commit
    • Illia Silin's avatar
      Fixing most of the cppcheck errors. (#1142) · 180e5720
      Illia Silin authored
      * fix cppcheck errors, first pass
      
      * fix format
      
      * fix returned value in examples
      
      * add macro definitions for cppcheck
      
      * fix the profile_gemm logic
      
      * update the gemm profiler logic
      
      * add more difinitions to cppcheck, fix couple more errors
      
      * replace runtime error with message in device function
      
      * fix a couple of int4 issues
      
      * no return for fill function
      
      * fix errors in data_types.hpp
      
      * fix format
      
      * fix few remaining errors
      
      * fix errors in data_types.hpp
      
      * fix last couple of errors in datat_types.hpp
      180e5720
  5. 22 Jan, 2024 1 commit
  6. 19 Jan, 2024 1 commit
  7. 09 Jan, 2024 1 commit
  8. 04 Jan, 2024 1 commit
    • arai713's avatar
      Transpose profiler fix (#1114) · aa3e2d79
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      * disabled device op for MI300
      
      * skip  elementwise_permute_2d on gfx94x
      
      * Update CMakeLists.txt
      
      * fixing CMake - disabling some GPU targets
      
      * added transpose profiler to CMake
      
      * fixed transpose profiler errors
      
      * fixed instances for tests/profiler
      
      * cleaned up code in transpose profiler source code
      
      * added some comments, updated copyright
      
      * made function arguments const where possible
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      aa3e2d79
  9. 20 Dec, 2023 1 commit
  10. 18 Dec, 2023 1 commit
    • rocking's avatar
      layernorm and groupnorm backward data (#1083) · a69aa2a1
      rocking authored
      * rename folder
      
      * Add type string
      
      * Remove typo
      
      * Add deviceOp to backward x
      
      * Add comment to describe the behavior of backward normalization
      
      * Add kernel function, prepare to implement
      
      * implement generic kernel
      
      * Check vector size
      
      * Add sweep once pipeline for small reduce size
      
      * Fix bug of KRaw_ error
      
      * Fix bug of dx stride
      
      * sanity check for mean and rstd
      
      * backward x for groupnorm
      
      * Add bwd x instance
      
      * add layernorm 2d bwd gamma beta instances
      
      * Change save mean var type from f32 to f16 in f16 mode
      
      * Change the example to f16
      
      * Add groupnorm bwd gamma beta instance
      
      * Add groupnorm bwd x instance
      
      * Fix naming
      
      * Add layernorm bwd x ckprofiler
      
      * Add groupnorm bwd x profiler
      
      * clang format
      
      * Rename bwd x to bwd data
      
      * Fix bug of verification in profiler
      
      * Add test of layernorm and groupnorm bwd data
      
      * Add missing cmake
      
      * Add layernorm2d bwd data
      
      * rename fwd example
      
      * Add groupnorm client example
      
      * Fix typo. replace Invarient with Invariant
      
      * Add checking before running the best instance
      a69aa2a1
  11. 07 Dec, 2023 1 commit
  12. 29 Nov, 2023 1 commit
    • arai713's avatar
      Disable transpose device op for MI300 (#1050) · a2969aa8
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      * disabled device op for MI300
      
      * skip  elementwise_permute_2d on gfx94x
      
      * Update CMakeLists.txt
      
      * fixing CMake - disabling some GPU targets
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      a2969aa8
  13. 28 Nov, 2023 1 commit
  14. 17 Nov, 2023 1 commit
  15. 16 Nov, 2023 1 commit
  16. 14 Nov, 2023 1 commit
  17. 11 Nov, 2023 1 commit
  18. 09 Nov, 2023 2 commits
    • arai713's avatar
      Transpose 3d (#984) · 3af8c81a
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      3af8c81a
    • rocking's avatar
      Layernorm4d (#1022) · a3d9a2cd
      rocking authored
      
      
      * Rename folder
      
      * Add layernorm 4d fwd example
      
      * Rename original layernorm example
      
      * Add layernorm 4d f16  test
      
      * Add layernorm4d_fwd client example
      
      * Support layernorm4D in ckProfiler
      
      * Rename groupnorm to groupnorm fwd in example
      
      * Rename layernorm and group fwd in test
      
      * Rename normalization to normalization_fwd (instances)
      
      * Add fwd to DeviceNormalization
      
      * Rename external api header
      
      * Rename folder, because we can also add bwd in this folder
      
      * Add fwd in layernorm and groupnorm (profiler
      
      * Fix compile error
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      a3d9a2cd
  19. 07 Nov, 2023 1 commit
  20. 02 Nov, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add support for mixed precision in contraction scale and bilinear (#973) · 4ef704d8
      Bartlomiej Wroblewski authored
      
      
      * Add support for mixed precision in contraction scale and bilinear (#936)
      
      * Extract common functionality to separate files
      
      * Reference contraction: Remove incorrect consts from type_converts
      
      * Reference contraction: Add missing type_convert for dst value
      
      * Reference contraction: Fix incorrect order of B matrix dimensions
      
      * Add support for mixed precision in contraction scale and bilinear
      
      * Move using statements from instances to a common file
      
      * Move using statements from examples to a common file
      
      * Fix the order of B matrix dimensions across examples and profiler
      
      * Fix the computation of error threshold
      
      * Make ComputeDataType an optional argument
      
      * Include possible DataType -> ComputeDataType casting error in the threshold
      
      * Remove commented code
      
      * Make the ComputeDataType an optional argument in instance
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      4ef704d8
  21. 31 Oct, 2023 1 commit
  22. 28 Oct, 2023 1 commit
  23. 19 Oct, 2023 1 commit
  24. 18 Oct, 2023 2 commits
    • rocking's avatar
      Layernorm and groupnorm support to save mean and inverse std in forward (#929) · 3696fe1c
      rocking authored
      * save mean and inverse std in normalization
      
      * Save mean and inverse std in splitK
      
      * Vector save mean and inv std
      
      * Modify instance for save mean and std
      
      * simplify the layernorm example
      
      * Save mean and std in groupnorm example
      
      * Save mean and inv std in ckProfiler and test
      
      * Remove compute data type from base class
      
      * Save mean and inv std in client example
      
      * Add changelog
      
      * clang format
      
      * Fix compile error
      
      * Refine naming
      
      * Avoid error in bf16
      
      * revert changelog
      3696fe1c
    • zjing14's avatar
      Clean DTYPES conditions in CMake (#974) · bf435140
      zjing14 authored
      
      
      * Add a condition to build fp8 instances
      
      * simplified buffer_load/store
      
      * add bfp8/fp8
      
      * fixed
      
      * remove all f8/bf8 condition include folder
      
      * fixed cmake conditions
      
      * fixed DTYPES=fp16/bfp16
      
      * fix
      
      * fixed buffer_load
      
      * fixed buffer_store
      
      * fix
      
      * clean example cmake files
      
      * fixed ci
      
      * fixed cit
      
      ---------
      Co-authored-by: default avatarRostyslav Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      bf435140
  25. 17 Oct, 2023 1 commit
  26. 13 Oct, 2023 1 commit
  27. 05 Oct, 2023 1 commit
  28. 04 Oct, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945) · 42facfc6
      Rostyslav Geyyer authored
      
      
      * Add f8 bf8 gemm example
      
      * Add element-wise ops
      
      * Add intrinsics
      
      * Update reference calculation
      
      * Add an additional type option for xdlops gemm
      
      * Fix build process
      
      * Add bf8 to buffer addressing
      
      * Update blockwise op, split typeA and typeB
      
      * Update for compatibility
      
      * Uppdate naming to f8->fp8
      
      * Update naming
      
      * Format
      
      * Update naming (#937)
      
      * Add a client example
      
      * Add computetypes to device and gridwise ops
      
      * Add instances, update instance factory
      
      * Format
      
      * Fix a flag
      
      * Add ckProfiler mode
      
      * Fix typos
      
      * Add an example
      
      * Add bf8 generator
      
      * add bf8 mfma; fixed type_convert for bf8
      
      * move verfication ahead of timing
      
      * Update reference calculation
      
      * Fix reference
      
      * Narrow down float init range
      
      * Fix bf8 bf8 mfma
      
      * Add bf8 @ fp8 mfma
      
      * Update example
      
      * Update instances
      
      * Update profiler api
      
      * Update for compatibility
      
      * Format
      
      * Remove extra example
      
      * Clean up
      
      * workaround convert
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      42facfc6
  29. 29 Sep, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add support for mixed precision in contraction scale and bilinear (#936) · f0748506
      Bartlomiej Wroblewski authored
      * Extract common functionality to separate files
      
      * Reference contraction: Remove incorrect consts from type_converts
      
      * Reference contraction: Add missing type_convert for dst value
      
      * Reference contraction: Fix incorrect order of B matrix dimensions
      
      * Add support for mixed precision in contraction scale and bilinear
      
      * Move using statements from instances to a common file
      
      * Move using statements from examples to a common file
      
      * Fix the order of B matrix dimensions across examples and profiler
      
      * Fix the computation of error threshold
      
      * Make ComputeDataType an optional argument
      
      * Include possible DataType -> ComputeDataType casting error in the threshold
      
      * Remove commented code
      f0748506
  30. 27 Sep, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Add column to image kernel (#930) · e2243a4d
      Bartłomiej Kocot authored
      * Add column to image kernel
      
      * Minor fixes for dtypes and client examples
      
      * Disable tests for disabled dtypes
      
      * Disable add instances functions for disabled data types
      
      * Minor stylistic fixes
      
      * Revert "Disable add instances functions for disabled data types"
      
      This reverts commit 728b8695.
      
      * Instances reduction
      
      * Add comments in device_column_to_image_impl
      
      * Update changelog and Copyrights
      
      * Improve changelog
      e2243a4d
  31. 26 Sep, 2023 1 commit
  32. 13 Sep, 2023 1 commit
    • zjing14's avatar
      fixed fp8 issues (#894) · a66d14ed
      zjing14 authored
      
      
      * fixed fp8 init; and reference gemm
      
      * Update host_tensor_generator.hpp
      
      * fixed convert
      
      * fixed reference gemm
      
      * fixed comments
      
      * fixed comments
      
      * fixed ci
      
      * fixed computeType
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      a66d14ed
  33. 12 Sep, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Refactor f8_t, add bf8_t (#792) · 62d4af74
      Rostyslav Geyyer authored
      * Refactor f8_t to add bf8_t
      
      * Add check_err impl for f8_t
      
      * Update fp8 test
      
      * Format
      
      * Revert the fix
      
      * Update vector_type implementation
      
      * Add bf8 test
      
      * Add bf8, use BitInt types
      
      * Add bf8 conversion methods
      
      * Update type_convert for fp8/bf8
      
      * Add check_err fp8/bf8 support
      
      * Add subnorm fp8 tests
      
      * Add subnorm bf8 tests
      
      * Fix conversion
      
      * Add bf8 cmake bindings
      
      * Add macros to enable build with disabled fp8/bf8
      
      * Remove is_native method
      
      * Update flag combination for mixed precision instances
      
      * Add more flag checks
      
      * Add another flag to a client example
      
      * Add type traits, decouple f8/bf8 casting
      
      * Clean up
      
      * Decouple fp8 and bf8 flags
      
      * Remove more redundant flags
      
      * Remove leftover comments
      62d4af74
  34. 08 Sep, 2023 1 commit
  35. 05 Sep, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Add image to column kernel (#867) · 0077eeb3
      Bartłomiej Kocot authored
      * Add image to column kernel
      
      * Add instances, tests, profiler, example
      
      * Add client example
      
      * Several fixes of image to column
      
      * Fix variable name in device_image_to_column_impl
      
      * Several fixes of image to column profiler
      
      * Fix num_btype calculation
      
      * Make new mesaurements for correct bytes calculation
      0077eeb3
  36. 31 Aug, 2023 1 commit
    • rocking's avatar
      MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861) · 866377de
      rocking authored
      * Add maxpool instances
      
      * Rename index pool to max pool.
      
      * Add maxpool bwd bf16 instances
      
      * Add avg pool bwd instances
      
      * Rename avgpool and maxpool to avg_pool3d and max_pool
      
      * Add bf16 pool fwd instances
      
      * Add max pool bwd to ckProfiler
      
      * Add avg pool3d bwd to ckProfiler
      
      * Add avg pool bwd test
      
      * Fix bug of reference pool fwd (dilation)
      
      * Fix bug of max pool bwd  (dilation and initZero)
      
      * Support bf16 compute data type
      
      * Force compute type be f32. Because atomicAdd only support f32
      
      * Add max pool bwd test
      
      * Rename folder
      
      * Rename pool
      
      * Add max pool bwd client example
      
      * Add avg pool bwd client example
      
      * Add missing workspace
      
      * clang format
      
      * Rename macro
      
      * remove useless header
      
      * remove useless layout
      866377de
  37. 28 Aug, 2023 1 commit
  38. 23 Aug, 2023 1 commit
    • Jun Liu's avatar
      [HotFix] add config and version files to pass on build info (#856) · c8a8385f
      Jun Liu authored
      * experiment with config file
      
      * experiment with version.h config
      
      * add more info to version.h
      
      * minor updates
      
      * minor updates
      
      * fix case where DTYPE is not used
      
      * large amount of files but minor changes
      
      * remove white space
      
      * minor changes to add more MACROs
      
      * fix cmakedefine01
      
      * fix issue with CK internal conflict
      
      * fix define and define value
      
      * fix clang-format
      
      * fix formatting issue
      
      * experiment with cmake
      
      * clang format v12 to be consistent with miopen
      
      * avoid clang-format for config file
      c8a8385f