1. 29 Nov, 2023 1 commit
    • arai713's avatar
      Disable transpose device op for MI300 (#1050) · a2969aa8
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      * disabled device op for MI300
      
      * skip  elementwise_permute_2d on gfx94x
      
      * Update CMakeLists.txt
      
      * fixing CMake - disabling some GPU targets
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      a2969aa8
  2. 28 Nov, 2023 1 commit
    • Illia Silin's avatar
      Split the static library into several files. (#1044) · 7965d66a
      Illia Silin authored
      * spolit the static library into several
      
      * update lib paths and fix client example
      
      * do not use device_mha_operarions for client examples
      
      * use appropriate libs to link to client examples
      
      * remove the gpu/transpose path from the list
      
      * try fixing clinet examples 3,4,9
      
      * add necessary libs for client examples
      
      * fix the layernorm client example
      
      * fix the client examples 23 and 24
      
      * fix typo
      
      * add interface library and refresh clang format
      7965d66a
  3. 27 Nov, 2023 1 commit
  4. 25 Nov, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add basic support for direct loads from global to LDS (#999) · 627054b9
      Bartlomiej Wroblewski authored
      * Add basic support for direct loads from global to LDS
      
      * Clean the code and comments
      
      * Add support for fp16
      
      * Add comments
      
      * Add check for thread cluster lengths
      
      * Align non-direct-load fp16 example
      
      * Small fixes
      
      * Extend IsSupported to check for supported GPU gens
      
      * Build examples only on the supported HW
      
      * Do not throw when instance not supported in 04 example
      
      * Review: Apply review suggestions
      
      * Review: small fix
      
      * Review: small fix
      627054b9
  5. 21 Nov, 2023 1 commit
  6. 17 Nov, 2023 1 commit
  7. 15 Nov, 2023 1 commit
    • carlushuang's avatar
      Fmha pr 2 (#26) · 3753c4bc
      carlushuang authored
      * support hdim=64/128 in same example code
      
      * support v transpose
      
      * revert gemm.cpp, not intent to modify it
      
      * remove useless code
      
      * fix a bug for swizzle C encoding, no perf change
      
      * optimize LDS encoding
      
      * update LDS layout
      
      * clean up code
      3753c4bc
  8. 14 Nov, 2023 1 commit
  9. 13 Nov, 2023 1 commit
    • arai713's avatar
      Hip tensor permute (#1002) · 454cf7bd
      arai713 authored
      * adding files for F32 example
      
      * adding functioning implementation with scalar multiplication and unary operator support
      
      * added fp 16 type check in unary square
      
      * updating scalar multiplication as an operator
      
      * functioning version with scalar operator
      
      * changing strides for col major
      
      * updated column major implementation
      
      * working column major implementation
      
      * cleaned up comments, rearranged/renamed files
      454cf7bd
  10. 10 Nov, 2023 2 commits
    • Bartłomiej Kocot's avatar
      Support multi AB for grouped conv fwd xdl (#1027) · 49e52bb3
      Bartłomiej Kocot authored
      * Support multi AB for grouped conv fwd xdl
      
      * Add instances
      
      * Add client example
      
      * Add example
      
      * Add interface test
      
      * Minor fixes
      
      Minor fixes
      
      Minor fixes
      
      * Comment fixes
      
      * Fixes
      
      * Reference fix
      
      * Test xdl fixes
      
      * Improve multi_ab interface test
      49e52bb3
    • rocking's avatar
      Backward of gamma and beta for layernorm and groupnorm (#1013) · 1db75603
      rocking authored
      * Add layernorm backward reference code
      
      * Add groupnorm backward reference code
      
      * Add example
      
      * clang format
      
      * Fixc bug of reference layernorm and groupnorm
      
      * Fix naming
      
      * Refine naming
      
      * Add device op for normalization bwd gamma and beta
      
      * Refine template parameter
      
      * Add bwd gamma & beta of kernel
      
      * 1. Add groupnorm example
      2. Refine layernorm naming
      
      * Narrow down the static check for performance
      
      * Refine variable name
      1db75603
  11. 09 Nov, 2023 2 commits
    • arai713's avatar
      Transpose 3d (#984) · 3af8c81a
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      3af8c81a
    • rocking's avatar
      Layernorm4d (#1022) · a3d9a2cd
      rocking authored
      
      
      * Rename folder
      
      * Add layernorm 4d fwd example
      
      * Rename original layernorm example
      
      * Add layernorm 4d f16  test
      
      * Add layernorm4d_fwd client example
      
      * Support layernorm4D in ckProfiler
      
      * Rename groupnorm to groupnorm fwd in example
      
      * Rename layernorm and group fwd in test
      
      * Rename normalization to normalization_fwd (instances)
      
      * Add fwd to DeviceNormalization
      
      * Rename external api header
      
      * Rename folder, because we can also add bwd in this folder
      
      * Add fwd in layernorm and groupnorm (profiler
      
      * Fix compile error
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      a3d9a2cd
  12. 03 Nov, 2023 3 commits
  13. 02 Nov, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add support for mixed precision in contraction scale and bilinear (#973) · 4ef704d8
      Bartlomiej Wroblewski authored
      
      
      * Add support for mixed precision in contraction scale and bilinear (#936)
      
      * Extract common functionality to separate files
      
      * Reference contraction: Remove incorrect consts from type_converts
      
      * Reference contraction: Add missing type_convert for dst value
      
      * Reference contraction: Fix incorrect order of B matrix dimensions
      
      * Add support for mixed precision in contraction scale and bilinear
      
      * Move using statements from instances to a common file
      
      * Move using statements from examples to a common file
      
      * Fix the order of B matrix dimensions across examples and profiler
      
      * Fix the computation of error threshold
      
      * Make ComputeDataType an optional argument
      
      * Include possible DataType -> ComputeDataType casting error in the threshold
      
      * Remove commented code
      
      * Make the ComputeDataType an optional argument in instance
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      4ef704d8
  14. 01 Nov, 2023 1 commit
  15. 31 Oct, 2023 1 commit
  16. 30 Oct, 2023 1 commit
  17. 28 Oct, 2023 1 commit
  18. 27 Oct, 2023 1 commit
    • carlushuang's avatar
      support batch & nhead, and scale (#20) · 95889861
      carlushuang authored
      * support batch & nhead
      
      * support scale
      
      * tile scheduler
      
      * rename tile-scheduler to tile-partitioner
      
      * add some exp2 math
      
      * fix a bug when chaning tile size
      95889861
  19. 21 Oct, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Fix cmake dtype check (#989) · ac0e0067
      Bartłomiej Kocot authored
      * Fix instances dtype check
      
      * Fix source dtypes seletor for examples and tests
      
      * Sync with new cmakefile changes
      
      * Remove not needed ifdefs
      
      * Remove not needed ifdefs
      ac0e0067
  20. 20 Oct, 2023 1 commit
  21. 19 Oct, 2023 5 commits
  22. 18 Oct, 2023 4 commits
    • rocking's avatar
      Layernorm and groupnorm support to save mean and inverse std in forward (#929) · 3696fe1c
      rocking authored
      * save mean and inverse std in normalization
      
      * Save mean and inverse std in splitK
      
      * Vector save mean and inv std
      
      * Modify instance for save mean and std
      
      * simplify the layernorm example
      
      * Save mean and std in groupnorm example
      
      * Save mean and inv std in ckProfiler and test
      
      * Remove compute data type from base class
      
      * Save mean and inv std in client example
      
      * Add changelog
      
      * clang format
      
      * Fix compile error
      
      * Refine naming
      
      * Avoid error in bf16
      
      * revert changelog
      3696fe1c
    • zjing14's avatar
      Clean DTYPES conditions in CMake (#974) · bf435140
      zjing14 authored
      
      
      * Add a condition to build fp8 instances
      
      * simplified buffer_load/store
      
      * add bfp8/fp8
      
      * fixed
      
      * remove all f8/bf8 condition include folder
      
      * fixed cmake conditions
      
      * fixed DTYPES=fp16/bfp16
      
      * fix
      
      * fixed buffer_load
      
      * fixed buffer_store
      
      * fix
      
      * clean example cmake files
      
      * fixed ci
      
      * fixed cit
      
      ---------
      Co-authored-by: default avatarRostyslav Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      bf435140
    • Po Yen Chen's avatar
      Pre-compute coordinates to speed up store_tile() for TileWindowWithStaticDistribution<> (#12) · 63bc96e3
      Po Yen Chen authored
      
      
      * Extract store_tile() logics as method
      
      * Extract load_tile() logics as method
      
      * Rename type alias
      
      * Extract common logics as traits
      
      * Remove unnecessary access specifier
      
      * Add ComputeMode for TileWindowWithStaticDistribution
      
      * Put field check into Traits
      
      * More definition of Traits types
      
      * Use more clear static_assert() message
      
      * Enable pre-compute coordinates in store_tile()
      
      * Re-formate static assert
      
      * Undo changes to the wrong method
      
      * Enable pre-compute coords for store_tile()
      
      * Remove static_vector usage
      
      * Add method to move non-member coordinates
      
      * Force using pre-computed coordinates in Store()
      
      * Fix wrong access for SFC_Ys
      
      * Change comment
      
      * Allow users to hint # access per coord
      
      * Add comment for noting remove data members later
      
      * Unify FIXME comments
      
      * Replace FIXME comments by TODO
      
      * Let user specify HintNumCoords
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * refactor load/store for window
      
      * clean
      
      * clean
      
      * bug fix for window; clean
      
      ---------
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      63bc96e3
    • zjing14's avatar
      Add contraction_multi_abd (#972) · 1cc36ba5
      zjing14 authored
      
      
      * add gridwise_multi_abd
      
      * move element_op into RunRead
      
      * merge element_wise op with data read
      
      * add multiABD example
      
      * allow packed elementwise_op
      
      * changed example
      
      * clean
      
      * clean
      
      * add is_detected
      
      * fix
      
      * minor fix
      
      * add scaleAdd_vec4 example
      
      * init commit for contraction_multi_ABD
      
      * add examples
      
      * add examples of multiA and broadcast
      
      * update example
      
      * fixed comments
      
      * Update cmake-ck-dev.sh
      
      * Update cmake-ck-dev.sh
      
      * Add comments into the example
      
      * Update CMakeLists.txt
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      1cc36ba5
  23. 17 Oct, 2023 1 commit
  24. 13 Oct, 2023 1 commit
  25. 12 Oct, 2023 2 commits
    • Chao Liu's avatar
      Refactor 1010 (#14) · 7337ec25
      Chao Liu authored
      * refactor
      
      * refactor
      
      * change load_tile, update block gemm
      
      * debug
      
      * clean
      
      * clean
      
      * experiment lod
      
      * workaround spilling issue
      
      * clean
      7337ec25
    • carlushuang's avatar
      slice kv, and use 3d padding LDS layout (#15) · 7b1a0b7f
      carlushuang authored
      * slice kv, and use 3d padding LDS layout
      
      * add missing sync
      
      * put sync to another poace
      
      * move sync place
      
      * revert to normal
      7b1a0b7f
  26. 10 Oct, 2023 1 commit
  27. 06 Oct, 2023 1 commit
    • carlushuang's avatar
      add tensor slicing API (#7) · 6491acda
      carlushuang authored
      
      
      * add tensor slicing API
      
      * remove redundant ck namespace
      
      * better gemm_gemm interface
      
      * modify gemm_gemm
      
      * add slice_tile api
      
      * fix merge bug
      
      * update to 3d padding, since we no longer need that much LDS size
      
      * clean
      
      * cleang
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      ---------
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6491acda
  28. 05 Oct, 2023 1 commit