1. 04 Dec, 2023 1 commit
  2. 03 Dec, 2023 3 commits
  3. 30 Nov, 2023 3 commits
  4. 29 Nov, 2023 1 commit
    • arai713's avatar
      Disable transpose device op for MI300 (#1050) · a2969aa8
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      * disabled device op for MI300
      
      * skip  elementwise_permute_2d on gfx94x
      
      * Update CMakeLists.txt
      
      * fixing CMake - disabling some GPU targets
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      a2969aa8
  5. 28 Nov, 2023 3 commits
  6. 27 Nov, 2023 2 commits
  7. 25 Nov, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add basic support for direct loads from global to LDS (#999) · 627054b9
      Bartlomiej Wroblewski authored
      * Add basic support for direct loads from global to LDS
      
      * Clean the code and comments
      
      * Add support for fp16
      
      * Add comments
      
      * Add check for thread cluster lengths
      
      * Align non-direct-load fp16 example
      
      * Small fixes
      
      * Extend IsSupported to check for supported GPU gens
      
      * Build examples only on the supported HW
      
      * Do not throw when instance not supported in 04 example
      
      * Review: Apply review suggestions
      
      * Review: small fix
      
      * Review: small fix
      627054b9
  8. 17 Nov, 2023 1 commit
  9. 16 Nov, 2023 2 commits
  10. 15 Nov, 2023 2 commits
  11. 14 Nov, 2023 1 commit
  12. 13 Nov, 2023 2 commits
    • Rostyslav Geyyer's avatar
      Add conv bwd weight client example (#1005) · 5356c4a9
      Rostyslav Geyyer authored
      * Add conv bwd weight client example
      
      * Update instance selector
      
      * Fake the conversion
      
      * Bring the conversion back
      5356c4a9
    • arai713's avatar
      Hip tensor permute (#1002) · 454cf7bd
      arai713 authored
      * adding files for F32 example
      
      * adding functioning implementation with scalar multiplication and unary operator support
      
      * added fp 16 type check in unary square
      
      * updating scalar multiplication as an operator
      
      * functioning version with scalar operator
      
      * changing strides for col major
      
      * updated column major implementation
      
      * working column major implementation
      
      * cleaned up comments, rearranged/renamed files
      454cf7bd
  13. 11 Nov, 2023 1 commit
  14. 10 Nov, 2023 2 commits
    • Bartłomiej Kocot's avatar
      Support multi AB for grouped conv fwd xdl (#1027) · 49e52bb3
      Bartłomiej Kocot authored
      * Support multi AB for grouped conv fwd xdl
      
      * Add instances
      
      * Add client example
      
      * Add example
      
      * Add interface test
      
      * Minor fixes
      
      Minor fixes
      
      Minor fixes
      
      * Comment fixes
      
      * Fixes
      
      * Reference fix
      
      * Test xdl fixes
      
      * Improve multi_ab interface test
      49e52bb3
    • rocking's avatar
      Backward of gamma and beta for layernorm and groupnorm (#1013) · 1db75603
      rocking authored
      * Add layernorm backward reference code
      
      * Add groupnorm backward reference code
      
      * Add example
      
      * clang format
      
      * Fixc bug of reference layernorm and groupnorm
      
      * Fix naming
      
      * Refine naming
      
      * Add device op for normalization bwd gamma and beta
      
      * Refine template parameter
      
      * Add bwd gamma & beta of kernel
      
      * 1. Add groupnorm example
      2. Refine layernorm naming
      
      * Narrow down the static check for performance
      
      * Refine variable name
      1db75603
  15. 09 Nov, 2023 3 commits
    • Illia Silin's avatar
      add linker script to QA builds (#1030) · 68f2b5e7
      Illia Silin authored
      68f2b5e7
    • arai713's avatar
      Transpose 3d (#984) · 3af8c81a
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      3af8c81a
    • rocking's avatar
      Layernorm4d (#1022) · a3d9a2cd
      rocking authored
      
      
      * Rename folder
      
      * Add layernorm 4d fwd example
      
      * Rename original layernorm example
      
      * Add layernorm 4d f16  test
      
      * Add layernorm4d_fwd client example
      
      * Support layernorm4D in ckProfiler
      
      * Rename groupnorm to groupnorm fwd in example
      
      * Rename layernorm and group fwd in test
      
      * Rename normalization to normalization_fwd (instances)
      
      * Add fwd to DeviceNormalization
      
      * Rename external api header
      
      * Rename folder, because we can also add bwd in this folder
      
      * Add fwd in layernorm and groupnorm (profiler
      
      * Fix compile error
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      a3d9a2cd
  16. 08 Nov, 2023 1 commit
  17. 07 Nov, 2023 2 commits
  18. 03 Nov, 2023 2 commits
  19. 02 Nov, 2023 2 commits
  20. 01 Nov, 2023 2 commits
  21. 31 Oct, 2023 3 commits