1. 22 Nov, 2023 2 commits
  2. 21 Nov, 2023 1 commit
  3. 17 Nov, 2023 1 commit
  4. 16 Nov, 2023 2 commits
  5. 15 Nov, 2023 2 commits
  6. 14 Nov, 2023 1 commit
  7. 13 Nov, 2023 2 commits
    • Rostyslav Geyyer's avatar
      Add conv bwd weight client example (#1005) · 5356c4a9
      Rostyslav Geyyer authored
      * Add conv bwd weight client example
      
      * Update instance selector
      
      * Fake the conversion
      
      * Bring the conversion back
      5356c4a9
    • arai713's avatar
      Hip tensor permute (#1002) · 454cf7bd
      arai713 authored
      * adding files for F32 example
      
      * adding functioning implementation with scalar multiplication and unary operator support
      
      * added fp 16 type check in unary square
      
      * updating scalar multiplication as an operator
      
      * functioning version with scalar operator
      
      * changing strides for col major
      
      * updated column major implementation
      
      * working column major implementation
      
      * cleaned up comments, rearranged/renamed files
      454cf7bd
  8. 11 Nov, 2023 1 commit
  9. 10 Nov, 2023 3 commits
    • Bartłomiej Kocot's avatar
      Support multi AB for grouped conv fwd xdl (#1027) · 49e52bb3
      Bartłomiej Kocot authored
      * Support multi AB for grouped conv fwd xdl
      
      * Add instances
      
      * Add client example
      
      * Add example
      
      * Add interface test
      
      * Minor fixes
      
      Minor fixes
      
      Minor fixes
      
      * Comment fixes
      
      * Fixes
      
      * Reference fix
      
      * Test xdl fixes
      
      * Improve multi_ab interface test
      49e52bb3
    • rocking's avatar
      Backward of gamma and beta for layernorm and groupnorm (#1013) · 1db75603
      rocking authored
      * Add layernorm backward reference code
      
      * Add groupnorm backward reference code
      
      * Add example
      
      * clang format
      
      * Fixc bug of reference layernorm and groupnorm
      
      * Fix naming
      
      * Refine naming
      
      * Add device op for normalization bwd gamma and beta
      
      * Refine template parameter
      
      * Add bwd gamma & beta of kernel
      
      * 1. Add groupnorm example
      2. Refine layernorm naming
      
      * Narrow down the static check for performance
      
      * Refine variable name
      1db75603
    • muozturk's avatar
      merge · 0c823497
      muozturk authored
      0c823497
  10. 09 Nov, 2023 3 commits
    • Illia Silin's avatar
      add linker script to QA builds (#1030) · 68f2b5e7
      Illia Silin authored
      68f2b5e7
    • arai713's avatar
      Transpose 3d (#984) · 3af8c81a
      arai713 authored
      
      
      * added working example for 5D input using 1D kernel
      
      * example with 5D input tensor and 2d kernel - not working: issues with arguments
      
      * added updated version of 3d device op - changed descriptors/dims
      
      * added example file to check kernel
      
      * fixed descriptor and isSupportedArgument stride problem
      
      * added and modified kernel for 3d - updated tids/loop
      
      * adding some more 5d example files
      
      * fixed some issues
      
      * changes made for testing
      
      * working version: fixed error in stride for A, still a bit inefficient
      
      * cleaned up formatting/comments
      
      * updating formatting
      
      * more formatting fixes
      
      * fixing cmake, adding back gpu targets in cmake script
      
      * adding client example
      
      * added instances for client example
      
      * fixed errors in client example
      
      * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
      
      * removed extra files
      
      * minor formatting and naming fixes
      
      * adding test files and profiler
      
      * fixing minor error
      
      * minor fix
      
      * removed unneccesary comments, renamed files
      
      * updated instance list for client example, added different layout example
      
      * removing instances
      
      * fixed error in instance generation
      
      * remove comments
      
      * update profiler and client example tensor layouts
      
      * fixed errors in test/profiler
      
      * updated vector dim access to enable vector load
      
      * updated test/profiler files
      
      * updated example with 1d kernel
      
      * updating profiler
      
      * renamed files
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      3af8c81a
    • rocking's avatar
      Layernorm4d (#1022) · a3d9a2cd
      rocking authored
      
      
      * Rename folder
      
      * Add layernorm 4d fwd example
      
      * Rename original layernorm example
      
      * Add layernorm 4d f16  test
      
      * Add layernorm4d_fwd client example
      
      * Support layernorm4D in ckProfiler
      
      * Rename groupnorm to groupnorm fwd in example
      
      * Rename layernorm and group fwd in test
      
      * Rename normalization to normalization_fwd (instances)
      
      * Add fwd to DeviceNormalization
      
      * Rename external api header
      
      * Rename folder, because we can also add bwd in this folder
      
      * Add fwd in layernorm and groupnorm (profiler
      
      * Fix compile error
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      a3d9a2cd
  11. 08 Nov, 2023 1 commit
  12. 07 Nov, 2023 2 commits
  13. 06 Nov, 2023 1 commit
  14. 03 Nov, 2023 2 commits
  15. 02 Nov, 2023 2 commits
  16. 01 Nov, 2023 2 commits
  17. 31 Oct, 2023 3 commits
  18. 30 Oct, 2023 1 commit
    • Illia Silin's avatar
      Enable sccache in the default docker and CI. (#1009) · 4e44a9e8
      Illia Silin authored
      
      
      * replace ccache with sccache, pin package versions
      
      * put ccache back temporarily to avoid breaking other CI jobs
      
      * add sccashe_wrapper.sh script
      
      * fix the package version syntax
      
      * fix the pymysql package issue
      
      * run sccache_wrapper before build if ccache server found
      
      * set the paths before calling the sccache_wrapper
      
      * use /tmp instead of /usr/local for cache
      
      * try using sccache --start-server instead of wrapper
      
      * try using redis server with sccache
      
      * define SCCACHE_REDIS
      
      * add redis and ping packages, and redis port
      
      * use the new sccache redis server
      
      * do not use sccache with staging compiler
      
      * fix the condition syntax
      
      * add stunnel to redis
      
      * add tunnel verification
      
      * separate caches for different architectures
      
      * fix syntax for the cache tag
      
      * quse double brackets for conditions
      
      * add bash line to the script
      
      * add a switch for sccache and only use it in build stage
      
      * run check_host function when enabling sccache
      
      * fix the invocation tags for sccache
      
      * fix groovy syntax
      
      * set the invocation tag in groovy
      
      * disable sccache in clang-format stage
      
      * try another syntax for invocation tags
      
      * use local sccache server if can't connect to redis
      
      * fix script syntax
      
      * update README
      
      * refresh readme
      
      * readme updates
      
      * remove the timing and verification caveat from readme
      
      ---------
      Co-authored-by: default avatarLisa Delaney <lisa.delaney@amd.com>
      4e44a9e8
  19. 28 Oct, 2023 1 commit
  20. 27 Oct, 2023 2 commits
  21. 26 Oct, 2023 1 commit
  22. 23 Oct, 2023 1 commit
  23. 21 Oct, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Fix cmake dtype check (#989) · ac0e0067
      Bartłomiej Kocot authored
      * Fix instances dtype check
      
      * Fix source dtypes seletor for examples and tests
      
      * Sync with new cmakefile changes
      
      * Remove not needed ifdefs
      
      * Remove not needed ifdefs
      ac0e0067
  24. 20 Oct, 2023 1 commit
  25. 19 Oct, 2023 1 commit