1. 23 May, 2024 1 commit
  2. 22 May, 2024 3 commits
  3. 21 May, 2024 1 commit
  4. 20 May, 2024 1 commit
  5. 17 May, 2024 3 commits
  6. 15 May, 2024 3 commits
  7. 11 May, 2024 1 commit
  8. 10 May, 2024 3 commits
  9. 09 May, 2024 2 commits
  10. 08 May, 2024 2 commits
  11. 07 May, 2024 2 commits
  12. 06 May, 2024 1 commit
  13. 02 May, 2024 1 commit
  14. 01 May, 2024 4 commits
  15. 30 Apr, 2024 2 commits
  16. 29 Apr, 2024 1 commit
  17. 26 Apr, 2024 4 commits
    • Haocong WANG's avatar
      [GEMM] UniversalGemm update (#1262) · 764164b4
      Haocong WANG authored
      
      
      * Add bf16 instances
      
      * Add bf16 gemm universal example
      
      * tempsave
      
      * Add guard to navi compilation
      
      * workground on a specific mixed gemm instance ( bring back it when compiler fix upload)
      
      * fix formatting condition statement issue
      
      * solve conflict
      
      ---------
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      764164b4
    • Rostyslav Geyyer's avatar
      Add element op (#1259) · f044ff71
      Rostyslav Geyyer authored
      f044ff71
    • zjing14's avatar
      ggemm tile_loop multD bf16 int8 (#1258) · 5ae893c0
      zjing14 authored
      
      
      * Overload output stream operator for LoopScheduler and PiplineVersion
      
      * Add Run overload accepting grid descriptors MK.
      
      * Add __device__ keyword for CalculateGridSize
      
      * Create device op GroupedGemmMultipleD
      
      * Add GroupedGemm MultipleD Tile Loop implementation.
      
      * Add an example for GroupedGemm MultipleD tile loop.
      
      * Device Op GroupedGEMMTileLoop.
      
      * Bunch of small changes in exmaple.
      
      * CkProfiler
      
      * Remove unused tparam.
      
      * changed the copy function to v7r2
      
      * adding multi_abd
      
      * in-progress
      
      * add post-load oob check
      
      * Fix include statement.
      
      * Fix output stream overloads.
      
      * Do not make descriptors and check validity untill we find group.
      
      * Fix gemm desc initialization.
      
      * debugging
      
      * adjust instances
      
      * add run_lds
      
      * add elemntwise_op
      
      * replace multi_abd_device with v3
      
      * clean up
      
      * clean
      
      * clean
      
      * Revert device op
      
      * Fix compilation for DTYPES=FP16
      
      * Validate tensor transfers paramters.
      
      * Added LDSType
      
      * profiling
      
      * adjust oobcheck
      
      * add missing file
      
      * Validate on host only NK dims if M is not known.
      
      * add
      
      * clean
      
      * refactor
      
      * clean
      
      * add examples
      
      * add fuse
      
      * add fusion and client example
      
      * Fix bug.
      
      * A convenient debug func for selecting threads.
      
      * Fix has main k block loop bug.
      
      * Make sure that b2c has up to date tile offset.
      
      * Output stream operator for Sequence type.
      
      * Cmake file formatting.
      
      * clean
      
      ---------
      Co-authored-by: default avatarAdam Osewski <Adam.Osewski@amd.com>
      5ae893c0
    • zjing14's avatar
      bf16A_Int8B with fastgelu/bias (#1264) · 0d0150db
      zjing14 authored
      * changed the copy function to v7r2
      
      * adding multi_abd
      
      * in-progress
      
      * add post-load oob check
      
      * debugging
      
      * adjust instances
      
      * add run_lds
      
      * add elemntwise_op
      
      * replace multi_abd_device with v3
      
      * clean up
      
      * clean
      
      * clean
      
      * Added LDSType
      
      * profiling
      
      * adjust oobcheck
      
      * add missing file
      
      * refactor
      
      * clean
      
      * add examples
      0d0150db
  18. 25 Apr, 2024 2 commits
    • Adam Osewski's avatar
      Grouped GEMM Multiple D tile loop. (#1247) · b4032629
      Adam Osewski authored
      * Overload output stream operator for LoopScheduler and PiplineVersion
      
      * Add Run overload accepting grid descriptors MK.
      
      * Add __device__ keyword for CalculateGridSize
      
      * Create device op GroupedGemmMultipleD
      
      * Add GroupedGemm MultipleD Tile Loop implementation.
      
      * Add an example for GroupedGemm MultipleD tile loop.
      
      * Device Op GroupedGEMMTileLoop.
      
      * Bunch of small changes in exmaple.
      
      * CkProfiler
      
      * Remove unused tparam.
      
      * Fix include statement.
      
      * Fix output stream overloads.
      
      * Do not make descriptors and check validity untill we find group.
      
      * Fix gemm desc initialization.
      
      * Revert device op
      
      * Fix compilation for DTYPES=FP16
      
      * Validate tensor transfers paramters.
      
      * Validate on host only NK dims if M is not known.
      
      * Fix bug.
      
      * A convenient debug func for selecting threads.
      
      * Fix has main k block loop bug.
      
      * Make sure that b2c has up to date tile offset.
      
      * Output stream operator for Sequence type.
      
      * Cmake file formatting.
      b4032629
    • ltqin's avatar
      Universal gemm flush cache (#1251) · f448d179
      ltqin authored
      
      
      * add flush cache to device op
      
      * add flush cache parameter to ckProfiler
      
      * change calculate size a and b method
      
      * chang evaluation time method foro AVERAGE to MEDIAN
      
      * format code
      
      * adjust some code
      
      * fix core dumped
      
      * remove loop call flush icache in kernel
      
      * remove loop(outer) call flush icache
      
      ---------
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      f448d179
  19. 23 Apr, 2024 1 commit
  20. 22 Apr, 2024 1 commit
  21. 19 Apr, 2024 1 commit
    • Bartłomiej Kocot's avatar
      Refactor elementwise kernels (#1222) · ad1597c4
      Bartłomiej Kocot authored
      * Refactor elementwise kernels
      
      * Instances fixes
      
      * Fix cmake
      
      * Fix max pool bwd test
      
      * Update two stage gemm split k
      
      * Restore elementwise scale for hiptensor backward compatiblity
      
      * Fix Acc data type check in conv fwd multiple abd
      
      * Disable conv fp64 fwd example
      
      * Update grouped conv weight multi d
      ad1597c4