1. 28 May, 2024 3 commits
  2. 23 May, 2024 3 commits
  3. 22 May, 2024 3 commits
  4. 21 May, 2024 1 commit
  5. 20 May, 2024 1 commit
  6. 17 May, 2024 3 commits
  7. 15 May, 2024 3 commits
  8. 11 May, 2024 1 commit
  9. 10 May, 2024 3 commits
  10. 09 May, 2024 2 commits
  11. 08 May, 2024 2 commits
  12. 07 May, 2024 2 commits
  13. 06 May, 2024 1 commit
  14. 02 May, 2024 1 commit
  15. 01 May, 2024 4 commits
  16. 30 Apr, 2024 2 commits
  17. 29 Apr, 2024 1 commit
  18. 26 Apr, 2024 4 commits
    • Haocong WANG's avatar
      [GEMM] UniversalGemm update (#1262) · 764164b4
      Haocong WANG authored
      
      
      * Add bf16 instances
      
      * Add bf16 gemm universal example
      
      * tempsave
      
      * Add guard to navi compilation
      
      * workground on a specific mixed gemm instance ( bring back it when compiler fix upload)
      
      * fix formatting condition statement issue
      
      * solve conflict
      
      ---------
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      764164b4
    • Rostyslav Geyyer's avatar
      Add element op (#1259) · f044ff71
      Rostyslav Geyyer authored
      f044ff71
    • zjing14's avatar
      ggemm tile_loop multD bf16 int8 (#1258) · 5ae893c0
      zjing14 authored
      
      
      * Overload output stream operator for LoopScheduler and PiplineVersion
      
      * Add Run overload accepting grid descriptors MK.
      
      * Add __device__ keyword for CalculateGridSize
      
      * Create device op GroupedGemmMultipleD
      
      * Add GroupedGemm MultipleD Tile Loop implementation.
      
      * Add an example for GroupedGemm MultipleD tile loop.
      
      * Device Op GroupedGEMMTileLoop.
      
      * Bunch of small changes in exmaple.
      
      * CkProfiler
      
      * Remove unused tparam.
      
      * changed the copy function to v7r2
      
      * adding multi_abd
      
      * in-progress
      
      * add post-load oob check
      
      * Fix include statement.
      
      * Fix output stream overloads.
      
      * Do not make descriptors and check validity untill we find group.
      
      * Fix gemm desc initialization.
      
      * debugging
      
      * adjust instances
      
      * add run_lds
      
      * add elemntwise_op
      
      * replace multi_abd_device with v3
      
      * clean up
      
      * clean
      
      * clean
      
      * Revert device op
      
      * Fix compilation for DTYPES=FP16
      
      * Validate tensor transfers paramters.
      
      * Added LDSType
      
      * profiling
      
      * adjust oobcheck
      
      * add missing file
      
      * Validate on host only NK dims if M is not known.
      
      * add
      
      * clean
      
      * refactor
      
      * clean
      
      * add examples
      
      * add fuse
      
      * add fusion and client example
      
      * Fix bug.
      
      * A convenient debug func for selecting threads.
      
      * Fix has main k block loop bug.
      
      * Make sure that b2c has up to date tile offset.
      
      * Output stream operator for Sequence type.
      
      * Cmake file formatting.
      
      * clean
      
      ---------
      Co-authored-by: default avatarAdam Osewski <Adam.Osewski@amd.com>
      5ae893c0
    • zjing14's avatar
      bf16A_Int8B with fastgelu/bias (#1264) · 0d0150db
      zjing14 authored
      * changed the copy function to v7r2
      
      * adding multi_abd
      
      * in-progress
      
      * add post-load oob check
      
      * debugging
      
      * adjust instances
      
      * add run_lds
      
      * add elemntwise_op
      
      * replace multi_abd_device with v3
      
      * clean up
      
      * clean
      
      * clean
      
      * Added LDSType
      
      * profiling
      
      * adjust oobcheck
      
      * add missing file
      
      * refactor
      
      * clean
      
      * add examples
      0d0150db