1. 27 Jun, 2024 1 commit
  2. 18 Jun, 2024 1 commit
  3. 06 Jun, 2024 4 commits
  4. 05 Jun, 2024 3 commits
    • Adam Osewski's avatar
      Code cleanup. · e628b162
      Adam Osewski authored
      e628b162
    • Adam Osewski's avatar
      Fix GetNextKTiles. · 2a16c61c
      Adam Osewski authored
      2a16c61c
    • Rostyslav Geyyer's avatar
      Add a scale op, related instances and examples (#1242) · cb0645be
      Rostyslav Geyyer authored
      
      
      * Add a scale op
      
      * Update the element op
      
      * Add instances
      
      * Add an example
      
      * Add a client example
      
      * Add a flag check
      
      * Revert flag check addition
      
      * Fix flag check
      
      * Update d strides in example
      
      * Update d strides in client example
      
      * Apply suggestions from code review
      
      Update copyright header
      Co-authored-by: default avatarBartłomiej Kocot <barkocot@amd.com>
      
      * Move the example
      
      * Move the client example
      
      * Update element op
      
      * Update example with the new element op
      
      * Add scalar layout
      
      * Update example
      
      * Update kernel for scalar Ds
      
      * Revert kernel changes
      
      * Update element op
      
      * Update example to use scales' pointers
      
      * Format
      
      * Update instances
      
      * Update client example
      
      * Move element op to unary elements
      
      * Update element op to work with values instead of pointers
      
      * Update instances to take element op as an argument
      
      * Update examples to use random scale values
      
      ---------
      Co-authored-by: default avatarBartłomiej Kocot <barkocot@amd.com>
      cb0645be
  5. 04 Jun, 2024 2 commits
  6. 01 Jun, 2024 1 commit
    • zjing14's avatar
      Post-merge fix of PR 1300 (#1313) · 6fb1f4e0
      zjing14 authored
      * add f8 gemm with multiD for both row/col wise
      
      * change compute_type to fp8
      
      * changed tuning parameters in the example
      
      * add rcr example
      
      * post-merge fix
      
      * fix
      
      * reduce init range
      6fb1f4e0
  7. 28 May, 2024 1 commit
  8. 22 May, 2024 1 commit
  9. 21 May, 2024 3 commits
  10. 20 May, 2024 1 commit
  11. 17 May, 2024 1 commit
  12. 15 May, 2024 2 commits
  13. 14 May, 2024 3 commits
  14. 10 May, 2024 2 commits
  15. 09 May, 2024 2 commits
  16. 08 May, 2024 2 commits
  17. 07 May, 2024 1 commit
  18. 06 May, 2024 1 commit
    • Adam Osewski's avatar
      Multiple fixes. · 160932b6
      Adam Osewski authored
      * fix Accumulation when there's only one workgroup per K dim.
      * Update occupancy values after KBatch update and fix it's calculation.
      160932b6
  19. 02 May, 2024 1 commit
  20. 29 Apr, 2024 2 commits
  21. 26 Apr, 2024 3 commits
    • Haocong WANG's avatar
      [GEMM] UniversalGemm update (#1262) · 764164b4
      Haocong WANG authored
      
      
      * Add bf16 instances
      
      * Add bf16 gemm universal example
      
      * tempsave
      
      * Add guard to navi compilation
      
      * workground on a specific mixed gemm instance ( bring back it when compiler fix upload)
      
      * fix formatting condition statement issue
      
      * solve conflict
      
      ---------
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      764164b4
    • Rostyslav Geyyer's avatar
      Add element op (#1259) · f044ff71
      Rostyslav Geyyer authored
      f044ff71
    • zjing14's avatar
      bf16A_Int8B with fastgelu/bias (#1264) · 0d0150db
      zjing14 authored
      * changed the copy function to v7r2
      
      * adding multi_abd
      
      * in-progress
      
      * add post-load oob check
      
      * debugging
      
      * adjust instances
      
      * add run_lds
      
      * add elemntwise_op
      
      * replace multi_abd_device with v3
      
      * clean up
      
      * clean
      
      * clean
      
      * Added LDSType
      
      * profiling
      
      * adjust oobcheck
      
      * add missing file
      
      * refactor
      
      * clean
      
      * add examples
      0d0150db
  22. 25 Apr, 2024 2 commits
    • Adam Osewski's avatar
      Grouped GEMM Multiple D tile loop. (#1247) · b4032629
      Adam Osewski authored
      * Overload output stream operator for LoopScheduler and PiplineVersion
      
      * Add Run overload accepting grid descriptors MK.
      
      * Add __device__ keyword for CalculateGridSize
      
      * Create device op GroupedGemmMultipleD
      
      * Add GroupedGemm MultipleD Tile Loop implementation.
      
      * Add an example for GroupedGemm MultipleD tile loop.
      
      * Device Op GroupedGEMMTileLoop.
      
      * Bunch of small changes in exmaple.
      
      * CkProfiler
      
      * Remove unused tparam.
      
      * Fix include statement.
      
      * Fix output stream overloads.
      
      * Do not make descriptors and check validity untill we find group.
      
      * Fix gemm desc initialization.
      
      * Revert device op
      
      * Fix compilation for DTYPES=FP16
      
      * Validate tensor transfers paramters.
      
      * Validate on host only NK dims if M is not known.
      
      * Fix bug.
      
      * A convenient debug func for selecting threads.
      
      * Fix has main k block loop bug.
      
      * Make sure that b2c has up to date tile offset.
      
      * Output stream operator for Sequence type.
      
      * Cmake file formatting.
      b4032629
    • ltqin's avatar
      Universal gemm flush cache (#1251) · f448d179
      ltqin authored
      
      
      * add flush cache to device op
      
      * add flush cache parameter to ckProfiler
      
      * change calculate size a and b method
      
      * chang evaluation time method foro AVERAGE to MEDIAN
      
      * format code
      
      * adjust some code
      
      * fix core dumped
      
      * remove loop call flush icache in kernel
      
      * remove loop(outer) call flush icache
      
      ---------
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      f448d179