1. 05 Nov, 2023 1 commit
  2. 01 Nov, 2023 1 commit
  3. 26 Oct, 2023 3 commits
  4. 25 Oct, 2023 1 commit
  5. 23 Oct, 2023 1 commit
  6. 21 Oct, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Fix cmake dtype check (#989) · ac0e0067
      Bartłomiej Kocot authored
      * Fix instances dtype check
      
      * Fix source dtypes seletor for examples and tests
      
      * Sync with new cmakefile changes
      
      * Remove not needed ifdefs
      
      * Remove not needed ifdefs
      ac0e0067
  7. 20 Oct, 2023 3 commits
  8. 19 Oct, 2023 8 commits
  9. 18 Oct, 2023 6 commits
  10. 17 Oct, 2023 2 commits
  11. 16 Oct, 2023 2 commits
    • zjing14's avatar
      workaround with float (#992) · 39430bfd
      zjing14 authored
      
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      39430bfd
    • Illia Silin's avatar
      Add hipTensor build and test to CK CI. (#990) · 707ad002
      Illia Silin authored
      * add a hipTensor test to CI
      
      * use jenkins git plugin
      
      * change hipTensor folder location in CI
      
      * change the git method for hipTensor
      
      * run tests usign ctest
      
      * check the hipTensor contents
      
      * only build hipTensor on MI100/200
      
      * pull hipTensor as zip archive
      
      * fix jenkins syntax
      
      * add path to the CK installation
      
      * combine build commands into one shell
      
      * change jenkins syntax for CK installer path
      
      * try different syntax
      
      * allow unzip overwrite
      
      * fix jenkins file syntax
      
      * remove any old versions of hipTensor before building
      
      * add option to select hipTensor branch for testing
      707ad002
  12. 13 Oct, 2023 2 commits
  13. 12 Oct, 2023 2 commits
  14. 11 Oct, 2023 4 commits
    • Harisankar Sadasivan's avatar
    • Harisankar Sadasivan's avatar
    • zjing14's avatar
      Revert "Grouped Gemm with looping over the tiles. (#788)" (#982) · c99323be
      zjing14 authored
      This reverts commit a4f72a31.
      c99323be
    • Adam Osewski's avatar
      Grouped Gemm with looping over the tiles. (#788) · a4f72a31
      Adam Osewski authored
      
      
      * Introduce LocalBlockToCTileMap.
      
      * Change the signature of CalculateBottomIndex() function which now does
      not accept any argument. The B2C map which is already passed as an
      argument to the kernel Run function is calculating block's local id
      already outside at kernel entry point __global__ function.
      The LocalB2C map stores as members local block ID.
      
      * Use LocalBlockToCTile map in device ops.
      
      * First draft of tile loop work distribution.
      
      * Fix typo.
      
      * Simplify kernel arguments.
      
      Calculate descriptors & B2C maps on the device.
      
      * Use looping kernel.
      
      * Fix B2C constructor.
      
      * Fix Navi21 errors.
      
      * Calculate tile start/end in device kernel.
      
      * Change Run API to accept user provided workspace buffer.
      
      * Add new line at EOF.
      
      * Move Gemm KernelArguments to device op interface.
      
      * Remove unused code.
      
      * Update API.
      
      * Launch grid size which is min of occupancy vs tile count
      
      * Get back to use constant memory for gemm descriptors.
      
      * Remove unused code.
      
      * Add default virtual method implementation.
      
      * Update comments to conform with doxygen style.
      
      * Fix doc style and unused parameters.
      
      * Add thread cluster lengths to kernel name.
      
      * Remove old splitk impl and replace it with tile looping one.
      
      * Modify instances.
      
      * set KPerBlock to 64
      * maximize wherever possible vector load size.
      
      * Fix instances cluster lengths.
      
      * Change comment style.
      
      * Use 128b store where possible in instances.
      
      * Update test cases, since KPerBlock has doubled.
      
      * Update output stream operator for Sequence.
      
      * Add pipeline version to GroupedGEMM device op type string.
      
      * Fix pipeline version type logging.
      
      * Fix input tensors type after merge.
      
      * Fix compiler error.
      
      * Fix output stream operator for Pipeline version.
      
      * Store using 128b.
      
      * Set of instances with kpb 32/64
      
      * Limit number of instances
      
      * Remove commented out instances.
      
      * Fix function name.
      
      * Limit the number of instances.
      
      Add pipline version to the regular instances
      
      * Change thr cluster layout for reading B tensor.
      
      * disabled failed instances
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      a4f72a31
  15. 10 Oct, 2023 2 commits
  16. 05 Oct, 2023 1 commit