1. 29 Feb, 2024 1 commit
  2. 27 Feb, 2024 1 commit
  3. 26 Feb, 2024 1 commit
  4. 31 Jan, 2024 1 commit
  5. 24 Jan, 2024 1 commit
    • Illia Silin's avatar
      Fixing most of the cppcheck errors. (#1142) · 180e5720
      Illia Silin authored
      * fix cppcheck errors, first pass
      
      * fix format
      
      * fix returned value in examples
      
      * add macro definitions for cppcheck
      
      * fix the profile_gemm logic
      
      * update the gemm profiler logic
      
      * add more difinitions to cppcheck, fix couple more errors
      
      * replace runtime error with message in device function
      
      * fix a couple of int4 issues
      
      * no return for fill function
      
      * fix errors in data_types.hpp
      
      * fix format
      
      * fix few remaining errors
      
      * fix errors in data_types.hpp
      
      * fix last couple of errors in datat_types.hpp
      180e5720
  6. 09 Nov, 2023 1 commit
  7. 07 Nov, 2023 1 commit
  8. 30 Oct, 2023 1 commit
    • Illia Silin's avatar
      Enable sccache in the default docker and CI. (#1009) · 4e44a9e8
      Illia Silin authored
      
      
      * replace ccache with sccache, pin package versions
      
      * put ccache back temporarily to avoid breaking other CI jobs
      
      * add sccashe_wrapper.sh script
      
      * fix the package version syntax
      
      * fix the pymysql package issue
      
      * run sccache_wrapper before build if ccache server found
      
      * set the paths before calling the sccache_wrapper
      
      * use /tmp instead of /usr/local for cache
      
      * try using sccache --start-server instead of wrapper
      
      * try using redis server with sccache
      
      * define SCCACHE_REDIS
      
      * add redis and ping packages, and redis port
      
      * use the new sccache redis server
      
      * do not use sccache with staging compiler
      
      * fix the condition syntax
      
      * add stunnel to redis
      
      * add tunnel verification
      
      * separate caches for different architectures
      
      * fix syntax for the cache tag
      
      * quse double brackets for conditions
      
      * add bash line to the script
      
      * add a switch for sccache and only use it in build stage
      
      * run check_host function when enabling sccache
      
      * fix the invocation tags for sccache
      
      * fix groovy syntax
      
      * set the invocation tag in groovy
      
      * disable sccache in clang-format stage
      
      * try another syntax for invocation tags
      
      * use local sccache server if can't connect to redis
      
      * fix script syntax
      
      * update README
      
      * refresh readme
      
      * readme updates
      
      * remove the timing and verification caveat from readme
      
      ---------
      Co-authored-by: default avatarLisa Delaney <lisa.delaney@amd.com>
      4e44a9e8
  9. 11 Oct, 2023 2 commits
    • zjing14's avatar
      Revert "Grouped Gemm with looping over the tiles. (#788)" (#982) · c99323be
      zjing14 authored
      This reverts commit a4f72a31.
      c99323be
    • Adam Osewski's avatar
      Grouped Gemm with looping over the tiles. (#788) · a4f72a31
      Adam Osewski authored
      
      
      * Introduce LocalBlockToCTileMap.
      
      * Change the signature of CalculateBottomIndex() function which now does
      not accept any argument. The B2C map which is already passed as an
      argument to the kernel Run function is calculating block's local id
      already outside at kernel entry point __global__ function.
      The LocalB2C map stores as members local block ID.
      
      * Use LocalBlockToCTile map in device ops.
      
      * First draft of tile loop work distribution.
      
      * Fix typo.
      
      * Simplify kernel arguments.
      
      Calculate descriptors & B2C maps on the device.
      
      * Use looping kernel.
      
      * Fix B2C constructor.
      
      * Fix Navi21 errors.
      
      * Calculate tile start/end in device kernel.
      
      * Change Run API to accept user provided workspace buffer.
      
      * Add new line at EOF.
      
      * Move Gemm KernelArguments to device op interface.
      
      * Remove unused code.
      
      * Update API.
      
      * Launch grid size which is min of occupancy vs tile count
      
      * Get back to use constant memory for gemm descriptors.
      
      * Remove unused code.
      
      * Add default virtual method implementation.
      
      * Update comments to conform with doxygen style.
      
      * Fix doc style and unused parameters.
      
      * Add thread cluster lengths to kernel name.
      
      * Remove old splitk impl and replace it with tile looping one.
      
      * Modify instances.
      
      * set KPerBlock to 64
      * maximize wherever possible vector load size.
      
      * Fix instances cluster lengths.
      
      * Change comment style.
      
      * Use 128b store where possible in instances.
      
      * Update test cases, since KPerBlock has doubled.
      
      * Update output stream operator for Sequence.
      
      * Add pipeline version to GroupedGEMM device op type string.
      
      * Fix pipeline version type logging.
      
      * Fix input tensors type after merge.
      
      * Fix compiler error.
      
      * Fix output stream operator for Pipeline version.
      
      * Store using 128b.
      
      * Set of instances with kpb 32/64
      
      * Limit number of instances
      
      * Remove commented out instances.
      
      * Fix function name.
      
      * Limit the number of instances.
      
      Add pipline version to the regular instances
      
      * Change thr cluster layout for reading B tensor.
      
      * disabled failed instances
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      a4f72a31
  10. 31 Aug, 2023 1 commit
    • zjing14's avatar
      Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0
      zjing14 authored
      
      
      * move all arguments into device
      
      * add b2c_tile_map
      
      * add examples
      
      * add SetDeviceKernelArgs
      
      * dedicated fixed_nk solution
      
      * init client api
      
      * add grouped_gemm_bias example
      
      * add a instance
      
      * add instances
      
      * formatting
      
      * fixed cmake
      
      * Update EnableCompilerWarnings.cmake
      
      * Update cmake-ck-dev.sh
      
      * clean; fixed comments
      
      * fixed comment
      
      * add instances for fp32 output
      
      * add instances for fp32 output
      
      * add fp32 out client example
      
      * fixed CI
      
      * init commit for kbatch
      
      * add splitk gridwise
      
      * format
      
      * fixed
      
      * clean deviceop
      
      * clean code
      
      * finish splitk
      
      * fixed instances
      
      * change m_loops to tile_loops
      
      * add setkbatch
      
      * clean code
      
      * add splitK+bias
      
      * add instances
      
      * opt mk_nk instances
      
      * clean examples
      
      * fixed CI
      
      * remove zero
      
      * finished non-zero
      
      * clean
      
      * clean code
      
      * optimized global_barrier
      
      * fixed ci
      
      * fixed CI
      
      * removed AddBias
      
      * format
      
      * fixed CI
      
      * fixed CI
      
      * move 20_grouped_gemm to 21_grouped_gemm
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      f5ec04f0
  11. 23 Aug, 2023 1 commit
    • Jun Liu's avatar
      [HotFix] add config and version files to pass on build info (#856) · c8a8385f
      Jun Liu authored
      * experiment with config file
      
      * experiment with version.h config
      
      * add more info to version.h
      
      * minor updates
      
      * minor updates
      
      * fix case where DTYPE is not used
      
      * large amount of files but minor changes
      
      * remove white space
      
      * minor changes to add more MACROs
      
      * fix cmakedefine01
      
      * fix issue with CK internal conflict
      
      * fix define and define value
      
      * fix clang-format
      
      * fix formatting issue
      
      * experiment with cmake
      
      * clang format v12 to be consistent with miopen
      
      * avoid clang-format for config file
      c8a8385f
  12. 03 Aug, 2023 2 commits
  13. 27 Jul, 2023 1 commit
  14. 06 Jul, 2023 1 commit
  15. 16 Jun, 2023 1 commit
  16. 15 Jun, 2023 1 commit
    • Illia Silin's avatar
      Enable gfx941 and gfx942 architectures. (#752) · 027e46ee
      Illia Silin authored
      * enable gfx941/942 targets
      
      * fix clang format
      
      * fix the cmake logic for multiple targets
      
      * fix cmake syntax for looping over targets
      
      * add gfx941/942 support for gemm_xdl instances
      027e46ee
  17. 13 Jun, 2023 1 commit
    • Haocong WANG's avatar
      AIT Attention API refactor (#8) · efee4541
      Haocong WANG authored
      * sanity pass
      
      * sanity pass 2
      
      * confirm significant performance regression.
      
      * turn on all instances
      
      * turn off instance format
      
      * Fix bug & tunning & format
      
      * DML meta, self_attn+cross_attn
      
      * sanity pass
      
      * remove useless flag
      
      * update tile and problem size used in AIT attention
      
      * bug fix in grouped conv supporting check
      efee4541
  18. 28 Apr, 2023 1 commit
  19. 27 Apr, 2023 2 commits
  20. 19 Apr, 2023 1 commit
    • Haocong WANG's avatar
      Merge origin dev (#2) · cad3212d
      Haocong WANG authored
      
      
      * [Navi3x] Fix Gridwise_multiple_d operation (#649)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * fix bug
      
      * standardize docs (#655)
      
      * Separate bibtex requirement from rocm-docs-core (#656)
      
      * separate bibtex requirement from rocm-docs-core
      
      * point requirements to source rocm-docs-core repo
      
      * Add CMake Option "USE_OPT_NAVI3X" (#647)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * remove navi3x opt compile option from cmake script
      
      * Conv + quantization + tanh  (#645)
      
      * Rename file. Prepare to support another activation
      
      * Add comment for quantization
      
      * Extract out_elementop
      
      * Add tanh example
      
      * Add conv + bias + tanh quantization instance
      
      * Add missing parameter
      
      * Refine cmake
      
      * Add external api and client example
      
      * Extract variable in example
      
      * Fix the comment
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * Add a denorm test fix (#603)
      
      * Add type_convert implementations for bf16
      
      * Add the fix for conv_fwd
      
      * Add the fix for conv_bwd_data
      
      * Add the fix for conv_bwd_weight
      
      * Format
      
      * Format
      
      * Another format
      
      * Add a macro to use workaround on MI200 only
      
      * Format
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * simplify karg in device/grid of split-k op (#644)
      
      * simplify karg in device/grid split-k op
      
      * fix mk_kn_mn instances
      
      * add more instances
      
      * use name from tensor layout
      
      * fix 3rd dword of buffer source descriptor (#659)
      
      * add fp64 instances (#658)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
      
      This reverts commit bb5530af
      
      .
      
      * Groupnorm + swish external api (#668)
      
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      
      * add a marco to turn on/off denorm fix (off by default) (#673)
      
      * add a marco to turn off denorm fix by default
      
      * expose the marco
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * fixed quant example (#672)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Add dependabot config and pin rocm-docs-core (#663)
      
      * [gtest] suppress unsafe buffer warn (#670)
      
      ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
      
      
      
      * Add memory index guard in wmma device ops (#667)
      
      * Add more macros to turn on/off denorm fix (#678)
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      
      * Fix a typo (#676)
      
      * Add (#677)
      
      * Allow using ROCm release candidate compilers. (#679)
      
      * enable use of rocm5.5 release candidate 4
      
      * upgrade to ROCM5.5 RC5
      
      * try fix the PUB_KEY error, remove the cmake-data package
      
      * upgrade to latest cmake version
      
      * use private dockerhub repo for rocm5.5 rc5
      
      * add missing bracket
      
      * add vector load check
      
      * solve conflicts
      
      ---------
      Co-authored-by: default avatarSam Wu <sjwu@ualberta.ca>
      Co-authored-by: default avatarSam Wu <sam.wu2@amd.com>
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      cad3212d
  21. 29 Mar, 2023 1 commit
  22. 15 Mar, 2023 1 commit
  23. 13 Mar, 2023 1 commit
  24. 06 Mar, 2023 1 commit
  25. 28 Feb, 2023 1 commit
  26. 24 Feb, 2023 1 commit
  27. 16 Feb, 2023 2 commits
  28. 15 Feb, 2023 1 commit
  29. 06 Feb, 2023 1 commit
    • Illia Silin's avatar
      Fix CI issues. (#572) · f73574ff
      Illia Silin authored
      * switch to recent staging compiler as default for CI
      
      * fix the baseline query
      
      * roll back sqlalchemy to version 1.4.46
      f73574ff
  30. 31 Jan, 2023 1 commit
  31. 02 Nov, 2022 1 commit
    • rocking5566's avatar
      Conv perlayer int8 quantization (#471) · 226bc02b
      rocking5566 authored
      * Add conv2d requant example
      
      * Fix bash error
      
      * Rename example
      
      * 1. Rename gemm quantization
      2. shares the requantization lambda function with conv
      
      * Refine declare type
      
      * Add conv bias relu quantization exmaple
      
      * clang format
      
      * Fix compile error due to merge develop
      
      * Fix CI error
      
      * Extract quantization post operation into another file
      
      * Support quantization for non piecewise linear function
      
      * Add instance for conv quantization
      
      * Add convolution quantization factory
      
      * Add convolution quantization client example
      
      * Add more instances with different template parameters
      
      * clang format
      
      * Sync the naming with the develop
      226bc02b
  32. 26 Oct, 2022 1 commit
  33. 03 Oct, 2022 1 commit
    • Chao Liu's avatar
      update document: Readme, contributors, citation, (#463) · 473ba5bc
      Chao Liu authored
      * update cmake script
      
      * update readme
      
      * Update README.md
      
      * add citation
      
      * add images
      
      * Update README.md
      
      * update
      
      * Update README.md
      
      * Update CONTRIBUTORS.md
      
      * Update README.md
      
      * Update CITATION.cff
      
      * Update README.md
      
      * Update CITATION.cff
      473ba5bc
  34. 21 Sep, 2022 1 commit
    • Illia Silin's avatar
      Build the CK targets only once. (#433) · 85b0920d
      Illia Silin authored
      * build CK only once, use deb package in all subsequent stages
      
      * update jenkins file
      
      * change prefix for build_CK stage
      
      * update writing deb metadata to control file
      
      * update ubuntu source for docker, script syntax for deb package metadata
      
      * try different way to create deb metadata
      
      * clean up DEBIAN before creating one
      
      * fix the CI folder names, fix splitK qa
      
      * use correct docker in all stages, separate tests for splitK verification and performance
      
      * clean old comments, change dir before packaging
      
      * use different package syntax
      
      * change packaging syntax
      
      * package with cmake
      
      * remove unnecessary build prefix
      
      * get rid of unnecessary paths
      
      * change paths during unpacking
      
      * change script syntax while unpacking
      
      * get rid of unneccesary steps
      
      * get rid of comments in the scripts
      
      * use double quotes for scripts
      
      * add ccache during build, try dpkg -x
      
      * pull and install each package separately
      
      * use full package names
      
      * try to use stashing for packages
      
      * change stash/unstash syntax
      
      * move unstash out of shell, run tests on any gpu node
      
      * unpack each package separately
      
      * try re-using existing workspace
      
      * merge the build and test stages, only stash ckProfiler
      
      * merge the build and test stages, only stash zipped ckProfiler
      
      * fix syntax
      
      * add GPU check before build and test, rename docker to usual name
      85b0920d
  35. 13 Sep, 2022 1 commit
    • Illia Silin's avatar
      Upgrade the OS and ROCM versions. (#411) · b22ebd44
      Illia Silin authored
      * upgrade the OS and ROCM versions in CK docker
      
      * add cxx flags to link code with rocm5.2 and ck-9110 compiler
      
      * rename the docker image
      
      * run ONNX gemms using init=1
      b22ebd44
  36. 07 Sep, 2022 1 commit