1. 04 Dec, 2023 1 commit
  2. 03 Dec, 2023 1 commit
  3. 09 Nov, 2023 1 commit
  4. 07 Nov, 2023 1 commit
  5. 30 Oct, 2023 1 commit
    • Illia Silin's avatar
      Enable sccache in the default docker and CI. (#1009) · 4e44a9e8
      Illia Silin authored
      
      
      * replace ccache with sccache, pin package versions
      
      * put ccache back temporarily to avoid breaking other CI jobs
      
      * add sccashe_wrapper.sh script
      
      * fix the package version syntax
      
      * fix the pymysql package issue
      
      * run sccache_wrapper before build if ccache server found
      
      * set the paths before calling the sccache_wrapper
      
      * use /tmp instead of /usr/local for cache
      
      * try using sccache --start-server instead of wrapper
      
      * try using redis server with sccache
      
      * define SCCACHE_REDIS
      
      * add redis and ping packages, and redis port
      
      * use the new sccache redis server
      
      * do not use sccache with staging compiler
      
      * fix the condition syntax
      
      * add stunnel to redis
      
      * add tunnel verification
      
      * separate caches for different architectures
      
      * fix syntax for the cache tag
      
      * quse double brackets for conditions
      
      * add bash line to the script
      
      * add a switch for sccache and only use it in build stage
      
      * run check_host function when enabling sccache
      
      * fix the invocation tags for sccache
      
      * fix groovy syntax
      
      * set the invocation tag in groovy
      
      * disable sccache in clang-format stage
      
      * try another syntax for invocation tags
      
      * use local sccache server if can't connect to redis
      
      * fix script syntax
      
      * update README
      
      * refresh readme
      
      * readme updates
      
      * remove the timing and verification caveat from readme
      
      ---------
      Co-authored-by: default avatarLisa Delaney <lisa.delaney@amd.com>
      4e44a9e8
  6. 11 Oct, 2023 2 commits
    • zjing14's avatar
      Revert "Grouped Gemm with looping over the tiles. (#788)" (#982) · c99323be
      zjing14 authored
      This reverts commit a4f72a31.
      c99323be
    • Adam Osewski's avatar
      Grouped Gemm with looping over the tiles. (#788) · a4f72a31
      Adam Osewski authored
      
      
      * Introduce LocalBlockToCTileMap.
      
      * Change the signature of CalculateBottomIndex() function which now does
      not accept any argument. The B2C map which is already passed as an
      argument to the kernel Run function is calculating block's local id
      already outside at kernel entry point __global__ function.
      The LocalB2C map stores as members local block ID.
      
      * Use LocalBlockToCTile map in device ops.
      
      * First draft of tile loop work distribution.
      
      * Fix typo.
      
      * Simplify kernel arguments.
      
      Calculate descriptors & B2C maps on the device.
      
      * Use looping kernel.
      
      * Fix B2C constructor.
      
      * Fix Navi21 errors.
      
      * Calculate tile start/end in device kernel.
      
      * Change Run API to accept user provided workspace buffer.
      
      * Add new line at EOF.
      
      * Move Gemm KernelArguments to device op interface.
      
      * Remove unused code.
      
      * Update API.
      
      * Launch grid size which is min of occupancy vs tile count
      
      * Get back to use constant memory for gemm descriptors.
      
      * Remove unused code.
      
      * Add default virtual method implementation.
      
      * Update comments to conform with doxygen style.
      
      * Fix doc style and unused parameters.
      
      * Add thread cluster lengths to kernel name.
      
      * Remove old splitk impl and replace it with tile looping one.
      
      * Modify instances.
      
      * set KPerBlock to 64
      * maximize wherever possible vector load size.
      
      * Fix instances cluster lengths.
      
      * Change comment style.
      
      * Use 128b store where possible in instances.
      
      * Update test cases, since KPerBlock has doubled.
      
      * Update output stream operator for Sequence.
      
      * Add pipeline version to GroupedGEMM device op type string.
      
      * Fix pipeline version type logging.
      
      * Fix input tensors type after merge.
      
      * Fix compiler error.
      
      * Fix output stream operator for Pipeline version.
      
      * Store using 128b.
      
      * Set of instances with kpb 32/64
      
      * Limit number of instances
      
      * Remove commented out instances.
      
      * Fix function name.
      
      * Limit the number of instances.
      
      Add pipline version to the regular instances
      
      * Change thr cluster layout for reading B tensor.
      
      * disabled failed instances
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      a4f72a31
  7. 31 Aug, 2023 1 commit
    • zjing14's avatar
      Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0
      zjing14 authored
      
      
      * move all arguments into device
      
      * add b2c_tile_map
      
      * add examples
      
      * add SetDeviceKernelArgs
      
      * dedicated fixed_nk solution
      
      * init client api
      
      * add grouped_gemm_bias example
      
      * add a instance
      
      * add instances
      
      * formatting
      
      * fixed cmake
      
      * Update EnableCompilerWarnings.cmake
      
      * Update cmake-ck-dev.sh
      
      * clean; fixed comments
      
      * fixed comment
      
      * add instances for fp32 output
      
      * add instances for fp32 output
      
      * add fp32 out client example
      
      * fixed CI
      
      * init commit for kbatch
      
      * add splitk gridwise
      
      * format
      
      * fixed
      
      * clean deviceop
      
      * clean code
      
      * finish splitk
      
      * fixed instances
      
      * change m_loops to tile_loops
      
      * add setkbatch
      
      * clean code
      
      * add splitK+bias
      
      * add instances
      
      * opt mk_nk instances
      
      * clean examples
      
      * fixed CI
      
      * remove zero
      
      * finished non-zero
      
      * clean
      
      * clean code
      
      * optimized global_barrier
      
      * fixed ci
      
      * fixed CI
      
      * removed AddBias
      
      * format
      
      * fixed CI
      
      * fixed CI
      
      * move 20_grouped_gemm to 21_grouped_gemm
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      f5ec04f0
  8. 23 Aug, 2023 1 commit
    • Jun Liu's avatar
      [HotFix] add config and version files to pass on build info (#856) · c8a8385f
      Jun Liu authored
      * experiment with config file
      
      * experiment with version.h config
      
      * add more info to version.h
      
      * minor updates
      
      * minor updates
      
      * fix case where DTYPE is not used
      
      * large amount of files but minor changes
      
      * remove white space
      
      * minor changes to add more MACROs
      
      * fix cmakedefine01
      
      * fix issue with CK internal conflict
      
      * fix define and define value
      
      * fix clang-format
      
      * fix formatting issue
      
      * experiment with cmake
      
      * clang format v12 to be consistent with miopen
      
      * avoid clang-format for config file
      c8a8385f
  9. 27 Jul, 2023 1 commit
  10. 06 Jul, 2023 1 commit
  11. 16 Jun, 2023 1 commit
  12. 15 Jun, 2023 1 commit
    • Illia Silin's avatar
      Enable gfx941 and gfx942 architectures. (#752) · 027e46ee
      Illia Silin authored
      * enable gfx941/942 targets
      
      * fix clang format
      
      * fix the cmake logic for multiple targets
      
      * fix cmake syntax for looping over targets
      
      * add gfx941/942 support for gemm_xdl instances
      027e46ee
  13. 28 Apr, 2023 1 commit
  14. 29 Mar, 2023 1 commit
  15. 15 Mar, 2023 1 commit
  16. 24 Feb, 2023 1 commit
  17. 15 Feb, 2023 1 commit
  18. 06 Feb, 2023 1 commit
    • Illia Silin's avatar
      Fix CI issues. (#572) · f73574ff
      Illia Silin authored
      * switch to recent staging compiler as default for CI
      
      * fix the baseline query
      
      * roll back sqlalchemy to version 1.4.46
      f73574ff
  19. 02 Nov, 2022 1 commit
    • rocking5566's avatar
      Conv perlayer int8 quantization (#471) · 226bc02b
      rocking5566 authored
      * Add conv2d requant example
      
      * Fix bash error
      
      * Rename example
      
      * 1. Rename gemm quantization
      2. shares the requantization lambda function with conv
      
      * Refine declare type
      
      * Add conv bias relu quantization exmaple
      
      * clang format
      
      * Fix compile error due to merge develop
      
      * Fix CI error
      
      * Extract quantization post operation into another file
      
      * Support quantization for non piecewise linear function
      
      * Add instance for conv quantization
      
      * Add convolution quantization factory
      
      * Add convolution quantization client example
      
      * Add more instances with different template parameters
      
      * clang format
      
      * Sync the naming with the develop
      226bc02b
  20. 26 Oct, 2022 1 commit
  21. 03 Oct, 2022 1 commit
    • Chao Liu's avatar
      update document: Readme, contributors, citation, (#463) · 473ba5bc
      Chao Liu authored
      * update cmake script
      
      * update readme
      
      * Update README.md
      
      * add citation
      
      * add images
      
      * Update README.md
      
      * update
      
      * Update README.md
      
      * Update CONTRIBUTORS.md
      
      * Update README.md
      
      * Update CITATION.cff
      
      * Update README.md
      
      * Update CITATION.cff
      473ba5bc
  22. 21 Sep, 2022 1 commit
    • Illia Silin's avatar
      Build the CK targets only once. (#433) · 85b0920d
      Illia Silin authored
      * build CK only once, use deb package in all subsequent stages
      
      * update jenkins file
      
      * change prefix for build_CK stage
      
      * update writing deb metadata to control file
      
      * update ubuntu source for docker, script syntax for deb package metadata
      
      * try different way to create deb metadata
      
      * clean up DEBIAN before creating one
      
      * fix the CI folder names, fix splitK qa
      
      * use correct docker in all stages, separate tests for splitK verification and performance
      
      * clean old comments, change dir before packaging
      
      * use different package syntax
      
      * change packaging syntax
      
      * package with cmake
      
      * remove unnecessary build prefix
      
      * get rid of unnecessary paths
      
      * change paths during unpacking
      
      * change script syntax while unpacking
      
      * get rid of unneccesary steps
      
      * get rid of comments in the scripts
      
      * use double quotes for scripts
      
      * add ccache during build, try dpkg -x
      
      * pull and install each package separately
      
      * use full package names
      
      * try to use stashing for packages
      
      * change stash/unstash syntax
      
      * move unstash out of shell, run tests on any gpu node
      
      * unpack each package separately
      
      * try re-using existing workspace
      
      * merge the build and test stages, only stash ckProfiler
      
      * merge the build and test stages, only stash zipped ckProfiler
      
      * fix syntax
      
      * add GPU check before build and test, rename docker to usual name
      85b0920d
  23. 13 Sep, 2022 1 commit
    • Illia Silin's avatar
      Upgrade the OS and ROCM versions. (#411) · b22ebd44
      Illia Silin authored
      * upgrade the OS and ROCM versions in CK docker
      
      * add cxx flags to link code with rocm5.2 and ck-9110 compiler
      
      * rename the docker image
      
      * run ONNX gemms using init=1
      b22ebd44
  24. 07 Sep, 2022 1 commit
  25. 26 Aug, 2022 1 commit
  26. 25 Aug, 2022 2 commits
  27. 08 Aug, 2022 1 commit
    • Illia Silin's avatar
      Fix QA, allow switching compiler versions, fix google test compilation error. (#348) · aba7fefc
      Illia Silin authored
      * allow selecting compiler version
      
      * fix typo
      
      * add Wno-deprecated flag for google tests
      
      * change git repo, fix qa log files names
      
      * change the git clone syntax
      
      * use Omkar's git credentials
      
      * try to use jenkins as git user
      
      * try using illsilin username for gerrit repo with ssh key
      
      * try new gerrit authorization
      
      * change ssh key syntax
      
      * try another way of passing ssh key to docker
      
      * add mount ssh in dockerfile
      
      * create .ssh folder
      
      * move ssh-keyscan to later
      
      * get rid of npm call
      
      * build first docker image on master
      
      * check the contents of the .ssh folder
      
      * try replacing omkars creds with gerrit creds
      
      * use open repo, clean up changes
      
      * get rid of ssh default argument
      aba7fefc
  28. 02 Aug, 2022 1 commit
    • Illia Silin's avatar
      Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339) · 984b3722
      Illia Silin authored
      
      
      * turn on full qa only on gfx90a, use int initialization
      
      * change script syntax
      
      * update script parsing clinfo, throw exception if 0 devices
      
      * fix syntax
      
      * try using toBoolean for the QA conditions
      
      * run regular CI on MI100 only, use MI200 only for daily QA
      
      * evaluate when conditions before agent
      
      * launch QA on develop branch and update profile_reduce script
      
      * update test script
      
      * update script
      
      * remove false dependency from dockerfile
      
      * try removing rbuild completely
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarChao Liu <lc.roy86@gmail.com>
      984b3722
  29. 21 Jul, 2022 1 commit
    • Illia Silin's avatar
      Add full QA with verification option, few other changes. (#331) · d8415a96
      Illia Silin authored
      * add verify flag and update scripts
      
      * replace old check_error function with the new check_err
      
      * fix syntax
      
      * remove blank spaces
      
      * remove empty line
      
      * add check_err for tensors
      
      * fix syntax
      
      * replace tensors with vectors in check_err calls
      
      * fix syntax
      
      * remove blank spaces
      
      * fix syntax
      
      * add new line at end of file
      
      * disable conv2d_bwd_weight test, add gpu check
      
      * set check_gpu using export
      
      * check GPU using runShell
      
      * add definition of runShell
      
      * fix script syntax
      
      * reduce the number of threads, add full qa option
      
      * run processing scripts in bash
      
      * fix the branch and host names in performance scripts, add chronos
      
      * replace parameterizedCron with cron
      
      * archive the perf log files
      
      * try to fix git call
      
      * pass branch and host names as arguments into scripts
      
      * fix script arguments
      
      * fix script arguments
      
      * process results on master
      
      * fix pipeline
      
      * add definition of gpu_arch
      
      * run processing scripts in docker
      
      * fix the brackets
      
      * add agent master for the processing stage
      
      * get rid of show_node_info call on master
      
      * try using mici label instead of master, disable MI100 tests for now
      
      * fix syntax
      
      * simplify container for results processing
      
      * remove node(master) from the process_results stage
      
      * put all stages in original order
      
      * change the agent label from master to mici for gfx908
      d8415a96
  30. 13 Jul, 2022 1 commit
    • Illia Silin's avatar
      Add switch between compilers, make 9110 compiler default, add full QA scripts. (#322) · 39acaea3
      Illia Silin authored
      * adding scripts for full perf test suite
      
      * uncomment the sql queries
      
      * fix typo and chmod a+x for scripts
      
      * dos2unix for all new scripts
      
      * disable verification in full performance test
      
      * fix reduction scripts, add gfrouped_gemm hotfix
      
      * fix the grouped_gemm hotfix and only run reduction for fp16
      
      * change compiler flag syntax
      
      * fix syntax
      
      * add predefinition of dockerArgs
      
      * avoid redefinitions of dockerArgs
      
      * add blank space at the end of dockerArgs
      
      * try to build with release compiler
      
      * adding spaces inside if condition
      
      * limit the number of threads for building 9110 compiler
      
      * change the way HIP_CLANG_PATH is set
      
      * remove the export command
      
      * change the conditional ENV syntax
      
      * set HIP_CLANG_PATH at docker run time
      
      * update scripts for full qa
      
      * enable the sql write query
      
      * fix typo
      
      * remove a comment from a script
      39acaea3
  31. 01 Jul, 2022 1 commit
  32. 23 Jun, 2022 1 commit
    • Adam Osewski's avatar
      Testing all fwd convolution specializations. (#259) · a2edd7d8
      Adam Osewski authored
      
      
      * UniforFill with integer values.
      
      * Log tested instance type string.
      
      * Add UT for all convolution specializations.
      
      * debugging conv
      
      * Fix dangling reference bug.
      
      * Small refinements.
      
      * Fix call to error checking function.
      
      * Small refinements to tests.
      
      * Configure error tolerance
      * Change problem size.
      * Remove OddC case from types that do not support it.
      
      * Add helper traits for AccumulatorDataType.
      
      * Print first 5 errs in check_err for integral types.
      
      * Rename FillUniform to FillUniformDistribution
      
      * Refactor
      
      * Do not use typed tests.
      * Instead use plain fixture class with templatized member functions.
      * Initialize tensors with integer values.
      
      * Refine test instances.
      
      * Properly set accumulator data type.
      * Add another "big" instance.
      
      * Refactor convolution tests.
      
      * Revert "debugging conv"
      
      This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb.
      
      * Add pragma once + format + small refinement.
      
      * Fix some unwanted changes.
      
      * Clang-format
      
      * Fix profile_convnd to use renamed tensor initializer.
      
      * Add instances for ConvFWDND kernel case 2D
      
      * Helpers to get ConvNDFwd 2D instances.
      
      * Refactoring.
      
      * Remove "small block" instance as it was generating compiler errors.
      * Remove default template parameters values.
      
      * Refine and fix test.
      
      * Fix problem with default template parameter types.
      * Adjust error thresholds for floating point values test.
      * Use integer values initialization for instances test.
      * Add tests for ConvNDFwd 2D case.
      
      * Remove AccumulatorDataType type trait.
      
      * Update unit-tests.
      
      * Remove operator<< overload.
      
      * Unlock conv1d/3d nd fwd instances.
      
      * Enable skipping calculating reference using flag.
      
      * Fix number of channels for first ResNet50 layer.
      
      * Clang-format.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      a2edd7d8
  33. 21 Jun, 2022 1 commit
  34. 16 Jun, 2022 1 commit
    • Illia Silin's avatar
      Use new github credentials (#278) · fb9b6b1e
      Illia Silin authored
      * use pre-built docker instead of building a new one
      
      * try docker.image.pull
      
      * change syntax in docker.image()
      
      * add 30 min timeout
      
      * increase timeout to 3 hours
      
      * move performance tests to first stage for testing
      
      * set image variable to the new container name
      
      * update image name
      
      * check available images
      
      * check available images in both places
      
      * try different image name
      
      * use image ID to refer to image
      
      * run performance on gfx90a
      
      * fix the gpu_arch labeling, add parameter
      
      * move env vars out of stages
      
      * add stand-alone performance script, MI200 tests, CU numbers
      
      * dos2unix for run_perf_tests.sh
      
      * try the new git credentials
      
      * use env var for git credentials
      fb9b6b1e
  35. 10 Jun, 2022 1 commit
    • Illia Silin's avatar
      Add performance tests on MI200 in CI, reporting number of CUs, add stand-alone perf test. (#277) · 1ced00a5
      Illia Silin authored
      * use pre-built docker instead of building a new one
      
      * try docker.image.pull
      
      * change syntax in docker.image()
      
      * add 30 min timeout
      
      * increase timeout to 3 hours
      
      * move performance tests to first stage for testing
      
      * set image variable to the new container name
      
      * update image name
      
      * check available images
      
      * check available images in both places
      
      * try different image name
      
      * use image ID to refer to image
      
      * run performance on gfx90a
      
      * fix the gpu_arch labeling, add parameter
      
      * move env vars out of stages
      
      * add stand-alone performance script, MI200 tests, CU numbers
      1ced00a5
  36. 02 Jun, 2022 1 commit
    • Illia Silin's avatar
      Adding Resnet50 test to Performance tests (#268) · 1677cf70
      Illia Silin authored
      * add resnet50 test to performance tests
      
      * add blanks before gpu_arch in log files
      
      * add resnet50 test with N=4 and process its results
      
      * add ROCM and HIP versions to test tables
      
      * uncomment the sql queries
      
      * fix script syntax in jenkinsfile
      1677cf70
  37. 24 May, 2022 2 commits
    • Qianfeng's avatar
      Overhaul to Reducton and its dependants (#237) · 63eee2d9
      Qianfeng authored
      * Tiny fix in dynamic_buffer.hpp to support vectorized AtomicAdd for double type
      
      * Update to host layer and host reduction
      
      * Merge and remove reduction kernels
      
      * Merge and remove reduction device interfaces and update pooling device interface
      
      * Merge and remove useless reduction device instances
      
      * Update to reduction profiler and reduction ctests
      
      * Update to reduction and pooling examples and add one reduction example
      
      * Change to reduction examples to let them testable by ctest
      
      * Add explicit pass checking for reduction and pooling examples
      
      * Explicit assignment of tensor shapes in example reduce_blockwise_two_call
      
      * Use atomic_add to repace atomicAdd and add atomic_add for double type
      
      * Add reduce ctest support for double data type
      
      * Replace to_int_vector() by using c++ std::vector::assign()
      
      * Keep DeviceReduceThreadWise separated from DeviceReduceBlockWise
      
      * Merge DeviceReduceBlockWise and DeviceReduceMultiBlockAtomicAdd into DeviceReduceMultiBlock
      
      * Add GetAtomicOperationZeroValue() support for AtomicMax
      
      * Tiny change to reduce example README.md
      
      * Fix some tiny issues due to branch merging
      
      * Revoke previous change in dynamic_buffer.hpp and add atomic_add for double2_t
      
      * Add reduce multiblock_atomic_add instances for fp64 to verify vectorized atomic_add on fp64
      
      * Renaming
      
      * Clean the header includings in device_reduce instances header files
      63eee2d9
    • Illia Silin's avatar
      Add performance tests as a stage of CI. (#247) · 1085794d
      Illia Silin authored
      * modify ckProfiler_gemm output
      
      * fix syntax
      
      * change ckProfiler output and return 0
      
      * fix syntax
      
      * output datatype
      
      * fix syntax
      
      * output datatype in another way
      
      * fix syntax
      
      * fix syntax
      
      * test return values of ckProfiler
      
      * add layout info and tests, make sure ckprofiler returns 0
      
      * fix syntax
      
      * change layout output
      
      * fix syntax
      
      * fix syntax again
      
      * update script to process perf results
      
      * rearrange jenkins stages
      
      * fix typo
      
      * add python packages to Docker file
      
      * adding setuptools-rust package
      
      * modify parsing for new test parameters
      
      * test db credentials on jenkins
      
      * fix syntax
      
      * update python script to handle incomplete lines
      
      * ungrade python to 3.8 and write the gemm_params table
      
      * add sqlalchemy package to docker
      
      * move perf data processing to master node
      
      * move the master node inside a steps region
      
      * add new stage for result processing
      
      * move results processing to separate stage
      
      * reduce number of tests to speedup debugging
      
      * pass config to processPerfResults stage
      
      * run script on master in a docker container
      
      * replace show_node_info
      
      * try loading docker on master node again
      
      * use ansible node instead of master
      
      * get rid of pymysql package
      
      * try ssh connection using paramiko
      
      * put back pymysql
      
      * put the perf data processing back on the gpu node
      
      * put back artifact definition
      
      * archive the perf_log before parsing
      
      * clean up jenkinsfile, fix parsing
      
      * fix typo
      
      * enable all perf tests
      
      * put all stages in original order, finalize script
      
      * fix gpu_arch version
      
      * update parsing script
      
      * remove obsolete file causing merge conflict
      1085794d