- 09 Nov, 2023 1 commit
-
-
Illia Silin authored
-
- 07 Nov, 2023 1 commit
-
-
zjing14 authored
* improve kpad * more tuning parameters * f16_f8_fp16 * cut test time * add f16_f8_fp16 * add f16_f8_f16 * testing instances for skinny cases * format * clean * add fp16_f8_fp16 * clang-format * add grouped gemm instalces * fixed profile grouped_gemm * clean * clean * clean * clean * clean * add missing instance func * fixed inferface --------- Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
root <root@sh5-1e707-rc06-38.mkm.dcgpu>
-
- 30 Oct, 2023 1 commit
-
-
Illia Silin authored
* replace ccache with sccache, pin package versions * put ccache back temporarily to avoid breaking other CI jobs * add sccashe_wrapper.sh script * fix the package version syntax * fix the pymysql package issue * run sccache_wrapper before build if ccache server found * set the paths before calling the sccache_wrapper * use /tmp instead of /usr/local for cache * try using sccache --start-server instead of wrapper * try using redis server with sccache * define SCCACHE_REDIS * add redis and ping packages, and redis port * use the new sccache redis server * do not use sccache with staging compiler * fix the condition syntax * add stunnel to redis * add tunnel verification * separate caches for different architectures * fix syntax for the cache tag * quse double brackets for conditions * add bash line to the script * add a switch for sccache and only use it in build stage * run check_host function when enabling sccache * fix the invocation tags for sccache * fix groovy syntax * set the invocation tag in groovy * disable sccache in clang-format stage * try another syntax for invocation tags * use local sccache server if can't connect to redis * fix script syntax * update README * refresh readme * readme updates * remove the timing and verification caveat from readme --------- Co-authored-by:Lisa Delaney <lisa.delaney@amd.com>
-
- 11 Oct, 2023 2 commits
-
-
Adam Osewski authored
* Introduce LocalBlockToCTileMap. * Change the signature of CalculateBottomIndex() function which now does not accept any argument. The B2C map which is already passed as an argument to the kernel Run function is calculating block's local id already outside at kernel entry point __global__ function. The LocalB2C map stores as members local block ID. * Use LocalBlockToCTile map in device ops. * First draft of tile loop work distribution. * Fix typo. * Simplify kernel arguments. Calculate descriptors & B2C maps on the device. * Use looping kernel. * Fix B2C constructor. * Fix Navi21 errors. * Calculate tile start/end in device kernel. * Change Run API to accept user provided workspace buffer. * Add new line at EOF. * Move Gemm KernelArguments to device op interface. * Remove unused code. * Update API. * Launch grid size which is min of occupancy vs tile count * Get back to use constant memory for gemm descriptors. * Remove unused code. * Add default virtual method implementation. * Update comments to conform with doxygen style. * Fix doc style and unused parameters. * Add thread cluster lengths to kernel name. * Remove old splitk impl and replace it with tile looping one. * Modify instances. * set KPerBlock to 64 * maximize wherever possible vector load size. * Fix instances cluster lengths. * Change comment style. * Use 128b store where possible in instances. * Update test cases, since KPerBlock has doubled. * Update output stream operator for Sequence. * Add pipeline version to GroupedGEMM device op type string. * Fix pipeline version type logging. * Fix input tensors type after merge. * Fix compiler error. * Fix output stream operator for Pipeline version. * Store using 128b. * Set of instances with kpb 32/64 * Limit number of instances * Remove commented out instances. * Fix function name. * Limit the number of instances. Add pipline version to the regular instances * Change thr cluster layout for reading B tensor. * disabled failed instances --------- Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Jing Zhang <jizha@amd.com>
- 31 Aug, 2023 1 commit
-
-
zjing14 authored
* move all arguments into device * add b2c_tile_map * add examples * add SetDeviceKernelArgs * dedicated fixed_nk solution * init client api * add grouped_gemm_bias example * add a instance * add instances * formatting * fixed cmake * Update EnableCompilerWarnings.cmake * Update cmake-ck-dev.sh * clean; fixed comments * fixed comment * add instances for fp32 output * add instances for fp32 output * add fp32 out client example * fixed CI * init commit for kbatch * add splitk gridwise * format * fixed * clean deviceop * clean code * finish splitk * fixed instances * change m_loops to tile_loops * add setkbatch * clean code * add splitK+bias * add instances * opt mk_nk instances * clean examples * fixed CI * remove zero * finished non-zero * clean * clean code * optimized global_barrier * fixed ci * fixed CI * removed AddBias * format * fixed CI * fixed CI * move 20_grouped_gemm to 21_grouped_gemm --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 23 Aug, 2023 1 commit
-
-
Jun Liu authored
* experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file
-
- 27 Jul, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add s_nops after v_dot to avoid hazard * Fix builtin for inner_produxt fp16 * Skip inline version to builtin * Add comments regarding isa * Fix comment regarding s_nop
-
- 06 Jul, 2023 1 commit
-
-
Adam Osewski authored
* Add basic setup for precommit * Update README.md with instructions on installing precommit hooks --------- Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
Bartlomiej Wroblewski <bwroblewski10@gmail.com>
-
- 16 Jun, 2023 1 commit
-
-
Illia Silin authored
-
- 15 Jun, 2023 1 commit
-
-
Illia Silin authored
* enable gfx941/942 targets * fix clang format * fix the cmake logic for multiple targets * fix cmake syntax for looping over targets * add gfx941/942 support for gemm_xdl instances
-
- 28 Apr, 2023 1 commit
-
-
Illia Silin authored
* enable gfx940 * switch between intrinsic mfma routines on mi100/200 and mi300 * fix mfma_int8 on MI300 * disable 2 int8 examples on MI300 * Update cmake-ck-dev.sh * restore gitignore file * modify Jenkinsfile to the internal repo --------- Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 29 Mar, 2023 1 commit
-
-
Haocong WANG authored
* Add CMake Option "USE_OPT_NAVI3X" * remove navi3x opt compile option from cmake script
-
- 15 Mar, 2023 1 commit
-
-
Rostyslav Geyyer authored
Co-authored-by:Rosty Geyyer <rosty.geyyer@amd.com>
-
- 24 Feb, 2023 1 commit
-
-
zjing14 authored
-
- 15 Feb, 2023 1 commit
-
-
zjing14 authored
* add contraction_bilinear * add contraction_scale_xdl_fp64 * reduce tile size to avoid register spill --------- Co-authored-by:root <root@ctr-ubbsmc16.amd.com>
-
- 06 Feb, 2023 1 commit
-
-
Illia Silin authored
* switch to recent staging compiler as default for CI * fix the baseline query * roll back sqlalchemy to version 1.4.46
-
- 02 Nov, 2022 1 commit
-
-
rocking5566 authored
* Add conv2d requant example * Fix bash error * Rename example * 1. Rename gemm quantization 2. shares the requantization lambda function with conv * Refine declare type * Add conv bias relu quantization exmaple * clang format * Fix compile error due to merge develop * Fix CI error * Extract quantization post operation into another file * Support quantization for non piecewise linear function * Add instance for conv quantization * Add convolution quantization factory * Add convolution quantization client example * Add more instances with different template parameters * clang format * Sync the naming with the develop
-
- 26 Oct, 2022 1 commit
-
-
Illia Silin authored
-
- 03 Oct, 2022 1 commit
-
-
Chao Liu authored
* update cmake script * update readme * Update README.md * add citation * add images * Update README.md * update * Update README.md * Update CONTRIBUTORS.md * Update README.md * Update CITATION.cff * Update README.md * Update CITATION.cff
-
- 21 Sep, 2022 1 commit
-
-
Illia Silin authored
* build CK only once, use deb package in all subsequent stages * update jenkins file * change prefix for build_CK stage * update writing deb metadata to control file * update ubuntu source for docker, script syntax for deb package metadata * try different way to create deb metadata * clean up DEBIAN before creating one * fix the CI folder names, fix splitK qa * use correct docker in all stages, separate tests for splitK verification and performance * clean old comments, change dir before packaging * use different package syntax * change packaging syntax * package with cmake * remove unnecessary build prefix * get rid of unnecessary paths * change paths during unpacking * change script syntax while unpacking * get rid of unneccesary steps * get rid of comments in the scripts * use double quotes for scripts * add ccache during build, try dpkg -x * pull and install each package separately * use full package names * try to use stashing for packages * change stash/unstash syntax * move unstash out of shell, run tests on any gpu node * unpack each package separately * try re-using existing workspace * merge the build and test stages, only stash ckProfiler * merge the build and test stages, only stash zipped ckProfiler * fix syntax * add GPU check before build and test, rename docker to usual name
-
- 13 Sep, 2022 1 commit
-
-
Illia Silin authored
* upgrade the OS and ROCM versions in CK docker * add cxx flags to link code with rocm5.2 and ck-9110 compiler * rename the docker image * run ONNX gemms using init=1
-
- 07 Sep, 2022 1 commit
-
-
Illia Silin authored
* add processing for the onng_gemm and splitK_gemm * add profile_onnx_gemm.sh * add stderr to logfiles, add splitK and onnx gemm parsing * enable splitK gemm wresults posting to db
-
- 26 Aug, 2022 1 commit
-
-
zjing14 authored
* add scripts * fixed splitK_gemm_fp32 * clean * clean
-
- 25 Aug, 2022 2 commits
-
-
Adam Osewski authored
* Grouped GEmm int4. * Formatting + fix K dimension for int8. * Batched Gemm int4 example. * CGEMM int4 example. * Include inc filese in clang-format. * SplitK int4 example * Refactoring of performance measurement. * Fix #ifdef statements. Co-authored-by:Adam Osewski <aosewski@amd.com>
-
zjing14 authored
-
- 08 Aug, 2022 1 commit
-
-
Illia Silin authored
* allow selecting compiler version * fix typo * add Wno-deprecated flag for google tests * change git repo, fix qa log files names * change the git clone syntax * use Omkar's git credentials * try to use jenkins as git user * try using illsilin username for gerrit repo with ssh key * try new gerrit authorization * change ssh key syntax * try another way of passing ssh key to docker * add mount ssh in dockerfile * create .ssh folder * move ssh-keyscan to later * get rid of npm call * build first docker image on master * check the contents of the .ssh folder * try replacing omkars creds with gerrit creds * use open repo, clean up changes * get rid of ssh default argument
-
- 02 Aug, 2022 1 commit
-
-
Illia Silin authored
* turn on full qa only on gfx90a, use int initialization * change script syntax * update script parsing clinfo, throw exception if 0 devices * fix syntax * try using toBoolean for the QA conditions * run regular CI on MI100 only, use MI200 only for daily QA * evaluate when conditions before agent * launch QA on develop branch and update profile_reduce script * update test script * update script * remove false dependency from dockerfile * try removing rbuild completely Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com>
-
- 21 Jul, 2022 1 commit
-
-
Illia Silin authored
* add verify flag and update scripts * replace old check_error function with the new check_err * fix syntax * remove blank spaces * remove empty line * add check_err for tensors * fix syntax * replace tensors with vectors in check_err calls * fix syntax * remove blank spaces * fix syntax * add new line at end of file * disable conv2d_bwd_weight test, add gpu check * set check_gpu using export * check GPU using runShell * add definition of runShell * fix script syntax * reduce the number of threads, add full qa option * run processing scripts in bash * fix the branch and host names in performance scripts, add chronos * replace parameterizedCron with cron * archive the perf log files * try to fix git call * pass branch and host names as arguments into scripts * fix script arguments * fix script arguments * process results on master * fix pipeline * add definition of gpu_arch * run processing scripts in docker * fix the brackets * add agent master for the processing stage * get rid of show_node_info call on master * try using mici label instead of master, disable MI100 tests for now * fix syntax * simplify container for results processing * remove node(master) from the process_results stage * put all stages in original order * change the agent label from master to mici for gfx908
-
- 13 Jul, 2022 1 commit
-
-
Illia Silin authored
* adding scripts for full perf test suite * uncomment the sql queries * fix typo and chmod a+x for scripts * dos2unix for all new scripts * disable verification in full performance test * fix reduction scripts, add gfrouped_gemm hotfix * fix the grouped_gemm hotfix and only run reduction for fp16 * change compiler flag syntax * fix syntax * add predefinition of dockerArgs * avoid redefinitions of dockerArgs * add blank space at the end of dockerArgs * try to build with release compiler * adding spaces inside if condition * limit the number of threads for building 9110 compiler * change the way HIP_CLANG_PATH is set * remove the export command * change the conditional ENV syntax * set HIP_CLANG_PATH at docker run time * update scripts for full qa * enable the sql write query * fix typo * remove a comment from a script
-
- 01 Jul, 2022 1 commit
-
-
Chao Liu authored
* interface for GEMM and GEMM+add+add+fastgelu * rename namespace * instance factory * fix build * fix build; add GEMM client example * clean
-
- 23 Jun, 2022 1 commit
-
-
Adam Osewski authored
* UniforFill with integer values. * Log tested instance type string. * Add UT for all convolution specializations. * debugging conv * Fix dangling reference bug. * Small refinements. * Fix call to error checking function. * Small refinements to tests. * Configure error tolerance * Change problem size. * Remove OddC case from types that do not support it. * Add helper traits for AccumulatorDataType. * Print first 5 errs in check_err for integral types. * Rename FillUniform to FillUniformDistribution * Refactor * Do not use typed tests. * Instead use plain fixture class with templatized member functions. * Initialize tensors with integer values. * Refine test instances. * Properly set accumulator data type. * Add another "big" instance. * Refactor convolution tests. * Revert "debugging conv" This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb. * Add pragma once + format + small refinement. * Fix some unwanted changes. * Clang-format * Fix profile_convnd to use renamed tensor initializer. * Add instances for ConvFWDND kernel case 2D * Helpers to get ConvNDFwd 2D instances. * Refactoring. * Remove "small block" instance as it was generating compiler errors. * Remove default template parameters values. * Refine and fix test. * Fix problem with default template parameter types. * Adjust error thresholds for floating point values test. * Use integer values initialization for instances test. * Add tests for ConvNDFwd 2D case. * Remove AccumulatorDataType type trait. * Update unit-tests. * Remove operator<< overload. * Unlock conv1d/3d nd fwd instances. * Enable skipping calculating reference using flag. * Fix number of channels for first ResNet50 layer. * Clang-format. Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 21 Jun, 2022 1 commit
-
-
Chao Liu authored
-
- 16 Jun, 2022 1 commit
-
-
Illia Silin authored
* use pre-built docker instead of building a new one * try docker.image.pull * change syntax in docker.image() * add 30 min timeout * increase timeout to 3 hours * move performance tests to first stage for testing * set image variable to the new container name * update image name * check available images * check available images in both places * try different image name * use image ID to refer to image * run performance on gfx90a * fix the gpu_arch labeling, add parameter * move env vars out of stages * add stand-alone performance script, MI200 tests, CU numbers * dos2unix for run_perf_tests.sh * try the new git credentials * use env var for git credentials
-
- 10 Jun, 2022 1 commit
-
-
Illia Silin authored
* use pre-built docker instead of building a new one * try docker.image.pull * change syntax in docker.image() * add 30 min timeout * increase timeout to 3 hours * move performance tests to first stage for testing * set image variable to the new container name * update image name * check available images * check available images in both places * try different image name * use image ID to refer to image * run performance on gfx90a * fix the gpu_arch labeling, add parameter * move env vars out of stages * add stand-alone performance script, MI200 tests, CU numbers
-
- 02 Jun, 2022 1 commit
-
-
Illia Silin authored
* add resnet50 test to performance tests * add blanks before gpu_arch in log files * add resnet50 test with N=4 and process its results * add ROCM and HIP versions to test tables * uncomment the sql queries * fix script syntax in jenkinsfile
-
- 24 May, 2022 2 commits
-
-
Qianfeng authored
* Tiny fix in dynamic_buffer.hpp to support vectorized AtomicAdd for double type * Update to host layer and host reduction * Merge and remove reduction kernels * Merge and remove reduction device interfaces and update pooling device interface * Merge and remove useless reduction device instances * Update to reduction profiler and reduction ctests * Update to reduction and pooling examples and add one reduction example * Change to reduction examples to let them testable by ctest * Add explicit pass checking for reduction and pooling examples * Explicit assignment of tensor shapes in example reduce_blockwise_two_call * Use atomic_add to repace atomicAdd and add atomic_add for double type * Add reduce ctest support for double data type * Replace to_int_vector() by using c++ std::vector::assign() * Keep DeviceReduceThreadWise separated from DeviceReduceBlockWise * Merge DeviceReduceBlockWise and DeviceReduceMultiBlockAtomicAdd into DeviceReduceMultiBlock * Add GetAtomicOperationZeroValue() support for AtomicMax * Tiny change to reduce example README.md * Fix some tiny issues due to branch merging * Revoke previous change in dynamic_buffer.hpp and add atomic_add for double2_t * Add reduce multiblock_atomic_add instances for fp64 to verify vectorized atomic_add on fp64 * Renaming * Clean the header includings in device_reduce instances header files
-
Illia Silin authored
* modify ckProfiler_gemm output * fix syntax * change ckProfiler output and return 0 * fix syntax * output datatype * fix syntax * output datatype in another way * fix syntax * fix syntax * test return values of ckProfiler * add layout info and tests, make sure ckprofiler returns 0 * fix syntax * change layout output * fix syntax * fix syntax again * update script to process perf results * rearrange jenkins stages * fix typo * add python packages to Docker file * adding setuptools-rust package * modify parsing for new test parameters * test db credentials on jenkins * fix syntax * update python script to handle incomplete lines * ungrade python to 3.8 and write the gemm_params table * add sqlalchemy package to docker * move perf data processing to master node * move the master node inside a steps region * add new stage for result processing * move results processing to separate stage * reduce number of tests to speedup debugging * pass config to processPerfResults stage * run script on master in a docker container * replace show_node_info * try loading docker on master node again * use ansible node instead of master * get rid of pymysql package * try ssh connection using paramiko * put back pymysql * put the perf data processing back on the gpu node * put back artifact definition * archive the perf_log before parsing * clean up jenkinsfile, fix parsing * fix typo * enable all perf tests * put all stages in original order, finalize script * fix gpu_arch version * update parsing script * remove obsolete file causing merge conflict
-
- 08 May, 2022 1 commit
-
-
Illia Silin authored
* add performance test to jenkins pipeline * fix typo * fix the syntax in conv_fwd_util.cpp * fix the error message syntax spacing * fix the error message syntax spacing again * run profile_gemm and archive results * fix typo * try to figure out the paths * try to figure out the paths one more time * skip the copying step * build ckProfiler release only once * change directory using dir * fix dir syntax * change the gemm parameters * do not pipe script output to file * try running ckProfiler directly * fix typo * use set +e * run profile_gemm.sh || true * run multiple gemms and parse results * fix typo in jenkinsfile * fix syntax * add new gemm sizes, update scripts * put all jenkins steps in original order Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com>
-
- 22 Apr, 2022 1 commit
-
-
Adam Osewski authored
-