- 15 Dec, 2023 1 commit
-
-
trixirt authored
Copied from the llvm-project LLVM_PARALLEL_*_JOBS Concurrent linking can break the build as well as having too many compile jobs for the avaiable memory. These options allow the user to fine tune the build to fit within their machines memory constraints. An example use on linux is COMPILE_JOBS=`cat /proc/cpuinfo | grep -m 1 'cpu cores' | awk '{ print $4 }'` if [ ${COMPILE_JOBS}x = x ]; then COMPILE_JOBS=1 fi BUILD_MEM=4 MEM_KB=0 MEM_KB=`cat /proc/meminfo | grep MemTotal | awk '{ print $2 }'` MEM_MB=`eval "expr ${MEM_KB} / 1024"` MEM_GB=`eval "expr ${MEM_MB} / 1024"` COMPILE_JOBS_MEM=`eval "expr 1 + ${MEM_GB} / ${BUILD_MEM}"` if [ "$COMPILE_JOBS_MEM" -lt "$COMPILE_JOBS" ]; then COMPILE_JOBS=$COMPILE_JOBS_MEM fi LINK_MEM=32 LINK_JOBS=`eval "expr 1 + ${MEM_GB} / ${LINK_MEM}"` cmake -G Ninja -DCK_PARALLEL_LINK_JOBS=$LINK_JOBS -DCK_PARALLEL_COMPILE_JOBS=$COMPILE_JOBS Signed-off-by:Tom Rix <trix@redhat.com>
-
- 05 Dec, 2023 1 commit
-
-
Illia Silin authored
* add daily build with mainline compiler * fix the compiler paths for ci * remove the -flto flag * build with clang by default
-
- 30 Oct, 2023 1 commit
-
-
Illia Silin authored
* replace ccache with sccache, pin package versions * put ccache back temporarily to avoid breaking other CI jobs * add sccashe_wrapper.sh script * fix the package version syntax * fix the pymysql package issue * run sccache_wrapper before build if ccache server found * set the paths before calling the sccache_wrapper * use /tmp instead of /usr/local for cache * try using sccache --start-server instead of wrapper * try using redis server with sccache * define SCCACHE_REDIS * add redis and ping packages, and redis port * use the new sccache redis server * do not use sccache with staging compiler * fix the condition syntax * add stunnel to redis * add tunnel verification * separate caches for different architectures * fix syntax for the cache tag * quse double brackets for conditions * add bash line to the script * add a switch for sccache and only use it in build stage * run check_host function when enabling sccache * fix the invocation tags for sccache * fix groovy syntax * set the invocation tag in groovy * disable sccache in clang-format stage * try another syntax for invocation tags * use local sccache server if can't connect to redis * fix script syntax * update README * refresh readme * readme updates * remove the timing and verification caveat from readme --------- Co-authored-by:Lisa Delaney <lisa.delaney@amd.com>
-
- 18 Oct, 2023 1 commit
-
-
zjing14 authored
* Add a condition to build fp8 instances * simplified buffer_load/store * add bfp8/fp8 * fixed * remove all f8/bf8 condition include folder * fixed cmake conditions * fixed DTYPES=fp16/bfp16 * fix * fixed buffer_load * fixed buffer_store * fix * clean example cmake files * fixed ci * fixed cit --------- Co-authored-by:
Rostyslav Geyyer <rosty.geyyer@amd.com> Co-authored-by:
Jing Zhang <jizha@amd.com>
-
- 02 Oct, 2023 1 commit
-
-
Illia Silin authored
-
- 27 Sep, 2023 1 commit
-
-
Illia Silin authored
* split ckProfiler gfx9 package into gfx90 and gfx94 * use lower case for package names
-
- 26 Sep, 2023 2 commits
-
-
Illia Silin authored
-
Illia Silin authored
* split the types in gemm_bilinear instances, add condition to cmake policy * fix syntax * split the data types in batchnorm examples * fix the batchnorm_bwd test * fix types in the batchnorm_bwd test
-
- 21 Sep, 2023 1 commit
-
-
Illia Silin authored
* refactor cmake files for the tests * refactor cmake files for examples * fix cmake for gemm example * fix the cmake file for all examples * add splitting by data types in gemm_splitk instance header * rename test to reflect only dl instances are used * clean up CI workspace, update cmake for instances * change the jenkinsfile syntax * build all instances except DL on gfx11 * move workspace cleanup after stages * clean up workspace after every stage * isolate data types in grouped_conv_fwd header * isolate dl instances for grouped_conv2d_fwd * fix syntax * fix cmake and batchnorm instances * fix typo * fix reduction instances * fix grouped_conv headers * fix syntax * replace parsing logic for instances, replace bfp16 with bf16 * fix the client examples build * clean up DTYPES from instances cmake files * update the parsing logic in cmake files * make an exception for reduction kernels * update few remaining cmake files to handle DTYPES * fix syntax * fix cmake conflicts * replace f8 with fp8 test name * resolve conflicts for dpp instances
-
- 19 Sep, 2023 1 commit
-
-
Illia Silin authored
-
- 13 Sep, 2023 1 commit
-
-
Jun Liu authored
-
- 12 Sep, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Refactor f8_t to add bf8_t * Add check_err impl for f8_t * Update fp8 test * Format * Revert the fix * Update vector_type implementation * Add bf8 test * Add bf8, use BitInt types * Add bf8 conversion methods * Update type_convert for fp8/bf8 * Add check_err fp8/bf8 support * Add subnorm fp8 tests * Add subnorm bf8 tests * Fix conversion * Add bf8 cmake bindings * Add macros to enable build with disabled fp8/bf8 * Remove is_native method * Update flag combination for mixed precision instances * Add more flag checks * Add another flag to a client example * Add type traits, decouple f8/bf8 casting * Clean up * Decouple fp8 and bf8 flags * Remove more redundant flags * Remove leftover comments
-
- 04 Sep, 2023 1 commit
-
-
Lauren Wrubleski authored
-
- 23 Aug, 2023 1 commit
-
-
Jun Liu authored
* experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file
-
- 09 Aug, 2023 2 commits
-
-
Illia Silin authored
* add fno-offload-uniform-block flag for rocm5.7 and up * add a comment and compiler ticket number * update the threshold rocm version
-
Illia Silin authored
* add -fno-offload-uniform-block flag for rocm5.7 and up * add a comment and compiler ticket number
-
- 03 Aug, 2023 1 commit
-
-
Illia Silin authored
-
- 26 Jul, 2023 1 commit
-
-
Illia Silin authored
-
- 21 Jul, 2023 1 commit
-
-
Illia Silin authored
-
- 18 Jul, 2023 1 commit
-
-
Illia Silin authored
* allow building CK for specific data types * add CI build and test stage on Naiv3x without some int8 instances * add missing gemm fp16 instances * add the changes to the missed cmake file * add empty lines at end of source files * Do not build quantization client example on navi3 in CI * disable batched_gemm_multi_d_int8 instances with DTYPES * disable device_conv2d_bwd_data_instance with DTYPES * fix ckprofiler for conv_bwd_data for int8 * properly isolate the conv_bwd_data int8 instances * remove empty line
-
- 17 Jul, 2023 1 commit
-
-
Illia Silin authored
* check if gpu_targets are supported by compiler * set default list of targets and filter for them
-
- 29 Mar, 2023 1 commit
-
-
Haocong WANG authored
* Add CMake Option "USE_OPT_NAVI3X" * remove navi3x opt compile option from cmake script
-
- 10 Nov, 2022 1 commit
-
-
Lauren Wrubleski authored
* Add packages for example and profiler * correct TEST_NAME -> EXAMPLE_NAME
-
- 27 Oct, 2022 1 commit
-
-
Illia Silin authored
* reduce the number of default targets * re-write the setting of target flags * move all options to one place * add new custom target instances for installing CK
-
- 26 Aug, 2022 1 commit
-
-
Illia Silin authored
* replace hipcc compiler with clang++ * build client app with hipcc * build client app with clang * add an option to build with hipcc ro clang * fix the environment for client app * fix setting up compiler in cmake_build * change the way the compiler is set
-
- 18 Aug, 2022 1 commit
-
-
Adam Osewski authored
* Introduce int4 data type. * Add unit-tests for int4 * Compile int4 UT only when int4 enabled. * clang-format Co-authored-by:Adam Osewski <aosewski@amd.com>
-
- 13 Aug, 2022 1 commit
-
-
Anthony Chang authored
* initial stub for gemm_gemm_xdl_cshuffle * set up example code * compiles * prevent integer overflow * harmonize interface between ref_gemm and ref_batched_gemm * batched_gemm_gemm * fix example * host tensor gen: diagonal pattern in lowest two-dimensions only * make c descriptors containing only integral constants * clean up * add BlockwiseGemmXdlops_v2 while exploring an unified approach * implement proper interface * tidy up example * fix compilation warnings * coarsely controlled 2nd gemm padding * remove rocm-cmake's hard requirement for certain revision * clang-format * resolve merge conflict * fix compilation error on gfx10 * adds acc0 elementwise op to interface * attention host validation * add blockwsie softmax v1 * iteratively update softmax+gemm * transpose both gemm0 and gemm1 xdl output so as to avoid broadcasting softmax max/sum * add init method for easier debugging * do away with manual thread cluster calculation * generalize blockwise softmax interface * row-wise softmax sum & max * format * rename to DeviceBatchedGemmSoftmaxGemm * add gemm_softmax_gemm instances and tests * comment Co-authored-by:
ltqin <letao.qin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 30 Jun, 2022 1 commit
-
-
Liam Wrubleski authored
-
- 25 Jun, 2022 2 commits
-
-
Liam Wrubleski authored
* Switch to standard ROCm packaging * Revert .gitignore changes * install new rocm-cmake version * update readme Co-authored-by:
illsilin <Illia.Silin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
Chao Liu authored
* ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add client app example * update readme * delete obselete files * remove old client app * delete old file * cleaning * clean * remove half * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path for all examples * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * revert client app example * clean build * fix build * temporary disable client test on Jenkins * clean * clean * clean
-
- 20 May, 2022 1 commit
-
-
Chao Liu authored
-
- 13 May, 2022 1 commit
-
-
Anthony Chang authored
* validate examples in ctest runs * format * fix usage of check_err * amend * add example codes to custom target 'check' Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 12 May, 2022 1 commit
-
-
JD authored
* Add host API * manually rebase on develop * clean * manually rebase on develop * exclude tests from all target * address review comments * update client app name * fix missing lib name * clang-format update * refactor * refactor * refactor * refactor * refactor * fix test issue * refactor * refactor * refactor * upate cmake and readme Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 30 Apr, 2022 1 commit
-
-
Adam Osewski authored
* Use googletest for tests. Add conv2d_fwd UT. * Add conv1D/3D to gtest UT. * Fix: not duplicate test with CTest. * Convert more tests to googltests. * Fix: GIT_SHALLOW is not allowed for git commit hash. * Clang-format * use integer value for GEMM test Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com>
-
- 09 Mar, 2022 1 commit
-
-
Chao Liu authored
* delete obselete files * move files * build * update cmake * update cmake * fix build * reorg examples * update cmake for example and test
-
- 03 Mar, 2022 1 commit
-
-
JD authored
* add docker file and make default target buildable * add Jenkinsfile * remove empty env block * fix package stage * remove render group from docker run * clean up Jenkins file * add cppcheck as dev dependency * update cmake file * Add profiler build stage * add hip_version config file for reduction operator * correct jenkins var name * Build release instead of debug * Update test CMakeLists.txt reorg test dir add test stage * reduce compile threads to prevent compiler crash * add optional debug stage, update second test * remove old test target * fix tests to return proper results and self review * Fix package name and make test run without args * change Dockerfile to ues rocm4.3.1 * remove parallelism from build * Lower paralellism Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 19 Feb, 2022 1 commit
-
-
JD authored
* add docker file and make default target buildable * add Jenkinsfile * remove empty env block * fix package stage * remove render group from docker run * clean up Jenkins file * add cppcheck as dev dependency * update cmake file * Add profiler build stage * add hip_version config file for reduction operator * correct jenkins var name * Build release instead of debug * clean up Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 07 Feb, 2022 1 commit
-
-
Chao Liu authored
* tweak conv for odd C * update script * clean up elementwise op * fix build * clean up * added example for gemm+bias+relu+add * added example for gemm+bias+relu * add profiler for gemm_s_shuffle; re-org files * add profiler * fix build * clean up * clean up * clean up * fix build
-
- 30 Nov, 2021 1 commit
-
-
Chao Liu authored
-
- 14 Nov, 2021 1 commit
-
-
Chao Liu authored
* add DeviceGemmXdl * update script * fix naming issue * fix comment * output HostTensorDescriptor * rename * padded GEMM for fwd v4r4r4 nhwc * refactor * refactor * refactor * adding ckProfiler * adding ckProfiler * refactor * fix tuning parameter bug * add more gemm instances * add more fp16 GEMM instances * fix profiler driver * fix bug in tuning parameter * add fp32 gemm instances * small fix * refactor * rename * refactor gemm profiler; adding DeviceConv and conv profiler * refactor * fix * add conv profiler * refactor * adding more GEMM and Conv instance * Create README.md Add build instruction for ckProfiler * Create README.md Add Readme for gemm_xdl example * Update README.md Remove build instruction from top most folder * Update README.md * clean up
-