- 05 Oct, 2022 2 commits
- 04 Oct, 2022 4 commits
- 03 Oct, 2022 3 commits
-
-
Chao Liu authored
* update cmake script * update readme * Update README.md * add citation * add images * Update README.md * update * Update README.md * Update CONTRIBUTORS.md * Update README.md * Update CITATION.cff * Update README.md * Update CITATION.cff * update doc * Update CONTRIBUTORS.md * Update LICENSE * update
-
Chao Liu authored
* update cmake script * update readme * Update README.md * add citation * add images * Update README.md * update * Update README.md * Update CONTRIBUTORS.md * Update README.md * Update CITATION.cff * Update README.md * Update CITATION.cff * update doc * Update CONTRIBUTORS.md * Update LICENSE
-
Chao Liu authored
* update cmake script * update readme * Update README.md * add citation * add images * Update README.md * update * Update README.md * Update CONTRIBUTORS.md * Update README.md * Update CITATION.cff * Update README.md * Update CITATION.cff
-
- 01 Oct, 2022 1 commit
-
-
Illia Silin authored
* enable ccache and decouple it from MIOpen ccache use * fix the ccache check script * use another method to get server name * fix syntax * add quotes around the server name variable * use check_host as function * change syntax * fix syntax * test if server name is parsed correctly * try different syntax * check the env var value * test new check node function * add ROCMVERSION parameter and fix script syntax * fix script syntax * add missing instances of rocm version * install ccache in the docker image * do not check GPU in clang format stage, clean up old code * update defaults and clean up
-
- 28 Sep, 2022 3 commits
- 27 Sep, 2022 2 commits
-
-
Illia Silin authored
* add an option to select specific compiler commit * change the logic of forcing building a docker * add check for compiler commit in dockerfile * compiler check syntax fix * change compiler selection logic * fix the new compiler build issue * set new compiler as default, update dev-requirements * fix jenkins syntax * fix docker syntax * get rid of hipcc.pl editing in jenkinsfile * fix the hipcc.pl in both places * try to fix the 10738 compiler linking bug * fix syntax * use dockerhub to store images * use newer amd-stg-open commit as default
-
Astha Rai authored
-
- 26 Sep, 2022 3 commits
- 25 Sep, 2022 4 commits
- 23 Sep, 2022 1 commit
-
-
JD authored
* fix device instance library to add all instances * remove cppcheck from requirements.txt Co-authored-by:
Jun Liu <Liu.Jun@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 22 Sep, 2022 2 commits
-
-
Chao Liu authored
* fix * fix * add instance
-
Illia Silin authored
* replace obsolete offload-arch flags with GPU_TARGETS * fix a build error for client app * replace commma with semicolon in GPU_TARGETS
-
- 21 Sep, 2022 3 commits
-
-
Lixun Zhang authored
-
Illia Silin authored
* build CK only once, use deb package in all subsequent stages * update jenkins file * change prefix for build_CK stage * update writing deb metadata to control file * update ubuntu source for docker, script syntax for deb package metadata * try different way to create deb metadata * clean up DEBIAN before creating one * fix the CI folder names, fix splitK qa * use correct docker in all stages, separate tests for splitK verification and performance * clean old comments, change dir before packaging * use different package syntax * change packaging syntax * package with cmake * remove unnecessary build prefix * get rid of unnecessary paths * change paths during unpacking * change script syntax while unpacking * get rid of unneccesary steps * get rid of comments in the scripts * use double quotes for scripts * add ccache during build, try dpkg -x * pull and install each package separately * use full package names * try to use stashing for packages * change stash/unstash syntax * move unstash out of shell, run tests on any gpu node * unpack each package separately * try re-using existing workspace * merge the build and test stages, only stash ckProfiler * merge the build and test stages, only stash zipped ckProfiler * fix syntax * add GPU check before build and test, rename docker to usual name
-
zjing14 authored
-
- 20 Sep, 2022 6 commits
-
-
Chao Liu authored
* fix build * fix build
-
Shaojie WANG authored
* add lower triangle bmm * init code for tile skipping * functionality right with lower triangle mask * add decoder lower triangular mask calculation * use 7*13 group * fix n2 compute error * attention with lower triangle mask with tile skipping * add template to distinguish masking kernel * rename template and remove default template value * remove lower triangle gemm reference struct * add some comments on example * add 10 instance for masking bmm + scale + softmax + bmm + permute kernels * add test * add test file * add gtest for bmm masking scale softmax bmm permute * clang-format * fix compile error * check lef bottom corner for tile skipping * fix error: check left bottom corner for tile skipping * add k padding * add test and instance for MNK padding * passing a mask struct * fix instances * delete used comments * format Co-authored-by:
danyao12 <yaodan@dc-smc-13.amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
Illia Silin authored
-
rocking5566 authored
* Add groupnorm example by layernorm 1. Reference is not ready 2. shape of gamma and beta need to be fix * Let shape of gamma and beta can be same as x * Modify test, instance and client example * [What] Fix bug of layernorm for greater than 2 dimension. [Why] We need to get upper length from merge transform instead of embed transform. * Add reference for groupnorm * Fuse sigmoid after groupnorm * [What] Rename original layernorm into layernorm2d [Why] Prepare to add groupnorm using layernorm5d * clang-format * Add groupnorm test * Refine error message * Add groupnorm ckProfiler * Test groupnorm kernel from device_instance * update example * upadte profiler * Fix test naming * Fix argc number * Move descriptor and sweeponce to argument for quick debugging Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
Po Yen Chen authored
* Add example folder for 'DeviceElementwise' * Re-structure example files * Move common parts into common.hpp * Use more strict input * Add more helper methods in 'DeviceElementwise' * Use more specific method to write example * Allow specify problem through command line argument * Allow specify problem 'axes' through command line argument * Add check to template type argument * Add transpose_shape() to generalize shape permute * Generalize transpose utility functions * Use better name for tensor indices * Add checks in helper functions * Remove debug messages * Refine error message for check_err() * Generalize variable naming in example code * Add device op 'DevicePermute' This device op is clone of 'DeviceElementwise' * Use 'DevicePermute' device op in example * Remove 'elementwise' from identifiers * Remove 'elementwise' from file paths * Remove base class of 'DevicePermute' * Let 'DevicePermute' ...
-
Anthony Chang authored
* sanity check * add attribution * add irrgular k tile size for batched attention * format
-
- 19 Sep, 2022 3 commits
-
-
Anthony Chang authored
-
Anthony Chang authored
* grouped attn without batch validates; now move toward grouped batched attn * grouped batched attention * working * remove debug logging clean up clean up * reintroduce g_ prefix back to host tensor variables * format * rename file * restore old file * rename * consolidate padded/non-padded attention example * harmonize padding specialization in attn examples
-
Shaojie WANG authored
* init commit of convnd bwd data * begin compiling example * have a first version that produce a right result * refine device level launch kernel code * add more instances in example and get right results * clang-format * format example file * add more instances * fix instances * adding conv_bwd_data multile_d * adding conv_bwd_data multile_d * adding conv_bwd multiple d * adding conv_bwd multiple d * adding conv_bwd multiple d * refactor * refactor * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * refactor * update conv fwd's bias impl * refactor * reorg file * clean up cmake * clean * clean * clean Co-authored-by:
Chao Liu <lc.roy86@gmail.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 16 Sep, 2022 1 commit
-
-
Chao Liu authored
-
- 14 Sep, 2022 1 commit
-
-
ltqin authored
* refactor * start * add device gemm file * add BatchStrideD0 * add stridd0 * add gridwise file * add d0 parameters to gridwise gemm * add c layout transformer * add d0 threadwise copy * init kernel * init kernel * regular code * nm desc put to out * kernel parameter can not use reference * host add bias+gelu * run right for bias+gelu * change AddFastGelu into another file * interface add d1 bias parameters * add d1 parameter to argument * add d1 parameter to gridwise * first all code,not verify * gelu change to relu and GetElementSpaceSize bug * add instance * start add to ckprofiler * ckprofiler finish code * change input parameter for ckProfiler * fix host bias+gelu bug * show help for ckProfiler * fix bug for lunch kernel ignore parametes * add pad and fix about bug * mutiple d0 * add dynamic d0_element_op * change profiler and instance to mutiple d0 * example have 2 d0 * remove some comments not using * change 2 d0 have self parameters * change d element_op name * change class name(multiple_d) * fix bug * fix bug that don't find file * update profiler * refactor * update profiler * clean * revert example change * add gon layout * optimize parameter for gno * add gon to gemm+gemm * change helping input parameters * change to GemmPadder_v2 * using ForEach * fix gb_per_sec Co-authored-by:
Chao Liu <lc.roy86@gmail.com> Co-authored-by:
ltqin <letaoqin@amd.com>
-
- 13 Sep, 2022 1 commit
-
-
Illia Silin authored
* upgrade the OS and ROCM versions in CK docker * add cxx flags to link code with rocm5.2 and ck-9110 compiler * rename the docker image * run ONNX gemms using init=1
-