- 08 Nov, 2023 2 commits
- 07 Nov, 2023 5 commits
-
-
arai713 authored
-
zjing14 authored
* improve kpad * more tuning parameters * f16_f8_fp16 * cut test time * add f16_f8_fp16 * add f16_f8_f16 * testing instances for skinny cases * format * clean * add fp16_f8_fp16 * clang-format * add grouped gemm instalces * fixed profile grouped_gemm * clean * clean * clean * clean * clean * add missing instance func * fixed inferface --------- Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
root <root@sh5-1e707-rc06-38.mkm.dcgpu>
-
Astha Rai authored
-
Astha Rai authored
-
Daming Feng authored
* add compute type check for fp16 in forward convolution instances * Add compute type check for default compute types --------- Co-authored-by:Bartlomiej Kocot <barkocot@amd.com>
-
- 03 Nov, 2023 2 commits
-
-
Illia Silin authored
-
Bartlomiej Wroblewski authored
-
- 02 Nov, 2023 2 commits
-
-
Bartlomiej Wroblewski authored
* Add support for mixed precision in contraction scale and bilinear (#936) * Extract common functionality to separate files * Reference contraction: Remove incorrect consts from type_converts * Reference contraction: Add missing type_convert for dst value * Reference contraction: Fix incorrect order of B matrix dimensions * Add support for mixed precision in contraction scale and bilinear * Move using statements from instances to a common file * Move using statements from examples to a common file * Fix the order of B matrix dimensions across examples and profiler * Fix the computation of error threshold * Make ComputeDataType an optional argument * Include possible DataType -> ComputeDataType casting error in the threshold * Remove commented code * Make the ComputeDataType an optional argument in instance --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.24.0 to 0.26.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.24.0...v0.26.0 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 01 Nov, 2023 3 commits
-
-
Astha Rai authored
-
Bartłomiej Kocot authored
* Add ScaleAddScaleAddRelu post op for conv fwd * Fixes * Fix instance file name * Minor fix
-
Illia Silin authored
-
- 31 Oct, 2023 4 commits
-
-
Po Yen Chen authored
* Disable the SLP vectorizer to prevent unnecessary wait * Add comment to the reason of adding flag * Fix wording
-
Po Yen Chen authored
* Enable gfx942 support for DeviceGemmXdl<> device op * Enable gfx941 support for DeviceGemmXdl<> device op
-
Bartłomiej Kocot authored
* Add support for groups in Img2Col/Col2Img * Fix interface test * Fix interface test G to N * Improve performance * Change gemm layout to 3d * Fixes
-
Astha Rai authored
-
- 30 Oct, 2023 1 commit
-
-
Illia Silin authored
* replace ccache with sccache, pin package versions * put ccache back temporarily to avoid breaking other CI jobs * add sccashe_wrapper.sh script * fix the package version syntax * fix the pymysql package issue * run sccache_wrapper before build if ccache server found * set the paths before calling the sccache_wrapper * use /tmp instead of /usr/local for cache * try using sccache --start-server instead of wrapper * try using redis server with sccache * define SCCACHE_REDIS * add redis and ping packages, and redis port * use the new sccache redis server * do not use sccache with staging compiler * fix the condition syntax * add stunnel to redis * add tunnel verification * separate caches for different architectures * fix syntax for the cache tag * quse double brackets for conditions * add bash line to the script * add a switch for sccache and only use it in build stage * run check_host function when enabling sccache * fix the invocation tags for sccache * fix groovy syntax * set the invocation tag in groovy * disable sccache in clang-format stage * try another syntax for invocation tags * use local sccache server if can't connect to redis * fix script syntax * update README * refresh readme * readme updates * remove the timing and verification caveat from readme --------- Co-authored-by:Lisa Delaney <lisa.delaney@amd.com>
-
- 29 Oct, 2023 1 commit
-
-
Astha Rai authored
-
- 28 Oct, 2023 1 commit
-
-
Illia Silin authored
* Fix the fp8 conversion * Try clipping value before conversion * Fix return * Simplify with a const * reduce the gemm input tensor values to reduce round-off error * replace if-else with lambda * fix syntax --------- Co-authored-by:Rostyslav Geyyer <rosty.geyyer@amd.com>
-
- 26 Oct, 2023 2 commits
-
-
dependabot[bot] authored
Bumps [sphinxcontrib-bibtex](https://github.com/mcmtroffaes/sphinxcontrib-bibtex) from 2.5.0 to 2.6.1. - [Changelog](https://github.com/mcmtroffaes/sphinxcontrib-bibtex/blob/develop/CHANGELOG.rst) - [Commits](https://github.com/mcmtroffaes/sphinxcontrib-bibtex/compare/2.5.0...2.6.1 ) --- updated-dependencies: - dependency-name: sphinxcontrib-bibtex dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Sam Wu <sam.wu2@amd.com>
-
Astha Rai authored
-
- 25 Oct, 2023 2 commits
- 24 Oct, 2023 3 commits
- 23 Oct, 2023 1 commit
-
-
zjing14 authored
* add mnk padding for fp8 * add padding for row_col layout * added padding for fp32 --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 21 Oct, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Fix instances dtype check * Fix source dtypes seletor for examples and tests * Sync with new cmakefile changes * Remove not needed ifdefs * Remove not needed ifdefs
-
- 20 Oct, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Fix the conversion * Add bf8 functionality * Enable example on MI200 as well
-
- 19 Oct, 2023 7 commits
-
-
Illia Silin authored
* apply the patch for dl kernels on gfx11 * build DL kernels on navi32 CI
-
Qianfeng authored
* reinterpret_cast to const char* in dumpBufferToFile to be compatible with both const and non-const input pointers * Add seed input to GeneratorTensor_4 for normal_distribution generator * Add GetTypeString() for DeviceElementwiseImpl * Add HIP_CHECK_ERROR macro
-
Bartłomiej Kocot authored
* Extend available elementwise operations with conv examples * Fixes * Remove not needed convert * Update CMakeFile and dir name
-
Po Yen Chen authored
* Avoid force setting ENABLE_PIPELINE_V2_OPT to OFF * Remove compilation option variable MAX_ILP_OPTS
-
Astha Rai authored
-
Bartlomiej Wroblewski authored
-
Astha Rai authored
-
- 18 Oct, 2023 2 commits
-
-
rocking authored
* save mean and inverse std in normalization * Save mean and inverse std in splitK * Vector save mean and inv std * Modify instance for save mean and std * simplify the layernorm example * Save mean and std in groupnorm example * Save mean and inv std in ckProfiler and test * Remove compute data type from base class * Save mean and inv std in client example * Add changelog * clang format * Fix compile error * Refine naming * Avoid error in bf16 * revert changelog
-
zjing14 authored
Co-authored-by:Jing Zhang <jizha@amd.com>
-