- 07 Oct, 2024 1 commit
-
-
Illia Silin authored
* update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set
-
- 04 Oct, 2024 1 commit
-
-
Bartłomiej Kocot authored
-
- 02 Oct, 2024 1 commit
-
-
macurtis-amd authored
Without this change, the following diagnostic is generated: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] See C++17 spec [temp.names] p5.
-
- 25 Sep, 2024 1 commit
-
-
Illia Silin authored
* fix clang20 compilation errors for gfx90a * fix clang20 compilation errors for gfx11 targets
-
- 20 Sep, 2024 2 commits
-
-
Bartłomiej Kocot authored
* Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes
-
Adam Osewski authored
The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`
-
- 13 Sep, 2024 1 commit
-
-
Jun Liu authored
* Legacy support: customized filesystem * Update cmakefile for python alternative path * fix build issues * CK has no boost dependency * More fixes to issues found on legay systems * fix clang format issue * Check if blob is correctly generated in cmake * fix the python issues * add a compiler flag for codegen when using alternative python * use target_link_options instead of target_compile_options --------- Co-authored-by:illsilin <Illia.Silin@amd.com>
-
- 12 Sep, 2024 1 commit
-
-
Mateusz Ozga authored
* Add pool2d instance BWD AVG * Add pool2d instance BWD MAX * Fix: avg review * Fix review: part2 * Fix - enable test when type is compiled * Fix review part3
-
- 11 Sep, 2024 2 commits
-
-
jakpiase authored
* added pool2d fwd * add tests * add reviewers changes * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new" This reverts commit 6b2ba7ff8960b0a6ddbe30d8dac53eeb55a8597e, reversing changes made to 22c82bea0caf3e0f29399100c1bb67b8003fc042. * Revert "add reviewers changes" This reverts commit 22c82bea0caf3e0f29399100c1bb67b8003fc042. * added reviewers comments * revert some old files * add reviewers requests --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
jakpiase authored
* Implemented smfmac xdlops * Added smfmac blockwise xdlops * fixes * add reviewers suggestions --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
- 05 Sep, 2024 2 commits
-
-
M.Emin Ozturk authored
* issue fix, one line changed for tmp * clang --------- Co-authored-by:
Emin Ozturk <emin.ozturk@utah.edu> Co-authored-by:
Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com>
-
Haocong WANG authored
* revert ckprofiler change * temp save * Add test and test pass * test pass * Fix bug inside rotating buffer when tensor is not packed * bug fix * clang format --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 03 Sep, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Add support for NGCHW in grouped conv bwd wei * Comments fixes * navi fixes * Update function names
-
- 02 Sep, 2024 1 commit
-
-
Bartłomiej Kocot authored
Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)" (#1455)" (#1490) This reverts commit 5ff8eeeb.
-
- 21 Aug, 2024 2 commits
-
-
Andriy Roshchenko authored
* Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Add CMakePresets.json * Guard off FP8 instances when the data type is not available. * Add example for Scaled FP8 Convolution with AMAX reduction. * Refactor CombConvScaleRelu instances. * Add CombConvScale instances. * Add client example for Scaled FP8 Convolution with AMAX reduction. * Cleanup.
-
Rostyslav Geyyer authored
* Set RNE fp8 conversion as a default * Update f8 tests * Disable failing test on gfx11 * Update bf8 tests * Add a flag * Fix the flag * Raise flag for gfx10 as well * Temp commit for tolerance testing * Update tolerances
-
- 14 Aug, 2024 1 commit
-
-
Haocong WANG authored
* replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by:
Jing Zhang <jizhan@fb.com> Co-authored-by:
Jing Zhang <jizhan@meta.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 13 Aug, 2024 1 commit
-
-
Mateusz Ozga authored
-
- 10 Aug, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Fix typo in TransformConvFwdToGemm * Fix bug in n offset calculation
-
- 09 Aug, 2024 1 commit
-
- 07 Aug, 2024 1 commit
-
-
Juan Manuel Martinez Caamaño authored
* Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead. See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility Closes #1439 * fix clang format --------- Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 06 Aug, 2024 3 commits
-
-
Juan Manuel Martinez Caamaño authored
-
jakpiase authored
* fix for beta!=0 in reduce * add reviewers suggestions
-
Bartłomiej Kocot authored
* Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes
-
- 31 Jul, 2024 1 commit
-
-
arai713 authored
* added isSupportedArgument check into codegen device op * adding function call * remove commented code
-
- 30 Jul, 2024 1 commit
-
-
Bartłomiej Kocot authored
-
- 25 Jul, 2024 1 commit
-
-
zjing14 authored
* add rotating_buff for gemm_multi_d * format * Update flush_cache.hpp * Update gtest.cmake --------- Co-authored-by:
Jing Zhang <jizhan@fb.com> Co-authored-by:
Haocong WANG <haocwang@amd.com>
-
- 24 Jul, 2024 2 commits
-
-
Andriy Roshchenko authored
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412) * Add CMakePresets configurations. * Add binary elementwise ConvScaleAdd and an example. * Numerical verification of results. Observed significant irregularities in F8 to F32 type conversions: ```log ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000 ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000 ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000 ``` * Implemented ConvScaleAdd + Example. * Add ConvScale+Bias Instances * Add Client Example for ConvScale+Bias * Fix number of bytes in an example.. * Cleanup.
-
Bartłomiej Kocot authored
* Add support for half_t and bfloat to reduction operations * Fix bhalf convert * Next fix bf16
-
- 22 Jul, 2024 1 commit
-
-
Bartłomiej Kocot authored
-
- 19 Jul, 2024 3 commits
-
-
Haocong WANG authored
* add ab_scale init support * enabled interwave * add scale type; update isSupport * adjust example * clean * enable f8 pure gemm rcr ckprofiler * Add gemm_multiply_multiply instances * clang format * Optimize for ScaleBlockMNK=128 * enable abscale f8 gemm ck profiler * Add pure f8 gemm test suite * Reverting to the state of project at f60fd77 * update copyright * clang format * update copyright --------- Co-authored-by:root <jizhan@amd.com>
-
ltqin authored
* init for reduce_threadwise multi_d * add reduce_threadwise_multi_d * add reduce_multi_d * clean * start add an other splitk device op * add reduce template parameter to SplitKBatchOffset * add reduce c matrix * clean up code * change example data type to bf16 * add bf16Ai8B example * remove reduce template parameter * add splitk atomic status to v4 * example add multi d parameters * device op add multi-d parameters * add multi-d to reduce * fix kbach=1 bug * change B layout to col in bf16Ai8B example * remove float adding struct * change multi-d interface * change file and class name * remove multi-d of bf16Ai8B example * change IsReduce function to IsReduceAdd * change example layout to RRR from RCR * according layout to set ds stride * reset parameter layout * add gemm universal reduce instance * add reduce factory * add profile_gemm_universal_reduce * add reduce to profiler * fix reduce instance * fix profiler reduce compiling bug * format * format library instance code * add mem instance for reduce library * fix call instance names * add workspace for reduce in ckProfiler * format * add mnpading to reduce library instance * add fp16 instance to reduce of profiler * change copyright time * restore profiler cmake file * add reduce text to instances * add DsLayout and DsDataType to instances template parameter * fixed gemm_reduce_multi_d * add an example without multi_d * Update common.hpp * Update gtest.cmake * Update gemm_xdl_splitk_reduce_bf16.cpp * clean * Update gtest.cmake * format * fixe api * format * default parameter change to RRR * add vector_len for multi_d * format * Update gtest.cmake * fix bf16A iBB elementwiseop * add ReduceDataType * move ReduceDataType to end position * format * remove googletest git method address * fix copyright time * update init data --------- Co-authored-by:
root <jizhan@amd.com> Co-authored-by:
letaoqin <letaoqin@amd.com> Co-authored-by:
Jing Zhang <jizhan@meta.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
Bartłomiej Kocot authored
* Refactor transform conv to gemm fwd * fixes codegen * wmma fixes * fix wmma * Fix copyright
-
- 17 Jul, 2024 1 commit
-
-
Qianfeng authored
-
- 16 Jul, 2024 1 commit
-
-
Andriy Roshchenko authored
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale element-wise operation and ReLU activation. (#1386) * Add CMakePresets configurations. * Add ConvScale+ReLU Functor and an Example * Account for ReLU FLOPs. * Add instances of 3D convolutions with ConvscaleRelu operation. * Implement Client Example * Cleanup
-
- 12 Jul, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Support access per groups and filter3x3 in grouped conv fwd * Fixes for large cases * Fixes for large tensors
-
- 06 Jul, 2024 1 commit
-
-
Harisankar Sadasivan authored
* universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). * Update README.md * fixing clang-format issues * removed conflicts in struct members between streamk and universal streamk * corrected arg parsing for streamk and universal streamk * added stream-k policies for 3 tile and 4 tile * fixed argument type issue with parsing cmd args * changes suggested in PR review are made- removing comments and correcting copyright * file permissions updated * added default value support for grid_size and streamk-policy selection set to -1 * print messages for arguments * print messages for arguments * print messages for arguments1
-
- 04 Jul, 2024 2 commits
- 27 Jun, 2024 1 commit
-
-
jakpiase authored
* first version of smfmac test * add reviewer comments * add reviewer suggestions
-