- 02 Dec, 2024 1 commit
-
-
rtmadduri authored
* LWPCK-2429: Device grouped GEMM uses Async Memcpy Resolving merge conflicts * reverting changes to profile_grouped_gemm * revert date change --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 27 Nov, 2024 2 commits
-
-
Illia Silin authored
-
Adam Osewski authored
* Few small fixes. * New GroupedGemm instances (BF16) * Unify and refactor GroupedGEMM device API. * Adapt changes to new API. * Adapt grouped gemm profiler. * Accept multiple kbatches for grouped gemm profiler. - delete obsolete two stage as it is now covered by grouped gemm * Update unit test for grouped gemm. * Fix thresholds for BF16 and F8. Unblock tests. * Fix few instances. * Multiple small fixes. * Adapt to new API, check dynamic casting. * Uncomment few data types in grouped gemm profiler. * Fix call to SetDeviceArgs. * Fix profile grouped gemm multiply tile loop. * Fix grouped gemm tile loop kernel args in client examples. * Review comments.
-
- 26 Nov, 2024 2 commits
-
-
Adam Osewski authored
* Fix loop order. * Fix loop order in pipeline v4
-
jakpiase authored
* add check for bf16 splitk support for grouped gemm splitk * Update if condition --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
- 21 Nov, 2024 1 commit
-
-
Harisankar Sadasivan authored
* universal streamk fp8 changes & ckprofiler instances * revert strides to -1 and verification options * fp8 exclusion on pre-gfx94 for universal_streamk * PR review based revisions: permissions reverted, removed hip err checks --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 18 Nov, 2024 2 commits
-
-
Illia Silin authored
* add bf16 gemms for gfx11/gfx12 * reduce the input values in test_gemm * add int8 wmma gemm instances for gfx11/gfx12 * add example gemm_wmma_int8 * fix bug in gemm_wmma_int8 test * increase bf16 gemm test tolerance * update the dates and clean-up commented-out instances
-
Bartłomiej Kocot authored
* Batched GEMM Multiple D based on Universal GEMM Co-authored-by:
Jing Zhang <jizhan@fb.com> * CI fixes Co-authored-by:
Jing Zhang <jizhan@fb.com> --------- Co-authored-by:
Jing Zhang <jizhan@fb.com>
-
- 13 Nov, 2024 2 commits
-
-
Illia Silin authored
-
Taylor Ding authored
-
- 07 Nov, 2024 1 commit
-
-
Illia Silin authored
-
- 05 Nov, 2024 1 commit
-
-
darren-amd authored
* explicit cast ptr offset * formating change
-
- 30 Oct, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Remove virtual destructors from unary ops * Fixes * Fixes * clang format fixes
-
- 29 Oct, 2024 1 commit
-
-
Illia Silin authored
-
- 26 Oct, 2024 2 commits
-
-
Bartłomiej Kocot authored
* Add dynamic elementwise op Co-authored-by:
ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> * CI issues fix * Custom parameter value for dynamic functions - Comments addressed --------- Co-authored-by:
ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> Co-authored-by:
ThruptiRajLakshmanaGowda <tlakshma@amd.com>
-
valarLip authored
* add int8 gemm multiply multiply a8w8 * uncomment * clang-format-12 * Add example_gemm_multiply_multiply_xdl_int8 * Remove shell scripts * update preprocess number for mi308; bring back printout in ckprofiler * format --------- Co-authored-by:
chenjun <junchen2@amd.com> Co-authored-by:
Haocong WANG <haocwang@amd.com> Co-authored-by:
carlushuang <carlus.huang@amd.com>
-
- 25 Oct, 2024 1 commit
-
-
aledudek authored
* Calculate generic relative threshold pool3dfwd * Calculate absolute error threshold pool3d fwd * Generic threshold calculation take max input for relative error pool3dfwd * Remove max possible value for error calculation at runtime * Remove debug print in pool3dfwd * Pool3d fwd adjusted types in generic threshold calculation * Generic threshold calculation take into account number of accumulations and accdatatype * Generic threshold fix final error formula * Generic threshold calculation - num of accs fix * Generic threshold calculation - adjust absolute error * Generic threshold calculation - OutDataType in absolute error
-
- 22 Oct, 2024 1 commit
-
-
Jatin Chaudhary authored
Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 17 Oct, 2024 2 commits
-
-
Mirza Halilcevic authored
This reverts commit b5209eae.
-
Mirza Halilcevic authored
-
- 15 Oct, 2024 3 commits
-
-
Mirza Halilcevic authored
- Move descriptor for gemm_softmax_gemm to different branch
-
Mirza Halilcevic authored
-
illsilin authored
-
- 14 Oct, 2024 2 commits
-
-
Rostyslav Geyyer authored
* Add non_native_vector_type * Add a test * Add non-native vector type * Fix CTOR * Fix non-native vector type of 1 * Fix CTORs * Use vector_type to cover non-native implementation as well * Update the test * Format * Format * Fix copyright years * Remove BoolVecT so far * Add AsType test cases * Update assert error message * Remove redundant type * Update naming * Add complex half type with tests * Add tests for vector reshaping * Add missing alignas * Update test/data_type/test_custom_type.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Compare custom types to built-in types * Add default constructor test * Add an alignment test --------- Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Po Yen Chen <PoYen.Chen@amd.com>
-
Bartłomiej Kocot authored
* Add transpose scale amax example * fixes * Tune reduce instance
-
- 12 Oct, 2024 1 commit
-
-
Adam Osewski authored
-
- 09 Oct, 2024 2 commits
-
-
Mirza Halilcevic authored
-
Christopher Millette authored
-
- 08 Oct, 2024 1 commit
-
-
Mirza Halilcevic authored
tests.
-
- 07 Oct, 2024 1 commit
-
-
Illia Silin authored
* update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set
-
- 04 Oct, 2024 1 commit
-
-
Bartłomiej Kocot authored
-
- 02 Oct, 2024 2 commits
-
-
macurtis-amd authored
Without this change, the following diagnostic is generated: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] See C++17 spec [temp.names] p5.
-
Mirza Halilcevic authored
-
- 25 Sep, 2024 2 commits
-
-
Illia Silin authored
* fix clang20 compilation errors for gfx90a * fix clang20 compilation errors for gfx11 targets
-
Mirza Halilcevic authored
-
- 20 Sep, 2024 2 commits
-
-
Bartłomiej Kocot authored
* Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes
-
Adam Osewski authored
The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`
-
- 13 Sep, 2024 1 commit
-
-
Jun Liu authored
* Legacy support: customized filesystem * Update cmakefile for python alternative path * fix build issues * CK has no boost dependency * More fixes to issues found on legay systems * fix clang format issue * Check if blob is correctly generated in cmake * fix the python issues * add a compiler flag for codegen when using alternative python * use target_link_options instead of target_compile_options --------- Co-authored-by:illsilin <Illia.Silin@amd.com>
-
- 12 Sep, 2024 1 commit
-
-
Mateusz Ozga authored
* Add pool2d instance BWD AVG * Add pool2d instance BWD MAX * Fix: avg review * Fix review: part2 * Fix - enable test when type is compiled * Fix review part3
-
- 11 Sep, 2024 1 commit
-
-
jakpiase authored
* added pool2d fwd * add tests * add reviewers changes * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new" This reverts commit 6b2ba7ff8960b0a6ddbe30d8dac53eeb55a8597e, reversing changes made to 22c82bea0caf3e0f29399100c1bb67b8003fc042. * Revert "add reviewers changes" This reverts commit 22c82bea0caf3e0f29399100c1bb67b8003fc042. * added reviewers comments * revert some old files * add reviewers requests --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-