"src/include/blockwise_4d_tensor_op.hpp" did not exist on "a0584426ff5b6b8b448c971b97c9b1a4d86ba010"
- 02 Dec, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 22 Nov, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 21 Nov, 2024 2 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 20 Nov, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 15 Nov, 2024 1 commit
-
-
Illia Silin authored
-
- 14 Nov, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 12 Nov, 2024 1 commit
-
-
Thomas Ning authored
* Finished the feature * Modified the test file * Test case update * addresss comment * Addressed the review comment * Fixed the CI error
-
- 07 Nov, 2024 1 commit
-
-
Illia Silin authored
-
- 06 Nov, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 05 Nov, 2024 2 commits
-
-
Illia Silin authored
* make sure cmake can handle xnack targets * dont build xdl instances for gfx906:xnack- * dont build xdl tests for gfx906:xnack-
-
Lin Sun authored
Add instances for int8 grouped conv2d fwd --------- Co-authored-by:
root <root@dell300x-pla-t28-03.pla.dcgpu> Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
- 04 Nov, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 01 Nov, 2024 1 commit
-
-
Illia Silin authored
* disable fp8 gemm_universal on gfx90a and gfx908 by default * fix cmake syntax * fix clang format * add ifdefs in amd_xdlops * disable fp8 gemm instances on gfx90a by default * update readme
-
- 30 Oct, 2024 3 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Adam Osewski authored
* CK-Tile GEMM with memory bound pipeline. * Memory bound gemm pipeline. * Fix not closed namespace. * Block gemm mem pipeline draft. * Do not use ck_tile:: within ck_tile namespace. * Refactoring & Move Layout info to pipeline problem. * Get hot loop and TailNum information before lunching kernel. * Fixes in pipeline. * Add comment to load_tile_raw and change variable naming style. * Few small changes & formatting. * Do not use macro. * Add gtests. * Use AccDataType for Output of MFMA instruction. * Formatting. * Refactor gemm examples. * Switch over to current block gemm. * Use currently available pipeline policy. * Refactoring and review comment.s * Fixes after merge. * Add missing include. * Add load tile overload which accepts output tensor as parameter. * This give 8% perf boost at the cost of using more registers. * Rename example. * Small changes. * Fix compilation err and lower K. * Support different layouts for A/B * Fix vector size for different layouts. * Rename Alignment into VectorSize * Unblock tests.
-
- 29 Oct, 2024 1 commit
-
-
valarLip authored
-
- 21 Oct, 2024 1 commit
-
-
illsilin authored
-
- 16 Oct, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 15 Oct, 2024 3 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 14 Oct, 2024 1 commit
-
-
Rostyslav Geyyer authored
* Add non_native_vector_type * Add a test * Add non-native vector type * Fix CTOR * Fix non-native vector type of 1 * Fix CTORs * Use vector_type to cover non-native implementation as well * Update the test * Format * Format * Fix copyright years * Remove BoolVecT so far * Add AsType test cases * Update assert error message * Remove redundant type * Update naming * Add complex half type with tests * Add tests for vector reshaping * Add missing alignas * Update test/data_type/test_custom_type.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Compare custom types to built-in types * Add default constructor test * Add an alignment test --------- Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Po Yen Chen <PoYen.Chen@amd.com>
-
- 11 Oct, 2024 2 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 10 Oct, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 07 Oct, 2024 2 commits
-
-
Andriy Roshchenko authored
-
Illia Silin authored
* update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set
-
- 27 Sep, 2024 1 commit
-
-
Bartłomiej Kocot authored
* [CK_TILE] Image to Column kernel * Fixes * Vector loads and stores * Fixes * Fixes * change test dir name
-
- 20 Sep, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes
-
- 17 Sep, 2024 1 commit
-
-
aledudek authored
* Extend pool3d fwd avg, max operations by f8_t, int8_t types * Pack MaxPool3dFwd params together * Fix MaxPool3dFwd AVG instances * Decrease verification precision for bf16 * Adjust tests + review changes * Adjust threshold for F8 * Adjusted compute types for MAX op instances * Fix ComputeDataType mismatch in tests and profiler for AVG * Fix naming from max_pool3d_fwd to pool3d_fwd * Adjust CMakeLists --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
- 16 Sep, 2024 1 commit
-
-
Mateusz Ozga authored
Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
- 13 Sep, 2024 1 commit
-
-
jakpiase authored
* add pool2d fp8 and int8 * minor fixes * add formatting * add reviewer suggestions * add reviewer suggestions
-
- 12 Sep, 2024 1 commit
-
-
Mateusz Ozga authored
* Add pool2d instance BWD AVG * Add pool2d instance BWD MAX * Fix: avg review * Fix review: part2 * Fix - enable test when type is compiled * Fix review part3
-
- 11 Sep, 2024 1 commit
-
-
jakpiase authored
* added pool2d fwd * add tests * add reviewers changes * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new" This reverts commit 6b2ba7ff8960b0a6ddbe30d8dac53eeb55a8597e, reversing changes made to 22c82bea0caf3e0f29399100c1bb67b8003fc042. * Revert "add reviewers changes" This reverts commit 22c82bea0caf3e0f29399100c1bb67b8003fc042. * added reviewers comments * revert some old files * add reviewers requests --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
- 05 Sep, 2024 1 commit
-
-
Haocong WANG authored
* revert ckprofiler change * temp save * Add test and test pass * test pass * Fix bug inside rotating buffer when tensor is not packed * bug fix * clang format --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 04 Sep, 2024 1 commit
-
-
Rostyslav Geyyer authored
-
- 03 Sep, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Add support for NGCHW in grouped conv bwd wei * Comments fixes * navi fixes * Update function names
-
- 26 Aug, 2024 1 commit
-
-
Illia Silin authored
* add ninja trace to CI builds * fix ninja trace logic * update the ninja trace logic in jenkins file * limit the number of threads to run ninja build * use ninja for installation after build * update the path to ninjatracing tool * use ninja to run check when using build trace * fix jenkins logic * fix typos * set proper setup_args for all stages * fix ninja syntax * replace ninja check with ninja test * enable ninja tracing with mainline and staging compilers
-