"test/vscode:/vscode.git/clone" did not exist on "d73c6d7cd12e752a96bdbee9260b2cdc1e3ea87e"
- 24 Oct, 2024 1 commit
-
-
Aleksander Dudek authored
-
- 22 Oct, 2024 1 commit
-
-
Aleksander Dudek authored
-
- 14 Oct, 2024 1 commit
-
-
Adam Osewski authored
* This give 8% perf boost at the cost of using more registers.
-
- 11 Oct, 2024 1 commit
-
-
Adam Osewski authored
-
- 10 Oct, 2024 7 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Thomas Ning authored
* ake the cshuffle compilable * modify Mhe reference on gpu and cpu. Correaccess of cshuffle * fix the cpu reference code * Complete the in tile shuffle logic * restructure the kernel template input * change the naming pattern of ck_tile gemm pipeline * Re-format files using remod.py * Solve the fmha conflict with gemm * Comment Addressed from Carlus --------- Co-authored-by:Po Yen, Chen <PoYen.Chen@amd.com>
-
Adam Osewski authored
-
Adam Osewski authored
-
- 09 Oct, 2024 2 commits
-
-
Adam Osewski authored
-
Christopher Millette authored
-
- 08 Oct, 2024 2 commits
-
-
Po Yen Chen authored
* Fix text alignment of ArgParser::print() * Update example README files * Clarify make-ck-dev.sh <arch> usage * Only keep some of the argument from '-?' output * Undo command line output changes in README * Only keep existing argument on doc and update description * Fix text alignment * Make cmake-ck-*.sh compatible with 'sh' command
-
Qianfeng authored
* Simplify the codes in splitkv_combine pipeline * Always set kPadSeqLenK=true for fmha splitkv kernels * Change in Oacc Alignment and TileDistribution to be more adaptable to tile sizes --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 07 Oct, 2024 4 commits
-
-
Illia Silin authored
* update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set
-
Bartłomiej Kocot authored
Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
Adam Osewski authored
-
rocking authored
* Fix compile error * Add one pass pipeline * Extract creating tile_window to operator() * clang format * reduce duplicated code * do not hardcode * Support padding in layernorm --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 04 Oct, 2024 2 commits
-
-
kylasa authored
* Adding seed and offset pointer support to the philox random number generator. * Separating seed and offset pointer checks with different condition statements. * Changes include, adding support for device seed and offset pointers, union is used to store seed/offset values and device pointers to minimize device SGPRs. * Correcting a typo in the readme file * Re-format files using remod.py * Use STL type for API parameters * Use simpler struct design for drop_seed & drop_offset * Undo unnecessary changes * Sync kargs style for fmha_fwd.hpp/.cpp * Use templated union to reduce code * Use structured binding to make code more readable --------- Co-authored-by:
Sudhir Kylasa <sukylasa@amd.com> Co-authored-by:
Po Yen Chen <PoYen.Chen@amd.com>
-
Bartłomiej Kocot authored
-
- 02 Oct, 2024 2 commits
-
-
macurtis-amd authored
Without this change, the following diagnostic is generated: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] See C++17 spec [temp.names] p5.
-
Adam Osewski authored
-
- 01 Oct, 2024 4 commits
-
-
Illia Silin authored
* add missing vector header * Re-format header using remod.py --------- Co-authored-by:Po Yen, Chen <PoYen.Chen@amd.com>
-
Adam Osewski authored
-
Adam Osewski authored
-
Po Yen Chen authored
* Use same layout for o_acc and o tensor * Use better param names in partitioner * Remove redundant kargs 'max_seqlen_q' * Use better param names in splitkv kernel * Add comment for additional kernel arguments * Sync empty loop early return logics between pipelines * Pass more arguments to cmake in scripts * Align backslashes * Fix wrong o_acc tensor view strides * Change o_acc layout if o_perm=0 * Handle whole row masked via attn_bias * Use use vector width = 1 for o_acc * Use more even split sizes
-
- 30 Sep, 2024 1 commit
-
-
Adam Osewski authored
-
- 27 Sep, 2024 1 commit
-
-
Bartłomiej Kocot authored
* [CK_TILE] Image to Column kernel * Fixes * Vector loads and stores * Fixes * Fixes * change test dir name
-
- 26 Sep, 2024 1 commit
-
-
Dan Yao authored
* add barriers * tail bias barriers * adjust bf16/hd256 tol * continue adjust bf16/hd256 tol
-
- 25 Sep, 2024 4 commits
-
-
Illia Silin authored
* fix clang20 compilation errors for gfx90a * fix clang20 compilation errors for gfx11 targets
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
- 22 Sep, 2024 1 commit
-
-
Po Yen Chen authored
-
- 20 Sep, 2024 2 commits
-
-
Bartłomiej Kocot authored
* Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes
-
Adam Osewski authored
The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`
-
- 18 Sep, 2024 1 commit
-
-
Thomas Ning authored
* Support the N dimension padding * Finished the padding feature for different dimension of K
-
- 14 Sep, 2024 1 commit
-
-
Thomas Ning authored
* Finished the feature of gpu verification * Add the ck_tile_gemm test in the CI CD * add the include of tensor_layou in reference_gemm * Comment Addressed * split ck_tile fhma and gemm tests into separate stages * restructure the reference gemm * restructure a new reference_gemm api that could read the device mem --------- Co-authored-by:
carlushuang <carlus.huang@amd.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 13 Sep, 2024 1 commit
-
-
Jun Liu authored
* Legacy support: customized filesystem * Update cmakefile for python alternative path * fix build issues * CK has no boost dependency * More fixes to issues found on legay systems * fix clang format issue * Check if blob is correctly generated in cmake * fix the python issues * add a compiler flag for codegen when using alternative python * use target_link_options instead of target_compile_options --------- Co-authored-by:illsilin <Illia.Silin@amd.com>
-