- 22 Nov, 2024 4 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 21 Nov, 2024 9 commits
-
-
Andriy Roshchenko authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 20 Nov, 2024 2 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
Removed multiple negations in fail/pass logic to propagate `true` as the success indicator.
-
- 19 Nov, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 15 Nov, 2024 1 commit
-
-
Rostyslav Geyyer authored
-
- 14 Nov, 2024 2 commits
-
-
Andriy Roshchenko authored
-
feli authored
Co-authored-by:dummycoderfe <noplydummmycoder@163.com>
-
- 13 Nov, 2024 3 commits
-
-
Illia Silin authored
-
Taylor Ding authored
-
Bartłomiej Kocot authored
* [CK TILE] Update gemm universal pipeline * Fixes * fix * Rebase
-
- 12 Nov, 2024 1 commit
-
-
Thomas Ning authored
* Finished the feature * Modified the test file * Test case update * addresss comment * Addressed the review comment * Fixed the CI error
-
- 11 Nov, 2024 2 commits
-
-
valarLip authored
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor * align CK coding style * extend strides to layernrom expample * clang-format...
-
Po Yen Chen authored
-
- 09 Nov, 2024 1 commit
-
-
dummycoderfe authored
* add moe_sorting & check ok * fix comments & typo * Run remod.py under include/ck_tile & example/ck_tile directories * format codes * fix output ci check bug * fix moe sorting readme and error commit file * use magiv div to accelerate compute * add an loop unroll for moe lds ops * add extblocksnel to set zeros for moebufs * [Ck_tile] moe set zero run ok, add size check and fix ref check * [Ck_tile]fix moe_sorting fuse set_zero remod * [Ck_tile] change name style, fix zero buffer size err, change folder * [Ck_tile] moe_sorting: fix name style * [Ck_tile] moe_sorting, remove useless params in traits * [Ck_tile] change outputtile cnt * unit_size; change output buf alloc --------- Co-authored-by:
dummycoderfe <noplydummmycoder@163.com> Co-authored-by:
Po Yen, Chen <PoYen.Chen@amd.com> Co-authored-by:
carlushuang <carlus.huang@amd.com>
-
- 08 Nov, 2024 2 commits
-
-
Rostyslav Geyyer authored
-
dummycoderfe authored
* optimze small N case using vec io and using rcp div * [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass * [Ck_tile] fix blockSize compute in Generic2dBlockShape * [Ck_tile]fix kfastfdiv template style * [Ck_tile] layernorm, fix stype in review --------- Co-authored-by:dummycoderfe <noplydummmycoder@163.com>
-
- 07 Nov, 2024 2 commits
-
-
Illia Silin authored
-
Rostyslav Geyyer authored
-
- 06 Nov, 2024 3 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 05 Nov, 2024 1 commit
-
-
darren-amd authored
* explicit cast ptr offset * formating change
-
- 04 Nov, 2024 2 commits
-
-
Andriy Roshchenko authored
-
Rostyslav Geyyer authored
-
- 02 Nov, 2024 1 commit
-
-
carlushuang authored
* more accurate residual * modify comment * Fix literal case in README.md --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 01 Nov, 2024 2 commits
-
-
rocking authored
* fix compile error * fix typo of padding * Add smoothquant op * Add smoothquant instance library * refine type * add test script * Re-generate smoothquant.hpp * Always use 'current year' in copyright * use Generic2dBlockShape instead * Add vector = 8 instance back * Find exe path automatically * Simplify the api condition * Remove debugging code * update year * Add blank line between function declaration * explicitly cast return value to dim3 * refine return value * Fix default warmup and repeat value * Add comment * refactor sommthquant cmake * Add README * Fix typo --------- Co-authored-by:Po Yen, Chen <PoYen.Chen@amd.com>
-
carlushuang authored
* hot fix ln * some rename
-
- 31 Oct, 2024 1 commit
-
-
carlushuang authored
* add prenorm/postnorm support, refactor using generate.py * update README * update README * fix format * update some description and fix format * update format * format * use non-raw for loading * format and update n4096 * dynamic-quant ready * update readme * support fused dynamic-quant * update fused-quant, with smooth * update README * update args * update some based on comment
-