- 14 Nov, 2024 1 commit
-
-
letaoqin authored
-
- 13 Nov, 2024 3 commits
-
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
- 12 Nov, 2024 3 commits
-
-
Illia Silin authored
-
carlushuang authored
-
Thomas Ning authored
* Finished the feature * Modified the test file * Test case update * addresss comment * Addressed the review comment * Fixed the CI error
-
- 11 Nov, 2024 6 commits
-
-
Illia Silin authored
-
valarLip authored
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor * align CK coding style * extend strides to layernrom expample * clang-format...
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
Po Yen Chen authored
-
- 09 Nov, 2024 2 commits
-
-
dummycoderfe authored
* add moe_sorting & check ok * fix comments & typo * Run remod.py under include/ck_tile & example/ck_tile directories * format codes * fix output ci check bug * fix moe sorting readme and error commit file * use magiv div to accelerate compute * add an loop unroll for moe lds ops * add extblocksnel to set zeros for moebufs * [Ck_tile] moe set zero run ok, add size check and fix ref check * [Ck_tile]fix moe_sorting fuse set_zero remod * [Ck_tile] change name style, fix zero buffer size err, change folder * [Ck_tile] moe_sorting: fix name style * [Ck_tile] moe_sorting, remove useless params in traits * [Ck_tile] change outputtile cnt * unit_size; change output buf alloc --------- Co-authored-by:
dummycoderfe <noplydummmycoder@163.com> Co-authored-by:
Po Yen, Chen <PoYen.Chen@amd.com> Co-authored-by:
carlushuang <carlus.huang@amd.com>
-
Po Yen Chen authored
-
- 08 Nov, 2024 2 commits
-
-
Bartłomiej Kocot authored
* Add generic instances for two stage conv bwd wei * Update layout prefix
-
dummycoderfe authored
* optimze small N case using vec io and using rcp div * [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass * [Ck_tile] fix blockSize compute in Generic2dBlockShape * [Ck_tile]fix kfastfdiv template style * [Ck_tile] layernorm, fix stype in review --------- Co-authored-by:dummycoderfe <noplydummmycoder@163.com>
-
- 07 Nov, 2024 4 commits
-
-
Illia Silin authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
- 06 Nov, 2024 6 commits
-
-
rocking authored
-
carlushuang authored
-
valarLip authored
-
aledudek authored
* Generic threshold calculation add passing num of accums * Generic threshold - after merge fixes * Fix cmakelists --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
carlushuang authored
-
valarLip authored
-
- 05 Nov, 2024 10 commits
-
-
Andriy Roshchenko authored
-
Illia Silin authored
-
darren-amd authored
* explicit cast ptr offset * formating change
-
Illia Silin authored
* make sure cmake can handle xnack targets * dont build xdl instances for gfx906:xnack- * dont build xdl tests for gfx906:xnack-
-
carlushuang authored
-
Juan Manuel Martinez Caamaño authored
Before, generate.py appended the list at the end of the output file. When running the cmake configuration steps multiple times on the examples, the blob list (such as fwd_blob_list.txt) would grow at every configuration. `library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around this issue by removing the output file if it exists. Now, generate.py overrides the content of the output file. There is no need for the workaround in the CMakeLists.txt; and the issue is solved for the example projects too.
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
Lin Sun authored
Add instances for int8 grouped conv2d fwd --------- Co-authored-by:
root <root@dell300x-pla-t28-03.pla.dcgpu> Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
- 04 Nov, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Temporary disable part of dynamic op conv instances * fix
-
- 02 Nov, 2024 1 commit
-
-
carlushuang authored
* more accurate residual * modify comment * Fix literal case in README.md --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 01 Nov, 2024 1 commit
-
-
Illia Silin authored
* disable fp8 gemm_universal on gfx90a and gfx908 by default * fix cmake syntax * fix clang format * add ifdefs in amd_xdlops * disable fp8 gemm instances on gfx90a by default * update readme
-