- 22 Nov, 2024 2 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 21 Nov, 2024 8 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 19 Nov, 2024 1 commit
-
-
Rostyslav Geyyer authored
-
- 18 Nov, 2024 2 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 15 Nov, 2024 1 commit
-
-
Rostyslav Geyyer authored
-
- 08 Nov, 2024 2 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 07 Nov, 2024 2 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 06 Nov, 2024 6 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
illsilin authored
-
illsilin authored
-
Illia Silin authored
Merge from public
-
- 05 Nov, 2024 7 commits
-
-
illsilin authored
-
Andriy Roshchenko authored
-
Illia Silin authored
-
darren-amd authored
* explicit cast ptr offset * formating change
-
Illia Silin authored
* make sure cmake can handle xnack targets * dont build xdl instances for gfx906:xnack- * dont build xdl tests for gfx906:xnack-
-
Juan Manuel Martinez Caamaño authored
Before, generate.py appended the list at the end of the output file. When running the cmake configuration steps multiple times on the examples, the blob list (such as fwd_blob_list.txt) would grow at every configuration. `library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around this issue by removing the output file if it exists. Now, generate.py overrides the content of the output file. There is no need for the workaround in the CMakeLists.txt; and the issue is solved for the example projects too.
-
Lin Sun authored
Add instances for int8 grouped conv2d fwd --------- Co-authored-by:
root <root@dell300x-pla-t28-03.pla.dcgpu> Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
- 04 Nov, 2024 2 commits
-
-
Bartłomiej Kocot authored
* Temporary disable part of dynamic op conv instances * fix
-
Rostyslav Geyyer authored
-
- 02 Nov, 2024 1 commit
-
-
carlushuang authored
* more accurate residual * modify comment * Fix literal case in README.md --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 01 Nov, 2024 5 commits
-
-
Andriy Roshchenko authored
-
Illia Silin authored
* disable fp8 gemm_universal on gfx90a and gfx908 by default * fix cmake syntax * fix clang format * add ifdefs in amd_xdlops * disable fp8 gemm instances on gfx90a by default * update readme
-
rocking authored
* fix compile error * fix typo of padding * Add smoothquant op * Add smoothquant instance library * refine type * add test script * Re-generate smoothquant.hpp * Always use 'current year' in copyright * use Generic2dBlockShape instead * Add vector = 8 instance back * Find exe path automatically * Simplify the api condition * Remove debugging code * update year * Add blank line between function declaration * explicitly cast return value to dim3 * refine return value * Fix default warmup and repeat value * Add comment * refactor sommthquant cmake * Add README * Fix typo --------- Co-authored-by:Po Yen, Chen <PoYen.Chen@amd.com>
-
carlushuang authored
* hot fix ln * some rename
-
Illia Silin authored
Update develop branch from public repository
-
- 31 Oct, 2024 1 commit
-
-
Andriy Roshchenko authored
-