- 14 Nov, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 12 Nov, 2024 2 commits
-
-
Andriy Roshchenko authored
The tests take too long to complete on the emulator. Need to see if it is possible to reduce the scope of the testing to just FP8 data types.
-
Andriy Roshchenko authored
-
- 08 Nov, 2024 2 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 07 Nov, 2024 7 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
splitk gemm appears to be losing precision VS reference implementation when FP numbers are involved.
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 06 Nov, 2024 6 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
illsilin authored
-
illsilin authored
-
Andriy Roshchenko authored
GPU verification takes too much time to complete on the emulator.
-
Illia Silin authored
Merge from public
-
- 05 Nov, 2024 10 commits
-
-
illsilin authored
-
Andriy Roshchenko authored
-
Illia Silin authored
-
Andriy Roshchenko authored
-
darren-amd authored
* explicit cast ptr offset * formating change
-
Illia Silin authored
* make sure cmake can handle xnack targets * dont build xdl instances for gfx906:xnack- * dont build xdl tests for gfx906:xnack-
-
Juan Manuel Martinez Caamaño authored
Before, generate.py appended the list at the end of the output file. When running the cmake configuration steps multiple times on the examples, the blob list (such as fwd_blob_list.txt) would grow at every configuration. `library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around this issue by removing the output file if it exists. Now, generate.py overrides the content of the output file. There is no need for the workaround in the CMakeLists.txt; and the issue is solved for the example projects too.
-
Andriy Roshchenko authored
-
Lin Sun authored
Add instances for int8 grouped conv2d fwd --------- Co-authored-by:
root <root@dell300x-pla-t28-03.pla.dcgpu> Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
Andriy Roshchenko authored
-
- 04 Nov, 2024 3 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Bartłomiej Kocot authored
* Temporary disable part of dynamic op conv instances * fix
-
- 02 Nov, 2024 1 commit
-
-
carlushuang authored
* more accurate residual * modify comment * Fix literal case in README.md --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 01 Nov, 2024 7 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Illia Silin authored
* disable fp8 gemm_universal on gfx90a and gfx908 by default * fix cmake syntax * fix clang format * add ifdefs in amd_xdlops * disable fp8 gemm instances on gfx90a by default * update readme
-
rocking authored
* fix compile error * fix typo of padding * Add smoothquant op * Add smoothquant instance library * refine type * add test script * Re-generate smoothquant.hpp * Always use 'current year' in copyright * use Generic2dBlockShape instead * Add vector = 8 instance back * Find exe path automatically * Simplify the api condition * Remove debugging code * update year * Add blank line between function declaration * explicitly cast return value to dim3 * refine return value * Fix default warmup and repeat value * Add comment * refactor sommthquant cmake * Add README * Fix typo --------- Co-authored-by:Po Yen, Chen <PoYen.Chen@amd.com>
-
Andriy Roshchenko authored
-
carlushuang authored
* hot fix ln * some rename
-
Illia Silin authored
Update develop branch from public repository
-
- 31 Oct, 2024 1 commit
-
-
Andriy Roshchenko authored
-