- 04 Feb, 2025 5 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 03 Feb, 2025 1 commit
-
-
Andriy Roshchenko authored
-
- 01 Feb, 2025 1 commit
-
-
Andriy Roshchenko authored
-
- 31 Jan, 2025 4 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
Test the functionality of V_MFMA_F32_16X16X128_F8F6F4 and V_MFMA_F32_32X32X64_F8F6F4 instructions. (#293) * Introduced MFMA tests * Verified f8f6f4 MFMA Instructions
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 30 Jan, 2025 5 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 29 Jan, 2025 6 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 24 Jan, 2025 1 commit
-
-
Andriy Roshchenko authored
-
- 22 Jan, 2025 4 commits
-
-
Illia Silin authored
Fix build logic when building for multiple targets, including gfx950.
-
illsilin authored
-
illsilin authored
-
illsilin authored
-
- 21 Jan, 2025 2 commits
-
-
Andriy Roshchenko authored
-
illsilin authored
-
- 20 Jan, 2025 1 commit
-
-
illsilin authored
-
- 17 Jan, 2025 4 commits
-
-
illsilin authored
-
Illia Silin authored
Merge from public
-
illsilin authored
-
Aviral Goel authored
* smoke and regression targets working with tests * test filters work for both examples and test * removed uneccesary comments * added a missing comment * added a missing comment * fixed typo in the comments * updated README * Update PULL_REQUEST_TEMPLATE.md updating the template for future addition of test cases * Update PULL_REQUEST_TEMPLATE.md
-
- 16 Jan, 2025 2 commits
-
-
Bartłomiej Kocot authored
* Fix and optimize dynamic unary elementwise * fix
-
carlushuang authored
* fix mock token id * prepare host for g1u1 * reformat inline-asm * restructure uk_0 * restructure gate_up * done * change default to init=1 * update readme * fix a bug in interleave pipeline * rcp for silu
-
- 15 Jan, 2025 4 commits
-
-
Illia Silin authored
-
Bartłomiej Kocot authored
* Add rounding for float to bf16 conversion * Add bhalf test * Add inf test bhalf * Refactor * update cmake * Fixes
-
ruanjm authored
* Add shortcut to RMSNorm * Modify test for adding shortcut for RMSNorm * Add fused parameter into tests * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp * 1. Supports various stride and percisions. * Add support of Epilogue * Add fuse and epilogue support to rmsnorm ref * Modify rmsnorm example * Refactor tests/examples * Bug fix for newly added tests/examples * Bug fix for new tests 2 * Modify smoke test scripts remove dbg code * Supports non-smooth dyanmic quant * Update Rmsnorm2dFwd::GetName() * rename xscale and prec_sx to smoothscale and prec_sm Bug fix after rename Remove files * change example_rmsnorm2d_fwd.cpp * update performance calculator * Fix issue in two-pass when fuse add is enabled * Remove comment of beta --------- Co-authored-by:rocking <ChunYu.Lai@amd.com>
-
Andriy Roshchenko authored
Temporarily uses `DeviceGemmMultiD_ABScale_Xdl_CShuffle_V3` kernel and 128x128 scaling matrices. Must be modified to use MX-native GEMM kernell with 16 or 32 component vectors per scale. Verified on the emulator.
-