- 06 Dec, 2024 1 commit
-
-
aska-0096 authored
-
- 05 Dec, 2024 2 commits
- 04 Dec, 2024 3 commits
- 30 Nov, 2024 1 commit
-
-
mtgu0705 authored
Add int4+scale based on Zhang, Jing pk_i4. Compile pass, function pass. Modify the kernel to 128x128x128, and use mfma_32x32x4 Move the weight permute from host to device Modified the scale init method. Modified the init method, the function is failed, need to debug. Added init method Support group=128 for Llam2-7B-int4 Move the weight permute from host to device Add ckProfiler for GEMM b scale (int4) Add reference function. Add pipeline v4 (2 LDS pingpong) Add more int4-Gemm kernel profiling instances. Modify the in4-Gemm kernel instances Move the pk_i4 permute in kernel
-
- 27 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 24 Oct, 2024 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 23 Oct, 2024 6 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 22 Oct, 2024 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 21 Oct, 2024 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 20 Oct, 2024 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 18 Oct, 2024 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 16 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 15 Oct, 2024 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 14 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 13 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 11 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 09 Oct, 2024 1 commit
-
-
Christopher Millette authored
-
- 07 Oct, 2024 1 commit
-
-
Illia Silin authored
* update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set
-
- 04 Oct, 2024 1 commit
-
-
Bartłomiej Kocot authored
-
- 02 Oct, 2024 1 commit
-
-
macurtis-amd authored
Without this change, the following diagnostic is generated: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] See C++17 spec [temp.names] p5.
-
- 25 Sep, 2024 1 commit
-
-
Illia Silin authored
* fix clang20 compilation errors for gfx90a * fix clang20 compilation errors for gfx11 targets
-
- 20 Sep, 2024 2 commits
-
-
Bartłomiej Kocot authored
* Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes
-
Adam Osewski authored
The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`
-