- 04 Dec, 2024 1 commit
-
-
aska-0096 authored
-
- 30 Nov, 2024 1 commit
-
-
mtgu0705 authored
Add int4+scale based on Zhang, Jing pk_i4. Compile pass, function pass. Modify the kernel to 128x128x128, and use mfma_32x32x4 Move the weight permute from host to device Modified the scale init method. Modified the init method, the function is failed, need to debug. Added init method Support group=128 for Llam2-7B-int4 Move the weight permute from host to device Add ckProfiler for GEMM b scale (int4) Add reference function. Add pipeline v4 (2 LDS pingpong) Add more int4-Gemm kernel profiling instances. Modify the in4-Gemm kernel instances Move the pk_i4 permute in kernel
-