Commits · 03c2ba3ad9845ed08f1e3c73405f1afb6afe0c91 · gaoqiong / composable_kernel_ROCM

04 Dec, 2024 1 commit
- bug fix + performance opt + clangformat · 03c2ba3a
  aska-0096 authored Dec 04, 2024
  
  03c2ba3a
30 Nov, 2024 1 commit

Merge the int4 kernel and profiling in one commit for RTP. · 40054f53

mtgu0705 authored Nov 02, 2024

Add int4+scale based on Zhang, Jing pk_i4. Compile pass, function pass.
Modify the kernel to 128x128x128, and use mfma_32x32x4
Move the weight permute from host to device

Modified the scale init method.

Modified the init method, the function is failed, need to debug.

Added init method

Support group=128 for Llam2-7B-int4

Move the weight permute from host to device

Add ckProfiler for GEMM b scale (int4)

Add reference function.

Add pipeline v4 (2 LDS pingpong)

Add more int4-Gemm kernel profiling instances.

Modify the in4-Gemm kernel instances

Move the pk_i4 permute in kernel

40054f53