• mtgu0705's avatar
    Merge the int4 kernel and profiling in one commit for RTP. · 40054f53
    mtgu0705 authored
    Add int4+scale based on Zhang, Jing pk_i4. Compile pass, function pass.
    Modify the kernel to 128x128x128, and use mfma_32x32x4
    Move the weight permute from host to device
    
    Modified the scale init method.
    
    Modified the init method, the function is failed, need to debug.
    
    Added init method
    
    Support group=128 for Llam2-7B-int4
    
    Move the weight permute from host to device
    
    Add ckProfiler for GEMM b scale (int4)
    
    Add reference function.
    
    Add pipeline v4 (2 LDS pingpong)
    
    Add more int4-Gemm kernel profiling instances.
    
    Modify the in4-Gemm kernel instances
    
    Move the pk_i4 permute in kernel
    40054f53
profile_gemm_b_scale.cpp 6.34 KB