• Adam Osewski's avatar
    Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762) · 1d8e4ec2
    Adam Osewski authored
    * add a prototype of int4
    
    * clean
    
    * debug
    
    * clean
    
    * clean
    
    * move packed into dynamic_buffer
    
    * fixed coord reset
    
    * add fast pki4 to half conversion
    
    * fix
    
    * fixed reference and host_tensor
    
    * fixed tensor init
    
    * format
    
    * debug i4_to_f16_convert
    
    * format
    
    * fixed splitk
    
    * weight permute
    
    * add b tile permute
    
    * clean
    
    * weight permute with splitki
    
    * format
    
    * improve weight layout
    
    * add and_or_b32
    
    * fixed splitk crush
    
    * add permute switch as a template
    
    * recover v3r1
    
    * clean
    
    * failure with intrawave v2
    
    * fixed
    
    * fixed
    
    * add ckProfiler
    
    * add bfp16 support
    
    * add bf16 example
    
    * fixed int4 to bhalf_t conversion
    
    * format
    
    * fixed int4 to bf16 conversion
    
    * clean
    
    * add instances for mem
    
    * clean
    
    * fixed host tensor size
    
    * fixed
    
    * debug
    
    * fixed
    
    * add pk_i4_t as a struct
    
    * fix
    
    * Update example/01_gemm/gemm_xdl_bf1...
    1d8e4ec2
profile_gemm_universal.cpp 8.32 KB