• Haocong WANG's avatar
    [Navi3x-LWPCK-545] Block-wise GEMM + Real GEMM_WMMA_FP16 (#541) · 919aeb1f
    Haocong WANG authored
    * wmma_op + unit test
    
    * add arch limitation to wmma test
    
    * change arch limitation
    
    * Refactor + Add all type unit test(int4 compile failed)
    
    * Add f32_16x16x16_bf16 unit test
    
    * tempsave
    
    * tempsave
    
    * tempsave
    
    * runtime bug, cannot find symbol
    
    * workaround for incorrect HIP warpSize return value
    
    * debugging
    
    * tempsave
    
    * Correctness OK, waiting for optimization
    
    * Tidy up + format
    
    * temp save
    
    * temp save, reproduce the v_bfi_b32 issue
    
    * add inline asm for wmmaop test
    
    * tidy up
    
    * clean some debug purpose code
    
    * discard some codes
    
    * clang format
    
    * clang format
    
    * compiler issue fixed + increase tile size
    919aeb1f
amd_inline_asm.hpp 15.4 KB