• myamlak's avatar
    Multi-kernel CGEMM (#230) · 7b1e2c37
    myamlak authored
    * Reference CGEMM + test stub
    
    * Format.
    
    * Incomplete simple implementation
    
    * Library instances
    
    * Sketch of tests
    
    * Test fixes.
    
    * Example added
    
    * Cosmetics
    
    * Add elementwise operation kernel and example
    
    * Add comment
    
    * Add template argument of dim . Prepare to support multiple dimension
    
    * Rename example
    
    * Support 1 dimension
    
    * Add static assert
    
    * Add comment
    
    * Second auxiliary buffer added
    
    * Extract pad
    
    * Remove redundant argument
    
    * Support any dimension for elementwise operation
    
    * Remove line
    
    * Let it be the multiple number of CU
    
    * Move thread per block to the parameter of constructor
    
    * Consuming binary ops to do A+B / A-B
    
    * Fix + cosmetics + bf16 test commented out temporarily
    
    * Format
    
    * Enabling bf16 test
    
    * Revert "Enabling bf16 test"
    
    This reverts commit f497e2ba.
    
    * Fix + test reenabled
    
    * fix build
    
    * Revert "fix build"
    
    This reverts commit d7310238
    
    .
    
    * post PR #235 merge fix
    
    * amend
    
    * Single workspace for cgemm + helper
    
    * Perf calc fix
    
    * Review remarks: static_cast
    
    * Review remarks: binary ops templated
    
    * Cleaning
    
    * Removal of instances and their tests
    
    * Review remarks from aosew addressed
    
    * Review remark: unnecessary attribute
    
    * Post-merge fixes
    
    * Restrict 4gemm to PassThrough + bug fix
    
    * Review remarks
    
    * update licence
    
    * change cgemm example to fp16
    Co-authored-by: default avatarrocking <chunylai@amd.com>
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    Co-authored-by: default avatarAnthony Chang <ac.chang@outlook.com>
    7b1e2c37
CMakeLists.txt 2.52 KB