• Rostyslav Geyyer's avatar
    Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945) · 42facfc6
    Rostyslav Geyyer authored
    
    
    * Add f8 bf8 gemm example
    
    * Add element-wise ops
    
    * Add intrinsics
    
    * Update reference calculation
    
    * Add an additional type option for xdlops gemm
    
    * Fix build process
    
    * Add bf8 to buffer addressing
    
    * Update blockwise op, split typeA and typeB
    
    * Update for compatibility
    
    * Uppdate naming to f8->fp8
    
    * Update naming
    
    * Format
    
    * Update naming (#937)
    
    * Add a client example
    
    * Add computetypes to device and gridwise ops
    
    * Add instances, update instance factory
    
    * Format
    
    * Fix a flag
    
    * Add ckProfiler mode
    
    * Fix typos
    
    * Add an example
    
    * Add bf8 generator
    
    * add bf8 mfma; fixed type_convert for bf8
    
    * move verfication ahead of timing
    
    * Update reference calculation
    
    * Fix reference
    
    * Narrow down float init range
    
    * Fix bf8 bf8 mfma
    
    * Add bf8 @ fp8 mfma
    
    * Update example
    
    * Update instances
    
    * Update profiler api
    
    * Update for compatibility
    
    * Format
    
    * Remove extra example
    
    * Clean up
    
    * workaround convert
    
    ---------
    Co-authored-by: default avatarJing Zhang <jizha@amd.com>
    42facfc6
profile_grouped_conv_bwd_weight.cpp 8.52 KB