• Rostyslav Geyyer's avatar
    Optimize bf16 conversion (#664) · b076a02a
    Rostyslav Geyyer authored
    * Add TypeConvert class and start refactoring
    
    * Refactor TypeConvert as a struct
    
    * Get back to template functions type_convert
    
    * Add a type_convert_bf16_rtn, set rtz as default
    
    * Clean up
    
    * Add UnaryConvertPrecision struct for high-precision workloads
    
    * Format
    
    * Update type_convert to UnaryConvert on threadwise level
    
    * Update UnaryConvertPrecision
    
    * Format
    
    * Fix chmod
    
    * Add a flag to pick converion method
    
    * Format
    
    * Remove the added flag
    
    * Merge elementwise op with type conversion
    
    * Move type_convert to elemwise op, update the op
    
    * Update type_convert_precision -> bf16_convert_rtn
    
    * Clean up
    
    * Update comments
    
    * Update the CK_WORKAROUND_DENORM_FIX flag handling
    
    * Update the unneeded op to work but warn user
    
    * Remove the message
    
    * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference
    
    * Format
    
    * Add missing include
    b076a02a
ck.hpp 7.17 KB