• Tong WU's avatar
    [Feature] Enhance vectorized conversion support in CUDA codegen (#1095) · a148d62a
    Tong WU authored
    * [Feature] Add vectorized float16 and float32 conversion support in CUDA codegen
    
    * Implemented handling for conversions between float16 and float32 types, specifically for vectorized operations using __half22float2 and __float22half2_rn.
    * Enhanced the existing code to support both directions of conversion based on the lane count.
    * Improved overall type handling in the VisitExpr_ method for better compatibility with TileLang.
    
    * [Feature] Add float32 to float8 conversion support in CUDA codegen
    
    * Implemented handling for conversion from float32 to float8 (E4M3/E5M2) in the VisitExpr_ method.
    * Added vectorized conversion support using __nv_cvt_float2_to_fp8x2 for float2 to fp8x2 transformations.
    * Enhanced type handling for better compatibility with TileLang, particularly for float8 types.
    
    * lint
    
    * fix a bug
    
    * [Enhancement] Support lanes=4 cases and add unit test for vectorized cast
    
    * lint
    
    * [Feature] Refactor bf16 convertion operations and remove legacy compile flags
    
    * lint
    a148d62a
codegen_cuda.cc 102 KB