• Lei Wang's avatar
    [Bugfix] Fix fp8 dtype for some cases (#1246) · 63bf1609
    Lei Wang authored
    * [Enhancement] Add FP8 support and reproducibility in lighting indexer
    
    * Introduced a manual seed in `test_fp8_lighting_indexer` to ensure reproducible performance.
    * Added specializations for `cute::float_e4m3_t` and `cute::float_e5m2_t` in `gemm_mma.h` for enhanced FP8 support across multiple CUDA architectures, ensuring compatibility and improved functionality.ix
    
    * Fix typos in `fp8_lighting_indexer.py` and improve formatting in `gemm_mma.h`
    
    * Corrected a typo in the comment for `test_fp8_lighting_indexer` to enhance clarity.
    * Reformatted lines in `gemm_mma.h` for better readability by aligning template specializations across multiple CUDA architectures.
    
    * test fix
    
    * bug fix
    63bf1609
fp8_lighting_indexer.py 10.7 KB