"vscode:/vscode.git/clone" did not exist on "6ea3abe07d386c480e5e2e7695d90630350c48f8"
  • Lei Wang's avatar
    [Bugfix] Fix fp8 dtype for some cases (#1246) · 63bf1609
    Lei Wang authored
    * [Enhancement] Add FP8 support and reproducibility in lighting indexer
    
    * Introduced a manual seed in `test_fp8_lighting_indexer` to ensure reproducible performance.
    * Added specializations for `cute::float_e4m3_t` and `cute::float_e5m2_t` in `gemm_mma.h` for enhanced FP8 support across multiple CUDA architectures, ensuring compatibility and improved functionality.ix
    
    * Fix typos in `fp8_lighting_indexer.py` and improve formatting in `gemm_mma.h`
    
    * Corrected a typo in the comment for `test_fp8_lighting_indexer` to enhance clarity.
    * Reformatted lines in `gemm_mma.h` for better readability by aligning template specializations across multiple CUDA architectures.
    
    * test fix
    
    * bug fix
    63bf1609
gemm_mma.h 21.5 KB