"examples/vscode:/vscode.git/clone" did not exist on "a7730272e4aeeed198b855b7f36ef7ac88cdd76b"
  • Lei Wang's avatar
    [Bugfix] Support larger than 256 box size tma copy (#413) · bf824406
    Lei Wang authored
    * [New Feature] Add FP8 Flash Attention Implementation (#412)
    
    * Introduce a new example script for FP8 Flash Attention in `example_mla_decode_kv_fp8.py`, showcasing the use of tilelang for efficient attention computation.
    * Implement the `flashattn` function with optimized memory management and kernel execution.
    * Include a reference program for comparison and performance evaluation.
    * Add command-line argument parsing for batch size, number of heads, and dimensions to facilitate testing and experimentation.
    * Enhance the overall structure and readability of the code.
    
    This addition aims to improve the performance of attention mechanisms in deep learning models by leveraging FP8 precision and optimized kernel execution.
    
    * lint fix
    
    * optimize quick start
    
    * lint fix
    bf824406
quickstart.py 3.74 KB