• Lei Wang's avatar
    [Bugfix] Fix layout conflict issue for gqa decoding examples (#314) · 0fd82ed5
    Lei Wang authored
    * Remove logging statement from LoopVectorizerDynamic Substitute method for cleaner output.
    
    * Refactor flashattn example to improve CUDA configuration handling
    
    - Updated the `flashattn` function in `example_gqa_decode.py` to utilize a heuristic configuration based on CUDA device capabilities, enhancing compatibility with different architectures.
    - Replaced local variable allocations with more efficient constructs and removed unnecessary logging statements for cleaner output.
    - Adjusted the `do_bench` method call to streamline performance profiling.
    
    * lint fix
    0fd82ed5
example_gqa_decode.py 21.6 KB