examples/flash_decoding/example_gqa_decode.py · 0fd82ed51f4da7533cfd0f251f1eecfde9d39477 · OpenDAS / tilelang

[Bugfix] Fix layout conflict issue for gqa decoding examples (#314) · 0fd82ed5

Lei Wang authored Apr 01, 2025

* Remove logging statement from LoopVectorizerDynamic Substitute method for cleaner output.

* Refactor flashattn example to improve CUDA configuration handling

- Updated the `flashattn` function in `example_gqa_decode.py` to utilize a heuristic configuration based on CUDA device capabilities, enhancing compatibility with different architectures.
- Replaced local variable allocations with more efficient constructs and removed unnecessary logging statements for cleaner output.
- Adjusted the `do_bench` method call to streamline performance profiling.

* lint fix

0fd82ed5

example_gqa_decode.py 21.6 KB

Replace example_gqa_decode.py