"...composable_kernel_onnxruntime.git" did not exist on "71d6b19d18e267bb6b8e04711bc37e241aaed55e"
[BugFix] Fix precision issue in GQA decode when block_N exceeds seqlen/num_split (#575)
* [CI] Add flash_decoding example to CI * Add output of ref latency * format example_gqa_decode.py * [BugFix] Fix precision issue in GQA decode when block_N exceeds seqlen/num_split * format example_gqa_decode.py
Showing
Please register or sign in to comment