Grouped batched attention + permute (#412)
* grouped attn without batch validates; now move toward grouped batched attn * grouped batched attention * working * remove debug logging clean up clean up * reintroduce g_ prefix back to host tensor variables * format * rename file * restore old file * rename * consolidate padded/non-padded attention example * harmonize padding specialization in attn examples
Showing
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment