Attn bwd develop qloop (#720)
* fix decoder tensor transfer related issues
* prototype1 Q loop direction w/ layout change
* remove useless templates
* add OutputDataType&Deterministic for pt1q1
* add OutputDataType&Deterministic for pt1q2
---------
Co-authored-by:
danyao12 <danyao12@amd.com>
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment