Add load tile overload which accepts output tensor as parameter.
* This give 8% perf boost at the cost of using more registers.
Showing
Please register or sign in to comment
* This give 8% perf boost at the cost of using more registers.