"...composable_kernel_onnx.git" did not exist on "c254e5abd2b01b9d5a2ba3fe4531e178623396d0"
use ford/for instead of static_ford/static_for in threadwise copy, somehow...
use ford/for instead of static_ford/static_for in threadwise copy, somehow register spill is greatly reduced on AMD
Showing
Please register or sign in to comment