Modify the kernel to 128x128x64, and use mfma_32x32x4
Add int4+scale based on Zhang, Jing pk_i4. Compile pass, function pass.
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment