You need to sign in or sign up before continuing.
cuda: optimize memory access
Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4
Showing
Please register or sign in to comment
Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4