fix mllama conversion (#10716)
cross attention Q and K projections needs to have their heads swapped, similar to non-cross attention Q and K tensors
Showing
Please register or sign in to comment
cross attention Q and K projections needs to have their heads swapped, similar to non-cross attention Q and K tensors