-
Michael Yang authored
cross attention Q and K projections needs to have their heads swapped, similar to non-cross attention Q and K tensors
55760195
cross attention Q and K projections needs to have their heads swapped, similar to non-cross attention Q and K tensors