"llama/patches/0021-decode-disable-output_all.patch" did not exist on "4987f13d345d77844b6737edadaa1f0432df004c"
-
Michael Yang authored
cross attention Q and K projections needs to have their heads swapped, similar to non-cross attention Q and K tensors
55760195