"test/srt/models/test_grok_models.py" did not exist on "86fc0d79d0b564fba1c313feafd15323ba731418"
Refactor the attention functions.
There's no reason for the whole CrossAttention object to be repeated when only the operation in the middle changes.
Showing
Please register or sign in to comment