[`GPTNeoX`] Faster rotary embedding for GPTNeoX (based on llama changes) (#25830)
* Faster rotary embedding for GPTNeoX * there might be un-necessary moves from device * fixup * fix dtype issue * add copied from statements * fox copies * oupsy * add copied from Llama for scaled ones as well * fixup * fix * fix copies
Showing
Please register or sign in to comment