You need to sign in or sign up before continuing.
discover: CPU supports flash attention
We already run flash attention on CPUs in cases where we have partial offloading but were disabling it if running on pure CPU, which is unnecessary.
Showing
Please register or sign in to comment