discover: CPU supports flash attention

We already run flash attention on CPUs in cases where we have partial offloading but were disabling it if running on pure CPU, which is unnecessary.

discover: CPU supports flash attention
We already run flash attention on CPUs in cases where we have partial offloading but were disabling it if running on pure CPU, which is unnecessary.
8f4ec9ab · Jesse Gross · Jesse Gross · dbfd7bd0 · 8f4ec9ab
Commit 8f4ec9ab authored Aug 11, 2025 by Jesse Gross Committed by Jesse Gross Aug 11, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

discover/types.go discover/types.go +2 -1

No files found.
--- a/discover/types.go
+++ b/discover/types.go
@@ -171,7 +171,8 @@ func (si SystemInfo) GetOptimalThreadCount() int {
 // For each GPU, check if it does NOT support flash attention
 func (l GpuInfoList) FlashAttentionSupported() bool {
 	for _, gpu := range l {
-		supportsFA := gpu.Library == "metal" ||
+		supportsFA := gpu.Library == "cpu" ||
+			gpu.Library == "metal" ||
 			(gpu.Library == "cuda" && gpu.DriverMajor >= 7) ||
 			gpu.Library == "rocm"