- 06 Dec, 2025 1 commit
-
-
Daniel Hiltgen authored
Follow up from #12992 Free all streams, and keep the alloc logic aligned across streams.
-
- 04 Dec, 2025 1 commit
-
-
Daniel Hiltgen authored
* Revert "vulkan: temporary cary of vulkan fixes (#12971)" This reverts commit 3a9e8e9f. * ggml update to b7087 * fix argsort on metal * update to b7108 * fix bakllava regression This model lacks the metadata for the projector type. * update to b7209 * fix TopK perf * only build arm code on arm
-
- 19 Nov, 2025 1 commit
-
-
Michael Yang authored
cuda panics on batches larger than 1024 so skip those and fallback to cpu
-