- 27 Aug, 2024 4 commits
-
-
Sean Khatiri authored
-
Patrick Devine authored
-
Patrick Devine authored
-
Jeffrey Morgan authored
-
- 25 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
The numa flag may be having a performance impact on multi-socket systems with GPU loads
-
- 23 Aug, 2024 6 commits
-
-
Daniel Hiltgen authored
The recent cuda variant changes uncovered a bug in ByLibrary which failed to group by common variant for GPU types.
-
Michael Yang authored
update faq
-
Michael Yang authored
-
Patrick Devine authored
-
Daniel Hiltgen authored
During rebasing, the ordering was inverted causing the cuda version selection logic to break, with driver version being evaluated as zero incorrectly causing a downgrade to v11.
-
Daniel Hiltgen authored
Define changed recently and this slipped through the cracks with the old name.
-
- 22 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
* Fix embeddings memory corruption The patch was leading to a buffer overrun corruption. Once removed though, parallism in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To work around this, only use slot 0 for embeddings. * Fix embed integration test assumption The token eval count has changed with recent llama.cpp bumps (0.3.5+)
-
- 21 Aug, 2024 8 commits
-
-
Michael Yang authored
convert: update llama conversion for llama3.1
-
Michael Yang authored
-
Michael Yang authored
convert gemma2
-
Michael Yang authored
convert bert model from safetensors
-
Michael Yang authored
fix: chmod new layer to 0o644 when creating it
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 20 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
We're over budget for github's maximum release artifact size with rocm + 2 cuda versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can be extracted into the same location as the main bundle.
-
- 19 Aug, 2024 17 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Fix overlapping artifact name on CI
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Cuda v12
-
Daniel Hiltgen authored
Override numParallel in pickBestPartialFitByLibrary() only if unset.
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Based on compute capability and driver version, pick v12 or v11 cuda variants.
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This adds new variants for arm64 specific to Jetson platforms
-
Daniel Hiltgen authored
This should help speed things up a little
-
Daniel Hiltgen authored
This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.
-
Jeffrey Morgan authored
-
- 18 Aug, 2024 2 commits
-
-
Richard Lyons authored
-
Richard Lyons authored
-