- 04 Apr, 2024 5 commits
-
-
Daniel Hiltgen authored
CI missing archive
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
CI subprocess path fix
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Fix CI release glitches
-
- 03 Apr, 2024 8 commits
-
-
Daniel Hiltgen authored
The subprocess change moved the build directory arm64 builds weren't setting cross-compilation flags when building on x86
-
Michael Yang authored
update graph size estimate
-
Michael Yang authored
-
Jeffrey Morgan authored
-
Michael Yang authored
default head_kv to 1
-
Blake Mizerany authored
This also moves the checkServerHeartbeat call out of the "RunE" Cobra stuff (that's the only word I have for that) to on-site where it's after the check for OLLAMA_MODELS, which allows the helpful error message to be printed before the server heartbeat check. This also arguably makes the code more readable without the magic/superfluous "pre" function caller.
-
Daniel Hiltgen authored
Fix numgpu opt miscomparison
-
Pier Francesco Contino authored
Co-authored-by:Pier Francesco Contino <pfcontino@gmail.com>
-
- 02 Apr, 2024 8 commits
-
-
Daniel Hiltgen authored
-
Michael Yang authored
-
Michael Yang authored
fix metal gpu
-
Michael Yang authored
-
Daniel Hiltgen authored
Bump llama.cpp to b2581
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Switch back to subprocessing for llama.cpp
-
- 01 Apr, 2024 17 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Leaving the cudart library loaded kept ~30m of memory pinned in the GPU in the main process. This change ensures we don't hold GPU resources when idle.
-
Daniel Hiltgen authored
We may have users that run into problems with our current payload model, so this gives us an escape valve.
-
Daniel Hiltgen authored
"cudart init failure: 35" isn't particularly helpful in the logs.
-
Daniel Hiltgen authored
Cleaner shutdown logic, a bit of response hardening
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
-
Patrick Devine authored
-
Michael Yang authored
update memory estimations for gpu offloading
-
Michael Yang authored
refactor model parsing
-
Michael Yang authored
fix generate output
-
Michael Yang authored
-
Michael Yang authored
count each layer independently when deciding gpu offloading
-
Michael Yang authored
-
Philipp Gillé authored
-
Saifeddine ALOUI authored
-
Jesse Zhang authored
Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit
🤗 Support: - Ollama - OpenAI APIs
-
- 31 Mar, 2024 2 commits
-
-
Yaroslav authored
Plugins list updated
-
sugarforever authored
* Community Integration: ChatOllama * fixed typo
-