- 22 Jan, 2024 3 commits
-
-
Daniel Hiltgen authored
This wires up logging in llama.cpp to always go to stderr, and also turns up logging if OLLAMA_DEBUG is set.
-
Daniel Hiltgen authored
-
Jeffrey Morgan authored
-
- 21 Jan, 2024 3 commits
-
-
Daniel Hiltgen authored
Detect potential error scenarios so we can fallback to CPU mode without hitting asserts.
-
Daniel Hiltgen authored
The linux build now support parallel CPU builds to speed things up. This also exposes AMD GPU targets as an optional setting for advaced users who want to alter our default set.
-
Jeffrey Morgan authored
-
- 20 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 19 Jan, 2024 4 commits
-
-
Daniel Hiltgen authored
-
Jeffrey Morgan authored
-
Self Denial authored
-
Self Denial authored
Update gpu.go initGPUHandles() to declare gpuHandles variable before reading it. This resolves an "invalid memory address or nil pointer dereference" error. Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under __TERMUX__ (Android).
-
- 18 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
A few obvious levels were adjusted, but generally everything mapped to "info" level.
-
- 17 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
This also refines the build process for the ext_server build.
-
- 16 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
Upstream llama.cpp has added a new dependency with the NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the driver distribution, not the general cuda libraries, and is not available as an archive, so we can not statically link it. This may introduce some additional compatibility challenges which we'll need to keep an eye on.
-
Bruce MacDonald authored
- prompt cache causes inferance to hang after some time
-
- 14 Jan, 2024 3 commits
-
-
Daniel Hiltgen authored
-
Alexander F. Rødseth authored
-
Jeffrey Morgan authored
-
- 13 Jan, 2024 3 commits
-
-
Daniel Hiltgen authored
Make sure we're building an x86 ext_server lib when cross-compiling
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 12 Jan, 2024 2 commits
-
-
Michael Yang authored
-
Fabian Preiss authored
-
- 11 Jan, 2024 9 commits
-
-
Daniel Hiltgen authored
The memory changes and multi-variant change had some merge glitches I missed. This fixes them so we actually get the cpu llm lib and best variant for the given system.
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Daniel Hiltgen authored
This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform
-
Daniel Hiltgen authored
This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This can be useful for scenarios such as ROCm v5 vs v6 incompatibility or potentially CPU features.
-
Jeffrey Morgan authored
* increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc
-
- 10 Jan, 2024 3 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed
-
Jeffrey Morgan authored
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` * unblock condition variable in `update_slots` when closing server
-
- 09 Jan, 2024 5 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-