- 11 Jan, 2024 10 commits
-
-
Daniel Hiltgen authored
Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds
-
Eduard van Valkenburg authored
-
Michael Yang authored
add lint and test on pull_request
-
Daniel Hiltgen authored
This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform
-
Daniel Hiltgen authored
This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available
-
Fabian Preiß authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This can be useful for scenarios such as ROCm v5 vs v6 incompatibility or potentially CPU features.
-
Jeffrey Morgan authored
* increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc
-
- 10 Jan, 2024 6 commits
-
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
Smarter GPU Management library detection
-
Daniel Hiltgen authored
When there are multiple management libraries installed on a system not every one will be compatible with the current driver. This change improves our management library algorithm to build up a set of discovered libraries based on glob patterns, and then try all of them until we're able to load one without error.
-
Daniel Hiltgen authored
This can help speed up incremental builds when you're only testing one archicture, like amd64. E.g. BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:
-
Jeffrey Morgan authored
update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed
-
Jeffrey Morgan authored
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` * unblock condition variable in `update_slots` when closing server
-
- 09 Jan, 2024 24 commits
-
-
Jeffrey Morgan authored
-
Robin Glauser authored
Fixed assistant in the example response.
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
Set corret CUDA minimum compute capability version
-
Daniel Hiltgen authored
If you attempt to run the current CUDA build on compute capability 5.2 cards, you'll hit the following failure: cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
fix: set template without triple quotes
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-