- 17 Jan, 2024 1 commit
-
-
Michael Yang authored
-
- 16 Jan, 2024 4 commits
-
-
Michael Yang authored
remove client.py
-
Bruce MacDonald authored
- prompt cache causes inferance to hang after some time
-
Patrick Devine authored
-
Michael Yang authored
fix: request retry with error
-
- 15 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
improve cuda detection (rel. issue #1704)
-
- 14 Jan, 2024 4 commits
-
-
Daniel Hiltgen authored
Fix typo in arm mac arch script
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Fix intel mac build
-
Jeffrey Morgan authored
-
- 13 Jan, 2024 3 commits
-
-
Daniel Hiltgen authored
Make sure we're building an x86 ext_server lib when cross-compiling
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 12 Jan, 2024 10 commits
-
-
Michael Yang authored
remove double newlines in /set parameter
-
Michael Yang authored
add max context length check
-
Michael Yang authored
-
Michael Yang authored
this fixes a subtle bug with makeRequestWithRetry where an HTTP status error on a retried request will potentially not return the right err
-
Fabian Preiss authored
-
Patrick Devine authored
-
Michael Yang authored
-
Michael Yang authored
Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-
Tristram Oaten authored
After executing the `userdel ollama` command, I saw this message: ```sh $ sudo userdel ollama userdel: group ollama not removed because it has other members. ``` Which reminded me that I had to remove the dangling group too. For completeness, the uninstall instructions should do this too. Thanks!
-
Michael Yang authored
-
- 11 Jan, 2024 17 commits
-
-
Michael Yang authored
-
Daniel Hiltgen authored
Fix up the CPU fallback selection
-
Daniel Hiltgen authored
The memory changes and multi-variant change had some merge glitches I missed. This fixes them so we actually get the cpu llm lib and best variant for the given system.
-
Michael Yang authored
fix build and lint
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Daniel Hiltgen authored
Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds
-
Eduard van Valkenburg authored
-
Michael Yang authored
add lint and test on pull_request
-
Daniel Hiltgen authored
This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform
-
Daniel Hiltgen authored
This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available
-
Fabian Preiß authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This can be useful for scenarios such as ROCm v5 vs v6 incompatibility or potentially CPU features.
-
Jeffrey Morgan authored
* increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc
-