- 11 Jan, 2024 4 commits
-
-
Michael Yang authored
add lint and test on pull_request
-
Fabian Preiß authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
* increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc
-
- 10 Jan, 2024 6 commits
-
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
Smarter GPU Management library detection
-
Daniel Hiltgen authored
When there are multiple management libraries installed on a system not every one will be compatible with the current driver. This change improves our management library algorithm to build up a set of discovered libraries based on glob patterns, and then try all of them until we're able to load one without error.
-
Daniel Hiltgen authored
This can help speed up incremental builds when you're only testing one archicture, like amd64. E.g. BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:
-
Jeffrey Morgan authored
update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed
-
Jeffrey Morgan authored
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` * unblock condition variable in `update_slots` when closing server
-
- 09 Jan, 2024 25 commits
-
-
Jeffrey Morgan authored
-
Robin Glauser authored
Fixed assistant in the example response.
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
Set corret CUDA minimum compute capability version
-
Daniel Hiltgen authored
If you attempt to run the current CUDA build on compute capability 5.2 cards, you'll hit the following failure: cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
fix: set template without triple quotes
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 08 Jan, 2024 4 commits
-
-
Michael Yang authored
fix(cmd): history in alt prompt
-
Jeffrey Morgan authored
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com>
-
Bruce MacDonald authored
-
Bruce MacDonald authored
-
- 07 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
Detect very old CUDA GPUs and fall back to CPU
-