- 11 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This can be useful for scenarios such as ROCm v5 vs v6 incompatibility or potentially CPU features.
-
Jeffrey Morgan authored
* increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc
-
- 10 Jan, 2024 3 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed
-
Jeffrey Morgan authored
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` * unblock condition variable in `update_slots` when closing server
-
- 09 Jan, 2024 6 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 08 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 07 Jan, 2024 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 05 Jan, 2024 1 commit
-
-
Bruce MacDonald authored
-
- 04 Jan, 2024 9 commits
-
-
Daniel Hiltgen authored
If the tree has a stale submodule, make sure we clean it up first
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
On linux, we link the CPU library in to the Go app and fall back to it when no GPU match is found. On windows we do not link in the CPU library so that we can better control our dependencies for the CLI. This fixes the logic so we correctly fallback to the dynamic CPU library on windows.
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
* update cmake flags for intel macOS * remove `LLAMA_K_QUANTS` * put back `CMAKE_OSX_DEPLOYMENT_TARGET` and disable `LLAMA_F16C`
-
Daniel Hiltgen authored
Go embed doesn't like when there's no matching files, so put a dummy placeholder in to allow building without any GPU support If no "server" library is found, it's safely ignored at runtime.
-
- 03 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
This moves the list of AMD GPUs to an easier to maintain list which should make it easier to update over time.
-
Bruce MacDonald authored
-
- 02 Jan, 2024 4 commits
-
-
Daniel Hiltgen authored
This one log line was triggering a single line llama.log to be generated in the pwd of the server
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Refactor where we store build outputs, and support a fully dynamic loading model on windows so the base executable has no special dependencies thus doesn't require a special PATH.
-
Daniel Hiltgen authored
This changes the model for llama.cpp inclusion so we're not applying a patch, but instead have the C++ code directly in the ollama tree, which should make it easier to refine and update over time.
-
- 27 Dec, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 22 Dec, 2023 4 commits
-
-
K0IN authored
-
Daniel Hiltgen authored
By default builds will now produce non-debug and non-verbose binaries. To enable verbose logs in llama.cpp and debug symbols in the native code, set `CGO_CFLAGS=-g`
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
The default thread count logic was broken and resulted in 2x the number of threads as it should on a hyperthreading CPU resulting in thrashing and poor performance.
-
- 21 Dec, 2023 1 commit
-
-
Daniel Hiltgen authored
The windows native setup still needs some more work, but this gets it building again and if you set the PATH properly, you can run the resulting exe on a cuda system.
-
- 20 Dec, 2023 1 commit
-
-
Daniel Hiltgen authored
This switches the default llama.cpp to be CPU based, and builds the GPU variants as dynamically loaded libraries which we can select at runtime. This also bumps the ROCm library to version 6 given 5.7 builds don't work on the latest ROCm library that just shipped.
-
- 19 Dec, 2023 3 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This allows the CPU only builds to work on systems with Radeon cards
-