- 04 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
Go embed doesn't like when there's no matching files, so put a dummy placeholder in to allow building without any GPU support If no "server" library is found, it's safely ignored at runtime.
-
- 02 Jan, 2024 4 commits
-
-
Daniel Hiltgen authored
This one log line was triggering a single line llama.log to be generated in the pwd of the server
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Refactor where we store build outputs, and support a fully dynamic loading model on windows so the base executable has no special dependencies thus doesn't require a special PATH.
-
Daniel Hiltgen authored
This changes the model for llama.cpp inclusion so we're not applying a patch, but instead have the C++ code directly in the ollama tree, which should make it easier to refine and update over time.
-
- 22 Dec, 2023 3 commits
-
-
Daniel Hiltgen authored
By default builds will now produce non-debug and non-verbose binaries. To enable verbose logs in llama.cpp and debug symbols in the native code, set `CGO_CFLAGS=-g`
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
The default thread count logic was broken and resulted in 2x the number of threads as it should on a hyperthreading CPU resulting in thrashing and poor performance.
-
- 21 Dec, 2023 1 commit
-
-
Daniel Hiltgen authored
The windows native setup still needs some more work, but this gets it building again and if you set the PATH properly, you can run the resulting exe on a cuda system.
-
- 20 Dec, 2023 1 commit
-
-
Daniel Hiltgen authored
This switches the default llama.cpp to be CPU based, and builds the GPU variants as dynamically loaded libraries which we can select at runtime. This also bumps the ROCm library to version 6 given 5.7 builds don't work on the latest ROCm library that just shipped.
-
- 19 Dec, 2023 8 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This changes the container-based linux build to use an older Ubuntu distro to improve our compatibility matrix for older user machines
-
Daniel Hiltgen authored
-
65a authored
The build tags rocm or cuda must be specified to both go generate and go build. ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also used to switch VRAM detection between cuda and rocm implementations, using added "accelerator_foo.go" files which contain architecture specific functions and variables. accelerator_none is used when no tags are set, and a helper function addRunner will ignore it if it is the chosen accelerator. Fix go generate commands, thanks @deadmeu for testing.
-
Daniel Hiltgen authored
Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.
-
Bruce MacDonald authored
- remove ggml runner - automatically pull gguf models when ggml detected - tell users to update to gguf in the case automatic pull fails Co-Authored-By:Jeffrey Morgan <jmorganca@gmail.com>
-
- 18 Dec, 2023 3 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 13 Dec, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 04 Dec, 2023 1 commit
-
-
Michael Yang authored
-
- 26 Nov, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 24 Nov, 2023 2 commits
-
-
Jing Zhang authored
* Support cuda build in Windows * Enable dynamic NumGPU allocation for Windows
-
Jongwook Choi authored
When CUDA peer access is enabled, multi-gpu inference will produce garbage output. This is a known bug of llama.cpp (or nvidia). Until the upstream bug is fixed, we can disable CUDA peer access temporarily to ensure correct output. See #961.
-
- 22 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 21 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 20 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 17 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 27 Oct, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 24 Oct, 2023 3 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 23 Oct, 2023 2 commits
-
-
Michael Yang authored
pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since 438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking
-
Michael Yang authored
-
- 17 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 06 Oct, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Bruce MacDonald authored
- this makes it easier to see that the subprocess is associated with ollama
-