Commits · ddbfa6fe31c4fc1894dbfdf77df53590dfdf5119 · OpenDAS / ollama

04 Jan, 2024 1 commit

Daniel Hiltgen authored Jan 03, 2024

Go embed doesn't like when there's no matching files, so put
a dummy placeholder in to allow building without any GPU support
If no "server" library is found, it's safely ignored at runtime.

ddbfa6fe

02 Jan, 2024 4 commits

Get rid of one-line llama.log · 0498f7ce

Daniel Hiltgen authored Dec 30, 2023

This one log line was triggering a single line llama.log to be generated
in the pwd of the server

0498f7ce

Rename the ollama cmakefile · 738a8d12
Daniel Hiltgen authored Dec 24, 2023

738a8d12

Switch windows build to fully dynamic · d966b730

Daniel Hiltgen authored Dec 23, 2023

Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.

d966b730

Refactor how we augment llama.cpp · 9a70aecc

Daniel Hiltgen authored Dec 22, 2023

This changes the model for llama.cpp inclusion so we're not applying a patch,
but instead have the C++ code directly in the ollama tree, which should make it
easier to refine and update over time.

9a70aecc

22 Dec, 2023 3 commits

Quiet down llama.cpp logging by default · e5202eb6

Daniel Hiltgen authored Dec 22, 2023

By default builds will now produce non-debug and non-verbose binaries.
To enable verbose logs in llama.cpp and debug symbols in the
native code, set `CGO_CFLAGS=-g`

e5202eb6

Remove CPU build, fixup linux build script · fa24e73b
Daniel Hiltgen authored Dec 21, 2023

fa24e73b

Fix CPU performance on hyperthreaded systems · 325d7498

Daniel Hiltgen authored Dec 21, 2023

The default thread count logic was broken and resulted in 2x the number
of threads as it should on a hyperthreading CPU
resulting in thrashing and poor performance.

325d7498

21 Dec, 2023 1 commit

Revive windows build · d9cd3d96

Daniel Hiltgen authored Dec 20, 2023

The windows native setup still needs some more work, but this gets it building
again and if you set the PATH properly, you can run the resulting exe on a cuda system.

d9cd3d96

20 Dec, 2023 1 commit

Revamp the dynamic library shim · 7555ea44

Daniel Hiltgen authored Dec 20, 2023

This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.

This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.

7555ea44

19 Dec, 2023 8 commits

Fix darwin intel build · 6558f94e
Daniel Hiltgen authored Dec 19, 2023

6558f94e

Refine build to support CPU only · 1b991d0b

Daniel Hiltgen authored Dec 13, 2023

If someone checks out the ollama repo and doesn't install the CUDA
library, this will ensure they can build a CPU only version

1b991d0b

Bump llama.cpp to b1662 and set n_parallel=1 · 9adca7f7
Daniel Hiltgen authored Dec 14, 2023

9adca7f7

Build linux using ubuntu 20.04 · 89bbaafa

Daniel Hiltgen authored Dec 18, 2023

This changes the container-based linux build to use an older Ubuntu
distro to improve our compatibility matrix for older user machines

89bbaafa

Adapted rocm support to cgo based llama.cpp · 35934b2e
Daniel Hiltgen authored Nov 29, 2023

35934b2e

Use build tags to generate accelerated binaries for CUDA and ROCm on Linux. · f8ef4439

65a authored Oct 16, 2023

The build tags rocm or cuda must be specified to both go generate and go build.
ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well
as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the
CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also
used to switch VRAM detection between cuda and rocm implementations, using
added "accelerator_foo.go" files which contain architecture specific functions
and variables. accelerator_none is used when no tags are set, and a helper
function addRunner will ignore it if it is the chosen accelerator. Fix go
generate commands, thanks @deadmeu for testing.

f8ef4439

Add cgo implementation for llama.cpp · d4cd6957

Daniel Hiltgen authored Nov 13, 2023

Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.

d4cd6957

deprecate ggml · 811b1f03

Bruce MacDonald authored Nov 24, 2023



- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>

811b1f03

18 Dec, 2023 3 commits
- update runner submodule · 6b5bdfa6
  Jeffrey Morgan authored Dec 18, 2023
  
  6b5bdfa6
- update runner submodule to fix hipblas build · c063ee4a
  Jeffrey Morgan authored Dec 18, 2023
  
  c063ee4a
- update runner submodule · b85982eb
  Jeffrey Morgan authored Dec 18, 2023
  
  b85982eb
13 Dec, 2023 1 commit
- Update runner to support mixtral and mixture of experts (MoE) (#1475) · 31f0551d
  Jeffrey Morgan authored Dec 13, 2023
  
  31f0551d
04 Dec, 2023 1 commit
- update for qwen · 6deebf24
  Michael Yang authored Dec 04, 2023
  
  6deebf24
26 Nov, 2023 2 commits
- add back `f16c` instructions on intel mac · 16a90063
  Jeffrey Morgan authored Nov 26, 2023
  
  16a90063
- update submodule commit · 9e4a3164
  Jeffrey Morgan authored Nov 26, 2023
  
  9e4a3164
24 Nov, 2023 2 commits

windows CUDA support (#1262) · 82b9b329

Jing Zhang authored Nov 24, 2023

* Support cuda build in Windows
* Enable dynamic NumGPU allocation for Windows

82b9b329

Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261) · 12e8c12d

Jongwook Choi authored Nov 24, 2023

When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.

See #961.

12e8c12d

22 Nov, 2023 1 commit
- consistent cpu instructions on macos and linux · d77dde12
  Jeffrey Morgan authored Nov 22, 2023
  
  d77dde12
21 Nov, 2023 1 commit
- update llama.cpp · a00fac4e
  Michael Yang authored Nov 21, 2023
  
  a00fac4e
20 Nov, 2023 1 commit
- enable cpu instructions on intel macs · 13ba6df5
  Jeffrey Morgan authored Nov 19, 2023
  
  13ba6df5
17 Nov, 2023 1 commit
- build intel mac with correct binary and compile flags · 41434a7c
  Jeffrey Morgan authored Nov 16, 2023
  
  41434a7c
27 Oct, 2023 1 commit
- restore building runner with `AVX` on by default (#900) · 3a1ed9ff
  Jeffrey Morgan authored Oct 27, 2023
  
  3a1ed9ff
24 Oct, 2023 3 commits
- fix metal assertion errors · b0c9cd0f
  Jeffrey Morgan authored Oct 24, 2023
  
  b0c9cd0f
- update submodule commit · 77f61c63
  Jeffrey Morgan authored Oct 24, 2023
  
  77f61c63
- update submodule commit · f3604534
  Jeffrey Morgan authored Oct 23, 2023
  
  f3604534
23 Oct, 2023 2 commits
- bump submodules · 0c7a00a2
  Michael Yang authored Oct 23, 2023
```
pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since
438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking
```
  0c7a00a2
- update default log target · c9167494
  Michael Yang authored Oct 23, 2023
  
  c9167494
17 Oct, 2023 1 commit
- Update llama.cpp gguf to latest (#710) · f3648fd2
  Bruce MacDonald authored Oct 17, 2023
  
  f3648fd2
06 Oct, 2023 2 commits
- llm: fix build on `amd64` · ab066829
  Jeffrey Morgan authored Oct 06, 2023
  
  ab066829
- rename server subprocess (#700) · 5d22319a
  Bruce MacDonald authored Oct 06, 2023
```
- this makes it easier to see that the subprocess is associated with ollama
```
  5d22319a