Commits · 9a70aecccb7034f2d53b7b400f3929c6c199cb90 · OpenDAS / ollama

"vscode:/vscode.git/clone" did not exist on "1c1f8a118ef869327aba9b46100fe27fe36de6d5"

02 Jan, 2024 1 commit

Refactor how we augment llama.cpp · 9a70aecc

Daniel Hiltgen authored Dec 22, 2023

This changes the model for llama.cpp inclusion so we're not applying a patch,
but instead have the C++ code directly in the ollama tree, which should make it
easier to refine and update over time.

9a70aecc

27 Dec, 2023 1 commit
- enable `cache_prompt` by default · d4ebdadb
  Jeffrey Morgan authored Dec 27, 2023
  
  d4ebdadb
22 Dec, 2023 4 commits
- Add Cache flag to api (#1642) · 10da41d6
  K0IN authored Dec 22, 2023
  
  10da41d6
- Quiet down llama.cpp logging by default · e5202eb6
  Daniel Hiltgen authored Dec 22, 2023
```
By default builds will now produce non-debug and non-verbose binaries.
To enable verbose logs in llama.cpp and debug symbols in the
native code, set `CGO_CFLAGS=-g`
```
  e5202eb6
- Remove CPU build, fixup linux build script · fa24e73b
  Daniel Hiltgen authored Dec 21, 2023
  
  fa24e73b
- Fix CPU performance on hyperthreaded systems · 325d7498
  Daniel Hiltgen authored Dec 21, 2023
```
The default thread count logic was broken and resulted in 2x the number
of threads as it should on a hyperthreading CPU
resulting in thrashing and poor performance.
```
  325d7498
21 Dec, 2023 1 commit

Revive windows build · d9cd3d96

Daniel Hiltgen authored Dec 20, 2023

The windows native setup still needs some more work, but this gets it building
again and if you set the PATH properly, you can run the resulting exe on a cuda system.

d9cd3d96

20 Dec, 2023 1 commit

Revamp the dynamic library shim · 7555ea44

Daniel Hiltgen authored Dec 20, 2023

This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.

This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.

7555ea44

19 Dec, 2023 10 commits

Fix darwin intel build · 6558f94e
Daniel Hiltgen authored Dec 19, 2023

6558f94e
Carry ggml-metal.metal as payload · 54dbfa4c
Daniel Hiltgen authored Dec 18, 2023

54dbfa4c
Refine handling of shim presence · 3269535a
Daniel Hiltgen authored Dec 15, 2023
```
This allows the CPU only builds to work on systems with Radeon cards
```
3269535a

Refine build to support CPU only · 1b991d0b

Daniel Hiltgen authored Dec 13, 2023

If someone checks out the ollama repo and doesn't install the CUDA
library, this will ensure they can build a CPU only version

1b991d0b

Bump llama.cpp to b1662 and set n_parallel=1 · 9adca7f7
Daniel Hiltgen authored Dec 14, 2023

9adca7f7

Build linux using ubuntu 20.04 · 89bbaafa

Daniel Hiltgen authored Dec 18, 2023

This changes the container-based linux build to use an older Ubuntu
distro to improve our compatibility matrix for older user machines

89bbaafa

Adapted rocm support to cgo based llama.cpp · 35934b2e
Daniel Hiltgen authored Nov 29, 2023

35934b2e

Use build tags to generate accelerated binaries for CUDA and ROCm on Linux. · f8ef4439

65a authored Oct 16, 2023

The build tags rocm or cuda must be specified to both go generate and go build.
ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well
as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the
CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also
used to switch VRAM detection between cuda and rocm implementations, using
added "accelerator_foo.go" files which contain architecture specific functions
and variables. accelerator_none is used when no tags are set, and a helper
function addRunner will ignore it if it is the chosen accelerator. Fix go
generate commands, thanks @deadmeu for testing.

f8ef4439

Add cgo implementation for llama.cpp · d4cd6957

Daniel Hiltgen authored Nov 13, 2023

Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.

d4cd6957

deprecate ggml · 811b1f03

Bruce MacDonald authored Nov 24, 2023



- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>

811b1f03

18 Dec, 2023 3 commits
- update runner submodule · 6b5bdfa6
  Jeffrey Morgan authored Dec 18, 2023
  
  6b5bdfa6
- update runner submodule to fix hipblas build · c063ee4a
  Jeffrey Morgan authored Dec 18, 2023
  
  c063ee4a
- update runner submodule · b85982eb
  Jeffrey Morgan authored Dec 18, 2023
  
  b85982eb
14 Dec, 2023 1 commit

restore model load duration on generate response (#1524) · 6ee8c801

Bruce MacDonald authored Dec 14, 2023

* restore model load duration on generate response

- set model load duration on generate and chat done response
- calculate createAt time when response created

* remove checkpoints predict opts

* Update routes.go

6ee8c801

13 Dec, 2023 1 commit
- Update runner to support mixtral and mixture of experts (MoE) (#1475) · 31f0551d
  Jeffrey Morgan authored Dec 13, 2023
  
  31f0551d
12 Dec, 2023 2 commits
- exponential back-off (#1484) · 3144e2a4
  Bruce MacDonald authored Dec 12, 2023
  
  3144e2a4
- retry on concurrent request failure (#1483) · c0960e29
  Bruce MacDonald authored Dec 12, 2023
```
- remove parallel
```
  c0960e29
11 Dec, 2023 2 commits

Multimodal support (#1216) · 910e9401

Patrick Devine authored Dec 11, 2023




---------
Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local>

910e9401

remove per-model types · 56ffc302

Michael Yang authored Dec 08, 2023

mostly replaced by decoding tensors except ggml models which only
support llama

56ffc302

10 Dec, 2023 4 commits
- fix model name returned by `/api/generate` being different than the model name provided · fa2f095b
  Jeffrey Morgan authored Dec 10, 2023
  
  fa2f095b
- seek to end of file when decoding older model formats · d9a250e9
  Jeffrey Morgan authored Dec 09, 2023
  
  d9a250e9
- seek to eof for older model binaries · 944519ed
  Jeffrey Morgan authored Dec 09, 2023
  
  944519ed
- do not use `--parallel 2` for old runners · 2dd040d0
  Jeffrey Morgan authored Dec 09, 2023
  
  2dd040d0
09 Dec, 2023 1 commit

fix: parallel queueing race condition caused silent failure (#1445) · bbe41ce4

Bruce MacDonald authored Dec 09, 2023

* fix: queued request failures

- increase parallel requests to 2 to complete queued request, queueing is managed in ollama

* log steam errors

bbe41ce4

05 Dec, 2023 7 commits
- load projectors · b9495ea1
  Michael Yang authored Nov 30, 2023
  
  b9495ea1
- chat api endpoint (#1392) · 195e3d9d
  Bruce MacDonald authored Dec 05, 2023
  
  195e3d9d
- Revert "chat api (#991)" while context variable is fixed · 00d06619
  Jeffrey Morgan authored Dec 04, 2023
```
This reverts commit 7a0899d6.
```
  00d06619
- comments · 5a5dca13
  Michael Yang authored Nov 29, 2023
  
  5a5dca13
- seek instead of copyn · 72e7a49a
  Michael Yang authored Nov 29, 2023
  
  72e7a49a
- split from into one or more models · 2cb0fa7d
  Michael Yang authored Nov 24, 2023
  
  2cb0fa7d
- unnecessary ReadSeeker for DecodeGGML · b2816bca
  Michael Yang authored Nov 22, 2023
  
  b2816bca
04 Dec, 2023 1 commit

chat api (#991) · 7a0899d6

Bruce MacDonald authored Dec 04, 2023

- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history

7a0899d6