Commits · d223f3b697bbc4ff90f029db5fd52cd7ea6cc5bc · OpenDAS / ollama

14 Feb, 2025 8 commits

ggml-backend: Close on nil should be a no-op · d223f3b6
Jesse Gross authored Feb 10, 2025

d223f3b6

ggml-backend: Ensure data is available after async computation · 60830695

Jesse Gross authored Feb 05, 2025

We need to sync before retrieving data after async computation.
It is also important to ensure that the Go buffer is not moved by
the GC across function calls so we do a synchronous copy.

60830695

ggml-backend: Let GGML allocate context memory · 01d9a468

Jesse Gross authored Jan 30, 2025

Passing in a Go buffer is not safe because the garbage collector could
free or move the memory while the context is still open. However, if
we pass in the size and a nil pointer then GGML will allocate it from
the C side.

01d9a468

backend: API to support full precision matmul · d773b7d6

Jesse Gross authored Feb 13, 2025

Most tensor backends try to optimize performance by using a lower
precision for matmuls. However, some operations (such as kq) on
some models are sensitive to this and require full precision.

d773b7d6

backend: Support graph computation that does not return an output · 4d4463b2

Jesse Gross authored Feb 03, 2025

There are two cases where we may not have an output after computing:
 - Prompt processing where the length of the input exceeds the batch
   size
 - Internal memory management operations such as cache defrag and shift

4d4463b2

backend: Consistently use int (vs. int64) for tensor shapes · 0e38297f

Jesse Gross authored Feb 03, 2025

Currently there is a mixture of int and int64 used when dealing with
tensor dimensions and shapes, which causes unnecessary conversions -
they all should be the same type.

In general, most interfaces (such as Pytorch) use int64 for
generality but most implementations (such as CUDA) use int32 for
performance. There isn't much benefit to us to being more flexible
than the implementations we are likely to run on.

In addition, as a practical matter, a model with a tensor with a single
dimension larger than 32 bits is unlikely to run on a 32-bit machine.

0e38297f

backend: Don't return an error on Close · 7e13f568

Jesse Gross authored Feb 04, 2025

It is not common to return errors with close/free operations - most
people won't check it and even if they did there's probably not much
that can do. It's better to not give implementations false expectations.

7e13f568

next ollama runner (#7913) · 58245413

Michael Yang authored Feb 14, 2025



feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

58245413

13 Feb, 2025 4 commits
- docs: add ollamazing to the README.md (#9075) · 8cf16063
  Bùi Đức Nhật authored Feb 14, 2025
  
  8cf16063
- docs: add H200 as supported device. (#9076) · 3a4449e2
  frob authored Feb 13, 2025
```
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
  3a4449e2
- openai: finish_reason as tool_calls for streaming with tools (#7963) · 10d59d5f
  Anuraag (Rag) Agrawal authored Feb 14, 2025
  
  10d59d5f
- build: add -DGGML_CUDA_NO_PEER_COPY=ON for rocm builds on windows (#9060) · a4f69a01
  Jeffrey Morgan authored Feb 13, 2025
  
  a4f69a01
12 Feb, 2025 3 commits
- readme: add Homebrew to package managers section (#9052) · 82658c3e
  Clinton authored Feb 12, 2025
  
  82658c3e
- docs: fix nix package link (#9045) · 378d6e1e
  bloominstrong authored Feb 13, 2025
```
removing the channel tag from the url so it will always go to the current stable channel.
```
  378d6e1e
- doc: fix link for Abso (#9043) · afa55bc7
  Hugues Chocart authored Feb 12, 2025
  
  afa55bc7
11 Feb, 2025 2 commits
- fix: harden backend loading (#9024) · 49df03da
  Michael Yang authored Feb 11, 2025
```
* wrap ggml_backend_load_best in try/catch
* ignore non-ollama paths
```
  49df03da
- readme: add Abso SDK to community integrations (#8973) · 0189bdd0
  Hugues Chocart authored Feb 11, 2025
  
  0189bdd0
10 Feb, 2025 2 commits
- ml/backend/ggml: fix crash on dlopen for non-AVX systems (#8976) · f4711da7
  Jeffrey Morgan authored Feb 10, 2025
  
  f4711da7
- readme: add Lunary to observability community integrations (#8975) · 38117fba
  Hugues Chocart authored Feb 10, 2025
  
  38117fba
08 Feb, 2025 4 commits
- ci: use windows-2022 to sign and bundle (#8941) · 1f766c36
  Michael Yang authored Feb 08, 2025
```
ollama requires vcruntime140_1.dll which isn't found on 2019. previously
the job used the windows runner (2019) but it explicitly installs
2022 to build the app. since the sign job doesn't actually build
anything, it can use the windows-2022 runner instead.
```
  1f766c36
- docs: add LocalLLM app to community integrations (#8953) · 484a99e4
  Qusai Ismael authored Feb 08, 2025
  
  484a99e4
- docs: ollama zig community lib (#8688) · ec6121c3
  DravenK authored Feb 09, 2025
  
  ec6121c3
- docs: link directly to latest release page for tdm-gcc (#8939) · b86c0a15
  Jeffrey Morgan authored Feb 08, 2025
  
  b86c0a15
07 Feb, 2025 6 commits
- readme: add deepseek to supported models · 7e402ebb
  Guddu Kumar authored Feb 08, 2025
  
  7e402ebb
- docs: improve syntax highlighting in code blocks (#8854) · b901a712
  Azis Alvriyanto authored Feb 08, 2025
  
  b901a712
- add gfx instinct gpus (#8933) · abb8dd57
  Michael Yang authored Feb 07, 2025
  
  abb8dd57
- docs: include port in faq.md OLLAMA_HOST examples (#8905) · a400df48
  Leisure Linux authored Feb 07, 2025
  
  a400df48
- readme: add React Native client to community integrations (#8877) · 6ab4ba4c
  annilq authored Feb 07, 2025
  
  6ab4ba4c
- readme: add ChibiChat to community integrations (#8883) · e8d4eb3e
  CosmicEventHorizon authored Feb 06, 2025
  
  e8d4eb3e
06 Feb, 2025 11 commits
- build(rocm): add numa, elf (#8900) · ae7e368f
  Michael Yang authored Feb 06, 2025
  
  ae7e368f
- readme: add Ollama Chat WebUI for Docker to community integrations (#8084) · 31acd1eb
  oslook authored Feb 07, 2025
  
  31acd1eb
- build(rocm): add tinfo (#8899) · 9a4757ae
  Michael Yang authored Feb 06, 2025
  
  9a4757ae
- docs: add step for removing libraries in linux.md (#8897) · 78140197
  Abhinav Pant authored Feb 07, 2025
  
  78140197
- build: add missing dependencies (#8896) · b698f9a0
  Michael Yang authored Feb 06, 2025
  
  b698f9a0
- format: rename test file from byte_test.go to bytes_test.go (#8865) · 32285a6d
  Azis Alvriyanto authored Feb 07, 2025
  
  32285a6d
- ci: fix linux archive (#8862) · 1c198977
  Michael Yang authored Feb 05, 2025
```
the find returns intermediate directories which pulls the parent
directories. it also omits files under lib/ollama.

switch back to globbing
```
  1c198977
- readme: add simple-discord-ai to community integrations (#8659) · 330b6c50
  zyphixor authored Feb 05, 2025
  
  330b6c50
- runner: avoid buffer overwrite when generating multiple embeddings (#8714) · 928911bc
  Diego Pereira authored Feb 05, 2025
```
Shield the code processing the embedding result
from subsequent calls that may overwrite the same
buffer to process a second input when retrieving
model embeddings.
```
  928911bc
- chore: update gitattributes (#8860) · 5b446cc8
  Michael Yang authored Feb 05, 2025
```
* chore: update gitattributes
* chore: add build info source
```
  5b446cc8
- readme: add MLflow Tracing as an observability integration (#8811) · 451c1596
  Daniel Lok authored Feb 06, 2025
  
  451c1596