Commits · 9950f6ec247952f0b6a7bec758d4484cb9d3d97b · OpenDAS / ollama

04 Aug, 2025 1 commit
- gpt-oss · 9950f6ec
  Michael Yang authored Jun 03, 2025
  
  9950f6ec
31 Jul, 2025 2 commits
- tests · f1c73840
  Michael Yang authored Jul 08, 2025
  
  f1c73840
- bf16 · 4a8fc3f9
  Michael Yang authored Jul 09, 2025
  
  4a8fc3f9
23 Jul, 2025 1 commit
- s#x/exp/maps#maps# (#11506) · 6c733bf0
  Michael Yang authored Jul 23, 2025
  
  6c733bf0
27 Jun, 2025 1 commit
- chore: cleanup comments + unused vars (#11225) · 4129af92
  Michael Yang authored Jun 27, 2025
  
  4129af92
26 Jun, 2025 1 commit

Michael Yang authored Jun 25, 2025

* update patches

* cherry pick metal mean kernel

* cherry pick cuda mean kernel

* gemma3n

73b642e6

20 Jun, 2025 1 commit
- convert: utility for merging tensors (#11069) · c088ac0e
  Michael Yang authored Jun 20, 2025
  
  c088ac0e
11 Jun, 2025 1 commit

feat: uneven splits (#11048) · 45f56355

Michael Yang authored Jun 11, 2025

The current splitDim function only operates on tensors that are split evenly which isn't always the case, e.g. a QKV tensor. This change allows the function to be used for arbitrary splits

45f56355

22 May, 2025 1 commit

fix: mllama quality (#10807) · adff143b

Michael Yang authored May 22, 2025

* fix mllama convert

- transform attn_gate and ffn_gate
- swap attention heads for vision models

* fix mllama

the mlp gate which was applied in the wrong place

adff143b

19 May, 2025 1 commit

ggml: Seperate tensor load from backend creation · 94ab428e

Jesse Gross authored Apr 17, 2025

Currently, when the backend is created, the tensors are loaded at the
same time, which is a slow operation. This separates them to be two
steps:
 - Create backend, including enumerating tensors and memory allocation
 - Loading tensor data

This allows more flexibility in managing model loading.

94ab428e

16 May, 2025 1 commit

model: handle multiple eos tokens (#10577) · 333e3604

Michael Yang authored May 16, 2025

* get eos_token_id from generation_config.json

* refactor

* include both ids and strings in trace

* comments

* remove special case for gemma3 special vocab (#10743)

333e3604

15 May, 2025 1 commit

fix mllama conversion (#10716) · 55760195

Michael Yang authored May 15, 2025

cross attention Q and K projections needs to have their heads swapped, similar to non-cross attention Q and K tensors

55760195

14 May, 2025 2 commits
- model: add Qwen2.5-VL support (#10385) · 0aa8b371
  Bruce MacDonald authored May 13, 2025
  
  0aa8b371
- chore: update mllama to use ollama engine (#10637) · 23125648
  Michael Yang authored May 13, 2025
  
  23125648
08 May, 2025 1 commit
- chore: remove unused ZipReader type (#10621) · b585a581
  Michael Yang authored May 08, 2025
  
  b585a581
06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

04 May, 2025 1 commit
- file close check and close. (#10554) · 7e5c8eee
  湛露先生 authored May 05, 2025
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
  7e5c8eee
25 Apr, 2025 5 commits
- fixes for maverick · 7ba9fa9c
  Michael Yang authored Apr 21, 2025
  
  7ba9fa9c
- chunked attention · 8bf11b84
  Michael Yang authored Apr 10, 2025
  
  8bf11b84
- llama4 · f0c66e6d
  Michael Yang authored Apr 03, 2025
  
  f0c66e6d
- convert: use -1 for read all · dc1e81f0
  Michael Yang authored Apr 23, 2025
  
  dc1e81f0
- convert: change to colmajor · 4892872c
  Michael Yang authored Apr 25, 2025
  
  4892872c
16 Apr, 2025 1 commit
- fix write gguf padding · 2fec73ee
  Michael Yang authored Apr 11, 2025
  
  2fec73ee
03 Apr, 2025 1 commit

model: support for mistral-small in the ollama runner · 6bd0a983

Bruce MacDonald authored Mar 14, 2025

Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.

6bd0a983

02 Apr, 2025 1 commit

chore(all): replace instances of interface with any (#10067) · 9876c9fa

Bruce MacDonald authored Apr 02, 2025

Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.

9876c9fa

18 Mar, 2025 1 commit
- convert: return name of unsupported architecture (#9862) · 61a88252
  Bruce MacDonald authored Mar 18, 2025
```
When a model's architecture cannot be converted return the name of the unsupported arch in the error message.
```
  61a88252
13 Mar, 2025 1 commit
- fix: change default context size for gemma3 (#9744) · 80c7ce38
  Patrick Devine authored Mar 13, 2025
  
  80c7ce38
11 Mar, 2025 12 commits
- all: address linter errors · 83f0ec82
  jmorganca authored Mar 11, 2025
  
  83f0ec82
- use 2d pooling · 63a39406
  Michael Yang authored Mar 11, 2025
  
  63a39406
- fix gemma3 1b conversion · 2e54d72f
  Patrick Devine authored Mar 10, 2025
  
  2e54d72f
- compat with upstream gguf · 6b32a2d5
  Michael Yang authored Mar 10, 2025
  
  6b32a2d5
- skip repacking vision tensors · d368c039
  Michael Yang authored Mar 09, 2025
  
  d368c039
- fix configs · 9b54267e
  Patrick Devine authored Mar 08, 2025
  
  9b54267e
- update model · 46bb0169
  Michael Yang authored Mar 08, 2025
  
  46bb0169
- fix conversion · c62861f4
  Patrick Devine authored Mar 07, 2025
  
  c62861f4
- set non-causal attention · 0df18004
  Michael Yang authored Mar 07, 2025
  
  0df18004
- temporary work around for converting spm · 631fecc6
  Patrick Devine authored Mar 07, 2025
  
  631fecc6
- add gemma vision encoder · 4b037a97
  Michael Yang authored Mar 06, 2025
  
  4b037a97
- gemma2 impl · 5f74d1fd
  Patrick Devine authored Feb 07, 2025
  
  5f74d1fd
14 Feb, 2025 1 commit

next ollama runner (#7913) · 58245413

Michael Yang authored Feb 14, 2025



feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

58245413