Commits · d2b63c19b326a5bfeacdfd407a95cf706927e1a0 · OpenDAS / ollama

20 Oct, 2025 1 commit
- fs(ggml): fill in arch prefix if necessary (#12646) · d2b63c19
  Michael Yang authored Oct 20, 2025
  
  d2b63c19
26 Aug, 2025 1 commit

convert: fix tensor sorting (#12015) · 86834a27

Michael Yang authored Aug 26, 2025

there's two bugs here.

1. the check for a layer id is incorrect and should be >= 0 since layer
   0 is valid
2. if both tensors have an layer identifier, it will only compare the
   layer id which will return 0 if the tensors are in the same layer.
   instead it should fallback to comparing the full tensor name

86834a27

26 Jun, 2025 1 commit

add new gemma model (#11204) · 73b642e6

Michael Yang authored Jun 25, 2025

* update patches

* cherry pick metal mean kernel

* cherry pick cuda mean kernel

* gemma3n

73b642e6

16 Jun, 2025 1 commit
- gguf: fix write order (#11068) · a6fbfc88
  Michael Yang authored Jun 16, 2025
```
* ggml: test write gguf order
* ggml: fix write tensor order
```
  a6fbfc88
07 May, 2025 1 commit
- fix data race in WriteGGUF (#10598) · af31ccef
  Daniel Hiltgen authored May 06, 2025
```
err in the go routine should not be shared with the outer scope
```
  af31ccef
06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

01 May, 2025 1 commit

fix: write gguf padding (#10510) · a7835c67

Michael Yang authored Apr 30, 2025

* add gguf_test

* fix padding

padding was being added to offset but not to the running count

a7835c67

25 Apr, 2025 2 commits
- generic ggml.array · 5d027916
  Michael Yang authored Apr 23, 2025
  
  5d027916
- convert: change to colmajor · 4892872c
  Michael Yang authored Apr 25, 2025
  
  4892872c
16 Apr, 2025 1 commit
- fix write gguf padding · 2fec73ee
  Michael Yang authored Apr 11, 2025
  
  2fec73ee
14 Feb, 2025 1 commit

next ollama runner (#7913) · 58245413

Michael Yang authored Feb 14, 2025



feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

58245413

18 Oct, 2024 1 commit

image processing for llama3.2 (#6963) · c7cb0f06

Patrick Devine authored Oct 18, 2024


Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>

c7cb0f06

12 Aug, 2024 1 commit
- add conversion for microsoft phi 3 mini/medium 4k, 128 · 6ffb5cb0
  Michael Yang authored Jun 03, 2024
  
  6ffb5cb0
31 Jul, 2024 3 commits
- comments · df993fa3
  Michael Yang authored Jul 08, 2024
  
  df993fa3
- refactor convert · 5e9db9fb
  Michael Yang authored May 31, 2024
  
  5e9db9fb
- update convert test to check result data · 6b252918
  Michael Yang authored Jun 03, 2024
  
  6b252918
16 Jul, 2024 1 commit
- add chat and generate tests with mock runner · 4a565cbf
  Michael Yang authored Jul 13, 2024
  
  4a565cbf
25 Jun, 2024 1 commit

llm: speed up gguf decoding by a lot (#5246) · cb42e607

Blake Mizerany authored Jun 24, 2024

Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.

cb42e607

11 Jun, 2024 1 commit

Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order" · 7bdcd1da

Michael Yang authored Jun 11, 2024

This reverts commit f5f245cc, reversing
changes made to 94d37fdc.

this change broke gguf v2 which is incorrectly detected as big endian

7bdcd1da

08 Jun, 2024 1 commit
- fix parsing big endian gguf · 620d5c56
  Michael Yang authored Jun 08, 2024
  
  620d5c56
07 Jun, 2024 1 commit
- fix create model when template detection errors · 030e765e
  Michael Yang authored Jun 07, 2024
  
  030e765e
04 Jun, 2024 1 commit
- lint · e40145a3
  Michael Yang authored May 21, 2024
  
  e40145a3
21 May, 2024 1 commit
- simplify safetensors reading · 171eb040
  Michael Yang authored May 20, 2024
  
  171eb040
20 May, 2024 3 commits
- cleanup · bbbd9f20
  Michael Yang authored May 15, 2024
  
  bbbd9f20
- bpe pretokenizer · 547132e8
  Michael Yang authored May 15, 2024
  
  547132e8
- llama3 conversion · c8cf0d94
  Patrick Devine authored Apr 28, 2024
  
  c8cf0d94
24 Apr, 2024 1 commit
- fixes for gguf (#3863) · 14476d48
  Patrick Devine authored Apr 23, 2024
  
  14476d48
16 Apr, 2024 2 commits
- fix padding to only return padding · e74163af
  Michael Yang authored Apr 15, 2024
  
  e74163af
- fix padding in decode · 969238b1
  Michael Yang authored Apr 15, 2024
```
TODO: update padding() to _only_ returning the padding
```
  969238b1
15 Apr, 2024 1 commit
- Add llama2 / torch models for `ollama create` (#3607) · 9f8691c6
  Patrick Devine authored Apr 15, 2024
  
  9f8691c6
10 Apr, 2024 1 commit
- refactor tensor query · 8b2c1006
  Michael Yang authored Apr 03, 2024
  
  8b2c1006
01 Apr, 2024 1 commit
- refactor model parsing · d338d704
  Michael Yang authored Mar 13, 2024
  
  d338d704
29 Mar, 2024 1 commit
- Add gemma safetensors conversion (#3250) · 5a5efee4
  Patrick Devine authored Mar 28, 2024
```
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  5a5efee4
26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
15 Mar, 2024 1 commit

llm,readline: use errors.Is instead of simple == check (#3161) · 6ce37e4d

Blake Mizerany authored Mar 15, 2024



This fixes some brittle, simple equality checks to use errors.Is. Since
go1.13, errors.Is is the idiomatic way to check for errors.
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

6ce37e4d

12 Mar, 2024 1 commit
- refactor readseeker · 00852979
  Michael Yang authored Mar 09, 2024
  
  00852979
08 Mar, 2024 1 commit
- decode ggla · 76bdebba
  Michael Yang authored Mar 08, 2024
  
  76bdebba
07 Mar, 2024 1 commit
- Convert Safetensors to an Ollama model (#2824) · 2c017ca4
  Patrick Devine authored Mar 06, 2024
  
  2c017ca4
21 Feb, 2024 1 commit
- add gguf file types (#2532) · 949d7b1c
  Michael Yang authored Feb 20, 2024
  
  949d7b1c
24 Jan, 2024 1 commit
- refactor tensor read · cd22855e
  Michael Yang authored Jan 24, 2024
  
  cd22855e