Commits · 33ee7168ba1e16c813b52dc2c9417efa1e2e9f20 · OpenDAS / ollama

09 Jan, 2026 1 commit

Add experimental MLX backend and engine with imagegen support (#13648) · 33ee7168

Daniel Hiltgen authored Jan 08, 2026



* WIP - MLX backend with gemma3

* MLX: add cmake and go tag build toggles

To build the new MLX backend code:
  cmake --preset MLX
  cmake --build --preset MLX --parallel
  cmake --install build --component MLX
  go build -tags mlx .

Note: the main.go entrypoint for the MLX engine will change in a follow up commit.

* add experimental image generation runtime

* add experimental image generation runtime

* MLX: wire up cuda build for linux

* MLX: get dependencies correct and dedup

This is still too large for a unified github artifact, but is now "correct" for the mlx_cuda_v13
directory.

* fix relative link bug in dedup

* Add darwin build and readme

* add go build tag for mlx dependent code and wire up build_darwin.sh

* lint cleanup

* macos: build mlx for x86

This will be CPU only.

* cuda build instructions and fix drift from mlx bump

* stale comment

* Delete agent helper doc

* Clean up readme.md

* Revise README for tokenizer clarity and details

Updated README to clarify tokenizer functionality and removed correctness section.

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>

33ee7168

20 Oct, 2025 1 commit
- fs(ggml): fill in arch prefix if necessary (#12646) · d2b63c19
  Michael Yang authored Oct 20, 2025
  
  d2b63c19
23 Jul, 2025 1 commit
- s#x/exp/maps#maps# (#11506) · 6c733bf0
  Michael Yang authored Jul 23, 2025
  
  6c733bf0
19 May, 2025 1 commit

ggml: Seperate tensor load from backend creation · 94ab428e

Jesse Gross authored Apr 17, 2025

Currently, when the backend is created, the tensors are loaded at the
same time, which is a slow operation. This separates them to be two
steps:
 - Create backend, including enumerating tensors and memory allocation
 - Loading tensor data

This allows more flexibility in managing model loading.

94ab428e

04 May, 2025 1 commit
- file close check and close. (#10554) · 7e5c8eee
  湛露先生 authored May 05, 2025
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
  7e5c8eee
25 Apr, 2025 1 commit
- convert: use -1 for read all · dc1e81f0
  Michael Yang authored Apr 23, 2025
  
  dc1e81f0
14 Feb, 2025 1 commit

next ollama runner (#7913) · 58245413

Michael Yang authored Feb 14, 2025



feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

58245413

16 Jan, 2025 1 commit
- convert: import support for command-r models from safetensors (#6063) · 93a8daf2
  Josh authored Jan 15, 2025
```
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
```
  93a8daf2
14 Jan, 2025 1 commit

convert: qwen2 from safetensors (#8408) · f6f37130

Bruce MacDonald authored Jan 14, 2025

Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.

f6f37130

18 Oct, 2024 1 commit

image processing for llama3.2 (#6963) · c7cb0f06

Patrick Devine authored Oct 18, 2024


Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>

c7cb0f06

06 Sep, 2024 1 commit
- Fix gemma2 2b conversion (#6645) · 608e87bf
  Patrick Devine authored Sep 05, 2024
  
  608e87bf
28 Aug, 2024 1 commit
- throw an error when encountering unsupport tensor sizes (#6538) · 6c1c1ad6
  Patrick Devine authored Aug 27, 2024
  
  6c1c1ad6
27 Aug, 2024 1 commit
- clean up convert tokenizer · eae3af68
  Michael Yang authored Aug 27, 2024
  
  eae3af68
23 Aug, 2024 1 commit
- convert safetensor adapters into GGUF (#6327) · 0c819e16
  Patrick Devine authored Aug 23, 2024
  
  0c819e16
21 Aug, 2024 3 commits
- llama3.1 · 77903ab8
  Michael Yang authored Jul 29, 2024
  
  77903ab8
- convert gemma2 · 3546bbd0
  Michael Yang authored Jun 28, 2024
  
  3546bbd0
- bert · 5a28b9cf
  Michael Yang authored Jun 06, 2024
  
  5a28b9cf
12 Aug, 2024 1 commit
- add conversion for microsoft phi 3 mini/medium 4k, 128 · 6ffb5cb0
  Michael Yang authored Jun 03, 2024
  
  6ffb5cb0
02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
31 Jul, 2024 3 commits
- convert: only extract large files · eafc607a
  Michael Yang authored Jun 29, 2024
  
  eafc607a
- refactor convert · 5e9db9fb
  Michael Yang authored May 31, 2024
  
  5e9db9fb
- update convert test to check result data · 6b252918
  Michael Yang authored Jun 03, 2024
  
  6b252918
21 May, 2024 1 commit
- add test · 3591bbe5
  Michael Yang authored May 21, 2024
  
  3591bbe5