1. 09 Jan, 2026 1 commit
    • Daniel Hiltgen's avatar
      Add experimental MLX backend and engine with imagegen support (#13648) · 33ee7168
      Daniel Hiltgen authored
      
      
      * WIP - MLX backend with gemma3
      
      * MLX: add cmake and go tag build toggles
      
      To build the new MLX backend code:
        cmake --preset MLX
        cmake --build --preset MLX --parallel
        cmake --install build --component MLX
        go build -tags mlx .
      
      Note: the main.go entrypoint for the MLX engine will change in a follow up commit.
      
      * add experimental image generation runtime
      
      * add experimental image generation runtime
      
      * MLX: wire up cuda build for linux
      
      * MLX: get dependencies correct and dedup
      
      This is still too large for a unified github artifact, but is now "correct" for the mlx_cuda_v13
      directory.
      
      * fix relative link bug in dedup
      
      * Add darwin build and readme
      
      * add go build tag for mlx dependent code and wire up build_darwin.sh
      
      * lint cleanup
      
      * macos: build mlx for x86
      
      This will be CPU only.
      
      * cuda build instructions and fix drift from mlx bump
      
      * stale comment
      
      * Delete agent helper doc
      
      * Clean up readme.md
      
      * Revise README for tokenizer clarity and details
      
      Updated README to clarify tokenizer functionality and removed correctness section.
      
      ---------
      Co-authored-by: default avatarjmorganca <jmorganca@gmail.com>
      33ee7168
  2. 06 May, 2025 1 commit
    • Daniel Hiltgen's avatar
      Move quantization to new backend (#10363) · 42481045
      Daniel Hiltgen authored
      * Move quantization logic to GGML via new backend
      
      This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.
      
      * Remove "add model quantizations"
      
      This is no longer needed now that quantization is implemented in Go+GGML code directly.
      42481045
  3. 14 Feb, 2025 1 commit
    • Michael Yang's avatar
      next ollama runner (#7913) · 58245413
      Michael Yang authored
      
      
      feat: add new Ollama engine using ggml through cgo
      
      This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.
      
      - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
      - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
      - `ml.Tensor` defines the interface for a tensor and tensor operations
      
      This is the first implementation of the new engine. Follow up PRs will implement more features:
      
      - non-greedy sampling (#8410)
      - integration with Ollama and KV caching (#8301)
      - more model support (#9080) with more coming soon
      Co-authored-by: default avatarBruce MacDonald <brucewmacdonald@gmail.com>
      58245413
  4. 16 Jan, 2025 1 commit