Commits · 1a0cfd080a2d3e65519c241b7561bf5aa49468ff · OpenDAS / ollama

19 May, 2025 2 commits

avoid kv truncation during create (#10761) · 1a0cfd08
Daniel Hiltgen authored May 19, 2025

1a0cfd08

ggml: Seperate tensor load from backend creation · 94ab428e

Jesse Gross authored Apr 17, 2025

Currently, when the backend is created, the tensors are loaded at the
same time, which is a slow operation. This separates them to be two
steps:
 - Create backend, including enumerating tensors and memory allocation
 - Loading tensor data

This allows more flexibility in managing model loading.

94ab428e

14 May, 2025 1 commit

fix crash in old clients with quantization progress (#10710) · ff80718e

Daniel Hiltgen authored May 14, 2025

Older clients assumed the digest was at least 19 characters long so increase the size
of the dummy digest to avoid array out of bounds crashes.

ff80718e

12 May, 2025 1 commit

convert: quantize from safetensors needs kv (#10675) · ad035ad5

Bruce MacDonald authored May 12, 2025

When creating a quantized model from safetensors we
need the array KV values to be loaded.Changing this
value to -1 loads the KV values on the returned
layer to be used and saved during quantization.

ad035ad5

06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

25 Apr, 2025 1 commit
- explicitly decode maxarraysize 1024 · 340448d2
  Michael Yang authored Apr 25, 2025
  
  340448d2
19 Apr, 2025 1 commit

create tempdir in models directory · 88738b35

Michael Yang authored Apr 18, 2025

the models directory should have plenty of storage and also ensure
there's no cross-device copy

88738b35

01 Mar, 2025 1 commit

server: validate local path on safetensor create (#9379) · bebb6823

Bruce MacDonald authored Feb 28, 2025

More validation during the safetensor creation process.
Properly handle relative paths (like ./model.safetensors) while rejecting absolute paths
Add comprehensive test coverage for various paths
No functionality changes for valid inputs - existing workflows remain unaffected
Leverages Go 1.24's new os.Root functionality for secure containment

bebb6823

14 Feb, 2025 1 commit

next ollama runner (#7913) · 58245413

Michael Yang authored Feb 14, 2025



feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

58245413

15 Jan, 2025 1 commit
- Fix absolute path names + gguf detection (#8428) · 2539f2db
  Patrick Devine authored Jan 14, 2025
  
  2539f2db
09 Jan, 2025 1 commit
- show a more descriptive error in the client if it is newer than the server (#8351) · 8bccae4f
  Patrick Devine authored Jan 09, 2025
  
  8bccae4f
01 Jan, 2025 1 commit
- Update the /api/create endpoint to use JSON (#7935) · 86a622cb
  Patrick Devine authored Dec 31, 2024
```
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
```
  86a622cb