- 20 Oct, 2025 1 commit
-
-
Michael Yang authored
-
- 23 Jul, 2025 1 commit
-
-
Michael Yang authored
-
- 19 May, 2025 1 commit
-
-
Jesse Gross authored
Currently, when the backend is created, the tensors are loaded at the same time, which is a slow operation. This separates them to be two steps: - Create backend, including enumerating tensors and memory allocation - Loading tensor data This allows more flexibility in managing model loading.
-
- 04 May, 2025 1 commit
-
-
湛露先生 authored
Signed-off-by:zhanluxianshen <zhanluxianshen@163.com>
-
- 25 Apr, 2025 1 commit
-
-
Michael Yang authored
-
- 14 Feb, 2025 1 commit
-
-
Michael Yang authored
feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 16 Jan, 2025 1 commit
-
-
Josh authored
--------- Co-authored-by:Patrick Devine <patrick@infrahq.com>
-
- 14 Jan, 2025 1 commit
-
-
Bruce MacDonald authored
Add native support for converting Qwen2 family models (including Qwen2.5) from safetensors to gguf format so we can run it.
-
- 18 Oct, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:
jmorganca <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me> Co-authored-by:
Jesse Gross <jesse@ollama.com>
-
- 06 Sep, 2024 1 commit
-
-
Patrick Devine authored
-
- 28 Aug, 2024 1 commit
-
-
Patrick Devine authored
-
- 27 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 23 Aug, 2024 1 commit
-
-
Patrick Devine authored
-
- 21 Aug, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 12 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 31 Jul, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 21 May, 2024 1 commit
-
-
Michael Yang authored
-