- 03 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
Mistral is a popular research lab making open source models. This updates the forward pass of llama architecture models to support both llama models and mistral models by accounting for additional metadata present in mistral models, and finding the correct dimensions for the output projection.
-
- 02 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
-
- 18 Mar, 2025 1 commit
-
-
Bruce MacDonald authored
When a model's architecture cannot be converted return the name of the unsupported arch in the error message.
-
- 13 Mar, 2025 1 commit
-
-
Patrick Devine authored
-
- 11 Mar, 2025 12 commits
-
-
jmorganca authored
-
Michael Yang authored
-
Patrick Devine authored
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
Michael Yang authored
-
Patrick Devine authored
-
Michael Yang authored
-
Patrick Devine authored
-
Michael Yang authored
-
Patrick Devine authored
-
- 14 Feb, 2025 1 commit
-
-
Michael Yang authored
feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 16 Jan, 2025 1 commit
-
-
Josh authored
--------- Co-authored-by:Patrick Devine <patrick@infrahq.com>
-
- 14 Jan, 2025 1 commit
-
-
Bruce MacDonald authored
Add native support for converting Qwen2 family models (including Qwen2.5) from safetensors to gguf format so we can run it.
-
- 10 Dec, 2024 1 commit
-
-
Stefan Weil authored
-
- 04 Dec, 2024 1 commit
-
-
Michael Yang authored
-
- 18 Oct, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:
jmorganca <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me> Co-authored-by:
Jesse Gross <jesse@ollama.com>
-
- 10 Sep, 2024 1 commit
-
-
Patrick Devine authored
-
- 06 Sep, 2024 1 commit
-
-
Patrick Devine authored
-
- 28 Aug, 2024 1 commit
-
-
Patrick Devine authored
-
- 27 Aug, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 23 Aug, 2024 1 commit
-
-
Patrick Devine authored
-
- 21 Aug, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 12 Aug, 2024 2 commits
-
-
Bruce MacDonald authored
-
Michael Yang authored
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 31 Jul, 2024 5 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-
Michael Yang authored
-
Michael Yang authored
-