"vscode:/vscode.git/clone" did not exist on "070c45bbfa26d6e6c59dd24e5133082c1416d607"
- 14 May, 2025 2 commits
-
-
Bruce MacDonald authored
-
Michael Yang authored
-
- 06 May, 2025 1 commit
-
-
Daniel Hiltgen authored
* Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.
-
- 25 Apr, 2025 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 03 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
Mistral is a popular research lab making open source models. This updates the forward pass of llama architecture models to support both llama models and mistral models by accounting for additional metadata present in mistral models, and finding the correct dimensions for the output projection.
-
- 18 Mar, 2025 1 commit
-
-
Bruce MacDonald authored
When a model's architecture cannot be converted return the name of the unsupported arch in the error message.
-
- 11 Mar, 2025 4 commits
-
-
jmorganca authored
-
Patrick Devine authored
-
Michael Yang authored
-
Patrick Devine authored
-
- 14 Feb, 2025 1 commit
-
-
Michael Yang authored
feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 16 Jan, 2025 1 commit
-
-
Josh authored
--------- Co-authored-by:Patrick Devine <patrick@infrahq.com>
-
- 14 Jan, 2025 1 commit
-
-
Bruce MacDonald authored
Add native support for converting Qwen2 family models (including Qwen2.5) from safetensors to gguf format so we can run it.
-
- 10 Sep, 2024 1 commit
-
-
Patrick Devine authored
-
- 23 Aug, 2024 1 commit
-
-
Patrick Devine authored
-
- 21 Aug, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 12 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 31 Jul, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 04 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 20 May, 2024 5 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
Patrick Devine authored
-
Patrick Devine authored
-
- 06 May, 2024 1 commit
-
-
Michael Yang authored
- FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32}
-
- 24 Apr, 2024 1 commit
-
-
Patrick Devine authored
-
- 15 Apr, 2024 1 commit
-
-
Patrick Devine authored
-
- 06 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 01 Apr, 2024 1 commit
-
-
Patrick Devine authored
-
- 29 Mar, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 26 Mar, 2024 1 commit
-
-
Patrick Devine authored
-
- 11 Mar, 2024 1 commit
-
-
Michael Yang authored
-
- 08 Mar, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 07 Mar, 2024 1 commit
-
-
Patrick Devine authored
-