- 14 Feb, 2025 1 commit
-
-
Michael Yang authored
feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 13 Feb, 2025 4 commits
-
-
Bùi Đức Nhật authored
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
Anuraag (Rag) Agrawal authored
-
Jeffrey Morgan authored
-
- 12 Feb, 2025 3 commits
-
-
Clinton authored
-
bloominstrong authored
removing the channel tag from the url so it will always go to the current stable channel.
-
Hugues Chocart authored
-
- 11 Feb, 2025 2 commits
-
-
Michael Yang authored
* wrap ggml_backend_load_best in try/catch * ignore non-ollama paths
-
Hugues Chocart authored
-
- 10 Feb, 2025 2 commits
-
-
Jeffrey Morgan authored
-
Hugues Chocart authored
-
- 08 Feb, 2025 4 commits
-
-
Michael Yang authored
ollama requires vcruntime140_1.dll which isn't found on 2019. previously the job used the windows runner (2019) but it explicitly installs 2022 to build the app. since the sign job doesn't actually build anything, it can use the windows-2022 runner instead.
-
Qusai Ismael authored
-
DravenK authored
-
Jeffrey Morgan authored
-
- 07 Feb, 2025 6 commits
-
-
Guddu Kumar authored
-
Azis Alvriyanto authored
-
Michael Yang authored
-
Leisure Linux authored
-
annilq authored
-
CosmicEventHorizon authored
-
- 06 Feb, 2025 11 commits
-
-
Michael Yang authored
-
oslook authored
-
Michael Yang authored
-
Abhinav Pant authored
-
Michael Yang authored
-
Azis Alvriyanto authored
-
Michael Yang authored
the find returns intermediate directories which pulls the parent directories. it also omits files under lib/ollama. switch back to globbing
-
zyphixor authored
-
Diego Pereira authored
Shield the code processing the embedding result from subsequent calls that may overwrite the same buffer to process a second input when retrieving model embeddings.
-
Michael Yang authored
* chore: update gitattributes * chore: add build info source
-
Daniel Lok authored
-
- 05 Feb, 2025 7 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Azis Alvriyanto authored
Removed redundant checks and streamlined the switch-case structure. Added test cases for both HumanBytes and HumanBytes2 to cover a wide range of scenarios.
-
Jeffrey Morgan authored
-
Yashwanth A authored
In some cases, downloads slow due to disk i/o or other factors, causing the download to restart a part. This causes the download to "reverse" in percent completion. By increasing the timeout to 30s, this should happen less frequently.
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-