- 08 May, 2025 1 commit
-
-
Michael Yang authored
-
- 07 May, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 06 May, 2025 1 commit
-
-
Devon Rifkin authored
Fixes: #5483
-
- 30 Apr, 2025 1 commit
-
-
Devon Rifkin authored
* strip out thinking tags in message history for qwen3 & r1 This is in advance of "proper" support where we'll make reasoning configurable and we'll parse out thinking/reasoning tags and provide them to the caller. These models expect there to be no thinking tags in the message history, so this should improve quality * parse model names instead of hacky prefix check
-
- 03 Mar, 2025 1 commit
-
-
Blake Mizerany authored
Previously, using a Registry required a DiskCache to be passed in for use in various methods. This was a bit cumbersome, as the DiskCache is required for most operations, and the DefaultCache is used in most of those cases. This change makes the DiskCache an optional field on the Registry struct. This also changes DefaultCache to initialize on first use. This is to not burden clients with the cost of creating a new cache per use, or having to hold onto a cache for the lifetime of the Registry. Also, slip in some minor docs updates for Trace.
-
- 27 Feb, 2025 1 commit
-
-
Blake Mizerany authored
This commit introduces a new API implementation for handling interactions with the registry and the local model cache. The new API is located in server/internal/registry. The package name is "registry" and should be considered temporary; it is hidden and not bleeding outside of the server package. As the commits roll in, we'll start consuming more of the API and then let reverse osmosis take effect, at which point it will surface closer to the root level packages as much as needed.
-
- 14 Feb, 2025 1 commit
-
-
Michael Yang authored
feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 01 Jan, 2025 1 commit
-
-
Patrick Devine authored
Replaces `POST /api/create` to use JSON instead of a Modelfile. This is a breaking change.
-
- 11 Dec, 2024 1 commit
-
-
Blake Mizerany authored
Fixes #7944
-
- 19 Nov, 2024 1 commit
-
-
Blake Mizerany authored
This change allows for mixed-case model names to be pushed, pulled, copied, and created, which was previously disallowed because the Ollama registry was backed by a Docker registry that enforced a naming convention that disallowed mixed-case names, which is no longer the case. This does not break existing, intended, behaviors. Also, make TestCase test a story of creating, updating, pulling, and copying a model with case variations, ensuring the model's manifest is updated correctly, and not duplicated across different files with different case variations.
-
- 18 Oct, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:
jmorganca <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me> Co-authored-by:
Jesse Gross <jesse@ollama.com>
-
- 01 Oct, 2024 1 commit
-
-
Alex Mavrogiannis authored
-
- 27 Aug, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 13 Aug, 2024 1 commit
-
-
royjhan authored
* load on empty input * no load on invalid input
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 22 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 16 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
* server: return empty slice on empty `/api/embed` request * fix tests
-
- 15 Jul, 2024 1 commit
-
-
royjhan authored
* Initial Batch Embedding * Revert "Initial Batch Embedding" This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29. * Initial Draft * mock up notes * api/embed draft * add server function * check normalization * clean up * normalization * playing around with truncate stuff * Truncation * Truncation * move normalization to go * Integration Test Template * Truncation Integration Tests * Clean up * use float32 * move normalize * move normalize test * refactoring * integration float32 * input handling and handler testing * Refactoring of legacy and new * clear comments * merge conflicts * touches * embedding type 64 * merge conflicts * fix hanging on single string * refactoring * test values * set context length * clean up * testing clean up * testing clean up * remove function closure * Revert "remove function closure" This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787. * remove function closure * remove redundant error check * clean up * more clean up * clean up
-
- 02 Jul, 2024 1 commit
-
-
royjhan authored
* OpenAI v1 models * Refactor Writers * Add Test Co-Authored-By: Attila Kerekes * Credit Co-Author Co-Authored-By:
Attila Kerekes <439392+keriati@users.noreply.github.com> * Empty List Testing * Use Namespace for Ownedby * Update Test * Add back envconfig * v1/models docs * Use ModelName Parser * Test Names * Remove Docs * Clean Up * Test name Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> * Add Middleware for Chat and List * Testing Cleanup * Test with Fatal * Add functionality to chat test * OpenAI: /v1/models/{model} compatibility (#5028) * Retrieve Model * OpenAI Delete Model * Retrieve Middleware * Remove Delete from Branch * Update Test * Middleware Test File * Function name * Cleanup * Test Update * Test Update --------- Co-authored-by:
Attila Kerekes <439392+keriati@users.noreply.github.com> Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com>
-
- 19 Jun, 2024 1 commit
-
-
royjhan authored
* API Show Extended * Initial Draft of Information Co-Authored-By:
Patrick Devine <pdevine@sonic.net> * Clean Up * Descriptive arg error messages and other fixes * Second Draft of Show with Projectors Included * Remove Chat Template * Touches * Prevent wrapping from files * Verbose functionality * Docs * Address Feedback * Lint * Resolve Conflicts * Function Name * Tests for api/show model info * Show Test File * Add Projector Test * Clean routes * Projector Check * Move Show Test * Touches * Doc update --------- Co-authored-by:
Patrick Devine <pdevine@sonic.net>
-
- 13 Jun, 2024 1 commit
-
-
Patrick Devine authored
-
- 07 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 06 Jun, 2024 1 commit
-
-
royjhan authored
* Remove false time fields * Struct Separation for List and Process * Remove Marshaler
-
- 04 Jun, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 24 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 20 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 14 May, 2024 2 commits
-
-
Michael Yang authored
-
Ryo Machida authored
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty. * Update server/routes.go --------- Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-
- 06 May, 2024 1 commit
-
-
Michael Yang authored
-
- 01 May, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 08 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 01 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
-
- 29 Mar, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 26 Mar, 2024 1 commit
-
-
Patrick Devine authored
-
- 09 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 12 Feb, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 01 Feb, 2024 1 commit
-
-
Michael Yang authored
-