- 18 Dec, 2025 3 commits
-
-
Jeffrey Morgan authored
-
Grace authored
-
- 17 Dec, 2025 1 commit
-
-
Parth Sareen authored
-
- 12 Dec, 2025 1 commit
-
-
Alexander Gusak authored
-
- 04 Dec, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 01 Dec, 2025 1 commit
-
-
Bruce MacDonald authored
While processing the response stream during a chat or generation if an error is occurred it is parsed and returned to the user. The issue with the existing code is that this assumed the response would be valid JSON, which is not a safe assumption and caused cryptic error messages to be displayed due to parsing failures: `invalid character 'i' looking for beginning of value` This change updates the stream function to return the raw error string if it cant be parsed as JSON. This should help with debugging issues by making sure the actual error reaches the user.
-
- 13 Nov, 2025 1 commit
-
-
Parth Sareen authored
-
- 11 Nov, 2025 1 commit
-
-
Baptiste Jamin authored
Adds logprobs support to Ollama's API including support for Ollama's OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter in the API, Ollama will return the log probabilities for each token generated. 'top_logprobs', an integer value can also be specified up to the value 20. When specified, the API will also provide the number of most likely tokens to return at each token position Co-authored-by:Baptiste Jamin <baptiste@crisp.chat>
-
- 06 Nov, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 05 Nov, 2025 1 commit
-
-
Grace authored
* routes/types: add tool call id --------- Co-authored-by:ParthSareen <parth.sareen@ollama.com>
-
- 15 Oct, 2025 1 commit
-
-
Parth Sareen authored
-
- 13 Oct, 2025 1 commit
-
-
Grace authored
* working (other than tool call is the incorrect order) for tool calls and tools * Tests work, other than image tags (tests do not go through server) and tools (not in the correct order, but contents are the same) * testing for qwen3vl parser - toolparser is working * made changes to JSON tool parser, wraps the TollCallFunction with a TollCall object * Working parser for thinking models - assumes state of thinking, emits unambiguous content in thinking, does not call tool call in thinking * changed the parser to start with collecting content * thinking prefill * add hasThinkingSupport parameter to parser * qwen3-vl -> qwen3-vl-instruct for renderer/parser * Add hasThinkingSupport=false to QwenVLParser --------- Co-authored-by:Devon Rifkin <drifkin@drifkin.net>
-
- 11 Oct, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 09 Oct, 2025 2 commits
-
-
Jeffrey Morgan authored
This reverts commit 6a62b894.
-
Jeffrey Morgan authored
-
- 08 Oct, 2025 1 commit
-
-
Patrick Devine authored
-
- 23 Sep, 2025 1 commit
-
-
Patrick Devine authored
* auth: fix problems with the ollama keypairs This change adds several fixes including: - reading in the pubkey files correctly - fixing the push unit test to create a keypair file in a temp directory - not return 500 errors for normal status error
-
- 17 Sep, 2025 1 commit
-
-
Patrick Devine authored
-
- 15 Sep, 2025 1 commit
-
-
Devon Rifkin authored
The format qwen3-coder uses is relatively unique, both in rendering and in parsing. To implement parsing, I wrote a custom parser in similar style to harmony. For the rendering, I found that the logic would be much more difficult to follow in a template, so I introduced the concept of a built-in renderer that uses go code, rather than a template to generate prompts. I set us up for future built-in parsers and renderers by making it so they can be specified in a Modelfile like so: ``` RENDERER "qwen3-coder" PARSER "qwen3-coder" ``` These need to be provided explicitly because the architecture alone is not enough to understand what format the model expects to receive, and what format we expect it to output (e.g., qwen3-coder is `qwen3moe`, which includes other qwen3-family models as well) I haven't converted harmony to be one of these "built-ins" yet, since some of it is in flux with the changes @ParthSareen has been making to move harmony to the runner. It is likely that many other built-ins will need to move to the runner as well, but I'm able to slightly defer that decision since qwen3-coder doesn't have thinking (and therefore doesn't need to be in the runner to make structured outputs work). I expect to unify harmony with this approach very soon. Whether a particular model supports tools or thinking was previously inferred from templates, but without a template we now also use the parser itself to declare what it supports. If we have future models that re-use the same parsing format, but have different capabilities, we'll want to parameterize them and give them different names to be specified as a `PARSER`. Misc changes: - I worked on the renderer by diffing outputs from the reference implementation and ours. To make it easier to do this, I extended <https://github.com/ollama/ollama/pull/11875> to also support returning the prompt via the openai compat layer
-
- 11 Sep, 2025 1 commit
-
-
Michael Yang authored
* feat: add field to truncate embeddings * add openai embeddings for dimensions
-
- 27 Aug, 2025 1 commit
-
-
Michael Yang authored
-
- 22 Aug, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 15 Aug, 2025 1 commit
-
-
Devon Rifkin authored
-
- 12 Aug, 2025 1 commit
-
-
Michael Yang authored
-
- 05 Aug, 2025 2 commits
-
-
Devon Rifkin authored
afaik gpt-oss is the first model that meaningfully transforms tool function definitions in its template. We found that relatively common definitions that include `anyOf` were not working because the template was assuming that types were always defined via a `type` field. anyOf allows for fully recursive types, so I exposed a `toTypeScriptType()` function to handle this recursive logic in go and keep the templates cleaner. The gpt-oss templates will need to be updated to use this. We should keep building out our function definition support to more fully support the parts of json schema that make sense for this use case, but in the meantime this will unblock some users (e.g., zed's ollama integration w/ gpt-oss). Probably the most urgent is proper array support
-
Michael Yang authored
* bf16 * tests * gpt-oss * enable gptoss for engine * rough estimate * convert to mxfp4 * handle safetensors U8 * clamp glu/linear * update tokenizer * MXFP4 support This implements the Open Compute Microscaling (MX) FP4 format as a tensor type with backend implementations focusing on mulmat and mulmatid on CPU, CUDA, and Metal. * Unit tests for MXFP4 support This exercises various operations and shapes on both CPU and GPU (if detected on the system) * cuda graph * unit test adjustments * cuda: optimize memory access Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4 * mac: fix crash on old macos versions cblas_sgemm is only supported on v13.3 and up, however bf16 is only supported on v14+ so we were falling back to ggml-blas and crashing on bf16 tensors. Checking for the function being null seems to be the simplest way to condittionally avoid registering the backend. * server: Minimum context length for gptoss This model requires a minimum context length of 8192 to function effectively. Users can set higher values through all normal mechanisms but lower values will be silently reset. * ggml: Multiply by numParallel for gptoss sliding window When computing the graph size estimate, the context size is already multiplied by numParallel so estimates reflect that. However, since sliding window models use a smaller, fixed context size, they need to manually take numParallel into account. * gpt-oss integration includes harmony parser and thinking levels, etc. * fix sync * fix tests * fix lint --------- Co-authored-by:
Daniel Hiltgen <daniel@ollama.com> Co-authored-by:
Jesse Gross <jesse@ollama.com> Co-authored-by:
Devon Rifkin <drifkin@drifkin.net>
-
- 16 Jul, 2025 1 commit
-
-
Bruce MacDonald authored
StatusError was unreachable, the client always checked for error messages in the response body first, and the server always includes error messages with HTTP error status codes.
-
- 08 Jul, 2025 1 commit
-
-
Daniel Hiltgen authored
* API: expose context size of loaded models * CLI: add context UX This adds a column in the ps output to show the models context size.
-
- 07 Jul, 2025 1 commit
-
-
Parth Sareen authored
-
- 07 Jun, 2025 1 commit
-
-
Jeffrey Morgan authored
This reverts commit 09430011.
-
- 04 Jun, 2025 1 commit
-
-
JasonHonKL authored
-
- 29 May, 2025 1 commit
-
-
Devon Rifkin authored
- Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/se...
-
- 27 May, 2025 1 commit
-
-
Patrick Devine authored
If OLLAMA_AUTH is set, sign each request w/ a timestamp and pass the signature in the token header
-
- 08 May, 2025 2 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 07 May, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 05 May, 2025 1 commit
-
-
Jeffrey Morgan authored
Some options listed in api/types.go are not supported in newer models, or have been deprecated in the past. This is the first of a series of PRs to clean up the API options
-
- 24 Apr, 2025 1 commit
-
-
Adrien Duermael authored
-
- 10 Apr, 2025 1 commit
-
-
Tom Sheffler authored
--------- Co-authored-by:Parth Sareen <parth.sareen@ollama.com>
-