Commits · 15c7d30d9a0732f19ec2eac137fb975bbc8f1966 · OpenDAS / ollama

28 Oct, 2025 1 commit
- embedding tests: added check against exact base64 string (#12790) · 15c7d30d
  nicole pardal authored Oct 28, 2025
  
  15c7d30d
22 Oct, 2025 1 commit
- embeddings: base64 encoding fix (#12715) · e0ead1ad
  nicole pardal authored Oct 22, 2025
  
  e0ead1ad
16 Oct, 2025 1 commit
- openai: make tool call conversion fns public · 160cecc8
  Devon Rifkin authored Oct 15, 2025
  
  160cecc8
13 Oct, 2025 1 commit

Qwen3VL Cloud Parser and Renderer (#12526) · 05982a95

Grace authored Oct 13, 2025



* working (other than tool call is the incorrect order) for tool calls and tools

* Tests work, other than image tags (tests do not go through server) and tools (not in the correct order, but contents are the same)

* testing for qwen3vl parser - toolparser is working

* made changes to JSON tool parser, wraps the TollCallFunction with a TollCall object

* Working parser for thinking models - assumes state of thinking, emits unambiguous content in thinking, does not call tool call in thinking

* changed the parser to start with collecting content

* thinking prefill

* add hasThinkingSupport parameter to parser

* qwen3-vl -> qwen3-vl-instruct for renderer/parser

* Add hasThinkingSupport=false to QwenVLParser

---------
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>

05982a95

09 Oct, 2025 1 commit
- openai: change the reasonin_effort field to also take none · 1b91d4dd
  Patrick Devine authored Oct 08, 2025
  
  1b91d4dd
08 Oct, 2025 1 commit
- thinking: turn on thinking mode for all reasoning models (#12533) · 90d429f5
  Patrick Devine authored Oct 08, 2025
  
  90d429f5
05 Oct, 2025 1 commit

openai: refactor to split compat layer and middleware · 2c2f4dea

Devon Rifkin authored Oct 05, 2025

This makes the core openai compat layer independent of the middleware
that adapts it to our particular gin routes

2c2f4dea

15 Sep, 2025 1 commit

add qwen3-coder tool support · 47991940

Devon Rifkin authored Sep 11, 2025

The format qwen3-coder uses is relatively unique, both in rendering and
in parsing. To implement parsing, I wrote a custom parser in similar
style to harmony. For the rendering, I found that the logic would be
much more difficult to follow in a template, so I introduced the concept
of a built-in renderer that uses go code, rather than a template to
generate prompts.

I set us up for future built-in parsers and renderers by making it so
they can be specified in a Modelfile like so:

```
RENDERER "qwen3-coder"
PARSER "qwen3-coder"
```

These need to be provided explicitly because the architecture alone is
not enough to understand what format the model expects to receive, and
what format we expect it to output (e.g., qwen3-coder is `qwen3moe`,
which includes other qwen3-family models as well)

I haven't converted harmony to be one of these "built-ins" yet, since
some of it is in flux with the changes @ParthSareen has been making to
move harmony to the runner. It is likely that many other built-ins will
need to move to the runner as well, but I'm able to slightly defer that
decision since qwen3-coder doesn't have thinking (and therefore doesn't
need to be in the runner to make structured outputs work). I expect to
unify harmony with this approach very soon.

Whether a particular model supports tools or thinking was previously
inferred from templates, but without a template we now also use the
parser itself to declare what it supports. If we have future models that
re-use the same parsing format, but have different capabilities, we'll
want to parameterize them and give them different names to be specified
as a `PARSER`.

Misc changes:

- I worked on the renderer by diffing outputs from the reference
  implementation and ours. To make it easier to do this, I extended
  <https://github.com/ollama/ollama/pull/11875> to also support
  returning the prompt via the openai compat layer

47991940

11 Sep, 2025 1 commit
- feat: add dimensions field to embed requests (#12242) · feb18cd7
  Michael Yang authored Sep 11, 2025
```
* feat: add field to truncate embeddings

* add openai embeddings for dimensions
```
  feb18cd7
20 Aug, 2025 1 commit
- openai: remove reasoning as an api.Options (#11993) · 91fc3c48
  Michael Yang authored Aug 20, 2025
  
  91fc3c48
12 Aug, 2025 1 commit
- fix(openai): handle reasoning_effort (#11868) · d0cf6c82
  Michael Yang authored Aug 12, 2025
  
  d0cf6c82
07 Aug, 2025 2 commits

openai: always provide reasoning · 735c41f9

Devon Rifkin authored Aug 06, 2025

We were missing passing along thinking if content was nil (as opposed
to empty string)

Also added a test for content not being passed, which was the real cause
of <https://github.com/ollama/ollama/issues/11704>, since with the way
`Content` is typed, not passing it and empty string are distinct

735c41f9

openai: when converting role=tool messages, propagate the tool name · 759dd78d

Devon Rifkin authored Aug 06, 2025

Added support for converting both `name` and `tool_call_id` fields,
which different clients might provide. `name` is a legacy field from the
OpenAI completions API. For `tool_call_id` we inspect previous messages
and look for a matching tool call ID and grab its name

Issue: https://github.com/ollama/ollama/issues/11704

759dd78d

06 Aug, 2025 1 commit

openai: allow for content _and_ tool calls in the same message · 203c1378

Devon Rifkin authored Aug 06, 2025

Previously our OpenAI chat completions compat layer assumed that tool
calls and content would never be provided together, but this is not a
correct assumption. Content is only optional when tool calls are
present, but tool calls and content can be provided together

Fixes: https://github.com/ollama/ollama/issues/11704

203c1378

05 Aug, 2025 2 commits

tools: support anyOf types · 30f8a68c

Devon Rifkin authored Aug 05, 2025

afaik gpt-oss is the first model that meaningfully transforms tool
function definitions in its template. We found that relatively common
definitions that include `anyOf` were not working because the template
was assuming that types were always defined via a `type` field.

anyOf allows for fully recursive types, so I exposed a
`toTypeScriptType()` function to handle this recursive logic in go and
keep the templates cleaner. The gpt-oss templates will need to be
updated to use this.

We should keep building out our function definition support to more
fully support the parts of json schema that make sense for this use
case, but in the meantime this will unblock some users (e.g., zed's
ollama integration w/ gpt-oss). Probably the most urgent is proper array
support

30f8a68c

gpt-oss (#11672) · fa7776fd

Michael Yang authored Aug 05, 2025



* bf16

* tests

* gpt-oss

* enable gptoss for engine

* rough estimate

* convert to mxfp4

* handle safetensors U8

* clamp glu/linear

* update tokenizer

* MXFP4 support

This implements the Open Compute Microscaling (MX) FP4 format
as a tensor type with backend implementations focusing
on mulmat and mulmatid on CPU, CUDA, and Metal.

* Unit tests for MXFP4 support

This exercises various operations and shapes on both CPU and GPU (if detected
on the system)

* cuda graph

* unit test adjustments

* cuda: optimize memory access

Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4

* mac: fix crash on old macos versions

cblas_sgemm is only supported on v13.3 and up, however bf16 is
only supported on v14+ so we were falling back to ggml-blas and
crashing on bf16 tensors.  Checking for the function being null
seems to be the simplest way to condittionally avoid registering the
backend.

* server: Minimum context length for gptoss

This model requires a minimum context length of 8192 to function
effectively. Users can set higher values through all normal mechanisms
but lower values will be silently reset.

* ggml: Multiply by numParallel for gptoss sliding window

When computing the graph size estimate, the context size is already
multiplied by numParallel so estimates reflect that. However, since
sliding window models use a smaller, fixed context size, they need
to manually take numParallel into account.

* gpt-oss integration

includes harmony parser and thinking levels, etc.

* fix sync

* fix tests

* fix lint

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>

fa7776fd

17 Jul, 2025 1 commit
- openai: allow openai endpoint to accept webp images (#11412) · 5e67f4f9
  frob authored Jul 17, 2025
```
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
  5e67f4f9
10 Apr, 2025 1 commit
- types: include the 'items' and '$defs' fields to properly handle "array" types (#10091) · ef65174d
  Tom Sheffler authored Apr 09, 2025
```
---------
Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
```
  ef65174d
08 Apr, 2025 1 commit
- types: add any type and validation for ToolFunction enum (#10166) · 6747099d
  Parth Sareen authored Apr 08, 2025
  
  6747099d
07 Apr, 2025 1 commit
- types: allow tool function parameters with a single type or an array of types (#9434) · 2f723ac2
  Alex Rozgo authored Apr 07, 2025
  
  2f723ac2
02 Apr, 2025 1 commit

chore(all): replace instances of interface with any (#10067) · 9876c9fa

Bruce MacDonald authored Apr 02, 2025

Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.

9876c9fa

13 Feb, 2025 1 commit
- openai: finish_reason as tool_calls for streaming with tools (#7963) · 10d59d5f
  Anuraag (Rag) Agrawal authored Feb 14, 2025
  
  10d59d5f
13 Dec, 2024 1 commit

openai: return usage as final chunk for streams (#6784) · e28f2d49

Anuraag (Rag) Agrawal authored Dec 13, 2024



* openai: return usage as final chunk for streams

---------
Co-authored-by: ParthSareen <parth.sareen@ollama.com>

e28f2d49

11 Dec, 2024 1 commit

llama: preserve field order in user-defined JSON schemas (#8002) · 9039c821

Blake Mizerany authored Dec 11, 2024

Previously we decoded and re-encoded JSON schemas during validation,
which served no purpose since json.RawMessage already validates JSON
syntax. Worse, the re-encoding lost field ordering from the original
schema, which affects inference quality during step-by-step reasoning.

While fixing this ordering issue by using json.RawMessage directly,
testing revealed that schema_to_grammar (from llama.cpp) also fails to
preserve field order during grammar generation. This appears to be the
root cause of inference degradation.

This change prevents us from mangling the user's original schema order,
but we still need to address the ordering issue in schema_to_grammar.
That will be a separate change.

Updates #7978

9039c821

05 Dec, 2024 1 commit

api: structured outputs - chat endpoint (#7900) · 630e7dc6

Parth Sareen authored Dec 04, 2024



Adds structured outputs to chat endpoint
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>

630e7dc6

30 Nov, 2024 1 commit
- Enable index tracking for tools - openai api support (#7888) · 5f805118
  Parth Sareen authored Nov 29, 2024
  
  5f805118
27 Nov, 2024 2 commits
- api: enable tool streaming (#7836) · ce7455a8
  Parth Sareen authored Nov 27, 2024
  
  ce7455a8
- openai: remove unused error code (#7850) · 940e6277
  Bruce MacDonald authored Nov 26, 2024
```
The writeError takes a code argument which is no longer used. Remove it for clarity.
```
  940e6277
07 Sep, 2024 2 commits
- openai: align chat temperature and frequency_penalty options with completion (#6688) · 06d4fba8
  frob authored Sep 07, 2024
  
  06d4fba8
- openai: don't scale temperature or frequency_penalty (#6514) · da915345
  Yaroslav authored Sep 07, 2024
  
  da915345
06 Sep, 2024 1 commit
- openai: fix "presence_penalty" typo and add test (#6665) · fe91d7ff
  frob authored Sep 06, 2024
  
  fe91d7ff
12 Aug, 2024 1 commit

OpenAI: Simplify input output in testing (#5858) · 01d544d3

royjhan authored Aug 12, 2024

* simplify input output

* direct comp

* in line image

* rm error pointer type

* update response testing

* lint

01d544d3

02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
01 Aug, 2024 1 commit

OpenAI: Add Usage to `v1/embeddings` (#5886) · 6f133a0b

royjhan authored Aug 01, 2024

* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* add tokens to v1/embeddings

* separate usage

6f133a0b

29 Jul, 2024 1 commit
- return tool calls finish reason for openai (#5995) · 365431d4
  royjhan authored Jul 29, 2024
```
* hot fix

* backend stream support

* clean up

* finish reason

* move to openai
```
  365431d4
19 Jul, 2024 2 commits
- OpenAI: Function Based Testing (#5752) · c57317cb
  royjhan authored Jul 19, 2024
```
* distinguish error forwarding

* more coverage

* rm comment
```
  c57317cb
- adjust openai chat msg processing (#5729) · 51b2fd29
  royjhan authored Jul 19, 2024
  
  51b2fd29
17 Jul, 2024 2 commits

OpenAI: Support Tools (#5614) · 154f6f45

royjhan authored Jul 16, 2024



* reopen pr

* tools

* remove tc from stream for now

* ID and Function

* openai expects arguments to be a string (#5739)

* mutually exclusive content and tool calls

* clean up

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

154f6f45

OpenAI: Add Suffix to `v1/completions` (#5611) · 0d41623b

royjhan authored Jul 16, 2024

* add suffix

* remove todo

* remove TODO

* add to test

* rm outdated prompt tokens info md

* fix test

* fix test

0d41623b

16 Jul, 2024 1 commit

OpenAI: /v1/embeddings compatibility (#5285) · 987dbab0

royjhan authored Jul 16, 2024



* OpenAI v1 models

* Empty List Testing

* Add back envconfig

* v1/models docs

* Remove Docs

* OpenAI batch embed compatibility

* merge conflicts

* integrate with api/embed

* ep

* merge conflicts

* request tests

* rm resp test

* merge conflict

* merge conflict

* test fixes

* test fn renaming

* input validation for empty string

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>

987dbab0