Commits · ec8bf5e6c57a0a2a78c41cea4e9619b384f2ea39 · OpenDAS / ollama

"vscode:/vscode.git/clone" did not exist on "d2eaeb8dd86fb583278ec9f7cdef4cf1928913f7"

15 Aug, 2025 1 commit
- server: add debug option for printing out prompt instead of calling model · 8de1da47
  Devon Rifkin authored Aug 15, 2025
  
  8de1da47
12 Aug, 2025 1 commit
- fix(openai): handle reasoning_effort (#11868) · d0cf6c82
  Michael Yang authored Aug 12, 2025
  
  d0cf6c82
05 Aug, 2025 2 commits

Devon Rifkin authored Aug 05, 2025

afaik gpt-oss is the first model that meaningfully transforms tool
function definitions in its template. We found that relatively common
definitions that include `anyOf` were not working because the template
was assuming that types were always defined via a `type` field.

anyOf allows for fully recursive types, so I exposed a
`toTypeScriptType()` function to handle this recursive logic in go and
keep the templates cleaner. The gpt-oss templates will need to be
updated to use this.

We should keep building out our function definition support to more
fully support the parts of json schema that make sense for this use
case, but in the meantime this will unblock some users (e.g., zed's
ollama integration w/ gpt-oss). Probably the most urgent is proper array
support

30f8a68c

gpt-oss (#11672) · fa7776fd

Michael Yang authored Aug 05, 2025



* bf16

* tests

* gpt-oss

* enable gptoss for engine

* rough estimate

* convert to mxfp4

* handle safetensors U8

* clamp glu/linear

* update tokenizer

* MXFP4 support

This implements the Open Compute Microscaling (MX) FP4 format
as a tensor type with backend implementations focusing
on mulmat and mulmatid on CPU, CUDA, and Metal.

* Unit tests for MXFP4 support

This exercises various operations and shapes on both CPU and GPU (if detected
on the system)

* cuda graph

* unit test adjustments

* cuda: optimize memory access

Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4

* mac: fix crash on old macos versions

cblas_sgemm is only supported on v13.3 and up, however bf16 is
only supported on v14+ so we were falling back to ggml-blas and
crashing on bf16 tensors.  Checking for the function being null
seems to be the simplest way to condittionally avoid registering the
backend.

* server: Minimum context length for gptoss

This model requires a minimum context length of 8192 to function
effectively. Users can set higher values through all normal mechanisms
but lower values will be silently reset.

* ggml: Multiply by numParallel for gptoss sliding window

When computing the graph size estimate, the context size is already
multiplied by numParallel so estimates reflect that. However, since
sliding window models use a smaller, fixed context size, they need
to manually take numParallel into account.

* gpt-oss integration

includes harmony parser and thinking levels, etc.

* fix sync

* fix tests

* fix lint

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>

fa7776fd

08 Jul, 2025 1 commit

API/CLI context enhancements (#11331) · 34088dbc

Daniel Hiltgen authored Jul 08, 2025

* API: expose context size of loaded models

* CLI: add context UX

This adds a column in the ps output to show the models context size.

34088dbc

07 Jul, 2025 1 commit
- template: add tool result compatibility (#11294) · 1f91cb0c
  Parth Sareen authored Jul 07, 2025
  
  1f91cb0c
07 Jun, 2025 1 commit
- Revert "server: add model capabilities to the list endpoint (#10174)" (#11004) · 09d308d6
  Jeffrey Morgan authored Jun 06, 2025
```
This reverts commit 09430011.
```
  09d308d6
04 Jun, 2025 1 commit
- server: add model capabilities to the list endpoint (#10174) · 09430011
  JasonHonKL authored Jun 05, 2025
  
  09430011
29 May, 2025 1 commit

add thinking support to the api and cli (#10584) · 5f57b0ef

Devon Rifkin authored May 28, 2025

- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models

5f57b0ef

08 May, 2025 1 commit
- api: remove unused sampling parameters (#10581) · fa9973cd
  Jeffrey Morgan authored May 08, 2025
  
  fa9973cd
07 May, 2025 1 commit
- api: remove unused RetrieveModelResponse type (#10603) · 392de840
  Jeffrey Morgan authored May 06, 2025
  
  392de840
05 May, 2025 1 commit

api: remove unused or unsupported api options (#10574) · 3b2d2c83

Jeffrey Morgan authored May 05, 2025

Some options listed in api/types.go are not supported in
newer models, or have been deprecated in the past. This is
the first of a series of PRs to clean up the API options

3b2d2c83

24 Apr, 2025 1 commit
- api: fix ImageData struct comment to expect raw image bytes (#10386) · 40b10eee
  Adrien Duermael authored Apr 23, 2025
  
  40b10eee
10 Apr, 2025 1 commit
- types: include the 'items' and '$defs' fields to properly handle "array" types (#10091) · ef65174d
  Tom Sheffler authored Apr 09, 2025
```
---------
Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
```
  ef65174d
08 Apr, 2025 1 commit
- types: add any type and validation for ToolFunction enum (#10166) · 6747099d
  Parth Sareen authored Apr 08, 2025
  
  6747099d
07 Apr, 2025 1 commit
- types: allow tool function parameters with a single type or an array of types (#9434) · 2f723ac2
  Alex Rozgo authored Apr 07, 2025
  
  2f723ac2
02 Apr, 2025 1 commit

chore(all): replace instances of interface with any (#10067) · 9876c9fa

Bruce MacDonald authored Apr 02, 2025

Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.

9876c9fa

01 Apr, 2025 1 commit

api: return model capabilities from the show endpoint (#10066) · e172f095

Bruce MacDonald authored Apr 01, 2025

With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.

e172f095

13 Mar, 2025 1 commit

add verbose mode to the show command (#9640) · 4bed7392

Patrick Devine authored Mar 13, 2025

Add metadata and tensor information to the show command to be able to
see more information about a model. This outputs the same data as
shown on the model details page on ollama.com

4bed7392

05 Mar, 2025 1 commit

server/internal/registry: take over pulls from server package (#9485) · e2252d0f

Blake Mizerany authored Mar 05, 2025

This commit replaces the old pull implementation in the server package
with the new, faster, more robust pull implementation in the registry
package.

The new endpoint, and now the remove endpoint too, are behind the
feature gate "client2" enabled only by setting the OLLAMA_EXPERIMENT
environment variable include "client2".

Currently, the progress indication is wired to perform the same as the
previous implementation to avoid making changes to the CLI, and because
the status reports happen at the start of the download, and the end of
the write to disk, the progress indication is not as smooth as it could
be. This is a known issue and will be addressed in a future change.

This implementation may be ~0.5-1.0% slower in rare cases, depending on
network and disk speed, but is generally MUCH faster and more robust
than the its predecessor in all other cases.

e2252d0f

24 Feb, 2025 1 commit
- config: allow setting context length through env var (#8938) · 314573bf
  Parth Sareen authored Feb 24, 2025
```
* envconfig: allow setting context length through env var
```
  314573bf
08 Jan, 2025 1 commit
- llama: update vendored code to commit 46e3556 (#8308) · 1deafd82
  Jeffrey Morgan authored Jan 08, 2025
  
  1deafd82
03 Jan, 2025 1 commit

api: remove unused create fields · 29a8975c

Bruce MacDonald authored Jan 03, 2025

These fields are deprecated, but specifying them will not do anything. Removing them as the other deprecated fields will still work, but these do not, so they dont match our existing pattern.

29a8975c

01 Jan, 2025 1 commit
- Update the /api/create endpoint to use JSON (#7935) · 86a622cb
  Patrick Devine authored Dec 31, 2024
```
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
```
  86a622cb
11 Dec, 2024 1 commit
- llama: update vendored code to commit 40c6d79f (#7875) · 527cc978
  Jeffrey Morgan authored Dec 10, 2024
  
  527cc978
05 Dec, 2024 2 commits
- api: add generate endpoint for structured outputs (#7939) · c6c52627
  Parth Sareen authored Dec 04, 2024
  
  c6c52627
- api: structured outputs - chat endpoint (#7900) · 630e7dc6
  Parth Sareen authored Dec 04, 2024
```
Adds structured outputs to chat endpoint
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>
```
  630e7dc6
30 Nov, 2024 1 commit
- Enable index tracking for tools - openai api support (#7888) · 5f805118
  Parth Sareen authored Nov 29, 2024
  
  5f805118
12 Nov, 2024 1 commit
- api: fix typos in Go Doc comments (#7620) · d48c1c5a
  Evan authored Nov 11, 2024
  
  d48c1c5a
06 Nov, 2024 1 commit

runner.go: Remove unused arguments · a9094176

Jesse Gross authored Oct 30, 2024

Now that server.cpp is gone, we don't need to keep passing arguments
that were only ignored and only kept for compatibility.

a9094176

28 Aug, 2024 1 commit
- update deprecated warnings · 8e6da3cb
  Michael Yang authored Aug 27, 2024
  
  8e6da3cb
06 Aug, 2024 1 commit
- Fixed invalid option provided not displaying the invalid option name problem. (#6202) · d4a7216c
  Chua Chee Seng authored Aug 07, 2024
  
  d4a7216c
05 Aug, 2024 1 commit

Implement linux NUMA detection · f457d634

Daniel Hiltgen authored Aug 05, 2024

If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.

f457d634

30 Jul, 2024 1 commit

Add Metrics to `api\embed` response (#5709) · 1b44d873

royjhan authored Jul 30, 2024

* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics

1b44d873

29 Jul, 2024 1 commit
- api: add stringifier for `Tool` (#5891) · 46e6327e
  Jeffrey Morgan authored Jul 29, 2024
  
  46e6327e
27 Jul, 2024 1 commit
- feat: add support for min_p (resolve #1142) (#1825) · f3d7a481
  Tibor Schmidt authored Jul 27, 2024
  
  f3d7a481
18 Jul, 2024 1 commit
- always provide content even if empty (#5778) · 84e5721f
  Jeffrey Morgan authored Jul 18, 2024
  
  84e5721f
17 Jul, 2024 1 commit
- marshal json automatically for some template values (#5758) · b2554455
  Michael Yang authored Jul 17, 2024
  
  b2554455
16 Jul, 2024 2 commits
- remove ToolCall from GenerateResponse · c279f963
  Michael Yang authored Jul 16, 2024
  
  c279f963
- add suffix support to generate endpoint · d290e875
  Michael Yang authored Jun 20, 2024
```
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
```
  d290e875