Commits · fa7776fd2458fc3a8aeb7f12e4bc65b439955319 · OpenDAS / ollama

05 Aug, 2025 1 commit

Michael Yang authored Aug 05, 2025



* bf16

* tests

* gpt-oss

* enable gptoss for engine

* rough estimate

* convert to mxfp4

* handle safetensors U8

* clamp glu/linear

* update tokenizer

* MXFP4 support

This implements the Open Compute Microscaling (MX) FP4 format
as a tensor type with backend implementations focusing
on mulmat and mulmatid on CPU, CUDA, and Metal.

* Unit tests for MXFP4 support

This exercises various operations and shapes on both CPU and GPU (if detected
on the system)

* cuda graph

* unit test adjustments

* cuda: optimize memory access

Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4

* mac: fix crash on old macos versions

cblas_sgemm is only supported on v13.3 and up, however bf16 is
only supported on v14+ so we were falling back to ggml-blas and
crashing on bf16 tensors.  Checking for the function being null
seems to be the simplest way to condittionally avoid registering the
backend.

* server: Minimum context length for gptoss

This model requires a minimum context length of 8192 to function
effectively. Users can set higher values through all normal mechanisms
but lower values will be silently reset.

* ggml: Multiply by numParallel for gptoss sliding window

When computing the graph size estimate, the context size is already
multiplied by numParallel so estimates reflect that. However, since
sliding window models use a smaller, fixed context size, they need
to manually take numParallel into account.

* gpt-oss integration

includes harmony parser and thinking levels, etc.

* fix sync

* fix tests

* fix lint

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>

fa7776fd

29 May, 2025 1 commit

add thinking support to the api and cli (#10584) · 5f57b0ef

Devon Rifkin authored May 28, 2025

- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models

5f57b0ef

14 May, 2025 1 commit
- chore: update mllama to use ollama engine (#10637) · 23125648
  Michael Yang authored May 13, 2025
  
  23125648
08 May, 2025 1 commit
- lint: enable usetesting, disable tenv (#10594) · 6e9a7a25
  Michael Yang authored May 08, 2025
  
  6e9a7a25
09 Dec, 2024 1 commit

prompt: Don't trim whitespace from prompts · 900f64e6

Jesse Gross authored Dec 06, 2024

New lines can be an important part of a user's prompt and trimming
it can alter the results. We previously only trimmed prompts with
images but refactoring brought this behavior to all prompts, where
it became more noticable.

The /generate endpoint adds less whitespace and therefore doesn't
need to trim it out - this brings the same behavior to /chat.

Thanks to @gabe-l-hart for spotting the issue!

Fixes #7795

900f64e6

17 Nov, 2024 1 commit
- server: fix warnings in prompt_test.go (#7710) · 8b4b243f
  Jeffrey Morgan authored Nov 17, 2024
  
  8b4b243f
30 Oct, 2024 1 commit

runner.go: Better abstract vision model integration · c826e574

Jesse Gross authored Oct 11, 2024



-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly
Co-authored-by: Michael Yang <mxyng@pm.me>

c826e574

18 Oct, 2024 1 commit

image processing for llama3.2 (#6963) · c7cb0f06

Patrick Devine authored Oct 18, 2024


Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>

c7cb0f06

02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
16 Jul, 2024 1 commit
- add chat and generate tests with mock runner · 4a565cbf
  Michael Yang authored Jul 13, 2024
  
  4a565cbf
15 Jul, 2024 1 commit
- tools · d02bbebb
  Michael Yang authored Jun 20, 2024
  
  d02bbebb
13 Jul, 2024 1 commit

fix system prompt (#5662) · 22c5451f

Michael Yang authored Jul 12, 2024



* fix system prompt

* execute template when hitting previous roles

* fix tests

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>

22c5451f

11 Jul, 2024 1 commit
- add system prompt to first legacy template · 41be2809
  Michael Yang authored Jul 10, 2024
  
  41be2809
05 Jul, 2024 2 commits
- comments · 2c3fe1fd
  Michael Yang authored Jun 20, 2024
  
  2c3fe1fd
- update message processing · 269ed6e6
  Michael Yang authored Jun 17, 2024
  
  269ed6e6
01 Jul, 2024 1 commit
- rename templates to template · 58e3fff3
  Michael Yang authored Jun 10, 2024
  
  58e3fff3
26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
29 Feb, 2024 1 commit

prepend image tags (#2789) · 0e19476b

Michael Yang authored Feb 29, 2024

instead of appending image tags, prepend them - this generally produces better results

0e19476b

16 Feb, 2024 1 commit
- fix: chat system prompting overrides (#2542) · 88622847
  Bruce MacDonald authored Feb 16, 2024
  
  88622847
12 Feb, 2024 1 commit
- Fix issues with templating prompt in chat mode (#2460) · 48a273f8
  Jeffrey Morgan authored Feb 12, 2024
  
  48a273f8