- 06 Jan, 2026 1 commit
-
-
Parth Sareen authored
-
- 26 Sep, 2025 1 commit
-
-
Patrick Devine authored
There are two bugs when using `/load <model>` for a model that doesn't exist, namely: 1. it will not restore the current model settings if the current model is a thinking model; and 2. it will crash is the current model is a non-thinking model This bug fix saves the current runOptions and then restores them if the model load doesn't happen. It also fixes the crash happening for non-thinking models.
-
- 05 Aug, 2025 1 commit
-
-
Michael Yang authored
* bf16 * tests * gpt-oss * enable gptoss for engine * rough estimate * convert to mxfp4 * handle safetensors U8 * clamp glu/linear * update tokenizer * MXFP4 support This implements the Open Compute Microscaling (MX) FP4 format as a tensor type with backend implementations focusing on mulmat and mulmatid on CPU, CUDA, and Metal. * Unit tests for MXFP4 support This exercises various operations and shapes on both CPU and GPU (if detected on the system) * cuda graph * unit test adjustments * cuda: optimize memory access Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4 * mac: fix crash on old macos versions cblas_sgemm is only supported on v13.3 and up, however bf16 is only supported on v14+ so we were falling back to ggml-blas and crashing on bf16 tensors. Checking for the function being null seems to be the simplest way to condittionally avoid registering the backend. * server: Minimum context length for gptoss This model requires a minimum context length of 8192 to function effectively. Users can set higher values through all normal mechanisms but lower values will be silently reset. * ggml: Multiply by numParallel for gptoss sliding window When computing the graph size estimate, the context size is already multiplied by numParallel so estimates reflect that. However, since sliding window models use a smaller, fixed context size, they need to manually take numParallel into account. * gpt-oss integration includes harmony parser and thinking levels, etc. * fix sync * fix tests * fix lint --------- Co-authored-by:
Daniel Hiltgen <daniel@ollama.com> Co-authored-by:
Jesse Gross <jesse@ollama.com> Co-authored-by:
Devon Rifkin <drifkin@drifkin.net>
-
- 22 Jul, 2025 1 commit
-
-
Patrick Devine authored
--------- Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
- 29 May, 2025 1 commit
-
-
Devon Rifkin authored
- Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/set think` or `/set nothink` - A `--hidethinking` option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like `ollama run qwen3 --think --hidethinking "my question here"` where you just want to see the answer but still want the benefits of thinking models
-
- 13 May, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 10 May, 2025 1 commit
-
-
Bruce MacDonald authored
-
- 20 Apr, 2025 1 commit
-
-
greengrass821 authored
Co-authored-by:tooth paste <tooth_paste91@Poorneshwars-MacBook-Pro.local>
-
- 15 Mar, 2025 1 commit
-
-
Patrick Devine authored
This fixes the case where a FROM line in previous modelfile points to a file which may/may not be present in a different ollama instance. We shouldn't be relying on the filename though and instead just check if the FROM line was instead a valid model name and point to that instead.
-
- 13 Mar, 2025 1 commit
-
-
Patrick Devine authored
Add metadata and tensor information to the show command to be able to see more information about a model. This outputs the same data as shown on the model details page on ollama.com
-
- 12 Mar, 2025 1 commit
-
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
- 01 Jan, 2025 1 commit
-
-
Patrick Devine authored
Replaces `POST /api/create` to use JSON instead of a Modelfile. This is a breaking change.
-
- 22 Dec, 2024 1 commit
-
-
Patrick Devine authored
-
- 26 Nov, 2024 1 commit
-
-
frob authored
-
- 21 Nov, 2024 1 commit
-
-
湛露先生 authored
Signed-off-by:zhanluxianshen <zhanluxianshen@163.com>
-
- 18 Oct, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:
jmorganca <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me> Co-authored-by:
Jesse Gross <jesse@ollama.com>
-
- 10 Oct, 2024 1 commit
-
-
Jesse Gross authored
Currently the CLI only sends images from the most recent image- containing message. This prevents doing things like sending one message with an image and then a follow message with a second image and asking for comparision based on additional information not present in any text that was output. It's possible that some models have a problem with this but the CLI is not the right place to do this since any adjustments are model-specific and should affect all clients. Both llava:34b and minicpm-v do reasonable things with multiple images in the history.
-
- 11 Sep, 2024 2 commits
-
-
Patrick Devine authored
-
Michael Yang authored
fixes line wrapping on long texts
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 27 Jul, 2024 1 commit
-
-
Tibor Schmidt authored
-
- 26 Jul, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
stop parameter is saved as a slice which is incompatible with modelfile parsing
-
- 22 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 14 Jul, 2024 1 commit
-
-
Patrick Devine authored
-
- 28 Jun, 2024 1 commit
-
-
royjhan authored
-
- 25 Jun, 2024 1 commit
-
-
Blake Mizerany authored
This commit changes the 'ollama run' command to defer fetching model information until it really needs it. That is, when in interactive mode. It also removes one such case where the model information is fetch in duplicate, just before calling generateInteractive and then again, first thing, in generateInteractive. This positively impacts the performance of the command: ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.168 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.220 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.217 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 4% cpu 0.652 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 5% cpu 0.498 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 3% cpu 0.479 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total
-
- 04 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 24 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 21 May, 2024 1 commit
-
-
Josh Yan authored
-
- 18 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 14 May, 2024 3 commits
-
-
Patrick Devine authored
-
Patrick Devine authored
-
Patrick Devine authored
-
- 07 May, 2024 1 commit
-
-
Tobias Gårdhus authored
-
- 01 May, 2024 1 commit
-
-
Bryce Reitano authored
* Add a /clear command * change help messages --------- Co-authored-by:Patrick Devine <patrick@infrahq.com>
-
- 23 Apr, 2024 1 commit
-
-
Bruce MacDonald authored
This reverts commit fad00a85.
-
- 22 Apr, 2024 1 commit
-
-
Bruce MacDonald authored
-
- 29 Mar, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 26 Mar, 2024 1 commit
-
-
Patrick Devine authored
-