- 22 Jul, 2025 1 commit
-
-
ycomiti authored
-
- 17 Jul, 2025 1 commit
-
-
frob authored
-
- 16 Jul, 2025 1 commit
-
-
Marcelo Fornet authored
-
- 11 Jul, 2025 1 commit
-
- 08 Jul, 2025 2 commits
-
-
Daniel Hiltgen authored
also removes stale model dir instructions for windows
-
Daniel Hiltgen authored
The current scheduler algorithm of picking the paralellism based on available VRAM complicates the upcoming dynamic layer memory allocation algorithm. This changes the default to 1, with the intent going forward that parallelism is explicit and will no longer be dynamically determined. Removal of the dynamic logic will come in a follow up.
-
- 07 Jul, 2025 2 commits
-
-
Parth Sareen authored
-
Parth Sareen authored
-
- 05 Jul, 2025 1 commit
-
-
Daniel Hiltgen authored
-
- 23 Jun, 2025 1 commit
-
-
Daniel Hiltgen authored
* Re-remove cuda v11 Revert the revert - drop v11 support requiring drivers newer than Feb 23 This reverts commit c6bcdc42. * Simplify layout With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling) * distinct sbsa variant for linux arm64 This avoids accidentally trying to load the sbsa cuda libraries on a jetson system which results in crashes. * temporary prevent rocm+cuda mixed loading
-
- 18 Jun, 2025 1 commit
-
-
Jeffrey Morgan authored
Removes a test under benchmark/ that is unused
-
- 07 Jun, 2025 2 commits
-
-
Krzysztof Jeziorny authored
-
Jeffrey Morgan authored
This reverts commit 09430011.
-
- 06 Jun, 2025 1 commit
-
-
Hunter Wittenborn authored
-
- 04 Jun, 2025 1 commit
-
-
JasonHonKL authored
-
- 29 May, 2025 1 commit
-
-
Devon Rifkin authored
- Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/set think` or `/set nothink` - A `--hidethinking` option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like `ollama run qwen3 --think --hidethinking "my question here"` where you just want to see the answer but still want the benefits of thinking models
-
- 24 May, 2025 1 commit
-
-
frob authored
-
- 13 May, 2025 1 commit
-
-
Daniel Hiltgen authored
Bring back v11 until we can better warn users that their driver is too old. This reverts commit fa393554.
-
- 12 May, 2025 1 commit
-
-
Daniel Hiltgen authored
The quantization PR didn't block all unsupported file types, which this PR fixes. It also updates the API docs to reflect the now reduced set of supported types.
-
- 08 May, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 07 May, 2025 1 commit
-
-
Daniel Hiltgen authored
This reduces the size of our Windows installer payloads by ~256M by dropping support for nvidia drivers older than Feb 2023. Hardware support is unchanged. Linux default bundle sizes are reduced by ~600M to 1G.
-
- 05 May, 2025 1 commit
-
-
Jeffrey Morgan authored
Some options listed in api/types.go are not supported in newer models, or have been deprecated in the past. This is the first of a series of PRs to clean up the API options
-
- 29 Apr, 2025 1 commit
-
-
Devon Rifkin authored
-
- 28 Apr, 2025 1 commit
-
-
Devon Rifkin authored
This reverts commit 424f6486.
-
- 22 Apr, 2025 1 commit
-
-
Devon Rifkin authored
* increase default context length to 4096 We lower the default numParallel from 4 to 2 and use these "savings" to double the default context length from 2048 to 4096. We're memory neutral in cases when we previously would've used numParallel == 4, but we add the following mitigation to handle some cases where we would have previously fallen back to 1x2048 due to low VRAM: we decide between 2048 and 4096 using a runtime check, choosing 2048 if we're on a one GPU system with total VRAM of <= 4 GB. We purposefully don't check the available VRAM because we don't want the context window size to change unexpectedly based on the available VRAM. We plan on making the default even larger, but this is a relatively low-risk change we can make to quickly double it. * fix tests add an explicit context length so they don't get truncated. The code that converts -1 from being a signal for doing a runtime check isn't running as part of these tests. * tweak small gpu message * clarify context length default also make it actually show up in `ollama serve --help`
-
- 15 Apr, 2025 2 commits
-
-
Devon Rifkin authored
In #8215 syntax highlighting was added to most of the blocks, but there were a couple that were still being rendered as plaintext
-
Devon Rifkin authored
This is to prevent rendering bright red comments indicating invalid JSON when the comments are just supposed to be explanatory
-
- 08 Apr, 2025 1 commit
-
-
frob authored
* cleanup: remove OLLAMA_TMPDIR * cleanup: ollama doesn't use temporary executables anymore --------- Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
- 01 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.
-
- 27 Mar, 2025 1 commit
-
-
Parth Sareen authored
-
- 25 Mar, 2025 1 commit
-
-
copeland3300 authored
-
- 21 Mar, 2025 2 commits
-
-
Bruce MacDonald authored
-
Parth Sareen authored
-
- 13 Mar, 2025 1 commit
-
-
Bradley Erickson authored
-
- 10 Mar, 2025 1 commit
-
-
frob authored
-
- 07 Mar, 2025 1 commit
-
-
rekcäH nitraM authored
The problem with default.target is that it always points to the target that is currently started. So if you boot into single user mode or the rescue mode still Ollama tries to start. I noticed this because either tried (and failed) to start all the time during a system update, where Ollama definitely is not wanted.
-
- 05 Mar, 2025 1 commit
-
-
Daniel Hiltgen authored
To stay under the 2G github artifact limit, we're splitting ROCm out like we do on linux.
-
- 04 Mar, 2025 1 commit
-
-
Blake Mizerany authored
Previously, developers without the synctest experiment enabled would see build failures when running tests in some server/internal/internal packages using the synctest package. This change makes the transition to use of the package less painful but guards the use of the synctest package with build tags. synctest is enabled in CI. If a new change will break a synctest package, it will break in CI, even if it does not break locally. The developer docs have been updated to help with any confusion about why package tests pass locally but fail in CI.
-
- 27 Feb, 2025 1 commit
-
-
Daniel Hiltgen authored
* Windows ARM build Skip cmake, and note it's unused in the developer docs. * Win: only check for ninja when we need it On windows ARM, the cim lookup fails, but we don't need ninja anyway.
-
- 25 Feb, 2025 1 commit
-
-
Chuanhui Liu authored
-