- 23 Dec, 2025 2 commits
-
-
Vallabh Mahajan authored
-
Daniel Hiltgen authored
On Linux, look at the GTT memory information for iGPUs.
-
- 19 Dec, 2025 1 commit
-
-
Jesse Gross authored
On the llama engine, when we compute the memory layout, we reserve a buffer to allow for some flexibility for incorrect estimates. This is subtracted from GPU free memory and on GPUs with limited memory, it may underflow. Fixes #13494
-
- 18 Dec, 2025 4 commits
-
-
Jeffrey Morgan authored
-
Parth Sareen authored
-
Grace authored
-
- 17 Dec, 2025 3 commits
-
-
Daniel Hiltgen authored
* Revert "add support for NVIDIA Nemotron 3 Nano" This reverts commit e7d2ae9d69421012e9a8765c06a3fdf0e45b12f3. * GGML update to 380b4c984 Remove MaskBatchPadding as GGML_KQ_MASK_PAD is no longer present (no padding required) * update to c45f89d55 * ec98e2002 solar pro needed more adjusting - needs verification * review comments
-
Parth Sareen authored
-
Grace authored
-
- 16 Dec, 2025 8 commits
-
-
Michael Yang authored
-
Bruce MacDonald authored
Refactored the ConfigV2 and RootFS types from server/images.go to a new types/model/config.go file under the model package. Updated all references to use model.ConfigV2 and model.RootFS. This allows for use in other projects without worrying about compiling the c code in the llama package.
-
Michael Yang authored
slog is already lazily evaluated so this code is completely redundant
-
Michael Yang authored
register bpe tokenizer which enables granite-embedding
-
Parth Sareen authored
-
Parth Sareen authored
--------- Co-authored-by:Daniel Hiltgen <daniel@ollama.com>
-
Grace authored
-
Michael Yang authored
* qwen25vl: bump max pixels * qwen25vl: mrope fix qwen2.5vl window * qwen25vl: vision rope
-
- 15 Dec, 2025 6 commits
-
-
Parth Sareen authored
-
Grace authored
-
Nhan Nguyen authored
The ggml/src/CMakeLists.txt uses GGML_VERSION_MAJOR for the shared library SOVERSION property, but these variables were not defined when building from ollama's CMakeLists.txt. This caused libggml-base.so to be named with a literal "SOVERSION" suffix (libggml-base.so.SOVERSION) instead of the actual version number (libggml-base.so.0). The fix adds the required GGML_VERSION_* variables before including the ggml subdirectory. Fixes #13436
-
Parth Sareen authored
-
Eva H authored
-
Daniel Hiltgen authored
This reverts commit 56f754f46b87749581f73ef3625314bb0e51bfed.
-
- 13 Dec, 2025 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 12 Dec, 2025 10 commits
-
-
Daniel Hiltgen authored
* flash attn: add auto mode for llama engine If the user does not specify fa in the environment, use auto-mode. * review comments * ensure kv cache quantized types have FA explicitly enabled additional review comments
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
This changes the default behavior to use the Ollama engine for supported models, while retaining the ability to disable the Ollama engine and fall back to the Llama engine. Models in the OllamaEngineRequired list will always run on the Ollama engine.
-
Eva H authored
-
Eva H authored
-
Devon Rifkin authored
* docs: add docs for v1/responses and rework openai compat section I reworked the examples to be separated by topic and to be fully runnable (i.e., they now log output instead of just suggesting how a call might be made). We now use `<CodeGroup>`s so that each example has a dropdown on the docs site for users to choose, which makes the examples a lot more digestible (since you only see approx 1/3 of the code you used to). I also added a new tool to extract code examples into files so that it's easier to actually run them and check that they work. ## Example ```shell go run docs/tools/extract-examples/main.go docs/api/openai-compatibility.mdx ``` Output: ``` Extracting code examples to: /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368 - 01_basic.py - 01_basic.js - 01_basic.sh - 02_responses.py - 02_responses.js - 02_responses.sh - 03_vision.py - 03_vision.js - 03_vision.sh Extracted 9 file(s) to /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368 To run examples: cd /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368 npm install # for JS examples then run individual files with `node file.js`, `python file.py`, `bash file.sh` ``` In the future we should consider actually running the examples in CI and having some sort of acceptance test so we can automatically detect when our examples break. So this is just a start in that direction. * Update docs/api/openai-compatibility.mdx Co-authored-by:
Parth Sareen <parth.sareen@ollama.com> * Update docs/api/openai-compatibility.mdx Co-authored-by:
Parth Sareen <parth.sareen@ollama.com> --------- Co-authored-by:
Parth Sareen <parth.sareen@ollama.com>
-
Parth Sareen authored
* openai: add tool call appending to previous asst message * add tests for thinking appending
-
Alexander Gusak authored
-
JJ authored
Correct Markdown syntax for Swollama GitHub and DocC documentation links
-
Jeffrey Morgan authored
-
- 11 Dec, 2025 4 commits
-
-
Devon Rifkin authored
Only supporting the stateless part of the API. Doc updates to come once this is shipped. Closes: #9659
-
nicole pardal authored
This PR detects embedding models and sets batch_size = context_size so the full input fits in a single batch. Previously, if batch size was smaller than the input, tokens could be split across batches and cause a SIGTRAP crash. This change ensures all tokens stay in one batch and prevents crashes. Fixes: #12938 #13054 Co-authored-by:Jesse Gross <jesse@ollama.com>
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-