Commits · 3f3083673496adcc0429ff213dabb0c4fcbe21a2 · OpenDAS / ollama

03 Dec, 2025 1 commit

CUDA: filter devices on secondary discovery (#13317) · 3f308367

Daniel Hiltgen authored Dec 03, 2025

We now do a deeper probe of CUDA devices to verify the library version has
the correct compute capability coverage for the device. Due to ROCm also
interpreting the CUDA env var to filter AMD devices, we try to avoid setting
it which leads to problems in mixed vendor systems. However without setting
it for this deeper probe, each CUDA library subprocess discovers all CUDA GPUs
and on systems with lots of GPUs, this can lead to hitting timeouts. The fix is
to turn on the CUDA visibility env var just for this deeper probe use-case.

3f308367

02 Dec, 2025 7 commits
- Update user message format for temperature query (#13256) · cc9555af
  Nathan Hook authored Dec 02, 2025
  
  cc9555af
- Add Vulkan GPU support instructions in development.md (#13265) · 20aee967
  hello_world authored Dec 03, 2025
```
Added Vulkan SDK installation instructions and environment variable setup for building with Vulkan support.
```
  20aee967
- test: avoid ministral tools test on low vram (#13302) · 18b5958d
  Daniel Hiltgen authored Dec 02, 2025
```
Avoid hitting test timeouts
```
  18b5958d
- llm: Don't always evict models on CPU-only systems · 5317202c
  Jesse Gross authored Nov 25, 2025
```
Model eviction happens when we have at least one other model
loaded and are unable to load all layers into VRAM. However, on
CPU-only systems we can never load layers into VRAM, so this
constantly triggered eviction.

Fixes #13227
```
  5317202c
- test: add ministral-3 (#13300) · d771043e
  Daniel Hiltgen authored Dec 02, 2025
  
  d771043e
- CUDA: verify CC is supported by target library (#13298) · f8f10718
  Daniel Hiltgen authored Dec 02, 2025
  
  f8f10718
- model: ministral w/ llama4 scaling (#13292) · d3e0a0de
  Patrick Devine authored Dec 01, 2025
```
This change:

* fixes rope scaling in the mistral converter
* updates ministral to include llama4 scaling
* includes a new ministral parser for parsing reasoning and tool calling

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
```
  d3e0a0de
01 Dec, 2025 3 commits

win: warn if ggml-base detected in PATH (#13289) · 55417275

Daniel Hiltgen authored Dec 01, 2025

If the user has somehow installed another GGML based app which places a
ggml-base lib somewhere in their PATH, we can experience runtime problems
due to incompatibilities. This change adds a warning message if we detect
a ggml-base outside of our install location to aid in troubleshooting.

55417275

api/client: handle non-json streaming errors (#13007) · 5b6a8e60

Bruce MacDonald authored Dec 01, 2025

While processing the response stream during a chat or generation if an error is occurred it is parsed and returned to the user. The issue with the existing code is that this assumed the response would be valid JSON, which is not a safe assumption and caused cryptic error messages to be displayed due to parsing failures:
`invalid character 'i' looking for beginning of value`

This change updates the stream function to return the raw error string if it cant be parsed as JSON. This should help with debugging issues by making sure the actual error reaches the user.

5b6a8e60

jetpack: require exact match or skip cuda_jetpack* (#13288) · 467bbc0d

Daniel Hiltgen authored Dec 01, 2025

The cuda_jetpack libs will enumerate discrete GPUs on SBSA systems
which leads to runtime failures of missing kernels.  This fix
requires an exact match to enable jetpacks instead of relying on
enumeration to filter out supported libraries.

467bbc0d

30 Nov, 2025 1 commit
- .gitattributes: add app/webview to linguist-vendored (#13274) · 6d9f9323
  Jeffrey Morgan authored Nov 29, 2025
  
  6d9f9323
29 Nov, 2025 1 commit
- docs: fix output formatting in faq.mdx (#13231) · 0c248960
  Ondrej Kokes authored Nov 29, 2025
```
There were a few Markdown typos in one FAQ answer. It now renders as a proper ascii table.
```
  0c248960
26 Nov, 2025 1 commit
- docs: remove deprecated parameters (#13237) · 8b1b89a9
  EntropyYue authored Nov 26, 2025
  
  8b1b89a9
20 Nov, 2025 6 commits
- app/cmd: update ollama help to navigate to ollama doc instead of github page (#13174) · 47e272c3
  Eva H authored Nov 20, 2025
  
  47e272c3
- app: open app instead of always navigating to / on connect (#13164) · 417a81fd
  Jeffrey Morgan authored Nov 20, 2025
  
  417a81fd
- discovery: fix cuda overlap case (#13176) · dba62ff3
  Daniel Hiltgen authored Nov 20, 2025
```
Recent refactoring introduced a regression for filtering cuda overlap to favor newest supported version.
```
  dba62ff3
- Parser for Cogito v2 (#13145) · d70e9355
  Grace authored Nov 19, 2025
  
  d70e9355
- deepseek2: upgrade to run v3+ models (#13166) · 5c1063df
  Michael Yang authored Nov 19, 2025
```
the check for mla omits v3 and r1 which should not return unsupported.
instead check the tokenizer for compatibility
```
  5c1063df
- kvcache: Run tests both with and without PermutedV · cb485b20
  Jesse Gross authored Nov 19, 2025
```
The causal cache can store data differently depending on what is
best for the backend. We should run tests both ways.
```
  cb485b20
19 Nov, 2025 10 commits

nomic-embed: nomic-embed-text defaulted to ollama runner (#13144) · b2af5096
nicole pardal authored Nov 19, 2025

b2af5096
chore: mark vulkan shaders as vendored files · eac5b8bf
Michael Yang authored Nov 18, 2025

eac5b8bf
models: enable deepseek2 (deepseek v3.1 w/ MLA) on the new engine (#13151) · 604e43b2
Patrick Devine authored Nov 18, 2025

604e43b2

kvcache: Use SetRows to store cache data · 53985b3c

Jesse Gross authored Aug 18, 2025

We currently copy data into the KV cache in contiguous buffers using
ggml_cpy(). ggml_set_rows() was introduced to allow scatter operation
so that contiguous buffers are no longer required. The direct primary
benefit of this is that we no longer need to perform defragmentation.

However, GGML recently removed an optimization for ggml_cpy() and
we picked it up in 544b6739 "ggml update to b6840 (#12791)". This
caused a roughly 40% drop in token generation performance on CUDA
due to CUDA graphs no longer being used. By switching to
ggml_set_rows(), the original optimization is no longer necessary
and CUDA performance is restored.

Fixes #13112

53985b3c

ggml: Automatically make tensors contiguous on reshape · b6e02cbb

Jesse Gross authored Nov 18, 2025

GGML requires tensors to be contiguous for reshape and if
this is not the case, it will assert fail. Contiguous is an
expensive operation, so it's best to do it lazily when it is
actually required rather than ahead of time when it may not
be needed.

b6e02cbb

Renderer for Cogito v2 (#13139) · 91935631
Grace authored Nov 18, 2025

91935631
nomic-embed-text model implementation (#13071) · 8de30b56
nicole pardal authored Nov 18, 2025

8de30b56

win: exit instead of abort (#13138) · 485da9fd

Daniel Hiltgen authored Nov 18, 2025

Calling abort on windows triggers the C++ runtime to attempt a debugger
attach, which causes the crashed runners to hang instead of exit, leading
to a timeout instead of a fast failure during discovery.

485da9fd

cuda: skip large batches · 0796d79d
Michael Yang authored Nov 18, 2025
```
cuda panics on batches larger than 1024 so skip those and fallback to
cpu
```
0796d79d
deepseekocr · 92981ae3
Michael Yang authored Oct 31, 2025

92981ae3

18 Nov, 2025 7 commits
- docs: fix typo in vscode.mdx (#13116) · 8ed1adf3
  Lhiam Andrei Lingco authored Nov 19, 2025
  
  8ed1adf3
- fix(tokenizer): add special tokens to empty inputs (#13091) · 440a3823
  Michael Yang authored Nov 18, 2025
  
  440a3823
- migrate to golangci-lint v2 (#13109) · 718961de
  Michael Yang authored Nov 18, 2025
```
* migrate to golangci-lint v2
* copyloopvar
```
  718961de
- docs: add Void Editor to community integrations (#13124) · 330f62a7
  SamareshSingh authored Nov 17, 2025
```
Void is an open source AI code editor and Cursor alternative that supports
Ollama. It's built on VS Code and allows users to connect directly to Ollama
for private LLM usage without going through a middleman backend.

Key features:
- Open source Cursor alternative
- Direct Ollama integration
- VS Code fork with full compatibility
- Agent mode and MCP support
- Works with any open source model

Fixes #12919
Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
```
  330f62a7
- Add deepseek v3.1 (#13063) · 584e2d64
  Grace authored Nov 17, 2025
```
* Add mla for flash attention
* Revert to using chunks
```
  584e2d64
- app/cmd: restrict ollama:// URL scheme to supported paths (#13120) · 1fd4cb87
  Eva H authored Nov 17, 2025
  
  1fd4cb87
- discover: Support cgroups cores and memory limitations (#10292) · 4aba2e8b
  Cerussite authored Nov 18, 2025
```
* Add supports for cgroups cores and memory limitations

* fix compile error and add logs

* remove cpu info log
```
  4aba2e8b
17 Nov, 2025 3 commits

bring back sysfs based VRAM information for AMD (#12871) · 2f36d769

Daniel Hiltgen authored Nov 17, 2025

* build: optimize dockerfile context for iterating

This moves the copy of the source into the layer AFTER
doing software installs so we don't have to go through
the RPM install for cuda, etc. every time you touch a
source file.

* amd: implement linux sysfs based VRAM lookup

This adds a C++ implementation of sysfs DRM VRAM discovery
for more accurate free VRAM data on linux for AMD GPUs.

2f36d769

ci: fix missing vulkan binaries in linux bundles (#13123) · 399eacf4
Daniel Hiltgen authored Nov 17, 2025

399eacf4
app/ui: fix to point ollama client to ui backend in dev mode (#13079) · 231cc878
Eva H authored Nov 17, 2025

231cc878