Commits · 485da9fd358eb17ba0a659a554f44be6c72efd2e · OpenDAS / ollama

19 Nov, 2025 3 commits

win: exit instead of abort (#13138) · 485da9fd

Daniel Hiltgen authored Nov 18, 2025

Calling abort on windows triggers the C++ runtime to attempt a debugger
attach, which causes the crashed runners to hang instead of exit, leading
to a timeout instead of a fast failure during discovery.

485da9fd

cuda: skip large batches · 0796d79d
Michael Yang authored Nov 18, 2025
```
cuda panics on batches larger than 1024 so skip those and fallback to
cpu
```
0796d79d
deepseekocr · 92981ae3
Michael Yang authored Oct 31, 2025

92981ae3

18 Nov, 2025 7 commits
- docs: fix typo in vscode.mdx (#13116) · 8ed1adf3
  Lhiam Andrei Lingco authored Nov 19, 2025
  
  8ed1adf3
- fix(tokenizer): add special tokens to empty inputs (#13091) · 440a3823
  Michael Yang authored Nov 18, 2025
  
  440a3823
- migrate to golangci-lint v2 (#13109) · 718961de
  Michael Yang authored Nov 18, 2025
```
* migrate to golangci-lint v2
* copyloopvar
```
  718961de
- docs: add Void Editor to community integrations (#13124) · 330f62a7
  SamareshSingh authored Nov 17, 2025
```
Void is an open source AI code editor and Cursor alternative that supports
Ollama. It's built on VS Code and allows users to connect directly to Ollama
for private LLM usage without going through a middleman backend.

Key features:
- Open source Cursor alternative
- Direct Ollama integration
- VS Code fork with full compatibility
- Agent mode and MCP support
- Works with any open source model

Fixes #12919
Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
```
  330f62a7
- Add deepseek v3.1 (#13063) · 584e2d64
  Grace authored Nov 17, 2025
```
* Add mla for flash attention
* Revert to using chunks
```
  584e2d64
- app/cmd: restrict ollama:// URL scheme to supported paths (#13120) · 1fd4cb87
  Eva H authored Nov 17, 2025
  
  1fd4cb87
- discover: Support cgroups cores and memory limitations (#10292) · 4aba2e8b
  Cerussite authored Nov 18, 2025
```
* Add supports for cgroups cores and memory limitations

* fix compile error and add logs

* remove cpu info log
```
  4aba2e8b
17 Nov, 2025 4 commits
- bring back sysfs based VRAM information for AMD (#12871) · 2f36d769
  Daniel Hiltgen authored Nov 17, 2025
```
* build: optimize dockerfile context for iterating

This moves the copy of the source into the layer AFTER
doing software installs so we don't have to go through
the RPM install for cuda, etc. every time you touch a
source file.

* amd: implement linux sysfs based VRAM lookup

This adds a C++ implementation of sysfs DRM VRAM discovery
for more accurate free VRAM data on linux for AMD GPUs.
```
  2f36d769
- ci: fix missing vulkan binaries in linux bundles (#13123) · 399eacf4
  Daniel Hiltgen authored Nov 17, 2025
  
  399eacf4
- app/ui: fix to point ollama client to ui backend in dev mode (#13079) · 231cc878
  Eva H authored Nov 17, 2025
  
  231cc878
- docs: link to ollama.com instead of hardcoding list of cloud models (#13110) · aa676b31
  Jeffrey Morgan authored Nov 16, 2025
  
  aa676b31
16 Nov, 2025 6 commits
- docs: fix typos in repository documentation (#10683) · dd0ed0ef
  omahs authored Nov 16, 2025
  
  dd0ed0ef
- readme: add Kdeps to community integrations (#11877) · d5649821
  Joel Bryan Juliano authored Nov 16, 2025
```
Kdeps is an AI framework for building Dockerized full-stack AI
applications declaratively and uses Ollama LLM models on the
backend
```
  d5649821
- server: clean up manifest documentation (#12995) · 4cea757e
  pierwill authored Nov 15, 2025
```
Co-authored-by: pierwill <pierwill@users.noreply.github.com>
```
  4cea757e
- llama: test case typo and readability improvements (#13078) · a751bc15
  Vignesh Skanda authored Nov 16, 2025
  
  a751bc15
- discover: fix typos in runner.go (#13096) · 5d31242f
  Laurențiu Nicola authored Nov 16, 2025
  
  5d31242f
- tests: basic benchmarking test framework (#12964) · d7fd7219
  Patrick Devine authored Nov 15, 2025
```
This change adds a basic benchmarking test framework for Ollama which can
be used to determine the prefill, eval, load duration, and total duration
for running a given model or models.
```
  d7fd7219
14 Nov, 2025 2 commits
- log: warn if user overrides detected (#13088) · 72ff5b9d
  Daniel Hiltgen authored Nov 14, 2025
```
Many failed GPU discovery issues recently can be traced to incorrect override settings.
This extra logging should help quickly spot these and guide users to try unsetting them first.
```
  72ff5b9d
- docs: add logprobs to openapi (#13090) · ce29f695
  Parth Sareen authored Nov 14, 2025
  
  ce29f695
13 Nov, 2025 9 commits
- fix tensor merge (#13053) · 12b174b1
  Michael Yang authored Nov 13, 2025
  
  12b174b1
- chore: update models to use slice/chunk/chunksections (#12934) · 333203d8
  Michael Yang authored Nov 13, 2025
```
* use slice/chunks

* bert

* llama4

* gemma3n

* gptoss

* mistral3

* qwen3vl

* qwen25vl

* deepseek2

* remove unused ops
```
  333203d8
- logprob: add bytes to logprobs (#13068) · c1149875
  Parth Sareen authored Nov 13, 2025
  
  c1149875
- ml: add slice operation (#12870) · b48083f3
  Michael Yang authored Nov 13, 2025
```
* slice

* chunk, chunksections
```
  b48083f3
- embeddings: added cli command to embedding docs (#12993) · 482bec82
  nicole pardal authored Nov 13, 2025
  
  482bec82
- docs: fix typo (VSCode -> VS Code) (#13072) · 684a9a8c
  Kowyo authored Nov 13, 2025
  
  684a9a8c
- app: remove source code for previous JavaScript-based macOS app (#13067) · 54a76d37
  Jeffrey Morgan authored Nov 12, 2025
```
The code in this directory has been replaced with the
new Go version in the 'app' directory.
```
  54a76d37
- readme: add AI UI to community integrations (#13035) · 8a75d8b0
  Radhi authored Nov 13, 2025
  
  8a75d8b0
- readme: fix incorrect header in community integrations (#13065) · f2063574
  Jeffrey Morgan authored Nov 12, 2025
  
  f2063574
12 Nov, 2025 3 commits

ci: fix win vulkan (#13062) · 8224cd90
Daniel Hiltgen authored Nov 12, 2025

8224cd90

Enable Vulkan with a temporary opt-in setting (#12931) · 6286d9a3

Daniel Hiltgen authored Nov 12, 2025

* docs: vulkan information

* Revert "CI: Set up temporary opt-out Vulkan support (#12614)"

This reverts commit 8b6e5bae.

* vulkan: temporary opt-in for Vulkan support

Revert this once we're ready to enable by default.

* win: add vulkan CI build

6286d9a3

vulkan: temporary cary of vulkan fixes (#12971) · 3a9e8e9f
Daniel Hiltgen authored Nov 12, 2025
```
This should be reverted once we update ggml past b6897
```
3a9e8e9f

11 Nov, 2025 6 commits

docs: rename api-reference.md back to api.md since redirect stopped working (#13056) · cb1cb064
Jeffrey Morgan authored Nov 11, 2025

cb1cb064
docs: fix openapi.yaml warnings, rename api.md to api-reference.md (#12904) · 2d5e066c
Jeffrey Morgan authored Nov 11, 2025

2d5e066c

docs/openapi: document that delete and copy responses are empty (#13055) · 15968714

Bruce MacDonald authored Nov 11, 2025

Some route endpoints return an empty response with a 200 OK. These should be documented in the OpenAPI doc. Note that the previous deletion response was not correct.

15968714

llm: Prefer dedicated GPUs over iGPUs when allocating memory · 8bf38552

Jesse Gross authored Nov 04, 2025

We currently assign model layers to GPUs according to free VRAM,
which assumes that GPU performance is roughly equal. This does not
work well for mixed dGPU and iGPU systems because iGPUs typically
use system memory which is large but their performance is slow.
This instead assigns layers to dGPUs first and then iGPUs.

In the future, this could be generalized to have a more fine grained
notion of GPU performance but dGPU vs. iGPU performance is the most
extreme.

8bf38552

llm: Separate llamaServer and ollamaServer code paths · b13fbad0

Jesse Gross authored Nov 06, 2025

Originally, llamaServer represented old memory estimates, which
could be used with either the old or new engine. ollamaServer was
used only for the new estimates and new engine. Since these
implementations did not map directly to engine, there was engine-
specific code in common code paths.

Now that new estimates are always used for the new engine, there is
a direct mapping between server type and engine. This separates out
most of the engine-specific code into the correct implementation
to make things easier to understand.

b13fbad0

llm: Use Ollama engine memory layouts for both old and new engines · f560bd07

Jesse Gross authored Nov 05, 2025

Currently for both the old and new engines, there is code to
calculate how much memory is required for a model and lay out
the layers onto GPUs. This reuses the new engine's lay out code
for the old engine as well, bringing them closer together. The
old engine continues to use its current method of estimating
required memory.

This reduces maintainence effort and improves consistency, as new
features only need to be implemented in one place. The newer code
is also more accurate, especially with multiple GPUs.

f560bd07