Commits · 53985b3c4d94f22517e4090696a5b8ecd06caedb · OpenDAS / ollama

19 Nov, 2025 7 commits

kvcache: Use SetRows to store cache data · 53985b3c

Jesse Gross authored Aug 18, 2025

We currently copy data into the KV cache in contiguous buffers using
ggml_cpy(). ggml_set_rows() was introduced to allow scatter operation
so that contiguous buffers are no longer required. The direct primary
benefit of this is that we no longer need to perform defragmentation.

However, GGML recently removed an optimization for ggml_cpy() and
we picked it up in 544b6739 "ggml update to b6840 (#12791)". This
caused a roughly 40% drop in token generation performance on CUDA
due to CUDA graphs no longer being used. By switching to
ggml_set_rows(), the original optimization is no longer necessary
and CUDA performance is restored.

Fixes #13112

53985b3c

ggml: Automatically make tensors contiguous on reshape · b6e02cbb

Jesse Gross authored Nov 18, 2025

GGML requires tensors to be contiguous for reshape and if
this is not the case, it will assert fail. Contiguous is an
expensive operation, so it's best to do it lazily when it is
actually required rather than ahead of time when it may not
be needed.

b6e02cbb

Renderer for Cogito v2 (#13139) · 91935631
Grace authored Nov 18, 2025

91935631
nomic-embed-text model implementation (#13071) · 8de30b56
nicole pardal authored Nov 18, 2025

8de30b56

win: exit instead of abort (#13138) · 485da9fd

Daniel Hiltgen authored Nov 18, 2025

Calling abort on windows triggers the C++ runtime to attempt a debugger
attach, which causes the crashed runners to hang instead of exit, leading
to a timeout instead of a fast failure during discovery.

485da9fd

cuda: skip large batches · 0796d79d
Michael Yang authored Nov 18, 2025
```
cuda panics on batches larger than 1024 so skip those and fallback to
cpu
```
0796d79d
deepseekocr · 92981ae3
Michael Yang authored Oct 31, 2025

92981ae3

18 Nov, 2025 7 commits
- docs: fix typo in vscode.mdx (#13116) · 8ed1adf3
  Lhiam Andrei Lingco authored Nov 19, 2025
  
  8ed1adf3
- fix(tokenizer): add special tokens to empty inputs (#13091) · 440a3823
  Michael Yang authored Nov 18, 2025
  
  440a3823
- migrate to golangci-lint v2 (#13109) · 718961de
  Michael Yang authored Nov 18, 2025
```
* migrate to golangci-lint v2
* copyloopvar
```
  718961de
- docs: add Void Editor to community integrations (#13124) · 330f62a7
  SamareshSingh authored Nov 17, 2025
```
Void is an open source AI code editor and Cursor alternative that supports
Ollama. It's built on VS Code and allows users to connect directly to Ollama
for private LLM usage without going through a middleman backend.

Key features:
- Open source Cursor alternative
- Direct Ollama integration
- VS Code fork with full compatibility
- Agent mode and MCP support
- Works with any open source model

Fixes #12919
Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
```
  330f62a7
- Add deepseek v3.1 (#13063) · 584e2d64
  Grace authored Nov 17, 2025
```
* Add mla for flash attention
* Revert to using chunks
```
  584e2d64
- app/cmd: restrict ollama:// URL scheme to supported paths (#13120) · 1fd4cb87
  Eva H authored Nov 17, 2025
  
  1fd4cb87
- discover: Support cgroups cores and memory limitations (#10292) · 4aba2e8b
  Cerussite authored Nov 18, 2025
```
* Add supports for cgroups cores and memory limitations

* fix compile error and add logs

* remove cpu info log
```
  4aba2e8b
17 Nov, 2025 4 commits
- bring back sysfs based VRAM information for AMD (#12871) · 2f36d769
  Daniel Hiltgen authored Nov 17, 2025
```
* build: optimize dockerfile context for iterating

This moves the copy of the source into the layer AFTER
doing software installs so we don't have to go through
the RPM install for cuda, etc. every time you touch a
source file.

* amd: implement linux sysfs based VRAM lookup

This adds a C++ implementation of sysfs DRM VRAM discovery
for more accurate free VRAM data on linux for AMD GPUs.
```
  2f36d769
- ci: fix missing vulkan binaries in linux bundles (#13123) · 399eacf4
  Daniel Hiltgen authored Nov 17, 2025
  
  399eacf4
- app/ui: fix to point ollama client to ui backend in dev mode (#13079) · 231cc878
  Eva H authored Nov 17, 2025
  
  231cc878
- docs: link to ollama.com instead of hardcoding list of cloud models (#13110) · aa676b31
  Jeffrey Morgan authored Nov 16, 2025
  
  aa676b31
16 Nov, 2025 6 commits
- docs: fix typos in repository documentation (#10683) · dd0ed0ef
  omahs authored Nov 16, 2025
  
  dd0ed0ef
- readme: add Kdeps to community integrations (#11877) · d5649821
  Joel Bryan Juliano authored Nov 16, 2025
```
Kdeps is an AI framework for building Dockerized full-stack AI
applications declaratively and uses Ollama LLM models on the
backend
```
  d5649821
- server: clean up manifest documentation (#12995) · 4cea757e
  pierwill authored Nov 15, 2025
```
Co-authored-by: pierwill <pierwill@users.noreply.github.com>
```
  4cea757e
- llama: test case typo and readability improvements (#13078) · a751bc15
  Vignesh Skanda authored Nov 16, 2025
  
  a751bc15
- discover: fix typos in runner.go (#13096) · 5d31242f
  Laurențiu Nicola authored Nov 16, 2025
  
  5d31242f
- tests: basic benchmarking test framework (#12964) · d7fd7219
  Patrick Devine authored Nov 15, 2025
```
This change adds a basic benchmarking test framework for Ollama which can
be used to determine the prefill, eval, load duration, and total duration
for running a given model or models.
```
  d7fd7219
14 Nov, 2025 2 commits
- log: warn if user overrides detected (#13088) · 72ff5b9d
  Daniel Hiltgen authored Nov 14, 2025
```
Many failed GPU discovery issues recently can be traced to incorrect override settings.
This extra logging should help quickly spot these and guide users to try unsetting them first.
```
  72ff5b9d
- docs: add logprobs to openapi (#13090) · ce29f695
  Parth Sareen authored Nov 14, 2025
  
  ce29f695
13 Nov, 2025 9 commits
- fix tensor merge (#13053) · 12b174b1
  Michael Yang authored Nov 13, 2025
  
  12b174b1
- chore: update models to use slice/chunk/chunksections (#12934) · 333203d8
  Michael Yang authored Nov 13, 2025
```
* use slice/chunks

* bert

* llama4

* gemma3n

* gptoss

* mistral3

* qwen3vl

* qwen25vl

* deepseek2

* remove unused ops
```
  333203d8
- logprob: add bytes to logprobs (#13068) · c1149875
  Parth Sareen authored Nov 13, 2025
  
  c1149875
- ml: add slice operation (#12870) · b48083f3
  Michael Yang authored Nov 13, 2025
```
* slice

* chunk, chunksections
```
  b48083f3
- embeddings: added cli command to embedding docs (#12993) · 482bec82
  nicole pardal authored Nov 13, 2025
  
  482bec82
- docs: fix typo (VSCode -> VS Code) (#13072) · 684a9a8c
  Kowyo authored Nov 13, 2025
  
  684a9a8c
- app: remove source code for previous JavaScript-based macOS app (#13067) · 54a76d37
  Jeffrey Morgan authored Nov 12, 2025
```
The code in this directory has been replaced with the
new Go version in the 'app' directory.
```
  54a76d37
- readme: add AI UI to community integrations (#13035) · 8a75d8b0
  Radhi authored Nov 13, 2025
  
  8a75d8b0
- readme: fix incorrect header in community integrations (#13065) · f2063574
  Jeffrey Morgan authored Nov 12, 2025
  
  f2063574
12 Nov, 2025 3 commits

ci: fix win vulkan (#13062) · 8224cd90
Daniel Hiltgen authored Nov 12, 2025

8224cd90

Enable Vulkan with a temporary opt-in setting (#12931) · 6286d9a3

Daniel Hiltgen authored Nov 12, 2025

* docs: vulkan information

* Revert "CI: Set up temporary opt-out Vulkan support (#12614)"

This reverts commit 8b6e5bae.

* vulkan: temporary opt-in for Vulkan support

Revert this once we're ready to enable by default.

* win: add vulkan CI build

6286d9a3

vulkan: temporary cary of vulkan fixes (#12971) · 3a9e8e9f
Daniel Hiltgen authored Nov 12, 2025
```
This should be reverted once we update ggml past b6897
```
3a9e8e9f

11 Nov, 2025 2 commits
- docs: rename api-reference.md back to api.md since redirect stopped working (#13056) · cb1cb064
  Jeffrey Morgan authored Nov 11, 2025
  
  cb1cb064
- docs: fix openapi.yaml warnings, rename api.md to api-reference.md (#12904) · 2d5e066c
  Jeffrey Morgan authored Nov 11, 2025
  
  2d5e066c