Commits · 564b558c92973ae9eda4ad585359e7f39b2dbff2 · OpenDAS / ollama

17 Sep, 2025 1 commit
- fix(llama): other llama flavours (#12308) · 564b558c
  Michael Yang authored Sep 17, 2025
```
* fix(llama): rope scale

* spm llama

* skip moe models

* cleanup
```
  564b558c
16 Sep, 2025 1 commit
- use split activations when possible (#12293) · ad95d5b3
  Michael Yang authored Sep 16, 2025
```
* use ggml_*_split activations when possible

* forward qkv
```
  ad95d5b3
15 Sep, 2025 2 commits

model: implement bert in ollama engine (#9080) · 3f6642f6

Michael Yang authored Sep 15, 2025

* fix truncate

* s/SentencePieceModel/SentencePiece/

* bert

* wordpiece

* refactor pooling

* more tokenizers

* normalize embeddings

3f6642f6

batch: use tensors for outputs (#12185) · 6f711714
Michael Yang authored Sep 15, 2025
```
this cleans up the model interface slightly without too much impact in
other areas
```
6f711714

29 Jul, 2025 1 commit

Increase performance for Gemma3n models on NVGPUs by enabling CUDA Graph execution (#11525) · ea85e27b

Oliver Simons authored Jul 29, 2025

* Enable CUDA Graphs for gemma3n.

Similar to
https://github.com/ggml-org/llama.cpp/pull/14741,
though ollama has a slightly different model graph
than llama.cpp which requires different workaround
checks.

* Remove residual check by reshaping differently in gemma3n model

This should make the heuristics more robust

ea85e27b

27 Jun, 2025 1 commit
- chore: cleanup comments + unused vars (#11225) · 4129af92
  Michael Yang authored Jun 27, 2025
  
  4129af92
26 Jun, 2025 1 commit

add new gemma model (#11204) · 73b642e6

Michael Yang authored Jun 25, 2025

* update patches

* cherry pick metal mean kernel

* cherry pick cuda mean kernel

* gemma3n

73b642e6