Commits · 6ecd7f64ba36b1d24ea4bb1b73a6dc4234e7d567 · OpenDAS / ollama

25 Feb, 2025 7 commits

docker: upgrade rocm to 6.3.3 (#8211) · 6ecd7f64

José Pekkarinen authored Feb 25, 2025

centos-7 images have been deprecated upstream and replaced with
almalinux-8 images instead, requiring some small extra work.
Signed-off-by: José Pekkarinen <jose.pekkarinen@foxhound.fi>

6ecd7f64

docs: rocm install link (#9346) · 88885567
Chuanhui Liu authored Feb 25, 2025

88885567
fix: add back bf16 support · b16367b4
Michael Yang authored Feb 25, 2025
```
this was accidentally removed when moving fs/ggml from its previous
location
```
b16367b4
build: support Compute Capability 5.0, 5.2 and 5.3 for CUDA 12.x (#8567) · a4993906
Pavol Rusnak authored Feb 25, 2025
```
CUDA 12.x still supports Compute Capability 5.0, 5.2 and 5.3,
so let's build for these architectures as well
```
a4993906
Move cgroups fix out of AMD section. (#9072) · 4df98f3e
frob authored Feb 25, 2025
```
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
4df98f3e

server/internal: copy bmizerany/ollama-go to internal package (#9294) · 348b3e09

Blake Mizerany authored Feb 24, 2025

This commit copies (without history) the bmizerany/ollama-go repository
with the intention of integrating it into the ollama as a replacement
for the pushing, and pulling of models, and management of the cache they
are pushed and pulled from.

New homes for these packages will be determined as they are integrated
and we have a better understanding of proper package boundaries.

348b3e09

sample: add sampling package for new engine (#8410) · 0b7e1676
Parth Sareen authored Feb 24, 2025

0b7e1676

24 Feb, 2025 3 commits
- config: allow setting context length through env var (#8938) · 314573bf
  Parth Sareen authored Feb 24, 2025
```
* envconfig: allow setting context length through env var
```
  314573bf
- go.mod: bump to go1.24 (#9242) · 4604b103
  Blake Mizerany authored Feb 24, 2025
  
  4604b103
- ml/backend/ggml: fix crash on windows paths with wide characters (#9305) · 8c13cfa4
  Jeffrey Morgan authored Feb 23, 2025
  
  8c13cfa4
22 Feb, 2025 2 commits

docs: add additional ROCm docs for building (#9066) · 7cfd4aee
Jeffrey Morgan authored Feb 22, 2025

7cfd4aee

server: group routes by category and purpose (#9270) · 68bac1e0

Blake Mizerany authored Feb 21, 2025

The route assembly in Handler lacked clear organization making it
difficult scan for routes and their relationships to each other. This
commit aims to fix that by reordering the assembly of routes to group
them by category and purpose.

Also, be more specific about what "config" refers to (it is about CORS
if you were wondering... I was.)

68bac1e0

21 Feb, 2025 3 commits

ml: Abstract attention out of model definitions · f53f4198

Jesse Gross authored Feb 14, 2025



There are two benefits to doing this:
 - Provide a library function that models can use, reducing code for
   each model implementation
 - Enables a single place to drop in optimized implementations of
   attention based on the backend or other factors. One is provided for
   GGML.

On CUDA this improves token generation rate by about 3%. It does not
have a significant effect on Metal.
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

f53f4198

ml/backend/ggml: fix rms norm · 2192a28e
Michael Yang authored Feb 20, 2025

2192a28e
docs: add `RockChinQ/LangBot` to integrations list (#9272) · 5d81c1a1
Junyan Qin (Chin) authored Feb 22, 2025

5d81c1a1

20 Feb, 2025 9 commits

models: Prune unused outputs earlier in the forward pass · 5c5535c0

Jesse Gross authored Feb 18, 2025

Currently Rows is called as the last step in a model computation
to get the values for the output tokens. However, if we move it
earlier in the process then we can trim out computations that
never get used. This is similar to how models are defined in
llama.cpp.

Changing the model definition in this way improves token generation
performance by approximately 8%.

5c5535c0

ggml-backend: Don't recreate the scheduler for each context · e5bcc51a

Jesse Gross authored Feb 18, 2025

We don't need to create and destroy the GGML scheduler for every
context. This introduces extra CPU overhead for every forward
pass and extra memory for contexts that don't actually get scheduled
(for example, KV caches). We can instead just have one scheduler
for the backend and reset it each time we call Compute.

This improves token generation performance by 1-2% and removes
scheduler create/destroy from profile traces.

e5bcc51a

ollamarunner: Pass runner performance parameters to backends · bd6a7d5e

Jesse Gross authored Feb 20, 2025

Currently the following parameters are in the runner but not used:
 - numGPULayers
 - mainGPU
 - threads
 - tensorSplit

This passes them through to the backend, which is where they would
actually get used. However, the GGML backend does not yet do anything
with them.

bd6a7d5e

api: document client stream behavior with a test (#8996) · 14b5a9a1

Bruce MacDonald authored Feb 20, 2025

Added unit tests to verify error handling behavior in the Client.stream and Client.do methods.
Tests cover various error scenarios including:
- Error responses with status codes >= 400
- Error messages with successful status codes
- Empty error messages
- Successful responses

14b5a9a1

ci: use clang for windows cpu builds · ba9ec3d0

Michael Yang authored Feb 20, 2025

clang outputs are faster. we were previously building with clang via gcc
wrapper in cgo but this was missed during the build updates so there was
a drop in performance

ba9ec3d0

server: add missing function parens to debug log (#9255) · 7c168b08
frob authored Feb 20, 2025

7c168b08
docs: Add yla to community integrations · 3d4cc783
danielekp authored Feb 20, 2025

3d4cc783
openai: add 'timeout' to allowable x-stainless headers (#9237) · 351a85d9
Lucas Hahn authored Feb 20, 2025

351a85d9
reorder patches · bda4ef6c
Michael Yang authored Feb 19, 2025

bda4ef6c

19 Feb, 2025 5 commits
- Merge pull request #9203 from ollama/mxyng/sapphirerapids · 1e438b23
  Michael Yang authored Feb 19, 2025
```
build: remove backend build for sapphirerapids
```
  1e438b23
- test: add test cases for ListHandler (#9146) · d721a02e
  yuiseki authored Feb 20, 2025
  
  d721a02e
- docs: Add AntSK to Community Integrations (#9214) · 778603a8
  zyxucp authored Feb 20, 2025
  
  778603a8
- docs: Add MaxKB to Community Integrations (#9212) · 3c874df4
  maninhill authored Feb 20, 2025
  
  3c874df4
- llama: add patch to fix ggml backend reg on Linux with utf-8 characters in the path (#9159) · d2eb226c
  Jeffrey Morgan authored Feb 18, 2025
  
  d2eb226c
18 Feb, 2025 9 commits
- Merge pull request #9079 from jeremyschlatter/main · e13e7c8d
  Michael Yang authored Feb 18, 2025
```
cmd: fix flickering in progress bar
```
  e13e7c8d
- address code review comments · 78f403ff
  Jeremy Schlatter authored Feb 18, 2025
  
  78f403ff
- build: remove backend build for sapphirerapids · 5f8c0318
  Michael Yang authored Feb 18, 2025
```
sapphire rapids has amx support but it ends up having a negative
performance impact.

emerald rapids also has amx support with a positive performance impact
however there's no reasonable way in ggml to differentiate between the
two. the impact is small (~6%) so disable amx entirely for simplicity
```
  5f8c0318
- cmake: avoid building intel backends on linux · 08a299e1
  Michael Yang authored Feb 18, 2025
  
  08a299e1
- ci: set owner/group in tarball · 7b5d916a
  Michael Yang authored Feb 14, 2025
```
set owner and group when building the linux tarball so extracted files
are consistent. this is the behaviour of release tarballs in version
0.5.7 and lower
```
  7b5d916a
- Add OpenDeepResearcher-via-searxng to Community Integrations (#9138) · 33ad61b1
  benhaotang authored Feb 18, 2025
  
  33ad61b1
- test: add test cases for HumanNumber (#9108) · 716e3656
  L. Jiang authored Feb 19, 2025
  
  716e3656
- readme: add LLM Telegram Bot to community integrations (#9150) · 3b4424ff
  innightwolfsleep authored Feb 18, 2025
  
  3b4424ff
- cmd: eliminate flickering with synchronized output · f9c7ead1
  Jeremy Schlatter authored Feb 17, 2025
  
  f9c7ead1
17 Feb, 2025 2 commits

cmd: fix cursor flickering in progress bar · 5930aaeb

Jeremy Schlatter authored Feb 17, 2025

The previous commit fixed flickering in the progress bar itself. Cursor
flickering is harder to address.

Cursor flickering could be fixed by hiding the cursor altogether while
the progress bar is displayed. The downside of this is that if the
program is killed in such a way that it can't clean up its state, it
would leave the cursor invisible.

Instead, this commit introduces an output buffer. All of the escape
codes and content for a single progress update are written to a buffer,
which is then flushed to the terminal all at once. This significantly
decreases the time during which the terminal has seen the cursor-hiding
code but has not yet seen the cursor-showing code, thus minimizing (but
not 100% eliminating) cursor flickering.

For more context, see:
https://gitlab.gnome.org/GNOME/vte/-/issues/2837#note_2269501

5930aaeb

cmd: fix progress bar flickering · faf67db0

Jeremy Schlatter authored Feb 17, 2025

Previous code cleared the display before writing new content, creating a
window where the terminal could (and in some cases did) render empty lines.

Instead, we now write new content over the old content, only clearing
the trailing end of lines for cases where the new line is shorter.

Fixes #1664

faf67db0