Commits · 7121dfa309c297f2dc1e9f6f69ab5437a4f1be66 · OpenDAS / ollama

20 Nov, 2024 11 commits
- runner.go: Retry decoding after defragmentation if needed · 7121dfa3
  Jesse Gross authored Nov 19, 2024
```
Fragmentation of the KV cache can occur due to cache shifting or
different sequences getting processed. Decode uses a heuristic to
decide if it should defrag. However, this heuristic isn't 100%
accurate, so decoding can sometimes fail by surprise.

For these cases, if decode indicates that there is no KV cache space,
we should defrag and then try again.
```
  7121dfa3
- runner.go: Use correct index when retrieving embedding results · 5f68fcab
  Jesse Gross authored Nov 19, 2024
```
This doesn't have any impact currently because NUM_PARALLEL is forced
to 1 for embeddings, so both indicies will always be 0.
```
  5f68fcab
- readme: add llm-axe to community integrations (#5931) · ecf41eed
  Emir Sahin authored Nov 20, 2024
  
  ecf41eed
- readme: add a swift community integration (#7383) · b8c66d33
  Marcus Ziadé authored Nov 20, 2024
  
  b8c66d33
- readme: add vibe app to community integrations (#7607) · 303f4bc7
  thewh1teagle authored Nov 20, 2024
  
  303f4bc7
- readme: add opentalkgpt to community integrations (#7707) · d2a25206
  Adarsh Mishra authored Nov 21, 2024
  
  d2a25206
- docs: fix minor typo in import.md (#7764) · 2f0a8c87
  rohitanshu authored Nov 20, 2024
```
change 'containg' to 'containing'
```
  2f0a8c87
- readme: add Abbey to community integrations (#7746) · bfd30f42
  Gordon Kamer authored Nov 19, 2024
  
  bfd30f42
- readme: add Gollama to community integrations (#7756) · 0ef17ede
  Jonathan Hecl authored Nov 20, 2024
  
  0ef17ede
- Improve crash reporting (#7728) · 909a88c5
  Daniel Hiltgen authored Nov 19, 2024
```
Many model crashes are masked behind "An existing connection was forcibly closed by the remote host"
This captures that common error message and wires in any detected errors from the log.

This also adds the deepseek context shift error to the known errors we capture.
```
  909a88c5
- expose underlying error on embedding failure (#7743) · f602ab4d
  Daniel Hiltgen authored Nov 19, 2024
```
Avoid a round-trip asking users for logs to see what went wrong.
```
  f602ab4d
19 Nov, 2024 5 commits

fix(runner): Set logits to 0 if false on Batch.Add · 807ace5b

Gabe Goodhart authored Nov 19, 2024

https://github.com/ollama/ollama/issues/7656


Branch: Granite3StoppingBug-7656
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

807ace5b

server: allow mixed-case model names on push, pull, cp, and create (#7676) · 4b8a2e34

Blake Mizerany authored Nov 19, 2024

This change allows for mixed-case model names to be pushed, pulled,
copied, and created, which was previously disallowed because the Ollama
registry was backed by a Docker registry that enforced a naming
convention that disallowed mixed-case names, which is no longer the
case.

This does not break existing, intended, behaviors.

Also, make TestCase test a story of creating, updating, pulling, and
copying a model with case variations, ensuring the model's manifest is
updated correctly, and not duplicated across different files with
different case variations.

4b8a2e34

Better error suppresion when getting terminal colours (#7739) · e66c2926
frob authored Nov 19, 2024
```
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
e66c2926
update the docs (#7731) · 712d63c3
Patrick Devine authored Nov 18, 2024

712d63c3
readme: add Alfred Ollama to community integrations (#7724) · 6cdf27d1
Patrick Sy authored Nov 19, 2024

6cdf27d1

18 Nov, 2024 5 commits
- Notify the user if systemd is not running (#6693) · 5c18e663
  frob authored Nov 19, 2024
```
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
  5c18e663
- win: add right click menu support (#7727) · 35096a7e
  Daniel Hiltgen authored Nov 18, 2024
```
Enable both left and right click on the pop-up menu
```
  35096a7e
- fix index out of range on zero layer metal load (#7696) · 81d55d3e
  Daniel Hiltgen authored Nov 18, 2024
```
If the model doesn't fit any layers on metal, and we load zero layers
we would panic trying to look up the GPU size during scheduling ops
```
  81d55d3e
- readme: improve Community Integrations section (#7718) · a14f7649
  Vinh Nguyen authored Nov 18, 2024
  
  a14f7649
- readme: add Witsy and multi-llm-ts to community integrations (#7713) · 760cfa27
  Nicolas Bonamy authored Nov 17, 2024
  
  760cfa27
17 Nov, 2024 5 commits
- readme: add Perfect Memory AI to community integrations (#7431) · c9a5aca3
  Darius Kocar authored Nov 17, 2024
  
  c9a5aca3
- readme: add ollama-haskell library to community integrations (#7451) · d5da2ab7
  Tushar Adhatrao authored Nov 18, 2024
  
  d5da2ab7
- readme: add the VT app to the community integrations section (#7706) · 1c041171
  Vinh Nguyen authored Nov 18, 2024
  
  1c041171
- server: fix warnings in prompt_test.go (#7710) · 8b4b243f
  Jeffrey Morgan authored Nov 17, 2024
  
  8b4b243f
- docs: add customization section in linux.md (#7709) · b42a5964
  Jeffrey Morgan authored Nov 17, 2024
  
  b42a5964
16 Nov, 2024 1 commit
- Install support for jetpacks (#7632) · 4759d879
  Daniel Hiltgen authored Nov 15, 2024
```
Follow up to #7217 - merge after release
```
  4759d879
15 Nov, 2024 3 commits

runner.go: Propagate panics back to the user. · d875e99e

Jesse Gross authored Nov 15, 2024

This is a partial revert of 8a35bb92
"runner.go: Increase survivability of main processing loop", removing
the panic handler.

Although we want to avoid errors taking down the runner, we also
should make the user aware of problems when they happen. In the
future, we can restructure things so both parts are true.

d875e99e

runner.go: Increase survivability of main processing loop · 8a35bb92

Jesse Gross authored Nov 14, 2024

Currently, if an error occurs during the prep stages (such as
tokenizing) of a single request, it will only affect that request.
However, if an error happens during decoding, it can take down the
entire runner.

Instead, it's better to drop the tokens that triggered the error and try to
keep going. However, we also need to stop when we run out of tokens,
otherwise, this just causes an infinite loop. This is likely the cause
of at least some of the hanging issues that have been reported.

Bug #7573

8a35bb92

build: fix arm container image (#7674) · a0ea067b
Daniel Hiltgen authored Nov 14, 2024
```
Fix a rebase glitch from the old C++ runner build model
```
a0ea067b

14 Nov, 2024 7 commits

add line numbers for parser errors (#7326) · 4efb98cb
Patrick Devine authored Nov 14, 2024

4efb98cb

chore(deps): bump golang.org/x dependencies (#7655) · 0679d491

Bruce MacDonald authored Nov 14, 2024

- golang.org/x/sync v0.3.0 -> v0.9.0
- golang.org/x/image v0.14.0 -> v0.22.0
- golang.org/x/text v0.15.0 -> v0.20.0

0679d491

runner.go: Don't trim whitespace from inputs · c25ffde9

Jesse Gross authored Nov 13, 2024

It's possible to get prompts that consist entirely of whitespace -
this is most likely to happen when generating embeddings. Currently,
we will trim this away, leaving an empty prompt, which will then
generate an error.

Generating embeddings from whitespace should not trigger an error,
as this may break pipelines. It's better to just leave the whitespace
in place and process what we are given. This is consistent with
past versions of Ollama.

Bug #7578

c25ffde9

runner.go: Enforce NUM_PARALLEL directly in the runner · 17b386a8

Jesse Gross authored Nov 12, 2024

NUM_PARALEL is currently enforced by the Ollama server process - it
will only issue requests to the runner if the maximum number of
concurrent requests has not been exceeded. Although this should
be sufficient, it is good for the runner to protect its own data
structures. Currently, if too many requests get through to the
runner, they will just get stuck and never return.

This may help with reports of Ollama hanging, though it is unclear
how it would actually occur.

Bug #7573

17b386a8

Merge pull request #7657 from ollama/mxyng/sync · 549c2bdf
Michael Yang authored Nov 14, 2024
```
fix(mllama): sync backend between batches
```
549c2bdf
cmd: preserve exact bytes when displaying template/system layers (#7586) · 67691e41
Blake Mizerany authored Nov 13, 2024

67691e41
fix(mllama): sync backend between batches · 5b3393b6
Michael Yang authored Nov 13, 2024

5b3393b6

12 Nov, 2024 3 commits
- runner.go: Fix off-by-one for num predicted · d7eb05b9
  Jesse Gross authored Nov 12, 2024
  
  d7eb05b9
- CI: give windows lint more time (#7635) · 636a743c
  Daniel Hiltgen authored Nov 12, 2024
```
It looks like 8 minutes isn't quite enough and we're seeing sporadic timeouts
```
  636a743c
- Jetpack support for Go server (#7217) · df011054
  Daniel Hiltgen authored Nov 12, 2024
```
This adds support for the Jetson JetPack variants into the Go runner
```
  df011054