Commits · 3478b2cf14c3fa2661c03f7fd5764a63a496293a · OpenDAS / ollama

23 Nov, 2024 1 commit

runner.go: Fix deadlock with many concurrent requests · 3478b2cf

Jesse Gross authored Nov 22, 2024

If there are no avilable slots for new sequences then a request
will not be added to the processing queue but will continue on
to wait for a response that never comes. Besides never giving a
response to the request, this prevents the model from being
unloaded due to the outstanding request.

To prevent this, there are semaphores that prevent more requests
from being processed than there are slots - one in the Ollama
server and one in the runner.
 - The Ollama server one works but it is not designed to protect
the runner's data internal structures and the runner can return a
final response before clearing its data structures.
 - The internal runner semaphore has similar behavior where it
 can release the semaphore when it issues a response. This is
 wrong - it should only release the semaphore after it has
 cleared the data structure.

In addition, we should return an error if a slot is not found
rather than deadlocking in the event we ever get to this spot.

Fixes #7779

3478b2cf

22 Nov, 2024 8 commits

server: remove out of date anonymous access check (#7785) · 7b5585b9

Bruce MacDonald authored Nov 22, 2024

In the past the ollama.com server would return a JWT that contained
information about the user being authenticated. This was used to return
different error messages to the user. This is no longer possible since the
token used to authenticate does not contain information about the user
anymore. Removing this code that no longer works.

Follow up changes will improve the error messages returned here, but good to
clean up first.

7b5585b9

tests: fix max queue integration test (#7782) · f0a35181
Daniel Hiltgen authored Nov 22, 2024
```
This had fallen out of sync with the envconfig behavior, where max queue default was not zero.
```
f0a35181

logs: explain client aborts better (#7783) · b85520bf

Daniel Hiltgen authored Nov 22, 2024

Users get confused by "Failed to acquire semaphore" error="context canceled"
messages in the logs, which are actually clients giving up. While there could be
a legitimate hang bug in the system, sometimes this is just short client timeouts
with an overloaded system, so this should help users understand what's going on
better.

b85520bf

Be quiet when redirecting output (#7360) · d88972ea

Daniel Hiltgen authored Nov 22, 2024

This avoids emitting the progress indicators to stderr, and the interactive
prompts to the output file or pipe. Running "ollama run model > out.txt"
now exits immediately, and "echo hello | ollama run model > out.txt"
produces zero stderr output and a typical response in out.txt

d88972ea

readme: add Local Multimodal AI Chat app to community integrations (#6931) · 25c9339e
Leon Sander authored Nov 22, 2024

25c9339e
readme: update google/uuid module (#7310) · 597072ef
Mikel Olasagasti Uranga authored Nov 22, 2024
```
update uuid.New().String() to uuid.NewString()
```
597072ef
readme: add ollamarama-matrix to community integrations (#7325) · 84b3e07f
Dustin authored Nov 21, 2024

84b3e07f
readme: add x-cmd ollama module to community integrations (#5191) · 422d5285
Edwin.JH.Lee authored Nov 22, 2024

422d5285

21 Nov, 2024 28 commits
- readme: add OrionChat to community integrations (#7084) · 723f2858
  Elias authored Nov 21, 2024
```
OrionChat is a free web-based chat interface that simplifies interactions
with multiple AI model providers. It provides a unified platform for chatting
and exploring multiple large language models (LLMs).
```
  723f2858
- cmd: delete duplicated call to sb.Reset() (#7308) · eaaf5d30
  湛露先生 authored Nov 22, 2024
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
  eaaf5d30
- docs: remove tutorials, add cloud section to community integrations (#7784) · 27d9c749
  Jeffrey Morgan authored Nov 21, 2024
  
  27d9c749
- env.sh: cleanup unused RELEASE_IMAGE_REPO (#6855) · b7bddeeb
  R0CKSTAR authored Nov 22, 2024
```
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
```
  b7bddeeb
- readme: add terminal tool ParLlama to community integrations (#5623) · 6a0c2ec5
  Paul Robello authored Nov 21, 2024
  
  6a0c2ec5
- readme: add a community made ollama web management tool (#7126) · baa41be2
  毛巳煜 authored Nov 21, 2024
  
  baa41be2
- readme: add Terraform AWS Ollama & Open WebUI community example (#5633) · 2157b123
  xuyangbocn authored Nov 21, 2024
  
  2157b123
- readme: add R2R to community integrations (#5587) · 37711578
  emrgnt-cmplxty authored Nov 21, 2024
  
  37711578
- readme: Add Nosia to Community Integrations (#5381) · fb2c9594
  Cyril Blaecke authored Nov 21, 2024
  
  fb2c9594
- readme: Add Spring AI library reference (#5981) · 7fbcd55d
  Christian Tzolov authored Nov 21, 2024
  
  7fbcd55d
- readme: add Parakeet to community integrations · b4348bdd
  Philippe Charrière authored Nov 21, 2024
```
Parakeet is a GoLang SDK for Ollama

---------
Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
```
  b4348bdd
- readme: add community integration py-gpt (#6503) · 155734e0
  Marcin Szczygliński authored Nov 21, 2024
  
  155734e0
- readme: add Promptery to community integrations (#7093) · 883d80e0
  Michael authored Nov 21, 2024
  
  883d80e0
- readme: add node-red-contrib-ollama to community integrations (#4648) · e4c9f75b
  Jakub Burkiewicz authored Nov 21, 2024
  
  e4c9f75b
- readme: add ollama grid search, a community project (#4301) · f5ec7cc8
  Dezoito authored Nov 21, 2024
  
  f5ec7cc8
- readme: Add LLPhant to community integrations (#5679) · 811bafba
  Franco Lombardo authored Nov 21, 2024
  
  811bafba
- readme: add autogpt integration to list of community integrations (#6459) · 431075fc
  Aarushi authored Nov 21, 2024
  
  431075fc
- readme: add community contribution to readme ollama-kis (#5575) · c4f27225
  Kevin Brake authored Nov 21, 2024
  
  c4f27225
- readme: Add tkinter-based client to community based integrations (#5412) · b7aa5ee0
  chyok authored Nov 21, 2024
  
  b7aa5ee0
- readme: add Shinkai Desktop to community integrations (#4877) · 3f87f717
  Nico authored Nov 21, 2024
  
  3f87f717
- readme: add OpenGPA to community integrations (#5497) · 20623cec
  Laurent Eschenauer authored Nov 21, 2024
  
  20623cec
- readme: add Haverscript to community integrations (#6945) · 0e5f31a8
  Andy Gill authored Nov 21, 2024
```
Haverscript uses classical functional programming techniques to provide a composable interface for interacting with ollama-hosted LLMs.
```
  0e5f31a8
- readme: Terminal app bb7 to community integrations (#7064) · 7e920917
  drunkwcodes authored Nov 21, 2024
  
  7e920917
- readme: update AMD ROCm links (#7213) · 1a742f54
  boessu authored Nov 21, 2024
  
  1a742f54
- readme: flutter-based chat app to community integrations (#7221) · 6a89dcf8
  奶茶叔叔 authored Nov 21, 2024
  
  6a89dcf8
- readme: orbiton to community integrations (#7770) · c5e238e8
  Alexander F. Rødseth authored Nov 21, 2024
  
  c5e238e8
- app: typo in wintray messages const (#7705) · fce30f40
  Nikita Ganzikov authored Nov 21, 2024
  
  fce30f40
- docs: Link to AMD guide on multi-GPU guidance (#7744) · d8632982
  Daniel Hiltgen authored Nov 20, 2024
  
  d8632982
20 Nov, 2024 3 commits

runner.go: Truncate inputs that exceed context rather than shifting · c4b34f2a

Jesse Gross authored Nov 20, 2024

Previous versions of the runner would truncate inputs to the context
window before beginning processing. The main processing loop relied
on this behavior if the context needed to be shifted later (due to
token generation). If truncation did not occur then invariants
would be broken, causing crashes or infinite loops.

Later versions attempted to fix these bugs and make the logic less
subtle so that all inputs could be handled. Truncation was removed
to make things consistent.

However, truncation is much faster than processing and shifting, so
removing it caused performance problems when the input vastly exceeded
the context size. This restores the input truncation as a performance
optimization while keeping the more robust processing logic.

Fixes #7762

c4b34f2a

runner.go: Don't add inputs to cache view until actually processed · c3ff9164

Jesse Gross authored Nov 19, 2024

We need to track which tokens are in the cache ourselves. We currently
add tokens to the cache tracker when we add them to batch but they are
not actually in the cache until we call Decode. This can cause
confusion when we are shifting the cache.

Avoids "could not find a KV slot for the batch" issues.

Bug #7545

c3ff9164

runner.go: Hard fail on errors rather than potentially infinite looping · 3fc1dc0e

Jesse Gross authored Nov 19, 2024

We try to recover from errors by dropping the tokens that caused the
problem and re-trying. However, dropping the tokens is not correct
and continuing often leads to infinite loops. To avoid, this we
end the sequence if such a condition is detected, which is also
surprising.

At this point, it is better to just report the error. This will make
it easier to find problems and the alternatives are perhaps even more
surprising to users.

This is not a very satisfactory solution either - we should isolate
the error and return it to the user without killing the whole process.
However, this is an incremental step and consistent with most other
failures (which either manifest as abort() or panic).

3fc1dc0e