Commits · 8b4b243f5fd31000515548e52bf66bcdb72f70e5 · OpenDAS / ollama

17 Nov, 2024 2 commits
- server: fix warnings in prompt_test.go (#7710) · 8b4b243f
  Jeffrey Morgan authored Nov 17, 2024
  
  8b4b243f
- docs: add customization section in linux.md (#7709) · b42a5964
  Jeffrey Morgan authored Nov 17, 2024
  
  b42a5964
16 Nov, 2024 1 commit
- Install support for jetpacks (#7632) · 4759d879
  Daniel Hiltgen authored Nov 15, 2024
```
Follow up to #7217 - merge after release
```
  4759d879
15 Nov, 2024 3 commits

runner.go: Propagate panics back to the user. · d875e99e

Jesse Gross authored Nov 15, 2024

This is a partial revert of 8a35bb92
"runner.go: Increase survivability of main processing loop", removing
the panic handler.

Although we want to avoid errors taking down the runner, we also
should make the user aware of problems when they happen. In the
future, we can restructure things so both parts are true.

d875e99e

runner.go: Increase survivability of main processing loop · 8a35bb92

Jesse Gross authored Nov 14, 2024

Currently, if an error occurs during the prep stages (such as
tokenizing) of a single request, it will only affect that request.
However, if an error happens during decoding, it can take down the
entire runner.

Instead, it's better to drop the tokens that triggered the error and try to
keep going. However, we also need to stop when we run out of tokens,
otherwise, this just causes an infinite loop. This is likely the cause
of at least some of the hanging issues that have been reported.

Bug #7573

8a35bb92

build: fix arm container image (#7674) · a0ea067b
Daniel Hiltgen authored Nov 14, 2024
```
Fix a rebase glitch from the old C++ runner build model
```
a0ea067b

14 Nov, 2024 7 commits

add line numbers for parser errors (#7326) · 4efb98cb
Patrick Devine authored Nov 14, 2024

4efb98cb

chore(deps): bump golang.org/x dependencies (#7655) · 0679d491

Bruce MacDonald authored Nov 14, 2024

- golang.org/x/sync v0.3.0 -> v0.9.0
- golang.org/x/image v0.14.0 -> v0.22.0
- golang.org/x/text v0.15.0 -> v0.20.0

0679d491

runner.go: Don't trim whitespace from inputs · c25ffde9

Jesse Gross authored Nov 13, 2024

It's possible to get prompts that consist entirely of whitespace -
this is most likely to happen when generating embeddings. Currently,
we will trim this away, leaving an empty prompt, which will then
generate an error.

Generating embeddings from whitespace should not trigger an error,
as this may break pipelines. It's better to just leave the whitespace
in place and process what we are given. This is consistent with
past versions of Ollama.

Bug #7578

c25ffde9

runner.go: Enforce NUM_PARALLEL directly in the runner · 17b386a8

Jesse Gross authored Nov 12, 2024

NUM_PARALEL is currently enforced by the Ollama server process - it
will only issue requests to the runner if the maximum number of
concurrent requests has not been exceeded. Although this should
be sufficient, it is good for the runner to protect its own data
structures. Currently, if too many requests get through to the
runner, they will just get stuck and never return.

This may help with reports of Ollama hanging, though it is unclear
how it would actually occur.

Bug #7573

17b386a8

Merge pull request #7657 from ollama/mxyng/sync · 549c2bdf
Michael Yang authored Nov 14, 2024
```
fix(mllama): sync backend between batches
```
549c2bdf
cmd: preserve exact bytes when displaying template/system layers (#7586) · 67691e41
Blake Mizerany authored Nov 13, 2024

67691e41
fix(mllama): sync backend between batches · 5b3393b6
Michael Yang authored Nov 13, 2024

5b3393b6

12 Nov, 2024 8 commits

runner.go: Fix off-by-one for num predicted · d7eb05b9
Jesse Gross authored Nov 12, 2024

d7eb05b9
CI: give windows lint more time (#7635) · 636a743c
Daniel Hiltgen authored Nov 12, 2024
```
It looks like 8 minutes isn't quite enough and we're seeing sporadic timeouts
```
636a743c
Jetpack support for Go server (#7217) · df011054
Daniel Hiltgen authored Nov 12, 2024
```
This adds support for the Jetson JetPack variants into the Go runner
```
df011054

doc: capture numeric group requirement (#6941) · ac07160c

Daniel Hiltgen authored Nov 12, 2024

Docker uses the container filesystem for name resolution, so we can't guide users
to use the name of the host group.  Instead they must specify the numeric ID.

ac07160c

docs: Capture docker cgroup workaround (#7519) · 6606e424

Daniel Hiltgen authored Nov 12, 2024

GPU support can break on some systems after a while.  This captures a
known workaround to solve the problem.

6606e424

runner.go: Make KV entry accounting more robust · 65973ceb

Jesse Gross authored Nov 08, 2024

The structure of the accounting for KV cache shifting was carried
over from the old runner but it now doesn't feel natural with the new
runner. There are a number of invariants that should hold true but
are difficult to reason about. There is at least one bug report
that would imply that the invariants are not holding.

This reduces the number of implicit assumptions and is more forgiving
of unexpected situations. It also improves behavior around which input
tokens are kept when truncation occurs.

Bug #7545

65973ceb

readme: add aichat terminal app to community integrations (#7418) · bebef1e5
Joey Zheng authored Nov 12, 2024

bebef1e5
api: fix typos in Go Doc comments (#7620) · d48c1c5a
Evan authored Nov 11, 2024

d48c1c5a

11 Nov, 2024 4 commits
- readme: add GoLamify to community integrations (#7521) · 36a8372b
  Prasad Bhalerao authored Nov 11, 2024
  
  36a8372b
- readme: add browser extension that enables using Ollama for interacting with web pages (#5827) · 4e94227b
  Ivo Stoykov authored Nov 11, 2024
  
  4e94227b
- docs: add mentions of Llama 3.2 (#7517) · 479d5517
  frances720 authored Nov 10, 2024
  
  479d5517
- api: fix typo in python ClientFromEnvironment docs (#7604) · 76b2b723
  Evan authored Nov 10, 2024
  
  76b2b723
10 Nov, 2024 1 commit
- readme: add llama3.2-vision to model list (#7580) · b8d77cde
  Arhan Busam authored Nov 11, 2024
  
  b8d77cde
08 Nov, 2024 3 commits
- runner.go: Check for zero length images · c2e8cbaa
  Jesse Gross authored Nov 06, 2024
```
If we get a request with a zero length image, it will result in
an out-of-bounds error when we pass the data to the image encoder.
```
  c2e8cbaa
- docs: update langchainpy.md with proper model name (#7527) · 771fab1d
  Edward J. Schwartz authored Nov 08, 2024
  
  771fab1d
- Set macos min version for all architectures (#7579) · 3a5239e6
  Daniel Hiltgen authored Nov 08, 2024
  
  3a5239e6
07 Nov, 2024 5 commits
- win: remove preview title from installer (#7529) · 3d25e7bf
  Daniel Hiltgen authored Nov 07, 2024
```
This should have been in #7347 but was overlooked.
```
  3d25e7bf
- Workaround buggy P2P ROCm copy on windows (#7466) · 1618700c
  Daniel Hiltgen authored Nov 07, 2024
```
This enables the workaround code only for windows which should help windows users with muliple AMD GPUs
```
  1618700c
- Debug logging for nvcuda init (#7532) · b111aa5a
  Daniel Hiltgen authored Nov 07, 2024
```
Some users are reporting crashes during nvcuda.dll initialization
on windows.  This should help narrow down where things are going bad.
```
  b111aa5a
- Align rocm compiler flags (#7467) · 9e83e550
  Daniel Hiltgen authored Nov 07, 2024
```
Bring consistency with the old generate script behavior
```
  9e83e550
- Be explicit for gpu library link dir (#7560) · fc2a0715
  Daniel Hiltgen authored Nov 07, 2024
```
On linux nvcc isn't automatically linking to the same cuda version.
```
  fc2a0715
06 Nov, 2024 3 commits

docs: OLLAMA_NEW_RUNNERS no longer exists · 3020d2dc
Jesse Gross authored Nov 06, 2024

3020d2dc

runner.go: Remove unused arguments · a9094176

Jesse Gross authored Oct 30, 2024

Now that server.cpp is gone, we don't need to keep passing arguments
that were only ignored and only kept for compatibility.

a9094176

sched: Lift parallel restriction for multimodal models except mllama · 6cd56687

Jesse Gross authored Oct 30, 2024

The Go runner does not have a problem with supporting parallel
requests for most multimodal models. Now that we won't be potentially
falling back to server.cpp, this restriction can be lifted.

However, the new mllama model can't support parallel requests, so we
will need to keep a restriction for that.

6cd56687

05 Nov, 2024 3 commits

Update README.md (#7516) · 9d71bcc3

RAPID ARCHITECT authored Nov 05, 2024

added reddit rate below hexabot, ollama powered reddit search and analysis with streamlit for the intervace

9d71bcc3

One corrupt manifest should not wedge model operations (#7515) · a4c70fe1

Daniel Hiltgen authored Nov 05, 2024

One potential failure mode is an empty file which bubbles up as an EOF error,
leading to all pulls and listing operations failing. Instead, continue and
warn about the corrupt manifest. This also allows re-pulling the corrupt
manifest to repair the system.

a4c70fe1

prompt: Use a single token when estimating mllama context size · 34a75102

Jesse Gross authored Nov 04, 2024

Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.

Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.

34a75102