Commits · c25ffde91d3d2f8913224ac9bbc28736a4981fa3 · OpenDAS / ollama

14 Nov, 2024 5 commits

runner.go: Don't trim whitespace from inputs · c25ffde9

Jesse Gross authored Nov 13, 2024

It's possible to get prompts that consist entirely of whitespace -
this is most likely to happen when generating embeddings. Currently,
we will trim this away, leaving an empty prompt, which will then
generate an error.

Generating embeddings from whitespace should not trigger an error,
as this may break pipelines. It's better to just leave the whitespace
in place and process what we are given. This is consistent with
past versions of Ollama.

Bug #7578

c25ffde9

runner.go: Enforce NUM_PARALLEL directly in the runner · 17b386a8

Jesse Gross authored Nov 12, 2024

NUM_PARALEL is currently enforced by the Ollama server process - it
will only issue requests to the runner if the maximum number of
concurrent requests has not been exceeded. Although this should
be sufficient, it is good for the runner to protect its own data
structures. Currently, if too many requests get through to the
runner, they will just get stuck and never return.

This may help with reports of Ollama hanging, though it is unclear
how it would actually occur.

Bug #7573

17b386a8

Merge pull request #7657 from ollama/mxyng/sync · 549c2bdf
Michael Yang authored Nov 14, 2024
```
fix(mllama): sync backend between batches
```
549c2bdf
cmd: preserve exact bytes when displaying template/system layers (#7586) · 67691e41
Blake Mizerany authored Nov 13, 2024

67691e41
fix(mllama): sync backend between batches · 5b3393b6
Michael Yang authored Nov 13, 2024

5b3393b6

12 Nov, 2024 8 commits

runner.go: Fix off-by-one for num predicted · d7eb05b9
Jesse Gross authored Nov 12, 2024

d7eb05b9
CI: give windows lint more time (#7635) · 636a743c
Daniel Hiltgen authored Nov 12, 2024
```
It looks like 8 minutes isn't quite enough and we're seeing sporadic timeouts
```
636a743c
Jetpack support for Go server (#7217) · df011054
Daniel Hiltgen authored Nov 12, 2024
```
This adds support for the Jetson JetPack variants into the Go runner
```
df011054

doc: capture numeric group requirement (#6941) · ac07160c

Daniel Hiltgen authored Nov 12, 2024

Docker uses the container filesystem for name resolution, so we can't guide users
to use the name of the host group.  Instead they must specify the numeric ID.

ac07160c

docs: Capture docker cgroup workaround (#7519) · 6606e424

Daniel Hiltgen authored Nov 12, 2024

GPU support can break on some systems after a while.  This captures a
known workaround to solve the problem.

6606e424

runner.go: Make KV entry accounting more robust · 65973ceb

Jesse Gross authored Nov 08, 2024

The structure of the accounting for KV cache shifting was carried
over from the old runner but it now doesn't feel natural with the new
runner. There are a number of invariants that should hold true but
are difficult to reason about. There is at least one bug report
that would imply that the invariants are not holding.

This reduces the number of implicit assumptions and is more forgiving
of unexpected situations. It also improves behavior around which input
tokens are kept when truncation occurs.

Bug #7545

65973ceb

readme: add aichat terminal app to community integrations (#7418) · bebef1e5
Joey Zheng authored Nov 12, 2024

bebef1e5
api: fix typos in Go Doc comments (#7620) · d48c1c5a
Evan authored Nov 11, 2024

d48c1c5a

11 Nov, 2024 4 commits
- readme: add GoLamify to community integrations (#7521) · 36a8372b
  Prasad Bhalerao authored Nov 11, 2024
  
  36a8372b
- readme: add browser extension that enables using Ollama for interacting with web pages (#5827) · 4e94227b
  Ivo Stoykov authored Nov 11, 2024
  
  4e94227b
- docs: add mentions of Llama 3.2 (#7517) · 479d5517
  frances720 authored Nov 10, 2024
  
  479d5517
- api: fix typo in python ClientFromEnvironment docs (#7604) · 76b2b723
  Evan authored Nov 10, 2024
  
  76b2b723
10 Nov, 2024 1 commit
- readme: add llama3.2-vision to model list (#7580) · b8d77cde
  Arhan Busam authored Nov 11, 2024
  
  b8d77cde
08 Nov, 2024 3 commits
- runner.go: Check for zero length images · c2e8cbaa
  Jesse Gross authored Nov 06, 2024
```
If we get a request with a zero length image, it will result in
an out-of-bounds error when we pass the data to the image encoder.
```
  c2e8cbaa
- docs: update langchainpy.md with proper model name (#7527) · 771fab1d
  Edward J. Schwartz authored Nov 08, 2024
  
  771fab1d
- Set macos min version for all architectures (#7579) · 3a5239e6
  Daniel Hiltgen authored Nov 08, 2024
  
  3a5239e6
07 Nov, 2024 5 commits
- win: remove preview title from installer (#7529) · 3d25e7bf
  Daniel Hiltgen authored Nov 07, 2024
```
This should have been in #7347 but was overlooked.
```
  3d25e7bf
- Workaround buggy P2P ROCm copy on windows (#7466) · 1618700c
  Daniel Hiltgen authored Nov 07, 2024
```
This enables the workaround code only for windows which should help windows users with muliple AMD GPUs
```
  1618700c
- Debug logging for nvcuda init (#7532) · b111aa5a
  Daniel Hiltgen authored Nov 07, 2024
```
Some users are reporting crashes during nvcuda.dll initialization
on windows.  This should help narrow down where things are going bad.
```
  b111aa5a
- Align rocm compiler flags (#7467) · 9e83e550
  Daniel Hiltgen authored Nov 07, 2024
```
Bring consistency with the old generate script behavior
```
  9e83e550
- Be explicit for gpu library link dir (#7560) · fc2a0715
  Daniel Hiltgen authored Nov 07, 2024
```
On linux nvcc isn't automatically linking to the same cuda version.
```
  fc2a0715
06 Nov, 2024 3 commits

docs: OLLAMA_NEW_RUNNERS no longer exists · 3020d2dc
Jesse Gross authored Nov 06, 2024

3020d2dc

runner.go: Remove unused arguments · a9094176

Jesse Gross authored Oct 30, 2024

Now that server.cpp is gone, we don't need to keep passing arguments
that were only ignored and only kept for compatibility.

a9094176

sched: Lift parallel restriction for multimodal models except mllama · 6cd56687

Jesse Gross authored Oct 30, 2024

The Go runner does not have a problem with supporting parallel
requests for most multimodal models. Now that we won't be potentially
falling back to server.cpp, this restriction can be lifted.

However, the new mllama model can't support parallel requests, so we
will need to keep a restriction for that.

6cd56687

05 Nov, 2024 4 commits

Update README.md (#7516) · 9d71bcc3

RAPID ARCHITECT authored Nov 05, 2024

added reddit rate below hexabot, ollama powered reddit search and analysis with streamlit for the intervace

9d71bcc3

One corrupt manifest should not wedge model operations (#7515) · a4c70fe1

Daniel Hiltgen authored Nov 05, 2024

One potential failure mode is an empty file which bubbles up as an EOF error,
leading to all pulls and listing operations failing. Instead, continue and
warn about the corrupt manifest. This also allows re-pulling the corrupt
manifest to repair the system.

a4c70fe1

prompt: Use a single token when estimating mllama context size · 34a75102

Jesse Gross authored Nov 04, 2024

Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.

Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.

34a75102

readme: add Hexabot to the list of community integrations · 4157d1f7
Med Marrouchi authored Nov 05, 2024

4157d1f7

04 Nov, 2024 6 commits
- Quiet down debug log of image payload (#7454) · 4ebfa2cb
  Daniel Hiltgen authored Nov 04, 2024
```
Avoid excessive log spew and make consistent with chat logging
```
  4ebfa2cb
- CI: Switch to v13 macos runner (#7498) · 046054fa
  Daniel Hiltgen authored Nov 04, 2024
  
  046054fa
- CI: matrix strategy fix (#7496) · 95483f34
  Daniel Hiltgen authored Nov 04, 2024
```
Github actions matrix strategy can't access env settings
```
  95483f34
- Merge pull request #7456 from ollama/mxyng/llama3.2-vision-mem · f247a623
  Michael Yang authored Nov 04, 2024
```
update llama3.2 vision memory estimation
```
  f247a623
- Sign windows arm64 official binaries (#7493) · 44bd9e59
  Daniel Hiltgen authored Nov 04, 2024
  
  44bd9e59
- readme: add TextCraft to community integrations (#7377) · 18237be9
  suncloudsmoon authored Nov 03, 2024
  
  18237be9
02 Nov, 2024 1 commit

nvidia libs have inconsistent ordering (#7473) · 29ab9fa7

Daniel Hiltgen authored Nov 02, 2024

The runtime and management libraries may not always have
identical ordering, so use the device UUID to correlate instead of ID.

29ab9fa7