Commits · d7eb05b9361febead29a74e71ddffc2ebeff5302 · OpenDAS / ollama

12 Nov, 2024 8 commits

runner.go: Fix off-by-one for num predicted · d7eb05b9
Jesse Gross authored Nov 12, 2024

d7eb05b9
CI: give windows lint more time (#7635) · 636a743c
Daniel Hiltgen authored Nov 12, 2024
```
It looks like 8 minutes isn't quite enough and we're seeing sporadic timeouts
```
636a743c
Jetpack support for Go server (#7217) · df011054
Daniel Hiltgen authored Nov 12, 2024
```
This adds support for the Jetson JetPack variants into the Go runner
```
df011054

doc: capture numeric group requirement (#6941) · ac07160c

Daniel Hiltgen authored Nov 12, 2024

Docker uses the container filesystem for name resolution, so we can't guide users
to use the name of the host group.  Instead they must specify the numeric ID.

ac07160c

docs: Capture docker cgroup workaround (#7519) · 6606e424

Daniel Hiltgen authored Nov 12, 2024

GPU support can break on some systems after a while.  This captures a
known workaround to solve the problem.

6606e424

runner.go: Make KV entry accounting more robust · 65973ceb

Jesse Gross authored Nov 08, 2024

The structure of the accounting for KV cache shifting was carried
over from the old runner but it now doesn't feel natural with the new
runner. There are a number of invariants that should hold true but
are difficult to reason about. There is at least one bug report
that would imply that the invariants are not holding.

This reduces the number of implicit assumptions and is more forgiving
of unexpected situations. It also improves behavior around which input
tokens are kept when truncation occurs.

Bug #7545

65973ceb

readme: add aichat terminal app to community integrations (#7418) · bebef1e5
Joey Zheng authored Nov 12, 2024

bebef1e5
api: fix typos in Go Doc comments (#7620) · d48c1c5a
Evan authored Nov 11, 2024

d48c1c5a

11 Nov, 2024 4 commits
- readme: add GoLamify to community integrations (#7521) · 36a8372b
  Prasad Bhalerao authored Nov 11, 2024
  
  36a8372b
- readme: add browser extension that enables using Ollama for interacting with web pages (#5827) · 4e94227b
  Ivo Stoykov authored Nov 11, 2024
  
  4e94227b
- docs: add mentions of Llama 3.2 (#7517) · 479d5517
  frances720 authored Nov 10, 2024
  
  479d5517
- api: fix typo in python ClientFromEnvironment docs (#7604) · 76b2b723
  Evan authored Nov 10, 2024
  
  76b2b723
10 Nov, 2024 1 commit
- readme: add llama3.2-vision to model list (#7580) · b8d77cde
  Arhan Busam authored Nov 11, 2024
  
  b8d77cde
08 Nov, 2024 3 commits
- runner.go: Check for zero length images · c2e8cbaa
  Jesse Gross authored Nov 06, 2024
```
If we get a request with a zero length image, it will result in
an out-of-bounds error when we pass the data to the image encoder.
```
  c2e8cbaa
- docs: update langchainpy.md with proper model name (#7527) · 771fab1d
  Edward J. Schwartz authored Nov 08, 2024
  
  771fab1d
- Set macos min version for all architectures (#7579) · 3a5239e6
  Daniel Hiltgen authored Nov 08, 2024
  
  3a5239e6
07 Nov, 2024 5 commits
- win: remove preview title from installer (#7529) · 3d25e7bf
  Daniel Hiltgen authored Nov 07, 2024
```
This should have been in #7347 but was overlooked.
```
  3d25e7bf
- Workaround buggy P2P ROCm copy on windows (#7466) · 1618700c
  Daniel Hiltgen authored Nov 07, 2024
```
This enables the workaround code only for windows which should help windows users with muliple AMD GPUs
```
  1618700c
- Debug logging for nvcuda init (#7532) · b111aa5a
  Daniel Hiltgen authored Nov 07, 2024
```
Some users are reporting crashes during nvcuda.dll initialization
on windows.  This should help narrow down where things are going bad.
```
  b111aa5a
- Align rocm compiler flags (#7467) · 9e83e550
  Daniel Hiltgen authored Nov 07, 2024
```
Bring consistency with the old generate script behavior
```
  9e83e550
- Be explicit for gpu library link dir (#7560) · fc2a0715
  Daniel Hiltgen authored Nov 07, 2024
```
On linux nvcc isn't automatically linking to the same cuda version.
```
  fc2a0715
06 Nov, 2024 3 commits

docs: OLLAMA_NEW_RUNNERS no longer exists · 3020d2dc
Jesse Gross authored Nov 06, 2024

3020d2dc

runner.go: Remove unused arguments · a9094176

Jesse Gross authored Oct 30, 2024

Now that server.cpp is gone, we don't need to keep passing arguments
that were only ignored and only kept for compatibility.

a9094176

sched: Lift parallel restriction for multimodal models except mllama · 6cd56687

Jesse Gross authored Oct 30, 2024

The Go runner does not have a problem with supporting parallel
requests for most multimodal models. Now that we won't be potentially
falling back to server.cpp, this restriction can be lifted.

However, the new mllama model can't support parallel requests, so we
will need to keep a restriction for that.

6cd56687

05 Nov, 2024 4 commits

Update README.md (#7516) · 9d71bcc3

RAPID ARCHITECT authored Nov 05, 2024

added reddit rate below hexabot, ollama powered reddit search and analysis with streamlit for the intervace

9d71bcc3

One corrupt manifest should not wedge model operations (#7515) · a4c70fe1

Daniel Hiltgen authored Nov 05, 2024

One potential failure mode is an empty file which bubbles up as an EOF error,
leading to all pulls and listing operations failing. Instead, continue and
warn about the corrupt manifest. This also allows re-pulling the corrupt
manifest to repair the system.

a4c70fe1

prompt: Use a single token when estimating mllama context size · 34a75102

Jesse Gross authored Nov 04, 2024

Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.

Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.

34a75102

readme: add Hexabot to the list of community integrations · 4157d1f7
Med Marrouchi authored Nov 05, 2024

4157d1f7

04 Nov, 2024 6 commits
- Quiet down debug log of image payload (#7454) · 4ebfa2cb
  Daniel Hiltgen authored Nov 04, 2024
```
Avoid excessive log spew and make consistent with chat logging
```
  4ebfa2cb
- CI: Switch to v13 macos runner (#7498) · 046054fa
  Daniel Hiltgen authored Nov 04, 2024
  
  046054fa
- CI: matrix strategy fix (#7496) · 95483f34
  Daniel Hiltgen authored Nov 04, 2024
```
Github actions matrix strategy can't access env settings
```
  95483f34
- Merge pull request #7456 from ollama/mxyng/llama3.2-vision-mem · f247a623
  Michael Yang authored Nov 04, 2024
```
update llama3.2 vision memory estimation
```
  f247a623
- Sign windows arm64 official binaries (#7493) · 44bd9e59
  Daniel Hiltgen authored Nov 04, 2024
  
  44bd9e59
- readme: add TextCraft to community integrations (#7377) · 18237be9
  suncloudsmoon authored Nov 03, 2024
  
  18237be9
02 Nov, 2024 4 commits

nvidia libs have inconsistent ordering (#7473) · 29ab9fa7

Daniel Hiltgen authored Nov 02, 2024

The runtime and management libraries may not always have
identical ordering, so use the device UUID to correlate instead of ID.

29ab9fa7

CI: omit unused tools for faster release builds (#7432) · b8d5036e

Daniel Hiltgen authored Nov 02, 2024

This leverages caching, and some reduced installer scope to try
to speed up builds. It also tidies up some windows build logic
that was only relevant for the older generate/cmake builds.

b8d5036e

llama: Improve error handling · 312d9de1

Jesse Gross authored Nov 01, 2024

Check for NULL return values from llama.cpp in more places and
convert them into Go errors, which should make debugging easier
in the future rather than having hidden surprises in our data
structures.

312d9de1

runner.go: Only allocate 1 element embedding batches for mllama · a103dae0

Jesse Gross authored Nov 01, 2024

Mllama has large embeddings (100 MB per image) and each embedding is
represented as 1 token when passed to llama.cpp. Batches are pre-
allocated for the size of the tokens times the batch size, so this
results in allocations of over 50 GB at the default batch size.
On some systems, these mallocs will fail.

Since an image is represented as a single token and mllama doesn't
support more than 1 image per request, we only need to allocate a
batch size of 1, which is much more reasonable. In addition, for
non-multimodal models, we don't need to allocate the embedding
batches at all.

Fixes #7464

a103dae0

01 Nov, 2024 2 commits
- refactor kv estimation · d07cf41a
  Michael Yang authored Oct 31, 2024
  
  d07cf41a
- mllama cross attention · 8c238e70
  Michael Yang authored Oct 31, 2024
  
  8c238e70