Commits · 76b2b723b2a365c7e9e66bf22492760e0bc4ff5a · OpenDAS / ollama

11 Nov, 2024 1 commit
- api: fix typo in python ClientFromEnvironment docs (#7604) · 76b2b723
  Evan authored Nov 10, 2024
  
  76b2b723
10 Nov, 2024 1 commit
- readme: add llama3.2-vision to model list (#7580) · b8d77cde
  Arhan Busam authored Nov 11, 2024
  
  b8d77cde
08 Nov, 2024 3 commits
- runner.go: Check for zero length images · c2e8cbaa
  Jesse Gross authored Nov 06, 2024
```
If we get a request with a zero length image, it will result in
an out-of-bounds error when we pass the data to the image encoder.
```
  c2e8cbaa
- docs: update langchainpy.md with proper model name (#7527) · 771fab1d
  Edward J. Schwartz authored Nov 08, 2024
  
  771fab1d
- Set macos min version for all architectures (#7579) · 3a5239e6
  Daniel Hiltgen authored Nov 08, 2024
  
  3a5239e6
07 Nov, 2024 5 commits
- win: remove preview title from installer (#7529) · 3d25e7bf
  Daniel Hiltgen authored Nov 07, 2024
```
This should have been in #7347 but was overlooked.
```
  3d25e7bf
- Workaround buggy P2P ROCm copy on windows (#7466) · 1618700c
  Daniel Hiltgen authored Nov 07, 2024
```
This enables the workaround code only for windows which should help windows users with muliple AMD GPUs
```
  1618700c
- Debug logging for nvcuda init (#7532) · b111aa5a
  Daniel Hiltgen authored Nov 07, 2024
```
Some users are reporting crashes during nvcuda.dll initialization
on windows.  This should help narrow down where things are going bad.
```
  b111aa5a
- Align rocm compiler flags (#7467) · 9e83e550
  Daniel Hiltgen authored Nov 07, 2024
```
Bring consistency with the old generate script behavior
```
  9e83e550
- Be explicit for gpu library link dir (#7560) · fc2a0715
  Daniel Hiltgen authored Nov 07, 2024
```
On linux nvcc isn't automatically linking to the same cuda version.
```
  fc2a0715
06 Nov, 2024 3 commits

docs: OLLAMA_NEW_RUNNERS no longer exists · 3020d2dc
Jesse Gross authored Nov 06, 2024

3020d2dc

runner.go: Remove unused arguments · a9094176

Jesse Gross authored Oct 30, 2024

Now that server.cpp is gone, we don't need to keep passing arguments
that were only ignored and only kept for compatibility.

a9094176

sched: Lift parallel restriction for multimodal models except mllama · 6cd56687

Jesse Gross authored Oct 30, 2024

The Go runner does not have a problem with supporting parallel
requests for most multimodal models. Now that we won't be potentially
falling back to server.cpp, this restriction can be lifted.

However, the new mllama model can't support parallel requests, so we
will need to keep a restriction for that.

6cd56687

05 Nov, 2024 4 commits

Update README.md (#7516) · 9d71bcc3

RAPID ARCHITECT authored Nov 05, 2024

added reddit rate below hexabot, ollama powered reddit search and analysis with streamlit for the intervace

9d71bcc3

One corrupt manifest should not wedge model operations (#7515) · a4c70fe1

Daniel Hiltgen authored Nov 05, 2024

One potential failure mode is an empty file which bubbles up as an EOF error,
leading to all pulls and listing operations failing. Instead, continue and
warn about the corrupt manifest. This also allows re-pulling the corrupt
manifest to repair the system.

a4c70fe1

prompt: Use a single token when estimating mllama context size · 34a75102

Jesse Gross authored Nov 04, 2024

Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.

Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.

34a75102

readme: add Hexabot to the list of community integrations · 4157d1f7
Med Marrouchi authored Nov 05, 2024

4157d1f7

04 Nov, 2024 6 commits
- Quiet down debug log of image payload (#7454) · 4ebfa2cb
  Daniel Hiltgen authored Nov 04, 2024
```
Avoid excessive log spew and make consistent with chat logging
```
  4ebfa2cb
- CI: Switch to v13 macos runner (#7498) · 046054fa
  Daniel Hiltgen authored Nov 04, 2024
  
  046054fa
- CI: matrix strategy fix (#7496) · 95483f34
  Daniel Hiltgen authored Nov 04, 2024
```
Github actions matrix strategy can't access env settings
```
  95483f34
- Merge pull request #7456 from ollama/mxyng/llama3.2-vision-mem · f247a623
  Michael Yang authored Nov 04, 2024
```
update llama3.2 vision memory estimation
```
  f247a623
- Sign windows arm64 official binaries (#7493) · 44bd9e59
  Daniel Hiltgen authored Nov 04, 2024
  
  44bd9e59
- readme: add TextCraft to community integrations (#7377) · 18237be9
  suncloudsmoon authored Nov 03, 2024
  
  18237be9
02 Nov, 2024 4 commits

nvidia libs have inconsistent ordering (#7473) · 29ab9fa7

Daniel Hiltgen authored Nov 02, 2024

The runtime and management libraries may not always have
identical ordering, so use the device UUID to correlate instead of ID.

29ab9fa7

CI: omit unused tools for faster release builds (#7432) · b8d5036e

Daniel Hiltgen authored Nov 02, 2024

This leverages caching, and some reduced installer scope to try
to speed up builds. It also tidies up some windows build logic
that was only relevant for the older generate/cmake builds.

b8d5036e

llama: Improve error handling · 312d9de1

Jesse Gross authored Nov 01, 2024

Check for NULL return values from llama.cpp in more places and
convert them into Go errors, which should make debugging easier
in the future rather than having hidden surprises in our data
structures.

312d9de1

runner.go: Only allocate 1 element embedding batches for mllama · a103dae0

Jesse Gross authored Nov 01, 2024

Mllama has large embeddings (100 MB per image) and each embedding is
represented as 1 token when passed to llama.cpp. Batches are pre-
allocated for the size of the tokens times the batch size, so this
results in allocations of over 50 GB at the default batch size.
On some systems, these mallocs will fail.

Since an image is represented as a single token and mllama doesn't
support more than 1 image per request, we only need to allocate a
batch size of 1, which is much more reasonable. In addition, for
non-multimodal models, we don't need to allocate the embedding
batches at all.

Fixes #7464

a103dae0

01 Nov, 2024 3 commits
- refactor kv estimation · d07cf41a
  Michael Yang authored Oct 31, 2024
  
  d07cf41a
- mllama cross attention · 8c238e70
  Michael Yang authored Oct 31, 2024
  
  8c238e70
- Add basic mllama integration tests (#7455) · 8a9bb0d0
  Daniel Hiltgen authored Oct 31, 2024
  
  8a9bb0d0
31 Oct, 2024 2 commits

runner.go: Don't set cross attention before sending embeddings · 26acdcf4

Jesse Gross authored Oct 31, 2024

Currently if an input has embeddings at any point then we will set
cross attention to true from the beginning. This means that any
tokens before the embeddings are sent will incorrectly have cross
attention layers applied.

This only sets cross attention when we have an embedding, either
previously in this sequence or in the cache. It also makes cross
attention capable of supporting parallelism at the runner level,
though the mllama implementation doesn't support that yet.

26acdcf4

Give unicode test more time to run (#7437) · 921779bb

Daniel Hiltgen authored Oct 31, 2024

* Give unicode test more time to run

Some slower GPUs (or partial CPU/GPU loads) can take more than the default 30s to complete this test

* Give more time for concurrency test

CPU inference can be very slow under stress

921779bb

30 Oct, 2024 6 commits

Refine default thread selection for NUMA systems (#7322) · 16f4eabe

Daniel Hiltgen authored Oct 30, 2024

Until we have full NUMA support, this adjusts the default thread selection
algorithm to count up the number of performance cores across all sockets.

16f4eabe

runner.go: Better abstract vision model integration · c826e574

Jesse Gross authored Oct 11, 2024



-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly
Co-authored-by: Michael Yang <mxyng@pm.me>

c826e574

Soften windows clang requirement (#7428) · 712e99d4

Daniel Hiltgen authored Oct 30, 2024

This will no longer error if built with regular gcc on windows.  To help
triage issues that may come in related to different compilers, the runner now
reports the compier used by cgo.

712e99d4

Remove submodule and shift to Go server - 0.4.0 (#7157) · b754f5a6

Daniel Hiltgen authored Oct 30, 2024

* Remove llama.cpp submodule and shift new build to top

* CI: install msys and clang gcc on win

Needed for deepseek to work properly on windows

b754f5a6

Move windows app out of preview (#7347) · a805e594
Daniel Hiltgen authored Oct 30, 2024

a805e594

windows: Support alt install paths, fit and finish (#6967) · 91dfbb1b

Daniel Hiltgen authored Oct 30, 2024

* windows: Support alt install paths

Advanced users are leveraging innosetup's /DIR switch to target
an alternate location, but we get confused by things not existing in the LocalAppData dir.
This also hardens the server path lookup code for a future attempt to unify with a ./bin prefix

* Fit and finish improvements for windows app

Document alternate install location instructions for binaries and model.
Pop up progress UI for upgrades (automatic, with cancel button).
Expose non-default port in menu to disambiguate mutiple instances.
Set minimum Windows version to 10 22H2

91dfbb1b

29 Oct, 2024 2 commits

add more tests for getting the optimal tiled canvas (#7411) · db1842b9
Patrick Devine authored Oct 29, 2024

db1842b9

Switch windows to clang (#7407) · c9ca3861

Daniel Hiltgen authored Oct 29, 2024

* Switch over to clang for deepseek on windows

The patch for deepseek requires clang on windows. gcc on windows
has a buggy c++ library and can't handle the unicode characters

* Fail fast with wrong compiler on windows

Avoid users mistakenly building with GCC when we need clang

c9ca3861