Commits · a4c70fe157477fc25940a0ff1f544632464f2e77 · OpenDAS / ollama

05 Nov, 2024 3 commits

One corrupt manifest should not wedge model operations (#7515) · a4c70fe1

Daniel Hiltgen authored Nov 05, 2024

One potential failure mode is an empty file which bubbles up as an EOF error,
leading to all pulls and listing operations failing. Instead, continue and
warn about the corrupt manifest. This also allows re-pulling the corrupt
manifest to repair the system.

a4c70fe1

prompt: Use a single token when estimating mllama context size · 34a75102

Jesse Gross authored Nov 04, 2024

Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.

Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.

34a75102

readme: add Hexabot to the list of community integrations · 4157d1f7
Med Marrouchi authored Nov 05, 2024

4157d1f7

04 Nov, 2024 6 commits
- Quiet down debug log of image payload (#7454) · 4ebfa2cb
  Daniel Hiltgen authored Nov 04, 2024
```
Avoid excessive log spew and make consistent with chat logging
```
  4ebfa2cb
- CI: Switch to v13 macos runner (#7498) · 046054fa
  Daniel Hiltgen authored Nov 04, 2024
  
  046054fa
- CI: matrix strategy fix (#7496) · 95483f34
  Daniel Hiltgen authored Nov 04, 2024
```
Github actions matrix strategy can't access env settings
```
  95483f34
- Merge pull request #7456 from ollama/mxyng/llama3.2-vision-mem · f247a623
  Michael Yang authored Nov 04, 2024
```
update llama3.2 vision memory estimation
```
  f247a623
- Sign windows arm64 official binaries (#7493) · 44bd9e59
  Daniel Hiltgen authored Nov 04, 2024
  
  44bd9e59
- readme: add TextCraft to community integrations (#7377) · 18237be9
  suncloudsmoon authored Nov 03, 2024
  
  18237be9
02 Nov, 2024 4 commits

nvidia libs have inconsistent ordering (#7473) · 29ab9fa7

Daniel Hiltgen authored Nov 02, 2024

The runtime and management libraries may not always have
identical ordering, so use the device UUID to correlate instead of ID.

29ab9fa7

CI: omit unused tools for faster release builds (#7432) · b8d5036e

Daniel Hiltgen authored Nov 02, 2024

This leverages caching, and some reduced installer scope to try
to speed up builds. It also tidies up some windows build logic
that was only relevant for the older generate/cmake builds.

b8d5036e

llama: Improve error handling · 312d9de1

Jesse Gross authored Nov 01, 2024

Check for NULL return values from llama.cpp in more places and
convert them into Go errors, which should make debugging easier
in the future rather than having hidden surprises in our data
structures.

312d9de1

runner.go: Only allocate 1 element embedding batches for mllama · a103dae0

Jesse Gross authored Nov 01, 2024

Mllama has large embeddings (100 MB per image) and each embedding is
represented as 1 token when passed to llama.cpp. Batches are pre-
allocated for the size of the tokens times the batch size, so this
results in allocations of over 50 GB at the default batch size.
On some systems, these mallocs will fail.

Since an image is represented as a single token and mllama doesn't
support more than 1 image per request, we only need to allocate a
batch size of 1, which is much more reasonable. In addition, for
non-multimodal models, we don't need to allocate the embedding
batches at all.

Fixes #7464

a103dae0

01 Nov, 2024 3 commits
- refactor kv estimation · d07cf41a
  Michael Yang authored Oct 31, 2024
  
  d07cf41a
- mllama cross attention · 8c238e70
  Michael Yang authored Oct 31, 2024
  
  8c238e70
- Add basic mllama integration tests (#7455) · 8a9bb0d0
  Daniel Hiltgen authored Oct 31, 2024
  
  8a9bb0d0
31 Oct, 2024 2 commits

runner.go: Don't set cross attention before sending embeddings · 26acdcf4

Jesse Gross authored Oct 31, 2024

Currently if an input has embeddings at any point then we will set
cross attention to true from the beginning. This means that any
tokens before the embeddings are sent will incorrectly have cross
attention layers applied.

This only sets cross attention when we have an embedding, either
previously in this sequence or in the cache. It also makes cross
attention capable of supporting parallelism at the runner level,
though the mllama implementation doesn't support that yet.

26acdcf4

Give unicode test more time to run (#7437) · 921779bb

Daniel Hiltgen authored Oct 31, 2024

* Give unicode test more time to run

Some slower GPUs (or partial CPU/GPU loads) can take more than the default 30s to complete this test

* Give more time for concurrency test

CPU inference can be very slow under stress

921779bb

30 Oct, 2024 6 commits

Refine default thread selection for NUMA systems (#7322) · 16f4eabe

Daniel Hiltgen authored Oct 30, 2024

Until we have full NUMA support, this adjusts the default thread selection
algorithm to count up the number of performance cores across all sockets.

16f4eabe

runner.go: Better abstract vision model integration · c826e574

Jesse Gross authored Oct 11, 2024



-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly
Co-authored-by: Michael Yang <mxyng@pm.me>

c826e574

Soften windows clang requirement (#7428) · 712e99d4

Daniel Hiltgen authored Oct 30, 2024

This will no longer error if built with regular gcc on windows.  To help
triage issues that may come in related to different compilers, the runner now
reports the compier used by cgo.

712e99d4

Remove submodule and shift to Go server - 0.4.0 (#7157) · b754f5a6

Daniel Hiltgen authored Oct 30, 2024

* Remove llama.cpp submodule and shift new build to top

* CI: install msys and clang gcc on win

Needed for deepseek to work properly on windows

b754f5a6

Move windows app out of preview (#7347) · a805e594
Daniel Hiltgen authored Oct 30, 2024

a805e594

windows: Support alt install paths, fit and finish (#6967) · 91dfbb1b

Daniel Hiltgen authored Oct 30, 2024

* windows: Support alt install paths

Advanced users are leveraging innosetup's /DIR switch to target
an alternate location, but we get confused by things not existing in the LocalAppData dir.
This also hardens the server path lookup code for a future attempt to unify with a ./bin prefix

* Fit and finish improvements for windows app

Document alternate install location instructions for binaries and model.
Pop up progress UI for upgrades (automatic, with cancel button).
Expose non-default port in menu to disambiguate mutiple instances.
Set minimum Windows version to 10 22H2

91dfbb1b

29 Oct, 2024 4 commits

add more tests for getting the optimal tiled canvas (#7411) · db1842b9
Patrick Devine authored Oct 29, 2024

db1842b9

Switch windows to clang (#7407) · c9ca3861

Daniel Hiltgen authored Oct 29, 2024

* Switch over to clang for deepseek on windows

The patch for deepseek requires clang on windows. gcc on windows
has a buggy c++ library and can't handle the unicode characters

* Fail fast with wrong compiler on windows

Avoid users mistakenly building with GCC when we need clang

c9ca3861

tests: Add test for Unicode processing · 078f666f
Jesse Gross authored Oct 23, 2024

078f666f

runner.go: Better handle return NULL values from llama.cpp · de1557a0

Jesse Gross authored Oct 22, 2024

Llama.cpp sometimes returns NULL as a return value to report an
error. We should explicitly check for this and convert it to a Go
error rather than putting NULL in our data structures and waiting
for it to blow up later.

de1557a0

28 Oct, 2024 1 commit
- add mllama image processing to the generate handler (#7384) · 084929c2
  Patrick Devine authored Oct 28, 2024
  
  084929c2
27 Oct, 2024 1 commit
- Bump to latest Go 1.22 patch (#7379) · abd5dfd0
  Daniel Hiltgen authored Oct 26, 2024
  
  abd5dfd0
26 Oct, 2024 2 commits

Fix deepseek deseret regex (#7369) · 099f7077
Daniel Hiltgen authored Oct 26, 2024
```
On windows compiled with gcc the c++ regex library failed to handle
the characters
```
099f7077

Better support for AMD multi-GPU on linux (#7212) · d7c94e0c

Daniel Hiltgen authored Oct 26, 2024

* Better support for AMD multi-GPU

This resolves a number of problems related to AMD multi-GPU setups on linux.

The numeric IDs used by rocm are not the same as the numeric IDs exposed in
sysfs although the ordering is consistent.  We have to count up from the first
valid gfx (major/minor/patch with non-zero values) we find starting at zero.

There are 3 different env vars for selecting GPUs, and only ROCR_VISIBLE_DEVICES
supports UUID based identification, so we should favor that one, and try
to use UUIDs if detected to avoid potential ordering bugs with numeric IDs

* ROCR_VISIBLE_DEVICES only works on linux

Use the numeric ID only HIP_VISIBLE_DEVICES on windows

d7c94e0c

25 Oct, 2024 2 commits
- Fix unicode output on windows with redirect to file (#7358) · 35ec7f07
  Daniel Hiltgen authored Oct 25, 2024
```
If we're not writing out to a terminal, avoid setting the console mode
on windows, which corrupts the output file.
```
  35ec7f07
- Fix incremental build file deps (#7361) · 5231ae52
  Daniel Hiltgen authored Oct 25, 2024
```
The common src/hdr defs should be in the common definitions, not gpu specific.
```
  5231ae52
24 Oct, 2024 1 commit

Improve dependency gathering logic (#7345) · 3085c47b

Daniel Hiltgen authored Oct 24, 2024

This unfies the rocm/cuda dependency logic into the makefile
and fixes a missing define which broke windows rocm

3085c47b

23 Oct, 2024 1 commit
- fix #7247 - invalid image input (#7249) · 0ccc7325
  Bill Wang authored Oct 24, 2024
```
---------
Co-authored-by: Bill Wang <bill.wang@bill.wang>
```
  0ccc7325
22 Oct, 2024 4 commits

integration: harden embedding test (#7306) · dc6fe820
Daniel Hiltgen authored Oct 22, 2024
```
Use cosine similarity to make the embeddings tests more robust
```
dc6fe820
default to "FROM ." if a Modelfile isn't present (#7250) · d78fb620
Patrick Devine authored Oct 22, 2024

d78fb620

Fix rocm windows build and clean up dependency gathering (#7305) · 5c44461c

Daniel Hiltgen authored Oct 22, 2024

On windows ensure windows version define is properly set for rocm.
Remove duplicate rocm arch flags.
Resolve wildcards in the targets so parallel builds don't race.
Use readlink to resolve rocm dependencies since wildcards omit libelf
Keep windows rocm deps aligned with unified packaging model

5c44461c

runner.go: Merge partial unicode characters before sending · 03e40efa

Jesse Gross authored Oct 21, 2024

We check for partial unicode characters and accumulate them before
sending. However, when we did send, we still sent each individual piece
separately, leading to broken output. This combines everything into
a single group, which is also more efficient.

This also switches to the built-in check for valid unicode characters,
which is stricter. After this, we should never send back an invalid
sequence.

Fixes #7290

03e40efa