Commits · 23f746508d1685bc1bbee9ac4cd600bf5f8e6167 · orangecat / ollama

22 Oct, 2024 1 commit
- readme: add Ollama for Swift to the community integrations (#7295) · 23f74650
  Mattt authored Oct 21, 2024
  
  23f74650
19 Oct, 2024 1 commit
- server: allow vscode-webview origin (#7273) · 48708ca0
  Jeffrey Morgan authored Oct 19, 2024
  
  48708ca0
18 Oct, 2024 1 commit

image processing for llama3.2 (#6963) · c7cb0f06

Patrick Devine authored Oct 18, 2024


Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>

c7cb0f06

17 Oct, 2024 4 commits

llama: Decouple patching script from submodule (#7139) · bf4018b9

Daniel Hiltgen authored Oct 17, 2024

* Refine llama.cpp vendoring workflow tools

Switch from the sync.sh over to make based tooling

* Run new make sync and patch flow

bf4018b9

llama: add compiler tags for cpu features (#7137) · f86d00cd
Daniel Hiltgen authored Oct 17, 2024
```
This adds the ability to customize the default runner with user specified flags
```
f86d00cd

IBM granite/granitemoe architecture support (#6760) · f2890a44

Gabe Goodhart authored Oct 17, 2024

* fix(ext_server): Port llama.cpp sampling refactors to ext_server

This was a fairly large changeset. I closely followed the changes here:
https://github.com/ggerganov/llama.cpp/commit/df270ef74596da8f1178f08991f4c51f18c9ee82



Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(server.cpp): Refactor server.cpp logging for llama.cpp overhaul

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Bump llama.cpp to the latest master with `granite` support

This does not yet have granite MoE support, but that can come in a
follow up PR

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(patches): Update all patches (except solar-pro) to work with bumped llama.cpp

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(solar): Update solar patch for llama.cpp bump

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama.cpp): Bump llama.cpp for granitemoe support

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama.cpp): Bump llama.cpp for granitemoe support

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(solar): Update the solar-pro patch for latest llama.cpp bump

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama.cpp): Bump to the latest master of llama.cpp

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(patches): Update all patches for latest bump

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama): Always run sync.sh from the right directory

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama/patches): Update llama patches

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama)!: Rough sync with llama.cpp submodule

There are a number of changes that will need to be propagated to llama.go
before any of this works!

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama/patches): Add a patch and update for missing ggml-impl.h include

This include is where the ggml_cgraph struct is defined. It is included in
many of the .c files to define the forward declartion in ggml.h. It seems
that with the subset of code included here, the import was somehow lost (or
out-of-order) when building, so adding this include to llama.cpp fixes the
missing definition.

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama/sync): Add missing ggml-cpu-impl.h copy-over in sync.sh

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Add missing log.cpp

This was added as part of the logging overhaul done in llama.cpp

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Overhaul use of sampling module for llama.cpp changes

The changes here reflect the changes made in the big llama.cpp sampling PR
https://github.com/ggerganov/llama.cpp/pull/9294



The sampling functionality is now broken into the base interface
(llama_sampler) and the generation implementation (gpt_sampler). The
changes here reflect that. Since the sampling.h/sampling.cpp code uses c++
STL headers, the sampling_ext.[h|cpp] wrapper is maintained to allow go to
access a pure-C interface.

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Fix the impl of SampleTokenGreedy for new sampling

I don't think this method is currently used, so it could probably just be
removed so that all sampling goes through the GPT interface, but in the
interest of doing no harm, this should keep the method working as expected.

Branch: IBMGraniteArchitectureSupport

* fix(llama): Remove unused SampleTokenGreedy

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(sync): Remove bash-specific change to sync.sh

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* chore(gofumpt): Format on llama.go to pass linting

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llm): Fix missing <thread> include in ext_server

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Remove TODO about grammar_first

This feature was not used/needed previously so should be fine without
plumbing it through now.

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Better naming for sampling wrapper and args

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Fix patch 05 to use new wrapper api and re-sync

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* runner: Flush pending responses before returning

If there are any pending reponses (such as from potential stop
tokens) then we should send them back before ending the sequence.
Otherwise, we can be missing tokens at the end of a response.

Fixes #6707

* fix(llama/sampling): Use gpt_sampler with a forward declaration

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Remove unnecessary patch for gguf impl header

This was caused by an earlier mistake in the embeddings patch that was
dereferencing the pointer instead of using the wrapper API.

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llm): Remove use of deprecated --log-disable flag

Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

f2890a44

Rename gpu package discover (#7143) · 05cd82ef
Daniel Hiltgen authored Oct 16, 2024
```
Cleaning up go package naming
```
05cd82ef

16 Oct, 2024 1 commit

Move macos v11 support flags to build script (#7203) · 7d6eb0d4

Daniel Hiltgen authored Oct 16, 2024

Having v11 support hard-coded into the cgo settings causes warnings
for newer Xcode versions. This should help keep the build clean for users
building from source with the latest tools, while still allow us to target
the older OS via our CI processes.

7d6eb0d4

15 Oct, 2024 3 commits
- Discovery CPU details for default thread selection (#6264) · 24636dfa
  Daniel Hiltgen authored Oct 15, 2024
```
On windows, detect large multi-socket systems and reduce to the number of cores
in one socket for best performance
```
  24636dfa
- Adding 'Ollama App' as community integrations (#6465) · 1d7fa3ad
  JHubi1 authored Oct 15, 2024
  
  1d7fa3ad
- Add missing BF16 tensor type. (#7193) · 09035b71
  frob authored Oct 15, 2024
```
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
  09035b71
14 Oct, 2024 1 commit
- Track GPU discovery failure information (#5820) · f3c8b898
  Daniel Hiltgen authored Oct 14, 2024
```
* Expose GPU discovery failure information

* Remove exposed API for now
```
  f3c8b898
13 Oct, 2024 1 commit
- Fix regression on older macos versions (#7192) · 5dd0477f
  Daniel Hiltgen authored Oct 13, 2024
```
The new cgo compilation requires a flag to target older macos versions
```
  5dd0477f
12 Oct, 2024 1 commit
- llm: Remove GGML_CUDA_NO_PEER_COPY for ROCm (#7174) · c3d321d4
  Daniel Hiltgen authored Oct 12, 2024
```
This workaround logic in llama.cpp is causing crashes for users with less system memory than VRAM.
```
  c3d321d4
10 Oct, 2024 3 commits

cli: Send all images in conversation history · 7fe39025

Jesse Gross authored Oct 09, 2024

Currently the CLI only sends images from the most recent image-
containing message. This prevents doing things like sending
one message with an image and then a follow message with a
second image and asking for comparision based on additional
information not present in any text that was output.

It's possible that some models have a problem with this but the
CLI is not the right place to do this since any adjustments are
model-specific and should affect all clients.

Both llava:34b and minicpm-v do reasonable things with multiple
images in the history.

7fe39025

runner.go: Handle truncation of tokens for stop sequences · 0077e22d

Jesse Gross authored Oct 09, 2024

When a single token contains both text to be return and a stop
sequence, this causes an out of bounds error when we update the
cache to match our text. This is because we currently assume that
the removing the stop sequence will consume at least one token.

This also inverts the logic to deal with positive numbers, rather
than a value to be subtracted, which is easier to reason about.

Fixes #7153

0077e22d

server: Don't clear cmd when closing a server · 03408f34

Jesse Gross authored Oct 09, 2024

Close can be called on an LLM server if the runner subprocess dies.
However, the Ollama scheduler code may not know about this yet and
still try to access it. In this case, it is important that 'cmd'
is still available as it is used to check on the status of the
subprocess. If this happens, Kill may be called twice on the subprocess -
that is fine.

In addition, model unloading may race with new accesses, so we should
hold a lock around this. This may result in the model being reloaded
after the first close call - this is also fine as close will be called
again later.

03408f34

09 Oct, 2024 2 commits
- fix vendoring attribute for metal (#7156) · cd7e01e8
  Daniel Hiltgen authored Oct 09, 2024
```
Add missing metal files to vendoring list
```
  cd7e01e8
- fix vendoring attribute (#7155) · 7a962bd8
  Daniel Hiltgen authored Oct 09, 2024
```
Expand out the file extensions for vendored code so git reports the
status correctly
```
  7a962bd8
08 Oct, 2024 3 commits

Fix build leakages (#7141) · f9584deb

Daniel Hiltgen authored Oct 08, 2024

The recent change to applying patches leaves the submodule dirty based on
"new commits" being present. This ensures we clean up so the tree no longer
reports dirty after a `go generate ./...` run.

The Makefile was being a bit too aggressive in cleaning things up and would result in deleting the placeholder files which someone might accidentally commit.

f9584deb

Re-introduce the `llama` package (#5034) · 96efd905

Jeffrey Morgan authored Oct 08, 2024

* Re-introduce the llama package

This PR brings back the llama package, making it possible to call llama.cpp and
ggml APIs from Go directly via CGo. This has a few advantages:

- C APIs can be called directly from Go without needing to use the previous
  "server" REST API
- On macOS and for CPU builds on Linux and Windows, Ollama can be built without
  a go generate ./... step, making it easy to get up and running to hack on
  parts of Ollama that don't require fast inference
- Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners
  takes <5 min on a fast CPU)
- No git submodule making it easier to clone and build from source

This is a big PR, but much of it is vendor code except for:

- llama.go CGo bindings
- example/: a simple example of running inference
- runner/: a subprocess server designed to replace the llm/ext_server package
- Makefile an as minimal as possible Makefile to build the runner package for
  different...

96efd905

readme: replace stale links to LangChain documentation (#7117) · de982616
Shifra Goldstone authored Oct 07, 2024

de982616

05 Oct, 2024 1 commit
- readme: add G1 to list of community integrations (#7096) · defbf942
  hidden1nin authored Oct 05, 2024
  
  defbf942
01 Oct, 2024 1 commit
- Stop model before deletion if loaded (fixed #6957) (#7050) · f40bb398
  Alex Mavrogiannis authored Oct 01, 2024
  
  f40bb398
29 Sep, 2024 1 commit
- readme: add ARGO LLM tool to community integrations (#7027) · 79d3b1e2
  zmldndx authored Sep 30, 2024
  
  79d3b1e2
26 Sep, 2024 1 commit

server: close response body on error (#6986) · 03608cb4

Blake Mizerany authored Sep 26, 2024

This change closes the response body when an error occurs in
makeRequestWithRetry. Previously, the first, non-200 response body was
not closed before reattempting the request. This change ensures that
the response body is closed in all cases where an error occurs,
preventing leaks of file descriptors.

Fixes #6974

03608cb4

25 Sep, 2024 2 commits
- readme: fix llama3.1 -> llama3.2 typo (#6962) · 450acb71
  Xe Iaso authored Sep 25, 2024
  
  450acb71
- update default model to llama3.2 (#6959) · 55ea963c
  Jeffrey Morgan authored Sep 25, 2024
  
  55ea963c
24 Sep, 2024 3 commits
- CI: Fix win arm version defect (#6940) · e9e9bdb8
  Daniel Hiltgen authored Sep 24, 2024
```
write-host in powershell writes directly to the console and will not be picked
up by a pipe.  Echo, or write-output will.
```
  e9e9bdb8
- readme: update llamaindex links (#6939) · 35bb6d32
  Alex Yang authored Sep 24, 2024
  
  35bb6d32
- readme: add LLMChat to community integrations (#6919) · 98701b58
  Deep Lakhani authored Sep 23, 2024
  
  98701b58
22 Sep, 2024 1 commit
- examples: use punkt_tab instead of punkt (#6907) · ad935f45
  Mahesh Sathiamoorthy authored Sep 22, 2024
```
This was causing an error since we depend on punkt_tab.
```
  ad935f45
21 Sep, 2024 3 commits

runner: Set windows above normal priority (#6905) · dbba7346

Daniel Hiltgen authored Sep 21, 2024

When running the subprocess as a background service windows may
throttle, which can lead to thrashing and very poor token rate.

dbba7346

Fix missing dep path on windows CPU runners (#6884) · 6c2eb73a

Daniel Hiltgen authored Sep 21, 2024

GPUs handled the dependency path properly, but CPU runners didn't which
results in missing vc redist libraries on systems where the user didn't
already have it installed from some other app.

6c2eb73a

CI: win arm artifact dist dir (#6900) · 2a038c1d

Daniel Hiltgen authored Sep 20, 2024

The upload artifact is missing the dist prefix since all
payloads are in the same directory, so restore the prefix
on download.

2a038c1d

20 Sep, 2024 3 commits

CI: win arm adjustments (#6898) · 616c5eaf
Daniel Hiltgen authored Sep 20, 2024

616c5eaf
CI: adjust step ordering for win arm to match x64 (#6895) · f5ff917b
Daniel Hiltgen authored Sep 20, 2024

f5ff917b

Add Windows arm64 support to official builds (#5712) · d632e23f

Daniel Hiltgen authored Sep 20, 2024

* Unified arm/x86 windows installer

This adjusts the installer payloads to be architecture aware so we can cary
both amd64 and arm64 binaries in the installer, and install only the applicable
architecture at install time.

* Include arm64 in official windows build

* Harden schedule test for slow windows timers

This test seems to be a bit flaky on windows, so give it more time to converge

d632e23f

18 Sep, 2024 2 commits
- documentation for stopping a model (#6766) · 5804cf17
  Patrick Devine authored Sep 18, 2024
  
  5804cf17
- examples: add python examples for `bespoke-minicheck` (#6841) · bf7ee0f4
  Ryan Marten authored Sep 18, 2024
  
  bf7ee0f4