- 15 Oct, 2024 1 commit
-
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
- 14 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
* Expose GPU discovery failure information * Remove exposed API for now
-
- 13 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
The new cgo compilation requires a flag to target older macos versions
-
- 12 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
This workaround logic in llama.cpp is causing crashes for users with less system memory than VRAM.
-
- 10 Oct, 2024 3 commits
-
-
Jesse Gross authored
Currently the CLI only sends images from the most recent image- containing message. This prevents doing things like sending one message with an image and then a follow message with a second image and asking for comparision based on additional information not present in any text that was output. It's possible that some models have a problem with this but the CLI is not the right place to do this since any adjustments are model-specific and should affect all clients. Both llava:34b and minicpm-v do reasonable things with multiple images in the history.
-
Jesse Gross authored
When a single token contains both text to be return and a stop sequence, this causes an out of bounds error when we update the cache to match our text. This is because we currently assume that the removing the stop sequence will consume at least one token. This also inverts the logic to deal with positive numbers, rather than a value to be subtracted, which is easier to reason about. Fixes #7153
-
Jesse Gross authored
Close can be called on an LLM server if the runner subprocess dies. However, the Ollama scheduler code may not know about this yet and still try to access it. In this case, it is important that 'cmd' is still available as it is used to check on the status of the subprocess. If this happens, Kill may be called twice on the subprocess - that is fine. In addition, model unloading may race with new accesses, so we should hold a lock around this. This may result in the model being reloaded after the first close call - this is also fine as close will be called again later.
-
- 09 Oct, 2024 2 commits
-
-
Daniel Hiltgen authored
Add missing metal files to vendoring list
-
Daniel Hiltgen authored
Expand out the file extensions for vendored code so git reports the status correctly
-
- 08 Oct, 2024 3 commits
-
-
Daniel Hiltgen authored
The recent change to applying patches leaves the submodule dirty based on "new commits" being present. This ensures we clean up so the tree no longer reports dirty after a `go generate ./...` run. The Makefile was being a bit too aggressive in cleaning things up and would result in deleting the placeholder files which someone might accidentally commit.
-
Jeffrey Morgan authored
* Re-introduce the llama package This PR brings back the llama package, making it possible to call llama.cpp and ggml APIs from Go directly via CGo. This has a few advantages: - C APIs can be called directly from Go without needing to use the previous "server" REST API - On macOS and for CPU builds on Linux and Windows, Ollama can be built without a go generate ./... step, making it easy to get up and running to hack on parts of Ollama that don't require fast inference - Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners takes <5 min on a fast CPU) - No git submodule making it easier to clone and build from source This is a big PR, but much of it is vendor code except for: - llama.go CGo bindings - example/: a simple example of running inference - runner/: a subprocess server designed to replace the llm/ext_server package - Makefile an as minimal as possible Makefile to build the runner package for different...
-
Shifra Goldstone authored
-
- 05 Oct, 2024 1 commit
-
-
hidden1nin authored
-
- 01 Oct, 2024 1 commit
-
-
Alex Mavrogiannis authored
-
- 29 Sep, 2024 1 commit
-
-
zmldndx authored
-
- 26 Sep, 2024 1 commit
-
-
Blake Mizerany authored
This change closes the response body when an error occurs in makeRequestWithRetry. Previously, the first, non-200 response body was not closed before reattempting the request. This change ensures that the response body is closed in all cases where an error occurs, preventing leaks of file descriptors. Fixes #6974
-
- 25 Sep, 2024 2 commits
-
-
Xe Iaso authored
-
Jeffrey Morgan authored
-
- 24 Sep, 2024 3 commits
-
-
Daniel Hiltgen authored
write-host in powershell writes directly to the console and will not be picked up by a pipe. Echo, or write-output will.
-
Alex Yang authored
-
Deep Lakhani authored
-
- 22 Sep, 2024 1 commit
-
-
Mahesh Sathiamoorthy authored
This was causing an error since we depend on punkt_tab.
-
- 21 Sep, 2024 3 commits
-
-
Daniel Hiltgen authored
When running the subprocess as a background service windows may throttle, which can lead to thrashing and very poor token rate.
-
Daniel Hiltgen authored
GPUs handled the dependency path properly, but CPU runners didn't which results in missing vc redist libraries on systems where the user didn't already have it installed from some other app.
-
Daniel Hiltgen authored
The upload artifact is missing the dist prefix since all payloads are in the same directory, so restore the prefix on download.
-
- 20 Sep, 2024 3 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
* Unified arm/x86 windows installer This adjusts the installer payloads to be architecture aware so we can cary both amd64 and arm64 binaries in the installer, and install only the applicable architecture at install time. * Include arm64 in official windows build * Harden schedule test for slow windows timers This test seems to be a bit flaky on windows, so give it more time to converge
-
- 18 Sep, 2024 4 commits
-
-
Patrick Devine authored
-
Ryan Marten authored
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 17 Sep, 2024 3 commits
-
-
Michael Yang authored
make patches git am-able
-
Michael Yang authored
raw diffs can be applied using `git apply` but not with `git am`. git patches, e.g. through `git format-patch` are both apply-able and am-able
-
Daniel Hiltgen authored
The new buildx based build no longer leaves the dist/linux-* directories around, so we don't have to clean them up before uploading.
-
- 16 Sep, 2024 5 commits
-
-
Daniel Hiltgen authored
The rocm CI step for RCs was incorrectly tagging them as the latest rocm build. The multiarch manifest was incorrectly tagged twice (with and without the prefix "v"). Static windows artifacts weren't being carried between build jobs. This also fixes the latest tagging script.
-
Daniel Hiltgen authored
The runners don't have emulation set up so the default multi-platform build wont work.
-
Michael Yang authored
-
Patrick Devine authored
-
Pepo authored
-