Commits · 1e6a28bf5b1fbcf540bb4fc4e2f89408ebd04384 · OpenDAS / ollama

28 Apr, 2024 1 commit

Daniel Hiltgen authored Apr 28, 2024

Prior refactoring passes accidentally removed the logic to bypass VRAM
checks for CPU loads. This adds that back, along with test coverage.

This also fixes loaded map access in the unit test to be behind the mutex which was
likely the cause of various flakes in the tests.

d6e3b645

26 Apr, 2024 2 commits
- return code `499` when user cancels request while a model is loading (#3955) · bb31def0
  Jeffrey Morgan authored Apr 26, 2024
  
  bb31def0
- types/model: overhaul Name and Digest types (#3924) · 37f9c8ad
  Blake Mizerany authored Apr 26, 2024
  
  37f9c8ad
25 Apr, 2024 2 commits
- Reload model if `num_gpu` changes (#3920) · 00b0699c
  Jeffrey Morgan authored Apr 25, 2024
```
* reload model if `num_gpu` changes

* dont reload on -1

* fix tests
```
  00b0699c
- Adjust context size for parallelism · b123be5b
  Daniel Hiltgen authored Apr 25, 2024
  
  b123be5b
24 Apr, 2024 4 commits
- Restructure loading conditional chain · 36a6dacc
  Bryce Reitano authored Apr 24, 2024
  
  36a6dacc
- Provide variable ggml for TestLoad · ceb0e26e
  Bryce Reitano authored Apr 24, 2024
  
  ceb0e26e
- Move ggml loading to when we attempt fitting · 284e02be
  Bryce Reitano authored Apr 24, 2024
  
  284e02be
- update copy to use model.Name · 592dae31
  Michael Yang authored Apr 16, 2024
  
  592dae31
23 Apr, 2024 2 commits

Harden sched TestLoad · d8851cb7
Daniel Hiltgen authored Apr 23, 2024
```
Give the go routine a moment to deliver the expired event
```
d8851cb7

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

21 Apr, 2024 1 commit
- chore: use errors.New to replace fmt.Errorf will much better (#3789) · 62be2050
  Cheng authored Apr 21, 2024
  
  62be2050
15 Apr, 2024 2 commits
- Add llama2 / torch models for `ollama create` (#3607) · 9f8691c6
  Patrick Devine authored Apr 15, 2024
  
  9f8691c6
- Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653) · a0b8a32e
  Jeffrey Morgan authored Apr 15, 2024
```
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading

* use `unload` in signal handler
```
  a0b8a32e
10 Apr, 2024 1 commit

server: provide helpful workaround hint when stalling on pull (#3584) · a7b431e7

Blake Mizerany authored Apr 10, 2024

This is a quick fix to help users who are stuck on the "pull" step at
99%.

In the near future we're introducing a new registry client that
should/will hopefully be smarter. In the meantime, this should unblock
the users hitting issue #1736.

a7b431e7

08 Apr, 2024 2 commits
- cgo quantize · 9502e566
  Michael Yang authored Apr 05, 2024
  
  9502e566
- no blob create if already exists · e1c9a2a0
  Michael Yang authored Apr 05, 2024
  
  e1c9a2a0
02 Apr, 2024 1 commit
- Revert options as a ref in the server · 6589eb8a
  Daniel Hiltgen authored Apr 02, 2024
  
  6589eb8a
01 Apr, 2024 4 commits

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

Simplify model conversion (#3422) · 3b6a9154
Patrick Devine authored Apr 01, 2024

3b6a9154
update memory calcualtions · 91b3e4d2
Michael Yang authored Mar 18, 2024
```
count each layer independently when deciding gpu offloading
```
91b3e4d2
refactor model parsing · d338d704
Michael Yang authored Mar 13, 2024

d338d704

29 Mar, 2024 1 commit
- Add gemma safetensors conversion (#3250) · 5a5efee4
  Patrick Devine authored Mar 28, 2024
```
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  5a5efee4
27 Mar, 2024 1 commit
- fix: trim quotes on OLLAMA_ORIGINS · af8a8a6b
  Michael Yang authored Mar 27, 2024
  
  af8a8a6b
26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
23 Mar, 2024 1 commit

Revamp go based integration tests · 949b6c01

Daniel Hiltgen authored Mar 23, 2024

This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.

949b6c01

15 Mar, 2024 1 commit

server: replace blob prefix separator from ':' to '-' (#3146) · 703684a8

Blake Mizerany authored Mar 14, 2024

This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.

703684a8

13 Mar, 2024 1 commit
- Default Keep Alive environment variable (#3094) · 47cfe58a
  Patrick Devine authored Mar 13, 2024
```
---------
Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
```
  47cfe58a
09 Mar, 2024 5 commits
- Finish unwinding idempotent payload logic · 4a5c9b80
  Daniel Hiltgen authored Mar 08, 2024
```
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent.  This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
```
  4a5c9b80
- separate out `isLocalIP` · 5b3fad96
  Jeffrey Morgan authored Mar 09, 2024
  
  5b3fad96
- simplify host checks · bfec2c6e
  Jeffrey Morgan authored Mar 08, 2024
  
  bfec2c6e
- add additional allowed hosts · 5c143af7
  Jeffrey Morgan authored Mar 08, 2024
  
  5c143af7
- add allowed host middleware and remove `workDir` middleware (#3018) · fc8c0445
  Jeffrey Morgan authored Mar 08, 2024
  
  fc8c0445
08 Mar, 2024 3 commits
- decode ggla · 76bdebba
  Michael Yang authored Mar 08, 2024
  
  76bdebba
- fix: allow importing a model from name reference (#3005) · 0cebc79c
  Bruce MacDonald authored Mar 08, 2024
  
  0cebc79c
- Revert "adjust download and upload concurrency based on available bandwidth" (#2995) · fc062059
  Jeffrey Morgan authored Mar 07, 2024
  
  fc062059
07 Mar, 2024 2 commits

Revamp ROCm support · 6c5ccb11

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11

Convert Safetensors to an Ollama model (#2824) · 2c017ca4
Patrick Devine authored Mar 06, 2024

2c017ca4

01 Mar, 2024 1 commit
- Fix embeddings load model behavior (#2848) · 3b4bab3d
  Jeffrey Morgan authored Feb 29, 2024
  
  3b4bab3d
29 Feb, 2024 1 commit

prepend image tags (#2789) · 0e19476b

Michael Yang authored Feb 29, 2024

instead of appending image tags, prepend them - this generally produces better results

0e19476b