Commits · dddb72e08451f18ff94bb4c74bf6ba2fd7894eda · OpenDAS / ollama

10 Sep, 2024 1 commit
- add *_proxy for debugging · dddb72e0
  Michael Yang authored Sep 10, 2024
  
  dddb72e0
05 Sep, 2024 2 commits

llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT · 67190976
Daniel Hiltgen authored Sep 05, 2024
```
With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
```
67190976

Introduce GPU Overhead env var (#5922) · b05c9e83

Daniel Hiltgen authored Sep 05, 2024

Provide a mechanism for users to set aside an amount of VRAM on each GPU
to make room for other applications they want to start after Ollama, or workaround
memory prediction bugs

b05c9e83

27 Aug, 2024 1 commit
- Move ollama executable out of bin dir (#6535) · 93ea9240
  Daniel Hiltgen authored Aug 27, 2024
  
  93ea9240
23 Aug, 2024 1 commit
- passthrough OLLAMA_HOST path to client · 386af6c1
  Michael Yang authored Aug 23, 2024
  
  386af6c1
19 Aug, 2024 2 commits

Adjust layout to bin+lib/ollama · 88bb9e33
Daniel Hiltgen authored Aug 14, 2024

88bb9e33

Refactor linux packaging · 74d45f01

Daniel Hiltgen authored Jul 08, 2024

This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.

74d45f01

22 Jul, 2024 10 commits
- comments · 85d9d73a
  Michael Yang authored Jul 08, 2024
  
  85d9d73a
- int · 0f191012
  Michael Yang authored Jul 03, 2024
  
  0f191012
- string · e2c3f6b3
  Michael Yang authored Jul 03, 2024
  
  e2c3f6b3
- keepalive · 8570c1c0
  Michael Yang authored Jul 03, 2024
  
  8570c1c0
- bool · 55cd3ddc
  Michael Yang authored Jul 03, 2024
  
  55cd3ddc
- models · 66fe77f0
  Michael Yang authored Jul 03, 2024
  
  66fe77f0
- origins · d1a5227c
  Michael Yang authored Jul 03, 2024
  
  d1a5227c
- host · 4f1afd57
  Michael Yang authored Jul 03, 2024
  
  4f1afd57
- rfc: dynamic environ lookup · 35b89b2e
  Michael Yang authored Jul 03, 2024
  
  35b89b2e
- Remove no longer supported max vram var · cc269ba0
  Daniel Hiltgen authored Jul 22, 2024
```
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios.  With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups.  Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
```
  cc269ba0
03 Jul, 2024 2 commits

fix: use `envconfig.ModelsDir` directly (#4821) · 0d16eb31

Anatoli Babenia authored Jul 04, 2024



* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org>
Co-authored-by: Maas Lalani <maas@lalani.dev>

0d16eb31

Only set default keep_alive on initial model load · 955f2a4e

Daniel Hiltgen authored Jul 02, 2024

This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load.  Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.

955f2a4e

01 Jul, 2024 1 commit
- Remove default auto from help message · 173b5504
  Daniel Hiltgen authored Jul 01, 2024
```
This may confuse users thinking "auto" is an acceptable string - it must be numeric
```
  173b5504
21 Jun, 2024 2 commits

Disable concurrency for AMD + Windows · 9929751c

Daniel Hiltgen authored Jun 19, 2024

Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.

9929751c

Enable concurrency by default · 17b7186c

Daniel Hiltgen authored May 06, 2024

This adjusts our default settings to enable multiple models and parallel
requests to a single model. Users can still override these by the same
env var settings as before. Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s). As before, multiple models will only load
concurrently if they fully fit in VRAM.

17b7186c

19 Jun, 2024 2 commits
- Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)"" · d34d88e4
  Daniel Hiltgen authored Jun 19, 2024
```
This reverts commit 755b4e4f.
```
  d34d88e4
- Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)" · 755b4e4f
  Wang,Zhe authored Jun 19, 2024
```
This reverts commit 163cd3e7.
```
  755b4e4f
17 Jun, 2024 1 commit
- gpu: add env var for detecting Intel oneapi gpus (#5076) · 163cd3e7
  Jeffrey Morgan authored Jun 16, 2024
```
* gpu: add env var for detecting intel oneapi gpus

* fix build error
```
  163cd3e7
14 Jun, 2024 2 commits

Centralize GPU configuration vars · 6be309e1

Daniel Hiltgen authored May 08, 2024

This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.

6be309e1

Support forced spreading for multi GPU · 5e8ff556

Daniel Hiltgen authored May 08, 2024

Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one.  This exposes that
tunable behavior.

5e8ff556

13 Jun, 2024 1 commit
- add OLLAMA_MODELS to envconfig (#5029) · 94618b23
  Patrick Devine authored Jun 13, 2024
  
  94618b23
12 Jun, 2024 1 commit
- move OLLAMA_HOST to envconfig (#5009) · c69bc19e
  Patrick Devine authored Jun 12, 2024
  
  c69bc19e
06 Jun, 2024 1 commit
- API app/browser access (#4879) · 1a29e9a8
  royjhan authored Jun 06, 2024
```
* API app/browser access

* Add tauri (resolves #2291, #4791, #3799, #4388)
```
  1a29e9a8
04 Jun, 2024 2 commits
- some gocritic · c895a7d1
  Michael Yang authored May 21, 2024
  
  c895a7d1
- nosprintfhostport · dad7a987
  Michael Yang authored May 21, 2024
  
  dad7a987
30 May, 2024 1 commit

Fix OLLAMA_LLM_LIBRARY with wrong map name and add more env vars to help message (#4663) · a03be181

Lei Jitang authored May 31, 2024



* envconfig/config.go: Fix wrong description of OLLAMA_LLM_LIBRARY
Signed-off-by: Lei Jitang <leijitang@outlook.com>

* serve: Add more env to help message of ollama serve

Add more enviroment variables to `ollama serve --help`
to let users know what can be configurated.
Signed-off-by: Lei Jitang <leijitang@outlook.com>

---------
Signed-off-by: Lei Jitang <leijitang@outlook.com>

a03be181

24 May, 2024 1 commit
- Move envconfig and consolidate env vars (#4608) · 4cc3be30
  Patrick Devine authored May 24, 2024
  
  4cc3be30
23 May, 2024 1 commit

Use flash attention flag for now (#4580) · 38255d2a

Jeffrey Morgan authored May 22, 2024

* put flash attention behind flag for now

* add test

* remove print

* up timeout for sheduler tests

38255d2a

05 May, 2024 1 commit

Centralize server config handling · f56aa200

Daniel Hiltgen authored May 04, 2024

This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs

f56aa200