Commits · 44b466eeb2e42e9ce2852c69d7cddb7ebac5daf8 · OpenDAS / ollama

29 Apr, 2025 1 commit
- config: update default context length to 4096 · 44b466ee
  Devon Rifkin authored Apr 28, 2025
  
  44b466ee
28 Apr, 2025 1 commit
- Revert "increase default context length to 4096 (#10364)" · dd93e1af
  Devon Rifkin authored Apr 28, 2025
```
This reverts commit 424f6486.
```
  dd93e1af
22 Apr, 2025 1 commit

increase default context length to 4096 (#10364) · 424f6486

Devon Rifkin authored Apr 22, 2025

* increase default context length to 4096

We lower the default numParallel from 4 to 2 and use these "savings" to
double the default context length from 2048 to 4096.

We're memory neutral in cases when we previously would've used
numParallel == 4, but we add the following mitigation to handle some
cases where we would have previously fallen back to 1x2048 due to low
VRAM: we decide between 2048 and 4096 using a runtime check, choosing
2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
purposefully don't check the available VRAM because we don't want the
context window size to change unexpectedly based on the available VRAM.

We plan on making the default even larger, but this is a relatively
low-risk change we can make to quickly double it.

* fix tests

add an explicit context length so they don't get truncated. The code
that converts -1 from being a signal for doing a runtime check isn't
running as part of these tests.

* tweak small gpu message

* clarify context length default

also make it actually show up in `ollama serve --help`

424f6486

27 Feb, 2025 1 commit
- server: allow vscode-file origins (#9313) · dc13813a
  Eries Trisnadi authored Feb 28, 2025
  
  dc13813a
24 Feb, 2025 1 commit
- config: allow setting context length through env var (#8938) · 314573bf
  Parth Sareen authored Feb 24, 2025
```
* envconfig: allow setting context length through env var
```
  314573bf
22 Feb, 2025 1 commit

server: group routes by category and purpose (#9270) · 68bac1e0

Blake Mizerany authored Feb 21, 2025

The route assembly in Handler lacked clear organization making it
difficult scan for routes and their relationships to each other. This
commit aims to fix that by reordering the assembly of routes to group
them by category and purpose.

Also, be more specific about what "config" refers to (it is about CORS
if you were wondering... I was.)

68bac1e0

19 Oct, 2024 1 commit
- server: allow vscode-webview origin (#7273) · 48708ca0
  Jeffrey Morgan authored Oct 19, 2024
  
  48708ca0
05 Sep, 2024 1 commit
- llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT · 67190976
  Daniel Hiltgen authored Sep 05, 2024
```
With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
```
  67190976
23 Aug, 2024 1 commit
- passthrough OLLAMA_HOST path to client · 386af6c1
  Michael Yang authored Aug 23, 2024
  
  386af6c1
22 Jul, 2024 7 commits
- comments · 85d9d73a
  Michael Yang authored Jul 08, 2024
  
  85d9d73a
- cleanup tests · 78140a71
  Michael Yang authored Jul 05, 2024
  
  78140a71
- keepalive · 8570c1c0
  Michael Yang authored Jul 03, 2024
  
  8570c1c0
- bool · 55cd3ddc
  Michael Yang authored Jul 03, 2024
  
  55cd3ddc
- origins · d1a5227c
  Michael Yang authored Jul 03, 2024
  
  d1a5227c
- host · 4f1afd57
  Michael Yang authored Jul 03, 2024
  
  4f1afd57
- rfc: dynamic environ lookup · 35b89b2e
  Michael Yang authored Jul 03, 2024
  
  35b89b2e
03 Jul, 2024 1 commit

Only set default keep_alive on initial model load · 955f2a4e

Daniel Hiltgen authored Jul 02, 2024

This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load.  Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.

955f2a4e

12 Jun, 2024 1 commit
- move OLLAMA_HOST to envconfig (#5009) · c69bc19e
  Patrick Devine authored Jun 12, 2024
  
  c69bc19e
24 May, 2024 1 commit
- Move envconfig and consolidate env vars (#4608) · 4cc3be30
  Patrick Devine authored May 24, 2024
  
  4cc3be30
23 May, 2024 1 commit

Use flash attention flag for now (#4580) · 38255d2a

Jeffrey Morgan authored May 22, 2024

* put flash attention behind flag for now

* add test

* remove print

* up timeout for sheduler tests

38255d2a

10 May, 2024 1 commit
- Fix envconfig unit test · 824ee544
  Daniel Hiltgen authored May 10, 2024
  
  824ee544
05 May, 2024 1 commit

Centralize server config handling · f56aa200

Daniel Hiltgen authored May 04, 2024

This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs

f56aa200