Commits · 5d22319a2c7f8509d5eb8f325a7ff9decd1ccda0 · OpenDAS / ollama

06 Oct, 2023 1 commit
- rename server subprocess (#700) · 5d22319a
  Bruce MacDonald authored Oct 06, 2023
```
- this makes it easier to see that the subprocess is associated with ollama
```
  5d22319a
05 Oct, 2023 1 commit
- enable q8, q5, 5_1, and f32 for linux gpu (#699) · d06bc0cb
  Bruce MacDonald authored Oct 05, 2023
  
  d06bc0cb
04 Oct, 2023 1 commit
- increase streaming buffer size (#692) · 9e2de1bd
  Bruce MacDonald authored Oct 04, 2023
  
  9e2de1bd
03 Oct, 2023 1 commit
- starcoder · c02c0cd4
  Michael Yang authored Oct 02, 2023
  
  c02c0cd4
02 Oct, 2023 2 commits

clean up num_gpu calculation code (#673) · b1f71233
Bruce MacDonald authored Oct 02, 2023

b1f71233

Relay default values to llama runner (#672) · 1fbf3585

Bruce MacDonald authored Oct 02, 2023



* include seed in params for llama.cpp server and remove empty filter for temp

* relay default predict options to llama.cpp

- reorganize options to match predict request for readability

* omit empty stop

---------
Co-authored-by: hallh <hallh@users.noreply.github.com>

1fbf3585

29 Sep, 2023 1 commit
- windows runner fixes (#637) · 9771b1ec
  Bruce MacDonald authored Sep 29, 2023
  
  9771b1ec
28 Sep, 2023 1 commit
- use int64 consistently · f40b3de7
  Michael Yang authored Sep 28, 2023
  
  f40b3de7
25 Sep, 2023 1 commit
- unbound max num gpu layers (#591) · 86279f4a
  Bruce MacDonald authored Sep 25, 2023
```
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  86279f4a
21 Sep, 2023 3 commits

silence warm up log · 058d0cd0
Michael Yang authored Sep 21, 2023

058d0cd0
update submodule (#567) · ee1c994d
Michael Yang authored Sep 21, 2023

ee1c994d

remove tmp directories created by previous servers (#559) · 4cba75ef

Bruce MacDonald authored Sep 21, 2023



* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>

4cba75ef

20 Sep, 2023 6 commits
- rename generate.go · a9ed7cc6
  Michael Yang authored Sep 20, 2023
  
  a9ed7cc6
- embed libraries using cmake · 6c6a31a1
  Michael Yang authored Sep 20, 2023
  
  6c6a31a1
- remove libcuda.so · fc6ec356
  Bruce MacDonald authored Sep 20, 2023
  
  fc6ec356
- only package 11.8 runner · 1255bc9b
  Bruce MacDonald authored Sep 20, 2023
  
  1255bc9b
- use cuda_version · b9bb5ca2
  Bruce MacDonald authored Sep 20, 2023
  
  b9bb5ca2
- pack in cuda libs · 4e8be787
  Bruce MacDonald authored Sep 20, 2023
  
  4e8be787
18 Sep, 2023 1 commit

subprocess improvements (#524) · 66003e1d

Bruce MacDonald authored Sep 18, 2023

* subprocess improvements

- increase start-up timeout
- when runner fails to start fail rather than timing out
- try runners in order rather than choosing 1 runner
- embed metal runner in metal dir rather than gpu
- refactor logging and error messages

* Update llama.go

* Update llama.go

* simplify by using glob

66003e1d

14 Sep, 2023 1 commit

support for packaging in multiple cuda runners (#509) · 2540c918

Bruce MacDonald authored Sep 14, 2023



* enable packaging multiple cuda versions
* use nvcc cuda version if available

---------
Co-authored-by: Michael Yang <mxyng@pm.me>

2540c918

13 Sep, 2023 1 commit
- fix: add falcon.go · d0288538
  Michael Yang authored Sep 13, 2023
  
  d0288538
12 Sep, 2023 4 commits
- fix model type for 70b · 0c5a4543
  Michael Yang authored Sep 12, 2023
  
  0c5a4543
- fix ggml arm64 cuda build (#520) · f59c4d03
  Bruce MacDonald authored Sep 12, 2023
  
  f59c4d03
- fix falcon decode · 7dee25a0
  Michael Yang authored Sep 12, 2023
```
get model and file type from bin file
```
  7dee25a0
- first pass at linux gpu support (#454) · f2216370
  Bruce MacDonald authored Sep 12, 2023
```
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488)
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  f2216370
07 Sep, 2023 1 commit
- GGUF support (#441) · 09dd2aef
  Bruce MacDonald authored Sep 07, 2023
  
  09dd2aef
06 Sep, 2023 3 commits
- set minimum `CMAKE_OSX_DEPLOYMENT_TARGET` to 11.0 · 61dda6a5
  Jeffrey Morgan authored Sep 06, 2023
  
  61dda6a5
- use `osPath` in gpu check · 7de30085
  Jeffrey Morgan authored Sep 05, 2023
  
  7de30085
- macos `amd64` compatibility fixes · 213ffdb5
  Jeffrey Morgan authored Sep 05, 2023
  
  213ffdb5
05 Sep, 2023 3 commits
- metal: add missing barriers for mul-mat (#469) · d18282bf
  Bruce MacDonald authored Sep 05, 2023
  
  d18282bf
- fix empty response · 2bc06565
  Michael Yang authored Sep 05, 2023
  
  2bc06565
- generate binary dependencies based on GOARCH on macos (#459) · 7fa6e516
  Jeffrey Morgan authored Sep 05, 2023
  
  7fa6e516
03 Sep, 2023 2 commits
- fix not forwarding last token · 59a70552
  Michael Yang authored Sep 03, 2023
  
  59a70552
- remove marshalPrompt which is no longer needed · 5d3f314b
  Michael Yang authored Sep 03, 2023
  
  5d3f314b
30 Aug, 2023 3 commits

remove test not applicate to subprocess · f964aea9
Bruce MacDonald authored Aug 30, 2023

f964aea9

subprocess llama.cpp server (#401) · 42998d79

Bruce MacDonald authored Aug 30, 2023

* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm

42998d79

treat stop as stop sequences, not exact tokens (#442) · f4432e1d

Quinn Slack authored Aug 30, 2023

The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.

Fixes https://github.com/jmorganca/ollama/issues/295.

f4432e1d

26 Aug, 2023 3 commits
- allow F16 to use metal · b25dd179
  Michael Yang authored Aug 26, 2023
```
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
```
  b25dd179
- add 34b to mem check · 304f2b6c
  Michael Yang authored Aug 26, 2023
  
  304f2b6c
- add missing entries for 34B · 177b69a2
  Jeffrey Morgan authored Aug 25, 2023
  
  177b69a2