Commits · ee4fd16f2c02a643e70c5393f7bb27cfda58671f · OpenDAS / ollama

"src/diffusers/quantizers/quanto/__init__.py" did not exist on "9a1810f0de807f936ac3cf344d6e1e2851af723a"

20 Sep, 2023 6 commits
- rename generate.go · a9ed7cc6
  Michael Yang authored Sep 20, 2023
  
  a9ed7cc6
- embed libraries using cmake · 6c6a31a1
  Michael Yang authored Sep 20, 2023
  
  6c6a31a1
- remove libcuda.so · fc6ec356
  Bruce MacDonald authored Sep 20, 2023
  
  fc6ec356
- only package 11.8 runner · 1255bc9b
  Bruce MacDonald authored Sep 20, 2023
  
  1255bc9b
- use cuda_version · b9bb5ca2
  Bruce MacDonald authored Sep 20, 2023
  
  b9bb5ca2
- pack in cuda libs · 4e8be787
  Bruce MacDonald authored Sep 20, 2023
  
  4e8be787
18 Sep, 2023 1 commit

subprocess improvements (#524) · 66003e1d

Bruce MacDonald authored Sep 18, 2023

* subprocess improvements

- increase start-up timeout
- when runner fails to start fail rather than timing out
- try runners in order rather than choosing 1 runner
- embed metal runner in metal dir rather than gpu
- refactor logging and error messages

* Update llama.go

* Update llama.go

* simplify by using glob

66003e1d

14 Sep, 2023 1 commit

support for packaging in multiple cuda runners (#509) · 2540c918

Bruce MacDonald authored Sep 14, 2023



* enable packaging multiple cuda versions
* use nvcc cuda version if available

---------
Co-authored-by: Michael Yang <mxyng@pm.me>

2540c918

13 Sep, 2023 1 commit
- fix: add falcon.go · d0288538
  Michael Yang authored Sep 13, 2023
  
  d0288538
12 Sep, 2023 4 commits
- fix model type for 70b · 0c5a4543
  Michael Yang authored Sep 12, 2023
  
  0c5a4543
- fix ggml arm64 cuda build (#520) · f59c4d03
  Bruce MacDonald authored Sep 12, 2023
  
  f59c4d03
- fix falcon decode · 7dee25a0
  Michael Yang authored Sep 12, 2023
```
get model and file type from bin file
```
  7dee25a0
- first pass at linux gpu support (#454) · f2216370
  Bruce MacDonald authored Sep 12, 2023
```
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488)
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  f2216370
07 Sep, 2023 1 commit
- GGUF support (#441) · 09dd2aef
  Bruce MacDonald authored Sep 07, 2023
  
  09dd2aef
06 Sep, 2023 3 commits
- set minimum `CMAKE_OSX_DEPLOYMENT_TARGET` to 11.0 · 61dda6a5
  Jeffrey Morgan authored Sep 06, 2023
  
  61dda6a5
- use `osPath` in gpu check · 7de30085
  Jeffrey Morgan authored Sep 05, 2023
  
  7de30085
- macos `amd64` compatibility fixes · 213ffdb5
  Jeffrey Morgan authored Sep 05, 2023
  
  213ffdb5
05 Sep, 2023 3 commits
- metal: add missing barriers for mul-mat (#469) · d18282bf
  Bruce MacDonald authored Sep 05, 2023
  
  d18282bf
- fix empty response · 2bc06565
  Michael Yang authored Sep 05, 2023
  
  2bc06565
- generate binary dependencies based on GOARCH on macos (#459) · 7fa6e516
  Jeffrey Morgan authored Sep 05, 2023
  
  7fa6e516
03 Sep, 2023 2 commits
- fix not forwarding last token · 59a70552
  Michael Yang authored Sep 03, 2023
  
  59a70552
- remove marshalPrompt which is no longer needed · 5d3f314b
  Michael Yang authored Sep 03, 2023
  
  5d3f314b
30 Aug, 2023 3 commits

remove test not applicate to subprocess · f964aea9
Bruce MacDonald authored Aug 30, 2023

f964aea9

subprocess llama.cpp server (#401) · 42998d79

Bruce MacDonald authored Aug 30, 2023

* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm

42998d79

treat stop as stop sequences, not exact tokens (#442) · f4432e1d

Quinn Slack authored Aug 30, 2023

The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.

Fixes https://github.com/jmorganca/ollama/issues/295.

f4432e1d

26 Aug, 2023 3 commits
- allow F16 to use metal · b25dd179
  Michael Yang authored Aug 26, 2023
```
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
```
  b25dd179
- add 34b to mem check · 304f2b6c
  Michael Yang authored Aug 26, 2023
  
  304f2b6c
- add missing entries for 34B · 177b69a2
  Jeffrey Morgan authored Aug 25, 2023
  
  177b69a2
25 Aug, 2023 1 commit
- patch llama.cpp for 34B · 7a378f8b
  Michael Yang authored Aug 25, 2023
  
  7a378f8b
24 Aug, 2023 1 commit
- add 34b model type · b1cececb
  Michael Yang authored Aug 24, 2023
  
  b1cececb
18 Aug, 2023 1 commit
- fix ModelType() · 5ca05c2e
  Michael Yang authored Aug 18, 2023
  
  5ca05c2e
17 Aug, 2023 1 commit
- model and file type as strings · a894cc79
  Michael Yang authored Aug 17, 2023
  
  a894cc79
14 Aug, 2023 4 commits
- close open files · e26085b9
  Michael Yang authored Aug 14, 2023
  
  e26085b9
- update llama.cpp · f7b61333
  Michael Yang authored Aug 14, 2023
  
  f7b61333
- Update llama.go · 4b2d366c
  Bruce MacDonald authored Aug 14, 2023
  
  4b2d366c
- log embedding eval timing · 56fd4e4e
  Bruce MacDonald authored Aug 14, 2023
  
  56fd4e4e
13 Aug, 2023 1 commit
- update `llama.cpp` to `f64d44a` · 22885aea
  Jeffrey Morgan authored Aug 12, 2023
  
  22885aea
11 Aug, 2023 1 commit
- ggml: fix off by one error · 6ed991c8
  Michael Yang authored Aug 11, 2023
```
remove used Unknown FileType
```
  6ed991c8
10 Aug, 2023 2 commits
- implement loading ggml lora adapters through the modelfile · 6de5d032
  Michael Yang authored Aug 03, 2023
  
  6de5d032
- check memory requirements before loading · d791df75
  Michael Yang authored Aug 03, 2023
  
  d791df75