Commits · 9ef2fce33a4625da88d4201cdb5e3a074e268def · OpenDAS / ollama

13 Oct, 2023 4 commits
- only check system memory on macos · 36fe2dee
  Michael Yang authored Oct 13, 2023
  
  36fe2dee
- check total (system + video) memory · 4a8931f6
  Michael Yang authored Oct 12, 2023
  
  4a8931f6
- refactor memory check · bd6e38fb
  Michael Yang authored Oct 12, 2023
  
  bd6e38fb
- fix memory check · 92189a58
  Michael Yang authored Oct 12, 2023
  
  92189a58
11 Oct, 2023 1 commit
- add format bytes · b599946b
  Michael Yang authored Oct 11, 2023
  
  b599946b
05 Oct, 2023 1 commit
- enable q8, q5, 5_1, and f32 for linux gpu (#699) · d06bc0cb
  Bruce MacDonald authored Oct 05, 2023
  
  d06bc0cb
25 Sep, 2023 1 commit
- unbound max num gpu layers (#591) · 86279f4a
  Bruce MacDonald authored Sep 25, 2023
```
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  86279f4a
21 Sep, 2023 1 commit

remove tmp directories created by previous servers (#559) · 4cba75ef

Bruce MacDonald authored Sep 21, 2023



* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>

4cba75ef

12 Sep, 2023 1 commit
- fix falcon decode · 7dee25a0
  Michael Yang authored Sep 12, 2023
```
get model and file type from bin file
```
  7dee25a0
07 Sep, 2023 1 commit
- GGUF support (#441) · 09dd2aef
  Bruce MacDonald authored Sep 07, 2023
  
  09dd2aef
30 Aug, 2023 1 commit

subprocess llama.cpp server (#401) · 42998d79

Bruce MacDonald authored Aug 30, 2023

* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm

42998d79

26 Aug, 2023 2 commits
- allow F16 to use metal · b25dd179
  Michael Yang authored Aug 26, 2023
```
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
```
  b25dd179
- add 34b to mem check · 304f2b6c
  Michael Yang authored Aug 26, 2023
  
  304f2b6c
17 Aug, 2023 1 commit
- model and file type as strings · a894cc79
  Michael Yang authored Aug 17, 2023
  
  a894cc79
14 Aug, 2023 1 commit
- close open files · e26085b9
  Michael Yang authored Aug 14, 2023
  
  e26085b9
10 Aug, 2023 4 commits
- implement loading ggml lora adapters through the modelfile · 6de5d032
  Michael Yang authored Aug 03, 2023
  
  6de5d032
- check memory requirements before loading · d791df75
  Michael Yang authored Aug 03, 2023
  
  d791df75
- disable gpu for q5_0, q5_1, q8_0 quants · 020a3b35
  Michael Yang authored Aug 03, 2023
  
  020a3b35
- partial decode ggml bin for more info · fccf8d17
  Michael Yang authored Jul 21, 2023
  
  fccf8d17