Commits · ea4c284a48fc80f978cef63dc002be6909ed152c · OpenDAS / ollama

17 Apr, 2024 1 commit
- account for all non-repeating layers · a8b9b930
  Michael Yang authored Apr 17, 2024
  
  a8b9b930
16 Apr, 2024 2 commits
- scale graph based on gpu count · 26df6747
  Michael Yang authored Apr 16, 2024
  
  26df6747
- darwin: no partial offloading if required memory greater than system · 41a272de
  Michael Yang authored Apr 16, 2024
  
  41a272de
15 Apr, 2024 1 commit
- Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653) · a0b8a32e
  Jeffrey Morgan authored Apr 15, 2024
```
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading

* use `unload` in signal handler
```
  a0b8a32e
10 Apr, 2024 2 commits
- partial offloading · 7e33a017
  Michael Yang authored Apr 05, 2024
  
  7e33a017
- refactor tensor query · 8b2c1006
  Michael Yang authored Apr 03, 2024
  
  8b2c1006
09 Apr, 2024 1 commit
- Handle very slow model loads · c5ff443b
  Daniel Hiltgen authored Apr 09, 2024
```
During testing, we're seeing some models take over 3 minutes.
```
  c5ff443b
06 Apr, 2024 1 commit
- no rope parameters · be517e49
  Michael Yang authored Apr 05, 2024
  
  be517e49
03 Apr, 2024 1 commit
- update graph size estimate · 12e923e1
  Michael Yang authored Apr 02, 2024
  
  12e923e1
02 Apr, 2024 2 commits
- Revert options as a ref in the server · 6589eb8a
  Daniel Hiltgen authored Apr 02, 2024
  
  6589eb8a
- fix metal gpu · 80163ebc
  Michael Yang authored Apr 02, 2024
  
  80163ebc
01 Apr, 2024 1 commit

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9