Commits · 34b9db5afc43b352c5ef04fe6ef52684bfdd57b5 · OpenDAS / ollama

"docs/source/en/api/pipelines/dance_diffusion.md" did not exist on "856dad57bb7a9ee13af4a08492e524b0a145a2c5"

23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

10 Apr, 2024 1 commit
- partial offloading · 7e33a017
  Michael Yang authored Apr 05, 2024
  
  7e33a017
01 Apr, 2024 1 commit
- update memory calcualtions · 91b3e4d2
  Michael Yang authored Mar 18, 2024
```
count each layer independently when deciding gpu offloading
```
  91b3e4d2
28 Nov, 2023 1 commit
- progress: fix bar rate · 424d53ac
  Michael Yang authored Nov 18, 2023
  
  424d53ac
20 Nov, 2023 1 commit
- only show decimal points for smaller file size numbers · 93a10821
  Jeffrey Morgan authored Nov 20, 2023
  
  93a10821
17 Nov, 2023 1 commit
- format bytes · 9f04e5a8
  Michael Yang authored Nov 14, 2023
  
  9f04e5a8
14 Nov, 2023 1 commit
- replace go-humanize with format.HumanBytes · 01ea6002
  Michael Yang authored Nov 14, 2023
  
  01ea6002
13 Oct, 2023 1 commit
- fix memory check · 92189a58
  Michael Yang authored Oct 12, 2023
  
  92189a58
11 Oct, 2023 1 commit
- add format bytes · b599946b
  Michael Yang authored Oct 11, 2023
  
  b599946b