- 23 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
-
- 10 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 01 Apr, 2024 1 commit
-
-
Michael Yang authored
count each layer independently when deciding gpu offloading
-
- 24 Feb, 2024 1 commit
-
-
Michael Yang authored
this is unnecessary now that x/crypto/ssh.MarshalPrivateKey has been added
-
- 28 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 20 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 17 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 14 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 09 Nov, 2023 1 commit
-
-
Michael Yang authored
instead of static number of parameters for each model family, get the real number from the tensors (#1022) * parse tensor info * refactor decoder * return actual parameter count * explicit rounding * s/Human/HumanNumber/
-
- 19 Oct, 2023 1 commit
-
-
Michael Yang authored
-
- 13 Oct, 2023 1 commit
-
-
Michael Yang authored
-
- 11 Oct, 2023 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 06 Sep, 2023 1 commit
-
-
Michael Yang authored
-
- 11 Aug, 2023 1 commit
-
-
Patrick Devine authored
-
- 18 Jul, 2023 1 commit
-
-
Patrick Devine authored
-