- 05 Feb, 2025 1 commit
-
-
Azis Alvriyanto authored
Removed redundant checks and streamlined the switch-case structure. Added test cases for both HumanBytes and HumanBytes2 to cover a wide range of scenarios.
-
- 08 May, 2024 1 commit
-
-
Daniel Hiltgen authored
This records more GPU usage information for eventual UX inclusion.
-
- 23 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
-
- 10 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 01 Apr, 2024 1 commit
-
-
Michael Yang authored
count each layer independently when deciding gpu offloading
-
- 28 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 20 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 17 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 14 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 13 Oct, 2023 1 commit
-
-
Michael Yang authored
-
- 11 Oct, 2023 1 commit
-
-
Michael Yang authored
-