- 12 Dec, 2023 2 commits
-
-
Bruce MacDonald authored
-
Bruce MacDonald authored
- remove parallel
-
- 11 Dec, 2023 1 commit
-
-
Patrick Devine authored
--------- Co-authored-by:Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
-
- 10 Dec, 2023 4 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 09 Dec, 2023 1 commit
-
-
Bruce MacDonald authored
* fix: queued request failures - increase parallel requests to 2 to complete queued request, queueing is managed in ollama * log steam errors
-
- 05 Dec, 2023 7 commits
-
-
Michael Yang authored
-
Bruce MacDonald authored
-
Jeffrey Morgan authored
This reverts commit 7a0899d6.
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 04 Dec, 2023 2 commits
-
-
Bruce MacDonald authored
- update chat docs - add messages chat endpoint - remove deprecated context and template generate parameters from docs - context and template are still supported for the time being and will continue to work as expected - add partial response to chat history
-
Michael Yang authored
-
- 26 Nov, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 24 Nov, 2023 2 commits
-
-
Jing Zhang authored
* Support cuda build in Windows * Enable dynamic NumGPU allocation for Windows
-
Jongwook Choi authored
When CUDA peer access is enabled, multi-gpu inference will produce garbage output. This is a known bug of llama.cpp (or nvidia). Until the upstream bug is fixed, we can disable CUDA peer access temporarily to ensure correct output. See #961.
-
- 22 Nov, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Michael Yang authored
-
- 21 Nov, 2023 2 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 20 Nov, 2023 3 commits
-
-
Michael Yang authored
-
Purinda Gunasekara authored
-
Jeffrey Morgan authored
-
- 19 Nov, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Bruce MacDonald authored
-
- 17 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 10 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
* add `"format": "json"` as an API parameter --------- Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 09 Nov, 2023 2 commits
-
-
Bruce MacDonald authored
-
Michael Yang authored
instead of static number of parameters for each model family, get the real number from the tensors (#1022) * parse tensor info * refactor decoder * return actual parameter count * explicit rounding * s/Human/HumanNumber/
-
- 04 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 03 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 02 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 31 Oct, 2023 1 commit
-
-
Michael Yang authored
-
- 27 Oct, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Bruce MacDonald authored
-