- 19 Dec, 2023 2 commits
-
-
Daniel Hiltgen authored
Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.
-
Bruce MacDonald authored
- remove ggml runner - automatically pull gguf models when ggml detected - tell users to update to gguf in the case automatic pull fails Co-Authored-By:Jeffrey Morgan <jmorganca@gmail.com>
-
- 18 Dec, 2023 3 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 14 Dec, 2023 1 commit
-
-
Bruce MacDonald authored
* restore model load duration on generate response - set model load duration on generate and chat done response - calculate createAt time when response created * remove checkpoints predict opts * Update routes.go
-
- 13 Dec, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 12 Dec, 2023 2 commits
-
-
Bruce MacDonald authored
-
Bruce MacDonald authored
- remove parallel
-
- 11 Dec, 2023 2 commits
-
-
Patrick Devine authored
--------- Co-authored-by:Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
-
Michael Yang authored
mostly replaced by decoding tensors except ggml models which only support llama
-
- 10 Dec, 2023 4 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 09 Dec, 2023 1 commit
-
-
Bruce MacDonald authored
* fix: queued request failures - increase parallel requests to 2 to complete queued request, queueing is managed in ollama * log steam errors
-
- 05 Dec, 2023 7 commits
-
-
Michael Yang authored
-
Bruce MacDonald authored
-
Jeffrey Morgan authored
This reverts commit 7a0899d6.
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 04 Dec, 2023 2 commits
-
-
Bruce MacDonald authored
- update chat docs - add messages chat endpoint - remove deprecated context and template generate parameters from docs - context and template are still supported for the time being and will continue to work as expected - add partial response to chat history
-
Michael Yang authored
-
- 26 Nov, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 24 Nov, 2023 2 commits
-
-
Jing Zhang authored
* Support cuda build in Windows * Enable dynamic NumGPU allocation for Windows
-
Jongwook Choi authored
When CUDA peer access is enabled, multi-gpu inference will produce garbage output. This is a known bug of llama.cpp (or nvidia). Until the upstream bug is fixed, we can disable CUDA peer access temporarily to ensure correct output. See #961.
-
- 22 Nov, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Michael Yang authored
-
- 21 Nov, 2023 2 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 20 Nov, 2023 3 commits
-
-
Michael Yang authored
-
Purinda Gunasekara authored
-
Jeffrey Morgan authored
-
- 19 Nov, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Bruce MacDonald authored
-
- 17 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 10 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
* add `"format": "json"` as an API parameter --------- Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-