- 12 Dec, 2023 1 commit
-
-
Patrick Devine authored
-
- 11 Dec, 2023 1 commit
-
-
Patrick Devine authored
--------- Co-authored-by:Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
-
- 09 Dec, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 05 Dec, 2023 3 commits
-
-
Michael Yang authored
-
Bruce MacDonald authored
-
Jeffrey Morgan authored
This reverts commit 7a0899d6.
-
- 04 Dec, 2023 1 commit
-
-
Bruce MacDonald authored
- update chat docs - add messages chat endpoint - remove deprecated context and template generate parameters from docs - context and template are still supported for the time being and will continue to work as expected - add partial response to chat history
-
- 29 Nov, 2023 1 commit
-
-
Patrick Devine authored
-
- 15 Nov, 2023 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 10 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
* add `"format": "json"` as an API parameter --------- Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 09 Nov, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 08 Nov, 2023 1 commit
-
-
Bruce MacDonald authored
- add the optional `raw` generate request parameter to bypass prompt formatting and response context -add raw request to docs
-
- 03 Nov, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 02 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 19 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
- only reload the running llm if the model has changed, or the options for loading the running model have changed - rename loaded llm to runner to differentiate from loaded model image - remove logic which keeps the first system prompt in the generation context
-
- 13 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
- remove new lines from llama.cpp error messages relayed to client - check api option types and return error on wrong type - change num layers from 95% VRAM to 92% VRAM
-
- 12 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 11 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
* update streaming request accept header * add optional stream param to request bodies
-
- 05 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 02 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
* include seed in params for llama.cpp server and remove empty filter for temp * relay default predict options to llama.cpp - reorganize options to match predict request for readability * omit empty stop --------- Co-authored-by:hallh <hallh@users.noreply.github.com>
-
- 29 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 28 Sep, 2023 1 commit
-
-
Michael Yang authored
-
- 12 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
* linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 06 Sep, 2023 1 commit
-
-
Patrick Devine authored
-
- 31 Aug, 2023 1 commit
-
-
Michael Yang authored
-
- 30 Aug, 2023 1 commit
-
-
Bruce MacDonald authored
* remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm
-
- 29 Aug, 2023 1 commit
-
-
Patrick Devine authored
-
- 17 Aug, 2023 1 commit
-
-
Michael Yang authored
-
- 10 Aug, 2023 4 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
Bruce MacDonald authored
Co-Authored-By:Jeffrey Morgan <jmorganca@gmail.com>
-
- 08 Aug, 2023 2 commits
-
-
Bruce MacDonald authored
- default to embeddings enabled - move embedding logic for loaded model to request - allow embedding full directory - close llm on reload
-
Jeffrey Morgan authored
Fixes #297 Fixes #296
-
- 07 Aug, 2023 1 commit
-
-
Michael Yang authored
num_keep defines how many tokens to keep in the context when truncating inputs. if left to its default value of -1, the server will calculate num_keep to be the left of the system instructions
-
- 04 Aug, 2023 1 commit
-
-
Michael Yang authored
-
- 01 Aug, 2023 1 commit
-
-
Bruce MacDonald authored
- read runner options from map to see what was specified explicitly and overwrite zero values
-