- 06 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
- this makes it easier to see that the subprocess is associated with ollama
-
- 05 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 04 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 03 Oct, 2023 1 commit
-
-
Michael Yang authored
-
- 02 Oct, 2023 2 commits
-
-
Bruce MacDonald authored
-
Bruce MacDonald authored
* include seed in params for llama.cpp server and remove empty filter for temp * relay default predict options to llama.cpp - reorganize options to match predict request for readability * omit empty stop --------- Co-authored-by:hallh <hallh@users.noreply.github.com>
-
- 29 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 28 Sep, 2023 1 commit
-
-
Michael Yang authored
-
- 25 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
--------- Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 21 Sep, 2023 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Bruce MacDonald authored
* remove tmp directories created by previous servers * clean up on server stop * Update routes.go * Update server/routes.go Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> * create top-level temp ollama dir * check file exists before creating --------- Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me>
-
- 20 Sep, 2023 6 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Bruce MacDonald authored
-
Bruce MacDonald authored
-
Bruce MacDonald authored
-
Bruce MacDonald authored
-
- 18 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
* subprocess improvements - increase start-up timeout - when runner fails to start fail rather than timing out - try runners in order rather than choosing 1 runner - embed metal runner in metal dir rather than gpu - refactor logging and error messages * Update llama.go * Update llama.go * simplify by using glob
-
- 14 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
* enable packaging multiple cuda versions * use nvcc cuda version if available --------- Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 13 Sep, 2023 1 commit
-
-
Michael Yang authored
-
- 12 Sep, 2023 4 commits
-
-
Michael Yang authored
-
Bruce MacDonald authored
-
Michael Yang authored
get model and file type from bin file
-
Bruce MacDonald authored
* linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 07 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 06 Sep, 2023 3 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 05 Sep, 2023 3 commits
-
-
Bruce MacDonald authored
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 03 Sep, 2023 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 30 Aug, 2023 3 commits
-
-
Bruce MacDonald authored
-
Bruce MacDonald authored
* remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm
-
Quinn Slack authored
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list. Fixes https://github.com/jmorganca/ollama/issues/295.
-
- 26 Aug, 2023 3 commits
-
-
Michael Yang authored
warning F16 uses significantly more memory than quantized model so the standard requires don't apply.
-
Michael Yang authored
-
Jeffrey Morgan authored
-