- 21 Nov, 2024 17 commits
-
-
Marcin Szczygliński authored
-
Michael authored
-
Jakub Burkiewicz authored
-
Dezoito authored
-
Franco Lombardo authored
-
Aarushi authored
-
Kevin Brake authored
-
chyok authored
-
Nico authored
-
Laurent Eschenauer authored
-
Andy Gill authored
Haverscript uses classical functional programming techniques to provide a composable interface for interacting with ollama-hosted LLMs.
-
drunkwcodes authored
-
boessu authored
-
奶茶叔叔 authored
-
Alexander F. Rødseth authored
-
Nikita Ganzikov authored
-
Daniel Hiltgen authored
-
- 20 Nov, 2024 14 commits
-
-
Jesse Gross authored
Previous versions of the runner would truncate inputs to the context window before beginning processing. The main processing loop relied on this behavior if the context needed to be shifted later (due to token generation). If truncation did not occur then invariants would be broken, causing crashes or infinite loops. Later versions attempted to fix these bugs and make the logic less subtle so that all inputs could be handled. Truncation was removed to make things consistent. However, truncation is much faster than processing and shifting, so removing it caused performance problems when the input vastly exceeded the context size. This restores the input truncation as a performance optimization while keeping the more robust processing logic. Fixes #7762
-
Jesse Gross authored
We need to track which tokens are in the cache ourselves. We currently add tokens to the cache tracker when we add them to batch but they are not actually in the cache until we call Decode. This can cause confusion when we are shifting the cache. Avoids "could not find a KV slot for the batch" issues. Bug #7545
-
Jesse Gross authored
We try to recover from errors by dropping the tokens that caused the problem and re-trying. However, dropping the tokens is not correct and continuing often leads to infinite loops. To avoid, this we end the sequence if such a condition is detected, which is also surprising. At this point, it is better to just report the error. This will make it easier to find problems and the alternatives are perhaps even more surprising to users. This is not a very satisfactory solution either - we should isolate the error and return it to the user without killing the whole process. However, this is an incremental step and consistent with most other failures (which either manifest as abort() or panic).
-
Jesse Gross authored
Fragmentation of the KV cache can occur due to cache shifting or different sequences getting processed. Decode uses a heuristic to decide if it should defrag. However, this heuristic isn't 100% accurate, so decoding can sometimes fail by surprise. For these cases, if decode indicates that there is no KV cache space, we should defrag and then try again.
-
Jesse Gross authored
This doesn't have any impact currently because NUM_PARALLEL is forced to 1 for embeddings, so both indicies will always be 0.
-
Emir Sahin authored
-
Marcus Ziadé authored
-
thewh1teagle authored
-
Adarsh Mishra authored
-
rohitanshu authored
change 'containg' to 'containing'
-
Gordon Kamer authored
-
Jonathan Hecl authored
-
Daniel Hiltgen authored
Many model crashes are masked behind "An existing connection was forcibly closed by the remote host" This captures that common error message and wires in any detected errors from the log. This also adds the deepseek context shift error to the known errors we capture.
-
Daniel Hiltgen authored
Avoid a round-trip asking users for logs to see what went wrong.
-
- 19 Nov, 2024 5 commits
-
-
Gabe Goodhart authored
https://github.com/ollama/ollama/issues/7656 Branch: Granite3StoppingBug-7656 Signed-off-by:
Gabe Goodhart <ghart@us.ibm.com>
-
Blake Mizerany authored
This change allows for mixed-case model names to be pushed, pulled, copied, and created, which was previously disallowed because the Ollama registry was backed by a Docker registry that enforced a naming convention that disallowed mixed-case names, which is no longer the case. This does not break existing, intended, behaviors. Also, make TestCase test a story of creating, updating, pulling, and copying a model with case variations, ensuring the model's manifest is updated correctly, and not duplicated across different files with different case variations.
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
Patrick Devine authored
-
Patrick Sy authored
-
- 18 Nov, 2024 4 commits
-
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
Daniel Hiltgen authored
Enable both left and right click on the pop-up menu
-
Daniel Hiltgen authored
If the model doesn't fit any layers on metal, and we load zero layers we would panic trying to look up the GPU size during scheduling ops
-
Vinh Nguyen authored
-