- 16 Jul, 2024 7 commits
-
-
Michael Yang authored
this change is triggered by the presence of "suffix", particularly useful for code completion tasks
-
royjhan authored
* OpenAI v1 models * Empty List Testing * Add back envconfig * v1/models docs * Remove Docs * OpenAI batch embed compatibility * merge conflicts * integrate with api/embed * ep * merge conflicts * request tests * rm resp test * merge conflict * merge conflict * test fixes * test fn renaming * input validation for empty string --------- Co-authored-by:jmorganca <jmorganca@gmail.com>
-
Michael Yang authored
-
Jeffrey Morgan authored
-
Michael Yang authored
-
Jeffrey Morgan authored
* server: return empty slice on empty `/api/embed` request * fix tests
-
Michael Yang authored
-
- 15 Jul, 2024 2 commits
-
-
Michael Yang authored
-
royjhan authored
* Initial Batch Embedding * Revert "Initial Batch Embedding" This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29. * Initial Draft * mock up notes * api/embed draft * add server function * check normalization * clean up * normalization * playing around with truncate stuff * Truncation * Truncation * move normalization to go * Integration Test Template * Truncation Integration Tests * Clean up * use float32 * move normalize * move normalize test * refactoring * integration float32 * input handling and handler testing * Refactoring of legacy and new * clear comments * merge conflicts * touches * embedding type 64 * merge conflicts * fix hanging on single string * refactoring * test values * set context length * clean up * testing clean up * testing clean up * remove function closure * Revert "remove function closure" This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787. * remove function closure * remove redundant error check * clean up * more clean up * clean up
-
- 14 Jul, 2024 1 commit
-
-
Patrick Devine authored
-
- 13 Jul, 2024 3 commits
-
-
jmorganca authored
-
Jeffrey Morgan authored
* server: fix `contet`, `load_duration` and `total_duration` fields * Update server/routes.go
-
Michael Yang authored
* fix system prompt * execute template when hitting previous roles * fix tests --------- Co-authored-by:jmorganca <jmorganca@gmail.com>
-
- 11 Jul, 2024 3 commits
-
-
Michael Yang authored
This reverts commit 19753c18. for compat. messages will be added at a later date
-
Jeffrey Morgan authored
-
Michael Yang authored
-
- 09 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL` * remove whitespace change * undo some changes
-
- 07 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 05 Jul, 2024 4 commits
-
-
Michael Yang authored
-
Michael Yang authored
ensure runtime model changes (template, system prompt, messages, options) are captured on model updates without needing to reload the server
-
Michael Yang authored
-
Michael Yang authored
-
- 03 Jul, 2024 3 commits
-
-
Anatoli Babenia authored
* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org> Co-authored-by:Maas Lalani <maas@lalani.dev>
-
Daniel Hiltgen authored
This change fixes the handling of keep_alive so that if client request omits the setting, we only set this on initial load. Once the model is loaded, if new requests leave this unset, we'll keep whatever keep_alive was there.
-
Daniel Hiltgen authored
Users may not realize the siny new model they're trying to load fits on their disk, but can't load into system+GPU memory. Today we crash, but with this fix, we'll give them a better error message before even trying to load it.
-
- 02 Jul, 2024 3 commits
-
-
Michael Yang authored
-
royjhan authored
* OpenAI v1 models * Refactor Writers * Add Test Co-Authored-By: Attila Kerekes * Credit Co-Author Co-Authored-By:
Attila Kerekes <439392+keriati@users.noreply.github.com> * Empty List Testing * Use Namespace for Ownedby * Update Test * Add back envconfig * v1/models docs * Use ModelName Parser * Test Names * Remove Docs * Clean Up * Test name Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> * Add Middleware for Chat and List * Completions Endpoint * Testing Cleanup * Test with Fatal * Add functionality to chat test * Rename function * float types * type cleanup * cleaning * more cleaning * Extra test cases * merge conflicts * merge conflicts * merge conflicts * merge conflicts * cleaning * cleaning --------- Co-authored-by:
Attila Kerekes <439392+keriati@users.noreply.github.com> Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com>
-
royjhan authored
* OpenAI v1 models * Refactor Writers * Add Test Co-Authored-By: Attila Kerekes * Credit Co-Author Co-Authored-By:
Attila Kerekes <439392+keriati@users.noreply.github.com> * Empty List Testing * Use Namespace for Ownedby * Update Test * Add back envconfig * v1/models docs * Use ModelName Parser * Test Names * Remove Docs * Clean Up * Test name Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> * Add Middleware for Chat and List * Testing Cleanup * Test with Fatal * Add functionality to chat test * OpenAI: /v1/models/{model} compatibility (#5028) * Retrieve Model * OpenAI Delete Model * Retrieve Middleware * Remove Delete from Branch * Update Test * Middleware Test File * Function name * Cleanup * Test Update * Test Update --------- Co-authored-by:
Attila Kerekes <439392+keriati@users.noreply.github.com> Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com>
-
- 01 Jul, 2024 6 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Daniel Hiltgen authored
-
- 27 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 25 Jun, 2024 1 commit
-
-
Blake Mizerany authored
Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF.
-
- 21 Jun, 2024 4 commits
-
-
Daniel Hiltgen authored
Provide consistent ordering for the ps command - longest duration listed first
-
Daniel Hiltgen authored
Until ROCm v6.2 ships, we wont be able to get accurate free memory reporting on windows, which makes automatic concurrency too risky. Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes. All other platforms and GPUs have accurate VRAM reporting wired up now, so we can turn on concurrency by default.
-
Daniel Hiltgen authored
This adjusts our default settings to enable multiple models and parallel requests to a single model. Users can still override these by the same env var settings as before. Parallel has a direct impact on num_ctx, which in turn can have a significant impact on small VRAM GPUs so this change also refines the algorithm so that when parallel is not explicitly set by the user, we try to find a reasonable default that fits the model on their GPU(s). As before, multiple models will only load concurrently if they fully fit in VRAM.
-
Michael Yang authored
-