- 08 Dec, 2025 1 commit
-
-
nicole pardal authored
This PR consolidates all embedding prompt-length checking, truncation, and prompt token counting into the runner to ensure a single source of truth.
-
- 28 Oct, 2025 2 commits
-
-
Patrick Devine authored
-
Patrick Devine authored
This reverts commit 5d347f6d.
-
- 27 Oct, 2025 1 commit
-
-
nicole pardal authored
Currently, checking the length of prompts for embeddings to ensure they fit in the context window (and possible truncation) occurs in two places - the Ollama server and runner. This can lead to inconsistencies in both the checks and reported number of tokens processed. Since we have to do this processing in the runner, this consolidates all of the logic there.
-
- 20 Oct, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 18 Sep, 2025 1 commit
-
-
Michael Yang authored
-
- 09 Sep, 2025 1 commit
-
-
Daniel Hiltgen authored
* tests: reduce stress on CPU to 2 models This should avoid flakes due to systems getting overloaded with 3 (or more) models running concurrently * tests: allow slow systems to pass on timeout If a slow system is still streaming a response, and the response will pass validation, don't fail just because the system is slow. * test: unload embedding models more quickly
-
- 29 Apr, 2025 1 commit
-
-
Daniel Hiltgen authored
The cleanup routine from InitServerconnection should run in the defer of the test case to properly detect failures and report the server logs
-
- 22 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
Use cosine similarity to make the embeddings tests more robust
-
- 22 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
* Fix embeddings memory corruption The patch was leading to a buffer overrun corruption. Once removed though, parallism in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To work around this, only use slot 0 for embeddings. * Fix embed integration test assumption The token eval count has changed with recent llama.cpp bumps (0.3.5+)
-
- 30 Jul, 2024 1 commit
-
-
royjhan authored
* add prompt tokens to embed response * rm slog * metrics * types * prompt n * clean up * reset submodule * update tests * test name * list metrics
-
- 24 Jul, 2024 1 commit
-
-
royjhan authored
* float cmp * increase tolerance
-
- 15 Jul, 2024 1 commit
-
-
royjhan authored
* Initial Batch Embedding * Revert "Initial Batch Embedding" This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29. * Initial Draft * mock up notes * api/embed draft * add server function * check normalization * clean up * normalization * playing around with truncate stuff * Truncation * Truncation * move normalization to go * Integration Test Template * Truncation Integration Tests * Clean up * use float32 * move normalize * move normalize test * refactoring * integration float32 * input handling and handler testing * Refactoring of legacy and new * clear comments * merge conflicts * touches * embedding type 64 * merge conflicts * fix hanging on single string * refactoring * test values * set context length * clean up * testing clean up * testing clean up * remove function closure * Revert "remove function closure" This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787. * remove function closure * remove redundant error check * clean up * more clean up * clean up
-