1. 08 Dec, 2025 1 commit
  2. 28 Oct, 2025 2 commits
  3. 27 Oct, 2025 1 commit
    • nicole pardal's avatar
      server: Consolidate embedding truncation in runner (#12730) · 5d347f6d
      nicole pardal authored
      Currently, checking the length of prompts for embeddings to ensure
      they fit in the context window (and possible truncation) occurs in
      two places - the Ollama server and runner. This can lead to
      inconsistencies in both the checks and reported number of tokens
      processed. Since we have to do this processing in the runner, this
      consolidates all of the logic there.
      5d347f6d
  4. 20 Oct, 2025 1 commit
  5. 18 Sep, 2025 1 commit
  6. 09 Sep, 2025 1 commit
    • Daniel Hiltgen's avatar
      tests: reduce stress on CPU to 2 models (#12161) · 67451828
      Daniel Hiltgen authored
      * tests: reduce stress on CPU to 2 models
      
      This should avoid flakes due to systems getting overloaded with 3 (or more) models running concurrently
      
      * tests: allow slow systems to pass on timeout
      
      If a slow system is still streaming a response, and the response
      will pass validation, don't fail just because the system is slow.
      
      * test: unload embedding models more quickly
      67451828
  7. 29 Apr, 2025 1 commit
  8. 22 Oct, 2024 1 commit
  9. 22 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Fix embeddings memory corruption (#6467) · 90ca8417
      Daniel Hiltgen authored
      * Fix embeddings memory corruption
      
      The patch was leading to a buffer overrun corruption.  Once removed though, parallism
      in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
      work around this, only use slot 0 for embeddings.
      
      * Fix embed integration test assumption
      
      The token eval count has changed with recent llama.cpp bumps (0.3.5+)
      90ca8417
  10. 30 Jul, 2024 1 commit
  11. 24 Jul, 2024 1 commit
  12. 15 Jul, 2024 1 commit
    • royjhan's avatar
      Introduce `/api/embed` endpoint supporting batch embedding (#5127) · b9f5e16c
      royjhan authored
      * Initial Batch Embedding
      
      * Revert "Initial Batch Embedding"
      
      This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.
      
      * Initial Draft
      
      * mock up notes
      
      * api/embed draft
      
      * add server function
      
      * check normalization
      
      * clean up
      
      * normalization
      
      * playing around with truncate stuff
      
      * Truncation
      
      * Truncation
      
      * move normalization to go
      
      * Integration Test Template
      
      * Truncation Integration Tests
      
      * Clean up
      
      * use float32
      
      * move normalize
      
      * move normalize test
      
      * refactoring
      
      * integration float32
      
      * input handling and handler testing
      
      * Refactoring of legacy and new
      
      * clear comments
      
      * merge conflicts
      
      * touches
      
      * embedding type 64
      
      * merge conflicts
      
      * fix hanging on single string
      
      * refactoring
      
      * test values
      
      * set context length
      
      * clean up
      
      * testing clean up
      
      * testing clean up
      
      * remove function closure
      
      * Revert "remove function closure"
      
      This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.
      
      * remove function closure
      
      * remove redundant error check
      
      * clean up
      
      * more clean up
      
      * clean up
      b9f5e16c