1. 11 Sep, 2024 1 commit
    • Jesse Gross's avatar
      runner: Flush pending responses before returning · 93ac3760
      Jesse Gross authored
      If there are any pending reponses (such as from potential stop
      tokens) then we should send them back before ending the sequence.
      Otherwise, we can be missing tokens at the end of a response.
      
      Fixes #6707
      93ac3760
  2. 04 Sep, 2024 1 commit
  3. 03 Sep, 2024 1 commit
    • FellowTraveler's avatar
      Fix sprintf to snprintf (#5664) · 94fff580
      FellowTraveler authored
      /Users/au/src/ollama/llm/ext_server/server.cpp:289:9: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.
      94fff580
  4. 22 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Fix embeddings memory corruption (#6467) · 90ca8417
      Daniel Hiltgen authored
      * Fix embeddings memory corruption
      
      The patch was leading to a buffer overrun corruption.  Once removed though, parallism
      in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
      work around this, only use slot 0 for embeddings.
      
      * Fix embed integration test assumption
      
      The token eval count has changed with recent llama.cpp bumps (0.3.5+)
      90ca8417
  5. 11 Aug, 2024 1 commit
  6. 06 Aug, 2024 1 commit
  7. 05 Aug, 2024 1 commit
  8. 30 Jul, 2024 1 commit
  9. 29 Jul, 2024 1 commit
  10. 22 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Enable windows error dialog for subprocess startup · e12fff88
      Daniel Hiltgen authored
      Make sure if something goes wrong spawning the process, the user gets
      enough info to be able to try to self correct, or at least file a bug
      with details so we can fix it.  Once the process starts, we immediately
      change back to the recommended setting to prevent the blocking dialog.
      This ensures if the model fails to load (OOM, unsupported model type,
      etc.) the process will exit quickly and we can scan the stdout/stderr
      of the subprocess for the reason to report via API.
      e12fff88
  11. 15 Jul, 2024 1 commit
    • royjhan's avatar
      Introduce `/api/embed` endpoint supporting batch embedding (#5127) · b9f5e16c
      royjhan authored
      * Initial Batch Embedding
      
      * Revert "Initial Batch Embedding"
      
      This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.
      
      * Initial Draft
      
      * mock up notes
      
      * api/embed draft
      
      * add server function
      
      * check normalization
      
      * clean up
      
      * normalization
      
      * playing around with truncate stuff
      
      * Truncation
      
      * Truncation
      
      * move normalization to go
      
      * Integration Test Template
      
      * Truncation Integration Tests
      
      * Clean up
      
      * use float32
      
      * move normalize
      
      * move normalize test
      
      * refactoring
      
      * integration float32
      
      * input handling and handler testing
      
      * Refactoring of legacy and new
      
      * clear comments
      
      * merge conflicts
      
      * touches
      
      * embedding type 64
      
      * merge conflicts
      
      * fix hanging on single string
      
      * refactoring
      
      * test values
      
      * set context length
      
      * clean up
      
      * testing clean up
      
      * testing clean up
      
      * remove function closure
      
      * Revert "remove function closure"
      
      This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.
      
      * remove function closure
      
      * remove redundant error check
      
      * clean up
      
      * more clean up
      
      * clean up
      b9f5e16c
  12. 07 Jul, 2024 2 commits
  13. 05 Jul, 2024 1 commit
  14. 03 Jul, 2024 1 commit
  15. 29 Jun, 2024 1 commit
  16. 19 Jun, 2024 1 commit
  17. 14 Jun, 2024 1 commit
  18. 11 Jun, 2024 1 commit
  19. 09 Jun, 2024 1 commit
  20. 01 Jun, 2024 1 commit
  21. 29 May, 2024 3 commits
  22. 23 May, 2024 2 commits
  23. 20 May, 2024 1 commit
    • Sam's avatar
      feat: add support for flash_attn (#4120) · e15307fd
      Sam authored
      * feat: enable flash attention if supported
      
      * feat: enable flash attention if supported
      
      * feat: enable flash attention if supported
      
      * feat: add flash_attn support
      e15307fd
  24. 09 May, 2024 1 commit
  25. 04 May, 2024 1 commit
  26. 30 Apr, 2024 3 commits
  27. 17 Apr, 2024 1 commit
  28. 16 Apr, 2024 1 commit
  29. 01 Apr, 2024 2 commits
    • Daniel Hiltgen's avatar
      Apply 01-cache.diff · 0a0e9f3e
      Daniel Hiltgen authored
      0a0e9f3e
    • Daniel Hiltgen's avatar
      Switch back to subprocessing for llama.cpp · 58d95cc9
      Daniel Hiltgen authored
      This should resolve a number of memory leak and stability defects by allowing
      us to isolate llama.cpp in a separate process and shutdown when idle, and
      gracefully restart if it has problems.  This also serves as a first step to be
      able to run multiple copies to support multiple models concurrently.
      58d95cc9
  30. 26 Mar, 2024 1 commit
  31. 23 Mar, 2024 1 commit
  32. 16 Mar, 2024 1 commit
  33. 12 Mar, 2024 1 commit