1. 11 Sep, 2024 1 commit
    • Jesse Gross's avatar
      runner: Flush pending responses before returning · 93ac3760
      Jesse Gross authored
      If there are any pending reponses (such as from potential stop
      tokens) then we should send them back before ending the sequence.
      Otherwise, we can be missing tokens at the end of a response.
      
      Fixes #6707
      93ac3760
  2. 10 Sep, 2024 1 commit
  3. 06 Sep, 2024 1 commit
  4. 05 Sep, 2024 2 commits
  5. 04 Sep, 2024 2 commits
  6. 03 Sep, 2024 2 commits
    • Daniel Hiltgen's avatar
      Log system memory at info (#6617) · 037a4d10
      Daniel Hiltgen authored
      On systems with low system memory, we can hit allocation failures that are difficult to diagnose
      without debug logs.  This will make it easier to spot.
      037a4d10
    • FellowTraveler's avatar
      Fix sprintf to snprintf (#5664) · 94fff580
      FellowTraveler authored
      /Users/au/src/ollama/llm/ext_server/server.cpp:289:9: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.
      94fff580
  7. 29 Aug, 2024 1 commit
  8. 27 Aug, 2024 1 commit
  9. 25 Aug, 2024 1 commit
  10. 23 Aug, 2024 2 commits
  11. 22 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Fix embeddings memory corruption (#6467) · 90ca8417
      Daniel Hiltgen authored
      * Fix embeddings memory corruption
      
      The patch was leading to a buffer overrun corruption.  Once removed though, parallism
      in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
      work around this, only use slot 0 for embeddings.
      
      * Fix embed integration test assumption
      
      The token eval count has changed with recent llama.cpp bumps (0.3.5+)
      90ca8417
  12. 21 Aug, 2024 1 commit
  13. 20 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Split rocm back out of bundle (#6432) · a017cf2f
      Daniel Hiltgen authored
      We're over budget for github's maximum release artifact size with rocm + 2 cuda
      versions.  This splits rocm back out as a discrete artifact, but keeps the layout so it can
      be extracted into the same location as the main bundle.
      a017cf2f
  14. 19 Aug, 2024 6 commits
  15. 12 Aug, 2024 1 commit
  16. 11 Aug, 2024 2 commits
  17. 08 Aug, 2024 1 commit
  18. 07 Aug, 2024 1 commit
  19. 06 Aug, 2024 1 commit
  20. 05 Aug, 2024 4 commits
  21. 02 Aug, 2024 1 commit
  22. 31 Jul, 2024 5 commits
  23. 30 Jul, 2024 1 commit