1. 12 Sep, 2024 1 commit
    • Daniel Hiltgen's avatar
      Optimize container images for startup (#6547) · cd5c8f64
      Daniel Hiltgen authored
      * Optimize container images for startup
      
      This change adjusts how to handle runner payloads to support
      container builds where we keep them extracted in the filesystem.
      This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
      size, and should result in faster startup times for container images.
      
      * Refactor payload logic and add buildx support for faster builds
      
      * Move payloads around
      
      * Review comments
      
      * Converge to buildx based helper scripts
      
      * Use docker buildx action for release
      cd5c8f64
  2. 11 Sep, 2024 1 commit
    • Jesse Gross's avatar
      runner: Flush pending responses before returning · 93ac3760
      Jesse Gross authored
      If there are any pending reponses (such as from potential stop
      tokens) then we should send them back before ending the sequence.
      Otherwise, we can be missing tokens at the end of a response.
      
      Fixes #6707
      93ac3760
  3. 10 Sep, 2024 1 commit
  4. 06 Sep, 2024 1 commit
  5. 05 Sep, 2024 2 commits
  6. 04 Sep, 2024 2 commits
  7. 03 Sep, 2024 2 commits
    • Daniel Hiltgen's avatar
      Log system memory at info (#6617) · 037a4d10
      Daniel Hiltgen authored
      On systems with low system memory, we can hit allocation failures that are difficult to diagnose
      without debug logs.  This will make it easier to spot.
      037a4d10
    • FellowTraveler's avatar
      Fix sprintf to snprintf (#5664) · 94fff580
      FellowTraveler authored
      /Users/au/src/ollama/llm/ext_server/server.cpp:289:9: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.
      94fff580
  8. 29 Aug, 2024 1 commit
  9. 27 Aug, 2024 1 commit
  10. 25 Aug, 2024 1 commit
  11. 23 Aug, 2024 2 commits
  12. 22 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Fix embeddings memory corruption (#6467) · 90ca8417
      Daniel Hiltgen authored
      * Fix embeddings memory corruption
      
      The patch was leading to a buffer overrun corruption.  Once removed though, parallism
      in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
      work around this, only use slot 0 for embeddings.
      
      * Fix embed integration test assumption
      
      The token eval count has changed with recent llama.cpp bumps (0.3.5+)
      90ca8417
  13. 21 Aug, 2024 1 commit
  14. 20 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Split rocm back out of bundle (#6432) · a017cf2f
      Daniel Hiltgen authored
      We're over budget for github's maximum release artifact size with rocm + 2 cuda
      versions.  This splits rocm back out as a discrete artifact, but keeps the layout so it can
      be extracted into the same location as the main bundle.
      a017cf2f
  15. 19 Aug, 2024 6 commits
  16. 12 Aug, 2024 1 commit
  17. 11 Aug, 2024 2 commits
  18. 08 Aug, 2024 1 commit
  19. 07 Aug, 2024 1 commit
  20. 06 Aug, 2024 1 commit
  21. 05 Aug, 2024 4 commits
  22. 02 Aug, 2024 1 commit
  23. 31 Jul, 2024 5 commits