1. 15 Oct, 2024 1 commit
  2. 14 Oct, 2024 1 commit
  3. 13 Oct, 2024 1 commit
  4. 12 Oct, 2024 1 commit
  5. 10 Oct, 2024 3 commits
    • Jesse Gross's avatar
      cli: Send all images in conversation history · 7fe39025
      Jesse Gross authored
      Currently the CLI only sends images from the most recent image-
      containing message. This prevents doing things like sending
      one message with an image and then a follow message with a
      second image and asking for comparision based on additional
      information not present in any text that was output.
      
      It's possible that some models have a problem with this but the
      CLI is not the right place to do this since any adjustments are
      model-specific and should affect all clients.
      
      Both llava:34b and minicpm-v do reasonable things with multiple
      images in the history.
      7fe39025
    • Jesse Gross's avatar
      runner.go: Handle truncation of tokens for stop sequences · 0077e22d
      Jesse Gross authored
      When a single token contains both text to be return and a stop
      sequence, this causes an out of bounds error when we update the
      cache to match our text. This is because we currently assume that
      the removing the stop sequence will consume at least one token.
      
      This also inverts the logic to deal with positive numbers, rather
      than a value to be subtracted, which is easier to reason about.
      
      Fixes #7153
      0077e22d
    • Jesse Gross's avatar
      server: Don't clear cmd when closing a server · 03408f34
      Jesse Gross authored
      Close can be called on an LLM server if the runner subprocess dies.
      However, the Ollama scheduler code may not know about this yet and
      still try to access it. In this case, it is important that 'cmd'
      is still available as it is used to check on the status of the
      subprocess. If this happens, Kill may be called twice on the subprocess -
      that is fine.
      
      In addition, model unloading may race with new accesses, so we should
      hold a lock around this. This may result in the model being reloaded
      after the first close call - this is also fine as close will be called
      again later.
      03408f34
  6. 09 Oct, 2024 2 commits
  7. 08 Oct, 2024 3 commits
    • Daniel Hiltgen's avatar
      Fix build leakages (#7141) · f9584deb
      Daniel Hiltgen authored
      The recent change to applying patches leaves the submodule dirty based on
      "new commits" being present.  This ensures we clean up so the tree no longer
      reports dirty after a `go generate ./...` run.
      
      The Makefile was being a bit too aggressive in cleaning things up and would result in deleting the placeholder files which someone might accidentally commit.
      f9584deb
    • Jeffrey Morgan's avatar
      Re-introduce the `llama` package (#5034) · 96efd905
      Jeffrey Morgan authored
      * Re-introduce the llama package
      
      This PR brings back the llama package, making it possible to call llama.cpp and
      ggml APIs from Go directly via CGo. This has a few advantages:
      
      - C APIs can be called directly from Go without needing to use the previous
        "server" REST API
      - On macOS and for CPU builds on Linux and Windows, Ollama can be built without
        a go generate ./... step, making it easy to get up and running to hack on
        parts of Ollama that don't require fast inference
      - Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners
        takes <5 min on a fast CPU)
      - No git submodule making it easier to clone and build from source
      
      This is a big PR, but much of it is vendor code except for:
      
      - llama.go CGo bindings
      - example/: a simple example of running inference
      - runner/: a subprocess server designed to replace the llm/ext_server package
      - Makefile an as minimal as possible Makefile to build the runner package for
        different...
      96efd905
    • Shifra Goldstone's avatar
  8. 05 Oct, 2024 1 commit
  9. 01 Oct, 2024 1 commit
  10. 29 Sep, 2024 1 commit
  11. 26 Sep, 2024 1 commit
    • Blake Mizerany's avatar
      server: close response body on error (#6986) · 03608cb4
      Blake Mizerany authored
      This change closes the response body when an error occurs in
      makeRequestWithRetry. Previously, the first, non-200 response body was
      not closed before reattempting the request. This change ensures that
      the response body is closed in all cases where an error occurs,
      preventing leaks of file descriptors.
      
      Fixes #6974
      03608cb4
  12. 25 Sep, 2024 2 commits
  13. 24 Sep, 2024 3 commits
  14. 22 Sep, 2024 1 commit
  15. 21 Sep, 2024 3 commits
  16. 20 Sep, 2024 3 commits
  17. 18 Sep, 2024 4 commits
  18. 17 Sep, 2024 3 commits
  19. 16 Sep, 2024 5 commits