1. 23 Apr, 2024 1 commit
    • Daniel Hiltgen's avatar
      Request and model concurrency · 34b9db5a
      Daniel Hiltgen authored
      This change adds support for multiple concurrent requests, as well as
      loading multiple models by spawning multiple runners. The default
      settings are currently set at 1 concurrent request per model and only 1
      loaded model at a time, but these can be adjusted by setting
      OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
      34b9db5a
  2. 11 Jan, 2024 2 commits
  3. 09 Jan, 2024 1 commit
  4. 03 Jan, 2024 1 commit
  5. 02 Jan, 2024 1 commit
    • Daniel Hiltgen's avatar
      Switch windows build to fully dynamic · d966b730
      Daniel Hiltgen authored
      Refactor where we store build outputs, and support a fully dynamic loading
      model on windows so the base executable has no special dependencies thus
      doesn't require a special PATH.
      d966b730
  6. 19 Dec, 2023 1 commit