1. 16 Apr, 2025 1 commit
  2. 02 Apr, 2025 1 commit
  3. 14 Mar, 2025 1 commit
    • Jesse Gross's avatar
      ml: Allow models to constrain inputs to a single batch · 9679f401
      Jesse Gross authored
      Models may require that a set of inputs all be processed as part
      of the same batch. For example, if an image has multiple patches
      with fully connected attention between them, we should not split
      the batch in the middle of an image.
      
      Fixes #9697
      9679f401
  4. 01 Nov, 2024 1 commit
  5. 14 Jun, 2024 1 commit
  6. 23 Apr, 2024 1 commit
    • Daniel Hiltgen's avatar
      Request and model concurrency · 34b9db5a
      Daniel Hiltgen authored
      This change adds support for multiple concurrent requests, as well as
      loading multiple models by spawning multiple runners. The default
      settings are currently set at 1 concurrent request per model and only 1
      loaded model at a time, but these can be adjusted by setting
      OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
      34b9db5a
  7. 26 Mar, 2024 1 commit
  8. 25 Mar, 2024 1 commit
  9. 23 Mar, 2024 1 commit
  10. 23 Dec, 2023 1 commit
  11. 19 Dec, 2023 1 commit