1. 28 Apr, 2024 1 commit
    • Daniel Hiltgen's avatar
      Fix concurrency for CPU mode · d6e3b645
      Daniel Hiltgen authored
      Prior refactoring passes accidentally removed the logic to bypass VRAM
      checks for CPU loads.  This adds that back, along with test coverage.
      
      This also fixes loaded map access in the unit test to be behind the mutex which was
      likely the cause of various flakes in the tests.
      d6e3b645
  2. 25 Apr, 2024 1 commit
  3. 24 Apr, 2024 3 commits
  4. 23 Apr, 2024 2 commits
    • Daniel Hiltgen's avatar
      Harden sched TestLoad · d8851cb7
      Daniel Hiltgen authored
      Give the go routine a moment to deliver the expired event
      d8851cb7
    • Daniel Hiltgen's avatar
      Request and model concurrency · 34b9db5a
      Daniel Hiltgen authored
      This change adds support for multiple concurrent requests, as well as
      loading multiple models by spawning multiple runners. The default
      settings are currently set at 1 concurrent request per model and only 1
      loaded model at a time, but these can be adjusted by setting
      OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
      34b9db5a