1. 16 Apr, 2025 1 commit
  2. 02 Apr, 2025 1 commit
  3. 31 Oct, 2024 1 commit
    • Daniel Hiltgen's avatar
      Give unicode test more time to run (#7437) · 921779bb
      Daniel Hiltgen authored
      * Give unicode test more time to run
      
      Some slower GPUs (or partial CPU/GPU loads) can take more than the default 30s to complete this test
      
      * Give more time for concurrency test
      
      CPU inference can be very slow under stress
      921779bb
  4. 29 Oct, 2024 1 commit
  5. 22 Oct, 2024 1 commit
    • Jesse Gross's avatar
      runner.go: Merge partial unicode characters before sending · 03e40efa
      Jesse Gross authored
      We check for partial unicode characters and accumulate them before
      sending. However, when we did send, we still sent each individual piece
      separately, leading to broken output. This combines everything into
      a single group, which is also more efficient.
      
      This also switches to the built-in check for valid unicode characters,
      which is stricter. After this, we should never send back an invalid
      sequence.
      
      Fixes #7290
      03e40efa
  6. 22 Jul, 2024 1 commit
  7. 23 Apr, 2024 2 commits
    • Daniel Hiltgen's avatar
      Local unicode test case · f2ea8470
      Daniel Hiltgen authored
      f2ea8470
    • Daniel Hiltgen's avatar
      Request and model concurrency · 34b9db5a
      Daniel Hiltgen authored
      This change adds support for multiple concurrent requests, as well as
      loading multiple models by spawning multiple runners. The default
      settings are currently set at 1 concurrent request per model and only 1
      loaded model at a time, but these can be adjusted by setting
      OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
      34b9db5a
  8. 01 Apr, 2024 1 commit
  9. 26 Mar, 2024 1 commit
  10. 25 Mar, 2024 1 commit
  11. 23 Mar, 2024 1 commit