1. 10 Sep, 2024 1 commit
  2. 05 Sep, 2024 2 commits
  3. 27 Aug, 2024 1 commit
  4. 23 Aug, 2024 1 commit
  5. 19 Aug, 2024 2 commits
    • Daniel Hiltgen's avatar
      Adjust layout to bin+lib/ollama · 88bb9e33
      Daniel Hiltgen authored
      88bb9e33
    • Daniel Hiltgen's avatar
      Refactor linux packaging · 74d45f01
      Daniel Hiltgen authored
      This adjusts linux to follow a similar model to windows with a discrete archive
      (zip/tgz) to cary the primary executable, and dependent libraries. Runners are
      still carried as payloads inside the main binary
      
      Darwin retain the payload model where the go binary is fully self contained.
      74d45f01
  6. 22 Jul, 2024 10 commits
  7. 03 Jul, 2024 2 commits
  8. 01 Jul, 2024 1 commit
  9. 21 Jun, 2024 2 commits
    • Daniel Hiltgen's avatar
      Disable concurrency for AMD + Windows · 9929751c
      Daniel Hiltgen authored
      Until ROCm v6.2 ships, we wont be able to get accurate free memory
      reporting on windows, which makes automatic concurrency too risky.
      Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
      All other platforms and GPUs have accurate VRAM reporting wired
      up now, so we can turn on concurrency by default.
      9929751c
    • Daniel Hiltgen's avatar
      Enable concurrency by default · 17b7186c
      Daniel Hiltgen authored
      This adjusts our default settings to enable multiple models and parallel
      requests to a single model.  Users can still override these by the same
      env var settings as before.  Parallel has a direct impact on
      num_ctx, which in turn can have a significant impact on small VRAM GPUs
      so this change also refines the algorithm so that when parallel is not
      explicitly set by the user, we try to find a reasonable default that fits
      the model on their GPU(s).  As before, multiple models will only load
      concurrently if they fully fit in VRAM.
      17b7186c
  10. 19 Jun, 2024 2 commits
  11. 17 Jun, 2024 1 commit
  12. 14 Jun, 2024 2 commits
    • Daniel Hiltgen's avatar
      Centralize GPU configuration vars · 6be309e1
      Daniel Hiltgen authored
      This should aid in troubleshooting by capturing and reporting the GPU
      settings at startup in the logs along with all the other server settings.
      6be309e1
    • Daniel Hiltgen's avatar
      Support forced spreading for multi GPU · 5e8ff556
      Daniel Hiltgen authored
      Our default behavior today is to try to fit into a single GPU if possible.
      Some users would prefer the old behavior of always spreading across
      multiple GPUs even if the model can fit into one.  This exposes that
      tunable behavior.
      5e8ff556
  13. 13 Jun, 2024 1 commit
  14. 12 Jun, 2024 1 commit
  15. 06 Jun, 2024 1 commit
  16. 04 Jun, 2024 2 commits
  17. 30 May, 2024 1 commit
  18. 24 May, 2024 1 commit
  19. 23 May, 2024 1 commit
  20. 05 May, 2024 1 commit
    • Daniel Hiltgen's avatar
      Centralize server config handling · f56aa200
      Daniel Hiltgen authored
      This moves all the env var reading into one central module
      and logs the loaded config once at startup which should
      help in troubleshooting user server logs
      f56aa200