"vscode:/vscode.git/clone" did not exist on "6bda1d24798e40fc9ea1419c6ce22c6cdcc9dfe2"
  1. 22 Jul, 2024 8 commits
  2. 03 Jul, 2024 2 commits
  3. 01 Jul, 2024 1 commit
  4. 21 Jun, 2024 2 commits
    • Daniel Hiltgen's avatar
      Disable concurrency for AMD + Windows · 9929751c
      Daniel Hiltgen authored
      Until ROCm v6.2 ships, we wont be able to get accurate free memory
      reporting on windows, which makes automatic concurrency too risky.
      Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
      All other platforms and GPUs have accurate VRAM reporting wired
      up now, so we can turn on concurrency by default.
      9929751c
    • Daniel Hiltgen's avatar
      Enable concurrency by default · 17b7186c
      Daniel Hiltgen authored
      This adjusts our default settings to enable multiple models and parallel
      requests to a single model.  Users can still override these by the same
      env var settings as before.  Parallel has a direct impact on
      num_ctx, which in turn can have a significant impact on small VRAM GPUs
      so this change also refines the algorithm so that when parallel is not
      explicitly set by the user, we try to find a reasonable default that fits
      the model on their GPU(s).  As before, multiple models will only load
      concurrently if they fully fit in VRAM.
      17b7186c
  5. 19 Jun, 2024 2 commits
  6. 17 Jun, 2024 1 commit
  7. 14 Jun, 2024 2 commits
    • Daniel Hiltgen's avatar
      Centralize GPU configuration vars · 6be309e1
      Daniel Hiltgen authored
      This should aid in troubleshooting by capturing and reporting the GPU
      settings at startup in the logs along with all the other server settings.
      6be309e1
    • Daniel Hiltgen's avatar
      Support forced spreading for multi GPU · 5e8ff556
      Daniel Hiltgen authored
      Our default behavior today is to try to fit into a single GPU if possible.
      Some users would prefer the old behavior of always spreading across
      multiple GPUs even if the model can fit into one.  This exposes that
      tunable behavior.
      5e8ff556
  8. 13 Jun, 2024 1 commit
  9. 12 Jun, 2024 1 commit
  10. 06 Jun, 2024 1 commit
  11. 04 Jun, 2024 2 commits
  12. 30 May, 2024 1 commit
  13. 24 May, 2024 1 commit