1. 27 Aug, 2024 1 commit
  2. 23 Aug, 2024 2 commits
  3. 19 Aug, 2024 6 commits
  4. 09 Aug, 2024 1 commit
  5. 05 Aug, 2024 3 commits
  6. 02 Aug, 2024 1 commit
  7. 24 Jul, 2024 1 commit
  8. 22 Jul, 2024 3 commits
  9. 20 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Adjust windows ROCm discovery · 283948c8
      Daniel Hiltgen authored
      The v5 hip library returns unsupported GPUs which wont enumerate at
      inference time in the runner so this makes sure we align discovery.  The
      gfx906 cards are no longer supported so we shouldn't compile with that
      GPU type as it wont enumerate at runtime.
      283948c8
  10. 11 Jul, 2024 1 commit
  11. 10 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Bump ROCm on windows to 6.1.2 · 1f50356e
      Daniel Hiltgen authored
      This also adjusts our algorithm to favor our bundled ROCm.
      I've confirmed VRAM reporting still doesn't work properly so we
      can't yet enable concurrency by default.
      1f50356e
  12. 09 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Detect CUDA OS Overhead · f6f759fc
      Daniel Hiltgen authored
      This adds logic to detect skew between the driver and
      management library which can be attributed to OS overhead
      and records that so we can adjust subsequent management
      library free VRAM updates and avoid OOM scenarios.
      f6f759fc
  13. 06 Jul, 2024 1 commit
  14. 03 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Better nvidia GPU discovery logging · ef757da2
      Daniel Hiltgen authored
      Refine the way we log GPU discovery to improve the non-debug
      output, and report more actionable log messages when possible
      to help users troubleshoot on their own.
      ef757da2
  15. 21 Jun, 2024 1 commit
    • Daniel Hiltgen's avatar
      Disable concurrency for AMD + Windows · 9929751c
      Daniel Hiltgen authored
      Until ROCm v6.2 ships, we wont be able to get accurate free memory
      reporting on windows, which makes automatic concurrency too risky.
      Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
      All other platforms and GPUs have accurate VRAM reporting wired
      up now, so we can turn on concurrency by default.
      9929751c
  16. 20 Jun, 2024 3 commits
  17. 19 Jun, 2024 4 commits
  18. 18 Jun, 2024 1 commit
  19. 17 Jun, 2024 3 commits
  20. 16 Jun, 2024 1 commit
  21. 15 Jun, 2024 1 commit
  22. 14 Jun, 2024 2 commits
    • Daniel Hiltgen's avatar
      Centralize GPU configuration vars · 6be309e1
      Daniel Hiltgen authored
      This should aid in troubleshooting by capturing and reporting the GPU
      settings at startup in the logs along with all the other server settings.
      6be309e1
    • Daniel Hiltgen's avatar
      Workaround gfx900 SDMA bugs · da3bf233
      Daniel Hiltgen authored
      Implement support for GPU env var workarounds, and leverage
      this for the Vega RX 56 which needs
      HSA_ENABLE_SDMA=0 set to work properly
      da3bf233