"tests/test_utils/test_misc.py" did not exist on "12e5913bb92188c236860341eddf085de86ddfff"
  1. 15 Oct, 2024 1 commit
  2. 14 Oct, 2024 1 commit
  3. 21 Sep, 2024 1 commit
  4. 12 Sep, 2024 1 commit
    • Daniel Hiltgen's avatar
      Optimize container images for startup (#6547) · cd5c8f64
      Daniel Hiltgen authored
      * Optimize container images for startup
      
      This change adjusts how to handle runner payloads to support
      container builds where we keep them extracted in the filesystem.
      This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
      size, and should result in faster startup times for container images.
      
      * Refactor payload logic and add buildx support for faster builds
      
      * Move payloads around
      
      * Review comments
      
      * Converge to buildx based helper scripts
      
      * Use docker buildx action for release
      cd5c8f64
  5. 11 Sep, 2024 1 commit
    • Daniel Hiltgen's avatar
      Verify permissions for AMD GPU (#6736) · 9246e6dd
      Daniel Hiltgen authored
      This adds back a check which was lost many releases back to verify /dev/kfd permissions
      which when lacking, can lead to confusing failure modes of:
        "rocBLAS error: Could not initialize Tensile host: No devices found"
      
      This implementation does not hard fail the serve command but instead will fall back to CPU
      with an error log.  In the future we can include this in the GPU discovery UX to show
      detected but unsupported devices we discovered.
      9246e6dd
  6. 04 Sep, 2024 1 commit
  7. 27 Aug, 2024 1 commit
  8. 23 Aug, 2024 2 commits
  9. 19 Aug, 2024 6 commits
  10. 09 Aug, 2024 1 commit
  11. 05 Aug, 2024 3 commits
  12. 02 Aug, 2024 1 commit
  13. 24 Jul, 2024 1 commit
  14. 22 Jul, 2024 3 commits
  15. 20 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Adjust windows ROCm discovery · 283948c8
      Daniel Hiltgen authored
      The v5 hip library returns unsupported GPUs which wont enumerate at
      inference time in the runner so this makes sure we align discovery.  The
      gfx906 cards are no longer supported so we shouldn't compile with that
      GPU type as it wont enumerate at runtime.
      283948c8
  16. 11 Jul, 2024 1 commit
  17. 10 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Bump ROCm on windows to 6.1.2 · 1f50356e
      Daniel Hiltgen authored
      This also adjusts our algorithm to favor our bundled ROCm.
      I've confirmed VRAM reporting still doesn't work properly so we
      can't yet enable concurrency by default.
      1f50356e
  18. 09 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Detect CUDA OS Overhead · f6f759fc
      Daniel Hiltgen authored
      This adds logic to detect skew between the driver and
      management library which can be attributed to OS overhead
      and records that so we can adjust subsequent management
      library free VRAM updates and avoid OOM scenarios.
      f6f759fc
  19. 06 Jul, 2024 1 commit
  20. 03 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Better nvidia GPU discovery logging · ef757da2
      Daniel Hiltgen authored
      Refine the way we log GPU discovery to improve the non-debug
      output, and report more actionable log messages when possible
      to help users troubleshoot on their own.
      ef757da2
  21. 21 Jun, 2024 1 commit
    • Daniel Hiltgen's avatar
      Disable concurrency for AMD + Windows · 9929751c
      Daniel Hiltgen authored
      Until ROCm v6.2 ships, we wont be able to get accurate free memory
      reporting on windows, which makes automatic concurrency too risky.
      Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
      All other platforms and GPUs have accurate VRAM reporting wired
      up now, so we can turn on concurrency by default.
      9929751c
  22. 20 Jun, 2024 3 commits
  23. 19 Jun, 2024 4 commits
  24. 18 Jun, 2024 1 commit
  25. 17 Jun, 2024 1 commit
    • Daniel Hiltgen's avatar
      Move libraries out of users path · b2799f11
      Daniel Hiltgen authored
      We update the PATH on windows to get the CLI mapped, but this has
      an unintended side effect of causing other apps that may use our bundled
      DLLs to get terminated when we upgrade.
      b2799f11