1. 23 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      DRY out the runner lifecycle code (#12540) · 3258a89b
      Daniel Hiltgen authored
      * DRY out the runner lifecycle code
      
      Now that discovery uses the runners as well, this unifies the runner spawning code
      into a single place.  This also unifies GPU discovery types with the newer ml.DeviceInfo
      
      * win: make incremental builds better
      
      Place build artifacts in discrete directories so incremental builds don't have to start fresh
      
      * Adjust sort order to consider iGPUs
      
      * handle cpu inference oom scenarios
      
      * review comments
      3258a89b
  2. 17 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      test: harden scheduler tests (#12662) · 68e04c7f
      Daniel Hiltgen authored
      * test: harden scheduler tests
      
      This removes reschedDelay which was stale code, and adds
      a new configurable timeout for the waitForVRAMRecovery so
      tests can now set the timeout to be very short to avoid the
      scheduler getting stuck and hitting a test timeout.
      
      * test: tune tests for partial loads
      
      Give stress tests more time when the model is split between CPU/GPU
      68e04c7f
  3. 11 Oct, 2025 1 commit
    • Devon Rifkin's avatar
      routes: fix built-in renderers for `api/generate` · 6db8da99
      Devon Rifkin authored
      Made it so when api/generate builds up a message array and generates the
      prompt it now goes through the same function as `api/chat` for
      consistency. This is where we hook the optional built-in renderers to
      bypass templates, which was missing for `api/generate` before this
      change.
      
      Closes: #12578
      6db8da99