1. 11 Sep, 2025 2 commits
  2. 10 Sep, 2025 5 commits
  3. 09 Sep, 2025 4 commits
  4. 08 Sep, 2025 4 commits
  5. 05 Sep, 2025 1 commit
  6. 04 Sep, 2025 2 commits
  7. 02 Sep, 2025 3 commits
  8. 31 Aug, 2025 2 commits
  9. 29 Aug, 2025 2 commits
    • Daniel Hiltgen's avatar
      perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd
      Daniel Hiltgen authored
      * perf: build graph for next batch in parallel to keep GPU busy
      
      This refactors the main run loop of the ollama runner to perform the main GPU
      intensive tasks (Compute+Floats) in a go routine so we can prepare the next
      batch in parallel to reduce the amount of time the GPU stalls waiting for the
      next batch of work.
      
      * tests: tune integration tests for ollama engine
      
      This tunes the integration tests to focus more on models supported
      by the new engine.
      517807cd
    • Daniel Hiltgen's avatar
      Always filter devices (#12108) · ead4a9a1
      Daniel Hiltgen authored
      * Always filter devices
      
      Avoid crashing on unsupported AMD iGPUs
      
      * Remove cuda device filtering
      
      This interferes with mixed setups
      ead4a9a1
  10. 28 Aug, 2025 1 commit
  11. 27 Aug, 2025 2 commits
    • Jesse Gross's avatar
      ggml: Avoid allocating CUDA primary context on unused GPUs · 9d97e6a9
      Jesse Gross authored
      The recent memory management changes caused all GPUs to be visible
      to the runner, regardless of whether they are ultimately used. This
      caused CUDA devices to allocate a primary context (~300 MB VRAM) on
      each GPU, for each model. This is unnecessary, so we can both avoid
      touching GPUs that we exclude in the early stage of allocation and
      freeing the memory for any that we touch but don't use.
      
      The issue will continue to exist for the old engine, since it touches
      all devices during initialization.
      9d97e6a9
    • Michael Yang's avatar
      fix keep alive (#12041) · 10815324
      Michael Yang authored
      10815324
  12. 26 Aug, 2025 3 commits
  13. 25 Aug, 2025 1 commit
  14. 22 Aug, 2025 6 commits
  15. 21 Aug, 2025 1 commit
  16. 20 Aug, 2025 1 commit