1. 10 Sep, 2025 2 commits
    • Daniel Hiltgen's avatar
      Add v12 + v13 cuda support (#12000) · 17a023f3
      Daniel Hiltgen authored
      * Add support for upcoming NVIDIA Jetsons
      
      The latest Jetsons with JetPack 7 are moving to an SBSA compatible model and
      will not require building a JetPack specific variant.
      
      * cuda: bring back dual versions
      
      This adds back dual CUDA versions for our releases,
      with v11 and v13 to cover a broad set of GPUs and
      driver versions.
      
      * win: break up native builds in build_windows.ps1
      
      * v11 build working on windows and linux
      
      * switch to cuda v12.8 not JIT
      
      * Set CUDA compression to size
      
      * enhance manual install linux docs
      17a023f3
    • Parth Sareen's avatar
      8d6fffae
  2. 09 Sep, 2025 4 commits
  3. 08 Sep, 2025 4 commits
  4. 05 Sep, 2025 1 commit
  5. 04 Sep, 2025 2 commits
  6. 02 Sep, 2025 3 commits
  7. 31 Aug, 2025 2 commits
  8. 29 Aug, 2025 2 commits
    • Daniel Hiltgen's avatar
      perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd
      Daniel Hiltgen authored
      * perf: build graph for next batch in parallel to keep GPU busy
      
      This refactors the main run loop of the ollama runner to perform the main GPU
      intensive tasks (Compute+Floats) in a go routine so we can prepare the next
      batch in parallel to reduce the amount of time the GPU stalls waiting for the
      next batch of work.
      
      * tests: tune integration tests for ollama engine
      
      This tunes the integration tests to focus more on models supported
      by the new engine.
      517807cd
    • Daniel Hiltgen's avatar
      Always filter devices (#12108) · ead4a9a1
      Daniel Hiltgen authored
      * Always filter devices
      
      Avoid crashing on unsupported AMD iGPUs
      
      * Remove cuda device filtering
      
      This interferes with mixed setups
      ead4a9a1
  9. 28 Aug, 2025 1 commit
  10. 27 Aug, 2025 2 commits
    • Jesse Gross's avatar
      ggml: Avoid allocating CUDA primary context on unused GPUs · 9d97e6a9
      Jesse Gross authored
      The recent memory management changes caused all GPUs to be visible
      to the runner, regardless of whether they are ultimately used. This
      caused CUDA devices to allocate a primary context (~300 MB VRAM) on
      each GPU, for each model. This is unnecessary, so we can both avoid
      touching GPUs that we exclude in the early stage of allocation and
      freeing the memory for any that we touch but don't use.
      
      The issue will continue to exist for the old engine, since it touches
      all devices during initialization.
      9d97e6a9
    • Michael Yang's avatar
      fix keep alive (#12041) · 10815324
      Michael Yang authored
      10815324
  11. 26 Aug, 2025 3 commits
  12. 25 Aug, 2025 1 commit
  13. 22 Aug, 2025 6 commits
  14. 21 Aug, 2025 1 commit
  15. 20 Aug, 2025 6 commits