1. 06 Nov, 2025 2 commits
  2. 14 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      logs: fix bogus "0 MiB free" log line (#12590) · 850da848
      Daniel Hiltgen authored
      On the llama runner, after the recent GGML bump a new log line reports
      incorrect 0 MiB free after our patch to remove memory from the props.  This
      adjusts the llama.cpp code to fetch the actual free memory of the active device.
      850da848
  3. 13 Oct, 2025 1 commit
  4. 02 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      Update GGML to b6646 (#12245) · c68f367e
      Daniel Hiltgen authored
      Notable EOLs with this change:
      - MacOS v12 and v13 are no longer supported (v14+ required)
      - AMD gfx900 and gfx906 are no longer supported
      c68f367e
  5. 27 Aug, 2025 1 commit
    • Jesse Gross's avatar
      ggml: Avoid allocating CUDA primary context on unused GPUs · 9d97e6a9
      Jesse Gross authored
      The recent memory management changes caused all GPUs to be visible
      to the runner, regardless of whether they are ultimately used. This
      caused CUDA devices to allocate a primary context (~300 MB VRAM) on
      each GPU, for each model. This is unnecessary, so we can both avoid
      touching GPUs that we exclude in the early stage of allocation and
      freeing the memory for any that we touch but don't use.
      
      The issue will continue to exist for the old engine, since it touches
      all devices during initialization.
      9d97e6a9