1. 19 Nov, 2025 3 commits
  2. 18 Nov, 2025 7 commits
  3. 17 Nov, 2025 4 commits
  4. 16 Nov, 2025 6 commits
  5. 14 Nov, 2025 2 commits
  6. 13 Nov, 2025 9 commits
  7. 12 Nov, 2025 3 commits
  8. 11 Nov, 2025 6 commits
    • Jeffrey Morgan's avatar
    • Jeffrey Morgan's avatar
    • Bruce MacDonald's avatar
      docs/openapi: document that delete and copy responses are empty (#13055) · 15968714
      Bruce MacDonald authored
      Some route endpoints return an empty response with a 200 OK. These should be documented in the OpenAPI doc. Note that the previous deletion response was not correct.
      15968714
    • Jesse Gross's avatar
      llm: Prefer dedicated GPUs over iGPUs when allocating memory · 8bf38552
      Jesse Gross authored
      We currently assign model layers to GPUs according to free VRAM,
      which assumes that GPU performance is roughly equal. This does not
      work well for mixed dGPU and iGPU systems because iGPUs typically
      use system memory which is large but their performance is slow.
      This instead assigns layers to dGPUs first and then iGPUs.
      
      In the future, this could be generalized to have a more fine grained
      notion of GPU performance but dGPU vs. iGPU performance is the most
      extreme.
      8bf38552
    • Jesse Gross's avatar
      llm: Separate llamaServer and ollamaServer code paths · b13fbad0
      Jesse Gross authored
      Originally, llamaServer represented old memory estimates, which
      could be used with either the old or new engine. ollamaServer was
      used only for the new estimates and new engine. Since these
      implementations did not map directly to engine, there was engine-
      specific code in common code paths.
      
      Now that new estimates are always used for the new engine, there is
      a direct mapping between server type and engine. This separates out
      most of the engine-specific code into the correct implementation
      to make things easier to understand.
      b13fbad0
    • Jesse Gross's avatar
      llm: Use Ollama engine memory layouts for both old and new engines · f560bd07
      Jesse Gross authored
      Currently for both the old and new engines, there is code to
      calculate how much memory is required for a model and lay out
      the layers onto GPUs. This reuses the new engine's lay out code
      for the old engine as well, bringing them closer together. The
      old engine continues to use its current method of estimating
      required memory.
      
      This reduces maintainence effort and improves consistency, as new
      features only need to be implemented in one place. The newer code
      is also more accurate, especially with multiple GPUs.
      f560bd07