1. 11 Nov, 2025 11 commits
    • Jesse Gross's avatar
      llm: Prefer dedicated GPUs over iGPUs when allocating memory · 8bf38552
      Jesse Gross authored
      We currently assign model layers to GPUs according to free VRAM,
      which assumes that GPU performance is roughly equal. This does not
      work well for mixed dGPU and iGPU systems because iGPUs typically
      use system memory which is large but their performance is slow.
      This instead assigns layers to dGPUs first and then iGPUs.
      
      In the future, this could be generalized to have a more fine grained
      notion of GPU performance but dGPU vs. iGPU performance is the most
      extreme.
      8bf38552
    • Jesse Gross's avatar
      llm: Separate llamaServer and ollamaServer code paths · b13fbad0
      Jesse Gross authored
      Originally, llamaServer represented old memory estimates, which
      could be used with either the old or new engine. ollamaServer was
      used only for the new estimates and new engine. Since these
      implementations did not map directly to engine, there was engine-
      specific code in common code paths.
      
      Now that new estimates are always used for the new engine, there is
      a direct mapping between server type and engine. This separates out
      most of the engine-specific code into the correct implementation
      to make things easier to understand.
      b13fbad0
    • Jesse Gross's avatar
      llm: Use Ollama engine memory layouts for both old and new engines · f560bd07
      Jesse Gross authored
      Currently for both the old and new engines, there is code to
      calculate how much memory is required for a model and lay out
      the layers onto GPUs. This reuses the new engine's lay out code
      for the old engine as well, bringing them closer together. The
      old engine continues to use its current method of estimating
      required memory.
      
      This reduces maintainence effort and improves consistency, as new
      features only need to be implemented in one place. The newer code
      is also more accurate, especially with multiple GPUs.
      f560bd07
    • Jesse Gross's avatar
      llamarunner: Respect device ordering for offloaded layers · 4372d0bf
      Jesse Gross authored
      We used to control the way that llama.cpp saw devices using
      CUDA_VISIBLE_DEVICES or similar. This would ensure that the layers
      offloaded to a device were actually the ones intended. This is
      particularly important because we might reorder devices based on
      free memory or performance.
      
      When we started explicitly scheduling layers, this logic went
      away but the llamarunner didn't have any way to set the correct
      order of devices. This meant that the correct number of layers
      would be assigned to a device but not necessarily the layers
      that were expected. This change sets up the devices correctly
      based on the offload information.
      4372d0bf
    • Eva H's avatar
    • Baptiste Jamin's avatar
      server: add logprobs and top_logprobs support to Ollama's API (#12899) · 59241c5b
      Baptiste Jamin authored
      
      
      Adds logprobs support to Ollama's API including support for Ollama's
      OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter
      in the API, Ollama will return the log probabilities for each token generated.
      'top_logprobs', an integer value can also be specified up to the value 20.
      When specified, the API will also provide the number of most likely tokens to
      return at each token position
      Co-authored-by: default avatarBaptiste Jamin <baptiste@crisp.chat>
      59241c5b
    • Eva Ho's avatar
      address comment · 2a9b61f0
      Eva Ho authored
      2a9b61f0
    • Sheikh's avatar
      docs: fix metal gpu section header (#13045) · 6df42088
      Sheikh authored
      6df42088
    • Eva Ho's avatar
      fix test · 9d615cda
      Eva Ho authored
      9d615cda
    • Eva Ho's avatar
      clean up · 6a818b8a
      Eva Ho authored
      6a818b8a
    • Eva Ho's avatar
      2aaf29ac
  2. 10 Nov, 2025 1 commit
  3. 08 Nov, 2025 3 commits
  4. 07 Nov, 2025 2 commits
  5. 06 Nov, 2025 15 commits
  6. 05 Nov, 2025 8 commits