1. 10 May, 2025 1 commit
  2. 08 May, 2025 5 commits
  3. 07 May, 2025 5 commits
  4. 06 May, 2025 5 commits
  5. 05 May, 2025 7 commits
  6. 04 May, 2025 1 commit
  7. 03 May, 2025 3 commits
  8. 02 May, 2025 4 commits
    • Jesse Gross's avatar
      ggml: Fix race that resulted in "context canceled" when loading · a6ef73f4
      Jesse Gross authored
      Successfully completing processing with an errgroup cancels the
      associated context. However, we also have a goroutine that is checking
      for cancelation of the context. As a result, there is a race where
      the goroutine can pick up the cancelation and report an error,
      replacing the sucessful error message.
      
      To avoid that, this replaces the goroutine with a cancelation check
      when we are reading files. This also has the advantage of stopping
      all reads relatively quickly on error and also ensuring that there are
      no outstanding I/O operations when we return in this case.
      
      The downside is that if a file read blocks forever (for example, over
      the network) then cancelation of the context effectively won't be
      honored. However, this is also true for other smaller files we read
      and the tensors are read in small chunks (128K), so it's consistent
      and better on balance overall.
      a6ef73f4
    • Jesse Gross's avatar
      ollamarunner: Re-enable worst case graph preallocation. · c2f5d666
      Jesse Gross authored
      Worst case graph preallocation was disabled by a27462b7
      "ollamarunner: Temporarily disable worst case graph preallocation"
      since it caused crashes with large batches when not using the GPU.
      
      This backports upstream llama.cpp commit f057808
      "ggml: Don't assert fail when tensor data changes (#13222)", which
      fixes the underlying bug and allows reverting the previous workaround.
      c2f5d666
    • Harsh Nevse's avatar
    • Jeffrey Morgan's avatar
      llama: update to commit e1e8e099 (#10513) · 8dd12c87
      Jeffrey Morgan authored
      8dd12c87
  9. 01 May, 2025 6 commits
  10. 30 Apr, 2025 3 commits
    • Devon Rifkin's avatar
      strip out thinking tags in message history for qwen3 & r1 (#10490) · ad3c7c9b
      Devon Rifkin authored
      * strip out thinking tags in message history for qwen3 & r1
      
      This is in advance of "proper" support where we'll make reasoning
      configurable and we'll parse out thinking/reasoning tags and provide
      them to the caller. These models expect there to be no thinking tags in
      the message history, so this should improve quality
      
      * parse model names instead of hacky prefix check
      ad3c7c9b
    • Daniel Hiltgen's avatar
      Fix "Stopping..." scheduler hang (#10487) · 415c8fcc
      Daniel Hiltgen authored
      * Adjust initial scheduler refCount
      
      Ensure we only set the refCount on success
      
      * sched: fix lock order inversion deadlock
      
      Under certain race conditions, there was a scenario where the scheduler would
      get into a deadlock while trying to update free space information while a model
      was trying to unload.
      415c8fcc
    • Daniel Hiltgen's avatar
      Narrow set of paths we load GGML from (#10485) · 718eda1b
      Daniel Hiltgen authored
      Users may have other incompatible GGML installs on their systems.
      This will prevent us from trying to load them from the path.
      718eda1b