1. 02 Apr, 2025 1 commit
  2. 01 Apr, 2025 1 commit
  3. 26 Mar, 2025 5 commits
    • molbal's avatar
      e5d84fb9
    • Hengky Steen's avatar
      docs: add ollamb to community projects · dd66712e
      Hengky Steen authored
      dd66712e
    • Jesse Gross's avatar
      ggml: Support heterogeneous KV cache layer sizes in memory estimation · f66216e3
      Jesse Gross authored
      Gemma3 uses sliding windows for its context on 5/6 layers, significantly
      reducing memory usage but leading to uneven usage across layers,
      which makes allocation to the correct GPU difficult. We currently
      estimate very conservatively by assuming all layers are consistent
      at the max size.
      
      Llama3.2-vision is also inconsistent between self attention and cross
      attention layers - at moment, we calculate the correct total size
      and then average this across layers. In some cases, this may lead
      to crashes if a large layer is placed on a GPU sized by the average.
      
      This allows memory estimation to calculate per-layer KV cache size
      and take this account when placing layers onto GPUs. We already do
      this for weights that vary per-tensor, so this is a logical extension.
      
      Fixes #9730
      Fixes #9890
      f66216e3
    • Jesse Gross's avatar
      llm: Fix debug logging for memory estimates · f4f0992b
      Jesse Gross authored
      f4f0992b
    • Jesse Gross's avatar
      kvcache: Sliding window cache only needs a single batch total · 1feff619
      Jesse Gross authored
      When computing the size of the cache for sliding window attention,
      we don't need to multiple the batch size by the number of parallel
      sequences - the batch size is constant.
      
      This also simplifies the check for whether to allocate the cache
      size based on capacity or window size as the batch size is already
      incorporated into the capacity when handled by the runner.
      1feff619
  4. 25 Mar, 2025 1 commit
  5. 24 Mar, 2025 1 commit
  6. 21 Mar, 2025 12 commits
  7. 20 Mar, 2025 6 commits
  8. 19 Mar, 2025 2 commits
  9. 18 Mar, 2025 2 commits
  10. 17 Mar, 2025 9 commits