1. 26 Mar, 2025 4 commits
    • Hengky Steen's avatar
      docs: add ollamb to community projects · dd66712e
      Hengky Steen authored
      dd66712e
    • Jesse Gross's avatar
      ggml: Support heterogeneous KV cache layer sizes in memory estimation · f66216e3
      Jesse Gross authored
      Gemma3 uses sliding windows for its context on 5/6 layers, significantly
      reducing memory usage but leading to uneven usage across layers,
      which makes allocation to the correct GPU difficult. We currently
      estimate very conservatively by assuming all layers are consistent
      at the max size.
      
      Llama3.2-vision is also inconsistent between self attention and cross
      attention layers - at moment, we calculate the correct total size
      and then average this across layers. In some cases, this may lead
      to crashes if a large layer is placed on a GPU sized by the average.
      
      This allows memory estimation to calculate per-layer KV cache size
      and take this account when placing layers onto GPUs. We already do
      this for weights that vary per-tensor, so this is a logical extension.
      
      Fixes #9730
      Fixes #9890
      f66216e3
    • Jesse Gross's avatar
      llm: Fix debug logging for memory estimates · f4f0992b
      Jesse Gross authored
      f4f0992b
    • Jesse Gross's avatar
      kvcache: Sliding window cache only needs a single batch total · 1feff619
      Jesse Gross authored
      When computing the size of the cache for sliding window attention,
      we don't need to multiple the batch size by the number of parallel
      sequences - the batch size is constant.
      
      This also simplifies the check for whether to allocate the cache
      size based on capacity or window size as the batch size is already
      incorporated into the capacity when handled by the runner.
      1feff619
  2. 25 Mar, 2025 1 commit
  3. 24 Mar, 2025 1 commit
  4. 21 Mar, 2025 12 commits
  5. 20 Mar, 2025 6 commits
  6. 19 Mar, 2025 2 commits
  7. 18 Mar, 2025 2 commits
  8. 17 Mar, 2025 9 commits
  9. 15 Mar, 2025 3 commits
    • Patrick Devine's avatar
      fix: correctly save in interactive mode (#9788) · 2c8b4846
      Patrick Devine authored
      This fixes the case where a FROM line in previous modelfile points to a
      file which may/may not be present in a different ollama instance. We
      shouldn't be relying on the filename though and instead just check if
      the FROM line was instead a valid model name and point to that instead.
      2c8b4846
    • Blake Mizerany's avatar
      server/internal/client/ollama: set User-Agent for registry client (#9775) · 82946761
      Blake Mizerany authored
      This sets the agent header in DefaultRegistry to include the version of
      the client, OS, and architecture in the previous format, with a minor
      twist.
      
      Note: The version is obtained from the build info, instead of the
      version in version.Version, which should not longer be necessary, but we
      can remove in a future commit. Using the build info is more accurate and
      also provides extra build information if the build is not tagged, and if
      it is "dirty". Previously, the version was just "0.0.0" with no other
      helpful information. The ollama.com registry and others handle this
      swimmingly.
      82946761
    • Patrick Devine's avatar
      gemma3 quantization (#9776) · ef378ad6
      Patrick Devine authored
      ef378ad6