"docs/vscode:/vscode.git/clone" did not exist on "9e1edfc1ad408a3a6b9df07f067ec5fce84781e1"
  1. 19 Apr, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/registry: make pull send errors with Error field (#10326) · 4e535e61
      Blake Mizerany authored
      Previously, the pull handler would send an error message in the Status
      field, this prevented the client from using the message as a signal to
      stop. In the case of the "run" command, it would follow the pull with a
      "show" which would print a nearly identical "not found" message for
      unresolved models.
      
      Fixes #10307
      4e535e61
  2. 17 Apr, 2025 1 commit
  3. 16 Apr, 2025 4 commits
    • Blake Mizerany's avatar
      server/internal/registry: remove superfluous progress bar flush (#10303) · 369de832
      Blake Mizerany authored
      This removes the extra flushProgress() at the end of handlePull. It is
      unnecessary because final progress updates are flushed in all cases of
      the main select loop.
      369de832
    • Blake Mizerany's avatar
      server/internal/client/ollama: cleanup use of multiple counters (#10304) · 3457a315
      Blake Mizerany authored
      The completed and received counters must work in tandem and the code
      should better reflect that. Previously, the act of updating them was 2-3
      lines of code duplicated in multiple places. This consolidates them into
      a single update closure for easy reading and maintenance.
      
      This also simplifies error handling in places where we can use a return
      parameter and defer to handle the error case for updates.
      
      Also, remove the old Layer field from the trackingReader struct.
      3457a315
    • Daniel Hiltgen's avatar
      Give tests more time to run (#10306) · 56dc316a
      Daniel Hiltgen authored
      Fix flake failures on windows
      56dc316a
    • Blake Mizerany's avatar
      cmd: add retry/backoff (#10069) · 1e7f62cb
      Blake Mizerany authored
      This commit adds retry/backoff to the registry client for pull requests.
      
      Also, revert progress indication to match original client's until we can
      "get it right."
      
      Also, make WithTrace wrap existing traces instead of clobbering them.
      This allows clients to compose traces.
      1e7f62cb
  4. 14 Apr, 2025 1 commit
  5. 10 Apr, 2025 1 commit
  6. 09 Apr, 2025 1 commit
  7. 08 Apr, 2025 1 commit
  8. 07 Apr, 2025 1 commit
  9. 03 Apr, 2025 1 commit
    • Bruce MacDonald's avatar
      llm: set done reason at server level (#9830) · e53b3cbd
      Bruce MacDonald authored
      No functional change. Many different done reasons can be set at the runner
      level, so rather than obsuring them we should return them to the server
      process and let it choose what to do with the done reason. This separates
      the API concerns from the runner.
      e53b3cbd
  10. 02 Apr, 2025 1 commit
  11. 01 Apr, 2025 1 commit
  12. 31 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/client/ollama: cache completed chunks (#9933) · ef27d52e
      Blake Mizerany authored
      This change adds tracking of download chunks during the pull process so
      that subsequent pulls can skip downloading already completed chunks.
      This works across restarts of ollama.
      
      Currently, download state will be lost if a prune is triggered during a
      pull (e.g. restart or remove). This issue should be addressed in a
      follow-up PR.
      ef27d52e
  13. 28 Mar, 2025 1 commit
  14. 26 Mar, 2025 1 commit
    • Jesse Gross's avatar
      ggml: Support heterogeneous KV cache layer sizes in memory estimation · f66216e3
      Jesse Gross authored
      Gemma3 uses sliding windows for its context on 5/6 layers, significantly
      reducing memory usage but leading to uneven usage across layers,
      which makes allocation to the correct GPU difficult. We currently
      estimate very conservatively by assuming all layers are consistent
      at the max size.
      
      Llama3.2-vision is also inconsistent between self attention and cross
      attention layers - at moment, we calculate the correct total size
      and then average this across layers. In some cases, this may lead
      to crashes if a large layer is placed on a GPU sized by the average.
      
      This allows memory estimation to calculate per-layer KV cache size
      and take this account when placing layers onto GPUs. We already do
      this for weights that vary per-tensor, so this is a logical extension.
      
      Fixes #9730
      Fixes #9890
      f66216e3
  15. 21 Mar, 2025 2 commits
  16. 20 Mar, 2025 1 commit
  17. 19 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/client/ollama: confirm all chunksums were received (#9893) · 2ddacd75
      Blake Mizerany authored
      If the chunksums response is missing a chunk, the client should fail
      the download. This changes the client to check that all bytes are
      accounted for in the chunksums response.
      
      It is possible there are overlaps or gaps in the chunksums response and
      so the size is not the only thing left to check, but this provides
      enough coverage for now. We may want to check that chunks are contiguous
      later.
      2ddacd75
  18. 15 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/client/ollama: set User-Agent for registry client (#9775) · 82946761
      Blake Mizerany authored
      This sets the agent header in DefaultRegistry to include the version of
      the client, OS, and architecture in the previous format, with a minor
      twist.
      
      Note: The version is obtained from the build info, instead of the
      version in version.Version, which should not longer be necessary, but we
      can remove in a future commit. Using the build info is more accurate and
      also provides extra build information if the build is not tagged, and if
      it is "dirty". Previously, the version was just "0.0.0" with no other
      helpful information. The ollama.com registry and others handle this
      swimmingly.
      82946761
  19. 14 Mar, 2025 3 commits
    • Jesse Gross's avatar
      gemma3: Allow multiple image in a single input · 7bf793a6
      Jesse Gross authored
      Previously processing multiple images in a batch would trigger
      segfaults so sending images together was disabled as a way to
      mitigate this. The trigger was processing one image on the CPU
      and one on the GPU.
      
      This can no longer happen:
       - The vision encoder is now on the GPU so both images would be
         processed on the GPU.
       - We require images to be fully contained in a batch and each
         image including its special tokens is over half the batch size.
         As a result, we will never get two images in the same batch.
      
      Fixes #9731
      7bf793a6
    • Blake Mizerany's avatar
      4e320b8b
    • Blake Mizerany's avatar
      server/internal/client: use chunksums for concurrent blob verification (#9746) · eb2b22b0
      Blake Mizerany authored
      Replace large-chunk blob downloads with parallel small-chunk
      verification to solve timeout and performance issues. Registry users
      experienced progressively slowing download speeds as large-chunk
      transfers aged, often timing out completely.
      
      The previous approach downloaded blobs in a few large chunks but
      required a separate, single-threaded pass to read the entire blob back
      from disk for verification after download completion.
      
      This change uses the new chunksums API to fetch many smaller
      chunk+digest pairs, allowing concurrent downloads and immediate
      verification as each chunk arrives. Chunks are written directly to their
      final positions, eliminating the entire separate verification pass.
      
      The result is more reliable downloads that maintain speed throughout the
      transfer process and significantly faster overall completion, especially
      over unstable connections or with large blobs.
      eb2b22b0
  20. 13 Mar, 2025 2 commits
  21. 11 Mar, 2025 3 commits
  22. 05 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/registry: take over pulls from server package (#9485) · e2252d0f
      Blake Mizerany authored
      This commit replaces the old pull implementation in the server package
      with the new, faster, more robust pull implementation in the registry
      package.
      
      The new endpoint, and now the remove endpoint too, are behind the
      feature gate "client2" enabled only by setting the OLLAMA_EXPERIMENT
      environment variable include "client2".
      
      Currently, the progress indication is wired to perform the same as the
      previous implementation to avoid making changes to the CLI, and because
      the status reports happen at the start of the download, and the end of
      the write to disk, the progress indication is not as smooth as it could
      be. This is a known issue and will be addressed in a future change.
      
      This implementation may be ~0.5-1.0% slower in rare cases, depending on
      network and disk speed, but is generally MUCH faster and more robust
      than the its predecessor in all other cases.
      e2252d0f
  23. 04 Mar, 2025 3 commits
    • Daniel Hiltgen's avatar
      New engine: vision models and auto-fallback (#9113) · 1fdb351c
      Daniel Hiltgen authored
      * Include unified vision layers in memory prediction
      
      For newer vision models with a single gguf, include
      the projection estimates.
      
      * Adjust CLI to handle both styles of vision model metadata
      
      * Wire up new tokenizers for new engine
      
      If we're loading the new engine, utilize the new model
      text processor instead of calling into cgo wrappers for
      llama.cpp.  This also cleans up some tech debt from the
      older tokenization flow for the C++ server which was
      no longer used.
      
      This also adjusts the grammar handling logic to pass
      through to the new engine instead of utilizing the cgo
      schema to grammar call.
      
      * Lay foundation for auto selection of new engine
      1fdb351c
    • Blake Mizerany's avatar
      server/internal/registry: reintroduce pruning on model deletion (#9489) · 7a01ad76
      Blake Mizerany authored
      This reintroduces aggressive pruning on model deletion as a temporary
      measure until a more controlled garbage collection (GC) mechanism is
      implemented.
      
      Issues with the current approach:
      
      1. Users may accidentally delete a model (`ollama rm llama3.3` instead
         of `ollama rm llama3.2`), requiring a full re-download unless another
         model references the same blobs.
      
      2. Users may assume a deleted model is still referenced elsewhere, but
         due to prior updates or deletions, the references no longer exist,
         leading to unnecessary re-downloads.
      
      Soon, we should implement a structured GC mechanism to retain
      unreferenced blobs for a configurable period before removal, which will
      run on "ollama rm" and other commands we deem appropriate.
      
      Users that want to immediately remove unreferenced blobs can use a new
      prune command that will allow them to specify the age and class of blobs
      to remove.
      
      Example usage:
      
          # Run basic blob GC
          $ ollama prune
      
          # Remove unreferenced blobs older than 7 days
          $ ollama prune --age 7d
      
          # Remove all blobs, referenced or not, older than 7 days (and their manifests?)
          $ ollama prune --age 7d --all
      
          # Remove all unreferenced blobs immediately
          $ ollama prune --age 0 --all
      
          # Remove all blobs
          $ ollama prune --age 0 --all
      
      This should provide a safer and more predictable cleanup process.
      7a01ad76
    • Blake Mizerany's avatar
      server/.../backoff,syncs: don't break builds without synctest (#9484) · 55ab9f37
      Blake Mizerany authored
      Previously, developers without the synctest experiment enabled would see
      build failures when running tests in some server/internal/internal
      packages using the synctest package. This change makes the transition to
      use of the package less painful but guards the use of the synctest
      package with build tags.
      
      synctest is enabled in CI. If a new change will break a synctest
      package, it will break in CI, even if it does not break locally.
      
      The developer docs have been updated to help with any confusion about
      why package tests pass locally but fail in CI.
      55ab9f37
  24. 03 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/client/ollama: hold DiskCache on Registry (#9463) · 3519dd1c
      Blake Mizerany authored
      Previously, using a Registry required a DiskCache to be passed in for
      use in various methods. This was a bit cumbersome, as the DiskCache is
      required for most operations, and the DefaultCache is used in most of
      those cases. This change makes the DiskCache an optional field on the
      Registry struct.
      
      This also changes DefaultCache to initialize on first use. This is to
      not burden clients with the cost of creating a new cache per use, or
      having to hold onto a cache for the lifetime of the Registry.
      
      Also, slip in some minor docs updates for Trace.
      3519dd1c
  25. 02 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/client/ollama: handle extended names in client/ollama (#9454) · ee048b76
      Blake Mizerany authored
      The extended name format is a superset of the name format that only the
      client needs to know about, not the server or other dependents of the
      name package, so move the split logic into the client package.
      
      Also, take advantage of knowing about the extended name format to allow
      the client to use the extended name format when unlinking to verify they
      are unlinking the manifest with the content they intend.
      ee048b76
  26. 01 Mar, 2025 2 commits
    • Blake Mizerany's avatar
      server/internal/internal/names: validate names (#9400) · cda6f5c6
      Blake Mizerany authored
      This commit is a step towards a goal to make names less ceremonial
      outside of the registry client. Clients of the registry package can
      treat names as opaque strings, and the registry package will handle
      parsing, validating, and normalizing names.
      
      Ideally we end up with the names package tucked away in an internal
      package for good. We'll see how things go.
      
      Also, this package name is not permanent. This another step in the
      on-going process of refactoring the server code, and at some point it
      will most likely be renamed/moved.
      cda6f5c6
    • Bruce MacDonald's avatar
      server: validate local path on safetensor create (#9379) · bebb6823
      Bruce MacDonald authored
      More validation during the safetensor creation process.
      Properly handle relative paths (like ./model.safetensors) while rejecting absolute paths
      Add comprehensive test coverage for various paths
      No functionality changes for valid inputs - existing workflows remain unaffected
      Leverages Go 1.24's new os.Root functionality for secure containment
      bebb6823
  27. 28 Feb, 2025 1 commit
  28. 27 Feb, 2025 1 commit