1. 13 Mar, 2025 2 commits
  2. 11 Mar, 2025 3 commits
  3. 05 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/registry: take over pulls from server package (#9485) · e2252d0f
      Blake Mizerany authored
      This commit replaces the old pull implementation in the server package
      with the new, faster, more robust pull implementation in the registry
      package.
      
      The new endpoint, and now the remove endpoint too, are behind the
      feature gate "client2" enabled only by setting the OLLAMA_EXPERIMENT
      environment variable include "client2".
      
      Currently, the progress indication is wired to perform the same as the
      previous implementation to avoid making changes to the CLI, and because
      the status reports happen at the start of the download, and the end of
      the write to disk, the progress indication is not as smooth as it could
      be. This is a known issue and will be addressed in a future change.
      
      This implementation may be ~0.5-1.0% slower in rare cases, depending on
      network and disk speed, but is generally MUCH faster and more robust
      than the its predecessor in all other cases.
      e2252d0f
  4. 04 Mar, 2025 3 commits
    • Daniel Hiltgen's avatar
      New engine: vision models and auto-fallback (#9113) · 1fdb351c
      Daniel Hiltgen authored
      * Include unified vision layers in memory prediction
      
      For newer vision models with a single gguf, include
      the projection estimates.
      
      * Adjust CLI to handle both styles of vision model metadata
      
      * Wire up new tokenizers for new engine
      
      If we're loading the new engine, utilize the new model
      text processor instead of calling into cgo wrappers for
      llama.cpp.  This also cleans up some tech debt from the
      older tokenization flow for the C++ server which was
      no longer used.
      
      This also adjusts the grammar handling logic to pass
      through to the new engine instead of utilizing the cgo
      schema to grammar call.
      
      * Lay foundation for auto selection of new engine
      1fdb351c
    • Blake Mizerany's avatar
      server/internal/registry: reintroduce pruning on model deletion (#9489) · 7a01ad76
      Blake Mizerany authored
      This reintroduces aggressive pruning on model deletion as a temporary
      measure until a more controlled garbage collection (GC) mechanism is
      implemented.
      
      Issues with the current approach:
      
      1. Users may accidentally delete a model (`ollama rm llama3.3` instead
         of `ollama rm llama3.2`), requiring a full re-download unless another
         model references the same blobs.
      
      2. Users may assume a deleted model is still referenced elsewhere, but
         due to prior updates or deletions, the references no longer exist,
         leading to unnecessary re-downloads.
      
      Soon, we should implement a structured GC mechanism to retain
      unreferenced blobs for a configurable period before removal, which will
      run on "ollama rm" and other commands we deem appropriate.
      
      Users that want to immediately remove unreferenced blobs can use a new
      prune command that will allow them to specify the age and class of blobs
      to remove.
      
      Example usage:
      
          # Run basic blob GC
          $ ollama prune
      
          # Remove unreferenced blobs older than 7 days
          $ ollama prune --age 7d
      
          # Remove all blobs, referenced or not, older than 7 days (and their manifests?)
          $ ollama prune --age 7d --all
      
          # Remove all unreferenced blobs immediately
          $ ollama prune --age 0 --all
      
          # Remove all blobs
          $ ollama prune --age 0 --all
      
      This should provide a safer and more predictable cleanup process.
      7a01ad76
    • Blake Mizerany's avatar
      server/.../backoff,syncs: don't break builds without synctest (#9484) · 55ab9f37
      Blake Mizerany authored
      Previously, developers without the synctest experiment enabled would see
      build failures when running tests in some server/internal/internal
      packages using the synctest package. This change makes the transition to
      use of the package less painful but guards the use of the synctest
      package with build tags.
      
      synctest is enabled in CI. If a new change will break a synctest
      package, it will break in CI, even if it does not break locally.
      
      The developer docs have been updated to help with any confusion about
      why package tests pass locally but fail in CI.
      55ab9f37
  5. 03 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/client/ollama: hold DiskCache on Registry (#9463) · 3519dd1c
      Blake Mizerany authored
      Previously, using a Registry required a DiskCache to be passed in for
      use in various methods. This was a bit cumbersome, as the DiskCache is
      required for most operations, and the DefaultCache is used in most of
      those cases. This change makes the DiskCache an optional field on the
      Registry struct.
      
      This also changes DefaultCache to initialize on first use. This is to
      not burden clients with the cost of creating a new cache per use, or
      having to hold onto a cache for the lifetime of the Registry.
      
      Also, slip in some minor docs updates for Trace.
      3519dd1c
  6. 02 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal/client/ollama: handle extended names in client/ollama (#9454) · ee048b76
      Blake Mizerany authored
      The extended name format is a superset of the name format that only the
      client needs to know about, not the server or other dependents of the
      name package, so move the split logic into the client package.
      
      Also, take advantage of knowing about the extended name format to allow
      the client to use the extended name format when unlinking to verify they
      are unlinking the manifest with the content they intend.
      ee048b76
  7. 01 Mar, 2025 2 commits
    • Blake Mizerany's avatar
      server/internal/internal/names: validate names (#9400) · cda6f5c6
      Blake Mizerany authored
      This commit is a step towards a goal to make names less ceremonial
      outside of the registry client. Clients of the registry package can
      treat names as opaque strings, and the registry package will handle
      parsing, validating, and normalizing names.
      
      Ideally we end up with the names package tucked away in an internal
      package for good. We'll see how things go.
      
      Also, this package name is not permanent. This another step in the
      on-going process of refactoring the server code, and at some point it
      will most likely be renamed/moved.
      cda6f5c6
    • Bruce MacDonald's avatar
      server: validate local path on safetensor create (#9379) · bebb6823
      Bruce MacDonald authored
      More validation during the safetensor creation process.
      Properly handle relative paths (like ./model.safetensors) while rejecting absolute paths
      Add comprehensive test coverage for various paths
      No functionality changes for valid inputs - existing workflows remain unaffected
      Leverages Go 1.24's new os.Root functionality for secure containment
      bebb6823
  8. 28 Feb, 2025 1 commit
  9. 27 Feb, 2025 2 commits
    • Blake Mizerany's avatar
      server/internal/registry: implement CloseNotify and Flush (for now) (#9402) · 41dc2804
      Blake Mizerany authored
      This fixes panics introduced in 2412adf4
      when Gin ungracefully assumes that the http.ResponseWriter implements
      http.CloseNotifier and http.Flusher, which our new statusCodeRecorder
      does not. This is a temporary fix until we can pour the rest of the Gin
      out.
      41dc2804
    • Blake Mizerany's avatar
      server/internal: replace model delete API with new registry handler. (#9347) · 2412adf4
      Blake Mizerany authored
      This commit introduces a new API implementation for handling
      interactions with the registry and the local model cache. The new API is
      located in server/internal/registry. The package name is "registry" and
      should be considered temporary; it is hidden and not bleeding outside of
      the server package. As the commits roll in, we'll start consuming more
      of the API and then let reverse osmosis take effect, at which point it
      will surface closer to the root level packages as much as needed.
      2412adf4
  10. 25 Feb, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal: copy bmizerany/ollama-go to internal package (#9294) · 348b3e09
      Blake Mizerany authored
      This commit copies (without history) the bmizerany/ollama-go repository
      with the intention of integrating it into the ollama as a replacement
      for the pushing, and pulling of models, and management of the cache they
      are pushed and pulled from.
      
      New homes for these packages will be determined as they are integrated
      and we have a better understanding of proper package boundaries.
      348b3e09
  11. 22 Feb, 2025 1 commit
    • Blake Mizerany's avatar
      server: group routes by category and purpose (#9270) · 68bac1e0
      Blake Mizerany authored
      The route assembly in Handler lacked clear organization making it
      difficult scan for routes and their relationships to each other. This
      commit aims to fix that by reordering the assembly of routes to group
      them by category and purpose.
      
      Also, be more specific about what "config" refers to (it is about CORS
      if you were wondering... I was.)
      68bac1e0
  12. 20 Feb, 2025 2 commits
  13. 14 Feb, 2025 3 commits
    • Jesse Gross's avatar
      Runner for Ollama engine · ed443a03
      Jesse Gross authored
      This provides integration with the new Ollama engine
      (58245413 next ollama runner (#7913)) and the rest of the Ollama
      infrastructure such as the runner and Ollama server.
      
      In addition, it also builds out the KV cache infrastructure to
      support requirements of how Ollama runs models such as:
       - Parallel processing
       - Memory management for defragmentation and shifting
       - Multi-modal modals
      
      Both old and new engines continue to be supported. By default, only
      the old engine is used. To enable the new engine:
      
      Start the server with the OLLAMA_NEW_ENGINE environment variable set:
      OLLAMA_NEW_ENGINE=1 ./ollama serve
      
      Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
      ./ollama run jessegross/llama3.1
      ed443a03
    • Jesse Gross's avatar
      models: Move model into their own directory · 6945617a
      Jesse Gross authored
      This allows there to be a file that is a list of models that is
      not mixed into the runner code.
      6945617a
    • Michael Yang's avatar
      next ollama runner (#7913) · 58245413
      Michael Yang authored
      
      
      feat: add new Ollama engine using ggml through cgo
      
      This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.
      
      - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
      - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
      - `ml.Tensor` defines the interface for a tensor and tensor operations
      
      This is the first implementation of the new engine. Follow up PRs will implement more features:
      
      - non-greedy sampling (#8410)
      - integration with Ollama and KV caching (#8301)
      - more model support (#9080) with more coming soon
      Co-authored-by: default avatarBruce MacDonald <brucewmacdonald@gmail.com>
      58245413
  14. 05 Feb, 2025 3 commits
  15. 29 Jan, 2025 1 commit
    • Michael Yang's avatar
      next build (#8539) · dcfb7a10
      Michael Yang authored
      
      
      * add build to .dockerignore
      
      * test: only build one arch
      
      * add build to .gitignore
      
      * fix ccache path
      
      * filter amdgpu targets
      
      * only filter if autodetecting
      
      * Don't clobber gpu list for default runner
      
      This ensures the GPU specific environment variables are set properly
      
      * explicitly set CXX compiler for HIP
      
      * Update build_windows.ps1
      
      This isn't complete, but is close.  Dependencies are missing, and it only builds the "default" preset.
      
      * build: add ollama subdir
      
      * add .git to .dockerignore
      
      * docs: update development.md
      
      * update build_darwin.sh
      
      * remove unused scripts
      
      * llm: add cwd and build/lib/ollama to library paths
      
      * default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS
      
      * add additional cmake output vars for msvc
      
      * interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12
      
      * remove unncessary filepath.Dir, cleanup
      
      * add hardware-specific directory to path
      
      * use absolute server path
      
      * build: linux arm
      
      * cmake install targets
      
      * remove unused files
      
      * ml: visit each library path once
      
      * build: skip cpu variants on arm
      
      * build: install cpu targets
      
      * build: fix workflow
      
      * shorter names
      
      * fix rocblas install
      
      * docs: clean up development.md
      
      * consistent build dir removal in development.md
      
      * silence -Wimplicit-function-declaration build warnings in ggml-cpu
      
      * update readme
      
      * update development readme
      
      * llm: update library lookup logic now that there is one runner (#8587)
      
      * tweak development.md
      
      * update docs
      
      * add windows cuda/rocm tests
      
      ---------
      Co-authored-by: default avatarjmorganca <jmorganca@gmail.com>
      Co-authored-by: default avatarDaniel Hiltgen <daniel@ollama.com>
      dcfb7a10
  16. 15 Jan, 2025 1 commit
  17. 09 Jan, 2025 1 commit
  18. 08 Jan, 2025 1 commit
  19. 01 Jan, 2025 1 commit
  20. 23 Dec, 2024 1 commit
  21. 15 Dec, 2024 1 commit
  22. 11 Dec, 2024 1 commit
  23. 10 Dec, 2024 3 commits
    • frob's avatar
      757eeacc
    • Stefan Weil's avatar
    • Daniel Hiltgen's avatar
      build: Make target improvements (#7499) · 4879a234
      Daniel Hiltgen authored
      * llama: wire up builtin runner
      
      This adds a new entrypoint into the ollama CLI to run the cgo built runner.
      On Mac arm64, this will have GPU support, but on all other platforms it will
      be the lowest common denominator CPU build.  After we fully transition
      to the new Go runners more tech-debt can be removed and we can stop building
      the "default" runner via make and rely on the builtin always.
      
      * build: Make target improvements
      
      Add a few new targets and help for building locally.
      This also adjusts the runner lookup to favor local builds, then
      runners relative to the executable, and finally payloads.
      
      * Support customized CPU flags for runners
      
      This implements a simplified custom CPU flags pattern for the runners.
      When built without overrides, the runner name contains the vector flag
      we check for (AVX) to ensure we don't try to run on unsupported systems
      and crash.  If the user builds a customized set, we omit the naming
      scheme and don't check for compatibility.  This avoids checking
      requirements at runtime, so that logic has been removed as well.  This
      can be used to build GPU runners with no vector flags, or CPU/GPU
      runners with additional flags (e.g. AVX512) enabled.
      
      * Use relative paths
      
      If the user checks out the repo in a path that contains spaces, make gets
      really confused so use relative paths for everything in-repo to avoid breakage.
      
      * Remove payloads from main binary
      
      * install: clean up prior libraries
      
      This removes support for v0.3.6 and older versions (before the tar bundle)
      and ensures we clean up prior libraries before extracting the bundle(s).
      Without this change, runners and dependent libraries could leak when we
      update and lead to subtle runtime errors.
      4879a234
  24. 09 Dec, 2024 1 commit
    • Jesse Gross's avatar
      prompt: Don't trim whitespace from prompts · 900f64e6
      Jesse Gross authored
      New lines can be an important part of a user's prompt and trimming
      it can alter the results. We previously only trimmed prompts with
      images but refactoring brought this behavior to all prompts, where
      it became more noticable.
      
      The /generate endpoint adds less whitespace and therefore doesn't
      need to trim it out - this brings the same behavior to /chat.
      
      Thanks to @gabe-l-hart for spotting the issue!
      
      Fixes #7795
      900f64e6
  25. 05 Dec, 2024 2 commits