1. 01 Mar, 2025 1 commit
    • Bruce MacDonald's avatar
      server: validate local path on safetensor create (#9379) · bebb6823
      Bruce MacDonald authored
      More validation during the safetensor creation process.
      Properly handle relative paths (like ./model.safetensors) while rejecting absolute paths
      Add comprehensive test coverage for various paths
      No functionality changes for valid inputs - existing workflows remain unaffected
      Leverages Go 1.24's new os.Root functionality for secure containment
      bebb6823
  2. 28 Feb, 2025 1 commit
  3. 27 Feb, 2025 2 commits
    • Blake Mizerany's avatar
      server/internal/registry: implement CloseNotify and Flush (for now) (#9402) · 41dc2804
      Blake Mizerany authored
      This fixes panics introduced in 2412adf4
      when Gin ungracefully assumes that the http.ResponseWriter implements
      http.CloseNotifier and http.Flusher, which our new statusCodeRecorder
      does not. This is a temporary fix until we can pour the rest of the Gin
      out.
      41dc2804
    • Blake Mizerany's avatar
      server/internal: replace model delete API with new registry handler. (#9347) · 2412adf4
      Blake Mizerany authored
      This commit introduces a new API implementation for handling
      interactions with the registry and the local model cache. The new API is
      located in server/internal/registry. The package name is "registry" and
      should be considered temporary; it is hidden and not bleeding outside of
      the server package. As the commits roll in, we'll start consuming more
      of the API and then let reverse osmosis take effect, at which point it
      will surface closer to the root level packages as much as needed.
      2412adf4
  4. 25 Feb, 2025 1 commit
    • Blake Mizerany's avatar
      server/internal: copy bmizerany/ollama-go to internal package (#9294) · 348b3e09
      Blake Mizerany authored
      This commit copies (without history) the bmizerany/ollama-go repository
      with the intention of integrating it into the ollama as a replacement
      for the pushing, and pulling of models, and management of the cache they
      are pushed and pulled from.
      
      New homes for these packages will be determined as they are integrated
      and we have a better understanding of proper package boundaries.
      348b3e09
  5. 22 Feb, 2025 1 commit
    • Blake Mizerany's avatar
      server: group routes by category and purpose (#9270) · 68bac1e0
      Blake Mizerany authored
      The route assembly in Handler lacked clear organization making it
      difficult scan for routes and their relationships to each other. This
      commit aims to fix that by reordering the assembly of routes to group
      them by category and purpose.
      
      Also, be more specific about what "config" refers to (it is about CORS
      if you were wondering... I was.)
      68bac1e0
  6. 20 Feb, 2025 2 commits
  7. 14 Feb, 2025 3 commits
    • Jesse Gross's avatar
      Runner for Ollama engine · ed443a03
      Jesse Gross authored
      This provides integration with the new Ollama engine
      (58245413 next ollama runner (#7913)) and the rest of the Ollama
      infrastructure such as the runner and Ollama server.
      
      In addition, it also builds out the KV cache infrastructure to
      support requirements of how Ollama runs models such as:
       - Parallel processing
       - Memory management for defragmentation and shifting
       - Multi-modal modals
      
      Both old and new engines continue to be supported. By default, only
      the old engine is used. To enable the new engine:
      
      Start the server with the OLLAMA_NEW_ENGINE environment variable set:
      OLLAMA_NEW_ENGINE=1 ./ollama serve
      
      Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
      ./ollama run jessegross/llama3.1
      ed443a03
    • Jesse Gross's avatar
      models: Move model into their own directory · 6945617a
      Jesse Gross authored
      This allows there to be a file that is a list of models that is
      not mixed into the runner code.
      6945617a
    • Michael Yang's avatar
      next ollama runner (#7913) · 58245413
      Michael Yang authored
      
      
      feat: add new Ollama engine using ggml through cgo
      
      This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.
      
      - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
      - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
      - `ml.Tensor` defines the interface for a tensor and tensor operations
      
      This is the first implementation of the new engine. Follow up PRs will implement more features:
      
      - non-greedy sampling (#8410)
      - integration with Ollama and KV caching (#8301)
      - more model support (#9080) with more coming soon
      Co-authored-by: default avatarBruce MacDonald <brucewmacdonald@gmail.com>
      58245413
  8. 05 Feb, 2025 3 commits
  9. 29 Jan, 2025 1 commit
    • Michael Yang's avatar
      next build (#8539) · dcfb7a10
      Michael Yang authored
      
      
      * add build to .dockerignore
      
      * test: only build one arch
      
      * add build to .gitignore
      
      * fix ccache path
      
      * filter amdgpu targets
      
      * only filter if autodetecting
      
      * Don't clobber gpu list for default runner
      
      This ensures the GPU specific environment variables are set properly
      
      * explicitly set CXX compiler for HIP
      
      * Update build_windows.ps1
      
      This isn't complete, but is close.  Dependencies are missing, and it only builds the "default" preset.
      
      * build: add ollama subdir
      
      * add .git to .dockerignore
      
      * docs: update development.md
      
      * update build_darwin.sh
      
      * remove unused scripts
      
      * llm: add cwd and build/lib/ollama to library paths
      
      * default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS
      
      * add additional cmake output vars for msvc
      
      * interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12
      
      * remove unncessary filepath.Dir, cleanup
      
      * add hardware-specific directory to path
      
      * use absolute server path
      
      * build: linux arm
      
      * cmake install targets
      
      * remove unused files
      
      * ml: visit each library path once
      
      * build: skip cpu variants on arm
      
      * build: install cpu targets
      
      * build: fix workflow
      
      * shorter names
      
      * fix rocblas install
      
      * docs: clean up development.md
      
      * consistent build dir removal in development.md
      
      * silence -Wimplicit-function-declaration build warnings in ggml-cpu
      
      * update readme
      
      * update development readme
      
      * llm: update library lookup logic now that there is one runner (#8587)
      
      * tweak development.md
      
      * update docs
      
      * add windows cuda/rocm tests
      
      ---------
      Co-authored-by: default avatarjmorganca <jmorganca@gmail.com>
      Co-authored-by: default avatarDaniel Hiltgen <daniel@ollama.com>
      dcfb7a10
  10. 15 Jan, 2025 1 commit
  11. 09 Jan, 2025 1 commit
  12. 08 Jan, 2025 1 commit
  13. 01 Jan, 2025 1 commit
  14. 23 Dec, 2024 1 commit
  15. 15 Dec, 2024 1 commit
  16. 11 Dec, 2024 1 commit
  17. 10 Dec, 2024 3 commits
    • frob's avatar
      757eeacc
    • Stefan Weil's avatar
    • Daniel Hiltgen's avatar
      build: Make target improvements (#7499) · 4879a234
      Daniel Hiltgen authored
      * llama: wire up builtin runner
      
      This adds a new entrypoint into the ollama CLI to run the cgo built runner.
      On Mac arm64, this will have GPU support, but on all other platforms it will
      be the lowest common denominator CPU build.  After we fully transition
      to the new Go runners more tech-debt can be removed and we can stop building
      the "default" runner via make and rely on the builtin always.
      
      * build: Make target improvements
      
      Add a few new targets and help for building locally.
      This also adjusts the runner lookup to favor local builds, then
      runners relative to the executable, and finally payloads.
      
      * Support customized CPU flags for runners
      
      This implements a simplified custom CPU flags pattern for the runners.
      When built without overrides, the runner name contains the vector flag
      we check for (AVX) to ensure we don't try to run on unsupported systems
      and crash.  If the user builds a customized set, we omit the naming
      scheme and don't check for compatibility.  This avoids checking
      requirements at runtime, so that logic has been removed as well.  This
      can be used to build GPU runners with no vector flags, or CPU/GPU
      runners with additional flags (e.g. AVX512) enabled.
      
      * Use relative paths
      
      If the user checks out the repo in a path that contains spaces, make gets
      really confused so use relative paths for everything in-repo to avoid breakage.
      
      * Remove payloads from main binary
      
      * install: clean up prior libraries
      
      This removes support for v0.3.6 and older versions (before the tar bundle)
      and ensures we clean up prior libraries before extracting the bundle(s).
      Without this change, runners and dependent libraries could leak when we
      update and lead to subtle runtime errors.
      4879a234
  18. 09 Dec, 2024 1 commit
    • Jesse Gross's avatar
      prompt: Don't trim whitespace from prompts · 900f64e6
      Jesse Gross authored
      New lines can be an important part of a user's prompt and trimming
      it can alter the results. We previously only trimmed prompts with
      images but refactoring brought this behavior to all prompts, where
      it became more noticable.
      
      The /generate endpoint adds less whitespace and therefore doesn't
      need to trim it out - this brings the same behavior to /chat.
      
      Thanks to @gabe-l-hart for spotting the issue!
      
      Fixes #7795
      900f64e6
  19. 05 Dec, 2024 2 commits
  20. 30 Nov, 2024 2 commits
  21. 27 Nov, 2024 1 commit
  22. 25 Nov, 2024 1 commit
    • Blake Mizerany's avatar
      server: fix Transport override (#7834) · 2b7ed61c
      Blake Mizerany authored
      This changes makeRequest to update the http client Transport if and only
      if testMakeRequestDialContext is set. This is to avoid overriding the
      default Transport when testMakeRequestDialContext is nil, which broke
      existing behavior, included proxies, timeouts, and other behaviors.
      
      Fixes #7829
      Fixes #7788
      2b7ed61c
  23. 23 Nov, 2024 1 commit
  24. 22 Nov, 2024 1 commit
    • Bruce MacDonald's avatar
      server: remove out of date anonymous access check (#7785) · 7b5585b9
      Bruce MacDonald authored
      In the past the ollama.com server would return a JWT that contained
      information about the user being authenticated. This was used to return
      different error messages to the user. This is no longer possible since the
      token used to authenticate does not contain information about the user
      anymore. Removing this code that no longer works.
      
      Follow up changes will improve the error messages returned here, but good to
      clean up first.
      7b5585b9
  25. 20 Nov, 2024 1 commit
  26. 19 Nov, 2024 1 commit
    • Blake Mizerany's avatar
      server: allow mixed-case model names on push, pull, cp, and create (#7676) · 4b8a2e34
      Blake Mizerany authored
      This change allows for mixed-case model names to be pushed, pulled,
      copied, and created, which was previously disallowed because the Ollama
      registry was backed by a Docker registry that enforced a naming
      convention that disallowed mixed-case names, which is no longer the
      case.
      
      This does not break existing, intended, behaviors.
      
      Also, make TestCase test a story of creating, updating, pulling, and
      copying a model with case variations, ensuring the model's manifest is
      updated correctly, and not duplicated across different files with
      different case variations.
      4b8a2e34
  27. 17 Nov, 2024 1 commit
  28. 06 Nov, 2024 1 commit
    • Jesse Gross's avatar
      sched: Lift parallel restriction for multimodal models except mllama · 6cd56687
      Jesse Gross authored
      The Go runner does not have a problem with supporting parallel
      requests for most multimodal models. Now that we won't be potentially
      falling back to server.cpp, this restriction can be lifted.
      
      However, the new mllama model can't support parallel requests, so we
      will need to keep a restriction for that.
      6cd56687
  29. 05 Nov, 2024 2 commits
    • Daniel Hiltgen's avatar
      One corrupt manifest should not wedge model operations (#7515) · a4c70fe1
      Daniel Hiltgen authored
      One potential failure mode is an empty file which bubbles up as an EOF error,
      leading to all pulls and listing operations failing.  Instead, continue and
      warn about the corrupt manifest.  This also allows re-pulling the corrupt
      manifest to repair the system.
      a4c70fe1
    • Jesse Gross's avatar
      prompt: Use a single token when estimating mllama context size · 34a75102
      Jesse Gross authored
      Currently we assume that images take 768 tokens of context size for
      the purposes of clipping old messages that exceed the context window.
      However, our mllama implementation stores the full image embedding
      in a single token. As a result, there is significant waste of context
      space.
      
      Ideally, we would handle this more generically and have the
      implementation report the number of tokens. However, at the moment
      this would just result in a similar set of 'if' conditions in the
      runner plus APIs to report it back. So for now, we just keep this
      simple.
      34a75102