1. 22 Jul, 2025 1 commit
  2. 17 Jul, 2025 1 commit
  3. 16 Jul, 2025 1 commit
  4. 11 Jul, 2025 1 commit
  5. 08 Jul, 2025 2 commits
    • Daniel Hiltgen's avatar
      doc: add MacOS docs (#11334) · 66fb8575
      Daniel Hiltgen authored
      also removes stale model dir instructions for windows
      66fb8575
    • Daniel Hiltgen's avatar
      Reduce default parallelism to 1 (#11330) · 20c3266e
      Daniel Hiltgen authored
      The current scheduler algorithm of picking the paralellism based on available
      VRAM complicates the upcoming dynamic layer memory allocation algorithm.  This
      changes the default to 1, with the intent going forward that parallelism is
      explicit and will no longer be dynamically determined.  Removal of the dynamic
      logic will come in a follow up.
      20c3266e
  6. 07 Jul, 2025 2 commits
  7. 05 Jul, 2025 1 commit
  8. 23 Jun, 2025 1 commit
    • Daniel Hiltgen's avatar
      Re-remove cuda v11 (#10694) · 1c6669e6
      Daniel Hiltgen authored
      * Re-remove cuda v11
      
      Revert the revert - drop v11 support requiring drivers newer than Feb 23
      
      This reverts commit c6bcdc42.
      
      * Simplify layout
      
      With only one version of the GPU libraries, we can simplify things down somewhat.  (Jetsons still require special handling)
      
      * distinct sbsa variant for linux arm64
      
      This avoids accidentally trying to load the sbsa cuda libraries on
      a jetson system which results in crashes.
      
      * temporary prevent rocm+cuda mixed loading
      1c6669e6
  9. 18 Jun, 2025 1 commit
  10. 07 Jun, 2025 2 commits
  11. 06 Jun, 2025 1 commit
  12. 04 Jun, 2025 1 commit
  13. 29 May, 2025 1 commit
    • Devon Rifkin's avatar
      add thinking support to the api and cli (#10584) · 5f57b0ef
      Devon Rifkin authored
      - Both `/api/generate` and `/api/chat` now accept a `"think"`
        option that allows specifying whether thinking mode should be on or
        not
      - Templates get passed this new option so, e.g., qwen3's template can
        put `/think` or `/no_think` in the system prompt depending on the
        value of the setting
      - Models' thinking support is inferred by inspecting model templates.
        The prefix and suffix the parser uses to identify thinking support is
        also automatically inferred from templates
      - Thinking control & parsing is opt-in via the API to prevent breaking
        existing API consumers. If the `"think"` option is not specified, the
        behavior is unchanged from previous versions of ollama
      - Add parsing for thinking blocks in both streaming/non-streaming mode
        in both `/generate` and `/chat`
      - Update the CLI to make use of these changes. Users can pass `--think`
        or `--think=false` to control thinking, or during an interactive
        session they can use the commands `/set think` or `/set nothink`
      - A `--hidethinking` option has also been added to the CLI. This makes
        it easy to use thinking in scripting scenarios like
        `ollama run qwen3 --think --hidethinking "my question here"` where you
        just want to see the answer but still want the benefits of thinking
        models
      5f57b0ef
  14. 24 May, 2025 1 commit
  15. 13 May, 2025 1 commit
  16. 12 May, 2025 1 commit
    • Daniel Hiltgen's avatar
      Follow up to #10363 (#10647) · 9d6df908
      Daniel Hiltgen authored
      The quantization PR didn't block all unsupported file types,
      which this PR fixes.  It also updates the API docs to reflect
      the now reduced set of supported types.
      9d6df908
  17. 08 May, 2025 1 commit
  18. 07 May, 2025 1 commit
    • Daniel Hiltgen's avatar
      remove cuda v11 (#10569) · fa393554
      Daniel Hiltgen authored
      This reduces the size of our Windows installer payloads by ~256M by dropping
      support for nvidia drivers older than Feb 2023.  Hardware support is unchanged.
      
      Linux default bundle sizes are reduced by ~600M to 1G.
      fa393554
  19. 05 May, 2025 1 commit
  20. 29 Apr, 2025 1 commit
  21. 28 Apr, 2025 1 commit
  22. 22 Apr, 2025 1 commit
    • Devon Rifkin's avatar
      increase default context length to 4096 (#10364) · 424f6486
      Devon Rifkin authored
      * increase default context length to 4096
      
      We lower the default numParallel from 4 to 2 and use these "savings" to
      double the default context length from 2048 to 4096.
      
      We're memory neutral in cases when we previously would've used
      numParallel == 4, but we add the following mitigation to handle some
      cases where we would have previously fallen back to 1x2048 due to low
      VRAM: we decide between 2048 and 4096 using a runtime check, choosing
      2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
      purposefully don't check the available VRAM because we don't want the
      context window size to change unexpectedly based on the available VRAM.
      
      We plan on making the default even larger, but this is a relatively
      low-risk change we can make to quickly double it.
      
      * fix tests
      
      add an explicit context length so they don't get truncated. The code
      that converts -1 from being a signal for doing a runtime check isn't
      running as part of these tests.
      
      * tweak small gpu message
      
      * clarify context length default
      
      also make it actually show up in `ollama serve --help`
      424f6486
  23. 15 Apr, 2025 2 commits
  24. 08 Apr, 2025 1 commit
  25. 01 Apr, 2025 1 commit
  26. 27 Mar, 2025 1 commit
  27. 25 Mar, 2025 1 commit
  28. 21 Mar, 2025 2 commits
  29. 13 Mar, 2025 1 commit
  30. 10 Mar, 2025 1 commit
  31. 07 Mar, 2025 1 commit
    • ‮rekcäH nitraM‮'s avatar
      Better WantedBy declaration · 25248f4b
      ‮rekcäH nitraM‮ authored
      The problem with default.target is that it always points to the target that is currently started. So if you boot into single user mode or the rescue mode still Ollama tries to start.
      
      I noticed this because either tried (and failed) to start all the time during a system update, where Ollama definitely is not wanted.
      25248f4b
  32. 05 Mar, 2025 1 commit
  33. 04 Mar, 2025 1 commit
    • Blake Mizerany's avatar
      server/.../backoff,syncs: don't break builds without synctest (#9484) · 55ab9f37
      Blake Mizerany authored
      Previously, developers without the synctest experiment enabled would see
      build failures when running tests in some server/internal/internal
      packages using the synctest package. This change makes the transition to
      use of the package less painful but guards the use of the synctest
      package with build tags.
      
      synctest is enabled in CI. If a new change will break a synctest
      package, it will break in CI, even if it does not break locally.
      
      The developer docs have been updated to help with any confusion about
      why package tests pass locally but fail in CI.
      55ab9f37
  34. 27 Feb, 2025 1 commit
    • Daniel Hiltgen's avatar
      Windows ARM build (#9120) · 688925ac
      Daniel Hiltgen authored
      * Windows ARM build
      
      Skip cmake, and note it's unused in the developer docs.
      
      * Win: only check for ninja when we need it
      
      On windows ARM, the cim lookup fails, but we don't need ninja anyway.
      688925ac
  35. 25 Feb, 2025 1 commit