1. 18 Dec, 2025 3 commits
  2. 17 Dec, 2025 1 commit
  3. 12 Dec, 2025 1 commit
  4. 04 Dec, 2025 1 commit
  5. 01 Dec, 2025 1 commit
    • Bruce MacDonald's avatar
      api/client: handle non-json streaming errors (#13007) · 5b6a8e60
      Bruce MacDonald authored
      While processing the response stream during a chat or generation if an error is occurred it is parsed and returned to the user. The issue with the existing code is that this assumed the response would be valid JSON, which is not a safe assumption and caused cryptic error messages to be displayed due to parsing failures:
      `invalid character 'i' looking for beginning of value`
      
      This change updates the stream function to return the raw error string if it cant be parsed as JSON. This should help with debugging issues by making sure the actual error reaches the user.
      5b6a8e60
  6. 13 Nov, 2025 1 commit
  7. 11 Nov, 2025 1 commit
    • Baptiste Jamin's avatar
      server: add logprobs and top_logprobs support to Ollama's API (#12899) · 59241c5b
      Baptiste Jamin authored
      
      
      Adds logprobs support to Ollama's API including support for Ollama's
      OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter
      in the API, Ollama will return the log probabilities for each token generated.
      'top_logprobs', an integer value can also be specified up to the value 20.
      When specified, the API will also provide the number of most likely tokens to
      return at each token position
      Co-authored-by: default avatarBaptiste Jamin <baptiste@crisp.chat>
      59241c5b
  8. 06 Nov, 2025 1 commit
  9. 05 Nov, 2025 1 commit
  10. 15 Oct, 2025 1 commit
  11. 13 Oct, 2025 1 commit
    • Grace's avatar
      Qwen3VL Cloud Parser and Renderer (#12526) · 05982a95
      Grace authored
      
      
      * working (other than tool call is the incorrect order) for tool calls and tools
      
      * Tests work, other than image tags (tests do not go through server) and tools (not in the correct order, but contents are the same)
      
      * testing for qwen3vl parser - toolparser is working
      
      * made changes to JSON tool parser, wraps the TollCallFunction with a TollCall object
      
      * Working parser for thinking models - assumes state of thinking, emits unambiguous content in thinking, does not call tool call in thinking
      
      * changed the parser to start with collecting content
      
      * thinking prefill
      
      * add hasThinkingSupport parameter to parser
      
      * qwen3-vl -> qwen3-vl-instruct for renderer/parser
      
      * Add hasThinkingSupport=false to QwenVLParser
      
      ---------
      Co-authored-by: default avatarDevon Rifkin <drifkin@drifkin.net>
      05982a95
  12. 11 Oct, 2025 1 commit
  13. 09 Oct, 2025 2 commits
  14. 08 Oct, 2025 1 commit
  15. 23 Sep, 2025 1 commit
    • Patrick Devine's avatar
      auth: fix problems with the ollama keypairs (#12373) · 64883e3c
      Patrick Devine authored
      * auth: fix problems with the ollama keypairs
      
      This change adds several fixes including:
        - reading in the pubkey files correctly
        - fixing the push unit test to create a keypair file in a temp directory
        - not return 500 errors for normal status error
      64883e3c
  16. 17 Sep, 2025 1 commit
  17. 15 Sep, 2025 1 commit
    • Devon Rifkin's avatar
      add qwen3-coder tool support · 47991940
      Devon Rifkin authored
      The format qwen3-coder uses is relatively unique, both in rendering and
      in parsing. To implement parsing, I wrote a custom parser in similar
      style to harmony. For the rendering, I found that the logic would be
      much more difficult to follow in a template, so I introduced the concept
      of a built-in renderer that uses go code, rather than a template to
      generate prompts.
      
      I set us up for future built-in parsers and renderers by making it so
      they can be specified in a Modelfile like so:
      
      ```
      RENDERER "qwen3-coder"
      PARSER "qwen3-coder"
      ```
      
      These need to be provided explicitly because the architecture alone is
      not enough to understand what format the model expects to receive, and
      what format we expect it to output (e.g., qwen3-coder is `qwen3moe`,
      which includes other qwen3-family models as well)
      
      I haven't converted harmony to be one of these "built-ins" yet, since
      some of it is in flux with the changes @ParthSareen has been making to
      move harmony to the runner. It is likely that many other built-ins will
      need to move to the runner as well, but I'm able to slightly defer that
      decision since qwen3-coder doesn't have thinking (and therefore doesn't
      need to be in the runner to make structured outputs work). I expect to
      unify harmony with this approach very soon.
      
      Whether a particular model supports tools or thinking was previously
      inferred from templates, but without a template we now also use the
      parser itself to declare what it supports. If we have future models that
      re-use the same parsing format, but have different capabilities, we'll
      want to parameterize them and give them different names to be specified
      as a `PARSER`.
      
      Misc changes:
      
      - I worked on the renderer by diffing outputs from the reference
        implementation and ours. To make it easier to do this, I extended
        <https://github.com/ollama/ollama/pull/11875> to also support
        returning the prompt via the openai compat layer
      47991940
  18. 11 Sep, 2025 1 commit
  19. 27 Aug, 2025 1 commit
  20. 22 Aug, 2025 1 commit
  21. 15 Aug, 2025 1 commit
  22. 12 Aug, 2025 1 commit
  23. 05 Aug, 2025 2 commits
    • Devon Rifkin's avatar
      tools: support anyOf types · 30f8a68c
      Devon Rifkin authored
      afaik gpt-oss is the first model that meaningfully transforms tool
      function definitions in its template. We found that relatively common
      definitions that include `anyOf` were not working because the template
      was assuming that types were always defined via a `type` field.
      
      anyOf allows for fully recursive types, so I exposed a
      `toTypeScriptType()` function to handle this recursive logic in go and
      keep the templates cleaner. The gpt-oss templates will need to be
      updated to use this.
      
      We should keep building out our function definition support to more
      fully support the parts of json schema that make sense for this use
      case, but in the meantime this will unblock some users (e.g., zed's
      ollama integration w/ gpt-oss). Probably the most urgent is proper array
      support
      30f8a68c
    • Michael Yang's avatar
      gpt-oss (#11672) · fa7776fd
      Michael Yang authored
      
      
      * bf16
      
      * tests
      
      * gpt-oss
      
      * enable gptoss for engine
      
      * rough estimate
      
      * convert to mxfp4
      
      * handle safetensors U8
      
      * clamp glu/linear
      
      * update tokenizer
      
      * MXFP4 support
      
      This implements the Open Compute Microscaling (MX) FP4 format
      as a tensor type with backend implementations focusing
      on mulmat and mulmatid on CPU, CUDA, and Metal.
      
      * Unit tests for MXFP4 support
      
      This exercises various operations and shapes on both CPU and GPU (if detected
      on the system)
      
      * cuda graph
      
      * unit test adjustments
      
      * cuda: optimize memory access
      
      Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4
      
      * mac: fix crash on old macos versions
      
      cblas_sgemm is only supported on v13.3 and up, however bf16 is
      only supported on v14+ so we were falling back to ggml-blas and
      crashing on bf16 tensors.  Checking for the function being null
      seems to be the simplest way to condittionally avoid registering the
      backend.
      
      * server: Minimum context length for gptoss
      
      This model requires a minimum context length of 8192 to function
      effectively. Users can set higher values through all normal mechanisms
      but lower values will be silently reset.
      
      * ggml: Multiply by numParallel for gptoss sliding window
      
      When computing the graph size estimate, the context size is already
      multiplied by numParallel so estimates reflect that. However, since
      sliding window models use a smaller, fixed context size, they need
      to manually take numParallel into account.
      
      * gpt-oss integration
      
      includes harmony parser and thinking levels, etc.
      
      * fix sync
      
      * fix tests
      
      * fix lint
      
      ---------
      Co-authored-by: default avatarDaniel Hiltgen <daniel@ollama.com>
      Co-authored-by: default avatarJesse Gross <jesse@ollama.com>
      Co-authored-by: default avatarDevon Rifkin <drifkin@drifkin.net>
      fa7776fd
  24. 16 Jul, 2025 1 commit
    • Bruce MacDonald's avatar
      api: fix unreachable status err (#11423) · 92c2e8a5
      Bruce MacDonald authored
      StatusError was unreachable, the client always checked for error messages in the response body first, and the server always includes error messages with HTTP error status codes.
      92c2e8a5
  25. 08 Jul, 2025 1 commit
  26. 07 Jul, 2025 1 commit
  27. 07 Jun, 2025 1 commit
  28. 04 Jun, 2025 1 commit
  29. 29 May, 2025 1 commit
    • Devon Rifkin's avatar
      add thinking support to the api and cli (#10584) · 5f57b0ef
      Devon Rifkin authored
      - Both `/api/generate` and `/api/chat` now accept a `"think"`
        option that allows specifying whether thinking mode should be on or
        not
      - Templates get passed this new option so, e.g., qwen3's template can
        put `/think` or `/no_think` in the system prompt depending on the
        value of the setting
      - Models' thinking support is inferred by inspecting model templates.
        The prefix and suffix the parser uses to identify thinking support is
        also automatically inferred from templates
      - Thinking control & parsing is opt-in via the API to prevent breaking
        existing API consumers. If the `"think"` option is not specified, the
        behavior is unchanged from previous versions of ollama
      - Add parsing for thinking blocks in both streaming/non-streaming mode
        in both `/generate` and `/chat`
      - Update the CLI to make use of these changes. Users can pass `--think`
        or `--think=false` to control thinking, or during an interactive
        session they can use the commands `/se...
      5f57b0ef
  30. 27 May, 2025 1 commit
  31. 08 May, 2025 2 commits
  32. 07 May, 2025 1 commit
  33. 05 May, 2025 1 commit
  34. 24 Apr, 2025 1 commit
  35. 10 Apr, 2025 1 commit