1. 22 Aug, 2025 1 commit
    • Devon Rifkin's avatar
      thinking: fix double emit when no opening tag · 2cb0a580
      Devon Rifkin authored
      The thinking parser will automatically transition to being a
      pass-through if non-whitespace is seen before an opening tag. However,
      we weren't clearing the buffer after the first non-whitespace input, so
      in practice the first token would be emitted twice.
      
      Added a test that demonstrated this, and then fixed the bug.
      2cb0a580
  2. 06 Jun, 2025 1 commit
  3. 05 Jun, 2025 1 commit
  4. 29 May, 2025 1 commit
    • Devon Rifkin's avatar
      add thinking support to the api and cli (#10584) · 5f57b0ef
      Devon Rifkin authored
      - Both `/api/generate` and `/api/chat` now accept a `"think"`
        option that allows specifying whether thinking mode should be on or
        not
      - Templates get passed this new option so, e.g., qwen3's template can
        put `/think` or `/no_think` in the system prompt depending on the
        value of the setting
      - Models' thinking support is inferred by inspecting model templates.
        The prefix and suffix the parser uses to identify thinking support is
        also automatically inferred from templates
      - Thinking control & parsing is opt-in via the API to prevent breaking
        existing API consumers. If the `"think"` option is not specified, the
        behavior is unchanged from previous versions of ollama
      - Add parsing for thinking blocks in both streaming/non-streaming mode
        in both `/generate` and `/chat`
      - Update the CLI to make use of these changes. Users can pass `--think`
        or `--think=false` to control thinking, or during an interactive
        session they can use the commands `/se...
      5f57b0ef