• Devon Rifkin's avatar
    add thinking support to the api and cli (#10584) · 5f57b0ef
    Devon Rifkin authored
    - Both `/api/generate` and `/api/chat` now accept a `"think"`
      option that allows specifying whether thinking mode should be on or
      not
    - Templates get passed this new option so, e.g., qwen3's template can
      put `/think` or `/no_think` in the system prompt depending on the
      value of the setting
    - Models' thinking support is inferred by inspecting model templates.
      The prefix and suffix the parser uses to identify thinking support is
      also automatically inferred from templates
    - Thinking control & parsing is opt-in via the API to prevent breaking
      existing API consumers. If the `"think"` option is not specified, the
      behavior is unchanged from previous versions of ollama
    - Add parsing for thinking blocks in both streaming/non-streaming mode
      in both `/generate` and `/chat`
    - Update the CLI to make use of these changes. Users can pass `--think`
      or `--think=false` to control thinking, or during an interactive
      session they can use the commands `/set think` or `/set nothink`
    - A `--hidethinking` option has also been added to the CLI. This makes
      it easy to use thinking in scripting scenarios like
      `ollama run qwen3 --think --hidethinking "my question here"` where you
      just want to see the answer but still want the benefits of thinking
      models
    5f57b0ef
routes_generate_test.go 27.7 KB