"pytorch/cuda/moe_comm_kernel.cu" did not exist on "969ef607227740bd034a678a8855b90bf7ba15aa"
- 07 Jan, 2026 1 commit
-
-
Parth Sareen authored
-
- 29 May, 2025 1 commit
-
-
Devon Rifkin authored
- Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/set think` or `/set nothink` - A `--hidethinking` option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like `ollama run qwen3 --think --hidethinking "my question here"` where you just want to see the answer but still want the benefits of thinking models
-
- 13 Aug, 2024 1 commit
-
-
Michael Yang authored
- fixes printf: non-constant format string in call to fmt.Printf - fixes SA1032: arguments have the wrong order - disables testifylint
-
- 26 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
Co-authored-by:Wen Sun <iwendellsun@gmail.com>
-
- 28 Oct, 2023 1 commit
-
-
Jeffrey Morgan authored
* dont quit ioloop on 0 rune * check for closed channel * remove unused error on `Close()`
-
- 26 Oct, 2023 1 commit
-
-
Patrick Devine authored
-
- 25 Oct, 2023 1 commit
-
-
Patrick Devine authored
-