1. 27 Oct, 2025 1 commit
    • Devon Rifkin's avatar
      create: inherit FROM model's renderer/parser · 1bdd8169
      Devon Rifkin authored
      On main, the `RENDERER` and `PARSER` fields from the `Modelfile` don't
      get propagated to a new model created with a `req.From` parameter. This
      is easily triggered via `ollama run qwen3-coder`, then running some save
      command like `/save qwen3-coder-custom`.
      
      Added a regression test for this, and then open the config for the
      "from" model in order to use its renderer/parser as a default for the
      new model. This will fix the CLI and also API-based creates.
      
      Fixes: https://github.com/ollama/ollama/issues/12792
      1bdd8169
  2. 23 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      DRY out the runner lifecycle code (#12540) · 3258a89b
      Daniel Hiltgen authored
      * DRY out the runner lifecycle code
      
      Now that discovery uses the runners as well, this unifies the runner spawning code
      into a single place.  This also unifies GPU discovery types with the newer ml.DeviceInfo
      
      * win: make incremental builds better
      
      Place build artifacts in discrete directories so incremental builds don't have to start fresh
      
      * Adjust sort order to consider iGPUs
      
      * handle cpu inference oom scenarios
      
      * review comments
      3258a89b
  3. 22 Oct, 2025 1 commit
  4. 20 Oct, 2025 1 commit
  5. 17 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      test: harden scheduler tests (#12662) · 68e04c7f
      Daniel Hiltgen authored
      * test: harden scheduler tests
      
      This removes reschedDelay which was stale code, and adds
      a new configurable timeout for the waitForVRAMRecovery so
      tests can now set the timeout to be very short to avoid the
      scheduler getting stuck and hitting a test timeout.
      
      * test: tune tests for partial loads
      
      Give stress tests more time when the model is split between CPU/GPU
      68e04c7f
  6. 16 Oct, 2025 1 commit
  7. 14 Oct, 2025 1 commit
  8. 13 Oct, 2025 1 commit
    • Grace's avatar
      Qwen3VL Cloud Parser and Renderer (#12526) · 05982a95
      Grace authored
      
      
      * working (other than tool call is the incorrect order) for tool calls and tools
      
      * Tests work, other than image tags (tests do not go through server) and tools (not in the correct order, but contents are the same)
      
      * testing for qwen3vl parser - toolparser is working
      
      * made changes to JSON tool parser, wraps the TollCallFunction with a TollCall object
      
      * Working parser for thinking models - assumes state of thinking, emits unambiguous content in thinking, does not call tool call in thinking
      
      * changed the parser to start with collecting content
      
      * thinking prefill
      
      * add hasThinkingSupport parameter to parser
      
      * qwen3-vl -> qwen3-vl-instruct for renderer/parser
      
      * Add hasThinkingSupport=false to QwenVLParser
      
      ---------
      Co-authored-by: default avatarDevon Rifkin <drifkin@drifkin.net>
      05982a95
  9. 11 Oct, 2025 2 commits
  10. 10 Oct, 2025 2 commits
  11. 09 Oct, 2025 4 commits
  12. 08 Oct, 2025 1 commit
  13. 05 Oct, 2025 1 commit
  14. 03 Oct, 2025 1 commit
  15. 01 Oct, 2025 2 commits
    • Daniel Hiltgen's avatar
      Use runners for GPU discovery (#12090) · bc8909fb
      Daniel Hiltgen authored
      This revamps how we discover GPUs in the system by leveraging the Ollama
      runner.  This should eliminate inconsistency between our GPU discovery and the
      runners capabilities at runtime, particularly for cases where we try to filter
      out unsupported GPUs.  Now the runner does that implicitly based on the actual
      device list.  In some cases free VRAM reporting can be unreliable which can
      leaad to scheduling mistakes, so this also includes a patch to leverage more
      reliable VRAM reporting libraries if available.
      
      Automatic workarounds have been removed as only one GPU leveraged this, which
      is now documented. This GPU will soon fall off the support matrix with the next
      ROCm bump.
      
      Additional cleanup of the scheduler and discovery packages can be done in the
      future once we have switched on the new memory management code, and removed
      support for the llama runner.
      bc8909fb
    • Michael Yang's avatar
      fix keep alive · 35ac4eb1
      Michael Yang authored
      this reference to keep alive was missed in #12041 so chat has a
      diffferent behaviour than generate
      35ac4eb1
  16. 23 Sep, 2025 1 commit
    • Patrick Devine's avatar
      auth: fix problems with the ollama keypairs (#12373) · 64883e3c
      Patrick Devine authored
      * auth: fix problems with the ollama keypairs
      
      This change adds several fixes including:
        - reading in the pubkey files correctly
        - fixing the push unit test to create a keypair file in a temp directory
        - not return 500 errors for normal status error
      64883e3c
  17. 18 Sep, 2025 3 commits
  18. 17 Sep, 2025 3 commits
  19. 15 Sep, 2025 3 commits
    • Michael Yang's avatar
      model: implement bert in ollama engine (#9080) · 3f6642f6
      Michael Yang authored
      * fix truncate
      
      * s/SentencePieceModel/SentencePiece/
      
      * bert
      
      * wordpiece
      
      * refactor pooling
      
      * more tokenizers
      
      * normalize embeddings
      3f6642f6
    • Devon Rifkin's avatar
      address comments · 472feec2
      Devon Rifkin authored
      472feec2
    • Devon Rifkin's avatar
      add qwen3-coder tool support · 47991940
      Devon Rifkin authored
      The format qwen3-coder uses is relatively unique, both in rendering and
      in parsing. To implement parsing, I wrote a custom parser in similar
      style to harmony. For the rendering, I found that the logic would be
      much more difficult to follow in a template, so I introduced the concept
      of a built-in renderer that uses go code, rather than a template to
      generate prompts.
      
      I set us up for future built-in parsers and renderers by making it so
      they can be specified in a Modelfile like so:
      
      ```
      RENDERER "qwen3-coder"
      PARSER "qwen3-coder"
      ```
      
      These need to be provided explicitly because the architecture alone is
      not enough to understand what format the model expects to receive, and
      what format we expect it to output (e.g., qwen3-coder is `qwen3moe`,
      which includes other qwen3-family models as well)
      
      I haven't converted harmony to be one of these "built-ins" yet, since
      some of it is in flux with the changes @ParthSareen has been making to
      move harmony to the runner. It is likely that many other built-ins will
      need to move to the runner as well, but I'm able to slightly defer that
      decision since qwen3-coder doesn't have thinking (and therefore doesn't
      need to be in the runner to make structured outputs work). I expect to
      unify harmony with this approach very soon.
      
      Whether a particular model supports tools or thinking was previously
      inferred from templates, but without a template we now also use the
      parser itself to declare what it supports. If we have future models that
      re-use the same parsing format, but have different capabilities, we'll
      want to parameterize them and give them different names to be specified
      as a `PARSER`.
      
      Misc changes:
      
      - I worked on the renderer by diffing outputs from the reference
        implementation and ours. To make it easier to do this, I extended
        <https://github.com/ollama/ollama/pull/11875> to also support
        returning the prompt via the openai compat layer
      47991940
  20. 12 Sep, 2025 2 commits
  21. 11 Sep, 2025 1 commit
  22. 10 Sep, 2025 1 commit
  23. 08 Sep, 2025 1 commit
  24. 27 Aug, 2025 1 commit
  25. 22 Aug, 2025 1 commit
  26. 21 Aug, 2025 1 commit
  27. 18 Aug, 2025 1 commit
    • Devon Rifkin's avatar
      harmony: convert fn names to be valid ts identifiers · 048bd447
      Devon Rifkin authored
      In <https://github.com/ollama/ollama/issues/11704#issuecomment-3177380197>
      I noticed that hyphens in function names could possibly cause the model
      to become confused. Later in that issue I found other explanations, but
      at a minimum tool names with spaces in them are confusing to the model
      because of the prompt format.
      
      In this change I create a mapper that converts arbitrary tool names into
      valid typescript identifiers. It's a little overly strict in that it
      doesn't allow all unicode characters that might be valid in ts
      identifiers, but it's still very permissive. Since mappings aren't
      reversible, we must temporarily store this mapping in order to unmap it
      if the model comes back with a call. We also handle the case where
      multiple mappings collide into the same mapping and append a counter to
      the end to make them unique
      048bd447