1. 18 Dec, 2025 3 commits
  2. 16 Dec, 2025 1 commit
    • Bruce MacDonald's avatar
      types: ConfigV2 and RootFS (#13504) · 45c47393
      Bruce MacDonald authored
      Refactored the ConfigV2 and RootFS types from server/images.go to a new types/model/config.go file under the model package. Updated all references to use model.ConfigV2 and model.RootFS. This allows for use in other projects without worrying about compiling the c code in the llama package.
      45c47393
  3. 11 Dec, 2025 2 commits
  4. 08 Dec, 2025 1 commit
  5. 05 Dec, 2025 1 commit
  6. 20 Nov, 2025 1 commit
  7. 18 Nov, 2025 1 commit
  8. 16 Nov, 2025 2 commits
  9. 13 Nov, 2025 1 commit
  10. 11 Nov, 2025 2 commits
    • Jesse Gross's avatar
      llm: Use Ollama engine memory layouts for both old and new engines · f560bd07
      Jesse Gross authored
      Currently for both the old and new engines, there is code to
      calculate how much memory is required for a model and lay out
      the layers onto GPUs. This reuses the new engine's lay out code
      for the old engine as well, bringing them closer together. The
      old engine continues to use its current method of estimating
      required memory.
      
      This reduces maintainence effort and improves consistency, as new
      features only need to be implemented in one place. The newer code
      is also more accurate, especially with multiple GPUs.
      f560bd07
    • Baptiste Jamin's avatar
      server: add logprobs and top_logprobs support to Ollama's API (#12899) · 59241c5b
      Baptiste Jamin authored
      
      
      Adds logprobs support to Ollama's API including support for Ollama's
      OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter
      in the API, Ollama will return the log probabilities for each token generated.
      'top_logprobs', an integer value can also be specified up to the value 20.
      When specified, the API will also provide the number of most likely tokens to
      return at each token position
      Co-authored-by: default avatarBaptiste Jamin <baptiste@crisp.chat>
      59241c5b
  11. 06 Nov, 2025 2 commits
  12. 05 Nov, 2025 2 commits
  13. 04 Nov, 2025 1 commit
  14. 29 Oct, 2025 1 commit
  15. 28 Oct, 2025 1 commit
  16. 27 Oct, 2025 2 commits
    • Devon Rifkin's avatar
      create: inherit FROM model's renderer/parser · 1bdd8169
      Devon Rifkin authored
      On main, the `RENDERER` and `PARSER` fields from the `Modelfile` don't
      get propagated to a new model created with a `req.From` parameter. This
      is easily triggered via `ollama run qwen3-coder`, then running some save
      command like `/save qwen3-coder-custom`.
      
      Added a regression test for this, and then open the config for the
      "from" model in order to use its renderer/parser as a default for the
      new model. This will fix the CLI and also API-based creates.
      
      Fixes: https://github.com/ollama/ollama/issues/12792
      1bdd8169
    • nicole pardal's avatar
      server: Consolidate embedding truncation in runner (#12730) · 5d347f6d
      nicole pardal authored
      Currently, checking the length of prompts for embeddings to ensure
      they fit in the context window (and possible truncation) occurs in
      two places - the Ollama server and runner. This can lead to
      inconsistencies in both the checks and reported number of tokens
      processed. Since we have to do this processing in the runner, this
      consolidates all of the logic there.
      5d347f6d
  17. 25 Oct, 2025 1 commit
  18. 23 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      DRY out the runner lifecycle code (#12540) · 3258a89b
      Daniel Hiltgen authored
      * DRY out the runner lifecycle code
      
      Now that discovery uses the runners as well, this unifies the runner spawning code
      into a single place.  This also unifies GPU discovery types with the newer ml.DeviceInfo
      
      * win: make incremental builds better
      
      Place build artifacts in discrete directories so incremental builds don't have to start fresh
      
      * Adjust sort order to consider iGPUs
      
      * handle cpu inference oom scenarios
      
      * review comments
      3258a89b
  19. 22 Oct, 2025 1 commit
  20. 20 Oct, 2025 1 commit
  21. 17 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      test: harden scheduler tests (#12662) · 68e04c7f
      Daniel Hiltgen authored
      * test: harden scheduler tests
      
      This removes reschedDelay which was stale code, and adds
      a new configurable timeout for the waitForVRAMRecovery so
      tests can now set the timeout to be very short to avoid the
      scheduler getting stuck and hitting a test timeout.
      
      * test: tune tests for partial loads
      
      Give stress tests more time when the model is split between CPU/GPU
      68e04c7f
  22. 16 Oct, 2025 1 commit
  23. 14 Oct, 2025 1 commit
  24. 13 Oct, 2025 1 commit
    • Grace's avatar
      Qwen3VL Cloud Parser and Renderer (#12526) · 05982a95
      Grace authored
      
      
      * working (other than tool call is the incorrect order) for tool calls and tools
      
      * Tests work, other than image tags (tests do not go through server) and tools (not in the correct order, but contents are the same)
      
      * testing for qwen3vl parser - toolparser is working
      
      * made changes to JSON tool parser, wraps the TollCallFunction with a TollCall object
      
      * Working parser for thinking models - assumes state of thinking, emits unambiguous content in thinking, does not call tool call in thinking
      
      * changed the parser to start with collecting content
      
      * thinking prefill
      
      * add hasThinkingSupport parameter to parser
      
      * qwen3-vl -> qwen3-vl-instruct for renderer/parser
      
      * Add hasThinkingSupport=false to QwenVLParser
      
      ---------
      Co-authored-by: default avatarDevon Rifkin <drifkin@drifkin.net>
      05982a95
  25. 11 Oct, 2025 2 commits
  26. 10 Oct, 2025 2 commits
  27. 09 Oct, 2025 4 commits