1. 08 Dec, 2025 1 commit
    • Michael Yang's avatar
      refactor rope · 603ceefa
      Michael Yang authored
      change to a flatter directory structure and group the options with the
      function
      
      update models to call rope in one place
      603ceefa
  2. 02 Dec, 2025 1 commit
  3. 20 Nov, 2025 2 commits
  4. 19 Nov, 2025 4 commits
  5. 18 Nov, 2025 2 commits
  6. 13 Nov, 2025 1 commit
  7. 06 Nov, 2025 1 commit
  8. 03 Nov, 2025 1 commit
  9. 30 Oct, 2025 2 commits
  10. 29 Oct, 2025 2 commits
  11. 28 Oct, 2025 2 commits
  12. 20 Oct, 2025 1 commit
  13. 18 Oct, 2025 1 commit
  14. 16 Oct, 2025 2 commits
    • Jeffrey Morgan's avatar
      renderers: add global flag for setting [img] tags (#12669) · 65fb3ff4
      Jeffrey Morgan authored
      Adds a temporary global flag to renderers that causes renderers to always
      render images as [img]. In a follow up change, we will consider making this
      the default, and this flag could eventually be removed
      65fb3ff4
    • Grace's avatar
      Grace/qwen3 thinking (#12647) · e2a0b244
      Grace authored
      * changing initial status to take into consideration prefill
      
      * Add seperate strings for content and thinking builder
      
      * thinking tests
      
      * remove white space from string before closing think tag
      e2a0b244
  15. 14 Oct, 2025 2 commits
  16. 13 Oct, 2025 2 commits
    • Grace's avatar
      Qwen3VL Cloud Parser and Renderer (#12526) · 05982a95
      Grace authored
      
      
      * working (other than tool call is the incorrect order) for tool calls and tools
      
      * Tests work, other than image tags (tests do not go through server) and tools (not in the correct order, but contents are the same)
      
      * testing for qwen3vl parser - toolparser is working
      
      * made changes to JSON tool parser, wraps the TollCallFunction with a TollCall object
      
      * Working parser for thinking models - assumes state of thinking, emits unambiguous content in thinking, does not call tool call in thinking
      
      * changed the parser to start with collecting content
      
      * thinking prefill
      
      * add hasThinkingSupport parameter to parser
      
      * qwen3-vl -> qwen3-vl-instruct for renderer/parser
      
      * Add hasThinkingSupport=false to QwenVLParser
      
      ---------
      Co-authored-by: default avatarDevon Rifkin <drifkin@drifkin.net>
      05982a95
    • Michael Yang's avatar
      fix(qwen3): deepseek distill · 6c833d5f
      Michael Yang authored
      deepseek's qwen3 distill uses a different rope scheme so support both
      6c833d5f
  17. 10 Oct, 2025 1 commit
  18. 09 Oct, 2025 2 commits
  19. 03 Oct, 2025 1 commit
  20. 30 Sep, 2025 1 commit
  21. 25 Sep, 2025 1 commit
    • Devon Rifkin's avatar
      parsers: fix unicode handling for qwen3-coder · 05ba4ca1
      Devon Rifkin authored
      When trimming whitespace at the end of every chunk, we were iterating
      backwards over the string byte-by-byte instead of rune-by-rune.
      
      As an example of how this can cause corruption, suppose we have the
      multi-byte character  (`"\u2705"`), which is represented in utf-8 as
      the three bytes `0xE2 0x9C 0x85`. It happens that `0x85` is NEL, which
      passes `unicode.IsSpace()`. Because we were iterating byte-by-byte, this
      caused us to mistakenly slice in the middle of the rune, removing `0x85`
      and leaving `0xE2 0x9C`, which beyond being the incorrect place to
      slice, is not even a valid utf-8 character.
      
      `trailingWhitespaceLen()` was modified to count from the end in a
      rune-aware way. Tests with various multibyte unicode characters were
      also added.
      
      
      Fixes: #12414
      05ba4ca1
  22. 24 Sep, 2025 2 commits
    • Grace's avatar
      Grace/deepseek v3 migration (#12385) · fbd82ba5
      Grace authored
      
      
      * init deepseek model file
      
      * temp removal of flash attention implementation
      
      * shapes and proper, can make a pass
      
      * query, key, value have good cosine similarity, but the max diff is a bit high
      
      * Attention block is working! ** with eager for now, have not added the mask line
      
      * Attention block is working! ** with eager for now, have not added the mask line
      
      * working MoE at around 0.95 cosine sim
      
      * added cosine similarity function
      
      * Starting end to end structure
      
      * Trying (and failing) to get rope to work, going to test full thing on tater
      
      * running on tater36... just not the right outputs
      
      * we have the right values for rope... but its still not working?
      
      * chnage Extrapolation Factor to 1
      
      * removed adding residuals twice, removed normalization from shared expert, refactored Norms (Attention, MLP) to be outside the (Attention, MLP) blocks and in the Transformer block instead, add cache setLayer
      
      * Temporary modelfiles for cpu
      
      * change kpass intermediate step to kv, two layer outputs [0,1] look fine
      
      * this calls for 16 chicken nuggets
      
      * whoops
      
      * cleaning up code
      
      * delete stuff we dont need
      
      * getting rid of debug statements for llama cpp
      
      * working with long contexts
      
      * fix long context view error
      
      * reverting some changes I made for files that are not apart of pr
      
      * Added proper tokenizer for deeepseek3
      
      * clean up model and go test
      
      * remove Modelfile
      
      * not passing the tests
      
      * whoops
      
      * how to pass the ci tests
      
      * resolving some of the comments
      
      * rename
      
      * linted and renamed deepseek3 -> deepseek2
      
      * remove name go
      
      * addressed changes - main change was adopting qwen3 naming scheme
      
      * I cannot with linters
      
      * clean up logs
      
      * clean up logs
      
      ---------
      Co-authored-by: default avatarGrace Guo <graceguo@Graces-MBP.localdomain>
      Co-authored-by: default avatarGrace Guo <graceguo@Graces-MacBook-Pro.local>
      Co-authored-by: default avatargraceguo <graceguo@tater36.localdomain>
      fbd82ba5
    • Michael Yang's avatar
      fix: leaf alt name (#12390) · e1979c57
      Michael Yang authored
      a leaf node with an alternative name gets all its alternatives names
      added into the same branch rather than creating branches themselves
      e1979c57
  23. 23 Sep, 2025 2 commits
  24. 20 Sep, 2025 1 commit
    • Devon Rifkin's avatar
      parsers: fix `&`s in qwen3coder parameter values · 242df70a
      Devon Rifkin authored
      In <https://github.com/ollama/ollama/issues/12357> we that the model
      will output tool calls such as
      
      ```
      <function=shell>
      <parameter=command>
      pwd && ls -la
      </parameter>
      </function>
      ```
      
      We parse this using the approach of transforming into valid xml and then
      using an xml parser. While we do transform the function and parameter
      names, we weren't escaping the parameter values (which in this example
      are invalid since `pwd && ls -la` contains unescaped ampersands).
      
      This has been fixed by first transforming the tags in the same way, and
      then walking the transformed string and escaping the text in between the
      tags. This also fixes a case where `<` in the middle of a parameter
      value would cause an xml parse failure.
      
      Fixes: #12357
      242df70a
  25. 19 Sep, 2025 1 commit
  26. 18 Sep, 2025 1 commit
    • Michael Yang's avatar
      fix: model load for unsupported embedding models (#12311) · 9f3a37fd
      Michael Yang authored
      with #12181, there's now support for embeddings in ollama engine.
      this is done by mutating the architecture and adding _embed when it
      detects an embedding model. however this introduced a bug where if
      an embedding model was run based on an existing ollama engine model
      without an embedding implementation, e.g. llama4, it will pass the
      initial arch support check but fail when actually loaded.
      
      there's currently two entrypoints to creating a model. previously this
      second entrypoint was necessary because calling model.New would also
      load the model. since #11818, this is no longer th case so merge them
      to reduce complexity
      9f3a37fd