1. 06 Jan, 2026 1 commit
    • Devon Rifkin's avatar
      preserve tool definition and call JSON ordering (#13525) · e51dead6
      Devon Rifkin authored
      * preserve tool definition and call JSON ordering
      
      This is another iteration of
      <https://github.com/ollama/ollama/pull/12518>, but this time we've
      simplified things by relaxing the competing requirements of being
      compatible AND order-preserving with templates (vs. renderers). We
      maintain backwards compatibility at the cost of not guaranteeing order
      for templates. We plan on moving more and more models to renderers,
      which have been updated to use these new data types, and additionally
      we could add an opt-in way of templates getting an order-preserved list
      (e.g., via sibling template vars)
      
      * orderedmap_test: remove testify
      e51dead6
  2. 22 Oct, 2025 1 commit
  3. 25 Sep, 2025 1 commit
  4. 22 Aug, 2025 1 commit
  5. 05 Aug, 2025 1 commit
    • Michael Yang's avatar
      gpt-oss (#11672) · fa7776fd
      Michael Yang authored
      
      
      * bf16
      
      * tests
      
      * gpt-oss
      
      * enable gptoss for engine
      
      * rough estimate
      
      * convert to mxfp4
      
      * handle safetensors U8
      
      * clamp glu/linear
      
      * update tokenizer
      
      * MXFP4 support
      
      This implements the Open Compute Microscaling (MX) FP4 format
      as a tensor type with backend implementations focusing
      on mulmat and mulmatid on CPU, CUDA, and Metal.
      
      * Unit tests for MXFP4 support
      
      This exercises various operations and shapes on both CPU and GPU (if detected
      on the system)
      
      * cuda graph
      
      * unit test adjustments
      
      * cuda: optimize memory access
      
      Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4
      
      * mac: fix crash on old macos versions
      
      cblas_sgemm is only supported on v13.3 and up, however bf16 is
      only supported on v14+ so we were falling back to ggml-blas and
      crashing on bf16 tensors.  Checking for the function being null
      seems to be the simplest way to condittionally avoid registering the
      backend.
      
      * server: Minimum context length for gptoss
      
      This model requires a minimum context length of 8192 to function
      effectively. Users can set higher values through all normal mechanisms
      but lower values will be silently reset.
      
      * ggml: Multiply by numParallel for gptoss sliding window
      
      When computing the graph size estimate, the context size is already
      multiplied by numParallel so estimates reflect that. However, since
      sliding window models use a smaller, fixed context size, they need
      to manually take numParallel into account.
      
      * gpt-oss integration
      
      includes harmony parser and thinking levels, etc.
      
      * fix sync
      
      * fix tests
      
      * fix lint
      
      ---------
      Co-authored-by: default avatarDaniel Hiltgen <daniel@ollama.com>
      Co-authored-by: default avatarJesse Gross <jesse@ollama.com>
      Co-authored-by: default avatarDevon Rifkin <drifkin@drifkin.net>
      fa7776fd
  6. 24 Jul, 2025 1 commit
  7. 20 Jul, 2025 1 commit
  8. 30 Jun, 2025 1 commit
  9. 18 Jun, 2025 1 commit
  10. 17 Jun, 2025 1 commit
  11. 12 Jun, 2025 1 commit
  12. 27 May, 2025 2 commits
  13. 23 May, 2025 1 commit