1. 05 Aug, 2025 1 commit
    • Michael Yang's avatar
      gpt-oss (#11672) · fa7776fd
      Michael Yang authored
      
      
      * bf16
      
      * tests
      
      * gpt-oss
      
      * enable gptoss for engine
      
      * rough estimate
      
      * convert to mxfp4
      
      * handle safetensors U8
      
      * clamp glu/linear
      
      * update tokenizer
      
      * MXFP4 support
      
      This implements the Open Compute Microscaling (MX) FP4 format
      as a tensor type with backend implementations focusing
      on mulmat and mulmatid on CPU, CUDA, and Metal.
      
      * Unit tests for MXFP4 support
      
      This exercises various operations and shapes on both CPU and GPU (if detected
      on the system)
      
      * cuda graph
      
      * unit test adjustments
      
      * cuda: optimize memory access
      
      Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4
      
      * mac: fix crash on old macos versions
      
      cblas_sgemm is only supported on v13.3 and up, however bf16 is
      only supported on v14+ so we were falling back to ggml-blas and
      crashing on bf16 tensors.  Checking for the function being null
      seems to be the simplest way to condittionally avoid registering the
      backend.
      
      * server: Minimum context length for gptoss
      
      This model requires a minimum context length of 8192 to function
      effectively. Users can set higher values through all normal mechanisms
      but lower values will be silently reset.
      
      * ggml: Multiply by numParallel for gptoss sliding window
      
      When computing the graph size estimate, the context size is already
      multiplied by numParallel so estimates reflect that. However, since
      sliding window models use a smaller, fixed context size, they need
      to manually take numParallel into account.
      
      * gpt-oss integration
      
      includes harmony parser and thinking levels, etc.
      
      * fix sync
      
      * fix tests
      
      * fix lint
      
      ---------
      Co-authored-by: default avatarDaniel Hiltgen <daniel@ollama.com>
      Co-authored-by: default avatarJesse Gross <jesse@ollama.com>
      Co-authored-by: default avatarDevon Rifkin <drifkin@drifkin.net>
      fa7776fd
  2. 17 Jul, 2025 1 commit
  3. 02 Apr, 2025 1 commit
  4. 13 Feb, 2025 1 commit
  5. 13 Dec, 2024 1 commit
  6. 11 Dec, 2024 1 commit
    • Blake Mizerany's avatar
      llama: preserve field order in user-defined JSON schemas (#8002) · 9039c821
      Blake Mizerany authored
      Previously we decoded and re-encoded JSON schemas during validation,
      which served no purpose since json.RawMessage already validates JSON
      syntax. Worse, the re-encoding lost field ordering from the original
      schema, which affects inference quality during step-by-step reasoning.
      
      While fixing this ordering issue by using json.RawMessage directly,
      testing revealed that schema_to_grammar (from llama.cpp) also fails to
      preserve field order during grammar generation. This appears to be the
      root cause of inference degradation.
      
      This change prevents us from mangling the user's original schema order,
      but we still need to address the ordering issue in schema_to_grammar.
      That will be a separate change.
      
      Updates #7978
      9039c821
  7. 05 Dec, 2024 1 commit
  8. 30 Nov, 2024 1 commit
  9. 27 Nov, 2024 2 commits
  10. 07 Sep, 2024 2 commits
  11. 06 Sep, 2024 1 commit
  12. 02 Aug, 2024 1 commit
  13. 01 Aug, 2024 1 commit
  14. 29 Jul, 2024 1 commit
  15. 19 Jul, 2024 2 commits
  16. 17 Jul, 2024 2 commits
  17. 16 Jul, 2024 1 commit
    • royjhan's avatar
      OpenAI: /v1/embeddings compatibility (#5285) · 987dbab0
      royjhan authored
      
      
      * OpenAI v1 models
      
      * Empty List Testing
      
      * Add back envconfig
      
      * v1/models docs
      
      * Remove Docs
      
      * OpenAI batch embed compatibility
      
      * merge conflicts
      
      * integrate with api/embed
      
      * ep
      
      * merge conflicts
      
      * request tests
      
      * rm resp test
      
      * merge conflict
      
      * merge conflict
      
      * test fixes
      
      * test fn renaming
      
      * input validation for empty string
      
      ---------
      Co-authored-by: default avatarjmorganca <jmorganca@gmail.com>
      987dbab0
  18. 14 Jul, 2024 1 commit
  19. 09 Jul, 2024 1 commit
  20. 02 Jul, 2024 2 commits
  21. 14 Jun, 2024 1 commit
  22. 04 Jun, 2024 1 commit
  23. 11 May, 2024 1 commit
  24. 09 May, 2024 1 commit
  25. 26 Mar, 2024 1 commit
  26. 07 Feb, 2024 1 commit