1. 06 Jan, 2026 3 commits
  2. 03 Jan, 2026 5 commits
  3. 23 Dec, 2025 2 commits
  4. 19 Dec, 2025 1 commit
    • Jesse Gross's avatar
      llm: Avoid integer underflow on llama engine memory layout · 172b5924
      Jesse Gross authored
      On the llama engine, when we compute the memory layout, we reserve
      a buffer to allow for some flexibility for incorrect estimates.
      This is subtracted from GPU free memory and on GPUs with limited
      memory, it may underflow.
      
      Fixes #13494
      172b5924
  5. 18 Dec, 2025 4 commits
  6. 17 Dec, 2025 3 commits
  7. 16 Dec, 2025 8 commits
  8. 15 Dec, 2025 6 commits
  9. 13 Dec, 2025 2 commits
  10. 12 Dec, 2025 6 commits
    • Daniel Hiltgen's avatar
      flash attn: add auto mode for llama engine (#13052) · bd6c1d6b
      Daniel Hiltgen authored
      * flash attn: add auto mode for llama engine
      
      If the user does not specify fa in the environment, use auto-mode.
      
      * review comments
      
      * ensure kv cache quantized types have FA explicitly enabled
      
      additional review comments
      bd6c1d6b
    • Jeffrey Morgan's avatar
      3af5d3b7
    • Daniel Hiltgen's avatar
      Enable Ollama engine by default (#13443) · 77308951
      Daniel Hiltgen authored
      This changes the default behavior to use the Ollama engine for supported
      models, while retaining the ability to disable the Ollama engine and
      fall back to the Llama engine.  Models in the OllamaEngineRequired list
      will always run on the Ollama engine.
      77308951
    • Eva H's avatar
      tidy up lint warnings on windows (#13430) · de9ecfd0
      Eva H authored
      de9ecfd0
    • Eva H's avatar
      95fdd8d6
    • Devon Rifkin's avatar
      docs: add docs for v1/responses and rework openai compat section (#13416) · 9f782285
      Devon Rifkin authored
      
      
      * docs: add docs for v1/responses and rework openai compat section
      
      I reworked the examples to be separated by topic and to be fully
      runnable (i.e., they now log output instead of just suggesting how a
      call might be made).
      
      We now use `<CodeGroup>`s so that each example has a dropdown on the
      docs site for users to choose, which makes the examples a lot more
      digestible (since you only see approx 1/3 of the code you used to).
      
      I also added a new tool to extract code examples into files so that it's
      easier to actually run them and check that they work.
      
      ## Example
      
      ```shell
      go run docs/tools/extract-examples/main.go docs/api/openai-compatibility.mdx
      ```
      
      Output:
      
      ```
      Extracting code examples to: /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368
      
        - 01_basic.py
        - 01_basic.js
        - 01_basic.sh
        - 02_responses.py
        - 02_responses.js
        - 02_responses.sh
        - 03_vision.py
        - 03_vision.js
        - 03_vision.sh
      
      Extracted 9 file(s) to /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368
      
      To run examples:
      
        cd /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368
        npm install   # for JS examples
      
      then run individual files with `node file.js`, `python file.py`, `bash file.sh`
      ```
      
      In the future we should consider actually running the examples in CI and
      having some sort of acceptance test so we can automatically detect when
      our examples break. So this is just a start in that direction.
      
      * Update docs/api/openai-compatibility.mdx
      Co-authored-by: default avatarParth Sareen <parth.sareen@ollama.com>
      
      * Update docs/api/openai-compatibility.mdx
      Co-authored-by: default avatarParth Sareen <parth.sareen@ollama.com>
      
      ---------
      Co-authored-by: default avatarParth Sareen <parth.sareen@ollama.com>
      9f782285