1. 03 Jan, 2026 2 commits
  2. 23 Dec, 2025 2 commits
  3. 19 Dec, 2025 1 commit
    • Jesse Gross's avatar
      llm: Avoid integer underflow on llama engine memory layout · 172b5924
      Jesse Gross authored
      On the llama engine, when we compute the memory layout, we reserve
      a buffer to allow for some flexibility for incorrect estimates.
      This is subtracted from GPU free memory and on GPUs with limited
      memory, it may underflow.
      
      Fixes #13494
      172b5924
  4. 18 Dec, 2025 4 commits
  5. 17 Dec, 2025 3 commits
  6. 16 Dec, 2025 8 commits
  7. 15 Dec, 2025 6 commits
  8. 13 Dec, 2025 2 commits
  9. 12 Dec, 2025 10 commits
  10. 11 Dec, 2025 2 commits
    • Devon Rifkin's avatar
      openai: add v1/responses support (#13351) · 1eb5e759
      Devon Rifkin authored
      Only supporting the stateless part of the API.
      
      Doc updates to come once this is shipped.
      
      Closes: #9659
      1eb5e759
    • nicole pardal's avatar
      embeddings: modified batch size (#13429) · 3475d915
      nicole pardal authored
      
      
      This PR detects embedding models and sets batch_size = context_size so the full input fits in a single batch.
      Previously, if batch size was smaller than the input, tokens could be split across batches and cause a SIGTRAP crash.
      This change ensures all tokens stay in one batch and prevents crashes.
      Fixes: #12938 #13054
      Co-authored-by: default avatarJesse Gross <jesse@ollama.com>
      3475d915