1. 06 Nov, 2025 1 commit
  2. 02 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      Update GGML to b6646 (#12245) · c68f367e
      Daniel Hiltgen authored
      Notable EOLs with this change:
      - MacOS v12 and v13 are no longer supported (v14+ required)
      - AMD gfx900 and gfx906 are no longer supported
      c68f367e
  3. 14 Aug, 2025 2 commits
    • Jesse Gross's avatar
      llm: New memory management · d5a0d8d9
      Jesse Gross authored
      This changes the memory allocation strategy from upfront estimation to
      tracking actual allocations done by the engine and reacting to that. The
      goal is avoid issues caused by both under-estimation (crashing) and
      over-estimation (low performance due to under-utilized GPUs).
      
      It is currently opt-in and can be enabled for models running on the
      Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other
      cases is unchanged and will continue to use the existing estimates.
      d5a0d8d9
    • Michael Yang's avatar
      update vendored llama.cpp and ggml (#11823) · 1a19df1f
      Michael Yang authored
      * TEMPORARY: Update the llama.cpp upstream to my fork's Granite Four branch
      
      This will be redone once my branch is merged upstream in llama.cpp
      
      * feat: Update all patches
      
      There are a number that are no longer needed at all:
      
      - 0003-embeddings: Embeddings entirely overhauled on master
      - 0008-ensure-KV-cache-is-fully-defragmented: KV caching entirely
          overhauled on master
      - 0019-metal-add-mean-kernel-14267: Merged upstream
      - 0020-CUDA-add-mean-operation-14313: Merged upstream
      
      * feat: Sync llama.cpp and ggml
      
      * fix: Update rsync-filter for all moved/new/removed files
      
      * fix: Add files missing from sync
      
      * fix: Update ggml rsync-filter for new ggml-cpu/arch subdirs
      
      * fix: Add ggml files missing from sync
      
      * fix: Narrow llama.cpp rsync-filter to not include mtmd main tool cpp files
      
      * fix: Remove mtmd main cpp files
      
      * fix: Add missing include in sampling_ext.cpp
      
      * fix: Update llama.go to use mtmd instead of clip/llava
      
      * fix: Add patch for mtmd_input_text
      
      *...
      1a19df1f
  4. 05 Aug, 2025 1 commit
    • Michael Yang's avatar
      gpt-oss (#11672) · fa7776fd
      Michael Yang authored
      * bf16
      
      * tests
      
      * gpt-oss
      
      * enable gptoss for engine
      
      * rough estimate
      
      * convert to mxfp4
      
      * handle safetensors U8
      
      * clamp glu/linear
      
      * update tokenizer
      
      * MXFP4 support
      
      This implements the Open Compute Microscaling (MX) FP4 format
      as a tensor type with backend implementations focusing
      on mulmat and mulmatid on CPU, CUDA, and Metal.
      
      * Unit tests for MXFP4 support
      
      This exercises various operations and shapes on both CPU and GPU (if detected
      on the system)
      
      * cuda graph
      
      * unit test adjustments
      
      * cuda: optimize memory access
      
      Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4
      
      * mac: fix crash on old macos versions
      
      cblas_sgemm is only supported on v13.3 and up, however bf16 is
      only supported on v14+ so we were falling back to ggml-blas and
      crashing on bf16 tensors.  Checking for the function being null
      seems to be the simplest way to condittionally avoid registering the
      backend.
      
      * server: Minimum context length for gptoss
      
      This model requires a minimum context ...
      fa7776fd