Commits · 2717dce6fe1bb4eab80abd5fbbd713211a7fc276 · OpenDAS / ollama

18 Sep, 2025 1 commit

convert: convert bf16 vision weights to fp16 (#12324) · 2717dce6

Patrick Devine authored Sep 17, 2025

This change moves back to converting bf16 vision weights to fp16,
specifically if they start with the name "v." (such as v.blk.0.attn_k.weight).

This fixes a bug where converted images are failing because they are trying
to call `im2col` which doesn't have a bf16 kernel in ggml.

2717dce6

26 Aug, 2025 1 commit
- convert(gptoss): mxfp4 to ggml layout to avoid jit conversion (#12018) · 59412fbb
  Michael Yang authored Aug 26, 2025
```
* convert: return bytes written

* ggml flavor mxfp4

* simplify jit conversion

* comment
```
  59412fbb
14 Aug, 2025 1 commit

convert: skip reading into memory when possible (#11507) · ef7d26ba

Michael Yang authored Aug 14, 2025

if there's no transformation to the tensor and the input and output
types match, copy directly into the writer. also read from a bufio with
a 32K buffer

ef7d26ba

05 Aug, 2025 1 commit

gpt-oss (#11672) · fa7776fd

Michael Yang authored Aug 05, 2025



* bf16

* tests

* gpt-oss

* enable gptoss for engine

* rough estimate

* convert to mxfp4

* handle safetensors U8

* clamp glu/linear

* update tokenizer

* MXFP4 support

This implements the Open Compute Microscaling (MX) FP4 format
as a tensor type with backend implementations focusing
on mulmat and mulmatid on CPU, CUDA, and Metal.

* Unit tests for MXFP4 support

This exercises various operations and shapes on both CPU and GPU (if detected
on the system)

* cuda graph

* unit test adjustments

* cuda: optimize memory access

Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4

* mac: fix crash on old macos versions

cblas_sgemm is only supported on v13.3 and up, however bf16 is
only supported on v14+ so we were falling back to ggml-blas and
crashing on bf16 tensors.  Checking for the function being null
seems to be the simplest way to condittionally avoid registering the
backend.

* server: Minimum context length for gptoss

This model requires a minimum context length of 8192 to function
effectively. Users can set higher values through all normal mechanisms
but lower values will be silently reset.

* ggml: Multiply by numParallel for gptoss sliding window

When computing the graph size estimate, the context size is already
multiplied by numParallel so estimates reflect that. However, since
sliding window models use a smaller, fixed context size, they need
to manually take numParallel into account.

* gpt-oss integration

includes harmony parser and thinking levels, etc.

* fix sync

* fix tests

* fix lint

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>

fa7776fd

23 Jul, 2025 1 commit
- s#x/exp/maps#maps# (#11506) · 6c733bf0
  Michael Yang authored Jul 23, 2025
  
  6c733bf0
25 Apr, 2025 1 commit
- llama4 · f0c66e6d
  Michael Yang authored Apr 03, 2025
  
  f0c66e6d
06 Sep, 2024 1 commit
- Fix gemma2 2b conversion (#6645) · 608e87bf
  Patrick Devine authored Sep 05, 2024
  
  608e87bf
28 Aug, 2024 1 commit
- throw an error when encountering unsupport tensor sizes (#6538) · 6c1c1ad6
  Patrick Devine authored Aug 27, 2024
  
  6c1c1ad6
21 Aug, 2024 1 commit
- convert gemma2 · 3546bbd0
  Michael Yang authored Jun 28, 2024
  
  3546bbd0
02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
31 Jul, 2024 4 commits
- convert: only extract large files · eafc607a
  Michael Yang authored Jun 29, 2024
  
  eafc607a
- Update convert/reader_safetensors.go · 781fc2d5
  Michael Yang authored Jul 31, 2024
```
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
```
  781fc2d5
- comments · df993fa3
  Michael Yang authored Jul 08, 2024
  
  df993fa3
- refactor convert · 5e9db9fb
  Michael Yang authored May 31, 2024
  
  5e9db9fb