• Michael Yang's avatar
    gpt-oss (#11672) · fa7776fd
    Michael Yang authored
    
    
    * bf16
    
    * tests
    
    * gpt-oss
    
    * enable gptoss for engine
    
    * rough estimate
    
    * convert to mxfp4
    
    * handle safetensors U8
    
    * clamp glu/linear
    
    * update tokenizer
    
    * MXFP4 support
    
    This implements the Open Compute Microscaling (MX) FP4 format
    as a tensor type with backend implementations focusing
    on mulmat and mulmatid on CPU, CUDA, and Metal.
    
    * Unit tests for MXFP4 support
    
    This exercises various operations and shapes on both CPU and GPU (if detected
    on the system)
    
    * cuda graph
    
    * unit test adjustments
    
    * cuda: optimize memory access
    
    Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4
    
    * mac: fix crash on old macos versions
    
    cblas_sgemm is only supported on v13.3 and up, however bf16 is
    only supported on v14+ so we were falling back to ggml-blas and
    crashing on bf16 tensors.  Checking for the function being null
    seems to be the simplest way to condittionally avoid registering the
    backend.
    
    * server: Minimum context length for gptoss
    
    This model requires a minimum context length of 8192 to function
    effectively. Users can set higher values through all normal mechanisms
    but lower values will be silently reset.
    
    * ggml: Multiply by numParallel for gptoss sliding window
    
    When computing the graph size estimate, the context size is already
    multiplied by numParallel so estimates reflect that. However, since
    sliding window models use a smaller, fixed context size, they need
    to manually take numParallel into account.
    
    * gpt-oss integration
    
    includes harmony parser and thinking levels, etc.
    
    * fix sync
    
    * fix tests
    
    * fix lint
    
    ---------
    Co-authored-by: default avatarDaniel Hiltgen <daniel@ollama.com>
    Co-authored-by: default avatarJesse Gross <jesse@ollama.com>
    Co-authored-by: default avatarDevon Rifkin <drifkin@drifkin.net>
    fa7776fd
interactive.go 18.6 KB