• Michael Yang's avatar
    gpt-oss (#11672) · fa7776fd
    Michael Yang authored
    * bf16
    
    * tests
    
    * gpt-oss
    
    * enable gptoss for engine
    
    * rough estimate
    
    * convert to mxfp4
    
    * handle safetensors U8
    
    * clamp glu/linear
    
    * update tokenizer
    
    * MXFP4 support
    
    This implements the Open Compute Microscaling (MX) FP4 format
    as a tensor type with backend implementations focusing
    on mulmat and mulmatid on CPU, CUDA, and Metal.
    
    * Unit tests for MXFP4 support
    
    This exercises various operations and shapes on both CPU and GPU (if detected
    on the system)
    
    * cuda graph
    
    * unit test adjustments
    
    * cuda: optimize memory access
    
    Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4
    
    * mac: fix crash on old macos versions
    
    cblas_sgemm is only supported on v13.3 and up, however bf16 is
    only supported on v14+ so we were falling back to ggml-blas and
    crashing on bf16 tensors.  Checking for the function being null
    seems to be the simplest way to condittionally avoid registering the
    backend.
    
    * server: Minimum context length for gptoss
    
    This model requires a minimum context ...
    fa7776fd
rope.go 1.07 KB