1. 22 May, 2025 1 commit
    • Bruce MacDonald's avatar
      server: improve tensor quantization fallback logic (#10806) · fbe6ae28
      Bruce MacDonald authored
      Fall back to alternative quantization types when a tensor's dimensions aren't divisible by the block size required for the original desired quantization type. If retried quantization types fail, the system ultimately falls back to F16 (half-precision floating point) which has a block size of 1 and can handle any tensor dimension.
      fbe6ae28
  2. 12 May, 2025 1 commit
    • Daniel Hiltgen's avatar
      Follow up to #10363 (#10647) · 9d6df908
      Daniel Hiltgen authored
      The quantization PR didn't block all unsupported file types,
      which this PR fixes.  It also updates the API docs to reflect
      the now reduced set of supported types.
      9d6df908
  3. 06 May, 2025 1 commit
    • Daniel Hiltgen's avatar
      Move quantization to new backend (#10363) · 42481045
      Daniel Hiltgen authored
      * Move quantization logic to GGML via new backend
      
      This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.
      
      * Remove "add model quantizations"
      
      This is no longer needed now that quantization is implemented in Go+GGML code directly.
      42481045