"test/torchaudio_unittest/transforms_test.py" did not exist on "55f61d1524d2b51bdcb3be51404abc91609c9665"
  1. 12 Aug, 2025 1 commit
    • Jesse Gross's avatar
      ggml: Use ordinal IDs for AMD GPUs on Linux when UUID is unavailable · a343ae53
      Jesse Gross authored
      Some AMD GPUs do not provide UUIDs and report only "XX". In these
      cases, we should use the ordinal ID as an alternate identifier.
      This is the same as we always need to do on Windows for AMD.
      
      In addition, this prints out the ID for each GPU when enumerating
      them for easier debugging in the future.
      a343ae53
  2. 08 Aug, 2025 1 commit
    • Jesse Gross's avatar
      ggml: No-alloc mode · 79f6376f
      Jesse Gross authored
      Callers can set a backend buffer type to be no-alloc, meaning that
      it does not allocate memory for tensors or operations. This can
      be used for calculating memory requirements. Tensors and graphs
      must be recreated with no-alloc set to false before loading data.
      
      Defaults to false for newly created backend buffer types.
      79f6376f
  3. 05 Aug, 2025 1 commit
    • Michael Yang's avatar
      gpt-oss (#11672) · fa7776fd
      Michael Yang authored
      
      
      * bf16
      
      * tests
      
      * gpt-oss
      
      * enable gptoss for engine
      
      * rough estimate
      
      * convert to mxfp4
      
      * handle safetensors U8
      
      * clamp glu/linear
      
      * update tokenizer
      
      * MXFP4 support
      
      This implements the Open Compute Microscaling (MX) FP4 format
      as a tensor type with backend implementations focusing
      on mulmat and mulmatid on CPU, CUDA, and Metal.
      
      * Unit tests for MXFP4 support
      
      This exercises various operations and shapes on both CPU and GPU (if detected
      on the system)
      
      * cuda graph
      
      * unit test adjustments
      
      * cuda: optimize memory access
      
      Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4
      
      * mac: fix crash on old macos versions
      
      cblas_sgemm is only supported on v13.3 and up, however bf16 is
      only supported on v14+ so we were falling back to ggml-blas and
      crashing on bf16 tensors.  Checking for the function being null
      seems to be the simplest way to condittionally avoid registering the
      backend.
      
      * server: Minimum context length for gptoss
      
      This model requires a minimum context length of 8192 to function
      effectively. Users can set higher values through all normal mechanisms
      but lower values will be silently reset.
      
      * ggml: Multiply by numParallel for gptoss sliding window
      
      When computing the graph size estimate, the context size is already
      multiplied by numParallel so estimates reflect that. However, since
      sliding window models use a smaller, fixed context size, they need
      to manually take numParallel into account.
      
      * gpt-oss integration
      
      includes harmony parser and thinking levels, etc.
      
      * fix sync
      
      * fix tests
      
      * fix lint
      
      ---------
      Co-authored-by: default avatarDaniel Hiltgen <daniel@ollama.com>
      Co-authored-by: default avatarJesse Gross <jesse@ollama.com>
      Co-authored-by: default avatarDevon Rifkin <drifkin@drifkin.net>
      fa7776fd
  4. 30 Jul, 2025 1 commit
  5. 29 Jul, 2025 1 commit
  6. 09 Jul, 2025 1 commit
    • Jesse Gross's avatar
      ggml: Report ordinal IDs for AMD GPUs on Windows · 35fda7b4
      Jesse Gross authored
      We don't get valid UUIDs for AMD GPUs on Windows, so the best option
      is to use the ordinal IDs. This brings us in line with what we currently
      do on the Ollama server - the only exception is AMD GPUs on Linux, which
      falls back to using ordinal IDs. The GGML implementation has no fallback
      but it doesn't appear to occur for any of the GPUs that we support.
      
      It's also possible that there are collisions between ordinal IDs for
      different libraries - however the only places where we use them are
      AMD on Windows and Metal on Mac, which can never occur on the same
      system.
      35fda7b4
  7. 26 Jun, 2025 1 commit
  8. 23 Jun, 2025 1 commit
    • Daniel Hiltgen's avatar
      Re-remove cuda v11 (#10694) · 1c6669e6
      Daniel Hiltgen authored
      * Re-remove cuda v11
      
      Revert the revert - drop v11 support requiring drivers newer than Feb 23
      
      This reverts commit c6bcdc42.
      
      * Simplify layout
      
      With only one version of the GPU libraries, we can simplify things down somewhat.  (Jetsons still require special handling)
      
      * distinct sbsa variant for linux arm64
      
      This avoids accidentally trying to load the sbsa cuda libraries on
      a jetson system which results in crashes.
      
      * temporary prevent rocm+cuda mixed loading
      1c6669e6
  9. 18 Jun, 2025 2 commits
  10. 29 May, 2025 1 commit
    • Jesse Gross's avatar
      ggml: Export GPU UUIDs · aaa78180
      Jesse Gross authored
      This enables matching up devices and information reported by the backend
      with system management libraries such as nvml to get accurate free
      memory reporting.
      aaa78180
  11. 23 May, 2025 1 commit
  12. 22 May, 2025 1 commit
    • Jesse Gross's avatar
      ggml: Report graph memory for failed allocations · 6db8a377
      Jesse Gross authored
      GGML has a function to report the allocated size of a backend buffer.
      However, this returns 0 if we tried to allocate a buffer and it failed.
      For memory management purposes, it's important to know how much we were
      trying to allocate. This extends the API to report attempted sizes for
      all buffers and whether it succeeeded.
      6db8a377
  13. 20 May, 2025 1 commit
  14. 16 May, 2025 1 commit
  15. 14 May, 2025 2 commits
  16. 13 May, 2025 3 commits
  17. 12 May, 2025 1 commit
  18. 10 May, 2025 1 commit
  19. 08 May, 2025 1 commit
  20. 06 May, 2025 1 commit
    • Daniel Hiltgen's avatar
      Move quantization to new backend (#10363) · 42481045
      Daniel Hiltgen authored
      * Move quantization logic to GGML via new backend
      
      This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.
      
      * Remove "add model quantizations"
      
      This is no longer needed now that quantization is implemented in Go+GGML code directly.
      42481045
  21. 05 May, 2025 2 commits
  22. 02 May, 2025 2 commits
  23. 25 Apr, 2025 1 commit
  24. 24 Apr, 2025 1 commit
  25. 17 Apr, 2025 1 commit
  26. 16 Apr, 2025 1 commit
  27. 15 Apr, 2025 1 commit
  28. 03 Apr, 2025 1 commit
    • Bruce MacDonald's avatar
      model: support for mistral-small in the ollama runner · 6bd0a983
      Bruce MacDonald authored
      Mistral is a popular research lab making open source models. This updates
      the forward pass of llama architecture models to support both llama models
      and mistral models by accounting for additional metadata present in mistral
      models, and finding the correct dimensions for the output projection.
      6bd0a983
  29. 31 Mar, 2025 1 commit
    • Bruce MacDonald's avatar
      runner: clear cache when shift is not possible (#9433) · 66b25392
      Bruce MacDonald authored
      Clear KV cache when shift operation is not supported by model.
      Added KvCacheCanShift() check to handle models that can't perform cache shifts,
      falling back to full cache clear while preserving logical token history to
      maintain expected behavior when context window fills up.
      66b25392
  30. 27 Mar, 2025 1 commit
  31. 15 Mar, 2025 1 commit
  32. 11 Mar, 2025 1 commit
  33. 10 Mar, 2025 1 commit
  34. 07 Mar, 2025 1 commit