"configs/_base_/models/hrnet/hrnet-w44.py" did not exist on "8f9dd0edefa849b2552ba149141ddb369bdbec4e"
  1. 17 Dec, 2025 1 commit
    • Daniel Hiltgen's avatar
      GGML update to ec98e2002 (#13451) · 49a9c9ba
      Daniel Hiltgen authored
      * Revert "add support for NVIDIA Nemotron 3 Nano"
      
      This reverts commit e7d2ae9d69421012e9a8765c06a3fdf0e45b12f3.
      
      * GGML update to 380b4c984
      
      Remove MaskBatchPadding as GGML_KQ_MASK_PAD is no longer present (no
      padding required)
      
      * update to c45f89d55
      
      * ec98e2002
      
      solar pro needed more adjusting - needs verification
      
      * review comments
      49a9c9ba
  2. 10 Dec, 2025 1 commit
  3. 04 Dec, 2025 1 commit
    • Daniel Hiltgen's avatar
      ggml update to b7108 (#12992) · 0cf7794b
      Daniel Hiltgen authored
      * Revert "vulkan: temporary cary of vulkan fixes (#12971)"
      
      This reverts commit 3a9e8e9f.
      
      * ggml update to b7087
      
      * fix argsort on metal
      
      * update to b7108
      
      * fix bakllava regression
      
      This model lacks the metadata for the projector type.
      
      * update to b7209
      
      * fix TopK perf
      
      * only build arm code on arm
      0cf7794b
  4. 06 Nov, 2025 2 commits
  5. 30 Oct, 2025 1 commit
    • Jesse Gross's avatar
      ggml: Enable op_offload to improve partial offload performance · afaf7ce8
      Jesse Gross authored
      When a model is partially offloaded to system RAM, we can either
      do the calculations on the CPU or we can temporarily transfer the
      data to the GPU to do the calculations there. Small batches tend
      to be better on the CPU, large batches on the GPU.
      
      The llamarunner used the GPU in most cases and the ollamarunner
      used the CPU. Although the ollamarunner saw an improvement in
      token generation performance, there was a large performance hit
      in prompt processing (3-10x).
      
      There is an existing heuristic to dynamically switch between these
      two modes but in practice it doesn't have enough information to
      accurately make that decision. This adds authoritative data to make
      the check work to get the best of both worlds.
      
      Fixes #12037
      afaf7ce8