1. 23 Dec, 2024 3 commits
  2. 22 Dec, 2024 1 commit
  3. 20 Dec, 2024 2 commits
  4. 19 Dec, 2024 1 commit
  5. 18 Dec, 2024 1 commit
  6. 17 Dec, 2024 6 commits
    • Jesse Gross's avatar
      llama: Ensure KV cache is fully defragmented. · 08a832b4
      Jesse Gross authored
      Sometimes the KV cache requires defragmentation even without
      triggering the threshold heuristic. In this case, decoding
      will not being able to find a KV cache slot. This is particularly
      difficult for the caller to handle if it happens in between
      ubatches. To avoid this, we should immediately trigger a defrag.
      
      In addition, a heavily fragmented cache can require more than
      max_moves to defragment. Currently, we stop when we hit the limit
      but this can leave a cache that still does not have adequate space
      even after defragmentation is triggered. Instead, we should do
      multiple batches of processing until everything is complete.
      
      Fixes #7949
      08a832b4
    • Blake Mizerany's avatar
      llm: do not error on "null" format (#8139) · 2ddc32d5
      Blake Mizerany authored
      This fixes another regression in the previous commit that fixed other
      known bugs.
      2ddc32d5
    • Jascha Beste's avatar
    • Blake Mizerany's avatar
      llm: do not silently fail for supplied, but invalid formats (#8130) · 87f0a49f
      Blake Mizerany authored
      Changes in #8002 introduced fixes for bugs with mangling JSON Schemas.
      It also fixed a bug where the server would silently fail when clients
      requested invalid formats. It also, unfortunately, introduced a bug
      where the server would reject requests with an empty format, which
      should be allowed.
      
      The change in #8127 updated the code to allow the empty format, but also
      reintroduced the regression where the server would silently fail when
      the format was set, but invalid.
      
      This commit fixes both regressions. The server does not reject the empty
      format, but it does reject invalid formats. It also adds tests to help
      us catch regressions in the future.
      
      Also, the updated code provides a more detailed error message when a
      client sends a non-empty, but invalid format, echoing the invalid format
      in the response.
      
      This commits also takes the opportunity to remove superfluous linter
      checks.
      87f0a49f
    • Jeffrey Morgan's avatar
    • Daniel Hiltgen's avatar
      darwin: restore multiple runners for x86 (#8125) · 8f805dd7
      Daniel Hiltgen authored
      In 0.5.2 we simplified packaging to have avx only for macos x86.  It looks like
      there may still be some non-AVX systems out there, so this puts back the prior
      logic of building no-AVX for the primary binary, and now 2 runners for avx and avx2.
      These will be packaged in the App bundle only, so the stand-alone binary will now be
      without AVX support on macos.  On arm, we'll also see these runners reported
      as available in the log, but they're dormant and will never be used at runtime.
      8f805dd7
  7. 16 Dec, 2024 2 commits
  8. 15 Dec, 2024 1 commit
  9. 14 Dec, 2024 2 commits
  10. 13 Dec, 2024 2 commits
  11. 12 Dec, 2024 2 commits
  12. 11 Dec, 2024 10 commits
  13. 10 Dec, 2024 7 commits
    • Tao Zuhong's avatar
    • frob's avatar
      757eeacc
    • Dr. Daniel Bender's avatar
    • Daniel Hiltgen's avatar
      Remove unused runner CpuFeatures (#8032) · b9ccb374
      Daniel Hiltgen authored
      The final implementation of #7499 removed dynamic vector requirements
      in favor of a simpler filename based model, and this was left over logic that
      is no longer needed.
      b9ccb374
    • Stefan Weil's avatar
    • Daniel Hiltgen's avatar
      build: fix typo in override variable (#8031) · 82a02e18
      Daniel Hiltgen authored
      The "F" was missing.
      82a02e18
    • Daniel Hiltgen's avatar
      build: Make target improvements (#7499) · 4879a234
      Daniel Hiltgen authored
      * llama: wire up builtin runner
      
      This adds a new entrypoint into the ollama CLI to run the cgo built runner.
      On Mac arm64, this will have GPU support, but on all other platforms it will
      be the lowest common denominator CPU build.  After we fully transition
      to the new Go runners more tech-debt can be removed and we can stop building
      the "default" runner via make and rely on the builtin always.
      
      * build: Make target improvements
      
      Add a few new targets and help for building locally.
      This also adjusts the runner lookup to favor local builds, then
      runners relative to the executable, and finally payloads.
      
      * Support customized CPU flags for runners
      
      This implements a simplified custom CPU flags pattern for the runners.
      When built without overrides, the runner name contains the vector flag
      we check for (AVX) to ensure we don't try to run on unsupported systems
      and crash.  If the user builds a customized set, we omit the naming
      scheme and don't check for compatibility.  This avoids checking
      requirements at runtime, so that logic has been removed as well.  This
      can be used to build GPU runners with no vector flags, or CPU/GPU
      runners with additional flags (e.g. AVX512) enabled.
      
      * Use relative paths
      
      If the user checks out the repo in a path that contains spaces, make gets
      really confused so use relative paths for everything in-repo to avoid breakage.
      
      * Remove payloads from main binary
      
      * install: clean up prior libraries
      
      This removes support for v0.3.6 and older versions (before the tar bundle)
      and ensures we clean up prior libraries before extracting the bundle(s).
      Without this change, runners and dependent libraries could leak when we
      update and lead to subtle runtime errors.
      4879a234