1. 17 Jan, 2024 1 commit
  2. 11 Jan, 2024 6 commits
    • Daniel Hiltgen's avatar
      Fix up the CPU fallback selection · 7427fa13
      Daniel Hiltgen authored
      The memory changes and multi-variant change had some merge
      glitches I missed.  This fixes them so we actually get the cpu llm lib
      and best variant for the given system.
      7427fa13
    • Michael Yang's avatar
      fix typo · d2be6387
      Michael Yang authored
      d2be6387
    • Michael Yang's avatar
      use x/exp/slices · defc1dbd
      Michael Yang authored
      defc1dbd
    • Daniel Hiltgen's avatar
      Always dynamically load the llm server library · 39928a42
      Daniel Hiltgen authored
      This switches darwin to dynamic loading, and refactors the code now that no
      static linking of the library is used on any platform
      39928a42
    • Daniel Hiltgen's avatar
      Build multiple CPU variants and pick the best · d88c527b
      Daniel Hiltgen authored
      This reduces the built-in linux version to not use any vector extensions
      which enables the resulting builds to run under Rosetta on MacOS in
      Docker.  Then at runtime it checks for the actual CPU vector
      extensions and loads the best CPU library available
      d88c527b
    • Daniel Hiltgen's avatar
      Support multiple variants for a given llm lib type · 8da7bef0
      Daniel Hiltgen authored
      In some cases we may want multiple variants for a given GPU type or CPU.
      This adds logic to have an optional Variant which we can use to select
      an optimal library, but also allows us to try multiple variants in case
      some fail to load.
      
      This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
      or potentially CPU features.
      8da7bef0