1. 17 Apr, 2024 1 commit
  2. 16 Apr, 2024 1 commit
  3. 01 Apr, 2024 2 commits
    • Daniel Hiltgen's avatar
      Apply 01-cache.diff · 0a0e9f3e
      Daniel Hiltgen authored
      0a0e9f3e
    • Daniel Hiltgen's avatar
      Switch back to subprocessing for llama.cpp · 58d95cc9
      Daniel Hiltgen authored
      This should resolve a number of memory leak and stability defects by allowing
      us to isolate llama.cpp in a separate process and shutdown when idle, and
      gracefully restart if it has problems.  This also serves as a first step to be
      able to run multiple copies to support multiple models concurrently.
      58d95cc9
  4. 26 Mar, 2024 1 commit
  5. 23 Mar, 2024 1 commit
  6. 16 Mar, 2024 1 commit
  7. 12 Mar, 2024 3 commits
  8. 11 Mar, 2024 2 commits
  9. 09 Mar, 2024 1 commit
  10. 08 Mar, 2024 1 commit
  11. 01 Mar, 2024 1 commit
  12. 20 Feb, 2024 2 commits
  13. 14 Feb, 2024 1 commit
  14. 09 Feb, 2024 1 commit
    • Daniel Hiltgen's avatar
      Shutdown faster · 66807615
      Daniel Hiltgen authored
      Make sure that when a shutdown signal comes, we shutdown quickly instead
      of waiting for a potentially long exchange to wrap up.
      66807615
  15. 31 Jan, 2024 1 commit
  16. 22 Jan, 2024 1 commit
  17. 21 Jan, 2024 1 commit
  18. 17 Jan, 2024 1 commit
  19. 14 Jan, 2024 1 commit
  20. 11 Jan, 2024 1 commit
    • Daniel Hiltgen's avatar
      Support multiple variants for a given llm lib type · 8da7bef0
      Daniel Hiltgen authored
      In some cases we may want multiple variants for a given GPU type or CPU.
      This adds logic to have an optional Variant which we can use to select
      an optimal library, but also allows us to try multiple variants in case
      some fail to load.
      
      This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
      or potentially CPU features.
      8da7bef0
  21. 10 Jan, 2024 1 commit
  22. 07 Jan, 2024 1 commit
  23. 04 Jan, 2024 1 commit