1. 22 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Fix embeddings memory corruption (#6467) · 90ca8417
      Daniel Hiltgen authored
      * Fix embeddings memory corruption
      
      The patch was leading to a buffer overrun corruption.  Once removed though, parallism
      in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
      work around this, only use slot 0 for embeddings.
      
      * Fix embed integration test assumption
      
      The token eval count has changed with recent llama.cpp bumps (0.3.5+)
      90ca8417
  2. 21 Aug, 2024 1 commit
  3. 20 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Split rocm back out of bundle (#6432) · a017cf2f
      Daniel Hiltgen authored
      We're over budget for github's maximum release artifact size with rocm + 2 cuda
      versions.  This splits rocm back out as a discrete artifact, but keeps the layout so it can
      be extracted into the same location as the main bundle.
      a017cf2f
  4. 19 Aug, 2024 6 commits
  5. 12 Aug, 2024 1 commit
  6. 11 Aug, 2024 2 commits
  7. 07 Aug, 2024 1 commit
  8. 06 Aug, 2024 1 commit
  9. 05 Aug, 2024 4 commits
  10. 02 Aug, 2024 1 commit
  11. 31 Jul, 2024 5 commits
  12. 30 Jul, 2024 1 commit
  13. 29 Jul, 2024 1 commit
  14. 27 Jul, 2024 1 commit
  15. 26 Jul, 2024 1 commit
  16. 25 Jul, 2024 1 commit
  17. 24 Jul, 2024 1 commit
  18. 22 Jul, 2024 6 commits
  19. 21 Jul, 2024 1 commit
  20. 20 Jul, 2024 2 commits
    • Daniel Hiltgen's avatar
      Adjust windows ROCm discovery · 283948c8
      Daniel Hiltgen authored
      The v5 hip library returns unsupported GPUs which wont enumerate at
      inference time in the runner so this makes sure we align discovery.  The
      gfx906 cards are no longer supported so we shouldn't compile with that
      GPU type as it wont enumerate at runtime.
      283948c8
    • Jeffrey Morgan's avatar
      add patch for tekken (#5807) · 1475eab9
      Jeffrey Morgan authored
      1475eab9
  21. 16 Jul, 2024 1 commit