"vscode:/vscode.git/clone" did not exist on "95a7832879a3ca7debd3f7a4ee05b08ddd19a8a7"
  1. 22 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Fix embeddings memory corruption (#6467) · 90ca8417
      Daniel Hiltgen authored
      * Fix embeddings memory corruption
      
      The patch was leading to a buffer overrun corruption.  Once removed though, parallism
      in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
      work around this, only use slot 0 for embeddings.
      
      * Fix embed integration test assumption
      
      The token eval count has changed with recent llama.cpp bumps (0.3.5+)
      90ca8417
  2. 11 Aug, 2024 1 commit
  3. 06 Aug, 2024 1 commit
  4. 05 Aug, 2024 1 commit
  5. 30 Jul, 2024 1 commit
  6. 29 Jul, 2024 1 commit
  7. 22 Jul, 2024 1 commit
    • Daniel Hiltgen's avatar
      Enable windows error dialog for subprocess startup · e12fff88
      Daniel Hiltgen authored
      Make sure if something goes wrong spawning the process, the user gets
      enough info to be able to try to self correct, or at least file a bug
      with details so we can fix it.  Once the process starts, we immediately
      change back to the recommended setting to prevent the blocking dialog.
      This ensures if the model fails to load (OOM, unsupported model type,
      etc.) the process will exit quickly and we can scan the stdout/stderr
      of the subprocess for the reason to report via API.
      e12fff88
  8. 15 Jul, 2024 1 commit
    • royjhan's avatar
      Introduce `/api/embed` endpoint supporting batch embedding (#5127) · b9f5e16c
      royjhan authored
      * Initial Batch Embedding
      
      * Revert "Initial Batch Embedding"
      
      This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.
      
      * Initial Draft
      
      * mock up notes
      
      * api/embed draft
      
      * add server function
      
      * check normalization
      
      * clean up
      
      * normalization
      
      * playing around with truncate stuff
      
      * Truncation
      
      * Truncation
      
      * move normalization to go
      
      * Integration Test Template
      
      * Truncation Integration Tests
      
      * Clean up
      
      * use float32
      
      * move normalize
      
      * move normalize test
      
      * refactoring
      
      * integration float32
      
      * input handling and handler testing
      
      * Refactoring of legacy and new
      
      * clear comments
      
      * merge conflicts
      
      * touches
      
      * embedding type 64
      
      * merge conflicts
      
      * fix hanging on single string
      
      * refactoring
      
      * test values
      
      * set context length
      
      * clean up
      
      * testing clean up
      
      * testing clean up
      
      * remove function closure
      
      * Revert "remove function closure"
      
      This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.
      
      * remove function closure
      
      * remove redundant error check
      
      * clean up
      
      * more clean up
      
      * clean up
      b9f5e16c
  9. 07 Jul, 2024 2 commits
  10. 05 Jul, 2024 1 commit
  11. 03 Jul, 2024 1 commit
  12. 29 Jun, 2024 1 commit
  13. 19 Jun, 2024 1 commit
  14. 14 Jun, 2024 1 commit
  15. 11 Jun, 2024 1 commit
  16. 09 Jun, 2024 1 commit
  17. 01 Jun, 2024 1 commit
  18. 29 May, 2024 3 commits
  19. 23 May, 2024 2 commits
  20. 20 May, 2024 1 commit
    • Sam's avatar
      feat: add support for flash_attn (#4120) · e15307fd
      Sam authored
      * feat: enable flash attention if supported
      
      * feat: enable flash attention if supported
      
      * feat: enable flash attention if supported
      
      * feat: add flash_attn support
      e15307fd
  21. 09 May, 2024 1 commit
  22. 04 May, 2024 1 commit
  23. 30 Apr, 2024 3 commits
  24. 17 Apr, 2024 1 commit
  25. 16 Apr, 2024 1 commit
  26. 01 Apr, 2024 2 commits
    • Daniel Hiltgen's avatar
      Apply 01-cache.diff · 0a0e9f3e
      Daniel Hiltgen authored
      0a0e9f3e
    • Daniel Hiltgen's avatar
      Switch back to subprocessing for llama.cpp · 58d95cc9
      Daniel Hiltgen authored
      This should resolve a number of memory leak and stability defects by allowing
      us to isolate llama.cpp in a separate process and shutdown when idle, and
      gracefully restart if it has problems.  This also serves as a first step to be
      able to run multiple copies to support multiple models concurrently.
      58d95cc9
  27. 26 Mar, 2024 1 commit
  28. 23 Mar, 2024 1 commit
  29. 16 Mar, 2024 1 commit
  30. 12 Mar, 2024 2 commits