- 22 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
* Fix embeddings memory corruption The patch was leading to a buffer overrun corruption. Once removed though, parallism in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To work around this, only use slot 0 for embeddings. * Fix embed integration test assumption The token eval count has changed with recent llama.cpp bumps (0.3.5+)
-
- 06 Aug, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 31 Jul, 2024 2 commits
-
-
Michael Yang authored
-
jmorganca authored
-
- 29 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 26 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 25 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
This reverts commit bb46bbcf.
-
- 24 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 22 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 21 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 20 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 07 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 05 Jul, 2024 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
* Fix assert on small embedding inputs * Update llm/patches/09-pooling.diff
-
- 03 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
On windows, if the model dir contained unicode characters clip models would fail to load. This fixes the file name handling in clip.cpp to support utf16 on windows.
-
- 27 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 17 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
* llm: update llama.cpp submodule to `7c26775` * disable `LLAMA_BLAS` for now * `-DLLAMA_OPENMP=off`
-
- 07 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 30 May, 2024 1 commit
-
-
Jeffrey Morgan authored
* update llama.cpp submodule to `5921b8f089d3b7bda86aac5a66825df6a6c10603` * add patch
-
- 23 May, 2024 2 commits
-
-
Michael Yang authored
-
Daniel Hiltgen authored
This doesn't expose a UX yet, but wires the initial server portion of progress reporting during load
-
- 16 May, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 06 May, 2024 1 commit
-
-
Jeffrey Morgan authored
* fix llava models not working after first request * individual requests only for llava models
-
- 26 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 25 Apr, 2024 1 commit
-
-
jmorganca authored
-
- 02 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 23 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
The release just before ggml-cuda.cu refactoring
-
- 14 Mar, 2024 1 commit
-
-
Michael Yang authored
-
- 13 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 11 Mar, 2024 2 commits
-
-
Bruce MacDonald authored
-
Jeffrey Morgan authored
-
- 10 Mar, 2024 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 09 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 08 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 01 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 20 Feb, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 19 Feb, 2024 1 commit
-
-
Daniel Hiltgen authored
This should resolve the problem where we don't fully unload from the GPU when we go idle.
-
- 12 Feb, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 06 Feb, 2024 1 commit
-
-
Daniel Hiltgen authored
-