- 29 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 22 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
Make sure if something goes wrong spawning the process, the user gets enough info to be able to try to self correct, or at least file a bug with details so we can fix it. Once the process starts, we immediately change back to the recommended setting to prevent the blocking dialog. This ensures if the model fails to load (OOM, unsupported model type, etc.) the process will exit quickly and we can scan the stdout/stderr of the subprocess for the reason to report via API.
-
- 15 Jul, 2024 1 commit
-
-
royjhan authored
* Initial Batch Embedding * Revert "Initial Batch Embedding" This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29. * Initial Draft * mock up notes * api/embed draft * add server function * check normalization * clean up * normalization * playing around with truncate stuff * Truncation * Truncation * move normalization to go * Integration Test Template * Truncation Integration Tests * Clean up * use float32 * move normalize * move normalize test * refactoring * integration float32 * input handling and handler testing * Refactoring of legacy and new * clear comments * merge conflicts * touches * embedding type 64 * merge conflicts * fix hanging on single string * refactoring * test values * set context length * clean up * testing clean up * testing clean up * remove function closure * Revert "remove function closure" This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787. * remove function closure * remove redundant error check * clean up * more clean up * clean up
-
- 07 Jul, 2024 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 05 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
* Use common prefix to select slot * actually report `longest`
-
- 03 Jul, 2024 1 commit
-
-
royjhan authored
* openai compatibility * Revert "openai compatibility" This reverts commit d3f98a811e00fc497d889c8c45b0cfec5b64690c. * remove erroneous subtraction of prompt cache
-
- 29 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
* Do not shift context for sliding window models * truncate prompt > 2/3 tokens * only target gemma2
-
- 19 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 14 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 11 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 09 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
* fix embedding by adding fixes from llama.cpp upstream * remove assert --------- Co-authored-by:Jesper Ek <deadbeef84@gmail.com>
-
- 01 Jun, 2024 1 commit
-
-
Michael Yang authored
* Revert "use `int32_t` for call to tokenize (#4738)" This reverts commit 763bb65d. * Revert "vocab only" This reverts commit bf54c845. * Revert "use ffi for tokenizing/detokenizing" This reverts commit 26a00a04.
-
- 29 May, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 23 May, 2024 2 commits
-
-
Michael Yang authored
-
Daniel Hiltgen authored
This doesn't expose a UX yet, but wires the initial server portion of progress reporting during load
-
- 20 May, 2024 1 commit
-
-
Sam authored
* feat: enable flash attention if supported * feat: enable flash attention if supported * feat: enable flash attention if supported * feat: add flash_attn support
-
- 09 May, 2024 1 commit
-
-
Michael Yang authored
-
- 04 May, 2024 1 commit
-
-
Michael Yang authored
-
- 30 Apr, 2024 3 commits
-
-
jmorganca authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
* Bump llama.cpp to b2761 * Adjust types for bump
-
- 17 Apr, 2024 1 commit
-
-
ManniX-ITA authored
-
- 16 Apr, 2024 1 commit
-
-
Jeffrey Morgan authored
* parse wide argv characters on windows * cleanup * move cleanup to end of `main`
-
- 01 Apr, 2024 2 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
-
- 26 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 23 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
The release just before ggml-cuda.cu refactoring
-
- 16 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 12 Mar, 2024 2 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-