- 15 Jul, 2024 1 commit
-
-
royjhan authored
* Initial Batch Embedding * Revert "Initial Batch Embedding" This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29. * Initial Draft * mock up notes * api/embed draft * add server function * check normalization * clean up * normalization * playing around with truncate stuff * Truncation * Truncation * move normalization to go * Integration Test Template * Truncation Integration Tests * Clean up * use float32 * move normalize * move normalize test * refactoring * integration float32 * input handling and handler testing * Refactoring of legacy and new * clear comments * merge conflicts * touches * embedding type 64 * merge conflicts * fix hanging on single string * refactoring * test values * set context length * clean up * testing clean up * testing clean up * remove function closure * Revert "remove function closure" This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787. * remove function closure * remove redundant error check * clean up * more clean up * clean up
-
- 07 Jul, 2024 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 05 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
* Use common prefix to select slot * actually report `longest`
-
- 03 Jul, 2024 1 commit
-
-
royjhan authored
* openai compatibility * Revert "openai compatibility" This reverts commit d3f98a811e00fc497d889c8c45b0cfec5b64690c. * remove erroneous subtraction of prompt cache
-
- 29 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
* Do not shift context for sliding window models * truncate prompt > 2/3 tokens * only target gemma2
-
- 19 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 14 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 11 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 09 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
* fix embedding by adding fixes from llama.cpp upstream * remove assert --------- Co-authored-by:Jesper Ek <deadbeef84@gmail.com>
-
- 01 Jun, 2024 1 commit
-
-
Michael Yang authored
* Revert "use `int32_t` for call to tokenize (#4738)" This reverts commit 763bb65d. * Revert "vocab only" This reverts commit bf54c845. * Revert "use ffi for tokenizing/detokenizing" This reverts commit 26a00a04.
-
- 29 May, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 23 May, 2024 2 commits
-
-
Michael Yang authored
-
Daniel Hiltgen authored
This doesn't expose a UX yet, but wires the initial server portion of progress reporting during load
-
- 20 May, 2024 1 commit
-
-
Sam authored
* feat: enable flash attention if supported * feat: enable flash attention if supported * feat: enable flash attention if supported * feat: add flash_attn support
-
- 09 May, 2024 1 commit
-
-
Michael Yang authored
-
- 04 May, 2024 1 commit
-
-
Michael Yang authored
-
- 30 Apr, 2024 3 commits
-
-
jmorganca authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
* Bump llama.cpp to b2761 * Adjust types for bump
-
- 17 Apr, 2024 1 commit
-
-
ManniX-ITA authored
-
- 16 Apr, 2024 1 commit
-
-
Jeffrey Morgan authored
* parse wide argv characters on windows * cleanup * move cleanup to end of `main`
-
- 01 Apr, 2024 2 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
-
- 26 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 23 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
The release just before ggml-cuda.cu refactoring
-
- 16 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 12 Mar, 2024 2 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-