- 01 May, 2024 1 commit
-
-
Mark Ward authored
-
- 01 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
-
- 15 Feb, 2024 1 commit
-
-
Daniel Hiltgen authored
This focuses on Windows first, but coudl be used for Mac and possibly linux in the future.
-
- 19 Dec, 2023 1 commit
-
-
Daniel Hiltgen authored
Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.
-
- 27 Nov, 2023 1 commit
-
-
Jason Jacobs authored
-
- 24 Nov, 2023 1 commit
-
-
Jing Zhang authored
* Support cuda build in Windows * Enable dynamic NumGPU allocation for Windows
-
- 18 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 30 Aug, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Bruce MacDonald authored
* remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm
-
- 28 Jul, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 22 Jul, 2023 1 commit
-
-
jk1jk authored
-
- 12 Jul, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 11 Jul, 2023 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 06 Jul, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 26 Jun, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 25 Jun, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 23 Jun, 2023 3 commits
-
-
Bruce MacDonald authored
-
Bruce MacDonald authored
-
Bruce MacDonald authored
-
- 22 Jun, 2023 1 commit
-
-
Jeffrey Morgan authored
-