Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
Showing
llm/llm_windows.go
0 → 100644
llm/payload.go
0 → 100644
llm/payload_test.go
deleted
100644 → 0
llm/server.go
0 → 100644
This diff is collapsed.
llm/status.go
0 → 100644
llm/utils.go
deleted
100644 → 0
Please register or sign in to comment