Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
Showing
llm/dyn_ext_server.c
deleted
100644 → 0
llm/dyn_ext_server.h
deleted
100644 → 0
llm/llama.go
deleted
100644 → 0
Please register or sign in to comment