• Bruce MacDonald's avatar
    subprocess llama.cpp server (#401) · 42998d79
    Bruce MacDonald authored
    * remove c code
    * pack llama.cpp
    * use request context for llama_cpp
    * let llama_cpp decide the number of threads to use
    * stop llama runner when app stops
    * remove sample count and duration metrics
    * use go generate to get libraries
    * tmp dir for running llm
    42998d79
development.md 318 Bytes