• Jeffrey Morgan's avatar
    Re-introduce the `llama` package (#5034) · 96efd905
    Jeffrey Morgan authored
    * Re-introduce the llama package
    
    This PR brings back the llama package, making it possible to call llama.cpp and
    ggml APIs from Go directly via CGo. This has a few advantages:
    
    - C APIs can be called directly from Go without needing to use the previous
      "server" REST API
    - On macOS and for CPU builds on Linux and Windows, Ollama can be built without
      a go generate ./... step, making it easy to get up and running to hack on
      parts of Ollama that don't require fast inference
    - Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners
      takes <5 min on a fast CPU)
    - No git submodule making it easier to clone and build from source
    
    This is a big PR, but much of it is vendor code except for:
    
    - llama.go CGo bindings
    - example/: a simple example of running inference
    - runner/: a subprocess server designed to replace the llm/ext_server package
    - Makefile an as minimal as possible Makefile to build the runner package for
      different...
    96efd905
mmq-instance-q6_k.cu 1.34 KB