Unverified Commit 96efd905 authored by Jeffrey Morgan's avatar Jeffrey Morgan Committed by GitHub
Browse files

Re-introduce the `llama` package (#5034)

* Re-introduce the llama package

This PR brings back the llama package, making it possible to call llama.cpp and
ggml APIs from Go directly via CGo. This has a few advantages:

- C APIs can be called directly from Go without needing to use the previous
  "server" REST API
- On macOS and for CPU builds on Linux and Windows, Ollama can be built without
  a go generate ./... step, making it easy to get up and running to hack on
  parts of Ollama that don't require fast inference
- Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners
  takes <5 min on a fast CPU)
- No git submodule making it easier to clone and build from source

This is a big PR, but much of it is vendor code except for:

- llama.go CGo bindings
- example/: a simple example of running inference
- runner/: a subprocess server designed to replace the llm/ext_server package
- Makefile an as minimal as possible Makefile to build the runner package for
  different...
parent de982616
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment