Re-introduce the `llama` package (#5034)
* Re-introduce the llama package This PR brings back the llama package, making it possible to call llama.cpp and ggml APIs from Go directly via CGo. This has a few advantages: - C APIs can be called directly from Go without needing to use the previous "server" REST API - On macOS and for CPU builds on Linux and Windows, Ollama can be built without a go generate ./... step, making it easy to get up and running to hack on parts of Ollama that don't require fast inference - Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners takes <5 min on a fast CPU) - No git submodule making it easier to clone and build from source This is a big PR, but much of it is vendor code except for: - llama.go CGo bindings - example/: a simple example of running inference - runner/: a subprocess server designed to replace the llm/ext_server package - Makefile an as minimal as possible Makefile to build the runner package for different...
Showing
llama/.gitignore
0 → 100644
llama/Dockerfile
0 → 100644
llama/Makefile
0 → 100644
llama/README.md
0 → 100644
llama/base64.hpp
0 → 100644
llama/build-info.cpp
0 → 100644
llama/clip.cpp
0 → 100644
This diff is collapsed.
llama/clip.h
0 → 100644
llama/common.cpp
0 → 100644
This diff is collapsed.
llama/common.h
0 → 100644
This diff is collapsed.
llama/ggml-aarch64.c
0 → 100644
This diff is collapsed.
llama/ggml-aarch64.h
0 → 100644
Please register or sign in to comment