Re-introduce the `llama` package (#5034)
* Re-introduce the llama package This PR brings back the llama package, making it possible to call llama.cpp and ggml APIs from Go directly via CGo. This has a few advantages: - C APIs can be called directly from Go without needing to use the previous "server" REST API - On macOS and for CPU builds on Linux and Windows, Ollama can be built without a go generate ./... step, making it easy to get up and running to hack on parts of Ollama that don't require fast inference - Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners takes <5 min on a fast CPU) - No git submodule making it easier to clone and build from source This is a big PR, but much of it is vendor code except for: - llama.go CGo bindings - example/: a simple example of running inference - runner/: a subprocess server designed to replace the llm/ext_server package - Makefile an as minimal as possible Makefile to build the runner package for different...
Showing
llama/make/Makefile.cuda_v11
0 → 100644
This diff is collapsed.
llama/make/Makefile.cuda_v12
0 → 100644
This diff is collapsed.
llama/make/Makefile.default
0 → 100644
This diff is collapsed.
llama/make/Makefile.rocm
0 → 100644
This diff is collapsed.
llama/make/common-defs.make
0 → 100644
This diff is collapsed.
llama/make/cuda.make
0 → 100644
This diff is collapsed.
llama/make/gpu.make
0 → 100644
This diff is collapsed.
llama/patches/01-cuda.diff
0 → 100644
This diff is collapsed.
This diff is collapsed.
llama/patches/03-metal.diff
0 → 100644
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
llama/patches/11-blas.diff
0 → 100644
This diff is collapsed.
llama/runner/README.md
0 → 100644
This diff is collapsed.
llama/runner/cache.go
0 → 100644
This diff is collapsed.
llama/runner/cache_test.go
0 → 100644
This diff is collapsed.
llama/runner/requirements.go
0 → 100644
This diff is collapsed.
Please register or sign in to comment