Re-introduce the `llama` package (#5034)
* Re-introduce the llama package This PR brings back the llama package, making it possible to call llama.cpp and ggml APIs from Go directly via CGo. This has a few advantages: - C APIs can be called directly from Go without needing to use the previous "server" REST API - On macOS and for CPU builds on Linux and Windows, Ollama can be built without a go generate ./... step, making it easy to get up and running to hack on parts of Ollama that don't require fast inference - Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners takes <5 min on a fast CPU) - No git submodule making it easier to clone and build from source This is a big PR, but much of it is vendor code except for: - llama.go CGo bindings - example/: a simple example of running inference - runner/: a subprocess server designed to replace the llm/ext_server package - Makefile an as minimal as possible Makefile to build the runner package for different...
Showing
llama/runner/runner.go
0 → 100644
This diff is collapsed.
llama/runner/stop.go
0 → 100644
This diff is collapsed.
llama/runner/stop_test.go
0 → 100644
This diff is collapsed.
llama/sampling.cpp
0 → 100644
This diff is collapsed.
llama/sampling.h
0 → 100644
This diff is collapsed.
llama/sampling_ext.cpp
0 → 100644
This diff is collapsed.
llama/sampling_ext.h
0 → 100644
This diff is collapsed.
llama/sgemm.cpp
0 → 100644
This diff is collapsed.
llama/sgemm.h
0 → 100644
This diff is collapsed.
llama/stb_image.h
0 → 100644
This diff is collapsed.
llama/sync.sh
0 → 100755
This diff is collapsed.
llama/unicode-data.cpp
0 → 100644
This diff is collapsed.
llama/unicode-data.h
0 → 100644
This diff is collapsed.
llama/unicode.cpp
0 → 100644
This diff is collapsed.
llama/unicode.h
0 → 100644
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
llm/llm.go
deleted
100644 → 0
This diff is collapsed.
Please register or sign in to comment