Re-introduce the `llama` package (#5034)

* Re-introduce the llama package This PR brings back the llama package, making it possible to call llama.cpp and ggml APIs from Go directly via CGo. This has a few advantages: - C APIs can be called directly from Go without needing to use the previous "server" REST API - On macOS and for CPU builds on Linux and Windows, Ollama can be built without a go generate ./... step, making it easy to get up and running to hack on parts of Ollama that don't require fast inference - Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners takes <5 min on a fast CPU) - No git submodule making it easier to clone and build from source This is a big PR, but much of it is vendor code except for: - llama.go CGo bindings - example/: a simple example of running inference - runner/: a subprocess server designed to replace the llm/ext_server package - Makefile an as minimal as possible Makefile to build the runner package for different...

Re-introduce the `llama` package (#5034)
* Re-introduce the llama package This PR brings back the llama package, making it possible to call llama.cpp and ggml APIs from Go directly via CGo. This has a few advantages: - C APIs can be called directly from Go without needing to use the previous "server" REST API - On macOS and for CPU builds on Linux and Windows, Ollama can be built without a go generate ./... step, making it easy to get up and running to hack on parts of Ollama that don't require fast inference - Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners takes <5 min on a fast CPU) - No git submodule making it easier to clone and build from source This is a big PR, but much of it is vendor code except for: - llama.go CGo bindings - example/: a simple example of running inference - runner/: a subprocess server designed to replace the llm/ext_server package - Makefile an as minimal as possible Makefile to build the runner package for different...
96efd905 · Jeffrey Morgan · GitHub · de982616 · 96efd905 · 96efd905
Unverified Commit 96efd905 authored Oct 08, 2024 by Jeffrey Morgan Committed by GitHub Oct 08, 2024
20 changed files
--- a/llama/runner/runner.go
+++ b/llama/runner/runner.go
--- a/llama/runner/stop.go
+++ b/llama/runner/stop.go
--- a/llama/runner/stop_test.go
+++ b/llama/runner/stop_test.go
--- a/llama/sampling.cpp
+++ b/llama/sampling.cpp
--- a/llama/sampling.h
+++ b/llama/sampling.h
--- a/llama/sampling_ext.cpp
+++ b/llama/sampling_ext.cpp
--- a/llama/sampling_ext.h
+++ b/llama/sampling_ext.h
--- a/llama/sgemm.cpp
+++ b/llama/sgemm.cpp
--- a/llama/sgemm.h
+++ b/llama/sgemm.h
--- a/llama/stb_image.h
+++ b/llama/stb_image.h
--- a/llama/sync.sh
+++ b/llama/sync.sh
--- a/llama/unicode-data.cpp
+++ b/llama/unicode-data.cpp
--- a/llama/unicode-data.h
+++ b/llama/unicode-data.h
--- a/llama/unicode.cpp
+++ b/llama/unicode.cpp
--- a/llama/unicode.h
+++ b/llama/unicode.h
--- a/llm/ext_server/server.cpp
+++ b/llm/ext_server/server.cpp
--- a/llm/generate/gen_darwin.sh
+++ b/llm/generate/gen_darwin.sh
--- a/llm/generate/gen_linux.sh
+++ b/llm/generate/gen_linux.sh
--- a/llm/generate/gen_windows.ps1
+++ b/llm/generate/gen_windows.ps1
--- a/llm/llm.go
+++ b/llm/llm.go