Re-introduce the `llama` package (#5034) (96efd905) · Commits · orangecat / ollama

Unverified Commit 96efd905 authored Oct 08, 2024 by

Jeffrey Morgan Committed by GitHub Oct 08, 2024

Re-introduce the `llama` package (#5034)

* Re-introduce the llama package

This PR brings back the llama package, making it possible to call llama.cpp and
ggml APIs from Go directly via CGo. This has a few advantages:

- C APIs can be called directly from Go without needing to use the previous
  "server" REST API
- On macOS and for CPU builds on Linux and Windows, Ollama can be built without
  a go generate ./... step, making it easy to get up and running to hack on
  parts of Ollama that don't require fast inference
- Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners
  takes <5 min on a fast CPU)
- No git submodule making it easier to clone and build from source

This is a big PR, but much of it is vendor code except for:

- llama.go CGo bindings
- example/: a simple example of running inference
- runner/: a subprocess server designed to replace the llm/ext_server package
- Makefile an as minimal as possible Makefile to build the runner package for
  different...

parent de982616

Show whitespace changes

Inline Side-by-side

Too many changes to show.

To preserve performance only 237 of 237+ files are displayed.

Plain diff Email patch

Please register or to comment