Unverified Commit 4879a234 authored by Daniel Hiltgen's avatar Daniel Hiltgen Committed by GitHub
Browse files

build: Make target improvements (#7499)

* llama: wire up builtin runner

This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build.  After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.

* build: Make target improvements

Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.

* Support customized CPU flags for runners

This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash.  If the user builds a customized set, we omit the naming
scheme and don't check for compatibility.  This avoids checking
requirements at runtime, so that logic has been removed as well.  This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.

* Use relative paths

If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.

* Remove payloads from main binary

* install: clean up prior libraries

This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.
parent 63269668
...@@ -3,35 +3,24 @@ ...@@ -3,35 +3,24 @@
Install required tools: Install required tools:
- go version 1.22 or higher - go version 1.22 or higher
- gcc version 11.4.0 or higher - OS specific C/C++ compiler (see below)
- GNU Make
### MacOS ## Overview
[Download Go](https://go.dev/dl/)
Optionally enable debugging and more verbose logging: Ollama uses a mix of Go and C/C++ code to interface with GPUs. The C/C++ code is compiled with both CGO and GPU library specific compilers. A set of GNU Makefiles are used to compile the project. GPU Libraries are auto-detected based on the typical environment variables used by the respective libraries, but can be overridden if necessary. The default make target will build the runners and primary Go Ollama application that will run within the repo directory. Throughout the examples below `-j 5` is suggested for 5 parallel jobs to speed up the build. You can adjust the job count based on your CPU Core count to reduce build times. If you want to relocate the built binaries, use the `dist` target and recursively copy the files in `./dist/$OS-$ARCH/` to your desired location. To learn more about the other make targets use `make help`
```bash Once you have built the GPU/CPU runners, you can compile the main application with `go build .`
# At build time
export CGO_CFLAGS="-g"
# At runtime ### MacOS
export OLLAMA_DEBUG=1
```
Get the required libraries and build the native LLM code: (Adjust the job count based on your number of processors for a faster build) [Download Go](https://go.dev/dl/)
```bash ```bash
make -j 5 make -j 5
``` ```
Then build ollama:
```bash
go build .
```
Now you can run `ollama`: Now you can run `ollama`:
```bash ```bash
...@@ -51,64 +40,42 @@ _Your operating system distribution may already have packages for NVIDIA CUDA. D ...@@ -51,64 +40,42 @@ _Your operating system distribution may already have packages for NVIDIA CUDA. D
Install `make`, `gcc` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) Install `make`, `gcc` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads)
development and runtime packages. development and runtime packages.
Typically the build scripts will auto-detect CUDA, however, if your Linux distro Typically the makefile will auto-detect CUDA, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by or installation approach uses alternative paths, you can specify the location by
specifying an environment variable `CUDA_LIB_DIR` to the location of the shared overriding `CUDA_PATH` to the location of the CUDA toolkit. You can customize
libraries, and `CUDACXX` to the location of the nvcc compiler. You can customize a set of target CUDA architectures by setting `CUDA_ARCHITECTURES` (e.g. `CUDA_ARCHITECTURES=50;60;70`)
a set of target CUDA architectures by setting `CMAKE_CUDA_ARCHITECTURES` (e.g. "50;60;70")
Then generate dependencies: (Adjust the job count based on your number of processors for a faster build)
``` ```
make -j 5 make -j 5
``` ```
Then build the binary: If both v11 and v12 tookkits are detected, runners for both major versions will be built by default. You can build just v12 with `make cuda_v12`
``` #### Older Linux CUDA (NVIDIA)
go build .
``` To support older GPUs with Compute Capability 3.5 or 3.7, you will need to use an older version of the Driver from [Unix Driver Archive](https://www.nvidia.com/en-us/drivers/unix/) (tested with 470) and [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive) (tested with cuda V11). When you build Ollama, you will need to set two make variable to adjust the minimum compute capability Ollama supports via `make -j 5 CUDA_ARCHITECTURES="35;37;50;52" EXTRA_GOLDLAGS="\"-X=github.com/ollama/ollama/discover.CudaComputeMajorMin=3\" \"-X=github.com/ollama/ollama/discover.CudaComputeMinorMin=5\""`. To find the Compute Capability of your older GPU, refer to [GPU Compute Capability](https://developer.nvidia.com/cuda-gpus).
#### Linux ROCm (AMD) #### Linux ROCm (AMD)
_Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_ _Your operating system distribution may already have packages for AMD ROCm. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_
Install [CLBlast](https://github.com/CNugteren/CLBlast/blob/master/doc/installation.md) and [ROCm](https://rocm.docs.amd.com/en/latest/) development packages first, as well as `make`, `gcc`, and `golang`. Install [ROCm](https://rocm.docs.amd.com/en/latest/) development packages first, as well as `make`, `gcc`, and `golang`.
Typically the build scripts will auto-detect ROCm, however, if your Linux distro Typically the build scripts will auto-detect ROCm, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by or installation approach uses unusual paths, you can specify the location by
specifying an environment variable `ROCM_PATH` to the location of the ROCm specifying an environment variable `HIP_PATH` to the location of the ROCm
install (typically `/opt/rocm`), and `CLBlast_DIR` to the location of the install (typically `/opt/rocm`). You can also customize
CLBlast install (typically `/usr/lib/cmake/CLBlast`). You can also customize the AMD GPU targets by setting HIP_ARCHS (e.g. `HIP_ARCHS=gfx1101;gfx1102`)
the AMD GPU targets by setting AMDGPU_TARGETS (e.g. `AMDGPU_TARGETS="gfx1101;gfx1102"`)
Then generate dependencies: (Adjust the job count based on your number of processors for a faster build)
``` ```
make -j 5 make -j 5
``` ```
Then build the binary:
```
go build .
```
ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the `render` group, or run as root. ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the `render` group, or run as root.
#### Advanced CPU Settings
By default, running `make` will compile a few different variations
of the LLM library based on common CPU families and vector math capabilities,
including a lowest-common-denominator which should run on almost any 64 bit CPU
somewhat slowly. At runtime, Ollama will auto-detect the optimal variation to
load.
Custom CPU settings are not currently supported in the new Go server build but will be added back after we complete the transition.
#### Containerized Linux Build #### Containerized Linux Build
If you have Docker available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included. The resulting binary is placed in `./dist` If you have Docker and buildx available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included. The resulting artifacts are placed in `./dist` and by default the script builds both arm64 and amd64 binaries. If you want to build only amd64, you can build with `PLATFORM=linux/amd64 ./scripts/build_linux.sh`
### Windows ### Windows
...@@ -126,12 +93,8 @@ The following tools are required as a minimal development environment to build C ...@@ -126,12 +93,8 @@ The following tools are required as a minimal development environment to build C
> [!NOTE] > [!NOTE]
> Due to bugs in the GCC C++ library for unicode support, Ollama should be built with clang on windows. > Due to bugs in the GCC C++ library for unicode support, Ollama should be built with clang on windows.
Then, build the `ollama` binary: ```
make -j 5
```powershell
$env:CGO_ENABLED="1"
make -j 8
go build .
``` ```
#### GPU Support #### GPU Support
...@@ -173,3 +136,30 @@ pacman -S mingw-w64-clang-aarch64-clang mingw-w64-clang-aarch64-gcc-compat mingw ...@@ -173,3 +136,30 @@ pacman -S mingw-w64-clang-aarch64-clang mingw-w64-clang-aarch64-gcc-compat mingw
``` ```
You will need to ensure your PATH includes go, cmake, gcc and clang mingw32-make to build ollama from source. (typically `C:\msys64\clangarm64\bin\`) You will need to ensure your PATH includes go, cmake, gcc and clang mingw32-make to build ollama from source. (typically `C:\msys64\clangarm64\bin\`)
## Advanced CPU Vector Settings
On x86, running `make` will compile several CPU runners which can run on different CPU families. At runtime, Ollama will auto-detect the best variation to load. If GPU libraries are present at build time, Ollama also compiles GPU runners with the `AVX` CPU vector feature enabled. This provides a good performance balance when loading large models that split across GPU and CPU with broad compatibility. Some users may prefer no vector extensions (e.g. older Xeon/Celeron processors, or hypervisors that mask the vector features) while other users may prefer turning on many more vector extensions to further improve performance for split model loads.
To customize the set of CPU vector features enabled for a CPU runner and all GPU runners, use CUSTOM_CPU_FLAGS during the build.
To build without any vector flags:
```
make CUSTOM_CPU_FLAGS=""
```
To build with both AVX and AVX2:
```
make CUSTOM_CPU_FLAGS=avx,avx2
```
To build with AVX512 features turned on:
```
make CUSTOM_CPU_FLAGS=avx,avx2,avx512,avx512vbmi,avx512vnni,avx512bf16
```
> [!NOTE]
> If you are experimenting with different flags, make sure to do a `make clean` between each change to ensure everything is rebuilt with the new compiler flags
...@@ -28,6 +28,7 @@ Check your compute compatibility to see if your card is supported: ...@@ -28,6 +28,7 @@ Check your compute compatibility to see if your card is supported:
| 5.0 | GeForce GTX | `GTX 750 Ti` `GTX 750` `NVS 810` | | 5.0 | GeForce GTX | `GTX 750 Ti` `GTX 750` `NVS 810` |
| | Quadro | `K2200` `K1200` `K620` `M1200` `M520` `M5000M` `M4000M` `M3000M` `M2000M` `M1000M` `K620M` `M600M` `M500M` | | | Quadro | `K2200` `K1200` `K620` `M1200` `M520` `M5000M` `M4000M` `M3000M` `M2000M` `M1000M` `K620M` `M600M` `M500M` |
For building locally to support older GPUs, see [developer.md](./development.md#linux-cuda-nvidia)
### GPU Selection ### GPU Selection
......
...@@ -10,6 +10,9 @@ curl -fsSL https://ollama.com/install.sh | sh ...@@ -10,6 +10,9 @@ curl -fsSL https://ollama.com/install.sh | sh
## Manual install ## Manual install
> [!NOTE]
> If you are upgrading from a prior version, you should remove the old libraries with `sudo rm -rf /usr/lib/ollama` first.
Download and extract the package: Download and extract the package:
```shell ```shell
......
...@@ -83,3 +83,6 @@ If you'd like to install or integrate Ollama as a service, a standalone ...@@ -83,3 +83,6 @@ If you'd like to install or integrate Ollama as a service, a standalone
and GPU library dependencies for Nvidia and AMD. This allows for embedding and GPU library dependencies for Nvidia and AMD. This allows for embedding
Ollama in existing applications, or running it as a system service via `ollama Ollama in existing applications, or running it as a system service via `ollama
serve` with tools such as [NSSM](https://nssm.cc/). serve` with tools such as [NSSM](https://nssm.cc/).
> [!NOTE]
> If you are upgrading from a prior version, you should remove the old directories first.
...@@ -175,7 +175,6 @@ func String(s string) func() string { ...@@ -175,7 +175,6 @@ func String(s string) func() string {
var ( var (
LLMLibrary = String("OLLAMA_LLM_LIBRARY") LLMLibrary = String("OLLAMA_LLM_LIBRARY")
TmpDir = String("OLLAMA_TMPDIR")
CudaVisibleDevices = String("CUDA_VISIBLE_DEVICES") CudaVisibleDevices = String("CUDA_VISIBLE_DEVICES")
HipVisibleDevices = String("HIP_VISIBLE_DEVICES") HipVisibleDevices = String("HIP_VISIBLE_DEVICES")
...@@ -250,7 +249,6 @@ func AsMap() map[string]EnvVar { ...@@ -250,7 +249,6 @@ func AsMap() map[string]EnvVar {
"OLLAMA_NUM_PARALLEL": {"OLLAMA_NUM_PARALLEL", NumParallel(), "Maximum number of parallel requests"}, "OLLAMA_NUM_PARALLEL": {"OLLAMA_NUM_PARALLEL", NumParallel(), "Maximum number of parallel requests"},
"OLLAMA_ORIGINS": {"OLLAMA_ORIGINS", Origins(), "A comma separated list of allowed origins"}, "OLLAMA_ORIGINS": {"OLLAMA_ORIGINS", Origins(), "A comma separated list of allowed origins"},
"OLLAMA_SCHED_SPREAD": {"OLLAMA_SCHED_SPREAD", SchedSpread(), "Always schedule model across all GPUs"}, "OLLAMA_SCHED_SPREAD": {"OLLAMA_SCHED_SPREAD", SchedSpread(), "Always schedule model across all GPUs"},
"OLLAMA_TMPDIR": {"OLLAMA_TMPDIR", TmpDir(), "Location for temporary files"},
"OLLAMA_MULTIUSER_CACHE": {"OLLAMA_MULTIUSER_CACHE", MultiUserCache(), "Optimize prompt caching for multi-user scenarios"}, "OLLAMA_MULTIUSER_CACHE": {"OLLAMA_MULTIUSER_CACHE", MultiUserCache(), "Optimize prompt caching for multi-user scenarios"},
// Informational // Informational
......
# top level makefile for Go server
include make/common-defs.make
RUNNER_TARGETS := default
# Determine which if any GPU runners we should build
ifeq ($(OS),windows)
CUDA_PATH?=$(shell cygpath -m -s "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\" 2>/dev/null)unknown
CUDA_BASE_DIR := $(dir $(shell cygpath -m -s "$(CUDA_PATH)\\.." 2>/dev/null))
CUDA_11:=$(shell ls -d $(CUDA_BASE_DIR)/v11.? 2>/dev/null)
CUDA_12:=$(shell ls -d $(CUDA_BASE_DIR)/v12.? 2>/dev/null)
HIP_LIB_DIR := $(shell ls -d $(HIP_PATH)/lib 2>/dev/null)
else ifeq ($(OS),linux)
HIP_PATH?=/opt/rocm
HIP_LIB_DIR := $(shell ls -d $(HIP_PATH)/lib 2>/dev/null)
CUDA_PATH?=/usr/local/cuda
CUDA_11:=$(shell ls -d $(CUDA_PATH)-11 2>/dev/null)
CUDA_12:=$(shell ls -d $(CUDA_PATH)-12 2>/dev/null)
endif
ifeq ($(OLLAMA_SKIP_CUDA_GENERATE),)
ifneq ($(CUDA_11),)
RUNNER_TARGETS += cuda_v11
endif
ifneq ($(CUDA_12),)
RUNNER_TARGETS += cuda_v12
endif
endif
ifeq ($(OLLAMA_SKIP_ROCM_GENERATE),)
ifneq ($(HIP_LIB_DIR),)
RUNNER_TARGETS += rocm
endif
endif
all: clean-payload .WAIT runners
runners: $(RUNNER_TARGETS)
$(RUNNER_TARGETS):
$(MAKE) -f make/Makefile.$@
help-sync apply-patches create-patches sync:
$(MAKE) -f make/Makefile.sync $@
clean:
rm -rf $(BUILD_DIR) $(DIST_RUNNERS) $(PAYLOAD_RUNNERS)
go clean -cache
clean-payload:
rm -rf $(addprefix $(RUNNERS_PAYLOAD_DIR)/, $(RUNNER_TARGETS) metal cpu cpu_avx cpu_avx2)
.PHONY: all runners clean clean-payload $(RUNNER_TARGETS) .WAIT
# Handy debugging for make variables
print-%:
@echo '$*=$($*)'
...@@ -9,22 +9,24 @@ package llama ...@@ -9,22 +9,24 @@ package llama
#cgo amd64,avx CXXFLAGS: -mavx #cgo amd64,avx CXXFLAGS: -mavx
#cgo amd64,avx2 CFLAGS: -mavx2 -mfma #cgo amd64,avx2 CFLAGS: -mavx2 -mfma
#cgo amd64,avx2 CXXFLAGS: -mavx2 -mfma #cgo amd64,avx2 CXXFLAGS: -mavx2 -mfma
#cgo amd64,avx512 CFLAGS: -mavx512f -mavx512dq -mavx512bw
#cgo amd64,avx512 CXXFLAGS: -mavx512f -mavx512dq -mavx512bw
#cgo amd64,avx512bf16 CFLAGS: -mavx512bf16 -D__AVX512BF16__
#cgo amd64,avx512bf16 CXXFLAGS: -mavx512bf16 -D__AVX512BF16__
#cgo amd64,avx512vbmi CFLAGS: -mavx512vbmi -D__AVX512VBMI__
#cgo amd64,avx512vbmi CXXFLAGS: -mavx512vbmi -D__AVX512VBMI__
#cgo amd64,avx512vnni CFLAGS: -mavx512vnni -D__AVX512VNNI__
#cgo amd64,avx512vnni CXXFLAGS: -mavx512vnni -D__AVX512VNNI__
#cgo amd64,f16c CFLAGS: -mf16c #cgo amd64,f16c CFLAGS: -mf16c
#cgo amd64,f16c CXXFLAGS: -mf16c #cgo amd64,f16c CXXFLAGS: -mf16c
#cgo amd64,fma CFLAGS: -mfma #cgo amd64,fma CFLAGS: -mfma
#cgo amd64,fma CXXFLAGS: -mfma #cgo amd64,fma CXXFLAGS: -mfma
#cgo avx CFLAGS: -mavx
#cgo avx CXXFLAGS: -mavx
#cgo avx2 CFLAGS: -mavx2 -mfma -mf16c
#cgo avx2 CXXFLAGS: -mavx2 -mfma -mf16c
#cgo cuda CFLAGS: -fPIE -DGGML_USE_CUDA -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1 #cgo cuda CFLAGS: -fPIE -DGGML_USE_CUDA -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1
#cgo cuda CFLAGS: -fPIE -DGGML_USE_CUDA -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1
#cgo cuda CXXFLAGS: -DGGML_USE_CUDA -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1
#cgo cuda CXXFLAGS: -DGGML_USE_CUDA -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1 #cgo cuda CXXFLAGS: -DGGML_USE_CUDA -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1
#cgo cuda_jetpack5 LDFLAGS: -lggml_cuda_jetpack5 -L/usr/local/cuda-11/lib64 #cgo cuda_jetpack5 LDFLAGS: -lggml_cuda_jetpack5
#cgo cuda_jetpack6 LDFLAGS: -lggml_cuda_jetpack6 -L/usr/local/cuda-12/lib64 #cgo cuda_jetpack6 LDFLAGS: -lggml_cuda_jetpack6
#cgo cuda_v11 LDFLAGS: -lggml_cuda_v11 -L/usr/local/cuda-11/lib64 #cgo cuda_v11 LDFLAGS: -lggml_cuda_v11
#cgo cuda_v12 LDFLAGS: -lggml_cuda_v12 -L/usr/local/cuda-12/lib64 #cgo cuda_v12 LDFLAGS: -lggml_cuda_v12
#cgo darwin,amd64 CFLAGS: -Wno-incompatible-pointer-types-discards-qualifiers #cgo darwin,amd64 CFLAGS: -Wno-incompatible-pointer-types-discards-qualifiers
#cgo darwin,amd64 CXXFLAGS: -Wno-incompatible-pointer-types-discards-qualifiers #cgo darwin,amd64 CXXFLAGS: -Wno-incompatible-pointer-types-discards-qualifiers
#cgo darwin,amd64 LDFLAGS: -framework Foundation #cgo darwin,amd64 LDFLAGS: -framework Foundation
...@@ -36,28 +38,24 @@ package llama ...@@ -36,28 +38,24 @@ package llama
#cgo darwin,arm64 LDFLAGS: -framework Foundation -framework Metal -framework MetalKit -framework Accelerate #cgo darwin,arm64 LDFLAGS: -framework Foundation -framework Metal -framework MetalKit -framework Accelerate
#cgo linux CFLAGS: -D_GNU_SOURCE #cgo linux CFLAGS: -D_GNU_SOURCE
#cgo linux CXXFLAGS: -D_GNU_SOURCE #cgo linux CXXFLAGS: -D_GNU_SOURCE
#cgo linux,amd64 LDFLAGS: -L${SRCDIR}/build/Linux/amd64 #cgo linux,amd64 LDFLAGS: -L${SRCDIR}/build/linux-amd64
#cgo linux,amd64 LDFLAGS: -L${SRCDIR}/build/Linux/amd64
#cgo linux,arm64 CFLAGS: -D__aarch64__ -D__ARM_NEON -D__ARM_FEATURE_FMA #cgo linux,arm64 CFLAGS: -D__aarch64__ -D__ARM_NEON -D__ARM_FEATURE_FMA
#cgo linux,arm64 CXXFLAGS: -D__aarch64__ -D__ARM_NEON -D__ARM_FEATURE_FMA #cgo linux,arm64 CXXFLAGS: -D__aarch64__ -D__ARM_NEON -D__ARM_FEATURE_FMA
#cgo linux,arm64 LDFLAGS: -L${SRCDIR}/build/Linux/arm64 #cgo linux,arm64 LDFLAGS: -L${SRCDIR}/build/linux-arm64
#cgo linux,arm64,sve CFLAGS: -march=armv8.6-a+sve #cgo linux,arm64,sve CFLAGS: -march=armv8.6-a+sve
#cgo linux,arm64,sve CXXFLAGS: -march=armv8.6-a+sve #cgo linux,arm64,sve CXXFLAGS: -march=armv8.6-a+sve
#cgo linux,cuda LDFLAGS: -lcuda -lcudart -lcublas -lcublasLt -lpthread -ldl -lrt -lresolv #cgo linux,cuda LDFLAGS: -lcuda -lcudart -lcublas -lcublasLt -lpthread -ldl -lrt -lresolv
#cgo linux,rocm LDFLAGS: -L/opt/rocm/lib -lpthread -ldl -lrt -lresolv #cgo linux,rocm LDFLAGS: -lpthread -ldl -lrt -lresolv
#cgo rocm CFLAGS: -DGGML_USE_CUDA -DGGML_USE_HIPBLAS -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1 #cgo rocm CFLAGS: -DGGML_USE_CUDA -DGGML_USE_HIPBLAS -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1
#cgo rocm CXXFLAGS: -DGGML_USE_CUDA -DGGML_USE_HIPBLAS -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1 #cgo rocm CXXFLAGS: -DGGML_USE_CUDA -DGGML_USE_HIPBLAS -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_BUILD=1
#cgo rocm LDFLAGS: -L${SRCDIR} -lggml_rocm -lhipblas -lamdhip64 -lrocblas #cgo rocm LDFLAGS: -L${SRCDIR} -lggml_rocm -lhipblas -lamdhip64 -lrocblas
#cgo windows CFLAGS: -Wno-discarded-qualifiers -D_WIN32_WINNT=0x602 #cgo windows CFLAGS: -Wno-discarded-qualifiers -D_WIN32_WINNT=0x602
#cgo windows CXXFLAGS: -D_WIN32_WINNT=0x602 #cgo windows CXXFLAGS: -D_WIN32_WINNT=0x602
#cgo windows LDFLAGS: -lmsvcrt
#cgo windows LDFLAGS: -lmsvcrt -static-libstdc++ -static-libgcc -static #cgo windows LDFLAGS: -lmsvcrt -static-libstdc++ -static-libgcc -static
#cgo windows,amd64 LDFLAGS: -L${SRCDIR}/build/Windows/amd64 #cgo windows,amd64 LDFLAGS: -L${SRCDIR}/build/windows-amd64
#cgo windows,amd64 LDFLAGS: -L${SRCDIR}/build/Windows/amd64
#cgo windows,arm64 CFLAGS: -D__aarch64__ -D__ARM_NEON -D__ARM_FEATURE_FMA #cgo windows,arm64 CFLAGS: -D__aarch64__ -D__ARM_NEON -D__ARM_FEATURE_FMA
#cgo windows,arm64 CXXFLAGS: -D__aarch64__ -D__ARM_NEON -D__ARM_FEATURE_FMA #cgo windows,arm64 CXXFLAGS: -D__aarch64__ -D__ARM_NEON -D__ARM_FEATURE_FMA
#cgo windows,arm64 LDFLAGS: -L${SRCDIR}/build/Windows/arm64 #cgo windows,arm64 LDFLAGS: -L${SRCDIR}/build/windows-arm64
#cgo windows,arm64 LDFLAGS: -L${SRCDIR}/build/Windows/arm64
#cgo windows,cuda LDFLAGS: -lcuda -lcudart -lcublas -lcublasLt #cgo windows,cuda LDFLAGS: -lcuda -lcudart -lcublas -lcublasLt
#cgo windows,rocm LDFLAGS: -lggml_rocm -lhipblas -lamdhip64 -lrocblas #cgo windows,rocm LDFLAGS: -lggml_rocm -lhipblas -lamdhip64 -lrocblas
......
package main package runner
import ( import (
"errors" "errors"
......
package main package runner
import ( import (
"testing" "testing"
......
package main package runner
import ( import (
"errors" "errors"
......
package main package runner
import ( import (
"reflect" "reflect"
......
package main
import (
"encoding/json"
"os"
"github.com/ollama/ollama/llama"
"github.com/ollama/ollama/version"
)
func printRequirements(fp *os.File) {
attrs := map[string]string{
"system_info": llama.PrintSystemInfo(),
"version": version.Version,
"cpu_features": llama.CpuFeatures,
}
enc := json.NewEncoder(fp)
_ = enc.Encode(attrs)
}
package main package runner
import ( import (
"context" "context"
...@@ -895,32 +895,37 @@ func (s *Server) loadModel( ...@@ -895,32 +895,37 @@ func (s *Server) loadModel(
s.ready.Done() s.ready.Done()
} }
func main() { func Execute(args []string) error {
mpath := flag.String("model", "", "Path to model binary file") if args[0] == "runner" {
ppath := flag.String("mmproj", "", "Path to projector binary file") args = args[1:]
parallel := flag.Int("parallel", 1, "Number of sequences to handle simultaneously") }
batchSize := flag.Int("batch-size", 512, "Batch size") fs := flag.NewFlagSet("runner", flag.ExitOnError)
nGpuLayers := flag.Int("n-gpu-layers", 0, "Number of layers to offload to GPU") mpath := fs.String("model", "", "Path to model binary file")
mainGpu := flag.Int("main-gpu", 0, "Main GPU") ppath := fs.String("mmproj", "", "Path to projector binary file")
flashAttention := flag.Bool("flash-attn", false, "Enable flash attention") parallel := fs.Int("parallel", 1, "Number of sequences to handle simultaneously")
kvSize := flag.Int("ctx-size", 2048, "Context (or KV cache) size") batchSize := fs.Int("batch-size", 512, "Batch size")
kvCacheType := flag.String("kv-cache-type", "", "quantization type for KV cache (default: f16)") nGpuLayers := fs.Int("n-gpu-layers", 0, "Number of layers to offload to GPU")
port := flag.Int("port", 8080, "Port to expose the server on") mainGpu := fs.Int("main-gpu", 0, "Main GPU")
threads := flag.Int("threads", runtime.NumCPU(), "Number of threads to use during generation") flashAttention := fs.Bool("flash-attn", false, "Enable flash attention")
verbose := flag.Bool("verbose", false, "verbose output (default: disabled)") kvSize := fs.Int("ctx-size", 2048, "Context (or KV cache) size")
noMmap := flag.Bool("no-mmap", false, "do not memory-map model (slower load but may reduce pageouts if not using mlock)") kvCacheType := fs.String("kv-cache-type", "", "quantization type for KV cache (default: f16)")
mlock := flag.Bool("mlock", false, "force system to keep model in RAM rather than swapping or compressing") port := fs.Int("port", 8080, "Port to expose the server on")
tensorSplit := flag.String("tensor-split", "", "fraction of the model to offload to each GPU, comma-separated list of proportions") threads := fs.Int("threads", runtime.NumCPU(), "Number of threads to use during generation")
multiUserCache := flag.Bool("multiuser-cache", false, "optimize input cache algorithm for multiple users") verbose := fs.Bool("verbose", false, "verbose output (default: disabled)")
requirements := flag.Bool("requirements", false, "print json requirement information") noMmap := fs.Bool("no-mmap", false, "do not memory-map model (slower load but may reduce pageouts if not using mlock)")
mlock := fs.Bool("mlock", false, "force system to keep model in RAM rather than swapping or compressing")
tensorSplit := fs.String("tensor-split", "", "fraction of the model to offload to each GPU, comma-separated list of proportions")
multiUserCache := fs.Bool("multiuser-cache", false, "optimize input cache algorithm for multiple users")
var lpaths multiLPath var lpaths multiLPath
flag.Var(&lpaths, "lora", "Path to lora layer file (can be specified multiple times)") fs.Var(&lpaths, "lora", "Path to lora layer file (can be specified multiple times)")
flag.Parse() fs.Usage = func() {
if *requirements { fmt.Fprintf(fs.Output(), "Runner usage\n")
printRequirements(os.Stdout) fs.PrintDefaults()
return }
if err := fs.Parse(args); err != nil {
return err
} }
level := slog.LevelInfo level := slog.LevelInfo
if *verbose { if *verbose {
...@@ -983,7 +988,8 @@ func main() { ...@@ -983,7 +988,8 @@ func main() {
listener, err := net.Listen("tcp", addr) listener, err := net.Listen("tcp", addr)
if err != nil { if err != nil {
fmt.Println("Listen error:", err) fmt.Println("Listen error:", err)
return cancel()
return err
} }
defer listener.Close() defer listener.Close()
...@@ -999,7 +1005,9 @@ func main() { ...@@ -999,7 +1005,9 @@ func main() {
log.Println("Server listening on", addr) log.Println("Server listening on", addr)
if err := httpServer.Serve(listener); err != nil { if err := httpServer.Serve(listener); err != nil {
log.Fatal("server error:", err) log.Fatal("server error:", err)
return err
} }
cancel() cancel()
return nil
} }
package main package runner
import ( import (
"strings" "strings"
......
package main package runner
import ( import (
"reflect" "reflect"
......
...@@ -25,7 +25,6 @@ import ( ...@@ -25,7 +25,6 @@ import (
"golang.org/x/sync/semaphore" "golang.org/x/sync/semaphore"
"github.com/ollama/ollama/api" "github.com/ollama/ollama/api"
"github.com/ollama/ollama/build"
"github.com/ollama/ollama/discover" "github.com/ollama/ollama/discover"
"github.com/ollama/ollama/envconfig" "github.com/ollama/ollama/envconfig"
"github.com/ollama/ollama/format" "github.com/ollama/ollama/format"
...@@ -144,20 +143,13 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter ...@@ -144,20 +143,13 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter
// Loop through potential servers // Loop through potential servers
finalErr := errors.New("no suitable llama servers found") finalErr := errors.New("no suitable llama servers found")
rDir, err := runners.Refresh(build.EmbedFS) availableServers := runners.GetAvailableServers()
if err != nil {
return nil, err
}
availableServers := runners.GetAvailableServers(rDir)
if len(availableServers) == 0 {
return nil, finalErr
}
var servers []string var servers []string
if cpuRunner != "" { if cpuRunner != "" {
servers = []string{cpuRunner} servers = []string{cpuRunner}
} else { } else {
servers = runners.ServersForGpu(gpus[0]) // All GPUs in the list are matching Library and Variant servers = runners.ServersForGpu(gpus[0].RunnerName()) // All GPUs in the list are matching Library and Variant
} }
demandLib := envconfig.LLMLibrary() demandLib := envconfig.LLMLibrary()
if demandLib != "" { if demandLib != "" {
...@@ -167,7 +159,7 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter ...@@ -167,7 +159,7 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter
} else { } else {
slog.Info("user override", "OLLAMA_LLM_LIBRARY", demandLib, "path", serverPath) slog.Info("user override", "OLLAMA_LLM_LIBRARY", demandLib, "path", serverPath)
servers = []string{demandLib} servers = []string{demandLib}
if strings.HasPrefix(demandLib, "cpu") { if strings.HasPrefix(demandLib, "cpu") || (!(runtime.GOOS == "darwin" && runtime.GOARCH == "arm64") && demandLib == runners.BuiltinName()) {
// Omit the GPU flag to silence the warning // Omit the GPU flag to silence the warning
opts.NumGPU = -1 opts.NumGPU = -1
} }
...@@ -279,15 +271,16 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter ...@@ -279,15 +271,16 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter
} }
for i := range servers { for i := range servers {
dir := availableServers[servers[i]] builtin := servers[i] == runners.BuiltinName()
if dir == "" { server := availableServers[servers[i]]
if server == "" {
// Shouldn't happen // Shouldn't happen
finalErr = fmt.Errorf("[%d] server %s not listed in available servers %v", i, servers[i], availableServers) finalErr = fmt.Errorf("[%d] server %s not listed in available servers %v", i, servers[i], availableServers)
slog.Error("server list inconsistent", "error", finalErr) slog.Error("server list inconsistent", "error", finalErr)
continue continue
} }
if strings.HasPrefix(servers[i], "cpu") { if strings.HasPrefix(servers[i], "cpu") || (builtin && !(runtime.GOOS == "darwin" && runtime.GOARCH == "arm64")) {
gpus = discover.GetCPUInfo() gpus = discover.GetCPUInfo()
} }
...@@ -304,14 +297,16 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter ...@@ -304,14 +297,16 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter
slog.Debug("ResolveTCPAddr failed ", "error", err) slog.Debug("ResolveTCPAddr failed ", "error", err)
port = rand.Intn(65535-49152) + 49152 // get a random port in the ephemeral range port = rand.Intn(65535-49152) + 49152 // get a random port in the ephemeral range
} }
finalParams := append(params, "--port", strconv.Itoa(port)) finalParams := []string{"runner"}
finalParams = append(finalParams, params...)
finalParams = append(finalParams, "--port", strconv.Itoa(port))
pathEnv := "LD_LIBRARY_PATH" pathEnv := "LD_LIBRARY_PATH"
if runtime.GOOS == "windows" { if runtime.GOOS == "windows" {
pathEnv = "PATH" pathEnv = "PATH"
} }
// Start with the server directory for the LD_LIBRARY_PATH/PATH // Start with the server directory for the LD_LIBRARY_PATH/PATH
libraryPaths := []string{dir} libraryPaths := []string{filepath.Dir(server)}
if libraryPath, ok := os.LookupEnv(pathEnv); ok { if libraryPath, ok := os.LookupEnv(pathEnv); ok {
// favor our bundled library dependencies over system libraries // favor our bundled library dependencies over system libraries
...@@ -325,22 +320,6 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter ...@@ -325,22 +320,6 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter
libraryPaths = append(gpus[0].DependencyPath, libraryPaths...) libraryPaths = append(gpus[0].DependencyPath, libraryPaths...)
} }
server := filepath.Join(dir, "ollama_llama_server")
if runtime.GOOS == "windows" {
server += ".exe"
}
// Detect tmp cleaners wiping out the file
_, err := os.Stat(server)
if errors.Is(err, os.ErrNotExist) {
slog.Warn("llama server disappeared, reinitializing payloads", "path", server, "error", err)
_, err = runners.Refresh(build.EmbedFS)
if err != nil {
slog.Warn("failed to reinitialize payloads", "error", err)
return nil, err
}
}
// TODO - once fully switched to the Go runner, load the model here for tokenize/detokenize cgo access // TODO - once fully switched to the Go runner, load the model here for tokenize/detokenize cgo access
s := &llmServer{ s := &llmServer{
port: port, port: port,
...@@ -417,7 +396,7 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter ...@@ -417,7 +396,7 @@ func NewLlamaServer(gpus discover.GpuInfoList, model string, ggml *GGML, adapter
if err = s.cmd.Start(); err != nil { if err = s.cmd.Start(); err != nil {
// Detect permission denied and augment the message about noexec // Detect permission denied and augment the message about noexec
if errors.Is(err, os.ErrPermission) { if errors.Is(err, os.ErrPermission) {
finalErr = fmt.Errorf("unable to start server %w. %s may have noexec set. Set OLLAMA_TMPDIR for server to a writable executable directory", err, dir) finalErr = fmt.Errorf("unable to start server %w. %s may have noexec set. Set OLLAMA_TMPDIR for server to a writable executable directory", err, server)
continue continue
} }
msg := "" msg := ""
......
# Build the default runner(s) for the platform which do not rely on 3rd party GPU libraries # Build the discrete cpu runner(s) for the platform which do not rely on 3rd party GPU libraries
# On Mac arm64, this builds the metal runner
# On other platforms this builds the CPU runner(s)
include make/common-defs.make include make/common-defs.make
CPU_GOFLAGS="-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=$(VERSION)\" \"-X=github.com/ollama/ollama/llama.CpuFeatures=$(subst $(space),$(comma),$(TARGET_CPU_FLAGS))\" $(TARGET_LDFLAGS)" CPU_GOFLAGS="-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=$(VERSION)\" \"-X=github.com/ollama/ollama/llama.CpuFeatures=$(subst $(space),$(comma),$(TARGET_CPU_FLAGS))\" $(TARGET_LDFLAGS)"
DEFAULT_RUNNER := $(if $(and $(filter darwin,$(OS)),$(filter arm64,$(ARCH))),metal,cpu)
RUNNERS := $(DEFAULT_RUNNER)
ifeq ($(ARCH),amd64) ifeq ($(ARCH),amd64)
ifeq ($(CUSTOM_CPU_FLAGS),) ifeq ($(origin CUSTOM_CPU_FLAGS),undefined)
RUNNERS += cpu_avx cpu_avx2 RUNNERS = cpu_avx cpu_avx2
endif endif
endif endif
DIST_RUNNERS = $(addprefix $(RUNNERS_DIST_DIR)/,$(addsuffix /ollama_llama_server$(EXE_EXT),$(RUNNERS))) DIST_RUNNERS = $(addprefix $(RUNNERS_DIST_DIR)/,$(addsuffix /ollama_llama_server$(EXE_EXT),$(RUNNERS)))
ifneq ($(OS),windows)
PAYLOAD_RUNNERS = $(addprefix $(RUNNERS_PAYLOAD_DIR)/,$(addsuffix /ollama_llama_server$(EXE_EXT).gz,$(RUNNERS)))
endif
BUILD_RUNNERS = $(addprefix $(RUNNERS_BUILD_DIR)/,$(addsuffix /ollama_llama_server$(EXE_EXT),$(RUNNERS))) BUILD_RUNNERS = $(addprefix $(RUNNERS_BUILD_DIR)/,$(addsuffix /ollama_llama_server$(EXE_EXT),$(RUNNERS)))
all: $(BUILD_RUNNERS) $(DIST_RUNNERS) $(PAYLOAD_RUNNERS) cpu: $(BUILD_RUNNERS)
$(RUNNERS_BUILD_DIR)/$(DEFAULT_RUNNER)/ollama_llama_server$(EXE_EXT): TARGET_CPU_FLAGS=$(CUSTOM_CPU_FLAGS) dist: $(DIST_RUNNERS)
$(RUNNERS_BUILD_DIR)/$(DEFAULT_RUNNER)/ollama_llama_server$(EXE_EXT): *.go ./runner/*.go $(COMMON_SRCS) $(COMMON_HDRS)
@-mkdir -p $(dir $@)
GOARCH=$(ARCH) go build -buildmode=pie $(CPU_GOFLAGS) -trimpath $(if $(CUSTOM_CPU_FLAGS),-tags $(subst $(space),$(comma),$(CUSTOM_CPU_FLAGS))) -o $@ ./runner
$(RUNNERS_BUILD_DIR)/cpu_avx/ollama_llama_server$(EXE_EXT): TARGET_CPU_FLAGS="avx" $(RUNNERS_BUILD_DIR)/cpu_avx/ollama_llama_server$(EXE_EXT): TARGET_CPU_FLAGS="avx"
$(RUNNERS_BUILD_DIR)/cpu_avx/ollama_llama_server$(EXE_EXT): *.go ./runner/*.go $(COMMON_SRCS) $(COMMON_HDRS) $(RUNNERS_BUILD_DIR)/cpu_avx/ollama_llama_server$(EXE_EXT): ./llama/*.go ./llama/runner/*.go $(COMMON_SRCS) $(COMMON_HDRS)
@-mkdir -p $(dir $@) @-mkdir -p $(dir $@)
GOARCH=$(ARCH) go build -buildmode=pie $(CPU_GOFLAGS) -trimpath -tags $(subst $(space),$(comma),$(TARGET_CPU_FLAGS)) -o $@ ./runner GOARCH=$(ARCH) go build -buildmode=pie $(CPU_GOFLAGS) -trimpath -tags $(subst $(space),$(comma),$(TARGET_CPU_FLAGS)) -o $@ ./cmd/runner
$(RUNNERS_BUILD_DIR)/cpu_avx2/ollama_llama_server$(EXE_EXT): TARGET_CPU_FLAGS="avx avx2" $(RUNNERS_BUILD_DIR)/cpu_avx2/ollama_llama_server$(EXE_EXT): TARGET_CPU_FLAGS="avx avx2"
$(RUNNERS_BUILD_DIR)/cpu_avx2/ollama_llama_server$(EXE_EXT): *.go ./runner/*.go $(COMMON_SRCS) $(COMMON_HDRS) $(RUNNERS_BUILD_DIR)/cpu_avx2/ollama_llama_server$(EXE_EXT): ./llama/*.go ./llama/runner/*.go $(COMMON_SRCS) $(COMMON_HDRS)
@-mkdir -p $(dir $@) @-mkdir -p $(dir $@)
GOARCH=$(ARCH) go build -buildmode=pie $(CPU_GOFLAGS) -trimpath -tags $(subst $(space),$(comma),$(TARGET_CPU_FLAGS)) -o $@ ./runner GOARCH=$(ARCH) go build -buildmode=pie $(CPU_GOFLAGS) -trimpath -tags $(subst $(space),$(comma),$(TARGET_CPU_FLAGS)) -o $@ ./cmd/runner
$(RUNNERS_DIST_DIR)/%: $(RUNNERS_BUILD_DIR)/% $(RUNNERS_DIST_DIR)/%: $(RUNNERS_BUILD_DIR)/%
@-mkdir -p $(dir $@) @-mkdir -p $(dir $@)
cp $< $@ cp $< $@
$(RUNNERS_PAYLOAD_DIR)/%/ollama_llama_server$(EXE_EXT).gz: $(RUNNERS_BUILD_DIR)/%/ollama_llama_server$(EXE_EXT) clean:
@-mkdir -p $(dir $@) rm -f $(BUILD_RUNNERS) $(DIST_RUNNERS)
${GZIP} --best -c $< > $@
clean:
rm -f $(BUILD_RUNNERS) $(DIST_RUNNERS) $(PAYLOAD_RUNNERS)
.PHONY: clean all .PHONY: clean cpu dist
# Handy debugging for make variables # Handy debugging for make variables
print-%: print-%:
......
# Build rules for CUDA v11 runner # Build rules for CUDA v11 runner
include make/common-defs.make include make/common-defs.make
include make/cuda-v11-defs.make
GPU_RUNNER_VARIANT := _v11 GPU_RUNNER_VARIANT := _v11
GPU_PATH_ROOT_WIN=$(shell ls -d $(dir $(shell cygpath -m -s "$(CUDA_PATH)\.."))/v11.? 2>/dev/null) GPU_COMPILER=$(CUDA_11_COMPILER)
GPU_PATH_ROOT_LINUX=$(shell ls -d $(CUDA_PATH)-11 2>/dev/null)
CUDA_ARCHITECTURES?=50;52;53;60;61;62;70;72;75;80;86 CUDA_ARCHITECTURES?=50;52;53;60;61;62;70;72;75;80;86
GPU_LIB_DIR = $(CUDA_11_LIB_DIR)
CGO_EXTRA_LDFLAGS = $(CUDA_11_CGO_EXTRA_LDFLAGS)
include make/cuda.make include make/cuda.make
include make/gpu.make include make/gpu.make
\ No newline at end of file
# Build rules for CUDA v12 runner # Build rules for CUDA v12 runner
include make/common-defs.make include make/common-defs.make
include make/cuda-v12-defs.make
GPU_RUNNER_VARIANT := _v12 GPU_RUNNER_VARIANT := _v12
GPU_PATH_ROOT_WIN=$(shell ls -d $(dir $(shell cygpath -m -s "$(CUDA_PATH)\.."))/v12.? 2>/dev/null) GPU_COMPILER=$(CUDA_12_COMPILER)
GPU_PATH_ROOT_LINUX=$(shell ls -d $(CUDA_PATH)-12 2>/dev/null)
CUDA_ARCHITECTURES?=60;61;62;70;72;75;80;86;87;89;90;90a CUDA_ARCHITECTURES?=60;61;62;70;72;75;80;86;87;89;90;90a
GPU_LIB_DIR = $(CUDA_12_LIB_DIR)
CGO_EXTRA_LDFLAGS = $(CUDA_12_CGO_EXTRA_LDFLAGS)
include make/cuda.make include make/cuda.make
include make/gpu.make include make/gpu.make
\ No newline at end of file
# Makefile for building top-level ollama binary
include make/common-defs.make
exe: $(OLLAMA_EXE)
dist_exe dist_ollama: $(DIST_OLLAMA_EXE)
GO_DEPS=$(foreach dir,$(shell go list -deps -f '{{.Dir}}' . ),$(wildcard $(dir)/*.go))
CPU_GOFLAGS="-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=$(VERSION)\" \"-X=github.com/ollama/ollama/llama.CpuFeatures=$(subst $(space),$(comma),$(TARGET_CPU_FLAGS))\" $(EXTRA_GOLDLAGS) $(TARGET_LDFLAGS)"
$(OLLAMA_EXE) $(DIST_OLLAMA_EXE): TARGET_CPU_FLAGS=$(CUSTOM_CPU_FLAGS)
$(OLLAMA_EXE) $(DIST_OLLAMA_EXE): $(COMMON_SRCS) $(COMMON_HDRS) $(GO_DEPS)
GOARCH=$(ARCH) go build -buildmode=pie $(CPU_GOFLAGS) -trimpath $(if $(CUSTOM_CPU_FLAGS),-tags $(subst $(space),$(comma),$(CUSTOM_CPU_FLAGS))) -o $@ .
.PHONY: ollama dist_ollama exe dist_exe
# Handy debugging for make variables
print-%:
@echo '$*=$($*)'
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment