Commits · a0ea067b63ad61016a44c1c7a86bffbfa678035a · OpenDAS / ollama

15 Nov, 2024 1 commit
- build: fix arm container image (#7674) · a0ea067b
  Daniel Hiltgen authored Nov 14, 2024
```
Fix a rebase glitch from the old C++ runner build model
```
  a0ea067b
12 Nov, 2024 1 commit
- Jetpack support for Go server (#7217) · df011054
  Daniel Hiltgen authored Nov 12, 2024
```
This adds support for the Jetson JetPack variants into the Go runner
```
  df011054
30 Oct, 2024 1 commit

Remove submodule and shift to Go server - 0.4.0 (#7157) · b754f5a6

Daniel Hiltgen authored Oct 30, 2024

* Remove llama.cpp submodule and shift new build to top

* CI: install msys and clang gcc on win

Needed for deepseek to work properly on windows

b754f5a6

27 Oct, 2024 1 commit
- Bump to latest Go 1.22 patch (#7379) · abd5dfd0
  Daniel Hiltgen authored Oct 26, 2024
  
  abd5dfd0
08 Oct, 2024 1 commit

Re-introduce the `llama` package (#5034) · 96efd905

Jeffrey Morgan authored Oct 08, 2024

* Re-introduce the llama package

This PR brings back the llama package, making it possible to call llama.cpp and
ggml APIs from Go directly via CGo. This has a few advantages:

- C APIs can be called directly from Go without needing to use the previous
  "server" REST API
- On macOS and for CPU builds on Linux and Windows, Ollama can be built without
  a go generate ./... step, making it easy to get up and running to hack on
  parts of Ollama that don't require fast inference
- Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners
  takes <5 min on a fast CPU)
- No git submodule making it easier to clone and build from source

This is a big PR, but much of it is vendor code except for:

- llama.go CGo bindings
- example/: a simple example of running inference
- runner/: a subprocess server designed to replace the llm/ext_server package
- Makefile an as minimal as possible Makefile to build the runner package for
  different...

96efd905

12 Sep, 2024 1 commit

Optimize container images for startup (#6547) · cd5c8f64

Daniel Hiltgen authored Sep 12, 2024

* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release

cd5c8f64

10 Sep, 2024 1 commit

Quiet down dockers new lint warnings (#6716) · 4a8069f9

Daniel Hiltgen authored Sep 09, 2024

* Quiet down dockers new lint warnings

Docker has recently added lint warnings to build.  This cleans up those warnings.

* Fix go lint regression

4a8069f9

03 Sep, 2024 1 commit
- Reduce docker image size (#5847) · 9df5f0e8
  R0CKSTAR authored Sep 04, 2024
```
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
```
  9df5f0e8
20 Aug, 2024 1 commit

Split rocm back out of bundle (#6432) · a017cf2f

Daniel Hiltgen authored Aug 20, 2024

We're over budget for github's maximum release artifact size with rocm + 2 cuda
versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can
be extracted into the same location as the main bundle.

a017cf2f

19 Aug, 2024 7 commits
- Adjust layout to bin+lib/ollama · 88bb9e33
  Daniel Hiltgen authored Aug 14, 2024
  
  88bb9e33
- Remove Jetpack · 3b19cdba
  Daniel Hiltgen authored Aug 13, 2024
  
  3b19cdba
- Enable cuda v12 flags · f6c811b3
  Daniel Hiltgen authored Jul 12, 2024
  
  f6c811b3
- Add cuda v12 variant and selection logic · 4fe3a556
  Daniel Hiltgen authored Jun 13, 2024
```
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
```
  4fe3a556
- Add Jetson cuda variants for arm · d470ebe7
  Daniel Hiltgen authored May 30, 2024
```
This adds new variants for arm64 specific to Jetson platforms
```
  d470ebe7
- Wire up ccache and pigz in the docker based build · c7bcb003
  Daniel Hiltgen authored Aug 09, 2024
```
This should help speed things up a little
```
  c7bcb003
- Refactor linux packaging · 74d45f01
  Daniel Hiltgen authored Jul 08, 2024
```
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
```
  74d45f01
22 Jul, 2024 1 commit
- Bump Go patch version · 5d604eec
  Daniel Hiltgen authored Jul 22, 2024
  
  5d604eec
17 Jul, 2024 1 commit
- bump go version to 1.22.5 to fix security vulnerabilities · f02f8366
  lreed authored Jul 17, 2024
  
  f02f8366
15 Jul, 2024 1 commit
- Bump linux ROCm to 6.1.2 · 224337b3
  Daniel Hiltgen authored Jul 15, 2024
  
  224337b3
02 Jul, 2024 1 commit

Switch amd container image base to rocky 8 · 020bd60a

Daniel Hiltgen authored Jul 02, 2024

The centos 7 arm mirrors have disappeared due to the EOL 2 days
ago, and the vault sed workaround which works for x86 doesn't work for arm.

020bd60a

14 Jun, 2024 1 commit
- Bump ROCm linux to 6.1.1 · 26ab6773
  Daniel Hiltgen authored Jun 06, 2024
  
  26ab6773
17 Apr, 2024 2 commits
- rearranged conditional logic for static build, dockerfile updated · 8aec92fa
  Jeremy authored Apr 17, 2024
  
  8aec92fa
- move static build to its own flag · 70261b9b
  Jeremy authored Apr 17, 2024
  
  70261b9b
11 Apr, 2024 1 commit
- Fix rocm deps with new subprocess paths · c2d813bd
  Daniel Hiltgen authored Apr 11, 2024
  
  c2d813bd
01 Apr, 2024 1 commit

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

28 Mar, 2024 1 commit
- Bump ROCm to 6.0.2 patch release · c91a4ebc
  Daniel Hiltgen authored Mar 27, 2024
  
  c91a4ebc
26 Mar, 2024 2 commits
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
- Revert "Switch arm cuda base image to centos 7" · e0319bd7
  Daniel Hiltgen authored Mar 25, 2024
```
This reverts commit 5dacc1eb.
```
  e0319bd7
25 Mar, 2024 1 commit

Switch arm cuda base image to centos 7 · 5dacc1eb

Daniel Hiltgen authored Mar 25, 2024

We had started using rocky linux 8, but they've updated to GCC 10.3,
which breaks NVCC.  10.2 is compatible (or 10.4, but that's not
available from rocky linux 8 repos yet)

5dacc1eb

21 Mar, 2024 1 commit
- doc: faq gpu compatibility (#3142) · a5ba0fcf
  Bruce MacDonald authored Mar 21, 2024
  
  a5ba0fcf
15 Mar, 2024 1 commit
- Wire up more complete CI for releases · 540f4af4
  Daniel Hiltgen authored Mar 07, 2024
```
Flesh out our github actions CI so we can build official releaes.
```
  540f4af4
11 Mar, 2024 1 commit
- use `-trimpath` when building releases (#3069) · b5fcd9d3
  Jeffrey Morgan authored Mar 11, 2024
  
  b5fcd9d3
10 Mar, 2024 1 commit
- Rename ROCm deps file to avoid confusion (#3025) · 82ca694d
  Daniel Hiltgen authored Mar 09, 2024
  
  82ca694d
07 Mar, 2024 2 commits

Revamp ROCm support · 6c5ccb11

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11

update go to 1.22 in other places (#2975) · d481fb3c
Jeffrey Morgan authored Mar 07, 2024

d481fb3c

29 Feb, 2024 1 commit
- Add env var so podman will map cuda GPUs · 794a916a
  Daniel Hiltgen authored Feb 29, 2024
```
Without this env var, podman's GPU logic doesn't map the GPU through
```
  794a916a
26 Jan, 2024 2 commits

Add back ROCm container support · 75c44aa3
Daniel Hiltgen authored Jan 25, 2024
```
This adds ROCm support back as a discrete image.
```
75c44aa3

Switch back to ubuntu base · a34e1ad3

Daniel Hiltgen authored Jan 25, 2024

The size increase for rocm support in the standard image is problematic
We'll revisit multiple tags for rocm support in a follow up PR.

a34e1ad3

21 Jan, 2024 2 commits

Make CPU builds parallel and customizable AMD GPUs · df54c723

Daniel Hiltgen authored Jan 21, 2024

The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.

df54c723

Combine the 2 Dockerfiles and add ROCm · da72235e

Daniel Hiltgen authored Jan 21, 2024

This renames Dockerfile.build to Dockerfile, and adds some new stages
to support 2 modes of building - the build_linux.sh script uses
intermediate stages to extract the artifacts for ./dist, and the default
build generates a container image usable by both cuda and rocm cards.
This required transitioniing the x86 base to the rocm image to avoid
layer bloat.

da72235e