Commits · fda0d3be5224b59a4b4b031e18c89adca71657ed · OpenDAS / ollama

12 Sep, 2024 2 commits

Use GOARCH for build dirs (#6779) · fda0d3be
Daniel Hiltgen authored Sep 12, 2024
```
Corrects x86_64 vs amd64 discrepancy
```
fda0d3be

Optimize container images for startup (#6547) · cd5c8f64

Daniel Hiltgen authored Sep 12, 2024

* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release

cd5c8f64

23 Aug, 2024 1 commit
- llm: Align cmake define for cuda no peer copy (#6455) · 0b03b9c3
  Daniel Hiltgen authored Aug 23, 2024
```
Define changed recently and this slipped through the cracks with the old
name.
```
  0b03b9c3
20 Aug, 2024 1 commit

Split rocm back out of bundle (#6432) · a017cf2f

Daniel Hiltgen authored Aug 20, 2024

We're over budget for github's maximum release artifact size with rocm + 2 cuda
versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can
be extracted into the same location as the main bundle.

a017cf2f

19 Aug, 2024 4 commits

Adjust layout to bin+lib/ollama · 88bb9e33
Daniel Hiltgen authored Aug 14, 2024

88bb9e33
Add Jetson cuda variants for arm · d470ebe7
Daniel Hiltgen authored May 30, 2024
```
This adds new variants for arm64 specific to Jetson platforms
```
d470ebe7
Wire up ccache and pigz in the docker based build · c7bcb003
Daniel Hiltgen authored Aug 09, 2024
```
This should help speed things up a little
```
c7bcb003

Refactor linux packaging · 74d45f01

Daniel Hiltgen authored Jul 08, 2024

This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.

74d45f01

11 Jul, 2024 1 commit
- llm: dont link cuda with compat libs (#5621) · efbf41ed
  Jeffrey Morgan authored Jul 10, 2024
  
  efbf41ed
10 Jul, 2024 1 commit
- remove `GGML_CUDA_FORCE_MMQ=on` from build (#5588) · 4e262eb2
  Jeffrey Morgan authored Jul 10, 2024
  
  4e262eb2
08 Jul, 2024 1 commit
- Workaround broken ROCm p2p copy · 0bacb300
  Daniel Hiltgen authored Jul 05, 2024
```
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
```
  0bacb300
06 Jul, 2024 2 commits
- llm: add `-DBUILD_SHARED_LIBS=off` to common cpu cmake flags (#5520) · 4607c706
  Jeffrey Morgan authored Jul 06, 2024
  
  4607c706
- llm: fix missing dylibs by restoring old build behavior on Linux and macOS (#5511) · 2cc854f8
  Jeffrey Morgan authored Jul 05, 2024
```
* Revert "fix cmake build (#5505)"

This reverts commit 4fd5f352.

* llm: fix missing dylibs by restoring old build behavior

* crlf -> lf
```
  2cc854f8
05 Jul, 2024 1 commit
- update llama.cpp submodule to `d7fd29f` (#5475) · 8f8e736b
  Jeffrey Morgan authored Jul 05, 2024
  
  8f8e736b
17 Jun, 2024 2 commits

Add back lower level parallel flags · b0930626

Daniel Hiltgen authored Jun 17, 2024

nvcc supports parallelism (threads) and cmake + make can use -j,
while msbuild requires /p:CL_MPcount=8

b0930626

llm: update llama.cpp commit to `7c26775` (#4896) · 152fc202

Jeffrey Morgan authored Jun 17, 2024

* llm: update llama.cpp submodule to `7c26775`

* disable `LLAMA_BLAS` for now

* `-DLLAMA_OPENMP=off`

152fc202

07 Jun, 2024 1 commit

Add ability to skip oneapi generate · ab8c929e

Daniel Hiltgen authored Jun 07, 2024

This follows the same pattern for cuda and rocm to allow
disabling the build even when we detect the dependent libraries

ab8c929e

24 May, 2024 1 commit
- support ollama run on Intel GPUs · fd5971be
  Wang,Zhe authored May 24, 2024
  
  fd5971be
15 May, 2024 1 commit
- Port cuda/rocm skip build vars to linux · c48c1d7c
  Daniel Hiltgen authored May 15, 2024
```
Windows already implements these, carry over to linux.
```
  c48c1d7c
25 Apr, 2024 1 commit
- Remove trailing spaces (#3889) · 5f73c087
  Roy Yang authored Apr 25, 2024
  
  5f73c087
18 Apr, 2024 1 commit

Update gen_linux.sh · 440b7190

Jeremy authored Apr 18, 2024

Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS instead of OLLAMA_CUSTOM_GPU_DEFS

440b7190

17 Apr, 2024 4 commits
- add support for custom gpu build flags for llama.cpp · 52f5370c
  Jeremy authored Apr 17, 2024
  
  52f5370c
- adds support for OLLAMA_CUSTOM_GPU_DEFS to customize GPU build flags · 7c000ec3
  Jeremy authored Apr 17, 2024
  
  7c000ec3
- rearranged conditional logic for static build, dockerfile updated · 8aec92fa
  Jeremy authored Apr 17, 2024
  
  8aec92fa
- move static build to its own flag · 70261b9b
  Jeremy authored Apr 17, 2024
  
  70261b9b
09 Apr, 2024 2 commits

Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) · 1524f323
Blake Mizerany authored Apr 09, 2024

1524f323

build.go: introduce a friendlier way to build Ollama (#3548) · fccf3eec

Blake Mizerany authored Apr 09, 2024

This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.

This script also provides nicer feedback to the user about what is
happening during the build process.

At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).

fccf3eec

07 Apr, 2024 1 commit
- update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to... · 63efa075
  Jeffrey Morgan authored Apr 07, 2024
```
update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528)
```
  63efa075
01 Apr, 2024 1 commit

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

25 Mar, 2024 1 commit
- add support for libcudart.so for CUDA devices (adds Jetson support) · dfc6721b
  Jeremy authored Mar 25, 2024
  
  dfc6721b
15 Mar, 2024 1 commit
- Add Radeon gfx940-942 GPU support · d4c10df2
  Daniel Hiltgen authored Mar 15, 2024
  
  d4c10df2
11 Mar, 2024 1 commit

Avoid rocm runner and dependency clash · bc13da2b

Daniel Hiltgen authored Mar 11, 2024

Putting the rocm symlink next to the runners is risky.  This moves
the payloads into a subdir to avoid potential clashes.

bc13da2b

10 Mar, 2024 1 commit
- Harden for deps file being empty (or short) · 3dc1bb6a
  Daniel Hiltgen authored Mar 10, 2024
  
  3dc1bb6a
07 Mar, 2024 1 commit

Revamp ROCm support · 6c5ccb11

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11

12 Feb, 2024 1 commit

Detect AMD GPU info via sysfs and block old cards · 6d84f075

Daniel Hiltgen authored Feb 11, 2024

This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.

6d84f075

25 Jan, 2024 1 commit
- Update gen_linux.sh to find libcudart in separate directory · a4564232
  mraiser authored Jan 25, 2024
  
  a4564232
21 Jan, 2024 1 commit

Make CPU builds parallel and customizable AMD GPUs · df54c723

Daniel Hiltgen authored Jan 21, 2024

The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.

df54c723

20 Jan, 2024 2 commits
- Add compute capability 5.0, 7.5, and 8.0 · a447a083
  Daniel Hiltgen authored Jan 20, 2024
  
  a447a083
- Add support for CUDA 5.2 cards · 681a9149
  Daniel Hiltgen authored Jan 20, 2024
  
  681a9149
17 Jan, 2024 1 commit
- Add multiple CPU variants for Intel Mac · 1b249748
  Daniel Hiltgen authored Jan 12, 2024
```
This also refines the build process for the ext_server build.
```
  1b249748