Commits · 8d1995c625e7f2ed2ff98eb099e1bd8d7e6e133e · OpenDAS / ollama

17 Apr, 2024 2 commits
- rearranged conditional logic for static build, dockerfile updated · 8aec92fa
  Jeremy authored Apr 17, 2024
  
  8aec92fa
- move static build to its own flag · 70261b9b
  Jeremy authored Apr 17, 2024
  
  70261b9b
09 Apr, 2024 2 commits

Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) · 1524f323
Blake Mizerany authored Apr 09, 2024

1524f323

build.go: introduce a friendlier way to build Ollama (#3548) · fccf3eec

Blake Mizerany authored Apr 09, 2024

This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.

This script also provides nicer feedback to the user about what is
happening during the build process.

At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).

fccf3eec

07 Apr, 2024 1 commit
- update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to... · 63efa075
  Jeffrey Morgan authored Apr 07, 2024
```
update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528)
```
  63efa075
01 Apr, 2024 1 commit

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

25 Mar, 2024 1 commit
- add support for libcudart.so for CUDA devices (adds Jetson support) · dfc6721b
  Jeremy authored Mar 25, 2024
  
  dfc6721b
15 Mar, 2024 1 commit
- Add Radeon gfx940-942 GPU support · d4c10df2
  Daniel Hiltgen authored Mar 15, 2024
  
  d4c10df2
11 Mar, 2024 1 commit

Avoid rocm runner and dependency clash · bc13da2b

Daniel Hiltgen authored Mar 11, 2024

Putting the rocm symlink next to the runners is risky.  This moves
the payloads into a subdir to avoid potential clashes.

bc13da2b

10 Mar, 2024 1 commit
- Harden for deps file being empty (or short) · 3dc1bb6a
  Daniel Hiltgen authored Mar 10, 2024
  
  3dc1bb6a
07 Mar, 2024 1 commit

Revamp ROCm support · 6c5ccb11

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11

12 Feb, 2024 1 commit

Detect AMD GPU info via sysfs and block old cards · 6d84f075

Daniel Hiltgen authored Feb 11, 2024

This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.

6d84f075

25 Jan, 2024 1 commit
- Update gen_linux.sh to find libcudart in separate directory · a4564232
  mraiser authored Jan 25, 2024
  
  a4564232
21 Jan, 2024 1 commit

Make CPU builds parallel and customizable AMD GPUs · df54c723

Daniel Hiltgen authored Jan 21, 2024

The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.

df54c723

20 Jan, 2024 2 commits
- Add compute capability 5.0, 7.5, and 8.0 · a447a083
  Daniel Hiltgen authored Jan 20, 2024
  
  a447a083
- Add support for CUDA 5.2 cards · 681a9149
  Daniel Hiltgen authored Jan 20, 2024
  
  681a9149
17 Jan, 2024 1 commit
- Add multiple CPU variants for Intel Mac · 1b249748
  Daniel Hiltgen authored Jan 12, 2024
```
This also refines the build process for the ext_server build.
```
  1b249748
16 Jan, 2024 1 commit

Bump llama.cpp to b1842 and add new cuda lib dep · 795674dd

Daniel Hiltgen authored Jan 10, 2024

Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it.  This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.

795674dd

14 Jan, 2024 1 commit
- Let gpu.go and gen_linux.sh also find CUDA on Arch Linux · f4bf1d51
  Alexander F. Rødseth authored Jan 14, 2024
  
  f4bf1d51
12 Jan, 2024 1 commit
- improve cuda detection (rel. issue #1704) · 905862e1
  Fabian Preiss authored Jan 09, 2024
  
  905862e1
11 Jan, 2024 3 commits

Always dynamically load the llm server library · 39928a42

Daniel Hiltgen authored Jan 09, 2024

This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform

39928a42

Build multiple CPU variants and pick the best · d88c527b

Daniel Hiltgen authored Jan 07, 2024

This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available

d88c527b

Support multiple variants for a given llm lib type · 8da7bef0

Daniel Hiltgen authored Jan 05, 2024

In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.

8da7bef0

04 Jan, 2024 3 commits

Code shuffle to clean up the llm dir · 77d96da9
Daniel Hiltgen authored Jan 04, 2024

77d96da9

update cmake flags for `amd64` macOS (#1780) · 29340c2e

Jeffrey Morgan authored Jan 03, 2024

* update cmake flags for intel macOS

* remove `LLAMA_K_QUANTS`

* put back `CMAKE_OSX_DEPLOYMENT_TARGET` and disable `LLAMA_F16C`

29340c2e

Fix CPU only builds · ddbfa6fe

Daniel Hiltgen authored Jan 03, 2024

Go embed doesn't like when there's no matching files, so put
a dummy placeholder in to allow building without any GPU support
If no "server" library is found, it's safely ignored at runtime.

ddbfa6fe

03 Jan, 2024 1 commit

Improve maintainability of Radeon card list · 16f4603b

Daniel Hiltgen authored Jan 03, 2024

This moves the list of AMD GPUs to an easier to maintain list which
should make it easier to update over time.

16f4603b

02 Jan, 2024 2 commits

Switch windows build to fully dynamic · d966b730

Daniel Hiltgen authored Dec 23, 2023

Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.

d966b730

Refactor how we augment llama.cpp · 9a70aecc

Daniel Hiltgen authored Dec 22, 2023

This changes the model for llama.cpp inclusion so we're not applying a patch,
but instead have the C++ code directly in the ollama tree, which should make it
easier to refine and update over time.

9a70aecc

22 Dec, 2023 1 commit
- Remove CPU build, fixup linux build script · fa24e73b
  Daniel Hiltgen authored Dec 21, 2023
  
  fa24e73b
20 Dec, 2023 1 commit

Revamp the dynamic library shim · 7555ea44

Daniel Hiltgen authored Dec 20, 2023

This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.

This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.

7555ea44

19 Dec, 2023 3 commits
- Refine build to support CPU only · 1b991d0b
  Daniel Hiltgen authored Dec 13, 2023
```
If someone checks out the ollama repo and doesn't install the CUDA
library, this will ensure they can build a CPU only version
```
  1b991d0b
- Adapted rocm support to cgo based llama.cpp · 35934b2e
  Daniel Hiltgen authored Nov 29, 2023
  
  35934b2e
- Add cgo implementation for llama.cpp · d4cd6957
  Daniel Hiltgen authored Nov 13, 2023
```
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
```
  d4cd6957