Commits · 88bb9e332877dfbba40030c19570fdbe00f41a21 · OpenDAS / ollama

19 Aug, 2024 3 commits

Adjust layout to bin+lib/ollama · 88bb9e33
Daniel Hiltgen authored Aug 14, 2024

88bb9e33
Add windows cuda v12 + v11 support · 927d98a6
Daniel Hiltgen authored Jul 12, 2024

927d98a6

Daniel Hiltgen authored Jul 08, 2024

This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.

74d45f01

20 Jul, 2024 1 commit

Adjust windows ROCm discovery · 283948c8

Daniel Hiltgen authored Jul 19, 2024

The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery. The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.

283948c8

10 Jul, 2024 1 commit

Bump ROCm on windows to 6.1.2 · 1f50356e

Daniel Hiltgen authored Jul 10, 2024

This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.

1f50356e

08 Jul, 2024 1 commit
- Workaround broken ROCm p2p copy · 0bacb300
  Daniel Hiltgen authored Jul 05, 2024
```
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
```
  0bacb300
06 Jul, 2024 2 commits
- llm: statically link pthread and stdc++ dependencies in windows build · f1a379aa
  jmorganca authored Jul 06, 2024
  
  f1a379aa
- llm: add `GGML_STATIC` flag to windows static lib · 9ae14699
  jmorganca authored Jul 06, 2024
  
  9ae14699
05 Jul, 2024 1 commit
- update llama.cpp submodule to `d7fd29f` (#5475) · 8f8e736b
  Jeffrey Morgan authored Jul 05, 2024
  
  8f8e736b
17 Jun, 2024 4 commits

Add back lower level parallel flags · b0930626

Daniel Hiltgen authored Jun 17, 2024

nvcc supports parallelism (threads) and cmake + make can use -j,
while msbuild requires /p:CL_MPcount=8

b0930626

Revert "More parallelism on windows generate" · e890be48
Daniel Hiltgen authored Jun 17, 2024
```
This reverts commit 0577af98.
```
e890be48

Move libraries out of users path · b2799f11

Daniel Hiltgen authored Jun 15, 2024

We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.

b2799f11

llm: update llama.cpp commit to `7c26775` (#4896) · 152fc202

Jeffrey Morgan authored Jun 17, 2024

* llm: update llama.cpp submodule to `7c26775`

* disable `LLAMA_BLAS` for now

* `-DLLAMA_OPENMP=off`

152fc202

15 Jun, 2024 1 commit
- More parallelism on windows generate · 0577af98
  Daniel Hiltgen authored Jun 13, 2024
```
Make the build faster
```
  0577af98
07 Jun, 2024 1 commit

Add ability to skip oneapi generate · ab8c929e

Daniel Hiltgen authored Jun 07, 2024

This follows the same pattern for cuda and rocm to allow
disabling the build even when we detect the dependent libraries

ab8c929e

24 May, 2024 1 commit
- support ollama run on Intel GPUs · fd5971be
  Wang,Zhe authored May 24, 2024
  
  fd5971be
27 Apr, 2024 2 commits
- Do not build AVX runners on ARM64 · 8a65717f
  Hernan Martinez authored Apr 26, 2024
  
  8a65717f
- Use architecture specific folders in the generate script · b438d485
  Hernan Martinez authored Apr 26, 2024
  
  b438d485
26 Apr, 2024 5 commits
- Fine grain control over windows generate steps · e4859c45
  Daniel Hiltgen authored Apr 26, 2024
```
This will speed up CI which already tries to only build static for unit tests
```
  e4859c45
- Fix target in gen_windows.ps1 · ed5fb088
  Daniel Hiltgen authored Apr 26, 2024
  
  ed5fb088
- Put back non-avx CPU build for windows · 421c878a
  Daniel Hiltgen authored Apr 26, 2024
  
  421c878a
- Refactor windows generate for more modular usage · 8671fded
  Daniel Hiltgen authored Apr 25, 2024
  
  8671fded
- Move cuda/rocm dependency gathering into generate script · 8feb97dc
  Daniel Hiltgen authored Apr 25, 2024
```
This will make it simpler for CI to accumulate artifacts from prior steps
```
  8feb97dc
23 Apr, 2024 1 commit

Move nested payloads to installer and zip file on windows · 058f6cd2

Daniel Hiltgen authored Apr 23, 2024

Now that the llm runner is an executable and not just a dll, more users are facing
problems with security policy configurations on windows that prevent users
writing to directories and then executing binaries from the same location.
This change removes payloads from the main executable on windows and shifts them
over to be packaged in the installer and discovered based on the executables location.
This also adds a new zip file for people who want to "roll their own" installation model.

058f6cd2

21 Apr, 2024 1 commit
- Update gen_windows.ps1 · 9c0db4cc
  Jeremy authored Apr 21, 2024
```
Fixed improper env references
```
  9c0db4cc
18 Apr, 2024 2 commits
- Update gen_windows.ps1 · 6f18297b
  Jeremy authored Apr 18, 2024
```
Forgot a " on the write-host
```
  6f18297b
- Update gen_windows.ps1 · 15016413
  Jeremy authored Apr 18, 2024
```
Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS to customize GPU builds on Windows
```
  15016413
09 Apr, 2024 2 commits

Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) · 1524f323
Blake Mizerany authored Apr 09, 2024

1524f323

build.go: introduce a friendlier way to build Ollama (#3548) · fccf3eec

Blake Mizerany authored Apr 09, 2024

This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.

This script also provides nicer feedback to the user about what is
happening during the build process.

At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).

fccf3eec

07 Apr, 2024 1 commit
- update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to... · 63efa075
  Jeffrey Morgan authored Apr 07, 2024
```
update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528)
```
  63efa075
04 Apr, 2024 2 commits
- Fail fast if mingw missing on windows · 36bd9677
  Daniel Hiltgen authored Apr 04, 2024
  
  36bd9677
- fix dll compress in windows building · 4de01267
  mofanke authored Apr 04, 2024
  
  4de01267
01 Apr, 2024 1 commit

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

26 Mar, 2024 1 commit
- remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350) · 856b8ec1
  Jeffrey Morgan authored Mar 26, 2024
  
  856b8ec1
15 Mar, 2024 2 commits
- Add Radeon gfx940-942 GPU support · d4c10df2
  Daniel Hiltgen authored Mar 15, 2024
  
  d4c10df2
- Wire up more complete CI for releases · 540f4af4
  Daniel Hiltgen authored Mar 07, 2024
```
Flesh out our github actions CI so we can build official releaes.
```
  540f4af4
12 Mar, 2024 1 commit
- Adapt our build for imported server.cpp · 85129d3a
  Daniel Hiltgen authored Mar 12, 2024
  
  85129d3a
07 Mar, 2024 1 commit

Revamp ROCm support · 6c5ccb11

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11

21 Feb, 2024 1 commit
- reset with `init_vars` ahead of each cpu build in `gen_windows.ps1` (#2654) · efe040f8
  Jeffrey Morgan authored Feb 21, 2024
  
  efe040f8
16 Feb, 2024 1 commit
- Fix duplicate menus on update and exit on signals · df6dc4fd
  Daniel Hiltgen authored Feb 16, 2024
```
Also fixes a few fit-and-finish items for better developer experience
```
  df6dc4fd