Commits · 4879a234c4bd3f2bbc99d9b09c44bd99fc337679 · OpenDAS / ollama

"vscode:/vscode.git/clone" did not exist on "709b4439bead016bef8f3af4b68cfc3c6429af33"

10 Dec, 2024 1 commit

build: Make target improvements (#7499) · 4879a234

Daniel Hiltgen authored Dec 10, 2024

* llama: wire up builtin runner

This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build.  After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.

* build: Make target improvements

Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.

* Support customized CPU flags for runners

This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash.  If the user builds a customized set, we omit the naming
scheme and don't check for compatibility.  This avoids checking
requirements at runtime, so that logic has been removed as well.  This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.

* Use relative paths

If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.

* Remove payloads from main binary

* install: clean up prior libraries

This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.

4879a234

12 Nov, 2024 1 commit
- Jetpack support for Go server (#7217) · df011054
  Daniel Hiltgen authored Nov 12, 2024
```
This adds support for the Jetson JetPack variants into the Go runner
```
  df011054
02 Nov, 2024 1 commit

nvidia libs have inconsistent ordering (#7473) · 29ab9fa7

Daniel Hiltgen authored Nov 02, 2024

The runtime and management libraries may not always have
identical ordering, so use the device UUID to correlate instead of ID.

29ab9fa7

17 Oct, 2024 1 commit
- Rename gpu package discover (#7143) · 05cd82ef
  Daniel Hiltgen authored Oct 16, 2024
```
Cleaning up go package naming
```
  05cd82ef
15 Oct, 2024 1 commit

Discovery CPU details for default thread selection (#6264) · 24636dfa

Daniel Hiltgen authored Oct 15, 2024

On windows, detect large multi-socket systems and reduce to the number of cores
in one socket for best performance

24636dfa

14 Oct, 2024 1 commit
- Track GPU discovery failure information (#5820) · f3c8b898
  Daniel Hiltgen authored Oct 14, 2024
```
* Expose GPU discovery failure information

* Remove exposed API for now
```
  f3c8b898
21 Sep, 2024 1 commit

Fix missing dep path on windows CPU runners (#6884) · 6c2eb73a

Daniel Hiltgen authored Sep 21, 2024

GPUs handled the dependency path properly, but CPU runners didn't which
results in missing vc redist libraries on systems where the user didn't
already have it installed from some other app.

6c2eb73a

12 Sep, 2024 1 commit

Optimize container images for startup (#6547) · cd5c8f64

Daniel Hiltgen authored Sep 12, 2024

* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release

cd5c8f64

27 Aug, 2024 1 commit
- Move ollama executable out of bin dir (#6535) · 93ea9240
  Daniel Hiltgen authored Aug 27, 2024
  
  93ea9240
23 Aug, 2024 1 commit

gpu: Ensure driver version set before variant (#6480) · 7a1e1c1c

Daniel Hiltgen authored Aug 23, 2024

During rebasing, the ordering was inverted causing the cuda version
selection logic to break, with driver version being evaluated as zero
incorrectly causing a downgrade to v11.

7a1e1c1c

19 Aug, 2024 5 commits

Review comments · f9e31da9
Daniel Hiltgen authored Aug 15, 2024

f9e31da9
Adjust layout to bin+lib/ollama · 88bb9e33
Daniel Hiltgen authored Aug 14, 2024

88bb9e33
Add cuda v12 variant and selection logic · 4fe3a556
Daniel Hiltgen authored Jun 13, 2024
```
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
```
4fe3a556
Add Jetson cuda variants for arm · d470ebe7
Daniel Hiltgen authored May 30, 2024
```
This adds new variants for arm64 specific to Jetson platforms
```
d470ebe7

Refactor linux packaging · 74d45f01

Daniel Hiltgen authored Jul 08, 2024

This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.

74d45f01

09 Aug, 2024 1 commit
- Harden intel boostrap for nil pointers · 5bca2e60
  Daniel Hiltgen authored Aug 09, 2024
  
  5bca2e60
02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
22 Jul, 2024 3 commits
- string · e2c3f6b3
  Michael Yang authored Jul 03, 2024
  
  e2c3f6b3
- bool · 55cd3ddc
  Michael Yang authored Jul 03, 2024
  
  55cd3ddc
- rfc: dynamic environ lookup · 35b89b2e
  Michael Yang authored Jul 03, 2024
  
  35b89b2e
11 Jul, 2024 1 commit

llm: avoid loading model if system memory is too small (#5637) · c4cf8ad5

Jeffrey Morgan authored Jul 11, 2024



* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

c4cf8ad5

09 Jul, 2024 1 commit

Detect CUDA OS Overhead · f6f759fc

Daniel Hiltgen authored Jul 09, 2024

This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.

f6f759fc

03 Jul, 2024 1 commit

Better nvidia GPU discovery logging · ef757da2

Daniel Hiltgen authored Jul 03, 2024

Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.

ef757da2

19 Jun, 2024 2 commits
- Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)"" · d34d88e4
  Daniel Hiltgen authored Jun 19, 2024
```
This reverts commit 755b4e4f.
```
  d34d88e4
- Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)" · 755b4e4f
  Wang,Zhe authored Jun 19, 2024
```
This reverts commit 163cd3e7.
```
  755b4e4f
17 Jun, 2024 2 commits

Move libraries out of users path · b2799f11

Daniel Hiltgen authored Jun 15, 2024

We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.

b2799f11

gpu: add env var for detecting Intel oneapi gpus (#5076) · 163cd3e7
Jeffrey Morgan authored Jun 16, 2024
```
* gpu: add env var for detecting intel oneapi gpus

* fix build error
```
163cd3e7

14 Jun, 2024 7 commits
- review comments and coverage · 6f351bf5
  Daniel Hiltgen authored Jun 05, 2024
  
  6f351bf5
- Refine CPU load behavior with system memory visibility · fc37c192
  Daniel Hiltgen authored Jun 03, 2024
  
  fc37c192
- Reintroduce nvidia nvml library for windows · 434dfe30
  Daniel Hiltgen authored Jun 03, 2024
```
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
```
  434dfe30
- Refactor intel gpu discovery · 4e2b7e18
  Daniel Hiltgen authored May 29, 2024
  
  4e2b7e18
- Improve multi-gpu handling at the limit · 6fd04ca9
  Daniel Hiltgen authored May 18, 2024
```
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
```
  6fd04ca9
- Refine GPU discovery to bootstrap once · 43ed358f
  Daniel Hiltgen authored May 15, 2024
```
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
```
  43ed358f
- Revert "Limit GPU lib search for now (#4777)" · efac4886
  Daniel Hiltgen authored Jun 03, 2024
```
This reverts commit 476fb8e8.
```
  efac4886
13 Jun, 2024 1 commit
- Actually skip PhysX on windows · aac36763
  Daniel Hiltgen authored Jun 13, 2024
  
  aac36763
04 Jun, 2024 1 commit
- lint linux · bf7edb0d
  Michael Yang authored May 22, 2024
  
  bf7edb0d
02 Jun, 2024 1 commit
- Limit GPU lib search for now (#4777) · 476fb8e8
  Jeffrey Morgan authored Jun 01, 2024
```
* fix oneapi errors on windows 10
```
  476fb8e8
24 May, 2024 2 commits
- Move envconfig and consolidate env vars (#4608) · 4cc3be30
  Patrick Devine authored May 24, 2024
  
  4cc3be30
- support ollama run on Intel GPUs · fd5971be
  Wang,Zhe authored May 24, 2024
  
  fd5971be
10 May, 2024 1 commit

Bump VRAM buffer back up · 30a7d709

Daniel Hiltgen authored May 10, 2024

Under stress scenarios we're seeing OOMs so this should help stabilize
the allocations under heavy concurrency stress.

30a7d709