Commits · 24636dfa87c3759b1d89efc47a1fd01623058fd1 · OpenDAS / ollama

"tests/test_utils/test_misc.py" did not exist on "12e5913bb92188c236860341eddf085de86ddfff"

15 Oct, 2024 1 commit

Discovery CPU details for default thread selection (#6264) · 24636dfa

Daniel Hiltgen authored Oct 15, 2024

On windows, detect large multi-socket systems and reduce to the number of cores
in one socket for best performance

24636dfa

14 Oct, 2024 1 commit
- Track GPU discovery failure information (#5820) · f3c8b898
  Daniel Hiltgen authored Oct 14, 2024
```
* Expose GPU discovery failure information

* Remove exposed API for now
```
  f3c8b898
21 Sep, 2024 1 commit

Fix missing dep path on windows CPU runners (#6884) · 6c2eb73a

Daniel Hiltgen authored Sep 21, 2024

GPUs handled the dependency path properly, but CPU runners didn't which
results in missing vc redist libraries on systems where the user didn't
already have it installed from some other app.

6c2eb73a

12 Sep, 2024 1 commit

Optimize container images for startup (#6547) · cd5c8f64

Daniel Hiltgen authored Sep 12, 2024

* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release

cd5c8f64

11 Sep, 2024 1 commit

Verify permissions for AMD GPU (#6736) · 9246e6dd

Daniel Hiltgen authored Sep 11, 2024

This adds back a check which was lost many releases back to verify /dev/kfd permissions
which when lacking, can lead to confusing failure modes of:
"rocBLAS error: Could not initialize Tensile host: No devices found"

This implementation does not hard fail the serve command but instead will fall back to CPU
with an error log. In the future we can include this in the GPU discovery UX to show
detected but unsupported devices we discovered.

9246e6dd

04 Sep, 2024 1 commit

Use cuda v11 for driver 525 and older (#6620) · f29b167e

Daniel Hiltgen authored Sep 03, 2024

It looks like driver 525 (aka, cuda driver 12.0) has problems with the cuda v12 library
we compile against, so run v11 on those older drivers if detected.

f29b167e

27 Aug, 2024 1 commit
- Move ollama executable out of bin dir (#6535) · 93ea9240
  Daniel Hiltgen authored Aug 27, 2024
  
  93ea9240
23 Aug, 2024 2 commits

gpu: Group GPU Library sets by variant (#6483) · 69be940b

Daniel Hiltgen authored Aug 23, 2024

The recent cuda variant changes uncovered a bug in ByLibrary
which failed to group by common variant for GPU types.

69be940b

gpu: Ensure driver version set before variant (#6480) · 7a1e1c1c

Daniel Hiltgen authored Aug 23, 2024

During rebasing, the ordering was inverted causing the cuda version
selection logic to break, with driver version being evaluated as zero
incorrectly causing a downgrade to v11.

7a1e1c1c

19 Aug, 2024 6 commits
- Review comments · f9e31da9
  Daniel Hiltgen authored Aug 15, 2024
  
  f9e31da9
- Adjust layout to bin+lib/ollama · 88bb9e33
  Daniel Hiltgen authored Aug 14, 2024
  
  88bb9e33
- Add cuda v12 variant and selection logic · 4fe3a556
  Daniel Hiltgen authored Jun 13, 2024
```
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
```
  4fe3a556
- Report GPU variant in log · fc3b4cda
  Daniel Hiltgen authored Jun 19, 2024
  
  fc3b4cda
- Add Jetson cuda variants for arm · d470ebe7
  Daniel Hiltgen authored May 30, 2024
```
This adds new variants for arm64 specific to Jetson platforms
```
  d470ebe7
- Refactor linux packaging · 74d45f01
  Daniel Hiltgen authored Jul 08, 2024
```
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
```
  74d45f01
09 Aug, 2024 1 commit
- Harden intel boostrap for nil pointers · 5bca2e60
  Daniel Hiltgen authored Aug 09, 2024
  
  5bca2e60
05 Aug, 2024 3 commits
- Implement linux NUMA detection · f457d634
  Daniel Hiltgen authored Aug 05, 2024
```
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
```
  f457d634
- close pid file · 43f9d920
  Michael Yang authored Aug 05, 2024
  
  43f9d920
- removeall to remove non-empty temp dirs · ed6c8bfe
  Michael Yang authored Aug 05, 2024
  
  ed6c8bfe
02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
24 Jul, 2024 1 commit

Ensure amd gpu nodes are numerically sorted · 7c2a157c

Daniel Hiltgen authored Jul 24, 2024

For systems that enumerate over 10 CPUs the default lexicographical
sort order interleaves CPUs and GPUs.

7c2a157c

22 Jul, 2024 3 commits
- string · e2c3f6b3
  Michael Yang authored Jul 03, 2024
  
  e2c3f6b3
- bool · 55cd3ddc
  Michael Yang authored Jul 03, 2024
  
  55cd3ddc
- rfc: dynamic environ lookup · 35b89b2e
  Michael Yang authored Jul 03, 2024
  
  35b89b2e
20 Jul, 2024 1 commit

Adjust windows ROCm discovery · 283948c8

Daniel Hiltgen authored Jul 19, 2024

The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery. The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.

283948c8

11 Jul, 2024 1 commit

llm: avoid loading model if system memory is too small (#5637) · c4cf8ad5

Jeffrey Morgan authored Jul 11, 2024



* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

c4cf8ad5

10 Jul, 2024 1 commit

Bump ROCm on windows to 6.1.2 · 1f50356e

Daniel Hiltgen authored Jul 10, 2024

This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.

1f50356e

09 Jul, 2024 1 commit

Detect CUDA OS Overhead · f6f759fc

Daniel Hiltgen authored Jul 09, 2024

This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.

f6f759fc

06 Jul, 2024 1 commit
- gpu: report system free memory instead of 0 (#5521) · f8241bfb
  Jeffrey Morgan authored Jul 06, 2024
  
  f8241bfb
03 Jul, 2024 1 commit

Better nvidia GPU discovery logging · ef757da2

Daniel Hiltgen authored Jul 03, 2024

Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.

ef757da2

21 Jun, 2024 1 commit

Disable concurrency for AMD + Windows · 9929751c

Daniel Hiltgen authored Jun 19, 2024

Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.

9929751c

20 Jun, 2024 3 commits
- err!=nil check · 662568d4
  Josh Yan authored Jun 20, 2024
  
  662568d4
- reformat error check · 4ebb66c6
  Josh Yan authored Jun 20, 2024
  
  4ebb66c6
- skip os.removeAll() if PID does not exist · 23e899f3
  Josh Yan authored Jun 20, 2024
  
  23e899f3
19 Jun, 2024 4 commits
- Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)"" · d34d88e4
  Daniel Hiltgen authored Jun 19, 2024
```
This reverts commit 755b4e4f.
```
  d34d88e4
- Fix bad symbol load detection · 52ce350b
  Daniel Hiltgen authored Jun 19, 2024
```
pointer deref's weren't correct on a few libraries, which explains
some crashes on older systems or miswired symlinks for discovery libraries.
```
  52ce350b
- get real func ptr. · badf975e
  Wang,Zhe authored Jun 19, 2024
  
  badf975e
- Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)" · 755b4e4f
  Wang,Zhe authored Jun 19, 2024
```
This reverts commit 163cd3e7.
```
  755b4e4f
18 Jun, 2024 1 commit

Wire up windows AMD driver reporting · 784bf88b

Daniel Hiltgen authored Jun 18, 2024

This seems to be ROCm version, not actually driver version, but
it may be useful for toggling logic for VRAM reporting in the future

784bf88b

17 Jun, 2024 1 commit

Move libraries out of users path · b2799f11

Daniel Hiltgen authored Jun 15, 2024

We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.

b2799f11