Commits · 93ea9240aee8a51b0fe455fdddedec2046438f95 · OpenDAS / ollama

27 Aug, 2024 1 commit
- Move ollama executable out of bin dir (#6535) · 93ea9240
  Daniel Hiltgen authored Aug 27, 2024
  
  93ea9240
23 Aug, 2024 2 commits

gpu: Group GPU Library sets by variant (#6483) · 69be940b

Daniel Hiltgen authored Aug 23, 2024

The recent cuda variant changes uncovered a bug in ByLibrary
which failed to group by common variant for GPU types.

69be940b

gpu: Ensure driver version set before variant (#6480) · 7a1e1c1c

Daniel Hiltgen authored Aug 23, 2024

During rebasing, the ordering was inverted causing the cuda version
selection logic to break, with driver version being evaluated as zero
incorrectly causing a downgrade to v11.

7a1e1c1c

19 Aug, 2024 6 commits
- Review comments · f9e31da9
  Daniel Hiltgen authored Aug 15, 2024
  
  f9e31da9
- Adjust layout to bin+lib/ollama · 88bb9e33
  Daniel Hiltgen authored Aug 14, 2024
  
  88bb9e33
- Add cuda v12 variant and selection logic · 4fe3a556
  Daniel Hiltgen authored Jun 13, 2024
```
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
```
  4fe3a556
- Report GPU variant in log · fc3b4cda
  Daniel Hiltgen authored Jun 19, 2024
  
  fc3b4cda
- Add Jetson cuda variants for arm · d470ebe7
  Daniel Hiltgen authored May 30, 2024
```
This adds new variants for arm64 specific to Jetson platforms
```
  d470ebe7
- Refactor linux packaging · 74d45f01
  Daniel Hiltgen authored Jul 08, 2024
```
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
```
  74d45f01
09 Aug, 2024 1 commit
- Harden intel boostrap for nil pointers · 5bca2e60
  Daniel Hiltgen authored Aug 09, 2024
  
  5bca2e60
05 Aug, 2024 3 commits
- Implement linux NUMA detection · f457d634
  Daniel Hiltgen authored Aug 05, 2024
```
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
```
  f457d634
- close pid file · 43f9d920
  Michael Yang authored Aug 05, 2024
  
  43f9d920
- removeall to remove non-empty temp dirs · ed6c8bfe
  Michael Yang authored Aug 05, 2024
  
  ed6c8bfe
02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
24 Jul, 2024 1 commit

Ensure amd gpu nodes are numerically sorted · 7c2a157c

Daniel Hiltgen authored Jul 24, 2024

For systems that enumerate over 10 CPUs the default lexicographical
sort order interleaves CPUs and GPUs.

7c2a157c

22 Jul, 2024 3 commits
- string · e2c3f6b3
  Michael Yang authored Jul 03, 2024
  
  e2c3f6b3
- bool · 55cd3ddc
  Michael Yang authored Jul 03, 2024
  
  55cd3ddc
- rfc: dynamic environ lookup · 35b89b2e
  Michael Yang authored Jul 03, 2024
  
  35b89b2e
20 Jul, 2024 1 commit

Adjust windows ROCm discovery · 283948c8

Daniel Hiltgen authored Jul 19, 2024

The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery. The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.

283948c8

11 Jul, 2024 1 commit

llm: avoid loading model if system memory is too small (#5637) · c4cf8ad5

Jeffrey Morgan authored Jul 11, 2024



* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

c4cf8ad5

10 Jul, 2024 1 commit

Bump ROCm on windows to 6.1.2 · 1f50356e

Daniel Hiltgen authored Jul 10, 2024

This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.

1f50356e

09 Jul, 2024 1 commit

Detect CUDA OS Overhead · f6f759fc

Daniel Hiltgen authored Jul 09, 2024

This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.

f6f759fc

06 Jul, 2024 1 commit
- gpu: report system free memory instead of 0 (#5521) · f8241bfb
  Jeffrey Morgan authored Jul 06, 2024
  
  f8241bfb
03 Jul, 2024 1 commit

Better nvidia GPU discovery logging · ef757da2

Daniel Hiltgen authored Jul 03, 2024

Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.

ef757da2

21 Jun, 2024 1 commit

Disable concurrency for AMD + Windows · 9929751c

Daniel Hiltgen authored Jun 19, 2024

Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.

9929751c

20 Jun, 2024 3 commits
- err!=nil check · 662568d4
  Josh Yan authored Jun 20, 2024
  
  662568d4
- reformat error check · 4ebb66c6
  Josh Yan authored Jun 20, 2024
  
  4ebb66c6
- skip os.removeAll() if PID does not exist · 23e899f3
  Josh Yan authored Jun 20, 2024
  
  23e899f3
19 Jun, 2024 4 commits
- Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)"" · d34d88e4
  Daniel Hiltgen authored Jun 19, 2024
```
This reverts commit 755b4e4f.
```
  d34d88e4
- Fix bad symbol load detection · 52ce350b
  Daniel Hiltgen authored Jun 19, 2024
```
pointer deref's weren't correct on a few libraries, which explains
some crashes on older systems or miswired symlinks for discovery libraries.
```
  52ce350b
- get real func ptr. · badf975e
  Wang,Zhe authored Jun 19, 2024
  
  badf975e
- Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)" · 755b4e4f
  Wang,Zhe authored Jun 19, 2024
```
This reverts commit 163cd3e7.
```
  755b4e4f
18 Jun, 2024 1 commit

Wire up windows AMD driver reporting · 784bf88b

Daniel Hiltgen authored Jun 18, 2024

This seems to be ROCm version, not actually driver version, but
it may be useful for toggling logic for VRAM reporting in the future

784bf88b

17 Jun, 2024 3 commits

Move libraries out of users path · b2799f11

Daniel Hiltgen authored Jun 15, 2024

We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.

b2799f11

Fix a build warning (#5096) · 4ad0d4d6
Lei Jitang authored Jun 18, 2024
```
Signed-off-by: Lei Jitang <leijitang@outlook.com>
```
4ad0d4d6
gpu: add env var for detecting Intel oneapi gpus (#5076) · 163cd3e7
Jeffrey Morgan authored Jun 16, 2024
```
* gpu: add env var for detecting intel oneapi gpus

* fix build error
```
163cd3e7

16 Jun, 2024 1 commit
- Add some more debugging logs for intel discovery · fd1e6e05
  Daniel Hiltgen authored Jun 16, 2024
```
Also removes an unused overall count variable
```
  fd1e6e05
15 Jun, 2024 1 commit
- gpu: Fix build warning · 225f0d12
  Lei Jitang authored Jun 15, 2024
```
Signed-off-by: Lei Jitang <leijitang@outlook.com>
```
  225f0d12
14 Jun, 2024 2 commits

Centralize GPU configuration vars · 6be309e1

Daniel Hiltgen authored May 08, 2024

This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.

6be309e1

Workaround gfx900 SDMA bugs · da3bf233

Daniel Hiltgen authored May 31, 2024

Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly

da3bf233