Commits · 2abebb2cbe9aad5d4c4710fa33dd7cc078ca894b · OpenDAS / ollama

19 Jun, 2024 4 commits
- Merge pull request #5128 from zhewang1-intc/fix_levelzero_empty_symbol_detect · 2abebb2c
  Daniel Hiltgen authored Jun 19, 2024
```
Fix levelzero empty symbol detect
```
  2abebb2c
- types/model: remove Digest · 380e06e5
  Blake Mizerany authored Jun 18, 2024
```
The Digest type in its current form is awkward to work with and presents
challenges with regard to how it serializes via String using the '-'
prefix.

We currently only use this in ollama.com, so we'll move our specific
needs around digest parsing and validation there.
```
  380e06e5
- get real func ptr. · badf975e
  Wang,Zhe authored Jun 19, 2024
  
  badf975e
- Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)" · 755b4e4f
  Wang,Zhe authored Jun 19, 2024
```
This reverts commit 163cd3e7.
```
  755b4e4f
18 Jun, 2024 7 commits

Merge pull request #5121 from ollama/mxyng/deepseekv2 · 21adf8b6
Michael Yang authored Jun 18, 2024
```
deepseek v2 graph
```
21adf8b6
deepseek v2 graph · e873841c
Michael Yang authored Jun 18, 2024

e873841c
Merge pull request #5117 from dhiltgen/fix_prediction · 26d0bf92
Daniel Hiltgen authored Jun 18, 2024
```
Handle models with divergent layer sizes
```
26d0bf92

Handle models with divergent layer sizes · 359b15a5

Daniel Hiltgen authored Jun 18, 2024

The recent refactoring of the memory prediction assumed all layers
are the same size, but for some models (like deepseek-coder-v2) this
is not the case, so our predictions were significantly off.

359b15a5

Merge pull request #5106 from dhiltgen/clean_logs · b55958a5
Daniel Hiltgen authored Jun 18, 2024
```
Tighten up memory prediction logging
```
b55958a5

Tighten up memory prediction logging · 7784ca33

Daniel Hiltgen authored Jun 17, 2024

Prior to this change, we logged the memory prediction multiple times
as the scheduler iterates to find a suitable configuration, which can be
confusing since only the last log before the server starts is actually valid.
This now logs once just before starting the server on the final configuration.
It also reports what library instead of always saying "offloading to gpu" when
using CPU.

7784ca33

Merge pull request #5105 from dhiltgen/cuda_mmap · c9c8c98b
Daniel Hiltgen authored Jun 17, 2024
```
Adjust mmap logic for cuda windows for faster model load
```
c9c8c98b

17 Jun, 2024 9 commits
- Adjust mmap logic for cuda windows for faster model load · 17179679
  Daniel Hiltgen authored Jun 17, 2024
```
On Windows, recent llama.cpp changes make mmap slower in most
cases, so default to off.  This also implements a tri-state for
use_mmap so we can detect the difference between a user provided
value of true/false, or unspecified.
```
  17179679
- Update import.md · 176d0f70
  Jeffrey Morgan authored Jun 17, 2024
  
  176d0f70
- Merge pull request #5103 from dhiltgen/faster_win_build · 8ed51cac
  Daniel Hiltgen authored Jun 17, 2024
```
Revert powershell jobs, but keep nvcc and cmake parallelism
```
  8ed51cac
- Merge pull request #5069 from dhiltgen/ci_release · c9e6f054
  Daniel Hiltgen authored Jun 17, 2024
```
Implement custom github release action
```
  c9e6f054
- Add back lower level parallel flags · b0930626
  Daniel Hiltgen authored Jun 17, 2024
```
nvcc supports parallelism (threads) and cmake + make can use -j,
while msbuild requires /p:CL_MPcount=8
```
  b0930626
- Revert "More parallelism on windows generate" · e890be48
  Daniel Hiltgen authored Jun 17, 2024
```
This reverts commit 0577af98.
```
  e890be48
- llm: update llama.cpp commit to `7c26775` (#4896) · 152fc202
  Jeffrey Morgan authored Jun 17, 2024
```
* llm: update llama.cpp submodule to `7c26775`

* disable `LLAMA_BLAS` for now

* `-DLLAMA_OPENMP=off`
```
  152fc202
- Fix a build warning (#5096) · 4ad0d4d6
  Lei Jitang authored Jun 18, 2024
```
Signed-off-by: Lei Jitang <leijitang@outlook.com>
```
  4ad0d4d6
- gpu: add env var for detecting Intel oneapi gpus (#5076) · 163cd3e7
  Jeffrey Morgan authored Jun 16, 2024
```
* gpu: add env var for detecting intel oneapi gpus

* fix build error
```
  163cd3e7
16 Jun, 2024 4 commits
- Merge pull request #5080 from dhiltgen/debug_intel_crash · 4c2c8f93
  Daniel Hiltgen authored Jun 16, 2024
```
Add some more debugging logs for intel discovery
```
  4c2c8f93
- Add some more debugging logs for intel discovery · fd1e6e05
  Daniel Hiltgen authored Jun 16, 2024
```
Also removes an unused overall count variable
```
  fd1e6e05
- Add ModifiedAt Field to /api/show (#5033) · 89c79bec
  royjhan authored Jun 15, 2024
```
* Add Mod Time to Show

* Error Handling
```
  89c79bec
- docs: add missing powershell package to windows development instructions (#5075) · c7b77004
  Jeffrey Morgan authored Jun 15, 2024
```
* docs: add missing instruction for powershell build

The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list.

* Update development.md
```
  c7b77004
15 Jun, 2024 9 commits
- Merge pull request #5058 from coolljt0725/fix_build_warning · 07d143f4
  Daniel Hiltgen authored Jun 15, 2024
```
gpu: Fix build warning
```
  07d143f4
- Implement custom github release action · a12283e2
  Daniel Hiltgen authored Jun 15, 2024
```
This implements the release logic we want via gh cli
to support updating releases with rc tags in place and retain
release notes and other community reactions.
```
  a12283e2
- Merge pull request #5037 from dhiltgen/faster_win_build · 4b0050cf
  Daniel Hiltgen authored Jun 15, 2024
```
More parallelism on windows generate
```
  4b0050cf
- More parallelism on windows generate · 0577af98
  Daniel Hiltgen authored Jun 13, 2024
```
Make the build faster
```
  0577af98
- Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround · 17ce203a
  Daniel Hiltgen authored Jun 15, 2024
```
Rocm gfx900 workaround
```
  17ce203a
- Merge pull request #4874 from dhiltgen/rocm_v6_bump · d76555ff
  Daniel Hiltgen authored Jun 15, 2024
```
Rocm v6 bump
```
  d76555ff
- Merge pull request #4264 from dhiltgen/show_gpu_visible_settings · 2786dff5
  Daniel Hiltgen authored Jun 15, 2024
```
Centralize GPU configuration vars
```
  2786dff5
- gpu: Fix build warning · 225f0d12
  Lei Jitang authored Jun 15, 2024
```
Signed-off-by: Lei Jitang <leijitang@outlook.com>
```
  225f0d12
- Merge pull request #4972 from jayson-cloude/main · 532db583
  Daniel Hiltgen authored Jun 14, 2024
```
fix: "Skip searching for network devices"
```
  532db583
14 Jun, 2024 7 commits
- Centralize GPU configuration vars · 6be309e1
  Daniel Hiltgen authored May 08, 2024
```
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
```
  6be309e1
- Workaround gfx900 SDMA bugs · da3bf233
  Daniel Hiltgen authored May 31, 2024
```
Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly
```
  da3bf233
- Bump ROCm linux to 6.1.1 · 26ab6773
  Daniel Hiltgen authored Jun 06, 2024
  
  26ab6773
- Merge pull request #4517 from dhiltgen/gpu_incremental · 45cacbaf
  Daniel Hiltgen authored Jun 14, 2024
```
Enhanced GPU discovery and multi-gpu support with concurrency
```
  45cacbaf
- Remove mmap related output calc logic · 17df6520
  Daniel Hiltgen authored Jun 13, 2024
  
  17df6520
- review comments and coverage · 6f351bf5
  Daniel Hiltgen authored Jun 05, 2024
  
  6f351bf5
- Prevent multiple concurrent loads on the same gpus · ff4f0cbd
  Daniel Hiltgen authored Jun 04, 2024
```
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
```
  ff4f0cbd