Commits · aae56abb7cc96b8495a1c761a08b92cfd136d9d2 · OpenDAS / ollama

28 Jun, 2024 3 commits
- Document concurrent behavior and settings · aae56abb
  Daniel Hiltgen authored Jun 28, 2024
  
  aae56abb
- Ollama Show: Check for Projector Type (#5307) · b910fa90
  royjhan authored Jun 28, 2024
```
* Check exists projtype

* Maintain Ordering
```
  b910fa90
- Update docs (#5312) · 6d421908
  royjhan authored Jun 28, 2024
  
  6d421908
27 Jun, 2024 5 commits
- Merge pull request #5340 from ollama/mxyng/mem · 1ed4f521
  Michael Yang authored Jun 27, 2024
```
gemma2 graph
```
  1ed4f521
- gemma2 graph · de2163da
  Michael Yang authored Jun 27, 2024
  
  de2163da
- update readme for gemma 2 (#5333) · 2cc7d050
  Michael authored Jun 27, 2024
```
* update readme for gemma 2
```
  2cc7d050
- zip: prevent extracting files into parent dirs (#5314) · 123a722a
  Michael Yang authored Jun 26, 2024
  
  123a722a
- llm: architecture patch (#5316) · 4d311eb7
  Jeffrey Morgan authored Jun 26, 2024
  
  4d311eb7
25 Jun, 2024 2 commits

llm: speed up gguf decoding by a lot (#5246) · cb42e607

Blake Mizerany authored Jun 24, 2024

Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.

cb42e607

cmd: defer stating model info until necessary (#5248) · 2aa91a93

Blake Mizerany authored Jun 24, 2024

This commit changes the 'ollama run' command to defer fetching model
information until it really needs it. That is, when in interactive mode.

It also removes one such case where the model information is fetch in
duplicate, just before calling generateInteractive and then again, first
thing, in generateInteractive.

This positively impacts the performance of the command:

    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.168 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.220 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.217 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 4% cpu 0.652 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 5% cpu 0.498 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 3% cpu 0.479 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total

2aa91a93

21 Jun, 2024 5 commits
- Merge pull request #5205 from dhiltgen/modelfile_use_mmap · ccef9431
  Daniel Hiltgen authored Jun 21, 2024
```
Fix use_mmap parsing for modelfiles
```
  ccef9431
- Docs (#5149) · 9a9e7d83
  royjhan authored Jun 21, 2024
  
  9a9e7d83
- Merge pull request #5206 from ollama/mxyng/quantize · 189a43ca
  Michael Yang authored Jun 21, 2024
```
fix: quantization with template
```
  189a43ca
- fix: quantization with template · e835ef18
  Michael Yang authored Jun 21, 2024
  
  e835ef18
- Fix use_mmap parsing for modelfiles · 7e774922
  Daniel Hiltgen authored Jun 21, 2024
```
Add the new tristate parsing logic for the code path for modelfiles,
as well as a unit test.
```
  7e774922
20 Jun, 2024 9 commits
- Merge pull request #5194 from dhiltgen/linux_mmap_auto · c7c2f3bc
  Daniel Hiltgen authored Jun 20, 2024
```
Refine mmap default logic on linux
```
  c7c2f3bc
- Merge pull request #5125 from dhiltgen/fedora39 · 54a79d6a
  Daniel Hiltgen authored Jun 20, 2024
```
Bump latest fedora cuda repo to 39
```
  54a79d6a
- Refine mmap default logic on linux · 5bf5aeec
  Daniel Hiltgen authored Jun 20, 2024
```
If we try to use mmap when the model is larger than the system free space, loading is slower than the no-mmap approach.
```
  5bf5aeec
- Merge pull request #5192 from ollama/mxyng/kv · e01e535c
  Michael Yang authored Jun 20, 2024
```
handle asymmetric embedding KVs
```
  e01e535c
- Merge pull request #5188 from ollama/jyan/tmpdir2 · 0195d6a2
  Josh authored Jun 20, 2024
```
fix: skip os.removeAll() if PID does not exist
```
  0195d6a2
- handle asymmetric embedding KVs · 8e0641a9
  Michael Yang authored Jun 20, 2024
  
  8e0641a9
- err!=nil check · 662568d4
  Josh Yan authored Jun 20, 2024
  
  662568d4
- reformat error check · 4ebb66c6
  Josh Yan authored Jun 20, 2024
  
  4ebb66c6
- skip os.removeAll() if PID does not exist · 23e899f3
  Josh Yan authored Jun 20, 2024
  
  23e899f3
19 Jun, 2024 15 commits
- Extend api/show and ollama show to return more model info (#4881) · fedf7163
  royjhan authored Jun 19, 2024
```
* API Show Extended

* Initial Draft of Information
Co-Authored-By: Patrick Devine <pdevine@sonic.net>

* Clean Up

* Descriptive arg error messages and other fixes

* Second Draft of Show with Projectors Included

* Remove Chat Template

* Touches

* Prevent wrapping from files

* Verbose functionality

* Docs

* Address Feedback

* Lint

* Resolve Conflicts

* Function Name

* Tests for api/show model info

* Show Test File

* Add Projector Test

* Clean routes

* Projector Check

* Move Show Test

* Touches

* Doc update

---------
Co-authored-by: Patrick Devine <pdevine@sonic.net>
```
  fedf7163
- Merge pull request #5074 from dhiltgen/app_log_rotation · 97c59be6
  Daniel Hiltgen authored Jun 19, 2024
```
Implement log rotation for tray app
```
  97c59be6
- Implement log rotation for tray app · 9d8a4988
  Daniel Hiltgen authored Jun 15, 2024
  
  9d8a4988
- Merge pull request #5147 from ollama/mxyng/cleanup · 1ae0750a
  Michael Yang authored Jun 19, 2024
```
remove confusing log message
```
  1ae0750a
- remove confusing log message · 9d91e5e5
  Michael Yang authored Jun 19, 2024
  
  9d91e5e5
- Merge pull request #5072 from dhiltgen/windows_path · 96624aa4
  Daniel Hiltgen authored Jun 19, 2024
```
Move libraries out of users path
```
  96624aa4
- Merge pull request #5146 from dhiltgen/backout · 10f33b85
  Daniel Hiltgen authored Jun 19, 2024
```
Put back temporary intel GPU env var
```
  10f33b85
- Merge pull request #5145 from dhiltgen/bad_loads · 4a633cc2
  Daniel Hiltgen authored Jun 19, 2024
```
Fix bad symbol load detection
```
  4a633cc2
- Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)"" · d34d88e4
  Daniel Hiltgen authored Jun 19, 2024
```
This reverts commit 755b4e4f.
```
  d34d88e4
- Fix bad symbol load detection · 52ce350b
  Daniel Hiltgen authored Jun 19, 2024
```
pointer deref's weren't correct on a few libraries, which explains
some crashes on older systems or miswired symlinks for discovery libraries.
```
  52ce350b
- Merge pull request #5128 from zhewang1-intc/fix_levelzero_empty_symbol_detect · 2abebb2c
  Daniel Hiltgen authored Jun 19, 2024
```
Fix levelzero empty symbol detect
```
  2abebb2c
- types/model: remove Digest · 380e06e5
  Blake Mizerany authored Jun 18, 2024
```
The Digest type in its current form is awkward to work with and presents
challenges with regard to how it serializes via String using the '-'
prefix.

We currently only use this in ollama.com, so we'll move our specific
needs around digest parsing and validation there.
```
  380e06e5
- get real func ptr. · badf975e
  Wang,Zhe authored Jun 19, 2024
  
  badf975e
- Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)" · 755b4e4f
  Wang,Zhe authored Jun 19, 2024
```
This reverts commit 163cd3e7.
```
  755b4e4f
- Bump latest fedora cuda repo to 39 · 1a1c99e3
  Daniel Hiltgen authored Jun 18, 2024
  
  1a1c99e3
18 Jun, 2024 1 commit
- Merge pull request #5121 from ollama/mxyng/deepseekv2 · 21adf8b6
  Michael Yang authored Jun 18, 2024
```
deepseek v2 graph
```
  21adf8b6