Commits · 422dcc385694759979d6a5fcd2fa0ac6593ca038 · OpenDAS / ollama

02 Jul, 2024 4 commits
- Merge pull request #5439 from dhiltgen/fix_centos_7_build · 422dcc38
  Daniel Hiltgen authored Jul 02, 2024
```
Switch ARM64 container image base to rocky 8
```
  422dcc38
- Switch amd container image base to rocky 8 · 020bd60a
  Daniel Hiltgen authored Jul 02, 2024
```
The centos 7 arm mirrors have disappeared due to the EOL 2 days
ago, and the vault sed workaround which works for x86 doesn't work for arm.
```
  020bd60a
- Merge pull request #5438 from dhiltgen/fix_centos_7_build · 8e277b72
  Daniel Hiltgen authored Jul 02, 2024
```
Centos 7 EOL broke mirrors
```
  8e277b72
- Centos 7 EOL broke mirrors · 4f67b39d
  Daniel Hiltgen authored Jul 02, 2024
```
As of July 1st 2024: Could not resolve host: mirrorlist.centos.org
This is expected due to EOL dates.
```
  4f67b39d
01 Jul, 2024 12 commits
- Merge pull request #5336 from ollama/jyan/from-errors · 24252813
  Josh authored Jul 01, 2024
```
fix: trim spaces for FROM argument, don't trim inside of quotes
```
  24252813
- Merge pull request #5421 from ollama/jyan/ver · 0403e986
  Josh authored Jul 01, 2024
```
fix: add unsupported architecture message for linux/windows
```
  0403e986
- error · 33a65e3b
  Josh Yan authored Jul 01, 2024
  
  33a65e3b
- trimspace test case · 7e571f95
  Josh Yan authored Jul 01, 2024
  
  7e571f95
- Merge pull request #5410 from dhiltgen/ctx_cleanup · e70610ef
  Daniel Hiltgen authored Jul 01, 2024
```
Fix case for NumCtx
```
  e70610ef
- Merge pull request #5364 from dhiltgen/concurrency_docs · dfded7e0
  Daniel Hiltgen authored Jul 01, 2024
```
Document concurrent behavior and settings
```
  dfded7e0
- Remove default auto from help message · 173b5504
  Daniel Hiltgen authored Jul 01, 2024
```
This may confuse users thinking "auto" is an acceptable string - it must be numeric
```
  173b5504
- Fix case for NumCtx · cff3f44f
  Daniel Hiltgen authored Jul 01, 2024
  
  cff3f44f
- updated parsefile test · 26e4e66f
  Josh Yan authored Jul 01, 2024
  
  26e4e66f
- Merge pull request #4218 from dhiltgen/auto_parallel · 3518aaef
  Daniel Hiltgen authored Jul 01, 2024
```
Enable concurrency by default
```
  3518aaef
- Update README.md (#5214) · 1963c002
  RAPID ARCHITECT authored Jun 30, 2024
```
* Update README.md

Added Mesop example to web & desktop

* Update README.md

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
```
  1963c002
- Update gpu.md (#5382) · 27402cb7
  Eduard authored Jul 01, 2024
```
Runs fine on a NVIDIA GeForce GTX 1050 Ti
```
  27402cb7
29 Jun, 2024 2 commits
- Update api.md · c1218199
  Jeffrey Morgan authored Jun 29, 2024
  
  c1218199
- Do not shift context for sliding window models (#5368) · 717f7229
  Jeffrey Morgan authored Jun 28, 2024
```
* Do not shift context for sliding window models

* truncate prompt > 2/3 tokens

* only target gemma2
```
  717f7229
28 Jun, 2024 4 commits
- Document concurrent behavior and settings · aae56abb
  Daniel Hiltgen authored Jun 28, 2024
  
  aae56abb
- Include Show Info in Interactive (#5342) · 5f034f5b
  royjhan authored Jun 28, 2024
  
  5f034f5b
- Ollama Show: Check for Projector Type (#5307) · b910fa90
  royjhan authored Jun 28, 2024
```
* Check exists projtype

* Maintain Ordering
```
  b910fa90
- Update docs (#5312) · 6d421908
  royjhan authored Jun 28, 2024
  
  6d421908
27 Jun, 2024 7 commits
- Merge pull request #5340 from ollama/mxyng/mem · 1ed4f521
  Michael Yang authored Jun 27, 2024
```
gemma2 graph
```
  1ed4f521
- gemma2 graph · de2163da
  Michael Yang authored Jun 27, 2024
  
  de2163da
- trim all params · 9bd00041
  Josh Yan authored Jun 27, 2024
  
  9bd00041
- unquote, trimp space · 4e986a82
  Josh Yan authored Jun 27, 2024
  
  4e986a82
- update readme for gemma 2 (#5333) · 2cc7d050
  Michael authored Jun 27, 2024
```
* update readme for gemma 2
```
  2cc7d050
- zip: prevent extracting files into parent dirs (#5314) · 123a722a
  Michael Yang authored Jun 26, 2024
  
  123a722a
- llm: architecture patch (#5316) · 4d311eb7
  Jeffrey Morgan authored Jun 26, 2024
  
  4d311eb7
25 Jun, 2024 2 commits

llm: speed up gguf decoding by a lot (#5246) · cb42e607

Blake Mizerany authored Jun 24, 2024

Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.

cb42e607

cmd: defer stating model info until necessary (#5248) · 2aa91a93

Blake Mizerany authored Jun 24, 2024

This commit changes the 'ollama run' command to defer fetching model
information until it really needs it. That is, when in interactive mode.

It also removes one such case where the model information is fetch in
duplicate, just before calling generateInteractive and then again, first
thing, in generateInteractive.

This positively impacts the performance of the command:

    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.168 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.220 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.217 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 4% cpu 0.652 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 5% cpu 0.498 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 3% cpu 0.479 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total

2aa91a93

21 Jun, 2024 8 commits

Merge pull request #5205 from dhiltgen/modelfile_use_mmap · ccef9431
Daniel Hiltgen authored Jun 21, 2024
```
Fix use_mmap parsing for modelfiles
```
ccef9431

Sort the ps output · 642cee13

Daniel Hiltgen authored Jun 21, 2024

Provide consistent ordering for the ps command - longest duration listed first

642cee13

Docs (#5149) · 9a9e7d83
royjhan authored Jun 21, 2024

9a9e7d83

Disable concurrency for AMD + Windows · 9929751c

Daniel Hiltgen authored Jun 19, 2024

Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.

9929751c

Enable concurrency by default · 17b7186c

Daniel Hiltgen authored May 06, 2024

This adjusts our default settings to enable multiple models and parallel
requests to a single model. Users can still override these by the same
env var settings as before. Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s). As before, multiple models will only load
concurrently if they fully fit in VRAM.

17b7186c

Merge pull request #5206 from ollama/mxyng/quantize · 189a43ca
Michael Yang authored Jun 21, 2024
```
fix: quantization with template
```
189a43ca
fix: quantization with template · e835ef18
Michael Yang authored Jun 21, 2024

e835ef18

Fix use_mmap parsing for modelfiles · 7e774922

Daniel Hiltgen authored Jun 21, 2024

Add the new tristate parsing logic for the code path for modelfiles,
as well as a unit test.

7e774922

20 Jun, 2024 1 commit
- Merge pull request #5194 from dhiltgen/linux_mmap_auto · c7c2f3bc
  Daniel Hiltgen authored Jun 20, 2024
```
Refine mmap default logic on linux
```
  c7c2f3bc