Commits · 0c819e167becd7f08312d2a1a1e2ac8e8ea5d4da · OpenDAS / ollama

23 Aug, 2024 1 commit
- convert safetensor adapters into GGUF (#6327) · 0c819e16
  Patrick Devine authored Aug 23, 2024
  
  0c819e16
12 Aug, 2024 1 commit
- add conversion for microsoft phi 3 mini/medium 4k, 128 · 6ffb5cb0
  Michael Yang authored Jun 03, 2024
  
  6ffb5cb0
31 Jul, 2024 1 commit
- update convert test to check result data · 6b252918
  Michael Yang authored Jun 03, 2024
  
  6b252918
10 Jul, 2024 1 commit
- chatglm graph · 5a739ff4
  Michael Yang authored Jul 10, 2024
  
  5a739ff4
27 Jun, 2024 1 commit
- gemma2 graph · de2163da
  Michael Yang authored Jun 27, 2024
  
  de2163da
25 Jun, 2024 1 commit

llm: speed up gguf decoding by a lot (#5246) · cb42e607

Blake Mizerany authored Jun 24, 2024

Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.

cb42e607

20 Jun, 2024 1 commit
- handle asymmetric embedding KVs · 8e0641a9
  Michael Yang authored Jun 20, 2024
  
  8e0641a9
18 Jun, 2024 1 commit
- deepseek v2 graph · e873841c
  Michael Yang authored Jun 18, 2024
  
  e873841c
14 Jun, 2024 1 commit

Improve multi-gpu handling at the limit · 6fd04ca9

Daniel Hiltgen authored May 18, 2024

Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block

6fd04ca9

11 Jun, 2024 1 commit

Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order" · 7bdcd1da

Michael Yang authored Jun 11, 2024

This reverts commit f5f245cc, reversing
changes made to 94d37fdc.

this change broke gguf v2 which is incorrectly detected as big endian

7bdcd1da

08 Jun, 2024 1 commit
- fix parsing big endian gguf · 620d5c56
  Michael Yang authored Jun 08, 2024
  
  620d5c56
06 Jun, 2024 1 commit
- detect chat template from KV · 9b6c2e6e
  Michael Yang authored Jun 03, 2024
  
  9b6c2e6e
24 May, 2024 2 commits
- Update llm/ggml.go · d51f1525
  Michael Yang authored May 24, 2024
```
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
```
  d51f1525
- fix q5_0, q5_1 · 8f440d57
  Michael Yang authored May 24, 2024
  
  8f440d57
23 May, 2024 1 commit
- Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322) · d6f692ad
  Bruce MacDonald authored May 23, 2024
```
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com>
```
  d6f692ad
21 May, 2024 1 commit
- simplify safetensors reading · 171eb040
  Michael Yang authored May 20, 2024
  
  171eb040
10 May, 2024 1 commit
- add phi2 mem · 1eb382da
  Michael Yang authored May 10, 2024
  
  1eb382da
08 May, 2024 1 commit
- skip if same quantization · eeb69526
  Michael Yang authored May 07, 2024
  
  eeb69526
06 May, 2024 2 commits
- comments · 01811c17
  Michael Yang authored Apr 23, 2024
  
  01811c17
- quantize any fp16/fp32 model · 9685c345
  Michael Yang authored Apr 12, 2024
```
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
```
  9685c345
23 Apr, 2024 1 commit
- fix: mixtral graph · 435cc866
  Michael Yang authored Apr 22, 2024
  
  435cc866
17 Apr, 2024 2 commits
- add stablelm graph calculation · 3cf483fe
  Michael Yang authored Apr 17, 2024
  
  3cf483fe
- account for all non-repeating layers · a8b9b930
  Michael Yang authored Apr 17, 2024
  
  a8b9b930
11 Apr, 2024 1 commit
- mixtral mem · 3397eff0
  Michael Yang authored Apr 11, 2024
  
  3397eff0
10 Apr, 2024 2 commits
- partial offloading · 7e33a017
  Michael Yang authored Apr 05, 2024
  
  7e33a017
- refactor tensor query · 8b2c1006
  Michael Yang authored Apr 03, 2024
  
  8b2c1006
04 Apr, 2024 1 commit
- add command-r graph estimate · 01f77ae2
  Michael Yang authored Apr 04, 2024
  
  01f77ae2
03 Apr, 2024 1 commit
- update graph size estimate · 12e923e1
  Michael Yang authored Apr 02, 2024
  
  12e923e1
02 Apr, 2024 1 commit
- default head_kv to 1 · 90f071c6
  Michael Yang authored Apr 02, 2024
  
  90f071c6
01 Apr, 2024 2 commits
- update memory calcualtions · 91b3e4d2
  Michael Yang authored Mar 18, 2024
```
count each layer independently when deciding gpu offloading
```
  91b3e4d2
- refactor model parsing · d338d704
  Michael Yang authored Mar 13, 2024
  
  d338d704
29 Mar, 2024 1 commit
- Add gemma safetensors conversion (#3250) · 5a5efee4
  Patrick Devine authored Mar 28, 2024
```
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  5a5efee4
12 Mar, 2024 1 commit
- refactor readseeker · 00852979
  Michael Yang authored Mar 09, 2024
  
  00852979
08 Mar, 2024 1 commit
- decode ggla · 76bdebba
  Michael Yang authored Mar 08, 2024
  
  76bdebba
07 Mar, 2024 1 commit
- Convert Safetensors to an Ollama model (#2824) · 2c017ca4
  Patrick Devine authored Mar 06, 2024
  
  2c017ca4
21 Feb, 2024 1 commit
- add gguf file types (#2532) · 949d7b1c
  Michael Yang authored Feb 20, 2024
  
  949d7b1c
12 Jan, 2024 1 commit
- add max context length check · eaed6f8c
  Michael Yang authored Jan 12, 2024
  
  eaed6f8c
09 Jan, 2024 1 commit
- fix lint · 2bb2bdd5
  Michael Yang authored Dec 15, 2023
  
  2bb2bdd5
08 Jan, 2024 1 commit

Offload layers to GPU based on new model size estimates (#1850) · 08f1e189

Jeffrey Morgan authored Jan 08, 2024



* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

08f1e189

19 Dec, 2023 1 commit

deprecate ggml · 811b1f03

Bruce MacDonald authored Nov 24, 2023



- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>

811b1f03