Commits · 45cacbaf0568a4d38d74ebdd0957fe01bd06719d · OpenDAS / ollama

14 Jun, 2024 1 commit

Improve multi-gpu handling at the limit · 6fd04ca9

Daniel Hiltgen authored May 18, 2024

Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block

6fd04ca9

11 Jun, 2024 1 commit

Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order" · 7bdcd1da

Michael Yang authored Jun 11, 2024

This reverts commit f5f245cc, reversing
changes made to 94d37fdc.

this change broke gguf v2 which is incorrectly detected as big endian

7bdcd1da

08 Jun, 2024 1 commit
- fix parsing big endian gguf · 620d5c56
  Michael Yang authored Jun 08, 2024
  
  620d5c56
06 Jun, 2024 1 commit
- detect chat template from KV · 9b6c2e6e
  Michael Yang authored Jun 03, 2024
  
  9b6c2e6e
24 May, 2024 2 commits
- Update llm/ggml.go · d51f1525
  Michael Yang authored May 24, 2024
```
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
```
  d51f1525
- fix q5_0, q5_1 · 8f440d57
  Michael Yang authored May 24, 2024
  
  8f440d57
23 May, 2024 1 commit
- Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322) · d6f692ad
  Bruce MacDonald authored May 23, 2024
```
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com>
```
  d6f692ad
21 May, 2024 1 commit
- simplify safetensors reading · 171eb040
  Michael Yang authored May 20, 2024
  
  171eb040
10 May, 2024 1 commit
- add phi2 mem · 1eb382da
  Michael Yang authored May 10, 2024
  
  1eb382da
08 May, 2024 1 commit
- skip if same quantization · eeb69526
  Michael Yang authored May 07, 2024
  
  eeb69526
06 May, 2024 2 commits
- comments · 01811c17
  Michael Yang authored Apr 23, 2024
  
  01811c17
- quantize any fp16/fp32 model · 9685c345
  Michael Yang authored Apr 12, 2024
```
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
```
  9685c345
23 Apr, 2024 1 commit
- fix: mixtral graph · 435cc866
  Michael Yang authored Apr 22, 2024
  
  435cc866
17 Apr, 2024 2 commits
- add stablelm graph calculation · 3cf483fe
  Michael Yang authored Apr 17, 2024
  
  3cf483fe
- account for all non-repeating layers · a8b9b930
  Michael Yang authored Apr 17, 2024
  
  a8b9b930
11 Apr, 2024 1 commit
- mixtral mem · 3397eff0
  Michael Yang authored Apr 11, 2024
  
  3397eff0
10 Apr, 2024 2 commits
- partial offloading · 7e33a017
  Michael Yang authored Apr 05, 2024
  
  7e33a017
- refactor tensor query · 8b2c1006
  Michael Yang authored Apr 03, 2024
  
  8b2c1006
04 Apr, 2024 1 commit
- add command-r graph estimate · 01f77ae2
  Michael Yang authored Apr 04, 2024
  
  01f77ae2
03 Apr, 2024 1 commit
- update graph size estimate · 12e923e1
  Michael Yang authored Apr 02, 2024
  
  12e923e1
02 Apr, 2024 1 commit
- default head_kv to 1 · 90f071c6
  Michael Yang authored Apr 02, 2024
  
  90f071c6
01 Apr, 2024 2 commits
- update memory calcualtions · 91b3e4d2
  Michael Yang authored Mar 18, 2024
```
count each layer independently when deciding gpu offloading
```
  91b3e4d2
- refactor model parsing · d338d704
  Michael Yang authored Mar 13, 2024
  
  d338d704
29 Mar, 2024 1 commit
- Add gemma safetensors conversion (#3250) · 5a5efee4
  Patrick Devine authored Mar 28, 2024
```
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  5a5efee4
12 Mar, 2024 1 commit
- refactor readseeker · 00852979
  Michael Yang authored Mar 09, 2024
  
  00852979
08 Mar, 2024 1 commit
- decode ggla · 76bdebba
  Michael Yang authored Mar 08, 2024
  
  76bdebba
07 Mar, 2024 1 commit
- Convert Safetensors to an Ollama model (#2824) · 2c017ca4
  Patrick Devine authored Mar 06, 2024
  
  2c017ca4
21 Feb, 2024 1 commit
- add gguf file types (#2532) · 949d7b1c
  Michael Yang authored Feb 20, 2024
  
  949d7b1c
12 Jan, 2024 1 commit
- add max context length check · eaed6f8c
  Michael Yang authored Jan 12, 2024
  
  eaed6f8c
09 Jan, 2024 1 commit
- fix lint · 2bb2bdd5
  Michael Yang authored Dec 15, 2023
  
  2bb2bdd5
08 Jan, 2024 1 commit

Offload layers to GPU based on new model size estimates (#1850) · 08f1e189

Jeffrey Morgan authored Jan 08, 2024



* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

08f1e189

19 Dec, 2023 1 commit

deprecate ggml · 811b1f03

Bruce MacDonald authored Nov 24, 2023



- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>

811b1f03

10 Dec, 2023 2 commits
- seek to end of file when decoding older model formats · d9a250e9
  Jeffrey Morgan authored Dec 09, 2023
  
  d9a250e9
- seek to eof for older model binaries · 944519ed
  Jeffrey Morgan authored Dec 09, 2023
  
  944519ed
05 Dec, 2023 3 commits
- seek instead of copyn · 72e7a49a
  Michael Yang authored Nov 29, 2023
  
  72e7a49a
- split from into one or more models · 2cb0fa7d
  Michael Yang authored Nov 24, 2023
  
  2cb0fa7d
- unnecessary ReadSeeker for DecodeGGML · b2816bca
  Michael Yang authored Nov 22, 2023
  
  b2816bca
23 Oct, 2023 1 commit

ggufv3 · 125d0a01

Michael Yang authored Oct 23, 2023

ggufv3 adds support for big endianness, mainly for s390x architecture.
while that's not currently supported for ollama, the change is simple.

loosen version check to be more forward compatible. unless specified,
gguf versions other v1 will be decoded into v2.

125d0a01

03 Oct, 2023 1 commit
- starcoder · c02c0cd4
  Michael Yang authored Oct 02, 2023
  
  c02c0cd4
25 Sep, 2023 1 commit
- unbound max num gpu layers (#591) · 86279f4a
  Bruce MacDonald authored Sep 25, 2023
```
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  86279f4a