Commits · ea0fdaed2893f83c34dfd4328c37103da4e2d9a9 · OpenDAS / ollama

"git@developer.sourcefind.cn:zhaoyu6/sglang.git" did not exist on "70251e935e9d466f36e75d74fffeea90af346418"

10 May, 2024 1 commit
- add phi2 mem · 1eb382da
  Michael Yang authored May 10, 2024
  
  1eb382da
08 May, 2024 1 commit
- skip if same quantization · eeb69526
  Michael Yang authored May 07, 2024
  
  eeb69526
06 May, 2024 2 commits
- comments · 01811c17
  Michael Yang authored Apr 23, 2024
  
  01811c17
- quantize any fp16/fp32 model · 9685c345
  Michael Yang authored Apr 12, 2024
```
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
```
  9685c345
23 Apr, 2024 1 commit
- fix: mixtral graph · 435cc866
  Michael Yang authored Apr 22, 2024
  
  435cc866
17 Apr, 2024 2 commits
- add stablelm graph calculation · 3cf483fe
  Michael Yang authored Apr 17, 2024
  
  3cf483fe
- account for all non-repeating layers · a8b9b930
  Michael Yang authored Apr 17, 2024
  
  a8b9b930
11 Apr, 2024 1 commit
- mixtral mem · 3397eff0
  Michael Yang authored Apr 11, 2024
  
  3397eff0
10 Apr, 2024 2 commits
- partial offloading · 7e33a017
  Michael Yang authored Apr 05, 2024
  
  7e33a017
- refactor tensor query · 8b2c1006
  Michael Yang authored Apr 03, 2024
  
  8b2c1006
04 Apr, 2024 1 commit
- add command-r graph estimate · 01f77ae2
  Michael Yang authored Apr 04, 2024
  
  01f77ae2
03 Apr, 2024 1 commit
- update graph size estimate · 12e923e1
  Michael Yang authored Apr 02, 2024
  
  12e923e1
02 Apr, 2024 1 commit
- default head_kv to 1 · 90f071c6
  Michael Yang authored Apr 02, 2024
  
  90f071c6
01 Apr, 2024 2 commits
- update memory calcualtions · 91b3e4d2
  Michael Yang authored Mar 18, 2024
```
count each layer independently when deciding gpu offloading
```
  91b3e4d2
- refactor model parsing · d338d704
  Michael Yang authored Mar 13, 2024
  
  d338d704
29 Mar, 2024 1 commit
- Add gemma safetensors conversion (#3250) · 5a5efee4
  Patrick Devine authored Mar 28, 2024
```
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  5a5efee4
12 Mar, 2024 1 commit
- refactor readseeker · 00852979
  Michael Yang authored Mar 09, 2024
  
  00852979
08 Mar, 2024 1 commit
- decode ggla · 76bdebba
  Michael Yang authored Mar 08, 2024
  
  76bdebba
07 Mar, 2024 1 commit
- Convert Safetensors to an Ollama model (#2824) · 2c017ca4
  Patrick Devine authored Mar 06, 2024
  
  2c017ca4
21 Feb, 2024 1 commit
- add gguf file types (#2532) · 949d7b1c
  Michael Yang authored Feb 20, 2024
  
  949d7b1c
12 Jan, 2024 1 commit
- add max context length check · eaed6f8c
  Michael Yang authored Jan 12, 2024
  
  eaed6f8c
09 Jan, 2024 1 commit
- fix lint · 2bb2bdd5
  Michael Yang authored Dec 15, 2023
  
  2bb2bdd5
08 Jan, 2024 1 commit

Offload layers to GPU based on new model size estimates (#1850) · 08f1e189

Jeffrey Morgan authored Jan 08, 2024



* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

08f1e189

19 Dec, 2023 1 commit

deprecate ggml · 811b1f03

Bruce MacDonald authored Nov 24, 2023



- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>

811b1f03

10 Dec, 2023 2 commits
- seek to end of file when decoding older model formats · d9a250e9
  Jeffrey Morgan authored Dec 09, 2023
  
  d9a250e9
- seek to eof for older model binaries · 944519ed
  Jeffrey Morgan authored Dec 09, 2023
  
  944519ed
05 Dec, 2023 3 commits
- seek instead of copyn · 72e7a49a
  Michael Yang authored Nov 29, 2023
  
  72e7a49a
- split from into one or more models · 2cb0fa7d
  Michael Yang authored Nov 24, 2023
  
  2cb0fa7d
- unnecessary ReadSeeker for DecodeGGML · b2816bca
  Michael Yang authored Nov 22, 2023
  
  b2816bca
23 Oct, 2023 1 commit

ggufv3 · 125d0a01

Michael Yang authored Oct 23, 2023

ggufv3 adds support for big endianness, mainly for s390x architecture.
while that's not currently supported for ollama, the change is simple.

loosen version check to be more forward compatible. unless specified,
gguf versions other v1 will be decoded into v2.

125d0a01

03 Oct, 2023 1 commit
- starcoder · c02c0cd4
  Michael Yang authored Oct 02, 2023
  
  c02c0cd4
25 Sep, 2023 1 commit
- unbound max num gpu layers (#591) · 86279f4a
  Bruce MacDonald authored Sep 25, 2023
```
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  86279f4a
21 Sep, 2023 1 commit

remove tmp directories created by previous servers (#559) · 4cba75ef

Bruce MacDonald authored Sep 21, 2023



* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>

4cba75ef

18 Sep, 2023 1 commit

subprocess improvements (#524) · 66003e1d

Bruce MacDonald authored Sep 18, 2023

* subprocess improvements

- increase start-up timeout
- when runner fails to start fail rather than timing out
- try runners in order rather than choosing 1 runner
- embed metal runner in metal dir rather than gpu
- refactor logging and error messages

* Update llama.go

* Update llama.go

* simplify by using glob

66003e1d

14 Sep, 2023 1 commit

support for packaging in multiple cuda runners (#509) · 2540c918

Bruce MacDonald authored Sep 14, 2023



* enable packaging multiple cuda versions
* use nvcc cuda version if available

---------
Co-authored-by: Michael Yang <mxyng@pm.me>

2540c918

12 Sep, 2023 1 commit
- fix falcon decode · 7dee25a0
  Michael Yang authored Sep 12, 2023
```
get model and file type from bin file
```
  7dee25a0
07 Sep, 2023 1 commit
- GGUF support (#441) · 09dd2aef
  Bruce MacDonald authored Sep 07, 2023
  
  09dd2aef
24 Aug, 2023 1 commit
- add 34b model type · b1cececb
  Michael Yang authored Aug 24, 2023
  
  b1cececb
17 Aug, 2023 1 commit
- model and file type as strings · a894cc79
  Michael Yang authored Aug 17, 2023
  
  a894cc79
11 Aug, 2023 1 commit
- ggml: fix off by one error · 6ed991c8
  Michael Yang authored Aug 11, 2023
```
remove used Unknown FileType
```
  6ed991c8