Commits · ead259d877fc8b20f7943f1f9e8eeaae0acfa52a · OpenDAS / ollama

"sgl-kernel/vscode:/vscode.git/clone" did not exist on "aa3eba8eb42cb4a49bd1475e78198734b4f5ada4"

11 Jun, 2024 1 commit
- llm: fix seed value not being applied to requests (#4986) · ead259d8
  Jeffrey Morgan authored Jun 11, 2024
  
  ead259d8
09 Jun, 2024 2 commits

Critical fix from llama.cpp JSON grammar to forbid un-escaped escape... · b84aea16

Craig Hughes authored Jun 09, 2024

Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782)

b84aea16

llm: always add bos token to prompt (#4941) · 34f14279

Jeffrey Morgan authored Jun 08, 2024



* fix embedding by adding fixes from llama.cpp upstream

* remove assert

---------
Co-authored-by: Jesper Ek <deadbeef84@gmail.com>

34f14279

08 Jun, 2024 1 commit
- fix parsing big endian gguf · 620d5c56
  Michael Yang authored Jun 08, 2024
  
  620d5c56
07 Jun, 2024 3 commits
- fix create model when template detection errors · 030e765e
  Michael Yang authored Jun 07, 2024
  
  030e765e
- Add ability to skip oneapi generate · ab8c929e
  Daniel Hiltgen authored Jun 07, 2024
```
This follows the same pattern for cuda and rocm to allow
disabling the build even when we detect the dependent libraries
```
  ab8c929e
- llm: patch to fix qwen 2 temporarily on nvidia (#4897) · ce0dc33c
  Jeffrey Morgan authored Jun 06, 2024
  
  ce0dc33c
06 Jun, 2024 1 commit
- detect chat template from KV · 9b6c2e6e
  Michael Yang authored Jun 03, 2024
  
  9b6c2e6e
04 Jun, 2024 4 commits
- gofmt, goimports · 6297f856
  Michael Yang authored Jun 04, 2024
  
  6297f856
- lint · e40145a3
  Michael Yang authored May 21, 2024
  
  e40145a3
- some gocritic · c895a7d1
  Michael Yang authored May 21, 2024
  
  c895a7d1
- replace x/exp/slices with slices · 04f3c12b
  Michael Yang authored May 21, 2024
  
  04f3c12b
01 Jun, 2024 1 commit

revert tokenize ffi (#4761) · 829ff87b

Michael Yang authored May 31, 2024

* Revert "use `int32_t` for call to tokenize (#4738)"

This reverts commit 763bb65d.

* Revert "vocab only"

This reverts commit bf54c845.

* Revert "use ffi for tokenizing/detokenizing"

This reverts commit 26a00a04.

829ff87b

31 May, 2024 2 commits
- use `int32_t` for call to tokenize (#4738) · 763bb65d
  Jeffrey Morgan authored May 30, 2024
```
* use `int32_t` for call to tokenize

* variable naming

* cleanup

* fix crash
```
  763bb65d
- speed up tests by only building static lib (#4740) · 7ca9605f
  Jeffrey Morgan authored May 30, 2024
  
  7ca9605f
30 May, 2024 3 commits
- partial offloading: allow flash attention and disable mmap (#4734) · a50a87a7
  Jeffrey Morgan authored May 30, 2024
```
* partial offloading: allow flash attention and disable mmap

* allow mmap with num_gpu=0
```
  a50a87a7
- vocab only · bf54c845
  Michael Yang authored May 30, 2024
  
  bf54c845
- Update llama.cpp submodule to `5921b8f0` (#4731) · 22f5c12c
  Jeffrey Morgan authored May 30, 2024
```
* update llama.cpp submodule to `5921b8f089d3b7bda86aac5a66825df6a6c10603`

* add patch
```
  22f5c12c
29 May, 2024 3 commits
- rm unused infill · de781b37
  Michael Yang authored May 12, 2024
  
  de781b37
- rm unused system prompt · 3e217993
  Michael Yang authored May 12, 2024
  
  3e217993
- use ffi for tokenizing/detokenizing · 26a00a04
  Michael Yang authored May 11, 2024
  
  26a00a04
28 May, 2024 2 commits

Give the final model loading more time · 92c81e81

Daniel Hiltgen authored May 28, 2024

On some systems, 1 minute isn't sufficient to finish the load after it
hits 100% This creates 2 distinct timers, although they're both set to
the same value for now so we can refine the timeouts further.

92c81e81

llm/server.go: Fix 2 minor typos (#4661) · 7487229c
Lei Jitang authored May 28, 2024
```
Signed-off-by: Lei Jitang <leijitang@outlook.com>
```
7487229c

25 May, 2024 1 commit

Report better warning on client closed abort of load · c4209d6d

Daniel Hiltgen authored May 25, 2024

If the client closes the connection before we finish loading the model
we abort, so lets make the log message clearer why to help users
understand this failure mode

c4209d6d

24 May, 2024 4 commits
- Update llm/ggml.go · d51f1525
  Michael Yang authored May 24, 2024
```
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
```
  d51f1525
- fix q5_0, q5_1 · 8f440d57
  Michael Yang authored May 24, 2024
  
  8f440d57
- Move envconfig and consolidate env vars (#4608) · 4cc3be30
  Patrick Devine authored May 24, 2024
  
  4cc3be30
- support ollama run on Intel GPUs · fd5971be
  Wang,Zhe authored May 24, 2024
  
  fd5971be
23 May, 2024 4 commits
- bump (#4597) · 714adb8b
  Michael Yang authored May 23, 2024
  
  714adb8b
- Wire up load progress · b37b496a
  Daniel Hiltgen authored May 20, 2024
```
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
```
  b37b496a
- Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322) · d6f692ad
  Bruce MacDonald authored May 23, 2024
```
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com>
```
  d6f692ad
- Use flash attention flag for now (#4580) · 38255d2a
  Jeffrey Morgan authored May 22, 2024
```
* put flash attention behind flag for now

* add test

* remove print

* up timeout for sheduler tests
```
  38255d2a
21 May, 2024 1 commit
- simplify safetensors reading · 171eb040
  Michael Yang authored May 20, 2024
  
  171eb040
20 May, 2024 6 commits
- cleanup · bbbd9f20
  Michael Yang authored May 15, 2024
  
  bbbd9f20
- bpe pretokenizer · 547132e8
  Michael Yang authored May 15, 2024
  
  547132e8
- llama3 conversion · c8cf0d94
  Patrick Devine authored Apr 28, 2024
  
  c8cf0d94
- set llama.cpp submodule commit to `614d3b9` · 5cab1373
  jmorganca authored May 20, 2024
  
  5cab1373
- updated updateURL · 8aadad9c
  Josh Yan authored May 20, 2024
  
  8aadad9c
- feat: add support for flash_attn (#4120) · e15307fd
  Sam authored May 21, 2024
```
* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: add flash_attn support
```
  e15307fd
16 May, 2024 1 commit
- update llama.cpp submodule to `614d3b9` (#4414) · 583c1f47
  Jeffrey Morgan authored May 16, 2024
  
  583c1f47