Commits · d8def1ff9432ef60d1067e5e6dde0d700dd95021 · OpenDAS / ollama

07 Jul, 2024 2 commits
- llm: allow gemma 2 to context shift (#5534) · d8def1ff
  Jeffrey Morgan authored Jul 07, 2024
  
  d8def1ff
- llm: print caching notices in debug only (#5533) · 0e09c380
  Jeffrey Morgan authored Jul 07, 2024
  
  0e09c380
06 Jul, 2024 1 commit

llm: fix missing dylibs by restoring old build behavior on Linux and macOS (#5511) · 2cc854f8

Jeffrey Morgan authored Jul 05, 2024

* Revert "fix cmake build (#5505)"

This reverts commit 4fd5f352.

* llm: fix missing dylibs by restoring old build behavior

* crlf -> lf

2cc854f8

05 Jul, 2024 3 commits
- fix cmake build (#5505) · 4fd5f352
  Jeffrey Morgan authored Jul 05, 2024
  
  4fd5f352
- update llama.cpp submodule to `d7fd29f` (#5475) · 8f8e736b
  Jeffrey Morgan authored Jul 05, 2024
  
  8f8e736b
- Use slot with cached prompt instead of least recently used (#5492) · d89454de
  Jeffrey Morgan authored Jul 05, 2024
```
* Use common prefix to select slot

* actually report `longest`
```
  d89454de
03 Jul, 2024 1 commit

Return Correct Prompt Eval Count Regardless of Cache Prompt (#5371) · 3b5a4a77

royjhan authored Jul 03, 2024

* openai compatibility

* Revert "openai compatibility"

This reverts commit d3f98a811e00fc497d889c8c45b0cfec5b64690c.

* remove erroneous subtraction of prompt cache

3b5a4a77

29 Jun, 2024 1 commit
- Do not shift context for sliding window models (#5368) · 717f7229
  Jeffrey Morgan authored Jun 28, 2024
```
* Do not shift context for sliding window models

* truncate prompt > 2/3 tokens

* only target gemma2
```
  717f7229
19 Jun, 2024 1 commit
- remove confusing log message · 9d91e5e5
  Michael Yang authored Jun 19, 2024
  
  9d91e5e5
14 Jun, 2024 1 commit
- Fix server.cpp for the new cuda build macros · fb9cdfa7
  Daniel Hiltgen authored May 18, 2024
  
  fb9cdfa7
11 Jun, 2024 1 commit
- llm: fix seed value not being applied to requests (#4986) · ead259d8
  Jeffrey Morgan authored Jun 11, 2024
  
  ead259d8
09 Jun, 2024 1 commit

llm: always add bos token to prompt (#4941) · 34f14279

Jeffrey Morgan authored Jun 08, 2024



* fix embedding by adding fixes from llama.cpp upstream

* remove assert

---------
Co-authored-by: Jesper Ek <deadbeef84@gmail.com>

34f14279

01 Jun, 2024 1 commit

revert tokenize ffi (#4761) · 829ff87b

Michael Yang authored May 31, 2024

* Revert "use `int32_t` for call to tokenize (#4738)"

This reverts commit 763bb65d.

* Revert "vocab only"

This reverts commit bf54c845.

* Revert "use ffi for tokenizing/detokenizing"

This reverts commit 26a00a04.

829ff87b

29 May, 2024 3 commits
- rm unused infill · de781b37
  Michael Yang authored May 12, 2024
  
  de781b37
- rm unused system prompt · 3e217993
  Michael Yang authored May 12, 2024
  
  3e217993
- use ffi for tokenizing/detokenizing · 26a00a04
  Michael Yang authored May 11, 2024
  
  26a00a04
23 May, 2024 2 commits
- bump (#4597) · 714adb8b
  Michael Yang authored May 23, 2024
  
  714adb8b
- Wire up load progress · b37b496a
  Daniel Hiltgen authored May 20, 2024
```
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
```
  b37b496a
20 May, 2024 1 commit

feat: add support for flash_attn (#4120) · e15307fd

Sam authored May 21, 2024

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: add flash_attn support

e15307fd

09 May, 2024 1 commit
- log clean up · 58876091
  Michael Yang authored May 09, 2024
  
  58876091
04 May, 2024 1 commit
- omit prompt and generate settings from final response · 44869c59
  Michael Yang authored May 03, 2024
  
  44869c59
30 Apr, 2024 3 commits
- llm: add back check for empty token cache · fcf4d60e
  jmorganca authored Apr 30, 2024
  
  fcf4d60e
- update llama.cpp submodule to `f364eb6` (#4060) · 18d9a7e1
  Jeffrey Morgan authored Apr 30, 2024
  
  18d9a7e1
- Update llama.cpp (#4036) · 23d23409
  Daniel Hiltgen authored Apr 29, 2024
```
* Bump llama.cpp to b2761

* Adjust types for bump
```
  23d23409
17 Apr, 2024 1 commit
- Fixed startup sequence to report model loading · c942e4a0
  ManniX-ITA authored Apr 17, 2024
  
  c942e4a0
16 Apr, 2024 1 commit
- Support unicode characters in model path (#3681) · 7c9792a6
  Jeffrey Morgan authored Apr 16, 2024
```
* parse wide argv characters on windows

* cleanup

* move cleanup to end of `main`
```
  7c9792a6
01 Apr, 2024 2 commits

Apply 01-cache.diff · 0a0e9f3e
Daniel Hiltgen authored Mar 19, 2024

0a0e9f3e

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

26 Mar, 2024 1 commit
- add license in file header for vendored llama.cpp code (#3351) · f5ca7f8c
  Jeffrey Morgan authored Mar 26, 2024
  
  f5ca7f8c
23 Mar, 2024 1 commit
- Bump llama.cpp to b2474 · 43799532
  Daniel Hiltgen authored Mar 23, 2024
```
The release just before ggml-cuda.cu refactoring
```
  43799532
16 Mar, 2024 1 commit
- llama: remove server static assets (#3174) · e95ffc74
  Jeffrey Morgan authored Mar 15, 2024
  
  e95ffc74
12 Mar, 2024 3 commits
- Adapt our build for imported server.cpp · 85129d3a
  Daniel Hiltgen authored Mar 12, 2024
  
  85129d3a
- Import server.cpp as of b2356 · 9ac6440d
  Daniel Hiltgen authored Mar 12, 2024
  
  9ac6440d
- chore: fix typo (#3073) · 53c107e2
  racerole authored Mar 13, 2024
```
Signed-off-by: racerole <jiangyifeng@outlook.com>
```
  53c107e2
11 Mar, 2024 2 commits
- relay load model errors to the client (#3065) · b80661e8
  Bruce MacDonald authored Mar 11, 2024
  
  b80661e8
- update llama.cpp submodule to `ceca1ae` (#3064) · 369eda65
  Jeffrey Morgan authored Mar 11, 2024
  
  369eda65
09 Mar, 2024 1 commit
- update llama.cpp submodule to `77d1ac7` (#3030) · 1ffb1e28
  Jeffrey Morgan authored Mar 09, 2024
  
  1ffb1e28
08 Mar, 2024 1 commit
- update llama.cpp submodule to `6cdabe6` (#2999) · 0e4669b0
  Jeffrey Morgan authored Mar 08, 2024
  
  0e4669b0
01 Mar, 2024 1 commit
- update llama.cpp submodule to `c29af7e` (#2868) · 21347e1e
  Jeffrey Morgan authored Mar 01, 2024
  
  21347e1e
20 Feb, 2024 1 commit
- update llama.cpp submodule to `66c1968f7` (#2618) · 4613a080
  Jeffrey Morgan authored Feb 20, 2024
  
  4613a080