Commits · 717f7229eb4f9220d4070aae617923950643d327 · OpenDAS / ollama

29 Jun, 2024 1 commit
- Do not shift context for sliding window models (#5368) · 717f7229
  Jeffrey Morgan authored Jun 28, 2024
```
* Do not shift context for sliding window models

* truncate prompt > 2/3 tokens

* only target gemma2
```
  717f7229
19 Jun, 2024 1 commit
- remove confusing log message · 9d91e5e5
  Michael Yang authored Jun 19, 2024
  
  9d91e5e5
14 Jun, 2024 1 commit
- Fix server.cpp for the new cuda build macros · fb9cdfa7
  Daniel Hiltgen authored May 18, 2024
  
  fb9cdfa7
11 Jun, 2024 1 commit
- llm: fix seed value not being applied to requests (#4986) · ead259d8
  Jeffrey Morgan authored Jun 11, 2024
  
  ead259d8
09 Jun, 2024 1 commit

llm: always add bos token to prompt (#4941) · 34f14279

Jeffrey Morgan authored Jun 08, 2024



* fix embedding by adding fixes from llama.cpp upstream

* remove assert

---------
Co-authored-by: Jesper Ek <deadbeef84@gmail.com>

34f14279

01 Jun, 2024 1 commit

revert tokenize ffi (#4761) · 829ff87b

Michael Yang authored May 31, 2024

* Revert "use `int32_t` for call to tokenize (#4738)"

This reverts commit 763bb65d.

* Revert "vocab only"

This reverts commit bf54c845.

* Revert "use ffi for tokenizing/detokenizing"

This reverts commit 26a00a04.

829ff87b

29 May, 2024 3 commits
- rm unused infill · de781b37
  Michael Yang authored May 12, 2024
  
  de781b37
- rm unused system prompt · 3e217993
  Michael Yang authored May 12, 2024
  
  3e217993
- use ffi for tokenizing/detokenizing · 26a00a04
  Michael Yang authored May 11, 2024
  
  26a00a04
23 May, 2024 2 commits
- bump (#4597) · 714adb8b
  Michael Yang authored May 23, 2024
  
  714adb8b
- Wire up load progress · b37b496a
  Daniel Hiltgen authored May 20, 2024
```
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
```
  b37b496a
20 May, 2024 1 commit

feat: add support for flash_attn (#4120) · e15307fd

Sam authored May 21, 2024

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: add flash_attn support

e15307fd

09 May, 2024 1 commit
- log clean up · 58876091
  Michael Yang authored May 09, 2024
  
  58876091
04 May, 2024 1 commit
- omit prompt and generate settings from final response · 44869c59
  Michael Yang authored May 03, 2024
  
  44869c59
30 Apr, 2024 3 commits
- llm: add back check for empty token cache · fcf4d60e
  jmorganca authored Apr 30, 2024
  
  fcf4d60e
- update llama.cpp submodule to `f364eb6` (#4060) · 18d9a7e1
  Jeffrey Morgan authored Apr 30, 2024
  
  18d9a7e1
- Update llama.cpp (#4036) · 23d23409
  Daniel Hiltgen authored Apr 29, 2024
```
* Bump llama.cpp to b2761

* Adjust types for bump
```
  23d23409
17 Apr, 2024 1 commit
- Fixed startup sequence to report model loading · c942e4a0
  ManniX-ITA authored Apr 17, 2024
  
  c942e4a0
16 Apr, 2024 1 commit
- Support unicode characters in model path (#3681) · 7c9792a6
  Jeffrey Morgan authored Apr 16, 2024
```
* parse wide argv characters on windows

* cleanup

* move cleanup to end of `main`
```
  7c9792a6
01 Apr, 2024 2 commits

Apply 01-cache.diff · 0a0e9f3e
Daniel Hiltgen authored Mar 19, 2024

0a0e9f3e

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

26 Mar, 2024 1 commit
- add license in file header for vendored llama.cpp code (#3351) · f5ca7f8c
  Jeffrey Morgan authored Mar 26, 2024
  
  f5ca7f8c
23 Mar, 2024 1 commit
- Bump llama.cpp to b2474 · 43799532
  Daniel Hiltgen authored Mar 23, 2024
```
The release just before ggml-cuda.cu refactoring
```
  43799532
16 Mar, 2024 1 commit
- llama: remove server static assets (#3174) · e95ffc74
  Jeffrey Morgan authored Mar 15, 2024
  
  e95ffc74
12 Mar, 2024 2 commits
- Adapt our build for imported server.cpp · 85129d3a
  Daniel Hiltgen authored Mar 12, 2024
  
  85129d3a
- Import server.cpp as of b2356 · 9ac6440d
  Daniel Hiltgen authored Mar 12, 2024
  
  9ac6440d