Commits · b9f5e16c8025f115abde34ff047023f4d6e34af5 · OpenDAS / ollama

15 Jul, 2024 1 commit

Introduce `/api/embed` endpoint supporting batch embedding (#5127) · b9f5e16c

royjhan authored Jul 15, 2024

* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up

b9f5e16c

07 Jul, 2024 2 commits
- llm: allow gemma 2 to context shift (#5534) · d8def1ff
  Jeffrey Morgan authored Jul 07, 2024
  
  d8def1ff
- llm: print caching notices in debug only (#5533) · 0e09c380
  Jeffrey Morgan authored Jul 07, 2024
  
  0e09c380
05 Jul, 2024 1 commit
- Use slot with cached prompt instead of least recently used (#5492) · d89454de
  Jeffrey Morgan authored Jul 05, 2024
```
* Use common prefix to select slot

* actually report `longest`
```
  d89454de
03 Jul, 2024 1 commit

Return Correct Prompt Eval Count Regardless of Cache Prompt (#5371) · 3b5a4a77

royjhan authored Jul 03, 2024

* openai compatibility

* Revert "openai compatibility"

This reverts commit d3f98a811e00fc497d889c8c45b0cfec5b64690c.

* remove erroneous subtraction of prompt cache

3b5a4a77

29 Jun, 2024 1 commit
- Do not shift context for sliding window models (#5368) · 717f7229
  Jeffrey Morgan authored Jun 28, 2024
```
* Do not shift context for sliding window models

* truncate prompt > 2/3 tokens

* only target gemma2
```
  717f7229
19 Jun, 2024 1 commit
- remove confusing log message · 9d91e5e5
  Michael Yang authored Jun 19, 2024
  
  9d91e5e5
14 Jun, 2024 1 commit
- Fix server.cpp for the new cuda build macros · fb9cdfa7
  Daniel Hiltgen authored May 18, 2024
  
  fb9cdfa7
11 Jun, 2024 1 commit
- llm: fix seed value not being applied to requests (#4986) · ead259d8
  Jeffrey Morgan authored Jun 11, 2024
  
  ead259d8
09 Jun, 2024 1 commit

llm: always add bos token to prompt (#4941) · 34f14279

Jeffrey Morgan authored Jun 08, 2024



* fix embedding by adding fixes from llama.cpp upstream

* remove assert

---------
Co-authored-by: Jesper Ek <deadbeef84@gmail.com>

34f14279

01 Jun, 2024 1 commit

revert tokenize ffi (#4761) · 829ff87b

Michael Yang authored May 31, 2024

* Revert "use `int32_t` for call to tokenize (#4738)"

This reverts commit 763bb65d.

* Revert "vocab only"

This reverts commit bf54c845.

* Revert "use ffi for tokenizing/detokenizing"

This reverts commit 26a00a04.

829ff87b

29 May, 2024 3 commits
- rm unused infill · de781b37
  Michael Yang authored May 12, 2024
  
  de781b37
- rm unused system prompt · 3e217993
  Michael Yang authored May 12, 2024
  
  3e217993
- use ffi for tokenizing/detokenizing · 26a00a04
  Michael Yang authored May 11, 2024
  
  26a00a04
23 May, 2024 2 commits
- bump (#4597) · 714adb8b
  Michael Yang authored May 23, 2024
  
  714adb8b
- Wire up load progress · b37b496a
  Daniel Hiltgen authored May 20, 2024
```
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
```
  b37b496a
20 May, 2024 1 commit

feat: add support for flash_attn (#4120) · e15307fd

Sam authored May 21, 2024

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: add flash_attn support

e15307fd

09 May, 2024 1 commit
- log clean up · 58876091
  Michael Yang authored May 09, 2024
  
  58876091
04 May, 2024 1 commit
- omit prompt and generate settings from final response · 44869c59
  Michael Yang authored May 03, 2024
  
  44869c59
30 Apr, 2024 3 commits
- llm: add back check for empty token cache · fcf4d60e
  jmorganca authored Apr 30, 2024
  
  fcf4d60e
- update llama.cpp submodule to `f364eb6` (#4060) · 18d9a7e1
  Jeffrey Morgan authored Apr 30, 2024
  
  18d9a7e1
- Update llama.cpp (#4036) · 23d23409
  Daniel Hiltgen authored Apr 29, 2024
```
* Bump llama.cpp to b2761

* Adjust types for bump
```
  23d23409
17 Apr, 2024 1 commit
- Fixed startup sequence to report model loading · c942e4a0
  ManniX-ITA authored Apr 17, 2024
  
  c942e4a0
16 Apr, 2024 1 commit
- Support unicode characters in model path (#3681) · 7c9792a6
  Jeffrey Morgan authored Apr 16, 2024
```
* parse wide argv characters on windows

* cleanup

* move cleanup to end of `main`
```
  7c9792a6
01 Apr, 2024 2 commits

Apply 01-cache.diff · 0a0e9f3e
Daniel Hiltgen authored Mar 19, 2024

0a0e9f3e

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

26 Mar, 2024 1 commit
- add license in file header for vendored llama.cpp code (#3351) · f5ca7f8c
  Jeffrey Morgan authored Mar 26, 2024
  
  f5ca7f8c
23 Mar, 2024 1 commit
- Bump llama.cpp to b2474 · 43799532
  Daniel Hiltgen authored Mar 23, 2024
```
The release just before ggml-cuda.cu refactoring
```
  43799532
16 Mar, 2024 1 commit
- llama: remove server static assets (#3174) · e95ffc74
  Jeffrey Morgan authored Mar 15, 2024
  
  e95ffc74
12 Mar, 2024 2 commits
- Adapt our build for imported server.cpp · 85129d3a
  Daniel Hiltgen authored Mar 12, 2024
  
  85129d3a
- Import server.cpp as of b2356 · 9ac6440d
  Daniel Hiltgen authored Mar 12, 2024
  
  9ac6440d