Commits · b3e5491e41811294de9d81649a96581af6522d08 · OpenDAS / ollama

21 Jul, 2024 1 commit
- llm: consider `head_dim` in llama arch (#5817) · 5534f2cc
  Jeffrey Morgan authored Jul 20, 2024
  
  5534f2cc
20 Jul, 2024 1 commit
- add patch for tekken (#5807) · 1475eab9
  Jeffrey Morgan authored Jul 20, 2024
  
  1475eab9
07 Jul, 2024 1 commit
- Update llama.cpp submodule to `a8db2a9c` (#5530) · 571dc619
  Jeffrey Morgan authored Jul 07, 2024
  
  571dc619
05 Jul, 2024 2 commits
- update llama.cpp submodule to `d7fd29f` (#5475) · 8f8e736b
  Jeffrey Morgan authored Jul 05, 2024
  
  8f8e736b
- Fix assert on small embedding inputs (#5491) · e9188e97
  Jeffrey Morgan authored Jul 05, 2024
```
* Fix assert on small embedding inputs

* Update llm/patches/09-pooling.diff
```
  e9188e97
03 Jul, 2024 1 commit

Fix clip model loading with unicode paths · 6298f498

Daniel Hiltgen authored Jul 03, 2024

On windows, if the model dir contained unicode characters
clip models would fail to load.  This fixes the file name
handling in clip.cpp to support utf16 on windows.

6298f498

27 Jun, 2024 1 commit
- llm: architecture patch (#5316) · 4d311eb7
  Jeffrey Morgan authored Jun 26, 2024
  
  4d311eb7
17 Jun, 2024 1 commit

llm: update llama.cpp commit to `7c26775` (#4896) · 152fc202

Jeffrey Morgan authored Jun 17, 2024

* llm: update llama.cpp submodule to `7c26775`

* disable `LLAMA_BLAS` for now

* `-DLLAMA_OPENMP=off`

152fc202

07 Jun, 2024 1 commit
- llm: patch to fix qwen 2 temporarily on nvidia (#4897) · ce0dc33c
  Jeffrey Morgan authored Jun 06, 2024
  
  ce0dc33c
30 May, 2024 1 commit
- Update llama.cpp submodule to `5921b8f0` (#4731) · 22f5c12c
  Jeffrey Morgan authored May 30, 2024
```
* update llama.cpp submodule to `5921b8f089d3b7bda86aac5a66825df6a6c10603`

* add patch
```
  22f5c12c
23 May, 2024 2 commits
- bump (#4597) · 714adb8b
  Michael Yang authored May 23, 2024
  
  714adb8b
- Wire up load progress · b37b496a
  Daniel Hiltgen authored May 20, 2024
```
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
```
  b37b496a
16 May, 2024 1 commit
- update llama.cpp submodule to `614d3b9` (#4414) · 583c1f47
  Jeffrey Morgan authored May 16, 2024
  
  583c1f47
06 May, 2024 1 commit
- Fix llava models not working after first request (#4164) · 1b0e6c9c
  Jeffrey Morgan authored May 05, 2024
```
* fix llava models not working after first request

* individual requests only for llava models
```
  1b0e6c9c
26 Apr, 2024 1 commit
- Fix clip log import · 85801317
  Daniel Hiltgen authored Apr 26, 2024
  
  85801317
25 Apr, 2024 1 commit
- use matrix multiplcation kernels in more cases · ddf5c09a
  jmorganca authored Apr 25, 2024
  
  ddf5c09a
02 Apr, 2024 1 commit
- Bump to b2581 · 0035e31a
  Daniel Hiltgen authored Mar 25, 2024
  
  0035e31a
23 Mar, 2024 1 commit
- Bump llama.cpp to b2474 · 43799532
  Daniel Hiltgen authored Mar 23, 2024
```
The release just before ggml-cuda.cu refactoring
```
  43799532
14 Mar, 2024 1 commit
- fix: clip memory leak · 291c6638
  Michael Yang authored Mar 14, 2024
  
  291c6638
13 Mar, 2024 1 commit
- restore locale patch (#3091) · e72c567c
  Jeffrey Morgan authored Mar 12, 2024
  
  e72c567c
11 Mar, 2024 2 commits
- relay load model errors to the client (#3065) · b80661e8
  Bruce MacDonald authored Mar 11, 2024
  
  b80661e8
- update llama.cpp submodule to `ceca1ae` (#3064) · 369eda65
  Jeffrey Morgan authored Mar 11, 2024
  
  369eda65
10 Mar, 2024 2 commits
- fix `03-locale.diff` · 41b00b98
  Jeffrey Morgan authored Mar 10, 2024
  
  41b00b98
- patch: use default locale in wpm tokenizer (#3034) · 908005d9
  Jeffrey Morgan authored Mar 09, 2024
  
  908005d9
09 Mar, 2024 1 commit
- update llama.cpp submodule to `77d1ac7` (#3030) · 1ffb1e28
  Jeffrey Morgan authored Mar 09, 2024
  
  1ffb1e28
08 Mar, 2024 1 commit
- update llama.cpp submodule to `6cdabe6` (#2999) · 0e4669b0
  Jeffrey Morgan authored Mar 08, 2024
  
  0e4669b0
01 Mar, 2024 1 commit
- update llama.cpp submodule to `c29af7e` (#2868) · 21347e1e
  Jeffrey Morgan authored Mar 01, 2024
  
  21347e1e
20 Feb, 2024 1 commit
- update llama.cpp submodule to `66c1968f7` (#2618) · 4613a080
  Jeffrey Morgan authored Feb 20, 2024
  
  4613a080
19 Feb, 2024 1 commit

Fix cuda leaks · fc39a6cd

Daniel Hiltgen authored Feb 18, 2024

This should resolve the problem where we don't fully unload from the GPU
when we go idle.

fc39a6cd

12 Feb, 2024 1 commit
- patch: always add token to cache_tokens (#2459) · 26b13fc3
  Jeffrey Morgan authored Feb 12, 2024
  
  26b13fc3
06 Feb, 2024 1 commit
- Bump llama.cpp to b2081 · de76b95d
  Daniel Hiltgen authored Feb 06, 2024
  
  de76b95d
31 Jan, 2024 1 commit

Bump llama.cpp to b1999 · 72b12c3b

Daniel Hiltgen authored Jan 29, 2024

This requires an upstream change to support graceful termination,
carried as a patch.

72b12c3b

25 Jan, 2024 1 commit
- Fix clearing kv cache between requests with the same prompt (#2186) · a64570dc
  Jeffrey Morgan authored Jan 25, 2024
```
* Fix clearing kv cache between requests with the same prompt

* fix powershell script
```
  a64570dc