Commits · 95b1133d0cd84f027d23ddd78f4d01e62d806490 · OpenDAS / ollama

"awq/models/aquila.py" did not exist on "bd56c301e15bfa80fb8d8d6124528f92ba528696"

23 May, 2024 1 commit

Daniel Hiltgen authored May 20, 2024

This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load

b37b496a

16 May, 2024 1 commit
- update llama.cpp submodule to `614d3b9` (#4414) · 583c1f47
  Jeffrey Morgan authored May 16, 2024
  
  583c1f47
06 May, 2024 1 commit
- Fix llava models not working after first request (#4164) · 1b0e6c9c
  Jeffrey Morgan authored May 05, 2024
```
* fix llava models not working after first request

* individual requests only for llava models
```
  1b0e6c9c
26 Apr, 2024 1 commit
- Fix clip log import · 85801317
  Daniel Hiltgen authored Apr 26, 2024
  
  85801317
25 Apr, 2024 1 commit
- use matrix multiplcation kernels in more cases · ddf5c09a
  jmorganca authored Apr 25, 2024
  
  ddf5c09a
02 Apr, 2024 1 commit
- Bump to b2581 · 0035e31a
  Daniel Hiltgen authored Mar 25, 2024
  
  0035e31a
23 Mar, 2024 1 commit
- Bump llama.cpp to b2474 · 43799532
  Daniel Hiltgen authored Mar 23, 2024
```
The release just before ggml-cuda.cu refactoring
```
  43799532
14 Mar, 2024 1 commit
- fix: clip memory leak · 291c6638
  Michael Yang authored Mar 14, 2024
  
  291c6638
13 Mar, 2024 1 commit
- restore locale patch (#3091) · e72c567c
  Jeffrey Morgan authored Mar 12, 2024
  
  e72c567c
11 Mar, 2024 2 commits
- relay load model errors to the client (#3065) · b80661e8
  Bruce MacDonald authored Mar 11, 2024
  
  b80661e8
- update llama.cpp submodule to `ceca1ae` (#3064) · 369eda65
  Jeffrey Morgan authored Mar 11, 2024
  
  369eda65
10 Mar, 2024 2 commits
- fix `03-locale.diff` · 41b00b98
  Jeffrey Morgan authored Mar 10, 2024
  
  41b00b98
- patch: use default locale in wpm tokenizer (#3034) · 908005d9
  Jeffrey Morgan authored Mar 09, 2024
  
  908005d9
09 Mar, 2024 1 commit
- update llama.cpp submodule to `77d1ac7` (#3030) · 1ffb1e28
  Jeffrey Morgan authored Mar 09, 2024
  
  1ffb1e28
08 Mar, 2024 1 commit
- update llama.cpp submodule to `6cdabe6` (#2999) · 0e4669b0
  Jeffrey Morgan authored Mar 08, 2024
  
  0e4669b0
01 Mar, 2024 1 commit
- update llama.cpp submodule to `c29af7e` (#2868) · 21347e1e
  Jeffrey Morgan authored Mar 01, 2024
  
  21347e1e
20 Feb, 2024 1 commit
- update llama.cpp submodule to `66c1968f7` (#2618) · 4613a080
  Jeffrey Morgan authored Feb 20, 2024
  
  4613a080
19 Feb, 2024 1 commit

Fix cuda leaks · fc39a6cd

Daniel Hiltgen authored Feb 18, 2024

This should resolve the problem where we don't fully unload from the GPU
when we go idle.

fc39a6cd

12 Feb, 2024 1 commit
- patch: always add token to cache_tokens (#2459) · 26b13fc3
  Jeffrey Morgan authored Feb 12, 2024
  
  26b13fc3
06 Feb, 2024 1 commit
- Bump llama.cpp to b2081 · de76b95d
  Daniel Hiltgen authored Feb 06, 2024
  
  de76b95d
31 Jan, 2024 1 commit

Bump llama.cpp to b1999 · 72b12c3b

Daniel Hiltgen authored Jan 29, 2024

This requires an upstream change to support graceful termination,
carried as a patch.

72b12c3b

25 Jan, 2024 1 commit
- Fix clearing kv cache between requests with the same prompt (#2186) · a64570dc
  Jeffrey Morgan authored Jan 25, 2024
```
* Fix clearing kv cache between requests with the same prompt

* fix powershell script
```
  a64570dc