Commits · dc18eee39d8db35e6cbbc416a39ecbbda68fa962 · OpenDAS / ollama

09 May, 2024 2 commits
- add done_reason to the api (#4235) · cfa84b84
  Bruce MacDonald authored May 09, 2024
  
  cfa84b84
- Refine subprocess reaping · 84ac7ce1
  Daniel Hiltgen authored May 09, 2024
  
  84ac7ce1
08 May, 2024 2 commits
- Record GPU usage information · bee2f4a3
  Daniel Hiltgen authored May 04, 2024
```
This records more GPU usage information for eventual UX inclusion.
```
  bee2f4a3
- skip if same quantization · eeb69526
  Michael Yang authored May 07, 2024
  
  eeb69526
07 May, 2024 2 commits
- Detect noexec and report a better error · 72700279
  Daniel Hiltgen authored May 07, 2024
```
This will bubble up a much more informative error message if noexec
is preventing us from running the subprocess
```
  72700279
- llm: add minimum based on layer size · 4736391b
  Michael Yang authored May 06, 2024
  
  4736391b
06 May, 2024 5 commits
- comments · 01811c17
  Michael Yang authored Apr 23, 2024
  
  01811c17
- quantize any fp16/fp32 model · 9685c345
  Michael Yang authored Apr 12, 2024
```
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
```
  9685c345
- Use our libraries first · 380378cc
  Daniel Hiltgen authored May 05, 2024
```
Trying to live off the land for cuda libraries was not the right strategy.  We need to use the version we compiled against to ensure things work properly
```
  380378cc
- Fix `no slots available` error with concurrent requests (#4160) · ed740a25
  Jeffrey Morgan authored May 06, 2024
  
  ed740a25
- Fix llava models not working after first request (#4164) · 1b0e6c9c
  Jeffrey Morgan authored May 05, 2024
```
* fix llava models not working after first request

* individual requests only for llava models
```
  1b0e6c9c
05 May, 2024 1 commit

Centralize server config handling · f56aa200

Daniel Hiltgen authored May 04, 2024

This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs

f56aa200

04 May, 2024 1 commit
- omit prompt and generate settings from final response · 44869c59
  Michael Yang authored May 03, 2024
  
  44869c59
01 May, 2024 5 commits
- Removing go routine calling .wait from load. · 321d57e1
  Mark Ward authored May 01, 2024
  
  321d57e1
- it will always return an error due to Kill() discarding Wait() errors · ba26c7aa
  Mark Ward authored Apr 29, 2024
  
  ba26c7aa
- log when the waiting for the process to stop to help debug when other tasks... · 63c76368
  Mark Ward authored Apr 29, 2024
```
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
```
  63c76368
- fix sched to wait for the runner to terminate to ensure following vram check will be more accurate · 948114e3
  Mark Ward authored Apr 28, 2024
  
  948114e3
- gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068) · f0c454ab
  Jeffrey Morgan authored May 01, 2024
  
  f0c454ab
30 Apr, 2024 4 commits
- llm: add back check for empty token cache · fcf4d60e
  jmorganca authored Apr 30, 2024
  
  fcf4d60e
- update llama.cpp commit to `952d03d` · e33d5c2d
  jmorganca authored Apr 30, 2024
  
  e33d5c2d
- update llama.cpp submodule to `f364eb6` (#4060) · 18d9a7e1
  Jeffrey Morgan authored Apr 30, 2024
  
  18d9a7e1
- Update llama.cpp (#4036) · 23d23409
  Daniel Hiltgen authored Apr 29, 2024
```
* Bump llama.cpp to b2761

* Adjust types for bump
```
  23d23409
29 Apr, 2024 1 commit
- llm: dont cap context window limit to training context window (#3988) · 7aa08a77
  Jeffrey Morgan authored Apr 29, 2024
  
  7aa08a77
27 Apr, 2024 3 commits
- Do not build AVX runners on ARM64 · 8a65717f
  Hernan Martinez authored Apr 26, 2024
  
  8a65717f
- Use architecture specific folders in the generate script · b438d485
  Hernan Martinez authored Apr 26, 2024
  
  b438d485
- Add import declaration for windows,arm64 to llm.go · 86e67fc4
  Hernan Martinez authored Apr 26, 2024
  
  86e67fc4
26 Apr, 2024 9 commits
- Fine grain control over windows generate steps · e4859c45
  Daniel Hiltgen authored Apr 26, 2024
```
This will speed up CI which already tries to only build static for unit tests
```
  e4859c45
- Fix target in gen_windows.ps1 · ed5fb088
  Daniel Hiltgen authored Apr 26, 2024
  
  ed5fb088
- fix gemma, command-r layer weights · f81f3081
  Michael Yang authored Apr 26, 2024
  
  f81f3081
- return code `499` when user cancels request while a model is loading (#3955) · bb31def0
  Jeffrey Morgan authored Apr 26, 2024
  
  bb31def0
- Put back non-avx CPU build for windows · 421c878a
  Daniel Hiltgen authored Apr 26, 2024
  
  421c878a
- Fix clip log import · 85801317
  Daniel Hiltgen authored Apr 26, 2024
  
  85801317
- Bump llama.cpp to b2737 · 2ed0d659
  Daniel Hiltgen authored Apr 25, 2024
  
  2ed0d659
- Refactor windows generate for more modular usage · 8671fded
  Daniel Hiltgen authored Apr 25, 2024
  
  8671fded
- Move cuda/rocm dependency gathering into generate script · 8feb97dc
  Daniel Hiltgen authored Apr 25, 2024
```
This will make it simpler for CI to accumulate artifacts from prior steps
```
  8feb97dc
25 Apr, 2024 4 commits
- llm: limit generation to 10x context size to avoid run on generations (#3918) · 993cf8bf
  Jeffrey Morgan authored Apr 25, 2024
```
* llm: limit generation to 10x context size to avoid run on generations

* add comment

* simplify condition statement
```
  993cf8bf
- only count output tensors · 7bb7cb8a
  Michael Yang authored Apr 25, 2024
  
  7bb7cb8a
- use matrix multiplcation kernels in more cases · ddf5c09a
  jmorganca authored Apr 25, 2024
  
  ddf5c09a
- Remove trailing spaces (#3889) · 5f73c087
  Roy Yang authored Apr 25, 2024
  
  5f73c087
24 Apr, 2024 1 commit
- fixes for gguf (#3863) · 14476d48
  Patrick Devine authored Apr 23, 2024
  
  14476d48