Commits · cc5a71e0e339cc9d3742674f2b0ccb40100be539 · OpenDAS / ollama

23 Apr, 2024 3 commits

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

fix: mixtral graph · 435cc866
Michael Yang authored Apr 22, 2024

435cc866
Trim spaces and quotes from llm lib override · aa72281e
Daniel Hiltgen authored Apr 22, 2024

aa72281e

21 Apr, 2024 2 commits
- Update gen_windows.ps1 · 9c0db4cc
  Jeremy authored Apr 21, 2024
```
Fixed improper env references
```
  9c0db4cc
- chore: use errors.New to replace fmt.Errorf will much better (#3789) · 62be2050
  Cheng authored Apr 21, 2024
  
  62be2050
18 Apr, 2024 3 commits

Update gen_windows.ps1 · 6f18297b
Jeremy authored Apr 18, 2024
```
Forgot a " on the write-host
```
6f18297b

Update gen_windows.ps1 · 15016413

Jeremy authored Apr 18, 2024

Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS to customize GPU builds on Windows

15016413

Update gen_linux.sh · 440b7190

Jeremy authored Apr 18, 2024

Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS instead of OLLAMA_CUSTOM_GPU_DEFS

440b7190

17 Apr, 2024 6 commits
- add stablelm graph calculation · 3cf483fe
  Michael Yang authored Apr 17, 2024
  
  3cf483fe
- add support for custom gpu build flags for llama.cpp · 52f5370c
  Jeremy authored Apr 17, 2024
  
  52f5370c
- adds support for OLLAMA_CUSTOM_GPU_DEFS to customize GPU build flags · 7c000ec3
  Jeremy authored Apr 17, 2024
  
  7c000ec3
- rearranged conditional logic for static build, dockerfile updated · 8aec92fa
  Jeremy authored Apr 17, 2024
  
  8aec92fa
- account for all non-repeating layers · a8b9b930
  Michael Yang authored Apr 17, 2024
  
  a8b9b930
- move static build to its own flag · 70261b9b
  Jeremy authored Apr 17, 2024
  
  70261b9b
16 Apr, 2024 6 commits
- fix padding to only return padding · e74163af
  Michael Yang authored Apr 15, 2024
  
  e74163af
- scale graph based on gpu count · 26df6747
  Michael Yang authored Apr 16, 2024
  
  26df6747
- Support unicode characters in model path (#3681) · 7c9792a6
  Jeffrey Morgan authored Apr 16, 2024
```
* parse wide argv characters on windows

* cleanup

* move cleanup to end of `main`
```
  7c9792a6
- darwin: no partial offloading if required memory greater than system · 41a272de
  Michael Yang authored Apr 16, 2024
  
  41a272de
- update llama.cpp submodule to `7593639` (#3665) · f3357222
  Jeffrey Morgan authored Apr 15, 2024
  
  f3357222
- fix padding in decode · 969238b1
  Michael Yang authored Apr 15, 2024
```
TODO: update padding() to _only_ returning the padding
```
  969238b1
15 Apr, 2024 2 commits
- Add llama2 / torch models for `ollama create` (#3607) · 9f8691c6
  Patrick Devine authored Apr 15, 2024
  
  9f8691c6
- Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653) · a0b8a32e
  Jeffrey Morgan authored Apr 15, 2024
```
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading

* use `unload` in signal handler
```
  a0b8a32e
13 Apr, 2024 1 commit
- update llama.cpp submodule to `4bd0f93` (#3627) · 309aef7f
  Jeffrey Morgan authored Apr 13, 2024
  
  309aef7f
11 Apr, 2024 1 commit
- mixtral mem · 3397eff0
  Michael Yang authored Apr 11, 2024
  
  3397eff0
10 Apr, 2024 2 commits
- partial offloading · 7e33a017
  Michael Yang authored Apr 05, 2024
  
  7e33a017
- refactor tensor query · 8b2c1006
  Michael Yang authored Apr 03, 2024
  
  8b2c1006
09 Apr, 2024 4 commits

Handle very slow model loads · c5ff443b
Daniel Hiltgen authored Apr 09, 2024
```
During testing, we're seeing some models take over 3 minutes.
```
c5ff443b
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) · 1524f323
Blake Mizerany authored Apr 09, 2024

1524f323

build.go: introduce a friendlier way to build Ollama (#3548) · fccf3eec

Blake Mizerany authored Apr 09, 2024

This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.

This script also provides nicer feedback to the user about what is
happening during the build process.

At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).

fccf3eec

update llama.cpp submodule to `1b67731` (#3561) · 5ec12cec
Jeffrey Morgan authored Apr 09, 2024

5ec12cec

08 Apr, 2024 1 commit
- cgo quantize · 9502e566
  Michael Yang authored Apr 05, 2024
  
  9502e566
07 Apr, 2024 1 commit
- update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to... · 63efa075
  Jeffrey Morgan authored Apr 07, 2024
```
update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528)
```
  63efa075
06 Apr, 2024 1 commit
- no rope parameters · be517e49
  Michael Yang authored Apr 05, 2024
  
  be517e49
04 Apr, 2024 3 commits
- add command-r graph estimate · 01f77ae2
  Michael Yang authored Apr 04, 2024
  
  01f77ae2
- Fail fast if mingw missing on windows · 36bd9677
  Daniel Hiltgen authored Apr 04, 2024
  
  36bd9677
- fix dll compress in windows building · 4de01267
  mofanke authored Apr 04, 2024
  
  4de01267
03 Apr, 2024 3 commits
- Fix CI release glitches · e4a7e5b2
  Daniel Hiltgen authored Apr 03, 2024
```
The subprocess change moved the build directory
arm64 builds weren't setting cross-compilation flags when building on x86
```
  e4a7e5b2
- update graph size estimate · 12e923e1
  Michael Yang authored Apr 02, 2024
  
  12e923e1
- Fix macOS builds on older SDKs (#3467) · cd135317
  Jeffrey Morgan authored Apr 03, 2024
  
  cd135317
02 Apr, 2024 1 commit
- Revert options as a ref in the server · 6589eb8a
  Daniel Hiltgen authored Apr 02, 2024
  
  6589eb8a