Commits · 44869c59d6b331e742d8bb2dab94304fed9842fa · OpenDAS / ollama

04 May, 2024 1 commit
- omit prompt and generate settings from final response · 44869c59
  Michael Yang authored May 03, 2024
  
  44869c59
01 May, 2024 5 commits
- Removing go routine calling .wait from load. · 321d57e1
  Mark Ward authored May 01, 2024
  
  321d57e1
- it will always return an error due to Kill() discarding Wait() errors · ba26c7aa
  Mark Ward authored Apr 29, 2024
  
  ba26c7aa
- log when the waiting for the process to stop to help debug when other tasks... · 63c76368
  Mark Ward authored Apr 29, 2024
```
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
```
  63c76368
- fix sched to wait for the runner to terminate to ensure following vram check will be more accurate · 948114e3
  Mark Ward authored Apr 28, 2024
  
  948114e3
- gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068) · f0c454ab
  Jeffrey Morgan authored May 01, 2024
  
  f0c454ab
30 Apr, 2024 4 commits
- llm: add back check for empty token cache · fcf4d60e
  jmorganca authored Apr 30, 2024
  
  fcf4d60e
- update llama.cpp commit to `952d03d` · e33d5c2d
  jmorganca authored Apr 30, 2024
  
  e33d5c2d
- update llama.cpp submodule to `f364eb6` (#4060) · 18d9a7e1
  Jeffrey Morgan authored Apr 30, 2024
  
  18d9a7e1
- Update llama.cpp (#4036) · 23d23409
  Daniel Hiltgen authored Apr 29, 2024
```
* Bump llama.cpp to b2761

* Adjust types for bump
```
  23d23409
29 Apr, 2024 1 commit
- llm: dont cap context window limit to training context window (#3988) · 7aa08a77
  Jeffrey Morgan authored Apr 29, 2024
  
  7aa08a77
27 Apr, 2024 3 commits
- Do not build AVX runners on ARM64 · 8a65717f
  Hernan Martinez authored Apr 26, 2024
  
  8a65717f
- Use architecture specific folders in the generate script · b438d485
  Hernan Martinez authored Apr 26, 2024
  
  b438d485
- Add import declaration for windows,arm64 to llm.go · 86e67fc4
  Hernan Martinez authored Apr 26, 2024
  
  86e67fc4
26 Apr, 2024 9 commits
- Fine grain control over windows generate steps · e4859c45
  Daniel Hiltgen authored Apr 26, 2024
```
This will speed up CI which already tries to only build static for unit tests
```
  e4859c45
- Fix target in gen_windows.ps1 · ed5fb088
  Daniel Hiltgen authored Apr 26, 2024
  
  ed5fb088
- fix gemma, command-r layer weights · f81f3081
  Michael Yang authored Apr 26, 2024
  
  f81f3081
- return code `499` when user cancels request while a model is loading (#3955) · bb31def0
  Jeffrey Morgan authored Apr 26, 2024
  
  bb31def0
- Put back non-avx CPU build for windows · 421c878a
  Daniel Hiltgen authored Apr 26, 2024
  
  421c878a
- Fix clip log import · 85801317
  Daniel Hiltgen authored Apr 26, 2024
  
  85801317
- Bump llama.cpp to b2737 · 2ed0d659
  Daniel Hiltgen authored Apr 25, 2024
  
  2ed0d659
- Refactor windows generate for more modular usage · 8671fded
  Daniel Hiltgen authored Apr 25, 2024
  
  8671fded
- Move cuda/rocm dependency gathering into generate script · 8feb97dc
  Daniel Hiltgen authored Apr 25, 2024
```
This will make it simpler for CI to accumulate artifacts from prior steps
```
  8feb97dc
25 Apr, 2024 4 commits
- llm: limit generation to 10x context size to avoid run on generations (#3918) · 993cf8bf
  Jeffrey Morgan authored Apr 25, 2024
```
* llm: limit generation to 10x context size to avoid run on generations

* add comment

* simplify condition statement
```
  993cf8bf
- only count output tensors · 7bb7cb8a
  Michael Yang authored Apr 25, 2024
  
  7bb7cb8a
- use matrix multiplcation kernels in more cases · ddf5c09a
  jmorganca authored Apr 25, 2024
  
  ddf5c09a
- Remove trailing spaces (#3889) · 5f73c087
  Roy Yang authored Apr 25, 2024
  
  5f73c087
24 Apr, 2024 2 commits

fixes for gguf (#3863) · 14476d48
Patrick Devine authored Apr 23, 2024

14476d48

Daniel Hiltgen authored Apr 23, 2024

If we get our predictions wrong, this can be used to
set a lower memory limit as a workaround.  Recent multi-gpu
refactoring accidentally removed it, so this adds it back.

5445aaa9

23 Apr, 2024 6 commits

Move nested payloads to installer and zip file on windows · 058f6cd2

Daniel Hiltgen authored Apr 23, 2024

Now that the llm runner is an executable and not just a dll, more users are facing
problems with security policy configurations on windows that prevent users
writing to directories and then executing binaries from the same location.
This change removes payloads from the main executable on windows and shifts them
over to be packaged in the installer and discovered based on the executables location.
This also adds a new zip file for people who want to "roll their own" installation model.

058f6cd2

Detect and recover if runner removed · 58888a74

Daniel Hiltgen authored Apr 23, 2024

Tmp cleaners can nuke the file out from underneath us.  This detects the missing
runner, and re-initializes the payloads.

58888a74

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

Report errors on server lookup instead of path lookup failure · 8711d03d
Daniel Hiltgen authored Apr 22, 2024

8711d03d
fix: mixtral graph · 435cc866
Michael Yang authored Apr 22, 2024

435cc866
Trim spaces and quotes from llm lib override · aa72281e
Daniel Hiltgen authored Apr 22, 2024

aa72281e

21 Apr, 2024 2 commits
- Update gen_windows.ps1 · 9c0db4cc
  Jeremy authored Apr 21, 2024
```
Fixed improper env references
```
  9c0db4cc
- chore: use errors.New to replace fmt.Errorf will much better (#3789) · 62be2050
  Cheng authored Apr 21, 2024
  
  62be2050
18 Apr, 2024 3 commits

Update gen_windows.ps1 · 6f18297b
Jeremy authored Apr 18, 2024
```
Forgot a " on the write-host
```
6f18297b

Update gen_windows.ps1 · 15016413

Jeremy authored Apr 18, 2024

Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS to customize GPU builds on Windows

15016413

Update gen_linux.sh · 440b7190

Jeremy authored Apr 18, 2024

Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS instead of OLLAMA_CUSTOM_GPU_DEFS

440b7190