Commits · 6f351bf586642e0c1c7086af028cdff0e856a254 · OpenDAS / ollama

14 Jun, 2024 15 commits
- review comments and coverage · 6f351bf5
  Daniel Hiltgen authored Jun 05, 2024
  
  6f351bf5
- Prevent multiple concurrent loads on the same gpus · ff4f0cbd
  Daniel Hiltgen authored Jun 04, 2024
```
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
```
  ff4f0cbd
- Refine CPU load behavior with system memory visibility · fc37c192
  Daniel Hiltgen authored Jun 03, 2024
  
  fc37c192
- Reintroduce nvidia nvml library for windows · 434dfe30
  Daniel Hiltgen authored Jun 03, 2024
```
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
```
  434dfe30
- Refactor intel gpu discovery · 4e2b7e18
  Daniel Hiltgen authored May 29, 2024
  
  4e2b7e18
- Harden unload for empty runners · 48702dd1
  Daniel Hiltgen authored May 30, 2024
  
  48702dd1
- refined test timing · 68dfc623
  Daniel Hiltgen authored May 31, 2024
```
adjust timing on some tests so they don't timeout on small/slow GPUs
```
  68dfc623
- Support forced spreading for multi GPU · 5e8ff556
  Daniel Hiltgen authored May 08, 2024
```
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one.  This exposes that
tunable behavior.
```
  5e8ff556
- Improve multi-gpu handling at the limit · 6fd04ca9
  Daniel Hiltgen authored May 18, 2024
```
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
```
  6fd04ca9
- Fix concurrency integration test to work locally · 206797bd
  Daniel Hiltgen authored May 23, 2024
```
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
```
  206797bd
- Refine GPU discovery to bootstrap once · 43ed358f
  Daniel Hiltgen authored May 15, 2024
```
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
```
  43ed358f
- Use DRM driver for VRAM info for amd · b32ebb4f
  Daniel Hiltgen authored May 14, 2024
```
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
```
  b32ebb4f
- Fix server.cpp for the new cuda build macros · fb9cdfa7
  Daniel Hiltgen authored May 18, 2024
  
  fb9cdfa7
- Revert "Limit GPU lib search for now (#4777)" · efac4886
  Daniel Hiltgen authored Jun 03, 2024
```
This reverts commit 476fb8e8.
```
  efac4886
- update 40xx gpu compat matrix (#5036) · 4dc7fb95
  Patrick Devine authored Jun 13, 2024
  
  4dc7fb95
13 Jun, 2024 9 commits
- Merge pull request #5032 from dhiltgen/actually_skip · c39761c5
  Daniel Hiltgen authored Jun 13, 2024
```
Actually skip PhysX on windows
```
  c39761c5
- Actually skip PhysX on windows · aac36763
  Daniel Hiltgen authored Jun 13, 2024
  
  aac36763
- Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16 · 15a687ae
  Michael Yang authored Jun 13, 2024
```
fix: multibyte utf16
```
  15a687ae
- fix utf16 for multibyte runes · d528e1af
  Michael Yang authored Jun 13, 2024
  
  d528e1af
- parser: add test for multibyte runes · cd234ce2
  Michael Yang authored Jun 13, 2024
  
  cd234ce2
- add OLLAMA_MODELS to envconfig (#5029) · 94618b23
  Patrick Devine authored Jun 13, 2024
  
  94618b23
- server: remove jwt decoding error (#5027) · 1fd236d1
  Jeffrey Morgan authored Jun 13, 2024
  
  1fd236d1
- Merge pull request #5025 from ollama/mxyng/revert-parser-scan · e87fc720
  Michael Yang authored Jun 13, 2024
```
Revert "proper utf16 support"
```
  e87fc720
- Revert "proper utf16 support" · 20b9f8e6
  Michael Yang authored Jun 13, 2024
```
This reverts commit 66ab4877.

this change broke utf-8 scanning of multi-byte runes
```
  20b9f8e6
12 Jun, 2024 3 commits
- move OLLAMA_HOST to envconfig (#5009) · c69bc19e
  Patrick Devine authored Jun 12, 2024
  
  c69bc19e
- Merge pull request #5004 from ollama/mxyng/fix-templates · bba5d177
  Michael Yang authored Jun 12, 2024
```
fix: multiple templates when creating from model
```
  bba5d177
- fix: multiple templates when creating from model · c16f8af9
  Michael Yang authored Jun 12, 2024
```
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
```
  c16f8af9
11 Jun, 2024 4 commits
- Merge pull request #4987 from ollama/mxyng/revert-byte-order · 217f60c3
  Michael Yang authored Jun 11, 2024
```
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
```
  217f60c3
- Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order" · 7bdcd1da
  Michael Yang authored Jun 11, 2024
```
This reverts commit f5f245cc, reversing
changes made to 94d37fdc.

this change broke gguf v2 which is incorrectly detected as big endian
```
  7bdcd1da
- llm: fix seed value not being applied to requests (#4986) · ead259d8
  Jeffrey Morgan authored Jun 11, 2024
  
  ead259d8
- Add Ollama-hpp to Community Libraries in README. (#4983) · 2ff45d57
  James Montgomery authored Jun 11, 2024
  
  2ff45d57
10 Jun, 2024 6 commits
- Merge pull request #4715 from ollama/mxyng/utf16-parser · 0f3cf1d4
  Michael Yang authored Jun 10, 2024
```
proper utf16 support
```
  0f3cf1d4
- Merge pull request #4921 from ollama/mxyng/import-md · 5bc029c5
  Michael Yang authored Jun 10, 2024
```
update import.md
```
  5bc029c5
- Merge pull request #4965 from ollama/mxyng/skip-layer-remove · e9a9c6a8
  Michael Yang authored Jun 10, 2024
```
fix: skip removing layers that no longer exist
```
  e9a9c6a8
- fix: skip removing layers that no longer exist · 515f497e
  Michael Yang authored Jun 10, 2024
  
  515f497e
- add test · b27268aa
  Michael Yang authored Jun 10, 2024
  
  b27268aa
- Merge pull request #4938 from ollama/mxyng/fix-byte-order · f5f245cc
  Michael Yang authored Jun 10, 2024
```
fix parsing big endian gguf
```
  f5f245cc
09 Jun, 2024 3 commits

fix: examples/langchain-python-rag-privategpt/requirements.txt (#3382) · 94d37fdc
Jim Scardelis authored Jun 09, 2024

94d37fdc

Critical fix from llama.cpp JSON grammar to forbid un-escaped escape... · b84aea16

Craig Hughes authored Jun 09, 2024

Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782)

b84aea16

Add instructions to easily install specific versions on faq.md (#4084) · 896495de

Napuh authored Jun 09, 2024



* Added instructions to easily install specific versions on faq.md

* Small typo

* Moved instructions on how to install specific version to linux.md

* Update docs/linux.md

* Update docs/linux.md

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

896495de