Commits · 96cfb626415cd811ab545c134bd0d16fa7aca044 · OpenDAS / ollama

17 Jan, 2024 1 commit
- fix: normalize name path before splitting · 96cfb626
  Michael Yang authored Jan 16, 2024
  
  96cfb626
16 Jan, 2024 4 commits
- Merge pull request #1937 from jmorganca/mxyng/remove-client-py · 598d6d55
  Michael Yang authored Jan 16, 2024
```
remove client.py
```
  598d6d55
- do not cache prompt (#2018) · a897e833
  Bruce MacDonald authored Jan 16, 2024
```
- prompt cache causes inferance to hang after some time
```
  a897e833
- Fix show parameters (#2017) · eef50acc
  Patrick Devine authored Jan 16, 2024
  
  eef50acc
- Merge pull request #1968 from jmorganca/mxyng/fix-request-retry · 05d53de7
  Michael Yang authored Jan 16, 2024
```
fix: request retry with error
```
  05d53de7
15 Jan, 2024 1 commit
- Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection · 8795447d
  Daniel Hiltgen authored Jan 14, 2024
```
improve cuda detection (rel. issue #1704)
```
  8795447d
14 Jan, 2024 4 commits
- Merge pull request #1988 from dhiltgen/fix_intel_mac · 95ad9a9f
  Daniel Hiltgen authored Jan 14, 2024
```
Fix typo in arm mac arch script
```
  95ad9a9f
- Fix typo in arm mac arch script · 3ca5f69c
  Daniel Hiltgen authored Jan 14, 2024
  
  3ca5f69c
- Merge pull request #1982 from dhiltgen/fix_intel_mac · cfa63379
  Daniel Hiltgen authored Jan 14, 2024
```
Fix intel mac build
```
  cfa63379
- Disable `mmap` with lora layers (#1985) · 557110d0
  Jeffrey Morgan authored Jan 13, 2024
  
  557110d0
13 Jan, 2024 3 commits
- Fix intel mac build · 2ecb2472
  Daniel Hiltgen authored Jan 13, 2024
```
Make sure we're building an x86 ext_server lib when cross-compiling
```
  2ecb2472
- add `gcc -lstdc++` flag for linux cpu (#1974) · 288ef8ff
  Jeffrey Morgan authored Jan 13, 2024
  
  288ef8ff
- use g++ to build `libext_server.so` on linux (#1972) · 4cf17990
  Jeffrey Morgan authored Jan 13, 2024
  
  4cf17990
12 Jan, 2024 10 commits
- Merge pull request #1961 from jmorganca/mxyng/rm-double-newline · b6c0ef1e
  Michael Yang authored Jan 12, 2024
```
remove double newlines in /set parameter
```
  b6c0ef1e
- Merge pull request #1971 from jmorganca/mxyng/max-context-length · 356d178f
  Michael Yang authored Jan 12, 2024
```
add max context length check
```
  356d178f
- add max context length check · eaed6f8c
  Michael Yang authored Jan 12, 2024
  
  eaed6f8c
- fix: request retry with error · cf29bd2d
  Michael Yang authored Jan 12, 2024
```
this fixes a subtle bug with makeRequestWithRetry where an HTTP status
error on a retried request will potentially not return the right err
```
  cf29bd2d
- improve cuda detection (rel. issue #1704) · 905862e1
  Fabian Preiss authored Jan 09, 2024
  
  905862e1
- Convert the REPL to use /api/chat for interactive responses (#1936) · 565f8a3c
  Patrick Devine authored Jan 12, 2024
  
  565f8a3c
- remove double newlines in /set parameter · 5121b7ac
  Michael Yang authored Jan 12, 2024
  
  5121b7ac
- Update README.md · a70262c6
  Michael Yang authored Jan 12, 2024
```
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
```
  a70262c6
- Add group delete to uninstall instructions (#1924) · 40a0a90a
  Tristram Oaten authored Jan 12, 2024
```
After executing the `userdel ollama` command, I saw this message:

```sh
  $ sudo userdel ollama
  userdel: group ollama not removed because it has other members.
```

Which reminded me that I had to remove the dangling group too. For completeness, the uninstall instructions should do this too.

Thanks!
```
  40a0a90a
- update readme · cbe20c43
  Michael Yang authored Jan 11, 2024
  
  cbe20c43
11 Jan, 2024 17 commits
- remove client.py · 5ffbbea1
  Michael Yang authored Jan 11, 2024
  
  5ffbbea1
- Merge pull request #1935 from dhiltgen/cpu_fallback · 3773fb64
  Daniel Hiltgen authored Jan 11, 2024
```
Fix up the CPU fallback selection
```
  3773fb64
- Fix up the CPU fallback selection · 7427fa13
  Daniel Hiltgen authored Jan 11, 2024
```
The memory changes and multi-variant change had some merge
glitches I missed.  This fixes them so we actually get the cpu llm lib
and best variant for the given system.
```
  7427fa13
- Merge pull request #1934 from jmorganca/mxyng/fix-slices · f84537e0
  Michael Yang authored Jan 11, 2024
```
fix build and lint
```
  f84537e0
- fix typo · d2be6387
  Michael Yang authored Jan 11, 2024
  
  d2be6387
- import fmt · d7af35d3
  Michael Yang authored Jan 11, 2024
  
  d7af35d3
- use x/exp/slices · defc1dbd
  Michael Yang authored Jan 11, 2024
  
  defc1dbd
- Merge pull request #1819 from dhiltgen/multi_variant · de2fbdec
  Daniel Hiltgen authored Jan 11, 2024
```
Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds
```
  de2fbdec
- Add semantic kernel to Readme (#1931) · f5faf79a
  Eduard van Valkenburg authored Jan 11, 2024
  
  f5faf79a
- Merge pull request #1552 from jmorganca/mxyng/lint-test · f4f939de
  Michael Yang authored Jan 11, 2024
```
add lint and test on pull_request
```
  f4f939de
- Always dynamically load the llm server library · 39928a42
  Daniel Hiltgen authored Jan 09, 2024
```
This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform
```
  39928a42
- Build multiple CPU variants and pick the best · d88c527b
  Daniel Hiltgen authored Jan 07, 2024
```
This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker.  Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available
```
  d88c527b
- fix gpu_test.go Error (same type) uint64->uint32 (#1921) · 3bc8b983
  Fabian Preiß authored Jan 11, 2024
  
  3bc8b983
- revisit memory allocation to account for full kv cache on main gpu · ab6be852
  Jeffrey Morgan authored Jan 11, 2024
  
  ab6be852
- DRY out the Dockefile.build · 052b33b8
  Daniel Hiltgen authored Jan 06, 2024
  
  052b33b8
- Support multiple variants for a given llm lib type · 8da7bef0
  Daniel Hiltgen authored Jan 05, 2024
```
In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.
```
  8da7bef0
- Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) · b24e8d17
  Jeffrey Morgan authored Jan 10, 2024
```
* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc
```
  b24e8d17