- 18 Oct, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:
jmorganca <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me> Co-authored-by:
Jesse Gross <jesse@ollama.com>
-
- 01 Oct, 2024 1 commit
-
-
Alex Mavrogiannis authored
-
- 11 Sep, 2024 2 commits
-
-
Patrick Devine authored
-
Michael Yang authored
fixes line wrapping on long texts
-
- 05 Sep, 2024 2 commits
-
-
Daniel Hiltgen authored
With the new very large parameter models, some users are willing to wait for a very long time for models to load.
-
Daniel Hiltgen authored
Provide a mechanism for users to set aside an amount of VRAM on each GPU to make room for other applications they want to start after Ollama, or workaround memory prediction bugs
-
- 01 Sep, 2024 1 commit
-
-
Vimal Kumar authored
-
- 23 Aug, 2024 1 commit
-
-
Patrick Devine authored
-
- 21 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 14 Aug, 2024 1 commit
-
-
longtao authored
* Fix typo and improve readability Summary: * Rename updatAvailableMenuID to updateAvailableMenuID * Replace unused cmd parameter with _ in RunServer function * Fix typos in comments (cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7) * Update api/client.go Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> --------- Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com>
-
- 12 Aug, 2024 1 commit
-
-
Josh authored
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 26 Jul, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 23 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 22 Jul, 2024 2 commits
-
-
Michael Yang authored
-
Daniel Hiltgen authored
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM scenarios. With Concurrency this was no longer wired up, and the simplistic value doesn't map to multi-GPU setups. Users can still set `num_gpu` to limit memory usage to avoid OOM if we get our predictions wrong.
-
- 14 Jul, 2024 1 commit
-
-
Patrick Devine authored
-
- 28 Jun, 2024 2 commits
- 27 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 25 Jun, 2024 1 commit
-
-
Blake Mizerany authored
This commit changes the 'ollama run' command to defer fetching model information until it really needs it. That is, when in interactive mode. It also removes one such case where the model information is fetch in duplicate, just before calling generateInteractive and then again, first thing, in generateInteractive. This positively impacts the performance of the command: ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.168 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.220 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.217 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 4% cpu 0.652 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 5% cpu 0.498 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 3% cpu 0.479 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total
-
- 19 Jun, 2024 1 commit
-
-
royjhan authored
* API Show Extended * Initial Draft of Information Co-Authored-By:
Patrick Devine <pdevine@sonic.net> * Clean Up * Descriptive arg error messages and other fixes * Second Draft of Show with Projectors Included * Remove Chat Template * Touches * Prevent wrapping from files * Verbose functionality * Docs * Address Feedback * Lint * Resolve Conflicts * Function Name * Tests for api/show model info * Show Test File * Add Projector Test * Clean routes * Projector Check * Move Show Test * Touches * Doc update --------- Co-authored-by:
Patrick Devine <pdevine@sonic.net>
-
- 12 Jun, 2024 1 commit
-
-
Patrick Devine authored
-
- 04 Jun, 2024 4 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 30 May, 2024 3 commits
-
-
Josh Yan authored
-
Josh Yan authored
-
Lei Jitang authored
* envconfig/config.go: Fix wrong description of OLLAMA_LLM_LIBRARY Signed-off-by:
Lei Jitang <leijitang@outlook.com> * serve: Add more env to help message of ollama serve Add more enviroment variables to `ollama serve --help` to let users know what can be configurated. Signed-off-by:
Lei Jitang <leijitang@outlook.com> --------- Signed-off-by:
Lei Jitang <leijitang@outlook.com>
-
- 24 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 20 May, 2024 2 commits
-
-
Patrick Devine authored
-
Patrick Devine authored
-
- 18 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 16 May, 2024 3 commits
- 15 May, 2024 2 commits