Commits · 1fdb351c37a445fb2e8fdad19fff88f6d85b2912 · OpenDAS / ollama

04 Mar, 2025 1 commit

New engine: vision models and auto-fallback (#9113) · 1fdb351c

Daniel Hiltgen authored Mar 04, 2025

* Include unified vision layers in memory prediction

For newer vision models with a single gguf, include
the projection estimates.

* Adjust CLI to handle both styles of vision model metadata

* Wire up new tokenizers for new engine

If we're loading the new engine, utilize the new model
text processor instead of calling into cgo wrappers for
llama.cpp.  This also cleans up some tech debt from the
older tokenization flow for the C++ server which was
no longer used.

This also adjusts the grammar handling logic to pass
through to the new engine instead of utilizing the cgo
schema to grammar call.

* Lay foundation for auto selection of new engine

1fdb351c

03 Mar, 2025 1 commit
- cmd: add default err return for stop (#9458) · d25efe39
  CYJiang authored Mar 04, 2025
  
  d25efe39
14 Feb, 2025 1 commit

Runner for Ollama engine · ed443a03

Jesse Gross authored Dec 17, 2024

This provides integration with the new Ollama engine
(58245413 next ollama runner (#7913)) and the rest of the Ollama
infrastructure such as the runner and Ollama server.

In addition, it also builds out the KV cache infrastructure to
support requirements of how Ollama runs models such as:
 - Parallel processing
 - Memory management for defragmentation and shifting
 - Multi-modal modals

Both old and new engines continue to be supported. By default, only
the old engine is used. To enable the new engine:

Start the server with the OLLAMA_NEW_ENGINE environment variable set:
OLLAMA_NEW_ENGINE=1 ./ollama serve

Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
./ollama run jessegross/llama3.1

ed443a03

16 Jan, 2025 1 commit
- fix default modelfile for create (#8452) · a420a453
  Patrick Devine authored Jan 16, 2025
  
  a420a453
11 Jan, 2025 1 commit
- make the modelfile path relative for `ollama create` (#8380) · 32bd37ad
  Patrick Devine authored Jan 10, 2025
  
  32bd37ad
09 Jan, 2025 1 commit
- show a more descriptive error in the client if it is newer than the server (#8351) · 8bccae4f
  Patrick Devine authored Jan 09, 2025
  
  8bccae4f
01 Jan, 2025 1 commit
- Update the /api/create endpoint to use JSON (#7935) · 86a622cb
  Patrick Devine authored Dec 31, 2024
```
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
```
  86a622cb
11 Dec, 2024 1 commit
- server: more support for mixed-case model names (#8017) · b1fd7fef
  Blake Mizerany authored Dec 11, 2024
```
Fixes #7944
```
  b1fd7fef
10 Dec, 2024 1 commit

build: Make target improvements (#7499) · 4879a234

Daniel Hiltgen authored Dec 10, 2024

* llama: wire up builtin runner

This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build.  After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.

* build: Make target improvements

Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.

* Support customized CPU flags for runners

This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash.  If the user builds a customized set, we omit the naming
scheme and don't check for compatibility.  This avoids checking
requirements at runtime, so that logic has been removed as well.  This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.

* Use relative paths

If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.

* Remove payloads from main binary

* install: clean up prior libraries

This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.

4879a234

06 Dec, 2024 1 commit
- bugfix: "null" value json mode (#7979) · de52b6c2
  Parth Sareen authored Dec 06, 2024
  
  de52b6c2
05 Dec, 2024 2 commits
- api: add generate endpoint for structured outputs (#7939) · c6c52627
  Parth Sareen authored Dec 04, 2024
  
  c6c52627
- api: structured outputs - chat endpoint (#7900) · 630e7dc6
  Parth Sareen authored Dec 04, 2024
```
Adds structured outputs to chat endpoint
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>
```
  630e7dc6
03 Dec, 2024 1 commit
- llm: introduce k/v context quantization (vRAM improvements) (#6279) · 1bdab9fd
  Sam authored Dec 04, 2024
  
  1bdab9fd
25 Nov, 2024 1 commit

cmd: print location of model after pushing (#7695) · a210ec74

Bruce MacDonald authored Nov 25, 2024

After a user pushes their model it is not clear what to do next. Add a link
to the output of `ollama push` that tells the user where their model can now
be found.

a210ec74

22 Nov, 2024 2 commits

server: remove out of date anonymous access check (#7785) · 7b5585b9

Bruce MacDonald authored Nov 22, 2024

In the past the ollama.com server would return a JWT that contained
information about the user being authenticated. This was used to return
different error messages to the user. This is no longer possible since the
token used to authenticate does not contain information about the user
anymore. Removing this code that no longer works.

Follow up changes will improve the error messages returned here, but good to
clean up first.

7b5585b9

Be quiet when redirecting output (#7360) · d88972ea

Daniel Hiltgen authored Nov 22, 2024

This avoids emitting the progress indicators to stderr, and the interactive
prompts to the output file or pipe. Running "ollama run model > out.txt"
now exits immediately, and "echo hello | ollama run model > out.txt"
produces zero stderr output and a typical response in out.txt

d88972ea

14 Nov, 2024 1 commit
- cmd: preserve exact bytes when displaying template/system layers (#7586) · 67691e41
  Blake Mizerany authored Nov 13, 2024
  
  67691e41
25 Oct, 2024 1 commit

Fix unicode output on windows with redirect to file (#7358) · 35ec7f07

Daniel Hiltgen authored Oct 25, 2024

If we're not writing out to a terminal, avoid setting the console mode
on windows, which corrupts the output file.

35ec7f07

22 Oct, 2024 1 commit
- default to "FROM ." if a Modelfile isn't present (#7250) · d78fb620
  Patrick Devine authored Oct 22, 2024
  
  d78fb620
18 Oct, 2024 1 commit

image processing for llama3.2 (#6963) · c7cb0f06

Patrick Devine authored Oct 18, 2024


Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>

c7cb0f06

01 Oct, 2024 1 commit
- Stop model before deletion if loaded (fixed #6957) (#7050) · f40bb398
  Alex Mavrogiannis authored Oct 01, 2024
  
  f40bb398
11 Sep, 2024 2 commits
- add "stop" command (#6739) · abed273d
  Patrick Devine authored Sep 11, 2024
  
  abed273d
- refactor show ouput · ecab6f1c
  Michael Yang authored Sep 11, 2024
```
fixes line wrapping on long texts
```
  ecab6f1c
05 Sep, 2024 2 commits

llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT · 67190976
Daniel Hiltgen authored Sep 05, 2024
```
With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
```
67190976

Introduce GPU Overhead env var (#5922) · b05c9e83

Daniel Hiltgen authored Sep 05, 2024

Provide a mechanism for users to set aside an amount of VRAM on each GPU
to make room for other applications they want to start after Ollama, or workaround
memory prediction bugs

b05c9e83

01 Sep, 2024 1 commit
- fix(cmd): show info may have nil ModelInfo (#6579) · 5f7b4a5e
  Vimal Kumar authored Sep 01, 2024
  
  5f7b4a5e
23 Aug, 2024 1 commit
- convert safetensor adapters into GGUF (#6327) · 0c819e16
  Patrick Devine authored Aug 23, 2024
  
  0c819e16
21 Aug, 2024 1 commit
- create bert models from cli · beb49eef
  Michael Yang authored Jun 07, 2024
  
  beb49eef
14 Aug, 2024 1 commit

Fix typo and improve readability (#5964) · 0a8d6ea8

longtao authored Aug 14, 2024



* Fix typo and improve readability

Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments

(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)

* Update api/client.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

0a8d6ea8

12 Aug, 2024 1 commit
- cmd: spinner progress for transfer model data (#6100) · f7e3b919
  Josh authored Aug 12, 2024
  
  f7e3b919
02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
26 Jul, 2024 2 commits
- display messages · a250c2cb
  Michael Yang authored Jul 26, 2024
  
  a250c2cb
- include modelfile messages · 15af5584
  Michael Yang authored Jun 19, 2024
  
  15af5584
23 Jul, 2024 1 commit
- Better explain multi-gpu behavior · 830fdd27
  Daniel Hiltgen authored Jul 23, 2024
  
  830fdd27
22 Jul, 2024 2 commits

host · 4f1afd57
Michael Yang authored Jul 03, 2024

4f1afd57

Remove no longer supported max vram var · cc269ba0

Daniel Hiltgen authored Jul 22, 2024

The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios. With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups. Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.

cc269ba0

14 Jul, 2024 1 commit
- remove template (#5655) · 057d3186
  Patrick Devine authored Jul 13, 2024
  
  057d3186
28 Jun, 2024 2 commits
- Include Show Info in Interactive (#5342) · 5f034f5b
  royjhan authored Jun 28, 2024
  
  5f034f5b
- Ollama Show: Check for Projector Type (#5307) · b910fa90
  royjhan authored Jun 28, 2024
```
* Check exists projtype

* Maintain Ordering
```
  b910fa90
27 Jun, 2024 1 commit
- zip: prevent extracting files into parent dirs (#5314) · 123a722a
  Michael Yang authored Jun 26, 2024
  
  123a722a