- 18 Sep, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 12 Sep, 2024 1 commit
-
-
Daniel Hiltgen authored
* Optimize container images for startup This change adjusts how to handle runner payloads to support container builds where we keep them extracted in the filesystem. This makes it easier to optimize the cpu/cuda vs cpu/rocm images for size, and should result in faster startup times for container images. * Refactor payload logic and add buildx support for faster builds * Move payloads around * Review comments * Converge to buildx based helper scripts * Use docker buildx action for release
-
- 11 Sep, 2024 1 commit
-
-
Patrick Devine authored
-
- 05 Sep, 2024 3 commits
-
-
Daniel Hiltgen authored
This reverts commit a60d9b89.
-
Daniel Hiltgen authored
-
Tobias Heinze authored
-
- 28 Aug, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 27 Aug, 2024 2 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 23 Aug, 2024 1 commit
-
-
Patrick Devine authored
-
- 22 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
* Fix embeddings memory corruption The patch was leading to a buffer overrun corruption. Once removed though, parallism in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To work around this, only use slot 0 for embeddings. * Fix embed integration test assumption The token eval count has changed with recent llama.cpp bumps (0.3.5+)
-
- 21 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 19 Aug, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 18 Aug, 2024 2 commits
-
-
Richard Lyons authored
-
Richard Lyons authored
-
- 17 Aug, 2024 1 commit
-
-
Richard Lyons authored
-
- 16 Aug, 2024 1 commit
-
-
zwwhdls authored
Signed-off-by:zwwhdls <zww@hdls.me>
-
- 15 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 14 Aug, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 13 Aug, 2024 3 commits
-
-
Blake Mizerany authored
The previous value of 64 was WAY too high and unnecessary. It reached diminishing returns and blew past it. This is a more reasonable number for _most_ normal cases. For users on cloud servers with excellent network quality, this will keep screaming for them, without hitting our CDN limits. For users with relatively poor network quality, this will keep them from saturating their network and causing other issues.
-
Michael Yang authored
- fixes printf: non-constant format string in call to fmt.Printf - fixes SA1032: arguments have the wrong order - disables testifylint
-
royjhan authored
* load on empty input * no load on invalid input
-
- 12 Aug, 2024 3 commits
- 11 Aug, 2024 1 commit
-
-
Jeffrey Morgan authored
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
-
- 09 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
It seems this can fail in some casees, but proceed with the download anyway.
-
- 08 Aug, 2024 2 commits
-
-
Jitang Lei authored
Signed-off-by:Jitang Lei <leijitang@outlook.com>
-
Jesse Gross authored
Commit 1829fb61 ("manifest: Fix crash on startup when trying to clean up unused files (#5840)") changed the config layer stored in manifests from a pointer to a value. This was done in order to avoid potential nil pointer dereferences after it is deserialized from JSON in the event that the field is missing. This changes the Layers slice to also be stored by value. This enables consistency in handling across the two objects.
-
- 07 Aug, 2024 3 commits
-
-
Jesse Gross authored
When creating a model the config layer is appended to the list of layers and then the last layer is used as the config when writing the manifest. This change directly uses the config layer to write the manifest. There is no behavior change but it is less error prone.
-
Jesse Gross authored
Currently if the config field is missing in the manifest file (or corrupted), Ollama will crash when it tries to read it. This can happen at startup or when pulling new models. This data is mostly just used for showing model information so we can be tolerant of it not being present - it is not required to run the models. Besides avoiding crashing, this also gives us the ability to restructure the config in the future by pulling it into the main manifest file.
-
Jesse Gross authored
If there is an error when opening a manifest file (corrupted, permission denied, etc.) then the referenced layers will not be included in the list of active layers. This causes them to be deleted when pruning happens at startup or a model is pulled. In such a situation, we should prefer to preserve data in the hopes that it can be recovered rather than being agressive about deletion.
-
- 06 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
The file.Truncate call on windows will write the whole file unless you set the sparse flag, leading to heavy I/O at the beginning of download. This should improve our I/O behavior on windows and put less stress on the users disk.
-
- 02 Aug, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 01 Aug, 2024 3 commits
-
-
Vyacheslav Moskalev authored
-
Vyacheslav Moskalev authored
-
Vyacheslav Moskalev authored
-