- 16 Sep, 2024 1 commit
-
-
Pepo authored
-
- 15 Sep, 2024 1 commit
-
-
Edward Cui authored
-
- 13 Sep, 2024 1 commit
-
-
Daniel Hiltgen authored
scripts: fix incremental builds on linux or similar
-
- 12 Sep, 2024 6 commits
-
-
Daniel Hiltgen authored
Corrects x86_64 vs amd64 discrepancy
-
Daniel Hiltgen authored
* Optimize container images for startup This change adjusts how to handle runner payloads to support container builds where we keep them extracted in the filesystem. This makes it easier to optimize the cpu/cuda vs cpu/rocm images for size, and should result in faster startup times for container images. * Refactor payload logic and add buildx support for faster builds * Move payloads around * Review comments * Converge to buildx based helper scripts * Use docker buildx action for release
-
dcasota authored
-
Adrian Cole authored
-
RAPID ARCHITECT authored
-
Jesse Gross authored
runner: Flush pending responses before returning
-
- 11 Sep, 2024 6 commits
-
-
Jesse Gross authored
If there are any pending reponses (such as from potential stop tokens) then we should send them back before ending the sequence. Otherwise, we can be missing tokens at the end of a response. Fixes #6707
-
Patrick Devine authored
-
Michael Yang authored
refactor show ouput
-
Michael Yang authored
fixes line wrapping on long texts
-
Petr Mironychev authored
-
Daniel Hiltgen authored
This adds back a check which was lost many releases back to verify /dev/kfd permissions which when lacking, can lead to confusing failure modes of: "rocBLAS error: Could not initialize Tensile host: No devices found" This implementation does not hard fail the serve command but instead will fall back to CPU with an error log. In the future we can include this in the GPU discovery UX to show detected but unsupported devices we discovered.
-
- 10 Sep, 2024 5 commits
-
-
Michael Yang authored
add *_proxy to env map for debugging
-
Michael Yang authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
* Quiet down dockers new lint warnings Docker has recently added lint warnings to build. This cleans up those warnings. * Fix go lint regression
-
Patrick Devine authored
-
- 08 Sep, 2024 2 commits
-
-
Jeffrey Morgan authored
-
RAPID ARCHITECT authored
-
- 07 Sep, 2024 3 commits
-
-
frob authored
-
Jeffrey Morgan authored
Includes small improvements to document layout and code blocks
-
Yaroslav authored
-
- 06 Sep, 2024 5 commits
-
-
nickthecook authored
-
imoize authored
-
Daniel Hiltgen authored
When we determine a GPU is too small for any layers, it's not always clear why. This will help troubleshoot those scenarios.
-
frob authored
-
Patrick Devine authored
-
- 05 Sep, 2024 10 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This reverts commit a60d9b89.
-
Daniel Hiltgen authored
With the new very large parameter models, some users are willing to wait for a very long time for models to load.
-
Daniel Hiltgen authored
Provide a mechanism for users to set aside an amount of VRAM on each GPU to make room for other applications they want to start after Ollama, or workaround memory prediction bugs
-
Daniel Hiltgen authored
-
Michael Yang authored
llama3.1 memory
-
Zeyo authored
-
Michael authored
* Update gpu.md Seems strange that the laptop versions of 3050 and 3050 Ti would be supported but not the non-notebook, but this is what the page (https://developer.nvidia.com/cuda-gpus ) says. Signed-off-by:bean5 <2052646+bean5@users.noreply.github.com> * Update gpu.md Remove notebook reference --------- Signed-off-by:
bean5 <2052646+bean5@users.noreply.github.com>
-
Tobias Heinze authored
-
Vitaly Zdanevich authored
-