- 22 May, 2025 1 commit
-
-
Harrison Saturley-Hall authored
-
- 20 May, 2025 2 commits
-
-
Harrison Saturley-Hall authored
-
Harrison Saturley-Hall authored
-
- 15 May, 2025 2 commits
-
-
mohammedabdulwahhab authored
-
Anant Sharma authored
-
- 14 May, 2025 2 commits
-
-
Anant Sharma authored
-
Harrison Saturley-Hall authored
-
- 13 May, 2025 3 commits
-
-
Biswa Panda authored
Co-authored-by:
Graham King <grahamk@nvidia.com> Co-authored-by:
hongkuan <hongkuanz@nvidia.com> Co-authored-by:
Ubuntu <ubuntu@crusoe-prod--inst-2wjuoekvfq72mlpdrcugujrtgfp.us-east1-a.compute.internal>
-
Anant Sharma authored
-
Hongkuan Zhou authored
-
- 12 May, 2025 1 commit
-
-
Anant Sharma authored
-
- 10 May, 2025 2 commits
-
-
ishandhanani authored
-
Harrison Saturley-Hall authored
-
- 09 May, 2025 8 commits
-
-
Harrison Saturley-Hall authored
Co-authored-by:Ryan Olson <ryanolson@users.noreply.github.com>
-
ishandhanani authored
-
Graham King authored
That avoids passing the `--model-config` param to dynamo-run when using llamacpp.
-
Harrison Saturley-Hall authored
-
wxsm authored
Allow both password or TLS auth, if none of these is provided fallback to no auth Closes #657
-
Biswa Panda authored
-
ishandhanani authored
Co-authored-by:ishandhanani <ishandhananai@gmail.com>
-
Adit Ranadive authored
NIXL uses UCX which will have support for EFA since 1.19. Explicitly use the 1.19 branch for UCX with Dynamo. Signed-off-by:Adit Ranadive <aranadive@nvidia.com>
-
- 08 May, 2025 9 commits
-
-
Hongkuan Zhou authored
-
julienmancuso authored
Co-authored-by:mohammedabdulwahhab <furkhan324@berkeley.edu>
-
hhzhang16 authored
-
Graham King authored
. New mistralrs and llamacpp version . mistralrs: Handle Gemma 3 and Llama 4 as vision models . Update the dynamo-run docs to use Qwen 3 . Our pre-processor now supports Llama 4's newer multi-modal `config.json` . Upgrade minijinja to handle Qwen 3's prompt template For Llama 4 we'll need to limit the max seq len. vllm says: > To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,... I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.
-
Ryan McCormick authored
-
Anthony Casagrande authored
Signed-off-by:Anthony Casagrande <acasagrande@nvidia.com>
-
Yan Ru Pei authored
-
Anant Sharma authored
-
hhzhang16 authored
-
- 07 May, 2025 10 commits
-
-
Hongkuan Zhou authored
-
Kris Hung authored
-
Graham King authored
Signed-off-by:
Graham King <graham@gkgk.org> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Ryan McCormick authored
-
Biswa Panda authored
-
Tanmay Verma authored
Signed-off-by:
Tanmay Verma <tanmay2592@gmail.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
祝健聪 authored
Signed-off-by:Chasing1020 <chasing1020@gmail.com>
-
Anthony Casagrande authored
-
Graham King authored
vllm and sglang are now the sub-process engines from #954 Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).
-
ptarasiewiczNV authored
-