- 17 Mar, 2025 38 commits
-
-
ishandhanani authored
-
Suman Tatiraju authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Neelay Shah authored
-
ishandhanani authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
kkranen authored
-
ishandhanani authored
Signed-off-by:
ishandhanani <ishandhanani@gmail.com> Co-authored-by:
mabdulwahhab <mabdulwahhab@nvidia.com>
-
Neelay Shah authored
-
Alec authored
Co-authored-by:Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
-
Harrison Saturley-Hall authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Suman Tatiraju authored
Co-authored-by:
Vikram Sharma <vsm2@illinois.edu> Co-authored-by:
Ziqi Fan <ziqif@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
ptarasiewiczNV authored
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
-
Alec authored
Co-authored-by:
GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by:
Sean <choishsean@gmail.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Neelay Shah authored
-
Suman Tatiraju authored
-
Suman Tatiraju authored
-
Suman Tatiraju authored
-
Hongkuan Zhou authored
-
Meenakshi Sharma authored
-
Meenakshi Sharma authored
-
Anant Sharma authored
-
Harrison Saturley-Hall authored
Co-authored-by:Anant Sharma <anants@nvidia.com>
-
Neelay Shah authored
-
Graham King authored
-
Anant Sharma authored
-
Harrison Saturley-Hall authored
-
Anant Sharma authored
-
Graham King authored
Previously several parts of the stack ensured max tokens (for this single request) was set. Now only text input sets it (to 8k). Everything else leaves as is, potentially blank. The engines themselves have very small defaults, 16 for vllm and 128 for sglang. Also fix dynamo-run CUDA startup message to only print if we're using an engine that would benefit from it (mistralrs, llamacpp).
-
nnshah1 authored
-
Anant Sharma authored
Co-authored-by:Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
-
ptarasiewiczNV authored
Co-authored-by:
hongkuanz <hongkuanz@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Co-authored-by:
Dmitry Tokarev <dtokarev@nvidia.com>
-
Suman Tatiraju authored
-
GuanLuo authored
-
ishandhanani authored
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Neelay Shah authored
-
Anant Sharma authored
-
Anant Sharma authored
-
- 16 Mar, 2025 2 commits
-
-
Dmitry Tokarev authored
-
Anant Sharma authored
-