- 18 Mar, 2025 8 commits
-
-
Neelay Shah authored
-
Suman Tatiraju authored
-
Anant Sharma authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Meenakshi Sharma authored
-
Harrison Saturley-Hall authored
-
Meenakshi Sharma authored
-
Meenakshi Sharma authored
Co-authored-by:Nicolas Noble <nicolasnoble@users.noreply.github.com>
-
Pavithra Vijayakrishnan authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
- 17 Mar, 2025 32 commits
-
-
Nicolas Noble authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
ishandhanani authored
-
Suman Tatiraju authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Neelay Shah authored
-
ishandhanani authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
kkranen authored
-
ishandhanani authored
Signed-off-by:
ishandhanani <ishandhanani@gmail.com> Co-authored-by:
mabdulwahhab <mabdulwahhab@nvidia.com>
-
Neelay Shah authored
-
Alec authored
Co-authored-by:Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
-
Harrison Saturley-Hall authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Suman Tatiraju authored
Co-authored-by:
Vikram Sharma <vsm2@illinois.edu> Co-authored-by:
Ziqi Fan <ziqif@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
ptarasiewiczNV authored
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
-
Alec authored
Co-authored-by:
GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by:
Sean <choishsean@gmail.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Neelay Shah authored
-
Suman Tatiraju authored
-
Suman Tatiraju authored
-
Suman Tatiraju authored
-
Hongkuan Zhou authored
-
Meenakshi Sharma authored
-
Meenakshi Sharma authored
-
Anant Sharma authored
-
Harrison Saturley-Hall authored
Co-authored-by:Anant Sharma <anants@nvidia.com>
-
Neelay Shah authored
-
Graham King authored
-
Anant Sharma authored
-
Harrison Saturley-Hall authored
-
Anant Sharma authored
-
Graham King authored
Previously several parts of the stack ensured max tokens (for this single request) was set. Now only text input sets it (to 8k). Everything else leaves as is, potentially blank. The engines themselves have very small defaults, 16 for vllm and 128 for sglang. Also fix dynamo-run CUDA startup message to only print if we're using an engine that would benefit from it (mistralrs, llamacpp).
-
nnshah1 authored
-
Anant Sharma authored
Co-authored-by:Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
-
ptarasiewiczNV authored
Co-authored-by:
hongkuanz <hongkuanz@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Co-authored-by:
Dmitry Tokarev <dtokarev@nvidia.com>
-
Suman Tatiraju authored
-