- 28 Aug, 2025 2 commits
-
-
atchernych authored
-
ryan-lempka authored
Signed-off-by:Ryan Lempka <rlempka@nvidia.com>
-
- 26 Aug, 2025 2 commits
-
-
Chi McIsaac authored
-
Ayush Agarwal authored
-
- 25 Aug, 2025 2 commits
-
-
nachiketb-nvidia authored
-
nachiketb-nvidia authored
- couple of refactors - added a new dependency, openai-harmony - implemented the gpt oss parser
-
- 22 Aug, 2025 2 commits
-
-
Graham King authored
-
Ayush Agarwal authored
-
- 21 Aug, 2025 2 commits
-
-
nachiketb-nvidia authored
-
Graham King authored
-
- 20 Aug, 2025 2 commits
-
-
Ayush Agarwal authored
-
nachiketb-nvidia authored
Changing the chat completions response objects from structs to types of dynamo_async_openai Implement aggregator traits for them chat completion structs add reasoning_content under message and delta message in lib/async-openai
-
- 19 Aug, 2025 2 commits
-
-
nachiketb-nvidia authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
atchernych authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-
- 18 Aug, 2025 1 commit
-
-
Ayush Agarwal authored
-
- 15 Aug, 2025 1 commit
-
-
Ayush Agarwal authored
-
- 14 Aug, 2025 1 commit
-
-
Greg Clark authored
Signed-off-by:Greg Clark <grclark@nvidia.com>
-
- 13 Aug, 2025 1 commit
-
-
Elyas Mehtabuddin authored
-
- 12 Aug, 2025 1 commit
-
-
KrishnanPrash authored
feat: Add frontend support for `min_tokens` and `ignore_eos` (outside of `nvext`) and Structured Output / Guided Decoding (#2380) Signed-off-by:
KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com> Co-authored-by:
Ayush Agarwal <ayushag@nvidia.com>
-
- 07 Aug, 2025 1 commit
-
-
Ayush Agarwal authored
Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
- 18 Jul, 2025 1 commit
-
-
Ryan Olson authored
-
- 17 Jul, 2025 1 commit
-
-
Ryan Olson authored
-
- 09 Jul, 2025 1 commit
-
-
Paul Hendricks authored
-
- 01 Jul, 2025 2 commits
-
-
Nathan Barry authored
-
Paul Hendricks authored
-
- 26 Jun, 2025 4 commits
-
-
Paul Hendricks authored
-
Paul Hendricks authored
-
Paul Hendricks authored
-
Paul Hendricks authored
-
- 25 Jun, 2025 2 commits
-
-
Zhongdongming Dai authored
-
ishandhanani authored
Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
- 24 Jun, 2025 2 commits
-
-
Paul Hendricks authored
-
Paul Hendricks authored
-
- 11 Jun, 2025 1 commit
-
-
Hongkuan Zhou authored
-
- 04 Jun, 2025 2 commits
-
-
Paul Hendricks authored
-
Tom O'Brien authored
-
- 03 Jun, 2025 1 commit
-
-
Hongkuan Zhou authored
Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
jothomson <jwillthomson19@gmail.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
- 19 May, 2025 1 commit
-
-
Tom O'Brien authored
Implements OpenAI embeddings (interface only). - Adds ModelType::Embedding - Adds OpenAI embedding request/response structs - Adds support for embedding model discovery
-
- 17 Mar, 2025 1 commit
-
-
Graham King authored
Previously several parts of the stack ensured max tokens (for this single request) was set. Now only text input sets it (to 8k). Everything else leaves as is, potentially blank. The engines themselves have very small defaults, 16 for vllm and 128 for sglang. Also fix dynamo-run CUDA startup message to only print if we're using an engine that would benefit from it (mistralrs, llamacpp).
-
- 14 Mar, 2025 1 commit
-
-
Ryan McCormick authored
-