- 11 Mar, 2026 1 commit
-
-
MatejKosec authored
Signed-off-by:Matej Kosec <mkosec@nvidia.com>
-
- 06 Mar, 2026 1 commit
-
-
Yuewei Na authored
Signed-off-by:
Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by:
Yuewei Na <nv-yna@users.noreply.github.com>
-
- 10 Feb, 2026 1 commit
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 05 Feb, 2026 1 commit
-
-
Keiven C authored
Signed-off-by:
Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by:
Keiven Chang <keivenchang@users.noreply.github.com>
-
- 02 Jan, 2026 1 commit
-
-
Tushar Sharma authored
Signed-off-by:Tushar Sharma <tusharma@nvidia.com>
-
- 19 Dec, 2025 1 commit
-
-
milesial authored
Signed-off-by:Alexandre Milesi <milesial@users.noreply.github.com>
-
- 02 Dec, 2025 1 commit
-
-
GuanLuo authored
fix: ModelDeploymentCard obtains full set of eos_token_ids by taking union from different files (#3192) Signed-off-by:
Guan Luo <gluo@nvidia.com> Signed-off-by:
Guan Luo <41310872+GuanLuo@users.noreply.github.com>
-
- 17 Nov, 2025 1 commit
-
-
Keiven C authored
Signed-off-by:
Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by:
Keiven Chang <keivenchang@users.noreply.github.com>
-
- 03 Nov, 2025 1 commit
-
-
KrishnanPrash authored
Signed-off-by:Krishnan Prashanth <kprashanth@nvidia.com>
-
- 31 Oct, 2025 1 commit
-
-
milesial authored
Signed-off-by:Alexandre Milesi <milesial@users.noreply.github.com>
-
- 27 Oct, 2025 1 commit
-
-
milesial authored
Signed-off-by:Alexandre Milesi <30204471+milesial@users.noreply.github.com>
-
- 17 Sep, 2025 2 commits
-
-
Chi McIsaac authored
Signed-off-by:Chi McIsaac <chixie.mcisaac@gmail.com>
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 05 Sep, 2025 1 commit
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 03 Sep, 2025 1 commit
-
-
KrishnanPrash authored
Signed-off-by:Krishnan Prashanth <kprashanth@nvidia.com>
-
- 22 Aug, 2025 1 commit
-
-
Graham King authored
-
- 19 Aug, 2025 1 commit
-
-
nachiketb-nvidia authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
- 12 Aug, 2025 1 commit
-
-
KrishnanPrash authored
feat: Add frontend support for `min_tokens` and `ignore_eos` (outside of `nvext`) and Structured Output / Guided Decoding (#2380) Signed-off-by:
KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com> Co-authored-by:
Ayush Agarwal <ayushag@nvidia.com>
-
- 07 Aug, 2025 2 commits
-
-
Graham King authored
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 22 May, 2025 1 commit
-
-
Graham King authored
Removed the hard coded sleeps, explained what we're testing. Closes https://github.com/ai-dynamo/dynamo/issues/1132 The race condition is that `apply_event` sends a message on a channel, it does not directly apply the event. At some later point the tokio runtime schedules the task running the channel receiver, which applies the event. If that had not happened yet the test would fail.
-
- 06 May, 2025 1 commit
-
-
Graham King authored
Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests: ``` from dynamo.llm import register_llm MODEL = "Qwen/Qwen2.5-0.5B-Instruct" await register_llm(endpoint, MODEL, 3) ``` Full vllm example, with pre-processing in dynamo: - `dynamo-run in=text out=dyn://dynamo.backend.generate` - `cd lib/bindings/python/examples/hello_world` - `python server_vllm.py` This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus. The `register_llm` call does this: - Download the model from HF if necessary - Load the model deployment card from the HF folder or extract from GGUF - Push the tokenizer config etc into NATS object store so ingress can access it from a different machine - Publish the model deployment card to ETCD
-
- 08 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-
- 05 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
- 27 Feb, 2025 1 commit
-
-
Paul Hendricks authored
-
- 26 Feb, 2025 1 commit
-
-
Paul Hendricks authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
- 25 Feb, 2025 3 commits
-
-
Graham King authored
Add backend type `EngineConfig::StaticCore` that wraps the engine in a preprocessor (prompt templating and tokenization). Add example engine `echo_core` (`out=echo_core`) which takes and returns tokens. A nice side effect is that it echos the full prompt template with system prompt, whereas `echo_full` echos only user prompt. 
-
Ryan McCormick authored
Signed-off-by:Ryan McCormick <rmccormick@nvidia.com>
-
Neelay Shah authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
- 24 Feb, 2025 1 commit
-
-
Biswa Panda authored
-