- 29 Apr, 2025 1 commit
-
-
Abrar Shivani authored
Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides. Changes: - Add --request-template CLI flag to specify template file path - Integrate template support in HTTP, batch and text input modes - Template values can be overridden by individual request parameters - Example template.json: ``` { "model": "Qwen2.5-3B-Instruct", "temperature": 0.7, "max_completion_tokens": 4096 } ```
-
- 21 Apr, 2025 1 commit
-
-
Graham King authored
"echo_core" is an engine that echoes the post-processed request back to you so you can see the template. Good for testing. It needed an extra flag set to work correctly.
-
- 26 Mar, 2025 1 commit
-
-
Ryan Olson authored
-
- 08 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-
- 05 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
- 28 Feb, 2025 1 commit
-
-
Paul Hendricks authored
-
- 27 Feb, 2025 2 commits
-
-
Paul Hendricks authored
-
Paul Hendricks authored
-
- 26 Feb, 2025 1 commit
-
-
Paul Hendricks authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
- 25 Feb, 2025 1 commit
-
-
Neelay Shah authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
- 10 Feb, 2025 1 commit
-
-
Ryan Olson authored
Signed-off-by:
Ryan Olson <ryanolson@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-