- 17 Jul, 2025 1 commit
-
-
GuanLuo authored
-
- 09 Jul, 2025 1 commit
-
-
Paul Hendricks authored
-
- 30 Jun, 2025 1 commit
-
-
Paul Hendricks authored
-
- 25 Jun, 2025 1 commit
-
-
ishandhanani authored
Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
- 04 Jun, 2025 1 commit
-
-
Paul Hendricks authored
-
- 29 May, 2025 1 commit
-
-
Graham King authored
- Add Granite to our tokenizer - Fix pre-processor to load context length correctly - Add strftime_now Jinja function for prompt templates - Update llama.cpp - Handle trtllm errors when not using trtllm Support depends on the engine: - `mistral.rs`, our default engine, doesn't support Granite yet. - `llama.cpp` does and works very well: ``` dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384 ``` - `vllm` also works very well: ``` dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384 ``` - `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here. Closes: #1245
-
- 01 May, 2025 1 commit
-
-
Graham King authored
Part of https://github.com/ai-dynamo/dynamo/issues/743
-
- 28 Apr, 2025 1 commit
-
-
Olga Andreeva authored
Signed-off-by:Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
-
- 24 Mar, 2025 1 commit
-
-
Graham King authored
This lets us do: ``` dynamo-run out=llamacpp <gguf_file> ``` Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.
-
- 06 Mar, 2025 1 commit
-
-
Ryan McCormick authored
-
- 28 Feb, 2025 1 commit
-
-
Paul Hendricks authored
-
- 27 Feb, 2025 1 commit
-
-
Paul Hendricks authored
-
- 26 Feb, 2025 1 commit
-
-
Paul Hendricks authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
- 25 Feb, 2025 2 commits
-
-
Ryan McCormick authored
Signed-off-by:Ryan McCormick <rmccormick@nvidia.com>
-
Neelay Shah authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-