- 30 May, 2025 6 commits
-
-
Alec authored
-
jthomson04 authored
-
julienmancuso authored
-
ishandhanani authored
-
Biswa Panda authored
-
Tanmay Verma authored
-
- 29 May, 2025 20 commits
-
-
Graham King authored
Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF. Why? - Since #1177 `llama.cpp` is built-in by default, so we can switch. - `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch. Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model. We can still run GGUF with mistralrs by doing `out=mistralrs`.
-
Tanmay Verma authored
-
jthomson04 authored
-
J Wyman authored
This change corrects the README.md file in the examples/multimodal folder: - Correct "vllm worker" to "decode worker" - Correct assertion that data is moved via NATS when embeddings are moved via RDMA. Additionally, this change updates the textual graphs with Mermaid graphs for improved presentation on github.com.
-
Alec authored
-
Graham King authored
-
jthomson04 authored
-
Graham King authored
- Add Granite to our tokenizer - Fix pre-processor to load context length correctly - Add strftime_now Jinja function for prompt templates - Update llama.cpp - Handle trtllm errors when not using trtllm Support depends on the engine: - `mistral.rs`, our default engine, doesn't support Granite yet. - `llama.cpp` does and works very well: ``` dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384 ``` - `vllm` also works very well: ``` dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384 ``` - `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here. Closes: #1245
-
Ryan Olson authored
-
Jacky authored
-
Tom O'Brien authored
-
Harrison Saturley-Hall authored
-
Anant Sharma authored
-
Hongkuan Zhou authored
Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
-
Graham King authored
-
Alec authored
-
Tushar Sharma authored
-
jthomson04 authored
-
Alec authored
-
Ryan McCormick authored
-
- 28 May, 2025 14 commits
-
-
Kris Hung authored
Co-authored-by:J Wyman <jwyman@nvidia.com>
-
Hongkuan Zhou authored
-
Biswa Panda authored
-
Graham King authored
Changes from default. - Disable the noisy status messages - Disable generating docstrings and unit tests
-
Graham King authored
Fixes #286
-
mohammedabdulwahhab authored
-
hhzhang16 authored
-
mohammedabdulwahhab authored
-
Graham King authored
It was removed from the docs in 0.2.1 and replaced with writing a [standalone Python engine](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_run.md#writing-your-own-engine-in-python). Also remove the associated `dynamo-run` feature `python`. Releasing this in 0.3.0 will resolve #784 and #1109.
-
Kris Hung authored
-
Tanmay Verma authored
-
mohammedabdulwahhab authored
-
Hongkuan Zhou authored
-
Alec authored
-