- 02 Jun, 2025 6 commits
-
-
Graham King authored
This allows building: - only `mistral.rs` engine: `--no-default-features --features mistralrs` - or only `llama.cpp` engine: `--no-default-features --features llamacpp`. Since llama.cpp became a default we'd only tested building both at once. The docs already said we supported that but there was some combo of Rust features that didn't build. This is the fix.
-
ptarasiewiczNV authored
-
julienmancuso authored
-
Ryan McCormick authored
-
Hongkuan Zhou authored
-
Graham King authored
It was confusing to have two names for one type. This tidy up started in #1064 , is now complete.
-
- 31 May, 2025 3 commits
-
-
Ryan McCormick authored
-
Hongkuan Zhou authored
-
mohammedabdulwahhab authored
-
- 30 May, 2025 13 commits
-
-
Biswa Panda authored
-
Olga Andreeva authored
-
Kris Hung authored
-
Ryan McCormick authored
-
jain-ria authored
-
Graham King authored
Unify them with all our other logs, so we can filter with DYN_LOG, they will eventually go to the log aggregation, etc.
-
Anant Sharma authored
-
Alec authored
-
jthomson04 authored
-
julienmancuso authored
-
ishandhanani authored
-
Biswa Panda authored
-
Tanmay Verma authored
-
- 29 May, 2025 18 commits
-
-
Graham King authored
Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF. Why? - Since #1177 `llama.cpp` is built-in by default, so we can switch. - `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch. Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model. We can still run GGUF with mistralrs by doing `out=mistralrs`.
-
Tanmay Verma authored
-
jthomson04 authored
-
J Wyman authored
This change corrects the README.md file in the examples/multimodal folder: - Correct "vllm worker" to "decode worker" - Correct assertion that data is moved via NATS when embeddings are moved via RDMA. Additionally, this change updates the textual graphs with Mermaid graphs for improved presentation on github.com.
-
Alec authored
-
Graham King authored
-
jthomson04 authored
-
Graham King authored
- Add Granite to our tokenizer - Fix pre-processor to load context length correctly - Add strftime_now Jinja function for prompt templates - Update llama.cpp - Handle trtllm errors when not using trtllm Support depends on the engine: - `mistral.rs`, our default engine, doesn't support Granite yet. - `llama.cpp` does and works very well: ``` dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384 ``` - `vllm` also works very well: ``` dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384 ``` - `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here. Closes: #1245
-
Ryan Olson authored
-
Jacky authored
-
Tom O'Brien authored
-
Harrison Saturley-Hall authored
-
Anant Sharma authored
-
Hongkuan Zhou authored
Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
-
Graham King authored
-
Alec authored
-
Tushar Sharma authored
-
jthomson04 authored
-