- 03 Jun, 2025 4 commits
-
-
J Wyman authored
Creates a README.md file for Connect. The README contains and overview, examples w/ diagrams, and documents the important classes. The README is not intended to be comprehensive. Instead it's meant to be more of a "getting started" or "learn the basics". More comprehensive information / documentation is available from the inline comments / documentation. Additionally, updates the Multimodal Example: Moves the remote and local prefill code from the generate method into remote_prefill and local_prefill respectively. Code changes made. Replaces reference to "agent" with "worker" for consistency reasons throughout the inline documentation. Only comments updated. No code changes made. The intention of this change is improve readability of the example code and to provide clearer examples to reference from within documentation. DIS-101
-
Hongkuan Zhou authored
Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
jothomson <jwillthomson19@gmail.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Hongkuan Zhou authored
-
ptarasiewiczNV authored
-
- 02 Jun, 2025 10 commits
-
-
hhzhang16 authored
Signed-off-by:
hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Co-authored-by:
coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
-
Ryan McCormick authored
-
Graham King authored
Do not include by default as it needs libgomp1 at runtime. Add a feature to enable it at build time.
-
julienmancuso authored
-
Graham King authored
This allows building: - only `mistral.rs` engine: `--no-default-features --features mistralrs` - or only `llama.cpp` engine: `--no-default-features --features llamacpp`. Since llama.cpp became a default we'd only tested building both at once. The docs already said we supported that but there was some combo of Rust features that didn't build. This is the fix.
-
ptarasiewiczNV authored
-
julienmancuso authored
-
Ryan McCormick authored
-
Hongkuan Zhou authored
-
Graham King authored
It was confusing to have two names for one type. This tidy up started in #1064 , is now complete.
-
- 31 May, 2025 3 commits
-
-
Ryan McCormick authored
-
Hongkuan Zhou authored
-
mohammedabdulwahhab authored
-
- 30 May, 2025 13 commits
-
-
Biswa Panda authored
-
Olga Andreeva authored
-
Kris Hung authored
-
Ryan McCormick authored
-
jain-ria authored
-
Graham King authored
Unify them with all our other logs, so we can filter with DYN_LOG, they will eventually go to the log aggregation, etc.
-
Anant Sharma authored
-
Alec authored
-
jthomson04 authored
-
julienmancuso authored
-
ishandhanani authored
-
Biswa Panda authored
-
Tanmay Verma authored
-
- 29 May, 2025 10 commits
-
-
Graham King authored
Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF. Why? - Since #1177 `llama.cpp` is built-in by default, so we can switch. - `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch. Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model. We can still run GGUF with mistralrs by doing `out=mistralrs`.
-
Tanmay Verma authored
-
jthomson04 authored
-
J Wyman authored
This change corrects the README.md file in the examples/multimodal folder: - Correct "vllm worker" to "decode worker" - Correct assertion that data is moved via NATS when embeddings are moved via RDMA. Additionally, this change updates the textual graphs with Mermaid graphs for improved presentation on github.com.
-
Alec authored
-
Graham King authored
-
jthomson04 authored
-
Graham King authored
- Add Granite to our tokenizer - Fix pre-processor to load context length correctly - Add strftime_now Jinja function for prompt templates - Update llama.cpp - Handle trtllm errors when not using trtllm Support depends on the engine: - `mistral.rs`, our default engine, doesn't support Granite yet. - `llama.cpp` does and works very well: ``` dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384 ``` - `vllm` also works very well: ``` dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384 ``` - `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here. Closes: #1245
-
Ryan Olson authored
-
Jacky authored
-