Commits · 4b1867c53ebbf98dea54623af24d2424ead56573 · OpenDAS / dynamo

17 Mar, 2025 17 commits
- chore: move examples to top level (#220) · 4b1867c5
  Neelay Shah authored Mar 17, 2025
  
  4b1867c5
- fix(runtime): Shutdown message from eprintln to tracing debug (#219) · f46f6d0e
  Graham King authored Mar 17, 2025
  
  f46f6d0e
- fix: add checkout action for path filters (#222) · 47e3439c
  Anant Sharma authored Mar 17, 2025
  
  47e3439c
- ci: enable copy-pr-bot, change trigger for internal CI, and optimizations for public CI (#203) · 06c10bba
  Harrison Saturley-Hall authored Mar 17, 2025
  
  06c10bba
- revert: "moving examples to top level" (#218) · 21b795e8
  Anant Sharma authored Mar 17, 2025
  
  21b795e8
- fix(vllm,sglang): Let the engine enforce max tokens (#216) · 05765cd4
  Graham King authored Mar 17, 2025
```
Previously several parts of the stack ensured max tokens (for this single request) was set.

Now only text input sets it (to 8k). Everything else leaves as is, potentially blank. The engines themselves have very small defaults, 16 for vllm and 128 for sglang.

Also fix dynamo-run CUDA startup message to only print if we're using an engine that would benefit from it (mistralrs, llamacpp).
```
  05765cd4
- moving examples to top level · 8891aa0c
  nnshah1 authored Mar 17, 2025
  
  8891aa0c
- chore: update nixl github commit (#214) · ed83e246
  Anant Sharma authored Mar 17, 2025
```
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
```
  ed83e246
- docs: Add disaggregated architecture mermaid diagram (#190) · 70266ec8
  ptarasiewiczNV authored Mar 17, 2025
```
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
```
  70266ec8
- docs: Add documentation for Dynamo Architecture and key features (#207) · aca25898
  Suman Tatiraju authored Mar 17, 2025
  
  aca25898
- feat: expose Python binding for KVEventPublisher. Use event pub/sub trait for KV events (#169) · 6e09681e
  GuanLuo authored Mar 17, 2025
  
  6e09681e
- chore: refactor examples and clean CLI (#195) · df51a622
  ishandhanani authored Mar 16, 2025
  
  df51a622
- docs: Add Grafana Dashboard Image to Metrics README (#204) · 0517f757
  Ryan McCormick authored Mar 16, 2025
  
  0517f757
- fix: Fix KV Cache Hit Rate Metrics and Misc QoL Updates (#199) · c8737c1f
  Ryan McCormick authored Mar 16, 2025
  
  c8737c1f
- chore: removing outdated examples (#202) · b92834c8
  Neelay Shah authored Mar 16, 2025
  
  b92834c8
- chore: remove sdk toml file (#179) · fd79234f
  Anant Sharma authored Mar 16, 2025
  
  fd79234f
- build: add vllm optional dependency (#201) · 12600c73
  Anant Sharma authored Mar 16, 2025
  
  12600c73
16 Mar, 2025 10 commits
- chore: Update CODEOWNERS - DevOps team for CI/CD, containers/, legal (#194) · 3ba2d427
  Dmitry Tokarev authored Mar 16, 2025
  
  3ba2d427
- chore: update version for patched vLLM wheel (#193) · 761ff073
  Anant Sharma authored Mar 16, 2025
  
  761ff073
- docs: Update README with marketing notes (#184) · 9ae7dde7
  David Zier authored Mar 16, 2025
  
  9ae7dde7
- chore: adding kubernetes - works for target dev (#192) · 233f782d
  Neelay Shah authored Mar 16, 2025
  
  233f782d
- fix: Add nixl to runtime container (#191) · 5bf82e48
  ptarasiewiczNV authored Mar 16, 2025
```
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
```
  5bf82e48
- feat: runtime container has a launch screen (#185) · 635d98ce
  Harrison Saturley-Hall authored Mar 16, 2025
  
  635d98ce
- feat: add TTL to image builder job (#75) · a1b402e8
  julienmancuso authored Mar 15, 2025
```
Co-authored-by: Maksim Khadkevich <mkhadkevich@nvidia.com>
```
  a1b402e8
- fix: added missing dependency for sdk (#187) · 8cd8941c
  Maksim Khadkevich authored Mar 15, 2025
  
  8cd8941c
- feat: cli args to override service configs and small misc cleanups (#166) · afbf92fc
  ishandhanani authored Mar 15, 2025
  
  afbf92fc
- feat: update deploy api & sdk (#74) · 7f136e29
  April Yang authored Mar 15, 2025
```
Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Maksim Khadkevich <mkhadkevich@nvidia.com>
```
  7f136e29
15 Mar, 2025 10 commits
- feat: add routerless processor based monolith example (#180) · dd238a26
  Biswa Panda authored Mar 15, 2025
  
  dd238a26
- feat: updated to override command line and support non equals and space in … (#182) · a509b8f6
  Neelay Shah authored Mar 15, 2025
  
  a509b8f6
- chore: Apply patch to vLLM wheel (#177) · a2773f3e
  ptarasiewiczNV authored Mar 15, 2025
  
  a2773f3e
- fix: Fix GenAI-Perf outdated dependency by manually building it (#149) · 29726360
  Matthew Kotila authored Mar 15, 2025
  
  29726360
- feat(deploy): Add examples for dynamo serve (#173) · b4aff959
  Biswa Panda authored Mar 15, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
```
  b4aff959
- fix: Specify vLLM prebuilt wheel location (#176) · 177fb356
  ptarasiewiczNV authored Mar 15, 2025
  
  177fb356
- feat: Support a small runtime container (#167) · f64e2366
  Harrison Saturley-Hall authored Mar 15, 2025
  
  f64e2366
- chore: Update CODEOWNERS /deploy/ (#174) · 58e7461a
  Maksim Khadkevich authored Mar 14, 2025
  
  58e7461a
- fix: fix helm chart deployment (#172) · ffd47bca
  julienmancuso authored Mar 14, 2025
  
  ffd47bca
- feat(dynamo-run): Batch mode (#142) · 2cca070c
  Graham King authored Mar 14, 2025
```
```
  dynamo-run in=batch:prompts.jsonl out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct/
```

The file has genai format, one entry per line:
```
  {"text": "the prompt"}
  {"text": ..etc
```

The prompt is evaluated and the output written to `output.jsonl` in the
same folder as the input.

At the end of the run various statistics are printed:
> Ran 5 files in 8s 679ms. Tokens in: 40 (5/s). Tokens out: 346 (43/s)

This is also helpful for pushing load into the system and stressing the
various components. Not intended for performance measurement, it's a
batch inference tool.
```
  2cca070c
14 Mar, 2025 3 commits

revert: "build: use wheel for vllm install (#163)" (#170) · 5cfcfe61
Anant Sharma authored Mar 14, 2025

5cfcfe61

feat(dynamo-run): Various UX improvements (#168) · 1fb31d6a

Graham King authored Mar 14, 2025

Engines mistralrs, sglang and vllm included by default. Can be disabled like this: `cargo build --no-default-features --features <add-back-what-you-want>`.

Added `--feature vulkan` option, for llamacpp.

Build time message if CUDA or Metal would help and are missing. That's the best we can do:
> warning: dynamo-run@0.1.0: CUDA not enabled, re-run with `--features cuda`

Runtime message if CUDA, Metal or Vulkan are enabled:
> 2025-03-14T21:59:26.501937Z  INFO dynamo_run: CUDA on

Runtime message if they are missing:
> 2025-03-14T22:02:37.439404Z  INFO dynamo_run: CPU mode. Rebuild with `--features cuda|metal|vulkan` for better performance

Defaut engine message includes available engines:
> 2025-03-14T21:59:26.503612Z  INFO dynamo_run: Using default engine: mistralrs. Use out=<engine> to specify one of echo_core, echo_full, mistralrs, llamacpp, sglang, vllm, pystr, pytok

The really important outcome is that this should now "just work":
```
cargo install dynamo-run
dynamo-run Qwen/Qwen2.5-3B-Instruct
```

Sadly you still need `--features cuda|metal` for performance, I couldn't automate that.

1fb31d6a

ci: Improve summarizing the test report (#153) · f465aca3
Pavithra Vijayakrishnan authored Mar 14, 2025

f465aca3