Commits · 635d98ced120db35fee39e24d89c90c136d78328 · OpenDAS / dynamo

"examples/vscode:/vscode.git/clone" did not exist on "2d39ded64cbf3025b6ced809fd2a3e50bf1fb72d"

16 Mar, 2025 5 commits
- feat: runtime container has a launch screen (#185) · 635d98ce
  Harrison Saturley-Hall authored Mar 16, 2025
  
  635d98ce
- feat: add TTL to image builder job (#75) · a1b402e8
  julienmancuso authored Mar 15, 2025
```
Co-authored-by: Maksim Khadkevich <mkhadkevich@nvidia.com>
```
  a1b402e8
- fix: added missing dependency for sdk (#187) · 8cd8941c
  Maksim Khadkevich authored Mar 15, 2025
  
  8cd8941c
- feat: cli args to override service configs and small misc cleanups (#166) · afbf92fc
  ishandhanani authored Mar 15, 2025
  
  afbf92fc
- feat: update deploy api & sdk (#74) · 7f136e29
  April Yang authored Mar 15, 2025
```
Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Maksim Khadkevich <mkhadkevich@nvidia.com>
```
  7f136e29
15 Mar, 2025 10 commits
- feat: add routerless processor based monolith example (#180) · dd238a26
  Biswa Panda authored Mar 15, 2025
  
  dd238a26
- feat: updated to override command line and support non equals and space in … (#182) · a509b8f6
  Neelay Shah authored Mar 15, 2025
  
  a509b8f6
- chore: Apply patch to vLLM wheel (#177) · a2773f3e
  ptarasiewiczNV authored Mar 15, 2025
  
  a2773f3e
- fix: Fix GenAI-Perf outdated dependency by manually building it (#149) · 29726360
  Matthew Kotila authored Mar 15, 2025
  
  29726360
- feat(deploy): Add examples for dynamo serve (#173) · b4aff959
  Biswa Panda authored Mar 15, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
```
  b4aff959
- fix: Specify vLLM prebuilt wheel location (#176) · 177fb356
  ptarasiewiczNV authored Mar 15, 2025
  
  177fb356
- feat: Support a small runtime container (#167) · f64e2366
  Harrison Saturley-Hall authored Mar 15, 2025
  
  f64e2366
- chore: Update CODEOWNERS /deploy/ (#174) · 58e7461a
  Maksim Khadkevich authored Mar 14, 2025
  
  58e7461a
- fix: fix helm chart deployment (#172) · ffd47bca
  julienmancuso authored Mar 14, 2025
  
  ffd47bca
- feat(dynamo-run): Batch mode (#142) · 2cca070c
  Graham King authored Mar 14, 2025
```
```
  dynamo-run in=batch:prompts.jsonl out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct/
```

The file has genai format, one entry per line:
```
  {"text": "the prompt"}
  {"text": ..etc
```

The prompt is evaluated and the output written to `output.jsonl` in the
same folder as the input.

At the end of the run various statistics are printed:
> Ran 5 files in 8s 679ms. Tokens in: 40 (5/s). Tokens out: 346 (43/s)

This is also helpful for pushing load into the system and stressing the
various components. Not intended for performance measurement, it's a
batch inference tool.
```
  2cca070c
14 Mar, 2025 21 commits
- revert: "build: use wheel for vllm install (#163)" (#170) · 5cfcfe61
  Anant Sharma authored Mar 14, 2025
  
  5cfcfe61
- feat(dynamo-run): Various UX improvements (#168) · 1fb31d6a
  Graham King authored Mar 14, 2025
```
Engines mistralrs, sglang and vllm included by default. Can be disabled like this: `cargo build --no-default-features --features <add-back-what-you-want>`.

Added `--feature vulkan` option, for llamacpp.

Build time message if CUDA or Metal would help and are missing. That's the best we can do:
> warning: dynamo-run@0.1.0: CUDA not enabled, re-run with `--features cuda`

Runtime message if CUDA, Metal or Vulkan are enabled:
> 2025-03-14T21:59:26.501937Z  INFO dynamo_run: CUDA on

Runtime message if they are missing:
> 2025-03-14T22:02:37.439404Z  INFO dynamo_run: CPU mode. Rebuild with `--features cuda|metal|vulkan` for better performance

Defaut engine message includes available engines:
> 2025-03-14T21:59:26.503612Z  INFO dynamo_run: Using default engine: mistralrs. Use out=<engine> to specify one of echo_core, echo_full, mistralrs, llamacpp, sglang, vllm, pystr, pytok

The really important outcome is that this should now "just work":
```
  cargo install dynamo-run
  dynamo-run Qwen/Qwen2.5-3B-Instruct
```

Sadly you still need `--features cuda|metal` for performance, I couldn't automate that.
```
  1fb31d6a
- ci: Improve summarizing the test report (#153) · f465aca3
  Pavithra Vijayakrishnan authored Mar 14, 2025
  
  f465aca3
- fix: indent issue (duplicated source code) (#165) · 0694d6b5
  Hongkuan Zhou authored Mar 14, 2025
  
  0694d6b5
- feat: Support caching nixl build stage (#147) · 2abe926d
  Ryan McCormick authored Mar 14, 2025
  
  2abe926d
- feat(sdk): add initial graph structure for prebuilt components (#130) · b8120504
  ishandhanani authored Mar 14, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  b8120504
- fix(mac): Fix for virtual env (#164) · 4f7f4b40
  Graham King authored Mar 14, 2025
```
On Mac embedded python interpreters don't pick up the virtual env. This seems to be a known problem. Fix the sys.path.
```
  4f7f4b40
- chore: add alec to .py and .rs code owner (#162) · 663cde81
  Hongkuan Zhou authored Mar 14, 2025
  
  663cde81
- fix: wrong indent -> only one worker metric (#161) · 75c249a2
  Hongkuan Zhou authored Mar 14, 2025
  
  75c249a2
- build: use wheel for vllm install (#163) · 7713e25c
  Anant Sharma authored Mar 14, 2025
  
  7713e25c
- fix: Improve error handling for failed HF download (#160) · 0f4529e9
  Ryan McCormick authored Mar 14, 2025
  
  0f4529e9
- refactor: Update default log level to INFO and promote/demote a few log messages (#159) · 6a93d2c7
  Ryan McCormick authored Mar 14, 2025
  
  6a93d2c7
- build: reorganize python packaging to build new wheels (#118) · c1c22703
  Anant Sharma authored Mar 14, 2025
  
  c1c22703
- feat: LLMAPI PoC with dynamo-run launcher (#114) · e0bb5bd3
  Tanmay Verma authored Mar 14, 2025
  
  e0bb5bd3
- fix: Various for MacOS (#155) · 76b79149
  Graham King authored Mar 14, 2025
```
- Mac doesn't have `pipe2` syscall so use plain `pipe`.
- rtnetlink isn't a dependency on mac so don't use the type
```
  76b79149
- feat: add helm charts for deployment (#145) · 82f455d5
  hhzhang16 authored Mar 14, 2025
```
Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
```
  82f455d5
- fix: Fix cargo doc warnings for lib/runtime (#150) · 0f4c1c58
  Ryan McCormick authored Mar 14, 2025
  
  0f4c1c58
- fix: Fix cargo doc warnings for lib/llm (#151) · dac63127
  Ryan McCormick authored Mar 14, 2025
  
  dac63127
- fix: Add missing binaries back into container build (#152) · 7df6bb18
  Ryan McCormick authored Mar 14, 2025
  
  7df6bb18
- refactor: Remove STANDARD and VLLM_NIXL choices from build/run (#148) · cd14a1c5
  Ryan McCormick authored Mar 14, 2025
  
  cd14a1c5
- feat: global kv block manager (#45) · f04359cf
  Ryan Olson authored Mar 13, 2025
  
  f04359cf
13 Mar, 2025 4 commits
- build: Remove nats and etcd sources from final build (#144) · 530a6be0
  Ryan McCormick authored Mar 13, 2025
  
  530a6be0
- refactor: update dynamo run cmd to hint user to install dynamo-run when missing (#143) · 28eb8530
  Ziqi Fan authored Mar 13, 2025
  
  28eb8530
- build: add top level rust workspace (#137) · 3d292851
  Anant Sharma authored Mar 13, 2025
  
  3d292851
- feat(mistralrs): Let the engine enforce max tokens (#134) · 404a78e9
  Graham King authored Mar 13, 2025
```
Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step.

Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.
```
  404a78e9