Commits · 16310b269f866e6f4b7968ba6780e54a4f7b76f6 · OpenDAS / dynamo

24 Apr, 2025 1 commit
- feat: Warm‑up mistral.rs engine to reduce latency on subsequent requests (#796) · 4761baa6
  Abrar Shivani authored Apr 24, 2025
```
Send a warm‑up request to the mistralrs engine so that subsequent requests are faster.
```
  4761baa6
21 Apr, 2025 4 commits
- fix: Fix cancellation flow in python component graph (#765) · 420b7a82
  Pankaj Gupta authored Apr 21, 2025
  
  420b7a82
- feat: add custom lease to worker components (#748) · c392c341
  ishandhanani authored Apr 21, 2025
  
  c392c341
- chore(dynamo-run): Fix echo_core for EOS tokens (#759) · 4e75b04b
  Graham King authored Apr 21, 2025
```
"echo_core" is an engine that echoes the post-processed request back to you so you can see the template. Good for testing. It needed an extra flag set to work correctly.
```
  4e75b04b
- feat: add additional packages to log filters (#752) · ee865ca0
  Abrar Shivani authored Apr 21, 2025
  
  ee865ca0
18 Apr, 2025 3 commits
- chore: Remove TRT-LLM C++ engine in favor of Python one (#747) · 675a9bf5
  Graham King authored Apr 18, 2025
  
  675a9bf5
- feat: gracefully shutdown endpoint by revoking etcd lease + python binding (#730) · 4c38680e
  Hongkuan Zhou authored Apr 18, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4c38680e
- feat(dynamo-engine-vllm): vllm 0.8.X support (#728) · a745a980
  Graham King authored Apr 18, 2025
```
It's different enough that I made a new engine vllm0_8 and renamed the previous engine to vllm0_7.

`dynamo-run out=vllm` now expects 0.8. This matches the container change in #690.

For older use `dynamo-run out=vllm0_7`.
```
  a745a980
17 Apr, 2025 3 commits
- feat: configure logger with detail info (#654) · 50aa390b
  tlipoca9 authored Apr 18, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  50aa390b
- feat: adding dynamo-tokens crate (#718) · 99b76ba4
  Ryan Olson authored Apr 17, 2025
  
  99b76ba4
- docs: Remove outdated python-wheels directory reference (#719) · f4780e85
  Ryan McCormick authored Apr 16, 2025
  
  f4780e85
12 Apr, 2025 1 commit

feat: ETCD prefix watcher + python binding + runtime reconfiguration for... · 08fd2897

Hongkuan Zhou authored Apr 11, 2025

feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router (#581)
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

08fd2897

11 Apr, 2025 1 commit
- docs: add docstring for llm.rs (#267) · 447840c2
  Cole authored Apr 10, 2025
  
  447840c2
09 Apr, 2025 2 commits
- feat: Extract Common Configs + Log Configs on Init + Add `test_` to... · 0292feb5
  jon-chuang authored Apr 09, 2025
```
feat: Extract Common Configs + Log Configs on Init + Add `test_` to `sdk/tests` filenames required for pytest (#434)
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  0292feb5
- chore: update versions to 0.1.1 (#552) · fa7ee14c
  Anant Sharma authored Apr 09, 2025
  
  fa7ee14c
07 Apr, 2025 1 commit

feat(dynamo-run): Basic routing choice (#524) · ec2e7307

Graham King authored Apr 07, 2025

As a first step towards KV routing:
- introduce a `--router-mode` in dynamo-run that only does random and round-robin right now. Not that interesting yet.
- Make the vllm engine publish the KV events received from our patched vllm.

Now we "just" need to connect the two. Easy right?

ec2e7307

04 Apr, 2025 3 commits

feat: KV recorder for dumping router events into a jsonl (#505) · 4b6cfc1b
Yan Ru Pei authored Apr 04, 2025

4b6cfc1b

chore: Upgrade Rust to 1.86 (#518) · e99aa1e1

Graham King authored Apr 04, 2025

Also upgrade the cargo resolver to v3, the default.

New clippy lints:
- `next_back()` instead of `last()` for a double-ended iterator. That avoids walking the whole list.
- ` repeat_n` instead of `repeat.take`. That avoids cloning.
- Doc indenting

e99aa1e1

feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425

Graham King authored Apr 04, 2025

Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.

This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.

Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.

For NIM.

88ad3425

03 Apr, 2025 3 commits

refactor: migrate engines to standalone crates (#453) · 84985d3f

Ryan Olson authored Apr 03, 2025

Moved all of `lib/llm/src/engines` to their own crates as e.g. `lib/engines/mistralrs`. This will allow publishing of the `dynamo-llm` crate as it won't have any github dependencies.

The only engines in dynamo-llm will be the demo `echo` ones.
Co-authored-by: Graham King <grahamk@nvidia.com>

84985d3f

chore: rename duration to timeout (#503) · 3c49a02c
tlipoca9 authored Apr 03, 2025

3c49a02c
fix: adding missing file (#501) · 6795e645
Ryan Olson authored Apr 03, 2025

6795e645

02 Apr, 2025 1 commit
- feat: kv aware router executable (#399) · c4106e6a
  Ryan Olson authored Apr 02, 2025
  
  c4106e6a
01 Apr, 2025 2 commits
- feat: unified logging (#472) · 5b682f48
  Ryan Olson authored Apr 01, 2025
  
  5b682f48
- fix: sglang worker log extraction error (#447) · 10682826
  Kiv Chen authored Apr 01, 2025
  
  10682826
31 Mar, 2025 3 commits
- refactor: prometheus upgrade (#452) · de290537
  Ryan Olson authored Mar 31, 2025
  
  de290537
- chore: Upgrade llamacpp dependency (#449) · 5f9d1fc3
  Graham King authored Mar 31, 2025
  
  5f9d1fc3
- fix: potential out-of-bound (#420) · 2dc18dbc
  Tianer Zhou authored Mar 31, 2025
```
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
```
  2dc18dbc
28 Mar, 2025 1 commit
- feat: dynamo deploy hello world example to k8s (#205) · 8621d914
  Biswa Panda authored Mar 28, 2025
  
  8621d914
26 Mar, 2025 1 commit
- fix: disabling sse keep-alive (#408) · 50564320
  Ryan Olson authored Mar 26, 2025
  
  50564320
25 Mar, 2025 1 commit

feat: Allow passing any arguments to vllm and sglang engines (#368) · 670661f6

Graham King authored Mar 25, 2025

Put the arguments in a JSON file:
```
{
    "dtype": "half",
    "trust_remote_code": true
}
```

Pass it like this:
```
dynamo-run out=sglang ~/llm_models/Llama-3.2-3B-Instruct --extra-engine-args sglang_extra.json
```

Requested here https://github.com/ai-dynamo/dynamo/issues/290 (`dtype`) and here https://github.com/ai-dynamo/dynamo/issues/360 (`trust_remote_code`).

670661f6

24 Mar, 2025 1 commit

feat: Build pre-processor from GGUF (#344) · c7067fc2

Graham King authored Mar 24, 2025

This lets us do:
```
dynamo-run out=llamacpp <gguf_file>
```

Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.

c7067fc2

21 Mar, 2025 1 commit
- chore: add warn log when fix_venv failed (#338) · aa21a03b
  zhaohaidao authored Mar 22, 2025
  
  aa21a03b
20 Mar, 2025 1 commit

feat: add more useful APIs for tokens (#313) · d4d93b6a

Nora authored Mar 20, 2025



Add `AsMut`, `DerefMut` and `IntoIterator` trait impl for the `Tokens` structure.
Signed-off-by: nora-coder-dot <nora6677@gmail.com>
Co-authored-by: nora-coder-dot <nora6677@gmail.com>

d4d93b6a

19 Mar, 2025 4 commits

fix: update crates metadata (#264) · 68d953f7
Anant Sharma authored Mar 19, 2025
```
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
```
68d953f7

chore: Don't depend on openssl (#292) · 7c3fd5c9

Graham King authored Mar 19, 2025

This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked.

Pieces:
- `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag).
- Move shared dependencies up into workspace
- New `rand` crate has some renames for future rust
- Ensure the dependency doesn't creep back in by enforcing it with cargo deny.

7c3fd5c9

fix(mistralrs): Disable paged attention (#234) · fd95f37b

Graham King authored Mar 19, 2025

Under load it sometimes drops a request. The request gets added to the batch (sequence) and immediately gets a FinishReason Stop. Not sure why. It doesn't happen with the default scheduler (non-paged attention), so switch to that for now.

fd95f37b

fix(dynamo-run): Fix build if llamacpp and mistralrs are disabled (#262) · 3ac95a90
Graham King authored Mar 19, 2025

3ac95a90

18 Mar, 2025 2 commits
- docs: fix links in docs (#256) · 548578f4
  Dmitry Tokarev authored Mar 18, 2025
```
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  548578f4
- fix: temporary documentation for crates.io (#255) · 1ccd4caa
  Harrison Saturley-Hall authored Mar 18, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  1ccd4caa