Commits · d346782c7c4bcbbb5ad0fdf0deeace42506ce887 · OpenDAS / dynamo

"vscode:/vscode.git/clone" did not exist on "2ee29443b6ec02d875d1e5bdf1c47667123f4be4"

25 Apr, 2025 1 commit

chore: Publish Model Deployment Card to NATS (#799) · d346782c

Graham King authored Apr 25, 2025

This will allow an ingress-side pre-processor to see it without needing a model checkout.

Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.

To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store.

The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.

Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.

Part of #743

d346782c

24 Apr, 2025 1 commit
- feat: Warm‑up mistral.rs engine to reduce latency on subsequent requests (#796) · 4761baa6
  Abrar Shivani authored Apr 24, 2025
```
Send a warm‑up request to the mistralrs engine so that subsequent requests are faster.
```
  4761baa6
21 Apr, 2025 4 commits
- fix: Fix cancellation flow in python component graph (#765) · 420b7a82
  Pankaj Gupta authored Apr 21, 2025
  
  420b7a82
- feat: add custom lease to worker components (#748) · c392c341
  ishandhanani authored Apr 21, 2025
  
  c392c341
- chore(dynamo-run): Fix echo_core for EOS tokens (#759) · 4e75b04b
  Graham King authored Apr 21, 2025
```
"echo_core" is an engine that echoes the post-processed request back to you so you can see the template. Good for testing. It needed an extra flag set to work correctly.
```
  4e75b04b
- feat: add additional packages to log filters (#752) · ee865ca0
  Abrar Shivani authored Apr 21, 2025
  
  ee865ca0
18 Apr, 2025 3 commits
- chore: Remove TRT-LLM C++ engine in favor of Python one (#747) · 675a9bf5
  Graham King authored Apr 18, 2025
  
  675a9bf5
- feat: gracefully shutdown endpoint by revoking etcd lease + python binding (#730) · 4c38680e
  Hongkuan Zhou authored Apr 18, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4c38680e
- feat(dynamo-engine-vllm): vllm 0.8.X support (#728) · a745a980
  Graham King authored Apr 18, 2025
```
It's different enough that I made a new engine vllm0_8 and renamed the previous engine to vllm0_7.

`dynamo-run out=vllm` now expects 0.8. This matches the container change in #690.

For older use `dynamo-run out=vllm0_7`.
```
  a745a980
17 Apr, 2025 3 commits
- feat: configure logger with detail info (#654) · 50aa390b
  tlipoca9 authored Apr 18, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  50aa390b
- feat: adding dynamo-tokens crate (#718) · 99b76ba4
  Ryan Olson authored Apr 17, 2025
  
  99b76ba4
- docs: Remove outdated python-wheels directory reference (#719) · f4780e85
  Ryan McCormick authored Apr 16, 2025
  
  f4780e85
12 Apr, 2025 1 commit

feat: ETCD prefix watcher + python binding + runtime reconfiguration for... · 08fd2897

Hongkuan Zhou authored Apr 11, 2025

feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router (#581)
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

08fd2897

11 Apr, 2025 1 commit
- docs: add docstring for llm.rs (#267) · 447840c2
  Cole authored Apr 10, 2025
  
  447840c2
09 Apr, 2025 2 commits
- feat: Extract Common Configs + Log Configs on Init + Add `test_` to... · 0292feb5
  jon-chuang authored Apr 09, 2025
```
feat: Extract Common Configs + Log Configs on Init + Add `test_` to `sdk/tests` filenames required for pytest (#434)
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  0292feb5
- chore: update versions to 0.1.1 (#552) · fa7ee14c
  Anant Sharma authored Apr 09, 2025
  
  fa7ee14c
07 Apr, 2025 1 commit

feat(dynamo-run): Basic routing choice (#524) · ec2e7307

Graham King authored Apr 07, 2025

As a first step towards KV routing:
- introduce a `--router-mode` in dynamo-run that only does random and round-robin right now. Not that interesting yet.
- Make the vllm engine publish the KV events received from our patched vllm.

Now we "just" need to connect the two. Easy right?

ec2e7307

04 Apr, 2025 3 commits

feat: KV recorder for dumping router events into a jsonl (#505) · 4b6cfc1b
Yan Ru Pei authored Apr 04, 2025

4b6cfc1b

chore: Upgrade Rust to 1.86 (#518) · e99aa1e1

Graham King authored Apr 04, 2025

Also upgrade the cargo resolver to v3, the default.

New clippy lints:
- `next_back()` instead of `last()` for a double-ended iterator. That avoids walking the whole list.
- ` repeat_n` instead of `repeat.take`. That avoids cloning.
- Doc indenting

e99aa1e1

feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425

Graham King authored Apr 04, 2025

Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.

This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.

Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.

For NIM.

88ad3425

03 Apr, 2025 3 commits

refactor: migrate engines to standalone crates (#453) · 84985d3f

Ryan Olson authored Apr 03, 2025

Moved all of `lib/llm/src/engines` to their own crates as e.g. `lib/engines/mistralrs`. This will allow publishing of the `dynamo-llm` crate as it won't have any github dependencies.

The only engines in dynamo-llm will be the demo `echo` ones.
Co-authored-by: Graham King <grahamk@nvidia.com>

84985d3f

chore: rename duration to timeout (#503) · 3c49a02c
tlipoca9 authored Apr 03, 2025

3c49a02c
fix: adding missing file (#501) · 6795e645
Ryan Olson authored Apr 03, 2025

6795e645

02 Apr, 2025 1 commit
- feat: kv aware router executable (#399) · c4106e6a
  Ryan Olson authored Apr 02, 2025
  
  c4106e6a
01 Apr, 2025 2 commits
- feat: unified logging (#472) · 5b682f48
  Ryan Olson authored Apr 01, 2025
  
  5b682f48
- fix: sglang worker log extraction error (#447) · 10682826
  Kiv Chen authored Apr 01, 2025
  
  10682826
31 Mar, 2025 3 commits
- refactor: prometheus upgrade (#452) · de290537
  Ryan Olson authored Mar 31, 2025
  
  de290537
- chore: Upgrade llamacpp dependency (#449) · 5f9d1fc3
  Graham King authored Mar 31, 2025
  
  5f9d1fc3
- fix: potential out-of-bound (#420) · 2dc18dbc
  Tianer Zhou authored Mar 31, 2025
```
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
```
  2dc18dbc
28 Mar, 2025 1 commit
- feat: dynamo deploy hello world example to k8s (#205) · 8621d914
  Biswa Panda authored Mar 28, 2025
  
  8621d914
26 Mar, 2025 1 commit
- fix: disabling sse keep-alive (#408) · 50564320
  Ryan Olson authored Mar 26, 2025
  
  50564320
25 Mar, 2025 1 commit

feat: Allow passing any arguments to vllm and sglang engines (#368) · 670661f6

Graham King authored Mar 25, 2025

Put the arguments in a JSON file:
```
{
    "dtype": "half",
    "trust_remote_code": true
}
```

Pass it like this:
```
dynamo-run out=sglang ~/llm_models/Llama-3.2-3B-Instruct --extra-engine-args sglang_extra.json
```

Requested here https://github.com/ai-dynamo/dynamo/issues/290 (`dtype`) and here https://github.com/ai-dynamo/dynamo/issues/360 (`trust_remote_code`).

670661f6

24 Mar, 2025 1 commit

feat: Build pre-processor from GGUF (#344) · c7067fc2

Graham King authored Mar 24, 2025

This lets us do:
```
dynamo-run out=llamacpp <gguf_file>
```

Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.

c7067fc2

21 Mar, 2025 1 commit
- chore: add warn log when fix_venv failed (#338) · aa21a03b
  zhaohaidao authored Mar 22, 2025
  
  aa21a03b
20 Mar, 2025 1 commit

feat: add more useful APIs for tokens (#313) · d4d93b6a

Nora authored Mar 20, 2025



Add `AsMut`, `DerefMut` and `IntoIterator` trait impl for the `Tokens` structure.
Signed-off-by: nora-coder-dot <nora6677@gmail.com>
Co-authored-by: nora-coder-dot <nora6677@gmail.com>

d4d93b6a

19 Mar, 2025 4 commits

fix: update crates metadata (#264) · 68d953f7
Anant Sharma authored Mar 19, 2025
```
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
```
68d953f7

chore: Don't depend on openssl (#292) · 7c3fd5c9

Graham King authored Mar 19, 2025

This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked.

Pieces:
- `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag).
- Move shared dependencies up into workspace
- New `rand` crate has some renames for future rust
- Ensure the dependency doesn't creep back in by enforcing it with cargo deny.

7c3fd5c9

fix(mistralrs): Disable paged attention (#234) · fd95f37b

Graham King authored Mar 19, 2025

Under load it sometimes drops a request. The request gets added to the batch (sequence) and immediately gets a FinishReason Stop. Not sure why. It doesn't happen with the default scheduler (non-paged attention), so switch to that for now.

fd95f37b

fix(dynamo-run): Fix build if llamacpp and mistralrs are disabled (#262) · 3ac95a90
Graham King authored Mar 19, 2025

3ac95a90

18 Mar, 2025 1 commit
- docs: fix links in docs (#256) · 548578f4
  Dmitry Tokarev authored Mar 18, 2025
```
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  548578f4