Commits · f11fc3f3781c16687692e57215a501a2b6e4fe4b · OpenDAS / dynamo

25 Jun, 2025 2 commits
- fix: Disable NIXL backend for TRTLLM on ARM (#1639) · e84b1e77
  Tanmay Verma authored Jun 25, 2025
  
  e84b1e77
- chore: Add SERVED_MODEL_NAME for consistent model name regardless of MODEL_PATH (#1632) · 2becce56
  Ryan McCormick authored Jun 25, 2025
  
  2becce56
24 Jun, 2025 1 commit
- feat: Using NIXL for KV cache transfer when using disaggregated serving in TRTLLM (#1591) · 0b7cdf55
  Tanmay Verma authored Jun 24, 2025
```
Signed-off-by: Tanmay Verma <tanmay2592@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  0b7cdf55
17 Jun, 2025 1 commit
- fix: Fix message truncation in disagg flow (#1572) · 84454ab4
  Tanmay Verma authored Jun 17, 2025
  
  84454ab4
16 Jun, 2025 1 commit
- build: DIS-148 use the tensorrt_llm public wheel from pypi by default in container build (#1525) · 47d05d7a
  richardhuo-nv authored Jun 16, 2025
  
  47d05d7a
13 Jun, 2025 2 commits
- fix: Fix NATS_SERVER value, add details on customizing MOUNTS (#1520) · ae7e08a3
  Ryan McCormick authored Jun 14, 2025
  
  ae7e08a3
- docs: Add multi-node TRTLLM worker example (Deepseek R1) (#1511) · 40ca062f
  Ryan McCormick authored Jun 14, 2025
  
  40ca062f
12 Jun, 2025 1 commit
- docs: fix the README link to the perf.sh file (#1501) · 0bba09a4
  richardhuo-nv authored Jun 12, 2025
  
  0bba09a4
11 Jun, 2025 3 commits
- docs: Add note about ignore_eos for MTP (#1475) · 150e983a
  Ryan McCormick authored Jun 12, 2025
  
  150e983a
- docs: MTP + TensorRT LLM + DS R1 disaggregated example (#1473) · 3363d8b6
  richardhuo-nv authored Jun 11, 2025
  
  3363d8b6
- docs: add message to guide users to the stable version (#1457) · e32fe675
  richardhuo-nv authored Jun 11, 2025
  
  e32fe675
07 Jun, 2025 2 commits
- docs: Reference Deepseek R1 configs in TRTLLM README (#1414) · 9281c95f
  Ryan McCormick authored Jun 08, 2025
  
  9281c95f
- docs: add aggregated example turning on MTP with DeepSeek R1 (#1421) · 4de7f44c
  richardhuo-nv authored Jun 06, 2025
  
  4de7f44c
05 Jun, 2025 1 commit
- fix: Use Rust Ingress (dynamo-run) for the Frontend (#1391) · 568eb100
  Tanmay Verma authored Jun 04, 2025
  
  568eb100
04 Jun, 2025 2 commits
- feat: decouple bento dependency (#1266) · afb8495e
  Biswa Panda authored Jun 04, 2025
  
  afb8495e
- fix: add speculative decoding config to dynamo serve + trtllm (#1356) · 5c9a2d49
  richardhuo-nv authored Jun 04, 2025
  
  5c9a2d49
02 Jun, 2025 1 commit
- fix: Flatten pytorch_backend_config section to address breaking change to trtllm config (#1326) · d9f6d7a5
  Ryan McCormick authored Jun 03, 2025
  
  d9f6d7a5
31 May, 2025 1 commit
- fix: Update breaking change to enable_overlap_scheduler field from TRTLLM commit b4e5df0e (#1310) · 859944f4
  Ryan McCormick authored May 31, 2025
  
  859944f4
30 May, 2025 2 commits
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- fix: Fix mypy errors on trtllm examples (#1277) · 003c4270
  Tanmay Verma authored May 29, 2025
  
  003c4270
29 May, 2025 1 commit

feat: expose estimated kv cache hit in dynamo-run (#1246) · c9eb6a83

Hongkuan Zhou authored May 29, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

c9eb6a83

27 May, 2025 1 commit
- docs: fix minor typo (#1206) · a8bdc0be
  Akash authored May 28, 2025
```
Signed-off-by: Akash <akpaul@nvidia.com>
```
  a8bdc0be
21 May, 2025 2 commits
- feat: rename dynamo decorator (#1133) · 6d46288c
  Biswa Panda authored May 21, 2025
  
  6d46288c
- fix: Fix the protocol in the example (#1146) · 84377e5d
  Tanmay Verma authored May 21, 2025
  
  84377e5d
20 May, 2025 1 commit
- fix: Incrementally decode token to reduce the overhead from Processor (#1129) · b3da9427
  Tanmay Verma authored May 20, 2025
  
  b3da9427
19 May, 2025 1 commit

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

16 May, 2025 1 commit
- chore: Add example TRTLLM configs for Deepseek R1 (GB200) (#1099) · b6774b88
  Ryan McCormick authored May 15, 2025
  
  b6774b88
15 May, 2025 1 commit
- feat: Add ignore_eos/nvext support for legacy completions (#1080) · 7275d496
  Ryan McCormick authored May 14, 2025
  
  7275d496
09 May, 2025 1 commit
- feat: decouple dynamo sdk to support mutiple deployment targets (#905) · d675d221
  Biswa Panda authored May 08, 2025
  
  d675d221
08 May, 2025 1 commit
- docs: Add slurm env var workaround for MPI spawn errors (#992) · 57402e70
  Ryan McCormick authored May 08, 2025
  
  57402e70
07 May, 2025 2 commits
- fix: Check nvext for ignore_eos and set min_tokens for benchmark consistency (#988) · 0a894cc3
  Ryan McCormick authored May 07, 2025
  
  0a894cc3
- build: Cleans the TensorRTLLM + Dynamo container build (#968) · 7dd79013
  Tanmay Verma authored May 07, 2025
```
Signed-off-by: Tanmay Verma <tanmay2592@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  7dd79013
06 May, 2025 1 commit
- refactor: refactor dynamo deploy subfolder (#927) · 403344e5
  hhzhang16 authored May 06, 2025
  
  403344e5
02 May, 2025 2 commits
- feat: Update to support completion endpoint in TRTLLM (#837) · 960ee927
  Tanmay Verma authored May 02, 2025
  
  960ee927
- docs: Add multi-node TRTLLM steps to README (#930) · f0ac8e2b
  Ryan McCormick authored May 02, 2025
  
  f0ac8e2b
01 May, 2025 1 commit
- fix: add dedicated llmapi config for trtllm disagg kv routing example (#916) · 0086ebc6
  Ziqi Fan authored Apr 30, 2025
  
  0086ebc6
30 Apr, 2025 1 commit
- fix: trtllm example (#909) · 49517f2a
  Biswa Panda authored Apr 30, 2025
  
  49517f2a
29 Apr, 2025 1 commit

refactor: change trtllm example kv routing use python bindings | deal with... · 3c1c2ac3

Ziqi Fan authored Apr 28, 2025

refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change (#866)

3c1c2ac3

28 Apr, 2025 2 commits

fix: change the processor number to 5 to reduce the tokenization bottleneck (#865) · 6630fa5c

richardhuo-nv authored Apr 28, 2025

We were observing a 40% performance drop compared with trtllm serve when benchmarking with isl=1000 and osl=200 at a concurrency level > 128.

The number of the tokenization worker is the bottleneck. After bumping the tokenization processors number to 5, dynamo's benchmarking perf could match the trtllm serve's perf.

6630fa5c

feat: Add unified x86 / aarch64 (ARM) build for VLLM image (#839) · 566068dc
Ryan McCormick authored Apr 28, 2025

566068dc