Commits · 39d645e58647d6adb074650e46be5de25f3f3bc6 · OpenDAS / dynamo

12 Feb, 2026 1 commit
- docs: migrate Fern docs from fern/ into docs/ (#6206) · 39d645e5
  Jonathan Tong authored Feb 11, 2026
```
Signed-off-by: Jont828 <jt572@cornell.edu>
```
  39d645e5
07 Feb, 2026 1 commit
- docs: full migration of docs/ to fern format in fern/ (#6050) · 2c3066bd
  dagil-nvidia authored Feb 06, 2026
```
Signed-off-by: Dan Gil <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
```
  2c3066bd
30 Jan, 2026 1 commit
- docs: update Fern docs for main branch (#5706) · 7ca6a562
  Jonathan Tong authored Jan 30, 2026
```
Signed-off-by: Jont828 <jt572@cornell.edu>
```
  7ca6a562
26 Jan, 2026 1 commit

docs: migrate existing docs to fern (#5445) · f9050aae

Jonathan Tong authored Jan 26, 2026


Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Co-authored-by: Neal Vaidya <nealv@nvidia.com>

f9050aae

21 Nov, 2025 1 commit
- chore: merge KvIndexer and ApproxKvIndexer (#4500) · c61e0dd3
  Yan Ru Pei authored Nov 21, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  c61e0dd3
19 Nov, 2025 1 commit
- feat: Only monitor NATS metrics if using NATS request plane (#4442) · 69797b5a
  Graham King authored Nov 19, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  69797b5a
11 Nov, 2025 1 commit
- chore: Remove static mode (#4235) · e1af3af6
  Graham King authored Nov 11, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  e1af3af6
31 Oct, 2025 1 commit
- refactor: move backend deploy, launch and slurm files from components to examples (#3849) · 8bd37c96
  Anant Sharma authored Oct 31, 2025
```
Signed-off-by: Anant Sharma <anants@nvidia.com>
```
  8bd37c96
22 Oct, 2025 1 commit

docs: address Harry/VDR feedback + fixing broken links across repository (#3802) · c6b59045

Anish authored Oct 22, 2025


Signed-off-by: Harry Kim <harry_kim@live.com>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Signed-off-by: akshatha-k <33278067+akshatha-k@users.noreply.github.com>
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: Harry Kim <harry_kim@live.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: akshatha-k <33278067+akshatha-k@users.noreply.github.com>
Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

c6b59045

16 Oct, 2025 1 commit
- docs: reorganizing documentation to make things clearer (#3658) · 598cbbb7
  Anish authored Oct 16, 2025
```
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Co-authored-by: Claude <noreply@anthropic.com>
```
  598cbbb7
08 Oct, 2025 2 commits
- chore: Remove llama.cpp engine (#3499) · 0aa0768f
  Graham King authored Oct 08, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  0aa0768f
- chore: Remove GGUF support (#3488) · 1b1265e6
  Graham King authored Oct 08, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  1b1265e6
16 Sep, 2025 1 commit
- fix: Interactive inputs actually stops, does not ignore stop token (#3057) · 87e6e052
  Graham King authored Sep 16, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  87e6e052
03 Sep, 2025 1 commit

refactor: Split ModelType to ModelInput for request and response type;... · 27fad26f

Olga Andreeva authored Sep 03, 2025

refactor: Split ModelType to ModelInput for request and response type; ModelType for the supported workloads (#2714)
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Guan Luo <gluo@nvidia.com>
Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

27fad26f

02 Sep, 2025 1 commit
- feat: FT Request Cancellation feature and test for 0.5.0 (#2500) · 6c539fbd
  Jacky authored Sep 02, 2025
  
  6c539fbd
06 Aug, 2025 2 commits
- docs(dynamo-run): Remove vllm/sglang/trtllm engines from dynamo-run docs (#2332) · 6be5c196
  Graham King authored Aug 06, 2025
  
  6be5c196
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
05 Aug, 2025 1 commit
- feat: Pass user_data to register_llm for LoRA support (#2286) · 433f6012
  Chi authored Aug 05, 2025
  
  433f6012
01 Aug, 2025 1 commit

test: Request Migration Docs and E2E vLLM Tests (#2177) · ae51b3f4

Jacky authored Aug 01, 2025


Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

ae51b3f4

28 Jul, 2025 1 commit
- chore: Add Request Migration docs and minor enhancements (#2038) · fdcf611f
  Jacky authored Jul 28, 2025
  
  fdcf611f
22 Jul, 2025 1 commit
- docs: Cleanup index.rst (#2007) · c49a13eb
  atchernych authored Jul 22, 2025
  
  c49a13eb
18 Jul, 2025 1 commit

feat: enable / disable chunked prefill for mockers (#2015) · e330d969

Yan Ru Pei authored Jul 18, 2025


Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

e330d969

17 Jul, 2025 1 commit
- feat(runtime): Support tokio-console (#1986) · 1eadc013
  Graham King authored Jul 17, 2025
  
  1eadc013
16 Jul, 2025 1 commit
- feat: integrate mocker with dynamo-run and python cli (#1927) · f31732a2
  Yan Ru Pei authored Jul 16, 2025
  
  f31732a2
14 Jul, 2025 1 commit
- feat: prefill aware routing (#1895) · df91fce2
  Yan Ru Pei authored Jul 14, 2025
  
  df91fce2
10 Jul, 2025 1 commit
- feat: allow using ApproxKvIndexer for routing via use_kv_events flag (#1869) · 13640e15
  Yan Ru Pei authored Jul 10, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>
```
  13640e15
08 Jul, 2025 1 commit

feat: predictive active blocks for routing without load metrics (#1731) · 84e71e27

Yan Ru Pei authored Jul 08, 2025


Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>

84e71e27

02 Jul, 2025 1 commit
- chore: fix typo for dynamo-run docs (#1720) · 7fd379a7
  Zhongdongming Dai authored Jul 02, 2025
  
  7fd379a7
30 Jun, 2025 1 commit
- docs: Update dynamo_run.md with the information how to resolve ModuleNotFou… (#1691) · 8f485b18
  tzulingk authored Jun 30, 2025
  
  8f485b18
12 Jun, 2025 1 commit

docs: DIS-133 and DIS-134 plus copyediting (#1439) · 0e7d4d82

Kristen Kelleher authored Jun 12, 2025


Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

0e7d4d82

04 Jun, 2025 1 commit

docs: fix sphinx errors admonitions adobe config (#1179) · 5e9370d3

Kristen Kelleher authored Jun 04, 2025


Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
- Content, format, and structural changes to the Dynamo docs for 0.3.0. 
- Includes copyediting and the first batch of changes from the DMO review.

5e9370d3

03 Jun, 2025 1 commit
- docs: Add documentation for verbosity flag in `dynamo-run` (#1353) · 9bf79b67
  Paul Hendricks authored Jun 03, 2025
  
  9bf79b67
02 Jun, 2025 2 commits
- feat: Make llama.cpp Gnu OpenMP dependency optional (#1331) · d3ca7661
  Graham King authored Jun 02, 2025
```
Do not include by default as it needs libgomp1 at runtime. Add a feature to enable it at build time.
```
  d3ca7661
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
29 May, 2025 1 commit
- chore: Make llama.cpp a default engine (#1177) · b889948c
  Graham King authored May 29, 2025
  
  b889948c
28 May, 2025 1 commit
- feat: Enable dynamo-run out=trtllm (#1223) · 1b1e089a
  Tanmay Verma authored May 28, 2025
  
  1b1e089a
22 May, 2025 2 commits

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821

21 May, 2025 2 commits

fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
Graham King authored May 21, 2025

3e8e38a9

docs: Add sphinx-theme based userguides (#528) · 8d636ebd

Suman Tatiraju authored May 21, 2025


Signed-off-by: Suman Tatiraju <167138127+statiraju@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>

8d636ebd