Commits · 17827e1d55de1f2b9a919b3f37511cfb4b534502 · OpenDAS / dynamo

26 Mar, 2025 4 commits
- feat: Decode -> Prefill cached kv transfer (#340) · 17827e1d
  ptarasiewiczNV authored Mar 26, 2025
  
  17827e1d
- fix: limit rust build parallel jobs (#366) · 405222ce
  Hongkuan Zhou authored Mar 26, 2025
  
  405222ce
- chore: Bump bentoml version to 1.4.6 (#404) · c396fa3e
  Dmitry Tokarev authored Mar 26, 2025
  
  c396fa3e
- chore: more Pythonic kv router cleanups in examples (#396) · a544d823
  Yan Ru Pei authored Mar 25, 2025
  
  a544d823
25 Mar, 2025 2 commits

fix: add codeowners for examples (#325) · cce0c028
Sean SH Choi authored Mar 25, 2025

cce0c028

feat: Allow passing any arguments to vllm and sglang engines (#368) · 670661f6

Graham King authored Mar 25, 2025

Put the arguments in a JSON file:
```
{
    "dtype": "half",
    "trust_remote_code": true
}
```

Pass it like this:
```
dynamo-run out=sglang ~/llm_models/Llama-3.2-3B-Instruct --extra-engine-args sglang_extra.json
```

Requested here https://github.com/ai-dynamo/dynamo/issues/290 (`dtype`) and here https://github.com/ai-dynamo/dynamo/issues/360 (`trust_remote_code`).

670661f6

24 Mar, 2025 4 commits
- docs: Fix capitalization (#367) · a03dd474
  Yiming Cheng authored Mar 24, 2025
  
  a03dd474
- feat: Build pre-processor from GGUF (#344) · c7067fc2
  Graham King authored Mar 24, 2025
```
This lets us do:
```
  dynamo-run out=llamacpp <gguf_file>
```

Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.
```
  c7067fc2
- feat: conditional disagg based on prefill queue size (#303) · d29f7fcc
  Hongkuan Zhou authored Mar 24, 2025
  
  d29f7fcc
- fix: Attach lease to etcd key (#364) · d7165149
  Graham King authored Mar 24, 2025
```
That ensures it gets removed when the process stops.
```
  d7165149
22 Mar, 2025 1 commit
- docs: fix grammar in architecture support note (#346) · c52c11fb
  Yiming Cheng authored Mar 22, 2025
  
  c52c11fb
21 Mar, 2025 6 commits
- chore: Clarified docs, added more informative error prints (#342) · 1831c9cc
  Olga Andreeva authored Mar 21, 2025
```
Co-authored-by: Olga Andreeva <oandreeva@oandreeva-mlt.client.nvidia.com>
```
  1831c9cc
- docs: Update support_matrix.md - list glibc min version (#341) · ac863f32
  Dmitry Tokarev authored Mar 21, 2025
  
  ac863f32
- chore: add warn log when fix_venv failed (#338) · aa21a03b
  zhaohaidao authored Mar 22, 2025
  
  aa21a03b
- docs: fix typo in dynamo_serve.md (#314) · 9242cfa0
  Ikko Eltociear Ashimine authored Mar 22, 2025
  
  9242cfa0
- docs: Update main and guide readmes (#332) · 66c6330a
  Harry Kim authored Mar 21, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  66c6330a
- fix: use rustup-init for rust install (#319) · d1427ac5
  Anant Sharma authored Mar 21, 2025
  
  d1427ac5
20 Mar, 2025 5 commits

chore: KV router Pythonic cleanups (#324) · 43ff14ce
Yan Ru Pei authored Mar 20, 2025

43ff14ce
ci: Add External Contribution label (#322) · bb35f36f
Meenakshi Sharma authored Mar 20, 2025

bb35f36f

chore: Make debug profile use all optimizations (#317) · 00e54337

Graham King authored Mar 20, 2025

It hardly slows the build down, and it makes things run much faster. That allows us to switch to the debug (default) profile for development, and keep the release profile for, well, releasing.

Motivated by changes in https://github.com/ai-dynamo/dynamo/pull/279

00e54337

feat: add more useful APIs for tokens (#313) · d4d93b6a

Nora authored Mar 20, 2025



Add `AsMut`, `DerefMut` and `IntoIterator` trait impl for the `Tokens` structure.
Signed-off-by: nora-coder-dot <nora6677@gmail.com>
Co-authored-by: nora-coder-dot <nora6677@gmail.com>

d4d93b6a

fix: helm tmpl (#307) · 001b07d9
gujing authored Mar 20, 2025
```
Signed-off-by: zibai <zibai.gj@alibaba-inc.com>
```
001b07d9

19 Mar, 2025 10 commits

feat: `Frontend` component uses served_model_name instead of model (#302) · 1f6ccc7f
ishandhanani authored Mar 19, 2025

1f6ccc7f
chore: remove older unused components (#300) · 476174f3
ishandhanani authored Mar 19, 2025

476174f3
chore: Update dynamo.code-workspace (#282) · 19a8a6ec
Elton Leander Pinto authored Mar 19, 2025
```
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>
```
19a8a6ec
fix: update crates metadata (#264) · 68d953f7
Anant Sharma authored Mar 19, 2025
```
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
```
68d953f7
fix: Add __init__.py for compoments folder in llm example (#299) · ff3413be
Piotr Marcinkiewicz authored Mar 19, 2025

ff3413be

chore: Don't depend on openssl (#292) · 7c3fd5c9

Graham King authored Mar 19, 2025

This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked.

Pieces:
- `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag).
- Move shared dependencies up into workspace
- New `rand` crate has some renames for future rust
- Ensure the dependency doesn't creep back in by enforcing it with cargo deny.

7c3fd5c9

feat: enable LTO and codegen-units = 1 optimizations (#279) · af8ee9db

Alexander Zaitsev authored Mar 19, 2025

#### Overview:

This PR enables more aggressive compiler optimizations for the project which should lead to better performance and smaller binary sizes.

In this PR, I decided to use Fat LTO instead of ThinLTO since it provides higher optimization level.

I have made quick tests (AMD Ryzen 5900x, Fedora 41, Rust 1.85.1, the latest version of the project at the moment, `cargo build --release` command) - here are the results about the binary size improvements.

| Binary\Build mode | dynamo-run | libdynamo_llm_capi.so | http | llmctl | metrics | mock_worker |
| --- | --- | --- | --- | --- | --- | --- |
| Release | 55 Mib | 14 Mib | 19 Mib | 14 Mib | 21 Mib | 14 Mib |
| Release + `codegen-units = 1` + ThinLTO | 43 Mib | 11 Mib | 15 Mib | 11 Mib | 17 Mib | 11 Mib |
| Release + `codegen-units = 1` + FatLTO | 38 Mib | 9.2 Mib | 13 Mib | 9.6 Mib | 15 Mib | 9.6 Mib |

#### Details:

Enable `codegen-units = 1` and Fat LTO for better optimizations.

#### Where should the reviewer start?

Just check the `Cargo.toml` file ;)

#### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

- closes GitHub issue: #278

af8ee9db

fix(mistralrs): Disable paged attention (#234) · fd95f37b

Graham King authored Mar 19, 2025

Under load it sometimes drops a request. The request gets added to the batch (sequence) and immediately gets a FinishReason Stop. Not sure why. It doesn't happen with the default scheduler (non-paged attention), so switch to that for now.

fd95f37b

docs: Move back dynamo deploy file to the guides subfolder in docs (#295) · 48a59890
mohammedabdulwahhab authored Mar 19, 2025
```
Co-authored-by: mabdulwahhab <mabdulwahhab@nvidia.com>
```
48a59890
fix(dynamo-run): Fix build if llamacpp and mistralrs are disabled (#262) · 3ac95a90
Graham King authored Mar 19, 2025

3ac95a90

18 Mar, 2025 8 commits
- docs: proper installation steps + Ubuntu 24.04 support (#275) · ba33b2bd
  Dmitry Tokarev authored Mar 18, 2025
```
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  ba33b2bd
- docs: Update README.md - add missing python3-pip package (#263) · 004b6e6a
  Dmitry Tokarev authored Mar 18, 2025
  
  004b6e6a
- fix: update readme discord link (#271) · 16d0d60f
  ishandhanani authored Mar 18, 2025
```
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  16d0d60f
- docs: dynamo serve guide (#270) · a5113e46
  ishandhanani authored Mar 18, 2025
```
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
```
  a5113e46
- docs: Clean up of readme for deploying to K8s using helm (#266) · 610ef375
  mohammedabdulwahhab authored Mar 18, 2025
```
Co-authored-by: mabdulwahhab <mabdulwahhab@nvidia.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  610ef375
- docs(dynamo-run): Move README into docs/guides/ , add Quickstart (#265) · 40c55a24
  Graham King authored Mar 18, 2025
  
  40c55a24
- feat: add local gpu allocation (#232) · 9f0181a8
  Biswa Panda authored Mar 18, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  9f0181a8
- docs: fix links in docs (#256) · 548578f4
  Dmitry Tokarev authored Mar 18, 2025
```
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  548578f4