Commits · af8ee9db45805d481863eec030383e766bca6700 · OpenDAS / dynamo

19 Mar, 2025 4 commits

feat: enable LTO and codegen-units = 1 optimizations (#279) · af8ee9db

Alexander Zaitsev authored Mar 19, 2025

#### Overview:

This PR enables more aggressive compiler optimizations for the project which should lead to better performance and smaller binary sizes.

In this PR, I decided to use Fat LTO instead of ThinLTO since it provides higher optimization level.

I have made quick tests (AMD Ryzen 5900x, Fedora 41, Rust 1.85.1, the latest version of the project at the moment, `cargo build --release` command) - here are the results about the binary size improvements.

| Binary\Build mode | dynamo-run | libdynamo_llm_capi.so | http | llmctl | metrics | mock_worker |
| --- | --- | --- | --- | --- | --- | --- |
| Release | 55 Mib | 14 Mib | 19 Mib | 14 Mib | 21 Mib | 14 Mib |
| Release + `codegen-units = 1` + ThinLTO | 43 Mib | 11 Mib | 15 Mib | 11 Mib | 17 Mib | 11 Mib |
| Release + `codegen-units = 1` + FatLTO | 38 Mib | 9.2 Mib | 13 Mib | 9.6 Mib | 15 Mib | 9.6 Mib |

#### Details:

Enable `codegen-units = 1` and Fat LTO for better optimizations.

#### Where should the reviewer start?

Just check the `Cargo.toml` file ;)

#### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

- closes GitHub issue: #278

af8ee9db

fix(mistralrs): Disable paged attention (#234) · fd95f37b

Graham King authored Mar 19, 2025

Under load it sometimes drops a request. The request gets added to the batch (sequence) and immediately gets a FinishReason Stop. Not sure why. It doesn't happen with the default scheduler (non-paged attention), so switch to that for now.

fd95f37b

docs: Move back dynamo deploy file to the guides subfolder in docs (#295) · 48a59890
mohammedabdulwahhab authored Mar 19, 2025
```
Co-authored-by: mabdulwahhab <mabdulwahhab@nvidia.com>
```
48a59890
fix(dynamo-run): Fix build if llamacpp and mistralrs are disabled (#262) · 3ac95a90
Graham King authored Mar 19, 2025

3ac95a90

18 Mar, 2025 20 commits
- docs: proper installation steps + Ubuntu 24.04 support (#275) · ba33b2bd
  Dmitry Tokarev authored Mar 18, 2025
```
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  ba33b2bd
- docs: Update README.md - add missing python3-pip package (#263) · 004b6e6a
  Dmitry Tokarev authored Mar 18, 2025
  
  004b6e6a
- fix: update readme discord link (#271) · 16d0d60f
  ishandhanani authored Mar 18, 2025
```
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  16d0d60f
- docs: dynamo serve guide (#270) · a5113e46
  ishandhanani authored Mar 18, 2025
```
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
```
  a5113e46
- docs: Clean up of readme for deploying to K8s using helm (#266) · 610ef375
  mohammedabdulwahhab authored Mar 18, 2025
```
Co-authored-by: mabdulwahhab <mabdulwahhab@nvidia.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  610ef375
- docs(dynamo-run): Move README into docs/guides/ , add Quickstart (#265) · 40c55a24
  Graham King authored Mar 18, 2025
  
  40c55a24
- feat: add local gpu allocation (#232) · 9f0181a8
  Biswa Panda authored Mar 18, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  9f0181a8
- docs: fix links in docs (#256) · 548578f4
  Dmitry Tokarev authored Mar 18, 2025
```
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  548578f4
- chore: remove dynamo from vllm whl version (#257) · 792b747c
  Anant Sharma authored Mar 18, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  792b747c
- fix: temporary documentation for crates.io (#255) · 1ccd4caa
  Harrison Saturley-Hall authored Mar 18, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  1ccd4caa
- Update README.md · 05d19c23
  Meenakshi Sharma authored Mar 17, 2025
  
  05d19c23
- fix: created documentation to deploy_to_k8s_using_helm (#245) · 3983830e
  Maksim Khadkevich authored Mar 17, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  3983830e
- fix: update default to dev (#251) · 93a46969
  Neelay Shah authored Mar 17, 2025
  
  93a46969
- docs: update guides and filenames (#252) · c2a6b368
  Suman Tatiraju authored Mar 17, 2025
  
  c2a6b368
- chore: rename patched vllm wheel to ai_dynamo_vllm (#250) · 5161250a
  Anant Sharma authored Mar 17, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  5161250a
- Update README.md · 8f9dcad4
  Meenakshi Sharma authored Mar 17, 2025
  
  8f9dcad4
- fix: more closely mimic perf analyzer location to previous behavior (#246) · 5e70dd60
  Harrison Saturley-Hall authored Mar 17, 2025
  
  5e70dd60
- Docs: Update README.md (#249) · 0ba0df4b
  Meenakshi Sharma authored Mar 17, 2025
  
  0ba0df4b
- docs: Discord banner (#248) · 708b1aae
  Meenakshi Sharma authored Mar 17, 2025
```
Co-authored-by: Nicolas Noble <nicolasnoble@users.noreply.github.com>
```
  708b1aae
- docs: add support matrix (#210) · 63974527
  Pavithra Vijayakrishnan authored Mar 17, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  63974527
17 Mar, 2025 16 commits
- docs: Adding Discord banner (#238) · f189b79c
  Nicolas Noble authored Mar 17, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  f189b79c
- fix: propogate env vars from input cli/yaml into process (#208) · a611726e
  ishandhanani authored Mar 17, 2025
  
  a611726e
- docs: add guides to docs (#243) · 9be75482
  Suman Tatiraju authored Mar 17, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  9be75482
- chore: Update README.md (#242) · eca57f66
  Neelay Shah authored Mar 17, 2025
  
  eca57f66
- docs: point to right sdk (#241) · 18ce1f9e
  ishandhanani authored Mar 17, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  18ce1f9e
- fix: kkranen re-codeowner (#240) · 29b2a7c5
  kkranen authored Mar 17, 2025
  
  29b2a7c5
- docs: add docs for SDK and CLI and how to use (#209) · b0f433ee
  ishandhanani authored Mar 17, 2025
```
Signed-off-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: mabdulwahhab <mabdulwahhab@nvidia.com>
```
  b0f433ee
- feat: minor improvements (#239) · 03953479
  Neelay Shah authored Mar 17, 2025
  
  03953479
- docs: add disclaimer about examples (#236) · e1553c39
  Alec authored Mar 17, 2025
```
Co-authored-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
```
  e1553c39
- ci: always run pre-merge-python for prs (#237) · 7b393db3
  Harrison Saturley-Hall authored Mar 17, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  7b393db3
- docs: Add kv cache manager documentation (#225) · 793d024e
  Suman Tatiraju authored Mar 17, 2025
```
Co-authored-by: Vikram Sharma <vsm2@illinois.edu>
Co-authored-by: Ziqi Fan <ziqif@nvidia.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  793d024e
- docs: Update dynamo serve disagg example (#212) · 41ec2338
  ptarasiewiczNV authored Mar 17, 2025
```
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
```
  41ec2338
- docs: first draft kv-router doc (#228) · d788b63e
  Alec authored Mar 17, 2025
```
Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Sean <choishsean@gmail.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  d788b63e
- docs: quick start (#215) · 0ee128ea
  Neelay Shah authored Mar 17, 2025
  
  0ee128ea
- docs: remove future plans (#235) · fa373c19
  Suman Tatiraju authored Mar 17, 2025
  
  fa373c19
- docs: Fix links (#233) · a14bafa2
  Suman Tatiraju authored Mar 17, 2025
  
  a14bafa2