- 19 Mar, 2025 10 commits
-
-
ishandhanani authored
-
ishandhanani authored
-
Elton Leander Pinto authored
Co-authored-by:Ryan Olson <ryanolson@users.noreply.github.com>
-
Anant Sharma authored
Co-authored-by:Dmitry Tokarev <dtokarev@nvidia.com>
-
Piotr Marcinkiewicz authored
-
Graham King authored
This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked. Pieces: - `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag). - Move shared dependencies up into workspace - New `rand` crate has some renames for future rust - Ensure the dependency doesn't creep back in by enforcing it with cargo deny.
-
Alexander Zaitsev authored
#### Overview: This PR enables more aggressive compiler optimizations for the project which should lead to better performance and smaller binary sizes. In this PR, I decided to use Fat LTO instead of ThinLTO since it provides higher optimization level. I have made quick tests (AMD Ryzen 5900x, Fedora 41, Rust 1.85.1, the latest version of the project at the moment, `cargo build --release` command) - here are the results about the binary size improvements. | Binary\Build mode | dynamo-run | libdynamo_llm_capi.so | http | llmctl | metrics | mock_worker | | --- | --- | --- | --- | --- | --- | --- | | Release | 55 Mib | 14 Mib | 19 Mib | 14 Mib | 21 Mib | 14 Mib | | Release + `codegen-units = 1` + ThinLTO | 43 Mib | 11 Mib | 15 Mib | 11 Mib | 17 Mib | 11 Mib | | Release + `codegen-units = 1` + FatLTO | 38 Mib | 9.2 Mib | 13 Mib | 9.6 Mib | 15 Mib | 9.6 Mib | #### Details: Enable `codegen-units = 1` and Fat LTO for better optimizations. #### Where should the reviewer start? Just check the `Cargo.toml` file ;) #### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to) - closes GitHub issue: #278
-
Graham King authored
Under load it sometimes drops a request. The request gets added to the batch (sequence) and immediately gets a FinishReason Stop. Not sure why. It doesn't happen with the default scheduler (non-paged attention), so switch to that for now.
-
mohammedabdulwahhab authored
Co-authored-by:mabdulwahhab <mabdulwahhab@nvidia.com>
-
Graham King authored
-
- 18 Mar, 2025 20 commits
-
-
Dmitry Tokarev authored
Co-authored-by:Anant Sharma <anants@nvidia.com>
-
Dmitry Tokarev authored
-
ishandhanani authored
Co-authored-by:
Dmitry Tokarev <dtokarev@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
ishandhanani authored
Co-authored-by:Dmitry Tokarev <dtokarev@nvidia.com>
-
mohammedabdulwahhab authored
Co-authored-by:
mabdulwahhab <mabdulwahhab@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Graham King authored
-
Biswa Panda authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Dmitry Tokarev authored
Co-authored-by:Anant Sharma <anants@nvidia.com>
-
Anant Sharma authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Harrison Saturley-Hall authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Meenakshi Sharma authored
-
Maksim Khadkevich authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Neelay Shah authored
-
Suman Tatiraju authored
-
Anant Sharma authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Meenakshi Sharma authored
-
Harrison Saturley-Hall authored
-
Meenakshi Sharma authored
-
Meenakshi Sharma authored
Co-authored-by:Nicolas Noble <nicolasnoble@users.noreply.github.com>
-
Pavithra Vijayakrishnan authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
- 17 Mar, 2025 10 commits
-
-
Nicolas Noble authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
ishandhanani authored
-
Suman Tatiraju authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Neelay Shah authored
-
ishandhanani authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
kkranen authored
-
ishandhanani authored
Signed-off-by:
ishandhanani <ishandhanani@gmail.com> Co-authored-by:
mabdulwahhab <mabdulwahhab@nvidia.com>
-
Neelay Shah authored
-
Alec authored
Co-authored-by:Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
-
Harrison Saturley-Hall authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-