- 20 Mar, 2025 3 commits
-
-
Graham King authored
It hardly slows the build down, and it makes things run much faster. That allows us to switch to the debug (default) profile for development, and keep the release profile for, well, releasing. Motivated by changes in https://github.com/ai-dynamo/dynamo/pull/279
-
Nora authored
Add `AsMut`, `DerefMut` and `IntoIterator` trait impl for the `Tokens` structure. Signed-off-by:
nora-coder-dot <nora6677@gmail.com> Co-authored-by:
nora-coder-dot <nora6677@gmail.com>
-
gujing authored
Signed-off-by:zibai <zibai.gj@alibaba-inc.com>
-
- 19 Mar, 2025 10 commits
-
-
ishandhanani authored
-
ishandhanani authored
-
Elton Leander Pinto authored
Co-authored-by:Ryan Olson <ryanolson@users.noreply.github.com>
-
Anant Sharma authored
Co-authored-by:Dmitry Tokarev <dtokarev@nvidia.com>
-
Piotr Marcinkiewicz authored
-
Graham King authored
This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked. Pieces: - `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag). - Move shared dependencies up into workspace - New `rand` crate has some renames for future rust - Ensure the dependency doesn't creep back in by enforcing it with cargo deny.
-
Alexander Zaitsev authored
#### Overview: This PR enables more aggressive compiler optimizations for the project which should lead to better performance and smaller binary sizes. In this PR, I decided to use Fat LTO instead of ThinLTO since it provides higher optimization level. I have made quick tests (AMD Ryzen 5900x, Fedora 41, Rust 1.85.1, the latest version of the project at the moment, `cargo build --release` command) - here are the results about the binary size improvements. | Binary\Build mode | dynamo-run | libdynamo_llm_capi.so | http | llmctl | metrics | mock_worker | | --- | --- | --- | --- | --- | --- | --- | | Release | 55 Mib | 14 Mib | 19 Mib | 14 Mib | 21 Mib | 14 Mib | | Release + `codegen-units = 1` + ThinLTO | 43 Mib | 11 Mib | 15 Mib | 11 Mib | 17 Mib | 11 Mib | | Release + `codegen-units = 1` + FatLTO | 38 Mib | 9.2 Mib | 13 Mib | 9.6 Mib | 15 Mib | 9.6 Mib | #### Details: Enable `codegen-units = 1` and Fat LTO for better optimizations. #### Where should the reviewer start? Just check the `Cargo.toml` file ;) #### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to) - closes GitHub issue: #278
-
Graham King authored
Under load it sometimes drops a request. The request gets added to the batch (sequence) and immediately gets a FinishReason Stop. Not sure why. It doesn't happen with the default scheduler (non-paged attention), so switch to that for now.
-
mohammedabdulwahhab authored
Co-authored-by:mabdulwahhab <mabdulwahhab@nvidia.com>
-
Graham King authored
-
- 18 Mar, 2025 20 commits
-
-
Dmitry Tokarev authored
Co-authored-by:Anant Sharma <anants@nvidia.com>
-
Dmitry Tokarev authored
-
ishandhanani authored
Co-authored-by:
Dmitry Tokarev <dtokarev@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
ishandhanani authored
Co-authored-by:Dmitry Tokarev <dtokarev@nvidia.com>
-
mohammedabdulwahhab authored
Co-authored-by:
mabdulwahhab <mabdulwahhab@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Graham King authored
-
Biswa Panda authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Dmitry Tokarev authored
Co-authored-by:Anant Sharma <anants@nvidia.com>
-
Anant Sharma authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Harrison Saturley-Hall authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Meenakshi Sharma authored
-
Maksim Khadkevich authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Neelay Shah authored
-
Suman Tatiraju authored
-
Anant Sharma authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Meenakshi Sharma authored
-
Harrison Saturley-Hall authored
-
Meenakshi Sharma authored
-
Meenakshi Sharma authored
Co-authored-by:Nicolas Noble <nicolasnoble@users.noreply.github.com>
-
Pavithra Vijayakrishnan authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
- 17 Mar, 2025 7 commits
-
-
Nicolas Noble authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
ishandhanani authored
-
Suman Tatiraju authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Neelay Shah authored
-
ishandhanani authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
kkranen authored
-
ishandhanani authored
Signed-off-by:
ishandhanani <ishandhanani@gmail.com> Co-authored-by:
mabdulwahhab <mabdulwahhab@nvidia.com>
-