- 26 Mar, 2025 4 commits
-
-
ptarasiewiczNV authored
-
Hongkuan Zhou authored
-
Dmitry Tokarev authored
-
Yan Ru Pei authored
-
- 25 Mar, 2025 2 commits
-
-
Sean SH Choi authored
-
Graham King authored
Put the arguments in a JSON file: ``` { "dtype": "half", "trust_remote_code": true } ``` Pass it like this: ``` dynamo-run out=sglang ~/llm_models/Llama-3.2-3B-Instruct --extra-engine-args sglang_extra.json ``` Requested here https://github.com/ai-dynamo/dynamo/issues/290 (`dtype`) and here https://github.com/ai-dynamo/dynamo/issues/360 (`trust_remote_code`).
-
- 24 Mar, 2025 4 commits
-
-
Yiming Cheng authored
-
Graham King authored
This lets us do: ``` dynamo-run out=llamacpp <gguf_file> ``` Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.
-
Hongkuan Zhou authored
-
Graham King authored
That ensures it gets removed when the process stops.
-
- 22 Mar, 2025 1 commit
-
-
Yiming Cheng authored
-
- 21 Mar, 2025 6 commits
-
-
Olga Andreeva authored
Co-authored-by:Olga Andreeva <oandreeva@oandreeva-mlt.client.nvidia.com>
-
Dmitry Tokarev authored
-
zhaohaidao authored
-
Ikko Eltociear Ashimine authored
-
Harry Kim authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Anant Sharma authored
-
- 20 Mar, 2025 5 commits
-
-
Yan Ru Pei authored
-
Meenakshi Sharma authored
-
Graham King authored
It hardly slows the build down, and it makes things run much faster. That allows us to switch to the debug (default) profile for development, and keep the release profile for, well, releasing. Motivated by changes in https://github.com/ai-dynamo/dynamo/pull/279
-
Nora authored
Add `AsMut`, `DerefMut` and `IntoIterator` trait impl for the `Tokens` structure. Signed-off-by:
nora-coder-dot <nora6677@gmail.com> Co-authored-by:
nora-coder-dot <nora6677@gmail.com>
-
gujing authored
Signed-off-by:zibai <zibai.gj@alibaba-inc.com>
-
- 19 Mar, 2025 10 commits
-
-
ishandhanani authored
-
ishandhanani authored
-
Elton Leander Pinto authored
Co-authored-by:Ryan Olson <ryanolson@users.noreply.github.com>
-
Anant Sharma authored
Co-authored-by:Dmitry Tokarev <dtokarev@nvidia.com>
-
Piotr Marcinkiewicz authored
-
Graham King authored
This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked. Pieces: - `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag). - Move shared dependencies up into workspace - New `rand` crate has some renames for future rust - Ensure the dependency doesn't creep back in by enforcing it with cargo deny.
-
Alexander Zaitsev authored
#### Overview: This PR enables more aggressive compiler optimizations for the project which should lead to better performance and smaller binary sizes. In this PR, I decided to use Fat LTO instead of ThinLTO since it provides higher optimization level. I have made quick tests (AMD Ryzen 5900x, Fedora 41, Rust 1.85.1, the latest version of the project at the moment, `cargo build --release` command) - here are the results about the binary size improvements. | Binary\Build mode | dynamo-run | libdynamo_llm_capi.so | http | llmctl | metrics | mock_worker | | --- | --- | --- | --- | --- | --- | --- | | Release | 55 Mib | 14 Mib | 19 Mib | 14 Mib | 21 Mib | 14 Mib | | Release + `codegen-units = 1` + ThinLTO | 43 Mib | 11 Mib | 15 Mib | 11 Mib | 17 Mib | 11 Mib | | Release + `codegen-units = 1` + FatLTO | 38 Mib | 9.2 Mib | 13 Mib | 9.6 Mib | 15 Mib | 9.6 Mib | #### Details: Enable `codegen-units = 1` and Fat LTO for better optimizations. #### Where should the reviewer start? Just check the `Cargo.toml` file ;) #### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to) - closes GitHub issue: #278
-
Graham King authored
Under load it sometimes drops a request. The request gets added to the batch (sequence) and immediately gets a FinishReason Stop. Not sure why. It doesn't happen with the default scheduler (non-paged attention), so switch to that for now.
-
mohammedabdulwahhab authored
Co-authored-by:mabdulwahhab <mabdulwahhab@nvidia.com>
-
Graham King authored
-
- 18 Mar, 2025 8 commits
-
-
Dmitry Tokarev authored
Co-authored-by:Anant Sharma <anants@nvidia.com>
-
Dmitry Tokarev authored
-
ishandhanani authored
Co-authored-by:
Dmitry Tokarev <dtokarev@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
ishandhanani authored
Co-authored-by:Dmitry Tokarev <dtokarev@nvidia.com>
-
mohammedabdulwahhab authored
Co-authored-by:
mabdulwahhab <mabdulwahhab@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Graham King authored
-
Biswa Panda authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Dmitry Tokarev authored
Co-authored-by:Anant Sharma <anants@nvidia.com>
-