"vscode:/vscode.git/clone" did not exist on "2ee29443b6ec02d875d1e5bdf1c47667123f4be4"
- 25 Apr, 2025 1 commit
-
-
Graham King authored
This will allow an ingress-side pre-processor to see it without needing a model checkout. Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files. To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided. Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit. Part of #743
-
- 24 Apr, 2025 1 commit
-
-
Abrar Shivani authored
Send a warm‑up request to the mistralrs engine so that subsequent requests are faster.
-
- 21 Apr, 2025 4 commits
-
-
Pankaj Gupta authored
-
ishandhanani authored
-
Graham King authored
"echo_core" is an engine that echoes the post-processed request back to you so you can see the template. Good for testing. It needed an extra flag set to work correctly.
-
Abrar Shivani authored
-
- 18 Apr, 2025 3 commits
-
-
Graham King authored
-
Hongkuan Zhou authored
Co-authored-by:ishandhanani <82981111+ishandhanani@users.noreply.github.com>
-
Graham King authored
It's different enough that I made a new engine vllm0_8 and renamed the previous engine to vllm0_7. `dynamo-run out=vllm` now expects 0.8. This matches the container change in #690. For older use `dynamo-run out=vllm0_7`.
-
- 17 Apr, 2025 3 commits
-
-
tlipoca9 authored
Co-authored-by:ishandhanani <82981111+ishandhanani@users.noreply.github.com>
-
Ryan Olson authored
-
Ryan McCormick authored
-
- 12 Apr, 2025 1 commit
-
-
Hongkuan Zhou authored
feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router (#581) Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
- 11 Apr, 2025 1 commit
-
-
Cole authored
-
- 09 Apr, 2025 2 commits
-
-
jon-chuang authored
feat: Extract Common Configs + Log Configs on Init + Add `test_` to `sdk/tests` filenames required for pytest (#434) Co-authored-by:ishandhanani <82981111+ishandhanani@users.noreply.github.com>
-
Anant Sharma authored
-
- 07 Apr, 2025 1 commit
-
-
Graham King authored
As a first step towards KV routing: - introduce a `--router-mode` in dynamo-run that only does random and round-robin right now. Not that interesting yet. - Make the vllm engine publish the KV events received from our patched vllm. Now we "just" need to connect the two. Easy right?
-
- 04 Apr, 2025 3 commits
-
-
Yan Ru Pei authored
-
Graham King authored
Also upgrade the cargo resolver to v3, the default. New clippy lints: - `next_back()` instead of `last()` for a double-ended iterator. That avoids walking the whole list. - ` repeat_n` instead of `repeat.take`. That avoids cloning. - Doc indenting
-
Graham King authored
Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio. This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd. Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere. For NIM.
-
- 03 Apr, 2025 3 commits
-
-
Ryan Olson authored
Moved all of `lib/llm/src/engines` to their own crates as e.g. `lib/engines/mistralrs`. This will allow publishing of the `dynamo-llm` crate as it won't have any github dependencies. The only engines in dynamo-llm will be the demo `echo` ones. Co-authored-by:Graham King <grahamk@nvidia.com>
-
tlipoca9 authored
-
Ryan Olson authored
-
- 02 Apr, 2025 1 commit
-
-
Ryan Olson authored
-
- 01 Apr, 2025 2 commits
-
-
Ryan Olson authored
-
Kiv Chen authored
-
- 31 Mar, 2025 3 commits
-
-
Ryan Olson authored
-
Graham King authored
-
Tianer Zhou authored
Signed-off-by:Tianer Zhou <ezhoureal@gmail.com>
-
- 28 Mar, 2025 1 commit
-
-
Biswa Panda authored
-
- 26 Mar, 2025 1 commit
-
-
Ryan Olson authored
-
- 25 Mar, 2025 1 commit
-
-
Graham King authored
Put the arguments in a JSON file: ``` { "dtype": "half", "trust_remote_code": true } ``` Pass it like this: ``` dynamo-run out=sglang ~/llm_models/Llama-3.2-3B-Instruct --extra-engine-args sglang_extra.json ``` Requested here https://github.com/ai-dynamo/dynamo/issues/290 (`dtype`) and here https://github.com/ai-dynamo/dynamo/issues/360 (`trust_remote_code`).
-
- 24 Mar, 2025 1 commit
-
-
Graham King authored
This lets us do: ``` dynamo-run out=llamacpp <gguf_file> ``` Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.
-
- 21 Mar, 2025 1 commit
-
-
zhaohaidao authored
-
- 20 Mar, 2025 1 commit
-
-
Nora authored
Add `AsMut`, `DerefMut` and `IntoIterator` trait impl for the `Tokens` structure. Signed-off-by:
nora-coder-dot <nora6677@gmail.com> Co-authored-by:
nora-coder-dot <nora6677@gmail.com>
-
- 19 Mar, 2025 4 commits
-
-
Anant Sharma authored
Co-authored-by:Dmitry Tokarev <dtokarev@nvidia.com>
-
Graham King authored
This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked. Pieces: - `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag). - Move shared dependencies up into workspace - New `rand` crate has some renames for future rust - Ensure the dependency doesn't creep back in by enforcing it with cargo deny.
-
Graham King authored
Under load it sometimes drops a request. The request gets added to the batch (sequence) and immediately gets a FinishReason Stop. Not sure why. It doesn't happen with the default scheduler (non-paged attention), so switch to that for now.
-
Graham King authored
-
- 18 Mar, 2025 1 commit
-
-
Dmitry Tokarev authored
Co-authored-by:Anant Sharma <anants@nvidia.com>
-