- 25 Apr, 2025 10 commits
-
-
hhzhang16 authored
-
Anant Sharma authored
-
Ziqi Fan authored
-
julienmancuso authored
-
Anant Sharma authored
-
Piotr Marcinkiewicz authored
-
mohammedabdulwahhab authored
-
Graham King authored
This will allow an ingress-side pre-processor to see it without needing a model checkout. Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files. To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided. Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit. Part of #743 -
Biswa Panda authored
Co-authored-by:ishandhanani <ishandhanani@gmail.com>
-
julienmancuso authored
-
- 24 Apr, 2025 9 commits
-
-
Alec authored
Signed-off-by:Alec <35311602+alec-flowers@users.noreply.github.com>
-
ishandhanani authored
Co-authored-by:mohammedabdulwahhab <furkhan324@berkeley.edu>
-
julienmancuso authored
-
Ryan McCormick authored
Signed-off-by:Ryan McCormick <rmccormick@nvidia.com>
-
hhzhang16 authored
Co-authored-by:Julien Mancuso <jmancuso@nvidia.com>
-
Abrar Shivani authored
Send a warm‑up request to the mistralrs engine so that subsequent requests are faster.
-
Ryan McCormick authored
-
Tanmay Verma authored
-
Ryan McCormick authored
-
- 23 Apr, 2025 8 commits
-
-
julienmancuso authored
-
Abrar Shivani authored
#### Overview: This PR adds a command-line verbosity flag (-v, -vv) to dynamo-run to control log levels. - Added new verbosity flag to Flags struct: - -v: Sets log level to debug - -vv: Sets log level to trace - No flag (default): Keeps log level at info #### Details: - closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/567
-
Anant Sharma authored
Signed-off-by:Anant Sharma <anants@nvidia.com>
-
julienmancuso authored
-
Anant Sharma authored
-
KennyMcCormick authored
Signed-off-by:cormick <cormick1080@gmail.com>
-
Ryan McCormick authored
-
julienmancuso authored
-
- 22 Apr, 2025 4 commits
-
-
GuanLuo authored
-
Tushar Sharma authored
-
julienmancuso authored
-
hhzhang16 authored
-
- 21 Apr, 2025 8 commits
-
-
hhzhang16 authored
-
Pankaj Gupta authored
-
ptarasiewiczNV authored
-
Harry Kim authored
Signed-off-by:Harry Kim <harry_kim@live.com>
-
ishandhanani authored
-
Graham King authored
"echo_core" is an engine that echoes the post-processed request back to you so you can see the template. Good for testing. It needed an extra flag set to work correctly.
-
Abrar Shivani authored
-
Zhongdongming Dai authored
-
- 18 Apr, 2025 1 commit
-
-
GuanLuo authored
-