- 07 May, 2025 6 commits
-
-
祝健聪 authored
Signed-off-by:Chasing1020 <chasing1020@gmail.com>
-
Anthony Casagrande authored
-
Graham King authored
vllm and sglang are now the sub-process engines from #954 Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).
-
ptarasiewiczNV authored
-
ptarasiewiczNV authored
-
julienmancuso authored
-
- 06 May, 2025 8 commits
-
-
jthomson04 authored
-
Hongkuan Zhou authored
-
Graham King authored
New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines. Why? - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain. - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues. - Should have better performance as it's "native" vllm / sglang. - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle. -
jthomson04 authored
-
Graham King authored
Approved by OSRB in Slack. Note we don't check for the closing delimiter to allow the longer copyright format. Motivation is that it reduces the context usage by 12 lines for every file in the project. That helps things like Cursor and Claude Code fit more, go faster, and cost less.
-
hhzhang16 authored
-
hhzhang16 authored
-
Graham King authored
Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests: ``` from dynamo.llm import register_llm MODEL = "Qwen/Qwen2.5-0.5B-Instruct" await register_llm(endpoint, MODEL, 3) ``` Full vllm example, with pre-processing in dynamo: - `dynamo-run in=text out=dyn://dynamo.backend.generate` - `cd lib/bindings/python/examples/hello_world` - `python server_vllm.py` This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus. The `register_llm` call does this: - Download the model from HF if necessary - Load the model deployment card from the HF folder or extract from GGUF - Push the tokenizer config etc into NATS object store so ingress can access it from a different machine - Publish the model deployment card to ETCD
-
- 05 May, 2025 6 commits
-
-
julienmancuso authored
-
Hongkuan Zhou authored
-
richardhuo-nv authored
-
julienmancuso authored
-
Harrison Saturley-Hall authored
Signed-off-by:
Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Co-authored-by:
Anant Sharma <anants@nvidia.com>
-
Hongkuan Zhou authored
-
- 02 May, 2025 3 commits
-
-
Tanmay Verma authored
-
Ryan McCormick authored
-
Kris Hung authored
-
- 01 May, 2025 7 commits
-
-
hhzhang16 authored
-
Graham King authored
Part of https://github.com/ai-dynamo/dynamo/issues/743
-
Biswa Panda authored
-
Abrar Shivani authored
The build script currently fails on macOS due to an incompatible Bash version. This PR adds a version check to ensure the correct Bash version is being used before proceeding. Closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/318
-
Abrar Shivani authored
Allow `hf://` prefix on command line. Closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/829
-
Yan Ru Pei authored
-
Ziqi Fan authored
-
- 30 Apr, 2025 5 commits
-
-
Biswa Panda authored
-
ishandhanani authored
-
Yan Ru Pei authored
-
hhzhang16 authored
Signed-off-by:
hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Co-authored-by:
mohammedabdulwahhab <furkhan324@berkeley.edu>
-
julienmancuso authored
-
- 29 Apr, 2025 5 commits
-
-
mohammedabdulwahhab authored
Signed-off-by:
mohammedabdulwahhab <furkhan324@berkeley.edu> Co-authored-by:
hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
-
julienmancuso authored
-
wxsm authored
Signed-off-by:
wxsm <wxsms@foxmail.com> Co-authored-by:
ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>
-
Abrar Shivani authored
Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides. Changes: - Add --request-template CLI flag to specify template file path - Integrate template support in HTTP, batch and text input modes - Template values can be overridden by individual request parameters - Example template.json: ``` { "model": "Qwen2.5-3B-Instruct", "temperature": 0.7, "max_completion_tokens": 4096 } ``` -
Graham King authored
-