- 14 Mar, 2025 14 commits
-
-
Hongkuan Zhou authored
-
Hongkuan Zhou authored
-
Anant Sharma authored
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Anant Sharma authored
-
Tanmay Verma authored
-
Graham King authored
- Mac doesn't have `pipe2` syscall so use plain `pipe`. - rtnetlink isn't a dependency on mac so don't use the type
-
hhzhang16 authored
Co-authored-by:Julien Mancuso <jmancuso@nvidia.com>
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Ryan Olson authored
-
- 13 Mar, 2025 11 commits
-
-
Ryan McCormick authored
-
Ziqi Fan authored
-
Anant Sharma authored
-
Graham King authored
Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step. Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.
-
Anant Sharma authored
-
Graham King authored
"netlink" doesn't exist on Mac. We print the primary network interface to help multi-node setup, which is also unlikely on Mac.
-
Pawel Ziecina authored
-
Dmitry Tokarev authored
-
Biswa Panda authored
-
Tanmay Verma authored
Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
Graham King authored
- Any engine can take the name of a Hugging Face repository. It will be downloaded before calling the engine. - The default engine (previously always mistralrs) depends on what is compiled in. - Text can be piped in and will result in a single run of the model. All of those together mean if you build with `--features vllm` you can do this and it will download the model and run it with vllm, answer your question, and exit: ``` echo "What is the capital of Costa Rica?" | dynamo-run Qwen/Qwen2.5-3B-Instruct ``` Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
- 12 Mar, 2025 14 commits
-
-
hhzhang16 authored
-
Graham King authored
Command line arguments are passed to the python engine like this: ``` dynamo-run out=pystr:my_python_engine.py -- -n 42 --custom-arg Orange --yes ``` The python engine receives the arguments in `sys.argv`. The argument list will include some standard ones as well as anything after the `--`. This input: ``` dynamo-run out=pystr:my_engine.py /opt/models/Llama-3.2-3B-Instruct/ --model-name llama_3.2 --tensor-parallel-size 4 -- -n 1 ``` is read like this: ``` async def generate(request): .. as before .. if __name__ == "__main__": print(f"MAIN: {sys.argv}") ``` and produces this output: ``` MAIN: ['my_engine.py', '--model-path', '/opt/models/Llama-3.2-3B-Instruct/', '--model-name', 'llama3.2', '--http-port', '8080', '--tensor-parallel-size', '4', '--base-gpu-id', '0', '--num-nodes', '1', '--node-rank', '0', '-n', '1'] ``` This allows quick iteration on the engine setup. Note how the `-n` `1` is included. Flags `--leader-addr` and `--model-config` will also be added if provided to `dynamo-run`. -
Ryan McCormick authored
-
Hongkuan Zhou authored
Co-authored-by:
hongkuan <hongkuanz@nvidia.com> Co-authored-by:
Anant Sharma <anants@nvidia.com>
-
ptarasiewiczNV authored
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
-
Maksim Khadkevich authored
-
Dmitry Tokarev authored
-
Neelay Shah authored
-
Anant Sharma authored
-
Maksim Khadkevich authored
-
Hongkuan Zhou authored
Co-authored-by:hongkuan <hongkuanz@nvidia.com>
-
Tanmay Verma authored
-
Ziqi Fan authored
feat: rename dynamo-sdk to dynamo; add dynamo run to call dynamo-run under the hood for unification (#104)
-
Neelay Shah authored
-
- 11 Mar, 2025 1 commit
-
-
hhzhang16 authored
-