- 13 Mar, 2025 9 commits
-
-
Anant Sharma authored
-
Graham King authored
Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step. Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.
-
Anant Sharma authored
-
Graham King authored
"netlink" doesn't exist on Mac. We print the primary network interface to help multi-node setup, which is also unlikely on Mac.
-
Pawel Ziecina authored
-
Dmitry Tokarev authored
-
Biswa Panda authored
-
Tanmay Verma authored
Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
Graham King authored
- Any engine can take the name of a Hugging Face repository. It will be downloaded before calling the engine. - The default engine (previously always mistralrs) depends on what is compiled in. - Text can be piped in and will result in a single run of the model. All of those together mean if you build with `--features vllm` you can do this and it will download the model and run it with vllm, answer your question, and exit: ``` echo "What is the capital of Costa Rica?" | dynamo-run Qwen/Qwen2.5-3B-Instruct ``` Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
- 12 Mar, 2025 14 commits
-
-
hhzhang16 authored
-
Graham King authored
Command line arguments are passed to the python engine like this: ``` dynamo-run out=pystr:my_python_engine.py -- -n 42 --custom-arg Orange --yes ``` The python engine receives the arguments in `sys.argv`. The argument list will include some standard ones as well as anything after the `--`. This input: ``` dynamo-run out=pystr:my_engine.py /opt/models/Llama-3.2-3B-Instruct/ --model-name llama_3.2 --tensor-parallel-size 4 -- -n 1 ``` is read like this: ``` async def generate(request): .. as before .. if __name__ == "__main__": print(f"MAIN: {sys.argv}") ``` and produces this output: ``` MAIN: ['my_engine.py', '--model-path', '/opt/models/Llama-3.2-3B-Instruct/', '--model-name', 'llama3.2', '--http-port', '8080', '--tensor-parallel-size', '4', '--base-gpu-id', '0', '--num-nodes', '1', '--node-rank', '0', '-n', '1'] ``` This allows quick iteration on the engine setup. Note how the `-n` `1` is included. Flags `--leader-addr` and `--model-config` will also be added if provided to `dynamo-run`. -
Ryan McCormick authored
-
Hongkuan Zhou authored
Co-authored-by:
hongkuan <hongkuanz@nvidia.com> Co-authored-by:
Anant Sharma <anants@nvidia.com>
-
ptarasiewiczNV authored
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
-
Maksim Khadkevich authored
-
Dmitry Tokarev authored
-
Neelay Shah authored
-
Anant Sharma authored
-
Maksim Khadkevich authored
-
Hongkuan Zhou authored
Co-authored-by:hongkuan <hongkuanz@nvidia.com>
-
Tanmay Verma authored
-
Ziqi Fan authored
feat: rename dynamo-sdk to dynamo; add dynamo run to call dynamo-run under the hood for unification (#104)
-
Neelay Shah authored
-
- 11 Mar, 2025 17 commits
-
-
hhzhang16 authored
-
Tanmay Verma authored
-
Neelay Shah authored
-
Ryan McCormick authored
-
Graham King authored
In https://github.com/ai-dynamo/dynamo/pull/89 `dynamo-run` was moved into a workspace. That means it builds in that workspace, so into `launch/target` not `launch/dynamo-run/target`. Update docs to match.
-
Graham King authored
If the python file raises an exception we print it like Python would. ``` $ ./target/debug/dynamo-run in=http out=pystr:~/Temp/cn47/1_e.py --model-name test Traceback (most recent call last): File "/home/graham/Temp/cn47/1_e.py", line 17, in generate raise MyException("The message") 1_e.MyException: The message ``` -
Hongkuan Zhou authored
Co-authored-by:
alec-flowers <aflowers@nvidia.com> Co-authored-by:
hongkuanz <hongkuanz@nvidia.com> Co-authored-by:
Alec <35311602+alec-flowers@users.noreply.github.com>
-
Graham King authored
- Latest from repo, many improvements - Support most of the OpenAI request features (temperature, top_p, etc) - Download models from Hugging Face if necessary
-
Neelay Shah authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Ryan McCormick authored
-
julienmancuso authored
-
Alec authored
-
Hongkuan Zhou authored
Co-authored-by:hongkuanz <hongkuanz@nvidia.com>
-
ptarasiewiczNV authored
-
Hongkuan Zhou authored
Co-authored-by:hongkuanz <hongkuanz@nvidia.com>
-
Tanmay Verma authored
-
Piotr Marcinkiewicz authored
-