- 13 Mar, 2025 2 commits
-
-
Graham King authored
Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step. Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.
-
Graham King authored
- Any engine can take the name of a Hugging Face repository. It will be downloaded before calling the engine. - The default engine (previously always mistralrs) depends on what is compiled in. - Text can be piped in and will result in a single run of the model. All of those together mean if you build with `--features vllm` you can do this and it will download the model and run it with vllm, answer your question, and exit: ``` echo "What is the capital of Costa Rica?" | dynamo-run Qwen/Qwen2.5-3B-Instruct ``` Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
- 11 Mar, 2025 1 commit
-
-
Graham King authored
If the python file raises an exception we print it like Python would. ``` $ ./target/debug/dynamo-run in=http out=pystr:~/Temp/cn47/1_e.py --model-name test Traceback (most recent call last): File "/home/graham/Temp/cn47/1_e.py", line 17, in generate raise MyException("The message") 1_e.MyException: The message ```
-
- 08 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-