- 17 Mar, 2025 10 commits
-
-
Anant Sharma authored
Co-authored-by:Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
-
ptarasiewiczNV authored
Co-authored-by:
hongkuanz <hongkuanz@nvidia.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Co-authored-by:
Dmitry Tokarev <dtokarev@nvidia.com>
-
Suman Tatiraju authored
-
GuanLuo authored
-
ishandhanani authored
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Neelay Shah authored
-
Anant Sharma authored
-
Anant Sharma authored
-
- 16 Mar, 2025 10 commits
-
-
Dmitry Tokarev authored
-
Anant Sharma authored
-
David Zier authored
-
Neelay Shah authored
-
ptarasiewiczNV authored
Co-authored-by:hongkuanz <hongkuanz@nvidia.com>
-
Harrison Saturley-Hall authored
-
julienmancuso authored
Co-authored-by:Maksim Khadkevich <mkhadkevich@nvidia.com>
-
Maksim Khadkevich authored
-
ishandhanani authored
-
April Yang authored
Co-authored-by:
Julien Mancuso <jmancuso@nvidia.com> Co-authored-by:
Hannah Zhang <hannahz@nvidia.com> Co-authored-by:
Biswa Panda <biswa.panda@gmail.com> Co-authored-by:
Maksim Khadkevich <mkhadkevich@nvidia.com>
-
- 15 Mar, 2025 10 commits
-
-
Biswa Panda authored
-
Neelay Shah authored
-
ptarasiewiczNV authored
-
Matthew Kotila authored
-
Biswa Panda authored
Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
ptarasiewiczNV authored
-
Harrison Saturley-Hall authored
-
Maksim Khadkevich authored
-
julienmancuso authored
-
Graham King authored
``` dynamo-run in=batch:prompts.jsonl out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct/ ``` The file has genai format, one entry per line: ``` {"text": "the prompt"} {"text": ..etc ``` The prompt is evaluated and the output written to `output.jsonl` in the same folder as the input. At the end of the run various statistics are printed: > Ran 5 files in 8s 679ms. Tokens in: 40 (5/s). Tokens out: 346 (43/s) This is also helpful for pushing load into the system and stressing the various components. Not intended for performance measurement, it's a batch inference tool.
-
- 14 Mar, 2025 10 commits
-
-
Anant Sharma authored
-
Graham King authored
Engines mistralrs, sglang and vllm included by default. Can be disabled like this: `cargo build --no-default-features --features <add-back-what-you-want>`. Added `--feature vulkan` option, for llamacpp. Build time message if CUDA or Metal would help and are missing. That's the best we can do: > warning: dynamo-run@0.1.0: CUDA not enabled, re-run with `--features cuda` Runtime message if CUDA, Metal or Vulkan are enabled: > 2025-03-14T21:59:26.501937Z INFO dynamo_run: CUDA on Runtime message if they are missing: > 2025-03-14T22:02:37.439404Z INFO dynamo_run: CPU mode. Rebuild with `--features cuda|metal|vulkan` for better performance Defaut engine message includes available engines: > 2025-03-14T21:59:26.503612Z INFO dynamo_run: Using default engine: mistralrs. Use out=<engine> to specify one of echo_core, echo_full, mistralrs, llamacpp, sglang, vllm, pystr, pytok The really important outcome is that this should now "just work": ``` cargo install dynamo-run dynamo-run Qwen/Qwen2.5-3B-Instruct ``` Sadly you still need `--features cuda|metal` for performance, I couldn't automate that.
-
Pavithra Vijayakrishnan authored
-
Hongkuan Zhou authored
-
Ryan McCormick authored
-
ishandhanani authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-
Graham King authored
On Mac embedded python interpreters don't pick up the virtual env. This seems to be a known problem. Fix the sys.path.
-
Hongkuan Zhou authored
-
Hongkuan Zhou authored
-
Anant Sharma authored
-