"examples/vscode:/vscode.git/clone" did not exist on "2d39ded64cbf3025b6ced809fd2a3e50bf1fb72d"
- 16 Mar, 2025 5 commits
-
-
Harrison Saturley-Hall authored
-
julienmancuso authored
Co-authored-by:Maksim Khadkevich <mkhadkevich@nvidia.com>
-
Maksim Khadkevich authored
-
ishandhanani authored
-
April Yang authored
Co-authored-by:
Julien Mancuso <jmancuso@nvidia.com> Co-authored-by:
Hannah Zhang <hannahz@nvidia.com> Co-authored-by:
Biswa Panda <biswa.panda@gmail.com> Co-authored-by:
Maksim Khadkevich <mkhadkevich@nvidia.com>
-
- 15 Mar, 2025 10 commits
-
-
Biswa Panda authored
-
Neelay Shah authored
-
ptarasiewiczNV authored
-
Matthew Kotila authored
-
Biswa Panda authored
Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
ptarasiewiczNV authored
-
Harrison Saturley-Hall authored
-
Maksim Khadkevich authored
-
julienmancuso authored
-
Graham King authored
``` dynamo-run in=batch:prompts.jsonl out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct/ ``` The file has genai format, one entry per line: ``` {"text": "the prompt"} {"text": ..etc ``` The prompt is evaluated and the output written to `output.jsonl` in the same folder as the input. At the end of the run various statistics are printed: > Ran 5 files in 8s 679ms. Tokens in: 40 (5/s). Tokens out: 346 (43/s) This is also helpful for pushing load into the system and stressing the various components. Not intended for performance measurement, it's a batch inference tool.
-
- 14 Mar, 2025 21 commits
-
-
Anant Sharma authored
-
Graham King authored
Engines mistralrs, sglang and vllm included by default. Can be disabled like this: `cargo build --no-default-features --features <add-back-what-you-want>`. Added `--feature vulkan` option, for llamacpp. Build time message if CUDA or Metal would help and are missing. That's the best we can do: > warning: dynamo-run@0.1.0: CUDA not enabled, re-run with `--features cuda` Runtime message if CUDA, Metal or Vulkan are enabled: > 2025-03-14T21:59:26.501937Z INFO dynamo_run: CUDA on Runtime message if they are missing: > 2025-03-14T22:02:37.439404Z INFO dynamo_run: CPU mode. Rebuild with `--features cuda|metal|vulkan` for better performance Defaut engine message includes available engines: > 2025-03-14T21:59:26.503612Z INFO dynamo_run: Using default engine: mistralrs. Use out=<engine> to specify one of echo_core, echo_full, mistralrs, llamacpp, sglang, vllm, pystr, pytok The really important outcome is that this should now "just work": ``` cargo install dynamo-run dynamo-run Qwen/Qwen2.5-3B-Instruct ``` Sadly you still need `--features cuda|metal` for performance, I couldn't automate that.
-
Pavithra Vijayakrishnan authored
-
Hongkuan Zhou authored
-
Ryan McCormick authored
-
ishandhanani authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-
Graham King authored
On Mac embedded python interpreters don't pick up the virtual env. This seems to be a known problem. Fix the sys.path.
-
Hongkuan Zhou authored
-
Hongkuan Zhou authored
-
Anant Sharma authored
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Anant Sharma authored
-
Tanmay Verma authored
-
Graham King authored
- Mac doesn't have `pipe2` syscall so use plain `pipe`. - rtnetlink isn't a dependency on mac so don't use the type
-
hhzhang16 authored
Co-authored-by:Julien Mancuso <jmancuso@nvidia.com>
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Ryan McCormick authored
-
Ryan Olson authored
-
- 13 Mar, 2025 4 commits
-
-
Ryan McCormick authored
-
Ziqi Fan authored
-
Anant Sharma authored
-
Graham King authored
Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step. Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.
-