Commits · 177fb3569e517031e0135019300ca55811123257 · OpenDAS / dynamo

15 Mar, 2025 5 commits

fix: Specify vLLM prebuilt wheel location (#176) · 177fb356
ptarasiewiczNV authored Mar 15, 2025

177fb356
feat: Support a small runtime container (#167) · f64e2366
Harrison Saturley-Hall authored Mar 15, 2025

f64e2366
chore: Update CODEOWNERS /deploy/ (#174) · 58e7461a
Maksim Khadkevich authored Mar 14, 2025

58e7461a
fix: fix helm chart deployment (#172) · ffd47bca
julienmancuso authored Mar 14, 2025

ffd47bca

feat(dynamo-run): Batch mode (#142) · 2cca070c

Graham King authored Mar 14, 2025

```
dynamo-run in=batch:prompts.jsonl out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct/
```

The file has genai format, one entry per line:
```
{"text": "the prompt"}
{"text": ..etc
```

The prompt is evaluated and the output written to `output.jsonl` in the
same folder as the input.

At the end of the run various statistics are printed:
> Ran 5 files in 8s 679ms. Tokens in: 40 (5/s). Tokens out: 346 (43/s)

This is also helpful for pushing load into the system and stressing the
various components. Not intended for performance measurement, it's a
batch inference tool.

2cca070c

14 Mar, 2025 21 commits
- revert: "build: use wheel for vllm install (#163)" (#170) · 5cfcfe61
  Anant Sharma authored Mar 14, 2025
  
  5cfcfe61
- feat(dynamo-run): Various UX improvements (#168) · 1fb31d6a
  Graham King authored Mar 14, 2025
```
Engines mistralrs, sglang and vllm included by default. Can be disabled like this: `cargo build --no-default-features --features <add-back-what-you-want>`.

Added `--feature vulkan` option, for llamacpp.

Build time message if CUDA or Metal would help and are missing. That's the best we can do:
> warning: dynamo-run@0.1.0: CUDA not enabled, re-run with `--features cuda`

Runtime message if CUDA, Metal or Vulkan are enabled:
> 2025-03-14T21:59:26.501937Z  INFO dynamo_run: CUDA on

Runtime message if they are missing:
> 2025-03-14T22:02:37.439404Z  INFO dynamo_run: CPU mode. Rebuild with `--features cuda|metal|vulkan` for better performance

Defaut engine message includes available engines:
> 2025-03-14T21:59:26.503612Z  INFO dynamo_run: Using default engine: mistralrs. Use out=<engine> to specify one of echo_core, echo_full, mistralrs, llamacpp, sglang, vllm, pystr, pytok

The really important outcome is that this should now "just work":
```
  cargo install dynamo-run
  dynamo-run Qwen/Qwen2.5-3B-Instruct
```

Sadly you still need `--features cuda|metal` for performance, I couldn't automate that.
```
  1fb31d6a
- ci: Improve summarizing the test report (#153) · f465aca3
  Pavithra Vijayakrishnan authored Mar 14, 2025
  
  f465aca3
- fix: indent issue (duplicated source code) (#165) · 0694d6b5
  Hongkuan Zhou authored Mar 14, 2025
  
  0694d6b5
- feat: Support caching nixl build stage (#147) · 2abe926d
  Ryan McCormick authored Mar 14, 2025
  
  2abe926d
- feat(sdk): add initial graph structure for prebuilt components (#130) · b8120504
  ishandhanani authored Mar 14, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  b8120504
- fix(mac): Fix for virtual env (#164) · 4f7f4b40
  Graham King authored Mar 14, 2025
```
On Mac embedded python interpreters don't pick up the virtual env. This seems to be a known problem. Fix the sys.path.
```
  4f7f4b40
- chore: add alec to .py and .rs code owner (#162) · 663cde81
  Hongkuan Zhou authored Mar 14, 2025
  
  663cde81
- fix: wrong indent -> only one worker metric (#161) · 75c249a2
  Hongkuan Zhou authored Mar 14, 2025
  
  75c249a2
- build: use wheel for vllm install (#163) · 7713e25c
  Anant Sharma authored Mar 14, 2025
  
  7713e25c
- fix: Improve error handling for failed HF download (#160) · 0f4529e9
  Ryan McCormick authored Mar 14, 2025
  
  0f4529e9
- refactor: Update default log level to INFO and promote/demote a few log messages (#159) · 6a93d2c7
  Ryan McCormick authored Mar 14, 2025
  
  6a93d2c7
- build: reorganize python packaging to build new wheels (#118) · c1c22703
  Anant Sharma authored Mar 14, 2025
  
  c1c22703
- feat: LLMAPI PoC with dynamo-run launcher (#114) · e0bb5bd3
  Tanmay Verma authored Mar 14, 2025
  
  e0bb5bd3
- fix: Various for MacOS (#155) · 76b79149
  Graham King authored Mar 14, 2025
```
- Mac doesn't have `pipe2` syscall so use plain `pipe`.
- rtnetlink isn't a dependency on mac so don't use the type
```
  76b79149
- feat: add helm charts for deployment (#145) · 82f455d5
  hhzhang16 authored Mar 14, 2025
```
Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
```
  82f455d5
- fix: Fix cargo doc warnings for lib/runtime (#150) · 0f4c1c58
  Ryan McCormick authored Mar 14, 2025
  
  0f4c1c58
- fix: Fix cargo doc warnings for lib/llm (#151) · dac63127
  Ryan McCormick authored Mar 14, 2025
  
  dac63127
- fix: Add missing binaries back into container build (#152) · 7df6bb18
  Ryan McCormick authored Mar 14, 2025
  
  7df6bb18
- refactor: Remove STANDARD and VLLM_NIXL choices from build/run (#148) · cd14a1c5
  Ryan McCormick authored Mar 14, 2025
  
  cd14a1c5
- feat: global kv block manager (#45) · f04359cf
  Ryan Olson authored Mar 13, 2025
  
  f04359cf
13 Mar, 2025 11 commits
- build: Remove nats and etcd sources from final build (#144) · 530a6be0
  Ryan McCormick authored Mar 13, 2025
  
  530a6be0
- refactor: update dynamo run cmd to hint user to install dynamo-run when missing (#143) · 28eb8530
  Ziqi Fan authored Mar 13, 2025
  
  28eb8530
- build: add top level rust workspace (#137) · 3d292851
  Anant Sharma authored Mar 13, 2025
  
  3d292851
- feat(mistralrs): Let the engine enforce max tokens (#134) · 404a78e9
  Graham King authored Mar 13, 2025
```
Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step.

Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.
```
  404a78e9
- build: remove hardcoded cargo jobs args (#136) · 941032da
  Anant Sharma authored Mar 13, 2025
  
  941032da
- fix(dynamo-run): Network interface detection is Linux only (#133) · b0d3eba1
  Graham King authored Mar 13, 2025
```
"netlink" doesn't exist on Mac. We print the primary network interface to help multi-node setup, which is also unlikely on Mac.
```
  b0d3eba1
- build: add CARGO_BUILD_JOBS argument to Dockerfiles (#132) · 3d9ade88
  Pawel Ziecina authored Mar 13, 2025
  
  3d9ade88
- docs: Updated macOS build instructions for dynamo-run. (#131) · 05465f78
  Dmitry Tokarev authored Mar 13, 2025
  
  05465f78
- feat: onboard nixl based vllm example to dynamo serve (#120) · cab65e1a
  Biswa Panda authored Mar 12, 2025
  
  cab65e1a
- fix: Fix TRTLLM chat to work with latest ToT (#127) · 8435b993
  Tanmay Verma authored Mar 12, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  8435b993
- feat(dynamo-run): Download models from HF, smart model defaults (#126) · 089f8e1b
  Graham King authored Mar 12, 2025
```
- Any engine can take the name of a Hugging Face repository. It will be downloaded before calling the engine.

- The default engine (previously always mistralrs) depends on what is compiled in.

- Text can be piped in and will result in a single run of the model.

All of those together mean if you build with `--features vllm` you can do this and it will download the model and run it with vllm, answer your question, and exit:
```
  echo "What is the capital of Costa Rica?"  | dynamo-run Qwen/Qwen2.5-3B-Instruct
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  089f8e1b
12 Mar, 2025 3 commits

test: enable unit tests for Dynamo API server (#107) · 1d856345
hhzhang16 authored Mar 12, 2025

1d856345

feat(pystr): Pass command line arguments (#123) · 995f71cc

Graham King authored Mar 12, 2025

Command line arguments are passed to the python engine like this:
```
dynamo-run out=pystr:my_python_engine.py -- -n 42 --custom-arg Orange --yes
```

The python engine receives the arguments in `sys.argv`. The argument list will include some standard ones as well as anything after the `--`.

This input:
```
dynamo-run out=pystr:my_engine.py /opt/models/Llama-3.2-3B-Instruct/ --model-name llama_3.2 --tensor-parallel-size 4 -- -n 1
```

is read like this:
```
async def generate(request):
    .. as before ..

if __name__ == "__main__":
    print(f"MAIN: {sys.argv}")
```

and produces this output:
```
MAIN: ['my_engine.py', '--model-path', '/opt/models/Llama-3.2-3B-Instruct/', '--model-name', 'llama3.2', '--http-port', '8080', '--tensor-parallel-size', '4', '--base-gpu-id', '0', '--num-nodes', '1', '--node-rank', '0', '-n', '1']
```

This allows quick iteration on the engine setup. Note how the `-n` `1` is included. Flags `--leader-addr` and `--model-config` will also be added if provided to `dynamo-run`.

995f71cc

feat: Support prometheus push gateway for use cases behind a firewall (#64) · 666cf87b
Ryan McCormick authored Mar 12, 2025

666cf87b