Commits · 3d2928510c77fc6cb29c497c91a493e3d06c0cc1 · OpenDAS / dynamo

13 Mar, 2025 9 commits

build: add top level rust workspace (#137) · 3d292851
Anant Sharma authored Mar 13, 2025

3d292851

feat(mistralrs): Let the engine enforce max tokens (#134) · 404a78e9

Graham King authored Mar 13, 2025

Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step.

Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.

404a78e9

build: remove hardcoded cargo jobs args (#136) · 941032da
Anant Sharma authored Mar 13, 2025

941032da

fix(dynamo-run): Network interface detection is Linux only (#133) · b0d3eba1

Graham King authored Mar 13, 2025

"netlink" doesn't exist on Mac. We print the primary network interface to help multi-node setup, which is also unlikely on Mac.

b0d3eba1

build: add CARGO_BUILD_JOBS argument to Dockerfiles (#132) · 3d9ade88
Pawel Ziecina authored Mar 13, 2025

3d9ade88
docs: Updated macOS build instructions for dynamo-run. (#131) · 05465f78
Dmitry Tokarev authored Mar 13, 2025

05465f78
feat: onboard nixl based vllm example to dynamo serve (#120) · cab65e1a
Biswa Panda authored Mar 12, 2025

cab65e1a
fix: Fix TRTLLM chat to work with latest ToT (#127) · 8435b993
Tanmay Verma authored Mar 12, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
8435b993

feat(dynamo-run): Download models from HF, smart model defaults (#126) · 089f8e1b

Graham King authored Mar 12, 2025



- Any engine can take the name of a Hugging Face repository. It will be downloaded before calling the engine.

- The default engine (previously always mistralrs) depends on what is compiled in.

- Text can be piped in and will result in a single run of the model.

All of those together mean if you build with `--features vllm` you can do this and it will download the model and run it with vllm, answer your question, and exit:
```
echo "What is the capital of Costa Rica?"  | dynamo-run Qwen/Qwen2.5-3B-Instruct
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

089f8e1b

12 Mar, 2025 14 commits
- test: enable unit tests for Dynamo API server (#107) · 1d856345
  hhzhang16 authored Mar 12, 2025
  
  1d856345
- feat(pystr): Pass command line arguments (#123) · 995f71cc
  Graham King authored Mar 12, 2025
```
Command line arguments are passed to the python engine like this:
```
  dynamo-run out=pystr:my_python_engine.py -- -n 42 --custom-arg Orange --yes
```

The python engine receives the arguments in `sys.argv`. The argument list will include some standard ones as well as anything after the `--`.

This input:
```
  dynamo-run out=pystr:my_engine.py /opt/models/Llama-3.2-3B-Instruct/ --model-name llama_3.2 --tensor-parallel-size 4 -- -n 1
```

is read like this:
```
  async def generate(request):
      .. as before ..
  
  if __name__ == "__main__":
      print(f"MAIN: {sys.argv}")
```

and produces this output:
```
  MAIN: ['my_engine.py', '--model-path', '/opt/models/Llama-3.2-3B-Instruct/', '--model-name', 'llama3.2', '--http-port', '8080', '--tensor-parallel-size', '4', '--base-gpu-id', '0', '--num-nodes', '1', '--node-rank', '0', '-n', '1']
```

This allows quick iteration on the engine setup. Note how the `-n` `1` is included. Flags `--leader-addr` and `--model-config` will also be added if provided to `dynamo-run`.
```
  995f71cc
- feat: Support prometheus push gateway for use cases behind a firewall (#64) · 666cf87b
  Ryan McCormick authored Mar 12, 2025
  
  666cf87b
- chore: add codeowner (#122) · e7233b2d
  Hongkuan Zhou authored Mar 12, 2025
```
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  e7233b2d
- chore: Reduce conditional prefill logs (#121) · a092bcf4
  ptarasiewiczNV authored Mar 12, 2025
```
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
```
  a092bcf4
- fix: Update trigger_ci.yml to enable CI pipeline trigger for dynamo deploy (#119) · 1725c02d
  Maksim Khadkevich authored Mar 12, 2025
  
  1725c02d
- Create SECURITY.md (#95) · 6e342cdc
  Dmitry Tokarev authored Mar 12, 2025
  
  6e342cdc
- test: add basic support for pytest codeblocks (#117) · fcac394a
  Neelay Shah authored Mar 12, 2025
  
  fcac394a
- revert: "feat: added back smart routing and basic vllm examples (#111)" (#115) · 1fd24d78
  Anant Sharma authored Mar 12, 2025
  
  1fd24d78
- feat: added back smart routing and basic vllm examples (#111) · e3f14051
  Maksim Khadkevich authored Mar 11, 2025
  
  e3f14051
- fix: bump nixl version (#110) · d57847b2
  Hongkuan Zhou authored Mar 11, 2025
```
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
```
  d57847b2
- build: Changes to support building trtllm image in CI (#108) · 175c1762
  Tanmay Verma authored Mar 11, 2025
  
  175c1762
- feat: rename dynamo-sdk to dynamo; add dynamo run to call dynamo-run under the... · 7db61a43
  Ziqi Fan authored Mar 11, 2025
```
feat: rename dynamo-sdk to dynamo; add dynamo run to call dynamo-run under the hood for unification (#104)
```
  7db61a43
- fix: Add missing util files to vllm example (#105) · ab33729b
  Neelay Shah authored Mar 11, 2025
  
  ab33729b
11 Mar, 2025 17 commits
- feat: remove Dynamo API Server's dependency on NDS (#54) · b0655a34
  hhzhang16 authored Mar 11, 2025
  
  b0655a34
- fix: Fix build regression in TRTLLM container with th rename. (#103) · 4f4f6bb7
  Tanmay Verma authored Mar 11, 2025
  
  4f4f6bb7
- refactor: rename vllm_nixl to vllm and make default (#100) · 5bcdb734
  Neelay Shah authored Mar 11, 2025
  
  5bcdb734
- fix: Add missing arg to echo_full example and source cargo env in setup steps (#101) · a7c35dcf
  Ryan McCormick authored Mar 11, 2025
  
  a7c35dcf
- docs(dynamo-run): Fix for workspace (#102) · 8992e895
  Graham King authored Mar 11, 2025
```
In https://github.com/ai-dynamo/dynamo/pull/89 `dynamo-run` was moved into a workspace. That means it builds in that workspace, so into `launch/target` not `launch/dynamo-run/target`.

Update docs to match.
```
  8992e895
- fix(pystr): Output python errors (#99) · 9c7b1ead
  Graham King authored Mar 11, 2025
```
If the python file raises an exception we print it like Python would.

```
  $ ./target/debug/dynamo-run in=http out=pystr:~/Temp/cn47/1_e.py --model-name test
  
  Traceback (most recent call last):
    File "/home/graham/Temp/cn47/1_e.py", line 17, in generate
      raise MyException("The message")
  1_e.MyException: The message
```
```
  9c7b1ead
- feat: kv aware disagg router (#98) · a954a1c6
  Hongkuan Zhou authored Mar 11, 2025
```
Co-authored-by: alec-flowers <aflowers@nvidia.com>
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  a954a1c6
- feat(dynamo-run): Upgrade mistral.rs (#97) · d99b188d
  Graham King authored Mar 11, 2025
```
- Latest from repo, many improvements
- Support most of the OpenAI request features (temperature, top_p, etc)
- Download models from Hugging Face if necessary
```
  d99b188d
- refactor: Move rust binaries out of examples, update nixl dockerfile (#89) · e5db9e86
  Neelay Shah authored Mar 11, 2025
```
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  e5db9e86
- fix: Support vllm_nixl (custom vllm patch) from dynamo-run (#84) · e1a95dab
  Ryan McCormick authored Mar 11, 2025
  
  e1a95dab
- feat: add script to deploy a Dynemo pipeline in k8s using helm (#42) · 86bc5442
  julienmancuso authored Mar 11, 2025
  
  86bc5442
- feat: add new metrics and simple router cost fn (#88) · 3f84cdad
  Alec authored Mar 11, 2025
  
  3f84cdad
- fix: inconsistent router args (#94) · 2153ee81
  Hongkuan Zhou authored Mar 11, 2025
```
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
```
  2153ee81
- fix: update vLLM patch to 0aa204 (#92) · b4281383
  ptarasiewiczNV authored Mar 11, 2025
  
  b4281383
- fix: add multi-node deployment instruction for vllm-nixl (#93) · e0571935
  Hongkuan Zhou authored Mar 11, 2025
```
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
```
  e0571935
- chore: Simplify the container build instructions for LLMAPI example (#87) · f784b36a
  Tanmay Verma authored Mar 11, 2025
  
  f784b36a
- fix: Include GAP hot fix in VLLM NIXL container (#90) · 28f3b1bb
  Piotr Marcinkiewicz authored Mar 11, 2025
  
  28f3b1bb