Commits · 663cde816de5d036919131c3893783b67d88da14 · OpenDAS / dynamo

14 Mar, 2025 14 commits
- chore: add alec to .py and .rs code owner (#162) · 663cde81
  Hongkuan Zhou authored Mar 14, 2025
  
  663cde81
- fix: wrong indent -> only one worker metric (#161) · 75c249a2
  Hongkuan Zhou authored Mar 14, 2025
  
  75c249a2
- build: use wheel for vllm install (#163) · 7713e25c
  Anant Sharma authored Mar 14, 2025
  
  7713e25c
- fix: Improve error handling for failed HF download (#160) · 0f4529e9
  Ryan McCormick authored Mar 14, 2025
  
  0f4529e9
- refactor: Update default log level to INFO and promote/demote a few log messages (#159) · 6a93d2c7
  Ryan McCormick authored Mar 14, 2025
  
  6a93d2c7
- build: reorganize python packaging to build new wheels (#118) · c1c22703
  Anant Sharma authored Mar 14, 2025
  
  c1c22703
- feat: LLMAPI PoC with dynamo-run launcher (#114) · e0bb5bd3
  Tanmay Verma authored Mar 14, 2025
  
  e0bb5bd3
- fix: Various for MacOS (#155) · 76b79149
  Graham King authored Mar 14, 2025
```
- Mac doesn't have `pipe2` syscall so use plain `pipe`.
- rtnetlink isn't a dependency on mac so don't use the type
```
  76b79149
- feat: add helm charts for deployment (#145) · 82f455d5
  hhzhang16 authored Mar 14, 2025
```
Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
```
  82f455d5
- fix: Fix cargo doc warnings for lib/runtime (#150) · 0f4c1c58
  Ryan McCormick authored Mar 14, 2025
  
  0f4c1c58
- fix: Fix cargo doc warnings for lib/llm (#151) · dac63127
  Ryan McCormick authored Mar 14, 2025
  
  dac63127
- fix: Add missing binaries back into container build (#152) · 7df6bb18
  Ryan McCormick authored Mar 14, 2025
  
  7df6bb18
- refactor: Remove STANDARD and VLLM_NIXL choices from build/run (#148) · cd14a1c5
  Ryan McCormick authored Mar 14, 2025
  
  cd14a1c5
- feat: global kv block manager (#45) · f04359cf
  Ryan Olson authored Mar 13, 2025
  
  f04359cf
13 Mar, 2025 11 commits
- build: Remove nats and etcd sources from final build (#144) · 530a6be0
  Ryan McCormick authored Mar 13, 2025
  
  530a6be0
- refactor: update dynamo run cmd to hint user to install dynamo-run when missing (#143) · 28eb8530
  Ziqi Fan authored Mar 13, 2025
  
  28eb8530
- build: add top level rust workspace (#137) · 3d292851
  Anant Sharma authored Mar 13, 2025
  
  3d292851
- feat(mistralrs): Let the engine enforce max tokens (#134) · 404a78e9
  Graham King authored Mar 13, 2025
```
Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step.

Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.
```
  404a78e9
- build: remove hardcoded cargo jobs args (#136) · 941032da
  Anant Sharma authored Mar 13, 2025
  
  941032da
- fix(dynamo-run): Network interface detection is Linux only (#133) · b0d3eba1
  Graham King authored Mar 13, 2025
```
"netlink" doesn't exist on Mac. We print the primary network interface to help multi-node setup, which is also unlikely on Mac.
```
  b0d3eba1
- build: add CARGO_BUILD_JOBS argument to Dockerfiles (#132) · 3d9ade88
  Pawel Ziecina authored Mar 13, 2025
  
  3d9ade88
- docs: Updated macOS build instructions for dynamo-run. (#131) · 05465f78
  Dmitry Tokarev authored Mar 13, 2025
  
  05465f78
- feat: onboard nixl based vllm example to dynamo serve (#120) · cab65e1a
  Biswa Panda authored Mar 12, 2025
  
  cab65e1a
- fix: Fix TRTLLM chat to work with latest ToT (#127) · 8435b993
  Tanmay Verma authored Mar 12, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  8435b993
- feat(dynamo-run): Download models from HF, smart model defaults (#126) · 089f8e1b
  Graham King authored Mar 12, 2025
```
- Any engine can take the name of a Hugging Face repository. It will be downloaded before calling the engine.

- The default engine (previously always mistralrs) depends on what is compiled in.

- Text can be piped in and will result in a single run of the model.

All of those together mean if you build with `--features vllm` you can do this and it will download the model and run it with vllm, answer your question, and exit:
```
  echo "What is the capital of Costa Rica?"  | dynamo-run Qwen/Qwen2.5-3B-Instruct
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  089f8e1b
12 Mar, 2025 14 commits
- test: enable unit tests for Dynamo API server (#107) · 1d856345
  hhzhang16 authored Mar 12, 2025
  
  1d856345
- feat(pystr): Pass command line arguments (#123) · 995f71cc
  Graham King authored Mar 12, 2025
```
Command line arguments are passed to the python engine like this:
```
  dynamo-run out=pystr:my_python_engine.py -- -n 42 --custom-arg Orange --yes
```

The python engine receives the arguments in `sys.argv`. The argument list will include some standard ones as well as anything after the `--`.

This input:
```
  dynamo-run out=pystr:my_engine.py /opt/models/Llama-3.2-3B-Instruct/ --model-name llama_3.2 --tensor-parallel-size 4 -- -n 1
```

is read like this:
```
  async def generate(request):
      .. as before ..
  
  if __name__ == "__main__":
      print(f"MAIN: {sys.argv}")
```

and produces this output:
```
  MAIN: ['my_engine.py', '--model-path', '/opt/models/Llama-3.2-3B-Instruct/', '--model-name', 'llama3.2', '--http-port', '8080', '--tensor-parallel-size', '4', '--base-gpu-id', '0', '--num-nodes', '1', '--node-rank', '0', '-n', '1']
```

This allows quick iteration on the engine setup. Note how the `-n` `1` is included. Flags `--leader-addr` and `--model-config` will also be added if provided to `dynamo-run`.
```
  995f71cc
- feat: Support prometheus push gateway for use cases behind a firewall (#64) · 666cf87b
  Ryan McCormick authored Mar 12, 2025
  
  666cf87b
- chore: add codeowner (#122) · e7233b2d
  Hongkuan Zhou authored Mar 12, 2025
```
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  e7233b2d
- chore: Reduce conditional prefill logs (#121) · a092bcf4
  ptarasiewiczNV authored Mar 12, 2025
```
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
```
  a092bcf4
- fix: Update trigger_ci.yml to enable CI pipeline trigger for dynamo deploy (#119) · 1725c02d
  Maksim Khadkevich authored Mar 12, 2025
  
  1725c02d
- Create SECURITY.md (#95) · 6e342cdc
  Dmitry Tokarev authored Mar 12, 2025
  
  6e342cdc
- test: add basic support for pytest codeblocks (#117) · fcac394a
  Neelay Shah authored Mar 12, 2025
  
  fcac394a
- revert: "feat: added back smart routing and basic vllm examples (#111)" (#115) · 1fd24d78
  Anant Sharma authored Mar 12, 2025
  
  1fd24d78
- feat: added back smart routing and basic vllm examples (#111) · e3f14051
  Maksim Khadkevich authored Mar 11, 2025
  
  e3f14051
- fix: bump nixl version (#110) · d57847b2
  Hongkuan Zhou authored Mar 11, 2025
```
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
```
  d57847b2
- build: Changes to support building trtllm image in CI (#108) · 175c1762
  Tanmay Verma authored Mar 11, 2025
  
  175c1762
- feat: rename dynamo-sdk to dynamo; add dynamo run to call dynamo-run under the... · 7db61a43
  Ziqi Fan authored Mar 11, 2025
```
feat: rename dynamo-sdk to dynamo; add dynamo run to call dynamo-run under the hood for unification (#104)
```
  7db61a43
- fix: Add missing util files to vllm example (#105) · ab33729b
  Neelay Shah authored Mar 11, 2025
  
  ab33729b
11 Mar, 2025 1 commit
- feat: remove Dynamo API Server's dependency on NDS (#54) · b0655a34
  hhzhang16 authored Mar 11, 2025
  
  b0655a34