Commits · 5f14e467cdc7212f7b892e6f2f40b61d08c653fb · OpenDAS / dynamo

01 Mar, 2025 1 commit
- Add lazy import to nixl.py · 5f14e467
  Piotr Marcinkiewicz authored Mar 01, 2025
  
  5f14e467
28 Feb, 2025 8 commits
- refactor: use async-openai CompletionRequest (#310) · 9162f3ad
  Paul Hendricks authored Feb 28, 2025
  
  9162f3ad
- feat: TensorRT-LLM engine (#317) · 057f8f47
  Graham King authored Feb 28, 2025
```
Engine, `tio` support and docs.

Proof of concept / experimental.
```
  057f8f47
- [fix] KV Router Example fixes (#314) · 11a36651
  Alec authored Feb 28, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  11a36651
- feat: Add initial prometheus/grafana support for count (#303) · d38325c2
  Ryan McCormick authored Feb 28, 2025
  
  d38325c2
- feat: vllm engine (#308) · 6e0cfbd9
  Graham King authored Feb 28, 2025
```
triton-distributed-llm component and support in tio
```
  6e0cfbd9
- ci: initial public ci for PRs (#241) · 37a8ebaf
  Harrison Saturley-Hall authored Feb 28, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  37a8ebaf
- Add missing inits (#307) · f229b41b
  Piotr Marcinkiewicz authored Feb 28, 2025
  
  f229b41b
- Updates to support DS R1 in TRTLLM example (#301) · 12d73a82
  NVShreyas authored Feb 28, 2025
  
  12d73a82
27 Feb, 2025 12 commits
- feat: llama.cpp engine for tio (#298) · e584e96f
  Graham King authored Feb 27, 2025
```
Docs in README
```
  e584e96f
- fix: add skip_serializing if none (#297) · b20ef999
  Paul Hendricks authored Feb 27, 2025
  
  b20ef999
- refactor: removes wrapper for ChatCompletionContent and adds documentation (#296) · 151a2a1d
  Paul Hendricks authored Feb 27, 2025
  
  151a2a1d
- refactor: service/endpoint stats_handler (#282) · 85cc7b67
  Ryan Olson authored Feb 27, 2025
  
  85cc7b67
- Update wheel build in Dockerfile.vllm_nixl (#295) · 0a393dcb
  ptarasiewiczNV authored Feb 27, 2025
  
  0a393dcb
- feat: vLLM + NIXL example · f7a60cba
  ptarasiewiczNV authored Feb 27, 2025
```
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: nnshah1 <neelays@nvidia.com>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
```
  f7a60cba
- ci: build wheel from root directory (#274) · ea401e3b
  Anant Sharma authored Feb 27, 2025
  
  ea401e3b
- refactor: rename ChatCompletionResponseDelta to NvCreateChatCompletionStreamResponse (#292) · 110f3f8c
  Paul Hendricks authored Feb 27, 2025
  
  110f3f8c
- refactor: rename ChatCompletionResponse to NvCreateChatCompletionResponse (#291) · c13ea718
  Paul Hendricks authored Feb 27, 2025
  
  c13ea718
- refactor: rename ChatCompletionRequest to NvCreateChatCompletionRequest (#284) · 96866f43
  Paul Hendricks authored Feb 27, 2025
  
  96866f43
- feat: LLM API example integration (#182) · 4b42b232
  Tanmay Verma authored Feb 27, 2025
```
Co-authored-by: NVShreyas <158103197+NVShreyas@users.noreply.github.com>
```
  4b42b232
- feat: Add HTTP completions endpoint to kv router example (#258) · 03d0f6a2
  Sean SH Choi authored Feb 26, 2025
```
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  03d0f6a2
26 Feb, 2025 9 commits
- chore: Remove 'Roff' section from repo language stats (#286) · 7357b432
  Ryan McCormick authored Feb 26, 2025
  
  7357b432
- refactor: using async_openai · 86aff237
  Paul Hendricks authored Feb 26, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  86aff237
- refactor: Move arg parsing into app for cleaner signatures (#281) · d694ca6e
  Ryan McCormick authored Feb 26, 2025
  
  d694ca6e
- fix: Fix stream::until_deadline bug and improve metric examples (#280) · 494d5625
  Ryan McCormick authored Feb 26, 2025
```
Co-authored-by: Ryan Olson <rolson@nvidia.com>
```
  494d5625
- Updated genai-perf for hot-fix for vLLM container · cec8248d
  Piotr Marcinkiewicz authored Feb 26, 2025
  
  cec8248d
- feat: Endpoint defaults for namespace/component/other (#277) · 31d27ab2
  Graham King authored Feb 26, 2025
```
This means we don't need to explain the parts to the users until they are ready. We use what they provide and default the rest.

Allows all of this and more:
- `tio out=tdr://test`
- `tio out=tdr://llama_8b_pool`
- `tio in=tdr://corp_ai_research_group/model_next-20250226`
- `tio out=tdr://AIRE.NIM.migrate.mistralrs.1802`

Python, API, etc all untouched.
```
  31d27ab2
- ci: fix rust deny workflow (#275) · 76439997
  Anant Sharma authored Feb 26, 2025
  
  76439997
- Update Dockerfile to include genai-perf hot fix (#276) · 0db2e6c8
  Piotr Marcinkiewicz authored Feb 26, 2025
```
Signed-off-by: Piotr Marcinkiewicz <piotrm@nvidia.com>
```
  0db2e6c8
- fix: Add TritonNcclConnector back to vllm patch (#273) · 85535ba0
  Alec authored Feb 25, 2025
  
  85535ba0
25 Feb, 2025 10 commits

feat: sglang backend for tio (#271) · e97493eb

Graham King authored Feb 25, 2025

- Setup venv

```
uv venv
source .venv/bin/activate
uv pip install pip
uv pip install sgl-kernel --force-reinstall --no-deps
uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
```

- Build: `cargo build --release --features sglang`

- Run single node (make sure you're in the venv): `./tio out=sglang ~/llm_models/my_model`

- Run Deepseek multi-gpu / multi-node:

Node 1:
```
tio in=http out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 0 --dist-init-addr 10.217.98.122:9876
```

Node 2:
```
tio in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 1 --dist-init-addr 10.217.98.122:9876
```

e97493eb

chore: updating docs after restructure · c70de37f
Neelay Shah authored Feb 25, 2025

c70de37f
feat: Add completion endpoint to http server and llmctl (#230) · b760c569
Alec authored Feb 25, 2025
```
Co-authored-by: aflowers <aflowers@nvidia.com>
```
b760c569
refactor: modify code owners · 113f4d91
Neelay Shah authored Feb 25, 2025

113f4d91

feat: enable metrics polling · 861c5098

GuanLuo authored Feb 25, 2025

Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswapanda@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

861c5098

refactor: moving tio to launch dir · eb022ec9
Neelay Shah authored Feb 25, 2025

eb022ec9
refactor: adds `TryFrom<&str>` and `FromStr` for `Endpoint` (#263) · e0e9f4a2
Paul Hendricks authored Feb 25, 2025

e0e9f4a2

feat: tio support preprocessor (#265) · 72064d84

Graham King authored Feb 25, 2025

Add backend type `EngineConfig::StaticCore` that wraps the engine in a preprocessor (prompt templating and tokenization).

Add example engine `echo_core` (`out=echo_core`) which takes and returns tokens. A nice side effect is that it echos the full prompt template with system prompt, whereas `echo_full` echos only user prompt.

![image](https://github.com/user-attachments/assets/27ec0a7b-a27d-4e69-96ea-1ffa0822ea90)

72064d84

ci: Add rust checks to missing directories (#239) · c06b95ff
Ryan McCormick authored Feb 25, 2025
```
Signed-off-by: Ryan McCormick <rmccormick@nvidia.com>
```
c06b95ff
chore: Update code workspace directories after restructure (#260) · 5f1af25a
Ryan McCormick authored Feb 24, 2025

5f1af25a