Commits · ae559fba7352649f1eb495e68d1faaea6f18c60f · OpenDAS / dynamo

03 Mar, 2025 3 commits
- Merge branch 'mkhadkevich/publish-compound-ai-sdk-gl' into 'main' · ae559fba
  Biswa Ranjan Panda authored Mar 03, 2025
```
feat: Add compound AI python SDK

See merge request dl/triton/triton-distributed!5
```
  ae559fba
- feat: Add compound AI python SDK · 07a1a8a1
  Biswa Ranjan Panda authored Mar 03, 2025
  
  07a1a8a1
- fix: Install specific toolchain (#329) · 2d906fb4
  Graham King authored Mar 03, 2025
```
`cargo build --locked` won't let you use "1.85.0" if you only have "stable" installed, even if those are the same thing right now.
```
  2d906fb4
02 Mar, 2025 4 commits
- fix: Copy nixl example for vllm_nixl container (#323) · d316e576
  Piotr Marcinkiewicz authored Mar 02, 2025
  
  d316e576
- chore: Tweak prometheus/grafana config defaults (#319) · 5cbeb5e7
  Ryan McCormick authored Mar 02, 2025
  
  5cbeb5e7
- chore: Update README.md · 3f12b570
  Neelay Shah authored Mar 01, 2025
```
Signed-off-by: Neelay Shah <neelays@nvidia.com>
```
  3f12b570
- [fix] OpenAI API object: completion to text_completion (#318) · a48ffc52
  Alec authored Mar 01, 2025
  
  a48ffc52
01 Mar, 2025 1 commit
- Add lazy import to nixl.py · 5f14e467
  Piotr Marcinkiewicz authored Mar 01, 2025
  
  5f14e467
28 Feb, 2025 8 commits
- refactor: use async-openai CompletionRequest (#310) · 9162f3ad
  Paul Hendricks authored Feb 28, 2025
  
  9162f3ad
- feat: TensorRT-LLM engine (#317) · 057f8f47
  Graham King authored Feb 28, 2025
```
Engine, `tio` support and docs.

Proof of concept / experimental.
```
  057f8f47
- [fix] KV Router Example fixes (#314) · 11a36651
  Alec authored Feb 28, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  11a36651
- feat: Add initial prometheus/grafana support for count (#303) · d38325c2
  Ryan McCormick authored Feb 28, 2025
  
  d38325c2
- feat: vllm engine (#308) · 6e0cfbd9
  Graham King authored Feb 28, 2025
```
triton-distributed-llm component and support in tio
```
  6e0cfbd9
- ci: initial public ci for PRs (#241) · 37a8ebaf
  Harrison Saturley-Hall authored Feb 28, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  37a8ebaf
- Add missing inits (#307) · f229b41b
  Piotr Marcinkiewicz authored Feb 28, 2025
  
  f229b41b
- Updates to support DS R1 in TRTLLM example (#301) · 12d73a82
  NVShreyas authored Feb 28, 2025
  
  12d73a82
27 Feb, 2025 12 commits
- feat: llama.cpp engine for tio (#298) · e584e96f
  Graham King authored Feb 27, 2025
```
Docs in README
```
  e584e96f
- fix: add skip_serializing if none (#297) · b20ef999
  Paul Hendricks authored Feb 27, 2025
  
  b20ef999
- refactor: removes wrapper for ChatCompletionContent and adds documentation (#296) · 151a2a1d
  Paul Hendricks authored Feb 27, 2025
  
  151a2a1d
- refactor: service/endpoint stats_handler (#282) · 85cc7b67
  Ryan Olson authored Feb 27, 2025
  
  85cc7b67
- Update wheel build in Dockerfile.vllm_nixl (#295) · 0a393dcb
  ptarasiewiczNV authored Feb 27, 2025
  
  0a393dcb
- feat: vLLM + NIXL example · f7a60cba
  ptarasiewiczNV authored Feb 27, 2025
```
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: nnshah1 <neelays@nvidia.com>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
```
  f7a60cba
- ci: build wheel from root directory (#274) · ea401e3b
  Anant Sharma authored Feb 27, 2025
  
  ea401e3b
- refactor: rename ChatCompletionResponseDelta to NvCreateChatCompletionStreamResponse (#292) · 110f3f8c
  Paul Hendricks authored Feb 27, 2025
  
  110f3f8c
- refactor: rename ChatCompletionResponse to NvCreateChatCompletionResponse (#291) · c13ea718
  Paul Hendricks authored Feb 27, 2025
  
  c13ea718
- refactor: rename ChatCompletionRequest to NvCreateChatCompletionRequest (#284) · 96866f43
  Paul Hendricks authored Feb 27, 2025
  
  96866f43
- feat: LLM API example integration (#182) · 4b42b232
  Tanmay Verma authored Feb 27, 2025
```
Co-authored-by: NVShreyas <158103197+NVShreyas@users.noreply.github.com>
```
  4b42b232
- feat: Add HTTP completions endpoint to kv router example (#258) · 03d0f6a2
  Sean SH Choi authored Feb 26, 2025
```
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  03d0f6a2
26 Feb, 2025 9 commits
- chore: Remove 'Roff' section from repo language stats (#286) · 7357b432
  Ryan McCormick authored Feb 26, 2025
  
  7357b432
- refactor: using async_openai · 86aff237
  Paul Hendricks authored Feb 26, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  86aff237
- refactor: Move arg parsing into app for cleaner signatures (#281) · d694ca6e
  Ryan McCormick authored Feb 26, 2025
  
  d694ca6e
- fix: Fix stream::until_deadline bug and improve metric examples (#280) · 494d5625
  Ryan McCormick authored Feb 26, 2025
```
Co-authored-by: Ryan Olson <rolson@nvidia.com>
```
  494d5625
- Updated genai-perf for hot-fix for vLLM container · cec8248d
  Piotr Marcinkiewicz authored Feb 26, 2025
  
  cec8248d
- feat: Endpoint defaults for namespace/component/other (#277) · 31d27ab2
  Graham King authored Feb 26, 2025
```
This means we don't need to explain the parts to the users until they are ready. We use what they provide and default the rest.

Allows all of this and more:
- `tio out=tdr://test`
- `tio out=tdr://llama_8b_pool`
- `tio in=tdr://corp_ai_research_group/model_next-20250226`
- `tio out=tdr://AIRE.NIM.migrate.mistralrs.1802`

Python, API, etc all untouched.
```
  31d27ab2
- ci: fix rust deny workflow (#275) · 76439997
  Anant Sharma authored Feb 26, 2025
  
  76439997
- Update Dockerfile to include genai-perf hot fix (#276) · 0db2e6c8
  Piotr Marcinkiewicz authored Feb 26, 2025
```
Signed-off-by: Piotr Marcinkiewicz <piotrm@nvidia.com>
```
  0db2e6c8
- fix: Add TritonNcclConnector back to vllm patch (#273) · 85535ba0
  Alec authored Feb 25, 2025
  
  85535ba0
25 Feb, 2025 3 commits

feat: sglang backend for tio (#271) · e97493eb

Graham King authored Feb 25, 2025

- Setup venv

```
uv venv
source .venv/bin/activate
uv pip install pip
uv pip install sgl-kernel --force-reinstall --no-deps
uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
```

- Build: `cargo build --release --features sglang`

- Run single node (make sure you're in the venv): `./tio out=sglang ~/llm_models/my_model`

- Run Deepseek multi-gpu / multi-node:

Node 1:
```
tio in=http out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 0 --dist-init-addr 10.217.98.122:9876
```

Node 2:
```
tio in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 1 --dist-init-addr 10.217.98.122:9876
```

e97493eb

chore: updating docs after restructure · c70de37f
Neelay Shah authored Feb 25, 2025

c70de37f
feat: Add completion endpoint to http server and llmctl (#230) · b760c569
Alec authored Feb 25, 2025
```
Co-authored-by: aflowers <aflowers@nvidia.com>
```
b760c569