Commits · f7a60cba6a078f283f575784e0110b25dc397b7f · OpenDAS / dynamo

27 Feb, 2025 7 commits
- feat: vLLM + NIXL example · f7a60cba
  ptarasiewiczNV authored Feb 27, 2025
```
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: nnshah1 <neelays@nvidia.com>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
```
  f7a60cba
- ci: build wheel from root directory (#274) · ea401e3b
  Anant Sharma authored Feb 27, 2025
  
  ea401e3b
- refactor: rename ChatCompletionResponseDelta to NvCreateChatCompletionStreamResponse (#292) · 110f3f8c
  Paul Hendricks authored Feb 27, 2025
  
  110f3f8c
- refactor: rename ChatCompletionResponse to NvCreateChatCompletionResponse (#291) · c13ea718
  Paul Hendricks authored Feb 27, 2025
  
  c13ea718
- refactor: rename ChatCompletionRequest to NvCreateChatCompletionRequest (#284) · 96866f43
  Paul Hendricks authored Feb 27, 2025
  
  96866f43
- feat: LLM API example integration (#182) · 4b42b232
  Tanmay Verma authored Feb 27, 2025
```
Co-authored-by: NVShreyas <158103197+NVShreyas@users.noreply.github.com>
```
  4b42b232
- feat: Add HTTP completions endpoint to kv router example (#258) · 03d0f6a2
  Sean SH Choi authored Feb 26, 2025
```
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  03d0f6a2
26 Feb, 2025 9 commits
- chore: Remove 'Roff' section from repo language stats (#286) · 7357b432
  Ryan McCormick authored Feb 26, 2025
  
  7357b432
- refactor: using async_openai · 86aff237
  Paul Hendricks authored Feb 26, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  86aff237
- refactor: Move arg parsing into app for cleaner signatures (#281) · d694ca6e
  Ryan McCormick authored Feb 26, 2025
  
  d694ca6e
- fix: Fix stream::until_deadline bug and improve metric examples (#280) · 494d5625
  Ryan McCormick authored Feb 26, 2025
```
Co-authored-by: Ryan Olson <rolson@nvidia.com>
```
  494d5625
- Updated genai-perf for hot-fix for vLLM container · cec8248d
  Piotr Marcinkiewicz authored Feb 26, 2025
  
  cec8248d
- feat: Endpoint defaults for namespace/component/other (#277) · 31d27ab2
  Graham King authored Feb 26, 2025
```
This means we don't need to explain the parts to the users until they are ready. We use what they provide and default the rest.

Allows all of this and more:
- `tio out=tdr://test`
- `tio out=tdr://llama_8b_pool`
- `tio in=tdr://corp_ai_research_group/model_next-20250226`
- `tio out=tdr://AIRE.NIM.migrate.mistralrs.1802`

Python, API, etc all untouched.
```
  31d27ab2
- ci: fix rust deny workflow (#275) · 76439997
  Anant Sharma authored Feb 26, 2025
  
  76439997
- Update Dockerfile to include genai-perf hot fix (#276) · 0db2e6c8
  Piotr Marcinkiewicz authored Feb 26, 2025
```
Signed-off-by: Piotr Marcinkiewicz <piotrm@nvidia.com>
```
  0db2e6c8
- fix: Add TritonNcclConnector back to vllm patch (#273) · 85535ba0
  Alec authored Feb 25, 2025
  
  85535ba0
25 Feb, 2025 12 commits

feat: sglang backend for tio (#271) · e97493eb

Graham King authored Feb 25, 2025

- Setup venv

```
uv venv
source .venv/bin/activate
uv pip install pip
uv pip install sgl-kernel --force-reinstall --no-deps
uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
```

- Build: `cargo build --release --features sglang`

- Run single node (make sure you're in the venv): `./tio out=sglang ~/llm_models/my_model`

- Run Deepseek multi-gpu / multi-node:

Node 1:
```
tio in=http out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 0 --dist-init-addr 10.217.98.122:9876
```

Node 2:
```
tio in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 1 --dist-init-addr 10.217.98.122:9876
```

e97493eb

chore: updating docs after restructure · c70de37f
Neelay Shah authored Feb 25, 2025

c70de37f
feat: Add completion endpoint to http server and llmctl (#230) · b760c569
Alec authored Feb 25, 2025
```
Co-authored-by: aflowers <aflowers@nvidia.com>
```
b760c569
refactor: modify code owners · 113f4d91
Neelay Shah authored Feb 25, 2025

113f4d91

feat: enable metrics polling · 861c5098

GuanLuo authored Feb 25, 2025

Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswapanda@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

861c5098

refactor: moving tio to launch dir · eb022ec9
Neelay Shah authored Feb 25, 2025

eb022ec9
refactor: adds `TryFrom<&str>` and `FromStr` for `Endpoint` (#263) · e0e9f4a2
Paul Hendricks authored Feb 25, 2025

e0e9f4a2

feat: tio support preprocessor (#265) · 72064d84

Graham King authored Feb 25, 2025

Add backend type `EngineConfig::StaticCore` that wraps the engine in a preprocessor (prompt templating and tokenization).

Add example engine `echo_core` (`out=echo_core`) which takes and returns tokens. A nice side effect is that it echos the full prompt template with system prompt, whereas `echo_full` echos only user prompt.

![image](https://github.com/user-attachments/assets/27ec0a7b-a27d-4e69-96ea-1ffa0822ea90)

72064d84

ci: Add rust checks to missing directories (#239) · c06b95ff
Ryan McCormick authored Feb 25, 2025
```
Signed-off-by: Ryan McCormick <rmccormick@nvidia.com>
```
c06b95ff
chore: Update code workspace directories after restructure (#260) · 5f1af25a
Ryan McCormick authored Feb 24, 2025

5f1af25a

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9

refactor: remove python native runtime · 0bfd9a76
Neelay Shah authored Feb 24, 2025

0bfd9a76

24 Feb, 2025 3 commits
- feat: adding etcd method kv_create_or_validate (#249) · 8f741f14
  Ryan Olson authored Feb 24, 2025
```
What does the PR do?
 - adds etcd method to atomic create or validate a kv entry.
 - adds integration tests to validate the behavior
```
  8f741f14
- feat: add rust based tokenizer · 4f6f63cd
  Biswa Panda authored Feb 24, 2025
  
  4f6f63cd
- Update trigger_ci.yml to remove ignore_path for documentation (#255) · 53163693
  Meenakshi Sharma authored Feb 24, 2025
```
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  53163693
22 Feb, 2025 3 commits
- refactor: adding tests and simplifying DeadlineStream (#243) · 00bc41d8
  Ryan Olson authored Feb 22, 2025
```
-   Minor update to DeadlineStream
-   Adding tests
```
  00bc41d8
- fix: simplifying the ability to test using from_current (#242) · 26f6008a
  Ryan Olson authored Feb 22, 2025
```
Enables `#[tokio::test]` via `Runtime::from_current()`

This uses the current handle as both the primary and secondary.
```
  26f6008a
- [fix] initialize vLLM engine before runtime (#238) · ccd153af
  Alec authored Feb 21, 2025
```
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
```
  ccd153af
21 Feb, 2025 6 commits

feat(tio): Distributed inference! (#235) · 32a748e4

Graham King authored Feb 21, 2025

Add support in tio for distributed components and discovery.

Node 1:
```
tio in=http out=tdr://ns/backend/mistralrs
```

Node 2:
```
tio in=tdr://ns/backend/mistralrs out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct
```

This will use etcd to auto-discover the model and NATS to talk to it. You can run multiple workers on the same endpoint and it will pick one at random each time.

The `ns/backend/mistralrs` are purely symbolic, pick anything as long as it has three parts, and it matches the other node.

32a748e4

feat: event plane + count · 3b7a462d

Ryan Olson authored Feb 21, 2025


Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

3b7a462d

build: Multi-stage VLLM Build (#225) · 6e0ccccb
Ryan McCormick authored Feb 21, 2025

6e0ccccb

feat: Add http support to vllm kv router example (#217) · 15de1807

Alec authored Feb 21, 2025


Co-authored-by: Sean Choi <choishsean@gmail.com>
Co-authored-by: aflowers <aflowers@nvidia.com>

15de1807

Create docs PR wokflow (#226) · 0439d3b5

Meenakshi Sharma authored Feb 21, 2025


Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

0439d3b5

Add etcd to vLLM dockerfile (#231) · e6be74ba
Piotr Marcinkiewicz authored Feb 21, 2025

e6be74ba