Commits · 76439997a755b0d1aa01397cb591303154ca0aa8 · OpenDAS / dynamo

26 Feb, 2025 3 commits
- ci: fix rust deny workflow (#275) · 76439997
  Anant Sharma authored Feb 26, 2025
  
  76439997
- Update Dockerfile to include genai-perf hot fix (#276) · 0db2e6c8
  Piotr Marcinkiewicz authored Feb 26, 2025
```
Signed-off-by: Piotr Marcinkiewicz <piotrm@nvidia.com>
```
  0db2e6c8
- fix: Add TritonNcclConnector back to vllm patch (#273) · 85535ba0
  Alec authored Feb 25, 2025
  
  85535ba0
25 Feb, 2025 12 commits

feat: sglang backend for tio (#271) · e97493eb

Graham King authored Feb 25, 2025

- Setup venv

```
uv venv
source .venv/bin/activate
uv pip install pip
uv pip install sgl-kernel --force-reinstall --no-deps
uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
```

- Build: `cargo build --release --features sglang`

- Run single node (make sure you're in the venv): `./tio out=sglang ~/llm_models/my_model`

- Run Deepseek multi-gpu / multi-node:

Node 1:
```
tio in=http out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 0 --dist-init-addr 10.217.98.122:9876
```

Node 2:
```
tio in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 1 --dist-init-addr 10.217.98.122:9876
```

e97493eb

chore: updating docs after restructure · c70de37f
Neelay Shah authored Feb 25, 2025

c70de37f
feat: Add completion endpoint to http server and llmctl (#230) · b760c569
Alec authored Feb 25, 2025
```
Co-authored-by: aflowers <aflowers@nvidia.com>
```
b760c569
refactor: modify code owners · 113f4d91
Neelay Shah authored Feb 25, 2025

113f4d91

feat: enable metrics polling · 861c5098

GuanLuo authored Feb 25, 2025

Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswapanda@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

861c5098

refactor: moving tio to launch dir · eb022ec9
Neelay Shah authored Feb 25, 2025

eb022ec9
refactor: adds `TryFrom<&str>` and `FromStr` for `Endpoint` (#263) · e0e9f4a2
Paul Hendricks authored Feb 25, 2025

e0e9f4a2

feat: tio support preprocessor (#265) · 72064d84

Graham King authored Feb 25, 2025

Add backend type `EngineConfig::StaticCore` that wraps the engine in a preprocessor (prompt templating and tokenization).

Add example engine `echo_core` (`out=echo_core`) which takes and returns tokens. A nice side effect is that it echos the full prompt template with system prompt, whereas `echo_full` echos only user prompt.

![image](https://github.com/user-attachments/assets/27ec0a7b-a27d-4e69-96ea-1ffa0822ea90)

72064d84

ci: Add rust checks to missing directories (#239) · c06b95ff
Ryan McCormick authored Feb 25, 2025
```
Signed-off-by: Ryan McCormick <rmccormick@nvidia.com>
```
c06b95ff
chore: Update code workspace directories after restructure (#260) · 5f1af25a
Ryan McCormick authored Feb 24, 2025

5f1af25a

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9

refactor: remove python native runtime · 0bfd9a76
Neelay Shah authored Feb 24, 2025

0bfd9a76

24 Feb, 2025 3 commits
- feat: adding etcd method kv_create_or_validate (#249) · 8f741f14
  Ryan Olson authored Feb 24, 2025
```
What does the PR do?
 - adds etcd method to atomic create or validate a kv entry.
 - adds integration tests to validate the behavior
```
  8f741f14
- feat: add rust based tokenizer · 4f6f63cd
  Biswa Panda authored Feb 24, 2025
  
  4f6f63cd
- Update trigger_ci.yml to remove ignore_path for documentation (#255) · 53163693
  Meenakshi Sharma authored Feb 24, 2025
```
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
```
  53163693
22 Feb, 2025 3 commits
- refactor: adding tests and simplifying DeadlineStream (#243) · 00bc41d8
  Ryan Olson authored Feb 22, 2025
```
-   Minor update to DeadlineStream
-   Adding tests
```
  00bc41d8
- fix: simplifying the ability to test using from_current (#242) · 26f6008a
  Ryan Olson authored Feb 22, 2025
```
Enables `#[tokio::test]` via `Runtime::from_current()`

This uses the current handle as both the primary and secondary.
```
  26f6008a
- [fix] initialize vLLM engine before runtime (#238) · ccd153af
  Alec authored Feb 21, 2025
```
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
```
  ccd153af
21 Feb, 2025 6 commits

feat(tio): Distributed inference! (#235) · 32a748e4

Graham King authored Feb 21, 2025

Add support in tio for distributed components and discovery.

Node 1:
```
tio in=http out=tdr://ns/backend/mistralrs
```

Node 2:
```
tio in=tdr://ns/backend/mistralrs out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct
```

This will use etcd to auto-discover the model and NATS to talk to it. You can run multiple workers on the same endpoint and it will pick one at random each time.

The `ns/backend/mistralrs` are purely symbolic, pick anything as long as it has three parts, and it matches the other node.

32a748e4

feat: event plane + count · 3b7a462d

Ryan Olson authored Feb 21, 2025


Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

3b7a462d

build: Multi-stage VLLM Build (#225) · 6e0ccccb
Ryan McCormick authored Feb 21, 2025

6e0ccccb

feat: Add http support to vllm kv router example (#217) · 15de1807

Alec authored Feb 21, 2025


Co-authored-by: Sean Choi <choishsean@gmail.com>
Co-authored-by: aflowers <aflowers@nvidia.com>

15de1807

Create docs PR wokflow (#226) · 0439d3b5

Meenakshi Sharma authored Feb 21, 2025


Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

0439d3b5

Add etcd to vLLM dockerfile (#231) · e6be74ba
Piotr Marcinkiewicz authored Feb 21, 2025

e6be74ba

20 Feb, 2025 7 commits
- build: fix wheel naming and add icp dependencies (#208) · 7eee891a
  Anant Sharma authored Feb 20, 2025
  
  7eee891a
- ci: Add Biswa and Thomas as Rust reviewers (#224) · 45f7dfc1
  Graham King authored Feb 20, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  45f7dfc1
- feat(tio): Defaults for in and out, support HF repos (#223) · 7ab5df5d
  Graham King authored Feb 20, 2025
```
You can now run an HF repo directly:
```
  tio ~/llm_models/Llama-3.2-1B-Instruct/
```

or a GGUF
```
  tio ~/llm_models/Llama-3.2-1B-Instruct-Q4_K_M.gguf
```

Also cleanup kv_router so I can merge.
```
  7ab5df5d
- ci: add instruction for running github actions locally (#218) · b90535aa
  Biswa Panda authored Feb 20, 2025
  
  b90535aa
- feat: add cli args for example http service (#221) · 73c10ae9
  Biswa Panda authored Feb 20, 2025
```
Co-authored-by: Biswa Ranjan Panda <biswaranjanp@nvidia.com>
```
  73c10ae9
- feat: use vllm out of process engine · 60a73634
  ptarasiewiczNV authored Feb 20, 2025
```
Signed-off-by: Piotr Marcinkiewicz <piotrm@nvidia.com>
Co-authored-by: Piotr Marcinkiewicz <piotrm@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  60a73634
- feat: add local model card (#216) · 65a2dfab
  Biswa Panda authored Feb 20, 2025
  
  65a2dfab
19 Feb, 2025 1 commit
- test: add unit tests for RuntimeConfig (#215) · 7f85dcc3
  Thomas Montfort authored Feb 19, 2025
  
  7f85dcc3
18 Feb, 2025 5 commits
- feat: add openai endpoint to the vllm example (#183) · 9e4a548d
  ptarasiewiczNV authored Feb 18, 2025
```
Co-authored-by: Ryan Olson <rolson@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  9e4a548d
- fix: triton.code-workpace; update dependencies (#207) · 695754ae
  Ryan Olson authored Feb 18, 2025
  
  695754ae
- [fix] add kv routing to vllm dockerfile · 9c9f0086
  aflowers authored Feb 18, 2025
  
  9c9f0086
- feat: http + llmctl (#181) · d0d35a9e
  Ryan Olson authored Feb 18, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  d0d35a9e
- bug: tio can run without NATS or ETCD (#203) · 4e6f3fef
  Graham King authored Feb 18, 2025
  
  4e6f3fef