- 27 Feb, 2025 7 commits
-
-
ptarasiewiczNV authored
Co-authored-by:
Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local> Co-authored-by:
nnshah1 <neelays@nvidia.com> Co-authored-by:
alec-flowers <aflowers@nvidia.com>
-
Anant Sharma authored
-
Paul Hendricks authored
-
Paul Hendricks authored
-
Paul Hendricks authored
-
Tanmay Verma authored
Co-authored-by:NVShreyas <158103197+NVShreyas@users.noreply.github.com>
-
Sean SH Choi authored
Co-authored-by:Alec <35311602+alec-flowers@users.noreply.github.com>
-
- 26 Feb, 2025 9 commits
-
-
Ryan McCormick authored
-
Paul Hendricks authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
Ryan McCormick authored
-
Ryan McCormick authored
Co-authored-by:Ryan Olson <rolson@nvidia.com>
-
Piotr Marcinkiewicz authored
-
Graham King authored
This means we don't need to explain the parts to the users until they are ready. We use what they provide and default the rest. Allows all of this and more: - `tio out=tdr://test` - `tio out=tdr://llama_8b_pool` - `tio in=tdr://corp_ai_research_group/model_next-20250226` - `tio out=tdr://AIRE.NIM.migrate.mistralrs.1802` Python, API, etc all untouched.
-
Anant Sharma authored
-
Piotr Marcinkiewicz authored
Signed-off-by:Piotr Marcinkiewicz <piotrm@nvidia.com>
-
Alec authored
-
- 25 Feb, 2025 12 commits
-
-
Graham King authored
- Setup venv ``` uv venv source .venv/bin/activate uv pip install pip uv pip install sgl-kernel --force-reinstall --no-deps uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/ ``` - Build: `cargo build --release --features sglang` - Run single node (make sure you're in the venv): `./tio out=sglang ~/llm_models/my_model` - Run Deepseek multi-gpu / multi-node: Node 1: ``` tio in=http out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 0 --dist-init-addr 10.217.98.122:9876 ``` Node 2: ``` tio in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 1 --dist-init-addr 10.217.98.122:9876 ```
-
Neelay Shah authored
-
Alec authored
Co-authored-by:aflowers <aflowers@nvidia.com>
-
Neelay Shah authored
-
GuanLuo authored
Signed-off-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan Olson <ryanolson@users.noreply.github.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Co-authored-by:
Biswa Panda <biswapanda@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Neelay Shah authored
-
Paul Hendricks authored
-
Graham King authored
Add backend type `EngineConfig::StaticCore` that wraps the engine in a preprocessor (prompt templating and tokenization). Add example engine `echo_core` (`out=echo_core`) which takes and returns tokens. A nice side effect is that it echos the full prompt template with system prompt, whereas `echo_full` echos only user prompt. 
-
Ryan McCormick authored
Signed-off-by:Ryan McCormick <rmccormick@nvidia.com>
-
Ryan McCormick authored
-
Neelay Shah authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Neelay Shah authored
-
- 24 Feb, 2025 3 commits
-
-
Ryan Olson authored
What does the PR do? - adds etcd method to atomic create or validate a kv entry. - adds integration tests to validate the behavior
-
Biswa Panda authored
-
Meenakshi Sharma authored
Signed-off-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
- 22 Feb, 2025 3 commits
-
-
Ryan Olson authored
- Minor update to DeadlineStream - Adding tests
-
Ryan Olson authored
Enables `#[tokio::test]` via `Runtime::from_current()` This uses the current handle as both the primary and secondary.
-
Alec authored
Co-authored-by:hongkuanz <hongkuanz@nvidia.com>
-
- 21 Feb, 2025 6 commits
-
-
Graham King authored
Add support in tio for distributed components and discovery. Node 1: ``` tio in=http out=tdr://ns/backend/mistralrs ``` Node 2: ``` tio in=tdr://ns/backend/mistralrs out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct ``` This will use etcd to auto-discover the model and NATS to talk to it. You can run multiple workers on the same endpoint and it will pick one at random each time. The `ns/backend/mistralrs` are purely symbolic, pick anything as long as it has three parts, and it matches the other node.
-
Ryan Olson authored
Signed-off-by:
Ryan Olson <ryanolson@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Ryan McCormick authored
-
Alec authored
Co-authored-by:
Sean Choi <choishsean@gmail.com> Co-authored-by:
aflowers <aflowers@nvidia.com>
-
Meenakshi Sharma authored
Signed-off-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Co-authored-by:
Anant Sharma <anants@nvidia.com>
-
Piotr Marcinkiewicz authored
-