- 01 Mar, 2025 1 commit
-
-
Piotr Marcinkiewicz authored
-
- 28 Feb, 2025 8 commits
-
-
Paul Hendricks authored
-
Graham King authored
Engine, `tio` support and docs. Proof of concept / experimental.
-
Alec authored
Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
Ryan McCormick authored
-
Graham King authored
triton-distributed-llm component and support in tio
-
Harrison Saturley-Hall authored
Signed-off-by:
Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Signed-off-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Piotr Marcinkiewicz authored
-
NVShreyas authored
-
- 27 Feb, 2025 12 commits
-
-
Graham King authored
Docs in README
-
Paul Hendricks authored
-
Paul Hendricks authored
-
Ryan Olson authored
-
ptarasiewiczNV authored
-
ptarasiewiczNV authored
Co-authored-by:
Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local> Co-authored-by:
nnshah1 <neelays@nvidia.com> Co-authored-by:
alec-flowers <aflowers@nvidia.com>
-
Anant Sharma authored
-
Paul Hendricks authored
-
Paul Hendricks authored
-
Paul Hendricks authored
-
Tanmay Verma authored
Co-authored-by:NVShreyas <158103197+NVShreyas@users.noreply.github.com>
-
Sean SH Choi authored
Co-authored-by:Alec <35311602+alec-flowers@users.noreply.github.com>
-
- 26 Feb, 2025 9 commits
-
-
Ryan McCormick authored
-
Paul Hendricks authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
Ryan McCormick authored
-
Ryan McCormick authored
Co-authored-by:Ryan Olson <rolson@nvidia.com>
-
Piotr Marcinkiewicz authored
-
Graham King authored
This means we don't need to explain the parts to the users until they are ready. We use what they provide and default the rest. Allows all of this and more: - `tio out=tdr://test` - `tio out=tdr://llama_8b_pool` - `tio in=tdr://corp_ai_research_group/model_next-20250226` - `tio out=tdr://AIRE.NIM.migrate.mistralrs.1802` Python, API, etc all untouched.
-
Anant Sharma authored
-
Piotr Marcinkiewicz authored
Signed-off-by:Piotr Marcinkiewicz <piotrm@nvidia.com>
-
Alec authored
-
- 25 Feb, 2025 10 commits
-
-
Graham King authored
- Setup venv ``` uv venv source .venv/bin/activate uv pip install pip uv pip install sgl-kernel --force-reinstall --no-deps uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/ ``` - Build: `cargo build --release --features sglang` - Run single node (make sure you're in the venv): `./tio out=sglang ~/llm_models/my_model` - Run Deepseek multi-gpu / multi-node: Node 1: ``` tio in=http out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 0 --dist-init-addr 10.217.98.122:9876 ``` Node 2: ``` tio in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 1 --dist-init-addr 10.217.98.122:9876 ```
-
Neelay Shah authored
-
Alec authored
Co-authored-by:aflowers <aflowers@nvidia.com>
-
Neelay Shah authored
-
GuanLuo authored
Signed-off-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan Olson <ryanolson@users.noreply.github.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Co-authored-by:
Biswa Panda <biswapanda@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Neelay Shah authored
-
Paul Hendricks authored
-
Graham King authored
Add backend type `EngineConfig::StaticCore` that wraps the engine in a preprocessor (prompt templating and tokenization). Add example engine `echo_core` (`out=echo_core`) which takes and returns tokens. A nice side effect is that it echos the full prompt template with system prompt, whereas `echo_full` echos only user prompt. 
-
Ryan McCormick authored
Signed-off-by:Ryan McCormick <rmccormick@nvidia.com>
-
Ryan McCormick authored
-