"tests/vscode:/vscode.git/clone" did not exist on "c9a48a52e1c1f79461420d4dea25ff45b0be0711"
- 27 Feb, 2025 8 commits
-
-
Graham King authored
Docs in README
-
Paul Hendricks authored
-
Paul Hendricks authored
-
Ryan Olson authored
-
Anant Sharma authored
-
Paul Hendricks authored
-
Paul Hendricks authored
-
Paul Hendricks authored
-
- 26 Feb, 2025 4 commits
-
-
Paul Hendricks authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
Ryan McCormick authored
Co-authored-by:Ryan Olson <rolson@nvidia.com>
-
Graham King authored
This means we don't need to explain the parts to the users until they are ready. We use what they provide and default the rest. Allows all of this and more: - `tio out=tdr://test` - `tio out=tdr://llama_8b_pool` - `tio in=tdr://corp_ai_research_group/model_next-20250226` - `tio out=tdr://AIRE.NIM.migrate.mistralrs.1802` Python, API, etc all untouched.
-
Anant Sharma authored
-
- 25 Feb, 2025 8 commits
-
-
Graham King authored
- Setup venv ``` uv venv source .venv/bin/activate uv pip install pip uv pip install sgl-kernel --force-reinstall --no-deps uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/ ``` - Build: `cargo build --release --features sglang` - Run single node (make sure you're in the venv): `./tio out=sglang ~/llm_models/my_model` - Run Deepseek multi-gpu / multi-node: Node 1: ``` tio in=http out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 0 --dist-init-addr 10.217.98.122:9876 ``` Node 2: ``` tio in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 1 --dist-init-addr 10.217.98.122:9876 ```
-
Neelay Shah authored
-
Alec authored
Co-authored-by:aflowers <aflowers@nvidia.com>
-
GuanLuo authored
Signed-off-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan Olson <ryanolson@users.noreply.github.com> Co-authored-by:
Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com> Co-authored-by:
Biswa Panda <biswapanda@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Paul Hendricks authored
-
Graham King authored
Add backend type `EngineConfig::StaticCore` that wraps the engine in a preprocessor (prompt templating and tokenization). Add example engine `echo_core` (`out=echo_core`) which takes and returns tokens. A nice side effect is that it echos the full prompt template with system prompt, whereas `echo_full` echos only user prompt. 
-
Ryan McCormick authored
Signed-off-by:Ryan McCormick <rmccormick@nvidia.com>
-
Neelay Shah authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-