Commits · e584e96f9b584c750e2f4e9b1073e3cade8c1c9a · OpenDAS / dynamo

"tests/vscode:/vscode.git/clone" did not exist on "c9a48a52e1c1f79461420d4dea25ff45b0be0711"

27 Feb, 2025 8 commits
- feat: llama.cpp engine for tio (#298) · e584e96f
  Graham King authored Feb 27, 2025
```
Docs in README
```
  e584e96f
- fix: add skip_serializing if none (#297) · b20ef999
  Paul Hendricks authored Feb 27, 2025
  
  b20ef999
- refactor: removes wrapper for ChatCompletionContent and adds documentation (#296) · 151a2a1d
  Paul Hendricks authored Feb 27, 2025
  
  151a2a1d
- refactor: service/endpoint stats_handler (#282) · 85cc7b67
  Ryan Olson authored Feb 27, 2025
  
  85cc7b67
- ci: build wheel from root directory (#274) · ea401e3b
  Anant Sharma authored Feb 27, 2025
  
  ea401e3b
- refactor: rename ChatCompletionResponseDelta to NvCreateChatCompletionStreamResponse (#292) · 110f3f8c
  Paul Hendricks authored Feb 27, 2025
  
  110f3f8c
- refactor: rename ChatCompletionResponse to NvCreateChatCompletionResponse (#291) · c13ea718
  Paul Hendricks authored Feb 27, 2025
  
  c13ea718
- refactor: rename ChatCompletionRequest to NvCreateChatCompletionRequest (#284) · 96866f43
  Paul Hendricks authored Feb 27, 2025
  
  96866f43
26 Feb, 2025 4 commits

refactor: using async_openai · 86aff237
Paul Hendricks authored Feb 26, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
86aff237
fix: Fix stream::until_deadline bug and improve metric examples (#280) · 494d5625
Ryan McCormick authored Feb 26, 2025
```
Co-authored-by: Ryan Olson <rolson@nvidia.com>
```
494d5625

feat: Endpoint defaults for namespace/component/other (#277) · 31d27ab2

Graham King authored Feb 26, 2025

This means we don't need to explain the parts to the users until they are ready. We use what they provide and default the rest.

Allows all of this and more:
- `tio out=tdr://test`
- `tio out=tdr://llama_8b_pool`
- `tio in=tdr://corp_ai_research_group/model_next-20250226`
- `tio out=tdr://AIRE.NIM.migrate.mistralrs.1802`

Python, API, etc all untouched.

31d27ab2

ci: fix rust deny workflow (#275) · 76439997
Anant Sharma authored Feb 26, 2025

76439997

25 Feb, 2025 8 commits

feat: sglang backend for tio (#271) · e97493eb

Graham King authored Feb 25, 2025

- Setup venv

```
uv venv
source .venv/bin/activate
uv pip install pip
uv pip install sgl-kernel --force-reinstall --no-deps
uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
```

- Build: `cargo build --release --features sglang`

- Run single node (make sure you're in the venv): `./tio out=sglang ~/llm_models/my_model`

- Run Deepseek multi-gpu / multi-node:

Node 1:
```
tio in=http out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 0 --dist-init-addr 10.217.98.122:9876
```

Node 2:
```
tio in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 1 --dist-init-addr 10.217.98.122:9876
```

e97493eb

chore: updating docs after restructure · c70de37f
Neelay Shah authored Feb 25, 2025

c70de37f
feat: Add completion endpoint to http server and llmctl (#230) · b760c569
Alec authored Feb 25, 2025
```
Co-authored-by: aflowers <aflowers@nvidia.com>
```
b760c569

feat: enable metrics polling · 861c5098

GuanLuo authored Feb 25, 2025

Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswapanda@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

861c5098

refactor: adds `TryFrom<&str>` and `FromStr` for `Endpoint` (#263) · e0e9f4a2
Paul Hendricks authored Feb 25, 2025

e0e9f4a2

feat: tio support preprocessor (#265) · 72064d84

Graham King authored Feb 25, 2025

Add backend type `EngineConfig::StaticCore` that wraps the engine in a preprocessor (prompt templating and tokenization).

Add example engine `echo_core` (`out=echo_core`) which takes and returns tokens. A nice side effect is that it echos the full prompt template with system prompt, whereas `echo_full` echos only user prompt.

![image](https://github.com/user-attachments/assets/27ec0a7b-a27d-4e69-96ea-1ffa0822ea90)

72064d84

ci: Add rust checks to missing directories (#239) · c06b95ff
Ryan McCormick authored Feb 25, 2025
```
Signed-off-by: Ryan McCormick <rmccormick@nvidia.com>
```
c06b95ff

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9