Commits · 57402e7059eb73708b4bca2c109ffec44f08c9bd · OpenDAS / dynamo

06 May, 2025 1 commit

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

01 May, 2025 1 commit
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027
29 Apr, 2025 1 commit

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

25 Apr, 2025 1 commit

chore: Publish Model Deployment Card to NATS (#799) · d346782c

Graham King authored Apr 25, 2025

This will allow an ingress-side pre-processor to see it without needing a model checkout.

Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.

To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store.

The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.

Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.

Part of #743

d346782c

09 Apr, 2025 1 commit

feat: Extract Common Configs + Log Configs on Init + Add `test_` to... · 0292feb5

jon-chuang authored Apr 09, 2025


feat: Extract Common Configs + Log Configs on Init + Add `test_` to `sdk/tests` filenames required for pytest (#434)
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>

0292feb5

04 Apr, 2025 1 commit

chore: Upgrade Rust to 1.86 (#518) · e99aa1e1

Graham King authored Apr 04, 2025

Also upgrade the cargo resolver to v3, the default.

New clippy lints:
- `next_back()` instead of `last()` for a double-ended iterator. That avoids walking the whole list.
- ` repeat_n` instead of `repeat.take`. That avoids cloning.
- Doc indenting

e99aa1e1

24 Mar, 2025 1 commit

feat: Build pre-processor from GGUF (#344) · c7067fc2

Graham King authored Mar 24, 2025

This lets us do:
```
dynamo-run out=llamacpp <gguf_file>
```

Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.

c7067fc2

08 Mar, 2025 1 commit
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
05 Mar, 2025 1 commit
- refactor: rename triton_distributed to dynemo (#22) · 1af7433b
  Neelay Shah authored Mar 05, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  1af7433b
25 Feb, 2025 2 commits

feat: tio support preprocessor (#265) · 72064d84

Graham King authored Feb 25, 2025

Add backend type `EngineConfig::StaticCore` that wraps the engine in a preprocessor (prompt templating and tokenization).

Add example engine `echo_core` (`out=echo_core`) which takes and returns tokens. A nice side effect is that it echos the full prompt template with system prompt, whereas `echo_full` echos only user prompt.

![image](https://github.com/user-attachments/assets/27ec0a7b-a27d-4e69-96ea-1ffa0822ea90)

72064d84

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9