Commits · 0e77d3442ee227eeaa84ac013633ce67be6b99b8 · OpenDAS / dynamo

17 Nov, 2025 1 commit

refactor: centralize environment variable constants (#4083) · 0e77d344

Keiven C authored Nov 17, 2025


Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

0e77d344

31 Oct, 2025 1 commit
- fix: update model express common to use HF_HOME (if specified) (#3638) · 7f0ed081
  Biswa Panda authored Oct 31, 2025
  
  7f0ed081
24 Sep, 2025 1 commit
- feat: modelexpress dynamo integration (#3191) · da3b1dbd
  Hyunjae Woo authored Sep 24, 2025
  
  da3b1dbd
16 Sep, 2025 1 commit
- chore(llm): Remove extra license headers (#3065) · bc0a7633
  Graham King authored Sep 16, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  bc0a7633
28 Aug, 2025 1 commit
- feat: Integrate Model Express Client into Dynamo Model Downloads (#2574) · 95ce83d5
  KavinKrishnan authored Aug 28, 2025
```
Signed-off-by: Kavin Krishnan <kavink@nvidia.com>
Co-authored-by: KavinKrishnan <kavin.krishnan@nvidia.com>
```
  95ce83d5
25 Aug, 2025 1 commit
- feat: support HF_HOME/_ENDPOINT env for Hugging Face models (#2642) · a24221d4
  Hyeonki Hong authored Aug 26, 2025
```
Signed-off-by: Hyeonki Hong <hhk7734@gmail.com>
```
  a24221d4
20 Aug, 2025 1 commit
- fix(hub): Download faster from Hugging Face (#2566) · d5b66fa2
  Graham King authored Aug 20, 2025
  
  d5b66fa2
31 Jul, 2025 1 commit
- feat: skip downloading model weights if using mocker (only tokenizer) (#2213) · bae25dc6
  Yan Ru Pei authored Jul 31, 2025
  
  bae25dc6
28 May, 2025 1 commit
- fix(dynamo-llm): Use HF_TOKEN env var (#1249) · 471a352f
  Graham King authored May 28, 2025
```
Fixes #286
```
  471a352f
06 May, 2025 1 commit

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

17 Mar, 2025 1 commit
- fix(runtime): Shutdown message from eprintln to tracing debug (#219) · f46f6d0e
  Graham King authored Mar 17, 2025
  
  f46f6d0e
14 Mar, 2025 1 commit
- fix: Improve error handling for failed HF download (#160) · 0f4529e9
  Ryan McCormick authored Mar 14, 2025
  
  0f4529e9
13 Mar, 2025 1 commit

feat(dynamo-run): Download models from HF, smart model defaults (#126) · 089f8e1b

Graham King authored Mar 12, 2025



- Any engine can take the name of a Hugging Face repository. It will be downloaded before calling the engine.

- The default engine (previously always mistralrs) depends on what is compiled in.

- Text can be piped in and will result in a single run of the model.

All of those together mean if you build with `--features vllm` you can do this and it will download the model and run it with vllm, answer your question, and exit:
```
echo "What is the capital of Costa Rica?"  | dynamo-run Qwen/Qwen2.5-3B-Instruct
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

089f8e1b