• Graham King's avatar
    feat: dynamo-run <-> python interop (#934) · 99cd9d85
    Graham King authored
    Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
    ```
    from dynamo.llm import register_llm
    
    MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
    await register_llm(endpoint, MODEL, 3)
    ```
    
    Full vllm example, with pre-processing in dynamo:
    - `dynamo-run in=text out=dyn://dynamo.backend.generate`
    - `cd lib/bindings/python/examples/hello_world`
    - `python server_vllm.py`
    
    This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.
    
    The `register_llm` call does this:
    
    - Download the model from HF if necessary
    - Load the model deployment card from the HF folder or extract from GGUF
    - Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
    - Publish the model deployment card to ETCD
    99cd9d85
This project manages its dependencies using Cargo. Learn more
Cargo.toml 3.17 KB