1. 08 May, 2025 8 commits
  2. 07 May, 2025 12 commits
  3. 06 May, 2025 8 commits
    • jthomson04's avatar
      c4213899
    • Hongkuan Zhou's avatar
      docs: add drt doc (#951) · 2d4f8b50
      Hongkuan Zhou authored
      2d4f8b50
    • Graham King's avatar
      feat(dynamo-run): vllm and sglang subprocess engines (#954) · 28fd481c
      Graham King authored
      New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines.
          
      Why?
          
        - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain.
        - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues.
        - Should have better performance as it's "native" vllm / sglang.
        - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.
      28fd481c
    • jthomson04's avatar
      chore: Add John as Codeowner (#962) · 9f0e12a0
      jthomson04 authored
      9f0e12a0
    • Graham King's avatar
      chore: Two-line copyright check (#958) · a9068dc6
      Graham King authored
      Approved by OSRB in Slack.
      
      Note we don't check for the closing delimiter to allow the longer copyright format.
      
      Motivation is that it reduces the context usage by 12 lines for every file in the project. That helps things like Cursor and Claude Code fit more, go faster, and cost less.
      a9068dc6
    • hhzhang16's avatar
      ci: lock cuda at 12.8 (#957) · 632158be
      hhzhang16 authored
      632158be
    • hhzhang16's avatar
      403344e5
    • Graham King's avatar
      feat: dynamo-run <-> python interop (#934) · 99cd9d85
      Graham King authored
      Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
      ```
      from dynamo.llm import register_llm
      
      MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
      await register_llm(endpoint, MODEL, 3)
      ```
      
      Full vllm example, with pre-processing in dynamo:
      - `dynamo-run in=text out=dyn://dynamo.backend.generate`
      - `cd lib/bindings/python/examples/hello_world`
      - `python server_vllm.py`
      
      This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.
      
      The `register_llm` call does this:
      
      - Download the model from HF if necessary
      - Load the model deployment card from the HF folder or extract from GGUF
      - Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
      - Publish the model deployment card to ETCD
      99cd9d85
  4. 05 May, 2025 6 commits
  5. 02 May, 2025 3 commits
  6. 01 May, 2025 3 commits