1. 09 May, 2025 3 commits
  2. 06 May, 2025 1 commit
    • Graham King's avatar
      feat(dynamo-run): vllm and sglang subprocess engines (#954) · 28fd481c
      Graham King authored
      New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines.
          
      Why?
          
        - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain.
        - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues.
        - Should have better performance as it's "native" vllm / sglang.
        - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.
      28fd481c
  3. 01 May, 2025 1 commit
  4. 29 Apr, 2025 1 commit
    • Graham King's avatar
      chore: Split PushRouter from Client (#817) · a1a10365
      Graham King authored
      In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.
      
      As part of moving pre-processing back to ingress-side we need to split this into two steps:
      - Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
      - PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.
      
      Part of #743
      a1a10365
  5. 26 Apr, 2025 1 commit
  6. 25 Apr, 2025 3 commits
    • Harrison Saturley-Hall's avatar
    • Anant Sharma's avatar
      448e79a6
    • Graham King's avatar
      chore: Publish Model Deployment Card to NATS (#799) · d346782c
      Graham King authored
      This will allow an ingress-side pre-processor to see it without needing a model checkout.
      
      Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.
      
      To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. 
      
      The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.
      
      Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.
      
      Part of #743 
      d346782c
  7. 17 Apr, 2025 1 commit
  8. 09 Apr, 2025 1 commit
  9. 04 Apr, 2025 1 commit
    • Graham King's avatar
      chore: Upgrade Rust to 1.86 (#518) · e99aa1e1
      Graham King authored
      Also upgrade the cargo resolver to v3, the default.
      
      New clippy lints:
      - `next_back()` instead of `last()` for a double-ended iterator. That avoids walking the whole list.
      - ` repeat_n` instead of `repeat.take`. That avoids cloning.
      - Doc indenting
      e99aa1e1
  10. 03 Apr, 2025 1 commit
  11. 31 Mar, 2025 1 commit
  12. 20 Mar, 2025 1 commit
  13. 19 Mar, 2025 2 commits
    • Graham King's avatar
      chore: Don't depend on openssl (#292) · 7c3fd5c9
      Graham King authored
      This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked.
      
      Pieces:
      - `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag).
      - Move shared dependencies up into workspace
      - New `rand` crate has some renames for future rust
      - Ensure the dependency doesn't creep back in by enforcing it with cargo deny.
      7c3fd5c9
    • Alexander Zaitsev's avatar
      feat: enable LTO and codegen-units = 1 optimizations (#279) · af8ee9db
      Alexander Zaitsev authored
      #### Overview:
      
      This PR enables more aggressive compiler optimizations for the project which should lead to better performance and smaller binary sizes.
      
      In this PR, I decided to use Fat LTO instead of ThinLTO since it provides higher optimization level.
      
      I have made quick tests (AMD Ryzen 5900x, Fedora 41, Rust 1.85.1, the latest version of the project at the moment, `cargo build --release` command) - here are the results about the binary size improvements.
      
      | Binary\Build mode | dynamo-run | libdynamo_llm_capi.so | http | llmctl | metrics | mock_worker |
      | --- | --- | --- | --- | --- | --- | --- |
      | Release | 55 Mib | 14 Mib | 19 Mib | 14 Mib | 21 Mib | 14 Mib |
      | Release + `codegen-units = 1` + ThinLTO | 43 Mib | 11 Mib | 15 Mib | 11 Mib | 17 Mib | 11 Mib |
      | Release + `codegen-units = 1` + FatLTO | 38 Mib | 9.2 Mib | 13 Mib | 9.6 Mib | 15 Mib | 9.6 Mib |
      
      #### Details:
      
      Enable `codegen-units = 1` and Fat LTO for better optimizations.
      
      #### Where should the reviewer start?
      
      Just check the `Cargo.toml` file ;)
      
      #### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
      
      - closes GitHub issue: #278 
      af8ee9db
  14. 14 Mar, 2025 1 commit
    • Graham King's avatar
      fix: Various for MacOS (#155) · 76b79149
      Graham King authored
      - Mac doesn't have `pipe2` syscall so use plain `pipe`.
      - rtnetlink isn't a dependency on mac so don't use the type
      76b79149
  15. 13 Mar, 2025 1 commit
  16. 11 Mar, 2025 1 commit
  17. 10 Mar, 2025 1 commit
  18. 09 Mar, 2025 1 commit
  19. 08 Mar, 2025 1 commit
  20. 05 Mar, 2025 1 commit
  21. 25 Feb, 2025 1 commit
  22. 21 Feb, 2025 1 commit
  23. 18 Feb, 2025 1 commit
  24. 13 Feb, 2025 1 commit
  25. 12 Feb, 2025 1 commit
  26. 11 Feb, 2025 1 commit
  27. 10 Feb, 2025 1 commit
  28. 06 Feb, 2025 1 commit
  29. 05 Feb, 2025 2 commits
  30. 04 Feb, 2025 1 commit