1. 16 Sep, 2025 1 commit
  2. 15 Sep, 2025 1 commit
  3. 05 Sep, 2025 1 commit
  4. 03 Sep, 2025 1 commit
  5. 02 Sep, 2025 1 commit
  6. 28 Aug, 2025 1 commit
  7. 26 Aug, 2025 1 commit
  8. 25 Aug, 2025 1 commit
  9. 22 Aug, 2025 2 commits
  10. 19 Aug, 2025 2 commits
  11. 14 Aug, 2025 1 commit
  12. 11 Aug, 2025 1 commit
  13. 07 Aug, 2025 1 commit
  14. 23 Jul, 2025 1 commit
  15. 10 Jul, 2025 1 commit
  16. 03 Jul, 2025 1 commit
  17. 27 Jun, 2025 1 commit
  18. 26 Jun, 2025 1 commit
  19. 25 Jun, 2025 1 commit
  20. 24 Jun, 2025 1 commit
  21. 11 Jun, 2025 1 commit
  22. 04 Jun, 2025 1 commit
  23. 03 Jun, 2025 1 commit
  24. 02 Jun, 2025 1 commit
  25. 29 May, 2025 1 commit
  26. 25 Apr, 2025 1 commit
    • Graham King's avatar
      chore: Publish Model Deployment Card to NATS (#799) · d346782c
      Graham King authored
      This will allow an ingress-side pre-processor to see it without needing a model checkout.
      
      Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.
      
      To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. 
      
      The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.
      
      Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.
      
      Part of #743 
      d346782c
  27. 04 Apr, 2025 1 commit
    • Graham King's avatar
      chore: Upgrade Rust to 1.86 (#518) · e99aa1e1
      Graham King authored
      Also upgrade the cargo resolver to v3, the default.
      
      New clippy lints:
      - `next_back()` instead of `last()` for a double-ended iterator. That avoids walking the whole list.
      - ` repeat_n` instead of `repeat.take`. That avoids cloning.
      - Doc indenting
      e99aa1e1
  28. 24 Mar, 2025 1 commit
  29. 17 Mar, 2025 1 commit
    • Graham King's avatar
      fix(vllm,sglang): Let the engine enforce max tokens (#216) · 05765cd4
      Graham King authored
      Previously several parts of the stack ensured max tokens (for this single request) was set.
      
      Now only text input sets it (to 8k). Everything else leaves as is, potentially blank. The engines themselves have very small defaults, 16 for vllm and 128 for sglang.
      
      Also fix dynamo-run CUDA startup message to only print if we're using an engine that would benefit from it (mistralrs, llamacpp).
      05765cd4
  30. 15 Mar, 2025 1 commit
    • Graham King's avatar
      feat(dynamo-run): Batch mode (#142) · 2cca070c
      Graham King authored
      ```
      dynamo-run in=batch:prompts.jsonl out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct/
      ```
      
      The file has genai format, one entry per line:
      ```
      {"text": "the prompt"}
      {"text": ..etc
      ```
      
      The prompt is evaluated and the output written to `output.jsonl` in the
      same folder as the input.
      
      At the end of the run various statistics are printed:
      > Ran 5 files in 8s 679ms. Tokens in: 40 (5/s). Tokens out: 346 (43/s)
      
      This is also helpful for pushing load into the system and stressing the
      various components. Not intended for performance measurement, it's a
      batch inference tool.
      2cca070c
  31. 14 Mar, 2025 1 commit
  32. 08 Mar, 2025 1 commit
  33. 05 Mar, 2025 1 commit
  34. 27 Feb, 2025 2 commits
  35. 26 Feb, 2025 1 commit
  36. 25 Feb, 2025 2 commits