1. 24 Mar, 2025 1 commit
  2. 17 Mar, 2025 1 commit
    • Graham King's avatar
      fix(vllm,sglang): Let the engine enforce max tokens (#216) · 05765cd4
      Graham King authored
      Previously several parts of the stack ensured max tokens (for this single request) was set.
      
      Now only text input sets it (to 8k). Everything else leaves as is, potentially blank. The engines themselves have very small defaults, 16 for vllm and 128 for sglang.
      
      Also fix dynamo-run CUDA startup message to only print if we're using an engine that would benefit from it (mistralrs, llamacpp).
      05765cd4
  3. 08 Mar, 2025 1 commit
  4. 06 Mar, 2025 1 commit
  5. 05 Mar, 2025 1 commit
  6. 25 Feb, 2025 2 commits
  7. 24 Feb, 2025 1 commit