- 23 Apr, 2025 1 commit
-
-
Abrar Shivani authored
#### Overview: This PR adds a command-line verbosity flag (-v, -vv) to dynamo-run to control log levels. - Added new verbosity flag to Flags struct: - -v: Sets log level to debug - -vv: Sets log level to trace - No flag (default): Keeps log level at info #### Details: - closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/567
-
- 18 Apr, 2025 1 commit
-
-
Graham King authored
-
- 07 Apr, 2025 1 commit
-
-
Graham King authored
As a first step towards KV routing: - introduce a `--router-mode` in dynamo-run that only does random and round-robin right now. Not that interesting yet. - Make the vllm engine publish the KV events received from our patched vllm. Now we "just" need to connect the two. Easy right?
-
- 25 Mar, 2025 1 commit
-
-
Graham King authored
Put the arguments in a JSON file: ``` { "dtype": "half", "trust_remote_code": true } ``` Pass it like this: ``` dynamo-run out=sglang ~/llm_models/Llama-3.2-3B-Instruct --extra-engine-args sglang_extra.json ``` Requested here https://github.com/ai-dynamo/dynamo/issues/290 (`dtype`) and here https://github.com/ai-dynamo/dynamo/issues/360 (`trust_remote_code`).
-
- 12 Mar, 2025 1 commit
-
-
Graham King authored
Command line arguments are passed to the python engine like this: ``` dynamo-run out=pystr:my_python_engine.py -- -n 42 --custom-arg Orange --yes ``` The python engine receives the arguments in `sys.argv`. The argument list will include some standard ones as well as anything after the `--`. This input: ``` dynamo-run out=pystr:my_engine.py /opt/models/Llama-3.2-3B-Instruct/ --model-name llama_3.2 --tensor-parallel-size 4 -- -n 1 ``` is read like this: ``` async def generate(request): .. as before .. if __name__ == "__main__": print(f"MAIN: {sys.argv}") ``` and produces this output: ``` MAIN: ['my_engine.py', '--model-path', '/opt/models/Llama-3.2-3B-Instruct/', '--model-name', 'llama3.2', '--http-port', '8080', '--tensor-parallel-size', '4', '--base-gpu-id', '0', '--num-nodes', '1', '--node-rank', '0', '-n', '1'] ``` This allows quick iteration on the engine setup. Note how the `-n` `1` is included. Flags `--leader-addr` and `--model-config` will also be added if provided to `dynamo-run`.
-
- 11 Mar, 2025 1 commit
-
-
Graham King authored
- Latest from repo, many improvements - Support most of the OpenAI request features (temperature, top_p, etc) - Download models from Hugging Face if necessary
-
- 08 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-
- 05 Mar, 2025 1 commit
-
-
Graham King authored
-
- 04 Mar, 2025 1 commit
-
-
Graham King authored
Needs more testing but good enough for now. I get the same results with this as with `vllm serve`.
-