- 11 Mar, 2025 8 commits
-
-
ptarasiewiczNV authored
-
Hongkuan Zhou authored
Co-authored-by:hongkuanz <hongkuanz@nvidia.com>
-
Tanmay Verma authored
-
Piotr Marcinkiewicz authored
-
ishandhanani authored
-
Hongkuan Zhou authored
Co-authored-by:hongkuanz <hongkuanz@nvidia.com>
-
Anant Sharma authored
-
Biswa Panda authored
-
- 10 Mar, 2025 8 commits
-
-
Anant Sharma authored
-
Ryan McCormick authored
-
Tanmay Verma authored
Co-authored-by:Shreyas Misra <shreyasm@nvidia.com>
-
Graham King authored
For the `echo` and `pystr` engines we previously required the user to pass `--model-name <x>` so we would have a name for the model. If the input is HTTP we do need this to match on the users' JSON request. If the input is Text we don't need a name. So if the input is Text and we don't already have a name for the model, give it one.
-
Harrison Saturley-Hall authored
-
Anant Sharma authored
-
Ryan McCormick authored
-
Dmitry Tokarev authored
-
- 09 Mar, 2025 8 commits
-
-
Alec authored
-
Neelay Shah authored
Co-authored-by:Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
-
Alec authored
-
Harrison Saturley-Hall authored
-
Neelay Shah authored
Co-authored-by:
Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Co-authored-by:
Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
-
GuanLuo authored
-
Hongkuan Zhou authored
Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
hongkuan <hongkuanz@nvidia.com> Co-authored-by:
Piotr Tarasiewicz <ptarasiewicz@nvidia.com> Co-authored-by:
Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local> Co-authored-by:
alec-flowers <aflowers@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
ptarasiewiczNV authored
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
-
- 08 Mar, 2025 7 commits
-
-
Meenakshi Sharma authored
-
Harrison Saturley-Hall authored
-
Dmitry Tokarev authored
-
Neelay Shah authored
-
Neelay Shah authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-
Pavithra Vijayakrishnan authored
-
GuanLuo authored
-
- 07 Mar, 2025 9 commits
-
-
Anant Sharma authored
-
Ryan McCormick authored
-
Graham King authored
There are two etcd keys: - The service - The model The second one is the interesting one for us. Previously we confused the two.
-
Biswa Panda authored
Co-authored-by:Neelay Shah <neelays@nvidia.com>
-
Ryan McCormick authored
Replaces hard-coded "kv-hit-rate" string in multiple places with KV_HIT_RATE_SUBJECT constant in lib/llm.
-
ptarasiewiczNV authored
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
-
Graham King authored
Instead of using `out=pystr:<my.py>` we can now do this: ``` dynemo-run out=pytok:/home/graham/my_python_engine.py --model-path <hf-repo-checkout> ``` That engine will receive and respond with tokens. Here's an example engine file: ``` import asyncio async def generate(request): yield {"token_ids":[791]} await asyncio.sleep(0.1) yield {"token_ids":[6864]} await asyncio.sleep(0.1) yield {"token_ids":[315]} await asyncio.sleep(0.1) yield {"token_ids":[9822]} await asyncio.sleep(0.1) yield {"token_ids":[374]} await asyncio.sleep(0.1) yield {"token_ids":[12366]} await asyncio.sleep(0.1) yield {"token_ids":[13]} ``` Also reduce duplication by making the bindings engine use the llm lib engine. -
Piotr Marcinkiewicz authored
-
Neelay Shah authored
-