- 09 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
-
- 08 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-
- 07 Mar, 2025 2 commits
-
-
Graham King authored
There are two etcd keys: - The service - The model The second one is the interesting one for us. Previously we confused the two.
-
Ryan McCormick authored
Replaces hard-coded "kv-hit-rate" string in multiple places with KV_HIT_RATE_SUBJECT constant in lib/llm.
-
- 06 Mar, 2025 2 commits
-
-
Ryan McCormick authored
-
Ryan McCormick authored
-
- 05 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
- 27 Feb, 2025 1 commit
-
-
Ryan Olson authored
-
- 26 Feb, 2025 2 commits
-
-
Ryan McCormick authored
Co-authored-by:Ryan Olson <rolson@nvidia.com>
-
Graham King authored
This means we don't need to explain the parts to the users until they are ready. We use what they provide and default the rest. Allows all of this and more: - `tio out=tdr://test` - `tio out=tdr://llama_8b_pool` - `tio in=tdr://corp_ai_research_group/model_next-20250226` - `tio out=tdr://AIRE.NIM.migrate.mistralrs.1802` Python, API, etc all untouched.
-
- 25 Feb, 2025 3 commits
-
-
Alec authored
Co-authored-by:aflowers <aflowers@nvidia.com>
-
Paul Hendricks authored
-
Neelay Shah authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-