- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 01 Aug, 2024 12 commits
-
-
royjhan authored
* docs without usage * no usage * rm metric note
-
royjhan authored
-
royjhan authored
* add prompt tokens to embed response * rm slog * metrics * types * prompt n * clean up * reset submodule * add tokens to v1/embeddings * separate usage
-
royjhan authored
* OpenAI Docs * Update docs/openai.md Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> * Remove newline --------- Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com>
-
Michael Yang authored
Fix context in /api/generate grows too much (#5980).
-
Michael Yang authored
refactor convert
-
Vyacheslav Moskalev authored
-
Vyacheslav Moskalev authored
-
Vyacheslav Moskalev authored
-
Vyacheslav Moskalev authored
-
Vyacheslav Moskalev authored
-
Michael Yang authored
fix modelfile message quotes
-
- 31 Jul, 2024 17 commits
-
-
Michael Yang authored
-
Michael Yang authored
patches: phi3 optional sliding window attention
-
Blake Mizerany authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
include modelfile messages
-
Michael Yang authored
fix: environ lookup
-
Daniel Nguyen authored
-
Jeffrey Morgan authored
-
Michael authored
Firebase Genkit
-
Jeffrey Morgan authored
Better example for multi-modal input
-
jmorganca authored
-
- 30 Jul, 2024 4 commits
-
-
royjhan authored
* add prompt tokens to embed response * rm slog * metrics * types * prompt n * clean up * reset submodule * update tests * test name * list metrics
-
Daniel Hiltgen authored
Prevent partial loading on mixed GPU brands
-
Daniel Hiltgen authored
In mult-brand GPU setups, if we couldn't fully load the model we would fall through the scheduler and mistakenly try to load across a mix of brands. This makes sure we find the set of GPU(s) that best fit for the partial load.
-
Kim Hallberg authored
* Update example models * Remove unused README.md
-
- 29 Jul, 2024 6 commits
-
-
Daniel Hiltgen authored
Better explain multi-gpu behavior
-
Daniel Hiltgen authored
Ensure amd gpu nodes are numerically sorted
-
Daniel Hiltgen authored
Report better error on cuda unsupported os/arch
-
royjhan authored
* hot fix * backend stream support * clean up * finish reason * move to openai
-
Daniel Hiltgen authored
Explain font problems on windows 10
-
Jeffrey Morgan authored
-