- 08 May, 2024 2 commits
-
-
Daniel Hiltgen authored
This records more GPU usage information for eventual UX inclusion.
-
Michael Yang authored
-
- 07 May, 2024 1 commit
-
-
Michael Yang authored
-
- 06 May, 2024 5 commits
-
-
Michael Yang authored
-
Michael Yang authored
- FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32} -
Daniel Hiltgen authored
Trying to live off the land for cuda libraries was not the right strategy. We need to use the version we compiled against to ensure things work properly
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
* fix llava models not working after first request * individual requests only for llava models
-
- 05 May, 2024 1 commit
-
-
Daniel Hiltgen authored
This moves all the env var reading into one central module and logs the loaded config once at startup which should help in troubleshooting user server logs
-
- 04 May, 2024 1 commit
-
-
Michael Yang authored
-
- 01 May, 2024 5 commits
-
-
Mark Ward authored
-
Mark Ward authored
-
Mark Ward authored
log when the waiting for the process to stop to help debug when other tasks execute during this wait. expire timer clear the timer reference because it will not be reused. close will clean up expireTimer if calling code has not already done this.
-
Mark Ward authored
-
Jeffrey Morgan authored
-
- 30 Apr, 2024 4 commits
-
-
jmorganca authored
-
jmorganca authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
* Bump llama.cpp to b2761 * Adjust types for bump
-
- 29 Apr, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 27 Apr, 2024 3 commits
-
-
Hernan Martinez authored
-
Hernan Martinez authored
-
Hernan Martinez authored
-
- 26 Apr, 2024 9 commits
-
-
Daniel Hiltgen authored
This will speed up CI which already tries to only build static for unit tests
-
Daniel Hiltgen authored
-
Michael Yang authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This will make it simpler for CI to accumulate artifacts from prior steps
-
- 25 Apr, 2024 4 commits
-
-
Jeffrey Morgan authored
* llm: limit generation to 10x context size to avoid run on generations * add comment * simplify condition statement
-
Michael Yang authored
-
jmorganca authored
-
Roy Yang authored
-
- 24 Apr, 2024 2 commits
-
-
Patrick Devine authored
-
Daniel Hiltgen authored
If we get our predictions wrong, this can be used to set a lower memory limit as a workaround. Recent multi-gpu refactoring accidentally removed it, so this adds it back.
-
- 23 Apr, 2024 2 commits
-
-
Daniel Hiltgen authored
Now that the llm runner is an executable and not just a dll, more users are facing problems with security policy configurations on windows that prevent users writing to directories and then executing binaries from the same location. This change removes payloads from the main executable on windows and shifts them over to be packaged in the installer and discovered based on the executables location. This also adds a new zip file for people who want to "roll their own" installation model.
-
Daniel Hiltgen authored
Tmp cleaners can nuke the file out from underneath us. This detects the missing runner, and re-initializes the payloads.
-