1. 06 Aug, 2024 1 commit
    • Daniel Hiltgen's avatar
      Ensure sparse files on windows during download · fc85f50a
      Daniel Hiltgen authored
      The file.Truncate call on windows will write the whole file
      unless you set the sparse flag, leading to heavy I/O at the
      beginning of download.  This should improve our
      I/O behavior on windows and put less stress on the users disk.
      fc85f50a
  2. 02 Aug, 2024 2 commits
  3. 01 Aug, 2024 5 commits
  4. 31 Jul, 2024 5 commits
  5. 30 Jul, 2024 2 commits
    • royjhan's avatar
      Add Metrics to `api\embed` response (#5709) · 1b44d873
      royjhan authored
      * add prompt tokens to embed response
      
      * rm slog
      
      * metrics
      
      * types
      
      * prompt n
      
      * clean up
      
      * reset submodule
      
      * update tests
      
      * test name
      
      * list metrics
      1b44d873
    • Daniel Hiltgen's avatar
      Prevent partial loading on mixed GPU brands · 34542099
      Daniel Hiltgen authored
      In mult-brand GPU setups, if we couldn't fully load the model we
      would fall through the scheduler and mistakenly try to load across
      a mix of brands.  This makes sure we find the set of GPU(s) that
      best fit for the partial load.
      34542099
  6. 26 Jul, 2024 3 commits
  7. 25 Jul, 2024 1 commit
  8. 22 Jul, 2024 10 commits
  9. 21 Jul, 2024 1 commit
  10. 20 Jul, 2024 1 commit
  11. 19 Jul, 2024 1 commit
  12. 18 Jul, 2024 3 commits
  13. 17 Jul, 2024 2 commits
  14. 16 Jul, 2024 3 commits