1. 02 Aug, 2024 2 commits
  2. 01 Aug, 2024 5 commits
  3. 31 Jul, 2024 5 commits
  4. 30 Jul, 2024 2 commits
    • royjhan's avatar
      Add Metrics to `api\embed` response (#5709) · 1b44d873
      royjhan authored
      * add prompt tokens to embed response
      
      * rm slog
      
      * metrics
      
      * types
      
      * prompt n
      
      * clean up
      
      * reset submodule
      
      * update tests
      
      * test name
      
      * list metrics
      1b44d873
    • Daniel Hiltgen's avatar
      Prevent partial loading on mixed GPU brands · 34542099
      Daniel Hiltgen authored
      In mult-brand GPU setups, if we couldn't fully load the model we
      would fall through the scheduler and mistakenly try to load across
      a mix of brands.  This makes sure we find the set of GPU(s) that
      best fit for the partial load.
      34542099
  5. 26 Jul, 2024 3 commits
  6. 25 Jul, 2024 1 commit
  7. 22 Jul, 2024 10 commits
  8. 21 Jul, 2024 1 commit
  9. 20 Jul, 2024 1 commit
  10. 19 Jul, 2024 1 commit
  11. 18 Jul, 2024 3 commits
  12. 17 Jul, 2024 2 commits
  13. 16 Jul, 2024 4 commits