1. 18 Jun, 2024 2 commits
    • Daniel Hiltgen's avatar
      Tighten up memory prediction logging · 7784ca33
      Daniel Hiltgen authored
      Prior to this change, we logged the memory prediction multiple times
      as the scheduler iterates to find a suitable configuration, which can be
      confusing since only the last log before the server starts is actually valid.
      This now logs once just before starting the server on the final configuration.
      It also reports what library instead of always saying "offloading to gpu" when
      using CPU.
      7784ca33
    • Daniel Hiltgen's avatar
      Merge pull request #5105 from dhiltgen/cuda_mmap · c9c8c98b
      Daniel Hiltgen authored
      Adjust mmap logic for cuda windows for faster model load
      c9c8c98b
  2. 17 Jun, 2024 9 commits
  3. 16 Jun, 2024 4 commits
  4. 15 Jun, 2024 9 commits
  5. 14 Jun, 2024 16 commits