1. 06 Dec, 2024 1 commit
    • Nicolas Patry's avatar
      Auto max prefill (#2797) · 5df80590
      Nicolas Patry authored
      * Attempt at automatic max batch prefill.
      
      * Taking into account number of shards.
      
      * Adding more cards.
      
      * Adding A100 + H100
      
      * Adding a few more cards.
      
      * Logprobs cost too much.
      
      * h100 better name, and keep factor of 2
      
      * Damn inflated sparse tflops.
      
      * Typo in h100.
      
      * Updated the flops calculation (checked with fvcore).
      
      * chunking by default.
      
      * Fix prefix caching for chat completion since we removed logprobs.
      
      * More tests.
      
      * Dropping all the prefill logprobs.
      
      * Add a flag that enables users to get logprobs back.
      
      * Repairing prompt token counting.
      
      * Fixing a few tests.
      
      * Remove some scaffolding.
      
      * Attempting to reduces the issues (workarounds for now).
      5df80590
  2. 15 Oct, 2024 1 commit
    • Alvaro Bartolome's avatar
      Rollback to `ChatRequest` for Vertex AI Chat instead of `VertexChat` (#2651) · ffe05ccd
      Alvaro Bartolome authored
      As spotted by @philschmid, the payload was compliant with Vertex AI, but
      just partially, since ideally the most compliant version would be with
      the generation kwargs flattened to be on the same level as the
      `messages`; meaning that Vertex AI would still expect a list of
      instances, but each instance would be an OpenAI-compatible instance,
      which is more clear; and more aligned with the SageMaker integration
      too, so kudos to him for spotting that; and sorry from my end for any
      inconvenience @Narsil.
      ffe05ccd
  3. 24 Sep, 2024 1 commit
    • Nicolas Patry's avatar
      Cleanup Vertex + Chat (#2553) · c032280b
      Nicolas Patry authored
      * Cleanup Vertex + Chat
      
      * logprobs defaults to false.
      
      * Parameters are optional
      
      * Fix  docs.
      
      * Changing back this logprobs default.
      
      * Fixup doc.
      
      * Let's debug that.
      
      * Not unstable.
      
      * Updating Cargo ?
      
      * Wat?
      
      * Dummy change.
      
      * Trying some other install.
      
      * Trying smething.
      
      * Revert everything.
      
      * Update Cargo lock.
      
      * Fixing the pre-commit after rebase.
      c032280b