1. 06 Dec, 2024 1 commit
    • Nicolas Patry's avatar
      Auto max prefill (#2797) · 5df80590
      Nicolas Patry authored
      * Attempt at automatic max batch prefill.
      
      * Taking into account number of shards.
      
      * Adding more cards.
      
      * Adding A100 + H100
      
      * Adding a few more cards.
      
      * Logprobs cost too much.
      
      * h100 better name, and keep factor of 2
      
      * Damn inflated sparse tflops.
      
      * Typo in h100.
      
      * Updated the flops calculation (checked with fvcore).
      
      * chunking by default.
      
      * Fix prefix caching for chat completion since we removed logprobs.
      
      * More tests.
      
      * Dropping all the prefill logprobs.
      
      * Add a flag that enables users to get logprobs back.
      
      * Repairing prompt token counting.
      
      * Fixing a few tests.
      
      * Remove some scaffolding.
      
      * Attempting to reduces the issues (workarounds for now).
      5df80590
  2. 24 Oct, 2024 1 commit
  3. 30 Sep, 2024 1 commit
    • drbh's avatar
      feat: support phi3.5 moe (#2479) · 93a7042d
      drbh authored
      
      
      * feat: support phi3.5 moe model loading
      
      * fix: prefer llama base model and improve rotary logic
      
      * feat: return reasonable generation and add integration test
      
      * fix: run lint and update docs
      
      * fix: rerun lint for openapi docs
      
      * fix: prefer do_sample false unless temp is set by user, and update chat tests
      
      * fix: small typo adjustments
      
      * fix: consolidate long rope paths
      
      * fix: revert greedy by default and test changes
      
      * Vendor configuration so that we don't have to `trust_remote_code`
      
      * Use SparseMoELayer
      
      * Add support for dense MoE
      
      * Some type annotations
      
      * Add the usual model tests
      
      * Ruff.
      
      ---------
      Co-authored-by: default avatarDaniël de Kok <me@danieldk.eu>
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      93a7042d
  4. 16 Sep, 2024 1 commit