• Nicolas Patry's avatar
    Choosing input/total tokens automatically based on available VRAM? (#2673) · 0c9b6cdd
    Nicolas Patry authored
    * Choosing input/total tokens automatically based on available VRAM?
    
    * Update doc.
    
    * Remove generated files.
    
    * Trying to fix non chunking targets.
    
    * Attempt #2
    
    * fix.
    
    * QuantLinear is rocm compatible.
    
    * Much simpler logic after the overhead.
    
    * Updating logic + non flash.
    
    * Revert doc text.
    
    * Simple updates.
    
    * Fix integration mt0 (transformers update).
    0c9b6cdd
server.py 11.1 KB