• Yuanheng Zhao's avatar
    [Infer] Serving example w/ ray-serve (multiple GPU case) (#4841) · 573f2705
    Yuanheng Zhao authored
    * fix imports
    
    * add ray-serve with Colossal-Infer tp
    
    * trivial: send requests script
    
    * add README
    
    * fix worker port
    
    * fix readme
    
    * use app builder and autoscaling
    
    * trivial: input args
    
    * clean code; revise readme
    
    * testci (skip example test)
    
    * use auto model/tokenizer
    
    * revert imports fix (fixed in other PRs)
    573f2705
send_requests.py 764 Bytes