• Lyu Han's avatar
    Report first-token-latency and token-latency percentiles (#736) · 5c9e1e28
    Lyu Han authored
    * update profile scripts
    
    * add top_p, top_k and temperature as input arguments
    
    * fix input_ids
    
    * update profile_throughput
    
    * update profile_restful_api
    
    * update profile_serving
    
    * update
    
    * update
    
    * add progress bar
    
    * remove TODO comments
    
    * update
    
    * remove useless profile_* argument
    
    * remove log level
    
    * change concurrency default value to 64
    
    * update restful_api.md
    
    * update according to review comments
    
    * fix docstring
    5c9e1e28
profile_generation.py 15.4 KB