• Lyu Han's avatar
    Report the inference benchmark of models with different size (#794) · ebe90bc9
    Lyu Han authored
    * update test scripts for models with different sizes
    
    * update
    
    * only test after tunning gemm
    
    * chmod +x
    
    * fix typo
    
    * benchmark on a100
    
    * fix typo
    
    * fix typo
    
    * per-token latency percentile in profile_throughput
    
    * fix
    
    * fix
    
    * rename
    
    * make the script accept parameters
    
    * minor fix
    
    * indent
    
    * reformat table
    
    * change to 3000
    
    * minor fix
    ebe90bc9
benchmark_70b.sh 1.9 KB