1. 06 Dec, 2023 1 commit
    • Lyu Han's avatar
      Report the inference benchmark of models with different size (#794) · ebe90bc9
      Lyu Han authored
      * update test scripts for models with different sizes
      
      * update
      
      * only test after tunning gemm
      
      * chmod +x
      
      * fix typo
      
      * benchmark on a100
      
      * fix typo
      
      * fix typo
      
      * per-token latency percentile in profile_throughput
      
      * fix
      
      * fix
      
      * rename
      
      * make the script accept parameters
      
      * minor fix
      
      * indent
      
      * reformat table
      
      * change to 3000
      
      * minor fix
      ebe90bc9