Report the inference benchmark of models with different size (#794)
* update test scripts for models with different sizes * update * only test after tunning gemm * chmod +x * fix typo * benchmark on a100 * fix typo * fix typo * per-token latency percentile in profile_throughput * fix * fix * rename * make the script accept parameters * minor fix * indent * reformat table * change to 3000 * minor fix
Showing
benchmark/benchmark_13b.sh
0 → 100755
benchmark/benchmark_20b.sh
0 → 100755
benchmark/benchmark_70b.sh
0 → 100755
benchmark/benchmark_7b.sh
0 → 100755
Please register or sign in to comment