Commit 8492333b authored by Piotr Marcinkiewicz's avatar Piotr Marcinkiewicz Committed by GitHub
Browse files

Adjustment for multi node benchmark


Signed-off-by: default avatarPiotr Marcinkiewicz <piotrm@nvidia.com>
parent 04e50aba
...@@ -36,7 +36,7 @@ throughput within the SLA. ...@@ -36,7 +36,7 @@ throughput within the SLA.
For example for input sequence length 3000 and output sequence length For example for input sequence length 3000 and output sequence length
150 after sweeping different tensor parallellism strategies on two 150 after sweeping different tensor parallellism strategies on two
8 x H100 GPU nodes, we've found that using 2 instances of TP 4 for 8 x H100 GPU nodes, we've found that using 4 instances of TP 2 for
context (on one node) and using 1 instance of TP 8 for generate (on context (on one node) and using 1 instance of TP 8 for generate (on
the second node) gives the best throughput at a latency target of 50 the second node) gives the best throughput at a latency target of 50
tokens per sec per user. tokens per sec per user.
...@@ -102,11 +102,10 @@ genai-perf profile \ ...@@ -102,11 +102,10 @@ genai-perf profile \
The following results are given as an example, are not fully The following results are given as an example, are not fully
optimized, and do not indicate what you may get locally. optimized, and do not indicate what you may get locally.
| label | configuration | concurrency | output_token_throughput_per_request | output_token_throughput_per_gpu | time_to_first_token | inter_token_latency | | label | configuration | concurrency | output token throughput per request | output token throughput per gpu | time to first token | inter token latency |
|----------|--------------------------------|-------------|-------------------------------------|---------------------------------|---------------------|---------------------| |----------|--------------------------------|-------------|-------------------------------------|---------------------------------|---------------------|---------------------|
| disagg | context_tp2dp4_generate_tp8dp1 | 48 | 49.18197330348195 | 87.55798331 | 1157.4852116520833 | 15.935926391666667 | | disagg | context tp2dp4 generate tp8dp1 | 48 | 49.18197330348195 | 87.55798331 | 1157.4852116520833 | 15.935926391666667 |
| baseline | baseline_tp4dp1 | 4 | 50.27116554062172 | 56.26445983 | 709.2506074249999 | 15.265875249999999 | | baseline | baseline tp4dp1 | 4 | 50.27116554062172 | 56.26445983 | 709.2506074249999 | 15.265875249999999 |
### Baseline Comparison ### Baseline Comparison
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment