Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
73e0f8ca
Unverified
Commit
73e0f8ca
authored
Jun 12, 2025
by
J Wyman
Committed by
GitHub
Jun 12, 2025
Browse files
docs: Fix Markdown Render Error (#1502)
parent
0e7d4d82
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
22 additions
and
21 deletions
+22
-21
examples/llm/benchmarks/README.md
examples/llm/benchmarks/README.md
+22
-21
No files found.
examples/llm/benchmarks/README.md
View file @
73e0f8ca
...
...
@@ -85,8 +85,8 @@ With the Dynamo repository, benchmarking image and model available, and **NATS a
./container/run.sh --mount-workspace
```
> [!Tip]
> The huggingface home source mount can be changed by setting `--hf-cache ~/.cache/huggingface`.
> [!Tip]
> The huggingface home source mount can be changed by setting `--hf-cache ~/.cache/huggingface`.
2.
Start disaggregated services
...
...
@@ -95,8 +95,8 @@ With the Dynamo repository, benchmarking image and model available, and **NATS a
dynamo serve benchmarks.disagg:Frontend -f benchmarks/disagg.yaml 1> disagg.log 2>&1 &
```
> [!Tip]
> Check the `disagg.log` to make sure the service is fully started before collecting performance numbers.
> [!Tip]
> Check the `disagg.log` to make sure the service is fully started before collecting performance numbers.
3.
Collect the performance numbers:
...
...
@@ -130,8 +130,8 @@ With the Dynamo repository, benchmarking image and model available, and **NATS a
./container/run.sh --mount-workspace
```
> [!Tip]
> The huggingface home source mount can be changed by setting `--hf-cache ~/.cache/huggingface`.
> [!Tip]
> The huggingface home source mount can be changed by setting `--hf-cache ~/.cache/huggingface`.
2.
Config NATS and ETCD (node 1)
...
...
@@ -140,8 +140,8 @@ With the Dynamo repository, benchmarking image and model available, and **NATS a
export ETCD_ENDPOINTS="<node_0_ip_addr>:2379"
```
> [!Important]
> Node 1 must be able to reach Node 0 over the network for the above services.
> [!Important]
> Node 1 must be able to reach Node 0 over the network for the above services.
3.
Start workers (node 0)
...
...
@@ -150,8 +150,8 @@ With the Dynamo repository, benchmarking image and model available, and **NATS a
dynamo serve benchmarks.disagg_multinode:Frontend -f benchmarks/disagg_multinode.yaml 1> disagg_multinode.log 2>&1 &
```
> [!Tip]
> Check the `disagg_multinode.log` to make sure the service is fully started before collecting performance numbers.
> [!Tip]
> Check the `disagg_multinode.log` to make sure the service is fully started before collecting performance numbers.
4.
Start workers (node 1)
...
...
@@ -160,8 +160,8 @@ With the Dynamo repository, benchmarking image and model available, and **NATS a
dynamo serve components.prefill_worker:PrefillWorker -f benchmarks/disagg_multinode.yaml 1> prefill_multinode.log 2>&1 &
```
> [!Tip]
> Check the `prefill_multinode.log` to make sure the service is fully started before collecting performance numbers.
> [!Tip]
> Check the `prefill_multinode.log` to make sure the service is fully started before collecting performance numbers.
5.
Collect the performance numbers:
...
...
@@ -188,8 +188,8 @@ With the Dynamo repository and the benchmarking image available, perform the fol
./container/run.sh --mount-workspace
```
> [!Tip]
> The Hugging Face home source mount can be changed by setting `--hf-cache ~/.cache/huggingface`.
> [!Tip]
> The Hugging Face home source mount can be changed by setting `--hf-cache ~/.cache/huggingface`.
2.
Start vLLM serve
...
...
@@ -212,10 +212,10 @@ With the Dynamo repository and the benchmarking image available, perform the fol
--port 8002 1> vllm_1.log 2>&1 &
```
> [!Tip]
> Check the `vllm_0.log` and `vllm_1.log` to make sure the service is fully started before collecting performance numbers.
>
> If benchmarking with two or more nodes, `--tensor-parallel-size 8` should be used and only run one `vllm serve` instance per node.
> [!Tip]
> Check the `vllm_0.log` and `vllm_1.log` to make sure the service is fully started before collecting performance numbers.
>
> If benchmarking with two or more nodes, `--tensor-parallel-size 8` should be used and only run one `vllm serve` instance per node.
3.
Use NGINX as load balancer
...
...
@@ -225,8 +225,8 @@ With the Dynamo repository and the benchmarking image available, perform the fol
service nginx restart
```
> [!Note]
> If benchmarking over 2 nodes, the `upstream` configuration will need to be updated to link to the `vllm serve` on the second node.
> [!Note]
> If benchmarking over 2 nodes, the `upstream` configuration will need to be updated to link to the `vllm serve` on the second node.
4.
Collect the performance numbers:
...
...
@@ -258,7 +258,8 @@ Note: As each `perf.sh` adds a new artifacts directory in the `artifacts_root` a
> @ [GitHub](https://github.com/triton-inference-server/perf_analyzer) for additional information about how to run GenAI-Perf
> and how to interpret results.
## Iterpreting Results
## Interpreting Results
### Plotting Pareto Graphs
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment