@@ -35,8 +35,8 @@ To address the growing demands of distributed inference serving, NVIDIA introduc
The following diagram outlines Dynamo's high-level architecture. To enable large-scale distributed and disaggregated inference serving, Dynamo includes four key features.
Dynamo is a python based SDK for building and deploying distributed inference applications. Dynamo leverages concepts from open source projects like [BentoML](https://github.com/bentoml/bentoml) to provide a developer friendly experience to go from local development to K8s deployment.
## Installation
```bash
pip install ai-dynamo
```
## Quickstart
Lets build a simple distributed pipeline with 3 components: `Frontend`, `Middle` and `Backend`. The structure of the pipeline looks like this:
You can run this pipeline locally by spinning up ETCD and NATS and then running the pipeline:
```bash
# Spin up ETCD and NATS
docker compose -f deploy/docker-compose.yml up -d
```
then
```bash
# Run the pipeline
dynamo serve pipeline:Frontend
```
Once it's up and running, you can make a request to the pipeline using
```bash
curl -X POST http://localhost:3000/generate \
-H"Content-Type: application/json"\
-d'{"text": "federer"}'
```
You should see the following output:
```bash
federer-mid-back
```
You can find in-depth documentation for the Dynamo SDK [here](../../deploy/dynamo/sdk/docs/sdk/README.md) and the Dynamo CLI [here](../../deploy/dynamo/sdk/docs/cli/README.md)