"...git@developer.sourcefind.cn:jerrrrry/infinilm.git" did not exist on "21f83e9176ad66d20cc5a4d336d8fd506218c59d"
dynamo_sdk.md 2.72 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# Dynamo SDK

Dynamo is a python based SDK for building and deploying distributed inference applications. Dynamo leverages concepts from open source projects like [BentoML](https://github.com/bentoml/bentoml) to provide a developer friendly experience to go from local development to K8s deployment.

## Installation

```bash
pip install ai-dynamo
```

## Quickstart
Lets build a simple distributed pipeline with 3 components: `Frontend`, `Middle` and `Backend`. The structure of the pipeline looks like this:

```
Users/Clients (HTTP)


┌─────────────┐
│  Frontend   │  HTTP API endpoint (/generate)
└─────────────┘


┌─────────────┐
│   Middle    │
└─────────────┘


┌─────────────┐
│  Backend    │
└─────────────┘
```

The code for the pipeline looks like this:

```python
# filename: pipeline.py

from dynamo.sdk import service, dynamo_endpoint, depends, api
from pydantic import BaseModel

class RequestType(BaseModel):
    text: str

@service(resources={"cpu": "1"})
class Frontend:
    middle = depends(Middle)

    @api
    async def generate(self, text: str):
        request = RequestType(text=text)
        async for response in self.middle.generate(request.model_dump_json()):
            yield f"Frontend: {response}"

@service(
    resources={"cpu": "1"},
    dynamo={"enabled": True, "namespace": "inference"}
)
class Middle:
    backend = depends(Backend)

    @dynamo_endpoint()
    async def generate(self, req: RequestType):
        text = f"{req.text}-mid"
        for token in text.split():
            yield f"Mid: {token}"

@service(
    resources={"cpu": "1"},
    dynamo={"enabled": True, "namespace": "inference"}
)
class Backend:
    @dynamo_endpoint()
    async def generate(self, req: RequestType):
        text = f"{req.text}-back"
        for token in text.split():
            yield f"Backend: {token}"
```

You can run this pipeline locally by spinning up ETCD and NATS and then running the pipeline:

```bash
# Spin up ETCD and NATS
docker compose -f deploy/docker-compose.yml up -d
```

then

```bash
# Run the pipeline
dynamo serve pipeline:Frontend
```

Once it's up and running, you can make a request to the pipeline using

```bash
curl -X POST http://localhost:3000/generate \
    -H "Content-Type: application/json" \
    -d '{"text": "federer"}'
```

You should see the following output:

```bash
federer-mid-back
```

You can find in-depth documentation for the Dynamo SDK [here](../../deploy/dynamo/sdk/docs/sdk/README.md) and the Dynamo CLI [here](../../deploy/dynamo/sdk/docs/cli/README.md)