api.md 5.27 KB
Newer Older
1
2
# API

3
4
5
6
7
8
9
10
## Endpoints

- [Generate a completion](#generate-a-completion)
- [Create a model](#create-a-model)
- [List local models](#list-local-models)
- [Copy a model](#copy-a-model)
- [Delete a model](#delete-a-model)
- [Pull a model](#pull-a-model)
11
- [Generate embeddings](#generate-embeddings)
Matt Williams's avatar
Matt Williams committed
12

13
## Conventions
Matt Williams's avatar
Matt Williams committed
14

15
### Model names
Matt Williams's avatar
Matt Williams committed
16

Jeffrey Morgan's avatar
Jeffrey Morgan committed
17
Model names follow a `model:tag` format. Some examples are `orca-mini:3b-q4_1` and `llama2:70b`. The tag is optional and if not provided will default to `latest`. The tag is used to identify a specific version.
Matt Williams's avatar
Matt Williams committed
18
19
20

### Durations

21
All durations are returned in nanoseconds.
Matt Williams's avatar
Matt Williams committed
22

23
## Generate a completion
Matt Williams's avatar
Matt Williams committed
24

25
26
27
```
POST /api/generate
```
28

29
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
Matt Williams's avatar
Matt Williams committed
30

31
### Parameters
Matt Williams's avatar
Matt Williams committed
32

33
34
- `model`: (required) the [model name](#model-names)
- `prompt`: the prompt to generate a response for
Matt Williams's avatar
Matt Williams committed
35

36
Advanced parameters:
Matt Williams's avatar
Matt Williams committed
37

38
39
40
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `system`: system prompt to (overrides what is defined in the `Modelfile`)
- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
Bruce MacDonald's avatar
Bruce MacDonald committed
41
- `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
42
43

### Request
Matt Williams's avatar
Matt Williams committed
44

45
46
47
48
49
50
```
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'
```
Matt Williams's avatar
Matt Williams committed
51

Matt Williams's avatar
Matt Williams committed
52
53
### Response

54
A stream of JSON objects:
Matt Williams's avatar
Matt Williams committed
55

56
```json
Matt Williams's avatar
Matt Williams committed
57
{
58
59
60
  "model": "llama2:7b",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "The",
Matt Williams's avatar
Matt Williams committed
61
62
63
64
  "done": false
}
```

65
The final response in the stream also includes additional data about the generation:
Matt Williams's avatar
Matt Williams committed
66

67
68
69
70
71
72
73
74
- `total_duration`: time spent generating the response
- `load_duration`: time spent in nanoseconds loading the model
- `sample_count`: number of samples generated
- `sample_duration`: time spent generating samples
- `prompt_eval_count`: number of tokens in the prompt
- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_count`: number of tokens the response
- `eval_duration`: time in nanoseconds spent generating the response
Bruce MacDonald's avatar
Bruce MacDonald committed
75
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
76
77
78

To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.

79
```json
Matt Williams's avatar
Matt Williams committed
80
{
81
82
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
Bruce MacDonald's avatar
Bruce MacDonald committed
83
  "context": [1, 2, 3],
84
85
86
87
88
89
90
91
92
93
  "done": true,
  "total_duration": 5589157167,
  "load_duration": 3013701500,
  "sample_count": 114,
  "sample_duration": 81442000,
  "prompt_eval_count": 46,
  "prompt_eval_duration": 1160282000,
  "eval_count": 113,
  "eval_duration": 1325948000
}
Matt Williams's avatar
Matt Williams committed
94
95
```

96
## Create a Model
Matt Williams's avatar
Matt Williams committed
97

98
99
100
101
102
```
POST /api/create
```

Create a model from a [`Modelfile`](./modelfile.md)
103

104
### Parameters
Matt Williams's avatar
Matt Williams committed
105

106
107
- `name`: name of the model to create
- `path`: path to the Modelfile
Matt Williams's avatar
Matt Williams committed
108
109
110

### Request

111
112
113
114
115
```
curl -X POST http://localhost:11434/api/create -d '{
  "name": "mario",
  "path": "~/Modelfile"
}'
Matt Williams's avatar
Matt Williams committed
116
117
118
119
```

### Response

120
A stream of JSON objects. When finished, `status` is `success`
Matt Williams's avatar
Matt Williams committed
121

122
```json
Matt Williams's avatar
Matt Williams committed
123
124
125
126
127
{
  "status": "parsing modelfile"
}
```

128
## List Local Models
Matt Williams's avatar
Matt Williams committed
129
130

```
131
GET /api/tags
Matt Williams's avatar
Matt Williams committed
132
133
```

134
List models that are available locally.
Matt Williams's avatar
Matt Williams committed
135
136
137

### Request

138
139
140
```
curl http://localhost:11434/api/tags
```
Matt Williams's avatar
Matt Williams committed
141
142
143

### Response

144
```json
Matt Williams's avatar
Matt Williams committed
145
146
147
{
  "models": [
    {
148
149
150
151
152
153
154
      "name": "llama2:7b",
      "modified_at": "2023-08-02T17:02:23.713454393-07:00",
      "size": 3791730596
    },
    {
      "name": "llama2:13b",
      "modified_at": "2023-08-08T12:08:38.093596297-07:00",
Matt Williams's avatar
Matt Williams committed
155
      "size": 7323310500
Matt Williams's avatar
Matt Williams committed
156
157
    }
  ]
Matt Williams's avatar
Matt Williams committed
158
159
160
}
```

161
## Copy a Model
Matt Williams's avatar
Matt Williams committed
162
163

```
164
POST /api/copy
Matt Williams's avatar
Matt Williams committed
165
```
166

167
Copy a model. Creates a model with another name from an existing model.
Matt Williams's avatar
Matt Williams committed
168
169
170
171

### Request

```
172
173
174
curl http://localhost:11434/api/copy -d '{
  "source": "llama2:7b",
  "destination": "llama2-backup"
Matt Williams's avatar
Matt Williams committed
175
176
177
}'
```

Matt Williams's avatar
Matt Williams committed
178
## Delete a Model
Matt Williams's avatar
Matt Williams committed
179

180
181
```
DELETE /api/delete
Matt Williams's avatar
Matt Williams committed
182
183
```

184
Delete a model and its data.
Matt Williams's avatar
Matt Williams committed
185

186
### Parameters
Matt Williams's avatar
Matt Williams committed
187

188
- `model`: model name to delete
Matt Williams's avatar
Matt Williams committed
189

190
### Request
Matt Williams's avatar
Matt Williams committed
191

192
193
194
```
curl -X DELETE http://localhost:11434/api/delete -d '{
  "name": "llama2:13b"
Matt Williams's avatar
Matt Williams committed
195
196
197
}'
```

198
## Pull a Model
Matt Williams's avatar
Matt Williams committed
199

200
201
202
203
204
```
POST /api/pull
```

Download a model from a the model registry. Cancelled pulls are resumed from where they left off, and multiple calls to will share the same download progress.
Matt Williams's avatar
Matt Williams committed
205

206
### Parameters
Matt Williams's avatar
Matt Williams committed
207

208
- `name`: name of the model to pull
Matt Williams's avatar
Matt Williams committed
209
210
211

### Request

212
213
214
215
```
curl -X POST http://localhost:11434/api/pull -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
216
217
218
```

### Response
219

220
```json
Matt Williams's avatar
Matt Williams committed
221
{
222
223
224
  "status": "downloading digestname",
  "digest": "digestname",
  "total": 2142590208
Matt Williams's avatar
Matt Williams committed
225
226
}
```
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259

## Generate Embeddings

```
POST /api/embeddings
```

Generate embeddings from a model

### Parameters

- `model`: name of model to generate embeddings from
- `prompt`: text to generate embeddings for

### Request

```
curl -X POST http://localhost:11434/api/embeddings -d '{
  "model": "llama2:7b",
  "prompt": "Here is an article about llamas..."
}'
```

### Response

```json
{
  "embeddings": [
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
  ]
}
```