api.md 8.67 KB
Newer Older
1
2
# API

3
4
5
## Endpoints

- [Generate a completion](#generate-a-completion)
Matt Williams's avatar
Matt Williams committed
6
7
8
9
10
11
12
13
14
- [Create a Model](#create-a-model)
- [List Local Models](#list-local-models)
- [Show Model Information](#show-model-information)
- [Copy a Model](#copy-a-model)
- [Delete a Model](#delete-a-model)
- [Pull a Model](#pull-a-model)
- [Push a Model](#push-a-model)
- [Generate Embeddings](#generate-embeddings)

15
## Conventions
Matt Williams's avatar
Matt Williams committed
16

17
### Model names
Matt Williams's avatar
Matt Williams committed
18

Matt Williams's avatar
Matt Williams committed
19
Model names follow a `model:tag` format. Some examples are `orca-mini:3b-q4_1` and `llama2:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
Matt Williams's avatar
Matt Williams committed
20
21
22

### Durations

23
All durations are returned in nanoseconds.
Matt Williams's avatar
Matt Williams committed
24

25
26
27
28
### Streaming responses

Certain endpoints stream responses as JSON objects delineated with the newline (`\n`) character.

29
## Generate a completion
Matt Williams's avatar
Matt Williams committed
30

Matt Williams's avatar
Matt Williams committed
31
```shell
32
33
POST /api/generate
```
34

35
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
Matt Williams's avatar
Matt Williams committed
36

37
### Parameters
Matt Williams's avatar
Matt Williams committed
38

39
40
- `model`: (required) the [model name](#model-names)
- `prompt`: the prompt to generate a response for
Matt Williams's avatar
Matt Williams committed
41

42
Advanced parameters (optional):
Matt Williams's avatar
Matt Williams committed
43

44
45
46
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `system`: system prompt to (overrides what is defined in the `Modelfile`)
- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
Bruce MacDonald's avatar
Bruce MacDonald committed
47
- `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
48
- `stream`: if `false` the response will be be returned as a single response object, rather than a stream of objects
49
50

### Request
Matt Williams's avatar
Matt Williams committed
51

Matt Williams's avatar
Matt Williams committed
52
```shell
53
54
55
56
57
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'
```
Matt Williams's avatar
Matt Williams committed
58

Matt Williams's avatar
Matt Williams committed
59
60
### Response

61
A stream of JSON objects:
Matt Williams's avatar
Matt Williams committed
62

63
```json
Matt Williams's avatar
Matt Williams committed
64
{
65
66
67
  "model": "llama2:7b",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "The",
Matt Williams's avatar
Matt Williams committed
68
69
70
71
  "done": false
}
```

72
The final response in the stream also includes additional data about the generation:
Matt Williams's avatar
Matt Williams committed
73

74
75
76
77
78
79
80
81
- `total_duration`: time spent generating the response
- `load_duration`: time spent in nanoseconds loading the model
- `sample_count`: number of samples generated
- `sample_duration`: time spent generating samples
- `prompt_eval_count`: number of tokens in the prompt
- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_count`: number of tokens the response
- `eval_duration`: time in nanoseconds spent generating the response
Bruce MacDonald's avatar
Bruce MacDonald committed
82
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
83
- `response`: empty if the response was streamed, if not streamed, this will contain the full response
84
85
86

To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.

87
```json
Matt Williams's avatar
Matt Williams committed
88
{
89
90
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
91
  "response": "",
Bruce MacDonald's avatar
Bruce MacDonald committed
92
  "context": [1, 2, 3],
93
94
95
96
97
98
99
100
101
102
  "done": true,
  "total_duration": 5589157167,
  "load_duration": 3013701500,
  "sample_count": 114,
  "sample_duration": 81442000,
  "prompt_eval_count": 46,
  "prompt_eval_duration": 1160282000,
  "eval_count": 113,
  "eval_duration": 1325948000
}
Matt Williams's avatar
Matt Williams committed
103
104
```

105
## Create a Model
Matt Williams's avatar
Matt Williams committed
106

Matt Williams's avatar
Matt Williams committed
107
```shell
108
109
110
111
POST /api/create
```

Create a model from a [`Modelfile`](./modelfile.md)
112

113
### Parameters
Matt Williams's avatar
Matt Williams committed
114

115
116
- `name`: name of the model to create
- `path`: path to the Modelfile
117
- `stream`: (optional) if `false` the response will be be returned as a single response object, rather than a stream of objects
Matt Williams's avatar
Matt Williams committed
118
119
120

### Request

Matt Williams's avatar
Matt Williams committed
121
```shell
122
123
124
125
curl -X POST http://localhost:11434/api/create -d '{
  "name": "mario",
  "path": "~/Modelfile"
}'
Matt Williams's avatar
Matt Williams committed
126
127
128
129
```

### Response

Matt Williams's avatar
Matt Williams committed
130
A stream of JSON objects. When finished, `status` is `success`.
Matt Williams's avatar
Matt Williams committed
131

132
```json
Matt Williams's avatar
Matt Williams committed
133
134
135
136
137
{
  "status": "parsing modelfile"
}
```

138
## List Local Models
Matt Williams's avatar
Matt Williams committed
139

Matt Williams's avatar
Matt Williams committed
140
```shell
141
GET /api/tags
Matt Williams's avatar
Matt Williams committed
142
143
```

144
List models that are available locally.
Matt Williams's avatar
Matt Williams committed
145
146
147

### Request

Matt Williams's avatar
Matt Williams committed
148
```shell
149
150
curl http://localhost:11434/api/tags
```
Matt Williams's avatar
Matt Williams committed
151
152
153

### Response

154
```json
Matt Williams's avatar
Matt Williams committed
155
156
157
{
  "models": [
    {
158
159
160
161
162
163
164
      "name": "llama2:7b",
      "modified_at": "2023-08-02T17:02:23.713454393-07:00",
      "size": 3791730596
    },
    {
      "name": "llama2:13b",
      "modified_at": "2023-08-08T12:08:38.093596297-07:00",
Matt Williams's avatar
Matt Williams committed
165
      "size": 7323310500
Matt Williams's avatar
Matt Williams committed
166
167
    }
  ]
Matt Williams's avatar
Matt Williams committed
168
169
170
}
```

Matt Williams's avatar
Matt Williams committed
171
172
173
174
175
176
177
178
179
180
181
## Show Model Information

```shell
POST /api/show
```

Show details about a model including modelfile, template, parameters, license, and system prompt.

### Parameters

- `name`: name of the model to show
Matt Williams's avatar
Matt Williams committed
182

Matt Williams's avatar
Matt Williams committed
183
184
### Request

185
```shell
Matt Williams's avatar
Matt Williams committed
186
187
188
curl http://localhost:11434/api/show -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
189
```
Matt Williams's avatar
Matt Williams committed
190
191
192
193
194

### Response

```json
{
195
196
197
198
  "license": "<contents of license block>",
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llama2:latest\n\nFROM /Users/username/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8\nTEMPLATE \"\"\"[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] \"\"\"\nSYSTEM \"\"\"\"\"\"\nPARAMETER stop [INST]\nPARAMETER stop [/INST]\nPARAMETER stop <<SYS>>\nPARAMETER stop <</SYS>>\n",
  "parameters": "stop                           [INST]\nstop                           [/INST]\nstop                           <<SYS>>\nstop                           <</SYS>>",
  "template": "[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] "
Matt Williams's avatar
Matt Williams committed
199
200
201
202
203
204
}
```

## Copy a Model

```shell
205
POST /api/copy
Matt Williams's avatar
Matt Williams committed
206
```
207

208
Copy a model. Creates a model with another name from an existing model.
Matt Williams's avatar
Matt Williams committed
209
210
211

### Request

Matt Williams's avatar
Matt Williams committed
212
```shell
213
214
215
curl http://localhost:11434/api/copy -d '{
  "source": "llama2:7b",
  "destination": "llama2-backup"
Matt Williams's avatar
Matt Williams committed
216
217
218
}'
```

Matt Williams's avatar
Matt Williams committed
219
## Delete a Model
Matt Williams's avatar
Matt Williams committed
220

Matt Williams's avatar
Matt Williams committed
221
```shell
222
DELETE /api/delete
Matt Williams's avatar
Matt Williams committed
223
224
```

225
Delete a model and its data.
Matt Williams's avatar
Matt Williams committed
226

227
### Parameters
Matt Williams's avatar
Matt Williams committed
228

229
- `model`: model name to delete
Matt Williams's avatar
Matt Williams committed
230

231
### Request
Matt Williams's avatar
Matt Williams committed
232

Matt Williams's avatar
Matt Williams committed
233
```shell
234
235
curl -X DELETE http://localhost:11434/api/delete -d '{
  "name": "llama2:13b"
Matt Williams's avatar
Matt Williams committed
236
237
238
}'
```

239
## Pull a Model
Matt Williams's avatar
Matt Williams committed
240

Matt Williams's avatar
Matt Williams committed
241
```shell
242
243
244
POST /api/pull
```

Matt Williams's avatar
Matt Williams committed
245
Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
Matt Williams's avatar
Matt Williams committed
246

247
### Parameters
Matt Williams's avatar
Matt Williams committed
248

249
- `name`: name of the model to pull
Matt Williams's avatar
Matt Williams committed
250
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
251
- `stream`: (optional) if `false` the response will be be returned as a single response object, rather than a stream of objects
Matt Williams's avatar
Matt Williams committed
252
253
254

### Request

Matt Williams's avatar
Matt Williams committed
255
```shell
256
257
258
curl -X POST http://localhost:11434/api/pull -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
259
260
261
```

### Response
262

263
```json
Matt Williams's avatar
Matt Williams committed
264
{
265
266
267
  "status": "downloading digestname",
  "digest": "digestname",
  "total": 2142590208
Matt Williams's avatar
Matt Williams committed
268
269
}
```
270

Matt Williams's avatar
Matt Williams committed
271
272
273
274
275
276
277
278
279
280
281
## Push a Model

```shell
POST /api/push
```

Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.

### Parameters

- `name`: name of the model to push in the form of `<namespace>/<model>:<tag>`
282
283
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
- `stream`: (optional) if `false` the response will be be returned as a single response object, rather than a stream of objects
Matt Williams's avatar
Matt Williams committed
284
285
286
287
288
289
290
291
292
293

### Request

```shell
curl -X POST http://localhost:11434/api/push -d '{
  "name": "mattw/pygmalion:latest"
}'
```

### Response
294

Matt Williams's avatar
Matt Williams committed
295
296
297
Streaming response that starts with:

```json
298
{ "status": "retrieving manifest" }
299
```
Matt Williams's avatar
Matt Williams committed
300
301
302
303
304

and then:

```json
{
305
306
307
  "status": "starting upload",
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
Matt Williams's avatar
Matt Williams committed
308
309
310
311
312
313
314
}
```

Then there is a series of uploading responses:

```json
{
315
316
317
318
  "status": "starting upload",
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
}
Matt Williams's avatar
Matt Williams committed
319
320
321
322
323
324
325
326
327
328
329
330
```

Finally, when the upload is complete:

```json
{"status":"pushing manifest"}
{"status":"success"}
```

## Generate Embeddings

```shell
331
332
333
334
335
336
337
338
339
340
POST /api/embeddings
```

Generate embeddings from a model

### Parameters

- `model`: name of model to generate embeddings from
- `prompt`: text to generate embeddings for

341
342
343
344
Advanced parameters:

- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`

345
346
### Request

Matt Williams's avatar
Matt Williams committed
347
```shell
348
349
350
351
352
353
354
355
356
357
358
359
360
361
curl -X POST http://localhost:11434/api/embeddings -d '{
  "model": "llama2:7b",
  "prompt": "Here is an article about llamas..."
}'
```

### Response

```json
{
  "embeddings": [
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
  ]
Costa Alexoglou's avatar
Costa Alexoglou committed
362
363
}
```