api.md 10.3 KB
Newer Older
1
2
# API

3
4
5
## Endpoints

- [Generate a completion](#generate-a-completion)
Matt Williams's avatar
Matt Williams committed
6
7
8
9
10
11
12
13
14
- [Create a Model](#create-a-model)
- [List Local Models](#list-local-models)
- [Show Model Information](#show-model-information)
- [Copy a Model](#copy-a-model)
- [Delete a Model](#delete-a-model)
- [Pull a Model](#pull-a-model)
- [Push a Model](#push-a-model)
- [Generate Embeddings](#generate-embeddings)

15
## Conventions
Matt Williams's avatar
Matt Williams committed
16

17
### Model names
Matt Williams's avatar
Matt Williams committed
18

Matt Williams's avatar
Matt Williams committed
19
Model names follow a `model:tag` format. Some examples are `orca-mini:3b-q4_1` and `llama2:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
Matt Williams's avatar
Matt Williams committed
20
21
22

### Durations

23
All durations are returned in nanoseconds.
Matt Williams's avatar
Matt Williams committed
24

25
26
27
28
### Streaming responses

Certain endpoints stream responses as JSON objects delineated with the newline (`\n`) character.

29
## Generate a completion
Matt Williams's avatar
Matt Williams committed
30

Matt Williams's avatar
Matt Williams committed
31
```shell
32
33
POST /api/generate
```
34

35
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
Matt Williams's avatar
Matt Williams committed
36

37
### Parameters
Matt Williams's avatar
Matt Williams committed
38

39
40
- `model`: (required) the [model name](#model-names)
- `prompt`: the prompt to generate a response for
Matt Williams's avatar
Matt Williams committed
41

42
Advanced parameters (optional):
Matt Williams's avatar
Matt Williams committed
43

44
45
46
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `system`: system prompt to (overrides what is defined in the `Modelfile`)
- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
Bruce MacDonald's avatar
Bruce MacDonald committed
47
- `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
48
- `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
49
50

### Request
Matt Williams's avatar
Matt Williams committed
51

Matt Williams's avatar
Matt Williams committed
52
```shell
53
54
55
56
57
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'
```
Matt Williams's avatar
Matt Williams committed
58

Matt Williams's avatar
Matt Williams committed
59
60
### Response

61
If `stream` is not specified, or set to `true`, a stream of JSON objects is returned:
Matt Williams's avatar
Matt Williams committed
62

63
```json
Matt Williams's avatar
Matt Williams committed
64
{
65
66
67
  "model": "llama2:7b",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "The",
Matt Williams's avatar
Matt Williams committed
68
69
70
71
  "done": false
}
```

72
The final response in the stream also includes additional data about the generation:
Matt Williams's avatar
Matt Williams committed
73

74
75
76
77
78
79
80
81
- `total_duration`: time spent generating the response
- `load_duration`: time spent in nanoseconds loading the model
- `sample_count`: number of samples generated
- `sample_duration`: time spent generating samples
- `prompt_eval_count`: number of tokens in the prompt
- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_count`: number of tokens the response
- `eval_duration`: time in nanoseconds spent generating the response
Bruce MacDonald's avatar
Bruce MacDonald committed
82
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
83
- `response`: empty if the response was streamed, if not streamed, this will contain the full response
84
85
86

To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.

87
```json
Matt Williams's avatar
Matt Williams committed
88
{
89
90
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
91
  "response": "",
Bruce MacDonald's avatar
Bruce MacDonald committed
92
  "context": [1, 2, 3],
93
94
95
96
97
98
99
100
101
102
  "done": true,
  "total_duration": 5589157167,
  "load_duration": 3013701500,
  "sample_count": 114,
  "sample_duration": 81442000,
  "prompt_eval_count": 46,
  "prompt_eval_duration": 1160282000,
  "eval_count": 113,
  "eval_duration": 1325948000
}
Matt Williams's avatar
Matt Williams committed
103
104
```

105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
If `stream` is set to `false`, the response will be a single JSON object:

```json
{
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "The sky is blue because it is the color of the sky.",
  "context": [1, 2, 3],
  "done": true,
  "total_duration": 5589157167,
  "load_duration": 3013701500,
  "sample_count": 114,
  "sample_duration": 81442000,
  "prompt_eval_count": 46,
  "prompt_eval_duration": 1160282000,
  "eval_count": 13,
  "eval_duration": 1325948000
}
```

125
## Create a Model
Matt Williams's avatar
Matt Williams committed
126

Matt Williams's avatar
Matt Williams committed
127
```shell
128
129
130
131
POST /api/create
```

Create a model from a [`Modelfile`](./modelfile.md)
132

133
### Parameters
Matt Williams's avatar
Matt Williams committed
134

135
136
- `name`: name of the model to create
- `path`: path to the Modelfile
137
- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
Matt Williams's avatar
Matt Williams committed
138
139
140

### Request

Matt Williams's avatar
Matt Williams committed
141
```shell
142
143
144
145
curl -X POST http://localhost:11434/api/create -d '{
  "name": "mario",
  "path": "~/Modelfile"
}'
Matt Williams's avatar
Matt Williams committed
146
147
148
149
```

### Response

Matt Williams's avatar
Matt Williams committed
150
A stream of JSON objects. When finished, `status` is `success`.
Matt Williams's avatar
Matt Williams committed
151

152
```json
Matt Williams's avatar
Matt Williams committed
153
154
155
156
157
{
  "status": "parsing modelfile"
}
```

158
## List Local Models
Matt Williams's avatar
Matt Williams committed
159

Matt Williams's avatar
Matt Williams committed
160
```shell
161
GET /api/tags
Matt Williams's avatar
Matt Williams committed
162
163
```

164
List models that are available locally.
Matt Williams's avatar
Matt Williams committed
165
166
167

### Request

Matt Williams's avatar
Matt Williams committed
168
```shell
169
170
curl http://localhost:11434/api/tags
```
Matt Williams's avatar
Matt Williams committed
171
172
173

### Response

174
175
A single JSON object will be returned.

176
```json
Matt Williams's avatar
Matt Williams committed
177
178
179
{
  "models": [
    {
180
181
182
183
184
185
186
      "name": "llama2:7b",
      "modified_at": "2023-08-02T17:02:23.713454393-07:00",
      "size": 3791730596
    },
    {
      "name": "llama2:13b",
      "modified_at": "2023-08-08T12:08:38.093596297-07:00",
Matt Williams's avatar
Matt Williams committed
187
      "size": 7323310500
Matt Williams's avatar
Matt Williams committed
188
189
    }
  ]
Matt Williams's avatar
Matt Williams committed
190
191
192
}
```

Matt Williams's avatar
Matt Williams committed
193
194
195
196
197
198
199
200
201
202
203
## Show Model Information

```shell
POST /api/show
```

Show details about a model including modelfile, template, parameters, license, and system prompt.

### Parameters

- `name`: name of the model to show
Matt Williams's avatar
Matt Williams committed
204

Matt Williams's avatar
Matt Williams committed
205
206
### Request

207
```shell
Matt Williams's avatar
Matt Williams committed
208
209
210
curl http://localhost:11434/api/show -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
211
```
Matt Williams's avatar
Matt Williams committed
212
213
214
215
216

### Response

```json
{
217
218
219
220
  "license": "<contents of license block>",
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llama2:latest\n\nFROM /Users/username/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8\nTEMPLATE \"\"\"[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] \"\"\"\nSYSTEM \"\"\"\"\"\"\nPARAMETER stop [INST]\nPARAMETER stop [/INST]\nPARAMETER stop <<SYS>>\nPARAMETER stop <</SYS>>\n",
  "parameters": "stop                           [INST]\nstop                           [/INST]\nstop                           <<SYS>>\nstop                           <</SYS>>",
  "template": "[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] "
Matt Williams's avatar
Matt Williams committed
221
222
223
224
225
226
}
```

## Copy a Model

```shell
227
POST /api/copy
Matt Williams's avatar
Matt Williams committed
228
```
229

230
Copy a model. Creates a model with another name from an existing model.
Matt Williams's avatar
Matt Williams committed
231
232
233

### Request

Matt Williams's avatar
Matt Williams committed
234
```shell
235
236
237
curl http://localhost:11434/api/copy -d '{
  "source": "llama2:7b",
  "destination": "llama2-backup"
Matt Williams's avatar
Matt Williams committed
238
239
240
}'
```

241
242
243
244
### Response

The only response is a 200 OK if successful.

Matt Williams's avatar
Matt Williams committed
245
## Delete a Model
Matt Williams's avatar
Matt Williams committed
246

Matt Williams's avatar
Matt Williams committed
247
```shell
248
DELETE /api/delete
Matt Williams's avatar
Matt Williams committed
249
250
```

251
Delete a model and its data.
Matt Williams's avatar
Matt Williams committed
252

253
### Parameters
Matt Williams's avatar
Matt Williams committed
254

255
- `name`: model name to delete
Matt Williams's avatar
Matt Williams committed
256

257
### Request
Matt Williams's avatar
Matt Williams committed
258

Matt Williams's avatar
Matt Williams committed
259
```shell
260
261
curl -X DELETE http://localhost:11434/api/delete -d '{
  "name": "llama2:13b"
Matt Williams's avatar
Matt Williams committed
262
263
264
}'
```

265
266
267
268
### Response

If successful, the only response is a 200 OK.

269
## Pull a Model
Matt Williams's avatar
Matt Williams committed
270

Matt Williams's avatar
Matt Williams committed
271
```shell
272
273
274
POST /api/pull
```

Matt Williams's avatar
Matt Williams committed
275
Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
Matt Williams's avatar
Matt Williams committed
276

277
### Parameters
Matt Williams's avatar
Matt Williams committed
278

279
- `name`: name of the model to pull
Matt Williams's avatar
Matt Williams committed
280
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
281
- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
Matt Williams's avatar
Matt Williams committed
282
283
284

### Request

Matt Williams's avatar
Matt Williams committed
285
```shell
286
287
288
curl -X POST http://localhost:11434/api/pull -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
289
290
291
```

### Response
292

293
294
295
296
297
298
299
300
301
302
303
304
If `stream` is not specified, or set to `true`, a stream of JSON objects is returned:

The first object is the manifest:

```json
{
  "status": "pulling manifest"
}
```

Then there is a series of downloading responses. Until any of the download is completed, the `completed` key may not be included. The number of files to be downloaded depends on the number of layers specified in the manifest.

305
```json
Matt Williams's avatar
Matt Williams committed
306
{
307
308
  "status": "downloading digestname",
  "digest": "digestname",
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
  "total": 2142590208, 
  "completed": 241970
}
```

After all the files are downloaded, the final responses are:

```json
{
    "status": "verifying sha256 digest"
}
{
    "status": "writing manifest"
}
{
    "status": "removing any unused layers"
}
{
    "status": "success"
}
```

if `stream` is set to false, then the response is a single JSON object:

```json
{
  "status": "success"
Matt Williams's avatar
Matt Williams committed
336
337
}
```
338

Matt Williams's avatar
Matt Williams committed
339
340
341
342
343
344
345
346
347
348
349
## Push a Model

```shell
POST /api/push
```

Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.

### Parameters

- `name`: name of the model to push in the form of `<namespace>/<model>:<tag>`
350
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
351
- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
Matt Williams's avatar
Matt Williams committed
352
353
354
355
356
357
358
359
360
361

### Request

```shell
curl -X POST http://localhost:11434/api/push -d '{
  "name": "mattw/pygmalion:latest"
}'
```

### Response
362

363
If `stream` is not specified, or set to `true`, a stream of JSON objects is returned:
Matt Williams's avatar
Matt Williams committed
364
365

```json
366
{ "status": "retrieving manifest" }
367
```
Matt Williams's avatar
Matt Williams committed
368
369
370
371
372

and then:

```json
{
373
374
375
  "status": "starting upload",
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
Matt Williams's avatar
Matt Williams committed
376
377
378
379
380
381
382
}
```

Then there is a series of uploading responses:

```json
{
383
384
385
386
  "status": "starting upload",
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
}
Matt Williams's avatar
Matt Williams committed
387
388
389
390
391
392
393
394
395
```

Finally, when the upload is complete:

```json
{"status":"pushing manifest"}
{"status":"success"}
```

396
397
398
399
400
401
If `stream` is set to `false`, then the response is a single JSON object:

```json
{"status":"success"}
```

Matt Williams's avatar
Matt Williams committed
402
403
404
## Generate Embeddings

```shell
405
406
407
408
409
410
411
412
413
414
POST /api/embeddings
```

Generate embeddings from a model

### Parameters

- `model`: name of model to generate embeddings from
- `prompt`: text to generate embeddings for

415
416
417
418
Advanced parameters:

- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`

419
420
### Request

Matt Williams's avatar
Matt Williams committed
421
```shell
422
423
424
425
426
427
428
429
430
431
curl -X POST http://localhost:11434/api/embeddings -d '{
  "model": "llama2:7b",
  "prompt": "Here is an article about llamas..."
}'
```

### Response

```json
{
Alexander F. Rødseth's avatar
Alexander F. Rødseth committed
432
  "embedding": [
433
434
435
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
  ]
Costa Alexoglou's avatar
Costa Alexoglou committed
436
437
}
```