api.md 8.04 KB
Newer Older
1
2
# API

3
4
5
## Endpoints

- [Generate a completion](#generate-a-completion)
Matt Williams's avatar
Matt Williams committed
6
7
8
9
10
11
12
13
14
- [Create a Model](#create-a-model)
- [List Local Models](#list-local-models)
- [Show Model Information](#show-model-information)
- [Copy a Model](#copy-a-model)
- [Delete a Model](#delete-a-model)
- [Pull a Model](#pull-a-model)
- [Push a Model](#push-a-model)
- [Generate Embeddings](#generate-embeddings)

Matt Williams's avatar
Matt Williams committed
15

16
## Conventions
Matt Williams's avatar
Matt Williams committed
17

18
### Model names
Matt Williams's avatar
Matt Williams committed
19

Matt Williams's avatar
Matt Williams committed
20
Model names follow a `model:tag` format. Some examples are `orca-mini:3b-q4_1` and `llama2:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
Matt Williams's avatar
Matt Williams committed
21
22
23

### Durations

24
All durations are returned in nanoseconds.
Matt Williams's avatar
Matt Williams committed
25

26
27
28
29
### Streaming responses

Certain endpoints stream responses as JSON objects delineated with the newline (`\n`) character.

30
## Generate a completion
Matt Williams's avatar
Matt Williams committed
31

Matt Williams's avatar
Matt Williams committed
32
```shell
33
34
POST /api/generate
```
35

36
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
Matt Williams's avatar
Matt Williams committed
37

38
### Parameters
Matt Williams's avatar
Matt Williams committed
39

40
41
- `model`: (required) the [model name](#model-names)
- `prompt`: the prompt to generate a response for
Matt Williams's avatar
Matt Williams committed
42

43
Advanced parameters:
Matt Williams's avatar
Matt Williams committed
44

45
46
47
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `system`: system prompt to (overrides what is defined in the `Modelfile`)
- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
Bruce MacDonald's avatar
Bruce MacDonald committed
48
- `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
49
50

### Request
Matt Williams's avatar
Matt Williams committed
51

Matt Williams's avatar
Matt Williams committed
52
```shell
53
54
55
56
57
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'
```
Matt Williams's avatar
Matt Williams committed
58

Matt Williams's avatar
Matt Williams committed
59
60
### Response

61
A stream of JSON objects:
Matt Williams's avatar
Matt Williams committed
62

63
```json
Matt Williams's avatar
Matt Williams committed
64
{
65
66
67
  "model": "llama2:7b",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "The",
Matt Williams's avatar
Matt Williams committed
68
69
70
71
  "done": false
}
```

72
The final response in the stream also includes additional data about the generation:
Matt Williams's avatar
Matt Williams committed
73

74
75
76
77
78
79
80
81
- `total_duration`: time spent generating the response
- `load_duration`: time spent in nanoseconds loading the model
- `sample_count`: number of samples generated
- `sample_duration`: time spent generating samples
- `prompt_eval_count`: number of tokens in the prompt
- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_count`: number of tokens the response
- `eval_duration`: time in nanoseconds spent generating the response
Bruce MacDonald's avatar
Bruce MacDonald committed
82
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
83
84
85

To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.

86
```json
Matt Williams's avatar
Matt Williams committed
87
{
88
89
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
Bruce MacDonald's avatar
Bruce MacDonald committed
90
  "context": [1, 2, 3],
91
92
93
94
95
96
97
98
99
100
  "done": true,
  "total_duration": 5589157167,
  "load_duration": 3013701500,
  "sample_count": 114,
  "sample_duration": 81442000,
  "prompt_eval_count": 46,
  "prompt_eval_duration": 1160282000,
  "eval_count": 113,
  "eval_duration": 1325948000
}
Matt Williams's avatar
Matt Williams committed
101
102
```

103
## Create a Model
Matt Williams's avatar
Matt Williams committed
104

Matt Williams's avatar
Matt Williams committed
105
```shell
106
107
108
109
POST /api/create
```

Create a model from a [`Modelfile`](./modelfile.md)
110

111
### Parameters
Matt Williams's avatar
Matt Williams committed
112

113
114
- `name`: name of the model to create
- `path`: path to the Modelfile
Matt Williams's avatar
Matt Williams committed
115
116
117

### Request

Matt Williams's avatar
Matt Williams committed
118
```shell
119
120
121
122
curl -X POST http://localhost:11434/api/create -d '{
  "name": "mario",
  "path": "~/Modelfile"
}'
Matt Williams's avatar
Matt Williams committed
123
124
125
126
```

### Response

Matt Williams's avatar
Matt Williams committed
127
A stream of JSON objects. When finished, `status` is `success`.
Matt Williams's avatar
Matt Williams committed
128

129
```json
Matt Williams's avatar
Matt Williams committed
130
131
132
133
134
{
  "status": "parsing modelfile"
}
```

135
## List Local Models
Matt Williams's avatar
Matt Williams committed
136

Matt Williams's avatar
Matt Williams committed
137
```shell
138
GET /api/tags
Matt Williams's avatar
Matt Williams committed
139
140
```

141
List models that are available locally.
Matt Williams's avatar
Matt Williams committed
142
143
144

### Request

Matt Williams's avatar
Matt Williams committed
145
```shell
146
147
curl http://localhost:11434/api/tags
```
Matt Williams's avatar
Matt Williams committed
148
149
150

### Response

151
```json
Matt Williams's avatar
Matt Williams committed
152
153
154
{
  "models": [
    {
155
156
157
158
159
160
161
      "name": "llama2:7b",
      "modified_at": "2023-08-02T17:02:23.713454393-07:00",
      "size": 3791730596
    },
    {
      "name": "llama2:13b",
      "modified_at": "2023-08-08T12:08:38.093596297-07:00",
Matt Williams's avatar
Matt Williams committed
162
      "size": 7323310500
Matt Williams's avatar
Matt Williams committed
163
164
    }
  ]
Matt Williams's avatar
Matt Williams committed
165
166
167
}
```

Matt Williams's avatar
Matt Williams committed
168
169
170
171
172
173
174
175
176
177
178
## Show Model Information

```shell
POST /api/show
```

Show details about a model including modelfile, template, parameters, license, and system prompt.

### Parameters

- `name`: name of the model to show
Matt Williams's avatar
Matt Williams committed
179

Matt Williams's avatar
Matt Williams committed
180
181
182
183
184
185
### Request

```shell  
curl http://localhost:11434/api/show -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
186
```
Matt Williams's avatar
Matt Williams committed
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201

### Response

```json
{
    "license": "<contents of license block>",
    "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llama2:latest\n\nFROM /Users/username/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8\nTEMPLATE \"\"\"[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] \"\"\"\nSYSTEM \"\"\"\"\"\"\nPARAMETER stop [INST]\nPARAMETER stop [/INST]\nPARAMETER stop <<SYS>>\nPARAMETER stop <</SYS>>\n",
    "parameters": "stop                           [INST]\nstop                           [/INST]\nstop                           <<SYS>>\nstop                           <</SYS>>",
    "template": "[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] "
}
```

## Copy a Model

```shell
202
POST /api/copy
Matt Williams's avatar
Matt Williams committed
203
```
204

205
Copy a model. Creates a model with another name from an existing model.
Matt Williams's avatar
Matt Williams committed
206
207
208

### Request

Matt Williams's avatar
Matt Williams committed
209
```shell
210
211
212
curl http://localhost:11434/api/copy -d '{
  "source": "llama2:7b",
  "destination": "llama2-backup"
Matt Williams's avatar
Matt Williams committed
213
214
215
}'
```

Matt Williams's avatar
Matt Williams committed
216
## Delete a Model
Matt Williams's avatar
Matt Williams committed
217

Matt Williams's avatar
Matt Williams committed
218
```shell
219
DELETE /api/delete
Matt Williams's avatar
Matt Williams committed
220
221
```

222
Delete a model and its data.
Matt Williams's avatar
Matt Williams committed
223

224
### Parameters
Matt Williams's avatar
Matt Williams committed
225

226
- `model`: model name to delete
Matt Williams's avatar
Matt Williams committed
227

228
### Request
Matt Williams's avatar
Matt Williams committed
229

Matt Williams's avatar
Matt Williams committed
230
```shell
231
232
curl -X DELETE http://localhost:11434/api/delete -d '{
  "name": "llama2:13b"
Matt Williams's avatar
Matt Williams committed
233
234
235
}'
```

236
## Pull a Model
Matt Williams's avatar
Matt Williams committed
237

Matt Williams's avatar
Matt Williams committed
238
```shell
239
240
241
POST /api/pull
```

Matt Williams's avatar
Matt Williams committed
242
Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
Matt Williams's avatar
Matt Williams committed
243

244
### Parameters
Matt Williams's avatar
Matt Williams committed
245

246
- `name`: name of the model to pull
Matt Williams's avatar
Matt Williams committed
247
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
Matt Williams's avatar
Matt Williams committed
248
249
250

### Request

Matt Williams's avatar
Matt Williams committed
251
```shell
252
253
254
curl -X POST http://localhost:11434/api/pull -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
255
256
257
```

### Response
258

259
```json
Matt Williams's avatar
Matt Williams committed
260
{
261
262
263
  "status": "downloading digestname",
  "digest": "digestname",
  "total": 2142590208
Matt Williams's avatar
Matt Williams committed
264
265
}
```
266

Matt Williams's avatar
Matt Williams committed
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
## Push a Model

```shell
POST /api/push
```

Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.

### Parameters

- `name`: name of the model to push in the form of `<namespace>/<model>:<tag>`
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.  

### Request

```shell
curl -X POST http://localhost:11434/api/push -d '{
  "name": "mattw/pygmalion:latest"
}'
```

### Response
289

Matt Williams's avatar
Matt Williams committed
290
291
292
293
Streaming response that starts with:

```json
{"status":"retrieving manifest"}
294
```
Matt Williams's avatar
Matt Williams committed
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323

and then:

```json
{
"status":"starting upload","digest":"sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total":1928429856
}
```

Then there is a series of uploading responses:

```json
{
"status":"starting upload",
"digest":"sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total":1928429856}
```

Finally, when the upload is complete:

```json
{"status":"pushing manifest"}
{"status":"success"}
```

## Generate Embeddings

```shell
324
325
326
327
328
329
330
331
332
333
POST /api/embeddings
```

Generate embeddings from a model

### Parameters

- `model`: name of model to generate embeddings from
- `prompt`: text to generate embeddings for

334
335
336
337
Advanced parameters:

- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`

338
339
### Request

Matt Williams's avatar
Matt Williams committed
340
```shell
341
342
343
344
345
346
347
348
349
350
351
352
353
354
curl -X POST http://localhost:11434/api/embeddings -d '{
  "model": "llama2:7b",
  "prompt": "Here is an article about llamas..."
}'
```

### Response

```json
{
  "embeddings": [
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
  ]
Matt Williams's avatar
Matt Williams committed
355
}```