api.md 7.92 KB
Newer Older
1
2
# API

3
4
5
## Endpoints

- [Generate a completion](#generate-a-completion)
Matt Williams's avatar
Matt Williams committed
6
7
8
9
10
11
12
13
14
- [Create a Model](#create-a-model)
- [List Local Models](#list-local-models)
- [Show Model Information](#show-model-information)
- [Copy a Model](#copy-a-model)
- [Delete a Model](#delete-a-model)
- [Pull a Model](#pull-a-model)
- [Push a Model](#push-a-model)
- [Generate Embeddings](#generate-embeddings)

Matt Williams's avatar
Matt Williams committed
15

16
## Conventions
Matt Williams's avatar
Matt Williams committed
17

18
### Model names
Matt Williams's avatar
Matt Williams committed
19

Matt Williams's avatar
Matt Williams committed
20
Model names follow a `model:tag` format. Some examples are `orca-mini:3b-q4_1` and `llama2:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
Matt Williams's avatar
Matt Williams committed
21
22
23

### Durations

24
All durations are returned in nanoseconds.
Matt Williams's avatar
Matt Williams committed
25

26
## Generate a completion
Matt Williams's avatar
Matt Williams committed
27

Matt Williams's avatar
Matt Williams committed
28
```shell
29
30
POST /api/generate
```
31

32
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
Matt Williams's avatar
Matt Williams committed
33

34
### Parameters
Matt Williams's avatar
Matt Williams committed
35

36
37
- `model`: (required) the [model name](#model-names)
- `prompt`: the prompt to generate a response for
Matt Williams's avatar
Matt Williams committed
38

39
Advanced parameters:
Matt Williams's avatar
Matt Williams committed
40

41
42
43
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `system`: system prompt to (overrides what is defined in the `Modelfile`)
- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
Bruce MacDonald's avatar
Bruce MacDonald committed
44
- `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
45
46

### Request
Matt Williams's avatar
Matt Williams committed
47

Matt Williams's avatar
Matt Williams committed
48
```shell
49
50
51
52
53
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'
```
Matt Williams's avatar
Matt Williams committed
54

Matt Williams's avatar
Matt Williams committed
55
56
### Response

57
A stream of JSON objects:
Matt Williams's avatar
Matt Williams committed
58

59
```json
Matt Williams's avatar
Matt Williams committed
60
{
61
62
63
  "model": "llama2:7b",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "The",
Matt Williams's avatar
Matt Williams committed
64
65
66
67
  "done": false
}
```

68
The final response in the stream also includes additional data about the generation:
Matt Williams's avatar
Matt Williams committed
69

70
71
72
73
74
75
76
77
- `total_duration`: time spent generating the response
- `load_duration`: time spent in nanoseconds loading the model
- `sample_count`: number of samples generated
- `sample_duration`: time spent generating samples
- `prompt_eval_count`: number of tokens in the prompt
- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_count`: number of tokens the response
- `eval_duration`: time in nanoseconds spent generating the response
Bruce MacDonald's avatar
Bruce MacDonald committed
78
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
79
80
81

To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.

82
```json
Matt Williams's avatar
Matt Williams committed
83
{
84
85
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
Bruce MacDonald's avatar
Bruce MacDonald committed
86
  "context": [1, 2, 3],
87
88
89
90
91
92
93
94
95
96
  "done": true,
  "total_duration": 5589157167,
  "load_duration": 3013701500,
  "sample_count": 114,
  "sample_duration": 81442000,
  "prompt_eval_count": 46,
  "prompt_eval_duration": 1160282000,
  "eval_count": 113,
  "eval_duration": 1325948000
}
Matt Williams's avatar
Matt Williams committed
97
98
```

99
## Create a Model
Matt Williams's avatar
Matt Williams committed
100

Matt Williams's avatar
Matt Williams committed
101
```shell
102
103
104
105
POST /api/create
```

Create a model from a [`Modelfile`](./modelfile.md)
106

107
### Parameters
Matt Williams's avatar
Matt Williams committed
108

109
110
- `name`: name of the model to create
- `path`: path to the Modelfile
Matt Williams's avatar
Matt Williams committed
111
112
113

### Request

Matt Williams's avatar
Matt Williams committed
114
```shell
115
116
117
118
curl -X POST http://localhost:11434/api/create -d '{
  "name": "mario",
  "path": "~/Modelfile"
}'
Matt Williams's avatar
Matt Williams committed
119
120
121
122
```

### Response

Matt Williams's avatar
Matt Williams committed
123
A stream of JSON objects. When finished, `status` is `success`.
Matt Williams's avatar
Matt Williams committed
124

125
```json
Matt Williams's avatar
Matt Williams committed
126
127
128
129
130
{
  "status": "parsing modelfile"
}
```

131
## List Local Models
Matt Williams's avatar
Matt Williams committed
132

Matt Williams's avatar
Matt Williams committed
133
```shell
134
GET /api/tags
Matt Williams's avatar
Matt Williams committed
135
136
```

137
List models that are available locally.
Matt Williams's avatar
Matt Williams committed
138
139
140

### Request

Matt Williams's avatar
Matt Williams committed
141
```shell
142
143
curl http://localhost:11434/api/tags
```
Matt Williams's avatar
Matt Williams committed
144
145
146

### Response

147
```json
Matt Williams's avatar
Matt Williams committed
148
149
150
{
  "models": [
    {
151
152
153
154
155
156
157
      "name": "llama2:7b",
      "modified_at": "2023-08-02T17:02:23.713454393-07:00",
      "size": 3791730596
    },
    {
      "name": "llama2:13b",
      "modified_at": "2023-08-08T12:08:38.093596297-07:00",
Matt Williams's avatar
Matt Williams committed
158
      "size": 7323310500
Matt Williams's avatar
Matt Williams committed
159
160
    }
  ]
Matt Williams's avatar
Matt Williams committed
161
162
163
}
```

Matt Williams's avatar
Matt Williams committed
164
165
166
167
168
169
170
171
172
173
174
## Show Model Information

```shell
POST /api/show
```

Show details about a model including modelfile, template, parameters, license, and system prompt.

### Parameters

- `name`: name of the model to show
Matt Williams's avatar
Matt Williams committed
175

Matt Williams's avatar
Matt Williams committed
176
177
178
179
180
181
### Request

```shell  
curl http://localhost:11434/api/show -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
182
```
Matt Williams's avatar
Matt Williams committed
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197

### Response

```json
{
    "license": "<contents of license block>",
    "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llama2:latest\n\nFROM /Users/username/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8\nTEMPLATE \"\"\"[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] \"\"\"\nSYSTEM \"\"\"\"\"\"\nPARAMETER stop [INST]\nPARAMETER stop [/INST]\nPARAMETER stop <<SYS>>\nPARAMETER stop <</SYS>>\n",
    "parameters": "stop                           [INST]\nstop                           [/INST]\nstop                           <<SYS>>\nstop                           <</SYS>>",
    "template": "[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] "
}
```

## Copy a Model

```shell
198
POST /api/copy
Matt Williams's avatar
Matt Williams committed
199
```
200

201
Copy a model. Creates a model with another name from an existing model.
Matt Williams's avatar
Matt Williams committed
202
203
204

### Request

Matt Williams's avatar
Matt Williams committed
205
```shell
206
207
208
curl http://localhost:11434/api/copy -d '{
  "source": "llama2:7b",
  "destination": "llama2-backup"
Matt Williams's avatar
Matt Williams committed
209
210
211
}'
```

Matt Williams's avatar
Matt Williams committed
212
## Delete a Model
Matt Williams's avatar
Matt Williams committed
213

Matt Williams's avatar
Matt Williams committed
214
```shell
215
DELETE /api/delete
Matt Williams's avatar
Matt Williams committed
216
217
```

218
Delete a model and its data.
Matt Williams's avatar
Matt Williams committed
219

220
### Parameters
Matt Williams's avatar
Matt Williams committed
221

222
- `model`: model name to delete
Matt Williams's avatar
Matt Williams committed
223

224
### Request
Matt Williams's avatar
Matt Williams committed
225

Matt Williams's avatar
Matt Williams committed
226
```shell
227
228
curl -X DELETE http://localhost:11434/api/delete -d '{
  "name": "llama2:13b"
Matt Williams's avatar
Matt Williams committed
229
230
231
}'
```

232
## Pull a Model
Matt Williams's avatar
Matt Williams committed
233

Matt Williams's avatar
Matt Williams committed
234
```shell
235
236
237
POST /api/pull
```

Matt Williams's avatar
Matt Williams committed
238
Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
Matt Williams's avatar
Matt Williams committed
239

240
### Parameters
Matt Williams's avatar
Matt Williams committed
241

242
- `name`: name of the model to pull
Matt Williams's avatar
Matt Williams committed
243
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
Matt Williams's avatar
Matt Williams committed
244
245
246

### Request

Matt Williams's avatar
Matt Williams committed
247
```shell
248
249
250
curl -X POST http://localhost:11434/api/pull -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
251
252
253
```

### Response
254

255
```json
Matt Williams's avatar
Matt Williams committed
256
{
257
258
259
  "status": "downloading digestname",
  "digest": "digestname",
  "total": 2142590208
Matt Williams's avatar
Matt Williams committed
260
261
}
```
262

Matt Williams's avatar
Matt Williams committed
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
## Push a Model

```shell
POST /api/push
```

Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.

### Parameters

- `name`: name of the model to push in the form of `<namespace>/<model>:<tag>`
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.  

### Request

```shell
curl -X POST http://localhost:11434/api/push -d '{
  "name": "mattw/pygmalion:latest"
}'
```

### Response
285

Matt Williams's avatar
Matt Williams committed
286
287
288
289
Streaming response that starts with:

```json
{"status":"retrieving manifest"}
290
```
Matt Williams's avatar
Matt Williams committed
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319

and then:

```json
{
"status":"starting upload","digest":"sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total":1928429856
}
```

Then there is a series of uploading responses:

```json
{
"status":"starting upload",
"digest":"sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total":1928429856}
```

Finally, when the upload is complete:

```json
{"status":"pushing manifest"}
{"status":"success"}
```

## Generate Embeddings

```shell
320
321
322
323
324
325
326
327
328
329
POST /api/embeddings
```

Generate embeddings from a model

### Parameters

- `model`: name of model to generate embeddings from
- `prompt`: text to generate embeddings for

330
331
332
333
Advanced parameters:

- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`

334
335
### Request

Matt Williams's avatar
Matt Williams committed
336
```shell
337
338
339
340
341
342
343
344
345
346
347
348
349
350
curl -X POST http://localhost:11434/api/embeddings -d '{
  "model": "llama2:7b",
  "prompt": "Here is an article about llamas..."
}'
```

### Response

```json
{
  "embeddings": [
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
  ]
Matt Williams's avatar
Matt Williams committed
351
}```