api.md 4.97 KB
Newer Older
1
2
# API

3
4
5
6
7
8
9
10
## Endpoints

- [Generate a completion](#generate-a-completion)
- [Create a model](#create-a-model)
- [List local models](#list-local-models)
- [Copy a model](#copy-a-model)
- [Delete a model](#delete-a-model)
- [Pull a model](#pull-a-model)
11
- [Generate embeddings](#generate-embeddings)
Matt Williams's avatar
Matt Williams committed
12

13
## Conventions
Matt Williams's avatar
Matt Williams committed
14

15
### Model names
Matt Williams's avatar
Matt Williams committed
16

17
Model names follow a `model:tag` format. Some examples are `orca:3b-q4_1` and `llama2:70b`. The tag is optional and if not provided will default to `latest`. The tag is used to identify a specific version.
Matt Williams's avatar
Matt Williams committed
18
19
20

### Durations

21
All durations are returned in nanoseconds.
Matt Williams's avatar
Matt Williams committed
22

23
## Generate a completion
Matt Williams's avatar
Matt Williams committed
24

25
26
27
```
POST /api/generate
```
28

29
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
Matt Williams's avatar
Matt Williams committed
30

31
### Parameters
Matt Williams's avatar
Matt Williams committed
32

33
34
- `model`: (required) the [model name](#model-names)
- `prompt`: the prompt to generate a response for
Matt Williams's avatar
Matt Williams committed
35

36
Advanced parameters:
Matt Williams's avatar
Matt Williams committed
37

38
39
40
41
42
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `system`: system prompt to (overrides what is defined in the `Modelfile`)
- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)

### Request
Matt Williams's avatar
Matt Williams committed
43

44
45
46
47
48
49
```
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'
```
Matt Williams's avatar
Matt Williams committed
50

Matt Williams's avatar
Matt Williams committed
51
52
### Response

53
A stream of JSON objects:
Matt Williams's avatar
Matt Williams committed
54

55
```json
Matt Williams's avatar
Matt Williams committed
56
{
57
58
59
  "model": "llama2:7b",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "The",
Matt Williams's avatar
Matt Williams committed
60
61
62
63
  "done": false
}
```

64
The final response in the stream also includes additional data about the generation:
Matt Williams's avatar
Matt Williams committed
65

66
67
68
69
70
71
72
73
74
75
76
- `total_duration`: time spent generating the response
- `load_duration`: time spent in nanoseconds loading the model
- `sample_count`: number of samples generated
- `sample_duration`: time spent generating samples
- `prompt_eval_count`: number of tokens in the prompt
- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_count`: number of tokens the response
- `eval_duration`: time in nanoseconds spent generating the response

To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.

77
```json
Matt Williams's avatar
Matt Williams committed
78
{
79
80
81
82
83
84
85
86
87
88
89
90
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "done": true,
  "total_duration": 5589157167,
  "load_duration": 3013701500,
  "sample_count": 114,
  "sample_duration": 81442000,
  "prompt_eval_count": 46,
  "prompt_eval_duration": 1160282000,
  "eval_count": 113,
  "eval_duration": 1325948000
}
Matt Williams's avatar
Matt Williams committed
91
92
```

93
## Create a Model
Matt Williams's avatar
Matt Williams committed
94

95
96
97
98
99
```
POST /api/create
```

Create a model from a [`Modelfile`](./modelfile.md)
100

101
### Parameters
Matt Williams's avatar
Matt Williams committed
102

103
104
- `name`: name of the model to create
- `path`: path to the Modelfile
Matt Williams's avatar
Matt Williams committed
105
106
107

### Request

108
109
110
111
112
```
curl -X POST http://localhost:11434/api/create -d '{
  "name": "mario",
  "path": "~/Modelfile"
}'
Matt Williams's avatar
Matt Williams committed
113
114
115
116
```

### Response

117
A stream of JSON objects. When finished, `status` is `success`
Matt Williams's avatar
Matt Williams committed
118

119
```json
Matt Williams's avatar
Matt Williams committed
120
121
122
123
124
{
  "status": "parsing modelfile"
}
```

125
## List Local Models
Matt Williams's avatar
Matt Williams committed
126
127

```
128
GET /api/tags
Matt Williams's avatar
Matt Williams committed
129
130
```

131
List models that are available locally.
Matt Williams's avatar
Matt Williams committed
132
133
134

### Request

135
136
137
```
curl http://localhost:11434/api/tags
```
Matt Williams's avatar
Matt Williams committed
138
139
140

### Response

141
```json
Matt Williams's avatar
Matt Williams committed
142
143
144
{
  "models": [
    {
145
146
147
148
149
150
151
      "name": "llama2:7b",
      "modified_at": "2023-08-02T17:02:23.713454393-07:00",
      "size": 3791730596
    },
    {
      "name": "llama2:13b",
      "modified_at": "2023-08-08T12:08:38.093596297-07:00",
Matt Williams's avatar
Matt Williams committed
152
      "size": 7323310500
Matt Williams's avatar
Matt Williams committed
153
154
    }
  ]
Matt Williams's avatar
Matt Williams committed
155
156
157
}
```

158
## Copy a Model
Matt Williams's avatar
Matt Williams committed
159
160

```
161
POST /api/copy
Matt Williams's avatar
Matt Williams committed
162
```
163

164
Copy a model. Creates a model with another name from an existing model.
Matt Williams's avatar
Matt Williams committed
165
166
167
168

### Request

```
169
170
171
curl http://localhost:11434/api/copy -d '{
  "source": "llama2:7b",
  "destination": "llama2-backup"
Matt Williams's avatar
Matt Williams committed
172
173
174
}'
```

Matt Williams's avatar
Matt Williams committed
175
## Delete a Model
Matt Williams's avatar
Matt Williams committed
176

177
178
```
DELETE /api/delete
Matt Williams's avatar
Matt Williams committed
179
180
```

181
Delete a model and its data.
Matt Williams's avatar
Matt Williams committed
182

183
### Parameters
Matt Williams's avatar
Matt Williams committed
184

185
- `model`: model name to delete
Matt Williams's avatar
Matt Williams committed
186

187
### Request
Matt Williams's avatar
Matt Williams committed
188

189
190
191
```
curl -X DELETE http://localhost:11434/api/delete -d '{
  "name": "llama2:13b"
Matt Williams's avatar
Matt Williams committed
192
193
194
}'
```

195
## Pull a Model
Matt Williams's avatar
Matt Williams committed
196

197
198
199
200
201
```
POST /api/pull
```

Download a model from a the model registry. Cancelled pulls are resumed from where they left off, and multiple calls to will share the same download progress.
Matt Williams's avatar
Matt Williams committed
202

203
### Parameters
Matt Williams's avatar
Matt Williams committed
204

205
- `name`: name of the model to pull
Matt Williams's avatar
Matt Williams committed
206
207
208

### Request

209
210
211
212
```
curl -X POST http://localhost:11434/api/pull -d '{
  "name": "llama2:7b"
}'
Matt Williams's avatar
Matt Williams committed
213
214
215
```

### Response
216

217
```json
Matt Williams's avatar
Matt Williams committed
218
{
219
220
221
  "status": "downloading digestname",
  "digest": "digestname",
  "total": 2142590208
Matt Williams's avatar
Matt Williams committed
222
223
}
```
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256

## Generate Embeddings

```
POST /api/embeddings
```

Generate embeddings from a model

### Parameters

- `model`: name of model to generate embeddings from
- `prompt`: text to generate embeddings for

### Request

```
curl -X POST http://localhost:11434/api/embeddings -d '{
  "model": "llama2:7b",
  "prompt": "Here is an article about llamas..."
}'
```

### Response

```json
{
  "embeddings": [
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
  ]
}
```