@@ -24,7 +24,7 @@ All durations are returned in nanoseconds.
### Streaming responses
Certain endpoints stream responses as JSON objects delineated with the newline (`\n`) character.
Certain endpoints stream responses as JSON objects.
## Generate a completion
...
...
@@ -32,7 +32,7 @@ Certain endpoints stream responses as JSON objects delineated with the newline (
POST /api/generate
```
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
### Parameters
...
...
@@ -47,7 +47,7 @@ Advanced parameters (optional):
-`template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
-`context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
-`stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
-`raw`: if `true` no formatting will be applied to the prompt and no context will be returned. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API, and are managing history yourself.
-`raw`: if `true` no formatting will be applied to the prompt. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API.
### JSON mode
...
...
@@ -57,7 +57,7 @@ Enable JSON mode by setting the `format` parameter to `json`. This will structur
### Examples
#### Request
#### Request (Prompt)
```shell
curl http://localhost:11434/api/generate -d'{
...
...
@@ -114,6 +114,8 @@ To calculate how fast the response is generated in tokens per second (token/s),
#### Request (No streaming)
A response can be recieved in one reply when streaming is off.
```shell
curl http://localhost:11434/api/generate -d'{
"model": "llama2",
...
...
@@ -144,9 +146,9 @@ If `stream` is set to `false`, the response will be a single JSON object:
}
```
#### Request (Raw mode)
#### Request (Raw Mode)
In some cases you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable formatting and context.
In some cases you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable formatting.
Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
### Parameters
-`model`: (required) the [model name](#model-names)
-`messages`: the messages of the chat, this can be used to keep a chat memory
Advanced parameters (optional):
-`format`: the format to return a response in. Currently the only accepted value is `json`
-`options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
-`template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
-`stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
err=fmt.Errorf("%v: this model may be incompatible with your version of Ollama. If you previously pulled this model, try updating it by running `ollama pull %s`",err,model.ShortName)