Unverified Commit 55ea963c authored by Jeffrey Morgan's avatar Jeffrey Morgan Committed by GitHub
Browse files

update default model to llama3.2 (#6959)

parent e9e9bdb8
...@@ -35,10 +35,10 @@ The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `olla ...@@ -35,10 +35,10 @@ The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `olla
## Quickstart ## Quickstart
To run and chat with [Llama 3.1](https://ollama.com/library/llama3.1): To run and chat with [Llama 3.2](https://ollama.com/library/llama3.2):
``` ```
ollama run llama3.1 ollama run llama3.2
``` ```
## Model library ## Model library
...@@ -49,6 +49,8 @@ Here are some example models that can be downloaded: ...@@ -49,6 +49,8 @@ Here are some example models that can be downloaded:
| Model | Parameters | Size | Download | | Model | Parameters | Size | Download |
| ------------------ | ---------- | ----- | ------------------------------ | | ------------------ | ---------- | ----- | ------------------------------ |
| Llama 3.2 | 3B | 2.0GB | `ollama run llama3.2` |
| Llama 3.2 | 1B | 1.3GB | `ollama run llama3.1:1b` |
| Llama 3.1 | 8B | 4.7GB | `ollama run llama3.1` | | Llama 3.1 | 8B | 4.7GB | `ollama run llama3.1` |
| Llama 3.1 | 70B | 40GB | `ollama run llama3.1:70b` | | Llama 3.1 | 70B | 40GB | `ollama run llama3.1:70b` |
| Llama 3.1 | 405B | 231GB | `ollama run llama3.1:405b` | | Llama 3.1 | 405B | 231GB | `ollama run llama3.1:405b` |
...@@ -99,16 +101,16 @@ See the [guide](docs/import.md) on importing models for more information. ...@@ -99,16 +101,16 @@ See the [guide](docs/import.md) on importing models for more information.
### Customize a prompt ### Customize a prompt
Models from the Ollama library can be customized with a prompt. For example, to customize the `llama3.1` model: Models from the Ollama library can be customized with a prompt. For example, to customize the `llama3.2` model:
``` ```
ollama pull llama3.1 ollama pull llama3.2
``` ```
Create a `Modelfile`: Create a `Modelfile`:
``` ```
FROM llama3.1 FROM llama3.2
# set the temperature to 1 [higher is more creative, lower is more coherent] # set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1 PARAMETER temperature 1
...@@ -143,7 +145,7 @@ ollama create mymodel -f ./Modelfile ...@@ -143,7 +145,7 @@ ollama create mymodel -f ./Modelfile
### Pull a model ### Pull a model
``` ```
ollama pull llama3.1 ollama pull llama3.2
``` ```
> This command can also be used to update a local model. Only the diff will be pulled. > This command can also be used to update a local model. Only the diff will be pulled.
...@@ -151,13 +153,13 @@ ollama pull llama3.1 ...@@ -151,13 +153,13 @@ ollama pull llama3.1
### Remove a model ### Remove a model
``` ```
ollama rm llama3.1 ollama rm llama3.2
``` ```
### Copy a model ### Copy a model
``` ```
ollama cp llama3.1 my-model ollama cp llama3.2 my-model
``` ```
### Multiline input ### Multiline input
...@@ -181,14 +183,14 @@ The image features a yellow smiley face, which is likely the central focus of th ...@@ -181,14 +183,14 @@ The image features a yellow smiley face, which is likely the central focus of th
### Pass the prompt as an argument ### Pass the prompt as an argument
``` ```
$ ollama run llama3.1 "Summarize this file: $(cat README.md)" $ ollama run llama3.2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
``` ```
### Show model information ### Show model information
``` ```
ollama show llama3.1 ollama show llama3.2
``` ```
### List models on your computer ### List models on your computer
...@@ -206,7 +208,7 @@ ollama ps ...@@ -206,7 +208,7 @@ ollama ps
### Stop a model which is currently running ### Stop a model which is currently running
``` ```
ollama stop llama3.1 ollama stop llama3.2
``` ```
### Start Ollama ### Start Ollama
...@@ -228,7 +230,7 @@ Next, start the server: ...@@ -228,7 +230,7 @@ Next, start the server:
Finally, in a separate shell, run a model: Finally, in a separate shell, run a model:
``` ```
./ollama run llama3.1 ./ollama run llama3.2
``` ```
## REST API ## REST API
...@@ -239,7 +241,7 @@ Ollama has a REST API for running and managing models. ...@@ -239,7 +241,7 @@ Ollama has a REST API for running and managing models.
``` ```
curl http://localhost:11434/api/generate -d '{ curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "model": "llama3.2",
"prompt":"Why is the sky blue?" "prompt":"Why is the sky blue?"
}' }'
``` ```
...@@ -248,7 +250,7 @@ curl http://localhost:11434/api/generate -d '{ ...@@ -248,7 +250,7 @@ curl http://localhost:11434/api/generate -d '{
``` ```
curl http://localhost:11434/api/chat -d '{ curl http://localhost:11434/api/chat -d '{
"model": "llama3.1", "model": "llama3.2",
"messages": [ "messages": [
{ "role": "user", "content": "why is the sky blue?" } { "role": "user", "content": "why is the sky blue?" }
] ]
......
...@@ -142,7 +142,7 @@ SetupAppRunningError=Another Ollama installer is running.%n%nPlease cancel or fi ...@@ -142,7 +142,7 @@ SetupAppRunningError=Another Ollama installer is running.%n%nPlease cancel or fi
;FinishedHeadingLabel=Run your first model ;FinishedHeadingLabel=Run your first model
;FinishedLabel=%nRun this command in a PowerShell or cmd terminal.%n%n%n ollama run llama3.1 ;FinishedLabel=%nRun this command in a PowerShell or cmd terminal.%n%n%n ollama run llama3.2
;ClickFinish=%n ;ClickFinish=%n
[Registry] [Registry]
......
...@@ -4,5 +4,5 @@ write-host "Welcome to Ollama!" ...@@ -4,5 +4,5 @@ write-host "Welcome to Ollama!"
write-host "" write-host ""
write-host "Run your first model:" write-host "Run your first model:"
write-host "" write-host ""
write-host "`tollama run llama3.1" write-host "`tollama run llama3.2"
write-host "" write-host ""
\ No newline at end of file
...@@ -69,7 +69,7 @@ Enable JSON mode by setting the `format` parameter to `json`. This will structur ...@@ -69,7 +69,7 @@ Enable JSON mode by setting the `format` parameter to `json`. This will structur
```shell ```shell
curl http://localhost:11434/api/generate -d '{ curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "model": "llama3.2",
"prompt": "Why is the sky blue?" "prompt": "Why is the sky blue?"
}' }'
``` ```
...@@ -80,7 +80,7 @@ A stream of JSON objects is returned: ...@@ -80,7 +80,7 @@ A stream of JSON objects is returned:
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00", "created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The", "response": "The",
"done": false "done": false
...@@ -102,7 +102,7 @@ To calculate how fast the response is generated in tokens per second (token/s), ...@@ -102,7 +102,7 @@ To calculate how fast the response is generated in tokens per second (token/s),
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z", "created_at": "2023-08-04T19:22:45.499127Z",
"response": "", "response": "",
"done": true, "done": true,
...@@ -124,7 +124,7 @@ A response can be received in one reply when streaming is off. ...@@ -124,7 +124,7 @@ A response can be received in one reply when streaming is off.
```shell ```shell
curl http://localhost:11434/api/generate -d '{ curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "model": "llama3.2",
"prompt": "Why is the sky blue?", "prompt": "Why is the sky blue?",
"stream": false "stream": false
}' }'
...@@ -136,7 +136,7 @@ If `stream` is set to `false`, the response will be a single JSON object: ...@@ -136,7 +136,7 @@ If `stream` is set to `false`, the response will be a single JSON object:
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z", "created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.", "response": "The sky is blue because it is the color of the sky.",
"done": true, "done": true,
...@@ -194,7 +194,7 @@ curl http://localhost:11434/api/generate -d '{ ...@@ -194,7 +194,7 @@ curl http://localhost:11434/api/generate -d '{
```shell ```shell
curl http://localhost:11434/api/generate -d '{ curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "model": "llama3.2",
"prompt": "What color is the sky at different times of the day? Respond using JSON", "prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json", "format": "json",
"stream": false "stream": false
...@@ -205,7 +205,7 @@ curl http://localhost:11434/api/generate -d '{ ...@@ -205,7 +205,7 @@ curl http://localhost:11434/api/generate -d '{
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-11-09T21:07:55.186497Z", "created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n", "response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true, "done": true,
...@@ -327,7 +327,7 @@ If you want to set custom options for the model at runtime rather than in the Mo ...@@ -327,7 +327,7 @@ If you want to set custom options for the model at runtime rather than in the Mo
```shell ```shell
curl http://localhost:11434/api/generate -d '{ curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "model": "llama3.2",
"prompt": "Why is the sky blue?", "prompt": "Why is the sky blue?",
"stream": false, "stream": false,
"options": { "options": {
...@@ -368,7 +368,7 @@ curl http://localhost:11434/api/generate -d '{ ...@@ -368,7 +368,7 @@ curl http://localhost:11434/api/generate -d '{
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z", "created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.", "response": "The sky is blue because it is the color of the sky.",
"done": true, "done": true,
...@@ -390,7 +390,7 @@ If an empty prompt is provided, the model will be loaded into memory. ...@@ -390,7 +390,7 @@ If an empty prompt is provided, the model will be loaded into memory.
```shell ```shell
curl http://localhost:11434/api/generate -d '{ curl http://localhost:11434/api/generate -d '{
"model": "llama3.1" "model": "llama3.2"
}' }'
``` ```
...@@ -400,7 +400,7 @@ A single JSON object is returned: ...@@ -400,7 +400,7 @@ A single JSON object is returned:
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-12-18T19:52:07.071755Z", "created_at": "2023-12-18T19:52:07.071755Z",
"response": "", "response": "",
"done": true "done": true
...@@ -415,7 +415,7 @@ If an empty prompt is provided and the `keep_alive` parameter is set to `0`, a m ...@@ -415,7 +415,7 @@ If an empty prompt is provided and the `keep_alive` parameter is set to `0`, a m
```shell ```shell
curl http://localhost:11434/api/generate -d '{ curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "model": "llama3.2",
"keep_alive": 0 "keep_alive": 0
}' }'
``` ```
...@@ -426,7 +426,7 @@ A single JSON object is returned: ...@@ -426,7 +426,7 @@ A single JSON object is returned:
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2024-09-12T03:54:03.516566Z", "created_at": "2024-09-12T03:54:03.516566Z",
"response": "", "response": "",
"done": true, "done": true,
...@@ -472,7 +472,7 @@ Send a chat message with a streaming response. ...@@ -472,7 +472,7 @@ Send a chat message with a streaming response.
```shell ```shell
curl http://localhost:11434/api/chat -d '{ curl http://localhost:11434/api/chat -d '{
"model": "llama3.1", "model": "llama3.2",
"messages": [ "messages": [
{ {
"role": "user", "role": "user",
...@@ -488,7 +488,7 @@ A stream of JSON objects is returned: ...@@ -488,7 +488,7 @@ A stream of JSON objects is returned:
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00", "created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": { "message": {
"role": "assistant", "role": "assistant",
...@@ -503,7 +503,7 @@ Final response: ...@@ -503,7 +503,7 @@ Final response:
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z", "created_at": "2023-08-04T19:22:45.499127Z",
"done": true, "done": true,
"total_duration": 4883583458, "total_duration": 4883583458,
...@@ -521,7 +521,7 @@ Final response: ...@@ -521,7 +521,7 @@ Final response:
```shell ```shell
curl http://localhost:11434/api/chat -d '{ curl http://localhost:11434/api/chat -d '{
"model": "llama3.1", "model": "llama3.2",
"messages": [ "messages": [
{ {
"role": "user", "role": "user",
...@@ -536,7 +536,7 @@ curl http://localhost:11434/api/chat -d '{ ...@@ -536,7 +536,7 @@ curl http://localhost:11434/api/chat -d '{
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z", "created_at": "2023-12-12T14:13:43.416799Z",
"message": { "message": {
"role": "assistant", "role": "assistant",
...@@ -560,7 +560,7 @@ Send a chat message with a conversation history. You can use this same approach ...@@ -560,7 +560,7 @@ Send a chat message with a conversation history. You can use this same approach
```shell ```shell
curl http://localhost:11434/api/chat -d '{ curl http://localhost:11434/api/chat -d '{
"model": "llama3.1", "model": "llama3.2",
"messages": [ "messages": [
{ {
"role": "user", "role": "user",
...@@ -584,7 +584,7 @@ A stream of JSON objects is returned: ...@@ -584,7 +584,7 @@ A stream of JSON objects is returned:
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00", "created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": { "message": {
"role": "assistant", "role": "assistant",
...@@ -598,7 +598,7 @@ Final response: ...@@ -598,7 +598,7 @@ Final response:
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z", "created_at": "2023-08-04T19:22:45.499127Z",
"done": true, "done": true,
"total_duration": 8113331500, "total_duration": 8113331500,
...@@ -656,7 +656,7 @@ curl http://localhost:11434/api/chat -d '{ ...@@ -656,7 +656,7 @@ curl http://localhost:11434/api/chat -d '{
```shell ```shell
curl http://localhost:11434/api/chat -d '{ curl http://localhost:11434/api/chat -d '{
"model": "llama3.1", "model": "llama3.2",
"messages": [ "messages": [
{ {
"role": "user", "role": "user",
...@@ -674,7 +674,7 @@ curl http://localhost:11434/api/chat -d '{ ...@@ -674,7 +674,7 @@ curl http://localhost:11434/api/chat -d '{
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z", "created_at": "2023-12-12T14:13:43.416799Z",
"message": { "message": {
"role": "assistant", "role": "assistant",
...@@ -696,7 +696,7 @@ curl http://localhost:11434/api/chat -d '{ ...@@ -696,7 +696,7 @@ curl http://localhost:11434/api/chat -d '{
``` ```
curl http://localhost:11434/api/chat -d '{ curl http://localhost:11434/api/chat -d '{
"model": "llama3.1", "model": "llama3.2",
"messages": [ "messages": [
{ {
"role": "user", "role": "user",
...@@ -735,7 +735,7 @@ curl http://localhost:11434/api/chat -d '{ ...@@ -735,7 +735,7 @@ curl http://localhost:11434/api/chat -d '{
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at": "2024-07-22T20:33:28.123648Z", "created_at": "2024-07-22T20:33:28.123648Z",
"message": { "message": {
"role": "assistant", "role": "assistant",
...@@ -771,7 +771,7 @@ If the messages array is empty, the model will be loaded into memory. ...@@ -771,7 +771,7 @@ If the messages array is empty, the model will be loaded into memory.
``` ```
curl http://localhost:11434/api/chat -d '{ curl http://localhost:11434/api/chat -d '{
"model": "llama3.1", "model": "llama3.2",
"messages": [] "messages": []
}' }'
``` ```
...@@ -779,7 +779,7 @@ curl http://localhost:11434/api/chat -d '{ ...@@ -779,7 +779,7 @@ curl http://localhost:11434/api/chat -d '{
##### Response ##### Response
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at":"2024-09-12T21:17:29.110811Z", "created_at":"2024-09-12T21:17:29.110811Z",
"message": { "message": {
"role": "assistant", "role": "assistant",
...@@ -798,7 +798,7 @@ If the messages array is empty and the `keep_alive` parameter is set to `0`, a m ...@@ -798,7 +798,7 @@ If the messages array is empty and the `keep_alive` parameter is set to `0`, a m
``` ```
curl http://localhost:11434/api/chat -d '{ curl http://localhost:11434/api/chat -d '{
"model": "llama3.1", "model": "llama3.2",
"messages": [], "messages": [],
"keep_alive": 0 "keep_alive": 0
}' }'
...@@ -810,7 +810,7 @@ A single JSON object is returned: ...@@ -810,7 +810,7 @@ A single JSON object is returned:
```json ```json
{ {
"model": "llama3.1", "model": "llama3.2",
"created_at":"2024-09-12T21:33:17.547535Z", "created_at":"2024-09-12T21:33:17.547535Z",
"message": { "message": {
"role": "assistant", "role": "assistant",
...@@ -989,7 +989,7 @@ Show information about a model including details, modelfile, template, parameter ...@@ -989,7 +989,7 @@ Show information about a model including details, modelfile, template, parameter
```shell ```shell
curl http://localhost:11434/api/show -d '{ curl http://localhost:11434/api/show -d '{
"name": "llama3.1" "name": "llama3.2"
}' }'
``` ```
...@@ -1050,7 +1050,7 @@ Copy a model. Creates a model with another name from an existing model. ...@@ -1050,7 +1050,7 @@ Copy a model. Creates a model with another name from an existing model.
```shell ```shell
curl http://localhost:11434/api/copy -d '{ curl http://localhost:11434/api/copy -d '{
"source": "llama3.1", "source": "llama3.2",
"destination": "llama3-backup" "destination": "llama3-backup"
}' }'
``` ```
...@@ -1105,7 +1105,7 @@ Download a model from the ollama library. Cancelled pulls are resumed from where ...@@ -1105,7 +1105,7 @@ Download a model from the ollama library. Cancelled pulls are resumed from where
```shell ```shell
curl http://localhost:11434/api/pull -d '{ curl http://localhost:11434/api/pull -d '{
"name": "llama3.1" "name": "llama3.2"
}' }'
``` ```
......
...@@ -63,7 +63,7 @@ docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 114 ...@@ -63,7 +63,7 @@ docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 114
Now you can run a model: Now you can run a model:
``` ```
docker exec -it ollama ollama run llama3.1 docker exec -it ollama ollama run llama3.2
``` ```
### Try different models ### Try different models
......
...@@ -32,7 +32,7 @@ When using the API, specify the `num_ctx` parameter: ...@@ -32,7 +32,7 @@ When using the API, specify the `num_ctx` parameter:
```shell ```shell
curl http://localhost:11434/api/generate -d '{ curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "model": "llama3.2",
"prompt": "Why is the sky blue?", "prompt": "Why is the sky blue?",
"options": { "options": {
"num_ctx": 4096 "num_ctx": 4096
...@@ -232,7 +232,7 @@ curl http://localhost:11434/api/chat -d '{"model": "mistral"}' ...@@ -232,7 +232,7 @@ curl http://localhost:11434/api/chat -d '{"model": "mistral"}'
To preload a model using the CLI, use the command: To preload a model using the CLI, use the command:
```shell ```shell
ollama run llama3.1 "" ollama run llama3.2 ""
``` ```
## How do I keep a model loaded in memory or make it unload immediately? ## How do I keep a model loaded in memory or make it unload immediately?
...@@ -240,7 +240,7 @@ ollama run llama3.1 "" ...@@ -240,7 +240,7 @@ ollama run llama3.1 ""
By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you're making numerous requests to the LLM. If you want to immediately unload a model from memory, use the `ollama stop` command: By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you're making numerous requests to the LLM. If you want to immediately unload a model from memory, use the `ollama stop` command:
```shell ```shell
ollama stop llama3.1 ollama stop llama3.2
``` ```
If you're using the API, use the `keep_alive` parameter with the `/api/generate` and `/api/chat` endpoints to set the amount of time that a model stays in memory. The `keep_alive` parameter can be set to: If you're using the API, use the `keep_alive` parameter with the `/api/generate` and `/api/chat` endpoints to set the amount of time that a model stays in memory. The `keep_alive` parameter can be set to:
...@@ -251,12 +251,12 @@ If you're using the API, use the `keep_alive` parameter with the `/api/generate` ...@@ -251,12 +251,12 @@ If you're using the API, use the `keep_alive` parameter with the `/api/generate`
For example, to preload a model and leave it in memory use: For example, to preload a model and leave it in memory use:
```shell ```shell
curl http://localhost:11434/api/generate -d '{"model": "llama3.1", "keep_alive": -1}' curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": -1}'
``` ```
To unload the model and free up memory use: To unload the model and free up memory use:
```shell ```shell
curl http://localhost:11434/api/generate -d '{"model": "llama3.1", "keep_alive": 0}' curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": 0}'
``` ```
Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to the section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable. Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to the section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable.
......
...@@ -50,7 +50,7 @@ INSTRUCTION arguments ...@@ -50,7 +50,7 @@ INSTRUCTION arguments
An example of a `Modelfile` creating a mario blueprint: An example of a `Modelfile` creating a mario blueprint:
```modelfile ```modelfile
FROM llama3.1 FROM llama3.2
# sets the temperature to 1 [higher is more creative, lower is more coherent] # sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1 PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token # sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
...@@ -72,10 +72,10 @@ More examples are available in the [examples directory](../examples). ...@@ -72,10 +72,10 @@ More examples are available in the [examples directory](../examples).
To view the Modelfile of a given model, use the `ollama show --modelfile` command. To view the Modelfile of a given model, use the `ollama show --modelfile` command.
```bash ```bash
> ollama show --modelfile llama3.1 > ollama show --modelfile llama3.2
# Modelfile generated by "ollama show" # Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with: # To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3.1:latest # FROM llama3.2:latest
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|> TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
...@@ -103,7 +103,7 @@ FROM <model name>:<tag> ...@@ -103,7 +103,7 @@ FROM <model name>:<tag>
#### Build from existing model #### Build from existing model
```modelfile ```modelfile
FROM llama3.1 FROM llama3.2
``` ```
A list of available base models: A list of available base models:
......
...@@ -25,7 +25,7 @@ chat_completion = client.chat.completions.create( ...@@ -25,7 +25,7 @@ chat_completion = client.chat.completions.create(
'content': 'Say this is a test', 'content': 'Say this is a test',
} }
], ],
model='llama3.1', model='llama3.2',
) )
response = client.chat.completions.create( response = client.chat.completions.create(
...@@ -46,13 +46,13 @@ response = client.chat.completions.create( ...@@ -46,13 +46,13 @@ response = client.chat.completions.create(
) )
completion = client.completions.create( completion = client.completions.create(
model="llama3.1", model="llama3.2",
prompt="Say this is a test", prompt="Say this is a test",
) )
list_completion = client.models.list() list_completion = client.models.list()
model = client.models.retrieve("llama3.1") model = client.models.retrieve("llama3.2")
embeddings = client.embeddings.create( embeddings = client.embeddings.create(
model="all-minilm", model="all-minilm",
...@@ -74,7 +74,7 @@ const openai = new OpenAI({ ...@@ -74,7 +74,7 @@ const openai = new OpenAI({
const chatCompletion = await openai.chat.completions.create({ const chatCompletion = await openai.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }], messages: [{ role: 'user', content: 'Say this is a test' }],
model: 'llama3.1', model: 'llama3.2',
}) })
const response = await openai.chat.completions.create({ const response = await openai.chat.completions.create({
...@@ -94,13 +94,13 @@ const response = await openai.chat.completions.create({ ...@@ -94,13 +94,13 @@ const response = await openai.chat.completions.create({
}) })
const completion = await openai.completions.create({ const completion = await openai.completions.create({
model: "llama3.1", model: "llama3.2",
prompt: "Say this is a test.", prompt: "Say this is a test.",
}) })
const listCompletion = await openai.models.list() const listCompletion = await openai.models.list()
const model = await openai.models.retrieve("llama3.1") const model = await openai.models.retrieve("llama3.2")
const embedding = await openai.embeddings.create({ const embedding = await openai.embeddings.create({
model: "all-minilm", model: "all-minilm",
...@@ -114,7 +114,7 @@ const embedding = await openai.embeddings.create({ ...@@ -114,7 +114,7 @@ const embedding = await openai.embeddings.create({
curl http://localhost:11434/v1/chat/completions \ curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{ -d '{
"model": "llama3.1", "model": "llama3.2",
"messages": [ "messages": [
{ {
"role": "system", "role": "system",
...@@ -154,13 +154,13 @@ curl http://localhost:11434/v1/chat/completions \ ...@@ -154,13 +154,13 @@ curl http://localhost:11434/v1/chat/completions \
curl http://localhost:11434/v1/completions \ curl http://localhost:11434/v1/completions \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{ -d '{
"model": "llama3.1", "model": "llama3.2",
"prompt": "Say this is a test" "prompt": "Say this is a test"
}' }'
curl http://localhost:11434/v1/models curl http://localhost:11434/v1/models
curl http://localhost:11434/v1/models/llama3.1 curl http://localhost:11434/v1/models/llama3.2
curl http://localhost:11434/v1/embeddings \ curl http://localhost:11434/v1/embeddings \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
...@@ -274,7 +274,7 @@ curl http://localhost:11434/v1/embeddings \ ...@@ -274,7 +274,7 @@ curl http://localhost:11434/v1/embeddings \
Before using a model, pull it locally `ollama pull`: Before using a model, pull it locally `ollama pull`:
```shell ```shell
ollama pull llama3.1 ollama pull llama3.2
``` ```
### Default model names ### Default model names
...@@ -282,7 +282,7 @@ ollama pull llama3.1 ...@@ -282,7 +282,7 @@ ollama pull llama3.1
For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name: For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name:
``` ```
ollama cp llama3.1 gpt-3.5-turbo ollama cp llama3.2 gpt-3.5-turbo
``` ```
Afterwards, this new model name can be specified the `model` field: Afterwards, this new model name can be specified the `model` field:
......
...@@ -33,7 +33,7 @@ Omitting a template in these models puts the responsibility of correctly templat ...@@ -33,7 +33,7 @@ Omitting a template in these models puts the responsibility of correctly templat
To add templates in your model, you'll need to add a `TEMPLATE` command to the Modelfile. Here's an example using Meta's Llama 3. To add templates in your model, you'll need to add a `TEMPLATE` command to the Modelfile. Here's an example using Meta's Llama 3.
```dockerfile ```dockerfile
FROM llama3.1 FROM llama3.2
TEMPLATE """{{- if .System }}<|start_header_id|>system<|end_header_id|> TEMPLATE """{{- if .System }}<|start_header_id|>system<|end_header_id|>
......
...@@ -15,7 +15,7 @@ import { Ollama } from "@langchain/community/llms/ollama"; ...@@ -15,7 +15,7 @@ import { Ollama } from "@langchain/community/llms/ollama";
const ollama = new Ollama({ const ollama = new Ollama({
baseUrl: "http://localhost:11434", baseUrl: "http://localhost:11434",
model: "llama3.1", model: "llama3.2",
}); });
const answer = await ollama.invoke(`why is the sky blue?`); const answer = await ollama.invoke(`why is the sky blue?`);
...@@ -23,7 +23,7 @@ const answer = await ollama.invoke(`why is the sky blue?`); ...@@ -23,7 +23,7 @@ const answer = await ollama.invoke(`why is the sky blue?`);
console.log(answer); console.log(answer);
``` ```
That will get us the same thing as if we ran `ollama run llama3.1 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app. That will get us the same thing as if we ran `ollama run llama3.2 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
```bash ```bash
npm install cheerio npm install cheerio
......
...@@ -29,7 +29,7 @@ Ollama uses unicode characters for progress indication, which may render as unkn ...@@ -29,7 +29,7 @@ Ollama uses unicode characters for progress indication, which may render as unkn
Here's a quick example showing API access from `powershell` Here's a quick example showing API access from `powershell`
```powershell ```powershell
(Invoke-WebRequest -method POST -Body '{"model":"llama3.1", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json (Invoke-WebRequest -method POST -Body '{"model":"llama3.2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
``` ```
## Troubleshooting ## Troubleshooting
......
...@@ -35,7 +35,7 @@ func main() { ...@@ -35,7 +35,7 @@ func main() {
ctx := context.Background() ctx := context.Background()
req := &api.ChatRequest{ req := &api.ChatRequest{
Model: "llama3.1", Model: "llama3.2",
Messages: messages, Messages: messages,
} }
......
...@@ -4,10 +4,10 @@ This example provides an interface for asking questions to a PDF document. ...@@ -4,10 +4,10 @@ This example provides an interface for asking questions to a PDF document.
## Setup ## Setup
1. Ensure you have the `llama3.1` model installed: 1. Ensure you have the `llama3.2` model installed:
``` ```
ollama pull llama3.1 ollama pull llama3.2
``` ```
2. Install the Python Requirements. 2. Install the Python Requirements.
......
...@@ -51,7 +51,7 @@ while True: ...@@ -51,7 +51,7 @@ while True:
template=template, template=template,
) )
llm = Ollama(model="llama3.1", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])) llm = Ollama(model="llama3.2", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))
qa_chain = RetrievalQA.from_chain_type( qa_chain = RetrievalQA.from_chain_type(
llm, llm,
retriever=vectorstore.as_retriever(), retriever=vectorstore.as_retriever(),
......
...@@ -4,10 +4,10 @@ This example summarizes the website, [https://ollama.com/blog/run-llama2-uncenso ...@@ -4,10 +4,10 @@ This example summarizes the website, [https://ollama.com/blog/run-llama2-uncenso
## Running the Example ## Running the Example
1. Ensure you have the `llama3.1` model installed: 1. Ensure you have the `llama3.2` model installed:
```bash ```bash
ollama pull llama3.1 ollama pull llama3.2
``` ```
2. Install the Python Requirements. 2. Install the Python Requirements.
......
...@@ -5,7 +5,7 @@ from langchain.chains.summarize import load_summarize_chain ...@@ -5,7 +5,7 @@ from langchain.chains.summarize import load_summarize_chain
loader = WebBaseLoader("https://ollama.com/blog/run-llama2-uncensored-locally") loader = WebBaseLoader("https://ollama.com/blog/run-llama2-uncensored-locally")
docs = loader.load() docs = loader.load()
llm = Ollama(model="llama3.1") llm = Ollama(model="llama3.2")
chain = load_summarize_chain(llm, chain_type="stuff") chain = load_summarize_chain(llm, chain_type="stuff")
result = chain.invoke(docs) result = chain.invoke(docs)
......
...@@ -4,10 +4,10 @@ This example is a basic "hello world" of using LangChain with Ollama. ...@@ -4,10 +4,10 @@ This example is a basic "hello world" of using LangChain with Ollama.
## Running the Example ## Running the Example
1. Ensure you have the `llama3.1` model installed: 1. Ensure you have the `llama3.2` model installed:
```bash ```bash
ollama pull llama3.1 ollama pull llama3.2
``` ```
2. Install the Python Requirements. 2. Install the Python Requirements.
......
from langchain.llms import Ollama from langchain.llms import Ollama
input = input("What is your question?") input = input("What is your question?")
llm = Ollama(model="llama3.1") llm = Ollama(model="llama3.2")
res = llm.predict(input) res = llm.predict(input)
print (res) print (res)
FROM llama3.1 FROM llama3.2
PARAMETER temperature 1 PARAMETER temperature 1
SYSTEM """ SYSTEM """
You are Mario from super mario bros, acting as an assistant. You are Mario from super mario bros, acting as an assistant.
......
...@@ -2,12 +2,12 @@ ...@@ -2,12 +2,12 @@
# Example character: Mario # Example character: Mario
This example shows how to create a basic character using Llama3.1 as the base model. This example shows how to create a basic character using Llama 3.2 as the base model.
To run this example: To run this example:
1. Download the Modelfile 1. Download the Modelfile
2. `ollama pull llama3.1` to get the base model used in the model file. 2. `ollama pull llama3.2` to get the base model used in the model file.
3. `ollama create NAME -f ./Modelfile` 3. `ollama create NAME -f ./Modelfile`
4. `ollama run NAME` 4. `ollama run NAME`
...@@ -18,7 +18,7 @@ Ask it some questions like "Who are you?" or "Is Peach in trouble again?" ...@@ -18,7 +18,7 @@ Ask it some questions like "Who are you?" or "Is Peach in trouble again?"
What the model file looks like: What the model file looks like:
``` ```
FROM llama3.1 FROM llama3.2
PARAMETER temperature 1 PARAMETER temperature 1
SYSTEM """ SYSTEM """
You are Mario from Super Mario Bros, acting as an assistant. You are Mario from Super Mario Bros, acting as an assistant.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment