@@ -18,7 +18,7 @@ Get up and running with large language models.
...
@@ -18,7 +18,7 @@ Get up and running with large language models.
### Linux
### Linux
```
```shell
curl -fsSL https://ollama.com/install.sh | sh
curl -fsSL https://ollama.com/install.sh | sh
```
```
...
@@ -42,7 +42,7 @@ The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `olla
...
@@ -42,7 +42,7 @@ The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `olla
To run and chat with [Llama 3.2](https://ollama.com/library/llama3.2):
To run and chat with [Llama 3.2](https://ollama.com/library/llama3.2):
```
```shell
ollama run llama3.2
ollama run llama3.2
```
```
...
@@ -92,13 +92,13 @@ Ollama supports importing GGUF models in the Modelfile:
...
@@ -92,13 +92,13 @@ Ollama supports importing GGUF models in the Modelfile:
2. Create the model in Ollama
2. Create the model in Ollama
```
```shell
ollama create example -f Modelfile
ollama create example -f Modelfile
```
```
3. Run the model
3. Run the model
```
```shell
ollama run example
ollama run example
```
```
...
@@ -110,7 +110,7 @@ See the [guide](docs/import.md) on importing models for more information.
...
@@ -110,7 +110,7 @@ See the [guide](docs/import.md) on importing models for more information.
Models from the Ollama library can be customized with a prompt. For example, to customize the `llama3.2` model:
Models from the Ollama library can be customized with a prompt. For example, to customize the `llama3.2` model:
```
```shell
ollama pull llama3.2
ollama pull llama3.2
```
```
...
@@ -145,13 +145,13 @@ For more information on working with a Modelfile, see the [Modelfile](docs/model
...
@@ -145,13 +145,13 @@ For more information on working with a Modelfile, see the [Modelfile](docs/model
`ollama create` is used to create a model from a Modelfile.
`ollama create` is used to create a model from a Modelfile.
```
```shell
ollama create mymodel -f ./Modelfile
ollama create mymodel -f ./Modelfile
```
```
### Pull a model
### Pull a model
```
```shell
ollama pull llama3.2
ollama pull llama3.2
```
```
...
@@ -159,13 +159,13 @@ ollama pull llama3.2
...
@@ -159,13 +159,13 @@ ollama pull llama3.2
### Remove a model
### Remove a model
```
```shell
ollama rm llama3.2
ollama rm llama3.2
```
```
### Copy a model
### Copy a model
```
```shell
ollama cp llama3.2 my-model
ollama cp llama3.2 my-model
```
```
...
@@ -184,37 +184,39 @@ I'm a basic program that prints the famous "Hello, world!" message to the consol
...
@@ -184,37 +184,39 @@ I'm a basic program that prints the famous "Hello, world!" message to the consol
```
```
ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
The image features a yellow smiley face, which is likely the central focus of the picture.
```
```
> **Output**: The image features a yellow smiley face, which is likely the central focus of the picture.
### Pass the prompt as an argument
### Pass the prompt as an argument
```shell
ollama run llama3.2 "Summarize this file: $(cat README.md)"
```
```
$ ollama run llama3.2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
> **Output**: Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
```
### Show model information
### Show model information
```
```shell
ollama show llama3.2
ollama show llama3.2
```
```
### List models on your computer
### List models on your computer
```
```shell
ollama list
ollama list
```
```
### List which models are currently loaded
### List which models are currently loaded
```
```shell
ollama ps
ollama ps
```
```
### Stop a model which is currently running
### Stop a model which is currently running
```
```shell
ollama stop llama3.2
ollama stop llama3.2
```
```
...
@@ -230,13 +232,13 @@ See the [developer guide](https://github.com/ollama/ollama/blob/main/docs/develo
...
@@ -230,13 +232,13 @@ See the [developer guide](https://github.com/ollama/ollama/blob/main/docs/develo
Next, start the server:
Next, start the server:
```
```shell
./ollama serve
./ollama serve
```
```
Finally, in a separate shell, run a model:
Finally, in a separate shell, run a model:
```
```shell
./ollama run llama3.2
./ollama run llama3.2
```
```
...
@@ -246,7 +248,7 @@ Ollama has a REST API for running and managing models.
...
@@ -246,7 +248,7 @@ Ollama has a REST API for running and managing models.
@@ -24,7 +24,7 @@ By default, Ollama uses a context window size of 2048 tokens.
...
@@ -24,7 +24,7 @@ By default, Ollama uses a context window size of 2048 tokens.
To change this when using `ollama run`, use `/set parameter`:
To change this when using `ollama run`, use `/set parameter`:
```
```shell
/set parameter num_ctx 4096
/set parameter num_ctx 4096
```
```
...
@@ -46,10 +46,15 @@ Use the `ollama ps` command to see what models are currently loaded into memory.
...
@@ -46,10 +46,15 @@ Use the `ollama ps` command to see what models are currently loaded into memory.
```shell
```shell
ollama ps
ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3:70b bcfb190ca3a7 42 GB 100% GPU 4 minutes from now
```
```
> **Output**:
>
> ```
> NAME ID SIZE PROCESSOR UNTIL
> llama3:70b bcfb190ca3a7 42 GB 100% GPU 4 minutes from now
> ```
The `Processor` column will show which memory the model was loaded in to:
The `Processor` column will show which memory the model was loaded in to:
*`100% GPU` means the model was loaded entirely into the GPU
*`100% GPU` means the model was loaded entirely into the GPU
*`100% CPU` means the model was loaded entirely in system memory
*`100% CPU` means the model was loaded entirely in system memory
...
@@ -88,7 +93,7 @@ If Ollama is run as a systemd service, environment variables should be set using
...
@@ -88,7 +93,7 @@ If Ollama is run as a systemd service, environment variables should be set using
4. Reload `systemd` and restart Ollama:
4. Reload `systemd` and restart Ollama:
```bash
```shell
systemctl daemon-reload
systemctl daemon-reload
systemctl restart ollama
systemctl restart ollama
```
```
...
@@ -221,16 +226,19 @@ properties.
...
@@ -221,16 +226,19 @@ properties.
If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints.
If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints.
To preload the mistral model using the generate endpoint, use:
To preload the mistral model using the generate endpoint, use:
@@ -28,7 +28,7 @@ A model file is the blueprint to create and share models with Ollama.
...
@@ -28,7 +28,7 @@ A model file is the blueprint to create and share models with Ollama.
The format of the `Modelfile`:
The format of the `Modelfile`:
```modelfile
```
# comment
# comment
INSTRUCTION arguments
INSTRUCTION arguments
```
```
...
@@ -49,7 +49,7 @@ INSTRUCTION arguments
...
@@ -49,7 +49,7 @@ INSTRUCTION arguments
An example of a `Modelfile` creating a mario blueprint:
An example of a `Modelfile` creating a mario blueprint:
```modelfile
```
FROM llama3.2
FROM llama3.2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
PARAMETER temperature 1
...
@@ -69,24 +69,30 @@ To use this:
...
@@ -69,24 +69,30 @@ To use this:
To view the Modelfile of a given model, use the `ollama show --modelfile` command.
To view the Modelfile of a given model, use the `ollama show --modelfile` command.
```bash
```shell
> ollama show --modelfile llama3.2
ollama show --modelfile llama3.2
# Modelfile generated by "ollama show"
```
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3.2:latest
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
> **Output**:
>
> ```
> # Modelfile generated by "ollama show"
> # To build a new Modelfile based on this one, replace the FROM line with:
> # FROM llama3.2:latest
> FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
> TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
>
> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
>
> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
>
> {{ .Response }}<|eot_id|>"""
> PARAMETER stop "<|start_header_id|>"
> PARAMETER stop "<|end_header_id|>"
> PARAMETER stop "<|eot_id|>"
> PARAMETER stop "<|reserved_special_token"
> ```
{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"
```
## Instructions
## Instructions
...
@@ -94,13 +100,13 @@ To view the Modelfile of a given model, use the `ollama show --modelfile` comman
...
@@ -94,13 +100,13 @@ To view the Modelfile of a given model, use the `ollama show --modelfile` comman
The `FROM` instruction defines the base model to use when creating a model.
The `FROM` instruction defines the base model to use when creating a model.
```modelfile
```
FROM <model name>:<tag>
FROM <model name>:<tag>
```
```
#### Build from existing model
#### Build from existing model
```modelfile
```
FROM llama3.2
FROM llama3.2
```
```
...
@@ -111,7 +117,7 @@ Additional models can be found at:
...
@@ -111,7 +117,7 @@ Additional models can be found at:
#### Build from a Safetensors model
#### Build from a Safetensors model
```modelfile
```
FROM <model directory>
FROM <model directory>
```
```
...
@@ -125,7 +131,7 @@ Currently supported model architectures:
...
@@ -125,7 +131,7 @@ Currently supported model architectures:
#### Build from a GGUF file
#### Build from a GGUF file
```modelfile
```
FROM ./ollama-model.gguf
FROM ./ollama-model.gguf
```
```
...
@@ -136,7 +142,7 @@ The GGUF file location should be specified as an absolute path or relative to th
...
@@ -136,7 +142,7 @@ The GGUF file location should be specified as an absolute path or relative to th
The `PARAMETER` instruction defines a parameter that can be set when the model is run.
The `PARAMETER` instruction defines a parameter that can be set when the model is run.
```modelfile
```
PARAMETER <parameter> <parametervalue>
PARAMETER <parameter> <parametervalue>
```
```
...
@@ -183,7 +189,7 @@ TEMPLATE """{{ if .System }}<|im_start|>system
...
@@ -183,7 +189,7 @@ TEMPLATE """{{ if .System }}<|im_start|>system
The `SYSTEM` instruction specifies the system message to be used in the template, if applicable.
The `SYSTEM` instruction specifies the system message to be used in the template, if applicable.
```modelfile
```
SYSTEM """<system message>"""
SYSTEM """<system message>"""
```
```
...
@@ -193,7 +199,7 @@ The `ADAPTER` instruction specifies a fine tuned LoRA adapter that should apply
...
@@ -193,7 +199,7 @@ The `ADAPTER` instruction specifies a fine tuned LoRA adapter that should apply
#### Safetensor adapter
#### Safetensor adapter
```modelfile
```
ADAPTER <path to safetensor adapter>
ADAPTER <path to safetensor adapter>
```
```
...
@@ -204,7 +210,7 @@ Currently supported Safetensor adapters:
...
@@ -204,7 +210,7 @@ Currently supported Safetensor adapters:
#### GGUF adapter
#### GGUF adapter
```modelfile
```
ADAPTER ./ollama-lora.gguf
ADAPTER ./ollama-lora.gguf
```
```
...
@@ -212,7 +218,7 @@ ADAPTER ./ollama-lora.gguf
...
@@ -212,7 +218,7 @@ ADAPTER ./ollama-lora.gguf
The `LICENSE` instruction allows you to specify the legal license under which the model used with this Modelfile is shared or distributed.
The `LICENSE` instruction allows you to specify the legal license under which the model used with this Modelfile is shared or distributed.
```modelfile
```
LICENSE """
LICENSE """
<license text>
<license text>
"""
"""
...
@@ -222,7 +228,7 @@ LICENSE """
...
@@ -222,7 +228,7 @@ LICENSE """
The `MESSAGE` instruction allows you to specify a message history for the model to use when responding. Use multiple iterations of the MESSAGE command to build up a conversation which will guide the model to answer in a similar way.
The `MESSAGE` instruction allows you to specify a message history for the model to use when responding. Use multiple iterations of the MESSAGE command to build up a conversation which will guide the model to answer in a similar way.
> **Note:** OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama [Python library](https://github.com/ollama/ollama-python), [JavaScript library](https://github.com/ollama/ollama-js) and [REST API](https://github.com/ollama/ollama/blob/main/docs/api.md).
> [!NOTE]
> OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama [Python library](https://github.com/ollama/ollama-python), [JavaScript library](https://github.com/ollama/ollama-js) and [REST API](https://github.com/ollama/ollama/blob/main/docs/api.md).
Ollama provides experimental compatibility with parts of the [OpenAI API](https://platform.openai.com/docs/api-reference) to help connect existing applications to Ollama.
Ollama provides experimental compatibility with parts of the [OpenAI API](https://platform.openai.com/docs/api-reference) to help connect existing applications to Ollama.
The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a `Modelfile` which looks like:
The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a `Modelfile` which looks like:
You can set OLLAMA_LLM_LIBRARY to any of the available LLM libraries to bypass autodetection, so for example, if you have a CUDA card, but want to force the CPU LLM library with AVX2 vector support, use:
You can set OLLAMA_LLM_LIBRARY to any of the available LLM libraries to bypass autodetection, so for example, if you have a CUDA card, but want to force the CPU LLM library with AVX2 vector support, use:
```
```shell
OLLAMA_LLM_LIBRARY="cpu_avx2" ollama serve
OLLAMA_LLM_LIBRARY="cpu_avx2" ollama serve
```
```
You can see what features your CPU has with the following.
You can see what features your CPU has with the following.
If you run into problems on Linux and want to install an older version, or you'd like to try out a pre-release before it's officially released, you can tell the install script which version to install.
If you run into problems on Linux and want to install an older version, or you'd like to try out a pre-release before it's officially released, you can tell the install script which version to install.
```sh
```shell
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION="0.1.29" sh
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.5.7 sh
@@ -47,6 +47,7 @@ If Ollama is already running, Quit the tray application and relaunch it from the
...
@@ -47,6 +47,7 @@ If Ollama is already running, Quit the tray application and relaunch it from the
## API Access
## API Access
Here's a quick example showing API access from `powershell`
Here's a quick example showing API access from `powershell`
```powershell
```powershell
(Invoke-WebRequest-methodPOST-Body'{"model":"llama3.2", "prompt":"Why is the sky blue?", "stream": false}'-urihttp://localhost:11434/api/generate).Content|ConvertFrom-json
(Invoke-WebRequest-methodPOST-Body'{"model":"llama3.2", "prompt":"Why is the sky blue?", "stream": false}'-urihttp://localhost:11434/api/generate).Content|ConvertFrom-json
@@ -8,7 +8,7 @@ Ollama vendors [llama.cpp](https://github.com/ggerganov/llama.cpp/) and [ggml](h
...
@@ -8,7 +8,7 @@ Ollama vendors [llama.cpp](https://github.com/ggerganov/llama.cpp/) and [ggml](h
If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the `./vendor/` directory.
If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the `./vendor/` directory.
```
```shell
make -f Makefile.sync apply-patches
make -f Makefile.sync apply-patches
```
```
...
@@ -22,7 +22,7 @@ When updating to a newer base commit, the existing patches may not apply cleanly
...
@@ -22,7 +22,7 @@ When updating to a newer base commit, the existing patches may not apply cleanly
Start by applying the patches. If any of the patches have conflicts, the `git am` will stop at the first failure.
Start by applying the patches. If any of the patches have conflicts, the `git am` will stop at the first failure.
```
```shell
make -f Makefile.sync apply-patches
make -f Makefile.sync apply-patches
```
```
...
@@ -30,7 +30,7 @@ If there are conflicts, you will see an error message. Resolve the conflicts in
...
@@ -30,7 +30,7 @@ If there are conflicts, you will see an error message. Resolve the conflicts in
Once all patches are applied, commit the changes to the tracking repository.
Once all patches are applied, commit the changes to the tracking repository.
```
```shell
make -f Makefile.sync format-patches sync
make -f Makefile.sync format-patches sync
```
```
...
@@ -38,13 +38,13 @@ make -f Makefile.sync format-patches sync
...
@@ -38,13 +38,13 @@ make -f Makefile.sync format-patches sync
When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied:
When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied:
```
```shell
make -f Makefile.sync clean apply-patches
make -f Makefile.sync clean apply-patches
```
```
Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the `./vendor/` directory, then generate the patches for ollama with
Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the `./vendor/` directory, then generate the patches for ollama with