> This command can also be used to update a local model. Only the diff will be pulled.
> This command can also be used to update a local model. Only the diff will be pulled.
...
@@ -151,13 +153,13 @@ ollama pull llama3.1
...
@@ -151,13 +153,13 @@ ollama pull llama3.1
### Remove a model
### Remove a model
```
```
ollama rm llama3.1
ollama rm llama3.2
```
```
### Copy a model
### Copy a model
```
```
ollama cp llama3.1 my-model
ollama cp llama3.2 my-model
```
```
### Multiline input
### Multiline input
...
@@ -181,14 +183,14 @@ The image features a yellow smiley face, which is likely the central focus of th
...
@@ -181,14 +183,14 @@ The image features a yellow smiley face, which is likely the central focus of th
### Pass the prompt as an argument
### Pass the prompt as an argument
```
```
$ ollama run llama3.1 "Summarize this file: $(cat README.md)"
$ ollama run llama3.2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
```
```
### Show model information
### Show model information
```
```
ollama show llama3.1
ollama show llama3.2
```
```
### List models on your computer
### List models on your computer
...
@@ -206,7 +208,7 @@ ollama ps
...
@@ -206,7 +208,7 @@ ollama ps
### Stop a model which is currently running
### Stop a model which is currently running
```
```
ollama stop llama3.1
ollama stop llama3.2
```
```
### Start Ollama
### Start Ollama
...
@@ -228,7 +230,7 @@ Next, start the server:
...
@@ -228,7 +230,7 @@ Next, start the server:
Finally, in a separate shell, run a model:
Finally, in a separate shell, run a model:
```
```
./ollama run llama3.1
./ollama run llama3.2
```
```
## REST API
## REST API
...
@@ -239,7 +241,7 @@ Ollama has a REST API for running and managing models.
...
@@ -239,7 +241,7 @@ Ollama has a REST API for running and managing models.
To preload a model using the CLI, use the command:
To preload a model using the CLI, use the command:
```shell
```shell
ollama run llama3.1""
ollama run llama3.2""
```
```
## How do I keep a model loaded in memory or make it unload immediately?
## How do I keep a model loaded in memory or make it unload immediately?
...
@@ -240,7 +240,7 @@ ollama run llama3.1 ""
...
@@ -240,7 +240,7 @@ ollama run llama3.1 ""
By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you're making numerous requests to the LLM. If you want to immediately unload a model from memory, use the `ollama stop` command:
By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you're making numerous requests to the LLM. If you want to immediately unload a model from memory, use the `ollama stop` command:
```shell
```shell
ollama stop llama3.1
ollama stop llama3.2
```
```
If you're using the API, use the `keep_alive` parameter with the `/api/generate` and `/api/chat` endpoints to set the amount of time that a model stays in memory. The `keep_alive` parameter can be set to:
If you're using the API, use the `keep_alive` parameter with the `/api/generate` and `/api/chat` endpoints to set the amount of time that a model stays in memory. The `keep_alive` parameter can be set to:
...
@@ -251,12 +251,12 @@ If you're using the API, use the `keep_alive` parameter with the `/api/generate`
...
@@ -251,12 +251,12 @@ If you're using the API, use the `keep_alive` parameter with the `/api/generate`
For example, to preload a model and leave it in memory use:
For example, to preload a model and leave it in memory use:
Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to the section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable.
Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to the section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable.
@@ -15,7 +15,7 @@ import { Ollama } from "@langchain/community/llms/ollama";
...
@@ -15,7 +15,7 @@ import { Ollama } from "@langchain/community/llms/ollama";
constollama=newOllama({
constollama=newOllama({
baseUrl:"http://localhost:11434",
baseUrl:"http://localhost:11434",
model:"llama3.1",
model:"llama3.2",
});
});
constanswer=awaitollama.invoke(`why is the sky blue?`);
constanswer=awaitollama.invoke(`why is the sky blue?`);
...
@@ -23,7 +23,7 @@ const answer = await ollama.invoke(`why is the sky blue?`);
...
@@ -23,7 +23,7 @@ const answer = await ollama.invoke(`why is the sky blue?`);
console.log(answer);
console.log(answer);
```
```
That will get us the same thing as if we ran `ollama run llama3.1 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
That will get us the same thing as if we ran `ollama run llama3.2 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
@@ -29,7 +29,7 @@ Ollama uses unicode characters for progress indication, which may render as unkn
...
@@ -29,7 +29,7 @@ Ollama uses unicode characters for progress indication, which may render as unkn
Here's a quick example showing API access from `powershell`
Here's a quick example showing API access from `powershell`
```powershell
```powershell
(Invoke-WebRequest-methodPOST-Body'{"model":"llama3.1", "prompt":"Why is the sky blue?", "stream": false}'-urihttp://localhost:11434/api/generate).Content|ConvertFrom-json
(Invoke-WebRequest-methodPOST-Body'{"model":"llama3.2", "prompt":"Why is the sky blue?", "stream": false}'-urihttp://localhost:11434/api/generate).Content|ConvertFrom-json