@@ -357,7 +357,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
...
@@ -357,7 +357,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
-[OpenTalkGpt](https://github.com/adarshM84/OpenTalkGpt)(Chrome Extension to manage open-source models supported by Ollama, create custom models, and chat with models from a user-friendly UI)
-[OpenTalkGpt](https://github.com/adarshM84/OpenTalkGpt)(Chrome Extension to manage open-source models supported by Ollama, create custom models, and chat with models from a user-friendly UI)
-[VT](https://github.com/vinhnx/vt.ai)(A minimal multimodal AI chat app, with dynamic conversation routing. Supports local models via Ollama)
-[VT](https://github.com/vinhnx/vt.ai)(A minimal multimodal AI chat app, with dynamic conversation routing. Supports local models via Ollama)
-[Nosia](https://github.com/nosia-ai/nosia)(Easy to install and use RAG platform based on Ollama)
-[Nosia](https://github.com/nosia-ai/nosia)(Easy to install and use RAG platform based on Ollama)
-[Witsy](https://github.com/nbonamy/witsy)(An AI Desktop application avaiable for Mac/Windows/Linux)
-[Witsy](https://github.com/nbonamy/witsy)(An AI Desktop application available for Mac/Windows/Linux)
-[Abbey](https://github.com/US-Artificial-Intelligence/abbey)(A configurable AI interface server with notebooks, document storage, and YouTube support)
-[Abbey](https://github.com/US-Artificial-Intelligence/abbey)(A configurable AI interface server with notebooks, document storage, and YouTube support)
-[Minima](https://github.com/dmayboroda/minima)(RAG with on-premises or fully local workflow)
-[Minima](https://github.com/dmayboroda/minima)(RAG with on-premises or fully local workflow)
@@ -80,7 +80,7 @@ If you are using a container to run Ollama, make sure you've set up the containe
...
@@ -80,7 +80,7 @@ If you are using a container to run Ollama, make sure you've set up the containe
Sometimes the Ollama can have difficulties initializing the GPU. When you check the server logs, this can show up as various error codes, such as "3" (not initialized), "46" (device unavailable), "100" (no device), "999" (unknown), or others. The following troubleshooting techniques may help resolve the problem
Sometimes the Ollama can have difficulties initializing the GPU. When you check the server logs, this can show up as various error codes, such as "3" (not initialized), "46" (device unavailable), "100" (no device), "999" (unknown), or others. The following troubleshooting techniques may help resolve the problem
- If you are using a container, is the container runtime working? Try `docker run --gpus all ubuntu nvidia-smi` - if this doesn't work, Ollama wont be able to see your NVIDIA GPU.
- If you are using a container, is the container runtime working? Try `docker run --gpus all ubuntu nvidia-smi` - if this doesn't work, Ollama won't be able to see your NVIDIA GPU.
- Is the uvm driver loaded? `sudo nvidia-modprobe -u`
- Is the uvm driver loaded? `sudo nvidia-modprobe -u`
- Try reloading the nvidia_uvm driver - `sudo rmmod nvidia_uvm` then `sudo modprobe nvidia_uvm`
- Try reloading the nvidia_uvm driver - `sudo rmmod nvidia_uvm` then `sudo modprobe nvidia_uvm`
# RAG Hallucination Checker using Bespoke-Minicheck
# RAG Hallucination Checker using Bespoke-Minicheck
This example allows the user to ask questions related to a document, which can be specified via an article url. Relevant chunks are retreived from the document and given to `llama3.2` as context to answer the question. Then each sentence in the answer is checked against the retrieved chunks using `bespoke-minicheck` to ensure that the answer does not contain hallucinations.
This example allows the user to ask questions related to a document, which can be specified via an article url. Relevant chunks are retrieved from the document and given to `llama3.2` as context to answer the question. Then each sentence in the answer is checked against the retrieved chunks using `bespoke-minicheck` to ensure that the answer does not contain hallucinations.
// Model will still need to fit in VRAM. If this setting wont fit
// Model will still need to fit in VRAM. If this setting won't fit
// we'll back off down to 1 to try to get it to fit
// we'll back off down to 1 to try to get it to fit
vardefaultParallel=4
vardefaultParallel=4
...
@@ -501,7 +501,7 @@ func (s *Scheduler) updateFreeSpace(allGpus discover.GpuInfoList) {
...
@@ -501,7 +501,7 @@ func (s *Scheduler) updateFreeSpace(allGpus discover.GpuInfoList) {
}elseif(allGpus[i].TotalMemory-p)<allGpus[i].FreeMemory{// predicted free is smaller than reported free, use it
}elseif(allGpus[i].TotalMemory-p)<allGpus[i].FreeMemory{// predicted free is smaller than reported free, use it
// TODO maybe we should just always trust our numbers, since cuda's free memory reporting is laggy
// TODO maybe we should just always trust our numbers, since cuda's free memory reporting is laggy
// and we might unload models we didn't actually need to. The risk is if some other GPU intensive app is loaded
// and we might unload models we didn't actually need to. The risk is if some other GPU intensive app is loaded
// after we start our first runner, then we'll never acount for that, so picking the smallest free value seems prudent.
// after we start our first runner, then we'll never account for that, so picking the smallest free value seems prudent.
allGpus[i].FreeMemory=allGpus[i].TotalMemory-p
allGpus[i].FreeMemory=allGpus[i].TotalMemory-p
}
}
slog.Info("updated VRAM based on existing loaded models","gpu",allGpus[i].ID,"library",allGpus[i].Library,"total",format.HumanBytes2(allGpus[i].TotalMemory),"available",format.HumanBytes2(allGpus[i].FreeMemory))
slog.Info("updated VRAM based on existing loaded models","gpu",allGpus[i].ID,"library",allGpus[i].Library,"total",format.HumanBytes2(allGpus[i].TotalMemory),"available",format.HumanBytes2(allGpus[i].FreeMemory))
@@ -6,7 +6,7 @@ The instructions in this section override those in the task description and styl
...
@@ -6,7 +6,7 @@ The instructions in this section override those in the task description and styl
You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.
You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.
# User Preamble
# User Preamble
You are a knowledgable assistant. You can answer questions and perform tasks.
You are a knowledgeable assistant. You can answer questions and perform tasks.
## Available Tools
## Available Tools
Here is a list of tools that you have available to you:
Here is a list of tools that you have available to you:
You are a knowledgable assistant. You can answer questions and perform tasks.
You are a knowledgeable assistant. You can answer questions and perform tasks.
In addition to plain text responses, you can chose to call one or more of the provided functions.
In addition to plain text responses, you can chose to call one or more of the provided functions.
Use the following rule to decide when to call a function:
Use the following rule to decide when to call a function:
...
@@ -14,4 +14,4 @@ If you decide to call functions:
...
@@ -14,4 +14,4 @@ If you decide to call functions:
* make sure you pick the right functions that match the user intent
* make sure you pick the right functions that match the user intent
Available functions as JSON spec:
Available functions as JSON spec:
[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather","parameters":{"type":"object","required":["location","format"],"properties":{"format":{"type":"string","description":"The temperature unit to use. Infer this from the users location.","enum":["celsius","fahrenheit"]},"location":{"type":"string","description":"The city and state, e.g. San Francisco, CA"}}}}}]<|eot_id|><|start_header_id|><|end_header_id|>You are a knowledgable assistant. You can answer questions and perform tasks.<|eot_id|><|start_header_id|>user<|end_header_id|>What's the weather like today in Paris?<|eot_id|><|start_header_id|>assistant<|end_header_id|> functools[{"name": "get_current_weather", "arguments": {"format":"celsius","location":"Paris, France"}}]<|eot_id|><|start_header_id|>tool<|end_header_id|>22<|eot_id|><|start_header_id|>assistant<|end_header_id|>The current temperature in Paris, France is 22 degrees Celsius.<|eot_id|><|start_header_id|>user<|end_header_id|>What's the weather like today in San Francisco and Toronto?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather","parameters":{"type":"object","required":["location","format"],"properties":{"format":{"type":"string","description":"The temperature unit to use. Infer this from the user's location.","enum":["celsius","fahrenheit"]},"location":{"type":"string","description":"The city and state, e.g. San Francisco, CA"}}}}}]<|eot_id|><|start_header_id|><|end_header_id|>You are a knowledgeable assistant. You can answer questions and perform tasks.<|eot_id|><|start_header_id|>user<|end_header_id|>What's the weather like today in Paris?<|eot_id|><|start_header_id|>assistant<|end_header_id|> functools[{"name": "get_current_weather", "arguments": {"format":"celsius","location":"Paris, France"}}]<|eot_id|><|start_header_id|>tool<|end_header_id|>22<|eot_id|><|start_header_id|>assistant<|end_header_id|>The current temperature in Paris, France is 22 degrees Celsius.<|eot_id|><|start_header_id|>user<|end_header_id|>What's the weather like today in San Francisco and Toronto?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
You are a knowledgable assistant. You can answer questions and perform tasks. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
You are a knowledgeable assistant. You can answer questions and perform tasks. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tools> {"name":"get_current_weather","description":"Get the current weather","parameters":{"type":"object","required":["location","format"],"properties":{"format":{"type":"string","description":"The temperature unit to use. Infer this from the users location.","enum":["celsius","fahrenheit"]},"location":{"type":"string","description":"The city and state, e.g. San Francisco, CA"}}}} </tools><|eot_id|><|start_header_id|>user<|end_header_id|>
<tools> {"name":"get_current_weather","description":"Get the current weather","parameters":{"type":"object","required":["location","format"],"properties":{"format":{"type":"string","description":"The temperature unit to use. Infer this from the user's location.","enum":["celsius","fahrenheit"]},"location":{"type":"string","description":"The city and state, e.g. San Francisco, CA"}}}} </tools><|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather like today in Paris?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
What's the weather like today in Paris?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
[INST] What's the weather like today in Paris?[/INST][TOOL_CALLS] [{"name": "get_current_weather", "arguments": {"format":"celsius","location":"Paris, France"}}]</s>[TOOL_RESULTS] {"content": 22}[/TOOL_RESULTS] The current temperature in Paris, France is 22 degrees Celsius.</s>[AVAILABLE_TOOLS] [{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather","parameters":{"type":"object","required":["location","format"],"properties":{"format":{"type":"string","description":"The temperature unit to use. Infer this from the users location.","enum":["celsius","fahrenheit"]},"location":{"type":"string","description":"The city and state, e.g. San Francisco, CA"}}}}}][/AVAILABLE_TOOLS][INST] You are a knowledgable assistant. You can answer questions and perform tasks.
[INST] What's the weather like today in Paris?[/INST][TOOL_CALLS] [{"name": "get_current_weather", "arguments": {"format":"celsius","location":"Paris, France"}}]</s>[TOOL_RESULTS] {"content": 22}[/TOOL_RESULTS] The current temperature in Paris, France is 22 degrees Celsius.</s>[AVAILABLE_TOOLS] [{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather","parameters":{"type":"object","required":["location","format"],"properties":{"format":{"type":"string","description":"The temperature unit to use. Infer this from the user's location.","enum":["celsius","fahrenheit"]},"location":{"type":"string","description":"The city and state, e.g. San Francisco, CA"}}}}}][/AVAILABLE_TOOLS][INST] You are a knowledgeable assistant. You can answer questions and perform tasks.
What's the weather like today in San Francisco and Toronto?[/INST]
What's the weather like today in San Francisco and Toronto?[/INST]
You are a knowledgable assistant. You can answer questions and perform tasks.
You are a knowledgeable assistant. You can answer questions and perform tasks.
<tool> {"type":"function","function":{"name":"get_current_weather","description":"Get the current weather","parameters":{"type":"object","required":["location","format"],"properties":{"format":{"type":"string","description":"The temperature unit to use. Infer this from the users location.","enum":["celsius","fahrenheit"]},"location":{"type":"string","description":"The city and state, e.g. San Francisco, CA"}}}}} </tool>
<tool> {"type":"function","function":{"name":"get_current_weather","description":"Get the current weather","parameters":{"type":"object","required":["location","format"],"properties":{"format":{"type":"string","description":"The temperature unit to use. Infer this from the user's location.","enum":["celsius","fahrenheit"]},"location":{"type":"string","description":"The city and state, e.g. San Francisco, CA"}}}}} </tool>