Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ollama
Commits
dccac8c8
Commit
dccac8c8
authored
Oct 30, 2023
by
Michael Yang
Browse files
k8s example
parent
c05ab9a8
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
134 additions
and
0 deletions
+134
-0
examples/kubernetes/README.md
examples/kubernetes/README.md
+36
-0
examples/kubernetes/cpu.yaml
examples/kubernetes/cpu.yaml
+42
-0
examples/kubernetes/gpu.yaml
examples/kubernetes/gpu.yaml
+56
-0
No files found.
examples/kubernetes/README.md
0 → 100644
View file @
dccac8c8
# Deploy Ollama to Kubernetes
## Prerequisites
-
Ollama: https://ollama.ai/download
-
Kubernetes cluster. This example will use Google Kubernetes Engine.
## Steps
1.
Create the Ollama namespace, daemon set, and service
```bash
kubectl apply -f cpu.yaml
```
1.
Port forward the Ollama service to connect and use it locally
```bash
kubectl -n ollama port-forward service/ollama 11434:80
```
1.
Pull and run
`orca-mini:3b`
```bash
ollama run orca-mini:3b
```
## (Optional) Hardware Acceleration
Hardware acceleration in Kubernetes requires NVIDIA's
[
`k8s-device-plugin`
](
https://github.com/NVIDIA/k8s-device-plugin
)
. Follow the link for more details.
Once configured, create a GPU enabled Ollama deployment.
```
bash
kubectl apply
-f
gpu.yaml
```
examples/kubernetes/cpu.yaml
0 → 100644
View file @
dccac8c8
---
apiVersion
:
v1
kind
:
Namespace
metadata
:
name
:
ollama
---
apiVersion
:
apps/v1
kind
:
Deployment
metadata
:
name
:
ollama
namespace
:
ollama
spec
:
selector
:
matchLabels
:
name
:
ollama
template
:
metadata
:
labels
:
name
:
ollama
spec
:
containers
:
-
name
:
ollama
image
:
ollama/ollama:latest
ports
:
-
name
:
http
containerPort
:
11434
protocol
:
TCP
---
apiVersion
:
v1
kind
:
Service
metadata
:
name
:
ollama
namespace
:
ollama
spec
:
type
:
ClusterIP
selector
:
name
:
ollama
ports
:
-
port
:
80
name
:
http
targetPort
:
http
protocol
:
TCP
examples/kubernetes/gpu.yaml
0 → 100644
View file @
dccac8c8
---
apiVersion
:
v1
kind
:
Namespace
metadata
:
name
:
ollama
---
apiVersion
:
apps/v1
kind
:
Deployment
metadata
:
name
:
ollama
namespace
:
ollama
spec
:
strategy
:
type
:
Recreate
selector
:
matchLabels
:
name
:
ollama
template
:
metadata
:
labels
:
name
:
ollama
spec
:
containers
:
-
name
:
ollama
image
:
ollama/ollama:latest
env
:
-
name
:
PATH
value
:
/usr/local/nvidia/bin:/usr/local/nvidia/lib64:/usr/bin:/usr/sbin:/bin:/sbin
-
name
:
LD_LIBRARY_PATH
value
:
/usr/local/nvidia/lib64
ports
:
-
name
:
http
containerPort
:
11434
protocol
:
TCP
resources
:
limits
:
nvidia.com/gpu
:
1
tolerations
:
-
key
:
nvidia.com/gpu
operator
:
Exists
effect
:
NoSchedule
---
apiVersion
:
v1
kind
:
Service
metadata
:
name
:
ollama
namespace
:
ollama
spec
:
type
:
ClusterIP
selector
:
name
:
ollama
ports
:
-
port
:
80
name
:
http
targetPort
:
http
protocol
:
TCP
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment