import.md 3.81 KB
Newer Older
1
2
3
4
5
6
7
8
# Import a model

This guide walks through creating an Ollama model from an existing model on HuggingFace from PyTorch, Safetensors or GGUF. It optionally covers pushing the model to [ollama.ai](https://ollama.ai/library).

## Supported models

Ollama supports a set of model architectures, with support for more coming soon:

9
- Llama & Mistral
10
- Falcon & RW
11
- GPT-NeoX
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
- BigCode

To view a model's architecture, check its `config.json` file. You should see an entry under `architecture` (e.g. `LlamaForCausalLM`).

## Importing

### Step 1: Clone the HuggingFace repository

```
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
cd Mistral-7B-Instruct-v0.1
```

### Step 2: Convert and quantize

28
A [Docker image](https://hub.docker.com/r/ollama/quantize) with the tooling required to convert and quantize models is available.
29

30
First, Install [Docker](https://www.docker.com/get-started/).
31

32
Next, to convert and quantize your model, run:
33
34
35
36
37
38
39
40

```
docker run --rm -v .:/model ollama/quantize -q q4_0 /model
```

This will output two files into the directory:

- `f16.bin`: the model converted to GGUF
41
- `q4_0.bin` the model quantized to a 4-bit quantization (we will use this file to create the Ollama model)
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144

### Step 3: Write a `Modelfile`

Next, create a `Modelfile` for your model. This file is the blueprint for your model, specifying weights, parameters, prompt templates and more.

```
FROM ./q4_0.bin
```

(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the `TEMPLATE` instruction in the `Modelfile`:

```
FROM ./q4_0.bin
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```

### Step 4: Create an Ollama model

Finally, create a model from your `Modelfile`:

```
ollama create example -f Modelfile
```

Next, test the model with `ollama run`:

```
ollama run example "What is your favourite condiment?"
```

### Step 5: Publish your model (optional - in alpha)

Publishing models is in early alpha. If you'd like to publish your model to share with others, follow these steps:

1. Create [an account](https://ollama.ai/signup)
2. Ollama uses SSH keys similar to Git. Find your public key with `cat ~/.ollama/id_ed25519.pub` and copy it to your clipboard.
3. Add your public key to your [Ollama account](https://ollama.ai/settings/keys)

Next, copy your model to your username's namespace:

```
ollama cp example <your username>/example
```

Then push the model:

```
ollama push <your username>/example
```

After publishing, your model will be available at `https://ollama.ai/<your username>/example`

## Quantization reference

The quantization options are as follow (from highest highest to lowest levels of quantization). Note: some architectures such as Falcon do not support K quants.

- `q2_K`
- `q3_K`
- `q3_K_S`
- `q3_K_M`
- `q3_K_L`
- `q4_0` (recommended)
- `q4_1`
- `q4_K`
- `q4_K_S`
- `q4_K_M`
- `q5_0`
- `q5_1`
- `q5_K`
- `q5_K_S`
- `q5_K_M`
- `q6_K`
- `q8_0`

## Manually converting & quantizing models

### Prerequisites

Start by cloning the `llama.cpp` repo to your machine in another directory:

```
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
```

Next, install the Python dependencies:

```
pip install -r requirements.txt
```

Finally, build the `quantize` tool:

```
make quantize
```

### Convert the model

Run the correct conversion script for your model architecture:

```shell
# LlamaForCausalLM or MistralForCausalLM
145
python convert.py <path to model directory>
146
147

# FalconForCausalLM
148
python convert-falcon-hf-to-gguf.py <path to model directory>
149
150

# GPTNeoXForCausalLM
151
python convert-falcon-hf-to-gguf.py <path to model directory>
152
153

# GPTBigCodeForCausalLM
154
python convert-starcoder-hf-to-gguf.py <path to model directory>
155
156
157
158
159
160
161
```

### Quantize the model

```
quantize <path to model dir>/ggml-model-f32.bin <path to model dir>/q4_0.bin q4_0
```