import.md 6.05 KB
Newer Older
xuxzh1's avatar
update  
xuxzh1 committed
1
# Importing a model
mashun1's avatar
v1  
mashun1 committed
2

xuxzh1's avatar
update  
xuxzh1 committed
3
## Table of Contents
mashun1's avatar
v1  
mashun1 committed
4

xuxzh1's avatar
update  
xuxzh1 committed
5
6
7
8
  * [Importing a Safetensors adapter](#Importing-a-fine-tuned-adapter-from-Safetensors-weights)
  * [Importing a Safetensors model](#Importing-a-model-from-Safetensors-weights)
  * [Importing a GGUF file](#Importing-a-GGUF-based-model-or-adapter)
  * [Sharing models on ollama.com](#Sharing-your-model-on-ollamacom)
mashun1's avatar
v1  
mashun1 committed
9

xuxzh1's avatar
update  
xuxzh1 committed
10
11
12
## Importing a fine tuned adapter from Safetensors weights

First, create a `Modelfile` with a `FROM` command pointing at the base model you used for fine tuning, and an `ADAPTER` command which points to the directory with your Safetensors adapter:
mashun1's avatar
v1  
mashun1 committed
13

xuxzh1's avatar
init  
xuxzh1 committed
14
```dockerfile
xuxzh1's avatar
update  
xuxzh1 committed
15
16
17
18
19
20
21
22
23
24
FROM <base model name>
ADAPTER /path/to/safetensors/adapter/directory
```

Make sure that you use the same base model in the `FROM` command as you used to create the adapter otherwise you will get erratic results. Most frameworks use different quantization methods, so it's best to use non-quantized (i.e. non-QLoRA) adapters. If your adapter is in the same directory as your `Modelfile`, use `ADAPTER .` to specify the adapter path.

Now run `ollama create` from the directory where the `Modelfile` was created:

```bash
ollama create my-model
mashun1's avatar
v1  
mashun1 committed
25
26
```

xuxzh1's avatar
update  
xuxzh1 committed
27
28
29
30
31
Lastly, test the model:

```bash
ollama run my-model
```
mashun1's avatar
v1  
mashun1 committed
32

xuxzh1's avatar
update  
xuxzh1 committed
33
Ollama supports importing adapters based on several different model architectures including:
mashun1's avatar
v1  
mashun1 committed
34

xuxzh1's avatar
update  
xuxzh1 committed
35
36
37
38
39
40
41
42
43
44
45
46
47
48
  * Llama (including Llama 2, Llama 3, Llama 3.1, and Llama 3.2);
  * Mistral (including Mistral 1, Mistral 2, and Mixtral); and
  * Gemma (including Gemma 1 and Gemma 2)

You can create the adapter using a fine tuning framework or tool which can output adapters in the Safetensors format, such as:

  * Hugging Face [fine tuning framework](https://huggingface.co/docs/transformers/en/training)
  * [Unsloth](https://github.com/unslothai/unsloth)
  * [MLX](https://github.com/ml-explore/mlx)


## Importing a model from Safetensors weights

First, create a `Modelfile` with a `FROM` command which points to the directory containing your Safetensors weights:
mashun1's avatar
v1  
mashun1 committed
49

xuxzh1's avatar
init  
xuxzh1 committed
50
51
```dockerfile
FROM /path/to/safetensors/directory
mashun1's avatar
v1  
mashun1 committed
52
53
```

xuxzh1's avatar
update  
xuxzh1 committed
54
If you create the Modelfile in the same directory as the weights, you can use the command `FROM .`.
mashun1's avatar
v1  
mashun1 committed
55

xuxzh1's avatar
update  
xuxzh1 committed
56
Now run the `ollama create` command from the directory where you created the `Modelfile`:
mashun1's avatar
v1  
mashun1 committed
57

xuxzh1's avatar
update  
xuxzh1 committed
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
```shell
ollama create my-model
```

Lastly, test the model:

```shell
ollama run my-model
```

Ollama supports importing models for several different architectures including:

  * Llama (including Llama 2, Llama 3, Llama 3.1, and Llama 3.2);
  * Mistral (including Mistral 1, Mistral 2, and Mixtral);
  * Gemma (including Gemma 1 and Gemma 2); and
  * Phi3

This includes importing foundation models as well as any fine tuned models which have been _fused_ with a foundation model.
## Importing a GGUF based model or adapter

If you have a GGUF based model or adapter it is possible to import it into Ollama. You can obtain a GGUF model or adapter by:

  * converting a Safetensors model with the `convert_hf_to_gguf.py` from Llama.cpp; 
  * converting a Safetensors adapter with the `convert_lora_to_gguf.py` from Llama.cpp; or
  * downloading a model or adapter from a place such as HuggingFace

To import a GGUF model, create a `Modelfile` containing:

```dockerfile
FROM /path/to/file.gguf
```
mashun1's avatar
v1  
mashun1 committed
89

xuxzh1's avatar
update  
xuxzh1 committed
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
For a GGUF adapter, create the `Modelfile` with:

```dockerfile
FROM <model name>
ADAPTER /path/to/file.gguf
```

When importing a GGUF adapter, it's important to use the same base model as the base model that the adapter was created with. You can use:

 * a model from Ollama
 * a GGUF file
 * a Safetensors based model 

Once you have created your `Modelfile`, use the `ollama create` command to build the model.

```shell
ollama create my-model
```

## Quantizing a Model

Quantizing a model allows you to run models faster and with less memory consumption but at reduced accuracy. This allows you to run a model on more modest hardware.

Ollama can quantize FP16 and FP32 based models into different quantization levels using the `-q/--quantize` flag with the `ollama create` command.

First, create a Modelfile with the FP16 or FP32 based model you wish to quantize.
mashun1's avatar
v1  
mashun1 committed
116

xuxzh1's avatar
init  
xuxzh1 committed
117
118
```dockerfile
FROM /path/to/my/gemma/f16/model
mashun1's avatar
v1  
mashun1 committed
119
120
```

xuxzh1's avatar
update  
xuxzh1 committed
121
122
Use `ollama create` to then create the quantized model.

mashun1's avatar
v1  
mashun1 committed
123
```shell
xuxzh1's avatar
update  
xuxzh1 committed
124
$ ollama create --quantize q4_K_M mymodel
xuxzh1's avatar
init  
xuxzh1 committed
125
126
127
128
129
130
transferring model data
quantizing F16 model to Q4_K_M
creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
writing manifest
success
mashun1's avatar
v1  
mashun1 committed
131
132
```

xuxzh1's avatar
init  
xuxzh1 committed
133
### Supported Quantizations
mashun1's avatar
v1  
mashun1 committed
134

xuxzh1's avatar
update  
xuxzh1 committed
135
136
137
138
139
- `q4_0`
- `q4_1`
- `q5_0`
- `q5_1`
- `q8_0`
mashun1's avatar
v1  
mashun1 committed
140

xuxzh1's avatar
init  
xuxzh1 committed
141
#### K-means Quantizations
mashun1's avatar
v1  
mashun1 committed
142

xuxzh1's avatar
update  
xuxzh1 committed
143
144
145
146
147
148
149
150
- `q3_K_S`
- `q3_K_M`
- `q3_K_L`
- `q4_K_S`
- `q4_K_M`
- `q5_K_S`
- `q5_K_M`
- `q6_K`
mashun1's avatar
v1  
mashun1 committed
151
152


xuxzh1's avatar
update  
xuxzh1 committed
153
## Sharing your model on ollama.com
mashun1's avatar
v1  
mashun1 committed
154

xuxzh1's avatar
update  
xuxzh1 committed
155
You can share any model you have created by pushing it to [ollama.com](https://ollama.com) so that other users can try it out.
mashun1's avatar
v1  
mashun1 committed
156

xuxzh1's avatar
update  
xuxzh1 committed
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
First, use your browser to go to the [Ollama Sign-Up](https://ollama.com/signup) page. If you already have an account, you can skip this step.

<img src="images/signup.png" alt="Sign-Up" width="40%">

The `Username` field will be used as part of your model's name (e.g. `jmorganca/mymodel`), so make sure you are comfortable with the username that you have selected.

Now that you have created an account and are signed-in, go to the [Ollama Keys Settings](https://ollama.com/settings/keys) page.

Follow the directions on the page to determine where your Ollama Public Key is located.

<img src="images/ollama-keys.png" alt="Ollama Keys" width="80%">

Click on the `Add Ollama Public Key` button, and copy and paste the contents of your Ollama Public Key into the text field.

To push a model to [ollama.com](https://ollama.com), first make sure that it is named correctly with your username. You may have to use the `ollama cp` command to copy
your model to give it the correct name. Once you're happy with your model's name, use the `ollama push` command to push it to [ollama.com](https://ollama.com).

```shell
ollama cp mymodel myuser/mymodel
ollama push myuser/mymodel
mashun1's avatar
v1  
mashun1 committed
177
178
```

xuxzh1's avatar
update  
xuxzh1 committed
179
180
Once your model has been pushed, other users can pull and run it by using the command:

xuxzh1's avatar
init  
xuxzh1 committed
181
```shell
xuxzh1's avatar
update  
xuxzh1 committed
182
ollama run myuser/mymodel
mashun1's avatar
v1  
mashun1 committed
183
184
```