import.md 3.7 KB
Newer Older
1
2
# Import a model

Jeffrey Morgan's avatar
Jeffrey Morgan committed
3
This guide walks through importing a GGUF, PyTorch or Safetensors model.
4

Jeffrey Morgan's avatar
Jeffrey Morgan committed
5
6
7
8
9
10
11
12
13
14
15
16
17
## Importing (GGUF)

### Step 1: Write a `Modelfile`

Start by creating a `Modelfile`. This file is the blueprint for your model, specifying weights, parameters, prompt templates and more.

```
FROM ./mistral-7b-v0.1.Q4_0.gguf
```

(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the `TEMPLATE` instruction in the `Modelfile`:

```
18
FROM ./mistral-7b-v0.1.Q4_0.gguf
Jeffrey Morgan's avatar
Jeffrey Morgan committed
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```

### Step 2: Create the Ollama model

Finally, create a model from your `Modelfile`:

```
ollama create example -f Modelfile
```

### Step 3: Run your model

Next, test the model with `ollama run`:

```
ollama run example "What is your favourite condiment?"
```

## Importing (PyTorch & Safetensors)
39

40
> Importing from PyTorch and Safetensors is a longer process than importing from GGUF. Improvements that make it easier are a work in progress.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
41

42
### Setup
Jeffrey Morgan's avatar
Jeffrey Morgan committed
43

44
First, clone the `ollama/ollama` repo:
Jeffrey Morgan's avatar
Jeffrey Morgan committed
45

46
47
48
49
```
git clone git@github.com:ollama/ollama.git ollama
cd ollama
```
Jeffrey Morgan's avatar
Jeffrey Morgan committed
50

51
and then fetch its `llama.cpp` submodule:
Jeffrey Morgan's avatar
Jeffrey Morgan committed
52

53
54
55
56
57
58
```shell
git submodule init
git submodule update llm/llama.cpp
```

Next, install the Python dependencies:
59
60

```
61
62
63
python3 -m venv llm/llama.cpp/.venv
source llm/llama.cpp/.venv/bin/activate
pip install -r llm/llama.cpp/requirements.txt
64
65
```

66
67
68
69
70
Then build the `quantize` tool:

```
make -C llm/llama.cpp quantize
```
71

72
### Clone the HuggingFace repository (optional)
73

74
If the model is currently hosted in a HuggingFace repository, first clone that repository to download the raw model.
75

76
Install [Git LFS](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage), verify it's installed, and then clone the model's repository:
77
78

```
79
80
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 model
81
82
```

83
### Convert the model
84

85
> Note: some model architectures require using specific convert scripts. For example, Qwen models require running `convert-hf-to-gguf.py` instead of `convert.py`
86

87
88
89
```
python llm/llama.cpp/convert.py ./model --outtype f16 --outfile converted.bin
```
90

91
### Quantize the model
92
93

```
94
llm/llama.cpp/quantize converted.bin quantized.bin q4_0
95
96
```

97
98
99
### Step 3: Write a `Modelfile`

Next, create a `Modelfile` for your model:
100
101

```
102
FROM quantized.bin
103
104
105
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```

106
### Step 4: Create the Ollama model
107
108
109
110
111
112
113

Finally, create a model from your `Modelfile`:

```
ollama create example -f Modelfile
```

Jeffrey Morgan's avatar
Jeffrey Morgan committed
114
115
### Step 5: Run your model

116
117
118
119
120
121
Next, test the model with `ollama run`:

```
ollama run example "What is your favourite condiment?"
```

Jeffrey Morgan's avatar
Jeffrey Morgan committed
122
## Publishing your model (optional – early alpha)
123
124
125

Publishing models is in early alpha. If you'd like to publish your model to share with others, follow these steps:

126
1. Create [an account](https://ollama.com/signup)
127
2. Run `cat ~/.ollama/id_ed25519.pub` to view your Ollama public key. Copy this to the clipboard.
128
3. Add your public key to your [Ollama account](https://ollama.com/settings/keys)
129
130
131
132
133
134
135
136
137
138
139
140
141

Next, copy your model to your username's namespace:

```
ollama cp example <your username>/example
```

Then push the model:

```
ollama push <your username>/example
```

142
After publishing, your model will be available at `https://ollama.com/<your username>/example`.
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164

## Quantization reference

The quantization options are as follow (from highest highest to lowest levels of quantization). Note: some architectures such as Falcon do not support K quants.

- `q2_K`
- `q3_K`
- `q3_K_S`
- `q3_K_M`
- `q3_K_L`
- `q4_0` (recommended)
- `q4_1`
- `q4_K`
- `q4_K_S`
- `q4_K_M`
- `q5_0`
- `q5_1`
- `q5_K`
- `q5_K_S`
- `q5_K_M`
- `q6_K`
- `q8_0`
Jeffrey Morgan's avatar
Jeffrey Morgan committed
165
- `f16`