import.md 3.69 KB
Newer Older
1
2
# Import a model

Jeffrey Morgan's avatar
Jeffrey Morgan committed
3
This guide walks through importing a GGUF, PyTorch or Safetensors model.
4

Jeffrey Morgan's avatar
Jeffrey Morgan committed
5
6
7
8
9
10
11
12
13
14
15
16
17
## Importing (GGUF)

### Step 1: Write a `Modelfile`

Start by creating a `Modelfile`. This file is the blueprint for your model, specifying weights, parameters, prompt templates and more.

```
FROM ./mistral-7b-v0.1.Q4_0.gguf
```

(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the `TEMPLATE` instruction in the `Modelfile`:

```
18
FROM ./mistral-7b-v0.1.Q4_0.gguf
Jeffrey Morgan's avatar
Jeffrey Morgan committed
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```

### Step 2: Create the Ollama model

Finally, create a model from your `Modelfile`:

```
ollama create example -f Modelfile
```

### Step 3: Run your model

Next, test the model with `ollama run`:

```
ollama run example "What is your favourite condiment?"
```

## Importing (PyTorch & Safetensors)
39

40
> Importing from PyTorch and Safetensors is a longer process than importing from GGUF. Improvements that make it easier are a work in progress.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
41

42
### Setup
Jeffrey Morgan's avatar
Jeffrey Morgan committed
43

44
First, clone the `ollama/ollama` repo:
Jeffrey Morgan's avatar
Jeffrey Morgan committed
45

46
47
48
49
```
git clone git@github.com:ollama/ollama.git ollama
cd ollama
```
Jeffrey Morgan's avatar
Jeffrey Morgan committed
50

51
and then fetch its `llama.cpp` submodule:
Jeffrey Morgan's avatar
Jeffrey Morgan committed
52

53
54
55
56
57
58
```shell
git submodule init
git submodule update llm/llama.cpp
```

Next, install the Python dependencies:
59
60

```
61
62
63
python3 -m venv llm/llama.cpp/.venv
source llm/llama.cpp/.venv/bin/activate
pip install -r llm/llama.cpp/requirements.txt
64
65
```

66
67
68
69
70
Then build the `quantize` tool:

```
make -C llm/llama.cpp quantize
```
71

72
### Clone the HuggingFace repository (optional)
73

74
If the model is currently hosted in a HuggingFace repository, first clone that repository to download the raw model.
75

76
Install [Git LFS](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage), verify it's installed, and then clone the model's repository:
77
78

```
79
80
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 model
81
82
```

83
### Convert the model
84

85
> Note: some model architectures require using specific convert scripts. For example, Qwen models require running `convert-hf-to-gguf.py` instead of `convert.py`
86

87
88
89
```
python llm/llama.cpp/convert.py ./model --outtype f16 --outfile converted.bin
```
90

91
### Quantize the model
92
93

```
94
llm/llama.cpp/quantize converted.bin quantized.bin q4_0
95
96
```

97
98
99
### Step 3: Write a `Modelfile`

Next, create a `Modelfile` for your model:
100
101

```
102
FROM quantized.bin
103
104
105
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```

106
### Step 4: Create the Ollama model
107
108
109
110
111
112
113

Finally, create a model from your `Modelfile`:

```
ollama create example -f Modelfile
```

Jeffrey Morgan's avatar
Jeffrey Morgan committed
114
115
### Step 5: Run your model

116
117
118
119
120
121
Next, test the model with `ollama run`:

```
ollama run example "What is your favourite condiment?"
```

Jeffrey Morgan's avatar
Jeffrey Morgan committed
122
## Publishing your model (optional – early alpha)
123
124
125
126

Publishing models is in early alpha. If you'd like to publish your model to share with others, follow these steps:

1. Create [an account](https://ollama.ai/signup)
127
2. Run `cat ~/.ollama/id_ed25519.pub` to view your Ollama public key. Copy this to the clipboard.
128
129
130
131
132
133
134
135
136
137
138
139
140
141
3. Add your public key to your [Ollama account](https://ollama.ai/settings/keys)

Next, copy your model to your username's namespace:

```
ollama cp example <your username>/example
```

Then push the model:

```
ollama push <your username>/example
```

142
After publishing, your model will be available at `https://ollama.ai/<your username>/example`.
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164

## Quantization reference

The quantization options are as follow (from highest highest to lowest levels of quantization). Note: some architectures such as Falcon do not support K quants.

- `q2_K`
- `q3_K`
- `q3_K_S`
- `q3_K_M`
- `q3_K_L`
- `q4_0` (recommended)
- `q4_1`
- `q4_K`
- `q4_K_S`
- `q4_K_M`
- `q5_0`
- `q5_1`
- `q5_K`
- `q5_K_S`
- `q5_K_M`
- `q6_K`
- `q8_0`
Jeffrey Morgan's avatar
Jeffrey Morgan committed
165
- `f16`