import.md 2.49 KB
Newer Older
Michael Yang's avatar
Michael Yang committed
1
# Import
2

Michael Yang's avatar
Michael Yang committed
3
GGUF models and select Safetensors models can be imported directly into Ollama.
4

Michael Yang's avatar
Michael Yang committed
5
## Import GGUF
Jeffrey Morgan's avatar
Jeffrey Morgan committed
6

Michael Yang's avatar
Michael Yang committed
7
A binary GGUF file can be imported directly into Ollama through a Modelfile.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
8

Michael Yang's avatar
Michael Yang committed
9
10
```dockerfile
FROM /path/to/file.gguf
Jeffrey Morgan's avatar
Jeffrey Morgan committed
11
12
```

Michael Yang's avatar
Michael Yang committed
13
## Import Safetensors
Jeffrey Morgan's avatar
Jeffrey Morgan committed
14

Michael Yang's avatar
Michael Yang committed
15
If the model being imported is one of these architectures, it can be imported directly into Ollama through a Modelfile:
Jeffrey Morgan's avatar
Jeffrey Morgan committed
16

Michael Yang's avatar
Michael Yang committed
17
18
19
 - LlamaForCausalLM
 - MistralForCausalLM
 - GemmaForCausalLM
Jeffrey Morgan's avatar
Jeffrey Morgan committed
20

Michael Yang's avatar
Michael Yang committed
21
22
```dockerfile
FROM /path/to/safetensors/directory
Jeffrey Morgan's avatar
Jeffrey Morgan committed
23
24
```

Michael Yang's avatar
Michael Yang committed
25
For architectures not directly convertable by Ollama, see llama.cpp's [guide](https://github.com/ggerganov/llama.cpp/blob/master/README.md#prepare-and-quantize) on conversion. After conversion, see [Import GGUF](#import-gguf).
Jeffrey Morgan's avatar
Jeffrey Morgan committed
26

Michael Yang's avatar
Michael Yang committed
27
## Automatic Quantization
Jeffrey Morgan's avatar
Jeffrey Morgan committed
28

Michael Yang's avatar
Michael Yang committed
29
30
> [!NOTE]
> Automatic quantization requires v0.1.35 or higher.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
31

Michael Yang's avatar
Michael Yang committed
32
Ollama is capable of quantizing FP16 or FP32 models to any of the supported quantizations with the `-q/--quantize` flag in `ollama create`.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
33

Michael Yang's avatar
Michael Yang committed
34
35
```dockerfile
FROM /path/to/my/gemma/f16/model
36
```
Jeffrey Morgan's avatar
Jeffrey Morgan committed
37

38
```shell
Michael Yang's avatar
Michael Yang committed
39
40
41
42
43
44
45
$ ollama create -q Q4_K_M mymodel
transferring model data
quantizing F16 model to Q4_K_M
creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
writing manifest
success
46
47
```

Michael Yang's avatar
Michael Yang committed
48
### Supported Quantizations
49

Michael Yang's avatar
Michael Yang committed
50
51
<details>
<summary>Legacy Quantization</summary>
52

Michael Yang's avatar
Michael Yang committed
53
54
55
56
57
- `Q4_0`
- `Q4_1`
- `Q5_0`
- `Q5_1`
- `Q8_0`
58

Michael Yang's avatar
Michael Yang committed
59
</details>
60

Michael Yang's avatar
Michael Yang committed
61
62
<details>
<summary>K-means Quantization</summary>`
63

Michael Yang's avatar
Michael Yang committed
64
65
66
67
68
69
70
71
- `Q3_K_S`
- `Q3_K_M`
- `Q3_K_L`
- `Q4_K_S`
- `Q4_K_M`
- `Q5_K_S`
- `Q5_K_M`
- `Q6_K`
72

Michael Yang's avatar
Michael Yang committed
73
</details>
74

Michael Yang's avatar
Michael Yang committed
75
76
> [!NOTE]
> Activation-aware Weight Quantization (i.e. IQ) are not currently supported for automatic quantization however you can still import the quantized model into Ollama, see [Import GGUF](#import-gguf).
77

Michael Yang's avatar
Michael Yang committed
78
## Template Detection
79

Michael Yang's avatar
Michael Yang committed
80
81
> [!NOTE]
> Template detection requires v0.1.42 or higher.
82

Michael Yang's avatar
Michael Yang committed
83
Ollama uses model metadata, specifically `tokenizer.chat_template`, to automatically create a template appropriate for the model you're importing.
84

Michael Yang's avatar
Michael Yang committed
85
86
```dockerfile
FROM /path/to/my/gemma/model
87
88
```

Michael Yang's avatar
Michael Yang committed
89
90
91
92
93
94
95
96
```shell
$ ollama create mymodel
transferring model data
using autodetected template gemma-instruct
creating new layer sha256:baa2a0edc27d19cc6b7537578a9a7ba1a4e3214dc185ed5ae43692b319af7b84
creating new layer sha256:ba66c3309914dbef07e5149a648fd1877f030d337a4f240d444ea335008943cb
writing manifest
success
97
98
```

Michael Yang's avatar
Michael Yang committed
99
Defining a template in the Modelfile will disable this feature which may be useful if you want to use a different template than the autodetected one.