import.md 2.23 KB
Newer Older
Michael Yang's avatar
Michael Yang committed
1
# Import
2

Michael Yang's avatar
Michael Yang committed
3
GGUF models and select Safetensors models can be imported directly into Ollama.
4

Michael Yang's avatar
Michael Yang committed
5
## Import GGUF
Jeffrey Morgan's avatar
Jeffrey Morgan committed
6

Michael Yang's avatar
Michael Yang committed
7
A binary GGUF file can be imported directly into Ollama through a Modelfile.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
8

Michael Yang's avatar
Michael Yang committed
9
10
```dockerfile
FROM /path/to/file.gguf
Jeffrey Morgan's avatar
Jeffrey Morgan committed
11
12
```

Michael Yang's avatar
Michael Yang committed
13
## Import Safetensors
Jeffrey Morgan's avatar
Jeffrey Morgan committed
14

Michael Yang's avatar
Michael Yang committed
15
If the model being imported is one of these architectures, it can be imported directly into Ollama through a Modelfile:
Jeffrey Morgan's avatar
Jeffrey Morgan committed
16

Michael Yang's avatar
Michael Yang committed
17
18
 - LlamaForCausalLM
 - MistralForCausalLM
Michael Yang's avatar
Michael Yang committed
19
 - MixtralForCausalLM
Michael Yang's avatar
Michael Yang committed
20
 - GemmaForCausalLM
Michael Yang's avatar
Michael Yang committed
21
 - Phi3ForCausalLM
Jeffrey Morgan's avatar
Jeffrey Morgan committed
22

Michael Yang's avatar
Michael Yang committed
23
24
```dockerfile
FROM /path/to/safetensors/directory
Jeffrey Morgan's avatar
Jeffrey Morgan committed
25
26
```

Michael Yang's avatar
Michael Yang committed
27
For architectures not directly convertable by Ollama, see llama.cpp's [guide](https://github.com/ggerganov/llama.cpp/blob/master/README.md#prepare-and-quantize) on conversion. After conversion, see [Import GGUF](#import-gguf).
Jeffrey Morgan's avatar
Jeffrey Morgan committed
28

Michael Yang's avatar
Michael Yang committed
29
## Automatic Quantization
Jeffrey Morgan's avatar
Jeffrey Morgan committed
30

Michael Yang's avatar
Michael Yang committed
31
32
> [!NOTE]
> Automatic quantization requires v0.1.35 or higher.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
33

Michael Yang's avatar
Michael Yang committed
34
Ollama is capable of quantizing FP16 or FP32 models to any of the supported quantizations with the `-q/--quantize` flag in `ollama create`.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
35

Michael Yang's avatar
Michael Yang committed
36
37
```dockerfile
FROM /path/to/my/gemma/f16/model
38
```
Jeffrey Morgan's avatar
Jeffrey Morgan committed
39

40
```shell
Michael Yang's avatar
Michael Yang committed
41
42
43
44
45
46
47
$ ollama create -q Q4_K_M mymodel
transferring model data
quantizing F16 model to Q4_K_M
creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
writing manifest
success
48
49
```

Michael Yang's avatar
Michael Yang committed
50
### Supported Quantizations
51

Michael Yang's avatar
Michael Yang committed
52
53
54
55
56
- `Q4_0`
- `Q4_1`
- `Q5_0`
- `Q5_1`
- `Q8_0`
57

Jeffrey Morgan's avatar
Jeffrey Morgan committed
58
#### K-means Quantizations
59

Michael Yang's avatar
Michael Yang committed
60
61
62
63
64
65
66
67
- `Q3_K_S`
- `Q3_K_M`
- `Q3_K_L`
- `Q4_K_S`
- `Q4_K_M`
- `Q5_K_S`
- `Q5_K_M`
- `Q6_K`
68

Michael Yang's avatar
Michael Yang committed
69
## Template Detection
70

Michael Yang's avatar
Michael Yang committed
71
72
> [!NOTE]
> Template detection requires v0.1.42 or higher.
73

Michael Yang's avatar
Michael Yang committed
74
Ollama uses model metadata, specifically `tokenizer.chat_template`, to automatically create a template appropriate for the model you're importing.
75

Michael Yang's avatar
Michael Yang committed
76
77
```dockerfile
FROM /path/to/my/gemma/model
78
79
```

Michael Yang's avatar
Michael Yang committed
80
81
82
83
84
85
86
87
```shell
$ ollama create mymodel
transferring model data
using autodetected template gemma-instruct
creating new layer sha256:baa2a0edc27d19cc6b7537578a9a7ba1a4e3214dc185ed5ae43692b319af7b84
creating new layer sha256:ba66c3309914dbef07e5149a648fd1877f030d337a4f240d444ea335008943cb
writing manifest
success
88
89
```

Michael Yang's avatar
Michael Yang committed
90
Defining a template in the Modelfile will disable this feature which may be useful if you want to use a different template than the autodetected one.