import.md 2.19 KB
Newer Older
Michael Yang's avatar
Michael Yang committed
1
# Import
2

Michael Yang's avatar
Michael Yang committed
3
GGUF models and select Safetensors models can be imported directly into Ollama.
4

Michael Yang's avatar
Michael Yang committed
5
## Import GGUF
Jeffrey Morgan's avatar
Jeffrey Morgan committed
6

Michael Yang's avatar
Michael Yang committed
7
A binary GGUF file can be imported directly into Ollama through a Modelfile.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
8

Michael Yang's avatar
Michael Yang committed
9
10
```dockerfile
FROM /path/to/file.gguf
Jeffrey Morgan's avatar
Jeffrey Morgan committed
11
12
```

Michael Yang's avatar
Michael Yang committed
13
## Import Safetensors
Jeffrey Morgan's avatar
Jeffrey Morgan committed
14

Michael Yang's avatar
Michael Yang committed
15
If the model being imported is one of these architectures, it can be imported directly into Ollama through a Modelfile:
Jeffrey Morgan's avatar
Jeffrey Morgan committed
16

Michael Yang's avatar
Michael Yang committed
17
18
19
 - LlamaForCausalLM
 - MistralForCausalLM
 - GemmaForCausalLM
Jeffrey Morgan's avatar
Jeffrey Morgan committed
20

Michael Yang's avatar
Michael Yang committed
21
22
```dockerfile
FROM /path/to/safetensors/directory
Jeffrey Morgan's avatar
Jeffrey Morgan committed
23
24
```

Michael Yang's avatar
Michael Yang committed
25
For architectures not directly convertable by Ollama, see llama.cpp's [guide](https://github.com/ggerganov/llama.cpp/blob/master/README.md#prepare-and-quantize) on conversion. After conversion, see [Import GGUF](#import-gguf).
Jeffrey Morgan's avatar
Jeffrey Morgan committed
26

Michael Yang's avatar
Michael Yang committed
27
## Automatic Quantization
Jeffrey Morgan's avatar
Jeffrey Morgan committed
28

Michael Yang's avatar
Michael Yang committed
29
30
> [!NOTE]
> Automatic quantization requires v0.1.35 or higher.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
31

Michael Yang's avatar
Michael Yang committed
32
Ollama is capable of quantizing FP16 or FP32 models to any of the supported quantizations with the `-q/--quantize` flag in `ollama create`.
Jeffrey Morgan's avatar
Jeffrey Morgan committed
33

Michael Yang's avatar
Michael Yang committed
34
35
```dockerfile
FROM /path/to/my/gemma/f16/model
36
```
Jeffrey Morgan's avatar
Jeffrey Morgan committed
37

38
```shell
Michael Yang's avatar
Michael Yang committed
39
40
41
42
43
44
45
$ ollama create -q Q4_K_M mymodel
transferring model data
quantizing F16 model to Q4_K_M
creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
writing manifest
success
46
47
```

Michael Yang's avatar
Michael Yang committed
48
### Supported Quantizations
49

Michael Yang's avatar
Michael Yang committed
50
51
52
53
54
- `Q4_0`
- `Q4_1`
- `Q5_0`
- `Q5_1`
- `Q8_0`
55

Jeffrey Morgan's avatar
Jeffrey Morgan committed
56
#### K-means Quantizations
57

Michael Yang's avatar
Michael Yang committed
58
59
60
61
62
63
64
65
- `Q3_K_S`
- `Q3_K_M`
- `Q3_K_L`
- `Q4_K_S`
- `Q4_K_M`
- `Q5_K_S`
- `Q5_K_M`
- `Q6_K`
66

Michael Yang's avatar
Michael Yang committed
67
## Template Detection
68

Michael Yang's avatar
Michael Yang committed
69
70
> [!NOTE]
> Template detection requires v0.1.42 or higher.
71

Michael Yang's avatar
Michael Yang committed
72
Ollama uses model metadata, specifically `tokenizer.chat_template`, to automatically create a template appropriate for the model you're importing.
73

Michael Yang's avatar
Michael Yang committed
74
75
```dockerfile
FROM /path/to/my/gemma/model
76
77
```

Michael Yang's avatar
Michael Yang committed
78
79
80
81
82
83
84
85
```shell
$ ollama create mymodel
transferring model data
using autodetected template gemma-instruct
creating new layer sha256:baa2a0edc27d19cc6b7537578a9a7ba1a4e3214dc185ed5ae43692b319af7b84
creating new layer sha256:ba66c3309914dbef07e5149a648fd1877f030d337a4f240d444ea335008943cb
writing manifest
success
86
87
```

Michael Yang's avatar
Michael Yang committed
88
Defining a template in the Modelfile will disable this feature which may be useful if you want to use a different template than the autodetected one.