import.md 2.19 KB
Newer Older
xuxzh1's avatar
init  
xuxzh1 committed
1
# Import
mashun1's avatar
v1  
mashun1 committed
2

xuxzh1's avatar
init  
xuxzh1 committed
3
GGUF models and select Safetensors models can be imported directly into Ollama.
mashun1's avatar
v1  
mashun1 committed
4

xuxzh1's avatar
init  
xuxzh1 committed
5
## Import GGUF
mashun1's avatar
v1  
mashun1 committed
6

xuxzh1's avatar
init  
xuxzh1 committed
7
A binary GGUF file can be imported directly into Ollama through a Modelfile.
mashun1's avatar
v1  
mashun1 committed
8

xuxzh1's avatar
init  
xuxzh1 committed
9
10
```dockerfile
FROM /path/to/file.gguf
mashun1's avatar
v1  
mashun1 committed
11
12
```

xuxzh1's avatar
init  
xuxzh1 committed
13
## Import Safetensors
mashun1's avatar
v1  
mashun1 committed
14

xuxzh1's avatar
init  
xuxzh1 committed
15
If the model being imported is one of these architectures, it can be imported directly into Ollama through a Modelfile:
mashun1's avatar
v1  
mashun1 committed
16

xuxzh1's avatar
init  
xuxzh1 committed
17
18
19
 - LlamaForCausalLM
 - MistralForCausalLM
 - GemmaForCausalLM
mashun1's avatar
v1  
mashun1 committed
20

xuxzh1's avatar
init  
xuxzh1 committed
21
22
```dockerfile
FROM /path/to/safetensors/directory
mashun1's avatar
v1  
mashun1 committed
23
24
```

xuxzh1's avatar
init  
xuxzh1 committed
25
For architectures not directly convertable by Ollama, see llama.cpp's [guide](https://github.com/ggerganov/llama.cpp/blob/master/README.md#prepare-and-quantize) on conversion. After conversion, see [Import GGUF](#import-gguf).
mashun1's avatar
v1  
mashun1 committed
26

xuxzh1's avatar
init  
xuxzh1 committed
27
## Automatic Quantization
mashun1's avatar
v1  
mashun1 committed
28

xuxzh1's avatar
init  
xuxzh1 committed
29
30
> [!NOTE]
> Automatic quantization requires v0.1.35 or higher.
mashun1's avatar
v1  
mashun1 committed
31

xuxzh1's avatar
init  
xuxzh1 committed
32
Ollama is capable of quantizing FP16 or FP32 models to any of the supported quantizations with the `-q/--quantize` flag in `ollama create`.
mashun1's avatar
v1  
mashun1 committed
33

xuxzh1's avatar
init  
xuxzh1 committed
34
35
```dockerfile
FROM /path/to/my/gemma/f16/model
mashun1's avatar
v1  
mashun1 committed
36
37
38
```

```shell
xuxzh1's avatar
init  
xuxzh1 committed
39
40
41
42
43
44
45
$ ollama create -q Q4_K_M mymodel
transferring model data
quantizing F16 model to Q4_K_M
creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
writing manifest
success
mashun1's avatar
v1  
mashun1 committed
46
47
```

xuxzh1's avatar
init  
xuxzh1 committed
48
### Supported Quantizations
mashun1's avatar
v1  
mashun1 committed
49

xuxzh1's avatar
init  
xuxzh1 committed
50
51
52
53
54
- `Q4_0`
- `Q4_1`
- `Q5_0`
- `Q5_1`
- `Q8_0`
mashun1's avatar
v1  
mashun1 committed
55

xuxzh1's avatar
init  
xuxzh1 committed
56
#### K-means Quantizations
mashun1's avatar
v1  
mashun1 committed
57

xuxzh1's avatar
init  
xuxzh1 committed
58
59
60
61
62
63
64
65
- `Q3_K_S`
- `Q3_K_M`
- `Q3_K_L`
- `Q4_K_S`
- `Q4_K_M`
- `Q5_K_S`
- `Q5_K_M`
- `Q6_K`
mashun1's avatar
v1  
mashun1 committed
66

xuxzh1's avatar
init  
xuxzh1 committed
67
## Template Detection
mashun1's avatar
v1  
mashun1 committed
68

xuxzh1's avatar
init  
xuxzh1 committed
69
70
> [!NOTE]
> Template detection requires v0.1.42 or higher.
mashun1's avatar
v1  
mashun1 committed
71

xuxzh1's avatar
init  
xuxzh1 committed
72
Ollama uses model metadata, specifically `tokenizer.chat_template`, to automatically create a template appropriate for the model you're importing.
mashun1's avatar
v1  
mashun1 committed
73

xuxzh1's avatar
init  
xuxzh1 committed
74
75
```dockerfile
FROM /path/to/my/gemma/model
mashun1's avatar
v1  
mashun1 committed
76
77
```

xuxzh1's avatar
init  
xuxzh1 committed
78
79
80
81
82
83
84
85
```shell
$ ollama create mymodel
transferring model data
using autodetected template gemma-instruct
creating new layer sha256:baa2a0edc27d19cc6b7537578a9a7ba1a4e3214dc185ed5ae43692b319af7b84
creating new layer sha256:ba66c3309914dbef07e5149a648fd1877f030d337a4f240d444ea335008943cb
writing manifest
success
mashun1's avatar
v1  
mashun1 committed
86
87
```

xuxzh1's avatar
init  
xuxzh1 committed
88
Defining a template in the Modelfile will disable this feature which may be useful if you want to use a different template than the autodetected one.