ollama/docs/import.md

# Import

GGUF models and select Safetensors models can be imported directly into Ollama.

## Import GGUF

A binary GGUF file can be imported directly into Ollama through a Modelfile.

```dockerfile
FROM /path/to/file.gguf
```

## Import Safetensors

If the model being imported is one of these architectures, it can be imported directly into Ollama through a Modelfile:

 - LlamaForCausalLM
 - MistralForCausalLM
 - GemmaForCausalLM

```dockerfile
FROM /path/to/safetensors/directory
```

For architectures not directly convertable by Ollama, see llama.cpp's [guide](https://github.com/ggerganov/llama.cpp/blob/master/README.md#prepare-and-quantize) on conversion. After conversion, see [Import GGUF](#import-gguf).

## Automatic Quantization

> [!NOTE]
> Automatic quantization requires v0.1.35 or higher.

Ollama is capable of quantizing FP16 or FP32 models to any of the supported quantizations with the `-q/--quantize` flag in `ollama create`.

```dockerfile
FROM /path/to/my/gemma/f16/model
```

```shell
$ ollama create -q Q4_K_M mymodel
transferring model data
quantizing F16 model to Q4_K_M
creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
writing manifest
success
```

### Supported Quantizations

<details>
<summary>Legacy Quantization</summary>

- `Q4_0`
- `Q4_1`
- `Q5_0`
- `Q5_1`
- `Q8_0`

</details>

<details>
<summary>K-means Quantization</summary>`

- `Q3_K_S`
- `Q3_K_M`
- `Q3_K_L`
- `Q4_K_S`
- `Q4_K_M`
- `Q5_K_S`
- `Q5_K_M`
- `Q6_K`

</details>

> [!NOTE]
> Activation-aware Weight Quantization (i.e. IQ) are not currently supported for automatic quantization however you can still import the quantized model into Ollama, see [Import GGUF](#import-gguf).

## Template Detection

> [!NOTE]
> Template detection requires v0.1.42 or higher.

Ollama uses model metadata, specifically `tokenizer.chat_template`, to automatically create a template appropriate for the model you're importing.

```dockerfile
FROM /path/to/my/gemma/model
```

```shell
$ ollama create mymodel
transferring model data
using autodetected template gemma-instruct
creating new layer sha256:baa2a0edc27d19cc6b7537578a9a7ba1a4e3214dc185ed5ae43692b319af7b84
creating new layer sha256:ba66c3309914dbef07e5149a648fd1877f030d337a4f240d444ea335008943cb
writing manifest
success
```

Defining a template in the Modelfile will disable this feature which may be useful if you want to use a different template than the autodetected one.
update import.md 2024-06-07 23:45:15 +00:00			`# Import`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`GGUF models and select Safetensors models can be imported directly into Ollama.`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`## Import GGUF`
Update import.md Separate GGUF and PyTorch guides 2023-10-24 00:42:17 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`A binary GGUF file can be imported directly into Ollama through a Modelfile.`
Update import.md Separate GGUF and PyTorch guides 2023-10-24 00:42:17 +00:00
update import.md 2024-06-07 23:45:15 +00:00			```dockerfile
			`FROM /path/to/file.gguf`
Update import.md Separate GGUF and PyTorch guides 2023-10-24 00:42:17 +00:00			```

update import.md 2024-06-07 23:45:15 +00:00			`## Import Safetensors`
Update import.md Separate GGUF and PyTorch guides 2023-10-24 00:42:17 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`If the model being imported is one of these architectures, it can be imported directly into Ollama through a Modelfile:`
Update import.md Separate GGUF and PyTorch guides 2023-10-24 00:42:17 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`- LlamaForCausalLM`
			`- MistralForCausalLM`
			`- GemmaForCausalLM`
Update import.md Separate GGUF and PyTorch guides 2023-10-24 00:42:17 +00:00
update import.md 2024-06-07 23:45:15 +00:00			```dockerfile
			`FROM /path/to/safetensors/directory`
Update import.md Separate GGUF and PyTorch guides 2023-10-24 00:42:17 +00:00			```

update import.md 2024-06-07 23:45:15 +00:00			`For architectures not directly convertable by Ollama, see llama.cpp's [guide](https://github.com/ggerganov/llama.cpp/blob/master/README.md#prepare-and-quantize) on conversion. After conversion, see [Import GGUF](#import-gguf).`
Update import.md Separate GGUF and PyTorch guides 2023-10-24 00:42:17 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`## Automatic Quantization`
Update import.md Separate GGUF and PyTorch guides 2023-10-24 00:42:17 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`> [!NOTE]`
			`> Automatic quantization requires v0.1.35 or higher.`
Update import.md 2023-10-24 00:44:53 +00:00
update import.md 2024-06-07 23:45:15 +00:00			Ollama is capable of quantizing FP16 or FP32 models to any of the supported quantizations with the `-q/--quantize` flag in `ollama create`.
Update import.md 2023-10-24 00:44:53 +00:00
update import.md 2024-06-07 23:45:15 +00:00			```dockerfile
			`FROM /path/to/my/gemma/f16/model`
Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247) 2024-02-05 05:50:44 +00:00			```
Update import.md 2023-10-24 00:44:53 +00:00
Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247) 2024-02-05 05:50:44 +00:00			```shell
update import.md 2024-06-07 23:45:15 +00:00			`$ ollama create -q Q4_K_M mymodel`
			`transferring model data`
			`quantizing F16 model to Q4_K_M`
			`creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd`
			`creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f`
			`writing manifest`
			`success`
Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247) 2024-02-05 05:50:44 +00:00			```

update import.md 2024-06-07 23:45:15 +00:00			`### Supported Quantizations`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`<details>`
			`<summary>Legacy Quantization</summary>`
Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247) 2024-02-05 05:50:44 +00:00
update import.md 2024-06-07 23:45:15 +00:00			- `Q4_0`
			- `Q4_1`
			- `Q5_0`
			- `Q5_1`
			- `Q8_0`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`</details>`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`<details>`
			<summary>K-means Quantization</summary>`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			- `Q3_K_S`
			- `Q3_K_M`
			- `Q3_K_L`
			- `Q4_K_S`
			- `Q4_K_M`
			- `Q5_K_S`
			- `Q5_K_M`
			- `Q6_K`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`</details>`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`> [!NOTE]`
			`> Activation-aware Weight Quantization (i.e. IQ) are not currently supported for automatic quantization however you can still import the quantized model into Ollama, see [Import GGUF](#import-gguf).`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`## Template Detection`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			`> [!NOTE]`
			`> Template detection requires v0.1.42 or higher.`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			Ollama uses model metadata, specifically `tokenizer.chat_template`, to automatically create a template appropriate for the model you're importing.
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00
update import.md 2024-06-07 23:45:15 +00:00			```dockerfile
			`FROM /path/to/my/gemma/model`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00			```

update import.md 2024-06-07 23:45:15 +00:00			```shell
			`$ ollama create mymodel`
			`transferring model data`
			`using autodetected template gemma-instruct`
			`creating new layer sha256:baa2a0edc27d19cc6b7537578a9a7ba1a4e3214dc185ed5ae43692b319af7b84`
			`creating new layer sha256:ba66c3309914dbef07e5149a648fd1877f030d337a4f240d444ea335008943cb`
			`writing manifest`
			`success`
add steps for creating a Modelfile and more example commands to `import.md` 2023-10-15 04:05:47 +00:00			```

update import.md 2024-06-07 23:45:15 +00:00			`Defining a template in the Modelfile will disable this feature which may be useful if you want to use a different template than the autodetected one.`