ollama/docs/import.md

# Import a model

This guide walks through importing a GGUF, PyTorch or Safetensors model.

## Importing (GGUF)

### Step 1: Write a `Modelfile`

Start by creating a `Modelfile`. This file is the blueprint for your model, specifying weights, parameters, prompt templates and more.

```
FROM ./mistral-7b-v0.1.Q4_0.gguf
```

(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the `TEMPLATE` instruction in the `Modelfile`:

```
FROM ./mistral-7b-v0.1.Q4_0.gguf
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```

### Step 2: Create the Ollama model

Finally, create a model from your `Modelfile`:

```
ollama create example -f Modelfile
```

### Step 3: Run your model

Next, test the model with `ollama run`:

```
ollama run example "What is your favourite condiment?"
```

## Importing (PyTorch & Safetensors)

> Importing from PyTorch and Safetensors is a longer process than importing from GGUF. Improvements that make it easier are a work in progress.

### Setup

First, clone the `ollama/ollama` repo:

```
git clone git@github.com:ollama/ollama.git ollama
cd ollama
```

and then fetch its `llama.cpp` submodule:

```shell
git submodule init
git submodule update llm/llama.cpp
```

Next, install the Python dependencies:

```
python3 -m venv llm/llama.cpp/.venv
source llm/llama.cpp/.venv/bin/activate
pip install -r llm/llama.cpp/requirements.txt
```

Then build the `quantize` tool:

```
make -C llm/llama.cpp quantize
```

### Clone the HuggingFace repository (optional)

If the model is currently hosted in a HuggingFace repository, first clone that repository to download the raw model.

Install [Git LFS](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage), verify it's installed, and then clone the model's repository:

```
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 model
```

### Convert the model

> Note: some model architectures require using specific convert scripts. For example, Qwen models require running `convert-hf-to-gguf.py` instead of `convert.py`

```
python llm/llama.cpp/convert.py ./model --outtype f16 --outfile converted.bin
```

### Quantize the model

```
llm/llama.cpp/quantize converted.bin quantized.bin q4_0
```

### Step 3: Write a `Modelfile`

Next, create a `Modelfile` for your model:

```
FROM quantized.bin
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```

### Step 4: Create the Ollama model

Finally, create a model from your `Modelfile`:

```
ollama create example -f Modelfile
```

### Step 5: Run your model

Next, test the model with `ollama run`:

```
ollama run example "What is your favourite condiment?"
```

## Publishing your model (optional – early alpha)

Publishing models is in early alpha. If you'd like to publish your model to share with others, follow these steps:

1. Create [an account](https://ollama.com/signup)
2. Copy your Ollama public key:
  - macOS: `cat ~/.ollama/id_ed25519.pub | pbcopy`
  - Windows: `type %USERPROFILE%\.ollama\id_ed25519.pub`
  - Linux: `cat /usr/share/ollama/.ollama/id_ed25519.pub`
3. Add your public key to your [Ollama account](https://ollama.com/settings/keys)

Next, copy your model to your username's namespace:

```
ollama cp example <your username>/example
```

> Note: model names may only contain lowercase letters, digits, and the characters `.`, `-`, and `_`.

Then push the model:

```
ollama push <your username>/example
```

After publishing, your model will be available at `https://ollama.com/<your username>/example`.

## Quantization reference

The quantization options are as follow (from highest highest to lowest levels of quantization). Note: some architectures such as Falcon do not support K quants.

- `q2_K`
- `q3_K`
- `q3_K_S`
- `q3_K_M`
- `q3_K_L`
- `q4_0` (recommended)
- `q4_1`
- `q4_K`
- `q4_K_S`
- `q4_K_M`
- `q5_0`
- `q5_1`
- `q5_K`
- `q5_K_S`
- `q5_K_M`
- `q6_K`
- `q8_0`
- `f16`
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
+								# Import a model
-												Update import.md
											
										
										
											2023-10-23 17:44:53 -07:00
+								This guide walks through importing a GGUF, PyTorch or Safetensors model.
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
-												Update import.md

Separate GGUF and PyTorch guides
											
										
										
											2023-10-23 17:42:17 -07:00
+								## Importing (GGUF)
 								### Step 1: Write a `Modelfile`
 								Start by creating a `Modelfile`. This file is the blueprint for your model, specifying weights, parameters, prompt templates and more.
 								```
 								FROM ./mistral-7b-v0.1.Q4_0.gguf
 								```
 								(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the `TEMPLATE` instruction in the `Modelfile`:
 								```
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								FROM ./mistral-7b-v0.1.Q4_0.gguf
-												Update import.md

Separate GGUF and PyTorch guides
											
										
										
											2023-10-23 17:42:17 -07:00
+								TEMPLATE "[INST] {{ .Prompt }} [/INST]"
 								```
 								### Step 2: Create the Ollama model
 								Finally, create a model from your `Modelfile`:
 								```
 								ollama create example -f Modelfile
 								```
 								### Step 3: Run your model
 								Next, test the model with `ollama run`:
 								```
 								ollama run example "What is your favourite condiment?"
 								```
 								## Importing (PyTorch & Safetensors)
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								> Importing from PyTorch and Safetensors is a longer process than importing from GGUF. Improvements that make it easier are a work in progress.
-												Update import.md
											
										
										
											2023-10-23 17:44:53 -07:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								### Setup
-												Update import.md
											
										
										
											2023-10-23 17:44:53 -07:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								First, clone the `ollama/ollama` repo:
-												Update import.md
											
										
										
											2023-10-23 17:44:53 -07:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								```
 								git clone git@github.com:ollama/ollama.git ollama
 								cd ollama
 								```
-												Update import.md
											
										
										
											2023-10-23 17:44:53 -07:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								and then fetch its `llama.cpp` submodule:
-												Update import.md
											
										
										
											2023-10-19 12:17:36 -04:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								```shell
 								git submodule init
 								git submodule update llm/llama.cpp
 								```
 								Next, install the Python dependencies:
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
 								```
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								python3 -m venv llm/llama.cpp/.venv
 								source llm/llama.cpp/.venv/bin/activate
 								pip install -r llm/llama.cpp/requirements.txt
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
+								```
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								Then build the `quantize` tool:
 								```
 								make -C llm/llama.cpp quantize
 								```
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								### Clone the HuggingFace repository (optional)
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								If the model is currently hosted in a HuggingFace repository, first clone that repository to download the raw model.
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								Install [Git LFS](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage), verify it's installed, and then clone the model's repository:
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
 								```
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								git lfs install
 								git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 model
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
+								```
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								### Convert the model
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								> Note: some model architectures require using specific convert scripts. For example, Qwen models require running `convert-hf-to-gguf.py` instead of `convert.py`
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								```
 								python llm/llama.cpp/convert.py ./model --outtype f16 --outfile converted.bin
 								```
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								### Quantize the model
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
 								```
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								llm/llama.cpp/quantize converted.bin quantized.bin q4_0
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
+								```
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								### Step 3: Write a `Modelfile`
 								Next, create a `Modelfile` for your model:
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
 								```
-												Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247)


											
										
										
											2024-02-05 00:50:44 -05:00
+								FROM quantized.bin
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
+								TEMPLATE "[INST] {{ .Prompt }} [/INST]"
 								```
-												`import.md`: formatting and spelling

											
										
										
											2023-10-15 01:39:46 -04:00
+								### Step 4: Create the Ollama model
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
 								Finally, create a model from your `Modelfile`:
 								```
 								ollama create example -f Modelfile
 								```
-												Update import.md

Separate GGUF and PyTorch guides
											
										
										
											2023-10-23 17:42:17 -07:00
+								### Step 5: Run your model
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
+								Next, test the model with `ollama run`:
 								```
 								ollama run example "What is your favourite condiment?"
 								```
-												Update import.md

Separate GGUF and PyTorch guides
											
										
										
											2023-10-23 17:42:17 -07:00
+								## Publishing your model (optional – early alpha)
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
 								Publishing models is in early alpha. If you'd like to publish your model to share with others, follow these steps:
-												Update domain name references in docs and install script (#2435)


											
										
										
											2024-02-09 15:19:30 -08:00
+. Create [an account](https://ollama.com/signup)
-												Update import.md
											
										
										
											2024-02-22 02:06:24 -05:00
+. Copy your Ollama public key:
-												docs: pbcopy on mac (#3129)


											
										
										
											2024-05-06 22:47:00 +02:00
+								  - macOS: `cat ~/.ollama/id_ed25519.pub | pbcopy`
-												Update import.md
											
										
										
											2024-02-22 02:08:03 -05:00
+								  - Windows: `type %USERPROFILE%\.ollama\id_ed25519.pub`
 								  - Linux: `cat /usr/share/ollama/.ollama/id_ed25519.pub`
-												Update domain name references in docs and install script (#2435)


											
										
										
											2024-02-09 15:19:30 -08:00
+. Add your public key to your [Ollama account](https://ollama.com/settings/keys)
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
 								Next, copy your model to your username's namespace:
 								```
 								ollama cp example <your username>/example
 								```
-												note on naming restrictions (#2625)

* note on naming restrictions

else push would fail with cryptic
retrieving manifest 
Error: file does not exist
==> maybe change that in code too

* Update docs/import.md

---------

Co-authored-by: C-4-5-3 <154636388+C-4-5-3@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
											
										
										
											2024-05-07 01:03:21 +02:00
+								> Note: model names may only contain lowercase letters, digits, and the characters `.`, `-`, and `_`.
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
+								Then push the model:
 								```
 								ollama push <your username>/example
 								```
-												Update domain name references in docs and install script (#2435)


											
										
										
											2024-02-09 15:19:30 -08:00
+								After publishing, your model will be available at `https://ollama.com/<your username>/example`.
-												add steps for creating a Modelfile and more example commands to `import.md`

											
										
										
											2023-10-15 00:05:47 -04:00
 								## Quantization reference
 								The quantization options are as follow (from highest highest to lowest levels of quantization). Note: some architectures such as Falcon do not support K quants.
 								- `q2_K`
 								- `q3_K`
 								- `q3_K_S`
 								- `q3_K_M`
 								- `q3_K_L`
 								- `q4_0` (recommended)
 								- `q4_1`
 								- `q4_K`
 								- `q4_K_S`
 								- `q4_K_M`
 								- `q5_0`
 								- `q5_1`
 								- `q5_K`
 								- `q5_K_S`
 								- `q5_K_M`
 								- `q6_K`
 								- `q8_0`
-												Update import.md
											
										
										
											2024-01-02 22:28:18 -05:00
+								- `f16`