2023-10-15 00:05:47 -04:00
# Import a model
2023-10-23 17:44:53 -07:00
This guide walks through importing a GGUF, PyTorch or Safetensors model.
2023-10-15 00:05:47 -04:00
2023-10-23 17:42:17 -07:00
## Importing (GGUF)
### Step 1: Write a `Modelfile`
Start by creating a `Modelfile` . This file is the blueprint for your model, specifying weights, parameters, prompt templates and more.
```
FROM ./mistral-7b-v0.1.Q4_0.gguf
```
(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the `TEMPLATE` instruction in the `Modelfile` :
```
2024-02-05 00:50:44 -05:00
FROM ./mistral-7b-v0.1.Q4_0.gguf
2023-10-23 17:42:17 -07:00
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```
### Step 2: Create the Ollama model
Finally, create a model from your `Modelfile` :
```
ollama create example -f Modelfile
```
### Step 3: Run your model
Next, test the model with `ollama run` :
```
ollama run example "What is your favourite condiment?"
```
## Importing (PyTorch & Safetensors)
2023-10-15 00:05:47 -04:00
2024-02-05 00:50:44 -05:00
> Importing from PyTorch and Safetensors is a longer process than importing from GGUF. Improvements that make it easier are a work in progress.
2023-10-23 17:44:53 -07:00
2024-02-05 00:50:44 -05:00
### Setup
2023-10-23 17:44:53 -07:00
2024-02-05 00:50:44 -05:00
First, clone the `ollama/ollama` repo:
2023-10-23 17:44:53 -07:00
2024-02-05 00:50:44 -05:00
```
git clone git@github .com:ollama/ollama.git ollama
cd ollama
```
2023-10-23 17:44:53 -07:00
2024-02-05 00:50:44 -05:00
and then fetch its `llama.cpp` submodule:
2023-10-19 12:17:36 -04:00
2024-02-05 00:50:44 -05:00
```shell
git submodule init
git submodule update llm/llama.cpp
```
Next, install the Python dependencies:
2023-10-15 00:05:47 -04:00
```
2024-02-05 00:50:44 -05:00
python3 -m venv llm/llama.cpp/.venv
source llm/llama.cpp/.venv/bin/activate
pip install -r llm/llama.cpp/requirements.txt
2023-10-15 00:05:47 -04:00
```
2024-02-05 00:50:44 -05:00
Then build the `quantize` tool:
```
make -C llm/llama.cpp quantize
```
2023-10-15 00:05:47 -04:00
2024-02-05 00:50:44 -05:00
### Clone the HuggingFace repository (optional)
2023-10-15 00:05:47 -04:00
2024-02-05 00:50:44 -05:00
If the model is currently hosted in a HuggingFace repository, first clone that repository to download the raw model.
2023-10-15 00:05:47 -04:00
2024-02-05 00:50:44 -05:00
Install [Git LFS ](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage ), verify it's installed, and then clone the model's repository:
2023-10-15 00:05:47 -04:00
```
2024-02-05 00:50:44 -05:00
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 model
2023-10-15 00:05:47 -04:00
```
2024-02-05 00:50:44 -05:00
### Convert the model
2023-10-15 00:05:47 -04:00
2024-02-05 00:50:44 -05:00
> Note: some model architectures require using specific convert scripts. For example, Qwen models require running `convert-hf-to-gguf.py` instead of `convert.py`
2023-10-15 00:05:47 -04:00
2024-02-05 00:50:44 -05:00
```
python llm/llama.cpp/convert.py ./model --outtype f16 --outfile converted.bin
```
2023-10-15 00:05:47 -04:00
2024-02-05 00:50:44 -05:00
### Quantize the model
2023-10-15 00:05:47 -04:00
```
2024-02-05 00:50:44 -05:00
llm/llama.cpp/quantize converted.bin quantized.bin q4_0
2023-10-15 00:05:47 -04:00
```
2024-02-05 00:50:44 -05:00
### Step 3: Write a `Modelfile`
Next, create a `Modelfile` for your model:
2023-10-15 00:05:47 -04:00
```
2024-02-05 00:50:44 -05:00
FROM quantized.bin
2023-10-15 00:05:47 -04:00
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```
2023-10-15 01:39:46 -04:00
### Step 4: Create the Ollama model
2023-10-15 00:05:47 -04:00
Finally, create a model from your `Modelfile` :
```
ollama create example -f Modelfile
```
2023-10-23 17:42:17 -07:00
### Step 5: Run your model
2023-10-15 00:05:47 -04:00
Next, test the model with `ollama run` :
```
ollama run example "What is your favourite condiment?"
```
2023-10-23 17:42:17 -07:00
## Publishing your model (optional – early alpha)
2023-10-15 00:05:47 -04:00
Publishing models is in early alpha. If you'd like to publish your model to share with others, follow these steps:
2024-02-09 15:19:30 -08:00
1. Create [an account ](https://ollama.com/signup )
2024-02-19 22:48:24 -05:00
2. Run `cat ~/.ollama/id_ed25519.pub` (or `type %USERPROFILE%\.ollama\id_ed25519.pub` on Windows) to view your Ollama public key. Copy this to the clipboard.
2024-02-09 15:19:30 -08:00
3. Add your public key to your [Ollama account ](https://ollama.com/settings/keys )
2023-10-15 00:05:47 -04:00
Next, copy your model to your username's namespace:
```
ollama cp example < your username > /example
```
Then push the model:
```
ollama push < your username > /example
```
2024-02-09 15:19:30 -08:00
After publishing, your model will be available at `https://ollama.com/<your username>/example` .
2023-10-15 00:05:47 -04:00
## Quantization reference
The quantization options are as follow (from highest highest to lowest levels of quantization). Note: some architectures such as Falcon do not support K quants.
- `q2_K`
- `q3_K`
- `q3_K_S`
- `q3_K_M`
- `q3_K_L`
- `q4_0` (recommended)
- `q4_1`
- `q4_K`
- `q4_K_S`
- `q4_K_M`
- `q5_0`
- `q5_1`
- `q5_K`
- `q5_K_S`
- `q5_K_M`
- `q6_K`
- `q8_0`
2024-01-02 22:28:18 -05:00
- `f16`