From 6002cebd2c379072a254e468e048870767d95d43 Mon Sep 17 00:00:00 2001 From: Jeffrey Morgan Date: Sun, 15 Oct 2023 00:11:51 -0400 Subject: [PATCH] `import.md`: convert and quantize docs --- docs/import.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/import.md b/docs/import.md index ffcba07c..2781a667 100644 --- a/docs/import.md +++ b/docs/import.md @@ -25,11 +25,11 @@ cd Mistral-7B-Instruct-v0.1 ### Step 2: Convert and quantize -- Install [Docker](https://www.docker.com/get-started/) +A [Docker image](https://hub.docker.com/r/ollama/quantize) with the tooling required to convert and quantize models is available. -Until Ollama supports conversion and quantization as a built-in feature, a [Docker image](https://hub.docker.com/r/ollama/quantize) with this tooling built-in is available. +First, Install [Docker](https://www.docker.com/get-started/). -To convert and quantize your model, run: +Next, to convert and quantize your model, run: ``` docker run --rm -v .:/model ollama/quantize -q q4_0 /model @@ -38,7 +38,7 @@ docker run --rm -v .:/model ollama/quantize -q q4_0 /model This will output two files into the directory: - `f16.bin`: the model converted to GGUF -- `q4_0.bin` the model quantized to a 4-bit quantization +- `q4_0.bin` the model quantized to a 4-bit quantization (we will use this file to create the Ollama model) ### Step 3: Write a `Modelfile` @@ -142,16 +142,16 @@ Run the correct conversion script for your model architecture: ```shell # LlamaForCausalLM or MistralForCausalLM -python3 convert.py +python convert.py # FalconForCausalLM -python3 convert-falcon-hf-to-gguf.py +python convert-falcon-hf-to-gguf.py # GPTNeoXForCausalLM -python3 convert-falcon-hf-to-gguf.py +python convert-falcon-hf-to-gguf.py # GPTBigCodeForCausalLM -python3 convert-starcoder-hf-to-gguf.py +python convert-starcoder-hf-to-gguf.py ``` ### Quantize the model