applied mikes comments
Signed-off-by: Matt Williams <m@technovangelist.com>
This commit is contained in:
parent
3c975f898f
commit
b2974a7095
2 changed files with 10 additions and 9 deletions
|
@ -133,7 +133,7 @@ PARAMETER <parameter> <parametervalue>
|
||||||
|
|
||||||
### TEMPLATE
|
### TEMPLATE
|
||||||
|
|
||||||
`TEMPLATE` of the full prompt template to be passed into the model. It may include (optionally) a system prompt and a user's prompt. This is used to create a full custom prompt, and syntax may be model specific. You can usually find the template for a given model in the read
|
`TEMPLATE` of the full prompt template to be passed into the model. It may include (optionally) a system prompt and a user's prompt. This is used to create a full custom prompt, and syntax may be model specific. You can usually find the template for a given model in the readme for that model.
|
||||||
|
|
||||||
#### Template Variables
|
#### Template Variables
|
||||||
|
|
||||||
|
|
|
@ -4,10 +4,10 @@ Sometimes the model you want to work with is not available at [https://ollama.ai
|
||||||
|
|
||||||
## Figure out if we can run the model?
|
## Figure out if we can run the model?
|
||||||
|
|
||||||
Not all models will work with Ollama. There are a number of factors that go into whether we are able to work with the next cool model. First it has to work with llama.cpp. Then we have to have implemented the features of llama.cpp that it requires. And then, sometimes, even with both of those, the model might not work…
|
Not all models will work with Ollama. There are a number of factors that go into whether we are able to work with the next cool model. First it has to work with llama.cpp. Then we have to have implemented the features of llama.cpp that it requires. And then, sometimes, even with both of those, the model might not work...
|
||||||
|
|
||||||
1. What is the model you want to convert and upload?
|
1. What is the model you want to convert and upload?
|
||||||
2. Visit the model’s page on HuggingFace.
|
2. Visit the model's page on HuggingFace.
|
||||||
3. Switch to the **Files and versions** tab.
|
3. Switch to the **Files and versions** tab.
|
||||||
4. Click on the **config.json** file. If there is no config.json file, it may not work.
|
4. Click on the **config.json** file. If there is no config.json file, it may not work.
|
||||||
5. Take note of the **architecture** list in the json file.
|
5. Take note of the **architecture** list in the json file.
|
||||||
|
@ -25,8 +25,9 @@ At this point there are two processes you can use. You can either use a Docker c
|
||||||
|
|
||||||
## Convert and Quantize with Docker
|
## Convert and Quantize with Docker
|
||||||
|
|
||||||
Run `docker run --rm -v /path/to/model/repo:/repo ollama/quantize -q quantlevel /repo`. For instance, if you have downloaded the latest Mistral 7B model, then clone it to your machine. Then change into that directory and you can run
|
Run `docker run --rm -v /path/to/model/repo:/repo ollama/quantize -q quantlevel /repo`. For instance, if you have downloaded the latest Mistral 7B model, then clone it to your machine. Then change into that directory and you can run:
|
||||||
```shell
|
|
||||||
|
```shell
|
||||||
docker run --rm -v .:/repo ollama/quantize -q q4_0 /repo
|
docker run --rm -v .:/repo ollama/quantize -q q4_0 /repo
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -34,6 +35,8 @@ You can find the different quantization levels below under **Quantize the Model*
|
||||||
|
|
||||||
This will output two files into the directory. First is a f16.bin file that is the model converted to GGUF. The second file is a q4_0.bin file which is the model quantized to a 4 bit quantization. You should rename it to something more descriptive.
|
This will output two files into the directory. First is a f16.bin file that is the model converted to GGUF. The second file is a q4_0.bin file which is the model quantized to a 4 bit quantization. You should rename it to something more descriptive.
|
||||||
|
|
||||||
|
You can find the repository for the Docker container here: [https://github.com/mxyng/quantize](https://github.com/mxyng/quantize)
|
||||||
|
|
||||||
## Convert and Quantize Manually
|
## Convert and Quantize Manually
|
||||||
|
|
||||||
### Clone llama.cpp to your machine
|
### Clone llama.cpp to your machine
|
||||||
|
@ -49,7 +52,7 @@ If we know the model has a chance of working, then we need to convert and quanti
|
||||||
### Convert the model to GGUF
|
### Convert the model to GGUF
|
||||||
|
|
||||||
1. Decide on the right convert script to run. What was the model architecture you found in the first section.
|
1. Decide on the right convert script to run. What was the model architecture you found in the first section.
|
||||||
1. LlamaForCausalLM or MistralForCausalLM:
|
1. LlamaForCausalLM or MistralForCausalLM:
|
||||||
run `python3 convert.py <modelfilename>`
|
run `python3 convert.py <modelfilename>`
|
||||||
No need to specify fp16 or fp32.
|
No need to specify fp16 or fp32.
|
||||||
2. FalconForCausalLM or RWForCausalLM:
|
2. FalconForCausalLM or RWForCausalLM:
|
||||||
|
@ -85,11 +88,9 @@ The quantization options are as follows. Note that some architectures such as Fa
|
||||||
- Q5_K_M
|
- Q5_K_M
|
||||||
- Q6_K
|
- Q6_K
|
||||||
- Q8_0
|
- Q8_0
|
||||||
- F16
|
|
||||||
- F32
|
|
||||||
|
|
||||||
Run the following command `quantize <converted model from above> <output file> <quantization type>`
|
Run the following command `quantize <converted model from above> <output file> <quantization type>`
|
||||||
|
|
||||||
## Now Create the Model
|
## Now Create the Model
|
||||||
|
|
||||||
Now you can create the Ollama model. Refer to the [modelfile](./modelfile.md) doc for more information on doing that.
|
Now you can create the Ollama model. Refer to the [modelfile](./modelfile.md) doc for more information on doing that.
|
||||||
|
|
Loading…
Reference in a new issue