add faqs for memory pre-loading and the keep_alive setting (#2601)
This commit is contained in:
parent
2653191222
commit
9a7a4b9533
1 changed files with 34 additions and 0 deletions
34
docs/faq.md
34
docs/faq.md
|
@ -131,3 +131,37 @@ This can impact both installing Ollama, as well as downloading models.
|
||||||
Open `Control Panel > Networking and Internet > View network status and tasks` and click on `Change adapter settings` on the left panel. Find the `vEthernel (WSL)` adapter, right click and select `Properties`.
|
Open `Control Panel > Networking and Internet > View network status and tasks` and click on `Change adapter settings` on the left panel. Find the `vEthernel (WSL)` adapter, right click and select `Properties`.
|
||||||
Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these
|
Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these
|
||||||
properties.
|
properties.
|
||||||
|
|
||||||
|
## How can I pre-load a model to get faster response times?
|
||||||
|
|
||||||
|
If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints.
|
||||||
|
|
||||||
|
To preload the mistral model using the generate endpoint, use:
|
||||||
|
```shell
|
||||||
|
curl http://localhost:11434/api/generate -d '{"model": "mistral"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
To use the chat completions endpoint, use:
|
||||||
|
```shell
|
||||||
|
curl http://localhost:11434/api/chat -d '{"model": "mistral"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## How do I keep a model loaded in memory or make it unload immediately?
|
||||||
|
|
||||||
|
By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you are making numerous requests to the LLM. You may, however, want to free up the memory before the 5 minutes have elapsed or keep the model loaded indefinitely. Use the `keep_alive` parameter with either the `/api/generate` and `/api/chat` API endpoints to control how long the model is left in memory.
|
||||||
|
|
||||||
|
The `keep_alive` parameter can be set to:
|
||||||
|
* a duration string (such as "10m" or "24h")
|
||||||
|
* a number in seconds (such as 3600)
|
||||||
|
* any negative number which will keep the model loaded in memory (e.g. -1 or "-1m")
|
||||||
|
* '0' which will unload the model immediately after generating a response
|
||||||
|
|
||||||
|
For example, to preload a model and leave it in memory use:
|
||||||
|
```shell
|
||||||
|
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": -1}'
|
||||||
|
```
|
||||||
|
|
||||||
|
To unload the model and free up memory use:
|
||||||
|
```shell
|
||||||
|
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
|
||||||
|
```
|
Loading…
Reference in a new issue