1.5 KiB
1.5 KiB
Ollama
Ollama is a tool for running any large language model on any machine. It's designed to be easy to use and fast, supporting the largest number of models possible by using the fastest loader available for your platform and model.
Note: this project is a work in progress. Certain models that can be run with
ollama
are intended for research and/or non-commercial use only.
Install
Using pip
:
pip install ollama
Using docker
:
docker run ollama/ollama
Quickstart
To run a model, use ollama run
:
ollama run orca-mini-3b
You can also run models from hugging face:
ollama run huggingface.co/TheBloke/orca_mini_3B-GGML
Or directly via downloaded model files:
ollama run ~/Downloads/orca-mini-13b.ggmlv3.q4_0.bin
Python SDK
Example
import ollama
ollama.generate("orca-mini-3b", "hi")
ollama.generate(model, message)
Generate a completion
ollama.generate("./llama-7b-ggml.bin", "hi")
ollama.models()
List available local models
models = ollama.models()
ollama.load(model)
Manually a model for generation
ollama.load("model")
ollama.unload(model)
Unload a model
ollama.unload("model")
ollama.pull(model)
Download a model
ollama.pull("huggingface.co/thebloke/llama-7b-ggml")
Coming Soon
ollama.search("query")
Search for compatible models that Ollama can run
ollama.search("llama-7b")