No description
Find a file
Quinn Slack f4432e1dba
treat stop as stop sequences, not exact tokens (#442)
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.

Fixes https://github.com/jmorganca/ollama/issues/295.
2023-08-30 11:53:42 -04:00
api Merge pull request #428 from jmorganca/mxyng/upload-chunks 2023-08-30 07:47:17 -07:00
app app: package ggml-metal.metal from correct directory 2023-08-17 23:55:45 -04:00
cmd add model IDs (#439) 2023-08-28 20:50:24 -07:00
docs treat stop as stop sequences, not exact tokens (#442) 2023-08-30 11:53:42 -04:00
examples Merge pull request #303 from jmorganca/matt/dockerit 2023-08-16 08:04:34 -07:00
format Generate private/public keypair for use w/ auth (#324) 2023-08-11 10:58:23 -07:00
llm treat stop as stop sequences, not exact tokens (#442) 2023-08-30 11:53:42 -04:00
parser Merge pull request #290 from jmorganca/add-adapter-layers 2023-08-10 17:23:01 -07:00
progressbar vendor in progress bar and change to bytes instead of bibytes (#130) 2023-07-19 17:24:03 -07:00
scripts build release mode 2023-08-22 09:52:43 -07:00
server treat stop as stop sequences, not exact tokens (#442) 2023-08-30 11:53:42 -04:00
vector embed text document in modelfile 2023-08-08 11:27:17 -04:00
version add version 2023-08-22 09:40:58 -07:00
.dockerignore add .env to .dockerignore 2023-08-21 09:32:02 -07:00
.gitignore add ggml-metal.metal to .gitignore 2023-07-28 11:04:21 -04:00
.prettierrc.json move .prettierrc.json to root 2023-07-02 17:34:46 -04:00
Dockerfile fix compilation issue in Dockerfile, remove from README.md until ready 2023-07-11 19:51:08 -07:00
go.mod check memory requirements before loading 2023-08-10 09:23:11 -07:00
go.sum check memory requirements before loading 2023-08-10 09:23:11 -07:00
LICENSE proto -> ollama 2023-06-26 15:57:13 -04:00
main.go set non-zero error code on error 2023-08-14 14:09:58 -07:00
README.md update orca to orca-mini 2023-08-27 13:26:30 -04:00

logo

Ollama

Discord

Run, create, and share large language models (LLMs).

Note: Ollama is in early preview. Please report any issues you find.

Download

Quickstart

To run and chat with Llama 2, the new model by Meta:

ollama run llama2

Model library

Ollama supports a list of open-source models available on ollama.ai/library

Here are some example open-source models that can be downloaded:

Model Parameters Size Download
Llama2 7B 3.8GB ollama pull llama2
Llama2 13B 13B 7.3GB ollama pull llama2:13b
Llama2 70B 70B 39GB ollama pull llama2:70b
Llama2 Uncensored 7B 3.8GB ollama pull llama2-uncensored
Code Llama 7B 3.8GB ollama pull codellama
Orca Mini 3B 1.9GB ollama pull orca-mini
Vicuna 7B 3.8GB ollama pull vicuna
Nous-Hermes 7B 3.8GB ollama pull nous-hermes
Nous-Hermes 13B 13B 7.3GB ollama pull nous-hermes:13b
Wizard Vicuna Uncensored 13B 7.3GB ollama pull wizard-vicuna

Note: You should have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.

Examples

Run a model

ollama run llama2
>>> hi
Hello! How can I help you today?

For multiline input, you can wrap text with """:

>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.

Create a custom model

Pull a base model:

ollama pull llama2

To update a model to the latest version, run ollama pull llama2 again. The model will be updated (if necessary).

Create a Modelfile:

FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system prompt
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Next, create and run the model:

ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.

For more examples, see the examples directory. For more information on creating a Modelfile, see the Modelfile documentation.

Pull a model from the registry

ollama pull orca-mini

Listing local models

ollama list

Model packages

Overview

Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.

logo

Building

You will also need a C/C++ compiler such as GCC for MacOS and Linux or Mingw-w64 GCC for Windows.

go build .

To run it start the server:

./ollama serve &

Finally, run a model!

./ollama run llama2

REST API

See the API documentation for all endpoints.

Ollama has an API for running and managing models. For example to generate text from a model:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'

Tools using Ollama