No description
Find a file
Michael Yang 5ade3db040 fix race
block on write which only returns when the channel is closed. this is
contrary to the previous arrangement where the handler may return but
the stream hasn't finished writing. it can lead to the client receiving
unexpected responses (since the request has been handled) or worst case
a nil-pointer dereference as the stream tries to flush a nil writer
2023-07-14 15:10:46 -07:00
api Merge pull request #77 from jmorganca/mem 2023-07-14 14:57:42 -07:00
app app: trim server lines before logging 2023-07-11 16:43:19 -07:00
cmd continue conversation 2023-07-13 17:13:00 -07:00
docs add publish script 2023-07-07 12:59:45 -04:00
examples/python examples: add basic python example 2023-07-08 17:40:05 -04:00
llama continue conversation 2023-07-13 17:13:00 -07:00
scripts build app in publish script 2023-07-12 19:16:39 -07:00
server fix race 2023-07-14 15:10:46 -07:00
web web: disable signup button while submitting 2023-07-12 17:32:27 -07:00
.dockerignore update Dockerfile 2023-07-06 16:34:44 -04:00
.gitignore fix compilation issue in Dockerfile, remove from README.md until ready 2023-07-11 19:51:08 -07:00
.prettierrc.json move .prettierrc.json to root 2023-07-02 17:34:46 -04:00
Dockerfile fix compilation issue in Dockerfile, remove from README.md until ready 2023-07-11 19:51:08 -07:00
ggml-metal.metal look for ggml-metal in the same directory as the binary 2023-07-11 15:58:56 -07:00
go.mod no errgroup 2023-07-11 14:58:10 -07:00
go.sum no errgroup 2023-07-11 14:58:10 -07:00
LICENSE proto -> ollama 2023-06-26 15:57:13 -04:00
main.go continue conversation 2023-07-13 17:13:00 -07:00
models.json update vicuna model 2023-07-12 09:42:26 -07:00
README.md update README.md API reference 2023-07-12 19:16:28 -07:00

ollama

Ollama

Run large language models with llama.cpp.

Note: certain models that can be run with Ollama are intended for research and/or non-commercial use only.

Features

  • Download and run popular large language models
  • Switch between multiple models on the fly
  • Hardware acceleration where available (Metal, CUDA)
  • Fast inference server written in Go, powered by llama.cpp
  • REST API to use with your application (python, typescript SDKs coming soon)

Install

  • Download for macOS
  • Download for Windows (coming soon)

You can also build the binary from source.

Quickstart

Run a fast and simple model.

ollama run orca

Example models

💬 Chat

Have a conversation.

ollama run vicuna "Why is the sky blue?"

🗺️ Instructions

Get a helping hand.

ollama run orca "Write an email to my boss."

🔎 Ask questions about documents

Send the contents of a document and ask questions about it.

ollama run nous-hermes "$(cat input.txt)", please summarize this story

📖 Storytelling

Venture into the unknown.

ollama run nous-hermes "Once upon a time"

Advanced usage

Run a local model

ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin

Building

go build .

To run it start the server:

./ollama server &

Finally, run a model!

./ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin

API Reference

POST /api/pull

Download a model

curl -X POST http://localhost:11343/api/pull -d '{"model": "orca"}'

POST /api/generate

Complete a prompt

curl -X POST http://localhost:11434/api/generate -d '{"model": "orca", "prompt": "hello!"}'