baalajimaestro/ollama

Fork 0

Jeffrey Morgan b43ddd84be update README.md instructions section

2023-07-08 19:19:31 -04:00

1.7 KiB

Raw Blame History

Ollama

Run large language models with llama.cpp.

Note: certain models that can be run with Ollama are intended for research and/or non-commercial use only.

Features

Download and run popular large language models
Switch between multiple models on the fly
Hardware acceleration where available (Metal, CUDA)
Fast inference server written in Go, powered by llama.cpp
REST API to use with your application (python, typescript SDKs coming soon)

Install

Download for macOS
Download for Windows (coming soon)
Docker: docker run -p 11434:11434 ollama/ollama

You can also build the binary from source.

Quickstart

Run a fast and simple model.

ollama run orca

Example models

💬 Chat

Have a conversation.

ollama run vicuna "Why is the sky blue?"

🗺️ Instructions

Get a helping hand.

ollama run orca "Write an email to my boss."

📖 Storytelling

Venture into the unknown.

ollama run nous-hermes "Once upon a time"

Advanced usage

Run a local model

ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin

Building

make

To run it start the server:

./ollama server &

Finally, run a model!

./ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin

API Reference

`POST /api/pull`

Download a model

curl -X POST http://localhost:11343/api/pull -d '{"model": "orca"}'

`POST /api/generate`

Complete a prompt

curl -X POST http://localhost:11434/api/generate -d '{"model": "orca", "prompt": "hello!", "stream": true}'

1.7 KiB Raw Blame History