No description

Find a file

Andrei Betlen 6cf5876538 Deprecate generate method		2023-04-12 14:06:04 -04:00
.github/workflows	Update workflow permissions	2023-04-10 02:35:00 -04:00
docs	Update model paths to be more clear they should point to file	2023-04-09 22:45:55 -04:00
examples	Added iterative search to prevent instructions from being echoed, add ignore eos, add no-mmap, fixed 1 character echo too much bug	2023-04-10 16:35:38 +02:00
llama_cpp	Deprecate generate method	2023-04-12 14:06:04 -04:00
tests	Make Llama instance pickleable. Closes #27	2023-04-05 06:52:17 -04:00
vendor	Update llama.cpp	2023-04-11 23:53:46 -04:00
.gitignore	Ignore ./idea folder	2023-04-05 18:23:17 -04:00
.gitmodules	Add llama.cpp to vendor folder	2023-03-23 05:37:26 -04:00
CMakeLists.txt	Build shared library with make on unix platforms	2023-04-08 02:39:17 -04:00
LICENSE.md	Initial commit	2023-03-23 05:33:06 -04:00
mkdocs.yml	Add search to mkdocs	2023-03-31 00:01:53 -04:00
poetry.lock	Add basic tests. Closes #24	2023-04-05 03:23:15 -04:00
pyproject.toml	Bump version	2023-04-10 12:56:48 -04:00
README.md	Update model paths to be more clear they should point to file	2023-04-09 22:45:55 -04:00
setup.py	Bump version	2023-04-10 12:56:48 -04:00

README.md

🦙 Python Bindings for `llama.cpp`

Simple Python bindings for @ggerganov's llama.cpp library. This package provides:

Low-level access to C API via ctypes interface.
High-level Python API for text completion
- OpenAI-like API
- LangChain compatibility

Installation

Install from PyPI:

pip install llama-cpp-python

High-level API

>>> from llama_cpp import Llama
>>> llm = Llama(model_path="./models/7B/ggml-model.bin")
>>> output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
>>> print(output)
{
  "id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "object": "text_completion",
  "created": 1679561337,
  "model": "./models/7B/ggml-model.bin",
  "choices": [
    {
      "text": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto.",
      "index": 0,
      "logprobs": None,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 28,
    "total_tokens": 42
  }
}

Web Server

llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).

To install the server package and get started:

pip install llama-cpp-python[server]
export MODEL=./models/7B/ggml-model.bin
python3 -m llama_cpp.server

Navigate to http://localhost:8000/docs to see the OpenAPI documentation.

Low-level API

The low-level API is a direct ctypes binding to the C API provided by llama.cpp. The entire API can be found in llama_cpp/llama_cpp.py and should mirror llama.h.

Documentation

Documentation is available at https://abetlen.github.io/llama-cpp-python. If you find any issues with the documentation, please open an issue or submit a PR.

Development

This package is under active development and I welcome any contributions.

To get started, clone the repository and install the package in development mode:

git clone git@github.com:abetlen/llama-cpp-python.git
git submodule update --init --recursive
# Will need to be re-run any time vendor/llama.cpp is updated
python3 setup.py develop

How does this compare to other Python bindings of `llama.cpp`?

I originally wrote this package for my own use with two goals in mind:

Provide a simple process to install llama.cpp and access the full C API in llama.h from Python
Provide a high-level Python API that can be used as a drop-in replacement for the OpenAI API so existing apps can be easily ported to use llama.cpp

Any contributions and changes to this package will be made with these goals in mind.

License

This project is licensed under the terms of the MIT license.