dbbfc4ba2f
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage These fix that: - I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...] - Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user" |
||
---|---|---|
.github/workflows | ||
docs | ||
examples | ||
llama_cpp | ||
tests | ||
vendor | ||
.gitignore | ||
.gitmodules | ||
CMakeLists.txt | ||
LICENSE.md | ||
mkdocs.yml | ||
poetry.lock | ||
pyproject.toml | ||
README.md | ||
setup.py |
🦙 Python Bindings for llama.cpp
Simple Python bindings for @ggerganov's llama.cpp
library.
This package provides:
- Low-level access to C API via
ctypes
interface. - High-level Python API for text completion
- OpenAI-like API
- LangChain compatibility
Installation
Install from PyPI (requires a c compiler):
pip install llama-cpp-python
The above command will attempt to install the package and build build llama.cpp
from source.
This is the recommended installation method as it ensures that llama.cpp
is built with the available optimizations for your system.
This method defaults to using make
to build llama.cpp
on Linux / MacOS and cmake
on Windows.
You can force the use of cmake
on Linux / MacOS setting the FORCE_CMAKE=1
environment variable before installing.
High-level API
>>> from llama_cpp import Llama
>>> llm = Llama(model_path="./models/7B/ggml-model.bin")
>>> output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
>>> print(output)
{
"id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"object": "text_completion",
"created": 1679561337,
"model": "./models/7B/ggml-model.bin",
"choices": [
{
"text": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto.",
"index": 0,
"logprobs": None,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 28,
"total_tokens": 42
}
}
Web Server
llama-cpp-python
offers a web server which aims to act as a drop-in replacement for the OpenAI API.
This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
To install the server package and get started:
pip install llama-cpp-python[server]
export MODEL=./models/7B/ggml-model.bin
python3 -m llama_cpp.server
Navigate to http://localhost:8000/docs to see the OpenAPI documentation.
Low-level API
The low-level API is a direct ctypes
binding to the C API provided by llama.cpp
.
The entire API can be found in llama_cpp/llama_cpp.py and should mirror llama.h.
Documentation
Documentation is available at https://abetlen.github.io/llama-cpp-python. If you find any issues with the documentation, please open an issue or submit a PR.
Development
This package is under active development and I welcome any contributions.
To get started, clone the repository and install the package in development mode:
git clone --recurse-submodules git@github.com:abetlen/llama-cpp-python.git
# Will need to be re-run any time vendor/llama.cpp is updated
python3 setup.py develop
How does this compare to other Python bindings of llama.cpp
?
I originally wrote this package for my own use with two goals in mind:
- Provide a simple process to install
llama.cpp
and access the full C API inllama.h
from Python - Provide a high-level Python API that can be used as a drop-in replacement for the OpenAI API so existing apps can be easily ported to use
llama.cpp
Any contributions and changes to this package will be made with these goals in mind.
License
This project is licensed under the terms of the MIT license.