llama.cpp/docs/server.md
Damian Stewart aab74f0b2b
Multimodal Support (Llava 1.5) (#821)
* llava v1.5 integration

* Point llama.cpp to fork

* Add llava shared library target

* Fix type

* Update llama.cpp

* Add llava api

* Revert changes to llama and llama_cpp

* Update llava example

* Add types for new gpt-4-vision-preview api

* Fix typo

* Update llama.cpp

* Update llama_types to match OpenAI v1 API

* Update ChatCompletionFunction type

* Reorder request parameters

* More API type fixes

* Even More Type Updates

* Add parameter for custom chat_handler to Llama class

* Fix circular import

* Convert to absolute imports

* Fix

* Fix pydantic Jsontype bug

* Accept list of prompt tokens in create_completion

* Add llava1.5 chat handler

* Add Multimodal notebook

* Clean up examples

* Add server docs

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-11-07 22:48:51 -05:00

1.9 KiB

OpenAI Compatible Server

llama-cpp-python offers an OpenAI API compatible web server.

This web server can be used to serve local models and easily connect them to existing clients.

Setup

Installation

The server can be installed by running the following command:

pip install llama-cpp-python[server]

Running the server

The server can then be started by running the following command:

python3 -m llama_cpp.server --model <model_path>

Server options

For a full list of options, run:

python3 -m llama_cpp.server --help

NOTE: All server options are also available as environment variables. For example, --model can be set by setting the MODEL environment variable.

Guides

Multi-modal Models

llama-cpp-python supports the llava1.5 family of multi-modal models which allow the language model to read information from both text and images.

You'll first need to download one of the available multi-modal models in GGUF format:

Then when you run the server you'll need to also specify the path to the clip model used for image embedding

python3 -m llama_cpp.server --model <model_path> --clip-model-path <clip_model_path>

Then you can just use the OpenAI API as normal

from openai import OpenAI

client = OpenAI(base_url="http://<host>:<port>/v1", api_key="sk-xxx")
response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "<image_url>"
                    },
                },
                {"type": "text", "text": "What does the image say"},
            ],
        }
    ],
)
print(response)