From ec43e8920f2231639bea33d3db6651841bcec45c Mon Sep 17 00:00:00 2001 From: Andrei Betlen Date: Fri, 24 May 2024 11:54:15 -0400 Subject: [PATCH] docs: Update multi-modal model section --- README.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index d5ced86..3255e69 100644 --- a/README.md +++ b/README.md @@ -499,13 +499,16 @@ llm = Llama.from_pretrained( `llama-cpp-python` supports such as llava1.5 which allow the language model to read information from both text and images. -You'll first need to download one of the available multi-modal models in GGUF format: +Below are the supported multi-modal models and their respective chat handlers (Python API) and chat formats (Server API). -- [llava-v1.5-7b](https://huggingface.co/mys/ggml_llava-v1.5-7b) -- [llava-v1.5-13b](https://huggingface.co/mys/ggml_llava-v1.5-13b) -- [bakllava-1-7b](https://huggingface.co/mys/ggml_bakllava-1) -- [llava-v1.6-34b](https://huggingface.co/cjpais/llava-v1.6-34B-gguf) -- [moondream2](https://huggingface.co/vikhyatk/moondream2) +| Model | `LlamaChatHandler` | `chat_format` | +| --- | --- | --- | +| [llava-v1.5-7b](https://huggingface.co/mys/ggml_llava-v1.5-7b) | `Llava15ChatHandler` | `llava-1-5` | +| [llava-v1.5-13b](https://huggingface.co/mys/ggml_llava-v1.5-13b) | `Llava15ChatHandler` | `llava-1-5` | +| [llava-v1.6-34b](https://huggingface.co/cjpais/llava-v1.6-34B-gguf) | `Llava16ChatHandler` | `llava-1-6` | +| [moondream2](https://huggingface.co/vikhyatk/moondream2) | `MoondreamChatHandler` | `moondream2` | +| [nanollava](https://huggingface.co/abetlen/nanollava) | `NanollavaChatHandler` | `nanollava` | +| [llama-3-vision-alpha](https://huggingface.co/abetlen/llama-3-vision-alpha) | `Llama3VisionAlphaChatHandler` | `llama-3-vision-alpha` | Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.