From c2a234a0860f5a65ae1a9655ac1038ef294c8357 Mon Sep 17 00:00:00 2001 From: Andrei Betlen Date: Thu, 15 Feb 2024 23:15:50 -0500 Subject: [PATCH] docs: Add embeddings section --- README.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/README.md b/README.md index 3d8d4d4..7da8e5f 100644 --- a/README.md +++ b/README.md @@ -398,6 +398,22 @@ llama = Llama( ) ``` +### Embeddings + +`llama-cpp-python` supports generating embeddings from the text. + +```python +import llama_cpp + +llm = llama_cpp.Llama(model_path="path/to/model.gguf", embeddings=True) + +embeddings = llm.create_embedding("Hello, world!") + +# or batched + +embeddings = llm.create_embedding(["Hello, world!", "Goodbye, world!"]) +``` + ### Adjusting the Context Window The context window of the Llama models determines the maximum number of tokens that can be processed at once. By default, this is set to 512 tokens, but can be adjusted based on your requirements.