From ea4fbadab39548673e2a835223968b023006e539 Mon Sep 17 00:00:00 2001 From: AgentJ-WR <60302956+AgentJ-WR@users.noreply.github.com> Date: Fri, 7 Jul 2023 23:24:57 -0400 Subject: [PATCH] Show how to adjust context window in README.md --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index fb652a9..0322c73 100644 --- a/README.md +++ b/README.md @@ -105,6 +105,15 @@ Below is a short example demonstrating how to use the high-level API to generate } ``` +### Adjusting the Context Window +The context window of the Llama models determines the maximum number of tokens that can be processed at once. By default, this is set to 512 tokens, but can be adjusted based on your requirements. + +For instance, if you want to work with larger contexts, you can expand the context window by setting the n_ctx parameter when initializing the Llama object: + +```python +llm = Llama(model_path="./models/7B/ggml-model.bin", n_ctx=2048) +``` + ## Web Server `llama-cpp-python` offers a web server which aims to act as a drop-in replacement for the OpenAI API.