docs: setting n_gqa is no longer required

2023-11-22 18:01:54 -05:00 · 2023-11-22 18:01:54 -05:00 · 68238b7883
commit 68238b7883
parent 198178225c
1 changed files with 0 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -143,13 +143,6 @@ For instance, if you want to work with larger contexts, you can expand the conte
 llm = Llama(model_path="./models/7B/llama-model.gguf", n_ctx=2048)
 ```

-### Loading llama-2 70b
-
-Llama2 70b must set the `n_gqa` parameter (grouped-query attention factor) to 8 when loading:
-
-```python
-llm = Llama(model_path="./models/70B/llama-model.gguf", n_gqa=8)
-```

 ## Web Server