docs: setting n_gqa is no longer required
This commit is contained in:
parent
198178225c
commit
68238b7883
1 changed files with 0 additions and 7 deletions
|
@ -143,13 +143,6 @@ For instance, if you want to work with larger contexts, you can expand the conte
|
|||
llm = Llama(model_path="./models/7B/llama-model.gguf", n_ctx=2048)
|
||||
```
|
||||
|
||||
### Loading llama-2 70b
|
||||
|
||||
Llama2 70b must set the `n_gqa` parameter (grouped-query attention factor) to 8 when loading:
|
||||
|
||||
```python
|
||||
llm = Llama(model_path="./models/70B/llama-model.gguf", n_gqa=8)
|
||||
```
|
||||
|
||||
## Web Server
|
||||
|
||||
|
|
Loading…
Reference in a new issue