c283edd7f2
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default. Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100% |
||
---|---|---|
.. | ||
server | ||
__init__.py | ||
llama.py | ||
llama_cpp.py | ||
llama_types.py |