llama.cpp/llama_cpp/server
Limour f165048a69
feat: add support for KV cache quantization options (#1307)
* add KV cache quantization options

https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-01 10:19:28 -04:00
..
__init__.py llama_cpp server: app is now importable, still runnable as a module 2023-04-29 11:41:25 -07:00
__main__.py [Feat] Multi model support (#931) 2023-12-22 05:51:25 -05:00
app.py feat: Add logprobs support to chat completions (#1311) 2024-03-31 13:30:13 -04:00
cli.py Fix python3.8 support 2024-01-19 08:17:49 -05:00
errors.py misc: Format 2024-02-28 14:27:40 -05:00
model.py feat: add support for KV cache quantization options (#1307) 2024-04-01 10:19:28 -04:00
settings.py feat: add support for KV cache quantization options (#1307) 2024-04-01 10:19:28 -04:00
types.py feat: Add logprobs support to chat completions (#1311) 2024-03-31 13:30:13 -04:00