f165048a69
* add KV cache quantization options https://github.com/abetlen/llama-cpp-python/discussions/1220 https://github.com/abetlen/llama-cpp-python/issues/1305 * Add ggml_type * Use ggml_type instead of string for quantization * Add server support --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com> |
||
---|---|---|
.. | ||
__init__.py | ||
__main__.py | ||
app.py | ||
cli.py | ||
errors.py | ||
model.py | ||
settings.py | ||
types.py |