llama.cpp

History

Junpei Kawamoto 320a5d7ea5 feat: Add `.close()` method to `Llama` class to explicitly free model from memory (#1513 ) * feat: add explicit methods to free model This commit introduces a `close` method to both `Llama` and `_LlamaModel`, allowing users to explicitly free the model from RAM/VRAM. The previous implementation relied on the destructor of `_LlamaModel` to free the model. However, in Python, the timing of destructor calls is unclear—for instance, the `del` statement does not guarantee immediate invocation of the destructor. This commit provides an explicit method to release the model, which works immediately and allows the user to load another model without memory issues. Additionally, this commit implements a context manager in the `Llama` class, enabling the automatic closure of the `Llama` object when used with the `with` statement. * feat: Implement ContextManager in _LlamaModel, _LlamaContext, and _LlamaBatch This commit enables automatic resource management by implementing the `ContextManager` protocol in `_LlamaModel`, `_LlamaContext`, and `_LlamaBatch`. This ensures that resources are properly managed and released within a `with` statement, enhancing robustness and safety in resource handling. * feat: add ExitStack for Llama's internal class closure This update implements ExitStack to manage and close internal classes in Llama, enhancing efficient and safe resource management. * Use contextlib ExitStack and closing * Explicitly free model when closing resources on server --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>		2024-06-13 04:16:14 -04:00
..
__init__.py	llama_cpp server: app is now importable, still runnable as a module	2023-04-29 11:41:25 -07:00
__main__.py	feat: Add support for yaml based configs	2024-04-10 02:47:01 -04:00
app.py	feat: add MinTokensLogitProcessor and min_tokens argument to server (#1333 )	2024-05-14 09:50:53 -04:00
cli.py	Fix python3.8 support	2024-01-19 08:17:49 -05:00
errors.py	misc: Format	2024-02-28 14:27:40 -05:00
model.py	feat: Add `.close()` method to `Llama` class to explicitly free model from memory (#1513 )	2024-06-13 04:16:14 -04:00
settings.py	feat: adding `rpc_servers` parameter to `Llama` class (#1477 )	2024-06-04 10:38:21 -04:00
types.py	feat: add MinTokensLogitProcessor and min_tokens argument to server (#1333 )	2024-05-14 09:50:53 -04:00