Commit graph

1779 commits

Author SHA1 Message Date
Junpei Kawamoto
320a5d7ea5
feat: Add .close() method to Llama class to explicitly free model from memory (#1513)
* feat: add explicit methods to free model

This commit introduces a `close` method to both `Llama` and `_LlamaModel`,
allowing users to explicitly free the model from RAM/VRAM.

The previous implementation relied on the destructor of `_LlamaModel` to free
the model. However, in Python, the timing of destructor calls is unclear—for
instance, the `del` statement does not guarantee immediate invocation of the
destructor.

This commit provides an explicit method to release the model, which works
immediately and allows the user to load another model without memory issues.

Additionally, this commit implements a context manager in the `Llama` class,
enabling the automatic closure of the `Llama` object when used with the `with`
statement.

* feat: Implement ContextManager in _LlamaModel, _LlamaContext, and _LlamaBatch

This commit enables automatic resource management by
implementing the `ContextManager` protocol in `_LlamaModel`,
`_LlamaContext`, and `_LlamaBatch`. This ensures that
resources are properly managed and released within a `with`
statement, enhancing robustness and safety in resource handling.

* feat: add ExitStack for Llama's internal class closure

This update implements ExitStack to manage and close internal
classes in Llama, enhancing efficient and safe resource
management.

* Use contextlib ExitStack and closing

* Explicitly free model when closing resources on server

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-13 04:16:14 -04:00
Sigbjørn Skjæret
dbcf64cf07
feat: Support SPM infill (#1492)
* Support SPM infill

* typo--

* one less layer of parenthesis necessary

* new required internals

* manually add bos/eos if model requires it

* add bos even when unknown

This is identical behaviour to llama.cpp

I guess any model that doesn't use BOS is recent enough to have the add_bos_token metadata.

* don't add bos/eos on non-infill pre-tokenized prompt

* add tokenizer hack to remove leading space in suffix

* I keep forgetting metadata are strings

* check if bos exists

* add example

* add cls/sep instead of bos/eos for WPM vocab

* simplify

* color-code filtered suffix

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-13 03:45:24 -04:00
Andrei Betlen
e342161371 feat: Update llama.cpp 2024-06-13 03:38:11 -04:00
Andrei Betlen
86a38ad4a0 chore: Bump version 2024-06-10 11:14:33 -04:00
Andrei Betlen
1615eb9e5b feat: Update llama.cpp 2024-06-10 11:05:45 -04:00
Andrei Betlen
83d6b26e6f feat: Update llama.cpp 2024-06-08 23:14:22 -04:00
Andrei Betlen
255e1b4495 feat: Update llama.cpp 2024-06-07 02:02:12 -04:00
nullname
d634efcdd9
feat: adding rpc_servers parameter to Llama class (#1477)
* passthru rpc_servers params

wip

* enable llama rpc by default

* convert string to byte

* add rpc package

* Revert "enable llama rpc by default"

This reverts commit 832c6dd56c979514cec5df224bf2d2014dccd790.

* update readme

* Only set rpc_servers when provided

* Add rpc servers to server options

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:38:21 -04:00
Asghar Ghorbani
6e0642ca19
fix: fix logprobs when BOS is not present (#1471)
* Fix lobprobs when BOS is not present

* Fix logprobs when bos is not available
2024-06-04 10:18:38 -04:00
Sigbjørn Skjæret
027f7bc678
fix: Avoid duplicate special tokens in chat formats (#1439)
* Templates sometimes have BOS in them, remove duplicate

* tokenize chat format prompts before completion

This is to ensure that we don't duplicate any special tokens.

Hopefully I amended the existing formats correctly?

* updated comment

* corrected a few

* add some missing internals

* proper bos/eos detection

* just let tokenizer do the job

* typo--

* align test with new response

* changed to a warning

* move to another PR

* Use python warnings module

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:15:41 -04:00
Andrei Betlen
951e39caf9 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-06-04 00:49:26 -04:00
Andrei Betlen
c3ef41ba06 chore: Bump version 2024-06-04 00:49:24 -04:00
Engininja2
ae5682f500
fix: Disable Windows+CUDA workaround when compiling for HIPBLAS (#1493)
* Disable Windows+CUDA workaround when compiling for HIPBLAS

* fix spacing

* change condition to check for Windows & CUDA

Co-authored-by: Andrei <abetlen@gmail.com>

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-06-04 00:42:34 -04:00
Andrei Betlen
cd3f1bb387 feat: Update llama.cpp 2024-06-04 00:35:47 -04:00
Andrei Betlen
6b018e00b1 misc: Improve llava error messages 2024-06-03 11:19:10 -04:00
Andrei Betlen
a6457ba74b Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-06-01 18:10:13 -04:00
Andrei Betlen
af3ed503e9 fix: Use numpy recarray for candidates data, fixes bug with temp < 0 2024-06-01 18:09:24 -04:00
Andrei Betlen
165b4dc6c1 fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488 2024-05-29 02:29:44 -04:00
Andrei Betlen
91d05aba46 fix: adjust kv_override member names to match llama.cpp 2024-05-29 02:28:58 -04:00
Andrei Betlen
df45a4b3fe fix: fix string value kv_overrides. Closes #1487 2024-05-29 02:02:22 -04:00
Andrei Betlen
10b7c50cd2 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-05-28 22:52:30 -04:00
Andrei Betlen
2907c26906 misc: Update debug build to keep all debug symbols for easier gdb debugging 2024-05-28 22:52:28 -04:00
Andrei Betlen
c26004b1be feat: Update llama.cpp 2024-05-28 22:52:03 -04:00
dependabot[bot]
c564007ff6
chore(deps): bump pypa/cibuildwheel from 2.18.0 to 2.18.1 (#1472)
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-27 10:57:17 -04:00
Andrei Betlen
454c9bb1cb feat: Update llama.cpp 2024-05-27 10:51:57 -04:00
Andrei Betlen
2d89964147 docs: Fix table formatting 2024-05-24 11:55:41 -04:00
Andrei Betlen
9e8d7d55bd fix(docs): Fix link typo 2024-05-24 11:55:01 -04:00
Andrei Betlen
ec43e8920f docs: Update multi-modal model section 2024-05-24 11:54:15 -04:00
Andrei Betlen
a4c9ab885d chore: Bump version 2024-05-24 01:59:25 -04:00
Linghan Zhong
5cae1040e3
feat: Improve Llama.eval performance by avoiding list conversion (#1476)
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-24 01:49:44 -04:00
Andrei Betlen
087cc0b036 feat: Update llama.cpp 2024-05-24 01:43:36 -04:00
Andrei Betlen
5a595f035a feat: Update llama.cpp 2024-05-22 02:40:31 -04:00
Andrei Betlen
3dbfec74e7 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-05-18 01:19:20 -04:00
Andrei Betlen
d8a3b013c3 feat: Update llama.cpp 2024-05-18 01:19:19 -04:00
Radoslav Gerganov
03f171e810
example: LLM inference with Ray Serve (#1465) 2024-05-17 13:27:26 -04:00
Andrei Betlen
b564d05806 chore: Bump version 2024-05-16 00:41:21 -04:00
Andrei Betlen
d99a6ba607 fix: segfault for models without eos / bos tokens. Closes #1463 2024-05-16 00:37:27 -04:00
Andrei Betlen
e811a81066 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-05-15 23:59:18 -04:00
Andrei Betlen
ca8e3c967d feat: Update llama.cpp 2024-05-15 23:59:17 -04:00
twaka
5212fb08ae
feat: add MinTokensLogitProcessor and min_tokens argument to server (#1333)
* implement min_tokens

* set default to 0

* pass min_tokens

* fix

* remove copy

* implement MinTokensLogitsProcessor

* format

* fix condition
2024-05-14 09:50:53 -04:00
Sigbjørn Skjæret
389e09c2f5
misc: Remove unnecessary metadata lookups (#1448)
Special tokens are already mapped from metadata by llama.cpp
2024-05-14 09:44:09 -04:00
dependabot[bot]
4b54f79330
chore(deps): bump pypa/cibuildwheel from 2.17.0 to 2.18.0 (#1453)
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.17.0 to 2.18.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.17.0...v2.18.0)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-14 09:35:52 -04:00
Andrei Betlen
50f5c74ecf Update llama.cpp 2024-05-14 09:30:04 -04:00
Andrei Betlen
43ba1526c8 feat: Update llama.cpp 2024-05-13 09:39:08 -04:00
Andrei Betlen
3f8e17af63 fix(ci): Use version without extra platform tag in pep503 index 2024-05-12 11:45:55 -04:00
Andrei Betlen
3c19faa0d4 chore: Bump version 2024-05-12 10:32:52 -04:00
Andrei Betlen
3fe8e9a8f3 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-05-12 10:30:24 -04:00
Andrei Betlen
9dc5e20fb6 feat: Update llama.cpp 2024-05-12 10:30:23 -04:00
Peng Yu
1547202b77
docs: Fix typo in README.md (#1444) 2024-05-10 10:35:51 -04:00
Andrei Betlen
7f59856fa6 fix: Enable CUDA backend for llava. Closes #1324 2024-05-10 10:18:47 -04:00