Andrei Betlen
4c1d74c0ae
fix: Make destructor to automatically call .close() method on Llama class.
2024-06-19 10:07:20 -04:00
Andrei Betlen
f4491c4903
feat: Update llama.cpp
2024-06-17 11:56:03 -04:00
Andrei Betlen
8401c6f2d1
feat: Update llama.cpp
2024-06-13 11:31:31 -04:00
Olivier DEBAUCHE
9e396b3ebd
feat: Update workflows and pre-built wheels ( #1416 )
...
* Update build-wheels-cuda.yaml
* Update build-wheels-cuda.yaml
* revert
* Bump pyhton from 3.8 to 3.9
* Remove python 3.8
* Remove Python 3.7 and 3.8 deprecated
* Bump python from 3.8 to 3.9
* Add python 3.9
* Add python 3.9, remove macos-11 deprecated, add macos-14
* Bump python 3.8 to 3.9
* Add python 3.13
* Add python 3.13
* python 3.13 remove
* remove python 3.13
* remove python 3.8
* Bump macos-13 to macos-14
* Update build-wheels-metal.yaml
* Update build-wheels-metal.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-wheels-metal.yaml
* Update generate-index-from-release.yaml
Add avx, avx2 and avx512
* Update test.yaml
* Update test-pypi.yaml
* Update publish.yaml
* Update publish-to-test.yaml
* Update build-wheels-cuda.yaml
Cuda with AVX2 by default
* Update build-wheels-cuda.yaml
* remove DEPRECATED 32 bits
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
Upgrade matrix os to latest version
* Update build-wheels-metal.yaml
* Update build-wheels-cuda.yaml
* Update test.yaml
* Update test-pypi.yaml
* Update test.yaml
Add cache: 'pip'
* Update publish-to-test.yaml
* Update build-wheels-metal.yaml
Add cache: 'pip'
* Update build-wheels-cuda.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-wheels-metal.yaml
remove x86_64
* Update build-wheels-metal.yaml
* Update build-and-release.yaml
* Update build-wheels-metal.yaml
* Update build-wheels-metal.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-wheels-metal.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-and-release.yaml
* Update build-wheels-metal.yaml
* revert
* Remove cpu variants
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-13 10:19:57 -04:00
dependabot[bot]
5af81634cb
chore(deps): bump pypa/cibuildwheel from 2.18.1 to 2.19.0 ( #1522 )
...
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.18.1 to 2.19.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.18.1...v2.19.0 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-13 10:12:02 -04:00
Junpei Kawamoto
320a5d7ea5
feat: Add .close()
method to Llama
class to explicitly free model from memory ( #1513 )
...
* feat: add explicit methods to free model
This commit introduces a `close` method to both `Llama` and `_LlamaModel`,
allowing users to explicitly free the model from RAM/VRAM.
The previous implementation relied on the destructor of `_LlamaModel` to free
the model. However, in Python, the timing of destructor calls is unclear—for
instance, the `del` statement does not guarantee immediate invocation of the
destructor.
This commit provides an explicit method to release the model, which works
immediately and allows the user to load another model without memory issues.
Additionally, this commit implements a context manager in the `Llama` class,
enabling the automatic closure of the `Llama` object when used with the `with`
statement.
* feat: Implement ContextManager in _LlamaModel, _LlamaContext, and _LlamaBatch
This commit enables automatic resource management by
implementing the `ContextManager` protocol in `_LlamaModel`,
`_LlamaContext`, and `_LlamaBatch`. This ensures that
resources are properly managed and released within a `with`
statement, enhancing robustness and safety in resource handling.
* feat: add ExitStack for Llama's internal class closure
This update implements ExitStack to manage and close internal
classes in Llama, enhancing efficient and safe resource
management.
* Use contextlib ExitStack and closing
* Explicitly free model when closing resources on server
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-13 04:16:14 -04:00
Sigbjørn Skjæret
dbcf64cf07
feat: Support SPM infill ( #1492 )
...
* Support SPM infill
* typo--
* one less layer of parenthesis necessary
* new required internals
* manually add bos/eos if model requires it
* add bos even when unknown
This is identical behaviour to llama.cpp
I guess any model that doesn't use BOS is recent enough to have the add_bos_token metadata.
* don't add bos/eos on non-infill pre-tokenized prompt
* add tokenizer hack to remove leading space in suffix
* I keep forgetting metadata are strings
* check if bos exists
* add example
* add cls/sep instead of bos/eos for WPM vocab
* simplify
* color-code filtered suffix
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-13 03:45:24 -04:00
Andrei Betlen
e342161371
feat: Update llama.cpp
2024-06-13 03:38:11 -04:00
Andrei Betlen
86a38ad4a0
chore: Bump version
2024-06-10 11:14:33 -04:00
Andrei Betlen
1615eb9e5b
feat: Update llama.cpp
2024-06-10 11:05:45 -04:00
Andrei Betlen
83d6b26e6f
feat: Update llama.cpp
2024-06-08 23:14:22 -04:00
Andrei Betlen
255e1b4495
feat: Update llama.cpp
2024-06-07 02:02:12 -04:00
nullname
d634efcdd9
feat: adding rpc_servers
parameter to Llama
class ( #1477 )
...
* passthru rpc_servers params
wip
* enable llama rpc by default
* convert string to byte
* add rpc package
* Revert "enable llama rpc by default"
This reverts commit 832c6dd56c979514cec5df224bf2d2014dccd790.
* update readme
* Only set rpc_servers when provided
* Add rpc servers to server options
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:38:21 -04:00
Asghar Ghorbani
6e0642ca19
fix: fix logprobs when BOS is not present ( #1471 )
...
* Fix lobprobs when BOS is not present
* Fix logprobs when bos is not available
2024-06-04 10:18:38 -04:00
Sigbjørn Skjæret
027f7bc678
fix: Avoid duplicate special tokens in chat formats ( #1439 )
...
* Templates sometimes have BOS in them, remove duplicate
* tokenize chat format prompts before completion
This is to ensure that we don't duplicate any special tokens.
Hopefully I amended the existing formats correctly?
* updated comment
* corrected a few
* add some missing internals
* proper bos/eos detection
* just let tokenizer do the job
* typo--
* align test with new response
* changed to a warning
* move to another PR
* Use python warnings module
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:15:41 -04:00
Andrei Betlen
951e39caf9
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-06-04 00:49:26 -04:00
Andrei Betlen
c3ef41ba06
chore: Bump version
2024-06-04 00:49:24 -04:00
Engininja2
ae5682f500
fix: Disable Windows+CUDA workaround when compiling for HIPBLAS ( #1493 )
...
* Disable Windows+CUDA workaround when compiling for HIPBLAS
* fix spacing
* change condition to check for Windows & CUDA
Co-authored-by: Andrei <abetlen@gmail.com>
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-06-04 00:42:34 -04:00
Andrei Betlen
cd3f1bb387
feat: Update llama.cpp
2024-06-04 00:35:47 -04:00
Andrei Betlen
6b018e00b1
misc: Improve llava error messages
2024-06-03 11:19:10 -04:00
Andrei Betlen
a6457ba74b
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-06-01 18:10:13 -04:00
Andrei Betlen
af3ed503e9
fix: Use numpy recarray for candidates data, fixes bug with temp < 0
2024-06-01 18:09:24 -04:00
Andrei Betlen
165b4dc6c1
fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488
2024-05-29 02:29:44 -04:00
Andrei Betlen
91d05aba46
fix: adjust kv_override member names to match llama.cpp
2024-05-29 02:28:58 -04:00
Andrei Betlen
df45a4b3fe
fix: fix string value kv_overrides. Closes #1487
2024-05-29 02:02:22 -04:00
Andrei Betlen
10b7c50cd2
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-05-28 22:52:30 -04:00
Andrei Betlen
2907c26906
misc: Update debug build to keep all debug symbols for easier gdb debugging
2024-05-28 22:52:28 -04:00
Andrei Betlen
c26004b1be
feat: Update llama.cpp
2024-05-28 22:52:03 -04:00
dependabot[bot]
c564007ff6
chore(deps): bump pypa/cibuildwheel from 2.18.0 to 2.18.1 ( #1472 )
...
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-27 10:57:17 -04:00
Andrei Betlen
454c9bb1cb
feat: Update llama.cpp
2024-05-27 10:51:57 -04:00
Andrei Betlen
2d89964147
docs: Fix table formatting
2024-05-24 11:55:41 -04:00
Andrei Betlen
9e8d7d55bd
fix(docs): Fix link typo
2024-05-24 11:55:01 -04:00
Andrei Betlen
ec43e8920f
docs: Update multi-modal model section
2024-05-24 11:54:15 -04:00
Andrei Betlen
a4c9ab885d
chore: Bump version
2024-05-24 01:59:25 -04:00
Linghan Zhong
5cae1040e3
feat: Improve Llama.eval performance by avoiding list conversion ( #1476 )
...
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-24 01:49:44 -04:00
Andrei Betlen
087cc0b036
feat: Update llama.cpp
2024-05-24 01:43:36 -04:00
Andrei Betlen
5a595f035a
feat: Update llama.cpp
2024-05-22 02:40:31 -04:00
Andrei Betlen
3dbfec74e7
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-05-18 01:19:20 -04:00
Andrei Betlen
d8a3b013c3
feat: Update llama.cpp
2024-05-18 01:19:19 -04:00
Radoslav Gerganov
03f171e810
example: LLM inference with Ray Serve ( #1465 )
2024-05-17 13:27:26 -04:00
Andrei Betlen
b564d05806
chore: Bump version
2024-05-16 00:41:21 -04:00
Andrei Betlen
d99a6ba607
fix: segfault for models without eos / bos tokens. Closes #1463
2024-05-16 00:37:27 -04:00
Andrei Betlen
e811a81066
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-05-15 23:59:18 -04:00
Andrei Betlen
ca8e3c967d
feat: Update llama.cpp
2024-05-15 23:59:17 -04:00
twaka
5212fb08ae
feat: add MinTokensLogitProcessor and min_tokens argument to server ( #1333 )
...
* implement min_tokens
* set default to 0
* pass min_tokens
* fix
* remove copy
* implement MinTokensLogitsProcessor
* format
* fix condition
2024-05-14 09:50:53 -04:00
Sigbjørn Skjæret
389e09c2f5
misc: Remove unnecessary metadata lookups ( #1448 )
...
Special tokens are already mapped from metadata by llama.cpp
2024-05-14 09:44:09 -04:00
dependabot[bot]
4b54f79330
chore(deps): bump pypa/cibuildwheel from 2.17.0 to 2.18.0 ( #1453 )
...
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.17.0 to 2.18.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.17.0...v2.18.0 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-14 09:35:52 -04:00
Andrei Betlen
50f5c74ecf
Update llama.cpp
2024-05-14 09:30:04 -04:00
Andrei Betlen
43ba1526c8
feat: Update llama.cpp
2024-05-13 09:39:08 -04:00
Andrei Betlen
3f8e17af63
fix(ci): Use version without extra platform tag in pep503 index
2024-05-12 11:45:55 -04:00