Commit graph

1821 commits

Author SHA1 Message Date
e9b337b312
Merge https://github.com/abetlen/llama-cpp-python 2024-06-25 06:55:33 +05:30
Andrei Betlen
04959f1884 feat: Update llama_cpp.py bindings 2024-06-21 16:56:15 -04:00
dependabot[bot]
35c980eb2e
chore(deps): bump pypa/cibuildwheel from 2.18.1 to 2.19.1 (#1527)
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.18.1 to 2.19.1.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.18.1...v2.19.1)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-21 12:10:43 -04:00
dependabot[bot]
398fe81547
chore(deps): bump docker/build-push-action from 5 to 6 (#1539)
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 5 to 6.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v5...v6)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-21 12:10:34 -04:00
Jon Craton
27d53589ff
docs: Update readme examples to use newer Qwen2 model (#1544) 2024-06-21 12:10:15 -04:00
Andrei Betlen
5beec1a1fd feat: Update llama.cpp 2024-06-21 12:09:14 -04:00
Andrei Betlen
d98a24a25b docs: Remove references to deprecated opencl backend. Closes #1512 2024-06-20 10:50:40 -04:00
Andrei Betlen
6c331909ca chore: Bump version 2024-06-19 10:10:01 -04:00
Andrei Betlen
554fd08e7d feat: Update llama.cpp 2024-06-19 10:07:28 -04:00
Andrei Betlen
4c1d74c0ae fix: Make destructor to automatically call .close() method on Llama class. 2024-06-19 10:07:20 -04:00
Andrei Betlen
f4491c4903 feat: Update llama.cpp 2024-06-17 11:56:03 -04:00
5f5ea0a49c
Merge https://github.com/abetlen/llama-cpp-python 2024-06-15 10:16:33 +05:30
Andrei Betlen
8401c6f2d1 feat: Update llama.cpp 2024-06-13 11:31:31 -04:00
Olivier DEBAUCHE
9e396b3ebd
feat: Update workflows and pre-built wheels (#1416)
* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* revert

* Bump pyhton from 3.8 to 3.9

* Remove python 3.8

* Remove Python 3.7 and 3.8 deprecated

* Bump python from 3.8 to 3.9

* Add python 3.9

* Add python 3.9, remove macos-11 deprecated, add macos-14

* Bump python 3.8 to 3.9

* Add python 3.13

* Add python 3.13

* python 3.13 remove

* remove python 3.13

* remove python 3.8

* Bump macos-13 to macos-14

* Update build-wheels-metal.yaml

* Update build-wheels-metal.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update generate-index-from-release.yaml

Add avx, avx2 and avx512

* Update test.yaml

* Update test-pypi.yaml

* Update publish.yaml

* Update publish-to-test.yaml

* Update build-wheels-cuda.yaml

Cuda with AVX2 by default

* Update build-wheels-cuda.yaml

* remove DEPRECATED 32 bits

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

Upgrade matrix os to latest version

* Update build-wheels-metal.yaml

* Update build-wheels-cuda.yaml

* Update test.yaml

* Update test-pypi.yaml

* Update test.yaml

Add cache: 'pip'

* Update publish-to-test.yaml

* Update build-wheels-metal.yaml

Add cache: 'pip'

* Update build-wheels-cuda.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

remove x86_64

* Update build-wheels-metal.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update build-wheels-metal.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* revert

* Remove cpu variants

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-13 10:19:57 -04:00
dependabot[bot]
5af81634cb
chore(deps): bump pypa/cibuildwheel from 2.18.1 to 2.19.0 (#1522)
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.18.1 to 2.19.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.18.1...v2.19.0)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-13 10:12:02 -04:00
Junpei Kawamoto
320a5d7ea5
feat: Add .close() method to Llama class to explicitly free model from memory (#1513)
* feat: add explicit methods to free model

This commit introduces a `close` method to both `Llama` and `_LlamaModel`,
allowing users to explicitly free the model from RAM/VRAM.

The previous implementation relied on the destructor of `_LlamaModel` to free
the model. However, in Python, the timing of destructor calls is unclear—for
instance, the `del` statement does not guarantee immediate invocation of the
destructor.

This commit provides an explicit method to release the model, which works
immediately and allows the user to load another model without memory issues.

Additionally, this commit implements a context manager in the `Llama` class,
enabling the automatic closure of the `Llama` object when used with the `with`
statement.

* feat: Implement ContextManager in _LlamaModel, _LlamaContext, and _LlamaBatch

This commit enables automatic resource management by
implementing the `ContextManager` protocol in `_LlamaModel`,
`_LlamaContext`, and `_LlamaBatch`. This ensures that
resources are properly managed and released within a `with`
statement, enhancing robustness and safety in resource handling.

* feat: add ExitStack for Llama's internal class closure

This update implements ExitStack to manage and close internal
classes in Llama, enhancing efficient and safe resource
management.

* Use contextlib ExitStack and closing

* Explicitly free model when closing resources on server

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-13 04:16:14 -04:00
Sigbjørn Skjæret
dbcf64cf07
feat: Support SPM infill (#1492)
* Support SPM infill

* typo--

* one less layer of parenthesis necessary

* new required internals

* manually add bos/eos if model requires it

* add bos even when unknown

This is identical behaviour to llama.cpp

I guess any model that doesn't use BOS is recent enough to have the add_bos_token metadata.

* don't add bos/eos on non-infill pre-tokenized prompt

* add tokenizer hack to remove leading space in suffix

* I keep forgetting metadata are strings

* check if bos exists

* add example

* add cls/sep instead of bos/eos for WPM vocab

* simplify

* color-code filtered suffix

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-13 03:45:24 -04:00
Andrei Betlen
e342161371 feat: Update llama.cpp 2024-06-13 03:38:11 -04:00
64058abaa0
Merge https://github.com/abetlen/llama-cpp-python 2024-06-12 06:54:43 +05:30
Andrei Betlen
86a38ad4a0 chore: Bump version 2024-06-10 11:14:33 -04:00
Andrei Betlen
1615eb9e5b feat: Update llama.cpp 2024-06-10 11:05:45 -04:00
c2e4d5820a
Merge https://github.com/abetlen/llama-cpp-python 2024-06-09 10:46:41 +05:30
Andrei Betlen
83d6b26e6f feat: Update llama.cpp 2024-06-08 23:14:22 -04:00
Andrei Betlen
255e1b4495 feat: Update llama.cpp 2024-06-07 02:02:12 -04:00
nullname
d634efcdd9
feat: adding rpc_servers parameter to Llama class (#1477)
* passthru rpc_servers params

wip

* enable llama rpc by default

* convert string to byte

* add rpc package

* Revert "enable llama rpc by default"

This reverts commit 832c6dd56c979514cec5df224bf2d2014dccd790.

* update readme

* Only set rpc_servers when provided

* Add rpc servers to server options

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:38:21 -04:00
Asghar Ghorbani
6e0642ca19
fix: fix logprobs when BOS is not present (#1471)
* Fix lobprobs when BOS is not present

* Fix logprobs when bos is not available
2024-06-04 10:18:38 -04:00
Sigbjørn Skjæret
027f7bc678
fix: Avoid duplicate special tokens in chat formats (#1439)
* Templates sometimes have BOS in them, remove duplicate

* tokenize chat format prompts before completion

This is to ensure that we don't duplicate any special tokens.

Hopefully I amended the existing formats correctly?

* updated comment

* corrected a few

* add some missing internals

* proper bos/eos detection

* just let tokenizer do the job

* typo--

* align test with new response

* changed to a warning

* move to another PR

* Use python warnings module

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:15:41 -04:00
Andrei Betlen
951e39caf9 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-06-04 00:49:26 -04:00
Andrei Betlen
c3ef41ba06 chore: Bump version 2024-06-04 00:49:24 -04:00
Engininja2
ae5682f500
fix: Disable Windows+CUDA workaround when compiling for HIPBLAS (#1493)
* Disable Windows+CUDA workaround when compiling for HIPBLAS

* fix spacing

* change condition to check for Windows & CUDA

Co-authored-by: Andrei <abetlen@gmail.com>

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-06-04 00:42:34 -04:00
Andrei Betlen
cd3f1bb387 feat: Update llama.cpp 2024-06-04 00:35:47 -04:00
Andrei Betlen
6b018e00b1 misc: Improve llava error messages 2024-06-03 11:19:10 -04:00
Andrei Betlen
a6457ba74b Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-06-01 18:10:13 -04:00
Andrei Betlen
af3ed503e9 fix: Use numpy recarray for candidates data, fixes bug with temp < 0 2024-06-01 18:09:24 -04:00
fc2af04c15
Merge https://github.com/abetlen/llama-cpp-python 2024-05-30 08:20:49 +05:30
Andrei Betlen
165b4dc6c1 fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488 2024-05-29 02:29:44 -04:00
Andrei Betlen
91d05aba46 fix: adjust kv_override member names to match llama.cpp 2024-05-29 02:28:58 -04:00
Andrei Betlen
df45a4b3fe fix: fix string value kv_overrides. Closes #1487 2024-05-29 02:02:22 -04:00
Andrei Betlen
10b7c50cd2 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-05-28 22:52:30 -04:00
Andrei Betlen
2907c26906 misc: Update debug build to keep all debug symbols for easier gdb debugging 2024-05-28 22:52:28 -04:00
Andrei Betlen
c26004b1be feat: Update llama.cpp 2024-05-28 22:52:03 -04:00
dependabot[bot]
c564007ff6
chore(deps): bump pypa/cibuildwheel from 2.18.0 to 2.18.1 (#1472)
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-27 10:57:17 -04:00
Andrei Betlen
454c9bb1cb feat: Update llama.cpp 2024-05-27 10:51:57 -04:00
Andrei Betlen
2d89964147 docs: Fix table formatting 2024-05-24 11:55:41 -04:00
Andrei Betlen
9e8d7d55bd fix(docs): Fix link typo 2024-05-24 11:55:01 -04:00
Andrei Betlen
ec43e8920f docs: Update multi-modal model section 2024-05-24 11:54:15 -04:00
Andrei Betlen
a4c9ab885d chore: Bump version 2024-05-24 01:59:25 -04:00
Linghan Zhong
5cae1040e3
feat: Improve Llama.eval performance by avoiding list conversion (#1476)
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-24 01:49:44 -04:00
Andrei Betlen
087cc0b036 feat: Update llama.cpp 2024-05-24 01:43:36 -04:00
Andrei Betlen
5a595f035a feat: Update llama.cpp 2024-05-22 02:40:31 -04:00