baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	554fd08e7d	feat: Update llama.cpp	2024-06-19 10:07:28 -04:00
Andrei Betlen	4c1d74c0ae	fix: Make destructor to automatically call .close() method on Llama class.	2024-06-19 10:07:20 -04:00
Andrei Betlen	f4491c4903	feat: Update llama.cpp	2024-06-17 11:56:03 -04:00
Andrei Betlen	8401c6f2d1	feat: Update llama.cpp	2024-06-13 11:31:31 -04:00
Olivier DEBAUCHE	9e396b3ebd	feat: Update workflows and pre-built wheels (#1416 ) * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * revert * Bump pyhton from 3.8 to 3.9 * Remove python 3.8 * Remove Python 3.7 and 3.8 deprecated * Bump python from 3.8 to 3.9 * Add python 3.9 * Add python 3.9, remove macos-11 deprecated, add macos-14 * Bump python 3.8 to 3.9 * Add python 3.13 * Add python 3.13 * python 3.13 remove * remove python 3.13 * remove python 3.8 * Bump macos-13 to macos-14 * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update generate-index-from-release.yaml Add avx, avx2 and avx512 * Update test.yaml * Update test-pypi.yaml * Update publish.yaml * Update publish-to-test.yaml * Update build-wheels-cuda.yaml Cuda with AVX2 by default * Update build-wheels-cuda.yaml * remove DEPRECATED 32 bits * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml Upgrade matrix os to latest version * Update build-wheels-metal.yaml * Update build-wheels-cuda.yaml * Update test.yaml * Update test-pypi.yaml * Update test.yaml Add cache: 'pip' * Update publish-to-test.yaml * Update build-wheels-metal.yaml Add cache: 'pip' * Update build-wheels-cuda.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml remove x86_64 * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * revert * Remove cpu variants --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-06-13 10:19:57 -04:00
dependabot[bot]	5af81634cb	chore(deps): bump pypa/cibuildwheel from 2.18.1 to 2.19.0 (#1522 ) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.18.1 to 2.19.0. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](https://github.com/pypa/cibuildwheel/compare/v2.18.1...v2.19.0) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-06-13 10:12:02 -04:00
Junpei Kawamoto	320a5d7ea5	feat: Add `.close()` method to `Llama` class to explicitly free model from memory (#1513 ) * feat: add explicit methods to free model This commit introduces a `close` method to both `Llama` and `_LlamaModel`, allowing users to explicitly free the model from RAM/VRAM. The previous implementation relied on the destructor of `_LlamaModel` to free the model. However, in Python, the timing of destructor calls is unclear—for instance, the `del` statement does not guarantee immediate invocation of the destructor. This commit provides an explicit method to release the model, which works immediately and allows the user to load another model without memory issues. Additionally, this commit implements a context manager in the `Llama` class, enabling the automatic closure of the `Llama` object when used with the `with` statement. * feat: Implement ContextManager in _LlamaModel, _LlamaContext, and _LlamaBatch This commit enables automatic resource management by implementing the `ContextManager` protocol in `_LlamaModel`, `_LlamaContext`, and `_LlamaBatch`. This ensures that resources are properly managed and released within a `with` statement, enhancing robustness and safety in resource handling. * feat: add ExitStack for Llama's internal class closure This update implements ExitStack to manage and close internal classes in Llama, enhancing efficient and safe resource management. * Use contextlib ExitStack and closing * Explicitly free model when closing resources on server --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-06-13 04:16:14 -04:00
Sigbjørn Skjæret	dbcf64cf07	feat: Support SPM infill (#1492 ) * Support SPM infill * typo-- * one less layer of parenthesis necessary * new required internals * manually add bos/eos if model requires it * add bos even when unknown This is identical behaviour to llama.cpp I guess any model that doesn't use BOS is recent enough to have the add_bos_token metadata. * don't add bos/eos on non-infill pre-tokenized prompt * add tokenizer hack to remove leading space in suffix * I keep forgetting metadata are strings * check if bos exists * add example * add cls/sep instead of bos/eos for WPM vocab * simplify * color-code filtered suffix --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-06-13 03:45:24 -04:00
Andrei Betlen	e342161371	feat: Update llama.cpp	2024-06-13 03:38:11 -04:00
Andrei Betlen	86a38ad4a0	chore: Bump version	2024-06-10 11:14:33 -04:00
Andrei Betlen	1615eb9e5b	feat: Update llama.cpp	2024-06-10 11:05:45 -04:00
Andrei Betlen	83d6b26e6f	feat: Update llama.cpp	2024-06-08 23:14:22 -04:00
Andrei Betlen	255e1b4495	feat: Update llama.cpp	2024-06-07 02:02:12 -04:00
nullname	d634efcdd9	feat: adding `rpc_servers` parameter to `Llama` class (#1477 ) * passthru rpc_servers params wip * enable llama rpc by default * convert string to byte * add rpc package * Revert "enable llama rpc by default" This reverts commit 832c6dd56c979514cec5df224bf2d2014dccd790. * update readme * Only set rpc_servers when provided * Add rpc servers to server options --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-06-04 10:38:21 -04:00
Asghar Ghorbani	6e0642ca19	fix: fix logprobs when BOS is not present (#1471 ) * Fix lobprobs when BOS is not present * Fix logprobs when bos is not available	2024-06-04 10:18:38 -04:00
Sigbjørn Skjæret	027f7bc678	fix: Avoid duplicate special tokens in chat formats (#1439 ) * Templates sometimes have BOS in them, remove duplicate * tokenize chat format prompts before completion This is to ensure that we don't duplicate any special tokens. Hopefully I amended the existing formats correctly? * updated comment * corrected a few * add some missing internals * proper bos/eos detection * just let tokenizer do the job * typo-- * align test with new response * changed to a warning * move to another PR * Use python warnings module --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-06-04 10:15:41 -04:00
Andrei Betlen	951e39caf9	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-06-04 00:49:26 -04:00
Andrei Betlen	c3ef41ba06	chore: Bump version	2024-06-04 00:49:24 -04:00
Engininja2	ae5682f500	fix: Disable Windows+CUDA workaround when compiling for HIPBLAS (#1493 ) * Disable Windows+CUDA workaround when compiling for HIPBLAS * fix spacing * change condition to check for Windows & CUDA Co-authored-by: Andrei <abetlen@gmail.com> --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-06-04 00:42:34 -04:00
Andrei Betlen	cd3f1bb387	feat: Update llama.cpp	2024-06-04 00:35:47 -04:00
Andrei Betlen	6b018e00b1	misc: Improve llava error messages	2024-06-03 11:19:10 -04:00
Andrei Betlen	a6457ba74b	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-06-01 18:10:13 -04:00
Andrei Betlen	af3ed503e9	fix: Use numpy recarray for candidates data, fixes bug with temp < 0	2024-06-01 18:09:24 -04:00
Andrei Betlen	165b4dc6c1	fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488	2024-05-29 02:29:44 -04:00
Andrei Betlen	91d05aba46	fix: adjust kv_override member names to match llama.cpp	2024-05-29 02:28:58 -04:00
Andrei Betlen	df45a4b3fe	fix: fix string value kv_overrides. Closes #1487	2024-05-29 02:02:22 -04:00
Andrei Betlen	10b7c50cd2	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-05-28 22:52:30 -04:00
Andrei Betlen	2907c26906	misc: Update debug build to keep all debug symbols for easier gdb debugging	2024-05-28 22:52:28 -04:00
Andrei Betlen	c26004b1be	feat: Update llama.cpp	2024-05-28 22:52:03 -04:00
dependabot[bot]	c564007ff6	chore(deps): bump pypa/cibuildwheel from 2.18.0 to 2.18.1 (#1472 ) updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-27 10:57:17 -04:00
Andrei Betlen	454c9bb1cb	feat: Update llama.cpp	2024-05-27 10:51:57 -04:00
Andrei Betlen	2d89964147	docs: Fix table formatting	2024-05-24 11:55:41 -04:00
Andrei Betlen	9e8d7d55bd	fix(docs): Fix link typo	2024-05-24 11:55:01 -04:00
Andrei Betlen	ec43e8920f	docs: Update multi-modal model section	2024-05-24 11:54:15 -04:00
Andrei Betlen	a4c9ab885d	chore: Bump version	2024-05-24 01:59:25 -04:00
Linghan Zhong	5cae1040e3	feat: Improve Llama.eval performance by avoiding list conversion (#1476 ) Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-24 01:49:44 -04:00
Andrei Betlen	087cc0b036	feat: Update llama.cpp	2024-05-24 01:43:36 -04:00
Andrei Betlen	5a595f035a	feat: Update llama.cpp	2024-05-22 02:40:31 -04:00
Andrei Betlen	3dbfec74e7	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-05-18 01:19:20 -04:00
Andrei Betlen	d8a3b013c3	feat: Update llama.cpp	2024-05-18 01:19:19 -04:00
Radoslav Gerganov	03f171e810	example: LLM inference with Ray Serve (#1465 )	2024-05-17 13:27:26 -04:00
Andrei Betlen	b564d05806	chore: Bump version	2024-05-16 00:41:21 -04:00
Andrei Betlen	d99a6ba607	fix: segfault for models without eos / bos tokens. Closes #1463	2024-05-16 00:37:27 -04:00
Andrei Betlen	e811a81066	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-05-15 23:59:18 -04:00
Andrei Betlen	ca8e3c967d	feat: Update llama.cpp	2024-05-15 23:59:17 -04:00
twaka	5212fb08ae	feat: add MinTokensLogitProcessor and min_tokens argument to server (#1333 ) * implement min_tokens * set default to 0 * pass min_tokens * fix * remove copy * implement MinTokensLogitsProcessor * format * fix condition	2024-05-14 09:50:53 -04:00
Sigbjørn Skjæret	389e09c2f5	misc: Remove unnecessary metadata lookups (#1448 ) Special tokens are already mapped from metadata by llama.cpp	2024-05-14 09:44:09 -04:00
dependabot[bot]	4b54f79330	chore(deps): bump pypa/cibuildwheel from 2.17.0 to 2.18.0 (#1453 ) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.17.0 to 2.18.0. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](https://github.com/pypa/cibuildwheel/compare/v2.17.0...v2.18.0) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-14 09:35:52 -04:00
Andrei Betlen	50f5c74ecf	Update llama.cpp	2024-05-14 09:30:04 -04:00
Andrei Betlen	43ba1526c8	feat: Update llama.cpp	2024-05-13 09:39:08 -04:00

1 2 3 4 5 ...

1785 commits