baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	83d6b26e6f	feat: Update llama.cpp	2024-06-08 23:14:22 -04:00
Andrei Betlen	255e1b4495	feat: Update llama.cpp	2024-06-07 02:02:12 -04:00
nullname	d634efcdd9	feat: adding `rpc_servers` parameter to `Llama` class (#1477 ) * passthru rpc_servers params wip * enable llama rpc by default * convert string to byte * add rpc package * Revert "enable llama rpc by default" This reverts commit 832c6dd56c979514cec5df224bf2d2014dccd790. * update readme * Only set rpc_servers when provided * Add rpc servers to server options --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-06-04 10:38:21 -04:00
Asghar Ghorbani	6e0642ca19	fix: fix logprobs when BOS is not present (#1471 ) * Fix lobprobs when BOS is not present * Fix logprobs when bos is not available	2024-06-04 10:18:38 -04:00
Sigbjørn Skjæret	027f7bc678	fix: Avoid duplicate special tokens in chat formats (#1439 ) * Templates sometimes have BOS in them, remove duplicate * tokenize chat format prompts before completion This is to ensure that we don't duplicate any special tokens. Hopefully I amended the existing formats correctly? * updated comment * corrected a few * add some missing internals * proper bos/eos detection * just let tokenizer do the job * typo-- * align test with new response * changed to a warning * move to another PR * Use python warnings module --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-06-04 10:15:41 -04:00
Andrei Betlen	951e39caf9	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-06-04 00:49:26 -04:00
Andrei Betlen	c3ef41ba06	chore: Bump version	2024-06-04 00:49:24 -04:00
Engininja2	ae5682f500	fix: Disable Windows+CUDA workaround when compiling for HIPBLAS (#1493 ) * Disable Windows+CUDA workaround when compiling for HIPBLAS * fix spacing * change condition to check for Windows & CUDA Co-authored-by: Andrei <abetlen@gmail.com> --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-06-04 00:42:34 -04:00
Andrei Betlen	cd3f1bb387	feat: Update llama.cpp	2024-06-04 00:35:47 -04:00
Andrei Betlen	6b018e00b1	misc: Improve llava error messages	2024-06-03 11:19:10 -04:00
Andrei Betlen	a6457ba74b	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-06-01 18:10:13 -04:00
Andrei Betlen	af3ed503e9	fix: Use numpy recarray for candidates data, fixes bug with temp < 0	2024-06-01 18:09:24 -04:00
Andrei Betlen	165b4dc6c1	fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488	2024-05-29 02:29:44 -04:00
Andrei Betlen	91d05aba46	fix: adjust kv_override member names to match llama.cpp	2024-05-29 02:28:58 -04:00
Andrei Betlen	df45a4b3fe	fix: fix string value kv_overrides. Closes #1487	2024-05-29 02:02:22 -04:00
Andrei Betlen	10b7c50cd2	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-05-28 22:52:30 -04:00
Andrei Betlen	2907c26906	misc: Update debug build to keep all debug symbols for easier gdb debugging	2024-05-28 22:52:28 -04:00
Andrei Betlen	c26004b1be	feat: Update llama.cpp	2024-05-28 22:52:03 -04:00
dependabot[bot]	c564007ff6	chore(deps): bump pypa/cibuildwheel from 2.18.0 to 2.18.1 (#1472 ) updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-27 10:57:17 -04:00
Andrei Betlen	454c9bb1cb	feat: Update llama.cpp	2024-05-27 10:51:57 -04:00
Andrei Betlen	2d89964147	docs: Fix table formatting	2024-05-24 11:55:41 -04:00
Andrei Betlen	9e8d7d55bd	fix(docs): Fix link typo	2024-05-24 11:55:01 -04:00
Andrei Betlen	ec43e8920f	docs: Update multi-modal model section	2024-05-24 11:54:15 -04:00
Andrei Betlen	a4c9ab885d	chore: Bump version	2024-05-24 01:59:25 -04:00
Linghan Zhong	5cae1040e3	feat: Improve Llama.eval performance by avoiding list conversion (#1476 ) Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-24 01:49:44 -04:00
Andrei Betlen	087cc0b036	feat: Update llama.cpp	2024-05-24 01:43:36 -04:00
Andrei Betlen	5a595f035a	feat: Update llama.cpp	2024-05-22 02:40:31 -04:00
Andrei Betlen	3dbfec74e7	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-05-18 01:19:20 -04:00
Andrei Betlen	d8a3b013c3	feat: Update llama.cpp	2024-05-18 01:19:19 -04:00
Radoslav Gerganov	03f171e810	example: LLM inference with Ray Serve (#1465 )	2024-05-17 13:27:26 -04:00
Andrei Betlen	b564d05806	chore: Bump version	2024-05-16 00:41:21 -04:00
Andrei Betlen	d99a6ba607	fix: segfault for models without eos / bos tokens. Closes #1463	2024-05-16 00:37:27 -04:00
Andrei Betlen	e811a81066	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-05-15 23:59:18 -04:00
Andrei Betlen	ca8e3c967d	feat: Update llama.cpp	2024-05-15 23:59:17 -04:00
twaka	5212fb08ae	feat: add MinTokensLogitProcessor and min_tokens argument to server (#1333 ) * implement min_tokens * set default to 0 * pass min_tokens * fix * remove copy * implement MinTokensLogitsProcessor * format * fix condition	2024-05-14 09:50:53 -04:00
Sigbjørn Skjæret	389e09c2f5	misc: Remove unnecessary metadata lookups (#1448 ) Special tokens are already mapped from metadata by llama.cpp	2024-05-14 09:44:09 -04:00
dependabot[bot]	4b54f79330	chore(deps): bump pypa/cibuildwheel from 2.17.0 to 2.18.0 (#1453 ) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.17.0 to 2.18.0. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](https://github.com/pypa/cibuildwheel/compare/v2.17.0...v2.18.0) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-14 09:35:52 -04:00
Andrei Betlen	50f5c74ecf	Update llama.cpp	2024-05-14 09:30:04 -04:00
Andrei Betlen	43ba1526c8	feat: Update llama.cpp	2024-05-13 09:39:08 -04:00
Andrei Betlen	3f8e17af63	fix(ci): Use version without extra platform tag in pep503 index	2024-05-12 11:45:55 -04:00
Andrei Betlen	3c19faa0d4	chore: Bump version	2024-05-12 10:32:52 -04:00
Andrei Betlen	3fe8e9a8f3	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-05-12 10:30:24 -04:00
Andrei Betlen	9dc5e20fb6	feat: Update llama.cpp	2024-05-12 10:30:23 -04:00
Peng Yu	1547202b77	docs: Fix typo in README.md (#1444 )	2024-05-10 10:35:51 -04:00
Andrei Betlen	7f59856fa6	fix: Enable CUDA backend for llava. Closes #1324	2024-05-10 10:18:47 -04:00
Andrei Betlen	73165021bb	chore: Bump version	2024-05-10 09:44:18 -04:00
Andrei Betlen	eafb6ec5e8	feat: Update llama.cpp	2024-05-10 08:39:55 -04:00
Andrei Betlen	ac55d0a175	fix: Clear kv cache to avoid kv bug when image is evaluated first	2024-05-10 02:38:10 -04:00
Andrei Betlen	4badac3a60	chore: Bump version	2024-05-10 00:56:19 -04:00
Sigbjørn Skjæret	561e880654	fix(security): Render all jinja templates in immutable sandbox (#1441 ) Chat templates are rendered with ImmutableSandboxedEnvironment in transformers so no need to do otherwise here. Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-10 00:49:40 -04:00

1 2 3 4 5 ...

1774 commits