Commit graph

1774 commits

Author SHA1 Message Date
Andrei Betlen
83d6b26e6f feat: Update llama.cpp 2024-06-08 23:14:22 -04:00
Andrei Betlen
255e1b4495 feat: Update llama.cpp 2024-06-07 02:02:12 -04:00
nullname
d634efcdd9
feat: adding rpc_servers parameter to Llama class (#1477)
* passthru rpc_servers params

wip

* enable llama rpc by default

* convert string to byte

* add rpc package

* Revert "enable llama rpc by default"

This reverts commit 832c6dd56c979514cec5df224bf2d2014dccd790.

* update readme

* Only set rpc_servers when provided

* Add rpc servers to server options

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:38:21 -04:00
Asghar Ghorbani
6e0642ca19
fix: fix logprobs when BOS is not present (#1471)
* Fix lobprobs when BOS is not present

* Fix logprobs when bos is not available
2024-06-04 10:18:38 -04:00
Sigbjørn Skjæret
027f7bc678
fix: Avoid duplicate special tokens in chat formats (#1439)
* Templates sometimes have BOS in them, remove duplicate

* tokenize chat format prompts before completion

This is to ensure that we don't duplicate any special tokens.

Hopefully I amended the existing formats correctly?

* updated comment

* corrected a few

* add some missing internals

* proper bos/eos detection

* just let tokenizer do the job

* typo--

* align test with new response

* changed to a warning

* move to another PR

* Use python warnings module

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:15:41 -04:00
Andrei Betlen
951e39caf9 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-06-04 00:49:26 -04:00
Andrei Betlen
c3ef41ba06 chore: Bump version 2024-06-04 00:49:24 -04:00
Engininja2
ae5682f500
fix: Disable Windows+CUDA workaround when compiling for HIPBLAS (#1493)
* Disable Windows+CUDA workaround when compiling for HIPBLAS

* fix spacing

* change condition to check for Windows & CUDA

Co-authored-by: Andrei <abetlen@gmail.com>

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-06-04 00:42:34 -04:00
Andrei Betlen
cd3f1bb387 feat: Update llama.cpp 2024-06-04 00:35:47 -04:00
Andrei Betlen
6b018e00b1 misc: Improve llava error messages 2024-06-03 11:19:10 -04:00
Andrei Betlen
a6457ba74b Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-06-01 18:10:13 -04:00
Andrei Betlen
af3ed503e9 fix: Use numpy recarray for candidates data, fixes bug with temp < 0 2024-06-01 18:09:24 -04:00
Andrei Betlen
165b4dc6c1 fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488 2024-05-29 02:29:44 -04:00
Andrei Betlen
91d05aba46 fix: adjust kv_override member names to match llama.cpp 2024-05-29 02:28:58 -04:00
Andrei Betlen
df45a4b3fe fix: fix string value kv_overrides. Closes #1487 2024-05-29 02:02:22 -04:00
Andrei Betlen
10b7c50cd2 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-05-28 22:52:30 -04:00
Andrei Betlen
2907c26906 misc: Update debug build to keep all debug symbols for easier gdb debugging 2024-05-28 22:52:28 -04:00
Andrei Betlen
c26004b1be feat: Update llama.cpp 2024-05-28 22:52:03 -04:00
dependabot[bot]
c564007ff6
chore(deps): bump pypa/cibuildwheel from 2.18.0 to 2.18.1 (#1472)
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-27 10:57:17 -04:00
Andrei Betlen
454c9bb1cb feat: Update llama.cpp 2024-05-27 10:51:57 -04:00
Andrei Betlen
2d89964147 docs: Fix table formatting 2024-05-24 11:55:41 -04:00
Andrei Betlen
9e8d7d55bd fix(docs): Fix link typo 2024-05-24 11:55:01 -04:00
Andrei Betlen
ec43e8920f docs: Update multi-modal model section 2024-05-24 11:54:15 -04:00
Andrei Betlen
a4c9ab885d chore: Bump version 2024-05-24 01:59:25 -04:00
Linghan Zhong
5cae1040e3
feat: Improve Llama.eval performance by avoiding list conversion (#1476)
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-24 01:49:44 -04:00
Andrei Betlen
087cc0b036 feat: Update llama.cpp 2024-05-24 01:43:36 -04:00
Andrei Betlen
5a595f035a feat: Update llama.cpp 2024-05-22 02:40:31 -04:00
Andrei Betlen
3dbfec74e7 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-05-18 01:19:20 -04:00
Andrei Betlen
d8a3b013c3 feat: Update llama.cpp 2024-05-18 01:19:19 -04:00
Radoslav Gerganov
03f171e810
example: LLM inference with Ray Serve (#1465) 2024-05-17 13:27:26 -04:00
Andrei Betlen
b564d05806 chore: Bump version 2024-05-16 00:41:21 -04:00
Andrei Betlen
d99a6ba607 fix: segfault for models without eos / bos tokens. Closes #1463 2024-05-16 00:37:27 -04:00
Andrei Betlen
e811a81066 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-05-15 23:59:18 -04:00
Andrei Betlen
ca8e3c967d feat: Update llama.cpp 2024-05-15 23:59:17 -04:00
twaka
5212fb08ae
feat: add MinTokensLogitProcessor and min_tokens argument to server (#1333)
* implement min_tokens

* set default to 0

* pass min_tokens

* fix

* remove copy

* implement MinTokensLogitsProcessor

* format

* fix condition
2024-05-14 09:50:53 -04:00
Sigbjørn Skjæret
389e09c2f5
misc: Remove unnecessary metadata lookups (#1448)
Special tokens are already mapped from metadata by llama.cpp
2024-05-14 09:44:09 -04:00
dependabot[bot]
4b54f79330
chore(deps): bump pypa/cibuildwheel from 2.17.0 to 2.18.0 (#1453)
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.17.0 to 2.18.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.17.0...v2.18.0)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-14 09:35:52 -04:00
Andrei Betlen
50f5c74ecf Update llama.cpp 2024-05-14 09:30:04 -04:00
Andrei Betlen
43ba1526c8 feat: Update llama.cpp 2024-05-13 09:39:08 -04:00
Andrei Betlen
3f8e17af63 fix(ci): Use version without extra platform tag in pep503 index 2024-05-12 11:45:55 -04:00
Andrei Betlen
3c19faa0d4 chore: Bump version 2024-05-12 10:32:52 -04:00
Andrei Betlen
3fe8e9a8f3 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-05-12 10:30:24 -04:00
Andrei Betlen
9dc5e20fb6 feat: Update llama.cpp 2024-05-12 10:30:23 -04:00
Peng Yu
1547202b77
docs: Fix typo in README.md (#1444) 2024-05-10 10:35:51 -04:00
Andrei Betlen
7f59856fa6 fix: Enable CUDA backend for llava. Closes #1324 2024-05-10 10:18:47 -04:00
Andrei Betlen
73165021bb chore: Bump version 2024-05-10 09:44:18 -04:00
Andrei Betlen
eafb6ec5e8 feat: Update llama.cpp 2024-05-10 08:39:55 -04:00
Andrei Betlen
ac55d0a175 fix: Clear kv cache to avoid kv bug when image is evaluated first 2024-05-10 02:38:10 -04:00
Andrei Betlen
4badac3a60 chore: Bump version 2024-05-10 00:56:19 -04:00
Sigbjørn Skjæret
561e880654
fix(security): Render all jinja templates in immutable sandbox (#1441)
Chat templates are rendered with ImmutableSandboxedEnvironment in transformers so no need to do otherwise here.

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-10 00:49:40 -04:00